pandas官方文档cookbook(4)中Arithmetic&Slicing&Sorting翻译

xiaoxiao2021-02-28  90

文档版本:0.20.3 这些例子是用python3.4写出来的。对于较早的python版本需要对代码做些相应的调整。 Pandas(pd)和Numpy(np)是唯一两个默认导入的包。其余的包会显示导入给新用户看。 若有翻译不当的地方,请多多指教。

这份文档中的例子都是从Stack-Overflow和Github中别人提问的比较经典的问题,作者从中进行提炼与总结。

Arithmetic

对多重索引执行算法需要进行广播

In [61]: cols = pd.MultiIndex.from_tuples([ (x,y) for x in ['A','B','C'] for y in ['O','I']]) In [62]: df = pd.DataFrame(np.random.randn(2,6),index=['n','m'],columns=cols); df Out[62]: A B C O I O I O I n 1.920906 -0.388231 -2.314394 0.665508 0.402562 0.399555 m -1.765956 0.850423 0.388054 0.992312 0.744086 -0.739776 In [63]: df = df.div(df['C'],level=1); df Out[63]: A B C O I O I O I n 4.771702 -0.971660 -5.749162 1.665625 1.0 1.0 m -2.373321 -1.149568 0.521518 -1.341367 1.0 1.0

切片

用xs函数对多重索引进行切片

In [64]: coords = [('AA','one'),('AA','six'),('BB','one'),('BB','two'),('BB','six')] In [65]: index = pd.MultiIndex.from_tuples(coords) In [66]: df = pd.DataFrame([11,22,33,44,55],index,['MyData']); df Out[66]: MyData AA one 11 six 22 BB one 33 two 44 six 55

获取第一水平和第一个轴的交叉部分

In [67]: df.xs('BB',level=0,axis=0) #Note : level and axis are optional, and default to zero Out[67]: MyData one 33 two 44 six 55

获取第二水平和第一个轴的交叉部分

In [68]: df.xs('six',level=1,axis=0) Out[68]: MyData AA 22 BB 55

用xs函数对多重索引进行切片方法二

In [69]: index = list(itertools.product(['Ada','Quinn','Violet'],['Comp','Math','Sci'])) In [70]: headr = list(itertools.product(['Exams','Labs'],['I','II'])) In [71]: indx = pd.MultiIndex.from_tuples(index,names=['Student','Course']) In [72]: cols = pd.MultiIndex.from_tuples(headr) #Notice these are un-named In [73]: data = [[70+x+y+(x*y)%3 for x in range(4)] for y in range(9)] In [74]: df = pd.DataFrame(data,indx,cols); df Out[74]: Exams Labs I II I II Student Course Ada Comp 70 71 72 73 Math 71 73 75 74 Sci 72 75 75 75 Quinn Comp 73 74 75 76 Math 74 76 78 77 Sci 75 78 78 78 Violet Comp 76 77 78 79 Math 77 79 81 80 Sci 78 81 81 81 In [75]: All = slice(None) In [76]: df.loc['Violet'] Out[76]: Exams Labs I II I II Course Comp 76 77 78 79 Math 77 79 81 80 Sci 78 81 81 81 In [77]: df.loc[(All,'Math'),All] Out[77]: Exams Labs I II I II Student Course Ada Math 71 73 75 74 Quinn Math 74 76 78 77 Violet Math 77 79 81 80 In [78]: df.loc[(slice('Ada','Quinn'),'Math'),All] Out[78]: Exams Labs I II I II Student Course Ada Math 71 73 75 74 Quinn Math 74 76 78 77 In [79]: df.loc[(All,'Math'),('Exams')] Out[79]: I II Student Course Ada Math 71 73 Quinn Math 74 76 Violet Math 77 79 In [80]: df.loc[(All,'Math'),(All,'II')] Out[80]: Exams Labs II II Student Course Ada Math 73 74 Quinn Math 76 77 Violet Math 79 80

排序

在多重索引中用某一列进行排序

In [81]: df.sort_values(by=('Labs', 'II'), ascending=False) Out[81]: Exams Labs I II I II Student Course Violet Sci 78 81 81 81 Math 77 79 81 80 Comp 76 77 78 79 Quinn Sci 75 78 78 78 Math 74 76 78 77 Comp 73 74 75 76 Ada Sci 72 75 75 75 Math 71 73 75 74 Comp 70 71 72 73
转载请注明原文地址: https://www.6miu.com/read-45970.html

最新回复(0)