创建一个Seires,并用一个由列表或数组组成的列表作为索引
In [29]: data=pd.Series(np.random.randn(8),index=[['a','a','a','b','b','b','c','c'],[1,2,3,1,2,3,1,2]]) In [30]: data Out[30]: a 1 -0.506962 2 0.795603 3 0.368363 b 1 -0.029980 2 -0.642660 3 -0.667930 c 1 -1.707709 2 1.455244 dtype: float64查看标签
In [31]: data.index Out[31]: MultiIndex(levels=[[u'a', u'b', u'c'], [1, 2, 3]], labels=[[0, 0, 0, 1, 1, 1, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1]])选取子集
In [32]: data['a'] Out[32]: 1 -0.506962 2 0.795603 3 0.368363 dtype: float64 In [33]: data[:,2] Out[33]: a 0.795603 b -0.642660 c 1.455244 dtype: float64用unstack将数据重新分配到DataFrame中
In [34]: data.unstack() Out[34]: 1 2 3 a -0.506962 0.795603 0.368363 b -0.029980 -0.642660 -0.667930 c -1.707709 1.455244 NaNstack是unstack的逆运算
In [36]: data.unstack().stack() Out[36]: a 1 -0.506962 2 0.795603 3 0.368363 b 1 -0.029980 2 -0.642660 3 -0.667930 c 1 -1.707709 2 1.455244 dtype: float64DataFrame的每条轴都可以有分层索引
In [40]: frame=pd.DataFrame(np.arange(12).reshape((4,3)) ...: ,index=[['a','a','b','b'],[1,2,1,2]] ...: ,columns=[['A','A','B'],['Green','Red','Red']]) In [41]: frame Out[41]: A B Green Red Red a 1 0 1 2 2 3 4 5 b 1 6 7 8 2 9 10 11转载地址:
《利用Python进行数据分析》