Python数据分析与展示（5）——Pandas库入门

xiaoxiao2021-02-27 207

最近在中国大学mooc网学习Python数据分析与展示相关知识，记入下来，以供参考。

Pandas库入门

Pandas库的

Pandas是Python第三方库，提供高性能易用数据类型和分析工具 Pandas基于NumPy实现，常与NumPy和Matplotlib一同使用其引用方式一般如下：

import pandas as pd

Pandas库的Series类型

Series类型由一组数据及与之相关的数据索引组成

Series类型的创建

Series类型可以由如下类型创建： • Python列表，index与列表元素个数一致 • 标量值，index表达Series类型的尺寸 • Python字典，键值对中的“键”是索引，index从字典中进行选择操作 • ndarray，索引和数据都可以通过ndarray类型创建 • 其他函数，range()函数等

In [1]:import pandas as pd #从列表创建 In [2]:a = pd.Series([9,8,7,6]) In [3]:a Out[3]: 0 9 1 8 2 7 3 6 dtype: int64 In [4]:b = pd.Series([9,8,7,6],index = ['a','b','c','d']) In [5]:b Out[5]: a 9 b 8 c 7 d 6 dtype: int64 #从标量值创建 In [6]:s = pd.Series(25,index = ['a','b','c']) In [7]:s Out[7]: a 25 b 25 c 25 dtype: int64 #从字典类型创建 In [8]:d = pd.Series({'a':9,'b':8,'c':7}) In [9]:d Out[9]: a 9 b 8 c 7 dtype: int64 In [10]:e = pd.Series({'a':9,'b':8,'c':7},index = ['c','a','b','d']) In [11]:e Out[11]: c 7.0 a 9.0 b 8.0 d NaN dtype: float64 #从ndarray类型创建 In [12]:import numpy as np In [13]:n =pd.Series(np.arange(5)) In [14]:n Out[14]: 0 0 1 1 2 2 3 3 4 4 dtype: int32 In [15]:n =pd.Series(np.arange(5),index = np.arange(9,4,-1)) In [16]:n Out[16]: 9 0 8 1 7 2 6 3 5 4 dtype: int32

Series类型的基本操作

Series类型包括index和values两部分Series类型的操作类似ndarray类型Series类型的操作类似Python字典类型 In[17]:b.index Out[17]: Index(['a', 'b', 'c', 'd'], dtype='object') In[18]:b.values Out[18]: array([9, 8, 7, 6], dtype=int64)

Series类型的操作类似ndarray类型： • 索引方法相同，采用[] • NumPy中运算和操作可用于Series类型 • 可以通过自定义索引的列表进行切片 • 可以通过自动索引进行切片，如果存在自定义索引，则一同被切片

Series类型的操作类似Python字典类型： • 通过自定义索引访问 • 保留字in操作,不会判断自动索引 • 使用.get()方法

Series类型对齐操作

Series + Series

In[22]:a = pd.Series([1,2,3],['c','d','e']) In[23]:b = pd.Series([9,8,7,6],['a', 'b', 'c', 'd']) In[24]:a + b Out[24]: a NaN b NaN c 8.0 d 8.0 e NaN dtype: float64

Series类型在运算中会自动对齐不同索引的数据

Series类型的name属性

Series对象和索引都可以有一个名字，存储在属性.name中

In [31]:b.name In [32]:b.name = "Series对象" In [33]:b.index.name = '索引列' In [34]:b Out[34]: 索引列 a 9 b 8 c 7 d 6 Name: Series对象, dtype: int64

Pandas库的DataFrame类型

DataFrame类型

DataFrame类型由共用相同索引的一组列组成 DataFrame是一个表格型的数据类型，每列值类型可以不同 DataFrame既有行索引、也有列索引 DataFrame常用于表达二维数据，但可以表达多维数据

DataFrame类型可以由如下类型创建： • 二维ndarray对象 • 由一维ndarray、列表、字典、元组或Series构成的字典 • Series类型 • 其他的DataFrame类型

#从二维ndarry对象创建 In [35]:import pandas as pd In [36]:import numpy as np In [37]:d = pd.DataFrame(np.arange(10).reshape(2,5)) In [38]:d Out[38]: 0 1 2 3 4 0 0 1 2 3 4 1 5 6 7 8 9 #从一维ndarry对象字典创建 In [42]: dt = {'one':pd.Series([1,2,3],index = ['a','b','c']),'two':pd.Series([9,8,7,6],index = ['a', 'b', 'c', 'd'])} In [43]: d = pd.DataFrame(dt) In [44]: d Out[44]: one two a 1.0 9 b 2.0 8 c 3.0 7 d NaN 6 In [46]: pd.DataFrame(dt,index = ['a','b','c'],columns = ['two','three']) Out[46]: two three a 9 NaN b 8 NaN c 7 NaN

Pandas库的数据类型操作

如何改变Series和DataFrame对象？增加或重排：重新索引删除：drop

重新索引

.reindex()能够改变或重排Series和DataFrame索引 .reindex(index=None, columns=None, …)的参数

index, columns 新的行列自定义索引fill_value 重新索引中，用于填充缺失位置的值method 填充方法, ffill当前值向前填充，bfill向后填充limit 最大填充量copy 默认True，生成新的对象，False时，新旧相等不复制

索引类型的常用方法

方法说明.append(idx)连接另一个Index对象，产生新的Index对象.diff(idx)计算差集，产生新的Index对象.intersection(idx)计算交集.union(idx)计算并集.delete(loc)删除loc位置处的元素.insert(loc,e)在loc位置增加一个元素e

删除指定索引对象

.drop()能够删除Series和DataFrame指定行或列索引

Pandas库的数据类型运算

算数运算法则

算术运算根据行列索引，补齐后运算，运算默认产生浮点数补齐时缺项填充NaN (空值)二维和一维、一维和零维间为广播运算采用+ ‐ * /符号进行的二元运算产生新的对象

方法形式的运算

方法说明.add(d, **argws)类型间加法运算，可选参数.sub(d, **argws)类型间减法运算，可选参数.mul(d, **argws)类型间乘法运算，可选参数.div(d, **argws)类型间除法运算，可选参数

比较运算法则

比较运算只能比较相同索引的元素，不进行补齐二维和一维、一维和零维间为广播运算采用> < >= <= == !=等符号进行的二元运算产生布尔对象

转载请注明原文地址: https://www.6miu.com/read-11913.html

技术

最新回复(0)