这次我们完成的题目来源于该网站:
https://nbviewer.jupyter.org/github/schmit/cme193-ipython-notebooks-lecture/blob/master/Exercises.ipynb
For each of the four datasets…
Compute the mean and variance of both x and yCompute the correlation coefficient between x and yCompute the linear regression line: y=β0+β1x+ϵ (hint: use statsmodels and look at the Statsmodels notebook)在计算数据的统计学数据时,我使用到了numpy库相关的函数,如np.average(), np.var(),np.corrcoef()等,对statsmodule库也有了初步的了解。
该数据集的四个部分,均值,方差,相关系数以及拟合曲线均惊人的一致。我使用json格式打印出来,如下所示。
{ "dataset_0": { "average": 9.0, "variance": 10.0, "coef": 0.81642051634484, "fit_result": [ 3.0000909090909085, 0.5000909090909091 ] }, "dataset_1": { "average": 9.0, "variance": 10.0, "coef": 0.8162365060002428, "fit_result": [ 3.00090909090909, 0.5000000000000002 ] }, "dataset_2": { "average": 9.0, "variance": 10.0, "coef": 0.8162867394895984, "fit_result": [ 3.002454545454544, 0.49972727272727285 ] }, "dataset_3": { "average": 9.0, "variance": 10.0, "coef": 0.8165214368885028, "fit_result": [ 3.001727272727269, 0.499909090909091 ] } }Using Seaborn, visualize all four datasets.
hint: use sns.FacetGrid combined with plt.scatter
这道题要求使用Seaborn库来对数据进行展示。话说Seaborn是对matplotlib库更高级的封装,画出来的图表,代码少还好看!
这里我使用了两种方法来画图。
一种是Seaborn中的lmplot函数来画,该函数还能够顺便把拟合曲线给画出来。
另一个就是题目中的提示,使用FacetGrid函数将plt.scatter的四个散点图拼在一起。
sns.set(style="ticks") print(anscombe) # Show the results of a linear regression within each dataset sns.lmplot(x="x", y="y", data = anscombe, col='data') g = sns.FacetGrid(anscombe, col='data') g = g.map(plt.scatter, 'x', 'y')
