week14

xiaoxiao2021-02-28  24

Part I

For each of the four datasets…

%matplotlib inline import random import numpy as np import scipy as sp import pandas as pd import matplotlib.pyplot as plt import seaborn as sns import statsmodels.api as sm import statsmodels.formula.api as smf sns.set_style("dark") sns.set_context("talk") anascombe = pd.read_csv('https://nbviewer.jupyter.org/github/schmit/cme193-ipython-notebooks-lecture/tree/master/data/anscombe.csv') anascombe.head() Compute the mean and variance of both x and y print(anascombe.groupby('dataset')['x', 'y'].mean()) print(anascombe.groupby('dataset')['x', 'y'].var()) x y dataset I 9.0 7.500909 II 9.0 7.500909 III 9.0 7.500000 IV 9.0 7.500909 x y dataset I 11.0 4.127269 II 11.0 4.127629 III 11.0 4.122620 IV 11.0 4.123249 Compute the correlation coefficient between x and y print(anascombe.groupby('dataset')['x', 'y'].corr()) x y dataset I x 1.000000 0.816421 y 0.816421 1.000000 II x 1.000000 0.816237 y 0.816237 1.000000 III x 1.000000 0.816287 y 0.816287 1.000000 IV x 1.000000 0.816521 y 0.816521 1.000000 Compute the linear regression line(hint: use statsmodels and look at the Statsmodels notebook) lin_model1=smf.ols('y ~ x', anascombe[anascombe['dataset']=='I']).fit(); print(lin_model1.summary()) lin_model2=smf.ols('y ~ x', anascombe[anascombe['dataset']=='II']).fit(); print(lin_model2.summary()) lin_model3=smf.ols('y ~ x', anascombe[anascombe['dataset']=='III']).fit(); print(lin_model3.summary()) lin_model4=smf.ols('y ~ x', anascombe[anascombe['dataset']=='IV']).fit(); print(lin_model4.summary()) OLS Regression Results ============================================================================== Dep. Variable: y R-squared: 0.667 Model: OLS Adj. R-squared: 0.629 Method: Least Squares F-statistic: 17.99 Date: Sat, 09 Jun 2018 Prob (F-statistic): 0.00217 Time: 00:02:39 Log-Likelihood: -16.841 No. Observations: 11 AIC: 37.68 Df Residuals: 9 BIC: 38.48 Df Model: 1 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ Intercept 3.0001 1.125 2.667 0.026 0.456 5.544 x 0.5001 0.118 4.241 0.002 0.233 0.767 ============================================================================== Omnibus: 0.082 Durbin-Watson: 3.212 Prob(Omnibus): 0.960 Jarque-Bera (JB): 0.289 Skew: -0.122 Prob(JB): 0.865 Kurtosis: 2.244 Cond. No. 29.1 ============================================================================== Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. OLS Regression Results ============================================================================== Dep. Variable: y R-squared: 0.666 Model: OLS Adj. R-squared: 0.629 Method: Least Squares F-statistic: 17.97 Date: Sat, 09 Jun 2018 Prob (F-statistic): 0.00218 Time: 00:02:39 Log-Likelihood: -16.846 No. Observations: 11 AIC: 37.69 Df Residuals: 9 BIC: 38.49 Df Model: 1 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ Intercept 3.0009 1.125 2.667 0.026 0.455 5.547 x 0.5000 0.118 4.239 0.002 0.233 0.767 ============================================================================== Omnibus: 1.594 Durbin-Watson: 2.188 Prob(Omnibus): 0.451 Jarque-Bera (JB): 1.108 Skew: -0.567 Prob(JB): 0.575 Kurtosis: 1.936 Cond. No. 29.1 ============================================================================== Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. OLS Regression Results ============================================================================== Dep. Variable: y R-squared: 0.666 Model: OLS Adj. R-squared: 0.629 Method: Least Squares F-statistic: 17.97 Date: Sat, 09 Jun 2018 Prob (F-statistic): 0.00218 Time: 00:02:39 Log-Likelihood: -16.838 No. Observations: 11 AIC: 37.68 Df Residuals: 9 BIC: 38.47 Df Model: 1 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ Intercept 3.0025 1.124 2.670 0.026 0.459 5.546 x 0.4997 0.118 4.239 0.002 0.233 0.766 ============================================================================== Omnibus: 19.540 Durbin-Watson: 2.144 Prob(Omnibus): 0.000 Jarque-Bera (JB): 13.478 Skew: 2.041 Prob(JB): 0.00118 Kurtosis: 6.571 Cond. No. 29.1 ============================================================================== Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. OLS Regression Results ============================================================================== Dep. Variable: y R-squared: 0.667 Model: OLS Adj. R-squared: 0.630 Method: Least Squares F-statistic: 18.00 Date: Sat, 09 Jun 2018 Prob (F-statistic): 0.00216 Time: 00:02:39 Log-Likelihood: -16.833 No. Observations: 11 AIC: 37.67 Df Residuals: 9 BIC: 38.46 Df Model: 1 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ Intercept 3.0017 1.124 2.671 0.026 0.459 5.544 x 0.4999 0.118 4.243 0.002 0.233 0.766 ============================================================================== Omnibus: 0.555 Durbin-Watson: 1.662 Prob(Omnibus): 0.758 Jarque-Bera (JB): 0.524 Skew: 0.010 Prob(JB): 0.769 Kurtosis: 1.931 Cond. No. 29.1 ==============================================================================

Use those forecast model and get the following figures

Part II

Using Seaborn, visualize all four datasets. hint: use sns.FacetGrid combined with plt.scatter

g =sns.FacetGrid(anascombe, hue = 'dataset', size=9) g.map(plt.scatter, 'x', 'y')

转载请注明原文地址: https://www.6miu.com/read-2620899.html

最新回复(0)