For each of the four datasets...
Compute the mean and variance of both x and yCompute the correlation coefficient between x and yCompute the linear regression line: y=β0+β1x+ϵy=β0+β1x+ϵ (hint: use statsmodels and look at the Statsmodels notebook)
OUTPUT
Code
print( 'The mean of x is : ', end="") print(anascombe['x'].mean()) print( 'The mean of y is : ', end="") print(anascombe['y'].mean()) print( 'The variance of x is : ', end="") print(anascombe['x'].var()) print( 'The variance of x is : ', end="") print(anascombe['y'].var()) print("The correlation coefficient between x and y: ", end="") print((np.corrcoef(np.array([anascombe['x'], anascombe['y']])))[0][1]) n = len(anascombe) is_train = np.random.rand(n) < 0.7 train = anascombe[is_train].reset_index(drop=True) test = anascombe[~is_train].reset_index(drop=True) lin_model = smf.ols('y ~ x', train).fit() lin_model.summary()Using Seaborn, visualize all four datasets.
OUTPUT
Code
# your code here m = sns.FacetGrid(anascombe, col="dataset") m.map(plt.scatter, "x","y")