【Jupyter】练习题

xiaoxiao2021-02-28 98

Part 1

For each of the four datasets...

Compute the mean and variance of both x and yCompute the correlation coefficient between x and yCompute the linear regression line:

y=β0+β1x+ϵy=β0+β1x+ϵ (hint: use statsmodels and look at the Statsmodels notebook)

OUTPUT

Code

print( 'The mean of x is : ', end="") print(anascombe['x'].mean()) print( 'The mean of y is : ', end="") print(anascombe['y'].mean()) print( 'The variance of x is : ', end="") print(anascombe['x'].var()) print( 'The variance of x is : ', end="") print(anascombe['y'].var()) print("The correlation coefficient between x and y: ", end="") print((np.corrcoef(np.array([anascombe['x'], anascombe['y']])))[0][1]) n = len(anascombe) is_train = np.random.rand(n) < 0.7 train = anascombe[is_train].reset_index(drop=True) test = anascombe[~is_train].reset_index(drop=True) lin_model = smf.ols('y ~ x', train).fit() lin_model.summary()

Part 2

Using Seaborn, visualize all four datasets.

OUTPUT

Code

# your code here m = sns.FacetGrid(anascombe, col="dataset") m.map(plt.scatter, "x","y")

转载请注明原文地址: https://www.6miu.com/read-2620609.html

技术

最新回复(0)