@100days of ml coding 逻辑回归

xiaoxiao2021-07-04 347

逻辑回归

逻辑回归可用来处理不同的分类问题，目的是预测当前被观察对象属于哪个组，会提供一个离散的二进制输出结果。作出预测：概率值必须被转换为二进制以便进行预测。二分类问题的概率与自变量之间的关系图形往往是一个S型曲线，可采用的Sigmoid函数实现。然后使用阈值分类器将（0，1）范围内的值转化为0或1的值。逻辑回归可以概率的形式输出，不只是0和1的判定。

线性回归逻辑回归自变量与因变量呈线性关系不要求自变量和因变量呈线性关系直接分析因变量y与自变量x的关系分析因变量y取某个值的概率与自变量x的关系给出的结果是连续的给出的结果是离散的

以下为分析（年龄-薪金）与是否有意愿购买产品间的关系。

###步骤1：数据预处理 #导入库 import numpy as np import pandas as pd import matplotlib.pyplot as plt #导入数据集 dataset = pd.read_csv('Social_Network_Ads.csv') X = dataset.iloc[:,[2,3]].values Y = dataset.iloc[:,4].values #将数据集分成训练集和测试集 from sklearn.model_selection import train_test_split X_train ,X_test ,y_train ,y_test = train_test_split(X,Y,test_size = 0.25 ,random_state = 0) #特征缩放 from sklearn.preprocessing import StandardScaler sc = StandardScaler() X_train = sc.fit_transform(X_train) X_test = sc.transform(X_test) ###步骤2:逻辑回归模型 from sklearn.linear_model import LogisticRegression classifier = LogisticRegression() classifier.fit(X_train,y_train) ###步骤3：预测 y_pred = classifier.predict(X_test) ###步骤4：评估预测 #生成混淆矩阵 from sklearn.metrics import confusion_matrix cm = confusion_matrix(y_test,y_pred) #使用matplotlib库绘图 from matplotlib.colors import ListedColormap X_set,y_set=X_train,y_train X1,X2=np. meshgrid(np. arange(start=X_set[:,0].min()-1, stop=X_set[:, 0].max()+1, step=0.01), np. arange(start=X_set[:,1].min()-1, stop=X_set[:,1].max()+1, step=0.01)) plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(),X2.ravel()]).T).reshape(X1.shape), alpha = 0.75, cmap = ListedColormap(('red', 'green'))) plt.xlim(X1.min(),X1.max()) plt.ylim(X2.min(),X2.max()) for i,j in enumerate(np. unique(y_set)): plt.scatter(X_set[y_set==j,0],X_set[y_set==j,1], c = ListedColormap(('red', 'green'))(i), label=j) plt. title(' LOGISTIC(Training set)') plt. xlabel(' Age') plt. ylabel(' Estimated Salary') plt. legend() plt. show() X_set,y_set=X_test,y_test X1,X2=np. meshgrid(np. arange(start=X_set[:,0].min()-1, stop=X_set[:, 0].max()+1, step=0.01), np. arange(start=X_set[:,1].min()-1, stop=X_set[:,1].max()+1, step=0.01)) plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(),X2.ravel()]).T).reshape(X1.shape), alpha = 0.75, cmap = ListedColormap(('red', 'green'))) plt.xlim(X1.min(),X1.max()) plt.ylim(X2.min(),X2.max()) for i,j in enumerate(np. unique(y_set)): plt.scatter(X_set[y_set==j,0],X_set[y_set==j,1], c = ListedColormap(('red', 'green'))(i), label=j) plt. title(' LOGISTIC(Test set)') plt. xlabel(' Age') plt. ylabel(' Estimated Salary') plt. legend() plt. show()

studentscores.csv中的数据来源

https://github.com/Avik-Jain/100-Days-Of-ML-Code/blob/master/datasets/Social_Network_Ads.csv 部分数据展示：

生成结果

【1】: 100-Days-Of-ML-Code:https://github.com/Avik-Jain/100-Days-Of-ML-Code

转载请注明原文地址: https://www.6miu.com/read-4821303.html

专利

最新回复(0)