文章作者是我的一位大神学长陈雨学长~,征得学长同意后将文章转载到了我的blog上,特在此感谢学长~
代码在学长的github上: https://github.com/unnamed2/MDL 欢迎猛击
自己实现的机器学习3
0 开始写代码
01 代码不多去掉矩阵运算和数据读取之外只有不到150行在这里全部拿上来02 运行结果 1 过拟合
自己实现的机器学习(3):
3.0 开始写代码
3.0.1 代码不多,去掉矩阵运算和数据读取之外只有不到150行,在这里全部拿上来:
template<size_t Row,size_t Col>
void misnt_data_reader(
std::
vector<std::pair<Matrix, Matrix>>& outputs){}
float Random_0_1() {
return (
float)rand() / (
float)RAND_MAX *
0.001f;
}
float Sigmoid(
float x)
{
return 1.0f / (expf(-x) +
1.0f);
}
float Sigmoid_P(
float x)
{
return Sigmoid(x)*(
1.0f - Sigmoid(x));
}
float LeaklyReLU(
float x) {
if (x >
0.0f)
return x;
return 0.1f * x;
}
float LeaklyReLU_P(
float x) {
if (x >
0)
return 1.0f;
return 0.1f;
}
int main()
{
std::
vector<std::pair<Matrix, Matrix>> datas;
printf(
"Initializing ...");
misnt_data_reader<
28 *
28,
1>(datas);
printf(
"done.\ttraining...\n");
const int sample_per_minibatch =
25;
const int hiddens =
100;
const float learning_rate =
0.01f / sample_per_minibatch;
Matrix W0(hiddens,
28 *
28, Random_0_1);
Matrix B0(hiddens,
1, Random_0_1);
Matrix W1(
10, hiddens, Random_0_1);
Matrix B1(
10,
1, Random_0_1);
std::random_device rd;
std::mt19937 g(rd());
Matrix partlW0(hiddens,
28 *
28, []() {
return 0.0f; });
Matrix partlB0(hiddens,
1, []() {
return 0.0f; });
Matrix partlW1(
10, hiddens, []() {
return 0.0f; });
Matrix partlB1(
10,
1, []() {
return 0.0f; });
for (
int i =
0; i <
20; i++) {
auto tp =
std::chrono::system_clock::now();
std::random_shuffle(datas.begin(), datas.begin() +
50000);
int DataOffset =
0;
int ErrC =
0;
for (
int j =
0; j <
50000 / sample_per_minibatch; j++) {
partlB0.SetValue(
0.0f);
partlB1.SetValue(
0.0f);
partlW0.SetValue(
0.0f);
partlW1.SetValue(
0.0f);
for (
int k =
0; k < sample_per_minibatch; k++) {
auto & samp = datas[DataOffset + k];
Matrix A0 = W0 * samp.first + B0;
Matrix Z0 = A0.Apply(Sigmoid);
Matrix A1 = W1 * Z0 + B1;
Matrix Y = A1.Apply(LeaklyReLU);
int idx =
0;
for (
int i =
0; i <
10; i++) {
if (Y(i,
0) > Y(idx,
0))idx = i;
}
if (samp.second(idx,
0) <
0.9f)ErrC++;
Matrix Loss = Y-samp.second;
Loss.ElementMultiplyWith(A1.Apply(LeaklyReLU_P));
partlB1 += Loss;
partlW1 += Loss * ~Z0;
Loss = (~W1 * Loss);
Loss.ElementMultiplyWith(A0.Apply(Sigmoid_P));
partlB0 += Loss;
partlW0 += Loss * ~samp.first;
}
W0 -= partlW0 * learning_rate;
B0 -= partlB0 * learning_rate;
W1 -= partlW1 * learning_rate;
B1 -= partlB1 * learning_rate;
}
DataOffset += sample_per_minibatch;
auto ed =
std::chrono::system_clock::now();
auto g = ed - tp;
int errCount =
0;
for (
int j =
0; j <
10000; j++) {
auto & samp = datas[
50000 + j];
Matrix A0 = W0 * samp.first + B0;
Matrix Z0 = A0.Apply(Sigmoid);
Matrix A1 = W1 * Z0 + B1;
Matrix Y = A1.Apply(LeaklyReLU);
int idx =
0;
for (
int i =
0; i <
10; i++) {
if (Y(i,
0) > Y(idx,
0))idx = i;
}
if (samp.second(idx,
0) <
0.9f)errCount++;
}
printf(
"Training %d / 20,loss %f%% on training set,loss %f%% on trst set,training cost %d ms\n", i+
1,ErrC /
500.0f, errCount /
100.0f,
std::chrono::duration_cast<
std::chrono::duration<
int,
std::milli>>(g).count());
}
}
3.0.2 运行结果
Training
1 /
20,loss
75.921997%
on training
set,loss
87.839996%
on trst
set,training cost
5986 ms
Training
2 /
20,loss
58.952000%
on training
set,loss
61.270000%
on trst
set,training cost
5982 ms
Training
3 /
20,loss
30.878000%
on training
set,loss
50.520000%
on trst
set,training cost
5961 ms
Training
4 /
20,loss
12.102000%
on training
set,loss
38.549999%
on trst
set,training cost
5979 ms
Training
5 /
20,loss
4.478000%
on training
set,loss
43.619999%
on trst
set,training cost
5801 ms
Training
6 /
20,loss
6.040000%
on training
set,loss
37.169998%
on trst
set,training cost
5911 ms
Training
7 /
20,loss
9.270000%
on training
set,loss
33.570000%
on trst
set,training cost
5886 ms
Training
8 /
20,loss
4.076000%
on training
set,loss
32.689999%
on trst
set,training cost
5920 ms
Training
9 /
20,loss
5.840000%
on training
set,loss
28.670000%
on trst
set,training cost
6021 ms
Training
10 /
20,loss
9.454000%
on training
set,loss
29.030001%
on trst
set,training cost
6032 ms
Training
11 /
20,loss
1.994000%
on training
set,loss
27.410000%
on trst
set,training cost
5956 ms
Training
12 /
20,loss
7.128000%
on training
set,loss
28.510000%
on trst
set,training cost
5881 ms
Training
13 /
20,loss
6.376000%
on training
set,loss
25.799999%
on trst
set,training cost
5828 ms
Training
14 /
20,loss
4.098000%
on training
set,loss
28.410000%
on trst
set,training cost
5866 ms
Training
15 /
20,loss
8.378000%
on training
set,loss
28.200001%
on trst
set,training cost
5681 ms
Training
16 /
20,loss
3.168000%
on training
set,loss
25.540001%
on trst
set,training cost
5814 ms
Training
17 /
20,loss
4.834000%
on training
set,loss
26.549999%
on trst
set,training cost
5744 ms
Training
18 /
20,loss
0.086000%
on training
set,loss
24.250000%
on trst
set,training cost
5901 ms
Training
19 /
20,loss
8.382000%
on training
set,loss
25.600000%
on trst
set,training cost
5863 ms
Training
20 /
20,loss
4.632000%
on training
set,loss
31.549999%
on trst
set,training cost
5776 ms
3.1 过拟合
从上面的结果里面可以明显的看到,在训练集上的误差要比测试集小得多,甚至第18次训练训练集上准确率已经高达99.9%以上,但是在测试集上依然只有75%左右的正确率. 换句话说,神经网络已经开始玩”背答案”,对教过她的其他内容掌握的明显不好.
在机器学习的方法里面,有很多手段来避免过拟合,后面会告诉大家如何真真正正的训练出一个不管在哪正确率都在99.9%以上的算法.