改进神经网络

Improve NN

文章目录

Improve NN
- train/dev/test set
- Bias/Variance
- basic recipe
- Regularization
- - Logistic Regression
  - Neural network
  - other ways
- optimization problem
- - Normalizing inputs
  - vanishing/exploding gradients
  - weight initialize
  - gradient check
  - - Numerical approximation
    - grad check

train/dev/test set

0.7/0/0.3 0.6.0.2.0.2 -> 100-10000

0.98/0.01/0.01 … -> big data

Bias/Variance

偏差度量的是单个模型的学习能力，而方差度量的是同一个模型在不同数据集上的稳定性。

在这里插入图片描述

high variance ->high dev set error

high bias ->high train set error

basic recipe

high bias -> bigger network / train longer / more advanced optimization algorithms / NN architectures

high variance -> more data / regularization / NN architecture

Regularization

Logistic Regression

(

)

→

(

)

∑

(

)

(

)

∥

L2\;\; regularization:\\min\mathcal{J}(w,b)\rightarrow J(w,b)=\frac{1}{m}\sum_{i=1}^m\mathcal{L}(\hat y^{(i)},y^{(i)})+\frac{\lambda}{2m}\Vert w\Vert_2^2

L2regularization:minJ(w,b)→J(w,b)=m1i=1∑mL(y^(i),y(i))+2mλ∥w∥22

Neural network

∥

[

]

∥

∑

[

]

∑

[

−

]

(

[

]

)

(

[

]

[

]

)

(

)

Frobenius\;\; norm\\ \Vert w^{[l]}\Vert^2_F=\sum_{i=1}^{n^{[l]}}\sum_{j=1}^{n^{[l-1]}}(w_{i,j}^{[l]})^2\\\\ Dropout\;\; regularization:\\ d3=np.randm.rand(a3.shape.shape[0],a3.shape[1]<keep.prob)\\ a3=np.multiply(a3,d3)\\ a3/=keep.prob

Frobeniusnorm∥w[l]∥F2=i=1∑n[l]j=1∑n[l−1](wi,j[l])2Dropoutregularization:d3=np.randm.rand(a3.shape.shape[0],a3.shape[1]<keep.prob)a3=np.multiply(a3,d3)a3/=keep.prob

other ways

early stopping
data augmentation

optimization problem

speed up the training of your neural network

Normalizing inputs

subtract mean

∑

(

)

−

\mu =\frac{1}{m}\sum _{i=1}^{m}x^{(i)}\\ x:=x-\mu

μ=m1i=1∑mx(i)x:=x−μ

normalize variance

∑

(

)

\sigma ^2=\frac{1}{m}\sum_{i=1}^m(x^{(i)})^2\\ x/=\sigma

σ2=m1i=1∑m(x(i))2x/=σ

vanishing/exploding gradients

[

]

[

−

]

[

]

[

]

[

]

→

(

[

]

)

→

∞

[

]

I\rightarrow (w^{[l]})^L\rightarrow\infty \\w^{[l]}I→(w[l])L→∞w[l]<I→(w[l])L→0

weight initialize

v

a

r

(

w

)

=

1

n

(

l

−

1

)

w

[

l

]

=

n

p

.

r

a

n

d

o

m

.

r

a

n

d

n

(

s

h

a

p

e

)

∗

n

p

.

s

q

r

t

(

1

n

(

l

−

1

)

)

var(w)=\frac{1}{n^{(l-1)}}\\ w^{[l]}=np.random.randn(shape)*np.sqrt(\frac{1}{n^{(l-1)}})

var(w)=n(l−1)1w[l]=np.random.randn(shape)∗np.sqrt(n(l−1)1)

gradient check

Numerical approximation

f

(

θ

)

=

θ

3

f

′

(

θ

)

=

f

(

θ

+

ε

)

−

f

(

θ

−

ε

)

2

ε

f(\theta)=\theta^3\\ f'(\theta)=\frac{f(\theta+\varepsilon)-f(\theta-\varepsilon)}{2\varepsilon}

f(θ)=θ3f′(θ)=2εf(θ+ε)−f(θ−ε)

grad check

d

θ

a

p

p

r

o

x

[

i

]

=

J

(

θ

1

,

.

.

.

θ

i

+

ε

.

.

.

)

−

J

(

θ

1

,

.

.

.

θ

i

−

ε

.

.

.

)

2

ε

=

d

θ

[

i

]

c

h

e

c

k

:

∥

d

θ

a

p

p

r

o

x

−

d

θ

∥

2

∥

d

θ

a

p

p

r

o

x

∥

2

+

∥

d

θ

∥

2

<

1

0

−

7

d\theta_{approx}[i]=\frac{J(\theta_1,…\theta_i+\varepsilon…)-J(\theta_1,…\theta_i-\varepsilon…)}{2\varepsilon}=d\theta[i]\\ check:\frac{\Vert d\theta_{approx}-d\theta\Vert_2}{\Vert d\theta_{approx}\Vert_2+\Vert d\theta\Vert_2}<10^{-7}

dθapprox[i]=2εJ(θ1,…θi+ε…)−J(θ1,…θi−ε…)=dθ[i]check:∥dθapprox∥2+∥dθ∥2∥dθapprox−dθ∥2<10−7

本文来自网络，不代表协通编程立场，如若转载，请注明出处：https://www.net2asp.com/de44d3d1c6.html