, .
:
, . , 2012 .
AlexNet, , , () . , 2013 .
, Q-learning Atari. , , . , () , , .
. 3D MuJoCo 80 (1440 ) 10 (3 32 10 ). 720 Atari, 3, 1 1 .
. , ( ), . ( ). , . , , . 1 . , , (.. ).
, ( ).
. , - , (, ). : , , . : , , , . . , . : , , .
, .. . , , . : , . , . , .
, , . , .
, , , 1 ( ), 1 ( ). 1 . .
, . , :
1)
2) - .
, w, , 100 w1 w100, . 100 , . 100 , (.. ). , 100 .
, .
# simple example: minimize a quadratic around some solution point
import numpy as np
solution = np.array([0.5, 0.1, -0.3])
def f(w): return -np.sum((w - solution)**2)
npop = 50 # population size
sigma = 0.1 # noise standard deviation
alpha = 0.001 # learning rate
w = np.random.randn(3) # initial guess
for i in range(300):
N = np.random.randn(npop, 3)
R = np.zeros(npop)
for j in range(npop):
w_try = w + sigma*N[j]
R[j] = f(w_try)
A = (R - np.mean(R)) / np.std(R)
w = w + alpha/(npop*sigma) * np.dot(N.T, A)
, , : . , . . ( ). .
:
- . , 2-3 . . , . , , ( ) ( ).
- . . ( ). .
- . , Atari . .
- . , . , . Q- , .
- . , , .. . , Seaquest , , .
. , , . , Montezumas Revenge , , .
Evolution Strategies as a Scalable Alternative to Reinforcement Learning.
.
. , .

https://habrahabr.ru/post/330342/