-

   rss_rss_hh_new

 - e-mail

 

 -

 LiveInternet.ru:
: 17.03.2011
:
:
: 51

:


[]

, 06 2017 . 14:38 +
, .
:

  • .

, . , 2012 . AlexNet, , , () . , 2013 . , Q-learning Atari. , , . , () , , .

. 3D MuJoCo 80 (1440 ) 10 (3 32 10 ). 720 Atari, 3, 1 1 .


. , ( ), . ( ). , . , , . 1 . , , (.. ).


, ( ).

. , - , (, ). : , , . : , , , . . , . : , , .


, .. . , , . : , . , . , .



, , . , .


, , , 1 ( ), 1 ( ). 1 . .


, . , :

1)
2) - .

, w, , 100 w1 w100, . 100 , . 100 , (.. ). , 100 .


, .

# simple example: minimize a quadratic around some solution point
import numpy as np  
solution = np.array([0.5, 0.1, -0.3])  
def f(w): return -np.sum((w - solution)**2)

npop = 50      # population size  
sigma = 0.1    # noise standard deviation  
alpha = 0.001  # learning rate  
w = np.random.randn(3) # initial guess  
for i in range(300):  
  N = np.random.randn(npop, 3)
  R = np.zeros(npop)
  for j in range(npop):
    w_try = w + sigma*N[j]
    R[j] = f(w_try)
  A = (R - np.mean(R)) / np.std(R)
  w = w + alpha/(npop*sigma) * np.dot(N.T, A)


, , : . , . . ( ). .


:

  • . , 2-3 . . , . , , ( ) ( ).

  • . . ( ). .

  • . , Atari . .

  • . , . , . Q- , .

  • . , , .. . , Seaquest , , .

. , , . , Montezumas Revenge , , .

Evolution Strategies as a Scalable Alternative to Reinforcement Learning.

.
?

63 . 16 .

. , .

Original source: habrahabr.ru (comments, light).

https://habrahabr.ru/post/330342/

:  

: [1] []
 

:
: 

: ( )

:

  URL