Телеграмм чат группы theoreticalrl страница 7

Всем, кто интересуется обучением роборук точному позиционному хватанию сложных предметов посвящается
http://bair.berkeley.edu/blog/2017/06/27/dexnet-2.0/

The Berkeley Artificial Intelligence Research Blog

Releasing the Dexterity Network (Dex-Net) 2.0 Dataset for Deep Grasping

The BAIR Blog

источник

19:15пожаловаться #5

2017 July 10

Pavel Shvechikov in RL reading group

Some recent worthwhile papers:

Discrete Sequential Prediction of Continuous Actions for Deep RL (https://arxiv.org/abs/1705.05035) – modern competitive approach to discretization of continuous action and state spaces which is capable of finding global optimum of value functions (unlike DDPG / qNAF).

Count-Based Exploration in Feature Space for Reinforcement Learning (https://arxiv.org/abs/1706.08090) – new optimistic exploration algorithm grounded on frequency of deep state features observations and incorporating generalisation about uncertainty.

Noisy Networks for Exploration (https://arxiv.org/abs/1706.10295) – exploration guided by learnable (with SGD) noise added to NN parameters. Results seems remarkable.

Teacher-Student Curriculum Learning (https://arxiv.org/abs/1707.00183) – smart and adaptive curriculum aimed at fastest learning and consisting of two separate networks.

Hindsight Experience Replay (https://arxiv.org/abs/1707.01495) – implicit curriculum which helps to deal with very sparse rewards.

Observational Learning by Reinforcement Learning (https://arxiv.org/abs/1706.06617) – agents are taught to learn from observation of other agents.

Programmable Agents (https://arxiv.org/abs/1706.06383) – agents are given a program expressed in formal language, learn mapping of this language terms to perceptions finally becoming able to generalize to unseen terms and unseen circumstances.

Uncertainty Decomposition in Bayesian Neural Networks with Latent Variables (https://arxiv.org/abs/1706.08495) – elaborates on two fundamental uncertainty components (epistemic and aleatoric); proposes novel risk sensitive objective for safe reinforcement learning.

Gated-Attention Architectures for Task-Oriented Language Grounding (https://arxiv.org/abs/1706.07230) – end-to-end trainable neural architecture for reinforcement learning using natural language instructions as an input.

Constrained policy optimization (https://arxiv.org/abs/1705.10528, http://bair.berkeley.edu/blog/2017/07/06/cpo/) – one of the very first pioneering works devoted to safe actions generated from policy being optimized.

Schema Networks: Zero-shot Transfer with a Generative Causal Model of Intuitive Physics (https://arxiv.org/abs/1706.04317) – efficient training and remarkable policy transfer between tasks achieved by dealing with causality in RL.

End-to-End Learning of Semantic Grasping (https://arxiv.org/abs/1707.01932) – very first end2end algorithm dealing with grasping objects by robohand guided by user specified class of the required object.

Emergence of locomotion behaviours in rich environments (https://arxiv.org/abs/1707.02286, https://deepmind.com/blog/producing-flexible-behaviours-simulated-environments/) – rich environment helps to promote the learning of complex behavior; in particular, novel scalable variant of policy gradient allows agents to learn very complex behaviours guided by the simple reward (the distance passed).

Learning human behaviours from motion capture by adversarial imitation (https://arxiv.org/abs/1707.02201, https://deepmind.com/blog/producing-flexible-behaviours-simulated-environments/) – adversarial idea for learning of humanlike movement patterns from limited demonstrations consisting only of partially observed state features, without access to actions, even when the demonstrations come from a body with different and unknown physical parameter

Robust imitation of diverse behaviours (https://deepmind.com/documents/95/diverse_arxiv.pdf, https://deepmind.com/blog/producing-flexible-behaviours-simulated-environments/) – proposed model is a new type of variational autoencoder on demonstration trajectories that learns semantic policy embeddings allowing to do imitation learning less sensitive to discrepancies between train and test data, and avoiding mode collapse (compared with GAN).

источник

20:02пожаловаться #6

Pavel Shvechikov in RL reading group

ELF library from facebook (https://github.com/facebookresearch/ELF) – Extensive, Lightweight and Flexible platform for game research, in particular for real-time strategy (RTS) games. On the C++-side, ELF hosts multiple games in parallel with C++ threading. On the Python side, ELF returns one batch of game state at a time, making it very friendly for modern RL

Trust-PCL: An Off-Policy Trust Region Method for Continuous Control (https://arxiv.org/abs/1707.01891) - improving quality and sample efficiency over TRPO.

GitHub

facebookresearch/ELF

An End-To-End, Lightweight and Flexible Platform for Game Research - facebookresearch/ELF

источник

20:02пожаловаться #7

2017 July 11

cydoroga in RL reading group

Интригующе

источник

13:26пожаловаться #8

Mikhail in RL reading group

когда я вижу такие сообщения, у меня включается демотиватор.. :) кажется, что я безнадежно отстал. Мне вот интересно: все кто разбирает эти статьи, занимаются RL профессионально (наука/работа) и только им, или это чисто fun?

источник

13:31пожаловаться #9

Pavel Shvechikov in RL reading group

Mikhail

Исходя из фамилий авторов - профессионально и только им за редким исключением.

источник

14:03пожаловаться #10

Mikhail in RL reading group

нет, я не про авторов статей говорю, а про людей, которые в этой группе состоят и самоорганизовались встречи проводить по RL.

источник

14:06пожаловаться #11

📒

📒 in RL reading group

вроде наоборот в этом прелесть этой науки что порог входа не высокий чтобы читать последние топ статьи, в отличии от фундаментальных наук типо физики где надо много лет готовиться перед тем как до адванс уровня дойти

источник

14:21пожаловаться #12

Sergey Ovcharenko in RL reading group

не сказал бы, что в ту же TRPO, например, низкий порог входа :)

источник

14:24пожаловаться #13

Dmitry Persiyanov in RL reading group

Угу, много криповых статей но с хорошими результатами / идеями

источник

14:29пожаловаться #14

Pavel Shvechikov in RL reading group

Пожалуйста, не стесняйтесь писать сюда или мне в ЛС, если Вас зацепила какая-либо статья из вышеприведенных и вы хотели бы ее рассказать в ближайшее время.

источник

16:05пожаловаться #15

2017 July 12

Evgenii Zheltonozhsk... in RL reading group

набежали)

источник

13:43пожаловаться #16

Evgenii Egorov in RL reading group

Пригласили и мы пришли

источник

13:43пожаловаться #17

АС