BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems Belief Reward Shaping in Reinforcement Learning Deep Reinforcement Learning for Unsupervised Video Summarization with DiversityRepresentativeness Reward Deep Reinforcement Learning that Matters Distributional Reinforcement Learning with Quantile Regression Efficient Probabilistic Performance Bounds for Inverse Reinforcement Learning Feature Engineering for Predictive Modeling using Reinforcement Learning Large Scaled Relation Extraction with Reinforcement Learning Learning Structured Representation for Text Classification via Reinforcement Learning Learning to Extract Coherent Summary via Deep Reinforcement Learning MathDQN: Solving Arithmetic Word Problems via Deep Reinforcement Learning Multi-Step Reinforcement Learning: A Unifying Algorithm OptionGAN: Learning Joint Reward-Policy Options using Generative Adversarial Inverse Reinforcement Learning PAC Reinforcement Learning with an Imperfect Model Personalizing a Dialogue System with Transfer Reinforcement Learning Phase-Parametric Policies for Reinforcement Learning in Cyclic Environments R^3: Reinforced Ranker-Reader for Open-Domain Question Answering Rainbow: Combining Improvements in Deep Reinforcement Learning Recurrent Attentional Reinforcement Learning for Multi-label Image Recognition Reinforced Multi-label Image Classification by Exploring Curriculum Reinforcement Learning for Relation Classification from Noisy Data Reinforcement Learning in POMDPs with Memoryless Options and Option-Observation Initiation Sets Safe Reinforcement Learning via Shielding Safe Reinforcement Learning via Formal Methods: Toward Safe Control Through Proof and Learning Teaching a Machine to Read Maps with Deep Reinforcement Learning Toward Deep Reinforcement Learning without a Simulator: An Autonomous Steering Example Deterministic Policy Optimization by Combining Pathwise and Score Function Estimators for Discrete Action Spaces Diverse Exploration for Fast and Safe Policy Improvement Expected Policy Gradients Hierarchical Policy Search via Return-Weighted Density Estimation Knowledge-Based Policies for Qualitative Decentralized POMDPs Learning with Options that Terminate Off-Policy Optimizing Interventions via Offline Policy Evaluation: Studies in Citizen Science Privacy-Preserving Policy Iteration for Decentralized POMDPs An Optimal Online Method of Selecting Source Policies for Reinforcement Learning Deep Q-learning from Demonstrations