注目コメント算出アルゴリズムの一部にLINEヤフー株式会社の「建設的コメント順位付けモデルAPI」を使用しています
Find an error with your paper? Please login to CMT to fix any errors. Fixes will eventually be pr... Find an error with your paper? Please login to CMT to fix any errors. Fixes will eventually be propagated here. Orals | Spotlights | Posters Orals "How hard is my MDP?" The distribution-norm to the rescue In Reinforcement Learning (RL), state-of-the-art algorithms require a large number of samples per state-action pair to estimate the transition kernel $p$. In many problems, a good approximation o
2014/09/18 リンク