AI Researchers at the University of Tokyo Developed an Extended Photonic Reinforcement Learning Scheme that Moves from the Static Bandit Problem Towards a more Challenging Dynamic Environment