In contrast to the learning rate (Delta rule), the Kalman gain varies from trial to trial depending on the current variance of the expected reward’s prior distribution (σˆ2prect,t � ^ � � , � 2 � � � ) and the estimated observation variance (σˆ2o � ^ � 2 ). The observation variance indicates how much the actual rewards vary around the (to be estima...
Glasp is a social web highlighter that people can highlight and organize quotes and thoughts from the web, and access other like-minded people’s learning.