[Home ] [Archive]   [ فارسی ]  
:: Main :: About :: Current Issue :: Archive :: Search :: Submit ::
:: Volume 9, Issue 4 (Autumn 2021) ::
Shefaye Khatam 2021, 9(4): 51-59 Back to browse issues page
Developing a Reinforcement Learning Algorithm to Model Pavlovian Approach Bias on Bidirectional Planning
Reza Kakooee, Mohammad Taghi Hamidi Beheshti *, Mehdi Keramati
Department of Control, Faculty of Electrical and Computer Engineering, Tarbiat Modares University, Tehran, Iran , mbehesht@modares.ac.ir
Abstract:   (577 Views)
Introduction: The decision- making process in the human brain is controlled by two mechanisms: Pavlovian and instrumental learning systems. The Pavlovian system learns the stimulus- outcome association independent of action; a process that manifests itself in the tendency to approach reward- associated stimuli. The instrumental controller, on the other hand, learns the action- outcome association. Instrumental learning is not limited to the current action's outcome and may evaluate a sequence of future actions in the form of forward planning. Nonetheless, forward planning may not be the only planning process used by instrumental learning. Humans may also use backward planning to evaluate actions sequences. However, backward planning has received less attention so far. Previous research has shown that despite the independence of Pavlovian and instrumental learning, they interact with each other such that the Pavlovian approach tendency biases forward planning, causing it to make decisions that may not be optimal actions from the instrumental learning perspective. Nevertheless, the effect of Pavlovian learning on backward planning has not yet been studied. Materials and Methods: This paper designs a navigation experiment that allows investigating forward, backward, and bidirectional planning. Moreover, we embed Pavlovian approach cues into the maps to investigate how they bias the three forms of planning. Results: Statistical analysis of the collected data indicates the existence of backward planning and shows that the Pavlovian- approach cues bias the planning. This bias is stronger in forward planning compared to backward planning and is even stronger in bidirectional planning. In the context of reinforcement learning, we developed a bidirectional planning algorithm under the Pavlovian approach tendency. Conclusion: The simulation results are consistent with the experimental results and indicate that the effect of Pavlovian bias can be modeled as pruning of decision trees.
Keywords: Decision Making, Strategic Planning, Conditioning, Operant, Computer Simulation
Full-Text [PDF 906 kb]   (170 Downloads)    
Type of Study: Research --- Open Access, CC-BY-NC | Subject: Cognitive Neuroscience
References
1. Simon DA, Daw ND. Neural correlates of forward planning in a spatial decision task in humans. Journal of Neuroscience. 2011; 31(14): 5526-39. [DOI:10.1523/JNEUROSCI.4647-10.2011]
2. Russell SJ, Norvig P. Artificial Intelligence- A Modern Approach, Third Int. Edition. Pearson Education, Upper Saddle River, NJ, USA; 2010.
3. Afsardeir A, Keramati M. Behavioural signatures of backward planning in animals. European Journal of Neuroscience. 2018; 47(5): 479-87. [DOI:10.1111/ejn.13851]
4. Khamassi M, Girard B. Modeling awake hippocampal reactivations with model- based bidirectional search. Biological Cybernetics (Modeling). 2020. [DOI:10.1007/s00422-020-00817-x]
5. Huys QJ, Eshel N, O'Nions E, Sheridan L, Dayan P, Roiser JP. Bonsai trees in your head: how the pavlovian system sculpts goal- directed choices by pruning decision trees. PLoS computational biology. 2012; 8(3): e1002410. [DOI:10.1371/journal.pcbi.1002410]
6. Rescorla RA. Pavlovian conditioning: It's not what you think it is. American psychologist. 1988; 43(3): 151. [DOI:10.1037/0003-066X.43.3.151]
7. O'Doherty JP, Cockburn J, Pauli WM. Learning, reward, and decision making. Annual review of psychology. 2017; 68: 73-100. [DOI:10.1146/annurev-psych-010416-044216]
8. Mogg K, Field M, Bradley BP. Attentional and approach biases for smoking cues in smokers: an investigation of competing theoretical views of addiction. Psychopharmacology. 2005; 180(2): 333-41. [DOI:10.1007/s00213-005-2158-x]
9. Dayan P, Niv Y, Seymour B, Daw ND. The misbehavior of value and the discipline of the will. Neural networks. 2006; 19(8): 1153-60. [DOI:10.1016/j.neunet.2006.03.002]
10. Balleine BW, Delgado MR, Hikosaka O. The role of the dorsal striatum in reward and decision- making. Journal of Neuroscience. 2007; 27(31): 8161-5. [DOI:10.1523/JNEUROSCI.1554-07.2007]
11. Cartoni E, Balleine B, Baldassarre G. Appetitive Pavlovian- instrumental transfer: a review. Neuroscience & Biobehavioral Reviews. 2016; 71: 829-48. [DOI:10.1016/j.neubiorev.2016.09.020]
12. Lloyd K, Dayan P. Pavlovian- instrumental interactions in active avoidance: The bark of neutral trials. Brain research. 2019; 1713: 52-61. [DOI:10.1016/j.brainres.2018.10.011]
13. Pool E, Pauli W, Kress C, O'Doherty J. Behavioural evidence for parallel outcome-sensitive and outcome-insensitive Pavlovian learning systems in humans. Nature Human Behaviour, 3 (3), 284-96. [DOI:10.1038/s41562-018-0527-9]
14. Dorfman HM, Gershman SJ. Controllability governs the balance between Pavlovian and instrumental action selection. Nature communications. 2019; 10(1): 1-8. [DOI:10.1038/s41467-019-13737-7]
15. Watson P, De Wit S, Hommel B, Wiers RW. Motivational mechanisms and outcome expectancies underlying the approach bias toward addictive substances. Frontiers in psychology. 2012; 3: 440. [DOI:10.3389/fpsyg.2012.00440]
16. Hunt LT, Rutledge RB, Malalasekera WN, Kennerley SW, Dolan RJ. Approach-induced biases in human information sampling. PLoS biology. 2016; 14(11): e2000638. [DOI:10.1371/journal.pbio.2000638]
17. Csifcsák G, Melsæter E, Mittner M. Intermittent absence of control during reinforcement learning interferes with Pavlovian bias in action selection. Journal of Cognitive Neuroscience. 2020; 32(4): 646-63. [DOI:10.1162/jocn_a_01515]
18. Gureckis TM, Love BC. Computational reinforcement learning. The Oxford handbook of computational and mathematical psychology. 2015: 99-117. [DOI:10.1093/oxfordhb/9780199957996.013.5]
19. Huys QJ, Cools R, Gölzer M, Friedel E, Heinz A, Dolan RJ, et al. Disentangling the roles of approach, activation and valence in instrumental and pavlovian responding. PLoS computational biology. 2011; 7(4): e1002028. [DOI:10.1371/journal.pcbi.1002028]
20. Sutton RS, Barto AG. Reinforcement learning: An introduction: MIT press; 2018.
21. Daw ND, Niv Y, Dayan P. Uncertainty- based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature neuroscience. 2005; 8(12): 1704-711. [DOI:10.1038/nn1560]
22. Dayan P, Berridge KC. Model-based and model-free Pavlovian reward learning: revaluation, revision, and revelation. Cognitive, Affective, & Behavioral Neuroscience. 2014; 14(2): 473-92. [DOI:10.3758/s13415-014-0277-8]
23. Cushman F, Morris A. Habitual control of goal selection in humans. Proceedings of the National Academy of Sciences. 2015; 112(45): 13817-22. [DOI:10.1073/pnas.1506367112]



XML   Persian Abstract   Print


Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Kakooee R, Hamidi Beheshti M T, Keramati M. Developing a Reinforcement Learning Algorithm to Model Pavlovian Approach Bias on Bidirectional Planning. Shefaye Khatam. 2021; 9 (4) :51-59
URL: http://shefayekhatam.ir/article-1-2232-en.html


Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Volume 9, Issue 4 (Autumn 2021) Back to browse issues page
مجله علوم اعصاب شفای خاتم The Neuroscience Journal of Shefaye Khatam
Persian site map - English site map - Created in 0.04 seconds with 30 queries by YEKTAWEB 4414