Path exploration with random network distillation on multi-agent reinforcement learning

Korawat Charoenpitaks

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/70347

Title:	Path exploration with random network distillation on multi-agent reinforcement learning
Other Titles:	การสำรวจเส้นทางด้วยการกลั่นตัวโครงข่ายแบบสุ่มบนการเรียนรู้เสริมกำลังหลายตัวแทน
Authors:	Korawat Charoenpitaks
Advisors:	Yachai Limpiyakorn
Other author:	Chulalongkorn University. Faculty of Engineering
Advisor's Email:	Yachai.L@Chula.ac.th
Subjects:	Reinforcement learning Machine learning การเรียนรู้แบบเสริมแรง การเรียนรู้ของเครื่อง
Issue Date:	2019
Publisher:	Chulalongkorn University
Abstract:	Intrinsic motivation is one of the potential candidates to help improve performance of reinforcement learning algorithm in complex environments. The method enhances exploration capability without explicitly told by the creator and works on any environment. This is suitable in the case of multi-agent reinforcement learning where the environment complexity is more than usual. The research presents an exploration model using intrinsic motivation built from the random network distillation algorithm to improve the performance of multi-agent reinforcement learning and compare with the benchmark in different scenarios. The concept of clipping ratio is introduced to enforces the limit on optimization magnitude. Based on the extrinsic reward, the limit in the form of clipping ratio helps truncate the excessive magnitude that may cause instability to the optimization. The experiments were carried out on two different multi-agent architectures: 1) Individual Intrinsic Motivation Architecture, and 2) Centralized Intrinsic Motivation Architecture. The experimental results showed that in case of very complex environments, Centralized Intrinsic Motivation Architecture accompanied with a small clipping ratio could gain an increase in performance. The result reported the achievement of up to 70% win-rate in both architectures which is higher than those of the benchmark at the best of 43% in 2s3z environment.
Other Abstract:	แรงจูงใจภายในเป็นทางเลือกหนึ่งที่มีศักยภาพช่วยเพิ่มขีดความสามารถของอัลกอรึทึมการเรียนรู้เสริมกำลังในสภาพแวดล้อมที่ซับซ้อน วิธีการดังกล่าวขยายความสามารถในการสำรวจได้ โดยไม่ต้องอาศัยค่าที่ชัดแจ้งจากผู้สร้าง อีกทั้งยังสามารถใช้ได้ทั่วไปกับสภาพแวดล้อมใดๆ ทำให้วิธีการนี้มีความเหมาะสมกับการนำมาใช้ในกรณีของการเรียนรู้แบบเสริมกำลังหลายตัวแทน ซึ่งมีสภาพแวดล้อมซับซ้อนมากกว่าปกติ งานวิจัยนี้ได้เสนอโมเดลการสำรวจโดยใช้แรงจูงใจภายในจากอัลกอริทึมการกลั่นตัวโครงข่ายแบบสุ่มเพื่อเพิ่มสมรรถนะของการเรียนรู้เสริมกำลังหลายตัวแทน และเปรียบเทียบผลลัพธ์กับผลการทดลองจากผลเกณฑ์มาตรฐานในหลายๆ สภาพแวดล้อม ทั้งนี้ ผู้วิจัยได้นำเสนอแนวคิดค่าอัตราส่วนสำหรับตัดออกเพื่อบังคับจำกัดขนาดค่าความเหมาะ โดยอ้างอิงจากอัตราส่วนที่มาจากค่าแรงจูงใจภายนอก การใช้ค่าอัตราส่วนสำหรับตัดออกจะช่วยตัดขนาดค่าส่วนเกินที่อาจทำให้การหาค่าเหมาะสมไม่มีความเสถียร การทดลองได้ดำเนินการบนสถาปัตยกรรมหลายตัวแทนสองแบบที่แตกต่าง ประกอบด้วย สถาปัตยกรรมแรงจูงใจภายในแบบเดี่ยว และสถาปัตยกรรมแรงจูงใจภายในแบบรวมศูนย์ ผลการทดลองแสดงให้เห็นว่า ในกรณีที่สภาพแวดล้อมมีความซับซ้อนมาก สถาปัตยกรรมแรงจูงใจภายในแบบรวมศูนย์ร่วมกับอัตราส่วนสำหรับตัดออกที่มีค่าน้อย จะช่วยเพิ่มสมรรถนะได้มากกว่าปกติ โดยสามารถทำอัตราการชนะได้จนถึง 70% ในทั้งสองสถาปัตยกรรมซึ่งสูงกว่าอัตราที่ดีที่สุด 43% ของเกณฑ์เปรียบเทียบมาตรฐานในงานวิจัยอื่นที่ทดลองบนสภาพแวดล้อม 2s3z
Description:	Thesis (M.Sc.)--Chulalongkorn University, 2019
Degree Name:	Master of Science
Degree Level:	Master’s Degree
Degree Discipline:	Computer Science
URI:	http://cuir.car.chula.ac.th/handle/123456789/70347
URI:	http://doi.org/10.58837/CHULA.THE.2019.162
metadata.dc.identifier.DOI:	10.58837/CHULA.THE.2019.162
Type:	Thesis
Appears in Collections:	Eng - Theses

Files in This Item:

File	Description	Size	Format
6170903021.pdf		3.38 MB	Adobe PDF	View/Open

Show full item record