Multi-agent deep reinforcement learning for cryptocurrency trading

Kittiwin Kumlungmak

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/83147

Title:	Multi-agent deep reinforcement learning for cryptocurrency trading
Other Titles:	การเรียนรู้แบบเสริมกำลังเชิงลึกแบบหลายตัวกระทำสำหรับการซื้อขายคริปโทเคอร์เรนซี
Authors:	Kittiwin Kumlungmak
Advisors:	Peerapon Vateekul
Other author:	Chulalongkorn University. Faculty of Engineering
Issue Date:	2022
Publisher:	Chulalongkorn University
Abstract:	Reinforcement learning has emerged as a promising approach for enhancing profitability in cryptocurrency trading. However, the inherent volatility of the market, especially during bearish periods, poses significant challenges in this domain. Existing literature addresses this issue through the adoption of single-agent techniques such as deep Q-network (DQN), advantage actor-critic (A2C), and proximal policy optimization (PPO), or their ensembles. Despite these efforts, the mechanisms employed to mitigate losses during bearish market conditions within the cryptocurrency context lack robustness. Consequently, the performance of reinforcement learning methods for cryptocurrency trading remains constrained within the current literature. To overcome this limitation, we present a novel cryptocurrency trading method, leveraging multi-agent proximal policy optimization (MAPPO). Our approach incorporates a collaborative multi-agent scheme and a local-global reward function to optimize both individual and collective agent performance. Employing a multi-objective optimization technique and a multi-scale continuous loss (MSCL) reward, we train the agents using a progressive penalty mechanism to prevent consecutive losses of portfolio value. In evaluating our method, we compare it against multiple baselines, revealing superior cumulative returns compared to baseline methods. Notably, the strength of our method is further exemplified through the results obtained from the bearish test set, where only our approach demonstrates the ability to yield a profit. Specifically, our method achieves an impressive cumulative return of 2.36%, while the baseline methods result in negative cumulative returns. In comparison to FinRL-Ensemble, a reinforcement learning-based method, our approach exhibits a remarkable 46.05% greater cumulative return in the bullish test set.
Other Abstract:	การเรียนรู้แบบเสริมกำลัง (Reinforcement learning) เป็นวิธีการที่ถูกนำมาใช้ในการเพิ่มผลกำไรในการซื้อขายคริปโทเคอร์เรนซี (cryptocurrency) อย่างไรก็ตาม ความผันผวนของตลาด โดยเฉพาะในช่วงเวลาที่ตลาดเป็นลักษณะตลาดขาลง (Bearish) กลายเป็นอุปสรรคที่สำคัญของด้านนี้ งานวิจัยที่มีอยู่ในปัจจุบัน มีความพยายามที่จะแก้ปัญหานี้โดยการใช้เทคนิค Deep Q-Network (DQN), Advantage Actor-Critic (A2C), และ Proximal Policy Optimization (PPO) หรือการผสมผสานกันของเทคนิคดังกล่าว (Ensemble) แต่อย่างไรก็ตาม กลไกที่นำมาใช้เพื่อลดความเสียหายในช่วงตลาดขาลงสำหรับคริปโทเคอร์เรนซียังไม่มีประสิทธิภาพเท่าที่ควร ดังนั้นประสิทธิภาพของวิธีการเรียนรู้แบบเสริมกำลังสำหรับการซื้อขายคริปโทเคอร์เรนซียังถูกจำกัด เพื่อเอาชนะข้อจำกัดนี้ เรานำเสนอเทคนิคใหม่สำหรับการซื้อขายคริปโทเคอร์เรนซี โดยใช้การเรียนรู้แบบหลายตัวกระทำ (Multi-Agent) และฟังก์ชันรางวัลร่วม (Local-Global Reward Function) เพื่อปรับปรุงประสิทธิภาพในการทำงานร่วมกันของตัวกระทำทุกตัว รวมถึงการทำงานของตัวกระทำแต่ละตัวไปพร้อมกันด้วย นอกจากนั้น เรายังใช้เทคนิคการปรับปรุงเป้าหมายหลายวัตถุประสงค์ (Multi-Objective Optimization Technique) และการทำโทษเมื่อมีการสูญเสียแบบต่อเนื่อง ซึ่งเราเรียกว่า Multi-Scale Continuous Loss (MSCL) Reward ที่เราดัดแปลงมาจากการลงโทษแบบเพิ่มเติม (Progressive Penalty) เพื่อป้องกันความสูญเสียต่อเนื่องของมูลค่าพอร์ตการลงทุน ในการประเมินผลของวิธีการที่เรานำเสนอ เราได้ทำการเปรียบเทียบกับเทคนิคอื่นๆที่เป็นที่นิยม และพบว่าผลตอบแทนสะสม (cumulative return) ของเทคนิคของเรามีค่าสูงกว่าเทคนิคดังกล่าว โดยเฉพาะในช่วงตลาดขาลง มีเพียงวิธีการของเราเท่านั้นที่สามารถให้ผลกำไรได้ ซึ่งวิธีการของเราสร้างผลตอบแทนสะสมได้ถึง 2.36% ในขณะที่วิธีการอื่นๆที่เรานำมาเปรียบเทียบเกิดการขาดทุนทั้งหมด และเมื่อเปรียบเทียบกับ FinRL-Ensemble ซึ่งเป็นวิธีการที่ใช้การเรียนรู้แบบเสริมกำลัง เราพบว่าวิธีการของเราได้รับผลตอบแทนสะสมที่สูงกว่าถึง 46.05% ในช่วงตลาดขาขึ้น (Bullish)
Description:	Thesis (M.Sc.)--Chulalongkorn University, 2022
Degree Name:	Master of Science
Degree Level:	Master's Degree
Degree Discipline:	Computer Science
URI:	https://cuir.car.chula.ac.th/handle/123456789/83147
URI:	http://doi.org/10.58837/CHULA.THE.2022.95
metadata.dc.identifier.DOI:	10.58837/CHULA.THE.2022.95
Type:	Thesis
Appears in Collections:	Eng - Theses

Files in This Item:

File	Description	Size	Format
6470140221.pdf		2.39 MB	Adobe PDF	View/Open

Show full item record