วิธีการสำหรับการสร้างหุ่นยนต์สนทนาไทยโดยใช้หน่วยความจำระยะสั้นแบบยาวแบบสยามและการแต่งเติมข้อมูลเชิงข้อความ

ธนัญญา พีรพัฒนาการ

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/70355

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	บุญเสริม กิจศิริกุล	-
dc.contributor.author	ธนัญญา พีรพัฒนาการ	-
dc.contributor.other	จุฬาลงกรณ์มหาวิทยาลัย. คณะวิศวกรรมศาสตร์	-
dc.date.accessioned	2020-11-11T13:54:55Z	-
dc.date.available	2020-11-11T13:54:55Z	-
dc.date.issued	2562	-
dc.identifier.uri	http://cuir.car.chula.ac.th/handle/123456789/70355	-
dc.description	วิทยานิพนธ์ (วท.ม.)--จุฬาลงกรณ์มหาวิทยาลัย, 2562	-
dc.description.abstract	แนวคิดการนำหุ่นยนต์สนทนามาช่วยในการตอบคำถามปัญหาที่พบบ่อยให้กับผู้รับบริการ เช่น การสอบถามข้อมูลทั่วไปเกี่ยวกับผู้ให้บริการ เป็นต้น เริ่มเป็นที่นิยมมากขึ้นในยุคปัจจุบัน อีกทั้งในการเรียนรู้ของเครื่องสำหรับสร้างหุ่นยนต์สนทนานั้น ชุดข้อมูลที่ใช้สำหรับการเรียนรู้ของแบบจำลอง ถือเป็นอีกหนึ่งสิ่งสำคัญที่จะช่วยให้แบบจำลองให้สามารถทำงานได้อย่างมีประสิทธิภาพ ในงานวิจัยนี้ได้รับการสนับสนุนข้อมูลจากการไฟฟ้านครหลวงแห่งประเทศไทยที่ได้รวบรวมข้อมูลการให้บริการการตอบปัญหาลูกค้าผ่านช่องทางสื่อสังคมออนไลน์ โดยจำนวนของชุดคำถามที่ได้นั้นมีปริมาณน้อยกว่า 1,500 คำถาม ทำให้จำนวนและความหลากหลายของข้อมูลที่มีนั้นส่งผลกับการเรียนรู้ของเครื่องโดยตรง งานวิจัยนี้จึงนำเสนอแนวคิดในการแต่งเติมข้อมูลด้วยวิธีการแทนที่คำด้วยคำที่มีความหมายคล้ายกันด้วยการวัดระยะห่างระหว่างเวกเตอร์น้อยที่สุดเมื่อเทียบกับคำที่ต้องการจะนำไปแทนที่ในประโยคเดิม เพื่อเพิ่มจำนวนและความหลากหลายของข้อมูล จากนั้นจึงนำชุดข้อมูลที่ได้ไปประยุกต์ใช้กับแบบจำลองหน่วยความจำระยะสั้นแบบยาว (Long Short-Term Memory: LSTM) ที่ใช้ร่วมกับการหาระยะทางร่วมกับการทดลองหาระยะทางของเวกเตอร์ทั้ง 3 แบบ ได้แก่ การหาระยะทางแบบยุคลิด (Euclidean Distance) การหาระยะทางแบบแมนฮัตตัน (Manhattan Distance) และ การหาค่าความคล้ายโคไซน์ (Cosine Similarity) เพื่อนำไปใช้ในการค้นคืนคำตอบของคำถามที่ได้รับมาจากผู้ใช้งาน ซึ่งผลการทดลองแสดงให้เห็นว่าชุดข้อมูลที่ปรับปรุงด้วยวิธีการแต่งเติมข้อมูลเชิงข้อความที่นำเสนอนั้นสามารถเพิ่มประสิทธิภาพของแบบจำลองได้ดีกว่าชุดข้อมูลตั้งต้น	-
dc.description.abstractalternative	The idea of using a dialogue bot is to provide answers to common questions. For training chatbot, the training dataset is also an important part, which helps machines to learn and accurately make the predictions. In this research, the question-answering dataset used for training and evaluating the system is from กฟน. The dataset is less than 1,500 sentences, which is a small size dataset. The size of a dataset is often responsible for poor performances in the training model. This paper presents a method called Text Data-Augmentation for increasing the textual data. Our approach creates new diverse questions by using cosine similarity for finding a similar word and replacing it in the same sequence. This research used the Siamese Long Short-Term Memory and distance similarity approach for the training model. For the evaluation, we used three distance similarity approaches such as Euclidean Distance, Manhattan Distance, and Cosine Similarity to get the most effective model. The experimental results show that the dataset using Text Data-Augmentation is able to improve the performance of the learned model.	-
dc.language.iso	th	-
dc.publisher	จุฬาลงกรณ์มหาวิทยาลัย	-
dc.relation.uri	http://doi.org/10.58837/CHULA.THE.2019.1134	-
dc.rights	จุฬาลงกรณ์มหาวิทยาลัย	-
dc.subject	การเรียนรู้ของเครื่อง	-
dc.subject	หุ่นยนต์ -- การออกแบบ	-
dc.subject	ความจำระยะสั้น	-
dc.subject	Machine learning	-
dc.subject	Robots -- Design	-
dc.subject	Short-term memory	-
dc.subject.classification	Engineering	-
dc.title	วิธีการสำหรับการสร้างหุ่นยนต์สนทนาไทยโดยใช้หน่วยความจำระยะสั้นแบบยาวแบบสยามและการแต่งเติมข้อมูลเชิงข้อความ	-
dc.title.alternative	An approach for Thai chatbot construction using Siamese long short-term memory and text data-augmentation	-
dc.type	Thesis	-
dc.degree.name	วิทยาศาสตรมหาบัณฑิต	-
dc.degree.level	ปริญญาโท	-
dc.degree.discipline	วิทยาศาสตร์คอมพิวเตอร์	-
dc.degree.grantor	จุฬาลงกรณ์มหาวิทยาลัย	-
dc.email.advisor	Boonserm.K@Chula.ac.th	-
dc.identifier.DOI	10.58837/CHULA.THE.2019.1134	-
Appears in Collections:	Eng - Theses

Files in This Item:

File	Description	Size	Format
6170930921.pdf		2.55 MB	Adobe PDF	View/Open

Show simple item record