Duplicate-sampling of difficult-to-classify sheme

Parinya Weangsamoot

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/18956

Title:	Duplicate-sampling of difficult-to-classify sheme
Other Titles:	แบบแผนการสุ่มตัวอย่างซ้ำของข้อมูลที่ยากต่อการจัดกลุ่ม
Authors:	Parinya Weangsamoot
Advisors:	Chidchanok Lursinsap Krung Sinapiromsaran
Other author:	Chulalongkorn University. Faculty of Science
Advisor's Email:	Chidchanok.L@Chula.ac.th Krung.S@Chula.ac.th
Subjects:	Neural networks ‪(Computer sciences)‬ Back propagation ‪(Artificial intelligence)‬ Machine learning
Issue Date:	2008
Publisher:	Chulalongkorn University
Abstract:	The multilayer perceptron (MLP) network is applied successfully in solving many pattern classification problems. The famous learning algorithms used to train the MLP network is the backpropagation (BP) algorithm. One disadvantage regarding to the BP algorithm is its lengthy computational time. Therefore, a number of techniques are developed to handle this situation. Most of those techniques are still based on a concept of improving the convergence of the network’s error. In this thesis, we present another view, based on distribution of data and information gain, to solve the time-consuming problem called duplicate-sampling of difficult-to-classify scheme. The proposed technique is designed as an enhancement tool to apply with the standard BP algorithm. The technique utilizes the information gain to split data space into subspaces based on binary splitting of numeric attribute. Data located in the impure subspaces are identified as difficult-to-classify data which are duplicated before starting the BP algorithm. During the learning process, the difficult-to-classify data will be emphasized by the BP learning. The BP learning with this technique requires less computational time to achieve the same or higher accuracy. The experiments performed on eight data sets taken from the UCI repository indicate a computational time improvement. Seven out of eight data sets showed a better result
Other Abstract:	โครงข่ายประสาทเทียมได้ถูกนำมาใช้ในการแก้ปัญหาการแบ่งกลุ่มข้อมูลอย่างสัมฤทธิ์ผล วิธีการเรียนรู้ที่นิยมใช้ในการสอนโครงข่ายประสาทเทียมคือวิธีการแบบแพร่ย้อนกลับ ปัญหาของวิธีการแพร่ย้อนกลับคือการใช้เวลานานในการคำนวณ ด้วยเหตุนี้วิธีการต่างๆ ได้ถูกพัฒนาขึ้นเพื่อแก้ปัญหาดังกล่าว วิธีการส่วนใหญ่ที่พัฒนาขึ้นตั้งอยู่บนแนวคิดในการปรับปรุงการลู่เข้าของค่าความผิดพลาดของโครงข่ายประสาทเทียม งานวิจัยชิ้นนี้ได้เสนอวิธีการที่จะแก้ปัญหาการใช้เวลานานของวิธีการแพร่ย้อนกลับ โดยมีชื่อว่าแบบแผนการสุ่มตัวอย่างซ้ำของชุดข้อมูลที่ยากต่อการจัดกลุ่ม ซึ่งตั้งอยู่บนพื้นฐานของการกระจายของข้อมูลและอินฟอเมชันเกน วิธีการที่นำเสนอได้ถูกออกแบบมา เพื่อเป็นเครื่องมือที่ทำให้วิธีการแพร่ย้อนกลับทำงานได้ดียิ่งขึ้น วิธีการดังกล่าวใช้อินฟอเมชันเกนในการแบ่งโดเมนของข้อมูลออกเป็นโดเมนย่อย โดยใช้หลักการแบ่งทวิภาคของคุณลักษณะประจำที่เป็นตัวแลข กลุ่มข้อมูลที่บรรจุอยู่ในโดเมนย่อยที่มีการปนกันของข้อมูลหลายกลุ่ม จะถูกเรียกว่ากลุ่มข้อมูลที่ยากต่อการแบ่งกลุ่ม ซึ่งจะถูกจำลองซ้ำก่อนที่จะเริ่มการเรียนรู้ด้วยวิธีการแพร่ย้อนกลับ ในขณะที่ทำการสอนโครงข่ายประสาทเทียมด้วยวิธีการแพร่ย้อนกลับ การเรียนรู้ของข้อมูลที่ยากต่อการจัดกลุ่มจะถูกให้ความสำคัญมากยิ่งขึ้น วิธีการแพร่ย้อนกลับที่เสริมด้วยวิธีการดังกล่าว ช่วยให้โครงข่ายประสาทเทียมใช้เวลาในการเรียนรู้น้อยลง ในขณะที่ให้ผลการทำนายในระดับเดียวกันหรือสูงกว่า จากผลการทดลองบนแปดชุดข้อมูลที่ได้จากคลังข้อมูลยูซีไอ แสดงให้เห็นถึงการใช้เวลาที่ลดลงของวิธีการแพร่ย้อนกลับ เจ็ดในแปดชุดข้อมูลได้แสดงให้เห็นถึงประสิทธิภาพในการทำนายที่ดีขึ้นของโครงข่ายประสาทเทียม
Description:	Thesis (M.Sc.)--Chulalongkorn University, 2008
Degree Name:	Master of Science
Degree Level:	Master's Degree
Degree Discipline:	Computational Science
URI:	http://cuir.car.chula.ac.th/handle/123456789/18956
URI:	http://doi.org/10.14457/CU.the.2008.1844
metadata.dc.identifier.DOI:	10.14457/CU.the.2008.1844
Type:	Thesis
Appears in Collections:	Sci - Theses

Files in This Item:

File	Description	Size	Format
Parinya_we.pdf		1.39 MB	Adobe PDF	View/Open

Show full item record