Lightly-supervised learning methods for one-class text classification

Yiping Jin

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/73587

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Dittaya Wanvarie	-
dc.contributor.author	Yiping Jin	-
dc.contributor.other	Chulalongkorn University. Faculty of Science	-
dc.date.accessioned	2021-05-28T06:12:15Z	-
dc.date.available	2021-05-28T06:12:15Z	-
dc.date.issued	2018	-
dc.identifier.uri	http://cuir.car.chula.ac.th/handle/123456789/73587	-
dc.description	Thesis (M.Sc.)--Chulalongkorn University, 2018	en_US
dc.description.abstract	This thesis introduces a lightly-supervised learning method to train text classifiers with very little manual labelling effort. We adapt two previous state-of-theart lightly-supervised models, generalized expectation (GE) criteria (Druck et al. (2008)) and multinomial naïve Bayes (MNB) with priors (Settles (2011)) to oneclass classification problem. Users just need to label a handful of keywords for the target category. We also combine the two aforementioned models by letting MNB automatically augment the list of GE constraints. In addition, we ensemble two families of classifiers to improve the accuracy further. We successfully applied our model to a real-world problem of online advertising. On a corpus of online advertising data, the proposed model achieved the top macro average F₁ of 0.69 and closed 50% gap between previous state-of-the-art lightly-supervised models and a fully-supervised model MaxEnt model.	en_US
dc.description.abstractalternative	วิทยานิพนธ์นี้นำเสนอวิธีการเรียนรู้แบบมีผู้สอนเล็กน้อยเพื่อสร้างตัวจำแนกข้อความ โดยอาศัยการกำกับคลาสเพียงเล็กน้อย เราปรับใช้ตัวแบบการเรียนรู้แบบมีผู้สอนเล็กน้อย ล่าสุดสองตัวแบบ ได้แก่เกณฑ์การคาดหวังทั่วไป (generalized expectation criteria: GE criteria) (Druck et al. (2008)) และตัวจำแนกอเนกนามแบบเบส์อย่างง่าย (Multinomial Naive Bayes: MNB) โดยมีความรู้ก่อน (Settles (2011)) กับปัญหาการจำแนกคลาส เดียว ผู้ใช้เพียงต้องป้อนคำสำคัญของคลาสที่ต้องการเท่านั้น เราใช้วิธีทั้งสองที่กล่าวมาโดยให้ MNB ช่วยเพิ่มเติมรายการเงื่อนไขของ GE นอกจากนี้เรายังรวมผลลัพธ์ของตัวจำแนกทั้ง สองเพื่อเพิ่มความแม่นยำอีกด้วย เราใช้ตัวแบบที่นำเสนอกับการโฆษณาออนไลน์ซึ่งเป็นปัญหาในโลกจริง ตัวแบบที่นำ เสนอเมื่อใช้กับคลังข้อความโฆษณาออนไลน์มี F₁ เฉลี่ยรวม 0.69 ซึ่งเพิ่มขึ้น 50% จากความ แตกต่างของตัวแบบเดิมที่มีผู้สอนเพียงเล็กน้อยกับตัวจำแนกแบบเอนโทรปีสูงสุด (MaxEnt) ซึ่งใช้ผู้สอนกำกับข้อความทั้งหมด	en_US
dc.language.iso	en	en_US
dc.publisher	Chulalongkorn University	en_US
dc.relation.uri	http://doi.org/10.58837/CHULA.THE.2018.162	-
dc.rights	Chulalongkorn University	en_US
dc.title	Lightly-supervised learning methods for one-class text classification	en_US
dc.title.alternative	วิธีการเรียนรู้แบบมีผู้สอนเล็กน้อยสำหรับการจัดหมู่ข้อความแบบคลาสเดียว	en_US
dc.type	Thesis	en_US
dc.degree.name	Master of Science	en_US
dc.degree.level	Master's Degree	en_US
dc.degree.discipline	Computer Science	en_US
dc.degree.grantor	Chulalongkorn University	en_US
dc.email.advisor	Dittaya.W@chula.ac.th	-
dc.identifier.DOI	10.58837/CHULA.THE.2018.162	-
Appears in Collections:	Sci - Theses

Files in This Item:

File	Description	Size	Format
Sci_5972634023_Yiping Jin.pdf		1.04 MB	Adobe PDF	View/Open

Show simple item record