Identification of non-coding RNAs in Mycobacterium Tuberculosis genome using combined computational approach

Natapol Pornputtapong; Chulalongkorn University. Faculty of Pharmaceutical Sciences Chulalongkorn University. Department of Biochemistry Chulalongkorn University. Biomedicinal Chemistry Alt Title	การระบุตำแหน่งอาร์เอ็นเอไม่แปลรหัสในจีโนมของไมโคแบคทีเรียมทูเบอร์คูโลซิสด้วยวิธีทางคอมพิวเตอร์หลายวิธีร่วมกัน

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/56560

Title:	Identification of non-coding RNAs in Mycobacterium Tuberculosis genome using combined computational approach
Other Titles:	การระบุตำแหน่งอาร์เอ็นเอไม่แปลรหัสในจีโนมของไมโคแบคทีเรียมทูเบอร์คูโลซิสด้วยวิธีทางคอมพิวเตอร์หลายวิธีร่วมกัน
Authors:	Natapol Pornputtapong Chulalongkorn University. Faculty of Pharmaceutical Sciences Chulalongkorn University. Department of Biochemistry Chulalongkorn University. Biomedicinal Chemistry Alt Title การระบุตำแหน่งอาร์เอ็นเอไม่แปลรหัสในจีโนมของไมโคแบคทีเรียมทูเบอร์คูโลซิสด้วยวิธีทางคอมพิวเตอร์หลายวิธีร่วมกัน
Advisors:	Voravee P. Hoven Chinae Thammarongtham
Advisor's Email:	Vipavee.P@Chula.ac.th
Subjects:	Mycobacterium tuberculosis -- Genome mapping Mycobacterium tuberculosis -- Identification -- Data processing Non-coding RNA Computational biology Drug development มัยโคแบคทีเรียม ทุเบอร์คุโลซิส -- แผนที่ยีน มัยโคแบคทีเรียม ทุเบอร์คุโลซิส -- การพิสูจน์เอกลักษณ์ -- การประมวลผลข้อมูล อาร์เอ็นเอไม่แปลรหัส ชีวสารสนเทศศาสตร์ การพัฒนายา
Issue Date:	2008
Publisher:	Chulalongkorn University
Abstract:	Nowadays it has already been known that non-coding RNAs (ncRNAs), which are not translated to proteins, play important roles in cellular processes including regulatory functions. In order to identify putative ncRNAs of Mycobacterium tuberculosis, genome-wide screening by using computational approach is applied. Although the efficiency of currently available programs is limited, combined approach was the method of choice. New workflow development was required. The core program, RNAz, of the workflow was integrated with TBA. By testing the workflow with Escherichia coli genome, it was however, observed that TBA generated a large number of false positives by generating missing alignment. This problem is challenging. In order to solve this, new genome wide alignment protocol was developed by combining BLAST search and MAFFT multiple sequence alignment. Evaluating this with E. coli ncRNA prediction, it can improve sensitivity of RNAz results from 0.54 to 0.84, precision from 0.37 to 0.56 and reduce time to calculate from over 6 hours to 70 minutes. Therefore, this protocol was used, instead of TBA, in M. tuberculosis ncRNA gene identification, resulting 61 predicted loci. Based on M. tuberculosis H37Rv ncRNA annotation, 33 predicted RNA loci were located in ncRNA gene region. Other loci were mapped with promoter and terminator prediction. There were 22 loci which had transcription signal and only a locus had double transcription signal. By sequence similarity search, there were 3 loci which matched with two known RNA sequences, ykoK and SRP, in database. The ykoK element is a reglatory element of divalent cation-related genes and SRP involves in protein translocation in cell. Resuling candidate putative loci were considered as putative ncRNAs for further experimental verification.
Other Abstract:	ปัจจุบันเป็นที่รู้กันดีว่าอาร์เอ็นเอไม่แปลรหัสมีหน้าที่สำคัญ ในการควบคุมการทำงานภายในเซลล์ ปกติการค้นหาตำแหน่งของยีนของอาร์เอไม่แปลรหัสด้วยวิธีทางห้องปฏิบัติการทำได้ลำบาก จึงได้มีการนำวิธีทางคอมพิวเตอร์เข้ามาช่วยในการค้นหายีนของอาร์เอ็นเอไม่แปลรหัส แต่ว่าวิธีทางคอมพิวเตอร์ยังมีข้อจำกัดและผลที่ได้บางส่วนยังมีความคลาดเคลื่อน ในงานวิจัยนี้จึงใช้หลายวิธีรวมกันโดยสร้างเป็นข่ายงาน เพื่อใช้ในการค้นหายีนของอาร์เอ็นเอไม่แปลรหัส ซึ่งเป็นการนำข้อดีของวิธีการต่างๆ มารวมกัน แกนหลักของข่ายงานใช้วิธีที่เชื่อว่ามีความถูกต้องที่สุดคือการทำนายโครงสร้างในระดับทุติยภูมิของสายอาร์เอ็นเอ คำนวณโดยโปรแกรม FNAz ซึ่งผู้พัฒนาได้แนะนำให้ใช้คู่กับโปรแกรม TBA ที่ช่วยเปรียบเทียบจีโนมหลังจากที่ได้ทดสอบข่ายงานกับจีโนมของ Escherichia coli และจากการทบทวนวรรณกรรม พบว่าโปรแกรม TBA ทำให้เกิด false positive สูง และมีการเปรียบเทียบจีโนมที่ผิดพลาด ในงานวิจัยนี้จึงได้พัฒนาโปรแกรมที่ใช้ในการเปรียบเทียบจีโนมขึ้นใหม่ โดยใช้โปรแกรม BLAST และ MAFFT เป็นโปรแกรมหลัก ซึ่งผลที่ได้จากการทดสอบในเชื้อ E.coli ทำให้สามารถเพิ่มค่าความไวของการทำนายด้วย RNAz จาก 0.54 เป็น 0.84 และค่าความแม่นยำจาก 0.37 เป็น 0.56 ได้ หลังจากนั้นโปรแกรมได้ถูกนำไปใช้ในการทำนายหาตำแหน่งของยีนอาร์เอ็นเอไม่แปลรหัสใน Mycobacterium tuberculosis H37Rv ผลจากการทำนายพบบริเวณซึ่งน่าจะเป็นตำแหน่งของอาร์เอ็นเอไม่แปลรหัสทั้งหมด 61 บริเวณ เมื่อนำไปเปรียบเทียบกับยีนของอาร์เอ็นเอไม่แปลรหัสที่พบแล้วในจีโนมของ M. tuberculosis H37Rv พบว่าเป็นบริเวณซึ่งถูกรายงานแล้วในข้อมูลจีโนมของ M. tuberculosis H37Rv เป็นจำนวน 33 บริเวณ เมื่อนำไปประกอบกับผลที่ได้จากการทำนายตำแหน่งของโปรโมเตอร์ และเทอร์มิเนเตอร์ พบบริเวณที่มีสัญญาณ 22 บริเวณและเมื่อนำไปเปรียบเทียบกับฐานข้อมูลด้วยวิธี BLAST พบว่ามี 3 บริเวณที่ตรงกับลำดับเบสของอาร์เอ็นเอที่ทราบแล้วได้แก่อาร์เอ็นเอ ykoK ที่ควบคุมการแสดงออกของยีนที่เกี่ยวข้องกับโลหะที่มีประจุ+2 และอาร์เอ็นเอ SRP ซึ่งเกี่ยวข้องกับการนำส่งของโปรตีนภายในเซลล์ด้วย โดยบริเวณที่ทำนายได้ทั้งหมดมีความน่าสนใจในการนำไปศึกษาต่อในห้องปฏิบัติการ
Degree Name:	Master of Science
Degree Level:	Master's Degree
Degree Discipline:	Biomedicinal Chemistry
URI:	http://cuir.car.chula.ac.th/handle/123456789/56560
URI:	http://doi.org/10.14457/CU.the.2008.1633
metadata.dc.identifier.DOI:	10.14457/CU.the.2008.1633
Type:	Thesis
Appears in Collections:	Pharm - Theses

Files in This Item:

File	Description	Size	Format
Natapol Pornputtapong.pdf		1.06 MB	Adobe PDF	View/Open

Show full item record