Font Size: a A A

Identification Of Innate Lymphoid Cells Lineage Using Single-cell RNA Sequencing

Posted on:2019-09-25Degree:MasterType:Thesis
Country:ChinaCandidate:David Omar Ramirez ValleFull Text:PDF
GTID:2370330566997335Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
This research work aims to identify the different kind of cells found in bone marrow samples of mus musculus through the analysis of data obtained by single cell RNA sequencing.In order to perform a detailed analysis,raw data should first pass through a filtering and preparation phase in which some Linux tools are used to allow a faster and more efficient analysis,after the initial processing of the data a deep analysis using statistical and graphical software tools is performed by following data mining techniques.The bone marrow is part of the lymphatic system and its where blood cells are originated,it is also responsible of the creation of specific cells called white blood cells that are responsible of the immune system and that is where lies the importance of this study,by analyzing and understanding the contents,behavior and differentiation of such cells we may be able to prevent or treat a wide variety of health conditions.In this work a basic review or introduction to the research,tools,objectives and more details about the applications of related works is described in order to introduce the reader into the context of this study.A brief and basic description of the different methodologies and principles used to obtain the raw data used in this work,as well as the advances and advantages achieved in the most recent studies are also described.An emphasis and brief description of new generation sequencing technologies and lymphoid tissue is included so that the reader can get a deeper and more clear understanding of this work.The full research can be basically divided into two main parts,the first one is considered the preparation of the data.As the initial part of the methodology in this research work,the objective was to transform and adapt the data so that it could be more easily analyzed.The information contained in the raw data is huge and by itself is very complicated to study and directly obtain any kind of hypothesis or conclusions.For this part of the research the Linux platform was used,along with a number of tools and software programs that have been developed specifically for the analysis of genomic information.In this section of the work a review and introduction to the raw data and the tools used to transform it: quality control,alignment and mapping to the mouse reference genome and the quantification of expressed genes are explained.The output or results obtained from the preparation of the data was a matrix containing the data of 760 cells and the genetic content of each of them.The second part of the processing methodology of this work and in comparison to the initial steps was developed using R language as its platform.The main objective of this part of the work was to explain the statistical analysis followed on the data that was previously transformed and adapted to a more manageable form.To achieve this objective the data was once again passed through several tools and processing algorithms: initially a quality control process,which also serves as a filtering step to eliminate the cells that may not contain accurate of enough valuable information;later,the data is put through a differential analysis procedure to discard genes that have a very low variation within the whole sample of cells;a dimensionality reduction algorithm is used to minimize the representation of the data into a simple plot of two dimensions;a clustering process which facilitates the visualization of the different types of cells in the data set;a statistical procedure better known as principal component analysis ease the identification and hierarchically order the major players or genes that contribute the most for the clustering procedure;and finally,the graphical visualization of the data arranged in a heat map,which arranges the data using the results from the clustering and principal component steps.Finally,as an interpretation of the statistical results,a detailed description of the analysis and the method applied,gene expression coloring method,is described in the last section of this work.As it is expected the conclusions obtained in this study were corroborated and compared with the results achieved within other studies in order to get the identification of cell types within clusters;the genes that help to the characterization and understanding of the biological functions of the cells are identified and used to reach the final results.
Keywords/Search Tags:clustering, gene expression, genome, heat-map, principal component analysis, RNA sequencing
PDF Full Text Request
Related items