AbstractLong noncoding RNAs (lncRNAs) are regulators of cell differentiation and development. The lncRNA transcriptome in human hematopoietic stem and progenitor cells is not comprehensively defined. We investigated lncRNAs in 979 human bone marrow-derived CD34+ cells by single cell RNA sequencing followed by de novo transcriptome reconstruction. We identified 3,173 lncRNAs in total, among which 2,365 were previously unknown, and we characterized lncRNA stem, differentiation, and maturation signatures. lncRNA expression exhibited high cell-to-cell variation, which was only apparent in single cell analysis. lncRNA expression followed a lineage-specific and highly dynamic pattern during early hematopoiesis. lncRNAs in hematopoietic cells closely correlated with protein-coding genes of known functions in the regulation of hematopoiesis and cell fate decisions, and the potential regulatory roles of lncRNAs in hematopoiesis were imputed by projection from protein-coding genes with a “guilt-by-association” approach. We characterized lncRNAs preferentially expressed in hematopoietic stem cells and in various downstream differentiated lineage progenitors. We also profiled lncRNA expression in single cells from patients with myelodysplastic syndromes and in aneuploid cells in particular. Our study provides a global view of lncRNAs in human hematopoietic stem and progenitor cells. We observed a highly ordered pattern of lncRNA expression and participation in regulation of early hematopoiesis, and coordinate aberrant messenger RNA and lncRNA transcriptomes in dysplastic hematopoiesis. (Registered at clinicaltrials.gov with identifiers: 00001620, 00001397)
Long noncoding RNAs (lncRNAs), which are defined as a subclass of noncoding RNAs, are longer than 200 nucleotides and lack protein coding capacity. lncRNAs are newly recognized as regulators of gene expression, transcriptionally and post-transcriptionally.31 Unlike messenger RNAs (mRNAs), which localize specifically to the cytoplasm, lncRNAs can occupy various nuclear compartments and/or the cytoplasm. lncRNAs function via RNA-DNA, RNA-RNA, and RNA-protein interactions.62 As a result, they affect multiple stages of gene regulation, including placement of chromatin marks, mRNA biogenesis, and signaling pathways.
lncRNA expression is tissue- and cell type-specific975 but less conserved across species than is mRNA expression.1110 lncRNAs have been linked to the development of several lineages in hematopoiesis and in the immune response. Some lncRNAs were found to be enriched in hematopoietic stem cells (HSCs)12 or dynamically expressed during erythropoiesis.1413 RNA interference studies have revealed that lncRNAs control HSC self-renewal and differentiation,12 erythroid precursor maturation,14 and granulocytic differentiation of hematopoietic stem and progenitor cells (HSPCs).15 Intergenic lncRNA signatures exhibit subset-specificity in T and B lymphocytes.1816 lincR-Ccr2-5’AS, together with GATA3, is essential in the regulation of gene expression and migration of Th2 cells.16 Downregulation of linc-MAF-4 skews T-cell differentiation towards the Th2 phenotype.17 TMEVPG1, a Th1-specific intergenic lncRNA, controls the expression of interferon-γ together with the Th1-specific transcription factor T-bet, and is critical in modulating susceptibility to infection with Theiler virus.2019 Expression of lncRNAs in pro-B and mature B cells is regulated by PAX5, a transcriptional factor required to specify B-cell lineage.18 Despite these many examples of specific functions for either stem cells or differentiated lineages, the repertoire of lncRNAs in human HSPCs has not been fully described.
Whole transcriptome sequencing allows large scale profiling of lncRNAs in tissues and diseases and, therefore, enables the identification of many putative lncRNAs.22215 lncRNAs in general are expressed at much lower levels242343 but are more cell type-specific than are mRNAs.259 Until recently, lncRNA expression was assessed by averaging transcriptomes of bulk RNA extracted from mixed cell populations, which limits the sensitivity to detect lncRNA expression in small cell populations and thus to resolve diversity within a cell type. With recent advances in single cell transcriptome profiling methods, many seemingly homogeneous cell populations have shown unexpected variability in gene expression. Recently published studies profiling lncRNAs at the single cell level have revealed the cell-specific expression of these RNAs.30265
In the current work, we performed single cell RNA sequencing (scRNA-seq) of 979 freshly isolated bone marrow-derived human CD34 cells from both healthy donors and patients with myelodysplastic syndrome (MDS). Using de novo transcriptome reconstruction, we identified a total of 3,173 lncRNAs, including 2,365 potential novel lncRNAs not reported in public databases. We further characterized the features and expression patterns of lncRNAs in CD34 cells, revealing stage- and lineage-specificity of lncRNA expression and putative functions in normal hematopoiesis. Expression and lineage-specificity of almost 40 lncRNAs, including those novel lncRNAs, were validated by quantitative real-time polymerase chain reaction (RT-PCR). We also profiled lncRNAs in MDS cells, and aneuploid cells in particular. Our study provides a global assessment of lncRNA biology in early human hematopoiesis.
Subjects and samples
Bone marrow samples from seven healthy donors and five MDS patients were obtained after written informed consent in accordance with the Declaration of Helsinki and under protocols (www.clinicaltrials.gov NCT00001620 and NCT00001397) approved by the Institutional Review Boards of the National Heart, Lung, and Blood Institute. Of the five patients with MDS, patients 1, 2, and 5 had evolved to MDS from aplastic anemia while patients 3 and 4 had de novo MDS. Fluorescence activated cell sorting (FACS) was performed using the FACSAria II Cell Sorter (BD Biosciences) after isolation of bone marrow mononuclear cells. The gating strategies are shown in Online Supplementary Figure S1A. CD34CD38 and CD34CD38 cells from four healthy donors and patient 4 were sequenced separately, while only the CD34 populations of patients 1, 2, 3, and 5 were sequenced due to limited cell numbers (Online Supplementary Figure S1B). The clinical characteristics of these patients have been published.31 Another set of bone marrow cells from a further three healthy donors was used for quantitative RT-PCR (Online Supplementary Figure S2).
Single cell RNA sequencing
The C1 Single-cell Auto Prep System (Fluidigm) was employed to perform SMARTer (Clontech) whole transcriptome amplification on as many as 96 individual cells, according to the manufacturer’s protocols (www.fluidigm.com). Whole transcriptome amplification products were converted to Illumina sequencing libraries using the Nextera XT DNA Sample Preparation Kit (Illumina). Final cDNA libraries were quantified using High Sensitivity DNA Kits (Agilent) and sequenced on a HiSeq 2500 or 3000 (Illumina), using the paired-end 75-bp protocol, as described previously.31 RNA-seq data from this study have been deposited at the National Center for Biotechnology Information Gene Expression Omnibus (accession number GSE99095), and updated with intermediate and result files from the lncRNA analysis. Aliquots of whole transcriptome amplification products were used for quantitative RT-PCR analysis.
Total reads were mapped to the reference genome (hg19) with RSubreader and gene-level read counts were calculated using featureCounts.32 Only data from high-quality cells with captured genes were utilized further. The schematic pipeline has been published.31 Aneuploidy was evaluated by three independent methods, including a sliding window analysis of copy number variations, chromosome relative expression value distribution, and analysis of the degree of loss of heterozygosity.
Identification and classification of long noncoding RNAs
After filtering computationally for quality,31 single cells were used to define lncRNAs with a pipeline adopted from published methods of identifying high-confidence gene models.2817161413 Fastq files of cells from each subject were merged. Reads were mapped to human genome hg19 with Tophat2 and assembled using Cufflinks packages.33 The assembled transcripts from all subjects were merged with Cuffmerge33 before removing genes with <200 nucleotides or containing single exons in order to obtain long transcripts. Assembled genes overlapping with known protein-coding genes were excluded, and we removed those with low expression (FPKM<2) to improve the reliability of the model. We investigated the coding potential of the remaining genes using three independent algorithms: (i) protein database homology with BlastX and Pfam 31.0 (hmmer2.0); (ii) codon potential assessment with CPAT;34 and (iii) presence of long open reading frames >100 amino acids with EMBOSS GetORF.35 Defined lncRNAs were compared with annotated databases from Ensembl, University of California Santa Cruz (UCSC) Genome Browser, and GENCODE:36 overlapping lncRNAs were defined as “annotated lncRNAs” and the others as putative “novel lncRNAs”. If supported by cap analysis of gene expression (CAGE) data,37 lncRNA transcripts obtained by the same filtering pipeline, but with medium expression levels (FPKM 0.1-2) were also defined to be expressed in human CD34 cells (Online Supplementary Methods and Results).
Identification and characterization of long noncoding RNAs in human CD34+ hematopoietic cells
To assess lncRNA expression in human HSPCs, we purified CD34 cells from the marrow of four healthy donors and five MDS patients. We then analyzed polyadenylated RNA by scRNA-seq. After filtering, 391 cells from healthy donors and 588 cells from MDS patients were retained for analysis, with over 9.1 billion 75 bp paired-end mapped reads in total and 7.7 million reads per cell on average. Using a published strategy,31 a total of 10,791 protein-coding genes were captured, 3,777 per cell on average.
To obtain reliable models of lncRNA expression, we followed a de novo transcript assembly pipeline (Figure 1A), in which “high-confidence” transcriptomes2817161413 from CD34 single cells of all nine subjects were merged in order to undergo multi-step filtering for: (i) overlap with known mRNA exon annotations, (ii) size and multiexonic selection, (iii) known protein domains, (iv) low levels of expression, and (v) predicted coding potential. Using this conservative multilayered analysis, we identified a total of 2,892 lncRNAs across 979 single human CD34 cells. To assign lncRNAs to specific classes, we examined their overlap with annotated noncoding genes present in public databases: 808 lncRNAs were previously annotated and 2,084 were putative novel lncRNAs (Figure 1B and Online Supplementary File 1). In addition, transcripts that were expressed at medium levels and supported by CAGE data37 were also defined to be lncRNAs (n=281) expressed in human CD34 cells (Online Supplementary File 2). Defined lncRNAs exhibited similarly low protein-coding potential (relative to protein-coding genes) as had previously annotated lncRNAs in the GENCODE database (Figure 1C). Such defined lncRNAs in single human CD34 cells were distributed across all chromosomes, at much lower average abundance than were protein-coding transcripts. Compared with protein-coding genes, lncRNA-encoding genes had fewer exons, were shorter and less well conserved. In general, lncRNA-encoding genes were enriched in 4-kb regions around the transcriptional start sites of their neighboring protein-coding genes, in agreement with previous work,38 suggesting that they share promoter regions [lncRNA-encoding genes show higher co-expression with protein-coding neighbors than do protein-coding gene pairs (see Online Supplementary Results “Characterization of lncRNAs defined in human CD34 hematopoietic cells”; Online Supplementary Figure S3)].
Detection of long noncoding RNAs with single cell RNA-sequencing
Expression of lncRNAs showed more variation among single cells than did the expression of coding transcripts (Figure 2A). Across all percentiles of gene expression levels, lncRNAs were expressed in smaller proportions of cells than were mRNAs (Figure 2B). Low overall expression of lncRNAs in bulk samples was likely partly attributable to limited but high expression of lncRNAs in a minority of cells or in small cell populations. Seven bulk samples of the CD34 population from the nine individuals studied were sequenced in parallel with single cells. We sought to compare the maximum abundance of mRNAs or lncRNAs versus housekeeping genes in bulk samples and individual cells,28 to quantify the power of gene expression detection by these different technical approaches. mRNAs were detected at a similar ratio to housekeeping genes in both bulk samples and single cells, but the ratio of maximum expression of lncRNAs relative to housekeeping genes was about 4-fold higher in single cells than in bulk samples. By scRNA-seq, the maximum expression of lncRNAs was similar to that of both mRNAs and housekeeping genes (Figure 2C). Genes with high variance tended to be captured by the single cell analysis rather than by the bulk approach (Online Supplementary Figure S4). Thus, lncRNA expression appeared to be better detected among single cells due to an expression pattern of high cell-to-cell variation and cell-specificity.
We then sought to infer putative functions of defined lncRNAs in hematopoiesis by a comprehensive “guilt by association” approach (Online Supplementary Methods and Results), correlating expression of lncRNAs with protein-coding genes of known functions.4139154 Associated protein-coding genes of defined lncRNAs across CD34 cells were enriched in gene ontology (GO) terms related to myeloid cell differentiation, cell growth, and cellular functions including DNA repair, mRNA splicing, gene expression, and epigenetic regulation (Figure 2D), implicating lncRNAs in the regulation of human hematopoiesis and associated cellular functions.
Stage- and lineage-specific expression of long noncoding RNAs in normal hematopoiesis
To obtain a profile of lncRNA expression in normal human hematopoiesis, we assessed lncRNA expression in 391 CD34 cells from healthy donors. We first studied whether a lncRNA signature separated CD38 and CD38 cell populations. lncRNAs detected with 20 reads in at least 20 cells were retained, and highly variable lncRNAs were used for stage-specific analysis (Online Supplementary Figure S5A). The method of t-distributed stochastic neighbor embedding (t-SNE) was adopted for non-linear dimension reduction based solely on batch-corrected (by Combat/SVA) lncRNA expression (Online Supplementary Figure S5B). In an unsupervised t-SNE plot, sorted CD38 cells formed a cluster distinct from CD38 cells, while CD38 cells were more dispersed (Figure 3A). To determine stage specificity, we performed pair-wise comparison of lncRNA expression in CD38 cells relative to expression in CD38 cells. lncRNA expression exhibited substantial differences in two stages (Online Supplementary Table S3); heatmaps of differentially expressed mRNAs and lncRNAs of CD38 and CD38 populations are shown in Figure 3B.
We previously assigned single CD34 cells to a cell type according to their protein-coding transcriptome profiles, based on gene expression data from flow cytometrically-sorted cell populations.42 The cell types to which the single cells were assigned included HSC, multilymphoid progenitor (MLP), megakaryocyte-erythroid progenitor (MEP), granulocyte-monocyte progenitor (GMP), pro-B cell (ProB), and earliest thymic progenitor (ETP).31 We applied weighted gene co-expression network analysis43 to assess the potential functions of lncRNAs in CD38 and CD38 cells. When protein-coding and lncRNA-encoding genes were simultaneously analyzed, they clustered into seven unsupervised modules (Online Supplementary Table S4), and genes in individual modules were analyzed for GO term enrichment (Figure 3C). Genes in module 1 showed high enrichment of lymphocyte activation pathway genes, and their expression levels were higher in ProB and ETP than in other cell types. Genes in module 6 were enriched in the heme metabolic process, and they showed higher expression in MEP. These data suggest roles of lncRNAs in hematopoiesis and lineage specificity of lncRNA expression.
By t-SNE, cells tended to cluster according to cell types (Figure 4A, right) and were coincident with the pattern of hematopoietic differentiation based on mRNA expression in pseudotime ordering (Figure 4A, middle).31 Thus lncRNAs appeared as powerful as their protein-coding counterparts in resolving subtypes of CD34 cells. We then analyzed cell-type specificity of gene expression by cell-type variance (Figure 4B) and assessed a Jensen-Shannon score8 (JScore) (Figure 4C). lncRNA expression showed higher cell-type specificity than did mRNA expression (JScore, P=1×10). There was more cell-to-cell variation in lncRNA expression than in mRNA expression, even within the same cell type (Online Supplementary Figure S6). We investigated our dataset for lncRNA signatures in various lineages, using difference in expression in a lineage, relative to expression in all other subsets, by pairwise comparisons, at a threshold P<0.05 (Figure 4E and Online Supplementary Table S5). Heatmaps revealed that MLP had signatures of both mRNAs and lncRNAs similar to those of HSC, in contrast to distinctive gene expression patterns in other lineages. These data were congruent with those of earlier studies,444231 and indicated that HSC and MLP defined by a transcriptome signature were enriched in a phenotypically characterized CD34CD38 population, while the other lineages comprised the more heterogeneous CD34CD38 population. We examined overlap of lncRNA and mRNA expression among lineages: 94.8% of mRNAs were shared by at least five out of six lineages, but only 62.2% of lncRNAs were so widely expressed (Figure 4D, top panel); conversely, 81.4% of lineage-signature mRNAs were specific to only one lineage, while 92.2% of lncRNAs were equivalently specific (Figure 4D, bottom panel). Again, lncRNA expression appeared more lineage-restricted than did the counterpart coding gene expression. In summary, we found lncRNA expression to be highly stage- and lineage-specific during early hematopoiesis.
To confirm our findings of potential novel lncRNAs and lineage-specific expression patterns of lncRNAs, we compared our results with a publicly available dataset.44 This scRNA-seq study was conducted with human HSPCs sorted based on cell surface antigens (GSE75478). Lineage-specific lncRNAs (and mRNAs) defined in the current study were also detected and showed consistent lineage-specific expression in the two datasets (Online Supplementary Results and Online Supplementary Figures S7 and S8). We then assessed 39 lncRNAs and 14 mRNAs by quantitative RT-PCR of aliquots of whole transcriptome amplification from those 391 single CD34 cells and another set of flow cytometry-sorted bulk samples (Online Supplementary Methods and Results; Online Supplementary Table S6). All 39 signature lncRNAs, including 20 novel lncRNAs, were detectable in single cells and bulk samples by quantitative RT-PCR, indicating expression in human CD34 cells. We confirmed cell type assignment of single cells by expression of well-recognized mRNAs (Online Supplementary Figure S9C) and confirmed lineage-specific expression for 35 out of 39 lineage signature lncRNAs in single cells. Moreover, their lineage-specific expression patterns in single cells were reproducible in independent sorted bulk samples (Online Supplementary Figure S9A,B). Expression of these lineage-specific lncRNAs in hematopoietic differentiation, by scRNA-seq and quantitative RT-PCR, is illustrated in Figure 4F.
Coordinated activation and suppression of signature messenger RNAs and long noncoding RNAs during hematopoiesis
To systematically assess expression of lncRNAs that might be activated or suppressed during hematopoiesis, we focused on dynamic changes of the mRNA and lncRNA transcriptomes along differentiation trajectories defined by pseudotime ordering of HSC/MLP into MEP and GM/L (granulocyte/monocyte/lymphocyte progenitors) (Figure 5 and Online Supplementary Tables S7 and S8). Sequentially upregulated/downregulated mRNAs and lncRNAs along the two trajectories were analyzed and gene expression was visualized in heatmaps (MEP trajectory in Figure 5A and GM/L trajectory in Figure 5B). Common downregulated mRNAs in MEP and GM/L trajectories (Figure 5C) were involved in signaling pathways related to stemness, including NRF2, AP-1, ATF-2, C-MYB, HIF-1, and IL-6 signaling. Downregulated genes specifically in the MEP differentiation pathway were mostly enriched in T cells and for broad immune response; enrichment in the EPO signaling pathway was observed only among GM/L downregulated genes. Frequently upregulated genes were involved in DNA replication, cell cycle, and cell proliferation; genes specifically upregulated in GM/L were enriched in B- and T-cell signaling and immune response (Figure 5D, right); hemoglobin synthesis and androgen receptors were enriched only among MEP upregulated genes (Figure 5D, left). lncRNA expression along the two differentiation trajectories was synchronously coordinated with lineage-specific coding genes and interrelated in functional pathways of stemness, megakaryocyte/erythrocyte development, and granulocyte/monocyte/lymphocyte development. Collectively, these data suggest the ordered expression of lncRNAs in hematopoietic differentiation and involvement in the regulation of hematopoiesis.
Long noncoding RNAs are bound by lineage-specific transcription factors and might be regulated by epigenetic mechanisms
Transcription factors are critical in cell fate decisions and thus in the regulation of lineage-specific gene expression. Given the observation of highly ordered expression patterns of lncRNAs during hematopoiesis and co-expression with lineage-specific transcription factors, we investigated roles of lineage-specific transcription factors in regulating lncRNA expression during hematopoiesis. The transcription factor GATA1 regulates erythrocyte and megakaryocyte differentiation,4645 and indeed its expression was sequentially increased as HSC differentiate into MEP (Figure 5A). Using data obtained by chromatin immunoprecipitation sequencing (ChIP-seq) for GATA1 binding (Encode Ref# ENCSR000EFT), we found that GATA1 binding to promoters was higher in lncRNA-encoding (Figure 6A, top) as well as protein-coding genes (Figure 6A, bottom) preferentially expressed in MEP than for other cell types. lncRNA-encoding genes preferentially expressed in MEP, such as SNHG3 and RP11-620J15.3 (Figure 6B), bound to GATA1 and had high read coverage of active histone marks (H3K27Ac, H3K79me2, and H3K4me2) and low coverage of repressive histone marks (H3K27me3) in erythroid cells. Our analysis, together with published data,4739181614138 indicated that cell fate decisions were controlled by critical lineage-specific transcription factors, as evidenced by expression of both lineage- specific mRNAs and lncRNAs bound and regulated by corresponding transcription factors, probably involving epigenetic modification.
Long noncoding RNAs exhibit aberrant expression in aneuploid cells from patients with myelodysplastic syndromes
Gene expression of 588 single CD34 cells from five MDS patients was compared with that of cells from four healthy donors. lncRNAs were differentially expressed in MDS cells compared with those from healthy donors (P<0.05): 372 and 590 lncRNAs were upregulated and downregulated, respectively (Figure 7A and Online Supplementary Table S10). By guilt-by-association, downregulated lncRNAs were associated with gene sets involved in immune response, cellular response, and gene expression and DNA damage response; upregulated lncRNAs were involved in cell metabolism and cell signaling (Figure 7B,C).
We adopted three bioinformatics methods to distinguish cells with abnormal karyotypes from diploid cells.31 We observed that 200 and 56 lncRNAs were downregulated and upregulated, respectively, in monosomy 7 cells, compared to diploid cells (P<0.05) (Figure 7D and Online Supplementary Table S11). By guilt-by-association, downregulated lncRNAs were associated with genes involved in immune response, cell apoptosis and cell death, and DNA modification; upregulated lncRNAs displayed involvement in Ras signaling, Wnt signaling, and interleukin-8 production (Figure 7E,F).
In the current study, we profiled the repertoire of lncRNAs in human bone marrow-derived CD34 cells, with the goal of understanding lncRNA biology in early human hematopoiesis. The majority of the human genome is transcribed but only a small proportion of transcripts encode proteins,49484 and thus the number of lncRNA genes is predicted to be very large. Deep RNA sequencing followed by de novo transcriptome reconstruction was adopted for genome-wide annotation and functional characterization of novel lncRNAs.18161412 Moreover, by scRNA-seq, we and others observed higher cell-to-cell variation of lncRNA expression compared to mRNA expression.50302826 The validation of defined lncRNAs, including potential novel ones, with quantitative RT-PCR in single cells and a new set of sorted bulk samples proved the validity of scRNA-seq and bioinformatic analysis in defining lncRNAs in the current study. Our strategy of single cell deep sequencing in combination with de novo transcript assembly could be adopted to further facilitate annotation of the complete lncRNA repertoire.
The very large number of both annotated and novel lncRNAs presents a challenge to functional validation. Based on earlier studies,4139154 we adopted a systematic, computational guilt-by-association method, from which we could confirm defined lncRNAs in human HSPCs to be likely involved in hematopoietic differentiation and anticipated cell functions. Conventional functional validation of the many hundreds of known and new lncRNAs would not only be prohibitively costly and time-consuming, but the choice of assays and conditions of testing is not obvious, nor is there an established statistic by which to judge correlation. We attempted to computationally distinguish lncRNA roles as primary and possibly regulatory from secondary and “epiphenomonal”. To this end, we first determined whether lncRNAs were preferentially expressed in specific cell types; if so, their functions were postulated to relate to lineage-specific protein-coding genes. We then applied pseudotime ordering to reconstruct hematopoietic differentiation in order to examine dynamic gene expression. HSCs are assumed to lose “stemness” and to progressively gain restricted lineage commitment gene expression during differentiation. Indeed, we observed repression of stemness genes and activation of the cell proliferation/metabolism gene program, accompanied by activation of specific-lineage genes and repression of alternative pathway of differentiation genes. By this analysis, we defined lncRNAs that are coordinately expressed in those gene modules and thus have a greater probability of regulatory roles in lineage specification. Our data should assist in narrowing the scope of future efforts including in vitro perturbation and in vivo experiments to study functions of individual lncRNAs in hematopoiesis.
The highly ordered expression pattern of lncRNAs during hematopoiesis implies regulatory constraint. Our analysis and earlier studies47398 indicated that lncRNAs are likely regulated by cell-type specific transcription factors.161413 The observation that lncRNAs exhibited higher expression variability than did mRNAs in the same regulatory program suggests more diverse and active expression of lncRNAs. lncRNAs exert regulatory roles transcriptionally and post-transcriptionally by a variety of mechanisms.61 These features of lncRNAs would make them more dynamic participants in cell states and biological processes, facilitating prompt adaptive responses to stimuli or perturbations, and add another layer of complexity in gene expression regulation and cell fate decision.
Our data indicated considerable stage- and lineage-specificity of lncRNAs in human HSPCs and potential engagement in early priming of cell fate, consistent with tissue- and cell type-specificity observed in previous studies.975, 1813 This conclusion was confirmed by extension to an external independent scRNA-seq study of 1,034 sorted single human HSPCs,45 and the reproducible lineage-specificity of 35 lncRNAs in both single cells and sorted bulk samples by quantitative RT-PCR. lncRNAs often form secondary structures and there are sensitive, rapid, low-cost methods readily available for lncRNA quantification, all of which make lncRNAs promising biomarkers for disease detection, diagnosis, and prognosis. One study based on microarray assay of bone marrow mononuclear cells from 176 adult patients with MDS established a four-lncRNA risk-scoring system that correlated with distinctive clinical features, and was an independent prognostic factor for survival and leukemia transformation.51 We also found lncRNAs to be dysregulated in MDS cells, but due to the limited number of patients, lncRNA signatures of MDS patients in the current study should be interpreted with caution. Nevertheless, our results were in agreement with reported microarray data from 183 MDS patients, which related abnormal lncRNAs with gene expression, cancer, and malignancy.52 Also, differentially expressed lncRNAs in monosomy 7 cells were involved in similar pathways as their mRNA counterparts in our previous study.31
Our results are not a complete profile of lncRNAs due to several limitations, especially the use of only polyA-enriched RNAs,8 and the limited cell numbers from a few individuals due to the high cost of scRNA-seq. Additionally, annotation of novel lncRNAs is context dependent. We adopted commonly used pipelines,18161412 but annotation might vary using different algorithms. Nevertheless, our work creates a model for future profiling of the repertoire of lncRNAs in other cell types. Lineage signatures of lncRNAs are comparison-based, and thus may vary when such comparisons are made among different subsets. Others have categorized HSCs versus cells of specific lineages and among differentiated cells or distinct subsets.1812 In contrast, we defined lncRNA signatures by making comparisons among subsets within a relatively homogeneous HSPC population, which may compromise our power to detect differences. Furthermore, pseudotime ordering reconstructs the hematopoietic hierarchy based on bioinformatic analysis of transcriptome similarity, and it has demonstrated high agreement with purified cell compartments;44 however, dynamic gene expression in hematopoiesis might be preferably assessed in purified cell populations obtained after physical sorting based on membrane proteins, including after induction of differentiation or other in vitro perturbations. Given the high cell-type specificity of lncRNAs, signature lncRNAs may be superior to mRNAs in discriminating and differentiating cell subsets or new cell types that cannot be easily distinguished based on cell surface markers. We did not compare the efficacy of lncRNAs and mRNAs in defining cell types due to a lack of detailed surface marker information for single cells. Future studies with larger cell numbers, complete surface marker characterization, and whole transcriptome expression data should be of great interest in defining new cells/subtypes.
Rapid evolution and low species conservation are features of lncRNAs,1110 making a human catalog a prerequisite to successful, clinically relevant lncRNA studies. Based on next-generation sequencing and single cell technology, we provide a global database that should be foundational for future studies of lncRNA biology in human HSPCs.
The authors acknowledge the support of the Trans-NIH Center for Human Immunology, Autoimmunity, and Inflammation (National Institute of Allergy and Infectious Diseases, NIH, Bethesda, MD, USA). We thank patients and healthy volunteers who donated bone marrow. Sequencing and technical support were provided by the DNA Sequencing and Genomics Core of NHLBI. FACS sorting was performed by Keyvan Keyvanfar and the Flow Cytometry Core of NHLBI. This research was supported by an Intramural Research Program of the National Heart, Lung, and Blood Institute.
- ↵* ZW and SG contributed equally to this work.
- Check the online version for the most updated information on this article, online supplements, and information on authorship & disclosures: www.haematologica.org/content/104/5/894
- Received October 12, 2018.
- Accepted November 22, 2018.
- Alvarez-Dominguez JR, Lodish HF. Emerging mechanisms of long noncoding RNA function during normal and malignant hematopoiesis. Blood. 2017; 130(18):1965-1975. PubMedhttps://doi.org/10.1182/blood-2017-06-788695Google Scholar
- Satpathy AT, Chang HY. Long noncoding RNA in hematopoiesis and immunity. Immunity. 2015; 42(5):792-804. PubMedhttps://doi.org/10.1016/j.immuni.2015.05.004Google Scholar
- Engreitz JM, Ollikainen N, Guttman M. Long non-coding RNAs: spatial amplifiers that control nuclear structure and gene expression. Nat Rev Mol Cell Biol. 2016; 17(12):756-770. PubMedhttps://doi.org/10.1038/nrm.2016.126Google Scholar
- Derrien T, Johnson R, Bussotti G. The GENCODE v7 catalog of human long non-coding RNAs: analysis of their gene structure, evolution, and expression. Genome Res. 2012; 22(9):1775-1789. PubMedhttps://doi.org/10.1101/gr.132159.111Google Scholar
- Cabili MN, Dunagin MC, McClanahan PD. Localization and abundance analysis of human lncRNAs at single-cell and single-molecule resolution. Genome Biol. 2015; 16:20. PubMedhttps://doi.org/10.1186/s13059-015-0586-4Google Scholar
- Djebali S, Davis CA, Merkel A. Landscape of transcription in human cells. Nature. 2012; 489(7414):101-108. PubMedhttps://doi.org/10.1038/nature11233Google Scholar
- Ulitsky I, Bartel DP. lincRNAs: genomics, evolution, and mechanisms. Cell. 2013; 154(1):26-46. PubMedhttps://doi.org/10.1016/j.cell.2013.06.020Google Scholar
- Cabili MN, Trapnell C, Goff L. Integrative annotation of human large inter-genic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 2011; 25(18):1915-1927. PubMedhttps://doi.org/10.1101/gad.17446611Google Scholar
- Washietl S, Kellis M, Garber M. Evolutionary dynamics and tissue specificity of human long noncoding RNAs in six mammals. Genome Res. 2014; 24(4):616-628. PubMedhttps://doi.org/10.1101/gr.165035.113Google Scholar
- Pang KC, Frith MC, Mattick JS. Rapid evolution of noncoding RNAs: lack of conservation does not mean lack of function. Trends Genet. 2006; 22(1):1-5. PubMedhttps://doi.org/10.1016/j.tig.2005.10.003Google Scholar
- Wang J, Zhang J, Zheng H. Mouse transcriptome: neutral evolution of ‘non-coding’ complementary DNAs. Nature. 2004; 431(7010):1. PubMedGoogle Scholar
- Luo M, Jeong M, Sun D. Long non-coding RNAs control hematopoietic stem cell function. Cell Stem Cell. 2015; 16(4):426-438. PubMedhttps://doi.org/10.1016/j.stem.2015.02.002Google Scholar
- Alvarez-Dominguez JR, Hu W, Yuan B. Global discovery of erythroid long noncoding RNAs reveals novel regulators of red cell maturation. Blood. 2014; 123(4):570-581. PubMedhttps://doi.org/10.1182/blood-2013-10-530683Google Scholar
- Paralkar VR, Mishra T, Luan J. Lineage and species-specific long noncoding RNAs during erythro-megakaryocytic development. Blood. 2014; 123(12):1927-1937. PubMedhttps://doi.org/10.1182/blood-2013-12-544494Google Scholar
- Schwarzer A, Emmrich S, Schmidt F. The non-coding RNA landscape of human hematopoiesis and leukemia. Nat Commun. 2017; 8(1):218. Google Scholar
- Hu G, Tang Q, Sharma S. Expression and regulation of intergenic long noncoding RNAs during T cell development and differentiation. Nat Immunol. 2013; 14(11):1190-1198. PubMedhttps://doi.org/10.1038/ni.2712Google Scholar
- Ranzani V, Rossetti G, Panzeri I. The long intergenic noncoding RNA landscape of human lymphocytes highlights the regulation of T cell differentiation by linc-MAF-4. Nat Immunol. 2015; 16(3):318-325. PubMedhttps://doi.org/10.1038/ni.3093Google Scholar
- Brazão TF, Johnson JS, Müller J. Long noncoding RNAs in B-cell development and activation. Blood. 2016; 128(7):e10-19. PubMedhttps://doi.org/10.1182/blood-2015-11-680843Google Scholar
- Collier SP, Collins PL, Williams CL. Cutting edge: influence of Tmevpg1, a long intergenic noncoding RNA, on the expression of Ifng by Th1 cells. J Immunol. 2012; 189(5):2084-2088. PubMedhttps://doi.org/10.4049/jimmunol.1200774Google Scholar
- Vigneau S, Rohrlich PS, Brahic M. Tmevpg1, a candidate gene for the control of Theiler’s virus persistence, could be implicated in the regulation of gamma interferon. J Virol. 2003; 77(10):5632-5638. PubMedhttps://doi.org/10.1128/JVI.77.10.5632-5638.2003Google Scholar
- Iyer MK, Niknafs YS, Malik R. The landscape of long noncoding RNAs in the human transcriptome. Nat Genet. 2015; 47(3):199-208. PubMedhttps://doi.org/10.1038/ng.3192Google Scholar
- Lei L, Xia S, Liu D, Li X. Genome-wide characterization of lncRNAs in acute myeloid leukemia. Brief Bioinform. 2018; 19(4):627-635. Google Scholar
- Heward JA, Lindsay MA. Long non-coding RNAs in the regulation of the immune response. Trends Immunol. 2014; 35(9):408-419. PubMedhttps://doi.org/10.1016/j.it.2014.07.005Google Scholar
- Ulitsky I, Bartel DP. lincRNAs: genomics, evolution, and mechanisms. Cell. 2013; 154(1):26-46. PubMedhttps://doi.org/10.1016/j.cell.2013.06.020Google Scholar
- Cabili MN, Trapnell C, Goff L. Integrative annotation of human large inter-genic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 2011; 25(18):1915-1927. PubMedhttps://doi.org/10.1101/gad.17446611Google Scholar
- Wang J, Roy B. Single-cell RNA-seq reveals lincRNA expression differences in Hela-S3 cells. Biotechnol Lett. 2017; 39(3):359-366. Google Scholar
- Kim DH, Marinov GK, Pepke S. Single-cell transcriptome analysis reveals dynamic changes in lncRNA expression during reprogramming. Cell Stem Cell. 2015; 16(1):88-101. PubMedhttps://doi.org/10.1016/j.stem.2014.11.005Google Scholar
- Liu SJ, Nowakowski TJ, Pollen AA. Single-cell analysis of long non-coding RNAs in the developing human neocortex. Genome Biol. 2016; 17:67. PubMedhttps://doi.org/10.1186/s13059-016-0932-1Google Scholar
- Gawronski KAB, Kim J. Single cell transcriptomics of noncoding RNAs and their cell-specificity. Wiley Interdiscip Rev RNA. 2017; 8(6)Google Scholar
- Hu W, Wang T, Yang Y. Tumor heterogeneity uncovered by dynamic expression of long noncoding RNA at single-cell resolution. Cancer Genet. 2015; 208(12):581-586. Google Scholar
- Zhao X, Gao S, Wu Z. Single-cell RNA-seq reveals a distinct transcriptome signature of aneuploid hematopoietic cells. Blood. 2017; 130(25):2762-2773. PubMedhttps://doi.org/10.1182/blood-2017-08-803353Google Scholar
- Liao Y, Smyth GK, Shi W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics. 2014; 30(7):923-930. PubMedhttps://doi.org/10.1093/bioinformatics/btt656Google Scholar
- Trapnell C, Roberts A, Goff L. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc. 2012; 7(3):562-578. PubMedhttps://doi.org/10.1038/nprot.2012.016Google Scholar
- Wang L, Park HJ, Dasari S. CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model. Nucleic Acids Res. 2013; 41(6):e74. PubMedhttps://doi.org/10.1093/nar/gkt006Google Scholar
- Rice P, Longden I, Bleasby A. EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 2000; 16(6):276-277. PubMedhttps://doi.org/10.1016/S0168-9525(00)02024-2Google Scholar
- Harrow J, Frankish A, Gonzalez JM. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 2012; 22(9):1760-1774. PubMedhttps://doi.org/10.1101/gr.135350.111Google Scholar
- Hon CC, Ramilowski JA, Harshbarger J. An atlas of human long non-coding RNAs with accurate 5′ ends. Nature. 2017; 543:199-204. PubMedhttps://doi.org/10.1038/nature21374Google Scholar
- Zhang K, Huang K, Luo Y. Identification and functional analysis of long non-coding RNAs in mouse cleavage stage embryonic development based on single cell transcriptome data. BMC Genomics. 2014; 15:845. PubMedhttps://doi.org/10.1186/1471-2164-15-845Google Scholar
- Guttman M, Amit I, Garber M. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature. 2009; 458(7235):223-227. PubMedhttps://doi.org/10.1038/nature07672Google Scholar
- Huarte M, Guttman M, Feldser D. A large intergenic noncoding RNA induced by p53 mediates global gene repression in the p53 response. Cell. 2010; 142(3):409-419. PubMedhttps://doi.org/10.1016/j.cell.2010.06.040Google Scholar
- Yan X, Hu Z, Feng Y. Comprehensive genomic characterization of long non-coding RNAs across human cancers. Cancer Cell. 2015; 28(4):529-540. PubMedhttps://doi.org/10.1016/j.ccell.2015.09.006Google Scholar
- Laurenti E, Doulatov S, Zandi S. The transcriptional architecture of early human hematopoiesis identifies multilevel control of lymphoid commitment. Nat Immunol. 2013; 14(7):756-763. PubMedhttps://doi.org/10.1038/ni.2615Google Scholar
- Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. 2008; 9:559. PubMedhttps://doi.org/10.1186/1471-2105-9-559Google Scholar
- Velten L, Haas SF, Raffel S. Human haematopoietic stem cell lineage commit ment is a continuous process. Nat Cell Biol. 2017; 19(4):271-281. PubMedhttps://doi.org/10.1038/ncb3493Google Scholar
- Stachura DL, Chou ST, Weiss MJ. Early block to erythromegakaryocytic development conferred by loss of transcription factor GATA-1. Blood. 2006; 107(1):87-97. PubMedhttps://doi.org/10.1182/blood-2005-07-2740Google Scholar
- Shivdasani RA, Fujiwara Y, McDevitt MA. A lineage-selective knockout establishes the critical role of transcription factor GATA-1 in megakaryocyte growth and platelet development. EMBO J. 1997; 16(13):3965-3973. PubMedhttps://doi.org/10.1093/emboj/16.13.3965Google Scholar
- Guttman M, Donaghey J, Carey BW. lincRNAs act in the circuitry controlling pluripotency and differentiation. Nature. 2011; 477(7364):295-300. PubMedhttps://doi.org/10.1038/nature10398Google Scholar
- Ezkurdia I, Juan D, Rodriguez JM. Multiple evidence strands suggest that there may be as few as 19,000 human protein-coding genes. Hum Mol Genet. 2014; 23(22):5866-5878. PubMedhttps://doi.org/10.1093/hmg/ddu309Google Scholar
- Mattick JS, Rinn JL. Discovery and annotation of long noncoding RNAs. Nat Struct Mol Biol. 2015; 22(1):5-7. PubMedhttps://doi.org/10.1038/nsmb.2942Google Scholar
- Zhou F, Li X, Wang W. Tracing haematopietic stem cell formation at single-cell resolution. Nature. 2016; 533(7604):487-492. PubMedhttps://doi.org/10.1038/nature17997Google Scholar
- Yao CY, Chen CH, Huang HH. A 4-lncRNA scoring system for prognostication of adult myelodysplastic syndromes. Blood Adv. 2017; 1(19):1505-1516. PubMedhttps://doi.org/10.1182/bloodadvances.2017008284Google Scholar
- Liu K, Beck D, Thoms JAI. Annotating function to differentially expressed LincRNAs in myelodysplastic syndrome using a network-based method. Bioinformatics. 2017; 33(17):2622-2630. Google Scholar