Short talk: Analysis of transposable elements in R and Bioconductor with atena

Analysis of transposable elements in R and Bioconductor with atena

Beatriz Calvo-Serra,Robert Castelo

Department of Medicine and Life Sciences, Universitat Pompeu Fabra (UPF), Barcelona, Spain

Abstract

Transposable elements (TEs) are DNA sequences that can mobilize within the genome either through a DNA or an RNA intermediate. Their insertions have resulted in a complex distribution of repeated elements occupying approximately half of the human genome [1]. These elements, particularly endogenous retroviruses (ERVs), participate in physiological processes and have been involved in the development of some human diseases [2]. They may exert their function through transcription, hence RNA sequencing can be used to detect their expression. However, due to their repetitive nature, reads sequenced from TE RNA transcripts usually map to multiple genomic loci (i.e. multi-mapping reads) and are consequently discarded in standard RNA sequencing data processing pipelines. For this reason, TE analysis software exists, such as ERVmap [3], Telescope [4] and TEtranscripts [5]. These software packages, developed outside the R and Bioconductor ecosystem, do interact with it for the purpose of differential expression analyses. To facilitate expression quantification of TEs and its integration with other Bioconductor software, we have developed atena (https://bioconductor.org/packages/atena), an open source software package for the analysis of TE expression in R available at Bioconductor. The atena package is a faster and accurate implementation of these three methods, with a quick, flexible and straightforward access and processing of RepeatMasker UCSC TE annotations [6]. In summary, atena facilitates the integration of TE annotation and expression quantification with a wide range of differential expression and functional analyses pipelines available in Bioconductor. 1. O’Neill K, Brocks D, Hammell MG. Mobile genomics: tools and techniques for tackling transposons. Philos Trans R Soc Lond B Biol Sci. 2020;375(1795):20190345. 2. Payer LM, Burns KH. Transposable elements in human genetic disease. Nat Rev Genet. 2019;20(12):760-772. 3. Tokuyama M, Kong Y, Song E, Jayewickreme T, Kang I et al. ERVmap analysis reveals genome-wide transcription of human endogenous retroviruses. PNAS. 2018;115(50):12565-12572. 4. Bendall ML, de Mulder M, Iñiguez LP, Lecanda-Sánchez A, Pérez-Losada M et al. Telescope: Characterization of the retrotranscriptome by accurate estimation of transposable element expression. PLoS Comput Biol. 2019;15(9):e1006453. 5. Jin Y, Tam OH, Paniagua E, Hammell M. TEtranscripts: a package for including transposable elements in differential expression analysis of RNA-seq datasets. Bioinformatics. 2015;31(22):3593-9. 6. Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. http://www.repeatmasker.org. 1996-2010.