MethQuant: A package providing entropy-based measures for quantifying patterns of DNA methylation heterogeneity in (single-cell) bisulfite sequencing data
MethQuant: A package providing entropy-based measures for quantifying patterns of DNA methylation heterogeneity in (single-cell) bisulfite sequencing data Emanuel Sonder, Izaskun Mallona & Mark D. Robinson DNA methylation is an essential epigenetic mark associated with gene expression regulation. In mammals, it typically affects the 5th carbon of cytosines situated in a CpG dinucleotide context Hence, DNA methylation is a binary mark, being either present or absent at the individual base level. However, in bulk assays, per-CpG methylation rates averaged across cells often take intermediate values, indicating DNA methylation heterogeneity. Conceptually, DNA methylation can be heterogeneous across cells (bulk effects) but also show disordered/aberrant patterns along the sequence of a single cell, e.g. due to stochastic loss of DNA methylation. To characterize patterns of variability in DNA methylation along the genome, different entropy-based scores have been introduced and applied to, mainly, bulk bisulfite sequencing data . To fully describe the sources of DNA methylation heterogeneity, we suggest incorporating further heterogeneity scores (i.e. Sample Entropy) and applying them to data derived from single cells; hence, aiming to disentangle heterogeneity within and across cells. Our package in development, MethQuant, provides several entropy-based scores for quantifying sources of DNA methylation heterogeneity for genomic regions of interest utilizing single-cell bisulfite sequencing (sc-BS) data. More specifically, MethQuant offers implementations of: 1. Shannon Entropy as a measure of inter-cell DNA methylation heterogeneity. 2. Sample Entropy as a measure of intra-cell DNA methylation heterogeneity. 3. Simulation functionalities for single-cell bisulfite sequencing data. Our package aims to make use and integrate with widely-used packages, such as GenomicRanges and bsseq. Feedback concerning implementation and handling of sc-BS data in accordance with existing Bioconductor packages would be particularly appreciated. To our knowledge, currently no package is devoted to mining DNA methylation heterogeneity using entropy-based scores in the Bioconductor ecosystem. A seamless integration within the Bioconductor project would allow on one hand making entropy-based scores available in a user-friendly way and on the other a basis for adding other DNA methylation-associated measures.  Michael Scherer et al. “Quantitative comparison of within-sample heterogeneity scores for DNA methylation data”. In: Nucleic Acids Research 48 (Feb. 2020). https://academic. oup.com/nar/article/48/8/e46/5760751, e46.