We identified a rare case of familial germline loss-of-function mutation in methyl-CpG binding domain 4, DNA glycosylase (MBD4) associated with early-onset acute myeloid leukemia (AML) in an Israeli Christian-Arab family from a highly endogamous community that reported no known consanguinity. This discovery prompted broader screening, revealing additional unrelated carriers of the same MBD4 frameshift mutation in the population. To investigate the consequences of heterozygous MBD4 loss, we analyzed clonal hematopoiesis (CH) and mutational patterns by whole-exome sequencing (WES) and found that heterozygous MBD4 deficiency is associated with increased mutagenesis, characterized particularly by CG>TG transitions. The base excision repair pathway maintains genomic stability by correcting DNA mismatches. MBD4 and thymine-DNA glycosylase (TDG) specifically repair T:G mismatches caused by 5-methylcytosine deamination, with some substrate specificity.1-3 While biallelic TDG loss is embryonically lethal,4 MBD4 deficiency increases mutagenesis and tumor risk.5 Germline biallelic MBD4 loss of function has been linked to cancers, including AML, myelodysplastic syndrome, CH,6 and colonic polyposis, defining the MBD4-associated neoplasia syndrome (MANS).7 Heterozygous germline mutations with somatic loss of the wild-type allele have also been reported in uveal melanoma.8 Affected tumors typically exhibit excess CG>TG mutations, with recent reports suggesting a potential SBS96 signature specifically associated with biallelic MBD4 deficiency.9
While the phenotype of the homozygous trait of the MBD4 mutation appears clearer, particularly concerning the hematopoietic system and gastrointestinal tract, the phenotype of the heterozygous state remains less defined.
A case involving a 42-year-old male diagnosed with AML harboring eight CH mutations (predominantly CG>TG) detected by myeloid panel sequencing presented to our laboratory (Figure 1A). The patient’s sister had previously succumbed to AML at the age of 30 years. WES of leukemic blasts revealed a germline biallelic 4-bp deletion in MBD4 (c.612_615del; p.Ser205ThrfsTer9), confirmed by amplicon sequencing of peripheral blood and was shown to segregate with disease in the family (Figure 1B).
This variant is rare in the Genome Aggregation Database (gnomAD v4.1.0), with an allele frequency of 0.0000399, and has only been observed in the heterozygous state (0.00000248), with no individuals carrying biallelic mutations. A similar familial case of the same biallelic c.612_615del frameshift deletion was reported in a patient with colorectal adenomas and myelodysplastic syndrome that evolved to AML.10 Given the patient’s origin in an Israeli Christian Arab community with notable endogamy and a high inbreeding coefficient,11 and the family’s denial of consanguinity, we hypothesized that this MBD4 mutation may represent a founder variant predisposing to MANS among Israeli Christian Arabs. In the current study we aimed to assess the prevalence of the MBD4 c.612_615del mutation in the Christian-Arab population in Israel, and to study the consequences of the MBD4 c.612_615del in heterozygous carriers.
We conducted a pilot screening of healthy unrelated volunteers from the Israeli Christian-Arab community, unrelated to the index family, in collaboration with the Orthodox Church and EMMS Nazareth Hospital using amplicon sequencing. Volunteers with active oncological conditions were excluded.
To assess the phenotype of MBD4 c.612_615del heterozygous carriers, we performed deep targeted sequencing for CH mutations and WES of peripheral blood to evaluate the impact of MBD4 c.612_615del on mutagenesis.
The study received approval from the local ethics committee of the Weizmann Institute of Science (Institutional Review Board approval 1773-2).
Genomic DNA was extracted from peripheral blood using a Qiagen DNA purification kit. For WES, 1 µg of DNA was fragmented (~220 bp) and libraries were prepared using the xGen Exome v1.0 Panel (IDT). The sequencing was performed on an Illumina NovaSeq X Plus using 100 bp paired-end reads, with an average depth of ~50× and minimum target coverage of 20×.
Public datasets were harmonized for comparison. Beat AML12 samples were sequenced using the Illumina Nextera RapidCapture Exome kit. The Cancer Genome Atlas (TCGA) blood-derived normal samples, used as germline references for the TCGA project, were selected from chemotherapy-naïve, aged-matched individuals across BRCA (breast invasive carcinoma), LGG (brain lower grade glioma), LIHC (liver hepatocellular carcinoma), and TGCT (testicular germ cell tumor) cohorts. Capture kits included Agilent v3/ v5 and Roche VCRome. All raw FASTQ or aligned BAM files from public datasets were reprocessed through the same alignment and variant calling pipeline to minimize bias.
Reads were aligned to GRCh38/hg38 using BWA. Variant calling was performed with Mutect2 (GATK v4.1.7.0) in tumor-only mode with default parameters.13 Germline filtering used gnomAD v4.1.0 and the 1000 Genomes Project Phase 3 v5 as references. Orientation artifacts were modeled with LearnReadOrientationModel, and calls were filtered with FilterMutectCalls. We retained only PASS variants with a variant allele frequency between 0.11 and 0.8, TLOD - Log10 likelihood ratio score of variant existing versus non-existing, with a threshold of 40.
Additional filters excluded variants with: (i) strand bias (P<0.01, χ2), (ii) alternate/reference base quality difference >5, (iii) extreme depth (>150×), (iv) <3 alternate-supporting reads, (v) <3.35 gnomAD MAF (-log10), (vi) proximity <50 bp to another variant; and (vii) recurrence across individuals. Annotation was performed using ANNOVAR.
Mutational signature analysis was performed with the MutationalPatterns R package for context-specific C>T substitutions, and with the signatureFit_pipeline() function from the signature.tools.lib10 package for SBS fitting, focusing on SBS1, SBS5, (signatures associated with spontaneous and clock-like C>T transitions) and SBS96. Signature fitting included 100 bootstraps. Deep targeted sequencing for CH was done with a 47-gene Molecular Inversion Probe panel,14 requiring depth >100×. Variant significance was assessed using a Poisson exact test with Benjamini–Hochberg multiple testing correction. Amplicon-based sequencing of MBD4 c.612_615del was performed using primers with 5′ Illumina adaptors (Fwd: CTACACGACGCTCTTCCGATCTttctgaagttaacatcatcaaca, Rev: CAGACGTGTGCTCTTCCGATCTaaccaaagtaacaattcaaactg) and sequenced on an Illumina MiniSeq (2×151 bp).
For the index case, somatic myeloid mutations were profiled using the Oncomine™ Myeloid Assay (Ion Torrent S5, GRCh37 reference), using a 5% threshold, excluding synonymous and common single nucleotide polymorphisms.
Sequencing the peripheral blood from 312 healthy unrelated Christian-Arab individuals (18-93 years old) using amplicon-based sequencing of MBD4 c.612_615del identified three MBD4 c.612_615del heterozygous carriers (~1%) and no biallelic cases. None had a personal or family history of cancer. To assess CH, we analyzed 11 MBD4 c.612_615del heterozygous carriers: three from our population screen and eight relatives of the index patient. CH mutations were found in three, including the 14-year-old daughter of the index patient (Figure 2A).
Figure 1.The MBD4 biallelic c.612_615del patient’s somatic mutations, pedigree, and unique mutational signature extracted from leukemic blasts. (A) Table of somatic mutations in clonal hematopoiesis-related genes identified using a myeloid next-generation sequencing panel on leukemic blasts from a 42-year-old patient with acute myeloid leukemia (AML). Several mutations, including those in STAG2, TET2, DNMT3A, and IDH1, occur at CpG sites and represent C>T transitions characteristic of CG>TG mutagenesis. (B) Pedigree of the biallelic c.612_615del patient (marked with X) from an Israeli Christian-Arab family with the MBD4 mutation. Black represents the mutated allele, white indicates the wild-type allele, squares denote males, circles denote females. The year of birth is indicated. The index patient’s parents declined genetic testing. (C, D) Comparison of mutational signatures between the MBD4 biallelic c.612_615del patient and samples from the Beat AML leukemic cohort. The total number of mutations was 552 in the biallelic MBD4 c.612_615del patient and 746 across 11 Beat AML samples. The biallelic patient exhibited a significantly higher proportion of C>T transitions at CpG sites (light red bar), with error bars representing 95% confidence intervals (Fisher exact test, ***P=1.72×10⁻¹⁵). HGVS: Human Genome Variation Society; VAF: variant allele frequency.
To assess the mutational impact of the MBD4 c.612_615del variant, we compared WES of leukemic blasts from the biallelic case to the Beat AML samples, and WES from MBD4 c.612_615del heterozygous carriers to WES from peripheral blood of non-leukemic donors (in-house MBD4+/+ donors), and blood-derived normal cases from the TCGA database. WES of leukemic blasts from the biallelic c.612_615del AML patient confirmed previous observations, with 72% of C>T substitutions occurring at CpG sites (CG>TG), which was significantly higher compared to AML samples from the Beat AML cohort (N=11, P=1.72×10-15, Fisher exact test) (Figure 1C, D). The biallelic c.612_615del patient exhibited an average higher overall mutation burden (552 mutations, N=1) compared to all the combined Beat AML samples (746 mutations, N=11). To assess mutational impact, we performed WES on peripheral blood from nine healthy MBD4 c.612_615del heterozygotes and compared their profiles to those of two non-leukemic control cohorts: (i) blood-derived normal samples from age-matched individuals in the TCGA database who had not received chemotherapy, and (ii) 19 non-leukemic MBD4+/+ donors sampled in-house to minimize potential sample handling bias.
MBD4 c.612_615del heterozygous carriers exhibited a significantly higher proportion of C>T transitions at CpG sites compared to both control groups (41% vs. 31% in MBD4+/+ in-house controls, P=0.00178; 41% vs. 28% in TCGA normal, P=0.0000245; Wilcoxon rank-sum test) (Figure 2B).
We observed 3,274 mutations in MBD4 c.612_615del heterozygous carriers (N=9), compared to 5,530 in TCGA controls (N=36) and 4,174 in MBD4+/+ donors (N=19), supporting modest CpG-biased mutagenesis in carriers. No significant correlation between age and mutation count was observed (MBD4+/–: r= –0.5, P=0.17; MBD4+/+: r= 0, P=0.99; TCGA: r= -0.06, P=0.75), suggesting that age alone does not explain the increased burden (Online Supplementary Figure S1A). Our cohort small size and drift might reduce power to detect age-related changes.
Figure 2.Prevalence and mutation patterns in MBD4 c.612_615del heterozygous carriers in the Christian-Arab population. (A) Clonal hematopoiesis (CH) in heterozygous carriers: CH mutations were detected from peripheral blood DNA in three out of 11 MBD4 c.612_615del heterozygous carriers, including one 14-year-old individual. Human Genome Variation Society annotations are based on canonical transcripts: TP53 (NM_000546.6), DNMT3A (NM_022552.5), and EZH2 (NM_004456.4). (B) Comparison of the percentage of C>T transitions at CpG sites among MBD4 c.612_615del heterozygous carriers, in-house MBD4+/+ healthy donors, and whole-exome sequencing (WES) data from The Cancer Genome Atlas (TCGA) project extracted from peripheral blood. MBD4 c.612_615del heterozygous carriers exhibited significantly more C>T transitions at CpG sites (41% vs. 31% in MBD4+/+ controls, **P=0.00178; 41% vs. 28% in TCGA normal samples, ***P=0.0000245; Wilcoxon rank-sum test). (C) Mutational spectra from peripheral blood WES data across cohorts. Total mutations: MBD4 c.612_615del heterozygous carriers (N=9) = 3,274; TCGA (N=36) = 5,530; MBD4+/+ (N=19) = 4,174. VAF: variant allele frequency. **P<0.01; ***P<0.001.
We used the signatureFit_pipeline function from the signature.tools.lib R package to assess mutational signatures, focusing on SBS96. The MBD4 biallelic c.612_615del patient showed a stronger SBS96 fit than Beat AML samples (mean 0.453 vs. 0.318, P=9.42×10⁻⁶) (Online Supplementary Figure S1B). MBD4 c.612_615del heterozygous carriers had higher SBS96 contributions than both MBD4+/+ donors (mean 0.403 vs. 0.335, P<2.6×10⁻³⁴) and TCGA controls (0.403 vs. 0.252, P<2.6×10⁻³⁴) (Online Supplementary Figure S1C).
Altogether, this study demonstrates that the MBD4 c.612_615del variant contributes to increased somatic mutagenesis, with a more pronounced effect in the homozygous state. While MBD4 c.612_615del heterozygous carriers exhibit a milder phenotype, the consistent rise in C>T transitions at CpG sites suggests a progressive mutagenic process. The ability to detect somatic mutations was probably feasible due to drift.15 Given its ~1% carrier frequency in Israeli Christian Arabs and potential long-term genomic instability, genetic counseling may be advisable. This may guide reproductive choices and clinical follow-up as penetrance and risks become clearer.
Footnotes
- Received February 26, 2025
- Accepted August 22, 2025
Correspondence
Disclosures
No conflicts of interest to disclose.
Contributions
Funding
Grant support was provided by the Israel Science Foundation (1123/21) and Israel Cancer Research Fund (22-107-PG).
Acknowledgments
The results here are based, in part, upon data generated by the TCGA Research Network: https://www.cancer.gov/tcga
References
- Hendrich B, Hardeland U, Ng HH. The thymine glycosylase MBD4 can bind to the product of deamination at methylated CpG sites. Nature. 1999; 401(6750):301-304. Google Scholar
- Wiebauer K, Jiricny J. In vitro correction of G.T mispairs to G.C pairs in nuclear extracts from human cells. Nature. 1989; 339(6221):234-236. Google Scholar
- Moréra S, Grin I, Vigouroux A. Biochemical and structural characterization of the glycosylase domain of MBD4 bound to thymine and 5-hydroxymethyuracil-containing DNA. Nucleic Acids Res. 2012; 40(19):9917-9926. Google Scholar
- Cortázar D, Kunz C, Selfridge J. Embryonic lethal phenotype reveals a function of TDG in maintaining epigenetic stability. Nature. 2011; 470(7334):419-423. Google Scholar
- Millar CB, Guy J, Sansom OJ. Enhanced CpG mutability and tumorigenesis in MBD4-deficient mice. Science. 2002; 297(5580):403-405. Google Scholar
- Sanders MA, Chew E, Flensburg C. MBD4 guards against methylation damage and germ line deficiency predisposes to clonal hematopoiesis and early-onset AML. Blood. 2018; 132(14):1526-1534. Google Scholar
- Blombery P, Ryland GL, Fox LC. Methyl-CpG binding domain 4, DNA glycosylase (MBD4)-associated neoplasia syndrome associated with a homozygous missense variant in MBD4: expansion of an emerging phenotype. Br J Haematol. 2022; 198(1):196-199. Google Scholar
- Villy MC, Le Ven A, Le Mentec M. Familial uveal melanoma and other tumors in 25 families with monoallelic germline MBD4 variants. J Natl Cancer Inst. 2024; 116(4):580-587. Google Scholar
- Degasperi A, Zou X, Amarante TD. Substitution mutational signatures in whole-genome–sequenced cancers in the UK population. Science. 2022; 376(6591)Google Scholar
- Palles C, West HD, Chew E. Germline MBD4 deficiency causes a multi-tumor predisposition syndrome. Am J Hum Genet. 2022; 109(5):953-960. Google Scholar
- Haber M, Gauguier D, Youhanna S. Genome-wide diversity in the Levant reveals recent structuring by culture. PLoS Genet. 2013; 9(2):e1003316. Google Scholar
- Beat Acute Myeloid Leukemia (AML) 1.0. 2024. Publisher Full TextGoogle Scholar
- Bernstein N, Spencer Chapman M, Nyamondo K. Analysis of somatic mutations in whole blood from 200,618 individuals identifies pervasive positive selection and novel drivers of clonal hematopoiesis. Nat Genet. 2024; 56(6):1147-1155. Google Scholar
- Biezuner T, Brilon Y, Arye AB. An improved molecular inversion probe based targeted sequencing approach for low variant allele frequency. NAR Genom Bioinform. 2022; 4(1):Iqab125. Google Scholar
- Zink F, Stacey SN, Norddahl GL. Clonal hematopoiesis, with and without candidate driver mutations, is common in the elderly. Blood. 2017; 130(6):742-752. Google Scholar
Data Supplements
Figures & Tables
Article Information

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.