Integration of Transcriptomics and Long-Read Genomics Prioritizes Structural Variants in Rare Disease

Tanner D Jensen,Bohan Ni,Chloe M Reuter,John E Gorzynski,Sarah Fazal,Devon Bonner,Rachel A Ungar,Pagé C Goddard,Archana Raja,Euan A Ashley,Jonathan A Bernstein,Stephan Zuchner,Undiagnosed Diseases Network,Michael D Greicius,Stephen B Montgomery,Michael C Schatz,Matthew T Wheeler,Alexis Battle

Genome research（2025）

Department of Genetics | Department of Computer Science | Center for Undiagnosed Diseases | University of Miami. | Stanford University.

Cited 0|Views58

Abstract

Rare structural variants (SVs) – insertions, deletions, and complex rearrangements – can cause Mendelian disease, yet they remain difficult to accurately detect and interpret. We sequenced and analyzed Oxford Nanopore long-read genomes of 68 individuals from the Undiagnosed Disease Network (UDN) with no previously identified diagnostic mutations from short-read sequencing. Using our optimized SV detection pipelines and 571 control long-read genomes, we detected 716 long-read rare (MAF < 0.01) SV alleles per genome on average, achieving a 2.4x increase from short-reads. To characterize the functional effects of rare SVs, we assessed their relationship with gene expression from blood or fibroblasts from the same individuals, and found that rare SVs overlapping enhancers were enriched (LOR = 0.46) near expression outliers. We also evaluated tandem repeat expansions (TREs) and found 14 rare TREs per genome; notably these TREs were also enriched near overexpression outliers. To prioritize candidate functional SVs, we developed Watershed-SV, a probabilistic model that integrates expression data with SV-specific genomic annotations, which significantly outperforms baseline models that do not incorporate expression data. Watershed-SV identified a median of eight high-confidence functional SVs per UDN genome. Notably, this included compound heterozygous deletions inFAM177A1shared by two siblings, which were likely causal for a rare neurodevelopmental disorder. Our observations demonstrate the promise of integrating long-read sequencing with gene expression towards improving the prioritization of functional SVs and TREs in rare disease patients.

Translated text

Key words

Structural Variation

Bibtex

AI Read Science

AI Summary

AI Summary is the key point extracted automatically understanding the full text of the paper, including the background, methods, results, conclusions, icons and other key content, so that you can get the outline of the paper at a glance.

Example

Background

Key content

Introduction

Methods

Results

Related work

Fund

Key content

Pretraining has recently greatly promoted the development of natural language processing (NLP)
We show that M6 outperforms the baselines in multimodal downstream tasks, and the large M6 with 10 parameters can reach a better performance
We propose a method called M6 that is able to process information of multiple modalities and perform both single-modal and cross-modal understanding and generation
The model is scaled to large model with 10 billion parameters with sophisticated deployment, and the 10 -parameter M6-large is the largest pretrained model in Chinese
Experimental results show that our proposed M6 outperforms the baseline in a number of downstream tasks concerning both single modality and multiple modalities We will continue the pretraining of extremely large models by increasing data to explore the limit of its performance

Try using models to generate summary,it takes about 60s

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Related Papers

Reference papers

Characterizing the Major Structural Variant Alleles of the Human Genome

Peter A. Audano,Arvis Sulovari,Tina A. Graves-Lindsay,Stuart Cantsilieris,Melanie Sorensen,AnneMarie E. Welch,Max L. Dougherty,Bradley J. Nelson,Ankeeta Shah,Susan K. Dutcher,Wesley C. Warren,Vincent Magrini,

2019

被引用404 | 浏览

Multi-platform Discovery of Haplotype-Resolved Structural Variation in Human Genomes

Mark J. P. Chaisson,Ashley D. Sanders,Xuefang Zhao,Ankit Malhotra,David Porubsky,Tobias Rausch,Eugene J. Gardner,Oscar L. Rodriguez,Li Guo,Ryan L. Collins,Xian Fan,Jia Wen,

2017

被引用692 | 浏览

Mapping and Characterization of Structural Variation in 17,795 Human Genomes

Haley J. Abel,David E. Larson,Allison A. Regier,Colby Chiang,Indraniel Das,Krishna L. Kanchi,Ryan M. Layer,Benjamin M. Neale,William J. Salerno,Catherine Reeves,Steven Buyske,Goncalo R. Abecasis,

2020

被引用211 | 浏览

Identification of rare-disease genes using blood transcriptome sequencing and large control cohorts

Frésard, Laure,Smail, Craig,Ferraro, Nicole M.,Teran, Nicole A.,Li, Xin,Smith, Kevin S.,Bonner, Devon,Kernohan, Kristin D.,Marwaha, Shruti,Zappala, Zachary,Balliu, Brunilda,Davis, Joe R.,

2019

被引用239 | 浏览

Expansion of GGC Repeat in GIPC1 is Associated with Oculopharyngodistal Myopathy.

Jianwen Deng,Jiaxi Yu,Pidong Li,Xinghua Luan,Li Cao,Juan Zhao,Meng Yu,Wei Zhang,He Lv,Zhiying Xie,LingChao Meng,Yiming Zheng,

2020

被引用107 | 浏览

Repeat expansion diseases

DH Geschwind,HL Paulson, C Klein

2018

被引用305 | 浏览

Genome-wide Enhancer Maps Link Risk Variants to Disease Genes

Joseph Nasser,Drew T. Bergman,Charles P. Fulco,Philine Guckelberger,Benjamin R. Doughty,Tejal A. Patwardhan,Thouis R. Jones,Tung H. Nguyen,Jacob C. Ulirsch,Fritz Lekschas,Kristy Mualim,Heini M. Natri,

2021

被引用388 | 浏览

Long-read Genome Sequencing for the Molecular Diagnosis of Neurodevelopmental Disorders

Susan M. Hiatt,James M. J. Lawlor,Lori H. Handley,Ryne C. Ramaker,Brianne B. Rogers,E. Christopher Partridge,Lori Beth Boston,Melissa Williams,Christopher B. Plott,Jerry Jenkins,David E. Gray,James M. Holt,

2021

被引用43 | 浏览

GGC Repeat Expansions in NOTCH2NLC Causing a Phenotype of Distal Motor Neuropathy and Myopathy

Jiaxi Yu,Xing-hua Luan,Meng Yu,Wei Zhang,He Lv,Li Cao,Lingchao Meng,Min Zhu,Binbin Zhou,Xiao-rong Wu,Pidong Li,Qiang Gang,

2021

被引用26 | 浏览

The Functional Impact of Rare Variation Across the Regulatory Cascade

Taibo Li,Nicole Ferraro,Benjamin J. Strober,Francois Aguet,Silva Kasela,Marios Arvanitis,Bohan Ni,Laurens Wiel,Elliot Hershberg,Kristin Ardlie,Dan E. Arking, Rebecca L. Beer,

2023

被引用4 | 浏览

Identification and Characterization of Two DMD Pedigrees with Large Inversion Mutations Based on a Long-Read Sequencing Pipeline.

Chang Geng,Ciliu Zhang,Pidong Li,Yuanren Tong,Baosheng Zhu,Jing He,Yanhuan Zhao,Fengxia Yao,Li-Ying Cui,Fan Liang, Yang Wang, Yaru Wang,

2022

被引用9 | 浏览

Scalable Nanopore Sequencing of Human Genomes Provides a Comprehensive View of Haplotype-Resolved Variation and Methylation

Mikhail Kolmogorov,Kimberley J. Billingsley,Mira Mastoras,Melissa Meredith,Jean Monlong,Ryan Lorig-Roach,Mobin Asri,Pilar Alvarez Jerez,Laksh Malik,Ramita Dewan,Xylena Reed,Rylee M. Genner,

2023

被引用53 | 浏览

Data Disclaimer

The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn

Chat Paper

【要点】：本文提出了一种整合转录组学和长读基因组学的方法，优先识别罕见疾病中的结构变异，并开发了一种概率模型Watershed-SV，通过结合表达数据与结构变异特异性基因组注释，提高了功能SV的优先级判断。

【方法】：作者使用Oxford Nanopore长读测序技术，优化了SV检测流程，并结合571个对照长读基因组进行分析，通过评估SV与基因表达的关系，开发Watershed-SV模型。

【实验】：研究对68位来自未诊断疾病网络（UDN）的个体进行了测序，这些个体在短读测序中未发现诊断性突变。实验检测了平均每个基因组中716个罕见（MAF < 0.01）SV等位基因，并在同一个体的血液或成纤维细胞中评估了SV的功能效应，发现罕见SV与增强子重叠的基因表达异常相关。此外，还评估了串联重复扩展（TREs），发现每个基因组中平均有14个罕见TREs。使用Watershed-SV模型，在UDN基因组中识别了中位数为八个高置信度功能SV。

去 AI 文献库对话