Wednesday, May 20, 2026
Science
No Result
View All Result
  • Login
  • HOME
  • SCIENCE NEWS
  • CONTACT US
  • HOME
  • SCIENCE NEWS
  • CONTACT US
No Result
View All Result
Scienmag
No Result
View All Result
Home Science News Biology

Shandong University Researchers Innovate Multi-Scale Feature Fusion and Weighted Ensemble Learning for Precise Promoter Identification Across Cell Lines

May 20, 2026
in Biology
Reading Time: 4 mins read
0
Shandong University Researchers Innovate Multi-Scale Feature Fusion and Weighted Ensemble Learning for Precise Promoter Identification Across Cell Lines — Biology

Shandong University Researchers Innovate Multi-Scale Feature Fusion and Weighted Ensemble Learning for Precise Promoter Identification Across Cell Lines

65
SHARES
589
VIEWS
Share on FacebookShare on Twitter
ADVERTISEMENT

Promoters are essential elements within the genome that orchestrate the initiation of gene expression by attracting transcriptional machinery to specific DNA sequences. These regulatory regions act as gatekeepers, dictating whether a gene is switched on or off in a particular cell type. This cell-type specificity is a major challenge for computational biologists attempting to accurately identify promoters because a sequence that functions as a promoter in one cellular environment may be inactive in another. The complexity intensifies due to the vast heterogeneity and sequence diversity of promoter regions across the genome. Traditional computational models, typically trained on data from a limited number of cell lines, often struggle to generalize to novel cellular contexts, resulting in reduced accuracy and robustness.

In response to these limitations, a dedicated research team at the Shenzhen Research Institute and the Schools of Mathematics and Software at Shandong University has developed an innovative deep learning-based framework known as MuSE-Promoter. This model tackles the challenge of promoter identification across multiple cell types by integrating diverse computational features and sophisticated neural network architectures. Unlike previous methods that rely heavily on a single type of input feature, MuSE-Promoter leverages a multimodal approach that incorporates semantic embeddings and handcrafted biophysical descriptors to capture various facets of promoter sequences.

The core strength of MuSE-Promoter lies in its ability to process raw DNA sequences via parallel computational branches. One branch extracts semantic embeddings using natural language processing techniques adapted for genomics, specifically DNABERT and Word2Vec algorithms. These embeddings capture the underlying “grammar” of regulatory DNA in a manner analogous to language models interpreting text. The other branch extracts handcrafted biophysical features, including tri-nucleotide physicochemical properties and reverse-complement k-mer frequencies. These features add complementary information regarding the structural and physicochemical attributes of DNA, which are pivotal for transcription factor binding and promoter functionality.

The combined features feed into a multi-scale convolutional neural network enhanced with squeeze-excitation attention mechanisms. This architecture is designed to detect sequence motifs of varying lengths efficiently, recognizing intricate patterns that may be critical for promoter activity. Following convolutional feature extraction, a transformer encoder models long-range interactions within the promoter sequence, accounting for dependencies that span tens or hundreds of base pairs. This step is crucial because promoter function often depends on complex interactions across extended regions rather than localized motifs alone.

MuSE-Promoter further integrates the outputs from this deep learning backbone with predictions from a random forest classifier through a learnable weighted ensemble. This ensemble technique balances the strengths of neural networks and traditional machine learning methods, enhancing the overall robustness of predictions. Such a strategy mitigates overfitting, a common pitfall when models trained on one cell line are applied to others, facilitating more reliable cross-cell-line promoter identification.

The researchers rigorously evaluated MuSE-Promoter on data from four human cell lines—GM12878, HeLa-S3, HUVEC, and K562—as well as on promoter datasets from the plant Arabidopsis thaliana encompassing both TATA-box and non-TATA promoters. The comparative analyses demonstrated that MuSE-Promoter consistently outperforms state-of-the-art promoter prediction tools such as iPro-WAEL and Z-curve. Its superiority is especially pronounced in challenging scenarios involving cross-cell-line generalization and differentiation between promoters and enhancers, which are often confounded due to overlapping regulatory characteristics.

In cross-cell-line validation tests, MuSE-Promoter achieved an impressive average Area Under the Curve (AUC) of 0.991 and Matthews Correlation Coefficient (MCC) values above 0.92. These metrics reflect an exceptional ability to generalize promoter identification beyond the training cell line, a notable advancement over prior methodologies. The model’s learned sequence representations also revealed clear separability between promoters and non-promoters in high-dimensional feature space, and they assigned significant importance weights to biologically established motifs such as CGA, RCKmer, and CC. The capacity to highlight these motifs underscores the model’s interpretability and alignment with known molecular biology.

Professor Hao Wu, co-corresponding author of the study, emphasizes that the strength of MuSE-Promoter derives from combining semantic DNA sequence representations with explicit biophysical insights. “This multi-modal fusion empowers the model to capture the nuanced regulatory language of DNA as well as its structural context, which are both critical for transcription factor recruitment and promoter function,” he notes. Such an integrated approach outperforms models that are limited to either sequence patterns or physicochemical properties alone.

Complementing these insights, Professor Zhangyu Mei highlights the model’s translational potential to advance genome annotation efforts. “MuSE-Promoter is poised to become an indispensable tool for large-scale promoter annotation projects. It enables researchers to decode cell-type-specific regulatory programs more accurately and to distinguish bona fide promoters from other regulatory elements such as enhancers,” Mei explains. This capability is a vital step towards building comprehensive maps of gene regulation that reflect cellular specificity and complexity.

Looking forward, the team aims to extend the MuSE-Promoter framework by integrating multi-omics data layers, including epigenomic marks and chromatin accessibility profiles, to refine promoter identification further. Additionally, the researchers plan to adapt the model to predict enhancer-promoter interactions, shedding light on higher-order gene regulatory networks involved in cellular differentiation and disease. These expansions will harness the power of deep learning to unravel even more intricate regulatory mechanisms underpinning genome function.

All code and datasets underpinning MuSE-Promoter have been made openly accessible via their GitHub repository, promoting transparency and enabling broader adoption by the genomics research community. This openness fosters collaborative developments and benchmarking against emerging tools in promoter prediction.

The implications of MuSE-Promoter resonate beyond bioinformatics, offering potential applications in synthetic biology, precision medicine, and developmental biology by facilitating targeted manipulation of gene expression. By accurately identifying promoters active across diverse cell types, scientists can design gene circuits with precise regulatory controls or uncover dysregulated promoters linked to disease states.

This breakthrough represents a crucial stride toward overcoming the enduring challenge of promoter identification amidst the complexity of cell-type specificity. By integrating advanced machine learning architectures with a rich tapestry of genomic features, MuSE-Promoter sets a new standard in computational genomics and promises to accelerate discoveries in gene regulation.

Subject of Research: Cells

Article Title: MuSE-Promoter: a multi-scale feature fusion and weighted ensemble learning method for identifying promoters across multiple cell lines

Web References: https://github.com/HaoWuLab-Bioinformatics/MuSE-Promoter

References: DOI 10.1016/j.mdmed.2026.100002

Image Credits: Xiao Bi, Zhangyu Mei & Hao Wu

Keywords: Bioinformatics, Genetics, Molecular biology, Mathematics, Technology, Biochemical engineering, Artificial intelligence

Tags: biophysical descriptors in promoter modelingcell-type specific promoter predictioncomputational biology in gene expressioncross-cell line genomic predictiondeep learning models for gene regulationmulti-scale feature fusion for promoter identificationmultimodal neural networks for DNA analysispromoter sequence diversity challengesrobust promoter detection algorithmssemantic embeddings in bioinformaticsShandong University computational genomics researchweighted ensemble learning in genomics
Share26Tweet16
Previous Post

New Study Reveals Early Complex Life Thrived in Oxygen-Rich Seas, Upending Traditional Evolutionary Theories

Next Post

Innovative Digital Platform Alleviates Emotional Distress in Children of Divorce

Related Posts

How Atlantic Herring Rewired Their Reproductive Strategy to Thrive in Changing Oceans — Biology
Biology

How Atlantic Herring Rewired Their Reproductive Strategy to Thrive in Changing Oceans

May 20, 2026
Study Finds Young Fraser River Chinook Salmon Swimming in Chemical Mixture — Biology
Biology

Study Finds Young Fraser River Chinook Salmon Swimming in Chemical Mixture

May 20, 2026
Thousands of UK Beekeepers Contribute Honey to Advance Environmental Science — Biology
Biology

Thousands of UK Beekeepers Contribute Honey to Advance Environmental Science

May 20, 2026
New Fossil Finds in Northwest Canada Transform Understanding of Early Animal Evolution — Biology
Biology

New Fossil Finds in Northwest Canada Transform Understanding of Early Animal Evolution

May 20, 2026
Cows Can Recognize Familiar Human Faces, New Study Reveals — Biology
Biology

Cows Can Recognize Familiar Human Faces, New Study Reveals

May 20, 2026
RNA Editing Enzyme Transforms Aggressive Bone Cancer Cells — Biology
Biology

RNA Editing Enzyme Transforms Aggressive Bone Cancer Cells

May 20, 2026
Next Post
Innovative Digital Platform Alleviates Emotional Distress in Children of Divorce — Social Science

Innovative Digital Platform Alleviates Emotional Distress in Children of Divorce

  • Mothers who receive childcare support from maternal grandparents show more parental warmth, finds NTU Singapore study

    Mothers who receive childcare support from maternal grandparents show more parental warmth, finds NTU Singapore study

    27647 shares
    Share 11055 Tweet 6910
  • University of Seville Breaks 120-Year-Old Mystery, Revises a Key Einstein Concept

    1050 shares
    Share 420 Tweet 263
  • Bee body mass, pathogens and local climate influence heat tolerance

    679 shares
    Share 272 Tweet 170
  • Researchers record first-ever images and data of a shark experiencing a boat strike

    543 shares
    Share 217 Tweet 136
  • Groundbreaking Clinical Trial Reveals Lubiprostone Enhances Kidney Function

    528 shares
    Share 211 Tweet 132
Science

Embark on a thrilling journey of discovery with Scienmag.com—your ultimate source for cutting-edge breakthroughs. Immerse yourself in a world where curiosity knows no limits and tomorrow’s possibilities become today’s reality!

RECENT NEWS

  • Common Food Preservatives Associated with Elevated Blood Pressure and Increased Heart Disease Risk
  • Study Finds Reusable Catheters a Safe Option That Could Save the NHS Millions
  • Primate Frontal Cortex Encodes Action Symbols
  • Eocene Origins of Atacama Desert’s Extreme Aridity

Categories

  • Agriculture
  • Anthropology
  • Archaeology
  • Athmospheric
  • Biology
  • Biotechnology
  • Blog
  • Bussines
  • Cancer
  • Chemistry
  • Climate
  • Earth Science
  • Editorial Policy
  • Marine
  • Mathematics
  • Medicine
  • Pediatry
  • Policy
  • Psychology & Psychiatry
  • Science Education
  • Social Science
  • Space
  • Technology and Engineering

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 5,146 other subscribers

© 2025 Scienmag - Science Magazine

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • HOME
  • SCIENCE NEWS
  • CONTACT US

© 2025 Scienmag - Science Magazine

Discover more from Science

Subscribe now to keep reading and get access to the full archive.

Continue reading