In a groundbreaking advancement at the intersection of computational science and molecular biology, researchers at Harvard University and Northwestern University have unveiled a novel machine learning approach to design intrinsically disordered proteins (IDPs) with customizable properties. This innovation addresses a longstanding challenge in protein science: the inability of even state-of-the-art AI platforms, including the Nobel-winning AlphaFold, to reliably predict or design proteins that resist adopting a fixed three-dimensional structure. Since approximately 30% of all human proteins fall into this intrinsically disordered category, this new methodology holds transformative potential for the biosciences, synthetic biology, and therapeutic development.
Intrinsically disordered proteins deviate from the traditional protein paradigm, where function is closely linked to a stable, folded structure. Instead, IDPs exist as dynamic ensembles of conformations, fluctuating constantly rather than settling into a singular architecture. This structural fluidity underpins their critical roles in biological processes such as molecular signaling, cross-linking, and environmental sensing, but it also presents a vexing obstacle for computational modeling. The transient and heterogeneous nature of IDPs means that conventional structure prediction algorithms, which rely on defined folding patterns, falter when applied to these proteins.
To surmount this barrier, the team led by Harvard’s Paulson School of Engineering and Applied Sciences, along with collaborators at Northwestern, leveraged a sophisticated machine learning technique centered around automatic differentiation—a key computational concept widely utilized in deep learning. Automatic differentiation facilitates the calculation of exact derivatives of physical simulations in real-time, enabling precise optimization by highlighting how infinitesimal changes at the amino acid sequence level translate into modifications of ensemble behavior. Essentially, this approach enables a physics-based, gradient-driven search engine for protein sequences, identifying those with specific dynamic properties rather than fixed structures.
This departure from purely data-driven AI models represents a new paradigm: instead of training machine learning systems solely on empirical protein structures, the researchers integrated physics-based molecular dynamics simulations directly into the optimization loop. By doing so, they generated “differentiable” IDPs, whose properties are tethered to the fundamental laws governing molecular interactions and thermal fluctuations. This allows for a rational design of proteins tailored to functions spanning from molecular connectors that form loops to sensors that react to environmental changes.
Ryan Krueger, a graduate student at Harvard and one of the co-lead authors, explained the motivation behind this approach: “We wanted to avoid training models on vast datasets with limited applicability and instead utilize existing, validated simulations to generate new protein designs directly informed by physical reality.” This contrasts with prior strategies that often relied heavily on patterns mined from known protein structures, which are ill-equipped to capture the dynamic heterogeneity of disordered proteins.
The practical implications of this work are profound. IDPs have been implicated in a variety of diseases, notably neurodegenerative disorders such as Parkinson’s disease, where aberrant forms of alpha-synuclein—a prototypical intrinsically disordered protein—contribute to pathology. Being able to design IDPs with targeted functionalities and behaviors opens avenues for not only deeper mechanistic insights but also innovative therapeutic approaches that could modulate or mimic their activity.
From a technical standpoint, the research capitalized on gradient-based optimization methods, routinely used in neural network training, to iteratively refine protein sequences. These methods compute derivatives of objective functions concerning sequence parameters, enabling the algorithm to “climb” toward optimal configurations that exhibit the desired biophysical traits. Unlike heuristic or stochastic search techniques, this ensures computational efficiency and enhanced precision in navigating the vast combinatorial space of amino acid combinations.
Moreover, the team’s strategy integrates seamlessly with molecular dynamics, a computational method that simulates the physical motions of atoms and molecules over time. By coupling automatic differentiation algorithms with these physics-based simulations, the optimization process harnesses the rich dynamic profile of IDPs, including their transient interactions and conformational ensembles, to inform design decisions. This synergistic approach bridges the gap between theoretical modeling and functional protein engineering.
The study, published in the prestigious journal Nature Computational Science, signifies a critical step toward the rational design of biomolecules that elude conventional design frameworks. It comes at a pivotal moment when advances in artificial intelligence are rapidly reshaping biological research, yet intrinsic disorder remains a frontier. The work was co-led by Krishna Shrinivas, an assistant professor at Northwestern University and former NSF-Simons QuantBio Fellow, alongside Michael Brenner, the Catalyst Professor of Applied Mathematics and Applied Physics at Harvard SEAS.
Further supporting this multi-institutional collaboration were federal agencies including the National Science Foundation AI Institute of Dynamic Systems, the Office of Naval Research, and various Harvard-based research centers. The collective expertise spanned applied mathematics, computational physics, and molecular biology, underscoring the interdisciplinary nature essential for tackling such a complex problem.
Looking ahead, the implications of this method extend beyond natural protein systems. In synthetic biology, engineered IDPs designed with specified properties could serve as novel biomaterials, adaptable sensors, or dynamic scaffolds. The ability to computationally tune sequence-ensemble-function relationships with such granularity offers a powerful toolkit for biotechnologists and pharmaceutical developers alike.
In summary, the team’s innovative utilization of automatic differentiation within a physics-based simulation framework provides a robust, data-efficient pathway to unlocking the mysteries of intrinsically disordered proteins. By transcending the limitations of existing AI models, this research sets the stage for designing a previously inaccessible class of proteins with vast biological and clinical potential.
Subject of Research: Not applicable
Article Title: Generalized design of sequence–ensemble–function relationships for intrinsically disordered proteins
News Publication Date: 6-Oct-2025
Web References:
- Article DOI: 10.1038/s43588-025-00881-y
- Associated institutions: Harvard SEAS, Northwestern University
References:
- Original research published in Nature Computational Science
Image Credits: Ramanna Shrinivas
Keywords:
Protein folding, Protein expression, Protein stability, Proteins, Protein activity, Life sciences, Biochemistry, Biomolecules, Machine learning, Artificial neural networks, Deep learning, Computer science, Applied physics, Applied mathematics, Algorithms