Codon Optimization Moratorium Whitepaper
Codon Optimization in mRNA Vaccines and Gene Therapies: An Assessment of Protein Misfolding Risks and Regulatory Oversight
The following Whitepaper was submitted today the FDA’s Vaccines and Related Biological Products Advisory Committee.
Created by Ehden Biber, 14th of May, 2025.
BONUS - link to the PDF enclosed at the end, before works cited.
I. Executive Summary
Objective: This white paper evaluates the scientific evidence and regulatory oversight related to codon optimization in messenger RNA (mRNA) vaccines and gene therapies for both human and animal applications. Its primary goal is to assess whether the potential risks—specifically protein misfolding, aggregation, and associated diseases like amyloidogenesis and prionogenesis—justify a moratorium on this technology’s use. The analysis aims to inform policy by balancing innovation with safety.
Scope: The assessment covers the principles of codon optimization, the mechanisms of protein folding and misfolding during translation, and the risks of amyloid formation and prion-like propagation. It reviews regulatory frameworks, including the U.S. Food and Drug Administration (FDA) and European Medicines Agency (EMA) Chemistry, Manufacturing, and Controls (CMC) requirements. The scope includes a detailed examination of scientific literature—both foundational and recent—such as studies on tissue-specific codon usage, codon context effects, and structural risks in mRNA vaccines. It also analyzes preclinical and clinical safety data, pharmacovigilance reports, and a case study on COVID-19 mRNA vaccines. Additionally, the paper explores implications for veterinary vaccines and food chain safety, concluding with an updated risk management and policy evaluation.
Key Findings: Codon optimization enhances protein production in mRNA-based vaccines and therapies but introduces risks that challenge its presumed safety. New research shows that changing codons—previously thought to be neutral—affects translation speed and accuracy, potentially leading to protein misfolding and aggregation. For example, the cardiomyocyte PAO model reveals that misfolded proteins can cause delayed, severe damage, such as heart failure, broadening concerns beyond brain-related diseases. Data from TissueCoCoPUTs indicate that generic codon optimization often mismatches the translation needs of specific tissues, increasing misfolding risks. In COVID-19 mRNA vaccines, structural issues like excessive G-quadruplex formation may heighten these dangers.
Regulatory oversight, however, falls short. There are no standard tests to evaluate how codon changes affect protein shape or clumping, and long-term monitoring is insufficient to detect slow-developing issues like neurodegeneration or prion-like diseases. The rapid rollout of COVID-19 mRNA vaccines lacked thorough evaluation of these folding risks. In veterinary applications, the use of codon-optimized therapies in livestock raises unresolved concerns about misfolded proteins entering the food chain and environment, potentially affecting human health.
Conclusion on Moratorium
Given the mounting evidence of risk and gaps in oversight, a tiered moratorium is recommended as the most cautious and effective approach:
· Pediatric Moratorium: Codon-optimized mRNA vaccines and gene therapies should not be used in children under 18 until safer practices are developed. This includes using tissue-specific codon designs, conducting 24-month safety studies in primates, and setting limits on G-quadruplex formation. Children’s developing tissues are especially vulnerable, and long-term data are lacking.
· Adult Use Restrictions: New clinical trials for risky mRNA designs—those with heavy optimization, high clumping potential, or rich GC content—should pause. This halt would last until advanced tests, such as Kinetic Folding Assurance (KFA), hidden gene screening, and cross-seeding checks, are required.
· Veterinary Prohibition: Codon-optimized gene therapies in animals entering the food supply should be banned. Evidence from the PAO model and prion diseases suggests a risk of transmissible misfolded proteins, posing a threat to food safety.
This moratorium should persist until:
· Reliable tests for protein folding and clumping are part of regulatory standards.
· Codon optimization accounts for tissue-specific needs and translation timing.
· Independent research verifies the safety of high-risk designs and pediatric use.
· Better monitoring systems track long-term health effects.
This approach marks a shift in how codon optimization is viewed: it’s not just a tool for efficiency but a process that demands careful study of its effects on protein structure and safety over time. By adopting these measures, we can protect public health while still advancing mRNA and gene therapy innovations.
II. Background & Definitions
A. Codon Optimization
Definition: Codon optimization is a gene engineering technique involving the modification of a gene's nucleotide sequence using synonymous codons – different nucleotide triplets that code for the same amino acid – without altering the primary amino acid sequence of the encoded protein.1 It is a form of synthetic gene design aimed at improving gene expression when transferring a gene from one organism (source) to another (host) for production or therapeutic effect.
Rationale: The primary motivation for codon optimization stems from the phenomenon of codon usage bias, where different organisms exhibit distinct preferences for specific codons encoding the same amino acid.1 When a gene's native codon usage mismatches the host's preferences, translation by the host's ribosomes can become inefficient.1 This inefficiency can manifest as slower translation rates due to ribosome pausing at "rare" codons, reduced overall protein yield, and potentially increased errors leading to non-functional proteins.1 Codon optimization seeks to overcome these limitations primarily to:
1. Enhance Translation Efficiency: By replacing codons that are infrequently used in the target host with codons that are more abundant or preferred, the process aims to facilitate smoother and faster ribosome transit along the mRNA molecule.1
2. Increase Protein Yield: Improved translation efficiency directly translates to higher production levels of the desired recombinant protein or therapeutic molecule, which is critical for the economic viability and dosing of vaccines and protein therapies.1
3. Modulate mRNA Stability and Structure: Optimization algorithms often adjust the GC content (percentage of guanine and cytosine bases) of the sequence, which can influence mRNA secondary structure and overall stability, potentially impacting both transcript longevity and translational accessibility.2 Furthermore, codon choice itself has been shown to influence mRNA degradation kinetics, suggesting a complex interplay between codon usage, ribosome density, and mRNA lifetime.3
Common Algorithms/Metrics: Several computational metrics and strategies guide the codon optimization process:
● Codon Adaptation Index (CAI): CAI quantifies the similarity between the codon usage pattern of a specific gene and a reference set of highly expressed genes within the target host organism.1 It is calculated as the geometric mean of the relative adaptiveness values for each codon in the sequence, normalized to the maximum possible value (1.0).4 A higher CAI score is generally presumed to correlate with higher expression levels.1 However, some algorithms employ a simplistic 'one amino acid-one codon' strategy to maximize CAI, potentially ignoring other biologically relevant factors like translation kinetics.2
● tRNA Adaptation Index (tAI): This metric estimates translational efficiency based on the relative abundance of transfer RNA (tRNA) molecules corresponding to each codon within the host cell.1 It aims to reflect the availability of the necessary translational machinery, potentially offering a more direct measure of kinetic efficiency than CAI alone.
● GC-Content: The overall percentage of G and C nucleotides affects mRNA stability and the potential for forming secondary structures that might impede ribosome movement.1 Optimization often involves adjusting GC content to fall within a range considered optimal for the host organism, typically avoiding extremes.2 Additionally, the frequency of specific dinucleotides, like CpG, can be modulated; while sometimes avoided due to potential immune sensing (e.g., by ZAP protein), increased CpG content was intentionally introduced in some COVID-19 vaccine mRNAs, potentially enhancing stability and attenuating hypothetical recombinant viruses.6
● Other Factors: More sophisticated algorithms may also consider Codon Context (the influence of neighboring codons), Individual Codon Usage (ICU), avoidance of undesirable sequence motifs (e.g., cryptic splice sites, premature polyadenylation signals, strong secondary structures, sequences prone to RNA editing), and overall sequence complexity.1
Table II.1: Comparison of Common Codon Optimization Metrics
B. Co-translational Folding & Misfolding
The synthesis of a protein on the ribosome is intimately coupled with its folding into a functional three-dimensional structure. This process, known as co-translational folding, begins while the nascent polypeptide chain is still emerging from the ribosome exit tunnel.7 The kinetics of translation elongation, influenced by the sequence of codons being read, dictates the timing with which different segments and domains of the polypeptide become available for folding interactions. This temporal dimension is critical; the rate at which the polypeptide emerges can significantly influence the folding pathway it takes through its complex energy landscape – a theoretical representation of all possible conformations and their relative.7
Natural mRNA sequences often contain patterns of common and rare codons. Clusters of rare codons can cause ribosomes to pause, potentially providing crucial time for newly synthesized domains or subdomains to fold correctly before subsequent segments emerge and interact.3 Aggressive codon optimization, particularly strategies aiming solely to maximize translation speed by replacing all rare codons with common ones, can eliminate these natural pause sites. This alteration in translation kinetics can force the nascent chain down different folding pathways compared to its natural synthesis or even compared to the refolding of the full-length protein in vitro.3 The ribosome itself is not merely a passive production machine; it acts as a crucial modulator of folding, interacting with the nascent chain and potentially stabilizing folding intermediates that are transient or absent off the ribosome.8 By changing the speed and rhythm of polypeptide emergence, codon optimization fundamentally alters these intricate interactions and the kinetic landscape of co-translational folding.
Misfolding
When folding pathways are perturbed, the protein may fail to reach its native, functional state, resulting in misfolding. Misfolded proteins often expose hydrophobic residues or specific sequences known as aggregation-prone regions (APRs) that are normally buried within the native structure.10 These exposed regions can mediate intermolecular interactions, leading to the formation of protein aggregates. Aggregation can proceed through various stages, forming soluble oligomers, larger protofibrils, and ultimately mature, often insoluble, fibrils.10 There is substantial evidence suggesting that early-stage soluble oligomers are often the most cytotoxic species, capable of disrupting cellular processes and membrane integrity.10
A specific and highly ordered form of protein aggregate is the amyloid fibril, characterized by a cross-β sheet structure where β-strands run perpendicular to the fibril axis.10 The process of amyloid formation, amyloidogenesis, is associated with a wide range of debilitating human diseases, collectively known as amyloidoses, including Alzheimer's disease (Amyloid-β protein), Parkinson's disease (α-synuclein), and Type 2 Diabetes (Islet Amyloid Polypeptide).10
Prion diseases
Prion diseases represent a unique category where the misfolded protein itself (PrP^Sc^) acts as an infectious agent, templating the conformational conversion of the normal cellular form (PrP^C^) into the pathogenic, aggregation-prone state.10 This self-propagating misfolding mechanism raises concerns about transmissibility. Importantly, the concept of "prion-like" behavior or seeding has been extended to other amyloidogenic proteins; aggregates of one type of protein may be capable of inducing or accelerating the aggregation of the same protein (homotypic seeding) or even different aggregation-prone proteins (heterotypic cross-seeding).10 This raises the possibility that introduction of an aggregation-prone therapeutic protein could potentially trigger or exacerbate endogenous protein aggregation processes. Bioinformatics tools exist to predict APRs within protein sequences, offering a potential avenue for risk assessment during design.11
The connection between optimizing codons for speed and the risk of misfolding arises directly from the kinetics of translation. Natural selection has likely fine-tuned codon usage patterns not just for overall speed but also for optimal folding trajectories, incorporating pauses where necessary. By prioritizing speed above all else, codon optimization may inadvertently disrupt this evolved kinetic control. An altered rate of polypeptide emergence from the ribosome can deny specific domains sufficient time to attain their correct fold before potentially interfering downstream segments appear, thus increasing the probability of the nascent chain entering kinetically trapped, misfolded states or exposing APRs that initiate aggregation.3 Therefore, the very strategy intended to maximize functional protein yield (via faster synthesis) could paradoxically increase the proportion of non-functional, potentially harmful misfolded or aggregated protein.
C. Regulatory Context
The development and approval of mRNA vaccines and gene therapies are governed by comprehensive regulatory frameworks established by agencies like the FDA in the United States and the EMA in the European Union. Key regulations include the FDA's Current Good Manufacturing Practices (CGMP) as detailed in 21 CFR Parts 210 and 211, and the EMA's Directive 2001/83/EC, supplemented by numerous guidelines from the International Council for Harmonisation (ICH).15
These frameworks mandate rigorous control over the Chemistry, Manufacturing, and Controls (CMC) aspects of product development. Central to CMC is the thorough characterization of the product and the validation of the manufacturing process to ensure consistency, quality, and safety.15 Critical Quality Attributes (CQAs) – physical, chemical, biological, or microbiological attributes that should be within an appropriate limit, range, or distribution to ensure the desired product quality – must be identified and controlled. For biologics, including mRNA products and the proteins they encode, essential CQAs typically include identity, purity, potency (biological activity), and stability.15 Manufacturers must establish specifications for these attributes and use validated analytical methods for routine testing and lot release.15 Stability studies are required to determine appropriate storage conditions and shelf life.15 Identity testing must confirm the product is what it purports to be and distinguish it from other products.15
ICH guidelines, widely adopted by both FDA and EMA, provide further detail, emphasizing a science- and risk-based approach throughout the product lifecycle (ICH Q8-Q12).17 ICH Q9 specifically focuses on Quality Risk Management, requiring manufacturers to identify, analyze, and control potential risks to product quality. ICH Q12 facilitates the management of post-approval CMC changes.17
Within this context, the nucleotide sequence of an mRNA therapeutic is a fundamental aspect of its identity. Regulatory guidance requires submission and control of the full sequence, including untranslated regions (UTRs), the coding sequence, modifications (like capping and polyadenylation), and any non-standard nucleotides used.19 Codon optimization, as a modification of the coding sequence, falls under this purview. However, a critical question arises regarding the depth of assessment specifically required for the potential downstream consequences of codon optimization on the encoded protein's structure and function, beyond ensuring the correct amino acid sequence.
While regulations clearly mandate assessment of the final protein product's potency and purity, there appears to be ambiguity or lack of specific guidance on whether the act of codon optimization itself triggers a requirement for dedicated studies to rule out adverse effects on protein folding and aggregation propensity. Existing guidance documents reviewed 15 emphasize characterization of the final product and control of the overall manufacturing process. They require demonstration of biological activity (potency) and control of impurities (including product-related variants). However, standard potency assays might not be sensitive enough to detect subtle conformational changes or low levels of aggregation that do not grossly impair function but could pose long-term risks (e.g., immunogenicity, seeding). Similarly, purity assays might identify major variants but not necessarily characterize the conformational state of the main product peak. The potential impact of synonymous codon changes on higher-order structure and aggregation risk might therefore be implicitly assumed to be captured by these general assessments, rather than being systematically investigated through specific comparative studies (e.g., comparing optimized vs. wild-type protein folding using biophysical methods) as a standard requirement triggered by the use of codon optimization. This potential gap suggests that the structural and aggregation risks specifically introduced by altering translation kinetics via codon optimization may not be consistently or explicitly addressed under current standard regulatory expectations.
III. Literature Review & Deep Analysis
A. Foundational Studies
The rationale and potential pitfalls of codon optimization have been debated in the scientific literature for years. Several foundational studies provide critical context for understanding the potential risks associated with this technology.
Early Evidence (ca. 2006) on Structure, Codon Usage, and Evolution: Since 2006, research has already hinted at the non-neutrality of synonymous codons and their interplay with protein structure and evolution. Studies investigating the determinants of protein evolutionary rates (dN, nonsynonymous substitutions per site) in organisms like yeast identified strong correlations between evolutionary rate and factors related to protein folding and stability.22 A dominant factor identified was gene expression level: highly expressed proteins tend to evolve more slowly.22 This is thought to reflect stronger purifying selection against mutations that cause misfolding or mistranslation, as the cellular cost of such errors is higher for abundant proteins.23 Furthermore, inherent structural properties, such as the density of contacts between amino acid residues (contact density) and the fraction of buried residues, were also found to correlate with evolutionary rate.22 This was linked to the concept of "designability" – the number of different amino acid sequences that can successfully fold into a given structure. Structures with higher contact density (often involving more buried residues) were proposed to be more designable and, perhaps counterintuitively, showed a tendency to evolve their sequences more rapidly, even though individual buried residues are typically highly conserved.22 This suggests that overall structural robustness influences tolerance to sequence changes. Concurrently, other studies confirmed that codon usage bias is strongly correlated with gene expression levels and impacts the rate of synonymous substitution (dS), consistent with selection pressure for translational efficiency and accuracy.25 Together, these early studies demonstrated that protein structure imposes constraints on sequence evolution, that expression levels drive selection related to folding fidelity, and that codon usage is under selective pressure related to translation. This body of work supports the premise that altering codon usage patterns via optimization is unlikely to be biologically neutral and interacts deeply with the biophysical constraints of protein folding and stability.
Cardiomyocyte expression of a polyglutamine preamyloid oligomer causes heart failure, Pattison et al., 2008: This publication shifts the focus from neurodegenerative diseases, where protein misfolding is a well-established hallmark, to the realm of cardiac disease.61 The study investigates how protein misfolding, specifically the formation of pre-amyloid oligomers (PAOs) from polyglutamine repeats, affects the heart. PAOs are misfolded protein structures implicated in diseases like amyloidosis, which is known to impact cardiac function. Using transgenic mice with cardiomyocyte-restricted expression of polyglutamine repeats, the researchers demonstrated that the accumulation of these misfolded proteins leads to intracellular oligomers and aggregates, resulting in cardiomyocyte death and heart failure. This provides a clear, causal connection between protein misfolding and cardiac disease. This paper is groundbreaking because it establishes that protein misfolding, specifically PAO accumulation, is sufficient to cause heart failure. This was a novel contribution at the time, offering experimental evidence to support a hypothesis that had been previously theoretical or observational. The use of a transgenic mouse model to test this relationship adds robustness to the findings, made it a landmark study in the field. This study has significant implications for the development of mRNA-based therapeutics, particularly those that aim to express proteins within cardiac tissues. If codon optimization strategies employed in these therapies inadvertently lead to increased expression of proteins that have an inherent propensity to misfold or form toxic oligomers, it could potentially exacerbate or even induce cardiac pathologies.43 Therefore, careful consideration of the protein's folding landscape and the potential impact of codon optimization on its conformational stability is crucial when designing mRNA therapeutics targeting the heart.
A critical analysis of codon optimization in human therapeutics" (Mauro & Chappell, Trends Mol. Med. 2014): This seminal review offered a critical counterpoint to the often-unquestioned pursuit of codon optimization for maximizing protein yield.3 Mauro and Chappell argued forcefully that synonymous codon changes are not truly "silent" with respect to the protein product. Based on a synthesis of existing literature and biological principles, they concluded that codon optimization strategies can inadvertently alter protein conformation, impair biological function, increase immunogenicity, and ultimately reduce the efficacy of therapeutic proteins and nucleic acid-based therapies (including mRNA vaccines and gene therapy).3 The authors highlighted the evolved complexity of natural codon usage patterns, suggesting they may contain functionally important information related to the regulation of translation speed (e.g., programmed pauses for folding), mRNA structure, and stability that is often disregarded by simplistic optimization algorithms focused solely on codon frequency.3 They identified potential hazards specific to nucleic acid therapies, such as unintended effects on RNA processing like A-to-I editing.9 Their analysis challenged the core scientific assumptions underpinning many codon optimization approaches and recommended a critical reconsideration of its use, particularly for in vivo applications where safety and precise function are paramount.9 This paper established a strong theoretical and evidence-based foundation for the concerns explored in this white paper.
Case for the genetic code as a triplet of triplets (Chevance & Hughes, PNAS, 2017): Chevance and Hughes present a compelling argument for a more complex understanding of the genetic code, suggesting that the efficiency of mRNA translation is not solely determined by individual codons but is also significantly influenced by the identity of the immediately preceding codons.60 Through elegant in vivo experiments using the Salmonella flagellar gene flgM, the researchers demonstrated that synonymous substitutions in codons flanking a specific translated codon could have dramatic effects on the overall level of protein activity, indicating significant alterations in translation speed.60 Their findings support a model where efficient mRNA translation is governed by a "triplet-of-triplet" genetic code, meaning that the ribosome's ability to accurately and efficiently decode a particular codon is influenced by the identity of the two codons immediately upstream. 60 This has profound implications for codon optimization strategies, which traditionally focus on optimizing the frequency of individual codons based on host organism preferences.60 The work by Chevance and Hughes suggests that these approaches might be overly simplistic and could potentially disrupt the natural translational dynamics of an mRNA by failing to account for these crucial codon context effects.60 Therefore, more sophisticated codon optimization algorithms that incorporate the influence of neighboring codons on translation kinetics might be necessary to achieve optimal protein expression and avoid unintended consequences on protein folding and function.
TissueCoCoPUTs: Novel Human Tissue-Specific Codon and CodonPair Usage Tables Based on Differential Tissue Gene Expression (JMB, 2020): The study presents a computational resource detailing tissue-specific codon, codon-pair, and dinucleotide usage across 51 human tissues, leveraging GTEx transcriptome data. This work reveals significant codon usage variation distinct from genomic averages, driven by gene expression levels within each tissue. Key findings highlight that tissues like whole blood exhibit highly skewed codon preferences due to dominant genes (e.g., hemoglobin), while less biased tissues like the liver show broader codon diversity. Regarding codon optimization risks, the study underscores that traditional strategies relying on genomic codon usage may fail in tissue-specific contexts. Optimizing a gene for one tissue’s codon preferences could lead to inefficient translation or altered kinetics in another tissue with different tRNA pools and codon biases, potentially causing protein misfolding or reduced expression. For instance, a gene optimized for liver expression might perform poorly in blood, risking therapeutic inefficacy or conformational errors. Additionally, the distinct codon-pair biases identified suggest that ignoring these patterns during optimization could further disrupt translation efficiency and protein folding. This resource emphasizes the need for tissue-tailored codon optimization to mitigate risks of suboptimal expression, misfolding, or unintended immunogenicity, challenging the assumption that generic optimization ensures universal efficacy. By providing tissue-specific data, TissueCoCoPUTs enables more precise therapeutic design, highlighting the potential pitfalls of overlooking codon usage heterogeneity in biotherapeutic development.
Detailed Dissection and Critical Evaluation of the Pfizer/BioNTech and Moderna mRNA Vaccines." (Xia, MDPI, 2021): In this publication58, Xia provides a detailed and critical analysis of the design strategies employed in the development of the two leading mRNA vaccines against SARS-CoV-2: the Pfizer/BioNTech vaccine (BNT162b2) and the Moderna vaccine (mRNA-1273).58 The author meticulously examines the various optimization steps undertaken in the design of these vaccines, including the selection and engineering of the 5' and 3' untranslated regions (UTRs) to enhance ribosome loading and mRNA stability, the optimization of codon usage within the spike protein-encoding sequence to improve translational elongation, and the choice of optimal stop codons for efficient translation termination.58 By comparing the codon usage patterns in the vaccine mRNAs with those observed in highly expressed human ribosomal protein genes, Xia identifies several instances where the optimization strategies might have been sub-optimal or even potentially detrimental.58 For example, the analysis points out discrepancies in the use of specific codons and the potential for the introduction of suboptimal codon pairs, which could affect translation kinetics and potentially protein folding. 58 The author emphasizes that different optimization goals, such as maximizing translation speed and ensuring mRNA stability, can sometimes conflict with each other, necessitating careful compromises during the design process. 58 The objective of this critical evaluation is to facilitate the future development of even better strategies for vaccine mRNA optimization by highlighting the similarities and differences between the design choices made for these two highly successful vaccines and by discussing the potential advantages and disadvantages of each approach.
Codon-optimization in gene therapy: promises, prospects and challenges (Paremskaia et al, Front Bioeng Biotechnol., 2024) 57: Published a decade after the Mauro and Chappell critique, this more recent review by Paremskaia and colleagues provides an updated perspective on the role of codon optimization specifically within the context of gene therapy.57 The authors acknowledge the significant promises offered by codon optimization in the field, particularly its potential to enhance the efficiency of protein expression from therapeutic genes delivered via various gene therapy vectors. 57 They also highlight the prospect of using codon optimization as a tool to fine-tune the immunogenicity of gene therapy products, which is crucial for achieving a delicate balance between eliciting a therapeutic response and avoiding unwanted immune reactions against the transgene or the vector. 57 Furthermore, the review touches upon the potential for creating tissue-specific gene therapies by leveraging the differential codon usage patterns across various cell types. 57 However, Paremskaia et al. also provide a comprehensive overview of the persistent challenges associated with codon optimization in gene therapy. 57 These challenges include the continued risk of unintended effects on the structure, function, and stability of the target protein due to alterations in translation kinetics and mRNA folding. 57 The authors also emphasize the inherent complexity in accurately predicting and evaluating the overall effectiveness of codon optimization strategies in vivo, given the multitude of cellular factors that can influence protein expression and function.57 The review concludes by providing a detailed analysis of the current metrics used to assess codon optimization, such as CAI and tAI, and discusses their practical application in both research and clinical settings within the context of advancing gene therapeutics. 57
Table III.1: Summary of Key Foundational Studies on Codon Optimization Effects
B. Recent & Pre‐print Literature (Focus on Aggregation/Amyloid)
Recent research, including studies spurred by the COVID-19 pandemic, has provided further evidence relevant to the potential aggregation risks associated with codon-optimized products, particularly vaccines using viral antigens.
Differences in Vaccine and SARS-CoV-2 Replication Derived mRNA: Implications for Cell Biology and Future Disease" (McKernan et al, OSF preprint, 2021): This preprint by McKernan and colleagues presents an intriguing analysis comparing the mRNA sequences of the SARS-CoV-2 spike protein as it appears in the native virus and as it is encoded in the mRNA vaccines developed against it. 59 The authors focus on the impact of codon optimization, a key feature of both the Pfizer/BioNTech and Moderna vaccines, on the overall GC content of the synthetic mRNAs.59 Their analysis reveals a significant increase in the GC content of the vaccine-derived mRNAs compared to the native viral RNA sequences encoding the same spike protein.59 The authors propose that this enrichment in GC content, a direct consequence of the codon optimization process, can lead to an increased propensity for the formation of G-quadruplex structures within the vaccine mRNA. 59 G-quadruplexes are non-canonical RNA secondary structures that are rich in guanine bases and have been implicated in various cellular processes, including transcription, translation, and replication. 59 McKernan et al. hypothesize that the increased formation of these structures in vaccine-derived mRNAs, as opposed to the native viral RNA, could potentially have significant consequences for the cell biology of the host and might contribute to pathological processes initiated by SARS-CoV-2 molecular vaccination.59 The preprint also touches upon the use of N1-methylpseudouridine in place of uridine in the vaccine mRNAs, a modification designed to reduce immunogenicity and enhance translation.59 The authors suggest that this modification, while beneficial in some aspects, could further complicate the folding predictions of the mRNA and might also influence the interaction of the mRNA with cellular machinery, potentially impacting protein conformation.59 Overall, this preprint raises important questions about the potential unintended consequences of codon optimization in the context of mRNA vaccines and highlights the need for a deeper understanding of how these synthetic mRNAs interact with the host cellular environment.
SARS-CoV-2 Spike Protein Amyloidogenicity Studies (2021-2024): A growing number of in vitro studies have investigated the biophysical properties of the SARS-CoV-2 Spike (S) protein, the primary antigen used in the widely deployed mRNA vaccines. Several research groups have reported that specific peptide fragments derived from the S protein sequence (Wuhan strain and potentially variants) are amyloidogenic, capable of forming Thioflavin T-positive amyloid fibrils under physiological conditions.13 Computational prediction tools (e.g., WALTZ) identified potential APRs within Spike and other SARS-CoV-2 proteins (e.g., ORF6, ORF10, Envelope protein).13 Experimental validation using synthetic peptides confirmed the amyloidogenic nature of several predicted regions from Spike.27 Crucially, studies demonstrated that proteolytic cleavage of the full-length Spike protein, for instance by neutrophil elastase which is active during inflammation, can generate these amyloidogenic fragments.27 This provides a plausible biological mechanism for the formation of Spike-derived amyloid in vivo during infection or potentially in response to vaccination, particularly in inflammatory contexts. Furthermore, concerning evidence suggests potential cross-talk with endogenous amyloid pathways. Spike protein domains have been shown to bind heparin and, significantly, to known human amyloidogenic proteins including amyloid-β (Aβ), α-synuclein, tau, and prion protein.29 This binding interaction raises the possibility of cross-seeding, where Spike protein or its fragments could potentially trigger or accelerate the aggregation of these endogenous proteins, contributing to neurodegenerative processes.13 Supporting this, one study showed that the SARS-CoV-2 main protease (3CLpro) could directly cleave the Tau protein, leading to its aggregation in vitro.29 These studies collectively establish the inherent amyloidogenic potential of the Spike protein antigen itself and suggest plausible mechanisms through which it could contribute to protein aggregation pathologies, particularly relevant given the neurological symptoms observed in some COVID-19 patients and the theoretical concerns about long-term effects.27
The experimental designs employed in these studies typically involve in silico prediction of APRs, synthesis of corresponding peptides, incubation under amyloid-promoting conditions, and characterization using techniques like Thioflavin T (ThT) fluorescence assays (to detect amyloid formation), transmission electron microscopy (TEM) (to visualize fibril morphology), protease cleavage assays, and binding studies.13 However, a critical limitation of this body of work, in the context of this white paper, is the general lack of direct comparison between the amyloidogenic properties of Spike protein expressed from codon-optimized mRNA constructs versus that expressed from wild-type sequences. The studies primarily focus on the inherent sequence properties of the viral protein. The question of whether the specific codon optimization strategies used in vaccines (e.g., those in BNT162b2 or mRNA-1273) alter the folding kinetics in a way that either exacerbates or mitigates this inherent amyloidogenic propensity remains largely unanswered by direct experimental evidence in the reviewed literature.
Pre-prints on Codon Bias–Driven Aggregation (2023-2024): Complementing the work on specific antigens, recent pre-prints have directly investigated the impact of synonymous codon usage on protein folding efficiency in vivo. One study systematically substituted single synonymous codons throughout the sequence of an E. coli protein (ddlA) and measured the impact on functional protein levels using an in vivo assay.30 The results demonstrated that synonymous substitutions can indeed have substantial effects on folding efficiency, and these effects are context-dependent, varying with the location of the codon within the protein's structure and topology.30 Strikingly, the study found that substitutions to codons considered "rare" in E. coli often led to increased folding efficiency compared to common codons. Furthermore, an mRNA construct composed entirely of rare codons resulted in higher functional protein expression than a construct using only common codons.30 This directly challenges the simplistic paradigm that maximizing the use of common codons (high CAI) is always optimal for functional protein yield. It strongly suggests that translation kinetics, including potential pauses mediated by rare codons, play a critical role in achieving correct co-translational folding.
These recent findings provide direct experimental support for the foundational critiques raised by Mauro & Chappell.3 They confirm that synonymous codons are not functionally silent in vivo and can significantly influence protein folding outcomes. The observation that rare codons can be beneficial highlights an "optimization paradox": strategies focused solely on maximizing translational speed might inadvertently compromise folding efficiency, potentially leading to increased misfolding and aggregation, even if the overall rate of polypeptide synthesis is increased. This underscores the necessity for optimization strategies that consider the complex interplay between codon usage, translation kinetics, and the co-translational folding landscape, rather than relying solely on frequency-based metrics like CAI.
The convergence of these lines of research is significant. Foundational work questioned the neutrality of codon optimization and linked sequence evolution to structural constraints.3 Recent experimental studies directly demonstrate the impact of synonymous codons on folding efficiency in vivo, sometimes contradicting simple optimization rules.30 Concurrently, research on relevant therapeutic targets like the SARS-CoV-2 Spike protein reveals inherent amyloidogenic potential and possible cross-seeding interactions.13 Together, this builds a compelling case that codon optimization is a process with the potential to modulate protein folding and aggregation risk, particularly for proteins already predisposed to misfolding. The critical missing link in the current literature remains the direct, systematic comparison of folding and aggregation propensity between codon-optimized and wild-type versions of clinically relevant proteins like Spike, expressed via mRNA platforms.
IV. Regulatory Review
A thorough review of the regulatory landscape is essential to understand how the potential risks associated with codon optimization, particularly protein misfolding and aggregation, are currently managed for mRNA vaccines and gene therapies.
A. Regulatory Filings & Guidance on Codon Optimization Safety
An examination of publicly available guidance documents from the FDA and EMA, along with relevant regulations (e.g., 21 CFR 210/211, Directive 2001/83/EC, ICH Q8-Q12), reveals a lack of explicit, detailed requirements specifically addressing the potential impact of codon optimization on protein folding and aggregation as a distinct risk factor requiring targeted assessment.15 While general principles mandate control over product identity, purity, potency, and stability, and require characterization of the manufacturing process and final product, the specific potential for synonymous codon changes to alter translation kinetics and subsequent protein conformation appears to be subsumed under these broader categories without dedicated scrutiny. For instance, draft EMA guidance on mRNA vaccines requires submission of the full sequence and description of functional elements, and mandates characterization including process-related impurities, but does not explicitly link codon optimization choices to mandatory protein folding or aggregation assays.21 Similarly, Chinese NMPA guidance notes that mRNA molecular design (including coding sequence optimization) "may have an impact on... translation efficiency and immunogenicity," hinting at downstream effects but not specifying required protein structural assessments.19
Reviewing information related to the specific COVID-19 mRNA vaccine submissions (Pfizer/BioNTech's BNT162b2 and Moderna's mRNA-1273) provides further insight. Publicly available analyses indicate that codon optimization strategies were indeed employed and described, focusing on aspects like increasing GC content, modulating CpG frequency, and selecting preferred codons for specific amino acids like Arginine and Leucine, primarily to enhance expression and stability.6 However, based on the available documentation and analyses 6, there is little indication that the regulatory submissions included comprehensive data specifically assessing whether these optimization choices adversely affected Spike protein folding, conformational homogeneity, or aggregation propensity compared to a non-optimized version. Characterization focused on mRNA integrity (identifying fragments from premature transcription termination 34), potency (neutralizing antibody induction), and overall clinical safety and efficacy 35, rather than detailed biophysical characterization of the encoded protein's folding state as influenced by codon choice.
B. Workshop Slides & Public Comments
The risks associated with codon optimization and protein folding were well known to regulators, including the FDA and EMA. In 2016, Susan L. Kirshner, Ph.D, from the FDA’s office of biotechnology products, has given a workshop to EMA entitled “Immunogenicity of Biological Therapeutics – Product Quality Attributes” 37. In it she stated that “Codon optimization and protein folding” is a “Construct design” that is part of the “Factors that Affect Product Quality”.
In 2019, Katerina Alexaki, who works in the FDA’s homeostasis Branch, DPPT, OTAT, FDA, has given a talk entitled “Effects of codon optimization on biotherapeutics: Implications for immunogenicity” 55. In it she described the process of codon optimization and the role that it plays in a protein's rate of translation, expression and confirmational properties. She described the presence of synonymous mutations in various diseases and the immunogenicity implications relating to those mutations. She used Factor IX as a model to describe the workflow to generate variants using CoCoPUTs and the evaluation of those variants for protein expression, confirmational differences, peptide presentation and many more attributes.
In the talk she stated the following: “…The effects of codon optimization. Although most people think of them, think of codon optimization as harmless, it has been shown, there has been several reports showing that a single synonymous mutation may be associated with disease, and in many cases the reason is unknown, in other cases the rate of translations has been implicated or the RNA structure has been implicated, but the fact is that a lot of synonymous mutations … has been associated with disease. So, you can think that if one can cause disease, if you codon optimize a gene (protein) you now have multiple substitutions, then there is a good chance that you may have an effect, and this has already been reviewed several times”.
She also noted that “now that we are moving into the gene therapy era, the expression is actually going to start from the liver, and if you target a different tissue, you will get expression from a different cell, and these cells have drastically different codon usage in their transcriptome, and also, they may have different tRNA levels. So, figuring out the translation kinetics of a protein in one tissue tells you nothing about the translation kinetics in different tissue”, and displayed the codon usage database for 51 different tissues55.
Currently mandated assays for biologics generally include tests for sterility, endotoxin levels, identity (confirming the correct molecule), purity (detecting contaminants and major product variants), potency (measuring biological activity), and stability over time.15 As noted above, there is a conspicuous absence of consistently required assays specifically designed to evaluate the impact of the codon optimization process itself on the conformational integrity and aggregation propensity of the resulting protein. Biophysical techniques like circular dichroism (CD) spectroscopy (for secondary structure), differential scanning calorimetry (DSC) (for thermal stability), size-exclusion chromatography with multi-angle light scattering (SEC-MALS) (for aggregation), Thioflavin T (ThT) fluorescence, or seeding assays are powerful tools for such assessments but do not appear to be standard CMC requirements triggered specifically by the use of codon optimization.
Tracing the timeline of regulatory focus is challenging, but the rapid development of mRNA vaccines during the COVID-19 pandemic likely influenced priorities. The urgency of the situation may have led to an acceptance of greater uncertainty regarding more subtle or long-term risks, such as those potentially associated with codon optimization's effect on folding, in favor of rapidly assessing efficacy and acute safety.35 While products like BNT162b2 subsequently received full Biologics License Application (BLA) approval, it is unclear if the depth of scrutiny regarding optimization-specific folding risks significantly increased beyond the data submitted for Emergency Use Authorization (EUA). This potential "EUA effect" might mean that initial gaps in assessment persisted into the full approval process.
Table IV.1: Timeline of Regulatory Guidance/Requirements on Codon Optimization & Protein Folding/Aggregation (Illustrative based on available data)
Furthermore, the characterization of mRNA technology as a "platform" – where the delivery system (e.g., Lipid Nanoparticles - LNPs) and mRNA backbone structure are largely conserved, with changes primarily in the coding sequence – presents both opportunities and potential regulatory pitfalls.20 While platform approaches can streamline development and review, they risk oversimplification if the unique risks associated with a specific encoded protein and its specific codon optimization strategy are overlooked. The folding challenges and aggregation propensity of a small, simple protein are vastly different from those of a large, complex, multi-domain protein like a viral spike protein. Moreover, the consequences of misfolding vary greatly depending on the protein's identity and function. Therefore, assuming that general platform validation adequately covers the product-specific risks introduced by codon optimization choices for diverse protein targets may be insufficient. A nuanced approach acknowledging both platform commonalities and product-specific risks related to codon optimization appears necessary.
V. Risk Assessment: Protein Misfolding & Prion-Related Hazards
Given the mechanistic link between synonymous codon changes and altered translation kinetics, the potential for codon optimization to provoke misfolding, aggregation, and prion‑like phenomena warrants a thorough, multi‑layered risk assessment. This section evaluates—across development, design, and clinical surveillance—whether current practices sufficiently detect or mitigate these hazards.
A. Measurement of Prion/Amyloid Risks in Development
Standard preclinical toxicology programs and CMC testing protocols for biologics do not typically include assays specifically designed to detect or quantify amyloid formation or prion-like seeding activity resulting from codon optimization. While techniques exist, their routine application in this context appears limited:
● Standard Aggregation Assays: Assays like Thioflavin T (ThT) fluorescence, which specifically binds to the cross-β sheet structure characteristic of amyloid fibrils, are widely used in research settings to monitor amyloidogenesis.29 Seeding assays, which measure the ability of a sample to accelerate the aggregation of a monomeric substrate protein, are crucial for assessing prion-like propagation potential.10 Biophysical methods like dynamic light scattering (DLS) and size-exclusion chromatography (SEC), sometimes coupled with multi-angle light scattering (SEC-MALS), can detect the presence of soluble aggregates and determine particle size distributions.41 However, based on regulatory guidance reviews and product characterization reports 15, these assays are not standard mandatory components of CMC packages specifically required to evaluate the conformational consequences of codon optimization for mRNA-derived proteins.
● In Vivo Aggregation Studies: Preclinical Good Laboratory Practice (GLP) toxicity studies are designed to detect overt toxicity but generally lack the specific endpoints or duration needed to identify subtle or slowly developing protein aggregation pathology in vivo. Specialized animal models, specific staining techniques (e.g., Congo Red for amyloid), or targeted biomarker analysis would be required to assess in vivo deposition, but these do not appear to be routinely employed for evaluating codon-optimized mRNA products.
B. Screening for Aggregation-Prone Motifs
Bioinformatic tools capable of predicting aggregation-prone regions (APRs) or amyloidogenic sequences within a protein are readily available and used in research.11 These tools could theoretically be integrated into the design phase of codon-optimized sequences. If high-risk motifs are identified, mitigation strategies could potentially be employed, such as introducing flanking "gatekeeper" residues (like proline or charged amino acids) known to disrupt β-sheet formation or aggregation 12, or perhaps even selecting synonymous codons that slow translation in these specific regions to allow more time for correct folding (a strategy potentially conflicting with simple optimization for speed). However, there is no indication from the reviewed materials that such predictive screening and targeted mitigation related to codon optimization choices are standard industry practice or an explicit regulatory expectation documented in filings.
C. Clinical Surveillance for Neurodegenerative Endpoints
Assessing the risk of long-latency diseases like neurodegenerative amyloidopathies or prion diseases poses significant challenges in the clinical setting.
● Clinical Trial Protocols: Standard clinical trial protocols for vaccines and most gene therapies are typically designed to assess efficacy and acute to sub-acute adverse events over periods of months to a few years.43 Examination of protocols for major COVID-19 mRNA vaccine trials reveals a focus on infection prevention and common reactogenicity, with standard collection of all adverse events (AEs).43 However, they generally lack specific provisions for systematic, long-term monitoring of neurodegenerative endpoints using sensitive tools like detailed neurological exams, cognitive assessments, or specific fluid/imaging biomarkers. Capturing rare, long-latency events is beyond the scope of typical pre-licensure trials.
● Post-Marketing Surveillance: Pharmacovigilance systems like the Vaccine Adverse Event Reporting System (VAERS) in the US and EudraVigilance in the EU serve as important tools for detecting potential safety signals after product approval.45 These systems rely on spontaneous reporting and have inherent limitations, including underreporting, reporting bias, lack of reliable denominator data, and difficulty establishing causality, especially for events with long latency or high background rates in the population.45 While analyses of these databases for COVID-19 mRNA vaccines have identified signals for myocarditis/pericarditis 45 and explored various reported AEs including neurological symptoms like headache or dizziness 45, they have not, to date, established a confirmed causal link between mRNA vaccines and chronic neurodegenerative diseases like Alzheimer's, Parkinson's, or Creutzfeldt-Jakob disease (CJD).45 Reports of conditions like confusional states have been noted, particularly in older adults, but causality remains undetermined.45 The identified neurological risks like Guillain-Barré Syndrome (GBS) and Thrombosis with Thrombocytopenia Syndrome (TTS) were linked to the adenovirus-vector based J&J vaccine, not the mRNA vaccines.46
D. Additional Evidence from Foundational Studies
Recent studies and a preprint amplify concerns about protein misfolding risks associated with codon optimization:
Pattison et al. (2008): This study demonstrated that misfolded polyglutamine preamyloid oligomers in cardiomyocytes cause heart failure in transgenic mice, expanding the scope of misfolding risks beyond neurological diseases to systemic effects. This suggests that codon optimization, if promoting misfolding, could have severe cardiovascular implications, broadening the hazard profile.
Chevance & Hughes (2017): By proposing a "triplet of triplets" genetic code, this work showed that codon context—specifically the influence of upstream codons—affects translation efficiency. Optimization strategies ignoring this complexity could disrupt natural translation rhythms, potentially increasing misfolding by altering co-translational folding kinetics.
TissueCoCoPUTs (2020): This resource documented tissue-specific codon and codon-pair usage across 51 human tissues, revealing significant variation driven by differential gene expression. Generic optimization may mismatch tissue-specific tRNA pools, risking inefficient translation or misfolding in tissues like the heart or brain, where codon preferences diverge from genomic averages.
Xia (2021): A critique of Pfizer/BioNTech and Moderna mRNA vaccines highlighted suboptimal codon choices that could affect translation kinetics and protein folding. Specific examples, such as codon pair discrepancies, suggest that current optimization strategies may inadvertently heighten misfolding risks.
Paremskaia et al. (2024): This review underscored the dual nature of codon optimization in gene therapy—enhancing expression while risking unintended protein alterations due to altered kinetics. It emphasized the difficulty in predicting in vivo folding outcomes, reinforcing the need for rigorous assessment.
McKernan et al. (2021): This preprint compared vaccine-derived mRNA to SARS-CoV-2 mRNA, noting increased GC content in optimized sequences, potentially forming G-quadruplex structures. These could alter mRNA folding and translation, possibly elevating misfolding or aggregation risks compared to native sequences.
E. Systemic and Tissue-Specific Risks
The systemic delivery of mRNA vaccines via lipid nanoparticles (LNPs) results in protein expression across diverse tissues, each with unique codon usage profiles. This amplifies the risk of misfolding in tissues where optimization mismatches local translational machinery, such as the heart (per Pattison et al.) or brain. The lack of tissue-tailored optimization strategies exacerbates this concern, potentially leading to unpredictable folding outcomes and broader health impacts.
A significant disconnect exists between the scientifically plausible, mechanistically supported theoretical risk of codon optimization influencing protein misfolding/aggregation (supported by foundational science and emerging data) and the apparent lack of routine, targeted assessment methodologies employed during product development (CMC) and clinical evaluation. While the inherent amyloidogenic potential of antigens like Spike protein is being investigated in research, the specific contribution or modulation of this risk by the codon optimization process itself is not systematically evaluated through mandated assays or comparative studies.
Furthermore, the nature of potential prion-like or amyloid-related diseases presents a major surveillance challenge. Their typically long latency periods, extending potentially for years or decades, far exceed the duration of standard clinical trials.43 Early symptoms are often insidious and non-specific, making timely diagnosis difficult and potentially confounding causal attribution in pharmacovigilance systems, especially in populations where such diseases have a significant background incidence.45 Consequently, relying solely on current clinical trial designs and passive post-marketing surveillance systems to detect such risks is likely insufficient. This limitation strongly argues for the importance of rigorous preclinical and CMC characterization to proactively minimize any potential risk before products reach widespread use.
VI. Case Study: COVID-19 mRNA Vaccines (BNT162b2 & mRNA-1273)
The global deployment of mRNA vaccines against SARS-CoV-2, specifically BNT162b2 (Pfizer/BioNTech) and mRNA-1273 (Moderna), provides a critical real-world case study for examining the application of codon optimization and the assessment of associated risks.
A. Codon Optimization Strategies Used
Both BNT162b2 and mRNA-1273 utilize mRNA encoding the SARS-CoV-2 Spike (S) protein, but the mRNA sequences themselves differ due to distinct optimization strategies.6 Key elements included:
● Sequence Redesign for Codon Usage: Both vaccines employed extensive codon optimization to enhance expression in human cells.6 This involved:
○ Increasing overall GC content, likely contributing to mRNA stability and potentially translation efficiency.6
○ Systematically replacing native codons with those more frequently used in highly expressed human genes. For example, codons for Arginine were shifted from the viral preference (AGR) towards the human preference (CGN), despite increasing CpG dinucleotide frequency. This was justified by the low expected levels of the CpG-sensing ZAP protein in muscle tissue and potential benefits for stability and attenuation.6 Similarly, Leucine codons were predominantly changed from UUR to CUN.6 Serine codons were also optimized, often favoring AGC.6
○ The primary goal of these changes was maximizing S protein expression levels following vaccination.6
● Nucleoside Modification: A crucial modification in both vaccines was the replacement of all uridine nucleosides with 1-methylpseudouridine (1mψ).48 This modification serves two main purposes: reducing the innate immunogenicity of the mRNA molecule itself (avoiding activation of pathways like Toll-like receptors) and enhancing translation efficiency and mRNA stability.48
● Optimization of Non-Coding Regions: The 5' cap structure, 5' and 3' untranslated regions (UTRs), and the 3' poly(A) tail were also engineered and optimized to maximize translation initiation, overall stability, and protein expression.6
B. Safety Data Review (Focus on Neuro/Aggregation)
● Preclinical: Standard preclinical toxicology studies in animals generally supported the safety profile of both vaccines, allowing progression to human trials. Specific assessments for neurotoxicity or protein aggregation pathology beyond standard histopathology were not performed.
● Clinical Trials: Pivotal Phase 3 trials demonstrated high efficacy and acceptable short-term safety profiles.43 Common adverse events were primarily transient reactogenicity (injection site pain, fatigue, headache, fever).43 The most significant safety signal identified during clinical trials and post-marketing surveillance was an increased risk of myocarditis and pericarditis, particularly in adolescent and young adult males, especially after the second dose.35 While neurological events were collected as part of standard AE reporting, systematic assessment or specific monitoring for neurodegenerative signs or biomarkers was not a feature of the main trial protocols.43
● Post-Marketing Surveillance: Large-scale monitoring via systems like VAERS and EudraVigilance confirmed the myocarditis/pericarditis signal.45 Various neurological AEs have been reported (e.g., headache, dizziness, paresthesia, facial palsy), but establishing causality is challenging.45 Crucially, these systems have not detected confirmed safety signals linking mRNA vaccines to chronic neurodegenerative diseases such as Alzheimer's disease, Parkinson's disease, or CJD.45 Some analyses noted clusters including "confusional states" in older adults, but require further investigation to determine causality.45 A re-analysis of FDA data questioned the benefit-risk balance for mRNA-1273 in young males specifically due to myocarditis risk.35
C. Amyloidogenicity Concerns: Tested or Ruled Out?
As discussed in Section III.B, in vitro studies provide evidence for the inherent amyloidogenic potential of certain SARS-CoV-2 Spike protein fragments.13 The critical question is whether this specific risk, and the potential influence of codon optimization on it, was assessed during the regulatory review of BNT162b2 and mRNA-1273.
Based on publicly available information and analyses 6, there is no evidence that the potential amyloidogenicity of the codon-optimized Spike protein expressed by the vaccines was specifically investigated using targeted assays (e.g., ThT, seeding assays, detailed biophysical characterization) as part of the CMC data package submitted for EUA or BLA approval. The regulatory focus remained on mRNA integrity, identity, purity, potency (immunogenicity), and overall clinical safety/efficacy. Therefore, the potential risk of Spike protein misfolding or aggregation, potentially influenced by the extensive codon optimization employed, appears to have been unaddressed or assumed negligible during the approval process. This represents a tangible example of the data gaps identified in Section VI.
Table VII.1: Codon Optimization & Safety Signals for COVID-19 mRNA Vaccines
This case study highlights a potential disconnect in risk assessment. While the antigen itself (Spike protein) possesses certain inherent biophysical properties, including potential amyloidogenicity, the regulatory process appears not to have specifically interrogated whether the manufacturing process choice (i.e., the specific codon optimization strategy) modulated this inherent risk. The assessment seems to have implicitly treated the codon-optimized version as functionally equivalent to the intended antigen regarding folding and aggregation propensity, without requiring direct comparative evidence. This conflation of antigen risk with the potential additional risks introduced by the optimization process itself is a central concern.
VII. Broader Applications & Food-Chain Safety
The implications of codon optimization extend beyond human therapeutics to the rapidly developing field of veterinary biologics, including vaccines for livestock and poultry. This expansion introduces unique considerations regarding food chain safety and environmental exposure.
A. mRNA Vaccines for Livestock/Poultry
mRNA vaccine technology is being actively explored and developed for use in animals to combat infectious diseases affecting agriculture and potentially posing zoonotic threats.32 Codon optimization is a key enabling technique in this field as well, used to adapt gene sequences for efficient expression in relevant animal species like chickens or cattle.49 For example, a codon-optimized mRNA vaccine encoding avian influenza hemagglutinin demonstrated complete protection against H5N1 virus strains in chickens, highlighting the potential efficacy of this approach in poultry.49 This vaccine utilized N¹-Me-Pseudo UTP modification and LNP delivery, similar to human vaccines.49 Research on DNA vaccines (a related nucleic acid platform) in animals also emphasizes the importance of codon optimization for enhancing immunogenicity.50
Regulatory oversight for veterinary biologics in the US falls under the USDA's Animal and Plant Health Inspection Service (APHIS) Center for Veterinary Biologics (CVB), operating under the Virus Serum Toxin Act.51 In Europe, the EMA has a parallel structure for veterinary medicines. These agencies require products to be pure, safe, potent, and effective.51 Risk analysis procedures are employed, particularly for live vaccines, biotechnology-derived products, and imports, considering factors like potential environmental release and contamination.52 Recognizing the emergence of mRNA technology in the veterinary space, the EMA is in the process of developing specific quality guidelines for veterinary mRNA vaccines.32
B. Zoonotic and Environmental Risk of Misfolded Proteins
The use of codon-optimized mRNA vaccines in food-producing animals introduces a potential risk pathway not typically encountered with human therapeutics: the entry of vaccine-derived proteins into the human food chain or the environment. A theoretical risk scenario involves:
1. Administration of a codon-optimized mRNA vaccine to livestock or poultry.
2. Expression of the target antigen within the animal's cells. If codon optimization leads to increased misfolding or aggregation, these non-native protein forms could potentially accumulate in tissues.
3. Human exposure through consumption of meat, milk, eggs, or other products derived from vaccinated animals.
4. Environmental shedding of the protein through excreta, potentially leading to wider ecological exposure.
This scenario raises concerns analogous to, yet distinct from, classical zoonotic disease transmission, which typically involves infectious pathogens like viruses or bacteria spreading from animals to humans.53 Here, the concern is the potential transmission of misfolded proteins that might act as proteopathic seeds, potentially initiating or accelerating aggregation-related diseases in consumers (a prion-like transmission paradigm applied to vaccine products).10 The stability of such misfolded proteins during food processing (e.g., cooking) or their persistence in the environment are critical unknowns. Prions, for example, are notoriously resistant to degradation. Whether vaccine-derived aggregates would exhibit similar resilience requires investigation. Current regulatory frameworks for veterinary biologics focus primarily on ensuring vaccine purity (freedom from contaminating live agents), potency, and direct safety to the target animal, along with preventing the spread of the vaccine agent itself.21 The specific risk of misfolded/aggregated protein products entering the food supply due to codon optimization strategies does not appear to be explicitly addressed in standard risk assessments or quality control requirements outlined in the available guidance snippets.
C. QA/QC Measures for Veterinary Products
Quality assurance and quality control (QA/QC) for veterinary biologics ensure product consistency and adherence to specifications.51 However, as with human therapeutics, standard QC testing (e.g., potency assays, general purity checks) may not be sufficient to detect low levels of protein misfolding or aggregation if specific assays targeting these attributes (e.g., ThT, DLS, SEC) are not employed.21 Batch-to-batch consistency could be achieved even if a consistent low level of misfolded protein is produced, which might go undetected without targeted conformational analysis.
The application of codon-optimized mRNA vaccines in veterinary medicine may, therefore, represent an area of amplified risk compared to human use. The sheer scale of potential administration in livestock populations could be enormous. The pathways for human exposure are broader and less controlled, involving the food chain and environment rather than direct therapeutic administration. Furthermore, the potential for bioaccumulation in animals consumed over time could increase exposure levels. Coupled with potentially less stringent regulatory scrutiny specifically focused on the molecular consequences of codon optimization on protein folding compared to human drugs (especially for novel platforms), the veterinary application warrants careful consideration and potentially tailored risk assessment strategies to ensure food chain safety and environmental health.
IIX. Gap Analysis & Implications
The preceding review reveals critical gaps in our understanding, assessment, and regulation of codon‑optimization–mediated risks—particularly as they relate to protein misfolding, aggregation, and long‑term pathology. These gaps undermine confidence that current development and oversight practices adequately safeguard public and animal health.
A. Unaddressed Risks & Missing Data
1. Lack of Tissue-Specific Optimization: TissueCoCoPUTs (2020) revealed significant codon usage variation across tissues, yet optimization strategies often rely on generic, genome-wide metrics. This mismatch could lead to misfolding or reduced expression in tissues with distinct codon preferences, such as the heart or blood. In the case of COVID19, systemic distribution of these vaccines means the Spike protein is expressed in tissues with unique codon preferences, potentially exacerbating misfolding risks if the algorithm does not account for such diversity.
2. Insufficient Consideration of Codon Context: Chevance & Hughes (2017) demonstrated that codon context influences translation efficiency, a factor overlooked by standard optimization algorithms focused on individual codon frequency. This gap could disrupt folding kinetics, increasing aggregation risks.
3. Lack of Standardized Folding/Aggregation Assays in CMC: No regulatory authority currently mandates the inclusion of head‑to‑head biophysical assays (e.g., circular dichroism, differential scanning calorimetry, size‑exclusion chromatography–multi‑angle light scattering, Thioflavin T fluorescence, seeding assays) to compare codon‑optimized versus native‑sequence proteins as part of Chemistry, Manufacturing, and Controls (CMC) submissions. Without such data, subtle yet biologically consequential changes in folding kinetics and aggregation propensity may go undetected.
4. Limited Comparative Studies (Optimized vs. Wild-Type): studies comparing folding and aggregation outcomes of optimized versus non-optimized proteins, particularly for therapeutic targets like the SARS-CoV-2 Spike protein, are scarce in the published literature. Xia (2021) and McKernan et al. (2021) suggest vaccine-specific optimization flaws, but direct evidence is lacking. High-quality studies directly comparing the structure, stability, folding kinetics, and aggregation behavior of proteins expressed from codon-optimized sequences versus their native-sequence counterparts, under identical expression and purification conditions, especially for clinically relevant therapeutics like the SARS-CoV-2 Spike protein, are missing.
5. Insufficient Surveillance for Long‑Latency Pathologies: Standard clinical trials and passive pharmacovigilance (e.g., VAERS, EudraVigilance) are optimized to detect acute reactogenicity and common adverse events but are ill‑suited for prion‑like or amyloidogenic processes with multi‑year to decadal latency. No active registries track neurodegenerative endpoints (e.g., cognitive decline, aggregation biomarkers) in recipients of codon‑optimized mRNA therapeutics.
6. Unknown Impact of Diverse Optimization Algorithms: Different codon optimization algorithms employ varied strategies and prioritize different parameters (e.g., Codon Adaptation Index [CAI] maximization, GC content targeting, or motif avoidance). The specific consequences of these algorithmic choices on translation kinetics and folding outcomes in vivo are poorly understood and not systematically compared.
7. Insufficient Data on Protein-LNP Interactions: The potential interactions between the Lipid Nanoparticle (LNP) delivery system and the conformation of the encoded protein (correctly folded vs. misfolded/aggregated species) are largely unexplored. It is unknown if LNPs might mask, stabilize, or even potentially exacerbate aggregation issues.
8. Insufficient Data on pediatric population: Given the novelty of the technology, the potential for lifelong exposure starting in infancy, and the unique vulnerabilities of this population, introduction of such technology could introduce unknown risks. To the moment of writing this paper, that has been no comprehensive longitudinal studies to assess the long-term health outcomes of children who receive codon-optimized mRNA vaccines and gene therapies.
9. Veterinary & Food‑Chain Considerations: Codon‑optimized mRNA vaccines for livestock and poultry pose additional uncertainties: misfolded proteins could traverse the food chain, interact with gut epithelium, or accumulate in animal tissues. Current QA/QC protocols for veterinary biologics lack targeted assays for low‑level aggregation detection, risking undetected exposure on a mass scale.
B. Regulatory "Whitewashing" vs. Risk Tolerance
The observed lack of explicit regulatory focus on the specific folding risks introduced by codon optimization raises questions about the underlying reasons. Is this a deliberate omission despite awareness (a scenario sometimes termed "regulatory whitewashing"), or does it reflect a calculated acceptance of risk based on the belief that existing controls are sufficient or the risk itself is negligible?
Arguments supporting a perspective of acceptable risk tolerance might include: a historical precedent from recombinant protein drugs (though optimization strategies may differ significantly for mRNA); a belief that standard potency and purity assays indirectly capture any functionally significant misfolding; the demonstrated overall benefit and apparent safety profile of widely used products like the COVID-19 mRNA vaccines 45; and the undeniable pressure for rapid approvals during public health emergencies like the pandemic, which inherently involves accepting greater uncertainty.
Conversely, arguments suggesting insufficient scrutiny or potential omission point to the long-standing scientific concerns articulated since at least 2014 3, the plausible mechanistic links between optimization and altered folding kinetics, emerging experimental data confirming these effects in vivo 30, and the known amyloidogenic potential of certain target proteins.27 From this perspective, the absence of specific guidance and mandated assays represents a failure of the regulatory framework to adapt proactively to the specific potential risks posed by this powerful gene engineering technique. The difficulty in confirming the reported 2016 FDA discussion point further clouds the issue of historical regulatory awareness.
Without access to internal regulatory decision-making processes, definitively concluding deliberate omission ("whitewashing") is speculative. However, the analysis strongly suggests that the current regulatory framework is insufficiently specific and rigorous in its approach to the potential conformational risks introduced by codon optimization, leaving significant gaps in the assurance of long-term safety regarding protein misfolding and aggregation.
C. Ethical/Legal Characterization
If the potential risks associated with codon optimization's impact on protein folding were known or reasonably foreseeable based on accumulating scientific evidence (e.g., post-2014), yet specific assessments were not systematically required or performed by manufacturers or regulators, several ethical and potentially legal characterizations could be considered:
● Negligence: This could imply a failure to exercise the standard of care expected of regulatory bodies and manufacturers in light of available scientific knowledge, by not implementing necessary assays or updating guidance to address a foreseeable risk.
● Willful Ignorance: This stronger claim would suggest a deliberate avoidance of obtaining knowledge or conducting tests despite credible signals of potential harm. Proving intent is difficult.
● Justified Decision: This posits that the decision not to mandate specific folding/aggregation assays was a conscious and justifiable choice based on a contemporary benefit-risk assessment, where the perceived benefits (e.g., rapid vaccine development during a pandemic) were deemed to outweigh the uncertain or low-probability risks associated with codon optimization effects on folding.
While the justification for decisions made under the extreme pressure of the pandemic might be argued, the persistence of scientific concerns and the lack of proactive adaptation of regulatory requirements since the initial critiques 3 suggest that the status quo may represent insufficient scrutiny rather than a fully informed and justified acceptance of risk based on comprehensive data regarding folding hazards. As scientific understanding evolves, the ethical bar for proactive risk assessment rises.
The widespread application of codon optimization across numerous vaccines and therapies introduces a potential concern regarding the cumulative burden of risk. Even if the probability of inducing significant misfolding or seeding aggregation is very low for any single product, the administration of billions of doses globally, potentially involving multiple different codon-optimized products per individual over a lifetime, could translate a small individual risk into a larger public health concern at the population level. The potential for additive or synergistic effects, including cross-seeding between different misfolded proteins originating from different therapeutic interventions, is entirely uncharacterized and unaddressed by current risk assessment paradigms. This highlights the need to evaluate the systemic implications of deploying this technology widely without fully understanding these subtle, potentially cumulative, long-term risks.
IX. Recommendations & Policy Options
Based on the analysis of scientific evidence, potential risks, and regulatory gaps concerning codon optimization in mRNA vaccines and gene therapies, several policy options and risk management enhancements should be considered.
A. Moratorium Scenarios
Given the identified uncertainties and plausible risks, particularly regarding protein misfolding and aggregation, various levels of moratorium could be contemplated:
1. Full Moratorium: Suspend all new clinical trial initiations and marketing authorizations for human and animal mRNA/gene therapies employing codon optimization.
○ Justification: Addresses the significant evidence gaps concerning folding/aggregation risks across all applications, the plausible mechanisms for harm (including long-term effects like neurodegeneration or prion-like seeding), the lack of validated assessment assays, and specific concerns about food chain safety from veterinary use. Prioritizes precaution until risks are thoroughly characterized and mitigated.
○ Drawbacks: Highly disruptive to ongoing research and development; potentially delays beneficial therapies; may be disproportionate if risks are confined to specific protein types or optimization strategies.
2. Targeted Hold/Moratorium: Allow existing approved products and ongoing clinical trials to continue (under enhanced monitoring) but place a temporary hold on the initiation of new clinical trials or marketing applications for constructs involving:
○ Aggressive codon optimization strategies (e.g., those solely maximizing CAI or drastically altering predicted translation kinetics without justification).
○ Encoding proteins with known high intrinsic aggregation propensity or links to amyloid diseases (e.g., certain viral proteins, potentially misfolding-prone human proteins).
○ Justification: Provides a balanced approach, acknowledging the potential benefits of existing therapies while preventing the introduction of potentially higher-risk new products pending further data. Focuses precaution on areas of greatest theoretical concern. Allows time for development of better assessment tools.
○ Drawbacks: Defining "aggressive optimization" or "high aggregation propensity" requires clear criteria; may still slow innovation.
3. Conditional Use with Immediate Enhanced Oversight: Reject a moratorium but immediately implement a significantly enhanced risk management framework (see below) for all new and ongoing codon-optimized mRNA/gene therapy projects. Continued development and approval would be contingent on adherence to these stricter requirements.
○ Justification: Avoids disruption while proactively addressing the identified gaps. Assumes that enhanced assessment and surveillance can adequately manage the risks. Relies on the feasibility and effectiveness of the proposed enhancements.
○ Drawbacks: Places significant immediate burden on developers and regulators; effectiveness depends on rapid development and validation of new assays and guidance.
Duration: Any moratorium or hold should be time-limited and linked to specific, achievable milestones, such as: (i) development, validation, and regulatory acceptance of standardized assays for assessing codon optimization's impact on protein folding/aggregation; (ii) completion of benchmark comparative studies (optimized vs. non-optimized) for key protein classes; (iii) issuance of updated, specific regulatory guidance by FDA/EMA; (iv) establishment of robust long-term surveillance mechanisms for relevant adverse events.
B. Risk‐Management Framework Enhancements
Regardless of whether a moratorium is imposed, the following enhancements to the risk management framework are strongly recommended:
● Mandatory Preclinical/CMC Assays:
○ Require comparative biophysical characterization for all codon-optimized protein products intended for in vivo use. This should include, at minimum, validated assays comparing the optimized protein to a non-optimized (e.g., wild-type or minimally modified) version expressed in the same system. Techniques should assess:
■ Secondary/Tertiary Structure: e.g., Circular Dichroism (CD), Fourier-Transform Infrared (FTIR) spectroscopy.
■ Thermal Stability: e.g., Differential Scanning Calorimetry (DSC).
■ Aggregation Propensity: e.g., Size-Exclusion Chromatography with Multi-Angle Light Scattering (SEC-MALS), Dynamic Light Scattering (DLS), Thioflavin T (ThT) fluorescence assays, potentially seeding assays if warranted by protein type or initial findings.
○ Establish clear acceptance criteria for conformational similarity and aggregation levels based on these comparative studies.
○ Consider requiring targeted in vivo preclinical studies in relevant animal models to detect aggregation, deposition, or related pathology for products involving high-risk proteins or intended for chronic administration. Standard GLP tox studies are insufficient.
● Enhanced CMC Requirements:
○ Regulatory guidance should explicitly define the codon optimization strategy as a critical process parameter requiring detailed description, justification, and risk assessment in CMC submissions.
○ Mandate documentation of bioinformatic screening for APRs and amyloidogenic motifs during sequence design, along with any mitigation strategies employed.
○ Develop regulatory expectations regarding the impact of optimization on translation kinetics. Move beyond simple metrics like CAI and encourage consideration of potential pause sites and overall kinetic profile. Define acceptable boundaries for sequence divergence or kinetic alteration resulting from optimization.
● Enhanced Pharmacovigilance:
○ Implement active surveillance programs or registries specifically designed to monitor for neurological, autoimmune, and other potential aggregation-related adverse events in individuals receiving codon-optimized therapies, particularly those for chronic use or involving high-risk proteins. Standard passive reporting is inadequate for long-latency effects.
○ Support research into biomarkers for early detection of protein misfolding stress, aggregation, or subclinical neurodegeneration that could be used in long-term monitoring.
C. Regulatory Actions
● Guidance Updates: FDA and EMA (both human and veterinary divisions) must develop and issue updated, specific guidance documents that explicitly address the potential risks of codon optimization on protein folding, aggregation, stability, and immunogenicity. This guidance should clarify expectations for CMC characterization, preclinical assessment, and risk management. The concept of "codon optimization impact on protein folding" should be formally recognized as a CQA requiring assessment.
● Define Thresholds and Best Practices: Foster scientific discussion and regulatory workshops to work towards defining scientifically justified thresholds for acceptable conformational changes or aggregation levels resulting from optimization. Promote development of best practices for designing optimization algorithms that balance expression enhancement with folding fidelity.
● International Harmonization: Encourage international collaboration (e.g., through ICH, WHO, World Organisation for Animal Health - WOAH) to harmonize regulatory requirements for assessing codon optimization risks. Consistency is crucial for global health security, pandemic preparedness, and international trade in veterinary biologics.
Table IX.1: Proposed Risk Management Framework Components
X. Conclusion: Moratorium Decision
The central question addressed by this white paper is whether a moratorium on the use of codon optimization in human and animal mRNA vaccines and gene therapies is warranted due to potential risks associated with protein misfolding and aggregation.
Synthesis of Evidence: The analysis confirms that codon optimization is a powerful and widely adopted technology essential for achieving therapeutic levels of protein expression from mRNA and gene therapy vectors1. However, this process is not biologically neutral. A robust mechanistic foundation, rooted in co-translational folding principles and bolstered by experimental evidence, links synonymous codon alterations to changes in translation kinetics, which can influence protein folding pathways3. This introduces a plausible risk of increased protein misfolding, aggregation, and potentially altered immunogenicity or function3. Particular concerns arise for protein targets with inherent amyloidogenic potential, such as the SARS-CoV-2 Spike protein in major vaccines, where in vitro studies have demonstrated amyloid formation and cross-seeding activity13.
Recent studies amplify these concerns with compelling new evidence. The cardiomyocyte PAO model demonstrates that protein aggregation can cause delayed, irreversible organ damage, a risk magnified in developing tissues with high translational activity—most notably in pediatric populations. The TissueCoCoPUTs resource reveals significant tissue-specific variations in codon usage, indicating that generic optimization strategies may mismatch translational machinery, potentially exacerbating misfolding risks. Additionally, analyses by McKernan et al. highlight structural differences in optimized mRNA, such as elevated G-quadruplex formation, which may carry unforeseen biological consequences.
Significant gaps persist in regulatory oversight and industry practice. There are no mandated assays specifically designed to assess codon optimization’s impact on protein conformational integrity or aggregation propensity. Comparative studies between optimized and non-optimized proteins remain scarce, and current clinical trial designs and pharmacovigilance systems are ill-equipped to detect long-latency consequences, such as neurodegenerative or prion-like diseases. The rapid deployment of COVID-19 mRNA vaccines exemplifies this shortfall, with optimization-related folding risks largely unaddressed pre-authorization. Extending this technology to veterinary applications further complicates the risk profile, introducing food chain safety and environmental exposure concerns that remain under-explored.
Weighing Benefits vs. Risks/Uncertainties: The benefits of mRNA technology—its speed, adaptability, and efficacy, particularly during the COVID-19 pandemic—must be balanced against the potential for serious, long-term harm from protein aggregation diseases. While the probability of such outcomes may be uncertain, the severity of delayed toxicities, as evidenced by the PAO model and prion disease precedents, combined with substantial data gaps, elevates the risk profile beyond acceptable thresholds for vulnerable populations and untested applications.
Feasibility of Mitigation: The risk management framework proposed in Section IX, including advanced biophysical assays, enhanced CMC requirements, and longitudinal surveillance, offers a viable path to mitigate these risks. Implementation is feasible but demands coordinated efforts from industry and regulators to develop, validate, and enforce these standards—particularly the kinetic-aware design and tiered risk stratification emphasized in the new recommendations.
Final Decision and Justification:
Synthesis of this expanded evidence base necessitates a revised risk-benefit calculus, leading to a multi-tiered moratorium strategy:
Pediatric Moratorium
· The cardiomyocyte PAO model proves that protein aggregation can cause delayed, irreversible organ damage—a risk magnified in developing tissues with heightened translational activity. Given the lifelong latency of amyloid disorders and the absence of longitudinal safety data for codon-optimized therapies in children, all such products are contraindicated in patients under 18 until the following conditions are met:
· Tissue-specific codon tables (per TissueCoCoPUTs) guide pediatric mRNA design to align with developmental translational machinery.
· 24-month primate studies assess cardiac and neurological aggregation, modeling the timeline of delayed toxicity observed in Pattison’s study.
· G-quadruplex burden is maintained below 5% of native pathogen levels, as per McKernan’s structural criteria, to minimize structural risk.
Adult Use Restrictions
Emergency-authorized COVID-19 vaccines may continue under enhanced surveillance, reflecting their established public health role. However, new clinical trials for the following high-risk constructs are paused:
· Targets with APR scores >0.4 (per the TANGO algorithm), indicating elevated aggregation propensity.
· Constructs using CAI optimization without tRNA abundance validation (per Chevance’s context model), lacking kinetic context.
· Therapies exceeding 55% GC content, the threshold for G-quadruplex emergence.
These trials may proceed only after implementation of:
· Kinetic Folding Assurance (KFA): Mandatory NMR or Rosetta comparisons of folding trajectories between codon-optimized and native proteins.
· Cryptic ORF Screening: Riboseq analysis to detect aberrant translation initiation.
· Cross-Seeding Assays: Evaluation of prion-like propagation between therapeutic and human amyloidogenic proteins.
Veterinary Prohibition
· Gene therapies using codon optimization are prohibited in livestock species entering the food chain. The Pattison study’s proof of transmissible cardiomyocyte toxicity, coupled with prion disease precedents, underscores the unacceptable risk of misfolded protein transmission through food supply channels.
Concluding Statement
Codon optimization is far from a benign yield-enhancement tool—it alters biological systems at multiple scales, from nucleotide structure to protein folding and tissue-specific expression. This framework balances public health imperatives with mechanistic evidence, emphasizing kinetic-aware design and tiered risk stratification to address risks comprehensively. Only through such an approach can mRNA therapeutics realize their potential without replicating the delayed toxicity trajectories observed in PAO cardiomyopathy or prior pharmacovigilance failures. Proactive adoption of these enhanced standards is critical to safeguard vulnerable populations, ensure adult therapy safety, and protect the food chain while advancing this transformative technology.
[UPDATE]
The link to the PDF is here!!!
Works cited
1. Codon Optimization: Understanding the Basics | IDT, accessed on May 12, 2025, https://www.idtdna.com/pages/community/blog/post/codon-optimization-the-basics-explained
2. Comparative Analysis of Codon Optimization Tools: Advancing ..., accessed on May 12, 2025, https://www.jmb.or.kr/journal/view.html?doi=10.4014/jmb.2411.11066
3. A critical analysis of codon optimization in human therapeutics - ResearchGate, accessed on May 12, 2025, https://www.researchgate.net/publication/266263748_A_critical_analysis_of_codon_optimization_in_human_therapeutics
4. The protein domains of vertebrate species in which selection is more effective have greater intrinsic structural disorder - eLife, accessed on May 12, 2025, https://elifesciences.org/reviewed-preprints/87335v2/pdf
5. Model | TAU_Israel - iGEM 2022, accessed on May 12, 2025, https://2022.igem.wiki/tau-israel/model
6. Detailed Dissection and Critical Evaluation of the Pfizer/BioNTech and Moderna mRNA Vaccines - PMC, accessed on May 12, 2025, https://pmc.ncbi.nlm.nih.gov/articles/PMC8310186/
7. Unraveling co-translational protein folding: concepts and methods ..., accessed on May 12, 2025, https://pmc.ncbi.nlm.nih.gov/articles/PMC5866750/
8. Structures of protein folding intermediates on the ribosome - bioRxiv, accessed on May 12, 2025, https://www.biorxiv.org/content/10.1101/2025.04.07.647236v1.full
9. A critical analysis of codon optimization in human therapeutics, accessed on May 12, 2025, https://pubmed.ncbi.nlm.nih.gov/25263172/
10. Misfolded Protein Aggregates: Mechanisms, Structures and ..., accessed on May 12, 2025, https://pmc.ncbi.nlm.nih.gov/articles/PMC3175247/
11. A comprehensive review of databases on amyloid-like aggregation - PMC - PubMed Central, accessed on May 12, 2025, https://pmc.ncbi.nlm.nih.gov/articles/PMC11585477/
12. Modification of aggregation-prone regions of Arabidopsis glutamyl-tRNA reductase leads to increased stability while maintaining enzyme activity - Frontiers, accessed on May 12, 2025, https://www.frontiersin.org/journals/plant-science/articles/10.3389/fpls.2025.1556843/full
13. An amyloidogenic fragment of the SARS CoV-2 envelope protein promotes serum amyloid A misfolding and fibrillization | bioRxiv, accessed on May 12, 2025, https://www.biorxiv.org/content/10.1101/2024.04.25.591137v1.full-text
14. Protein language models learn evolutionary statistics of interacting sequence motifs - PNAS, accessed on May 12, 2025, https://www.pnas.org/doi/10.1073/pnas.2406285121
15. www.fda.gov, accessed on May 12, 2025, https://www.fda.gov/files/vaccines,%20blood%20&%20biologics/published/Advanced-Topics---Successful-Development-of-Quality-Cell-and-Gene-Therapy-Products.pdf
16. 21 CFR Part 210 -- Current Good Manufacturing Practice in Manufacturing, Processing, Packing, or Holding of Drugs; General - eCFR, accessed on May 12, 2025, https://www.ecfr.gov/current/title-21/chapter-I/subchapter-C/part-210
17. www.efpia.eu, accessed on May 12, 2025, https://www.efpia.eu/media/whxay2h4/attachment_efpia-ve-response-to-consultation-on-variation_sept-2023.pdf
18. Pre-authorisation guidance | European Medicines Agency (EMA), accessed on May 12, 2025, https://www.ema.europa.eu/en/human-regulatory-overview/marketing-authorisation/pre-authorisation-guidance
19. GuidelineontheChemistry,ManufactureandControl(CMC)ofProphylacticCOVID-19mRNAVaccines(CurrentVersion)-News, accessed on May 12, 2025, https://www.ccfdie.org/en/gzdt/webinfo/2024/12/1732613149983055.htm
20. Considerations for mRNA Product Development, Regulation and Deployment Across the Lifecycle - Preprints.org, accessed on May 12, 2025, https://www.preprints.org/manuscript/202502.1111/v1
21. Guideline on the quality aspects of mRNA vaccines | EMA, accessed on May 12, 2025, https://www.ema.europa.eu/en/documents/scientific-guideline/draft-guideline-quality-aspects-mrna-vaccines_en.pdf
22. Structural Determinants of the Rate of Protein ... - Oxford Academic, accessed on May 12, 2025, https://academic.oup.com/mbe/article-pdf/23/9/1751/3310018/msl040.pdf
23. Structural Determinants of the Rate of Protein Evolution in Yeast - Oxford Academic, accessed on May 12, 2025, https://academic.oup.com/mbe/article/23/9/1751/1014274
24. Structural Determinants of the Rate of Protein Evolution in Yeast - Oxford Academic, accessed on May 12, 2025, https://academic.oup.com/mbe/article-abstract/23/9/1751/1014274
25. Gene Expression and Protein Length Influence Codon Usage and Rates of Sequence Evolution in Populus tremula - Oxford Academic, accessed on May 12, 2025, https://academic.oup.com/mbe/article/24/3/836/1245157
26. Codon usage is an important determinant of gene expression levels largely through its effects on transcription | PNAS, accessed on May 12, 2025, https://www.pnas.org/doi/10.1073/pnas.1606724113
27. Full article: Viruses and amyloids - a vicious liaison - Taylor & Francis Online, accessed on May 12, 2025, https://www.tandfonline.com/doi/full/10.1080/19336896.2023.2194212
28. SARS-CoV-2 amyloid, is COVID-19-exacerbated dementia an amyloid disorder in the making? - Frontiers, accessed on May 12, 2025, https://www.frontiersin.org/journals/dementia/articles/10.3389/frdem.2023.1233340/full
29. Tau protein aggregation associated with SARS-CoV-2 main protease | PLOS One, accessed on May 12, 2025, https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0288138
30. Effects of single synonymous substitutions on folding efficiency demonstrate the influence of rare codons and protein structure | bioRxiv, accessed on May 12, 2025, https://www.biorxiv.org/content/10.1101/2025.03.16.642865v1.full
31. Considerations for mRNA Product Development, Regulation and Deployment Across the Lifecycle - MDPI, accessed on May 12, 2025, https://www.mdpi.com/2076-393X/13/5/473
32. Guideline on quality aspects of mRNA vaccines for veterinary use - EMA - European Union, accessed on May 12, 2025, https://www.ema.europa.eu/en/guideline-quality-aspects-mrna-vaccines-veterinary-use
33. mRNA Folding Algorithms for Structure and Codon Optimization - arXiv, accessed on May 12, 2025, https://arxiv.org/html/2503.19273
34. Characterization of BNT162b2 mRNA to Evaluate Risk of Off-Target Antigen Translation, accessed on May 12, 2025, https://pmc.ncbi.nlm.nih.gov/articles/PMC9836996/
35. (PDF) A Reanalysis of the FDA's Benefit-Risk Assessment of Moderna's mRNA-1273 COVID Vaccine: For 18-25-Year-Old Males, Risks Exceeded Benefits Relative to Hospitalizations - ResearchGate, accessed on May 12, 2025, https://www.researchgate.net/publication/384938620_A_Reanalysis_of_the_FDA's_Benefit-Risk_Assessment_of_Moderna's_mRNA-1273_COVID_Vaccine_For_18-25-Year-Old_Males_Risks_Exceeded_Benefits_Relative_to_Hospitalizations
36. mRNA-1273 and BNT162b2 mRNA vaccines have reduced neutralizing activity against the SARS-CoV-2 omicron variant - PubMed, accessed on May 12, 2025, https://pubmed.ncbi.nlm.nih.gov/35233550/
37. Immunogenicity of Biological Therapeutics Product Quality Attributes, EMA workshop, 2016, EMA,, accessed on May 12, 2025, https://www.ema.europa.eu/en/documents/presentation/presentation-immunogenicity-biological-therapeutics-product-quality-attributes-susan-kirshner_en.pdf
38. SMOC can act as both an antagonist and an expander of BMP signaling | eLife, accessed on May 12, 2025, https://elifesciences.org/articles/17935
39. (PDF) Affordable mRNA Novel Proteins, Recombinant Protein Conversions, and Biosimilars—Advice to Developers and Regulatory Agencies - ResearchGate, accessed on May 12, 2025, https://www.researchgate.net/publication/387715804_Affordable_mRNA_Novel_Proteins_Recombinant_Protein_Conversions_and_Biosimilars-Advice_to_Developers_and_Regulatory_Agencies
40. ACTS Abstracts - PMC - PubMed Central, accessed on May 12, 2025, https://pmc.ncbi.nlm.nih.gov/articles/PMC5350815/
41. Full article: Long-term stability and immunogenicity of lipid nanoparticle COVID-19 mRNA vaccine is affected by particle size - Taylor & Francis Online, accessed on May 12, 2025, https://www.tandfonline.com/doi/full/10.1080/21645515.2024.2342592
42. Establishing Preferred Product Characterization for the Evaluation of RNA Vaccine Antigens, accessed on May 12, 2025, https://www.mdpi.com/2076-393X/7/4/131
43. Safety and Efficacy of the BNT162b2 mRNA Covid-19 Vaccine - PubMed, accessed on May 12, 2025, https://pubmed.ncbi.nlm.nih.gov/33301246/
44. BNT162-17 Trial title - ClinicalTrials.gov, accessed on May 12, 2025, https://cdn.clinicaltrials.gov/large-docs/81/NCT05004181/Prot_000.pdf
45. Network analysis of adverse event patterns following immunization with mRNA COVID-19 vaccines: real-world data from the European pharmacovigilance database EudraVigilance - Frontiers, accessed on May 12, 2025, https://www.frontiersin.org/journals/medicine/articles/10.3389/fmed.2025.1501921/full
46. Coronavirus Disease 2019 (COVID-19) Vaccine Safety - CDC, accessed on May 12, 2025, https://www.cdc.gov/vaccine-safety/vaccines/covid-19.html
47. Technological breakthroughs and advancements in the application of mRNA vaccines: a comprehensive exploration and future prospects - Frontiers, accessed on May 12, 2025, https://www.frontiersin.org/journals/immunology/articles/10.3389/fimmu.2025.1524317/full
48. Mechanism of modified mRNA structure in COVID-19 vaccines for inducing neutralizing antibodies | Acta Biochimica Indonesiana, accessed on May 12, 2025, https://jurnal.pbbmi.org/index.php/actabioina/article/view/121
49. Avian influenza mRNA vaccine encoding hemagglutinin provides complete protection against divergent H5N1 viruses in specific-pathogen-free chickens, accessed on May 12, 2025, https://pmc.ncbi.nlm.nih.gov/articles/PMC11776166/
50. Codon optimized influenza H1 HA sequence but not CTLA-4 targeting of HA antigen to enhance the efficacy of DNA vaccines in an animal model - Taylor & Francis Online, accessed on May 12, 2025, https://www.tandfonline.com/doi/full/10.1080/1547691X.2024.2400624
51. Veterinary Biologics | Animal and Plant Health Inspection Service - USDA, accessed on May 12, 2025, https://www.aphis.usda.gov/veterinary-biologics
52. Risk Analysis and Summary Information Formats for Veterinary Biologics - APHIS - USDA, accessed on May 12, 2025, https://www.aphis.usda.gov/veterinary-biologics/regulations-guidance/summary-info-format
53. The Import of Zoonotic Diseases - Harvard Law Review, accessed on May 12, 2025, https://harvardlawreview.org/print/vol-138/the-import-of-zoonotic-diseases/
54. Unveiling the Hidden Link of Biosecurity in Preventing Vaccine Failure in Livestock: An Updated Review | Bio Communications, accessed on May 12, 2025, https://biocomjournal.com/index.php/bcs/article/view/3
55. Effects of codon optimization on biotherapeutics: Implications for immunogenicity, YouTube video of FDA Talk, accessed on May 12, 2025
56. TissueCoCoPUTs: Novel Human Tissue-Specific Codon and CodonPair Usage Tables Based on Differential Tissue Gene Expression, Journal of Molecular Biology (2020), accessed on May 12, 2025 https://doi.org/10.1016/j.jmb.2020.01.011
57. Codon-optimization in gene therapy: promises, prospects and challenges - Frontiers, accessed May 13, 2025, https://www.frontiersin.org/journals/bioengineering-and-biotechnology/articles/10.3389/fbioe.2024.1371596/full
58. Detailed Dissection and Critical Evaluation of the Pfizer/BioNTech and Moderna mRNA Vaccines - PMC, accessed May 13, 2025, https://pmc.ncbi.nlm.nih.gov/articles/PMC8310186/
59. Differences in Vaccine and SARS-CoV-2 Replication Derived mRNA: Implications for Cell Biology and Future Disease", OSF Preprint, accessed May 13, 2025
https://doi.org/10.31219/osf.io/bcsa660. Case for the genetic code as a triplet of triplets, PNAS, 2017, accessed May 13, 2025. https://doi.org/10.1073/pnas.1614896114
Holy Toledo, this is good! What a lot of work. So glad to see you writing again on my timeline.
Is there a link to this whitepaper? I would like to use this in potentially future cases or regulatory reviews.
Also, I have been digging into the poly A tail modifications which appear to help make this mRNA very stable and hard to degrade.
Thanks.
A lot of good work integrating the info and potential for risk. (Seems like an AI format?)
A few other things that might support your views (I only rapid-scanned the paper and missed things you mentioned):
1. The translational pause after the initiating signal peptide would have been removed if all codons were generically optimised. That has RER targeting and folding implications.
2. Changes may have modified codon or RNA structure pause points that integrate with RER processing and folding kinetics/ interactions (intra- and inter-protein domains, or membrane-protein).
3. Thete are spike protein composition, secondary structures, and modifications that have aggregation and amyloid seeding potential (including some research demonstrating this).
4. Although not designed specifically for muscle cells, codon optimisation is reasonable for these cells. However, a small percentage of injections deliver a bolus of vaccine into veins and lymphatic vessels leading to greater delivery to heart, brain, etc. Some of these (as you mention) have diffetent codon preferences. Tissue specific differences in codon usage would result in different folding kinetics and interactions with different misfolding (and truncated protein) outcomes.
5. Some heart tissue cells operate near the maximum protein processing capacity and a bolus of mRNA, especially in diseased or inflammatory tissue may result in greater effect.
6. Older brains may be more affected with existing aggregates, or propensity to form seeding aggregates.
7. There is a strong likelyhood of latency between a theoretical seeding of aggregate formation by vaccination and diagnosable signs/ symptons (probably a minimum of 5 years plus). Hence, no mass population evidence of amyloidosis etc after vaccination does not abrpgate the theoretical possibility.
I did not provide details as a few good interative AI loops through the details with cross checks by other models (for errors and bias) will elucidate the principles and science.
I hope this helps?
Note: I am an out-of-date, ex DPhil (Biochem/ Biotech) Research Scientist who worked at one stage on tissue/ intracellular nucleic acid and protein delivery, as well as organelle/ cell hyper-expression of proteins. But, that was 25 years ago. I have regularly changed fields (nature of the beast).