Decoding the Genetic Code
Large-Scale Language Models for Codon Optimization and Enhanced Protein Synthesis.
Master thesis, Bachelor thesis
The aim of this project is to leverage deep learning, specifically using a Large Language Model, to identify patterns in homologous protein gene sequences that indicate high expressibility. Using these identified patterns, we want to predict the producibility of heterologous proteins from their DNA sequences and validate these predictions experimentally. Additionally, we are laying the groundwork for a new codon optimization framework developed collaboratively by our interdisciplinary team, though its specific design and implementation remain flexible at this stage.
