Sequence Optimization Consideration

Sequence Optimization Consideration

Heterologous protein expression

Protein heterologous expression project can reveal to be a demanding process. Among the many factors that may affect successful expression, that of the actual coding sequence utilized has been shown in many instances to be a central issue. You may benefit from re-designing or creating de novo an expression-prone DNA sequence from he protein sequence.

Poor or no expression, troncated proteins, amino-acids misincorporation are the most common consequences of codon bias and unbalanced tRNA pool. 

This issue has been extensively studied for E. coli expression and is perfectly exemplified by the arginine codons AGG and AGA -same tRNA- which are hardly ever found in E. coli highly expressed ORFs while found with significantly higher frequency in many other organisms. It would thus make better sense to replace the latter by codons used more frequently in your host organism e.g. CGT or CGC.

Frequency Codons (AGG
+ AGA) (% of all arg codons)
Expression host
(Highly expressed genes, Henaut and Danchin, Escherichia coli
and Salmonella, Vol. 2, Ch. 114:2047-2066, 1996)
E. coli
Recombinant gene source
(all genes frequencies,
A. thaliana
C. elegans
D. melanogaster
H. sapiens
S. cerevisae

Gene design

Designing a gene that will express in a particular organism a recombinant protein boils down to choosing the most appropriate triplet for each amino acids. With a ratio of 64 codons to 20 aa plus termination there is quite some flexibility to include other constraints in addition to that of codon bias adjustment such as: 

  • Gene & protein engineering: Addition or removal of specific motives (e.g. restriction sites), tags for purification, multiple stops, etc. 
  • Gene manufacturing: Maintain average GC content, avoid long repeats, palindroms, etc.