Cloning forever

  1. 1.  Virginia Tech

No design needed!

Over the last six months, I spent considerable time and efforts trying to engage biologists in discussions about the design of expression vectors. My goal has been to better understand the design rules applicable in different domains and so that we can formalize them in GenoCAD. This effort has been an eye opening experience. It has helped me understand how foreign the notion of design is to most biologists. Most of the people think that their project is pretty simple and does not call for using any kind of design tool.

As an example, a company had difficulty expressing an enzyme critical to the development of a new line of services. They had an expression system that did not give them the quantities of soluble proteins they needed. They were off by more than an order of magnitude. The enzyme was a complex composed of two subunits. Yet, the team in charge of this project thought that the optimization of this vector was trivial. It was supposedly, just a matter of adding solubility tags and possibly a chaperone to help with the folding. Doesn't that sound simple indeed when put this way?

If the problem is so simple then I am wondering why is it that the team in charge of this project has not yet delivered a system meeting the business requirements. If it was simple, why didn't they get the right plasmid the first time? Why is it that they didn't put a solubility tag and a chaperone to begin with? What makes them think that the next plasmid is going to work when the previous one did not? If the next plasmid does not work, how many more iterations will be needed? How long will it take and how much money will it cost to deliver what the company needs?

Next plasmid...

Interestingly, the biologists who do not think they need design tools, are often craving better tools for supporting their laboratory efforts. Tools for primer design, planning of cloning experiments, and sample tracking are very high in their wish list. I think this partly helps understand the lack of interest for the design aspect of their work.

The vast majority of potential users of design tools have been trained to clone DNA. They are completely focused on the process of deriving new DNA molecules from existing ones. When they say that their project is simple, they don't mean that it will be simple to deliver an expression system that works. What they mean is that it is simple to figure out the next plasmid to make. For instance, if it is easy to add a solubility tag to the coding sequence of a gene in an existing plasmid, then they assume there is nothing more to think about. The next plasmid to make is easily identified mostly based on ease of cloning criteria, not functional criteria.

Two stones don't make a house

There are several problems with this perspective:

  • Plasmids are derived from one another. They generally carry legacy sequences that can have undesirable effects.
  • Little thought is given to the biology of the expression vector.
  • The plasmids may not have been sequenced in a very long time, if ever.
  • Little planning goes into the design of the experiment. The outcome and timing of this haphazard process is fairly unpredictable.
  • Lack of structure makes it very challenging to develop models of the data collected that could support a rational improvement of the expression system.

Deriving plasmids from one another based on ease of cloning is akin to someone who would want to build a house by randomly picking up materials near the construction site. Anyone would be lucky to get a crude shack using this approach. Getting a decent home requires blueprints, procuring the supplies, and generally a certain amount of planning before the project is shovel-ready.

Festina Lente

Looking at this problem with the eyes of an engineer, I can see many design decisions that need to be made:

  • Should we link the two subunits?
  • Should we put them in one cistron under the control of a single promoter?
  • Should we put them in different cassettes or even plasmids?
  • What solubility tag should we use?
  • Should we tag one or two subunits?
  • Should we use a strong promoter? A weak promoter? An inducible promoter or a constitutive promoter?
  • Should we use a strong ribosome binding site (RBS) or a week RBS?
  • Should we put the chaperone on the same plasmid or a different one?

And for all the parts used in the expression system, we would have to figure out what is the sequence we would want to use considering that there are dozens of variants of these sequences out there.

The challenge is that they are many more possible designs than a team can test. Making them and testing them is long and expensive. So, the protein expression team needs to evaluate all the possibilities carefully, prioritize them, and break them down into batches by considering expression, manufacturability, and purification criteria. That's what design is about? Looking at all options and coming up with a plan to explore them in a systematic way until a solution is found.

Using design software would help (or force) the team to step back and think about the design of the expression vector. It would help them plan their experiment. And if the experiments are properly planned, it will be possible down the road to use mathematical models to optimize the expression of the gene.

Festina lente is a classical adage and oxymoron meaning "make haste slowly" sometimes rendered in English as "more haste, less speed". Design of DNA sequences is a good illustration of this expression. Design may feel like a waste of time and a distraction from the next experiment. However, activities like planning and cloning should be performed with a proper balance of urgency and diligence. If tasks such as deriving a new vector from an existing one are overly rushed, mistakes are made and good long-term results are not achieved. "Wasting" a few days looking to look at different options is cheap insurance against costly mistakes that delay the successful outcomes of a research project.


This article and its reviews are distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and redistribution in any medium, provided that the original author and source are credited.