Enlighten Biology Note on Using DNA Sequencing.


The development of DNA profiling and DNA sequencing has led to the development of new areas of bioscience that help us analyze, understand, and make use of all the data generated.

Computational biology and bioinformatics

People often use the terms computational biology and bioinformatics interchangeably.

They describe different aspects of the application of computer technology to biology.

Bioinformatics is the development of the software and computing tools needed to organize and analyze raw biological data, including the development of algorithms, mathematical models, and statistical tests that help us to make sense of the enormous quantities of data being generated.

Computational biology then uses this data to build theoretical models of biological systems, which can be used to predict what will happen in different circumstances.

Computational biology is the study of biology using computational techniques, especially in the analysis of huge amounts of biodata.

For example, it is important in the analysis of the data from sequencing the billions of base pairs in DNA, for working out the 3D structures of proteins, and for understanding molecular pathways such as gene regulation.

It helps us to use the information from DNA sequencing for example in identifying genes linked to specific diseases in populations and in determining the evolutionary relationships between organisms.

Genome-wide comparisons

As whole genome sequencing has become increasingly automated, it has become cheaper and faster, leading to some amazing advances in biology.

The field of genetics that applies DNA sequencing methods and computational biology to analyze the structure and function of genomes is called genomics.

Analyzing the human genome

Since the first complete draft of the human genome was published in 2003, tens of thousands of human genomes have been sequenced as part of research projects such as the 10 000 Genomes Project UK 10K, and most recently the 100 000 Genomes Project.

Computers can analyze and compare the genomes of many individuals, revealing patterns in the DNA we inherit and the diseases to which we are vulnerable.

This has enormous implications for health management and the field of medicine in the future.

Genomics is changing the face of epidemiology.

However, scientists increasingly recognize, except for a few relatively rare genetic diseases caused by changes in a single gene, that our genes work together with the environment to affect our physical characteristics, our physiology, and our likelihood of developing certain diseases.

Analysing the genomes of pathogens

Sequencing the genomes of pathogens including bacteria, viruses, fungi, and Protoctista has become fast and relatively cheap.

This enables:

  • Doctors to find out the source of an infection, for example, bird flu or MRSA in hospitals.
  • Doctors to identify antibiotic-resistant strains of bacteria, ensuring antibiotics are only used when they will be effective and help prevent the spread of antibiotic resistance.

For example, the bacteria that cause tuberculosis (TB) are difficult to culture, slow growing and some strains are resistant to most antibiotics.

Whole genome analysis makes it easier to track the spread of transmission and to plan suitable treatment options.

This has enormous implications for the successful treatment of this potentially fatal disease, especially as TB is spreading fast around the world again, linked to the spread of HIV/AIDS.

  • Scientists to track the progress of an outbreak of a potentially serious disease and monitor potential epidemics, for example, flu each winter, and Ebola virus in 2014/15.
  • Scientists to identify regions in the genome of pathogens that may be useful targets in the development of new drugs and to identify genetic markers for use in vaccines.

Identifying species (DNA barcoding)

Using traditional methods of observation, it can be very difficult to determine which species an organism belongs to or if a new species has been discovered.

Genome analysis provides scientists with another tool to aid in species identification, by comparison to a standard sequence for the species.

The challenge for scientists is to produce stock sequences for all the different species.

One useful technique is to identify particular sections of the genome that are common to all species but vary between them, so comparisons can be made this technique is referred to as DNA barcoding.

In the International Barcode of Life (iBOL) project, scientists identify species using relatively short sections of DNA front a conserved region of the genome.

For animals, the region chosen is a 648 base-pair section of the mitochondrial DNA in the gene cytochrome c oxidase, which codes for an enzyme involved in cellular respiration.

This section is small enough to be sequenced quickly and cheaply, yet varies enough to give clear differences between species.

In land plants, that region of the DNA does not evolve quickly enough to show clear differences between species, but two regions in the DNA of the chloroplasts have been identified that can be used in a similar way to identify species.

The barcoding system is not perfect so far scientists have not come up with suitable regions for fungi and bacteria, and they may not be able to do so but DNA sequencing is nevertheless having a big impact on classification.

Searching for evolutionary relationships.

Genome sequencing has given scientists a powerful tool to help them understand the evolutionary relationships between organisms.

DNA sequences of different organisms can be compared because the basic mutation rate of DNA can be calculated scientists can calculate how long ago two species diverged from a common ancestor.

DNA sequencing enables scientists to build up evolutionary trees with an accuracy they have never had before.

Genomics and proteomics

Proteomics is the study and amino acid sequencing of an organism’s entire protein complement.

Traditionally, scientists thought that each gene codes for a particular protein, but we now know that there are 20-25 000 coding genes in the human DNA but a very different number of unique proteins.

Estimates range from somewhere between 250000 and 1 000000 different proteins to only 17-18 000 so there is still a lot of work to be done.

More scientific evidence is emerging that highlights the complexity of the relationship between the genotype and the phenotype of an individual.

The DNA sequence of the genome should, in theory, enable you to predict the sequence of the amino acids in all of the proteins it produces.

The evidence is that the sequence of the amino acids is not always what would be predicted from the genome sequence alone.

Some genes can code for many different proteins.


The mRNA transcribed from the DNA in the nucleus includes both the exons and introns.

Before it lines up on the ribosomes to be translated, this ‘pre-mRNA’ is modified in several ways.

The introns are removed, and in some cases, some of the exons are removed as well.

Then the exons to be translated are joined together by enzyme complexes known as spliceosomes to give the mature functional mRNA.

The spliceosomes may join the same exons in a variety of ways.

As a result, a single gene may produce several versions of functional mRNA, which in turn would code for different arrangements of amino acids, giving different proteins and resulting in several different phenotypes.

Protein modification

Some proteins are modified by other proteins after they are synthesized.

A protein that is coded for by a gene may remain intact or it may be shortened or lengthened to give a variety of other proteins.

The study of proteomics is constantly giving us increasing knowledge of the extremely complex relationship between the genotype and the phenotype.

Synthetic biology

The ability to sequence the genome of organisms and understand how each sequence is translated into amino acids, along with the ever-increasing ability of computers to store, manipulate, and analyze the data, has led to the development of a new field of science called synthetic biology.

Synthetic biology is defined by the Royal Society as ‘an emerging area of research that can broadly be described as the design and construction of novel artificial biological pathways, organisms or devices, or the redesign of existing natural biological systems.

Synthetic biology includes many different techniques.

These include:

  • genetic engineering – this may involve a single change in a biological pathway or relatively major genetic modification of an entire organism (further detail is given in the next topic)
  • use of biological systems or parts of biological systems in industrial contexts, for example, the use of fixed or immobilized enzymes and the production of drugs from microorganisms
  • the synthesis of new genes to replace faulty genes, for example, in developing treatments for cystic fibrosis (CF), scientists have attempted to synthesize functional genes in the laboratory and use them to replace the faulty genes in the cells of people affected by CF
  • the synthesis of an entirely new organism. In 2010, scientists announced that they had created an artificial genome for a bacterium and successfully replaced the original genome with this new, functioning genome.

Synthetic life

  • Scientists have developed some new nucleotide bases (not adenine, thymine, cytosine, or guanine) which, in a test tube, can be incorporated into a strand of DNA by special enzymes. The bases fit together well – they are not held by hydrogen bonds like the natural bases.
  • In 2014, scientists introduced a small section of DNA made with these synthetic bases into bacteria. They found that this unique DNA, including the synthetic nucleotide bases, was replicated time after time as long as they supplied the bacteria with the synthetic bases.
  • If these bases can be incorporated into the main DNA of an organism, and then transcribed into RNA, synthetic biologists will have synthetically expanded the genetic code for the very first time.


About Author

Leave a Reply

Your email address will not be published. Required fields are marked *