An algorithm for precision medicine (plus a quick intro to biology)

[article index] [] [@mattmight] [rss]

Precision medicine promises to deliver “the right drug to the right patient at the right time.”

Delivering on that promise swiftly and efficiently remains a significant challenge, yet the individual components necessary to conduct precision medicine do exist today.

Motivated patients, advocates and clinicians can stitch these components together.

This article exists as a living document to provide a guide for precision medicine for motivated patients (or their advocates) and the clinicians and scientists willing to work with them on their journey.

The high-level goal of this document is to answer the question, “What do I have to do find or create the right drug?”

The overarching emphasis is on taking the steps necessary:

  1. to be able to test whether a compound may be a treatment;
  2. to find candidate compounds which may be a treatment; and
  3. to screen these candidate compounds using that test.

This guide is written for a patient with a (suspected or confirmed) monogenic condition, although some of the principles should be adaptable to other conditions with a genetic basis.

The article is split into two parts:

  1. The first part of the article is a guide to help motivated patients working with clinicians and scientists identify rational next steps toward treatment.

  2. The second part of the article is Q&A-based glossary that doubles as a tutorial for patients teaching themselves the fundamentals of precision medicine.

I strongly encourage critical feedback and suggestions.

The guide contains the following high-level steps:

  1. Sequencing and diagnosis.
  2. Finding other patients and building community.
  3. Preparing for scientific investigation.
  4. Identifying strategies for therapeutics.
  5. Exploring specific therapies.
  6. Discovering drug candidates.
  7. Medicinal chemistry.
  8. Clinical trials.

Acknowledgements: Kim Splinter has contributed substantial edits, revisions and corrections to an earlier version of this document as well as her significant authorship in the present version. Dr. Karen Ho has also been an invaluable scientific mentor, having introduced me to many of the techniques below and answered countless questions.


Before you read any further, please realize that this document has been constructed by the father of a patient with an ultra-rare genetic disorder – not a licensed medical professional.

It is a distillation of three years of experience after my son was diagnosed as the first patient with his particular genetic disorder.

As a result, it is critical to validate any information in this article with a trained medical professional and a scientific team with domain expertise.

If the process described herein leads you to make predictions about potential therapies, do not attempt them. Consult your health care provider.

For patients and their advocates, my hope is that this guide will serve as a launching point for questions and discussions with professionals, and that it will save individuals on similar journies significant time.

I spoke at Harvard Medical School about the thought process that ultimately led to this article, and the talk is available online:

(There is significantly more detail in this article than in the talk.)

A word on audience

This document is split into a guide followed by an introductory Q&A.

The first part – the guide – is written for an audience with a basic grasp of modern biology.

To make the guide more accessible, I have sometimes use terms more intuitive to a lay audience (such as mutation) at first instead of the more precise term often used by professionals (such as variant or allele).

Most patients and advocates (myself included) don’t come equipped with a grasp of modern biology, so the second part of this document is a Q&A, and the guide has links into the Q&A to explain technical terms as necessary.

Moreover, you don’t have to read the guide in its entirety.

You can trace out the steps relevant to you.

It may be beneficial for newcomers to scan the entire Q&A first, and then circle back to the guide.

If you’re already trained in a technical field, then I recommend Quickstart Molecular Biology to get up to speed rapidly:

If you have a technical background, you can digest this book in about a day.

Step 1: Sequencing and diagnosis

How does precision medicine diagnose patients?

If you already have a diagnosis, you can skip to Step 2.

Exome and genome sequencing are powerful techniques for diagnosing conditions with a suspected genetic cause.

For these conditions, sequencing has the advantage of being able to look at many genes simultaneously.

For many patients on diagnostic odysseys, it is the first step toward taking therapeutic action.

Sequencing often finds genetic changes, or mutations, of interest. (Geneticists often use the closely related word variant to describe a specific genetic change or mutation.)

After sequencing, it is important to interpret the meaning of the mutations discovered.

For patients that have been on intractable diagnostic odysseys, the NIH-funded Undiagnosed Diseases Network specializes in using advanced precision medicine to deliver diagnoses.

Interpreting the meaning of mutations

Since every human being carries many mutations, merely finding a mutation is not grounds for concluding that it is the cause of a condition.

In fact, most mutations are benign on their own – or not disease-causing if only one allele is affected – so it is important to consider combinations of mutations as well.

Each mutation (or the genotype collectively) must be examined to determine whether or not it could plausibly contribute to the phenotype of the patient.

To be clear, while there are some principles that can aid in interpreting mutations, it is not a standardized process (nor will it ever likely become one, lest we find a way to standardize science itself).

Achieving consistent interpretations of a mutation or mutations is a significant challenge at present.

The ACMG variant analysis guidelines provide a baseline set of techniques and resources to use during the interpretation of variants.

Several factors may aid the process of interpretation:

  1. Finding additional patients.
  2. Assessing segregation and inheritance patterns.
  3. Analyzing frequency in the population.
  4. Analyzing conservation.
  5. Exploring the type of mutation and functional predictions.
  6. Conducting functional studies in a lab.

Aiding interpretation: Finding additional patients

Given the difficulty of interpreting mutations, finding another patient with a matching or similar genotype is the ideal means of confirming pathogenicity.

Searching for additional cases in databases such as PubMed, ClinVar, DECIPHER, OMIM, HGMD and dbGaP may be able to find another case for a gene of interest.

Less conventional resources, such as Google and Wikipedia, should also be searched in case clinicians, researchers or even patients themselves have posted relevant information online about a gene or variant.

When searching less structured resources, it is important to consider all names for a gene (in both humans and other organisms).

For example, the equivalent of the gene NGLY1 is called PNGase in other organisms (and more recently is sometimes erroneously called CDGIV or CDG1V).

Aiding interpretation: Assessing segregation and inheritance patterns

If more than one relative is impacted, combining genotypic and phenotypic data from multiple relatives – both healthy and affected – can aid in determining the pathogenicity of a mutation and the pattern of inheritance of a condition.

If a condition is dominant, then relatives with a causative pathogenic allele should present with the condition.

If a condition is autosomal recessive, then only relatives with two causative pathogenic alleles should present with the condition.

Reasoning backward, potential patterns of inheritance also provide grounds for additional scrutiny. For example:

Aiding interpretation: Population frequency

The frequency of an allele in a genomic population database such as ExAC also hints at pathogenicity: pathogenic alleles tend to be rarer.

Aiding interpretation: Analyzing conservation

Evolutionary conservation, examined using tools such as the UCSC genome browser, can also be supporting evidence of pathogenicity.

For example, if a particular amino acid remains the same in versions of the gene across species, it is an indication that natural selection is protecting against change in that amino acid, and changes to such an amino acid are more likely to be pathogenic.

Looking at genes that co-evolve with a gene of interest with ERC analysis may also yield clues as to the functional role of the gene.

Aiding interpretation: Predicting functional impact

In general, determining the impact of a mutation will require consulting the literature and experts for the genes and mutations of interest.

Tools such as PolyPhen2, MutationTaster, SIFT, and the Ensembl variant predictor will attempt to predict the effect of mutations for proteins, although a geneticist or genetic counselor needs to audit the results.

Related genes should be studied as well for known or suspected pathogenicity. The STRING database and Genemania report potential interactions between proteins that could suggest additional hypotheses and analyses.

Searching Google and PubMed may identify animal or cell models with mutations in a gene of interest demonstrating an effect relevant to the clinical presentation.

The type of a mutation is also important in predicting impact on function:

Given the location and type of the mutation, expertise in the structure of the protein or the gene may yield insight into its potential impact.

Example: Mutations in a functional domain

Mutations within a domain of function should be viewed with increased suspicion, as these have a greater chance of disrupting the activity of the protein.

Consult the NCBI protein database or for known domains for a protein.

Computer modeling may be able to predict alteration of binding affinity.

Example: Mutations impacting post-translational modification

A mutation (especially a missense mutation or an in-frame mutation) that enables or disables post-translational modifications warrants greater scrutiny for pathogenicity.

There are a variety of tools for predicting post-translational modification sites.

For example, the loss of a phosphorylation site that is used to inhibit activity could lead to gain of function.

Example: Mutations impacting post-primary structure

A mutation (especially a missense mutation or an in-frame mutation) that could potentially modify the secondary, tertiary or quaternary structure of a protein (or resulting complex) warrants scrutiny for pathogenicity.

To determine the impact on structure, either a structure obtained from crystallography or a simulated protein folding is necessary.

For example, two cysteines distant from one another in the sequence can form disulfide bonds within a protein as it folds. Changing either cysteine into a different amino acid breaks the ability to form the disulfide bond.

Example: Mutations that impact splicing

Mutations (including intronic mutations) that damage RNA (which might even appear to be synonymous in some cases) can severely limit or eliminate production of the protein.

The tool ESE2 can help identify splicing errors.

Example: Mutations in non-coding regions

Mutations outside of the exome can be difficult to interpret.

For example, if a mutation were to occur in the promoter region for a gene, it could disrupt transcription of the gene, leading to loss of function for that allele.

Aiding interpretation: Functional studies

Functional studies of mutations in a laboratory may help determine pathogenicity.

Functional studies may involve analysis of cell lines or even the construction of a model organism, such as a mouse, fly or worm.

The model organism will be genetically modified to have a mutation equivalent to the one under suspicion.

In practice, functional studies require identifying an expert in the gene or related genes to design an experiment that would support or refute pathogenicity for the genotype.

For example, if a mouse bearing the mutation(s) (or equivalent) presents with a phenotype corresponding to the patient, then this is evidence of pathogenicity.

For interpretation, the Monarch Initiative can compare phenotypes across species.

If studying cell lines with an assay related to activity of the gene reveals that the mutation has caused gain of function for the gene involved, then the mutation should draw additional scrutiny.

If no assays exist to examine the direct or indirect hypothesized mechanisms of harm, it may be necessary to discover an assay.

See also:

Interpreting chromosomal abnormalities

High-resolution karyotyping and next-generation sequencing can also detect chromosomal abnormalities.

Chromosomal abnormalities usually impact many genes simultaneously, leading to duplicated copies of many genes or deleted copies of many genes (or both).

The next step is to determine which genes have been impacted, and then to attempt molecular therapeutics on a gene-by-gene basis.

Using the UCSC genome browser, one can look up the genes found in the affected regions.

A diagnostic report should indicate the specific abnormality:

  • In a chromosomal deletion, the set of genes in the fragment have been deleted.

  • In a chromosomal duplication, the set of genes in the fragment have been duplicated.

  • In a chromosomal translocation, fragments of two chromosomes have swapped, which may be balanced (indicating no genes lost or duplicated) or unbalanced (indicating possible duplicated and lost genes).

A diagnostic report should also include the regions impacted in cytogenetic notation.

Next steps if there is no diagnosis

Unfortunately, sequencing does not always lead to a diagnosis.

In some cases, there are multiple plausible candidate mutations, but none can be definitively linked to the condition, and in others, there are simply no plausible candidate mutations.

  • If plausible but not definitive candidates emerge among the mutations, then the next steps are to determining which approaches were used to analyze pathogenicity, and attempting those which were not, which may include techniques less common in a clinical setting, including:

    1. structural analysis via simulated protein folding;
    2. conducting functional studies; and
    3. finding additional patients.
  • If no plausible candidates are identified, then next steps are:

    1. to use genome sequencing if exome sequencing was used;
    2. to consider somatic mosaicism and other potentially confounding factors; and
    3. to pursue other “omics” beyond genomics, including transcriptomics and proteomics.
  • While unconventional, crowd-sourcing the interpretation of a mutation over social media (such as in a blog post), may yield insight.

Step 2: Finding other patients and building community

How can patients discover each other an build communities?

It is difficult to find a therapy for a single patient.

A community is useful not only for the resources and talents its members bring, but because it takes a community of patients to isolate the core phenotype of a disorder.

Examining the frequency of potentially pathogenic alleles in a database like ExAC or the Exome Variant Server allows estimation of the number of other patients in the world.

There are several genetic databases available to search for a matching patient, such as:

And, there is a large collection of such databases pooled together through MatchMaker Exchange.

Using the internet and social media can also help to identify these patients.

When encountering a possible new patient, it is important to realize that matching on the same gene in a diagnostic report does not automatically imply that he or she has the same disease.

For each potential second patient, his or her genotype must be examined to ensure that it is sufficiently similar and that the presumed pattern of inheritance matches.

For example, if a disorder is recessive, it is important to ensure that the second patient is not simply a carrier.

See also:

Next steps

As a patient community begins to grow, there are important next steps to be taken in parallel, including:

Step 3: Preparing a scientific foundation

What should a patient or community do in order to enable the science necessary to identify and act on therapeutic strategies?

Identifying or discovering therapeutics may require significant scientific investigation. It is critical then to prepare the foundation for investigating the disease in parallel with identifying therapeutic strategies.

Most or all of the following activities should conducted as soon as possible, especially since some can take months or years to complete:

  1. natural history studies of patients; and

  2. investigational studies into mechanism of harm;

  3. identifying and developing assays;

  4. identifying biomarkers;

  5. establishing cell lines;

  6. establishing biobanks;

  7. establishing model organisms;

  8. investigational studies into phenotypic modifiers;

  9. establishing or enhancing the patient community; and

Most of these tasks will require identifying an expert.

Phenotyping via natural history studies

A natural history study for a cohort of patients is a scientific study that observes how phenotypes evolve for each patient individually and collectively over time.

It is important to study many patients, because the phenotype for most conditions varies, and a natural history study will be needed to identify the core of the phenotype.

Regulatory agencies such as FDA prefer having strong, longitudinal natural history data for clinical trials.

Natural history studies are critical for:

  • being able to predict the progression of a condition;
  • associating genotype with phenotype;
  • identifying the core features of a disorder; and
  • uncovering biomarkers.

Treatment strategies directed at the core features of a disorder are more likely to bring broad relief, while identification of ancillary genes that modify the phenotype may provide therapeutic insights and targets.

Deep phenotyping as part of natural history studies can also uncover potential biomarkers for use while conducting clinical trials.

Investigational studies into mechanism of harm

Determining the pathogenicity of a mutation often does not involve causally linking the underlying gene or variant to all of the high-level symptoms of the patient. (For instance, when a second case is identified to confirm the cause, there may be no knowledge as to why it is causal.)

It is difficult to treat a disease if the chain of events (starting with a mutation) that cause harm to the patient are not understood.

The chain of causality between a genetic defect and a high-level symptom can be lengthy, but it is critical to uncover the full chain.

In some cases, targeting a later link in the chain is easier than targeting the initial cause, with the expectation that focusing on downstream mechanisms may bring less general relief.

Investigational studies can proceed bottom-up from the cell biology level, or they can move top-down from the patients and model organisms (as in natural history studies). Ideally, these studies should move in both directions and generate hypotheses for the other to test.

Identifying assays

For each mechanism of harm, investigational studies should aim to discover laboratory assays that can observe the hypothesized or confirmed mechanism.

An assay is, effectively, a standardized laboratory experiment.

The purpose of an assay is to measure a specific feature of a biological system.

For example:

  • Protein-expression assaysmeasure the amount of a particular protein present in a sample (usually with an antibody).

  • Enzymatic activity assays measure the level of activity or function present for a particular enzyme.

  • Electrophysiological assays measure electrical properties of cells. (Genetic epilepsies may benefit from this broad category of assay.)

Having a suite of assays that can probe different links of the mechanism of harm are critical for validating compounds predicted to have therapeutic benefit.

For example, a good assay might fluoresce under UV light in the presence of a compound that corrects some aspect of the mechanism of harm.

Assays are necessary for validating potential compounds.

Precise assays are also a prerequisite for conducting high-throughput screening.

There are roughly three kinds of assays, in order of increasing complexity:

  • cell-free assays isolate the key components of a cellular or chemical process;

  • cell-based assays operate on patient cells or on cells that model the disease; and

  • model-organism-based assays observe the phenotype of a model organism.

Lower-complexity assays tend to be more economical for conducting high-throughput screening techniques, while higher-complexity techniques tend to produce stronger candidates. (For instance, if a compound works on a model organism, at least some aspect of the delivery challenge in medicinal chemistry has been solved.)

A word of caution on assay development

Designing scientific experiments to measure a feature of interest is a fundamentally creative process, so assay discovery is subject to the same constraints that govern scientific progress itself.

Designing an assay will require finding an expert in the relevant biology.

While it is hard to systematize or automate the process of assay discovery, Recursion Pharmaceuticals has computer vision algorithms that attempt to discover cell-based assays when there are morphological changes to the cell as the result of a disorder.

Identifying biomarkers

Basic research into the mechanism of harm should also be aimed at identifying biomarkers: observable indicators of the disorder.

Biomarkers help measure the effectiveness of therapeutic approaches directly or indirectly, and are essential when conducting clinical trials.

Establishing cell lines

For basic investigational studies into the cell biology of a disorder, there are two broad categories of cell lines:

  • Patient-derived cell lines such as fibroblasts, lymphoblasts and iPSC use donated patient tissue, such as blood, skin, muscle or liver.

  • Constructed cell lines introduce a patient’s genotype into an existing cell line to mimic the condition in that cell line.

For each disorder, it is important to create the cell lines with the strongest phenotype relative to the high-level phenotype of the patients. The strongest phenotypes are most useful for validation and screening.

For instance, if a disorder has strong liver involvement, then hepatocytes may provide a strong cellular representation of the disorder, while if it is neurological in nature, then neurons may provide a strong representation.

As cell lines become established, it is important to conduct investigational studies that establish phenotypes for the different cell types.

Patient-derived cell lines

The type of cell lines necessary for studying a disorder vary with the disorder, but fibroblasts and lymphoblastoid lines are relatively common, if only for their durability.

Stem cell lines (usually induced pluripotent stem cells (iPSC) made from patients) can also be useful in studying a disorder, because they are differentiable into other cell types, such as brain cells or liver cells.

Stem cell lines cost around USD 5,000 to USD 10,000 to create in an academic laboratory, but can take months to create, so it is advisable to start the process early. Differentiation into other cell types can take longer still.

It is also advisable to use as many patients as possible, since some patients will inevitably provide more robust cell lines than others.

Constructed cell lines

Since some genetic conditions make culturing patient-derived cells challenging, so an alternative approach is to introduce a patient’s pathogenic mutations directly into an existing cell line.

For instance, cancer cell lines are sometimes used for their robust growth properties, although cancer cell lines can alter the phenotype of the cells.

An advantage of using existing cell lines is that, in addition to growing well, they can be created relatively quickly and cheaply for almost any tissue type.

The chief disadvantage is that their fidelity to the actual condition and real patients may not be clear.

Establishing biobanks and reagent repositories

As a patient community grows, establishing biobanks with cell lines and repositories of reagents can accelerate research and make it easier to compare and reproduce results between research labs.

A set of reagents that will likely prove useful are antibodies to any proteins of interest. (Antibodies can be used in a lab to detect the presence of a particular protein.)

Patient communities will likely want to partner with an existing medical research institute such as a university, the NIH or a non-profit such as Sanford-Burnham-Prebys or Coriell for biobanking of cell lines and tissues.

Because patient tissue is extremely valuable, communities should also consider creating a procedure for donating the bodies of deceased patients to aid in investigation of the disorder, should patients or parents choose to do so.

Creating model organisms

Creating model organisms – such as flies, mice, worms and yeast – can aid interpretation of variants during diagnosis, in understanding the mechanism of harm and in validating compounds during drug development.

If a model organism recapitulates the phenotype of a patient, then it is evidence that the modeled genotype is pathogenic.

Even if a model organism does not share the same phenotype as human patients, it may still be useful for scientific investigations, so long as it has a phenotype.

If a model organism responds to a therapy, then it can aid in validating predicted therapeutics.

The choice of model organism in disease research is guided by the exhibition of a strong phenotype for the disorder and/or its correspondence with the human phenotype.

After the construction of a model organism, a critical next step is to characterize the phenotype of the model.

Characterizing the phenotype of a model organism may be significantly more labor than creating the initial organism itself, but it is a critical step, since it is difficult to test therapeutics without a robust phenotype with high statistical confidence.

Many academic laboratories and institutes such as Jackson Laboratories can construct and phenotype model organisms.

In the context of human disease, there are three categories of model organisms commonly employed:

Finding secondary genes that modify the phenotype

In addition to identifying mechanisms of harm, pre-translational scientific investigations can also aim to identify genes that modify the phenotype of a condition.

For instance, in a loss of function disorder, there may be another gene that can compensate for the role of the lost activity. (In a metabolic disorder, this would be called an alternative pathway.)

If an assay that measures the activity of the deficient enzyme is available, then it is possible to conduct a genetic screen for alternate pathways.

If a compensatory gene is discovered, then the next step is to target it for increasing gene expression.

In some cases, disabling or decreasing expression of a second gene may actually suppress the phenotype of the disorder. Suppressor genes offer additional therapeutic targets for inhibition. A genetic screen can identify suppressor genes as well.

For general interactions between genes, the STRING database and Genemania provide information on known relationships.

For determining which genes co-evolve together – which yields clues to function – one can use ERC analysis.

Step 4: Identifying strategies for therapeutics

Which therapeutic strategies are in scope for a disease?

In order to explore therapeutic strategies for a genetic condition, it is critical to understand (1) the type and location of the mutation; (2) the type and function of the affected protein (or non-coding RNA); and (3) the primary and downstream mechanisms of harm.

The following set of strategies for molecular therapeutics depends upon the type of the mutation, the type of the protein and the maturation of understanding around the mechanism of harm, and these include:

  • screening for candidate compounds;
  • a general strategy for changes in degree;
  • targeting a mutation;
  • targeting a metabolic defect;
  • targeting a transporter protein defect;
  • targeting a receptor defect;
  • targeting gene regulatory defects;
  • targeting a structural protein defect;
  • targeting a non-coding RNA defect;
  • targeting a proteinopathy.

Screening for candidate compounds

Regardless of the underlying cause, much of modern drug development rests on screening compound libraries for an effect on the mechanism of harm.

As a result, it is difficult to engage in screening without first conducting investigational studies into mechanism of harm.

Once an assay for measuring a mechanism of harm has been discovered, the next step is to conduct high-throughput screening.

In addition to manual screening, virtual screening may be able to make therapeutic predictions without resorting to bench science.

If screening yields any hits, the next step is compound validation.

Targeting a mutation

Some therapeutic strategies focus directly on the mutation itself, without regard to the type or role of the corresponding protein:

Readthrough therapeutics

If a disorder is caused by a premature stop mutation, then readthrough therapeutics are in scope.

Read-through compounds attempt to override premature stop codons, ultimately converting them to an amino acid in the process of protein synthesis.

(Keeling, et al., 2014) provide an overview of the readthrough therapeutics space.

When the mutation of interest is a premature stop, then testing readthrough compounds on cell cultures is a reasonable next step, and the following compounds may be useful in those tests:

  • G418 is a potent readthrough compound useful in a laboratory setting as a measure of the potential of this approach, since it is a potent readthrough inducer, although it is too toxic to be used therapeutically.

  • Gentamicin (which has problematic side effects) is also known to induce readthrough in some cases.

  • Ataluren, which is purported to induce readthrough, is available in Europe.

Exon-skipping therapeutics

If a damaging mutation occurs in an exon that is non-critical to the resulting protein, then exon-skipping may be a viable therapeutic approach.

Databases such as the Ensembl genome browser can provide the exons and introns for a gene.

In addition, Ensembl will also provide alternate transcripts that have been identified. If an alternate transcript exists that skips the exon containing the mutation, this is a positive (though not essential) indication that exon-skipping is a viable therapeutic approach.

To skip over a mutation-bearing exon, an antisense oligonucleotide sufficiently complementary to the mutation and surrounding nucleotides is created that can induce RNA splicing to skip the exon during construction of mRNA.

In theory, an antisense oligonucleotide sequence could be customized for any disorder in which it is reasonable to skip an exon.

Exon skipping is being actively pursued as a means to convert cases of Duchenne muscular dystrophy into the less severe Becker muscular dystrophy.

If an exon-skipping compound is identified or constructed, the next step is compound validation.

A general strategy: Targeting changes in degree

Many (but not all) genetic disorders can be lumped into one of three categories according to their impact on the function of either the original gene or a process with which the gene interacts: total loss of function, partial loss of function or gain of function.

A total loss of function often results when:

  1. both alleles for a gene in an autosomal recessive disorder are impacted by loss of function mutations;

  2. the only allele in an X-linked disorder is impacted by a loss of function mutation;

A partial loss of function often results one when one allele in a haploinsufficient gene suffers a loss of function mutation.

A gain of function results when a mutation causes an increase which disrupts regular functioning.

Therapeutic strategies for total loss of function

A total loss of function in a gene most commonly happens in recessive disorders with two loss-of-function alleles or in X-linked disorders with a single loss-of-function allele.

There are three high-level strategies for total loss of function:

  1. Restore the lost function.
  2. Compensate for the lost function.
  3. Suppress aggravating factors.
Restoring lost function

If a missing enzyme could be delivered to the correct part of the cell, then the next steps include enzyme synthesis and development of enzyme replacement therapy.

If a missing protein could be delivered from another tissue, then exploring genetically-motivated transplantation is a next step.

Compensating for lost function

In some cases, a second gene may have a degree of redundancy with a lost function.

Thus, a next step is investigational studies to look for compensating genes.

If total loss of function is leading to insufficiency or absence of a particular metabolite, then another next step is to explore a metabolic diet to deliver the metabolite.

If an insufficient or absent metabolite cannot be obtained through diet, then it should be considered a target for drug development via medicinal chemistry.

Suppressing aggravating factors

In a total loss of function, a genetic suppressor screen can identify secondary genes that worsen the condition.

Given the difficulty in restoring lost function, a genetic suppressor screen is a highly advisable strategy for developing therapeutics.

For each gene hit on a genetic suppressor screen, it should be treated as if the disorder were caused by a gain of function in that gene in terms of therapeutic strategies. That is, the aggravating gene should be targeted by inhibitors.

If total loss of function is leading to accumulation of a harmful metabolite, then another next step is to explore a metabolic diet to reduce consumption of the harmful metabolite or its precursors.

Therapeutic strategies for partial loss of function

If there is diminished – but not lost – activity for a gene, then the high-level strategy is to increase activity.

Mutations is haploinsufficient genes can lead to disorders with partial loss of function.

Because there is residual activity, there are additional strategies to be considered in addition to those for total loss of function:

  • A next step is to explore increasing expression of the target gene with the aim of boosting expression of the wild-type allele as a means of compensating for the loss of function in a mutant allele.

  • If residual activity is insufficient for downstream processes, it may also be useful to consider to increasing the inputs to the activity; that is, increasing the number of substrates or agonists in an attempt to resolve the insufficiencies.

Under either approach, the next step is to validate any candidates.

Therapeutic strategies for gain of function

If a disorder is caused by a gain of function in a gene or is aggravated by activity in another gene, then the high-level strategy is to suppress activity in that gene or in its pathway.

A next step is to search for suppressors – inhibitors, blockers or antagonists – of the target gene. Google and PubMed may identify initial hits for such compounds.

The Guide to Pharmacology contains target-specific inhibitors for many genes.

In addition to searching for suppressors, a next step is to explore decreasing expression for a gene.

If upstream or downstream elements in the affected pathways are known, then applying the same gain of function strategies to each element in the pathway may serve to counteract a gain of function upstream or downstream.

If no inhibitor, blocker or antagonist is known, then structure-based drug development and virtual screening are potential next steps.

For any compounds that turn up, the next step is compound validation.

Targeting a metabolic pathway defect

If a mechanism of harm for the disorder disturbs a metabolic pathway – as when mutations disrupt an gene encoding an enzyme – then the general strategy applies.

In metabolic pathway defects, it may be possible to intervene upstream or downstream of the defect. Pathway databases such as BioCyc may help identify additional targets for intervention.

In any case, but especially in the event of a total loss of function, an additional next step is to explore a metabolic diet.

If a missing metabolite cannot be effectively consumed in the diet, then the missing metabolite is a target for drug delivery via medicinal chemistry.

Targeting a membrane transporter protein defect

If a mechanism of harm for the disorder disrupts a transporter protein (such as an ion channel), the defect may increase, decrease or halt the flow of a molecule across a membrane.

A “loss of function” mutation in a membrane transporter protein means the protein is broken in some way. Since a membrane transporter protein acts like an automatic door, there are two ways that a door can suffer a “loss of function”:

  1. a door that is stuck open lets in too much; while

  2. a door that is stuck closed lets in too little.

When dealing with a defect in a membrane transporter, is absolutely critical to know whether the defect is causing a gain in traffic or a loss in traffic.

As such, the general strategy for targeting a degree of change in function is in scope, except that the equivalent of an inhibitor is often called a blocker.

Cystic fibrosis (a total loss of function (in the sense of closure) in a chloride channel) is an example of a transporter protein defect with a recent track record of success in finding treatments.

Targeting a receptor defect

(If a mechanism of harm impacts a receptor, then the high-level general strategy for a change in degree applies, but there are additional possibilities due to special properties of receptors.)

Receptors are more complex in their interactions than enzymes, because they have a baseline level of activity – constitutive activity – even in the absence of the ligand (agonist) which stimulates them.

With a partial loss of function, a next step is look for agonists of the receptor.

In the case of a gain of function, increasing antagonists may help, but increasing inverse agonists may help as well.

Receptors can also be viewed as the starting points of the metabolic pathways that they kick off, so it may be easier to target a metabolic pathway behind the receptor than the receptor defect itself.

For any compound suggested by these strategies, the next step is compound validation.

Targeting gene regulatory defects

Apart from a genetic disorder’s primary mechanism of harm, disregulation of other genes can account for harm in these disorders as well.

Most genes have a role to play in the regulation of other genes: increasing expression expression of one gene may increase or decrease the expression of another gene.

As a result, altering the expression of the protein impacted by a mutation can have downstream effects through gene regulatory networks.

In addition, some genes – such as those involved in chromatin modification or histone/DNA methylation – engage in regulation of other genes as their primary function.

To target the primary effects of a mutation in a regulatory gene and the downstream effects of other mutations, transcriptomics and proteomics can reveal the extent to which other genes have been disregulated, and can suggest therapeutic strategies for restoring a baseline gene expression profile.

Targeting a structural protein defect

If a mechanism of harm disrupts a protein whose primary purpose is structural (as in dystrophin), then it is challenging to replace the structure.

The Duchenne Muscular Dystrophy community is an exemplar in developing strategies for tackling the absence of a structural protein.

Given the pharmacological challenges in therapeutically delivering a structural protein, strategies focusing on the mutation such as exon-skipping and readthrough are next steps, assuming the mutations are in scope.

Though difficult, investments in basic science for gene-editing may be advisable.

If it is suspected that the mutant protein would have some value, but protein quality control mechanisms are too aggressively degrading the protein, then a next step is to search for stabilizers for the mutant protein.

Targeting a non-coding RNA defect

If the primary cause of a disorder is an error (or a loss) of non-coding RNA – which is presumed to be rare relative to disorders caused by defects in proteins – then there are two additional high-level strategies:

  1. delivering the missing non-coding RNA; and

  2. editing the defects in the non-coding region.

RNA is straightforward to synthesize, but targeted delivery is a significant challenge in the application of medicinal chemistry.

However, targeted gene-editing has additional (likely more difficult) challenges.

Targeting a proteinopathy

If a mechanism of harm disturbs protein folding, in some cases, the new foldings (or foldings to which they have become susceptible), are actively malignant.

In many cases, cells can detect misfolded and/or mutant proteins through quality control mechanisms, and degrade them.

In some diseases, improper protein folding causes toxic protein aggregation.

For instance, amyloid misfolding features in diseases such as prion disease, amyloidosis, Alzheimer’s, Huntington and Parkinson’s, as it allows aggregation of misfolded proteins.

In disorders in which misfolding is driving malignant behavior, several high-level strategies are in scope, including:

Heat shock proteins aid protein folding and are often naturally upregulated when a cell is stressed (although heat shock therapeutics have been challenging to develop due to toxicity).

Autophagy is the process by which cells digest defective or excessive components, and upregulation of this process may be beneficial in proteinopathies.

If protein aggregration specifically is problematic, then an additional strategy is to characterize sites on the protein that allow aggregates to form and to identify inhibitors that can interfere with these sites, with the aim of preventing aggregates from forming.

For any identified compound, the next step is to compound validation.

Targeting the phenotype

It is possible to bypass the primary mechanisms and instead look for therapeutics based on biomarkers and/or phenotypes in cells and/or model organisms.

  • At the level of a patient, phenotypic targeting is simply symptomatic treatment. For instance, if a symptom of the disorder is epilepsy, one can try known anticonvulsants.

  • If investigational studies into the mechanism of harm have found a biomarker in cells and a high-precision assay has been discovered to measure that biomarker, then a next step is high-throughput screening.

  • If investigational studies into the mechanism of harm have found a phenotype in model organisms, then a next step is model-organism screening.

Crowd-sourcing and crowd-screening

In some cases, conducting precision medicine means conducting science.

Science is a process that depends on collaboration and creativity, so tapping the collective creativity and wisdom of the Internet can accelerate the process.

For example:

  • Crowd-sourcing variant interpretation allows experts on relevant genes to provide their insight on pathogenicity, and it opens up the possibility of finding a matching case.

  • Crowd-screening suggestions through social media for potential therapeutics allows experts to contribute rationally predicted therapeutics (in contrast to blind high-throughput screening approaches).

Mark2Cure is a crowd-screening platform from the Su Lab at Scripps that enables large-scale, crowd-sourced biocuration of the medical research literature related to a disease.

Biocuration enables bioinformatics techniques to mine the newly structured data for relationships between diseases, potential drugs and genes.

Of course, soliciting advice from social media requires filtering out advice without a plausible scientific basis, but it can be a powerful mechanism for generating leads.

Step 5: Exploring specific therapies

How does one implement a specific therapeutic strategy?

Once a therapeutic strategy has been identified, the next step is examine specific approaches to implement that strategy.

Increasing gene expression

In a disorder that involves loss of function, if there exists a functioning copy of a gene that has identical or sufficiently similar function, then increasing expression of that gene may have a therapeutic effect.

Or, in disorders where a mutant protein with reduced function is leading to disease, increasing expression of the mutant protein may raise activity levels high enough to be therapeutic.

In either case, the next step is to predict upregulators for the RNA of the target gene.

Decreasing gene expression

In a gain of function disorder or a disorder for which a suppressor gene has been identified, reducing expression of a target gene may provide a therapeutic effect.

The next step is to predict downregulators for the RNA of the target gene.

Predicting compounds to modify gene expression

To predict which compounds may be able to upregulate the expression of a target gene, the Connectivity Map – or cMapLINCS cloud databases contain the result of experiments measuring the effect of a library of compounds on RNA expression for many genes.

In some cases, the target gene may not be present in the databases, but if the target gene is regulated by another gene in the database, then one can attempt to indirectly regulate the target gene.

Any hits produced through this approach should proceed to compound validation.

Designing a metabolic diet

In the case of a lost metabolic pathway, in which inputs no longer convert to outputs, two mechanism of harm should be expected:

  1. accumulation of inputs; and
  2. a deficiency in the outputs.

If there is no alternate pathway to metabolize the input, then (1) should be examined and if no alternate pathway to synthesize the output exists, then (2) should be examined.

For example, in the disorder PKU, total loss of function in phenylalanine hydroxylase leads to an inability to convert the amino acid phenylalanine into tyrosine.

This suggests two strategies:

  • Limiting consumption of phenylalanine.

  • Increasing consumption of tyrosine.

In fact, strictly limiting consumption of phenylalanine in the diet is an effective treatment for the disorder, and tyrosine supplementation is beneficial as well.

As another example, patients with CDG Ib – a total loss of function in the gene MPI – lack an enzyme to interconvert mannose–6-phosphate and fructose–6-phoshpate. Because this enzyme is the sole provider of mannose–6-phoshpate, the loss of the enzyme results in a deficiency of mannose–6-phosphate, a critical precursor to a process called glycosylation.

Adding mannose supplementation to the diet is an effective treatment for the disorder.

A more common metabolic diet is the restriction of lactose-bearing dairy products in individuals with lactose intolerance, a result of insufficient or absent quantities of the enzyme lactase, which breaks down lactose into galactose and glucose for further digestion.

Finding stabilizers for mutant proteins

In the event that the mutant protein is predicted to have residual function, but quality control mechanisms within the cell (such as endoplasmic-reticulum associated degradation) are degrading the protein, the goal of stabilization is to find a molecule that interacts with the mutant protein to prevent degradation.

In general, finding stabilizers may require virtual screening and structure-based drug design.

In the specific case where mutant enzymes may retain activity if they could properly fold, but poor ability to fold leads to degradation of the mutant proteins, there is work showing that potent inhibitors of the mutant enzymes at low concentrations may be able to induce proper folding (Fan, 2003), thereby preventing their destruction and rescuing activity.

Enzyme replacement therapy

In disorders lacking an enzyme, enzyme replacement therapy, which replaces the missing enzyme, may be able provide therapeutic relief.

There are substantial drug delivery challenges in enzyme replacement therapy, but these vary in difficulty depending on the tissues and intracellular compartments that need to be targeted.

The first step toward enzyme replacement therapy is being able to synthesize the enzyme in a biologically active form.

Therapeutic enzyme synthesis generally uses the transfection of Chinese Hamster Ovary (CHO) cells with DNA containing the gene that encodes the desired enzyme.

In properly tuned bioreactors, transfected CHO cells can generate large quantities of the target enzyme with post-translational modifications compatible with mammals.

Once the enzyme is synthesizable and validated, applying medicinal chemistry will likely be required to ensure that the enzyme is delivered to the correct tissues and/or intracellular compartments.

Genetically-motivated transplantation

In some cases, transplanting organs, tissue or cells that do not contain the underlying genetic defect can be therapeutic.

In particular, if a disorder results from a missing gene product and that gene product can be delivered from another tissue to other cells in the body, then organ and bone-marrow stem cell transplantation are in scope.

In a bone marrow transplant, the patient’s bone marrow is depleted and then a transfusion of donor stem cells is provided to regrow the bone marrow.

Moreover, because the donor stem cells don’t carry the mutation, as they differentiate in organs and tissues in the body, they will produce cells not affected by the disorder.

In theory, gene editing could be used on induced pluripotent stem cell lines for an autologous bone marrow transplant, although more basic research into accurate gene editing is required before this could be considered a realistic possibility.

Stem cell therapeutics

Stem cells have attracted attention for their regenerative therapeutic potential, and there are certainly disorders and injuries which stand to benefit from them.

Unfortunately, the scope for stem cells in treating genetic disorders is more limited.

In disorders for which genetically-motivated transplantation is in scope, it is conceivable that stem cell lines could be genetically modified to remove a mutation, and then transfused back into a patient.

Two further obstacles make autologous stem cell therapeutics challenging for genetic disorders for the near future:

  • transplantation with stem cells increases cancer risk; and

  • error rates in gene-editing further increase cancer risk.

Despite more limited prospects for treatment, stem cell lines are valuable in investigational studies because they can differentiate into different cell types, sparing the need to extract those tissues from patients.

Gene-editing and gene therapy

Gene editing and gene therapy is a theoretical silver bullet for all genetic disorders, and with enough investment in basic scientific research, it will almost certainly one day become reality.

Conceptually, gene editing involves editing out defects in an organism’s genome by inserting, replacing or deleting elements in an organism’s genetic code.

Practically, there are three major high-level challenges with most approaches:

  1. Delivering the gene-editing agents to every cell (or every cell of interest).

  2. Ensuring that the editing error rate is low enough to avoid introducing additional mutations (and likely cancer in the process).

  3. Managing the immune response to delivery vectors.

For delivering gene-editing agents, engineered viruses are the most popular platform.

New genetic material (such as a functioning gene version of a gene) can be delivered directly as a separate fragment of DNA called a plasmid.

Alternatively, genetic material can be integrated into the host genome via techniques like Zinc fingers, TALENs, CRISPR/Cas9 or meganucleases.

Transcriptomics-driven therapeutics

In the context of human disease, transcriptomics (or RNA sequencing) has the potential to identify genes disregulated as the consequence of the disease.

Transcriptomics also has the potential to bring a diagnosis when genome or exome sequencing fails, because RNA sequencing can pick up unusual splice errors or transcript variants that could be hard to find with static sequencing techniques alone.

If substantial disregulation is identified, then databases like cMap and LINCS cloud can be used to compute a perturbagenic cocktail of compounds designed to bring gene regulation closer to baseline.

Care must be taken in interpreting results, as some disregulation could be a compensatory response to the defect. Some apparent “disregulation” could simply be background variation in the individual.

Transcriptomics requires conducting RNA sequencing on as many patients and close relatives as possible in order to increase statistical confidence and separate core disregulation from transcriptional artifacts.

Restoring baseline expression where disregulation was compensatory could be anti-therapeutic.

An advantage of a transcriptomics-driven approach to disease therapeutics is that it holds the potential of addressing a broad class of downstream mechanisms simultaneously, and it could be utilized even in the absence of a firm diagnosis, because RNA sequencing can capture a snapshot of the mechanism of harm as it passes through the transcriptome.

The disadvantage of a transcriptomics-driven approach is that it does not target the primary mechanism of harm.

While transcriptomics could be used on the level of an individual patient given sufficient samples, it is certainly more effect with RNA sequencing data available from the larger population, since this should allow it to identify core disregulation in a disorder.

RNA sequencing is not generally available in a clinical context, so this approach require partnering with an academic partner.

Proteomics-driven therapeutics

Proteomics measures the protein types and quantities present in an organism across specific tissues, environments and times.

For human disease, proteomics can identify proteins disregulated as a consequence of the disease.

SomaLogic has a platform for blood-based proteomics that has the potential to identify biomarkers in human disease, even down to a personalized level (Hathout, et al., 2015).

As with transcriptomics, disregulated proteins found in proteomics may also suggest regulatory strategies for therapeutics, with the caveat that some proteins may be disregulated as a compensatory response.

Step 6: Discovering drug candidates

Given a target for a therapy, how do you find a drug that hits the target?

Conducting high-throughput screening

High-throughput screening attempts to test thousands of drug candidate compounds simultaneously using robotics to automate the process.

High-throughput screening requires the selection of a compound library and the discovery of a high-precision assay that can recognize when a mechanism of harm has been mitigated.

Assays must be engineered to have a high signal-to-noise ratio to rule out excessive false positives in large screens.

For each compound that gets a hit, the next step is to validate the compound.

Structure-based drug design and virtual screening

Structure-based drug design and virtual screening are computational methods for designing and searching for drug candidates (which are generally inhibitors).

In structure-based drug design, the objective is to design a small molecule that is roughly opposite in structure and charge to a target domain on an enzyme.

Virtual screening scans compound libraries for potential ability to bind with and inhibit target domains on proteins.

Because of the approximative nature of computational methods, predicted candidates from these methods should proceed to compound validation.

High-fidelity virtual screening would often require intractable simulations with molecular dynamics, so docking simulations may be used in lieu of full physical simulation.

There is software available for conducting these simulations:

  • PyRx can screen a protein against possible inhibitors.
  • ZINC is a database of structures for commercially available compounds.
  • FAF-Drugs3 is a filtering package to predict pharmacokinetics.

For conducting follow-up simulations on hits with full molecular dynamics, both NAMD and GROMACS can be used.

Conducting model organism screening

Once a model organism has been created for a disorder and its phenotype has been robustly characterized, then the organism may be used as a platform for screening potentially therapeutic compounds.

For some model organisms, it is possible to employ automation to conduct phenotyping, which may enable large amounts of compounds to be tested.

Conducting genetic suppressor screening

A compound-based screen on model organisms can identify compounds that impact the phenotype, but a suppressor screen can identify genes that modify the phenotype, and these genes may be useful as the basis for investigating therapeutics.

In a suppressor screen, mutagenic agents are introduced into a large population of model organisms.

If any of the resulting double mutant organisms show improvement in their phenotype, then the mutant can be sequenced to determine which gene was modified.

Validating a candidate compound

If a screen produces a hit or a compound is hypothesized to be therapeutic, the critical next step is to validate the compound in a laboratory setting.

Initial validation involves using the appropriate assays discovered while investigating the mechanism of harm to determine whether that compound mitigates a particular mechanism.

For example, if a read-through compound is predicted to increase the expression of the wild-type protein, an antibody for the protein should be able to detect its presence.

If validation with cells succeeds or validation with cells is not possible, the next step is validation against the phenotype of model organisms.

If cell-based and organism-based validation succeed, the next step is to apply medicinal chemistry to the compound to convert it to clinical material suitable for clinical trials in humans.

Step 7: Applying medicinal chemistry to candidates

How does a hit on a screen become a drug?

When compounds are first identified either through screening or rational predictions, it most likely not the case that these compounds will have regulatory approval, or even that these compounds will be non-toxic and effective in patients.

As strategies for identifying and developing molecular therapeutics begin to yield these candidates, for any candidates without regulatory approval, medicinal chemistry will be required to transform these compounds into a form suitable for conducting a clinical trial.

As a broader discipline, medicinal chemistry aims to manipulate the efficacy, toxicity and delivery of a compound.

In other words, medicinal chemistry is a multidisciplinary engineering process that begins with a molecule that demonstrates efficacy on an assay in cells and ends with a derivative of that molecule that is intended to be safe and effective.

In general, medicinal chemistry challenges have to be re-solved for each molecule, although some platforms, such as exosomal encapsulation (see a review in (Batrakova and Kim, 2015), provide the possibility of aiding delivery for a larger class of molecules.

Crossing the blood-brain barrier

While there are many challenges in medicinal chemistry, one often stands out, especially in diseases impacting the brain: crossing the blood-brain barrier.

The selective permeability of the endothelial cells in the brain prevent many molecules from crossing, which makes drug delivery to the brain a major challenge.

Enzyme replacement challenges

Enzyme replacement therapy (and large molecule therapy in general) also requires special considerations, both for its increased challenges in crossing the blood-brain barrier and also for the need to control targeting to specific tissues or intracellular compartments.

Enzyme replacement may also require targeting a specific organ, tissue or intracellular compartment.

Within a cell, targeting the lysosome with a synthetic enzyme is perhaps among the easiest, because the natural process of phagocytosis naturally tends to direct large molecules to the lysosome.

For delivery to the cytoplasm of the cell, attaching cell-penetrating peptides (such as the TAT peptide) to an enzyme can improve cell penetrance.

PEGylation of the enzyme may also be beneficial in reducing the immunogenicity of the protein (reducing side effects) and in reducing renal clearance (improving availability and increasing half-life).

Step 8: Conducting clinical trials

How does one validate the safety and efficacy of a compound?

Clinical trials attempt to determine the safety and efficacy of a therapeutic, and they are required by regulatory agencies in most countries before a compound may be marketed.

Clinical trials typically have four phases:

  • Phase 0: First-in-human. Pharmacodynamics and pharmacokinetics study. About a dozen volunteers.

  • Phase 1: Safety testing. Dose range determination. Side effect observation. A few dozen volunteers.

  • Phase 2: Effectiveness testing. A few hundred patient volunteers.

  • Phase 3: Large-scale safety and effectiveness testing. A few thousand patient volunteers.

Clinical trials are often placebo-controlled and double-blinded, so that some participants are receiving a therapeutic and others are receiving a placebo.

Conducting a clinical trial on a single patient

Perhaps one of the greatest epistemological and regulatory challenges for precision medicine is that there may be so few patients that a placebo-controlled trial is unlikely to yield the statistical confidence necessary to validate the approach.

At the moment, there is no regulatory framework in place for single-patient trials, although proposals for “n=1” trials are circulating in the academic community.

In the U.S., if a patient wishes to take a compound that does not have FDA approval, the manufacturer must agree to provide it, and the patient must petition the FDA for permission through expanded access.

Part II: Questions and answers

This is part two of the guide.

It began as a glossary, but has evolved into a question and answer format.

I am striving to explain entries in more patient-friendly language and in the context of human disease.

In fact, not all of the questions below refer to topics above; some are there because they may appear in a diagnostic report.

You can read each entry as needed as a reference, but I have tried to order the questions so that you can also read it top to bottom as a tutorial on genetics and precision medicine.

It is by no means complete, and I expect to be updating this segment of the guide regularly.

If you’re already trained in a field of science or engineering, then, once again, I recommend Quickstart Molecular Biology:

It’s a rapid introduction to the field, targeted at those that already have a technical background (broadly speaking).

What is a phenotype?

In the context of a patient, a phenotype is a collection of symptoms for a disorder.

More generally, phenotype refers to the observable or measurable characteristics of an organism, whether in cells, model organisms or human patients.

Everything observable – from hair color to seizures to blood platelet levels – counts as part of the phenotype.

What is a genome?

The human genome is an instruction manual for building and operating a human being at the molecular level.

This instruction manual exists in every cell of the human body, and it is encoded in the long string-like molecules of DNA.

What is a genetic condition?

A genetic condition results from damaging alterations to the genome of an organism.

Most genetic conditions are the result of alterations inherited from one or both parents, and in these, the alterations are present in every cell.

There is a less common class of genetic conditions in which only some fraction of a patient’s cells experiences a condition – a situation known as somatic mosaicism.

Cancer is an example of an genetic condition that begins with alterations to the genome of a single cell in a previously healthy patient.

What is an exome?

The exome is the part of the genome that contains instructions for constructing proteins, and it constitutes about 1% of the entire human genome.

The remainder of the genome outside of the exome is the non-coding region.

Despite its smaller size, it is estimated that most genetic disorders arise from mutations that alter proteins.

Sequencing the exome instead of the whole genome is an economical way to look for the root cause of genetic disorders.

What is a mutation?

A mutation is an alteration of the genome.

What is sequencing?

Sequencing is a process that uses cells (usually from blood) to read the genome (or exome) of an individual.

Sequencing permits the identification of mutations.

At present, sequencing the exome is less expensive than sequencing the entire genome.

What is a gene?

A gene is a region in the genome that encodes the instructions for building a gene product.

What is a gene product?

A gene product is a molecule encoded by a gene.

There two kinds of gene products: proteins and non-coding RNAs.

In most genetic disorders studied today, errors in protein-coding genes are responsible, but there are disorders, such as Prader-Willi Syndrome, in which non-coding RNAs are implicated.

A protein-coding gene is a region of DNA that contains the instructions for building a protein written in the standard genetic code.

What is a protein?

Protein is a class of molecules that plays a significant role in life.

Proteins are the key actors in cells, and they play many roles, including:

  • enabling molecular transformations and reactions (cellular metabolism);
  • serving as structures within and between cells;
  • mediating communication within and between cells;
  • serving as molecular transporters;
  • conducting cell replication; and
  • building and modifying other proteins.

Structurally, a protein is a sequence of amino acids that folds into a 3D shape in order to perform its function.

The instructions for building a protein will be found in a gene, and the instructions will be written in the standard genetic code.

What is DNA?

DNA is a large molecule that stores the information within the genome.

Viewed as information, a DNA molecule is a long word written in an alphabet containing the letters A, T, C and G.

At a structural level, DNA is composed of two opposing strands, and each strand is a sequence of nucleotides, and together, the two strands form the famous double helical structure.

The two opposing strands in a molecule of DNA are related: A’s and T’s pair together between opposing strands and C’s and G’s pair together, as in the following simple example of two strands:

A - T
G - C
T - A
C - G
T - A

A pair of nucleotides linked together within DNA are known as a [base pair].

What’s an example of a protein-coding gene in DNA?

The following DNA sequence is a simple gene that encodes a protein found in the saliva of Gila monsters:


The process that converts this DNA sequence into its corresponding protein is protein synthesis.

What is a nucleotide?

A nucleotide is a molecule that represents one of the letters in the alphabet for DNA or RNA.

The four nucleotides in DNA are adenine (A), thymine (T), guanine (G) and cytosine (C).

The four nucleotides in RNA are adenine (A), uracil (U), guanine (G) and cytosine (C).

What is a base pair?

A base pair is a pairing of two complementary nucleotides in DNA on opposite strands of the DNA helix.

A is complementary to T, and C is complementary to G.

[base pair]

What is RNA?

RNA is another information-bearing molecule made of nucleotides. It is similar to DNA, except that it is single-stranded, and in place of thymine there is uracil.

In terms of information content, RNA is a second molecular alphabet composed of A, U, C and G.

When building proteins, genes within DNA are transcribed into RNA, and the RNA is processed before being translated into a protein.

(T becomes U during transcription from DNA to RNA.)

In addition to bearing information, some RNA molecules, known as non-coding RNAs, do not translate into proteins, but still have an active biological role.

What is non-coding RNA (ncRNA)? What is functional RNA?

A non-coding RNA molecule (ncRNA) is an RNA molecule that does not end up being translated into a protein.

Non-coding RNA may be called functional RNA to emphasize the fact that even though it does not translate into a protein, it may still have an active biological role, especially in terms of gene regulation.

What is the non-coding region of the genome?

The non-coding region of the genome is the region that does not encode proteins.

Some regions in the non-coding region contain non-coding RNAs that do not become proteins, yet play an active role in the cell.

What is an oligonucleotide?

An oligonucleotide (oligo being Greek meaning “a few”) is a short strand of RNA or DNA.

These short RNA sequences can play active biological roles, especially in gene regulation and splicing.

Synthetic oligonucleotides form a potential basis for some therapies.

What is a variant/allele?

A variant or allele is a version of a gene.

Though all humans have roughly the same set of genes, each of us has two alleles for most genes – one from the chromosome from our father, the other from the chromosome from our mother.

(The exception is for genes on the Y chromosome in males.)

For most genes, there is a large collection of common variants that make up the bulk of the alleles in the population.

Mutations in a gene produce new alleles.

What is a wild-type allele?

A wild-type allele for a gene is a commonly occurring non-pathogenic version found in nature.

In disorders that impact only one of two alleles for a gene, it may be therapeutic to target the wild-type allele by boosting its activity in loss of function disorders and reducing its activity in gain of function disorders.

How do mutations change a genome?

A mutation is an alteration of the genome (inserting, changing or deleting letters).

In the context of the Gila monster gene, we can imagine a mutation that transforms the sequence:


into the sequence:


In this case, the fourth letter was changed from C to G.

Geneticists have even developed a notation (called HGVS notation) for describing mutations at the DNA level, and this one would be called c.4C>G

Under some circumstances, a mutation (or collection of mutations) may lead to a genetic disorder.

And, when a mutation happens in an individual cell later in life, it may give rise to cancer.

What is a de novo mutation?

A de novo (a Latin expression implying newness) mutation is one that it is unique to a child, and not found in either parent.

Through chance, every human being carries a few de novo mutations.

While de novo mutations are usually harmless on their own, they are often scrutinized in cases of rare disease.

What is an inherited mutation?

A mutation is inherited if it has been passed from parent to child.

What is a germline mutation?

A germline mutation is a de novo mutation that occurs early in development of an organism.

The cells, tissues and organs that descend from the mutant cell carry the mutation, but the rest do not.

The result is somatic mosaicism.

What is a genotype?

A genotype is the set of alleles for a specific organism.

In the context of disease, genotype also refers to the specific alleles responsible for causing the disease.

During the diagnostic phase of a genetic disorder, genotype may also refer to the collection of mutations under suspicion uncovered by sequencing.

What is a pathogenic variant/mutation?

A variant is pathogenic if it can cause disease.

What is pattern of inheritance?

In the context of human disease, the pattern of inheritance for disease refers to how genes must be inherited from parents in order to exhibit the disease.

The three common patterns of inheritance for genetic disorders are:

  • autosomal recessive, in which there is a one in four chance that a child of two carriers will have the disorder;

  • autosomal dominant, in which there is a one in two chance that a child of a carrier will have the disorder; and

  • X-linked, in which there is a one in two chance that the son of a mother carrying the disorder will have the disorder while daughters have a one in two chance of being a carrier.

What is a chromosome?

At a structural level, a chromosome is a long string of DNA plus packaging material that holds it together and aids in regulating [gene expression].

At a genetics level, a chromosome is a collection of genes.

Humans carry 23 pairs of chromosomes (for 46 total).

One of these 23 pairs is the sex chromosome pairing – two X chromosomes for women, an X and a Y chromosome in men.

Each pair of the remaining pairs are autosomes.

What is an autosome?

Within each pair of chromosomes, each autosome carries a set of genes redundant (often called “homologous”) with the other.

This redundancy built in to autosome pairs provides protection against mutations: if one allele of a gene is damaged, there is a good chance the allele on the other autosome is still viable, and in many cases, one functional copy of a gene is sufficient.

When a condition is “autosomal” it means that it involves one of the 22 pairs of autosomal chromosomes.

What is an X-linked condition?

An X-linked condition is one in which a gene on the X chromosome is impacted.

Because men have only one copy of the X chromosome, these conditions tend to present only in men, while women are carriers.

In an X-linked condition, the odds of mother that carries a condition will produce an affected son is 1 in 4.

If the mother knows she is carrying a boy, the odds of being affected increase to 1 in 2.

What is the difference between homozygous and heterozygous?

An individual is homozygous for a gene if they have two identical alleles for that gene.

An individual is heterozygous for a gene if they have two different alleles for a gene.

What is a compound heterozygous individual?

In human disease, an individual is compound heterozygous for a gene if they have two different alleles for a gene, yet have a recessive disease caused by that gene.

This is is usually the the result of two different loss of function alleles.

What is haplosufficiency?

With the exception of the Y chromosome in men, each gene has two copies in the genome.

A haplosufficient gene is one for which only functioning allele is necessary for full function.

A haploinsufficient gene requires both copies of the gene for full function.

When a loss of function variant is discovered for an autosomal gene, it is important to consider whether that gene is haploinsufficient.

What is a dominant disorder?

If a pathogenic mutation can cause disease by itself, then it is dominant.

If a parent has a dominant pathogenic mutation, that parent should have the disorder, and one out of every two children on average will have the same disorder.

What is a recessive disorder?

If a pathogenic mutation requires being paired with another pathogenic mutation (usually in the same gene) to cause disease, then it is recessive.

If someone has only one copy of a recessive mutation, then they are a carrier.

Many genetic conditions are recessive because humans carry two copies of most genes (one on each autosome), and usually only one working copy of a gene is necessary.

What is a carrier?

In the context of human disease, someone is a carrier for a recessive disorder if they have a pathogenic mutation for a gene that can cause the disorder, but also a functioning copy of that gene as well.

Carriers often have no symptoms, and in some cases have mild symptoms.

If two carriers for a autosomal recessive condition have a child, on average, on out of four children will have the condition.

If a carrier (a mother) for an X-linked condition has a child, then there is a one out of two chance that any boy will have the condition.

What is an exon? What is an intron?

In most genes, the sequence of DNA for a gene will be composed of both exons and introns.

Exons are the subsequences of a gene that encode protein structure, while the remaining regions between exons – introns – are ignored during protein construction.

When proteins are constructed, there is a splicing phase that removes introns.

Exome sequencing focuses primarily on exons.

What is transcription?

Transcription is the first phase of converting a gene encoded in DNA into a protein.

In transcription, the DNA for a gene is copied into its corresponding RNA transcript.

After some additional processing, RNA is then translated into an amino acid sequence, and the amino acid sequence folds into a protein.

Some strategies for treating genetic disorders such as readthrough and exon-skipping involve interacting with the RNA for a gene after transcription.

What is an example of a coding RNA sequence after transcription?

For example, once transcribed into the protein-coding portion of the RNA, the code for the protein from the Gila monster looks like:


What is a transcript?

A transcript is the RNA molecule that has been produced for a gene.

(A transcript may also be called the primary transcript to distinguish it from RNA that has been processed through mechanisms such as splicing.)

What is messenger RNA (mRNA)?

Messenger RNA (mRNA) is the RNA that remains after splicing, and is destined for translation into a protein.

What is RNA sequencing?

RNA sequencing takes a snapshot of the active RNA transcripts in a cell at a given time.

RNA sequencing yields insights into which genes are expressed by a particular cell and the quantity of expression.

What is transcriptomics?

Transcriptomics uses RNA sequencing to interrogate the transcripts present in specific cells and across time and environments.

Transcriptomics often seeks to discover regulatory relationships between genes.

In the context of human disease, transcriptomics can identify disregulated genes and suggest corrective therapies.

What is an amino acid?

An amino acid is an individual building block for a protein: proteins are created as chains of amino acids joined together.

Each amino acid has a side chain, with properties such as hydrophobicity (attracted to or repelled by water) and electrical charge (positive or negative).

To a large degree, the properties of the side chains determine the structure and function of the entire protein.

(For example, some proteins require chaperones to achieve their intended structure.)

For instance, mutations in DNA can change one amino acid to another in the resulting protein, which may in turn enhance or degrade the structure and function of the resulting protein.

There are 20 standard amino acids ordinarily used for protein synthesis:

  • Alanine (Ala, A)
  • Arginine (Arg, R)
  • Asparagine (Asn, N)
  • Aspartic acid (Asp, D)
  • Cysteine (Cys, C)
  • Glutamine (Gln, Q)
  • Glutamic acid (Glu, E)
  • Glycine (Gly, G)
  • Histidine (His, H)
  • Isoleucine (Ile, I)
  • Leucine (Leu, L)
  • Lysine (Lys, K)
  • Methionine (Met, M)
  • Phenylalanine (Phe, F)
  • Proline (Pro, P)
  • Serine (Ser, S)
  • Threonine (Thr, T)
  • Tryptophan (Trp, W)
  • Tyrosine (Tyr, Y)
  • Valine (Val, V)

Each amino acid is represented by one or more codons in the standard genetic code.

Under the running example of the Gila monster protein, the chain of amino acids encoded by the gene becomes:


How are proteins created?

During the construction of a protein (often called protein synthesis), a gene is first transcribed into RNA.

Once transcribed into RNA, the introns are spliced out.

The resulting RNA, which contains only the exons, is called messenger RNA.

The messenger RNA is translated into a sequence of amino acids according to the standard genetic code.

Many proteins are further modified through post-translational modifications after synthesis.

What is RNA splicing?

When the DNA for a gene is first transcribed into RNA, it contains both exons (which contain part of the code for a protein) and introns (which do not).

RNA splicing is the phase in which the nucleotides for introns are removed.

For protein-coding genes, the resulting RNA is messenger RNA.

Mutations in introns that impact splicing can still result in pathogenic alterations to the resulting protein.

In some cases, splicing will also remove exons, resulting in a transcript variant.

What is a transcript/splice variant?

When a RNA splicing removes one or more exons, it creates transcript variants.

Transcript variants lead to different proteins, some of which have modified functionality.

Attempting to force the synthesis of an transcript variant by deliberating skipping an exon is a therapeutic strategy for some mutations.

Since mutations are usually reported with respect to a transcript variant, it is important to make sure that the transcript variants are identical when comparing mutations, or else to interconvert the mutations between transcript variants.

What is the standard genetic code? What is a codon?

The standard genetic code maps three-letter patterns in DNA to codons.

A codon is an individual instruction in the list of instructions for building a protein.

A codon encodes one of two types of instructions:

  • insert a specific amino acid next; and

  • stop production of the protein.

For example, the DNA codon ATG (which becomes AUG in RNA) means “insert a methionine next,” while DNA codon TGA (which becomes UGA in RNA) means “stop.”

There are 64 possible codons, but only 20 of them represent unique amino acids. (For example, AAA and AAG both mean “insert a Lysine.”)

Under this interpretation of DNA, we can re-orient the Gila monster gene into codons and their corresponding instructions, much as if it were a computer program or a recipe:

AAC;  // Insert Asparagine
CTG;  // Insert Leucine
TAT;  // Insert Tyrosine
ATT;  // Insert Isoleucine
CAG;  // Insert Glutamine
TGG;  // Insert Tryptophan
CTG;  // Insert Leucine
AAA;  // Insert Lysine
GAT;  // Insert Aspartic Acid
GGC;  // Insert Glycine
GGC;  // Insert Glycine
CCG;  // Insert Proline
AGC;  // Insert Serine
AGC;  // Insert Serine
GGC;  // Insert Glycine
CGC;  // Insert Arginine
CCG;  // Insert Proline
CCG;  // Insert Proline
CCG;  // Insert Proline
AGC;  // Insert Serine
TGA;  // Stop

After running this program, we have the following sequence of amino acids stitched together:


What is protein translation?

During protein synthesis, protein translation is the construction of an amino acid sequence from the corresponding RNA under the standard genetic code.

What is proteomics?

Proteomics interrogates all of the proteins present in specific cells and across time and environments.

At present, there are a variety of approaches for conducting proteomics.

In the context of human disease, proteomics can identify disregulated proteins and suggest corrective therapies.

What are the types of mutations in proteins?

With an understanding of the standard genetic code, it is possible to categorize mutations according to their impact on the protein.

The major categories for mutations in protein-coding region of genes are:

How do you interpret HGVS notation for mutations?

The HGVS specification provides a shorthand notation for describing mutations in a gene.

Variants are commonly reported in two different ways at the same time, in the coding DNA notation and in the protein-coding notation.

The coding DNA notation (prefixed with c.) indicates at which nucleotide in the coding DNA the mutation starts and what kind of mutation it was.

The coding DNA is the DNA that remains when the introns are spliced out.

For example:

  • c.24G>C means that the 24rd nucleotide was changed from G to C.

  • c.24Cdel means that the 24th nucleotide was deleted.

  • c.67_68insT means that a T was inserted between nucleotides 67 and 68.

The protein-coding notation (prefixed with p.) indicates the effect of a mutation on the amino acid sequence for a protein.

For example:

  • p.R401X means that the 401st codon was changed from an arginine codon to a stop codon. (Possibly also written p.Arg401Ter or p.Arg401*)

  • p.Q631SfsX7 means that the 631st codon was changed from a glutamine to a serine due to a frameshift mutation that resulted in a new stop codon 7 codons away. (Possibly als written as p.Gln631fsTer7 or p.Gln641fs*7)

What is a premature stop / framestop / nonsense mutation?

A premature stop (also called nonsense) mutation truncates construction of the protein by turning a codon for an amino acid into a stop codon.

For example, changing the first A to T in AGA turns it into TGA.

AGA codes for the amino acid arginine, but TGA codes for stop.

Truncation usually destroys the function of the resulting protein. (In fact, nonsense-mediated decay may even prevent the production of proteins with such mutations.)

For example, the mutation p.R401X (possibly also written p.Arg401Ter or p.Arg401*) indicates that the 401st amino acid (an arginine) has been replaced by a stop codon.

For a premature stop mutation, readthrough compounds should be investigated as potential therapeutics.

What is a frameshift mutation?

A mutation that inserts or deletes a number of nucleotides that is not an even multiple of three will cause a “frameshift” in which subsequent codons are misinterpreted.

Frameshifts almost always cause a loss of function mutation in the resulting protein.

For example, c.C1891del (also written as p.Q631SfsX7) indicates that a the 1,891st nucleotide in the coding DNA for a protein (in this case a cytosine) was deleted, which caused the codon at position 631 (formerly a glutamine) and all subsequent codons to be garbled, until the introduction of a stop seven codons down.

What is an in-frame mutation?

An in-frame mutation is an insertion or deletion of nucleotides in a protein-coding region which is an even multiple of three.

This may add or remove one or several amino acids.

In-frame mutations are generally less damaging than premature stop or frameshift mutations.

What is a missense mutation?

A missense mutations changes a codon from one amino acid to another.

If a missense changes the type of the side chain of the amino acid (e.g. hydrophobic to polar, positive to negative) it is more likely to damage the function of the protein than missense mutations that do not.

For example, p.W244R indicates a tryptophan has become an arginine at codon 244 (which changes a hydrophobic side chain to into a positively charged side chain). In some cases, this could be a loss of function mutation.

To be clear, even a mutation that does not change the side chain type (such as alinine to valine) can still be pathogenic, as it can alter the function of the protein, as in the case of prion disease.

If the effect of a missense mutation is unclear, simulating protein folding may provide insight into the pathogenicity based on its impact on structure.

What is a synonymous mutation?

A synonymous mutation changes the underlying nucleotides for a codon, but it does not change the amino acid.

For example, a mutation that changes TTA into TTG probably has no effect, since both codons insert the amino acid Leucine.

What is protein folding?

Once a protein is synthesized as a chain of amino acids, that chain begins folding into a 3D structure determined by the side chains on its amino acids; properties such as the acidity (pH) of its environment and temperature and the presence of chaperone proteins.

As a principle, folding tends to minimize the number of hydrophobic (“water-fearing”) side chains on the exterior of the final form.

The final structure of the protein determines its function.

As an example, the Gila monster protein, after folding into its 3D shape, looks like:

Trp Cage Protein

Proteins that fail to fold correctly can lead to human disease.

What is primary protein structure?

The primary structure of a protein is the sequence of amino acids from which it is made.

What is secondary protein structure?

Secondary protein structure refers to commonly occurring local units of structure in proteins that span several amino acids. A beta sheet is an example of a commonly occurring secondary unit of structure.

What is a tertiary protein structure?

Tertiary protein structure refers to the 3D shape of a protein after folding is complete.

What is a quaternary protein structure?

Quaternary protein structure refers to the arrangements of multiple proteins in a complex.

What is a protein complex?

A protein complex is a multi-protein structure.

What is simulated protein folding?

Protein folding simulation is a technique in computational biology for predicting the folded structure of a protein.

While ab initio protein folding simulations can be computationally expensive and accuracy is problematic, a computed folding may offer evidence as to whether domains of function in a protein are still functional, or whether their binding affinity has been altered.

Programs such as Phyre2 attempt to predict the folding of a protein from sequence data, which may yield insight into the structure of a mutant protein.

What is crystallography?

Crystallography is a collection of methods for studying and determining the structure of a crystal.

By freezing proteins into crystals, crystallography (in particular, x-ray crystallography), can determine their 3D structure.

What is an antisense oligonucleotide?

An antisense [oligonucleotide] is a short sequence of RNA or DNA designed to bind to a particular target sequence and effectively nullify it.

For example, if the RNA sequence to be muted is AUAG, then the antisense oligonucleotide is UAUC.

Because antisense oligonucleotides bind to their complements, they can mute them during protein translation.

In the case of [knockdown] model organisms, these complementary fragments of RNA can silence an entire gene.

In the case of exon-skipping therapeutics, an antisense oligonucleotide for a particular exon (such as the one containing the harmful mutation) can cause the exon to be spliced out during protein construction.

What is a post-translational modification?

Proteins are often modified after translation by processes such as glycosylation, phosphorylation or methylation.

The sites at which proteins are modified is often based on consensus sequences among amino acids in a protein.

There are a variety of tools for predicting post-translational modification sites.

For example, N-linked glycans are usually attached to proteins at an asparagine which is followed by any amino acid except proline followed by either a serine or threonine.

A mutation that alters the consensus sequence – for instance changing the asparagine to an aspartate – would likely prevent N-linked glycans from being attached, which may in turn alter the function or stability of the resulting protein.

What functions do proteins have?

Proteins serve a variety of functions within a cell.

In a genetic disorder that involves a protein, the type of protein involved may influence therapeutic strategies.

Major types of proteins include:

  • enzymes;
  • receptors;
  • chaperones;
  • hormone/signaling proteins;
  • storage proteins;
  • motor proteins;
  • immune proteins;
  • protective proteins;
  • transporter proteins;
  • structural proteins; and
  • regulatory proteins.

Some proteins have more than one type.

What is a domain of function?

A domain of function is a region within a protein that is responsible for a specific function.

A given protein may have several domains which work together, domains that work independently or some combination thereof.

When analyzing mutations, it is useful to examine their potential impact on the known domains of function within that protein.

What is an enzyme? What is a metabolic pathway?

An enzyme is a protein that enables chemical reactions and molecular transformations.

Thus, an enzyme creates a metabolic pathway from the inputs to the reaction to the outputs of the reactions.

A metabolic pathway is an enzyme-driven process for conducting chemical reactions and molecular transformations.

A pathway diagram for the enzyme lactase illustrates the pathway:

              ----> galactose
lactose ---
              ----> glucose

Some humans with mutations in the gene for lactase – LCT – cannot produce lactase, and as a result, they cannot digest lactose, a condition known as lactose intolerance.

(In fact, as humans age, they produce less lactase, resulting in increasing lactose intolerance as this pathway shuts down.)

Collectively, enzymes and the pathways they define determine the metabolism of an organism.

What is a substrate?

A substrate is a compound on which an enzyme acts.

For example, for the enzyme lactase, its substrate is lactose.

What is a transporter protein?

A membrane transporter protein is like an automatic door on the surface of the cell (or an intracellular compartment) that only opens for specific molecules.

Membrane transporter proteins are selectively permeable membrane-bound proteins that regulate the movements of molecules inside and outside of a cell and between intracellular compartments.

Ion channels are an example of a transporter protein.

What is an ion channel?

Ion channels are formed by membrane-bound transporter proteins that regulate the flow of ions into and out of a cell or intracellular compartment.

A defect in a ion channel protein can lead to a channelopathy.

What is a channelopathy?

Channelopathies are disorders caused by defects in ion channel proteins – specific class of transporter protein – which regulate the flow of ions across a membrane.

What is a receptor?

Receptors are (generally) membrane-bound proteins that transmit signals across a membrane boundary, often from outside a cell to inside a cell.

When an agonist docks with its target receptor, it causes the release of a secondary messenger compound on the opposite side of the membrane.

Receptors can have a baseline activity level – known as constitutive activity – even in the absence of their agonist.

With receptors, antagonists play the role of inhibitors, blocking agonists from reaching the receptor and preventing the transmission of a signal.

Unlike enzymes, receptors may also be susceptible to an inverse agonist, which lower their constitutive activity. (Enzymes have no background activity in the absence of a substrate.)

What is constitutive activity?

Constitutive activity is the baseline activity of a receptor in the absence of an agonist.

What is a secondary messenger?

A secondary messenger is the compound released by a receptor when stimulated by an agonist on the opposing side of a membrane.

A secondary messenger may trigger a cascade of reactions to the external stimulus.

What is an agonist?

An agonist is an agent that stimulates a receptor to release its secondary messenger.

What is an antagonist?

An antagonist is an agent that prevents an agonist from stimulating a receptor.

What is an inverse agonist?

An inverse agonist lowers the constitutive activity of a receptor.

What is a chaperone?

A chaperone protein aids in the folding of other proteins, or helps to maintain protein folding in the presence of stress (such as a higher temperature).

What is cytogenetic notation?

Cytogenetic notation is the notation used to describe regions within chromosomes.

The general format of the notation is:

<chromosome number> 'p' or 'q' <band number> . <sub-band number>

For example, 3p24.2 indicates chromosome 3, p indicates the short arm of the chromosome, 24 is the 24th colored band, and .2 indicates the second sub-band within the 24th band.

Characteristic bands show up on chromosomes when they’re stained with trypsin, and these bands are identify regions.

In a condition with chromosomal abnormalities, it is important to to determine from the cytogenic notation which regions of the chromosome have been deleted or duplicated.

The UCSC genome browser can list all of the impacted genes in a region.

With the advent of sequencing, chromosomal abnormalities are now often reported in very specific ranges of base pairs.

What is somatic mosaicism?

Somatic mosaicism is the technical term for what happens when mutations happen early on during development – in the germline.

For example, if a mutation happens in a cell when a developing human being is roughly ten cells, than that mutation may be present in about 10% of the resulting cells.

Somatic mosaicism can lead to an individual suffering from a disorder incompletely, and it can also complicate diagnostic sequencing: if the tissue used for sequencing does not contain the pathogenic mutation, then sequencing will not find it.

What is a mechanism of harm?

A mechanism of harm is the process by which a diseases causes harm.

The primary mechanism of harm is the root cause of the disorder, which in the cause of genetic disorders, is a mutation or group of mutations.

A downstream mechanism of harm is a later link in the chain of causes.

For example, in cystic fibrosis, according to (Ratjen, 2009) the initial loss of function in the CFTR gene leads to:

  • defective chloride and thiocyanate ion transport across cell membranes;

  • which leads to loss of surface liquid in the airway;

  • which leads to destabilization of cilia and loss of mucociliary transport;

  • which leads to retention of phlegm;

  • which leads to infection;

  • which leads to inflammation;

  • which in turn aggravates the retention of phlegm.

It may be possible to devise therapeutic strategies that intervene at any link in the chain between mechanisms of harm.

What is a model organism?

A model organism is an organism that has been genetically modified or bred to exhibit a specific phenotype or the analog of a human disorder.

Common model organisms include:

  • E. coli bacteria;
  • yeast (often Saccharomyces cerevisiae);
  • worms (often Caenorhabditis elegans);
  • fruit flies (often Drosophila melanogaster);
  • zebrafish (Danio rario);
  • mice; and
  • rats;

and there are dozens of other model organisms also used in research.

In the context of genetic disorders, model organisms may have human mutations introduced (a knock-in), or an entire gene removed (a knockout).

Creating model organisms is useful in many stages of disease research, from discovery and diagnosis to the development of therapeutics.

What is a knockout organism?

A knockout organism is one in which a gene has been removed.

Knockout organisms are useful for studying the role of a particular gene.

Knockout organisms are also useful for studying recessive disorders in which the primary mechanism is total loss of function in a gene.

What is a knockdown organism?

By introducing interfering RNA for a specific gene into to cells, it is possible to dial down (and even eliminate) the expression of a target gene.

When the interfering RNA is introduced exogenously, the interfering RNA is eventually depleted and gene activity is restored.

It is also possible to construct permanent knockdowns, in which gene expression is dialed down through genetic modification.

Because the amount of interfering RNA can be varied, it is posible to study the effect of differing levels of expression of a gene.

What is a knock-in organism?

A knock-in organism is one in which a specific gene has been introduced.

In organisms that contain the equivalent of a human gene, the human version can be inserted to test its functional equivalence and conservation.

In addition, a disease-causing version of a gene from a human patient can be knocked in to create a more faithful model organism in which to study a disorder.

Some therapeutic strategies can only be tested on knock-in models. For example, over-expression of a mutant allele only works if there is a mutant allele to upregulate.

What is a fibroblast?

In the context of human disease, a fibroblast cell line is a cell line usually created from the skin of patients (and sometimes family members).

What is a stem cell? What is an iPS cell?

A stem cell is an immature cell that can be differentiated into many different cell types, e.g., skin, cardiac tissue, neurons.

Embyronic stem cells are the most differentiable.

Induced pluripotent stem cells (iPS cells) are mature adult cells that have been reprogrammed to behave like early stem cells.

What are the advantages of iPS cell lines?

For genetic disorders, iPS cells make it possible to obtain tissue types which may otherwise be difficult or impossible to extract directly from patients.

What is a biomarker?

A biomarker is an observable indicator of a disease state, often used is clinical trials to measure the effectiveness of a therapy.

An example of a biomarker is the presence of oligosaccharides in urine; this is a biomarker for many lysosomal storage diseases.

As another example, low protein in the cerebrospinal fluid (CSF) is a biomarker for NGLY1 deficiency.

What is an assay?

In biochemistry, an assay is a technique for measuring a property of interest in a sample.

An assay is defined by both a process and a collection of material necessary to execute the process.

Assays are often run by hand, but in some cases, they may be automated for high-throughput screening.

From the perspective of practicing precision medicine, it is critical to note that developing an assay may require an act of scientific discovery, which in turn hinges on the creative process underpinning scientific progress.

For example, an assay that targets the primary mechanism of harm in an enzyme deficiency disorder will (somehow) measure the activity of the enzyme.

Running the assay on patient cells should show no activity for the enzyme, while running the assay on control cells should show activity for the enzyme.

In this example, the assays is useful for validating compounds predicted to restore the activity of the missing enzyme.

For instance, if patient cells are treated with the compound and then run through the assay, and it shows activity, then the compound may be a target for drug development.

What is a molecular therapy?

A molecular therapy is one that targets the molecular basis for a disorder.

In a genetic disorder, a molecular therapy can attempt to correct a mutation itself at a genetic level (as in gene-editing or readthrough therapeutics), or it can attempt to compensate at the protein level (as in mutant protein stabilization), or it can intervene at a downstream mechanism of harm with a molecular basis.

What is gene expression?

Gene expression refers to the presence and quantity of the gene product associated with a gene, and to the act of creating the gene product.

Increasing gene expression for a protein-coding gene implies increasing the protein encoded by that gene.

In some contexts, increasing gene expression refers to increasing the RNA transcription for the gene (and assuming a corresponding increase in the protein).

What is gene regulation?

Gene regulation refers to the processes and entities that manage the expression of genes.

Even though every cell (except for sperm and eggs) contains every gene for an individual, not every cell expresses all of the gene products associated with every gene, and even with a cell, the proteins being expressed depend on the environment and state of the cell.

What is a gene regulatory network?

Genes exert regulatory effects on each other: the expression of one gene may increase or decrease the expression of another gene.

RNA sequencing, the basis for transcriptomics, examines a snapshot of the RNA content in a cell, which can infer regulatory relationships.

A gene regulatory network expresses the regulatory dependence relationships between genes.

What is an antibody?

Within the immune system, an antibody is a kind of protein that is adapted to recognize many different (presumably infectious) agents.

Antibodies have a Y-like structure, and the tips contain a region that can be varied significantly to recognize a wide variety of molecules.

Once an antibody recognizes its target, it signals it for destruction (or may actively disable the target on its own).

Some therapeutics are also based on developing custom antibodies to recognize harmful agents.

In laboratory science, special antibodies are often developed to detect the presence and quantity of particular agents.

What is binding affinity?

With respect to a given structure such as DNA or a domain on a protein, the binding affinity of another molecule is the degree to which this secondary molecule is attracted.

Drug design often depends on optimizing binding affinity.

How do I find the right scientific expert?

One of the most challenging aspects of precision medicine from the patient side is the identification of qualified and appropriate scientific expertise.

The Weber Lab at Harvard has created search engines for expertise:

In standard medicine, patients expect to be able to work with a single physician, but in precision medicine, each stage may require a different physician or scientist.

Evaluating the quality and fit of a potential scientific partner may also be challenging for non-academics.

Questions to answer when considering the fit of a scientist include:

  • Does the scientist have publications that relate to the genes, models or mechanisms of interest?

  • Does the scientist have research funding already aligned with genes, models or mechanisms of interest?

If funding a scientist, it is important to request a plan that explains and justifies both the basic and translational scope of research efforts.

Related pages