Before you read any further, please realize that this document has been constructed by the father of a patient with an ultra-rare genetic disorder – not a licensed medical professional.
It is a distillation of three years of experience after my son was diagnosed as the first patient with his particular genetic disorder.
As a result, it is critical to validate any information in this article with a trained medical professional and a scientific team with domain expertise.
If the process described herein leads you to make predictions about potential therapies, do not attempt them. Consult your health care provider.
For patients and their advocates, my hope is that this guide will serve as a launching point for questions and discussions with professionals, and that it will save individuals on similar journies significant time.
I spoke at Harvard Medical School about the thought process that ultimately led to this article, and the talk is available online:
(There is significantly more detail in this article than in the talk.)
A word on audience
The first part – the guide – is written for an audience with a basic grasp of modern biology.
To make the guide more accessible, I have sometimes use terms more intuitive to a lay audience (such as mutation) at first instead of the more precise term often used by professionals (such as variant or allele).
Most patients and advocates (myself included) don’t come equipped with a grasp of modern biology, so the second part of this document is a Q&A, and the guide has links into the Q&A to explain technical terms as necessary.
Moreover, you don’t have to read the guide in its entirety.
You can trace out the steps relevant to you.
It may be beneficial for newcomers to scan the entire Q&A first, and then circle back to the guide.
If you’re already trained in a technical field, then I recommend Quickstart Molecular Biology to get up to speed rapidly:
If you have a technical background, you can digest this book in about a day.
The guide contains the following high-level steps:
Step 1: Sequencing and diagnosis
For these conditions, sequencing has the advantage of being able to look at many genes simultaneously.
For many patients on diagnostic odysseys, it is the first step toward taking therapeutic action.
Sequencing often finds genetic changes, or mutations, of interest. (Geneticists often use the closely related word variant to describe a specific genetic change or mutation.)
After sequencing, it is important to interpret the meaning of the mutations discovered.
Interpreting the meaning of mutations
In fact, most mutations are benign on their own – or not disease-causing if only one allele is affected – so it is important to consider combinations of mutations as well.
To be clear, while there are some principles that can aid in interpreting mutations, it is not a standardized process (nor will it ever likely become one, lest we find a way to standardize science itself).
Achieving consistent interpretations of a mutation or mutations is a significant challenge at present.
The ACMG variant analysis guidelines provide a baseline set of techniques and resources to use during the interpretation of variants.
Several considerations may aid the process of interpretation:
- Finding additional patients.
- Assessing segregation and inheritance patterns.
- Analyzing frequency in the population.
- Analyzing conservation.
- Exploring the type of mutation and functional predictions.
- Conducting functional studies in a lab.
Aiding interpretation: Finding additional patients
Less conventional resources, such as Google and Wikipedia, should also be searched in case clinicians, researchers or even patients themselves have posted relevant information online about a gene or variant.
When searching less structured resources, it is important to consider all names for a gene (in both humans and other organisms).
For example, the equivalent of the gene NGLY1 is called PNGase in other organisms (and more recently is sometimes erroneously called CDGIV or CDG1V).
Aiding interpretation: Assessing segregation and inheritance patterns
If more than one relative is impacted, combining genotypic and phenotypic data from multiple relatives – both healthy and affected – can aid in determining the pathogenicity of a mutation and the pattern of inheritance of a condition.
If a condition is dominant, then relatives with a causative pathogenic allele should present with the condition.
If a condition is autosomal recessive, then only relatives with two causative pathogenic alleles should present with the condition.
Reasoning backward, potential patterns of inheritance also provide grounds for additional scrutiny. For example:
Because of their rarity, cases where both alleles of an autosomal gene harbor an apparent loss of function mutation (either in compound heterozygous form or homozygous form) should raise suspicion of a possible recessive disorder.
If a mutation is de novo, then it warrants greater scrutiny.
Aiding interpretation: Population frequency
The frequency of an allele in a genomic population database such as ExAC also hints at pathogenicity: pathogenic alleles tend to be rarer.
Aiding interpretation: Analyzing conservation
Evolutionary conservation, examined using tools such as the UCSC genome browser, can also be supporting evidence of pathogenicity.
For example, if a particular amino acid remains the same in versions of the gene across species, it is an indication that natural selection is protecting against change in that amino acid, and changes to such an amino acid are more likely to be pathogenic.
Looking at genes that co-evolve with a gene of interest with ERC analysis may also yield clues as to the functional role of the gene.
Aiding interpretation: Predicting functional impact
Tools such as PolyPhen2, MutationTaster, SIFT, and the Ensembl variant predictor will attempt to predict the effect of mutations for proteins, although a geneticist or genetic counselor needs to audit the results.
Related genes should be studied as well for known or suspected pathogenicity. The STRING database and Genemania report potential interactions between proteins that could suggest additional hypotheses and analyses.
The type of a mutation is also important in predicting impact on function:
Given the location and type of the mutation, expertise in the structure of the protein or the gene may yield insight into its potential impact.
Example: Mutations in a functional domain
Mutations within a domain of function should be viewed with increased suspicion, as these have a greater chance of disrupting the activity of the protein.
Consult the NCBI protein database or for known domains for a protein.
Computer modeling may be able to predict alteration of binding affinity.
Example: Mutations impacting post-translational modification
There are a variety of tools for predicting post-translational modification sites.
For example, the loss of a phosphorylation site that is used to inhibit activity could lead to gain of function.
Example: Mutations impacting post-primary structure
A mutation (especially a missense mutation or an in-frame mutation) that could potentially modify the secondary, tertiary or quaternary structure of a protein (or resulting complex) warrants scrutiny for pathogenicity.
For example, two cysteines distant from one another in the sequence can form disulfide bonds within a protein as it folds. Changing either cysteine into a different amino acid breaks the ability to form the disulfide bond.
Example: Mutations that impact splicing
The tool ESE2 can help identify splicing errors.
Example: Mutations in non-coding regions
Mutations outside of the exome can be difficult to interpret.
Aiding interpretation: Functional studies
Functional studies of mutations in a laboratory may help determine pathogenicity.
Functional studies may involve analysis of cell lines or even the construction of a model organism, such as a mouse, fly or worm.
The model organism will be genetically modified to have a mutation equivalent to the one under suspicion.
For interpretation, the Monarch Initiative can compare phenotypes across species.
If studying cell lines with an assay related to activity of the gene reveals that the mutation has caused gain of function for the gene involved, then the mutation should draw additional scrutiny.
Interpreting chromosomal abnormalities
High-resolution karyotyping and next-generation sequencing can also detect chromosomal abnormalities.
Chromosomal abnormalities usually impact many genes simultaneously, leading to duplicated copies of many genes or deleted copies of many genes (or both).
The next step is to determine which genes have been impacted, and then to attempt molecular therapeutics on a gene-by-gene basis.
Using the UCSC genome browser, one can look up the genes found in the affected regions.
A diagnostic report should indicate the specific abnormality:
In a chromosomal deletion, the set of genes in the fragment have been deleted.
In a chromosomal duplication, the set of genes in the fragment have been duplicated.
In a chromosomal translocation, fragments of two chromosomes have swapped, which may be balanced (indicating no genes lost or duplicated) or unbalanced (indicating possible duplicated and lost genes).
A diagnostic report should also include the regions impacted in cytogenetic notation.
If a causal mutation is identified, the next steps are:
If plausible but not definitive candidates emerge among the mutations, then the next steps are to determining which approaches were used to analyze pathogenicity, and attempting those which were not, which may include techniques less common in a clinical setting, including:
If no plausible candidates are identified, then the next steps are:
While unconventional, crowd-sourcing the interpretation of a mutation over social media (such as in a blog post), may yield insight.
Step 2: Finding other patients
When encountering a possible new patient, it is important to realize that matching on the same gene in a diagnostic report does not automatically imply that he or she has the same disease.
For example, if a disorder is recessive, it is important to ensure that the second patient is not simply a carrier.
If a second case confirms the cause of the disorder, then the next step is molecular therapeutics.
As a patient community begins to grow, there are important next steps to be taken in parallel, including:
conducting natural history studies of existing patients;
creating non-profit foundations to fund research; and
establishing a patient registry.
Step 3: Preparing for therapeutic development
In order to explore therapeutic strategies for a genetic condition, it is critical to understand (1) the type and location of the mutation; (2) the type and function of the affected protein (or non-coding RNA); and (3) the primary and downstream mechanisms of harm.
In parallel with seeking therapeutics, there are additional tasks that will either accelerate the search or become necessary once drug targets are identified:
Most of these tasks will require identifying an expert.
Phenotyping via natural history studies
A natural history study for a cohort of patients is a scientific study that observes how phenotypes evolve for each patient individually and collectively over time.
Regulatory agencies such as FDA insist on having strong, longitudinal natural history data for clinical trials.
Natural history studies are critical for:
- being able to predict the progression of a disease;
- associating genotype with phenotype;
- identifying the core features of a disorder; and
- uncovering biomarkers.
In any given patient, some features of a disease may be specific to that patient due to interactions with environment and with other genes.
These are not part of the core features of the disorder, and ancillary to the primary mechanism of harm.
Treatment strategies directed at the core features of a disorder are more likely to bring broad relief, while identification of ancillary genes that modify the phenotype may provide therapeutic insights.
Investigational studies into mechanism of harm
Determining the pathogenicity of a mutation often does not involve causally linking the underlying gene or variant to all of the high-level symptoms of the patient. (For instance, when a second case is identified to confirm the cause, there may be no knowledge as to why it is causal.)
It is difficult to treat a disease if the chain of events (starting with a mutation) that cause harm to the patient are not understood.
The chain of causality between a genetic defect and a high-level symptom can be lengthy, but it is critical to uncover the full chain.
In some cases, targeting downstream mechanisms of harm is easier than targeting the initial cause, with the expectation that focusing on downstream mechanisms may bring less general relief.
Investigational studies can proceed bottom-up from the cell biology level, or they can move top-down from the patients and model organisms (as in natural history studies). Ideally, these studies should move in both directions and generate hypotheses for the other to test.
For each mechanism of harm, investigational studies should aim to discover laboratory assays that can observe the hypothesized or confirmed mechanism.
The purpose of an assay is to measure a specific feature of a biological system.
Having a suite of assays that can probe different links of the mechanism of harm are critical for validating compounds predicted to have therapeutic benefit.
For example, a good assay might fluoresce under UV light in the presence of a compound that corrects some aspect of the mechanism of harm.
Assays are necessary for validating potential compounds.
They are also a prerequisite for conducting high-throughput screening.
There are roughly three kinds of assays, in order of increasing complexity:
cell-free assays isolate the key components of a cellular or chemical process;
cell-based assays operate on patient cells or on cells that model the disease; and
model-organism-based assays observe the phenotype of a model organism.
Lower-complexity assays tend to be more economical for conducting high-throughput screening techniques, while higher-complexity techniques tend to produce stronger candidates. (For instance, if a compound works on a model organism, at least some aspect of the delivery challenge in medicinal chemistry has been solved.)
A word of caution on assay development
Designing scientific experiments to measure a feature of interest is a fundamentally creative process, so assay discovery is subject to the same constraints that govern scientific progress itself.
Designing an assay will require finding an expert in the relevant biology.
While it is hard to systematize or automate the process of assay discovery, Recursion Pharmaceuticals has computer vision algorithms that attempt to discover cell-based assays when there are morphological changes to the cell as the result of a disorder.
Basic research into the mechanism of harm should also be aimed at identifying biomarkers: observable indicators of the disorder.
Biomarkers help measure the effectiveness of therapeutic approaches directly or indirectly, and are essential when conducting clinical trials.
Establishing cell lines
For basic investigational studies into the cell biology of a disorder, patient cell lines are invaluable.
The type of cell lines necessary for studying a disorder vary with the disorder, but fibroblasts and lymphoblastoid lines are relatively common, if only for their durability.
Stem cell lines (usually induced pluripotent stem cells (iPSc) made from patients) can also be useful in studying a disorder, because they are differentiable into other cell types.
As cell lines become established, it is important to conduct investigational studies that establish phenotypes for the different cell types.
For each disorder, it is important to create the cell lines with the strongest phenotype relative to the high-level phenotype of the patients. The strongest phenotypes are most useful for validation and screening.
For instance, if a disorder has strong liver involvement, then hepatocytes may provide a strong cellular representation of the disorder, while if it is neurological in nature, then neurons may provide a strong representation.
Establishing biobanks and reagent repositories
As a patient community grows, establishing biobanks with patient cell lines and repositories of reagents (such as antibodies) can accelerate research and make it easier to compare and reproduce results between research labs.
Patient communities will likely want to partner with an existing medical research institute such as a university, the NIH or a non-profit such as Sanford-Burnham-Prebys or Coriell for biobanking of cell lines and tissues.
Because patient tissue is extremely valuable, communities should also consider creating a procedure for donating patient bodies to aid in investigation of the disorder, should patients or parents choose to do so.
Creating model organisms
If a model organism recapitulates the phenotype of a patient, then it is evidence that the modeled genotype is pathogenic.
The choice of model organism in disease research is guided by the exhibition of a strong phenotype for the disorder and its correspondence with the human phenotype.
After the construction of a model organism, a critical next step is to characterize the phenotype of the model.
Characterizing the phenotype of a model organism may be significantly more labor than creating the initial organism itself, but it is a critical step, since it is difficult to test therapeutics without a robust phenotype with high statistical confidence.
Many academic laboratories and institutes such as Jackson Laboratories can construct and phenotype model organisms.
In the context of human disease, there are three categories of model organisms commonly employed:
Finding modifier genes
For instance, in a loss of function disorder, there may be another gene that can compensate for the role of the lost activity. (In a metabolic disorder, this would be called an alternative pathway.)
If an assay that measures the activity of the deficient enzyme is available, then it is possible to conduct a genetic screen for alternate pathways.
If a compensatory gene is discovered, then the next step is to target it for increasing gene expression.
In some cases, disabling or decreasing expression of a second gene may actually suppress the phenotype of the disorder. Suppressor genes offer additional therapeutic targets for inhibition. A genetic screen can identify suppressor genes as well.
For determining which genes co-evolve together – which yields clues to function – one can use ERC analysis.
Crowd-sourcing and crowd-screening
In some cases, conducting precision medicine means conducting science.
Science is a process that depends on collaboration and creativity, so tapping the collective creativity and wisdom of the Internet can accelerate the process.
Crowd-sourcing variant interpretation allows experts on relevant genes to provide their insight on pathogenicity, and it opens up the possibility of finding a matching case.
Crowd-screening suggestions through social media for potential therapeutics allows experts to contribute rationally predicted therapeutics (in contrast to blind high-throughput screening approaches).
Biocuration enables bioinformatics techniques to mine the newly structured data for relationships between diseases, potential drugs and genes.
Of course, soliciting advice from social media requires filtering out advice without a plausible scientific basis, but it can be a powerful mechanism for generating leads.
Step 4: Identifying strategies for therapeutics
The following set of strategies for molecular therapeutics depends upon the type of the mutation, the type of the protein and the maturation of understanding around the mechanism of harm, and these include:
- screening for candidate compounds;
- a general strategy for changes in degree;
- targeting a mutation;
- targeting a metabolic defect;
- targeting a transporter protein defect;
- targeting a receptor defect;
- targeting gene regulatory defects;
- targeting a structural protein defect;
- targeting a non-coding RNA defect;
- targeting a proteinopathy.
Screening for candidate compounds
Regardless of the underlying cause, much of modern drug development rests on screening compound libraries for an effect on the mechanism of harm.
As a result, it is difficult to engage in screening without first conducting investigational studies into mechanism of harm.
Once an assay for measuring a mechanism of harm has been discovered, the next step is to conduct high-throughput screening.
In addition to manual screening, virtual screening may be able to make therapeutic predictions without resorting to bench science.
If screening yields any hits, the next step is compound validation.
Targeting a mutation
Some therapeutic strategies focus directly on the mutation itself, without regard to the type or role of the corresponding protein:
While not generally feasible at present, gene therapeutics provide the possibility of correcting or compensating for a mutation at the DNA level.
If a disorder is caused by a premature stop mutation, then readthrough therapeutics are in scope.
(Keeling, et al., 2014) provide an overview of the readthrough therapeutics space.
When the mutation of interest is a premature stop, then testing readthrough compounds on cell cultures is a reasonable next step, and the following compounds may be useful in those tests:
G418 is a potent readthrough compound useful in a laboratory setting as a measure of the potential of this approach, since it is a potent readthrough inducer, although it is too toxic to be used therapeutically.
Gentamicin (which has problematic side effects) is also known to induce readthrough in some cases.
Ataluren, which is purported to induce readthrough, is available in Europe.
If a damaging mutation occurs in an exon that is non-critical to the resulting protein, then exon-skipping may be a viable therapeutic approach.
Databases such as the Ensembl genome browser can provide the exons and introns for a gene.
In addition, Ensembl will also provide alternate transcripts that have been identified. If an alternate transcript exists that skips the exon containing the mutation, this is a positive (though not essential) indication that exon-skipping is a viable therapeutic approach.
To skip over a mutation-bearing exon, an antisense oligonucleotide sufficiently complementary to the mutation and surrounding nucleotides is created that can induce RNA splicing to skip the exon during construction of mRNA.
In theory, an antisense oligonucleotide sequence could be customized for any disorder in which it is reasonable to skip an exon.
Exon skipping is being actively pursued as a means to convert cases of Duchenne muscular dystrophy into the less severe Becker muscular dystrophy.
If an exon-skipping compound is identified or constructed, the next step is compound validation.
A general strategy: Targeting changes in degree
Many (but not all) genetic disorders can be lumped into one of three categories according to their impact on the function of either the original gene or a process with which the gene interacts: total loss of function, partial loss of function or gain of function.
A total loss of function often results when:
both alleles for a gene in an autosomal recessive disorder are impacted by loss of function mutations;
the only allele in an X-linked disorder is impacted by a loss of function mutation;
A partial loss of function often results one when one allele in a haploinsufficient gene suffers a loss of function mutation.
A gain of function results when a mutation causes an increase which disrupts regular functioning.
If a mechanism of harm is caused by a partial loss of function, then the next step is to consider therapeutic strategies for partial loss of function.
If a mechanism of harm is caused by a gain of function, then the next step is to consider therapeutic strategies for gain of function.
Therapeutic strategies for total loss of function
There are three high-level strategies for total loss of function:
- Restore the lost function.
- Compensate for the lost function.
- Suppress aggravating factors.
Restoring lost function
If a missing enzyme could be delivered to the correct part of the cell, then the next steps include enzyme synthesis and development of enzyme replacement therapy.
If a missing protein could be delivered from another tissue, then exploring genetically-motivated transplantation is a next step.
Compensating for lost function
In some cases, a second gene may have a degree of redundancy with a lost function.
Thus, a next step is investigational studies to look for compensating genes.
If total loss of function is leading to insufficiency or absence of a particular metabolite, then another next step is to explore a metabolic diet to deliver the metabolite.
If an insufficient or absent metabolite cannot be obtained through diet, then it should be considered a target for drug development via medicinal chemistry.
Suppressing aggravating factors
In a total loss of function, a genetic suppressor screen can identify secondary genes that worsen the condition.
Given the difficulty in restoring lost function, a genetic suppressor screen is a highly advisable strategy for developing therapeutics.
For each gene hit on a genetic suppressor screen, it should be treated as if the disorder were caused by a gain of function in that gene in terms of therapeutic strategies. That is, the aggravating gene should be targeted by inhibitors.
If total loss of function is leading to accumulation of a harmful metabolite, then another next step is to explore a metabolic diet to reduce consumption of the harmful metabolite or its precursors.
Therapeutic strategies for partial loss of function
If there is diminished – but not lost – activity for a gene, then the high-level strategy is to increase activity.
Mutations is haploinsufficient genes can lead to disorders with partial loss of function.
Because there is residual activity, there are additional strategies to be considered in addition to those for total loss of function:
If residual activity is insufficient for downstream processes, it may also be useful to consider to increasing the inputs to the activity; that is, increasing the number of substrates or agonists in an attempt to resolve the insufficiencies.
Under either approach, the next step is to validate any candidates.
Therapeutic strategies for gain of function
If a disorder is caused by a gain of function in a gene or is aggravated by activity in another gene, then the high-level strategy is to suppress activity in that gene or in its pathway.
A next step is to search for suppressors – inhibitors, blockers or antagonists – of the target gene. Google and PubMed may identify initial hits for such compounds.
The Guide to Pharmacology contains target-specific inhibitors for many genes.
In addition to searching for suppressors, a next step is to explore decreasing expression for a gene.
If upstream or downstream elements in the affected pathways are known, then applying the same gain of function strategies to each element in the pathway may serve to counteract a gain of function upstream or downstream.
If no inhibitor, blocker or antagonist is known, then structure-based drug development and virtual screening are potential next steps.
For any compounds that turn up, the next step is compound validation.
Targeting a metabolic pathway defect
In metabolic pathway defects, it may be possible to intervene upstream or downstream of the defect. Pathway databases such as BioCyc may help identify additional targets for intervention.
In any case, but especially in the event of a total loss of function, an additional next step is to explore a metabolic diet.
If a missing metabolite cannot be effectively consumed in the diet, then the missing metabolite is a target for drug delivery via medicinal chemistry.
Targeting a membrane transporter protein defect
A “loss of function” mutation in a membrane transporter protein means the protein is broken in some way. Since a membrane transporter protein acts like an automatic door, there are two ways that a door can suffer a “loss of function”:
a door that is stuck open lets in too much; while
a door that is stuck closed lets in too little.
When dealing with a defect in a membrane transporter, is absolutely critical to know whether the defect is causing a gain in traffic or a loss in traffic.
As such, the general strategy for targeting a degree of change in function is in scope, except that the equivalent of an inhibitor is often called a blocker.
Cystic fibrosis (a total loss of function (in the sense of closure) in a chloride channel) is an example of a transporter protein defect with a recent track record of success in finding treatments.
Targeting a receptor defect
Receptors are more complex in their interactions than enzymes, because they have a baseline level of activity – constitutive activity – even in the absence of the ligand (agonist) which stimulates them.
With a partial loss of function, a next step is look for agonists of the receptor.
Receptors can also be viewed as the starting points of the metabolic pathways that they kick off, so it may be easier to target a metabolic pathway behind the receptor than the receptor defect itself.
For any compound suggested by these strategies, the next step is compound validation.
Targeting gene regulatory defects
Apart from a genetic disorder’s primary mechanism of harm, disregulation of other genes can account for harm in these disorders as well.
Most genes have a role to play in the regulation of other genes: increasing expression expression of one gene may increase or decrease the expression of another gene.
As a result, altering the expression of the protein impacted by a mutation can have downstream effects through gene regulatory networks.
In addition, some genes – such as those involved in chromatin modification or histone/DNA methylation – engage in regulation of other genes as their primary function.
To target the primary effects of a mutation in a regulatory gene and the downstream effects of other mutations, transcriptomics and proteomics can reveal the extent to which other genes have been disregulated, and can suggest therapeutic strategies for restoring a baseline gene expression profile.
Targeting a structural protein defect
If a mechanism of harm disrupts a protein whose primary purpose is structural (as in dystrophin), then it is challenging to replace the structure.
The Duchenne Muscular Dystrophy community is an exemplar in developing strategies for tackling the absence of a structural protein.
Given the pharmacological challenges in therapeutically delivering a structural protein, strategies focusing on the mutation such as exon-skipping and readthrough are next steps, assuming the mutations are in scope.
Though difficult, investments in basic science for gene-editing may be advisable.
If it is suspected that the mutant protein would have some value, but protein quality control mechanisms are too aggressively degrading the protein, then a next step is to search for stabilizers for the mutant protein.
Targeting a non-coding RNA defect
If the primary cause of a disorder is an error (or a loss) of non-coding RNA – which is presumed to be rare relative to disorders caused by defects in proteins – then there are two additional high-level strategies:
delivering the missing non-coding RNA; and
editing the defects in the non-coding region.
RNA is straightforward to synthesize, but targeted delivery is a significant challenge in the application of medicinal chemistry.
However, targeted gene-editing has additional (likely more difficult) challenges.
Targeting a proteinopathy
If a mechanism of harm disturbs protein folding, in some cases, the new foldings (or foldings to which they have become susceptible), are actively malignant.
In many cases, cells can detect misfolded and/or mutant proteins through quality control mechanisms, and degrade them.
In some diseases, improper protein folding causes toxic protein aggregation.
For instance, amyloid misfolding features in diseases such as prion disease, amyloidosis, Alzheimer’s, Huntington and Parkinson’s, as it allows aggregation of misfolded proteins.
In disorders in which misfolding is driving malignant behavior, several high-level strategies are in scope, including:
identifying compounds that can stabilize the misfolded protein;
creating monoclonal antibodies to target malignant proteins;
Heat shock proteins aid protein folding and are often naturally upregulated when a cell is stressed (although heat shock therapeutics have been challenging to develop due to toxicity).
Autophagy is the process by which cells digest defective or excessive components, and upregulation of this process may be beneficial in proteinopathies.
If protein aggregration specifically is problematic, then an additional strategy is to characterize sites on the protein that allow aggregates to form and to identify inhibitors that can interfere with these sites, with the aim of preventing aggregates from forming.
For any identified compound, the next step is to compound validation.
Targeting the phenotype
At the level of a patient, phenotypic targeting is simply symptomatic treatment. For instance, if a symptom of the disorder is epilepsy, one can try known anticonvulsants.
If investigational studies into the mechanism of harm have found a biomarker in cells and a high-precision assay has been discovered to measure that biomarker, then a next step is high-throughput screening.
Step 5: Exploring specific therapies
Increasing gene expression
In a disorder that involves loss of function, if there exists a functioning copy of a gene that has identical or sufficiently similar function, then increasing expression of that gene may have a therapeutic effect.
Or, in disorders where a mutant protein with reduced function is leading to disease, increasing expression of the mutant protein may raise activity levels high enough to be therapeutic.
In either case, the next step is to predict upregulators for the RNA of the target gene.
Decreasing gene expression
In a gain of function disorder or a disorder for which a suppressor gene has been identified, reducing expression of a target gene may provide a therapeutic effect.
The next step is to predict downregulators for the RNA of the target gene.
Predicting compounds to modify gene expression
To predict which compounds may be able to upregulate the expression of a target gene, the Connectivity Map – or cMap – LINCS cloud databases contain the result of experiments measuring the effect of a library of compounds on RNA expression for many genes.
In some cases, the target gene may not be present in the databases, but if the target gene is regulated by another gene in the database, then one can attempt to indirectly regulate the target gene.
Any hits produced through this approach should proceed to compound validation.
Designing a metabolic diet
In the case of a lost metabolic pathway, in which inputs no longer convert to outputs, two mechanism of harm should be expected:
- accumulation of inputs; and
- a deficiency in the outputs.
If there is no alternate pathway to metabolize the input, then (1) should be examined and if no alternate pathway to synthesize the output exists, then (2) should be examined.
For example, in the disorder PKU, total loss of function in phenylalanine hydroxylase leads to an inability to convert the amino acid phenylalanine into tyrosine.
This suggests two strategies:
Limiting consumption of phenylalanine.
Increasing consumption of tyrosine.
In fact, strictly limiting consumption of phenylalanine in the diet is an effective treatment for the disorder, and tyrosine supplementation is beneficial as well.
As another example, patients with CDG Ib – a total loss of function in the gene MPI – lack an enzyme to interconvert mannose–6-phosphate and fructose–6-phoshpate. Because this enzyme is the sole provider of mannose–6-phoshpate, the loss of the enzyme results in a deficiency of mannose–6-phosphate, a critical precursor to a process called glycosylation.
Adding mannose supplementation to the diet is an effective treatment for the disorder.
A more common metabolic diet is the restriction of lactose-bearing dairy products in individuals with lactose intolerance, a result of insufficient or absent quantities of the enzyme lactase, which breaks down lactose into galactose and glucose for further digestion.
Finding stabilizers for mutant proteins
In the event that the mutant protein is predicted to have residual function, but quality control mechanisms within the cell (such as endoplasmic-reticulum associated degradation) are degrading the protein, the goal of stabilization is to find a molecule that interacts with the mutant protein to prevent degradation.
In general, finding stabilizers may require virtual screening and structure-based drug design.
In the specific case where mutant enzymes may retain activity if they could properly fold, but poor ability to fold leads to degradation of the mutant proteins, there is work showing that potent inhibitors of the mutant enzymes at low concentrations may be able to induce proper folding (Fan, 2003), thereby preventing their destruction and rescuing activity.
Enzyme replacement therapy
In disorders lacking an enzyme, enzyme replacement therapy, which replaces the missing enzyme, may be able provide therapeutic relief.
There are substantial drug delivery challenges in enzyme replacement therapy, but these vary in difficulty depending on the tissues and intracellular compartments that need to be targeted.
The first step toward enzyme replacement therapy is being able to synthesize the enzyme in a biologically active form.
Therapeutic enzyme synthesis generally uses the transfection of Chinese Hamster Ovary (CHO) cells with DNA containing the gene that encodes the desired enzyme.
In properly tuned bioreactors, transfected CHO cells can generate large quantities of the target enzyme with post-translational modifications compatible with mammals.
In some cases, transplanting organs, tissue or cells that do not contain the underlying genetic defect can be therapeutic.
In particular, if a disorder results from a missing gene product and that gene product can be delivered from another tissue to other cells in the body, then organ and bone-marrow stem cell transplantation are in scope.
In a bone marrow transplant, the patient’s bone marrow is depleted and then a transfusion of donor stem cells is provided to regrow the bone marrow.
Moreover, because the donor stem cells don’t carry the mutation, as they differentiate in organs and tissues in the body, they will produce cells not affected by the disorder.
In theory, gene editing could be used on induced pluripotent stem cell lines for an autologous bone marrow transplant, although more basic research into accurate gene editing is required before this could be considered a realistic possibility.
Stem cell therapeutics
Stem cells have attracted attention for their regenerative therapeutic potential, and there are certainly disorders and injuries which stand to benefit from them.
Unfortunately, the scope for stem cells in treating genetic disorders is more limited.
In disorders for which genetically-motivated transplantation is in scope, it is conceivable that stem cell lines could be genetically modified to remove a mutation, and then transfused back into a patient.
Two further obstacles make autologous stem cell therapeutics challenging for genetic disorders for the near future:
transplantation with stem cells increases cancer risk; and
error rates in gene-editing further increase cancer risk.
Despite more limited prospects for treatment, stem cell lines are valuable in investigational studies because they can differentiate into different cell types, sparing the need to extract those tissues from patients.
Gene-editing and gene therapy
Gene editing and gene therapy is a theoretical silver bullet for all genetic disorders, and with enough investment in basic scientific research, it will almost certainly one day become reality.
Conceptually, gene editing involves editing out defects in an organism’s genome by inserting, replacing or deleting elements in an organism’s genetic code.
Practically, there are three major high-level challenges with most approaches:
Delivering the gene-editing agents to every cell (or every cell of interest).
Ensuring that the editing error rate is low enough to avoid introducing additional mutations (and likely cancer in the process).
Managing the immune response to delivery vectors.
For delivering gene-editing agents, engineered viruses are the most popular platform.
New genetic material (such as a functioning gene version of a gene) can be delivered directly as a separate fragment of DNA called a plasmid.
Alternatively, genetic material can be integrated into the host genome via techniques like Zinc fingers, TALENs, CRISPR/Cas9 or meganucleases.
In the context of human disease, transcriptomics has the potential to identify genes disregulated as the consequence of the disease.
Care must be taken in interpreting results, as some disregulation could be a compensatory response to the defect. Some apparent “disregulation” could simply be background variation in the individual.
Transcriptomics requires conducting RNA sequencing on as many patients and close relatives as possible in order to increase statistical confidence and separate core disregulation from transcriptional artifacts.
Restoring baseline expression where disregulation was compensatory could be anti-therapeutic.
An advantage of a transcriptomics-driven approach to disease therapeutics is that it holds the potential of addressing a broad class of downstream mechanisms simultaneously, and it could be utilized even in the absence of a firm diagnosis, because RNA sequencing can capture a snapshot of the mechanism of harm as it passes through the transcriptome.
The disadvantage of a transcriptomics-driven approach is that it does not target the primary mechanism of harm.
While transcriptomics could be used on the level of an individual patient given sufficient samples, it is certainly more effect with RNA sequencing data available from the larger population, since this should allow it to identify core disregulation in a disorder.
RNA sequencing is not generally available in a clinical context, so this approach require partnering with an academic partner.
Proteomics measures the protein types and quantities present in an organism across specific tissues, environments and times.
For human disease, proteomics can identify proteins disregulated as a consequence of the disease.
As with transcriptomics, disregulated proteins found in proteomics may also suggest regulatory strategies for therapeutics, with the caveat that some proteins may be disregulated as a compensatory response.
Step 6: Discovering drug candidates
Conducting high-throughput screening
High-throughput screening attempts to test thousands of drug candidate compounds simultaneously using robotics to automate the process.
High-throughput screening requires the selection of a compound library and the discovery of a high-precision assay that can recognize when a mechanism of harm has been mitigated.
Assays must be engineered to have a high signal-to-noise ratio to rule out excessive false positives in large screens.
For each compound that gets a hit, the next step is to validate the compound.
Structure-based drug design and virtual screening
Structure-based drug design and virtual screening are computational methods for designing and searching for drug candidates (which are generally inhibitors).
In structure-based drug design, the objective is to design a small molecule that is roughly opposite in structure and charge to a target domain on an enzyme.
Virtual screening scans compound libraries for potential ability to bind with and inhibit target domains on proteins.
Because of the approximative nature of computational methods, predicted candidates from these methods should proceed to compound validation.
High-fidelity virtual screening would often require intractable simulations with molecular dynamics, so docking simulations may be used in lieu of full physical simulation.
There is software available for conducting these simulations:
- PyRx can screen a protein against possible inhibitors.
- ZINC is a database of structures for commercially available compounds.
- FAF-Drugs3 is a filtering package to predict pharmacokinetics.
Conducting model organism screening
Once a model organism has been created for a disorder and its phenotype has been robustly characterized, then the organism may be used as a platform for screening potentially therapeutic compounds.
For some model organisms, it is possible to employ automation to conduct phenotyping, which may enable large amounts of compounds to be tested.
Conducting genetic suppressor screening
A compound-based screen on model organisms can identify compounds that impact the phenotype, but a suppressor screen can identify genes that modify the phenotype, and these genes may be useful as the basis for investigating therapeutics.
In a suppressor screen, mutagenic agents are introduced into a large population of model organisms.
If any of the resulting double mutant organisms show improvement in their phenotype, then the mutant can be sequenced to determine which gene was modified.
If knocking down a second gene suppresses the phenotype in a screen, then the next step is to apply therapeutics strategies as if the target gene were causing disease through gain of function, such as developing an inhibitor.
If increasing the activity of a second gene suppresses the phenotype in a screen, then a next step is to explore therapeutic RNA upregulation in this second gene.
Validating a candidate compound
If a screen produces a hit or a compound is hypothesized to be therapeutic, the critical next step is to validate the compound in a laboratory setting.
For example, if a read-through compound is predicted to increase the expression of the wild-type protein, an antibody for the protein should be able to detect its presence.
If validation with cells succeeds or validation with cells is not possible, the next step is validation against the phenotype of model organisms.
If cell-based and organism-based validation succeed, the next step is to apply medicinal chemistry to the compound to convert it to clinical material suitable for clinical trials in humans.
Step 7: Applying medicinal chemistry to candidates
When compounds are first identified either through screening or rational predictions, it most likely not the case that these compounds will have regulatory approval, or even that these compounds will be non-toxic and effective in patients.
As strategies for identifying and developing molecular therapeutics begin to yield these candidates, for any candidates without regulatory approval, medicinal chemistry will be required to transform these compounds into a form suitable for conducting a clinical trial.
As a broader discipline, medicinal chemistry aims to manipulate the efficacy, toxicity and delivery of a compound.
In other words, medicinal chemistry is a multidisciplinary engineering process that begins with a molecule that demonstrates efficacy on an assay in cells and ends with a derivative of that molecule that is intended to be safe and effective.
In general, medicinal chemistry challenges have to be re-solved for each molecule, although some platforms, such as exosomal encapsulation (see a review in (Batrakova and Kim, 2015), provide the possibility of aiding delivery for a larger class of molecules.
Crossing the blood-brain barrier
While there are many challenges in medicinal chemistry, one often stands out, especially in diseases impacting the brain: crossing the blood-brain barrier.
The selective permeability of the endothelial cells in the brain prevent many molecules from crossing, which makes drug delivery to the brain a major challenge.
Enzyme replacement challenges
Enzyme replacement therapy (and large molecule therapy in general) also requires special considerations, both for its increased challenges in crossing the blood-brain barrier and also for the need to control targeting to specific tissues or intracellular compartments.
Enzyme replacement may also require targeting a specific organ, tissue or intracellular compartment.
Within a cell, targeting the lysosome with a synthetic enzyme is perhaps among the easiest, because the natural process of phagocytosis naturally tends to direct large molecules to the lysosome.
For delivery to the cytoplasm of the cell, attaching cell-penetrating peptides (such as the TAT peptide) to an enzyme can improve cell penetrance.
PEGylation of the enzyme may also be beneficial in reducing the immunogenicity of the protein (reducing side effects) and in reducing renal clearance (improving availability and increasing half-life).
Step 8: Conducting clinical trials
Clinical trials attempt to determine the safety and efficacy of a therapeutic, and they are required by regulatory agencies on most countries before a compound may be marketed.
Clinical trials typically have four phases:
Phase 0: First-in-human. Pharmacodynamics and pharmacokinetics study. About a dozen volunteers.
Phase 1: Safety testing. Dose range determination. Side effect observation. A few dozen volunteers.
Phase 2: Effectiveness testing. A few hundred patient volunteers.
Phase 3: Large-scale safety and effectiveness testing. A few thousand patient volunteers.
Clinical trials are often placebo-controlled and double-blinded, so that some participants are receiving a therapeutic and others are receiving a placebo.
Conducting a clinical trial on a single patient
Perhaps one of the greatest epistemological and regulatory challenges for precision medicine is that there may be so few patients that a placebo-controlled trial is unlikely to yield the statistical confidence necessary to validate the approach.
At the moment, there is no regulatory framework in place for single-patient trials, although proposals for “n=1” trials are circulating in the academic community.
In the U.S., if a patient wishes to take a compound that does not have FDA approval, the manufacturer must agree to provide it, and the patient must petition the FDA for permission through expanded access.
Part II: Questions and answers
This is part two of the guide.
It began as a glossary, but has evolved into a question and answer format.
I am striving to explain entries in more patient-friendly language and in the context of human disease.
In fact, not all of the questions below refer to topics above; some are there because they may appear in a diagnostic report.
You can read each entry as needed as a reference, but I have tried to order the questions so that you can also read it top to bottom as a tutorial on genetics and precision medicine.
It is by no means complete, and I expect to be updating this segment of the guide regularly.
If you’re already trained in a field of science or engineering, then, once again, I recommend Quickstart Molecular Biology:
It’s a rapid introduction to the field, targeted at those that already have a technical background (broadly speaking).
What is a phenotype?
In the context of a patient, a phenotype is a collection of symptoms for a disorder.
More generally, phenotype refers to the observable or measurable characteristics of an organism, whether in cells, model organisms or human patients.
Everything observable – from hair color to seizures to blood platelet levels – counts as part of the phenotype.
What is a genome?
The human genome is an instruction manual for building and operating a human being at the molecular level.
This instruction manual exists in every cell of the human body, and it is encoded in the long string-like molecules of DNA.
What is a genetic condition?
A genetic condition results from damaging alterations to the genome of an organism.
Most genetic conditions are the result of alterations inherited from one or both parents, and in these, the alterations are present in every cell.
There is a less common class of genetic conditions in which only some fraction of a patient’s cells experiences a condition – a situation known as somatic mosaicism.
Cancer is an example of an genetic condition that begins with alterations to the genome of a single cell in a previously healthy patient.
What is an exome?
Sequencing the exome instead of the whole genome is an economical way to look for the root cause of genetic disorders.
What is a mutation?
A mutation is an alteration of the genome.
What is sequencing?
Sequencing is a process that uses cells (usually from blood) to read the genome (or exome) of an individual.
Sequencing permits the identification of mutations.
At present, sequencing the exome is less expensive than sequencing the entire genome.
What is a gene?
A gene is a region in the genome that encodes the instructions for building a gene product.
What is a gene product?
A gene product is a molecule encoded by a gene.
In most genetic disorders studied today, errors in protein-coding genes are responsible, but there are disorders, such as Prader-Willi Syndrome, in which non-coding RNAs are implicated.
What is a protein?
Protein is a class of molecules that plays a significant role in life.
Proteins are the key actors in cells, and they play many roles, including:
- enabling molecular transformations and reactions (cellular metabolism);
- serving as structures within and between cells;
- mediating communication within and between cells;
- serving as molecular transporters;
- conducting cell replication; and
- building and modifying other proteins.
What is DNA?
DNA is a large molecule that stores the information within the genome.
Viewed as information, a DNA molecule is a long word written in an alphabet
containing the letters
At a structural level, DNA is composed of two opposing strands, and each strand is a sequence of nucleotides, and together, the two strands form the famous double helical structure.
The two opposing strands in a molecule of DNA are related: A’s and T’s pair together between opposing strands and C’s and G’s pair together, as in the following simple example of two strands:
A - T G - C T - A C - G T - A
A pair of nucleotides linked together within DNA are known as a [base pair].
What’s an example of a protein-coding gene in DNA?
What is a nucleotide?
The four nucleotides in DNA are adenine (
A), thymine (
G) and cytosine (
The four nucleotides in RNA are adenine (
A), uracil (
U), guanine (
and cytosine (
What is a base pair?
A is complementary to
C is complementary to
What is RNA?
In terms of information content, RNA is a second molecular alphabet composed
In addition to bearing information, some RNA molecules, known as non-coding RNAs, do not translate into proteins, but still have an active biological role.
What is non-coding RNA (ncRNA)? What is functional RNA?
A non-coding RNA molecule (ncRNA) is an RNA molecule that does not end up being translated into a protein.
Non-coding RNA may be called functional RNA to emphasize the fact that even though it does not translate into a protein, it may still have an active biological role, especially in terms of gene regulation.
What is the non-coding region of the genome?
The non-coding region of the genome is the region that does not encode proteins.
Some regions in the non-coding region contain non-coding RNAs that do not become proteins, yet play an active role in the cell.
What is an oligonucleotide?
Synthetic oligonucleotides form a potential basis for some therapies.
What is a variant/allele?
A variant or allele is a version of a gene.
Though all humans have roughly the same set of genes, each of us has two alleles for most genes – one from the chromosome from our father, the other from the chromosome from our mother.
(The exception is for genes on the Y chromosome in males.)
For most genes, there is a large collection of common variants that make up the bulk of the alleles in the population.
Mutations in a gene produce new alleles.
What is a wild-type allele?
A wild-type allele for a gene is a commonly occurring non-pathogenic version found in nature.
In disorders that impact only one of two alleles for a gene, it may be therapeutic to target the wild-type allele by boosting its activity in loss of function disorders and reducing its activity in gain of function disorders.
How do mutations change a genome?
A mutation is an alteration of the genome (inserting, changing or deleting letters).
In the context of the Gila monster gene, we can imagine a mutation that transforms the sequence:
into the sequence:
In this case, the fourth letter was changed from C to G.
Geneticists have even developed a notation (called HGVS notation) for describing mutations at the DNA level, and this one would be called c.4C>G
Under some circumstances, a mutation (or collection of mutations) may lead to a genetic disorder.
And, when a mutation happens in an individual cell later in life, it may give rise to cancer.
What is a de novo mutation?
A de novo (a Latin expression implying newness) mutation is one that it is unique to a child, and not found in either parent.
Through chance, every human being carries a few de novo mutations.
While de novo mutations are usually harmless on their own, they are often scrutinized in cases of rare disease.
What is an inherited mutation?
A mutation is inherited if it has been passed from parent to child.
What is a germline mutation?
A germline mutation is a de novo mutation that occurs early in development of an organism.
The cells, tissues and organs that descend from the mutant cell carry the mutation, but the rest do not.
The result is somatic mosaicism.
What is a genotype?
A genotype is the set of alleles for a specific organism.
In the context of disease, genotype also refers to the specific alleles responsible for causing the disease.
During the diagnostic phase of a genetic disorder, genotype may also refer to the collection of mutations under suspicion uncovered by sequencing.
What is a pathogenic variant/mutation?
A variant is pathogenic if it can cause disease.
What is pattern of inheritance?
In the context of human disease, the pattern of inheritance for disease refers to how genes must be inherited from parents in order to exhibit the disease.
The three common patterns of inheritance for genetic disorders are:
X-linked, in which there is a one in two chance that the son of a mother carrying the disorder will have the disorder while daughters have a one in two chance of being a carrier.
What is a chromosome?
At a structural level, a chromosome is a long string of DNA plus packaging material that holds it together and aids in regulating [gene expression].
At a genetics level, a chromosome is a collection of genes.
Humans carry 23 pairs of chromosomes (for 46 total).
One of these 23 pairs is the sex chromosome pairing – two X chromosomes for women, an X and a Y chromosome in men.
Each pair of the remaining pairs are autosomes.
What is an autosome?
Within each pair of chromosomes, each autosome carries a set of genes redundant (often called “homologous”) with the other.
This redundancy built in to autosome pairs provides protection against mutations: if one allele of a gene is damaged, there is a good chance the allele on the other autosome is still viable, and in many cases, one functional copy of a gene is sufficient.
When a condition is “autosomal” it means that it involves one of the 22 pairs of autosomal chromosomes.
What is an X-linked condition?
In an X-linked condition, the odds of mother that carries a condition will produce an affected son is 1 in 4.
If the mother knows she is carrying a boy, the odds of being affected increase to 1 in 2.
What is the difference between homozygous and heterozygous?
What is a compound heterozygous individual?
This is is usually the the result of two different loss of function alleles.
What is haplosufficiency?
With the exception of the Y chromosome in men, each gene has two copies in the genome.
A haplosufficient gene is one for which only functioning allele is necessary for full function.
A haploinsufficient gene requires both copies of the gene for full function.
When a loss of function variant is discovered for an autosomal gene, it is important to consider whether that gene is haploinsufficient.
What is a dominant disorder?
If a pathogenic mutation can cause disease by itself, then it is dominant.
If a parent has a dominant pathogenic mutation, that parent should have the disorder, and one out of every two children on average will have the same disorder.
What is a recessive disorder?
If a pathogenic mutation requires being paired with another pathogenic mutation (usually in the same gene) to cause disease, then it is recessive.
If someone has only one copy of a recessive mutation, then they are a carrier.
Many genetic conditions are recessive because humans carry two copies of most genes (one on each autosome), and usually only one working copy of a gene is necessary.
What is a carrier?
In the context of human disease, someone is a carrier for a recessive disorder if they have a pathogenic mutation for a gene that can cause the disorder, but also a functioning copy of that gene as well.
Carriers often have no symptoms, and in some cases have mild symptoms.
If two carriers for a autosomal recessive condition have a child, on average, on out of four children will have the condition.
If a carrier (a mother) for an X-linked condition has a child, then there is a one out of two chance that any boy will have the condition.
What is an exon? What is an intron?
In most genes, the sequence of DNA for a gene will be composed of both exons and introns.
Exons are the subsequences of a gene that encode protein structure, while the remaining regions between exons – introns – are ignored during protein construction.
When proteins are constructed, there is a splicing phase that removes introns.
Exome sequencing focuses primarily on exons.
What is transcription?
What is an example of a coding RNA sequence after transcription?
For example, once transcribed into the protein-coding portion of the RNA, the code for the protein from the Gila monster looks like:
What is a transcript?
A transcript is the RNA molecule that has been produced for a gene.
(A transcript may also be called the primary transcript to distinguish it from RNA that has been processed through mechanisms such as splicing.)
What is messenger RNA (mRNA)?
What is RNA sequencing?
RNA sequencing yields insights into which genes are expressed by a particular cell and the quantity of expression.
What is transcriptomics?
Transcriptomics often seeks to discover regulatory relationships between genes.
In the context of human disease, transcriptomics can identify disregulated genes and suggest corrective therapies.
What is an amino acid?
An amino acid is an individual building block for a protein: proteins are created as chains of amino acids joined together.
Each amino acid has a side chain, with properties such as hydrophobicity (attracted to or repelled by water) and electrical charge (positive or negative).
To a large degree, the properties of the side chains determine the structure and function of the entire protein.
(For example, some proteins require chaperones to achieve their intended structure.)
There are 20 standard amino acids ordinarily used for protein synthesis:
- Alanine (Ala, A)
- Arginine (Arg, R)
- Asparagine (Asn, N)
- Aspartic acid (Asp, D)
- Cysteine (Cys, C)
- Glutamine (Gln, Q)
- Glutamic acid (Glu, E)
- Glycine (Gly, G)
- Histidine (His, H)
- Isoleucine (Ile, I)
- Leucine (Leu, L)
- Lysine (Lys, K)
- Methionine (Met, M)
- Phenylalanine (Phe, F)
- Proline (Pro, P)
- Serine (Ser, S)
- Threonine (Thr, T)
- Tryptophan (Trp, W)
- Tyrosine (Tyr, Y)
- Valine (Val, V)
Under the running example of the Gila monster protein, the chain of amino acids encoded by the gene becomes:
How are proteins created?
During the construction of a protein (often called protein synthesis), a gene is first transcribed into RNA.
Many proteins are further modified through post-translational modifications after synthesis.
What is RNA splicing?
Mutations in introns that impact splicing can still result in pathogenic alterations to the resulting protein.
What is a transcript/splice variant?
When a RNA splicing removes one or more exons, it creates transcript variants.
Transcript variants lead to different proteins, some of which have modified functionality.
Attempting to force the synthesis of an transcript variant by deliberating skipping an exon is a therapeutic strategy for some mutations.
Since mutations are usually reported with respect to a transcript variant, it is important to make sure that the transcript variants are identical when comparing mutations, or else to interconvert the mutations between transcript variants.
What is the standard genetic code? What is a codon?
The standard genetic code maps three-letter patterns in DNA to codons.
A codon is an individual instruction in the list of instructions for building a protein.
A codon encodes one of two types of instructions:
insert a specific amino acid next; and
stop production of the protein.
For example, the DNA codon
ATG (which becomes
AUG in RNA) means
“insert a methionine next,” while DNA codon
TGA (which becomes
in RNA) means “stop.”
There are 64 possible codons, but only 20 of them represent unique amino
acids. (For example,
AAG both mean “insert a Lysine.”)
Under this interpretation of DNA, we can re-orient the Gila monster gene into codons and their corresponding instructions, much as if it were a computer program or a recipe:
AAC; // Insert Asparagine CTG; // Insert Leucine TAT; // Insert Tyrosine ATT; // Insert Isoleucine CAG; // Insert Glutamine TGG; // Insert Tryptophan CTG; // Insert Leucine AAA; // Insert Lysine GAT; // Insert Aspartic Acid GGC; // Insert Glycine GGC; // Insert Glycine CCG; // Insert Proline AGC; // Insert Serine AGC; // Insert Serine GGC; // Insert Glycine CGC; // Insert Arginine CCG; // Insert Proline CCG; // Insert Proline CCG; // Insert Proline AGC; // Insert Serine TGA; // Stop
After running this program, we have the following sequence of amino acids stitched together:
What is protein translation?
During protein synthesis, protein translation is the construction of an amino acid sequence from the corresponding RNA under the standard genetic code.
What are the types of mutations in proteins?
With an understanding of the standard genetic code, it is possible to categorize mutations according to their impact on the protein.
The major categories for mutations in protein-coding region of genes are:
- premature stop mutations;
- frameshift mutations;
- in-frame mutations;
- missense mutations; and
- synonymous mutations.
How do you interpret HGVS notation for mutations?
Variants are commonly reported in two different ways at the same time, in the coding DNA notation and in the protein-coding notation.
c.24G>C means that the 24rd nucleotide was changed from G to C.
c.24Cdel means that the 24th nucleotide was deleted.
c.67_68insT means that a T was inserted between nucleotides 67 and 68.
The protein-coding notation (prefixed with p.) indicates the effect of a mutation on the amino acid sequence for a protein.
p.Q631SfsX7 means that the 631st codon was changed from a glutamine to a serine due to a frameshift mutation that resulted in a new stop codon 7 codons away. (Possibly als written as p.Gln631fsTer7 or p.Gln641fs*7)
What is a premature stop / framestop / nonsense mutation?
A premature stop (also called nonsense) mutation truncates construction of the protein by turning a codon for an amino acid into a stop codon.
For example, changing the first
AGA turns it into
AGA codes for the amino acid arginine, but
TGA codes for stop.
Truncation usually destroys the function of the resulting protein. (In fact, nonsense-mediated decay may even prevent the production of proteins with such mutations.)
For example, the mutation p.R401X (possibly also written p.Arg401Ter or p.Arg401*) indicates that the 401st amino acid (an arginine) has been replaced by a stop codon.
For a premature stop mutation, readthrough compounds should be investigated as potential therapeutics.
What is a frameshift mutation?
A mutation that inserts or deletes a number of nucleotides that is not an even multiple of three will cause a “frameshift” in which subsequent codons are misinterpreted.
Frameshifts almost always cause a loss of function mutation in the resulting protein.
For example, c.C1891del (also written as p.Q631SfsX7) indicates that a the 1,891st nucleotide in the coding DNA for a protein (in this case a cytosine) was deleted, which caused the codon at position 631 (formerly a glutamine) and all subsequent codons to be garbled, until the introduction of a stop seven codons down.
What is an in-frame mutation?
An in-frame mutation is an insertion or deletion of nucleotides in a protein-coding region which is an even multiple of three.
This may add or remove one or several amino acids.
In-frame mutations are generally less damaging than premature stop or frameshift mutations.
What is a missense mutation?
A missense mutations changes a codon from one amino acid to another.
If a missense changes the type of the side chain of the amino acid (e.g. hydrophobic to polar, positive to negative) it is more likely to damage the function of the protein than missense mutations that do not.
For example, p.W244R indicates a tryptophan has become an arginine at codon 244 (which changes a hydrophobic side chain to into a positively charged side chain). In some cases, this could be a loss of function mutation.
To be clear, even a mutation that does not change the side chain type (such as alinine to valine) can still be pathogenic, as it can alter the function of the protein, as in the case of prion disease.
If the effect of a missense mutation is unclear, simulating protein folding may provide insight into the pathogenicity based on its impact on structure.
What is a synonymous mutation?
A synonymous mutation changes the underlying nucleotides for a codon, but it does not change the amino acid.
For example, a mutation that changes
TTG probably has no
effect, since both codons insert the amino acid Leucine.
What is protein folding?
Once a protein is synthesized as a chain of amino acids, that chain begins folding into a 3D structure determined by the side chains on its amino acids; properties such as the acidity (pH) of its environment and temperature and the presence of chaperone proteins.
As a principle, folding tends to minimize the number of hydrophobic (“water-fearing”) side chains on the exterior of the final form.
The final structure of the protein determines its function.
As an example, the Gila monster protein, after folding into its 3D shape, looks like:
Proteins that fail to fold correctly can lead to human disease.
What is primary protein structure?
The primary structure of a protein is the sequence of amino acids from which it is made.
What is secondary protein structure?
Secondary protein structure refers to commonly occurring local units of structure in proteins that span several amino acids. A beta sheet is an example of a commonly occurring secondary unit of structure.
What is a tertiary protein structure?
Tertiary protein structure refers to the 3D shape of a protein after folding is complete.
What is a quaternary protein structure?
Quaternary protein structure refers to the arrangements of multiple proteins in a complex.
What is a protein complex?
A protein complex is a multi-protein structure.
What is simulated protein folding?
Protein folding simulation is a technique in computational biology for predicting the folded structure of a protein.
While ab initio protein folding simulations can be computationally expensive and accuracy is problematic, a computed folding may offer evidence as to whether domains of function in a protein are still functional, or whether their binding affinity has been altered.
Programs such as Phyre2 attempt to predict the folding of a protein from sequence data, which may yield insight into the structure of a mutant protein.
What is crystallography?
Crystallography is a collection of methods for studying and determining the structure of a crystal.
By freezing proteins into crystals, crystallography (in particular, x-ray crystallography), can determine their 3D structure.
What is an antisense oligonucleotide?
An antisense [oligonucleotide] is a short sequence of RNA or DNA designed to bind to a particular target sequence and effectively nullify it.
For example, if the RNA sequence to be muted is
AUAG, then the
antisense oligonucleotide is
Because antisense oligonucleotides bind to their complements, they can mute them during protein translation.
In the case of [knockdown] model organisms, these complementary fragments of RNA can silence an entire gene.
In the case of exon-skipping therapeutics, an antisense oligonucleotide for a particular exon (such as the one containing the harmful mutation) can cause the exon to be spliced out during protein construction.
What is a post-translational modification?
Proteins are often modified after translation by processes such as glycosylation, phosphorylation or methylation.
The sites at which proteins are modified is often based on consensus sequences among amino acids in a protein.
There are a variety of tools for predicting post-translational modification sites.
A mutation that alters the consensus sequence – for instance changing the asparagine to an aspartate – would likely prevent N-linked glycans from being attached, which may in turn alter the function or stability of the resulting protein.
What functions do proteins have?
Proteins serve a variety of functions within a cell.
In a genetic disorder that involves a protein, the type of protein involved may influence therapeutic strategies.
Major types of proteins include:
- hormone/signaling proteins;
- storage proteins;
- motor proteins;
- immune proteins;
- protective proteins;
- transporter proteins;
- structural proteins; and
- regulatory proteins.
Some proteins have more than one type.
What is a domain of function?
A domain of function is a region within a protein that is responsible for a specific function.
A given protein may have several domains which work together, domains that work independently or some combination thereof.
When analyzing mutations, it is useful to examine their potential impact on the known domains of function within that protein.
What is an enzyme? What is a metabolic pathway?
An enzyme is a protein that enables chemical reactions and molecular transformations.
Thus, an enzyme creates a metabolic pathway from the inputs to the reaction to the outputs of the reactions.
A metabolic pathway is an enzyme-driven process for conducting chemical reactions and molecular transformations.
A pathway diagram for the enzyme lactase illustrates the pathway:
----> galactose / lactose --- \ ----> glucose
Some humans with mutations in the gene for lactase – LCT – cannot produce lactase, and as a result, they cannot digest lactose, a condition known as lactose intolerance.
(In fact, as humans age, they produce less lactase, resulting in increasing lactose intolerance as this pathway shuts down.)
Collectively, enzymes and the pathways they define determine the metabolism of an organism.
What is a substrate?
A substrate is a compound on which an enzyme acts.
For example, for the enzyme lactase, its substrate is lactose.
What is a transporter protein?
A membrane transporter protein is like an automatic door on the surface of the cell (or an intracellular compartment) that only opens for specific molecules.
Membrane transporter proteins are selectively permeable membrane-bound proteins that regulate the movements of molecules inside and outside of a cell and between intracellular compartments.
Ion channels are an example of a transporter protein.
What is an ion channel?
Ion channels are formed by membrane-bound transporter proteins that regulate the flow of ions into and out of a cell or intracellular compartment.
A defect in a ion channel protein can lead to a channelopathy.
What is a channelopathy?
Channelopathies are disorders caused by defects in ion channel proteins – specific class of transporter protein – which regulate the flow of ions across a membrane.
What is a receptor?
Receptors are (generally) membrane-bound proteins that transmit signals across a membrane boundary, often from outside a cell to inside a cell.
When an agonist docks with its target receptor, it causes the release of a secondary messenger compound on the opposite side of the membrane.
Receptors can have a baseline activity level – known as constitutive activity – even in the absence of their agonist.
With receptors, antagonists play the role of inhibitors, blocking agonists from reaching the receptor and preventing the transmission of a signal.
Unlike enzymes, receptors may also be susceptible to an inverse agonist, which lower their constitutive activity. (Enzymes have no background activity in the absence of a substrate.)
What is constitutive activity?
Constitutive activity is the baseline activity of a receptor in the absence of an agonist.
What is a secondary messenger?
A secondary messenger may trigger a cascade of reactions to the external stimulus.
What is an agonist?
What is an antagonist?
What is an inverse agonist?
What is a chaperone?
A chaperone protein aids in the folding of other proteins, or helps to maintain protein folding in the presence of stress (such as a higher temperature).
What is cytogenetic notation?
Cytogenetic notation is the notation used to describe regions within chromosomes.
The general format of the notation is:
<chromosome number> 'p' or 'q' <band number> . <sub-band number>
3p24.2 indicates chromosome 3,
p indicates the short
arm of the chromosome,
24 is the 24th colored band, and
the second sub-band within the 24th band.
Characteristic bands show up on chromosomes when they’re stained with trypsin, and these bands are identify regions.
In a condition with chromosomal abnormalities, it is important to to determine from the cytogenic notation which regions of the chromosome have been deleted or duplicated.
The UCSC genome browser can list all of the impacted genes in a region.
With the advent of sequencing, chromosomal abnormalities are now often reported in very specific ranges of base pairs.
What is somatic mosaicism?
Somatic mosaicism is the technical term for what happens when mutations happen early on during development – in the germline.
For example, if a mutation happens in a cell when a developing human being is roughly ten cells, than that mutation may be present in about 10% of the resulting cells.
Somatic mosaicism can lead to an individual suffering from a disorder incompletely, and it can also complicate diagnostic sequencing: if the tissue used for sequencing does not contain the pathogenic mutation, then sequencing will not find it.
What is a mechanism of harm?
A mechanism of harm is the process by which a diseases causes harm.
The primary mechanism of harm is the root cause of the disorder, which in the cause of genetic disorders, is a mutation or group of mutations.
A downstream mechanism of harm is a later link in the chain of causes.
For example, in cystic fibrosis, according to (Ratjen, 2009) the initial loss of function in the CFTR gene leads to:
defective chloride and thiocyanate ion transport across cell membranes;
which leads to loss of surface liquid in the airway;
which leads to destabilization of cilia and loss of mucociliary transport;
which leads to retention of phlegm;
which leads to infection;
which leads to inflammation;
which in turn aggravates the retention of phlegm.
It may be possible to devise therapeutic strategies that intervene at any link in the chain between mechanisms of harm.
What is a model organism?
A model organism is an organism that has been genetically modified or bred to exhibit a specific phenotype or the analog of a human disorder.
Common model organisms include:
- E. coli bacteria;
- yeast (often Saccharomyces cerevisiae);
- worms (often Caenorhabditis elegans);
- fruit flies (often Drosophila melanogaster);
- zebrafish (Danio rario);
- mice; and
and there are dozens of other model organisms also used in research.
In the context of genetic disorders, model organisms may have human mutations introduced (a knock-in), or an entire gene removed (a knockout).
Creating model organisms is useful in many stages of disease research, from discovery and diagnosis to the development of therapeutics.
What is a knockout organism?
A knockout organism is one in which a gene has been removed.
Knockout organisms are useful for studying the role of a particular gene.
Knockout organisms are also useful for studying recessive disorders in which the primary mechanism is total loss of function in a gene.
What is a knockdown organism?
By introducing interfering RNA for a specific gene into to cells, it is possible to dial down (and even eliminate) the expression of a target gene.
When the interfering RNA is introduced exogenously, the interfering RNA is eventually depleted and gene activity is restored.
It is also possible to construct permanent knockdowns, in which gene expression is dialed down through genetic modification.
Because the amount of interfering RNA can be varied, it is posible to study the effect of differing levels of expression of a gene.
What is a knock-in organism?
A knock-in organism is one in which a specific gene has been introduced.
In organisms that contain the equivalent of a human gene, the human version can be inserted to test its functional equivalence and conservation.
In addition, a disease-causing version of a gene from a human patient can be knocked in to create a more faithful model organism in which to study a disorder.
Some therapeutic strategies can only be tested on knock-in models. For example, over-expression of a mutant allele only works if there is a mutant allele to upregulate.
What is a fibroblast?
In the context of human disease, a fibroblast cell line is a cell line usually created from the skin of patients (and sometimes family members).
What is a stem cell? What is an iPS cell?
A stem cell is an immature cell that can be differentiated into many different cell types, e.g., skin, cardiac tissue, neurons.
Embyronic stem cells are the most differentiable.
Induced pluripotent stem cells (iPS cells) are mature adult cells that have been reprogrammed to behave like early stem cells.
What are the advantages of iPS cell lines?
For genetic disorders, iPS cells make it possible to obtain tissue types which may otherwise be difficult or impossible to extract directly from patients.
What is a biomarker?
A biomarker is an observable indicator of a disease state, often used is clinical trials to measure the effectiveness of a therapy.
An example of a biomarker is the presence of oligosaccharides in urine; this is a biomarker for many lysosomal storage diseases.
As another example, low protein in the cerebrospinal fluid (CSF) is a biomarker for NGLY1 deficiency.
What is an assay?
In biochemistry, an assay is a technique for measuring a property of interest in a sample.
An assay is defined by both a process and a collection of material necessary to execute the process.
Assays are often run by hand, but in some cases, they may be automated for high-throughput screening.
From the perspective of practicing precision medicine, it is critical to note that developing an assay may require an act of scientific discovery, which in turn hinges on the creative process underpinning scientific progress.
For example, an assay that targets the primary mechanism of harm in an enzyme deficiency disorder will (somehow) measure the activity of the enzyme.
Running the assay on patient cells should show no activity for the enzyme, while running the assay on control cells should show activity for the enzyme.
In this example, the assays is useful for validating compounds predicted to restore the activity of the missing enzyme.
For instance, if patient cells are treated with the compound and then run through the assay, and it shows activity, then the compound may be a target for drug development.
What is a molecular therapy?
A molecular therapy is one that targets the molecular basis for a disorder.
In a genetic disorder, a molecular therapy can attempt to correct a mutation itself at a genetic level (as in gene-editing or readthrough therapeutics), or it can attempt to compensate at the protein level (as in mutant protein stabilization), or it can intervene at a downstream mechanism of harm with a molecular basis.
What is gene expression?
Increasing gene expression for a protein-coding gene implies increasing the protein encoded by that gene.
In some contexts, increasing gene expression refers to increasing the RNA transcription for the gene (and assuming a corresponding increase in the protein).
What is gene regulation?
Gene regulation refers to the processes and entities that manage the expression of genes.
Even though every cell (except for sperm and eggs) contains every gene for an individual, not every cell expresses all of the gene products associated with every gene, and even with a cell, the proteins being expressed depend on the environment and state of the cell.
What is a gene regulatory network?
Genes exert regulatory effects on each other: the expression of one gene may increase or decrease the expression of another gene.
RNA sequencing, the basis for transcriptomics, examines a snapshot of the RNA content in a cell, which can infer regulatory relationships.
A gene regulatory network expresses the regulatory dependence relationships between genes.
What is an antibody?
Within the immune system, an antibody is a kind of protein that is adapted to recognize many different (presumably infectious) agents.
Antibodies have a Y-like structure, and the tips contain a region that can be varied significantly to recognize a wide variety of molecules.
Once an antibody recognizes its target, it signals it for destruction (or may actively disable the target on its own).
Some therapeutics are also based on developing custom antibodies to recognize harmful agents.
In laboratory science, special antibodies are often developed to detect the presence and quantity of particular agents.
What is binding affinity?
Drug design often depends on optimizing binding affinity.
How do I find the right scientific expert?
One of the most challenging aspects of precision medicine from the patient side is the identification of qualified and appropriate scientific expertise.
The Weber Lab at Harvard has created search engines for expertise:
In standard medicine, patients expect to be able to work with a single physician, but in precision medicine, each stage may require a different physician or scientist.
Evaluating the quality and fit of a potential scientific partner may also be challenging for non-academics.
Questions to answer when considering the fit of a scientist include:
Does the scientist have publications that relate to the genes, models or mechanisms of interest?
Does the scientist have research funding already aligned with genes, models or mechanisms of interest?
If funding a scientist, it is important to request a plan that explains and justifies both the basic and translational scope of research efforts.