Mendelian randomisation is a technique which, fuelled by the results of GWA studies, can be used to determine causal relationships between intermediate phenotypes such as metabolite levels and outcomes such as cardiovascular disease (Evans and Davey Smith 2015). Much faster and cheaper than randomised controlled trials, and relatively free from the biases of observational studies, it has the potential to identify new drug targets and reduce attrition rates in the pharmaceutical development pipeline.
The field of observational epidemiology is to thank for the discovery of the links between many environmental exposures and disease including: smoking, ionising radiation, benzene, asbestos, diethylstilbestrol, and thalidomide (Trichopoulos 1995). However these were likely the low hanging fruit of large effect sizes and more recent studies have often produced false positives. There are also a number of methodological biases in observational studies which have recently somewhat undermined the credibility of the field (Taubes 1995).
Firstly correlation, which is all that observational epidemiological studies record, does not imply causation. For example confounding, or omitted-variable bias, occurs when an external variable affects both the exposure and outcome. Additionally the outcome may affect the exposure (reverse causation) (Smith and Ebrahim 2003).
Many of the measurements of exposures are also liable to large errors and bias—especially from retrospective questionnaires. Additionally publication bias, in which negative results are not published, results in a skewed perception of potential risks. Collision bias, although avoidable, is another potential problem whereby conditioning on a common effect can induce associations where there are none (Cole et al. 2010).
Associations which were proposed but then found to be non-causal include: vitamin E supplementation and coronary heart disease, beta carotene and lung cancer, hormone replacement therapy and cardiovascular disease, high HDL cholesterol (HDL-C) and cardiovascular disease, and electromagnetic radiation and leukemia (Evans and Davey Smith 2015).
Randomised closed trials (RCT), although they do not have the problem of confounders or reverse causation, can be prohibitively expensive, slow, unethical, and impractical.
Mendelian randomisation (MR) is a technique which provides a middle ground and in theory is able to provide evidence of causal associations between exposures and outcomes (Smith and Ebrahim 2003). It is, in effect, a naturally occurring RCT, but randomisation is achieved by Mendel’s second law of independent assortment instead of via the investigator (Davey Smith 2007). Some genetic variants affect intermediate phenotypes such as LDL concentration and hence can be used as a proxy for the effect of an intervention such as statins in a RCT (Voight et al. 2012). These genetic variants will be randomised by Mendel’s second law in the same way a treatment is in a RCT. Hence any confounding variables should occur at equal frequency in both the group with and without the variant (i.e. there should be no difference between the groups apart from the instrumental genetic variant and its effects) and so like RCTs there should be no omitted variable bias. Furthermore as the genetic variant is (usually) fixed at conception, reverse causation can not occur (an exception being in cancers) (Evans and Davey Smith 2015).
This approach was first widely used in econometrics where it is called instrumental variable analysis and the formalisation is shown in figure 1 as a directed acyclic graph. IV represents the instrumental variable (the genetic variant in MR), E the exposure (such as lipid levels), Y the outcome (such as cardiovascular disease), and U the confounders (such as smoking) (Didelez and Sheehan 2007).
Figure 1: Instrumental variable analysis
For the MR approach to work the IV must act by directly changing the exposure only, and not directly on the outcome or any other variables (U). This is represented with the red lines in figure 1.
The MR approach was first suggested by Katan in 1986 (Katan 2004) (a suggestion which was not taken up) but MR studies have only recently become more popular (figure 2). Katan described how the various Apolipoprotein E isoforms could be used as instrumental variables to probe the association of serum cholesterol levels with cancer risk. This is because the function of Apo E is to remove cholesterol from the serum and each isoform differs in its ability to do so.
Figure 2: Growth of the number of MR studies as estimated by a pubmed search of
"mendelian randomisation" OR "mendelian randomization" on the 7th of December 2015 (US National Library of Medicine 2015).
Estimating Causal Effect Size
A test of causality between an exposure on an outcome can be performed by testing the regression coefficient of the outcome on the instrumental variable (βY|IV) against the null hypothesis that this value is zero. To estimate the size of the causal effect this coefficient can simply be divided by the regression coefficient of the exposure on the instrumental variable (βE|IV).
This is called the ratio method or the Wald estimator. For multiple IVs a different method must be used called two stage least squares (2SLS). In the first stage the exposure is regressed on the IVs and in the second stage the fitted exposure values are regressed on the outcome. This model simplifies to the Wald method in the case of a single IV. For maximum likelihood estimates other methods, for example using a Bayesian framework, must be used (Burgess, Small, and Thompson 2015).
Variants of Mendelian Randomisation
The estimate of βE|IV and βY|IV need not be calculated using data from the same sample. This is useful as the genetic variant to disease associations are often ascertained in case-control cohorts whereas the genetic variant to exposure associations are often ascertained in population cohort studies. Hence this approach, called “two sample MR” enables the use of the large samples which have already been collected (Pierce and Burgess 2013; Burgess et al. 2015).
Caveats of Mendelian Randomisation
Although MR studies have enormous potential to alleviate some of the problems which have plagued observational epidemiology studies, with much fewer downsides than RCTs, there are also significant problems with MR studies which one should keep in mind when assessing the strength of the evidence from these studies (Pickrell 2015).
Genetic variants typically only explain a small fraction of the population variance in a phenotype. Hence any difference in outcome due to that variant is also likely to be small and so a very large sample size will be required to be able to detect a statistically significant effect - typically tens of thousands of individuals (Pierce and Burgess 2013). Fortunately the GWAS community has already genotyped and phenotyped many hundreds of thousands of individuals as part of various projects and the two sample MR approach enables the use of data from these projects (Burgess et al. 2015). The size of causal estimates can even be determined by using only summary statistics, although to be able to test the core assumptions of the MR method individual level information is required. Additionally if there are multiple genetic variants known which affect the same exposure they can be combined in order to increase power (S. Burgess et al. 2015).
Furthermore, as the genetic variant is present from conception it causes a change in the exposure over a much longer time than in a RCT or drug treatment, which is often not prescribed until older age once a disease has manifested. Hence the genetic variants do not typically require to have as large an effect as drugs, although this results in the requirement to extrapolate effect sizes, in order to predict the effect sizes of drugs a priori (Burgess et al. 2012).
One of the largest problems in MR studies is the assumption that the genetic variants being used as instrumental variables only directly affect the exposure variable. If the variant affects more than one phenotype directly (i.e. has pleiotropic effects) then one of the core assumptions of MR is invalidated and hence so too are the strongest interpretation of the results (Davey Smith and Hemani 2014). Note that any effects in variables which result from a change in the exposure variable (vertical pleiotropy) do not invalidate the assumption. Proteins are known to work in highly interconnected networks with some proteins functioning as part of large complexes and so variants may be likely to affect multiple phenotypes (Solovieff et al. 2013).
However, it has been suggested that Egger regression can be used obtain an unbiased estimate of both the causal effect size and the systematic bias due to pleiotropy. In Egger regression the bias due to pleiotropy is modelled as is small study bias in meta-analyses and the pleiotropy is estimated by the intercept of the regression line (Bowden, Davey Smith, and Burgess 2015). Furthermore using multiple instrumental variables, as well as increasing power, also reduces possible bias due to pleiotropy as it is unlikely that they all have pleiotropic effects in the same direction (Smith 2015).
Cannelisation and Intervention
Cannelisation is the processes whereby during development an organism can counteract the presence of a genetic variant to produce a invariant phenotype. For example inhibition of myoglobin in mice disrupts myocardium function suggesting that deleterious genetic variants could do the same. However myoglobin knockout mice have apparently normal myocardium function, suggesting a compensatory mechanism is present during development (Garry et al. 1998).
Another similar processes is human intervention. For example if a genetic variant causes higher LDL cholesterol (LDL-C) levels, people carrying this variant are more likely to be prescribed statins, which lower LDL levels (Cohen, Stender, and Hobbs 2014). Hence although MR can help to remove many confounding variables, some may still be present.
Like GWAS studies, if the genetic variants being used as instrumental variables vary in frequency between populations then this could bias the investigation and produce false positives (Burgess et al. 2012). Currently in MR studies this is dealt with by using single populations.
An exception to Mendel’s second law of independent assortment is linkage disequilibrium. Variants which are close together (on the order of approximately 10kbp) are more likely to be inherited together due to the lower frequency of recombination between them (Lawlor et al. 2007). This means that other trait-influencing genetic associations have the potential to confound MR studies, perhaps especially as associated SNPs often cluster together.
Lack of Suitable Genetic Variants
Given these restrictions on the genetic variants suitable for use as instrumental variables it may sometimes be hard or even impossible to find a valid instrumental variable. However with recent growth in the number of trait-associated SNPs (figure 3) this has become much easier (Ebrahim and Davey Smith 2008; Sleiman and Grant 2010). Furthermore it may sometimes be non-obvious at first where to look for variants. For example, say you wanted to test whether organophosphates, which are found in insecticides and herbicides, cause cancer. It is not obvious that there would be any genetic variants which would predispose someone to organophosphate exposure. However there are proteins (for example paraoxonase) which are involved in the metabolism of these compounds, and hence there are genetic variants related to these proteins which could alter the effective exposure by altering the half-life of the compounds within the body (Cherry et al. 2002).
Figure 3: Approximate cumulative number of trait-associated SNPs achieving genome wide significance (p = 5 × 10 − 8) in the NHGRI-EBI GWAS Catalogue (NHGRI-EBI 2015).
Another example concerns nutritional studies, where a number of variants exist which correlate with food intake, for example variants in alcohol dehydrogenase, lactase, and Taste receptor 2 member 38 are highly associated with alcohol, dairy, and bitter food intake respectively (Holmes et al. 2014; Honkanen et al. 1997).
Applications in Drug Development
As well as providing information as to which environmental exposures should be avoided in order to protect our health, MR has the potential to help reduce the attrition rate of drug development and alleviate the productivity crisis in the pharmaceutical industry (Mokry et al. 2014).
Many drugs which undergo development fail at the final stages of clinical trials due to lack of efficacy (Cook et al. 2014; Paul et al. 2010). MR is a fast and relatively cheap method which could be used to supplement the evidence used to decide whether or not large investments should be made in taking a candidate drug through clinical trials.
For example observational epidemiology and molecular biology had suggested that higher serum HDL-C levels were protective of cardiovascular disease and that inhibiting cholesterylester transfer protein could increase HDL-C. However, such inhibitors have so far failed to pass phase III clinical trials. For example Pfizer’s Torcetrapib, although succeeded in raising HDL-C levels by almost 75%, also resulted in a 25% increase in cardiovascular disease. Dalcetrapib (Hoffmann–La Roche), and Evacetrapib (Eli Lilly) have followed similar fates (Rader and Degoma 2014; Mullard 2015a). Despite this Anacetrapib (Merck) is also starting phase III clinical trials. These failures represent billions of dollars of investment which could have potentially been more effectively spent. An MR study published in 2012 used a SNP in the endothelial lipase gene as an instrumental variable with almost 21,000 cases and concluded “a 1 SD increase in HDL cholesterol due to genetic score was not associated with risk of myocardial infarction” (Voight et al. 2012; Harrison, Holmes, and Humphries 2012). However they noted that and increase in LDL-C was associated with a genetic score for LDL-C. This is concordant with the fact that there are already licensed drugs which act on LDL-C (statins).
MR studies can also be used to predict drugs with good efficacy as well as to discount drugs with no causal link. For example an MR study which used a nonsense mutation in PCSK9 as an instrumental variable for LDL-C and cardiovascular heart disease lead to the development of anti PCSK9 antibodies which were recently licensed (Cohen et al. 2006; Mullard 2015b). In a similar vein another study found that “Loss-of-function mutations in APOC3 were associated with low levels of triglycerides and a reduced risk of ischemic cardiovascular disease.” and there is currently an APOC3 inhibitor in Phase II clinical trials (Jørgensen et al. 2014; Heart 2014).
Predicting Side Effects
MR studies can also aid in determining whether a drug is likely to have undesirable on-target side effects. For example the use of statins has been associated with an increased risk of type II diabetes. MR studies using variants in the gene coding for HMG-CoA reductase which decrease LDL-C also cause increased risk of diabetes and hence this side effect is likely an on-target effect and would result from any drug which targets this protein (Swerdlow et al. 2015).
Discovery and Repositioning
MR studies can also be used to find new causal associations which has both the potential to result in new drug targets as well as the repurposing of current drugs for other conditions. Repurposing has the advantage that preclinical research and phase I clinical trials of safety have already been completed (Ashburn and Thor 2004).
For example anti interleukin 6 antibodies (tocilizumab) have been approved for rheumatoid arthritis. However variants in the IL6 gene are also associated with coronary heart disease, suggesting that tocilizumab could be repurposed for this condition (IL6-MR-Consortium 2012).
A variant of the pheWAS methodology has recently been proposed incorporating an MR approach. Usually MR studies are hypothesis driven, which can bias the study itself and encourage publication bias, much like candidate gene studies in the pre-GWAS era. The MR-pheWAS method uses a “hypothesis free” approach in which a given exposure is tested for causality of many different outcomes using associated genetic variants (Millard et al. 2015).
One caveat in using MR to find new drug targets is that modifying some exposures, although causal in the development of disease, will not be able to reverse the development of that disease. For example smoking intensity is causal for lung cancer but stopping smoking after cancer has developed will not remove the cancer. Furthermore not all targets found by MR approaches will be druggable (Mokry et al. 2014).
Mendelian randomisation is a promising approach which utilises the vast information uncovered from GWAS studies in order to improve both public health policies and pharmaceutical development. However, there is still some lingering uncertainty in the validity of the assumptions of the technique (Pickrell 2015). Despite this it seems likely that new variants of the method, as well as more causal predictions which go on to be verified by RCTs, will abate these fears (Evans and Davey Smith 2015).
Ashburn, Ted T, and Karl B Thor. 2004. “Drug repositioning: identifying and developing new uses for existing drugs.” Nature Reviews. Drug Discovery 3 (8): 673–83. doi:10.1038/nrd1468.
Bowden, Jack, George Davey Smith, and Stephen Burgess. 2015. “Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression.” International Journal of Epidemiology 44 (2): 512–25. doi:10.1093/ije/dyv080.
Burgess, S., D. S. Small, and S. G. Thompson. 2015. “A review of instrumental variable estimators for Mendelian randomization.” Statistical Methods in Medical Research, 1–26. doi:10.1177/0962280215597579.
Burgess, S., N. J. Timpson, S. Ebrahim, and G. Davey Smith. 2015. “Mendelian randomization: where are we now and where are we going?” International Journal of Epidemiology 44 (2): 379–88. doi:10.1093/ije/dyv108.
Burgess, Stephen, Adam Butterworth, Anders Malarstig, and Simon G Thompson. 2012. “Use of Mendelian randomisation to assess potential benefit of clinical intervention.” BMJ (Clinical Research Ed.) 345 (November): e7325. doi:10.1136/bmj.e7325.
Burgess, Stephen, Robert A Scott, Nicholas J Timpson, George Davey Smith, and Simon G Thompson. 2015. “Using published data in Mendelian randomization: a blueprint for efficient identification of causal risk factors.” European Journal of Epidemiology. Springer Netherlands, 543–52. doi:10.1007/s10654-015-0011-z.
Cherry, Nicola, Mike Mackness, Paul Durrington, Andrew Povey, Martin Dippnall, Ted Smith, and Bharti Mackness. 2002. “Paraoxonase (PON1) polymorphisms in farmers attributing ill health to sheep dip.” Lancet 359 (9308): 763–64. doi:10.1016/S0140-6736(02)07847-9.
Cohen, Jonathan C, Eric Boerwinkle, Thomas H Mosley, and Helen H Hobbs. 2006. “Sequence Variations in PCSK9, Low LDL, and Protection against Coronary Heart Disease.” Heart Disease, 1264–72. doi:10.1056/NEJMoa054013.
Cohen, Jonathan C., Stefan Stender, and Helen H. Hobbs. 2014. “APOC3, Coronary Disease, and Complexities of Mendelian Randomization.” Cell Metabolism 20 (3). Elsevier Inc.: 387–89. doi:10.1016/j.cmet.2014.08.007.
Cole, S. R., R. W. Platt, E. F. Schisterman, H. Chu, D. Westreich, D. Richardson, and C. Poole. 2010. “Illustrating bias due to conditioning on a collider.” International Journal of Epidemiology 39 (2): 417–20. doi:10.1093/ije/dyp334.
Cook, David, Dearg Brown, Robert Alexander, Ruth March, Paul Morgan, Gemma Satterthwaite, and Menelas N Pangalos. 2014. “Lessons learned from the fate of AstraZeneca’s drug pipeline: a five-dimensional framework.” Nature Reviews. Drug Discovery 13 (6). Nature Publishing Group: 419–31. doi:10.1038/nrd4309.
Davey Smith, G., and G. Hemani. 2014. “Mendelian randomization: genetic anchors for causal inference in epidemiological studies.” Human Molecular Genetics 23 (R1): R89–98. doi:10.1093/hmg/ddu328.
Davey Smith, George. 2007. “Capitalizing on Mendelian randomization to assess the effects of treatments.” Journal of the Royal Society of Medicine 100 (9): 432–5. doi:10.1258/jrsm.100.9.432.
Didelez, Vanessa, and Nuala a. Sheehan. 2007. “Mendelian Randomization as an Instrumental Variable Approach to Causal Inference.” Statistical Methods in Medical Research 16: 309–30. doi:10.1177/0962280206077743.
Ebrahim, Shah, and George Davey Smith. 2008. “Mendelian randomization: can genetic epidemiology help redress the failures of observational epidemiology?” Human Genetics 123 (1): 15–33. doi:10.1007/s00439-007-0448-6.
Evans, David M., and George Davey Smith. 2015. “Mendelian Randomization: New Applications in the Coming Age of Hypothesis-Free Causality.” Annual Review of Genomics and Human Genetics 16 (1): 327–50. doi:10.1146/annurev-genom-090314-050016.
Garry, D J, G a Ordway, J N Lorenz, N B Radford, E R Chin, R W Grange, R Bassel-Duby, and R S Williams. 1998. “Mice without myoglobin.” Nature 395 (6705): 905–8. doi:10.1038/27681.
Harrison, Seamus C., Michael V. Holmes, and Steve E. Humphries. 2012. “Mendelian randomisation, lipids, and cardiovascular disease.” The Lancet 380 (9841): 543–45. doi:10.1016/S0140-6736(12)60481-4.
Heart, National. 2014. “Loss-of-Function Mutations in APOC3, Triglycerides, and Coronary Disease.” New England Journal of Medicine 371 (1): 22–31. doi:10.1056/NEJMoa1307095.
Holmes, M. V., C. E. Dale, L. Zuccolo, R. J. Silverwood, Y. Guo, Z. Ye, D. Prieto-Merino, et al. 2014. “Association between alcohol and cardiovascular disease: Mendelian randomisation analysis based on individual participant data.” Bmj 349 (jul10 6): g4164–64. doi:10.1136/bmj.g4164.
Honkanen, R., H. Kröger, E. Alhava, P. Turpeinen, M. Tuppurainen, and S. Saarikoski. 1997. “Lactose intolerance associated with fractures of weight-bearing bones in finnish women aged 38–57 years.” Bone 21 (6): 473–77. doi:10.1016/S8756-3282(97)00172-5.
IL6-MR-Consortium. 2012. “The interleukin-6 receptor as a target for prevention of coronary heart disease: a mendelian randomisation analysis.” The Lancet 379 (9822). Elsevier Ltd: 1214–24. doi:10.1016/S0140-6736(12)60110-X.
Jørgensen, Anders Berg, Ruth Frikke-Schmidt, Børge G. Nordestgaard, and Anne Tybjærg-Hansen. 2014. “Loss-of-Function Mutations in APOC3 and Risk of Ischemic Vascular Disease.” New England Journal of Medicine 371 (1): 32–41. doi:10.1056/NEJMoa1308027.
Katan, Martjin B. 2004. “Apolipoprotein E isoforms, serum cholesterol, and cancer.” International Journal of Epidemiology 33 (1): 9. doi:10.1093/ije/dyh312.
Lawlor, Debbie A., Roger M. Harbord, Jonathan A. C. Sterne, Nic Timpson, and George Davey Smith. 2007. “Mendelian randomization: Using genes as instruments for making causal inferences in epidemiology.” Statistics in Medicine 27: 1133–63. doi:10.1002/sim.3034.
Millard, Louise A. C., Neil M. Davies, Nic J. Timpson, Kate Tilling, Peter A. Flach, and George Davey Smith. 2015. “MR-PheWAS: hypothesis prioritization among potential causal effects of body mass index on many outcomes, using Mendelian randomization.” Scientific Reports 5 (February). Nature Publishing Group: 16645. doi:10.1038/srep16645.
Mokry, Lauren E, Omar Ahmad, Vincenzo Forgetta, George Thanassoulis, and J Brent Richards. 2014. “Mendelian randomisation applied to drug development in cardiovascular disease: a review.” Journal of Medical Genetics 52 (2): 71–79. doi:10.1136/jmedgenet-2014-102438.
Mullard, Asher. 2015a. “CETP set-back, again.” Nature Reviews Drug Discovery 14 (11). Nature Publishing Group: 739–39. doi:10.1038/nrd4781.
Mullard, Asher. 2015b. “PCSK9 inhibitors are go.” Nature Reviews Drug Discovery 14 (9). Nature Publishing Group: 593–93. doi:10.1038/nrd4730.
NHGRI-EBI. 2015. “GWAS Catalog.” http://www.ebi.ac.uk/gwas/docs/downloads.
Paul, Steven M, Daniel S Mytelka, Christopher T Dunwiddie, Charles C Persinger, Bernard H Munos, Stacy R Lindborg, and Aaron L Schacht. 2010. “How to improve R&D productivity: the pharmaceutical industry’s grand challenge.” Nature Reviews. Drug Discovery 9 (3): 203–14. doi:10.1038/nrd3078.
Pickrell, Joseph. 2015. “Fulfilling the promise of Mendelian randomization.” BioRxiv, 018150. doi:10.1101/018150.
Pierce, Brandon L., and Stephen Burgess. 2013. “Efficient design for mendelian randomization studies: Subsample and 2-sample instrumental variable estimators.” American Journal of Epidemiology 178 (7): 1177–84. doi:10.1093/aje/kwt084.
Rader, D J, and E M Degoma. 2014. “Future of cholesteryl ester transfer protein inhibitors.” Annu.Rev.Med. 65 (1545-326X (Electronic)): 385–403. doi:10.1146/annurev-med-050311-163305.
Sleiman, Patrick M a, and Struan F a Grant. 2010. “Mendelian randomization in the era of genomewide association studies.” Clinical Chemistry 56 (5): 723–8. doi:10.1373/clinchem.2009.141564.
Smith, George D., and Shah Ebrahim. 2003. “’Mendelian randomization’: Can genetic epidemiology contribute to understanding environmental determinants of disease?” International Journal of Epidemiology 32 (1): 1–22. doi:10.1093/ije/dyg070.
Smith, George Davey. 2015. “Mendelian randomization : a premature burial?” BioRxiv. doi:10.1101/021386.
Solovieff, Nadia, Chris Cotsapas, Phil H. Lee, Shaun M. Purcell, and Jordan W. Smoller. 2013. “Pleiotropy in complex traits: challenges and strategies.” Nature Reviews Genetics 14 (7). Nature Publishing Group: 483–95. doi:10.1038/nrg3461.
Swerdlow, Daniel I, David Preiss, Karoline B Kuchenbaecker, Michael V Holmes, Jorgen E L Engmann, Tina Shah, Reecha Sofat, et al. 2015. “HMG-coenzyme A reductase inhibition , type 2 diabetes , and bodyweight : evidence from genetic analysis and randomised trials.” The Lancet Epub ahead (9965): 1–11. doi:10.1016/S0140-6736(14)61183-1.
Taubes, Gary. 1995. “Epidemiology Faces Its Limits.” Science 269: 164–69.
Trichopoulos, Dimitrios. 1995. “The Discipline of Epidemiology.” Science, no. 2.
US National Library of Medicine. 2015. “PubMed Search.” http://www.ncbi.nlm.nih.gov/pubmed/?term=%22mendelian+randomization%22+or+%22mendelian\n+randomisation%22.
Voight, Benjamin F, Gina M Peloso, Marju Orho-Melander, Ruth Frikke-Schmidt, Maja Barbalic, Majken K Jensen, George Hindy, et al. 2012. “Plasma HDL cholesterol and risk of myocardial infarction: a mendelian randomisation study.” Lancet 380 (9841): 572–80. doi:10.1016/S0140-6736(12)60312-2.
This article and its reviews are distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and redistribution in any medium, provided that the original author and source are credited.