John Carlisle is a British anaesthesiologist, who works in a seaside Torbay Hospital near Exeter, at the English Channel. Despite not being a professor or in academia at all, he is a legend in medical research, because his amazing statistics skills and his fearlessness to use them exposed scientific fraud of several of his esteemed anaesthesiologist colleagues and professors: the retraction record holder Yoshitaka Fujii and his partner Yuhji Saitoh, as well as Scott Reuben and Joachim Boldt. This method needs no access to the original data: the number presented in the published paper suffice to check if they are actually real. Carlisle was fortunate also to have the support of his journal, Anaesthesia, when evidence of data manipulations in their clinical trials was found using his methodology. Now, the editor Carlisle dropped a major bomb by exposing many likely rigged clinical trial publications not only in his own Anaesthesia, but in five more anaesthesiology journals and two “general” ones, the stellar medical research outlets NEJM and JAMA. The clinical trials exposed in the latter for their unrealistic statistics are therefore from various fields of medicine, not just anaesthesiology. The medical publishing scandal caused by Carlisle now is perfect, and the elite journals had no choice but to announce investigations which they even intend to coordinate. Time will show how seriously their effort is meant.
Carlisle’s bombshell paper “Data fabrication and other reasons for non-random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals” was published today in Anaesthesia, Carlisle 2017, DOI: 10.1111/anae.13962. It is accompanied by an explanatory editorial, Loadsman & McCulloch 2017, doi: 10.1111/anae.13938. A Guardian article written by Stephen Buranyi provides the details. There is also another, earlier editorial in Anaesthesia, which explains Carlisle’s methodology rather well (Pandit, 2012).
What Carlilse tested was how realistic are the variabilities provided for the baselines of the patient cohorts used in the study. Such baselines are categorical parameters like sex or presence vs absence of a certain illness, or continuous parameters like patients’ body weight, their blood pressure or other measurable physiological values. When patients are distributed for the purpose of the clinical trial in separate cohorts (say, one to receive an intervention and the other being the control), the distribution done by an objective triallist should be random. Which means the mean values in each cohort should be similar. These numbers are of course easy to fake. The statistical trap for cheaters lies elsewhere: in faking a realistic variance (V), which is the degree of departure from expected (or the standard deviation, SD, which is sqrt(V)). The standard error gets smaller the bigger the sample gets; this is also why clinical trials involving only a handful of patients (say, 10 or 30 participants) are to be taken with greatest caution: one single “outlier” or wrongly measured patient can skew the entire analysis, and the error is just too big to allow the applied therapeutic effect appear any significant. Of course, you might obtain no effect despite a very large trial participant cohort simply because your clinical intervention (your pills or therapy) do not work at all. In that case, once you decided to fake your baseline values to procure some significance, you will have it very tough faking their standard deviations anywhere realistically. Of course, hardly any peer reviewer or statistics editor would check those anyway during peer review. But Carlisle now did check, and for the first time he did so not just for his own journal Anaesthesia, but for a bunch of other journals, which certainly did not ask (and probably not welcomed either) for this kind of post-publication peer review.
In a clinical trial paper, you must declare the actual number of your trial participants (100 people, 1000, 2000) in total and for each intervention group, and once you did this, the standard deviations for the patient cohorts can be easily calculated and compared with the ones you provided in your publication. The parallel published editorial explains:
“Essentially, Carlisle’s method identiﬁes papers in which the baseline characteristics (e.g. age, weight) exhibit either too narrow or too wide a distribution than expected by chance, resulting in an excess of p values close to either one or zero”.
Carlisle now applied this statistics screen over 5000 of clinical trials in different journals, and noticed that in around 1-2% of those the paired SD baseline values made absolutely no sense at all, even with the maximum of good will. He even accounted for all kind of possible author mistakes and typos, to allow a possibility of an honest error. And he also set his threshold extremely high at p < 0.0001, to make sure that anything caught there can never, under no realistic circumstances, show the proclaimed random distribution of patients. The authors either rigged the distribution of their control and intervention cohorts, or they just faked the numbers retrospectively, all to pretend a significant effect of their clinical intervention. As already mentioned, one needs some rather advanced statistical skills to fake SD values in such case for them to seem realistic. Here Carlisle’s method description:
“I extracted baseline summary data for continuous variables, reported as mean (SD) or mean (SEM). I did not study trials for which participant allocation was not described as random, or trials that did not report baseline continuous variables, or those that reported a different summary measure, such as median (IQR or range). I deﬁned ‘baseline’ as a variable measured before groups were exposed to the allocated intervention, variables such as age, height, ‘baseline’ blood pressure or serum sodium concentration. I excluded variables that had been stratiﬁed. I recorded whether the allocation sequence had been generated in blocks, permuted or otherwise, which could reduce the distribution of means for time-varying measurements”.
There was also a very peculiar control sample Carlisle used: retracted clinical trials, like those of Fujii, where data manipulation was either admitted or was to be expected. Carlisle took them as gold standard to calibrate his analysis. Some of these retracted papers in fact even missed the threshold of p < 0.0001 and were undetectable by this generously set gate, which kind of indicates how badly manipulated those other papers must be, which did pass the threshold. As Carlisle put it:
“Some p values were so extreme that the baseline data could not be correct: for instance, for 43/5015 unretracted trials the probability was less than 1 in 1015 (equivalent to one drop of water in 20,000 Olympic-sized swimming pools)”.
What about those which escaped then? Carlisle offered another way to catch potential cheaters: once suspicious but not yet conclusive values emerged for the same author in several independent papers, another statistics analysis could be applied to prove fraud.
This are the papers and journals which Carlisle analysed:
“I wanted to determine whether data distributions in trials published in specialist anaesthetic journals have been different to distributions in non-specialist medical journals. I analysed the distribution of 72,261 means of 29,789 variables in 5087 randomised, controlled trials published in eight journals between January 2000 and December 2015: Anaesthesia (399); Anesthesia and Analgesia (1288); Anesthesiology (541); British Journal of Anaesthesia (618); Canadian Journal of Anesthesia (384); European Journal of Anaesthesiology (404); Journal of the American Medical Association (518) and New England Journal of Medicine (935). I chose these journals as I had electronic access to the full text”.
Obviously, subscription paywall is a great tool to hide fraud behind it. It did apparently prevent Carlisle from accessing a number of publications using his hospital’s limited access to medical literature. This might provide one reason why medical publishing is so reluctant to move to open access. Carlisle concludes:
“Fraud, unintentional error, correlation, stratiﬁed allocation and poor methodology might have contributed to the excess of randomised, controlled trials with similar or dissimilar means, a pattern that was common to all the surveyed journals. It is likely that this work will lead to the identiﬁcation, correction and retraction of hitherto unretracted randomised, controlled trials”.
The papers are in fact all easily identifiable using the Supplementary Table 1 from Carlisle 2017 paper, the ones at the top are seen as the most problematic ones, namely those with the most ridiculously unrealistic distribution in baseline values. Together with some explanations about the analysed values and authors’ potential sources of error, the year, issue and page number of the analysed publications are provided by Carlisle for each journal. Entering these in an internet search gives you immediately the exact publication in question. There is no escape; the information about the phony data in a number of clinical trials is out there. As Loadsman & McCulloch wrote in their editorial, inviting journal editors to correct and retract unreliable papers:
“Each editor only has to work his/her way down the list. We cannot say at what point in the list editors should desist, and the journals will need to exercise their own discretion”.
I list below as example some papers straight off the top each for JAMA and NEJM, which I could unmask in this way. The former even includes one paper retracted for fraud, another paper had a correction of baseline values issued. Maybe someone can even make an automatic conversion of Carlisle’s Excel document, with hyperlinks to papers? In any case, the journals affected cannot ignore this easily, and I am updating this text below with their responses.
Update 7.06.2017: in this added comment, I now also list the anaesthesiology journal papers which Carlisle found most problematic (p<0.00001).
- Effect of Metformin and Rosiglitazone Combination Therapy in Patients With Type 2 Diabetes Mellitus A Randomized Controlled Trial
Vivian Fonseca, MD; Julio Rosenstock, MD; Rita Patwardhan, PhD; Alan Salzman, MD, PhD
JAMA. 2000;283(13):1695-1702. doi:10.1001/jama.283.13.1695
- Ketoconazole for Early Treatment of Acute Lung Injury and Acute Respiratory Distress Syndrome A Randomized Controlled Trial
The ARDS Network Authors for the ARDS Network
JAMA. 2000;283(15):1995-2002. doi: 10.1001/jama.283.15.1995
- Management of Chronic Tension-Type Headache With Tricyclic Antidepressant Medication, Stress Management Therapy, and Their Combination A Randomized Controlled Trial
Kenneth A. Holroyd, PhD; Francis J. O’Donnell, DO; Michael Stensland, MS; Gay L. Lipchik, PhD; Gary E. Cordingley, MD, PhD; Bruce W. Carlson, PhD
JAMA. 2001;285(17):2208-2215. doi:10.1001/jama.285.17.2208
- Effect of Blood Pressure Lowering and Antihypertensive Drug Class on Progression of Hypertensive Kidney Disease Results From the AASK Trial
Jackson T. Wright, Jr, MD, PhD; George Bakris, MD; Tom Greene, PhD; Larry Y. Agodoa, MD; Lawrence J. Appel, MD, MPH; Jeanne Charleston, RN; DeAnna Cheek, MD; Janice G. Douglas-Baltimore, MD; Jennifer Gassman, PhD; Richard Glassock, MD; Lee Hebert, MD; Kenneth Jamerson, MD; Julia Lewis, MD; Robert A. Phillips, MD, PhD; Robert D. Toto, MD; John P. Middleton, MD; Stephen G. Rostand, MD; for the African American Study of Kidney Disease and Hypertension Study Group
JAMA. 2002;288(19):2421-2431. doi:10.1001/jama.288.19.2421
- Impact of Electron Beam Tomography, With or Without Case Management, on Motivation, Behavioral Change, and Cardiovascular Risk Profile A Randomized Controlled Trial
Patrick G. O’Malley, MD, MPH; Irwin M. Feuerstein, MD; Allen J. Taylor, MD
JAMA. 2003;289(17):2215-2223. doi: 10.1001/jama.289.17.2215
- Effect of Testosterone Supplementation on Functional Mobility, Cognition, and Other Parameters in Older Men A Randomized Controlled Trial
Marielle H. Emmelot-Vonk, MD; Harald J. J. Verhaar, MD, PhD; Hamid R. Nakhai Pour, MD, PhD; André Aleman, PhD; Tycho M. T. W. Lock, MD; J. L. H. Ruud Bosch, MD, PhD; Diederick E. Grobbee, MD, PhD; Yvonne T. van der Schouw, PhD
JAMA. 2008;299(1):39-52. doi:10.1001/jama.2007.51
7. Effect of Physical Activity on Cognitive Function in Older Adults at Risk for Alzheimer Disease A Randomized Trial
Nicola T. Lautenschlager, MD; Kay L. Cox, PhD; Leon Flicker, MBBS, PhD; Jonathan K. Foster, DPhil; Frank M. van Bockxmeer, PhD; Jianguo Xiao, MD, PhD; Kathryn R. Greenop, PhD; Osvaldo P. Almeida, MD, PhD
JAMA. 2008;300(9):1027-1037. doi:10.1001/jama.300.9.1027
Incorrect Data (JAMA, January 21, 2009—Vol 301, No. 3): In the Original Contribution entitled “Effect of Physical Activity on Cognitive Function in Older Adults at Risk for Alzheimer Disease: A Randomized Trial” published in the September 3, 2008, issue of JAMA (2008;300: 1027-1037), incorrect data were reported in Table 7, which appears on page 1036. In the row “Total ADAS-Cog [Alzheimer Diseas Assessment Scale– Cognitive Subscale]” and in the “Control Group” columns, the geometric mean (SD) for the completers should have been “6.4 (1.8)” and “10.6 (1.4)” for the dropouts.
- Laparoscopic Adjustable Gastric Banding in Severely Obese Adolescents A Randomized Trial
Paul E. O’Brien, MD, FRACS; Susan M. Sawyer, MBBS, MD, FRACP; Cheryl Laurie, RN, BHSc; Wendy A. Brown, MBBS, PhD, FRACS; Stewart Skinner, MBBS, PhD, FRACS; Friederike Veit, MBBS, MD, FRACP; Eldho Paul, MSc; Paul R. Burton, MBBS, FRACS; Melanie McGrice, BSc, M Nutr Diet; Margaret Anderson, BHIM, Grad Dip HA; John B. Dixon, MBBS, PhD, FRACGP
JAMA. 2010;303(6):519-526. doi:10.1001/jama.2010.81
- Cognitive Behavioral Therapy for Treatment of Chronic Primary Insomnia A Randomized Controlled Trial
Jack D. Edinger, PhD; William K. Wohlgemuth, PhD; Rodney A. Radtke, MD; Gail R. Marsh, PhD; Ruth E. Quillian, PhD
JAMA. 2001;285(14):1856-1864. doi:10.1001/jama.285.14.1856
- Chemoembolization Combined With Radiofrequency Ablation for Patients With Hepatocellular Carcinoma Larger Than 3 cm A Randomized Controlled Trial
Bao-Quan Cheng, MD, PhD; Chong-Qi Jia, PhD; Chun-Tao Liu, MD; Wei Fan, MD; Qing-Liang Wang, MD; Zong-Li Zhang, MD, PhD; Cui-Hua Yi, MD, PhD
JAMA. 2008;299(14):1669-1677. doi:10.1001/jama.299.14.1669
Retraction: JAMA. 2009;301(18):1931. doi:10.1001/jama.2009.640
1. High-Dose Atorvastatin after Stroke or Transient Ischemic Attack
The Stroke Prevention by Aggressive Reduction in Cholesterol Levels (SPARCL) Investigators
N Engl J Med 2006; 355:549-559 August 10, 2006 DOI: 10.1056/NEJMoa061894
- Horse versus Rabbit Antithymocyte Globulin in Acquired Aplastic Anemia
Phillip Scheinberg, M.D., Olga Nunez, R.N., B.S.N., Barbara Weinstein, R.N., Priscila Scheinberg, M.S., Angélique Biancotto, Ph.D., Colin O. Wu, Ph.D., and Neal S. Young, M.D.
N Engl J Med 2011; 365:430-438 August 4, 2011 DOI: 10.1056/NEJMoa1103975
3. Primary Prevention of Cardiovascular Disease with a Mediterranean Diet
Ramón Estruch, M.D., Ph.D., Emilio Ros, M.D., Ph.D., Jordi Salas-Salvadó, M.D., Ph.D., Maria-Isabel Covas, D.Pharm., Ph.D., Dolores Corella, D.Pharm., Ph.D., Fernando Arós, M.D., Ph.D., Enrique Gómez-Gracia, M.D., Ph.D., Valentina Ruiz-Gutiérrez, Ph.D., Miquel Fiol, M.D., Ph.D., José Lapetra, M.D., Ph.D., Rosa Maria Lamuela-Raventos, D.Pharm., Ph.D., Lluís Serra-Majem, M.D., Ph.D., Xavier Pintó, M.D., Ph.D., Josep Basora, M.D., Ph.D., Miguel Angel Muñoz, M.D., Ph.D., José V. Sorlí, M.D., Ph.D., José Alfredo Martínez, D.Pharm, M.D., Ph.D., and Miguel Angel Martínez-González, M.D., Ph.D., for the PREDIMED Study Investigators*
N Engl J Med 2013; 368:1279-1290 April 4, 2013 DOI: 10.1056/NEJMoa1200303
4. Extended Antiretroviral Prophylaxis to Reduce Breast-Milk HIV-1 Transmission
Newton I. Kumwenda, Ph.D., Donald R. Hoover, Ph.D., Lynne M. Mofenson, M.D., Michael C. Thigpen, M.D., George Kafulafula, M.B., B.S., Qing Li, M.Sc., Linda Mipando, M.Sc., Kondwani Nkanaunena, M.Sc., Tsedal Mebrahtu, Sc.M., Marc Bulterys, M.D., Ph.D., Mary Glenn Fowler, M.D., M.P.H., and Taha E. Taha, M.D., Ph.D.
N Engl J Med 2008; 359:119-129 July 10, 2008 DOI: 10.1056/NEJMoa0801941
5. Treatment of Periodontitis and Endothelial Function
Maurizio S. Tonetti, D.M.D., Ph.D., Francesco D’Aiuto, D.M.D., Ph.D., Luigi Nibali, D.M.D., Ph.D., Ann Donald, Clare Storry, B.Sc., Mohamed Parkar, M.Phil., Jean Suvan, M.Sc., Aroon D. Hingorani, Ph.D., Patrick Vallance, M.D., and John Deanfield, M.B., B.Chir.
N Engl J Med 2007; 356:911-920 March 1, 2007 DOI: 10.1056/NEJMoa063186
6. Effect of Bronchoconstriction on Airway Remodeling in Asthma
Christopher L. Grainge, Ph.D., Laurie C.K. Lau, Ph.D., Jonathon A. Ward, B.Sc., Valdeep Dulay, B.Sc., Gemma Lahiff, B.Sc., Susan Wilson, Ph.D., Stephen Holgate, D.M., Donna E. Davies, Ph.D., and Peter H. Howarth, D.M.
N Engl J Med 2011; 364:2006-2015 May 26, 2011 DOI: 10.1056/NEJMoa1014350
Updates about journal replies
Reply from Howard Bauchner, Editor in Chief, JAMA and The JAMA Network
“We receive numerous allegations about various issues related to the articles we publish.
This allegation will be treated in a similar manner. We will assess the validity the allegation, and potentially contact the individual making the allegation for more information or the author of the article.
Authors are always offered the option to respond to allegations that are deemed valid”
Reply from Jeffrey Drazen, Editor-in-Chief New England Journal of Medicine (NEJM):
“We are in the process of studying Dr. Carlisle’s methods and examining the points raised by him”.
Reply from Martin Tramèr, Editor-in-Chief European Journal of Anaesthesiology (EJA):
“It is about possible fraud, not about accusations or evidence.
We will look into this”.
Reply from Hugh Hemmings, EiC of British Journal of Anesthesiology
“We are currently reviewing the study by Carlisle and the data in question to determine our course of action. While concerning, there are no allegations of fraud, so we will have to carefully review the data before proceeding”.
Reply from Hilary Grocott, Editor-in-chief Canadian Journal of Anesthesia
“We take this matter very seriously and intend to investigate”.
Reply from Andrew Klein, editor in chief of Anaesthesia:
“Our journal, Anaesthesia, intends to contact the authors of the trials identified by this study, so that we can discuss with them any errors or issues with the published data in line with COPE guidance.
The six Editors-in-Chief of the anaesthetic journals all met together yesterday (05 June 2017) to discuss the Carlisle paper and their approach following its publication, and we will be following up with them individually and as a group over the next few weeks and months.
I have received further emails from the NEJM and JAMA following the publication of the article and will also keep in touch with them both. I cannot comment as to what each Editor-in-Chief chooses to do and how exactly they will proceed as I do not know – I have explained to you our intentions at Anaesthesia. However, be assured I will be following this up, as above. I would point out the accompanying editorial to the Carlisle paper in our journal (Widening the search for suspect data – is the flood of retractions about to become a tsunami?) which discusses how journals may proceed further.”
Reply from Evan Kharasch, Editor-in-Chief Anesthesiology:
“Upon my return we will be evaluating and analyzing the article by Dr. Carlisle, and those articles published by Anesthesiology cited therein, to determine if there were any issues and whether any actions are indicated”.
Reply (after a reminder) from Jean-Francois Pittet, Editor-in-Chief Anesthesia and Analgesia:
“We will be evaluating and analyzing the methodology and findings of Dr. Carlisle, and specifically, those articles that were published by Anesthesia & Analgesia and cited therein, to determine if there are any issues and whether any actions are indicated”.