"MetroHealth, Explorys use huge patient database to revolutionize medical research"

As I have written in the past, beware claims of "revolutions" where healthcare or medical research are involved.  As in a slide in my recent presentation to the Health Informatics Society of Australia on healthcare IT trust, I asked:




In that same talk, I pointed out the "revolutions" usually have downsides, and IT always produces winners and losers (per the empirical research of Social Informatics). 

I also asked "have we suffered a complete breakdown in the scientific method with regard to EHR and clinical IT?" in a 2009 post on uncontrolled EHR data and comparative effectiveness studies at this link.  

In the Aug. 29, 2012 article "MetroHealth, Explorys use huge patient database to revolutionize medical research"  in The Plain Dealer (Ohio), the following claims are proffered.  I am assuming regarding this article, as in newspaper articles I make contributions to, that the researchers were involved in its provenance and content:

Large databases of electronic medical records hold great promise for medical research. In theory they can provide doctors access to huge amounts of anonymous patient data, allowing large-scale population studies without the cost and hassle of patient recruitment, review boards and staff training.

Now, a team of data experts at MetroHealth Medical Center and the Cleveland Clinic Innovations spinoff company Explorys has shown just how powerful such medical records can be: In three months, they've replicated a major medical study that took a Norwegian team 14 years to research and report. And they've done it at a fraction of the cost, with a sample about 40 times as large.

The local effort, led by MetroHealth Chief Medical Informatics Officer Dr. David Kaelber, was possible because of Explorys' database of 14 million electronic medical records gathered from 12 major health systems.

14 million records gathered from 12 major health systems, which may in fact be using disparate EHR systems or versions, is almost indisputably not controlled data.

A comparison to a more rigorous study was conceptualized and performed:

Most of all though, he [Kaelber] needed to prove that it worked -- that he could get exactly the same results as traditional research studies that required more time and resources.

Enter the Norwegian study. In 1994, a group of researchers started registering 26,714 people in the northern region's largest city, Tromso. It was part of a much longer study on the population's heart disease risk. They recorded height, weight and other measures of obesity, then followed participants for 13 years, recording any blood clots they had.

Their conclusion: the combination of obesity and a tall stature significantly increases risk of blood clots, especially in men.

The same hypothesis was tested with the uncontrolled EHR data, with claims that (obviously) the time and expense were lower, at least in the instant sense (costs of EHR implementation and maintenance in 12 major health systems could be in the billions of dollars at a time when a health system can easily spend $100 million each over just a few years as Bob Wachter points our here - one hundred million dollars for UCSF):

... The sample: 959,030 patients with medical records in the Explorys platform. The method: using software to search for blood clots in the health and claims records contained in the database, and then looking for patterns in those patients' height and weight. The result: Exactly the same as the Tromso study.

"And actually the statistical significance of our study was much, much higher because our sample size was 40 times greater," Kaelber said. Kaelber and his team published their results online in July in the Journal of the American Medical Informatics Association.

The conclusion to the article here (subscription required) was:

With the right clinical research informatics tools and EHR data, some types of very large cohort studies can be completed [and, by implication, trusted and acted upon at local, regional or societal levels - ed.] with minimal resources.

This data was, and I don't think this can be refuted, poorly controlled.  There are limits to what "data cleansing" and standardization can accomplish with such data. In methodologies like this, confounders are too numerous to list comprehensively, but just a few - missing data; variability in observations (inter- and intra-observer variability); data originating from different vendor systems with different terminology and definitions, input by myriad people of different backgrounds with differing interpretations of terminologies (students/MD's/RN's etc); different pressures creating bias (time, reimbursement maximization, litigation avoidance); to name just a few.

Several questions: 

  • What is the likelihood these results were themselves a chance outcome? Correlation is not proof, especially where n=1. 
  • How do we reliably identify ante hoc the "types of very large cohort studies" that we can complete?
  • What is the likelihood that such techniques are generalizable to all studies (such as comparative effectiveness research, currently a governmental initiative), especially where issues and confounders might be subtle?
  • Related to generalizability supra, where are the scientific studies that prove methodologies like this are valid and reliable for other than, say, low-granularity epidemiological purposes?  Put more colloquially - how do we know what these efforts attempt to do, in what is almost indisputably a radical deviation from controlled-trial norms, is not "computational alchemy" (i.e., attempting to turn lead into gold)?
  • Is there rigorous data on attempted studies like this that failed to correlate with more traditional research?  Unless the latter is studied and/or attempted and some sense gained of whether a radical approach using uncontrolled data such as this does not fail to a concerning extent, then claims of "revolutions" need to be postponed.

Worst case scenario: large dataset, radical analytics and a political agenda can lead to undesired outcomes.  We need to know what we are doing.


More on these issues in a 2009 article I authored here

I have invited Dr. Kaelber to comment on the questions I raise. 

Note:  I am not challenging this specific study per se, which is an experiment, but am challenging the generalizability of this type of "from data rags to information riches" methodology, as I've done in other posts on similar topics. Not to mention, the overconfidence as expressed in the press, which is what the public including our lawmakers and people responsible for resource allocation read.  I think it's highly premature to write of "revolutionizing medical research."

Sep. 4, 2012

Here is the reply from Dr. Kaelber -

We actually cover many of the points you raise in the discussion of the article.

I think it is important if we are focusing on a scientific discussion of a scientific article to focus on the primary source - the scientific article itself, and not a secondary source - the reporting of the scientific article.

One of the major points in the discussion (quoted from the published manuscript) is that we feel that there are three keys to using pooled, standardized, normalized, and de-identified EHR data, including:

1. Understanding data sources - understanding of the characteristics of the underlying EHR data sources, including a data dictionary and ontologies used.

2. Corroborating data findings - internal and/or external methods to corroborate retrospective EHR cohort study data and/or findings. For example, manual chart review of a sample of the EHR data (internal validation), or other studies demonstrating similar results (external validation), ideally coupled with a biologically plausible hypothesis.

3. Clinical data versus research data - recognizing that retrospective EHR data were typically collected for clinical and not research purposes. Therefore, depending on the type of data, the quality may not meet typical research standards. In some cases, the large quantity of clinical data may help mitigate the fact that it was not collected with the higher precision and accuracy that can occur as part of a prospective research study.

As I was also quoted in the Plain Dealer article as saying this is like a new type of power drill for those who have been used to using hand drills for retrospective (electronic or paper) chart reviews. To be used appropriately, researchers need to understand the opportunities and disadvantages of using of the new tool and be trained on how to use it.

I would also add that, again, as was stated in the introduction for the published paper that clinical research informatics as a field is in its infancy and we need more people involved in the field, more tools for people in the field, and more experience of people using tools in the field to really understand both the contributions and limitations for electronic health record data to advance clinical research.

I hope these comments are of some interest/use.

-David

My thoughts are that use of of experimental EHR-based methodologies like this, that could have significant social implications, requires significant caution and validation.  I think we are in agreement on that point.  The methodology of use of EHR data from myriad sources needs to be shown valid, as well as proven not to be invalid, before the findings of "new" studies that are not attempting to duplicate prior ones are translated to the bedside or beyond.

The disadvantages include, among other issues (and perhaps most important) the spread of premature over-optimism about health IT such as in newspapers.  That problem has, I believe, contributed greatly to the prevalent hyper-enthusiasm and "there's little supportive data - nevertheless, we believe - so let's spend $15+ billion dollars" attitudes about current health IT as I wrote about, for instance, here (ONC's 'Data Palooza') and here (on MU Stage 2 justifications).

Finally, I would question whether increasing the quantity of data could ever reliably compensate for lack of quality (unless, perhaps  the additions were themselves of high quality).  Adding low quality data to already low quality data should not produce better results, it seems to me.

-- SS