By: Hannah Lin (CC '23)
Can you talk a bit about your background?
My name is George Hripcsak. I went to medical school at Columbia, trained in internal medicine, and later got a master’s degree in biostats. I’ve been in the Department of Biomedical Informatics since 1988. I’m currently the Vivian Beaumont Allen Professor and Chair of the department, and I do research in data science and healthcare.
Can you describe your COVID research at the moment?
New York is one of the epicenters of the COVID disease, unfortunately. We’ve collected a lot of data on patients and there are a lot of questions on what treatments are effective and what the risk factors are. We’ve done some work local to Columbia and some collaborative international research.
I’ll talk about the international one first: OHDSI [pronounced odyssey]—Observational Health Data Sciences and Informatics—is an international collaboration with 300 researchers, 30 countries, 600 million unique patients in our federated database. Columbia University is the coordinating center for OHDSI, and I’m the director of that coordinating center. With the appearance of COVID-19, we engaged in COVID research. We were actually supposed to run our annual European symposium at Oxford in the UK just as everything was shutting down, so we turned it into a virtual symposium. It turned into a study-a-thon (like a hackathon to do a study). We spent four days straight, 24 hours a day, working on COVID research, and that is what got us started. We had 350 people and we used the Microsoft Teams platform. Because we had Asia, Europe, and North America, the different time zones meant there was always something going on and we were getting up for all the calls all around the clock, pretty much. And we’ve continued for several weeks now.
There are three things we do in OHDSI for COVID research: characterization, estimation, and prediction. Characterization is measuring how often different things happen in the disease. You may have seen news stories that older people, people with hypertension, and people with other various risk factors get the disease or serious complications more often. We’re looking at databases in Korea, Spain, and the UK, and several databases in the US: Columbia, the Veterans Administration, Tufts, Stanford, and others. We find that compared to influenza, COVID is a young person’s disease. That is, influenza affects old people and people with chronic diseases much more severely. The difference for COVID-19 is actually that compared to influenza, we’re getting more younger people than we would have expected based on the influenza experience. COVID-19 still affects older people or those with chronic disease more severely, but healthy people are also getting ill at a rate much higher than ever before.
There are a number of questions that pop up, like what role does obesity play? And is the greater COVID-19 risk with hypertension due to hypertension or to the drugs used to treat hypertension? Are older people getting sick because they have chronic diseases or are people with chronic disease getting sick because they’re older, or is it both things independently? Once you start asking questions about what is causing an effect, then you move away from characterization to population-level estimation. This includes risk factors and treatment effects.
Here is the first estimation study on COVID-19 patients that OHDSI is doing. There’s a class of drugs related to angiotensin—angiotensin converting enzyme inhibitors and angiotensin receptor blockers (ACEi and ARB). Based on the physiology, some posit that they predispose you to get infected and get complications once you’re infected. Others posit that they help patients with COVID. We’re carrying out a study to answer that question right now. We’re doing it extremely rigorously. We create our study protocol very carefully. We’re defining all our endpoints; we’re defining all the phenotypes. We’re publishing the protocol publicly so that we can’t change it mid-study. When we run the study, we blind ourselves to the results so that no one’s allowed to see them. We then unblind everyone at once to the answer across all sites so no one can tweak the statistical methods to get the answer they wanted to see.
The third OHDSI product is patient-level prediction: given my personal risk factors, what’s my risk of being infected? What’s my risk of severe complications? This can be used to decide if it is safe for someone at the hospital to go home.
The first COVID study OHDSI did was actually not on COVID-19 patients, but on patients who were taking hydroxychloroquine and azithromycin to see what their safety risk was, just in case we give the drugs to people as prophylaxis against getting COVID, such as healthcare workers. What extra risk are they taking on by taking the drug, and does that outweigh any benefits from taking the drug? What we found was that for a 30-day period, hydroxychloroquine is relatively safe, but taken in combination with azithromycin, which is what many people are doing, there is a notably increased risk of sudden cardiac death due to arrhythmias, due to the known side effects of those two drugs. That study was used by the EMA (the European version of the food and drug administration) in their decisions on how to treat those two drugs and was also used by the US FDA.
We’re also doing Columbia-specific studies. We published a characterization study on Columbia patients. Second-year Columbia medical students pitched in to help get data out of the electronic health record in a medically interpreted form. They filled out an extensive spreadsheet with hundreds of items that characterized patients with COVID, and then they did the analysis. There were about 30 medical students with four of them as leaders, and they did an excellent job. Their paper just came out in the British Medical Journal (BMJ). We found a couple things: first, as others have shown, that older people with hypertension, diabetes, obesity tend to be at greater risk. We noticed a high preponderance of kidney disease, which has also been noted in other New York hospitals, but was higher than seen in China or Korea. We saw bimodal distribution of intubation. From when symptoms started (not when they entered the hospital), 3-4 days is when they had one peak of intubation, and the highest peak was at about 9 days. Once you got to 15 days, the risk dropped pretty low, so much so that if you see a patient who started their symptoms 2 or more weeks ago, you had to worry less about intubation and respiratory failure. It might therefore be safer to just send them home for observation rather than having to admit them. But if they got through only a week, they actually hadn’t even reached the peak of intubation yet (9 days).
With the Department of Medicine, there was a paper on hydroxychloroquine—not on its safety but on its effectiveness. That study didn’t find a difference in patients who took it versus those who did not take it; it was published in the New England Journal of Medicine. There are several other papers coming out from our local research.
Have you faced any setbacks in your research?
As we do international research, each country and each site within a country is coding COVID-19 slightly differently because it’s new. And the standards organizations that create the codes that you encode disease with were scrambling to create new codes. But even with their codes, it ended up going into the databases in different forms and you end up creating different definitions. One question is: do we change our definitions for each site or does each site modify the data in their database to better match more standard definitions? That slowed us down.
For randomized trials, you can easily assign causality. Because you randomize patients to two groups and see which did better, you’re pretty sure that the two groups were equivalent at the beginning. In observational research like we’re doing where we look back in the patient’s database, you don’t know if the group who got the drug and the group who didn’t get the drug were equivalent. For example, for patients who were more ill, the doctor might think of giving them hydroxychloroquine more often. Therefore, it might look like patients who got hydroxychloroquine are dying more often. Or patients who were “do not resuscitate”—therefore, those who were not going to be intubated—might get the drug more often or less often. Therefore, the 2 groups aren’t balanced, and you need techniques to account for that. In the setting of COVID-19, it’s harder to do that because patients are coming in, and everything’s going on at once. On the first day, you might be getting the drug or not, you might be intubated already, you might be intubated on the floor before a bed is ready for you in the ICU, and things like that. And people are so busy that it’s hard to document properly in the electronic health record—do they record it accurately? Do we know what time everything happened? If someone gets a drug and they get intubated, which happened first? Do they count it against the drug as a failure? Sorting all that out is actually quite difficult and it makes the studies that try to look at effectiveness—these population-level estimation studies—quite difficult.
We see in the literature a plethora of studies being published about COVID-19, and a lot of them are contradictory. Some say a drug works, and some say it doesn’t work. Researchers go in with an instinct that something is going to work or not and you never know if they’re picking the analysis based on it matching what they expected, which is not what you want. That’s why you blind your results so that if you can’t see what’s coming out and you can’t modify it by adding a variable or changing the analysis method a little bit, which pushes it in a direction—even subconsciously—that you wanted it to happen. If it doesn’t make sense, you say, “Why did that happen? Let me try this.” Now that thing matches your expectation and you accept it because you think you’re doing the right thing, when in fact all you’ve done is flipped a coin until you got a head and said, “See? They’re all heads!” So those are the most difficult things.
Would you say that COVID-19 has shifted dramatically any of your goals or thoughts about the health record system?
The fact that we were able to produce those codes quickly was actually quite good, so I was happy with the response of the standards organizations.
Electronic health records have been in the news in recent years because of their increased use and the burden they place on providers. In the setting of COVID-19, they were in even more of a hurry than usual, and that probably limited the ability to document things in some ways, so improving electronic health records to reduce the documentation burden would be very useful. The fact that we were able to do the studies this quickly was actually a benefit of having electronic health records. Now, we’re social distancing. Imagine: in the old days you would have to go to the floor that the patient’s being treated on to get the chart, read it, and abstract it. Now, you can go into the computer and get those data that are entered. I think we need to continue to improve them, but overall, they were a benefit in this case.
I know you’re a trained internist too—do you still see patients?
No, I stopped seeing patients after residency and board certification in 1988.
What started your interest in working on the health record system?
I always had computers and mathematics as a hobby, so I was always doing that as a kid—well, not computer science as a kid since there were no home computers at the time—but electronics as a kid. I was interested through college, and it was only in med school that I realized I could make a career out of it. In senior year of residency, I applied for a fellowship—it wasn’t official, since it was a brand-new field—and started doing it and learning more and basically never got up from that chair. I loved it so much I stayed in it. That was 1988 at Columbia, and we didn’t have full electronic health records then. We had a few electronic systems: the lab system was electronic, and a couple other things were. We built on that and built the whole electronic health record system. Over the years, we’ve switched systems; most recently, we purchased Epic Systems, an electronic health record that went live on February 1st. That was one of our difficulties—it had just gone live and we didn’t have experience extracting data from it, so we scrambled to do that. Anyway, I had an interest in math and later computers, and I just found I loved what I was doing with computers in medicine.
Are there any common misconceptions you’ve heard regarding COVID-19 that you’d like to rectify or respond to?
The idea that if you’re young, you’re safe is not quite true. With influenza, if you’re young, you’re not completely safe even then; with COVID-19, if you’re young, you’re even less safe. Most people who are young do fine, but not everyone, unfortunately.
What is your perspective on the future—both regarding your own research and the broader impacts of COVID-19?
Well, I think OHDSI, for example, is moving slowly on these estimation studies to get them right and to publish the protocols. Over time, we’ll collect better data and better understand this disease, both for COVID-19 and how to best respond to future pandemics separate from COVID-19. I think the usefulness of OHDSI to the US FDA and to the European EMA is becoming recognized. I think there’s probably going to be more collaboration with those two agencies going forward. The OHDSI format, OMOP, got adopted by NIH’s program to assemble a database of COVID-19 patients being run by NCATS, one of the institutes within NIH. It’s called the N3C project.
For our work, we’ve learned a lot about how to analyze patients with COVID-19 as opposed to outpatients who, say, have hypertension and started a drug and see how it goes over the course of a year. Now we’re learning how to do the same kind of research, but on patients who come in, and in a day, this happens, and in two days, that happens, because things are happening much more acutely. We need more work on how to do well in this acute setting. And the importance of having these electronic data is just being further verified.
I love the name OHDSI, by the way. I thought it looked a little clunky, but the actual pronunciation sounds great.
Yes, and the tools in our projects are named for Greek mythology. There’s ACHILLES and ATLAS, and if you go through the OHDSI website, you’ll see more. SCYLLA and CHARYBDIS are the 2 COVID-19 projects.