AI Is Only as Good as Its Training Data
Garbage in, garbage out
AI engines, or large-language models (LLM’s), can only be as good as the information they are trained on. If they are trained on poor information, their conclusions will be faulty, and that appears to be the case with the question of whether ivermectin is useful for treating SARS-CoV-2 infections.
Consider the hypothetical case in which 80% of the population believes that the Earth is flat. What would AI tell us? It would probably say the Earth was flat, though it might mention that others claim it is spherical (actually, it’s an oblate spheroid).
It turns out, according to a LLM, that approximately 2% to 10% of Americans believe, to some extent, that the Earth is flat. And so AI engines won’t tell us the Earth is flat. They will tell us that some people believe the world is flat, but the AI engines will tell us those people are wrong while those who believe it is spherical are correct. For this issue, AI engines will offer correct information, matching their training data.
However, there are other popular beliefs that deviate from the best available information. There are cases where the common perception is wrong. You may know of such cases and I’m sure we could think of a dozen or more. Consider the issue of using ivermectin to address infections by SARS-CoV-2, the virus that causes COVID-19.
I asked ChatGPT version 5.2 the following question: “Is ivermectin useful for treating COVID-19?”
“Short answer: No. High-quality clinical evidence shows that Ivermectin is not effective for preventing or treating COVID-19.”
It added:
“What the evidence shows:
Randomized controlled trials (including large, well-designed studies) found no meaningful benefit for symptom duration, hospitalization, or death.
Early small studies that suggested benefit were later shown to have serious methodological problems, and some were retracted.
Multiple systematic reviews and meta-analyses conclude there’s no reliable evidence of clinical benefit.”
Unfortunately, these conclusions don’t comport with the available empirical evidence.
Then ChatGPT repeated what the FDA, the NIH, and the WHO have said about ivermectin for SARS-CoV-2—that it doesn’t work. David R. Henderson and I wrote an article in the Wall Street Journal challenging the FDA’s statements about ivermectin, which changed shortly after our article was published. We have no direct proof that our article was the reason, but we’ll take credit.
Next, I asked ChatGPT to summarize the key clinical trials. It listed four. Below I provide my comments on these four published papers.
The TOGETHER Trial (Reis et al., 2022) was both big and bad. Big in the sense that so many patients were included that it overwhelms other clinical trials in meta-analyses. Bad in the sense that this is the same clinical trial for which David and I discovered so many problems that we wrote about here and here. The TOGETHER Trial includes a severe conflict of interest, broken trial blinding, failures with treatment randomization, multiple impossible numbers, uncorrected errors, protocol violations, a broken promise to release the data, and no responses from the authors regarding queries.
The TOGETHER Trial was so flawed that I’m left wondering whether the cause was incompetence or fraud. How could a clinical trial be so bad and yet have so many praise it?
Naggie et al. (2022 and 2023), had substantial errors and anomalies. This study contained flawed data and calculations. About 16% of the patients were missing from the original analysis. After these problems were raised, the authors reanalyzed the data to include the missing patients. However, the revised results changed so dramatically that they raise serious concerns about the reliability of the new analysis.
For example, 8.0% of the patients in the original dataset experienced an adverse event, compared with only 0.4% of the newly added patients. Similarly, 2.4% of the original patients were missing a symptom severity measurement, while 32% of the newly added patients lacked this measurement. Differences of this magnitude are unlikely to occur by chance and suggest deeper problems with the data or analysis.
López-Medina et al. (2021), was partially funded by drug companies Sanofi Pasteur, GlaxoSmithKline, Janssen, Merck, and Gilead, which stood to gain from bad news about ivermectin, and was condemned as fatally flawed by more than 100 physicians in an open letter to Journal of the American Medical Association. More criticisms are here, here, and here. Critics have identified several methodological problems. They argue that ivermectin was administered too late in the disease course, that patients were instructed to take it on an empty stomach, and that the trial population consisted largely of mildly ill, otherwise healthy individuals. There are also reports that some participants assigned to the control group had already taken ivermectin.
More significantly, the investigators changed the trial’s primary endpoint—the key outcome used to determine whether the drug is effective—halfway through the study. Changing a primary endpoint after a trial has begun is generally regarded as a serious breach of clinical trial protocol and raises substantial concerns about the reliability of the results.
The design of Hayward et al. (2024) raises questions about whether the trial gave ivermectin a fair test. The lead investigator, Chris Butler, conducted two overlapping clinical trials evaluating potential COVID-19 treatments. The trial of molnupiravir, an expensive branded drug, administered treatment early in the course of infection, enrolled older and higher-risk patients, used an appropriate dose, and included a large number of participants.
In contrast, the trial of ivermectin, a cheap generic drug, allowed treatment delays of up to two weeks after symptom onset, enrolled younger and lower-risk patients, used a relatively low dose for a short duration, instructed participants to take the drug without food—contrary to standard recommendations—and included far fewer participants. These design differences raise questions about whether the ivermectin trial was structured in a way that would allow the drug to demonstrate a meaningful clinical benefit.
Some who have studied Hayward et al. have identified 50 problems with the clinical trial, including inconsistent results and the paper’s hiding the superior results for ivermectin.
How could anyone say these were good clinical trials?
ChatGPT reported that “multiple systematic reviews and meta-analyses conclude there’s no reliable evidence of clinical benefit.” However, the only one it listed is Cochrane (Popp et al., 2022). David and I have an article set to be published in Cato’s Regulation about the Cochrane report, which claims that ivermectin was ineffective against COVID. I don’t want to repeat our findings in detail here, but we found that the Cochrane conclusion was faulty. As we document in the Regulation article, the Cochrane report was too selective on which data to include, and much of the data selected was of poor quality. Overall, the Cochrane report is an example of “garbage in, garbage out.” Instead of upholding Cochran’s reputation for quality and, therefore, being the last word on ivermectin, this study is, unfortunately, specious.
While ChatGPT mentioned only Cochrane, there are nine meta-analyses that I am aware of. Six reported a benefit for ivermectin treatment while three didn’t.
For the six meta-analyses that reported a benefit:
Bryant et al. (2021) found a 62% reduced risk of mortality with ivermectin.
Hariyanto et al. (2021) found a 69% lower risk.
Kory et al. (2021), found a 71% lower risk.
Lawrie et al. (2021), found an 83% lower risk.
Nardelli et al. (2021), found a 79% lower risk.
Zein et al. (2021), found a 61% lower risk.
These six meta-analyses derived results that were all highly statistically significant. The one with the lowest (“worst”) p-value still had p=0.005, which is one tenth the standard accepted limit.
All three of the meta-analyses that didn’t report a benefit had fundamental problems. The weaknesses of Cochrane (Popp et al., 2022) were mentioned above.
Song et al. (2024) is tainted by the damning complication that one of the study’s researchers, Andrew Hill, was recorded on video (at the 5:26 point) admitting that the conclusions of his ivermectin research were not based on the data but were requested by a funding organization! Not to worry, he assures us, because he’ll fix the mistake “later.”
A third meta-analysis, Hernandez et al. (2024), shares with Cochrane the use of many deeply flawed studies.
ChatGPT’s conclusions about the usefulness of ivermectin contradict the clinical record, which includes 106 clinical trials that have included approximately 220,000 patients. For the most important clinical outcome, mortality, the results of the 53 studies that included this outcome suggest that ivermectin can reduce mortality by 47% and the results are highly statistically significant (p < 0.0001) when combined.
Next I gave ChatGPT the following prompt: “There have been 106 clinical trials of ivermectin. In the 53 studies that included mortality, ivermectin was shown to reduce the risk of mortality by 47% and the results are highly statistically significant. Why have you ignored these clinical trials in your assessment of ivermectin?”
The response was unsatisfying, saying that it included only trials that were randomized, controlled, adequately powered, and transparently conducted. Note that one of the strengths of meta-analyses is the combination of small studies into what is effectively one large study. Among meta-analyses, six report a benefit from ivermectin treatment while three don’t. I asked further clarification questions and got reiterations of the “good” trials and the recommendations of the FDA, the WHO, and the NIH.
In conclusion, ChatGPT told me that ivermectin doesn’t work for COVID-19 because, I can guess, most of the information it was trained on said that ivermectin doesn’t work for COVID-19. However, based on my research, that conclusion is incorrect.
I am reassured, however, that ChatGPT doesn’t claim the world is flat.

