(Update: While this post may still be of some interest, I no longer consider the reasoning directly relevant to the question of the external validity of the Baird et al. study. The reason for this is that the high prevalences were spread across both the control group and half of the treatment group, and I don't know how long these high prevalences lasted in the control group, relative to the 2.4 extra years of deworming that the treatment group received.)
This is a sort of ancillary post to a forthcoming, more detailed post on the cost-effectiveness of mass deworming programmes in humans. The context here is that GiveWell's recommendation of deworming is based in large part on an analysis of Baird et al. (working paper). This paper looks at the earnings of a group of people in Kenya who had been dewormed at school, and the authors find substantially increased incomes relative to a control group. Unfortunately, flooding caused very high worm burdens during the second year of the study, so we can't expect that the results generalise across the rest of sub-Saharan Africa. But given the prevalences in the study sample, and estimated prevalences across the continent, it seems reasonable that we could come up with an appropriate discount factor for the external validity of Baird et al.'s results. Unfortunately, I haven't been able to come up with such a discount factor – essentially the best I think we can do is a sort of fudged guess.
[Remark: the high prevalence in the second year of the study is not the only reason to have an external validity discount factor, but it is a main reason. I'll write more in the upcoming post.]
The short version for this lack of success: I don't know (and perhaps no-one knows) the relationship between worm burden and probablity or degree of developmental impairment at an individual level. i.e., if someone has a very high worm burden, how much is that going to cost them in adulthood? I don't know.
The long version follows.
The basic idea is that worm infections – ascariasis, trichuriasis, hookworm, and schistosomiasis – can affect children's physical and mental development, leaving them with life-long impairments that affect their earnings in adulthood. These infections can be treated very cheaply by administering deworming tablets to all students at schools in infected areas, several times a year.
The way the WHO estimate the number of people at risk of developmental impairment due to worm infections (link), and from there the total estimated disease burden, is to define a threshold worm burden – for example, 15 Ascaris lumbricoides worms in a child aged 5-9 – estimate the number of people (through a modelling process) above the threshold, and then apply the same disability weight to all people above the threshold. (They then multiply this by a 3%, on the grounds that only a minority of the people infected will have the life-long impairment.)
So, a natural approach to solve our problem with the Baird et al. paper is to see how many of the children in the study's high-prevalence year were above the relevant thresholds, and compare to the number in the typical-prevalence year, and do a division. (A natural further step would be to compare the typical above-threshold percentage in that part of Kenya with the above-threshold percentages across sub-Saharan Africa.)
But in an email discussion, Alexander Berger of GiveWell pointed out that worm burdens aren't linear. What follows is my attempt to learn how worm burdens are distributed, and I end by throwing my hands in the air and giving up trying to reach a conclusion on the appropriate discount factor for the study's external validity.
Worm burdens in a community approximately follow a negative binomial distribution. Unfortunately, there are quite a number of different conventions to describe these distributions, and the terminology used in the medical literature is quite different to that in the Wikipedia article. I'll follow instead the notation in Anderson and May. Here is a screenshot from Amazon's "search inside":
The aggregation parameter k can take real values, in which case the factorials (which come from a binomial coefficient) are replaced by appropriate gamma functions.
[Wikipedia's definition uses r – the same as our k – and p, with the binomial coefficient, in our notation, being multiplied by (1-p)kpi. The parameters are related by m/k = p/(1-p), and you can check that it's all consistent. Note that some sources replace p by (1-p): it's a standardisation nightmare.]
While I think I understand this now, I'm not completely satisfied with my ability to reproduce the histograms in the textbook. This is from the following page (I've blacked out the caption for the histogram not shown):
We're mostly looking at the black bars, though we can get the total number of people by looking at the grey bars, which are all integer frequencies (the y-axis is not measured in percentages: the grey bars sum to 84). The (black) bin labelled '85' appears to be erroneous, since it's the same height as '75'. But ignoring that one, we can count pixels to see numerically what the "expected" worm burden in each histogram bin: 8.6, 29.5, 14.1, 9.2, 6.6, 5.0, 3.9....
If I then plug in the given values of m = 24.54 and k = 0.618 and generate the distribution, I find that N(0) = 8.5, N(1-10) = 29.2, N(11-20) = 14.1, N(21-30) = 9.1, N(31-40) = 6.2, N(41-50) = 4.4, ....
The agreement between theory and theory is not great, even allowing for some pixel counting errors and assuming that the histogram bins aren't clearly labelled.
So, having established that I might not know what I'm doing, let's move on to the problem of worm burdens in the Kenyan study. Take roundworm (A. lumbricoides) in Miguel and Kremer (an earlier study on the same Kenyan programme). Pre-treatment (Table II), there was a moderate-heavy infection prevalence of 16%, with a mean intensity of 2334 eggs per gram of faecal matter; for the control group (Table V) the same moderate-heavy prevalence was 24%.
Let's say that there are 1000 epg per worm, so that the mean worm burden is 2.334 worms per person. Then with k=0.54 (the usual value used for A. lumbricoides), we'd expect 17% of the population to have at least 5 worms. Moderate-heavy infection here is defined as 5000 epg, which I'm assuming to be 5 worms; 17% compares well with the actual 16%, so this small part of the calculation is approximately consistent. (It doesn't work so well at predicting the total prevalence (light or moderate-heavy): predicted 59%, actual 42%. That doesn't bother me, though, since I'm not expecting great curve fitting from only a couple of reported values and a guess at the epg to worm conversion.)
Now let's consider the mean worm burden for those individuals in the model with at least 5 worms: these people average 8.5 worms each.
If we increase the mean worm burden to 3.2, then 24% of our modelled population has at least 5 worms, as in the higher-prevalence year of the study. But now, of this larger group of people above the threshold, the mean worm burden is 9.7.
So as well as an increase in the prevalence of moderate-heavy roundworm infection by 50% (8 percentage points), those in the moderate-heavy group are experiencing an increased worm burden of about 14%.
We could imagine following a similar procedure for whipworm, hookworm, and the schistosomes (though I don't believe we know much about the numbers of schistosomes in infected people). But even if I or someone else did all that, what could we conclude? I would be reluctant to boldly say that developmental impacts are proportional to the worm burden above threshold – I don't think we have the evidence to make any claim on the relation between worm burden and developmental impact at an individual level. And what about those people with two or more infections? And so on.
The GiveWell spreadsheet bases its external validity correction on the prevalence of any moderate-heavy helminth infection: 37% pre-treatment, 66% in the control group.* They think that 37/66 = 0.56 would be too high a factor to use (i.e., not discount enough), because the 66% group would have higher mean worm burdens above threshold (or, equivalently, more people with very high worm burdens). So instead they use an odds ratio: 0.37/(1-0.37)/(0.66/(1-0.66)) = 0.3025.
*The latter figure is 52%, adjusted upwards to 66% on the grounds that the control group benefited from the deworming applied to the treatment group – the two groups came into contact with each other, and there wasn't as much transmission of infection as there would have been in the absence of the programme. While such a spillover effect is a useful feature of a deworming programme, I don't think it should be taken into account when deriving the external validity correction: we should compare the prevalence experienced by the control group to the prevalence experienced in more normal years.
I think that using an odds ratio like this suggests a mathematical reasoning for the discount factor that just doesn't exist: we might have 0.37/0.66 * somefudgefactor, or just someotherfudgefactor < 0.37/0.66. Whatever the appropriate numbers to use are, I think it is better to be clear that there's a big element of guesswork in it.
At the time of writing, I don't really have anything against a 30% discount due to the high prevalence experienced by the control group. But I wouldn't have anything against 60% either.