With air raids in Libya and escalating conflict in Ivory Coast, we’ve been asked a seemingly simple question: how do epidemiologists keep track of the victims of violence?
Is counting civilian casualties even a public health question? Les Roberts of Columbia University argues it is: “It is odd that the logic of epidemiology embraced by the press every day regarding new drugs or health risks somehow changes when the mechanism of death is their armed forces.” There’s a practical issue here as well: that public health organizations, including government and non-governmental aid agencies, are expected to clean up the consequences of war; it follows that they should have some understanding of the war’s impact if they are going to allocate resources and construct programs in the wake of a conflict.
While there is essentially no reliable data to know the impact of the Libyan or Ivory Coast conflicts as of yet, the method of assessing casualties can be answered retrospectively: by looking at the controversy surrounding studies of deaths in Iraq. In this blog entry, we recount the major studies of deaths during the Iraq war since 2003, and discuss how to proceed with both methodological and political controversies surrounding the epidemiology of conflict.
There were at least six major estimates of deaths during the Iraq war, most of them accounting for deaths since March 2003. The numbers vary considerably—from around 100,000 civilian deaths according to the Iraq Body Count project (which uses media reports to document deaths, supplemented by third party sources like morgues), to between three and twelve times that number according to two surveys published in The Lancet.
The Lancet surveys, and the controversy surrounding them, provide some important education to us about epidemiology in wartime. The first survey, published in October 2004, estimated 98,000 excess Iraqi deaths in the 18th months during and after the 2003 invasion. The second survey, published two years later, estimated over 650,000 excess deaths related to the war (about 2.5% of the population) through the end of June 2006. The numbers drew controversy because they were higher than the Iraqi Body Count project’s estimates, and during the subsequent Presidential election debates, George Bush dismissed them as “pretty well discredited”, although it’s not clear by whom.
Where do the numbers come from?
In the first survey, the study authors used the same methodology they used in the Congo, where their estimates contributed to the political impetus to deliver $150 million in aid to the region. The technique involved estimating total mortality in the country through door-to-door surveys asking who has died in each household, to avoid the problem of missing deaths that would be excluded by official media reports. A key goal was to account for all deaths, including those not directly attributable to a bullet or bomb; this method accounts for the indirect consequences of war, such as deaths from infections that went untreated because of hospital power outages or damage to a pharmaceutical factory. The authors then subtracted the pre-war mortality rate from the total mortality estimate during the conflict to estimate “excess mortality”.
But because an evenly-distributed survey across the entire region would be nearly impossible to execute during wartime, the authors used a “cluster sampling” method that divided the country into 33 randomly-selected, roughly equally-populated regions. A random point was chosen within each region, and then a “cluster” of about 30 households around each point was surveyed. In total, the authors sampled 988 households to estimate the relative risk of death during the invasion to that preceding the invasion, excluding an outlier city (Fallujah, where heavier fighting occurred than in other places). Based on the survey, the study authors calculated that the relative risk of death was about 1.5 times higher than the pre-war mortality rate (if Fallujah is included, it would be about 2.5 times the pre-war rate).
What constitutes bias?
The criticisms of this approach were both political and scientific. Here, we’ll stick to the scientific criticisms. The first question that arises from this methodology is: how do we calculate confidence intervals around these estimates? The authors estimated the variance in their results within each cluster, both for the pre-war rate of mortality and for several intervals post-invasion, in order to attempt to examine how much variability in reporting they received both over time and between different clusters. They analyzed the data using a standard statistical model of death risk by place and time, using the variance around the model’s mean to estimate the standard error in their calculations (alternatively, one could argue that a normal distribution should not be used for such a calculation, and the confidence intervals would be wider if the randomization in selection of clusters was fully accounted for in the calculation). Some criticism came from the fact that the confidence intervals were quite broad when accounting for the variable rates of death among clusters and across time: the final death toll’s 95% confidence interval varied from 8,000 to 194,000 excess deaths, corresponding to a relative risk of between 1.1 and 2.3 (excluding Fallujah). Does this make the results “biased”, as some commentators have suggested? Not really; as stated by Klim McPherson in the BMJ, “To confuse imprecision with bias is unjustified.” It is likely that variation in sampling during war is inherent to surveying anything during war using a household survey that will have a lot of variability based on who you ask and what’s gone on in different communities at different times; that doesn’t make the mean estimate of relative risk biased in any particular direction, however. So while this estimate from The Lancet is higher that the Iraqi Body Count estimate, that may be because the latter only uses official media reports while the Lancet study uses household surveys and accounts for indirect deaths.
Bias can occur from over-estimates or inaccurate reporting from household members who feel compelled to report deaths as being higher during the conflict than before, to attribute blame or seek compensation. The survey samplers did verify death certificates from most households, but this did not necessarily apply to the pre-war mortality rate, which some authors have argued was lower in The Lancet study that in other reports. Either this implies inaccurate reporting, or the sample in The Lancet study was not sufficiently representative of the country as a whole. Such an under-estimate of pre-war mortality could make the civilian casualty relative risk estimate artificially high.
An opposing source of bias, however, is inherent to modern warfare itself. Because deaths during wartime have a very high degree of clustering around key bombings or points of conflict, the random sampling of a few households around one point in a region may not capture a higher number of deaths in neighboring points somewhere else in the region, or vice versa (not enough clusters to account for the geographically unequal distribution of deaths). This is particularly concerning because not all of the random points being selected could be visited by surveyors, as the locations closed by checkpoints or considered too dangerous may not have been fully surveyed. Such inaccessibility implies that the areas experiencing the highest rates of death were those most likely to be excluded from cluster sampling.
Main street bias?
The second Lancet survey brought up more complex concerns about the cluster sampling methodology. This survey, conducted between May 20 and July 10, 2006, included 1,849 randomly selected households that had an average of seven members each. One person in each household was asked about deaths in the 14 months before the invasion and in the period after. The interviewers asked for death certificates 87% of the time; when they did, more than 90% of households produced certificates. The surveyors had sampling problems with three clusters due to clerical errors, so they excluded these three clusters from the analysis. The study concluded that the mortality rate per 1,000 population per year in the pre-invasion period was 5.5 (95% CI 4.3-7.1) and in the post-invasion period was 13.3 (95% CI, 10.9-16.1). The authors estimated that between March 2003 and June 2006, 654,965 (392,979–942,636) excess deaths occurred in Iraq (about 600,000 of which were violent deaths), an estimate that is ten-fold higher than the Iraqi Body Count and other estimates. That’s about 500 deaths per day.
This survey was notable for being larger in sample size that is predecessor, and as a result contained smaller confidence intervals. But one aspect of the methodology drew the most criticism: that the authors surveyed from “a random selection of main streets,” defined as “major commercial streets and avenues” as well as a “list of residential streets crossing” those main streets.
In the journal Science, two authors took the surveyors to task: “Main street bias inflates casualty rates since conflict events such as car bombs, drive-by shootings, artillery strikes on insurgent positions, and marketplace explosions gravitate toward the same neighborhood types that the [Lancet] researchers surveyed… In short, the closer you are to a main road, the more likely you are to die in violent activity. So if researchers only count people living close to a main road, then it comes as no surprise they will over-count the dead.”
How much does “main street bias” inflate an estimate of civilian casualties? One approach to evaluate the “external validity” of the results is to compare it to independent data, which the authors tried to do in this graph:
Looks like the survey and two other sources of data are pretty similar, right? The authors used this graph to conclude that “other sources corroborate our ﬁndings about the trends in mortality over time”. But do you see their error? The graph confuses two different kinds of data: data from the Iraq Body count and Department of Defense, which are cumulative deaths over time (left axis); and the household survey data from the authors, which is a rate of death per period of time (right axis). Thus, the two sets of data should not be compared on the same graph, and their similar-looking slopes should not be interpreted as indicating similarity in trends unless the data are adjusted to be in the same kinds of units. If we were to re-plot the data as deaths per month, or cumulative deaths overall, for all three data sources, The Lancet data would stand out as rising exponentially higher than the other two data sources.
As a better comparison, we can contrast The Lancet surveys with the only other peer-reviewed survey published to date–an official estimate from the Iraqi government and World Health Organization, which surveyed 9,345 households across Iraq in what is known as the “Iraqi Family Health Survey”. The study, published in The New England Journal, estimated 151,000 violent deaths (95% CI: 104,000 to 223,000) from March 2003 through June 2006 using a cluster survey method. The estimates were fairly similar in distribution among provinces between this New England Journal study and the Iraq Body Count website, although the results were again higher from the survey, consistent with the greater sensitivity of household interviews compared to media reviews. (The Body Count site used to collect news about civilian deaths that appear in at least two independent media sources, and cross-checked with hospital records, morgues, and nongovernmental organizations; it does not include combatant deaths, unlike the surveys, and now has revised its inclusion criteria to no longer require two sources). The second Lancet study was also smaller (1849 households in 47 clusters) than that conducted by the New England Journal group (9345 households in 1086 clusters, see map below).
The New England Journal group attempted to correct for under-reporting that might occur when all family members are killed, and may have contained a sample that was more representative of the overall country. The remaining difference between the studies may be the result of main street bias, but other authors have thought that the Lancet surveys were done too quickly, posing questions about whether many of the data were checked for legitimacy and whether their reporting of the percent of households presenting death certificates was plausible (many will note that the Lancet authors received criticism for avoiding full-disclosure of survey materials and data, some of which appear to have not been approved by ethics boards).
Beyond the specific controversies about The Lancet surveys, what can we learn about future civilian death toll estimates from epidemiologists?
 We have to acknowledge that the data gathering is hard, and dangerous, but often plagued by politics as competing interests try to inflate or deflate the results. It’s likely to be very imprecise, but probably better than having no data at all from which to estimate aid and rehabilitation budgets.
 Cluster sampling is probably safer and more feasible than trying to randomly sample a vast region during warfare, but it’s inherently limited by the fact that bombs are also clustered—whether around main streets or not—and whether you have enough clusters in your design to achieve good population representation cannot be easily computed by some abstract formula accounting for how much bombing or fighting is concentrated in certain regions at different times. There’s no simple equivalent to the calculation that we perform to determine sample size for standard cohort studies, and so will argue a lot about whether clusters are sufficient in size and scope, and how we should correct for biases in reporting and sampling.
 It’s likely that household surveys capture something that media reports do not: the indirect deaths that do not come from bullets or bombs, but from water contamination and hospital closures that leave people to die at home as unofficial or delayed casualties. Further surveys should also include injuries and disability that remain unaccounted for in research papers that are focused on the “hard outcome” of mortality.
 We can argue all day as epidemiologists about what magnitude of error we might see in overall civilian casualty estimates. This matters at a political level, in terms of bringing voice to those who lose their lives in conflict. But at a logistical level, a key question is not about the overall “final tally” of deaths, but rather what the line-item direct and indirect causes of death actually are, and where they are concentrated. In the spirit of public health, we should analyze the details of where different types of death are occurring to identify how we can prevent future illness. The surveys from Iraq reveal when deaths resulted from the contamination of water or the breakdown of healthcare infrastructure rather than from violence alone. They suggest where delayed injuries and deaths may occur in concentrated areas. The Iraqi Family Health Survey reported a large number of non-violent excess deaths occurred.
In such cases, household surveys provide us with clear direction for intervention, revealing underlying trends in geographical mortality risk that daily media reports about bombs and bullets often prevent us from noticing. The fact that Fallujah was such an outlier in the original Lancet survey indicates that the degree of household burden from deaths in that city requires a proportionate aid response, but such coordination and planning to correlate aid with need is surprisingly not yet the norm. The next step in doing this work, after all, is to avert further deaths rather than bickering over confidence intervals.