I really enjoy learning about the Census Bureau and its processes. So I’ve been especially fond of Michael Hait’s “anatomy” series and his other articles about the census. After reading his latest census oriented post, I started musing about the 1890 census. The population schedules for the 1890 census were mostly destroyed in a fire that occurred in the Commerce Department building in January of 1921. The Associated Press reported that several senators believed that cigarette smoking had been the cause of the fire. There were calls for immediate bans on smoking in federal buildings; however, that wouldn’t happen for many more decades.
The destroyed census records were probably not burned but rather affected by water damage. In any event, the absence of an 1890 census can be a real problem for some researchers. Certainly there are other sources available; some even are described as census substitutes. These latter are most frequently directories of some sort or another that list individuals by name or occupation or address for a particular time. A complete census substitute may consist of a number of such records.
But the census substitutes and other records used to fill the 1890 to 1900 gap, are scattered and sporadic. It would really be nice to have an 1890 census.
Well let’s ask our fairy godmother to reconstruct the 1890 census or couldn’t we actually do it ourselves, using presently available technology? Maybe. For a moment as we go down this fantasy trail, let’s set aside issues of cost, including cost-benefit analysis. We’ll get to all of that later .
My idea goes like this: we would gather all the available information relevant to the 1890 census and we would dump it into a supercomputer that would crunch the numbers and come out with population schedules for every state just as if the original population schedules existed.
Here’s the data we would we would input:
- the 1880 census
- the 1870 census
- the 1900 census
- the 1910 census
[the purpose of selecting these years for census data is to engage in a bit of "triangulation" with respect to individuals and their residences in 1890.]
We would also input:
- digitalized data of known migration patterns
- digitalized biographical information of people known to have been alive between 1890 and 1900.
- information about deaths known to have occurred between 1880 and 1900.
- information about the rate of growth of each state and county in the country from 1880 to 1900.
- marriage data from 1880 to 1900.
- available military records from 1880 to 1900.
- a harvesting of names from all publications issued from 1880 to 1900, with locations attached to the names.
I guess that comes down to just about everything for which we have a record from 1880 to 1900. We would program the computer with a set of rules to allow it to connect individual lives from these various sources. Another right rule would have to do with linking families together.
Remember the goal is to produce population schedules that would replicate the original 1890 census population schedules.
I’m thinking that in the abstract this is a plausible idea. And I’d bet there’s enough computing power in the world to do this.
Now to deal with reality. One question is how would we judge the accuracy of this reconstructed 1890 census populations can? That’s an excellent question. Like all scientific endeavors, we would have to establish some confidence level at which we would find ourselves to be satisfied with the the matter as it is developed. Think for a moment about a point that Michael Hait raised in his latest post on census: how reliable are any census records? We could judge the accuracy of the reconstituted 1890 census population schedules by identifying sources that would likely lead to a specific individual being located where the reconstituted schedules say he is. We would randomly sample the population schedules and compare them to other known records. Recall that’s among some of the other known records are some survive in population schedules for some very few counties in some very few states. Those could be part of the data set or they could be held out as a control set. That could be one of the devices we used to sample the reconstituted schedules for accuracy.
But in some sense the accuracy sampling issue has already been answered for us. We would subject our Reconstituted 1890 Census Schedules to the elements of the Genealogical Proof Standard.
- “A reasonably exhaustive search:” this would have been done through the data collection process.
- “Complete and accurate citation of all sources:” the computer will do this.
- “Analyze and correlate the collected information to assess its quality as evidence:” the computer will do this.
- “Resolve any conflicts caused by contradictory items of evidence or information contrary to the conclusion”: this would be a shared human/computer task.
- “Arrive at a ‘soundly reasoned, coherently written conclusion:'” this would be the end product.
If the project meets these criteria, then we may deem it sufficiently accurate for genealogical research purposes.
Another question is how much will this cost? Answer: I don’t know.
And yet another questio, really the most basic one: why?
And my answer is: (1) because we can (i think); (2) because it would be interesting; (3) it may perhaps spawn new applications useful for other things that we cannot now anticipate; (4) it would be fun!
Now I’m no techie, but there seems to be some logic in the argument that this can be done. I’m not saying it should be done or that it’s necessary to be done but that it could be done. Do you think I’m right? Do you think this is actually feasible?
I’m not exactly holding my breath waiting for anyone to come rushing in to do this project, but I’d like to hear some educated views on whether it could be done.
November 7, 2009 Saturday at 7:08 pm