[First posted on December 16, 2017, 2017 at rgr-cyt.org.]
The National Archives and Record Administration (NARA) has released a sixth set of records under the JFK Records Act. The release, which took place on December 15, was announced here. According to the announcement,
At this point, with the exception of 86 record identification numbers where additional research is required by the National Archives and the other agencies, all documents subject to section 5 of the JFK Act have been released either in full or in part
This may therefore be the end of NARA posting ARC records. Or it may not. Many of the records released this year have been redacted. Whether to lift these redactions is not yet decided, but will be decided by April 16, 2018. My hope is that where redactions are lifted, NARA will re-post these files, with the original text replacing the redactions. Anyone who is taking a vote on this, please note that my hand is raised in favor of re-posting.
Going back to the current release, #6 is rather complicated if you want to count what’s now available. NARA provides a spreadsheet of releases with lots of important metadata (here), but you still have to do some number crunching to get the correct figures. The remainder of this post will explain how I crunched. A follow-up post will raise some questions that I still have.
Spreadsheets, filenames, and RIF numbers
After each release from the JFK ARC, NARA has posted a spreadsheet that gives important information about the records that are being released. The records are also posted online as either audio (.wav) files or pdf files. These pdf files are generally scans of original documents, usually produced by government agencies. Which records should be released was determined by the Assassination Records Review Board, an independent government agency established under the JFK Assassination Records Collection Act (JFK ARCA). This law was intended to open up all U.S. government records relating to the assassination of President Kennedy to the maximum extent possible, and the ARRB adopted a VERY broad definition of what counts as relevant to the assassination.
NARA’s spreadsheet of records in the releases is cumulative. That is, there is not a separate spreadsheet for each release. Instead, there is only one spreadsheet which is updated for each release, with information on the most recently released records at the top of the sheet. The cumulative spreadsheet for the December 15 release has 35,557 rows, but this does not mean there are 35,557 files posted at NARA; to understand why, one needs to crunch numbers.
The two key bits of information in each spreadsheet row are the name of the file posted at NARA, and a RIF number. RIF (Reader Information Form) is a form developed by NARA and ARRB for use in a database of JFK records. The ARCA mandated the creation of this database. An RIF includes a bunch of information, but the key component is the RIF number. This is a unique number which identifies every individual record in the database.
The relation between a document and a “record” in NARA’s database is not simple. There are cases where the same document may have more than one RIF number. For example if the ARRB acquired the same document from two different sources, say a document from the FBI investigation of the JFK assassination, then a copy of that document from one of the government bodies that investigated the FBI’s investigation of the JFK assassination, these are usually treated as two different records. Another example would be where single components of larger documents are assigned one RIF number, and the larger document they came from is assigned another RIF number.
Another complication occurs in the NARA release spreadsheet. The simplest situation for the spreadsheet data would be if there was a one to one correspondence between the name of the file posted at NARA for each release, and an RIF number from the JFK database. In other words, each file represents a record in the database. This is not the case, however. Instead, a single filename may be listed multiple times in the spreadsheet, each time with a different RIF number. I discussed a number of these cases in earlier posts. A single RIF number may also be linked to more than one file. I also found several cases of this which I discussed in earlier posts. The cumulative spreadsheet for the 6th release has more of both of these. Hence the need for additional number crunching.
Release 6 counts
The revised spreadsheet for release 6 has an additional 4238 rows of record metadata compared to the spreadsheet for release 5. According to the NARA announcement, however, there are 3539 “documents” posted on the NARA’s website. This difference arises in part because in the new rows, there are many cases where a single filename occurs in multiple rows. Each row in the spreadsheet has a different RIF number, so this means one file is listed under different RIF numbers.
This, I believe, is what NARA means when it says
Within records released on December 15th, there are instances where multiple record identification numbers are associated with the same pdf. This is due to the fact that the files were scanned in batches.
I don’t understand what “files were scanned in batches” means or why it would cause this situation. But in any case, this explains, in part, the discrepancy between the number of pdf files posted at NARA and the number of spreadsheet rows. In one case, a single file posted at NARA is listed in 25 different rows. Unfortunately, most of the metadata is missing for these rows, so it is quite hard to say why one file should be listed so many times.
Multiple RIF numbers for one file does not completely resolve the discrepancy between the number of spreadsheet rows and the number of files posted on NARA, however. The second reason for this discrepancy is because there are 12 rows in the spreadsheet that list multiple files. This means that there are multiple files with one RIF number.
In the previous cases where I found this, the different files keyed to the same RIF number occurred in different releases, and it seemed that the later versions of the files either had extra material added or redactions removed. But in the case of the files in release 6 where multiple files are associated with one RIF, I have not yet figured out what is going on. As an example of this, there is one case where a single row with one RIF number lists four different files (RIF 124-10183-10291). When one adds up all these multiple files listed under one RIF, one gets another 15 files.
Summing up, there are 3510 files listed in the spreadsheet for release #6 which are associated with 4226 RIF numbers. There are also 27 files listed under 12 RIF numbers. 4226+12 = 4238 and this is the total number of new rows in the release 6 spreadsheet (new as compared to the spreadsheet for release 5). The total number of files in release 6 should therefore be 3510+27 = 3537. NARA, however, says there are 3539. I have not yet figured out where the missing 2 files are.