Re-counting records at Mary Ferrell

This is a follow-up to a note I posted over two years ago (March 2018).

That note looked at the question of how many records were listed in the ACRS, NARA’s on-line database of ARC metadata.1As discussed numerous times on this website, the ACRS is not a list of every record and/or item in the ARC. A page is coming soon to this website that will explain exactly what the ACRS does and does not cover. Since the ACRS has become available on-line, with a little bit of programming one should be able to answer that question definitively; just scan and scrape it. In fact, the Mary Ferrell website has a copy of the entire ACRS from mid-2015 which was produced in just this way. Call this copy MF15.

Origin and uses of MF15

MF15 came from a programmer named Ramon Herrera, who gave one copy to the Mary Ferrell Foundation and maintains another copy on his own website. The copy of MF15 on the MF website also comes with an interface which presents extremely useful summaries of ACRS data. This interface, dubbed the JFK Database Explorer, is available here.

The ACRS is not, of course, set in stone. In fact, as noted elsewhere on this blog, the ACRS is now off-line and undergoing maintenance (as of October 2020). This will probably result in significant changes to the ACRS, due to the 2017-2018 release of most of the ARC documents and passages withheld from the public.

When ACRS comes back on-line, the MF15 copy will then serve as an indicator for exactly what has been added, removed, or changed in it.

So how many?

One question that is bound to come up again is how many records are actually listed in the ACRS. Oddly, MF15 did not completely settle this question. This was the point of that old post. Instead of one answer, I found three. The first answer, according to the Mary Ferrell website, was 319,106. At the same time, MF also quoted Martha Murphy, supervisor of the ARC at NARA, as saying there were 318,866.2For these two numbers see here Using the JFK Database Explorer, the MF resource page described above, I came up with a third count: 318,733.

This was the confused state described in my March 2018 note. Recently I have taken another look at the question, and come up with a definitive answer: The total number of records in the MF15 is indeed 319,106.

NARA officially adopted this figure in October 2017.3The current version of the webpage that says this has benn replaced by an announcement of ongoing maintenance to the ACRS. To see a Wayback machine copy of the page click here. This left only the discrepancy I found in the JFK Database Explorer in 2018. This discrepancy has now disappeared in the current version of the Database Explorer.

If you just want a total for MF15, you can stop here. Beware, however! Just because there are 319106 entries in MF15 doesn’t mean there are 319106 records at NARA. If you want to understand this, you will have to read on.

Source of the discrepancy

First, you have to know how the records in the ARC are indexed. For each record, a form was completed called a Reader Information Form (RIF) that includes detailed metadata for the record, such as date, number of pages, recipient(s), agency providing the document, agency originating the document, and so on. Each of these RIFs is identified by a unique RIF number. For an example, see the top sheet here to view the metadata for record 104-10001-10010.

The first three digits in each RIF number represent the agency that provided the record. 104 refers to the CIA, 124 refers to the FBI, and so on. Looking at this page of Mary Ferrell’s JFK Database Explorer, we can thus see the total number of records for each of these three digit prefixes in MF15.

Fortunately, I kept a copy of this summary page from 2018. Comparing it with the current summary page, we can see where the differences came from. The record counts for three agencies changed between 2018 and 2020:

Agency 2018 count 2020 count
104 85492 85493
119 5145 5155
124 148049 148411

These differences add up to 373 more ARC records in 2020 than in 2018. The current version of MF15 thus now matches the total of 319106 records which MF first gave in 2018. So which records were added to MF15, and why weren’t they in the earlier version? These questions took a bit of searching and head-scratching to answer.

A data entry problem

It turns out that these records all have something in common. Take a look at this record: 124-10001-10013. This RIF number is in MF15, but it has NO metadata.4I checked the copy at Ramon Herrera’s website, and it is the same: a number but no data. The reason, as the “subjects” and “filenum” fields inform us, is because “NO DOCUMENT EXISTS FOR THIS RECORD NUMBER.”

This is a data entry problem. Somehow, a RIF number was entered in the ACRS which was not assigned to any document. All 373 of the new documents seem to be the same issue. For some reason, these records were excluded from the earlier version of the JFK Databse Explorer which I looked at in 2018. They have now been added to the 2020 version.

As a practical matter, we can just accept this and go on. Notice, however, that this also means we cannot look at the total number of ENTRIES in the ACRS and take this to be the total number of DOCUMENTS in the ARC! Let all us bean counters take a lesson from this.

My 2 cents

373 is not a big number, but I have read posts on the internet that cite smaller discrepancies in the ACRS with the deepest suspicion. Is NARA concealing ARC documents? In fact, there are many such gaps in numbering throughout the Collection; after all, it was put together over decades, often by hand.

It is of course a fair question to ask how these “blank entries” came about in the ACRS. Note that most of these entries are FBI records. Although I cannot point to any statements from the FBI’s Task Force on JFK records, or from NARA itself, it is not hard to come up with a possible explanation based on the structure of FBI files.

An FBI file is itself a collection of individual documents that are called “serials”. For example, 105-82555-1 represents the first document or serial in the FBI Headquarters file on Lee Harvey Oswald. It is not unusual, however, for series to sometimes be incorrectly numbered, so that the seriation is not continuous, or accidentally omitted, so that there are documents which were not originally included in a series.

A number of these “blank entries” were perhaps produced when the FBI staff inputting data for NARA simply input RIF numbers and serials numbers based on overall counts which failed to note that a serial number had been skipped. If, for example, in the sequence 105-82555-1, 105-82555-2, 105-82555-3, serial 2 had been skipped, but continuous RIF numbers had already been input for three records; this would mean one RIF number would necessarily link to no data. Instead of the FBI or NARA deleting this blank entry, it may have simply been loaded into the ACRS, to the confusion of later users.

There are other ways that these blank entries could have been produced; the point is that these are surely an artifact of data entry problems, not of conspiratorial schemes to conceal ARC documents.