Seeking Validation?
The time has come for me to reckon with EAD.
EAD, or Encoded Archival Description, has been in my life for a couple of years now. During grad school, I did a lot of experimenting with converting the hierarchical, tag-based XML data of EAD into tabular data. One of the outputs of this was a paper in which I visualized metadata from a Smithsonian Folk Life archival collection as a spatial network, and analyzed the geographical spread of various genre tags assigned to audio recordings. Archives-as-data projects continue to entice me, but I haven't had time to pursue this recently.
Now that I am a processing archivist, I am on the other side of EAD. Rather than converting from EAD to tabular data, I'm essentially doing the opposite. As I work with a collection, I keep a series-by-series folder list in spreadsheet format. When it's time to encode, I use this as the jumping-off point for my collection inventory. This is, in many ways, an outdated way of creating EAD. Many archival collections management systems, including the ever-popular ArchivesSpace, automate EAD production. Collections are cataloged in a field-by-field user interface, more similar to library or museum cataloging, and the software is able to convert this information to an exportable EAD file.
Unfortunately, my institution does not currently use an archives software with this capability. We use ReDiscovery Proficio, a software that will likely be a topic of many future blog posts... While the data structure of Proficio is actually very well suited to the hierarchical arrangements of archival collections, we have not yet found a way to automatically export EAD files from the catalog entries for collections. So, we are left to hand-encode our EAD files if we want to make our finding aids accessible on our statewide online collections portal, RIAMCO.
The trouble with hand encoding...
Have you ever hand coded an HTML site? Or, for my digital humanists, have you manually added tags to a piece of text to make it a TEI file? If so, you will have experienced the epic highs and lows, the triumphs and defeats of manual markup.
In markup languages, like HyperText Markup Language and eXtensible Markup Language, tags, defined by <angle brackets>, tell the computer what to do with a piece of text. The computer also needs to be told when to stop listening to a tag, which happens when it encounters a / (slash).
In HTML, tags can define segments of the text. For example:
1<html> # Hello computer, this is an HTML document.
2 <head> # This tells the computer that we're on the header for a document.
3 <title>This is a title!</title>
4 </head> # And now the header is over!
5 <body> # Now we're onto the body of the document.
6 <h1> Section Heading </h1>
7 <p> This is my first paragraph.</p>
8 <p> And this is my second paragraph.</p>
9 </body> # Now the body is done.
10</html> # The HTML document is now over!
Encoded Archival Description is written in XML, which likewise uses tags. Instead of defining elements for the purpose of a computer rendering different types of text, EAD tags define pieces of archival metadata:
1<ead> # Hello computer, this is an EAD document.
2 <control> # This section contains administrative information.
3 <recordid>XXX.XX.XX</recordid>
4 <filedesc> # this next section is going to describe THIS file.
5 <titlestmt> # in this bit, I'll be writing the title.
6 <titleproper> This is the Title </titleproper>
7 </titlestmt>
8 ...
9 </filedesc>
10 </control>
11 <archdesc> # Now this section actually describes the collection.
12 <did> # Here I'm bundling together a bunch of different types of information
13 ... # and so on and so forth...
14 </did>
15 </archdesc>
16</ead>
There are dozens of EAD tags, and two versions of EAD currently in use. The most current version of EAD is EAD3, which was last updated in 2021. The previous version, EAD2002, is officially deprecated, but is still the version in use for RIAMCO. My absolute favorite resource for working in EAD is EADiva, a site developed by archivist Ruth Kitchin Tillman as a totally free and open reference resource.
I learned EAD sort of backwards. I first used EAD as a data source, as mentioned at the beginning of this post, and then learned to encode. This has given me a unique perspective, and I feel I have a particularly strong understanding of the EAD structure as a result.
But, encoding errors happen.
It doesn't matter how well you understand it or how careful you are, encoding errors will always happen when hand-encoding EAD. Typos always happen when you're writing an essay -- that doesn't mean you aren't a good writer.
The difference is that all modern word processing software -- Microsoft Word, Google Docs, etc. -- tell you when they detect an error. Sometimes these errors are not, in fact, errors, but your attention has been called to them so that you can make that decision. Some XML editors will alert you to minor errors, such as a mismatched tag, but they usually cannot reference a specific XML schema to determine if you have made mistakes relative to that schema.
In our particular workflow, our only opportunity to have errors detected is when we upload a completed finding aid to RIAMCO. Before the finding aid is published, we can access a rendered preview. If there is a major error, the render will look bizarre, and it will signal that something is wrong -- but it doesn't say what is wrong. If there isn't a major error, the preview makes it easier to copy-edit the actual text of the finding aid, but all corrections have to be made in the XML document. The old version has to get deleted from the RIAMCO system, the new version uploaded, more proofreading done, rinse and repeat.
A better system?
Necessity is the mother of invention, and she has inspired me to build a more robust EAD validator! At the moment, this validator only works with EAD3 files. Because EAD2002 is deprecated, I'm having a harder time getting the validation code to recognize the XML namespace for EAD2002 files. But, I've decided to put the EAD3 version into the world now. Input an EAD3 file and receive a table of errors. The output will provide both the error message, generated by the R XML package, and the line number for where the error is located in your document. I've found it much easier to correct my EAD using this, and hope it is helpful to others as well!
In addition to the widget below, the validator is also hosted on ShinyApps.
Note: if you get an error message in red, this might be due to an EAD namespace declaration error. Try using this set of declarations:
1<?xml version="1.0" encoding="UTF-8"?>
2<?xml-model href="schema/ead3.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
3<ead xmlns="http://ead3.archivists.org/schema/undeprecated/">
4 <control audience="external" countryencoding="iso3166-1" dateencoding="iso8601" scriptencoding="iso15924" relatedencoding="MARC21" repositoryencoding="iso15511" langencoding="iso639-2b">