Indiana Jones and the lost file format

Going out with an Archaeologist undergrad in college, meant date night sometimes kicked off with a lecture in UCD Archaeology Dept. The excavations of the Danevirke (an earthen wall to keep the Vikings out of northern Germany) might not kick off St. Valentine’s night for many but dear reader, I have that t-shirt. A quiz question on a TV recently on the tablets of the Epic of Gilgamesh brought me back to college days and an interesting discussion on file formats. 

The Epic of Gilgamesh is a pre-Homeric epic poem going back at least four thousand years, set in Uruk now present-day Iraq. I don’t plan to go into the story of the Epic of Gilgamesh as to do so would probably require the first adult-rated blog post on Teachnet but more on the clay tablets on which the story has passed down to us. The Epic of Gilgamesh is the second oldest text and the first piece of narrative literature with a file format- clay tablet. The change from an oral culture to a script-based culture permeates down to our own present culture.   

Gilgamesh Dream Tablet Credit: commons.wikimedia.org used under Public Domain

Baked clay tablets are not the most durable form of saving knowledge or data and the existence of the said tablets are a boon to archaeologists and literacy scholars. Some of the original tablets are missing chunks or are worn away from centuries of handling. Yet clay may prove more durable than code. Software is fragile unlike words carved in stone, it can get deleted or corrupted. I think everyone reading this blog post has experienced the same range of emotions as the author with a broken USB key containing a final revision of an important document. 

It raises some questions; how do you identify that file with a weird extension? How do you manage data formats? How do you plan for archived documents saved on external hard drives in a cardboard box in that store cupboard that no one enters? 

On a few occasions, a query for an obscure filetype has brought me to fileinfo.com. This site claims to have an entry for 10,000 file types as well as a free to trial file viewer to open over 400 file types. It will list the file types that use that extension as well as identify the software packages that will open the file. It is a useful resource to save in Bookmarks.

How to manage data formats for future use? The US National Archives has a very useful page on the preferred formats for a range of media such as audio or video files with the focus on un-compressed or raw data formats.  It might be a prudent use of time to create a low compression video file to go with the HD widescreen version of the school concert or a Rich Text File of an interview of a WWII veteran in case the video or audio format might get obsolete or corrupted. Hopefully, an agency like Software Heritage whose mission is to collect, preserve and share all software that is publicly available in source code form will enable a solution to retrieve the most obscure file format.  

From a previous career as a software tester, many software packages have backward compatibility to previous versions or you may be forced to seek an emulator however a site like winworldpc will host many versions of OS and software packages but many require some skill and advanced knowledge to get them to work. Archive.org also has a software library with a slant towards old video games. There is a quest to rival Indiana Jones’ quest for the Holy Grail, to find a Universal Translator for old, obsolete computer files that could easily work in a web browser but it is an ongoing project.   

One final flogging of the archaeologist and old software theme might raise a wry smile to those of a certain age when a couple of years ago it was discovered, a buried treasure of a dubious value was uncovered. Atari in the 1980s produced an ET video game that was so bad that they had to bury them in a dump in the desert  

So it raises one final question…. What will open Gilgamesh.claytablet? 

Scroll to Top