Monday, 17 November 2014

Routine can kill passion (and mess up your data)

Documenting what you do, step by step, sounds easy. But it is not. Think, for example, of describing your morning routine. Would you be able to? And how accurately? Let’s give it a try: you wake up, get out of bed, prepare your coffee or tea... Wait. We forgot to say that you put your slippers on. Oh, and before that, that you probably turned off your alarm. You see? The things that we do automatically can be among the most complicated to document. So, when I started documenting my work, I realised how many small (and not so small) transformations and adjustments I apply to my data, without even thinking. Then I wondered if these actions should be documented as well.
The problem is, as always, where to draw a line and when “more information” becomes “too much information”. I have tried to keep the ontology slim, so that its complexity is not off putting for other researchers. However, the ontology is theoretically always open to further specification, that the user can decide to use or not. 
Just to give an example, I want to mention some of the operations that virtual archaeologists, in my experience, perform so often that they might go unnoticed. 

Elements of a series: isDerivedFrom
In the real world, of course, things are all unique. If measured, all the columns in the colonnade of the Iseum would have similar but different values. I have decided that the level of granularity of my representation doesn’t require that precision. Therefore, as in many other models of ancient buildings, all my columns have been artificially assumed to be identical (and perfectly aligned). Only one has been measured on site (the one that looked better preserved), and the others duplicated. To express this process, the subelement column that has been actually measured is documented as based on hard measurements (taken by me and available online at a certain url), while all the others are recorded as derived from other elements, i.e. derived from the value of the only measured one. 
My measurements of the ekklesiasterion of the Iseum.

Elements of a series: isConformedTo
Another possibility, is that a series of elements, such as the arches on the east wall of the ekklesiasterion, have actually been singularly measured but, for various reasons, it is not considered relevant to represent these differences visually in the model. In the case of the ekklesiasterion, my assumption is that the differences between the arches are mainly due to weathering and other accidents. And, although they were never perfectly identical in the past, my reckon is that they were meant to look so (a part from the central one which is wider), so I think it made sense to just model one arch and clone it four times. It is actually a more economical approach from a modelling point of view. 
How to represent this process in the documentation? In this case, all the arches had been measured, however they have been «conformed» (the word is a work in progress label. Any better ideas? «regularised»? «normalised»? ) to an average value. In the documentation, they have an attribute that has as value the range between the lowest and the highest values measured, and the percentage that this range is against the whole measured value. That sounds confusing… So, the four arches of the east wall of the ekklesiasterion (I have left out the wider central one) have a width between 159 and 164 cm. So, all the four of them have, as value of hasWidth an average 162 cm. However, the arches (transitions) also have two attributes which are “isConformedTo: average of four (159/164)”, and “hasVariation (again, the label is a work in progress): 3%”; i.e. the percentage of the variation against the whole average value: 5 cm on 162 cm.

If stating that the columns of a colonnade have not been singularly measured can sound unnecessary and pedantic (and, maybe, it actually is…), conforming the value of elements that had been measured might sound like a loss of information. However, in the documentation of the 3D element, there is always a link to the original measurements in case they are needed at a different stage of the research or by other scholars. 

Monday, 3 November 2014

Embrace your inner dr. Frankenstein: documenting heterogeneous sources

There are a few things I have noticed writing the documentation of my 3D model in RDF, that I had not realised before starting thinking about it.
When I started my research on the ontology, I assumed that assigning one source to each element of the 3D model would have been more than enough to document sufficiently a 3D visualisation of cultural heritage.
But then I found out that a single source not always could provide all the information I was looking for. I (and possibly many others in this field) have to put together pieces of information that not only come from various archives but that have often different format, author and history. I know, it sounds like a terrible mess…

The ekklesiasterion of the Iseum in Pompeii, the north
wall visible between the arches of the east one.
Picture from pompeiiinpictures
For example, let’s look at the hypothetical restoration of the Iseum pre catastrophe. If we take the north wall of the ekklesiasterion, we can say that the width of the wall has actually been measured. So, the source for that specific information is the measurement taken on site and recorded by the researcher (in this case me) and available online at a certain url. The depth of the wall, however, cannot really be measured, definitely not with the equipment I had with me. So, the value I have assigned to the depth of the north wall of the ekklesiasterion in my model is simply based on the depth of the east wall of the same room, that can be measured because it has arches in it. The guess is supported by the fact that the depth of the walls appears to be quite consistent across the entire architectonic complex. So, the source for this other bit of information, is another element (the east wall) that has actually been measured. Last, it is not possible to know how tall the wall was before the eruptions.
For the more hypothetical elements, I have relied on Piranesi’s drawings as they have proven to be a thorough and, all in all, acceptably reliable visualisation. Thus, the height of the north wall has yet another source.

As you can see, the problem here is that not just each element, but even each dimension of the element can have a different source (it’s not always the case, but it has happened).
For this reason, I have decided to enter, for each feature, transition or constrain, the attributes hasHight, hasWidth, hasDepth, and use them not only to express the numeric value, but also (or mainly) to connect them to the related source.
Is this level of documentation, although expressed synthetically through RDF triples, sustainable? I’m not sure yet…

Boris Karloff(*) as the Creature of Dr. Frankenstein
Image from giphy
To achieve a higher consistency, I could have tried to derive all information from the richest source, which is probably Piranesi. This would have been a perfectly acceptable choice, and the outcome would have been “an hypothetical restoration of the Iseum in Pompeii according to Francesco Piranesi”.
Nonetheless, I followed a different approach. Although I don’t want to state any degree of preferability among the different sources, I have chosen to use hard measurements each time they were available. Also, information derived geometrically from the actual remains has been considered preferable to the one derived from drawings or other secondary sources. Piranesi’s data, in the end, have been mostly used for the things that cannot be measured, that I didn’t measured (for various reasons) and that do not exist anymore.

I know that this choice makes my model a little frankenstein of information, but, in the first place, even the most detailed elevation or cross section cannot show all the information needed to produce a 3D model that is actually visible 360° in space. 
Second, my aim is not to produce a new groundbreaking hypothesis on the restoration of the Iseum but to provide a way to connect the 3D model to its sources. From this perspective, it is actually interesting to me to see how much I can stretch the potentiality of my system, and to give an idea of the richness and diversity of data virtual archaeologists deal with.

(*) trivia: glorious actor Boris Karloff is one of the King's College London illustrious alumni.