Working with tabular data in the museum context

This short introduction provides guidance on preparing tabular data from disparate sources in the museum context. The aim is to make it possible that data can speak to each other from various sources. It is essential for creating comprehensive and insightful analyses, facilitating research across institutions, and enhancing the overall understanding of museum collections in research.

1. Data Inventory:

Identify Sources: Compile a list of all data sources relevant to the museum context. This may include a list of all institutions with a unique ID, addresses and contact information, the formats in which the data was received, other data repositories of the museums such as websites, online catalogs, or open data repositories. This is the source guide for building a data management system.
Understand Data Structures: Gain a thorough understanding of the structure and format of each data source. Note variations in column names, data types, and any unique identifiers. Store every data source referencing the institution's identifier.
Always ask museums to export data and in the way they store their data. You are responsible for cleaning or restructuring of the data, not the museum experts.

2. Standardization:

Column Mapping: Create a standardized mapping for columns by creating an index for each source of the original field title and the standardized field title. Below is a list of fields that are a guide for a standardized mapping of disparate fields from various data structures in the museum context.
Data Cleaning: Address inconsistencies, missing values, and errors. Always have a backup of the original data when starting to clean.
Tools such as Open Refine are powerful starting points for researchers without programming skills.
Meta-data fields describe the value in the field. Only write a value in the field that is indicated in the meta-data field. Two common mistakes in writing data are:
- descriptors that are already mentioned in the meta-data field are repeated as values in the field (for example the unit of measurement (cm))
- additional descriptors that are not mentioned in the meta-data field become values (for example Last, First Name and role (collector) in a field that is only dedicated for standardized names (such as provenance names)).
Relationships can be build by indexing any value that needs additional project specific information. For example Provenance Names, ID, Role and Biography are three columns in a spreadsheet. This spreadsheet can be referenced with the Provenance Name and/or ID.

Here are some examples for how to standardize formats in the following way:

Object Terms: Reference Getty Vocabulary or any of these standards The UK Museum Documentation Standard or Nomenclature for Museum Cataloging as your main references for standardization of object terms and reference the source.
Names: Reference Wikidata and Getty Vocabulary as your main references for standardization of names and reference the source. Last (comma) First Name (semi colon) Last (comma) First Name
Locations: Reference GeoNames or Wikidata for naming conventions of any Locations.
Dates: Use the ISO format to write dates. ISO standard defines the year format as YYYY, the year/month format as YYYY-MM, and the year/month/day format as YYYY-MM-DD.
Duration: hour:minute:second 00:01:00
Measurements: Indicate the unit of measurement in the field title and only write numbers into the field
Provenance: Create Provenance Events in the following way in one column.
Date of Event, Source (Last Name, First Name), Target (Last Name, First Name), Intermediary (Last Name, First Name), Method of Transfer, Location, Archival Resource, Notes
Exhibition History: Create Exhibition Events in the following way in one column. Date of Exhibition, Artists (Last Name, First Name; ), Curator (Last Name, First Name; ), Title, Location, Institution, Bibliography
Object ID: AccessionYear.AccessionGroup.AccessionObject (2023.1.23) this varies from institution to institution and multiple other short forms can be used. However it is important not to use special characters, strings (words) or spaces in the object ID
Images: Image IDs should correspond to the object ID AccessionYear.AccessionGroup.AccessionObject.imagenumber 2023.1.23.65

Here is an example of a data structure as a starting point:

Object ID: Number

Designation: Vocabulary

Description: Free Text

material: Vocabulary

length: Number

width: Number

height: Number

diameter: Number

weight: Number

object type: Vocabulary

colour: Vocabulary

iconography: Vocabulary

object count: Number

inscriptions: Free Text

date made: ISO

producer: Vocabulary

production technique: Vocabulary

production place: Vocabulary

prior owner: Vocabulary

prior owner role: Vocabulary

provenance: Standard

date collected: ISO

place collected: Vocabulary

reign associated name: Vocabulary

ethnic attribution: Vocabulary

department and institution: Standard

accession date: ISO

accession method: Vocabulary

notes: Free text

exhibition history: Standard

display status: Standard

bibliography: Standard

legal status: Standard

condition: Standard

license: Standard

images: Standard

weblink: URL

3. Data Enrichment rather than data cleaning:

If possible, enrich datasets with additional information to enhance their value. This may include linking to external databases or incorporating supplementary data. Any new data fields can have an identifier such as RD (research data) which shows the difference between the museum’s original data and the newly cleaned or enriched data by the researcher.

4. Integration:

Database Setup: Choose a central repository or database for integrated data. Ensure it supports the identified unique identifiers and provides the necessary scalability for future data growth.

Tools such as Excel, Airtable or Google Sheets are a good starting point as a central repository.

5. Documentation:

Metadata Documentation: Create comprehensive documentation for each dataset, including metadata such as data source, last update, and any transformations applied.

Data Dictionary: Develop a data dictionary that defines the meaning and structure of each column. This aids in maintaining consistency and assists other users in understanding the integrated data.

6. Quality Assurance:

Validation: Implement validation checks to ensure data accuracy and integrity. Regularly audit the integrated dataset to catch any anomalies or discrepancies.

Conclusion:

By following these steps, you can effectively prepare and integrate tabular data from disparate sources in the museum context. This process enhances the reliability and utility of the data, supporting informed decision-making and promoting a deeper understanding of museum collections.