Transcription and accuracy

Transcribing the census is a massive exercise. Every single digitised document has to be read and transcribed by hand, a process that results in over seven billion keystrokes over the course of the project.

With this volume of keystrokes, errors are inevitable. However, during transcription, we apply a number of processes (which we have developed during our many years’ experience of digitising censuses and other historical documents) to correct the most obvious errors and keep inaccuracy to a minimum.

The transcription is designed as a finding aid for the original documents, which should be viewed as the ‘source of truth’ and most users are able to find their ancestors despite the inevitable errors that creep in.

The challenges of the 1911 census


The 1911 census poses particular problems. The core documents from which the transcribers work are the original household pages rather than the enumerators’ summary books, as in previous censuses. This means they have to decipher the handwriting of eight million different people – a challenging task.

Accuracy levels


The National Archives set an accuracy threshold of 98.5 per cent, and at launch the accuracy of the 1911census.co.uk website exceeded this figure. Nevertheless, we are continually working to reach even higher accuracy levels, which we do in a number of ways:

Batch sampling


All of our transcriptions undergo thorough batch sampling by the transcription house, by The National Archives and by our in-house quality control team. Any batch failing to meet the required level of accuracy is rejected and re-keyed.

User reporting


Users of 1911census.co.uk can also report transcription errors to us. Each report is reviewed by the transcription team and if the change is approved, the change is incorporated into the search results, usually within a month (when the next data upload is made to the website).

Our policy is to only accept changes if they match the entry on the original page, and we are unable to make changes based on information supplied to us from a different source.

Data standardisation


When we transcribe the original records we ‘key as seen’ in order to reproduce faithfully the original, even if the original is not what you’d expect. However, this is not necessarily useful when you are trying to search.

For example, ‘St John’s Street’ may be written in a number of different ways, such as: Saint John’s Street, St John’s St and St John’s Street. It has even been listed as SJS in some records.

We use our discretion, and the knowledge of how people search to standardise certain fields, so that when you enter a specific term in one of the search fields, the search will pick up all the variations. This is a long-term process, as we need to look at large collections of data in order to assess which elements need standardising, and in some cases the processes are much easier to apply once the data is complete.

Why don’t we double key?


It is possible to reduce transcription errors by ‘double-keying’ every entry. In this process, every document is transcribed twice by different people and the results compared, with any differences eliminated by hand.

This naturally doubles the transcription cost, but does not significantly improve the accuracy (since you can never reach 100 per cent) and the costs would have been passed on to the public, which would result in much higher prices for the census service.

Unreadable and Contestable data


In a small number of cases, the transcribers come across data that is either ‘illegible’ or ‘contestable’. Illegible data either cannot be read at all, or so little of it can be read that it cannot be made sense of. Contestable data can be read but may be read differently by two or more people and it is not possible to say definitively which version is right.

In such cases the transcribers use a question mark to indicate which data is unreadable:

Transcription table.

Related topics


Search features and tips .