Transcription from handwriting

samplePrior to offering transcription of scanned printed text as a service, I built up some experience with Project Gutenberg. I didn’t really have any experience of transcription from handwritten texts, but offered the chance to try, I gave it a go. Some thoughts on the process ….

With scanned printing, there are really three stages. The first is OCR – Optical Character Recognition.then proofreading, then final formatting. OCR happens by computer, and is pretty fast – it just requires the right software. Proofreading – making sure the text has been captured accurately, getting rid of spurious bits and pieces – is slower. The final formatting once you have a good text is a quick pass through. All of these are really present for handwritten texts, but the OCR has to be done by eye! (If there’s software that can reliably read handwriting, I’d like to know about it!)

Proofreading is quite slow – especially when there are things like names of people or places that may have been obvious to the writer, but aren’t so obvious without their mental context. It helps to have some sort of overview of the whole document, as the same names may crop up elsewhere.

The final format will depend on what is to be done with the document, but once the text is in place, it’s easy enough for it to be bashed into any required page or file format. The task of initially transcribing from handwriting was shared between people – and it was interesting how much extra work was required simply to get the different extracts back to the same format – note to self: make sure this is defined properly in advance next time!

What was most interesting was the sense of personal involvement in people’s stories. We were transcribing a kind of visitors’ book. To follow small elements of the family history over the years was a surprisingly touching experience.

Where to keep things?

It’s ironic that in an age when it is so easy to generate media, we have a real issue with media permanence. Phones have largely displaced cameras and video recorders as means of storing memories, and indeed they are capable of very high quality in simple terms – my current phone takes pictures with a higher resolution than any digital camera I have had, and also records HD videos. There are plenty of options for displaying media – we have used various companies to create photobooks, for example, and other people have digital picture frames. However, it takes much more discipline to get the media off the phone and onto somewhere that they can be enjoyed, even by the photographer, let alone anybody else.

There’s also the problem of how permanent the medium is. The fact that we don’t all remember Betamax and LaserDiscs is a warning to us. It’s all very well converting photos into JPEG and LPs into MP3 files – but what if in three years’ time, I’ve lost the disk they are stored on? Or in five years’ time, nobody even reads that file format any more?

This issue exists with traditional media as well. Part of the motivation for setting up the service that we offer was consciousness of the fact that, too often, when the sad time comes to clear a house, it will reach the stage where it is easier to toss stuff into a skip than it is to spend time processing it, or space storing it. On the CJK Digital facebook page are scans of some old postcards, once part of a fairly substantial collection, which had simply been dropped off at a second hand bookshop. Most won’t get even that far.

The file format may be an issue – although certain formats are now so ubiquitous it’s hard to imagine that they will ever be unreadable. The issue of where to store stuff is bigger, though. CDs/DVDs are one option: estimates for their longevity suggest of the order of 100 years. Cloud storage – uploading them to a company that keeps them online for you – is another. There are privacy risks, but the fact that a company is basically undertaking to keep them secure, providing their own backup service and protection against things like fire and theft, is probably as secure as it can get.

There are various providers, most of whom will charge for anything more than a limited service. One option that is particularly interesting, especially if you already have Amazon Prime, is Amazon’s Cloud Drive. Included in the price is the ability to store any number of photos. In addition to a desktop application, there are phone apps, which permit you to echo photos straight to your storage.

The slide problem

It seems, absent large scale facilities, I happened on the best way of scanning slides.

We’ve been using a flat-bed scanner with a frame to put the slides in.scanner It is a cumbersome process. Slides have to be checked as clean as they can be, and placed into the frame on the scanner. If lucky, or more accurately, if the slides are “normal”, the software works out where they are and how big they are, and produces a fair scan. It’s possible to scan around 30 an hour If unlucky, it may be necessary to go from an “auto” mode to a “professional” mode, to define the edges of the image accurately. This slows things down even more. The scanning software has built-in dust removal and colour correction algorithms, and will scan at … well, a higher resolution than I can imagine people asking for. But it’s laborious.

So I started looking at dedicated slide and film scanners. I have ordered one, which seemed to perform well according to reviewers (a 14 megapixel sensor, adaptors for various media, some ability to adjust the image, saving direct to a memory card). scanner2They are substantially faster, but until you get to the really high-end ones (over £1000), they still seem to have their drawbacks. One of the major criticisms seems to be that they chop the edges of the image off – there are widespread claims from reviewers that 10-15% of the image is lost on a wide range of these scanners. That’s disappointing, as the slide frame provides what ought to be a good clear boundary for the image. The problem isn’t avoided entirely with flat-bed scanners: it is one of the reasons that I frequently had to engage “professional” mode when using it – but at least here it’s possible to tell the scanner, “No, I definitely want you to scan up to these edges.”

Another discovery was that not all scans turn out to be worthy of keeping. Some people are in the position of being able to filter down the slides they definitely want to keep prior to getting them scanned. But in other cases, people don’t know what they’ve got until they start to look at them again.

So I’m thinking that the possible work flow needs adapting…

  • Clean slides as far as possible – get rid of markings, dust etc.
  • Do a “fast scan” with the slide scanner, and review the slides to determine which ones are to be scanned properly. This will still return a good high-resolution JPG file, but is not so time consuming.
  • For these, do a “slow scan” using the flatbed scanner, ensuring that the whole image is captured. Software dust removal or colour correction can be done on this, and image manipulation software can be used to improve it further.

Realistically, a “slow scan” takes a considerable amount of human input, and so will cost more. But if the amount of “slow scans” can be reduced with the slide scanner doing “fast scans”, then the overall cost could come down. And some of the grind of lining up slides in the flatbed scanner might also be eliminated!

Copying VHS videos

We have had the ability to burn the contents of VHS videos to DVDs for some time, but we’ve just taken a significant step forwards. Rather than simply letting our trusty Toshiba machinetoshiba zap the stuff straight onto DVD, we can grab the stuff from the video cassette onto a computer, tidy it up, and then burn it onto a DVD. This opens up various options. It’s still possible to simply dump the video contents straight on to a DVD so that it runs just like a video would. But it’s also possible now to divide a video up into chapters, and put a menu on the front of the DVD, making it easier to find what you’re looking for. In principle it is easier to edit videos as well, if that is desirable.

Significant amounts of family memories were stored on video cassettes in the 80s and 90s – wedding videos, and family camcorder recordings – even digital ones, before data storage was as cheap and easy as it is today – and we may be able to help you see those recordings again. We would charge around £8 per hour for a simple transfer from VHS to DVD. The charge for adding a menu system would depend on its complexity, but for guidance, it would be possible to add a simple menu for about £5.

As mentioned in an earlier post, the law has now changed, making it legal to make personal copies of videos. It may be the case that you have video recordings of things that it’s not possible to find on DVD. Copy protection notwithstanding, if you have an original video, the law would now be on your side if you wish to burn it onto a DVD, for format.

Personal copies and the law

The law was recently changed to officially permit what had already been happening since time immemorial. The Copyright and Rights in Performances (Personal Copies for Private Use) Regulations 2014 says that

The making of a copy of a work, other than a computer program, by an individual does not infringe copyright in the work…

There are restrictions – it must be for the personal use of the person making the copy, who can’t gain commercially from it, and they must own the original from which the copy was made. But making back-up copies and copies in different formats are all legitimate now.

This means that you can legally convert CDs to MP3 files for your own collection, or put them on iTunes or Google Play. Converting your own LPs into MP3 files or burning them to a CD is also now officially legitimate. You can make your own compilations to listen to in the car. In principle, if you wish to make an electronic version of a book to study on a computer, this is also okay.

However, the law doesn’t permit the giving or receiving of mix tapes or mix CDs – a copy would be going to a person who doesn’t own the original.

An interesting question

There’s a section that follows the regulations, with the heading:


Remedy where restrictive measures prevent or restrict personal copying


As far as I can tell, this argues that, if a copyright owner has in place a mechanism to prevent someone from making personal copies, this person can complain to the Secretary of State. The Secretary of State can then intervene if he or she decides that the person is being prevented from making a personal copy, the right to which is provided by this regulation.

This has little bearing on listening to music; most obvious forms of personal copying are achievable. However, one of the areas where I suspect this has interesting implications relates to video cassettes. Commercial video cassettes have copy protection schemes, and  VHS/DVD combination recorders won’t bypass them, meaning that people have not been able to transfer the content of commercial VHS video cassettes to DVDs. The implication of these regulations seems to be that this is likely to face a legal challenge, and copyright holders of copy protected videos could, in theory, be forced by the government to provide a means whereby owners can obtain personal copies without restriction.


It’s quite feasible to convert a text from a physical book to an electronic book. However, it’s a multiple stage process.

The first stage is scanning the physical text. Here’s a scanned image from a book called “A Memoir of Adolph Saphir D.D.”.

Next, Optical Character Recognition (OCR) software has to be used to convert this from an image into text. This is pretty intensive in terms of computer power. I use Abbyy FineReader 10 Professional Edition. Here’s a sample of the output from this (though much of the formatting has been lost in copying from a Word document into Blogger):


TT has been impossible to publish sooner the Memoir of the lamented Dr. Adolph Saphir. On account of his sudden death, which followed so closely that of his wife, there was a delay in the settlement of his affairs; and, consequently, no access could be had to documents of any kind till about the middle of last year—a year after his death. When I was then asked to write the Memoir, much time and labour were required to collect letters and documents from friends and correspondents of Dr. Saphir. But though there has consequently been delay, the Memoir will, I believe and hope, be not less valued by devoted friends, of whom he had very many, nor less interesting to the general public.

A good quality scan makes a difference – by comparing the image and the text, you can see how good a job the software has done in “reading” the image.

However, the most intensive stage is still to come. That is proofreading the text that has been produced. FineReader will highlight places where it was unsure about the translation from image to text, which means that the file can be edited directly in the software. Alternatively, a rough word processor file can be used as a starting point with reference to the original document. In either case, the Scan/OCR stages are pretty much just a question of getting round to them and then letting the computer run. The proofreading stage is a project in its own right.