Authors: Andreas Schmid, Lorenz Heckelbacher, Raphael Wimmer

Published in: Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems (publication page)

Date: 2022-04-27

We modified a flatbed scanner by adding an infrared LED to its light guide. As the ink of most pens is invisible in the infrared spectrum, this scanner can be used to extract handwritten annotations from printed documents. (Tweet this with link)

Despite ever improving digital ink and paper solutions, many people still prefer printing out documents for close reading, proofreading, or filling out forms. However, in order to incorporate paper-based annotations into digital workflows, handwritten text and markings need to be extracted. Common computer-vision and machine-learning approaches require extensive sets of training data or a clean digital version of the document. We propose a simple method for extracting handwritten annotations from laser-printed documents using multispectral imaging. While black toner absorbs infrared light, most inks are invisible in the infrared spectrum. We modified an off-the-shelf flatbed scanner by adding a switchable infrared LED to its light guide. By subtracting an infrared scan from a color scan, handwritten text and highlighting can be extracted and added to a PDF version. Initial experiments show accurate results with high quality on a test data set of 93 annotated pages. Thus, infrared scanning seems like a promising building block for integrating paper-based and digital annotation practices.