All Topics  
Book scanning

 

   Email Print
   Bookmark   Link






 

Book scanning



 
 
Book scanning is the process of converting physical book
Book

A book is a set or collection of written, printed, illustrated, or blank sheets, made of paper, parchment, or other material, usually fastened together to hinge at one side....
s into digital images or electronic books
E-book

An e-book is the digital media equivalent of a conventional printed book. Such documents are usually read on personal computers, or on dedicated computer hardware devices known as e-book readers or e-book devices....
 (e-books) via image scanning. This is a much less time-intensive method than re-typing all of the text; before scanning became feasible, re-typing was generally the only option.

Once a book has been digitally scanned, the images are available for rapid distribution, reproduction, and on-screen reading. Such book images are commonly stored in a DjVu
DjVu

DjVu is a computer file format designed primarily to store , especially those containing combination of text, line drawings and photographs. It uses technologies such as image layer separation of text and background/images, progressive loading, arithmetic coding, and lossy compression for bitonal images....
, Portable Document Format
Portable Document Format

Portable Document Format is a file format created by Adobe Systems in 1993 for document exchange. PDF is used for representing two-dimensional documents in a manner independent of the application software, hardware, and operating system....
 (PDF), or Tagged Image File Format
Tagged Image File Format

Tagged Image File Format is a file format for storing raster graphics, including photographs and line art. It is now under the control of Adobe Systems....
 (TIFF).






Discussion
Ask a question about 'Book scanning'
Start a new discussion about 'Book scanning'
Answer questions from other users
Full Discussion Forum



Encyclopedia


Book scanning is the process of converting physical book
Book

A book is a set or collection of written, printed, illustrated, or blank sheets, made of paper, parchment, or other material, usually fastened together to hinge at one side....
s into digital images or electronic books
E-book

An e-book is the digital media equivalent of a conventional printed book. Such documents are usually read on personal computers, or on dedicated computer hardware devices known as e-book readers or e-book devices....
 (e-books) via image scanning. This is a much less time-intensive method than re-typing all of the text; before scanning became feasible, re-typing was generally the only option.

Once a book has been digitally scanned, the images are available for rapid distribution, reproduction, and on-screen reading. Such book images are commonly stored in a DjVu
DjVu

DjVu is a computer file format designed primarily to store , especially those containing combination of text, line drawings and photographs. It uses technologies such as image layer separation of text and background/images, progressive loading, arithmetic coding, and lossy compression for bitonal images....
, Portable Document Format
Portable Document Format

Portable Document Format is a file format created by Adobe Systems in 1993 for document exchange. PDF is used for representing two-dimensional documents in a manner independent of the application software, hardware, and operating system....
 (PDF), or Tagged Image File Format
Tagged Image File Format

Tagged Image File Format is a file format for storing raster graphics, including photographs and line art. It is now under the control of Adobe Systems....
 (TIFF). One can reap additional benefits by using optical character recognition
Optical character recognition

Optical character recognition, usually abbreviated to OCR, is the mechanical or Electronics translation of s of handwritten, typewritten or printed text into machine-editable text....
 (OCR) to convert images of book pages into a machine-processable encoding of the book's text, dramatically reducing the storage needed for the book and allowing the text to be reformatted, searched, or used as input for text processing applications such as natural language processing
Natural language processing

Natural language processing is a field of computer science concerned with the interactions between computers and human languages. Natural language generation systems convert information from computer databases into readable human language....
.

Commercial book scanners

Commercial book scanners are not like normal scanners
Image scanner

In computing, a scanner is a device that optically scans images, printed text, handwriting, or an object, and converts it to a digital image. Common examples found in offices are variations of the desktop scanner where the document is placed on a glass window for scanning....
; these book scanners are usually a high quality digital camera
Digital camera

A digital camera is a camera that takes video or still photographs, or both, digitally by recording digital image via an electronics .Many compact digital still cameras can record sound and moving video as well as still photographs....
 with light sources on either side of the camera mounted on some sort of frame to provide easy access for a person or machine to flip the pages of the book. Some models involve V-shaped book cradles, which provide support for book spines and also center book position automatically.

The advantage of this type of scanner is that it is very fast, compared to the productivity of overhead scanners. Compared with traditional overhead scanners whose prices normally start from US$10,000 upwards, this type of digital camera-based book scanner is much more cost-effective.

Book scanning by organizations on a large scale

Projects like Project Gutenberg
Project Gutenberg

Project Gutenberg, abbreviated as PG, is a volunteer effort to digitize, archive and distribute cultural works, as founder Michael Hart said "To encourage the creation and distribution of eBooks."....
, Google Book Search
Google Book Search

Google Book Search is a tool from Google that searches the full text of books that Google scans, converts to text using optical character recognition, and stores in its digital database....
, and the Open Content Alliance
Open Content Alliance

The Open Content Alliance is a consortium of organizations contributing to a permanent, publicly accessible archive of digitized texts. Its creation was announced in October 2005 by Yahoo!, the Internet Archive, the University of California, the University of Toronto and others ....
 scan books on a large scale.

One of the main challenges to this is the sheer volume of books that must be scanned, expected to be in the tens of millions. All of these must be scanned and then made searchable online for the public to use as a universal library
Universal library

A universal library is a library which contains all existing or useful information or knowledge.This ideal, although unrealizable, has influenced and continues to influence librarians and others....
. Currently, there are 3 main ways that large organizations are relying on: outsourcing, scanning in house using commercial book scanners, and scanning in house using robotic scanning solutions.

As for outsourcing, books are often shipped to be scanned by low-cost sources such as India
India

India, officially the Republic of India , is a country in South Asia. It is the List of countries and outlying territories by total area country by geographical area, the List of countries by population country, and the most populous liberal democracy in the world....
 or China
China

China is a Culture of China, an ancient civilization, and, depending on perspective, a national or multinational entity extending over a large area in East Asia....
. Alternatively, due to convenience, safety and technology improvement, many organizations choose to scan in-house by using either overhead scanners which are time-consuming, or digital camera-based scanning solutions which are substantially faster, and is a method employed by Internet Archive as well as Google. Traditional methods have included cutting off the book's spine and scanning the pages in a scanner
Image scanner

In computing, a scanner is a device that optically scans images, printed text, handwriting, or an object, and converts it to a digital image. Common examples found in offices are variations of the desktop scanner where the document is placed on a glass window for scanning....
 with automatic page-feeding capability, with rebinding of the loose pages occurring afterwards.

Once the page is scanned, the data
DATA

Debt, AIDS, Trade in Africa is a multinational Non-governmental organization founded in January 2002 in London by U2's Bono along with Robert Sargent Shriver III and activists from the Jubilee 2000 Drop the Debt campaign....
 is either entered manually or via OCR, another major cost of the book scanning projects.

Due to copyright
Copyright

Copyright is a form of intellectual property which gives the creator of an original work exclusive rights for a certain time period in relation to that work, including its publication, distribution and adaptation; after which time the work is said to enter the public domain....
 issues, most scanned books are those that are out of copyright; however, Google Book Search
Google Book Search

Google Book Search is a tool from Google that searches the full text of books that Google scans, converts to text using optical character recognition, and stores in its digital database....
 is known to scan books still protected under copyright unless the publisher specifically excludes them.

Destructive scanning

For book scanning on a low budget, the least expensive method to scan a book or magazine is to cut off the binding. This converts the book or magazine into a sheaf of looseleaf papers, which can then be loaded into a standard automatic document feeder
Automatic Document Feeder

In multifunction printer, fax machines, photocopiers and scanners, Automatic Document Feeder or ADF is a feature which takes several pages and feeds the paper one page at a time into the scanner, allowing the user to scan multiple-page documents without having to manually replace each page....
 and scanned using inexpensive and common scanning technology. While this is definitely not a desirable solution for very old and uncommon books, it is a useful tool for book and magazine scanning where the book is not an expensive collector's item and replacement of the scanned content is easy. There are two technical difficulties with this process, first with the cutting and second with the scanning.

Cutting

The proper method of cutting a stack of 500 to 1000 pages in one pass is accomplished with a guillotine
Guillotine

The guillotine consists of a tall upright frame from which a long, smooth, heavy blade is suspended. This blade is raised with a rope and then allowed to drop, severing the victim's head from his or her body....
 paper cutter. This is a large steel table with a paper vise
Visé

Vis? is a Wallonia municipality and City status in Belgium of Belgium, where it is located on the river Meuse river, in the province of Li?ge ....
 that screws down onto the stack and firmly secures it before cutting. The cut is accomplished with a large sharpened steel blade which moves straight down and cuts the entire length of each sheet all at once. A lever on the blade permits several hundred pounds of force to be applied to the blade for a quick one-pass cut.

A clean cut through a thick stack of paper cannot be done with a traditional inexpensive sickle-shaped hinged paper cutter
Paper cutter

A paper cutter is a tool often found in offices and classrooms, designed to cut a large set of paper at once with a perfectly straight edge....
. These cutters are only intended for a few sheets, with up to ten sheets being the practical cutting limit. A large stack of paper applies torsional forces on the hinge, pulling the blade away from the cutting edge on the table. The cut becomes more inaccurate as the cut moves away from the hinge, and the force required to hold the blade against the cutting edge increases as the cut moves away from the hinge.

The guillotine cutting process dulls the blade over time and which must be resharpened. Coated paper
Coated paper

Coated paper is paper which has been coating by a compound to impart certain qualities to the paper, including weight and surface gloss, smoothness or ink absorbency....
 such as slick magazine paper dulls the blade more rapidly than plain book paper, due to the kaolinite
Kaolinite

Kaolinite is a clay mineral with the chemical composition Aluminium2Silicon2Oxygen54. It is a layered Silicate minerals, with one tetrahedron sheet linked through oxygen atoms to one octahedron sheet of alumina octahedra....
 clay
Clay

Clay is a naturally occurring material composed primarily of fine-grained minerals, which show plasticity through a variable range of water content, and which can be hardened when dried and/or fired....
 coating. Additionally, knifing an entire hardcover book causes excessive wear due to cutting through the hardcover backing. Instead the outer cover is removed and just the internal bound paper stack is cut.

Scanning

Once the paper is liberated from the spine, it can be singly scanned one sheet at a time in a traditional flatbed scanner. However this is a slow and laborious process. It is instead much easier to use an automatic document feeder (ADF) to scan the material.

Some types of books can be difficult to scan in an ADF due to the book having a decorative riffled edging or curving in an arc due to a non-flat binding. An ADF is intended to scan pages which are all of uniform shape and size, and this nonuniform sizing can lead to improper scanning. For these books, the riffled edges or curved edge is guillotined off to render the outer edges flat and smooth before the binding is cut.

The coated paper slickness of magazines and bound textbooks can make them difficult for the rollers in an ADF to pick up and guide along the paper path. An ADF which uses a series of rollers and channels to flip sheets over may be subject to many jams and misfeeds. Generally there are fewer problems by using as straight of a paper path as is possible, with few bends and curves. The clay can also rub off the paper over time and coat sticky pickup rollers, making them loosely grip the paper. The ADF rollers may need periodic cleaning to prevent this slipping.

Magazines can pose a bulk-scanning challenge due to small nonuniform sheets of paper in the stack, such as magazine subscription cards and fold out pages. These need to be removed before the bulk scan begins, and are either scanned separately if they include worthwhile content, or are simply left out of the scan process.

Non-Destructive scanning


In recent years, software driven machines and robots have been developed to scan books without the need of disbinding them in order to preserve both the contents of the document and a digital photo archive of its current state. This recent trend has been due in part to ever improving imaging technologies that allow a high quality digital archive image to be captured with little or no damage to a rare or fragile book in a reasonably short period of time. Some high-end scanning systems employ vacuum and air and static charges to turn pages while imaging is performed automatically, usually from a high resolution camera located over an adjustable v-shaped cradle. Images are then shuttled from the imaging device into various editing suites which can further process the images for either an archival-quality file such as TIFF or JPEG 2000, or a web-friendly output such as JPEG or PDF.

See also

  • Robotic book scanner
    Robotic book scanner

    A robotic book scanner is a machine which is used to book scanning, integrating automated components that allow the device to exceed the speed of traditional manual imaging devices such as camera stands....
  • Planetary scanner
    Planetary scanner

    A planetary scanner is a type of for making scans of rare books and other easily damaged documents. In essence, such a scanner is a mounted camera taking photograph of a well-lit environment....
  • Institutional Repository
    Institutional repository

    An Institutional Repository is an online locus for collecting, preserving, and disseminating -- in digital form -- the intellectual output of an institution, particularly a research institution....
  • Digital Library
    Digital library

    A digital library is a library in which collections are stored in digital formats and accessible by computers. The digital content may be stored locally, or accessed remotely via computer networks....
  • Optical character recognition
    Optical character recognition

    Optical character recognition, usually abbreviated to OCR, is the mechanical or Electronics translation of s of handwritten, typewritten or printed text into machine-editable text....


Book archives

  • Search for free online books by author, title, keyword, etc.
  • Read and share free online books.
  • Internet Public Library
    Internet Public Library

    The Internet Public Library is a non-profit, largely student-run website at Drexel University. Visitors can ask a reference question. Volunteer librarians and graduate students in library and information science form collections and answer questions....
     Search for free online books by author, title, keyword, etc.
  • A project to convert many of Project Gutenberg's books to PDF format.
  • Collection of Welsh books of national cultural interest which have long been out of print.
  • Google Book Search
    Google Book Search

    Google Book Search is a tool from Google that searches the full text of books that Google scans, converts to text using optical character recognition, and stores in its digital database....
     Search for books from numerous sources. Full text available for works in the public domain.


External links