Microsoft Office Document Imaging
Encyclopedia
Microsoft Office Document Imaging (MODI) is a Microsoft Office
Microsoft Office
Microsoft Office is a non-free commercial office suite of inter-related desktop applications, servers and services for the Microsoft Windows and Mac OS X operating systems, introduced by Microsoft in August 1, 1989. Initially a marketing term for a bundled set of applications, the first version of...

 application that supports editing documents scanned by Microsoft Office Document Scanning. It was first introduced in Microsoft Office XP
Microsoft Office XP
Microsoft Office XP is a productivity suite written and distributed by Microsoft for their Windows operating system. Released on March 5, 2001, it is the successor to Office 2000 and the predecessor to Office 2003, and was known as Office 10 in the early stages of its development cycle...

 and is included in later Office versions including Office 2007. It is no longer available in Office 2010. According to Microsoft
Microsoft
Microsoft Corporation is an American public multinational corporation headquartered in Redmond, Washington, USA that develops, manufactures, licenses, and supports a wide range of products and services predominantly related to computing through its various product divisions...

, MODI allows users to:
  • Scan single or multi-page documents.
  • Produce editable text from a scanned document using OCR
    Optical character recognition
    Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. It is widely used to convert books and documents into electronic files, to computerize a record-keeping...

    .
  • Copy and export scanned text and images to Microsoft Word
    Microsoft Word
    Microsoft Word is a word processor designed by Microsoft. It was first released in 1983 under the name Multi-Tool Word for Xenix systems. Subsequent versions were later written for several other platforms including IBM PCs running DOS , the Apple Macintosh , the AT&T Unix PC , Atari ST , SCO UNIX,...

    .
  • View a scanned document (the software does not permit navigating among multiple documents).
  • Search for text within scanned documents.
  • Easily reorganize scanned document pages.
  • Send scanned documents via e-mail
    E-mail
    Electronic mail, commonly known as email or e-mail, is a method of exchanging digital messages from an author to one or more recipients. Modern email operates across the Internet or other computer networks. Some early email systems required that the author and the recipient both be online at the...

     or Internet fax
    Internet fax
    Internet fax uses the Internet to receive and send faxes.Internet faxing, "e-Fax" or "online faxing" is a general term which refers to sending a document facsimile using the Internet, rather than using only phone networks with a fax machine.Depending on the specific method/implementation ,...

    .
  • Annotate
    Annotation
    An annotation is a note that is made while reading any form of text. This may be as simple as underlining or highlighting passages.Annotated bibliographies give descriptions about how each source is useful to an author in constructing a paper or argument...

     scanned documents including using ink on a Tablet PC
    Tablet computer
    A tablet computer, or simply tablet, is a complete mobile computer, larger than a mobile phone or personal digital assistant, integrated into a flat touch screen and primarily operated by touching the screen...

    .


While the native file format of MODI seems to be MDI
Microsoft Document Imaging Format
MDI is a file format created by Microsoft for storing raster images of scanned documents together with optional annotations or metadata which can include the text of the document, generated by OCR...

, MODI can read and write a small variety of TIFF files. It can also save OCR text into the original TIFF file. However, MODI produces .tif files which violate the TIFF standard and are usable only by the Microsoft Office Document Imaging products. JPEG format images can be recovered from these files using data carving recovery tools designed to cull intact files from images of damaged hard drives such as foremost. The OCR text in these files is visible in a binary editor.

In its default mode, the OCR engine will deskew and re-orient the page where required. If the objectname.save method is called it will save the deskewed reoriented images back into the original image file.

Programmability

Via COM
Component Object Model
Component Object Model is a binary-interface standard for software componentry introduced by Microsoft in 1993. It is used to enable interprocess communication and dynamic object creation in a large range of programming languages...

, MODI provides an object model
Object model
In computing, object model has two related but distinct meanings:# The properties of objects in general in a specific computer programming language, technology, notation or methodology that uses them. For example, the Java objects model, the COM object model, or the object model of OMT...

 based on 'document' and 'image' (page) objects. One feature that has elicited particular interest on the Web is MODI's ability to convert scanned images to text under program control, using its built-in OCR
Optical character recognition
Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. It is widely used to convert books and documents into electronic files, to computerize a record-keeping...

 engine.

The MODI object model is accessible from development tools that support the Component Object Model (COM) by using a reference to the Microsoft Office Document Imaging 11.0 Type Library. The MODI Viewer control is accessible from any development tool that supports ActiveX
ActiveX
ActiveX is a framework for defining reusable software components in a programming language-independent way. Software applications can then be composed from one or more of these components in order to provide their functionality....

 controls by adding Microsoft Office Document Imaging Viewer Control 11.0 or 12.0 (MDIVWCTL.DLL) to the application project. These folders are usually located in C:\Program Files\Common Files\Microsoft Shared\MODI.

The MODI control became accessible in the Office 2003 release; while the associated programs were included in earlier Office XP, the object model was not exposed to programmatic control.

A simple example in Visual Basic .NET
Visual Basic .NET
Visual Basic .NET , is an object-oriented computer programming language that can be viewed as an evolution of the classic Visual Basic , which is implemented on the .NET Framework...

 follows:


Dim inputFile As String = "C:\test\multipage.tif"
Dim strRecText As String = ""
Dim Doc1 As MODI.Document

Doc1 = New MODI.Document
Doc1.Create(inputFile)
Doc1.OCR ' this will ocr all pages of a multi-page tiff file
Doc1.Save ' this will save the deskewed reoriented images, and the OCR text, back to the inputFile

For imageCounter As Integer = 0 To (Doc1.Images.Count - 1) ' work your way through each page of results
strRecText &= Doc1.Images(imageCounter).Layout.Text ' this puts the ocr results into a string
Next

File.AppendAllText("C:\test\testmodi.txt", strRecText) ' write the OCR file out to disk

Doc1.Close ' clean up
Doc1 = Nothing

Changes since Office 2003 Service Pack 3

In Office 2003 Service Pack 3, Microsoft
Microsoft
Microsoft Corporation is an American public multinational corporation headquartered in Redmond, Washington, USA that develops, manufactures, licenses, and supports a wide range of products and services predominantly related to computing through its various product divisions...

 removed the file association for .TIF
Tagged Image File Format
TIFF is a file format for storing images, popular among graphic artists, the publishing industry, and both amateur and professional photographers in general. As of 2009, it is under the control of Adobe Systems...

 and .TIFF
Tagged Image File Format
TIFF is a file format for storing images, popular among graphic artists, the publishing industry, and both amateur and professional photographers in general. As of 2009, it is under the control of Adobe Systems...

file extensions with Microsoft Office Document Imaging as part of the Service Pack's security changes. Also, TIFF files can no longer use JPEG compression. . No detail is given about what the security issue was.

In Office 2010, MODI is fully deprecated. This change also affects the setup tree, which no longer shows the MODI Help, OCR, or Indexing Service Filter nodes on the Tools menu. The Internet Fax feature in Office 2010 uses the Windows Fax printer driver to generate a fixed file format (TIF). MODI and all its components are deprecated for 64-bit Office 2010.

Alternatives to MODI for Office 2010 Users

If running Office 2010 which lacks MODI, there are these alternatives (among others):
  • Follow Microsoft's suggestions which includes an installation of only the MODI software from Microsoft Office 2007. (This installation process might also work with earlier versions of Office): http://support.microsoft.com/kb/982760
  • Install the Alterna-TIFF viewer: either ActiveX control (for IE) or browser plug-in (for other browsers): http://www.alternatiff.com/

  • Install Black Ice's TIFF Viewer and plug-in: http://www.blackice.com/TIFFViewer.htm
  • Install Cartesian Product's CPC viewer: either CPC View ax (ActiveX for IE) or CPC Lite pi (plug-in for other browsers): http://www.cartesianinc.com/Products/CPCLite/

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK