 |
| OUR SERVICES |

|
OCR Conversion
With our OCR s/w we would be able to convert your printed documents into precise information. Conversion of printed documents through OCR had not been so easy until our s/w team has developed and maintaining it.
Document Scanning
IT Datasoft provides document scanning, indexing, and archiving and retrieval services for a wide variety of applications. Scanning documents can be a cost effective alternative to the long-term storage of paper.
Pages of forms are scanned and converted into bit-mapped (usually TIFF) images of forms which are either compressed and stored for later batch processing, or are passed immediately in an uncompressed format to an ICR engine for recognition.
Image analysis
The document image is cleaned up. Character image quality is improved, using image enhancement techniques. Background "noise" is removed from the form.
Form Processing
We can capture data from all types of handwritten and typed forms accurately in the most cost effective manner.
Just as documents must be prepared in order to be fed into a scanner by removing staples, smoothing wrinkles, positioning them for optimal registration, etc., so the image of a form document must be prepared by following these steps before it can be intelligently recognized.
|
|
|
 |
 |
 |
 |
Form Alignment |
 |
The image is registered and deskewed by the ICR software, which utomatically aligns the form by locating special symbols on the document called registration marks as guides.
|
|
 |
Form background removal
|
 |
This stage is not necessary if the document is a form that was originally printed in a colored ("drop out") ink that is invisible to the scanner being used. |
If colored ink is not used, the form image may contain lines, boxes, fine print, & other form attributes-passive data-that tend to confuse the ICR engine. These form attributes must be extracted from the image of the form, so that only the character images-the active data-are left behind. Broken and fragmented characters are automatically repaired and restored to their original shapes.
|
Character segmentation |
 |
Sophisticated software routines analyze, separate, and break down the character fields into isolated characters. If the form is "ICR-friendly" characters |
are segmented with the aid of graphic devices such as boxes, tick-marks, and connected boxes called "combs" that serve to force the form user to legibly separate the characters from one another.
|
Character Classification |
 |
Individual characters are classified by ICR algorithms according to their ASCII category and assigned a confidence value, which is an |
| index of how "certain" the ICR engine "feels" about the selection it has made. Alternate character choices are ranked according to those values, so that they can be incorporated into editing procedures that improve ICR accuracy. For example, the alternate choice "1" might be used instead of the first-ranked choice "I" when contextual analysis reports that the field is all-numeric. |
|
 |
Form Identification
|
The document is identified
by certain spredefined characteristics that the ICR software is trained to look for, so that the zones containing the fields designated forrecognition
|
 |
can be located by a customized, predefined
ICR template. Form ID attributes can includeform numbers, corporate logos, or the name of the form itself imprinted somewhere on the form. |
|
| |
 |
Character Field Location
|
| The predefined ICR template automatically locates the fields that contain character data. The template identifies which individual fields on the form |

|
| image require character recognition, and what the nature of those fields are-hand print, machine print, numeric, alphabetic, alphanumeric, etc. The template also identifies which areas are barcodes or check box recognition zones. |
Post-Processing |
| The initial or "raw" recognition results are validated using edit procedures such as grammatical rules, spell-checkers, dictionaries, check-sum routines, and look-up tables. |
 |
Ambiguous and erroneous data fields-the "rejects"- are identified and sent to data entry operators at workstations for manual correction.
|
Manual correction of rejected character fields |
The manner in which the data entry operator is presented the rejected data for correction can dramatically impact both the speed and the accuracy of the reject repair process. In particular, the data entry GUI is important because the ergonomics of data entry are what enable a given data entry operator to reach his or her maximum correction speed.
With so much opportunity for error increasing at each successive step of the way, it is remarkable that ICR accuracy rates can attain (and sometimes exceed) human performance levels.
|
|
|
 |
 |
 |
 |
|
 |
 |
 |
 |
 |
| © IT Datasoft, New Delhi, India . All Rights Reserved. |
 |
|
|