You are on page 1of 6

Automated Legal Redaction

The Open Government Act of 2007 emphasizes putting administrative and court
records in the public domain as much as possible, and this is supported by the
Obama administration. Because of the vast amount of such records, the redaction
process is being increasingly automated, at least for an initial screening, for it is not
humanly possible to sift through numerous documents and edit out personal
information such as Social Security Numbers, names of minors, dates of birth etc.
Automated legal redaction software not only remove visible information that has
been previously identified for editing, but also help in identifying and removing
metadata about the information and any hidden tracking data that can be
recovered later through professional methods. While major editing software
developers such as Microsoft or Adobe provide their own redaction tools, several
stand-alone automated legal redaction tools are also available.
These automated legal redaction software work in several ways they can delete
the selected information and then create a new file without this information; or they
can replace the information with garbage lettering (in effect replacing one string of
binary numbers with another, meaningless one); or they can draw rectangles on top
of the marked sections, generate an image of the page and then flatten it, so that
the original redacted information is lost. One can then print out these images or
scan them through OCR software in batch mode for electronic storage and retrieval.
These automated legal redaction software also feature self-learning capabilities, and
try to match patterns of your previous work to predict potential candidates for
removal.
However, a redacted document that has been processed by automated legal
redaction software should ideally be screened by a human prior to publication to
check that all information has been omitted, and no clues have been left to recover
that information.

2. (Redact a PDF)
Requirements for PDF Redaction
Legal and administrative requirements are increasingly placing more documents in
the public domain, and electronic documents, including PDFs, are no exception.
Before publishing these files, however, it is necessary to redact a PDF either
manually or through batch software. There are several techniques as to how to
redact a PDF properly, and avoid common mistakes that might lead to inadvertently
leaving data in the redacted PDF file. An example is to make the text the same color
as the background, or to put a rectangle around the text, which allows one to
highlight the text easily and read it.
Using Acrobats built-in tool
The full version of Adobe Acrobat comes with a tool for automatic redaction, and
offers several options. When you start to redact a PDF using this tool, it parses the
document automatically, identifies private data according to criteria you set,
removes the data, and then creates a new file so that one cannot access redacted
data anymore. It can also load a pre-defined wordlist and search the document
according to this list. It is relatively easy and fast to redact a PDF using this tool.
Acrobat have also released a short video on their website which takes you through
the steps and teaches you how to redact a PDF.
Batch Mode Redaction
You can also use stand-alone plugins that can handle a large number of files, and let
you redact a PDF in batch mode, thus saving you considerable amount of time.
Batch mode redaction of a PDF, however, should be used with caution, and always
check some files randomly to make sure that no clues as to the missing data is left
behind inadvertently. Once you have had some practice, and mastered the art of
setting up proper filters, go ahead and redact a PDF with confidence.

3. (Forms processing OCR)

Reading hand-written forms into a computer


Optical Character Recognition, or OCR, software can be very useful while processing
forms. After you scan forms filled by hand into your computer, you can run them
through processing OCR which will recognize the handwritten characters and
transcribe them into the word processing software of your choice. Form processing
OCR applications have made good improvements in accuracy as well as speed of
recognition of characters, and now you can process a batch of files very quickly, and
usually with an accuracy above 90%.
Types of forms processing OCR
Several types of OCR variations are available, including OMR (Optical Mark
recognition), BCR (Bar Code Recognition), MICR (Magnetic Ink Character
Recognition) etc. These software are used to process different types of forms, such
as bank cheques, product bar-codes, check-boxes in ballot papers and survey forms,
special characters and symbols etc. Most of these OCRs, however, have some predefined formats for processing forms, such as scan resolution, a pre-agreed DPI of
the scanned image, color mode, layout and orientation etc. in addition to English, a
few OCRs can also process forms written in some other languages.
Customizing OCR processing
The OCR software can be setup and customized in various ways. One can use predefined templates to match the scanned form image and then process it; or one can
use special modules of the OCR to read images in batch mode; or you can set up
patterns in the form of word lists which can be used by the OCR to process form
characters more efficiently. Well-designed OCR software also perform tasks such as
insertion of data into electronic forms, archiving and retrieval, data scrubbing,
highlighting of unrecognized characters so they can be filled in manually etc. It is
also possible to read data from PDFs containing images through OCR software.

4. (Template based forms processing)

Converting manually filled forms into electronic documents


Forms that are filled in by hand need to be scanned and then converted into
electronic documents, and you have the option of using Optical Character
Recognition (OCR) software for processing the scanned form images. These OCRs
scan the images and process the data based on existing templates to convert them
into editable documents which you can copy into any word processing software. For
large processing volumes of forms, you can either use a pre-defined template, or
edit the template to suit your requirement, or even build a template from scratch.
Advantages of templates based forms processing
In recent years OCR software has increased both in accuracy and speed of
processing forms, and you no longer need to correct numerous recognition mistakes
that earlier generation of software used to make. In fact now-a-days most OCR
software are driven by templates, and if you use high performance scanners to scan
in large bunches of forms regularly, then it is best that you process them through
these templates, which have pre-defined fields that include text boxes, check-mark
boxes, radio buttons etc. The template then simply inserts the data into its proper
fields, and you are ready with the electronic version of the form.
Reuse an existing template or develop your own?
Most OCR software come with pre-defined templates that conform to best practices
in designing forms that are user-friendly as well as machine readable. Of course you
can make minor alterations to these templates to suit your needs, such as renaming
a field, or changing the data type etc. But if you have a form that is not being easily
processed by existing templates, you can build your own using the SDKs provided
by the OCR developers, and design a custom template that can process the form
according to your exact specifications.

5. (Automated redaction tool with OCR)

The case for automated redaction tools


The Obama administration is laying great emphasis on open and transparent
governance, which calls for placing an increasing number of legal, administrative
and financial documents in the public domain. At the same time, privacy concerns
require that sensitive data such as SSN Numbers, names, logos etc. be redacted
out, and the sheer volume of documents means that you must use automated
redaction tools combined with OCR techniques.
The OCR and redaction workflow
Once you have identified a stack of manually filled forms that need redaction, you
shall typically require a heavy-duty scanner that can scan these forms quickly and
in high detail, and automatically send the images to an OCR software. The OCR then
scrubs the data, converts it into editable documents based on its own templates,
and calls the word processing document that you have specified. Many OCRs are
also equipped with their own tools that can help you in automated redaction in
batch mode, and once you specify the rules and word lists, you will get the fully
redacted forms for storage and retrieval. The redaction tool usually generates a new
document without the deleted data, so that data retrieval becomes impossible.
Advantages of redaction tool with OCR
Even though most text handling software such as Adobe or Word come with in-built
redaction tools, they have a disadvantage when you need to process high volumes
of forms in a short time and with high accuracy. There are two reasons: using a
stand-alone redaction tool means you need to take the output documents from the
OCR and then start the redaction process afresh with these tools; also, OCRs are
better at recognizing patterns than other redaction tools. Hence automatic
redaction combined with OCRs give you an output with higher accuracy, and you
can be sure that no undesirable data is left behind.

6. (Web based PDF compression)


The need for Web based PDF compression
Adobe Acrobat files that use a lot of high resolution graphics or other embedded
metadata can become quite large in size, and often you are in need of compressing
the PDF file if you wish to send it via email. When compressing a PDF, you will be
faced with two problems: standard zipping software do not compress PDFs to a large
extent, and if you do not have the full version of Acrobat, you cannot use its built in
compression tools. It is in such cases that you need web based PDF compression
sites.
How does it work
There are many websites where you can upload your PDF and they will do the
compression for you; they will then provide a link from where you can download
your compressed files. They will also provide you options such as screen-based, low
quality output vs print-based medium quality output, and a fast compression vs a
slower compression. These sites compress the images so that they become slightly
lower quality, but still give a good output. If you opt for premium services, you can
increase the PDF size as well as access some more compression options.
Pros and cons of web based compression
As with any other web based service, online PDF compression also has security and
privacy concerns. If you are dealing with sensitive documents then a wiser choice is
to buy the full version of Acrobat; but for the vast majority of cases a web based
compression tools is a better choice. These sites take the hassle of installation and
up-gradation away from you, and you can just choose from the options without
knowing details such as best compression schemes etc. in fact the professional sites
also advice you on the best techniques that will work on your particular document,
and make sure that your uploaded documents are not put in the public domain.

You might also like