You are on page 1of 11

Challenges in Archiving Print and Electronic Government Documents Amanda Marie Yetter While searching through the Government

Documents ListServ, a certain question pertaining to archiving print and electronic government documents caught my attention. Rolanda L. Ridley a Government Documents Librarian at Leonard S. Washington Memorial Library at Southern University at New Orleans (LA) asked, I have a question brought to me by our cataloging department. They have been finding that there are no records for the tangible documents that we receive. They find records for the electronic items, but none for the print.1Ridley continues to say that the U.S. Government Printing Office (GPO) might have put a stop to providing records for tangible items, but she seemed uncertain about the actuality of this occurring. Websters English Dictionary defines tangible as capable of being felt, seen, or noticed; substantial, real.2 This definition would exclude items that are located on the Web. Yet, the GPO is providing more government documents only in online forms. According to www.gpo.gov, the GPO is the authoritative source for providing both descriptive and subject cataloging for Federal Government Documents, Ridleys question remained unanswered in the ListServ and allows questions to be raised about the challenges faced when archiving print and electronic government documents as a whole. Government documents encompass a plethora of items including Supreme Court Records, Congressional Records and documents relating to health and welfare and are in both electronic and print forms. So, how does the Federal Government archive information? While there are many ways to archive print and electronic forms issues of copyright, born digital information, dead links and current information come to mind as major issues with archiving and
1 2

Rolanda Ridley, e-mail to GovDoc-L, September 15, 2010. Websters English Dictionary. Scotland: Geddes & Grosset, 2003. 1

cataloging such documents. One form in particular for collecting information and archiving it is the Electronic Research Collection. The Electronic Research Collection is a cooperative project of the Department of the State, the Federal Depository Library Program, and the University of Illinois at Chicago Library to electronically archive material from the Department of the State Web site. As material is superseded or new editions of materials are compiled, the older electronic versions are moved to this Web site. The National Security Archive is a nongovernmental, nonprofit organization, which has among its goals the creation of an archive of declassified government publications and information. Most of the documents have been obtained as a result of Freedom of Information Act request. Their Web site provides access to the full text of many publications relating to foreign policy and international relations.3 According to the Online Dictionary of Library and Information Science (ODLIS), the Freedom of Information Act (FOIA) was Passed by Congress in 1966, FOIA guarantees right of access to unclassified government information to any American who submits a written request to see copies of specific records or documents. The Act exempts from disclosure information that might prove harmful to national defense, foreign relations, law enforcement, commercial interests of third parties, or personal privacy. The intent behind FOIA is to make government more transparent and accountable to citizens and to prevent secrecy from being used for illegitimate purposes. Similar legislation has been enacted in most European and UK countries. FOIA applies only to federal agencies and does not create right of access to records held by Congress, the courts, or state or local government agencies (each state 4 has enacted its own laws concerning access). Other initiatives to archive the Web include: the National Digital Information and Infrastructure Preservation Program (NDIIPP), Web-at-Risk, and the Digital Library Federation (DLF). According to Cataloging and Archiving State Government Publications by Colleen Valente: In 2000 the United States Congress recognized that a wealth of digital information had been created and needed to be collected and preserved for the future. Congress responded by funding the National Digital Information and Infrastructure Preservation Program
3

J. Sears and Marilyn K. Moody, Using Government Information Sources Electronic and Print, third ed. Phoenix, AZ: Oryx Press, 2001. 4 Online Dictionary of Library and Information Science (ODLIS). http://www.lu.com/odlis/. (accessed October 12, 2010). 2

(NDIIPP), overseen by the Library of Congress, to develop a national strategy to collect, preserve and make available significant digital content, especially information that is created in digital form only, for current and future generations.5 Another initiative is the Web-at-Risk. The Web-at-Risk projects was funded by the Library of congress National Digital Information Infrastructure and Preservation Program (NDI-IPP), and charged with enabling the archiving and preservation of government and political information on the Web. When project staff assessed the preservation needs of a group of curators and librarians, they found that for many institutions, documents found on government websites are a preservation priority, rather than the sites themselves. However, locating and harvesting these documents as they become available remains a significant challenge. Attempting to access PDF copies of federal government documents through the Wayback Machines archived copies of the www.gpo.gov website reveals that many of the documents have been listed in a robots.txt file on the U.S. Government Printing Offices (GPOs) site. This file instructs Web crawlers, such as those used by the Internet Archive, to ignore certain areas of the site. The files are generally used to exclude files from search engine indexes, but in this case they seem to have prevented the documents from being archived as part of the GPOs site. 7 According to the Digital Library Federation (DLF) Web site, the Digital Library Federation (DLF) is a consortium of libraries and related agencies that are pioneering the use of electronicinformation technologies to extend collections and services. (http://www.clir.org/dlf.html). DLF is committed to identifying standards and best practices for digital collections and network access, coordinating research and development in the use of information technology by libraries, and assisting in the initiation of projects and services that individual libraries lack the means to develop on their own.4 There is also the Digital Millennium Copyright Act (DMCA) which according to ODLIS is legislation passed by Congress and signed into law in October 1998 to prepare the United States for the ratification of international treaties protecting copyrights to intellectual property in digital form, drafted in 1996 at a conference of the World Intellectual Property Organization
55

C.Valente, Cataloging and Archiving State Government Publications. Cataloging and Classification Quarterly, 48 no. 4 (March 2010): 315329. 3

(WIPO). The bill was supported by the software and entertainment industries and opposed by the library, research, and education communities. Other related initiatives to preserve sound records and government documents include the oral history initiative. Oral history is important in the area of government documents as it provides first hand information in a form that can be replayed repeatedly on a given medium. The federal government has conducted an enormous number of oral histories to preserve institutional memory from the early days of agencies, to document lessons learned, to increase visibility of an historical office within a large agency, to enhance museum exhibits and historic sites, and for various other reasons.6 How does this initiative keep working without breaking any form of copyright laws? This is simply because as a general rule, the U.S. copyright statue protects all varieties of literary, musical, dramatic, choreographic, pictorial, graphic, sculptural, and audiovisual works and sound recordings as soon as they are fixed in what the statue calls any tangible medium of expression.7 Copyright office records are available for the public and are open to public inspection and searching from 8:30 A.M. to 5 P.M., eastern time, Monday through Friday, except federal holidays. The various records that are available to the public include an extensive card catalog, an automated catalog containing records from 1978 forward, record books, and microfilm records of assignments and related documents. Other records, including correspondence files and deposit copies, are not open to the public for searching. However, they may be inspected upon request and payment of a $65 per hour search fee.4 The beauty of the Internet is that searching is available at all hours of the day and is not confined to certain search times. With this in mind, it seems that the Copyright office might benefit from becoming available to be searched via the Web.
6

T. Finchum, Sharing Government Documents with the People. Dttp Documents to the People 38, no.1 (Spring 2010). 32--35. 7 L. Wilson, The Copyright Guide: A friendly guide to protecting and profiting from copyrights. Revised edition. New York: Allenworth Press, 2000.
4

What items are protected under the current Copyright Act? Bryan Carson states in The Law of Libraries and Archives that Unpublished lectures, letters, and manuscripts are the staples of historical and archival work. The current Copyright Act provides that all of the materials covered by this statue will automatically become copyrighted upon creation, regardless of whether the work is published or whether it includes a formal notice of copyright8 If that is the case then what happens to the items that are born digital and do not have a chance to be published in a sense of being put out in a paper form? Szydlowski grapples with this idea in Archiving the Web: Its Going to Have to be a Group Effort and states that The idea of the Internet as a primary publishing platform is not a futuristic scenario; government documents, newspapers, and even the output of some academic disciplines appear on the Web first, and may not appear in print at all. 9 What are future librarians going to do with such a task to archive and preserve government documents that are born digital and only available on the Web? There have been some changes made over the past twenty years to prepare for such events and One integral part of the linking to other agency-hosted online documents is the GPOs PURL program. A URL is a web pages unique electronic address, but URLs can change in a blink of an eye. For those documents whose electronic copies are on an agency website and whose permanent access is not under the direct control of the GPO, GPO catalogers assign a PURL (persistent uniform resource locator), which looks like a regular URL but functions like a switching service that links the end user to the desired web page, wherever it currently resides.10 What happens with the PURL is that If a URL changes, it is the PURL providers responsibility to correct the link. The PURL program is administered by the OCLC and through any organization can become a PURL provider the GPO is one of the major users of the system. All current online government documents are assigned PURLs by GPO catalogers, and the PURL
8 9

Bryan Carson, The Law of Libraries and Archives. Lanham, MD: Scarecrow Press, 2007. N. Szydlowski, Archiving the Web: Its Going to Have to Be a Group Effort, in The Serials Librarian 59: 1 3539. http://dx.doi.org/10.1080/03615260903534908. 10 Andrea Morrison, Managing Electronic Government Information in Libraries: Issues and Practices. Chicago: American Library Association, 2008. 5

appears as a live link in GPO catalog records.10 This does not mean that the work will be any easier on librarians or archivists in maintaining current records, but it does make it easier to maintain URLs by creating a snapshot of the Web at a given moment linking the patron or user to the Web site. While Wilson states that The Internet Archive is a wonderful service, but librarians and archivists should not be lulled into thinking that the job of archiving the Web content that is most important to our patrons will be done by someone else. 7 This still means that librarians, archivists, and not just the GPO will do the major legwork needed to preserve government documents. While the GPO provides great tools and has began to form current initiatives to preserve and archive the Web it is still an ongoing issue. What will happen to the various forms of mediums used to archive the Web now and the storage space needed to make sure that the information can be housed and maintained properly? Morrison mentions the Floppy Disk Project and other disappearing forms of digital formats. How many people have Floppy Disk readers on their computers? As technology changes and the forms of storage and information retrieval changes and develops the mediums in which we store information needs to be updated, upgraded and changed to follow the current trends in order to preserve our historical past for the future.10 Another way to preserve and maintain documents is the CyberCemetery which is a cooperative program that addresses the problem of disappearing agency pages is the University of Texass partnership with the GPO.10 Here, users can search and browse a plethora of government information from agency pages that are no longer in existence. Other digital information preserved by the CyberCemetery includes that from government entities dissolved by mandate of Congress, such as the National Educational Goals Panel (1990-2002), and temporary government
1 1 1

entities such as independent counsels. The CyberCemetery is also a particularly good resource for a specific task or time period and then dissolved10 Unfortunately, after checking on the CyberCemetery, it too has been dissolved and has ceased operation. However, information from the Web site states that The CyberCemetery is an archive of government websites that have ceased operation (usually websites of defunct government agencies and commissions that have issued a final report). This collection features a variety of topics indicative of the broad nature of government information. In particular, this collection features websites that cover topics supporting the universitys curriculum and particular program strengths.11 Some other initiatives used in the preservation of materials would be Google and the Google Book Search. Google is well known as one of the primary Internet search engines. In 2004, Google launched a massive digitalization project in conjunction with Harvard University, Stanford University, the University of Michigan, Oxford University, and the New York Public Library. Contrary to the buzz at that time, these partnerships do not entail digitizing the entire collections of these libraries, but, even so, the sheer number of projected digitized titles is amazing. Assuming similar scanning costs, Google will spend $750 million to scan the 30 million volumes contained in the collections of the five participating libraries. Since the original announcement, other libraries have joined the project with varying degrees of commitment, including the Library of Congress, the University of California System, Madrids Complutense University, the Libraries of the University of Wisconsin at Madison, and the Wisconsin Historical Society Library. Many of the titles to be digitized in the Google plan are government documents, and questions have been raised in the government documents preservation community concerning the authenticity and authority of these digital copiesan important point when dealing with documents that may at some point become exhibits in court cases or other legal proceedings. The idea of preservation these documents and other titles is a good one, but Google has not made a commitment to provide long-term access to these digitalized copies or to migrate to new electronic formats as they appear in the marketplace. Copyright issues remain a concern and would prevent free public access to some documents. The jury is still out on the ultimate value to future researchers of Googles digitization project.10
1 11

University of Northern Texas, CyberCemetery, University of Northern Texas Government Documents Department, http://govinfo.library.unt.edu/. (accessed October 12, 2010).
1

Googles project brings to mind the idea of the National Archives and what the National Archives and Records Administration does in the area of archiving. According to their Web site: The National Archives and Records Administration (NARA) is the nation's record keeper. Of all documents and materials created in the course of business conducted by the United States Federal government, only 1%-3% are so important for legal or historical reasons that they are kept by us forever. Those valuable records are preserved and are available to you, whether you want to see if they contain clues about your familys history, need to prove a veterans military service, or are researching an historical topic that interests you.12 At the NARA, information can be searched for pertaining to government documents and can be purchased for a fee or located in a depository library. Personally, the NARA is a great source in finding information from various wars and other information pertaining to the military that is often difficult to find in a federal depository library. The Web site provides an easy to navigate forum for searching by subject, SUDOC number, and even simple definitions of what an archive is and what a record is. All of this information is pertinent in the area of government documents and maintaining current and accurate records for the archival purposes. In regards to issues with cataloging government documents Valente notes that The cataloging problems identified need to be assigned to very experienced, if not professional cataloging staff. Good guess work, based on judgment was frequently the only way to resolve problems. Of all the problems that had to be dealt with, authority work was usually the most troublesome. The lack of authority records for Alabama agencies made it difficult to clearly express relationships among agencies and their subordinate units. Not only do records for subordinate bodies often not exist but, when they do, the records often lack useful cross references to earlier forms of names. Most of the existing records are too old to reflect more recent changes. For instance, the State Motor Pool became the Office of Fleet Management sometime after 2000 under the main entry heading: Alabama. State Motor Pool (OCLC #45414710). The description of the body of the record indicates that the name Fleet Management was in use but the relationship between it and the Motor Pool was, apparently, unclear to the cataloger. In order to resolve the problem of choosing the correct agency name, the cataloger made an added entry for: Alabama. Dept. of Finance. Fleet Management. This may have been an attempt to cover all possibilities. The lack of clarity is still apparent today on the agencys Web site. The front page features the name State Motor Pool prominently and does so in a way that suggests that it is a unit of the Office of Fleet Management. They are actually one and the same entity, but it required a phone call to the directors office to
12

National Archives and Records Administration, http://www.archives.gov. (accessed October 12,2010). 8

clarify the situation. 5 Some situations do need to be clarified and in the case of the Alabama issue with the Department of Finance Fleet Management names and authority records need to be updated and made to be accessible for patrons and users seeking information. In conclusion, the majority of the legwork needed to archive government documents effectively needs to be administered by trained catalogers, not just librarians or the GPO. The information must be current and up-to-date in the realm of born digital information and the like of CyberCemetery archivists need to make sure that the information is current and up-to-date for users, patrons, and librarians to access quick and effective records. The GPO estimated in 2004 that 50 percent of all U.S. federal documents were born digital, that is created digitally with no print version. Two years later the Pew Internet and American Life Project reported that 73 percent of American adults were Internet Users. If one wants to copyright a book, get an extension from for filing a tax return, track satellites with NASA, get a small business loan, or find out how a senator voted on a particular bill, the information may be found on a government website.10 Information saved and stored needs to be on current mediums and forms available for quick and easy access to records. The NARA provides such records available for an easy search and price provided online for users and patrons to access. With this in mind is the future of FDLP at jeopardy due to the growth of the Internet and born digital information only available online through the Internet? What about the future for librarians as a whole? Regardless of the strength of the Internet in regards to government documents FDLP will still be in existence as they house all of the documents necessary to give quick and dirty answers to the people in a short amount of time. Librarians and catalogers will always have jobs in being the finding mechanism and searching aid to helping patrons find information both quickly and effectively. The future of the Internet is one that will continue to grow and morph as technology
5 1

increases and technological awareness increases with patrons and users alike. Librarians will always be present to bridge the gap between the knowledge and the searching and finding functions of such programs whether at the library or through an Ask the Librarian button online. Without libraries housing the information needed for government documents, the catalogers that catalog the information, and users seeking information the library as a whole may disappear. However, searching government documents will never disappear as users are intrigued by what the government produces, in need of services provided by the government and intrigued by the new types of media that that government finds to store such historical relics on. With the onset of Googles initiative and the Oral history Initiative government documents will continue to be at the forefront of preservation for years to come.

Bibliography Carson, Bryan M. The Law of Libraries and Archives.Lanham, MD: Scarecrow Press, 2007. Finchum, T. Sharing Government Documents with the People. Dttp Documents to the People. 38, no.1 (Spring 2010): 32--35. Morrison, Andrea, ed. Managing Electronic Government Information in Libraries: Issues and Practices. Chicago: American Library Association, 2008. National Archives and Records Administration. http://www.archives.gov. (accessed October 12,2010).

10

Online Dictionary of Library and Information Science (ODLIS). http://www.lu.com/odlis/. (accessed October 12, 2010). Rolanda Ridley, e-mail to GovDoc-L, September 15, 2010. Szydlowski, N. Archiving the Web: Its Going to Have to Be a Group Effort The Serials Librarian 59: 1 3539. http://dx.doi.org/10.1080/03615260903534908. Sears, J. and Marilyn K. Moody. Using Government Information Sources Electronic and Print, third edition. Phoenix, AZ: Oryx Press, 2001. University of Northern Texas. CyberCemetery, University of Northern Texas Government Documents Department, http://govinfo.library.unt.edu/. (accessed October 12, 2010). Valente, Colleen. Cataloging and Archiving State Government Publications Cataloging & Classification Quarterly, 48, no. 4, (March 2010): 315329. http://dx.doi.org/10.1080/01639371003614726 (accessed October 12, 2010). Websters English Dictionary. Scotland: Geddes & Grosset, 2003. Wilson, Lee. The Copyright Guide: A Friendly Guide to Protecting and Profiting from Copyrights Revised edition. New York: Allenworth Press, 2000.

11

You might also like