The Library of Congress >> To Preserve and Protect

Publications (The Library of Congress)

Preservation and Security Challenges

18. The Coming Crisis in Preserving Our Digital Cultural Heritage
Clifford A. Lynch

This paper offers a brief survey and synthesis of several developments in areas as diverse as intellectual property law, the marketplace for cultural and intellectual goods, and the technologies involved in maintaining digital information across long periods of time. These developments are converging to create a crisis in our ability to preserve our cultural heritage as this heritage increasingly migrates into digital formats.

In the historical period we are just leaving behind, the stewardship of cultural and intellectual heritage was primarily concerned with the acquisition and subsequent preservation of physical artifacts. Copies of books, sound recordings, photographs, prints, pamphlets, and other materials were made available in the public marketplace. (There was also, of course, a market in original unique objects such as paintings, but this involves different issues, which I do not consider in this paper.) Libraries and other institutions concerned with preserving our heritage obtained copies through purchase, donation, or other means and then kept the artifacts in trust for society. Anything broadly available commercially or for free was available to libraries as well as to individuals. This was enabled by a legal framework that included both copyright law and the doctrine of first sale.

There were of course problems. Some artifacts offered in the marketplace were poorly constructed for long-term persistence, for instance, books printed on acid paper; the intent of the producers was inexpensive mass production for a consumer market. Some important cultural materials were not sold as artifacts, and libraries had trouble obtaining copies, as has been the case with television broadcasts or films before the emergence of the videocassette marketplace. Other artifacts could be used only with technical playback systems that quickly became obsolete or unavailable, for instance, early computer games and some audio and video materials. In an ironic twist of fate, the only surviving record we have of many early films is the paper prints that were deposited for copyright registration in 1894 and the following decades. The films themselves were produced for limited distribution on volatile nitrate film stock that is long gone, and the studios that created them often did not even try to preserve them; indeed, many early studios simply went out of business. But the system worked well enough—particularly for print and sound recordings, where a mass market existed from the beginning. Libraries were able to have access to the vast majority of our cultural and intellectual heritage and to select what they wished to preserve from this treasure trove, and our civilization is immeasurably richer for this.

More and more of our society's new cultural and intellectual works are being produced in digital forms. Older works are also being repackaged (sometimes with important enhancements) as digital products. As this migration takes place, we are seeing the emergence of a new and very different marketplace in intellectual and cultural goods. In this new marketplace, content is moving to disembodied collections of bits that are delivered over the network, removed from any specific artifactual "carrier." Even in the still-numerous transitional cases where carrier media remain in use, the complexity and rate of obsolescence of the playback system technology mean that it will become increasingly commonplace to find media that can no longer be played. There are relatively short windows of opportunity when content can be copied and reformatted from one medium to another while old and new playback technologies briefly coexist in the marketplace. Preservation requires active management and continual vigilance.

The terms of availability are changing as well. Rather than selling an artifact, content is made available to the public under constrained license terms that restrict sharing, copying, transfer of ownership, display, performance, and other use, sometimes far beyond the customary constraints imposed on artifacts by copyright law. In the most extreme cases, consumers do not obtain works at all, but rather the right to experience a work for a limited time under a pay-per-view or similar rental framework, with no guarantee that a work enjoyed today will still be available to be enjoyed tomorrow, even if the reader is prepared to pay the additional fees. These "pay-per-view" arrangements convert a much larger class of works than the traditional performing arts into ephemeral, transient, experiential things that sometimes may only be shared or revisited through memory and re-description rather than through revisiting the work itself.

Content that is available to the consumer may simply not be available to libraries under terms that allow long-term retention and future provision to the interested public. Although there has never been an obligation on the part of publishers and other content distributors to accommodate libraries among their customers (and thus ensure that their materials will be available to the society for the long term) as part of marketing their wares to the general public, in an era characterized by a marketplace in artifacts, it was very hard to avoid doing so.

In the new world of digital content and commerce governed by license, it is very easy to target experiential consumer markets while explicitly excluding long-term access (ownership of copies) either by private collectors or by cultural heritage institutions such as libraries. This shift is well illustrated by the new characterization of music as a "service" that is being promoted by some parts of the recording industry. Rather than acquiring ownership of copies of specific musical works, consumers pay for a subscription that allows them to listen on demand to a large but perhaps ever-changing corpus of music, the details of which are determined by the industry. This may be attractive to the consumer, but it is problematic for organizations concerned with the long-term preservation of the cultural record.

We are still limited in what we know about either the costs or the best technical strategies for preserving digital content. In particular, we face difficult intellectual issues about exactly what we are trying to preserve. But we are developing a broad consensus in several areas. First, we need to focus on the bits, and not the artifacts that may temporarily carry them. The bits that define the works—and not the media that may house the bits at any given time—are what are important and what need to be preserved. In a world of short-lived artifacts and even shorter-lived playback systems, we cannot count on bits stored on and bound to artifacts to be reliably readable in the long term simply because we have placed the artifacts on shelves. Instead, the strategy is to copy bits from older storage technologies to newer ones on a continuous basis, taking advantage of those periods when generations of technologies overlap and copying can be done inexpensively, and without incurring the risks involved in making assumptions about the shelf-life of the various media. Preservation of digital materials is a continuous, active process (requiring steady funding), rather than a practice of benignly neglecting artifacts stored in a hospitable environment, perhaps punctuated by interventions every few decades for repairs.

Second, we recognize that maintaining bits in a digital world depends not only on storage hardware but also on software to interpret the bits and that software systems and standards (image, audio, and video formats, for example) will evolve over time. Because of this, not only do we need to simply copy bits, but we also periodically need to reformat works from older standards to newer ones to ensure that we will continue to have available software that can interpret the bits. Whereas we know a great deal about the mechanics of how to manage bits across time, we have no general theory of how to manage the migration of formats across time as standards and software evolve, though there is some basis for optimism about our ability to successfully navigate format migrations case by case assuming we are able to ensure that digital materials receive sustained and careful attention and stewardship. This is the best understanding we have today about preserving digital information, along with a well-honed sense of the fragility of complex digital information such as interactive computer games or simulations that depend not just on rendering content for the human perceptual system but on the integral participation of computing systems in mediating the "performance" or execution of the digital work.

Third and last, we have learned that preserving digital works is difficult, even if we can easily read all the bits when the work first arrives at the archive that is to manage it, and even if the format of the work is well documented. But we are facing disturbing developments that make the task of preservation infinitely more difficult. Some works are no longer available in the marketplace as open, documented files of bits. Rather, they are encrypted and wrapped in protective active software systems that perform and enforce rights management by preventing copying. New laws (discussed below) have made it a crime to attempt to bypass these protections (though there are some exemptions), but even leaving aside the legal issues, these protective measures vastly complicate the copying and reformatting that is necessary to preserve digital works.

It is probably not an exaggeration to say that the most fundamental problem facing cultural heritage institutions is the ability to obtain digital materials together with sufficient legal rights to be able to preserve these materials and make them available to the public over the long term. Without explicit and affirmative permissions from the rights-holders, this is likely to be impossible. Such permission is no longer part of the standard commercial framework as we have moved toward licensing and pay-per-view agreements and away from a marketplace dominated by long-lived artifacts. Indeed, recent legal developments—in particular, some of the provisions of the Digital Millennium Copyright Act, such as those dealing with anticircumvention—have made it much more difficult for libraries to act to preserve digital content in the absence of explicit permissions from rights-holders. Legislation such as the Uniform Computer Information Transaction Act (UCITA) has helped to legitimize pay-per-view and licensing frameworks. The Sonny Bono Copyright Extension Act, by stretching out the term of copyright, has also made libraries more dependent on obtaining explicit permissions to ensure that digital materials are preserved for the long term. Because of the new, extraordinarily long terms of copyright, it is even more improbable that artifacts bearing digital content, or the digital content itself, will remain readable throughout the duration of copyright without active stewardship.

In this new world, then, content may not be preserved by traditional cultural memory institutions, except perhaps by the Library of Congress, by virtue of its special, peculiar status, under American copyright deposit (and the issues here are not entirely clear at present), or by other national libraries under their own national copyright deposit arrangements, unless the rights-holders take steps to make sure that this happens. This prospect places a heavy burden on these national libraries and is particularly dangerous because of the high degree of reliance on a handful of unique institutions that are subject to the vagaries of politically based funding and policy direction. The preservation of large portions of our cultural heritage may depend critically on ample and consistent annual funding to a single institution. A few years of budget austerity could cause large portions of this record to vanish. Previously, for print collections, a wide range of public and private funding sources underwrote the preservation of the record, and the nature of preserving print was such that it could survive considerable periods of lean budgets. An equally diverse group of institutions both public and private (as well as individual collectors) actually collected and preserved the materials, and these materials are widely distributed.

We have several sets of issues to consider as we look beyond the possible special role of the national libraries, which can invoke copyright deposit regulations as a means of obtaining control of copies of materials for preservation. (Note that there is a broader, and much more complex and controversial issue involved here, which is largely beyond the scope of this paper: who gets access to the materials and under what terms? The problem is one of finding a balance that does not destroy the marketplace in cultural and intellectual goods, but that still provides some measure of access to the public through cultural memory organizations. Negotiating this balance will be an extraordinary challenge. I am focusing here more narrowly on preservation.) If the broader community of libraries, archives, museums, universities, and other cultural heritage institutions is to exercise stewardship over our intellectual record in the digital age, as matters stand today these institutions will need to obtain permissions to perform these functions. How and why might they obtain such permissions?

For publishers in the consumer marketplace, the concerns are with revenue maximization and asset management through managed availability. At best, questions of long-term preservation of their wares as cultural heritage are irrelevant (think about broadcasts of the nightly news, or about newspapers migrating to the digital medium). At worst, it runs actively counter to their economic interests (here, think about entertainment products like music). To be clear, there is growing recognition that archives of various materials do have economic value, and many content producers are offering products that involve archives (such as the newspaper industry); but this is changing content into new products, not preserving it for the longer term when it is no longer viable as a product. It is not at all clear why these content-owners—particularly the smaller or newer organizations that have not yet become cultural institutions in their own right—will even bother to spend the time and money to engage the issues of putting permissions in place to ensure preservation, much less actually grant the needed permissions.

There is another large class of content that can best be termed "ephemera"—network-based analogs of pamphlets, broadsides, menus, transportation schedules, and the like. Historically, if libraries could obtain a copy of such items, they could preserve them using the framework of copyright, but for current materials, legal agreements and permissions are needed. Yet the authors of these works are often not major economic players, or they produce content as a byproduct of other economic activities; they do not have the funding or the interest to enter into such legal agreements. Indeed, it is often impossible even to identify the authors of such works or to engage them in a discussion about such agreements. One can see this problem vividly in the efforts of the Internet Archive, which simply collects publicly accessible Web pages on a continuing basis and archives them under what are at best uncertain legal auspices. The notion of the Internet Archive actually negotiating with the author of each Web site and obtaining permission to make, store, and maintain a copy of the site is literally unthinkable.

In the old world of physical artifacts, simply by publishing their works so that an archive or library could obtain a copy authors would in effect enter into the necessary agreements to ensure that their works would be archived as a byproduct, but this is no longer the case. The category of ephemera is actually very broad—consider advertising, for example—and grows ever broader as more people employ the Web as a democratic, low-barrier-to-entry means of sharing their ideas. Access to the digital printing press has truly become available to almost everyone, but without the historic properties that accompanied the physical output of the older printing press that were so essential to preservation.

It is informative to look at the case of academic and scholarly journals that have moved to the digital world. This is a very different situation than we find in the consumer marketplace. Here, in most cases, libraries constitute the primary marketplace. But even more to the point, these journals exist to serve their authors and readers, who are scholars operating within a strong culture of the importance of maintaining the intellectual record. Organizations such as the Coalition for Networked Information, the Council on Library and Information Resources, and the Association of Research Libraries have sponsored a number of meetings over the past few years to try to address issues of archiving scholarly journals of record as they move to digital form. One very strong message that has emerged from these discussions is that there is a deeply held shared commitment to archiving and to the integrity of the intellectual record as represented by these publications. The publishers of these works, by and large, have made it clear that they are prepared to assign the necessary rights and permissions to libraries to ensure that these works are archived and maintained for the long term. They understand that they have an obligation to do so in order to keep faith with their authors and readers, and that if they do not do so, they will not be able to continue to attract authors and readers as their publications migrate to digital form. Even if they were unwilling to do this, it seems likely that libraries, as their primary customers, could persuade them to do so, but such market pressure appears to be largely unnecessary. There is a shared, common set of values that says that preservation is essential and that the appropriate permissions simply must be put in place to make it happen.

Some difficult technical, economic, and organizational issues need to be resolved in order to put an effective and comprehensive system of archiving for scholarly journals of record in place. And yet, these developments—and, most particularly this strong affirmation of common values related to the integrity and preservation of the intellectual record—leave me optimistic that the problems will be solved.

But contrast this to mass market cultural products. We have little evidence so far that creators, consumers, and publishers in this world have been able to articulate a similar set of shared values around preservation. Many rights-holders are keen on the notion that they can simply withdraw a work from circulation at will, regardless of how many people may have seen it and the extent of the work's impact on society. As discussed already, there is a new emphasis on content that is offered only on a limited-time and limited-use basis rather than having copies distributed for continued consideration and reassessment. And, of course, the impact of a work and its cultural value may be perceived only in retrospect. Orwellian scenarios involving the purging or rewriting of what is clearly well-established cultural and intellectual history are actually embraced by some as desirable and even attractive consequences of the new technologies of content control and the new licensing frameworks.

The issues in question are actually quite profound and nuanced intellectually and it is not clear what the right answers are. We have traditions of creators as owners, enjoying the exercise of both property and moral rights over their creations. But offsetting this, for example, we have a strong historical tradition that considers publication an essentially irrevocable act, that once a work is published, it cannot be withdrawn from the public record. As a society, we generally reject government censorship and are at best deeply uncomfortable with the idea that the exercise of ownership rights can reverse the act of publication rather than amend it. In this tradition, it may be possible to prevent new copies from entering the hands of the public, reducing the work to a rare and specialized, but not inaccessible part of that public record. The work becomes something that may be consulted, perhaps with some difficulty or inconvenience, without necessarily being available in new copies for new purchase. In effect, our cultural and intellectual record has been supplied by the consumer marketplace over time but has existed distinct and independent from the present status of that marketplace at any particular point in time. Rather, the cultural record has represented a summation of all that has ever been available in the marketplace.

Today, in a radically altered legal and technical landscape, it appears possible to change all this—but should we, and if so, on what basis? Libraries and other cultural heritage organizations have traditionally served as the society's advocates for preservation. One can all too readily envision futile attempts by these cultural heritage organizations to intervene in the new consumer marketplaces, where content is made available only on a pay-per-view basis. Libraries simply do not represent a significant sector of the marketplace, and they are likely to be told either to accept the same terms as every other consumer or to refrain from licensing the product if they do not like the terms—or even that the distributor of the work simply does not care to do business with libraries at all. If this disenfranchising of our cultural heritage institutions, this elimination of any opportunity to preserve these materials, occurs, it will hurt our society in the long term. Such an issue is unlikely to be resolved by marketplace forces.

It is all too easy to invoke a sort of narrow legal and economic determinism here, to simply say that our current laws and marketplaces empower the rights-holder (and even perhaps the consumer) and that this is a good thing. We may be tempted to make vague references to the inevitability and necessity of the globalization and harmonization of intellectual property law or to argue the economic need to maintain parity with European Community copyright law and policy all as a way of abdicating any real responsibility for social consequences. Standing in opposition to these developments, but too often overlooked today, is the fundamental constitutional construction of intellectual property in American society.

Intellectual property rights are not just another form of property rights, they are a part of a pact between creators and society as a whole. These rights are a tool to advance the "sciences and useful arts," as specified in the U.S. Constitution. Rights are assigned for a limited term, with the intent that after that term, works will become part of a national intellectual patrimony, a part of the public domain.

As I understand it, the Constitution does not speak directly to a public intellectual and cultural record, though copyright deposit legislation looks to the ongoing construction of such a record. And surely such a record is vital, not only as precursor to the public domain but also as a necessary prerequisite for an informed, educated, accountable, vital, and democratic society. Our society, at least as we conceive of it today needs its libraries and its intellectual and cultural record. Perhaps the framers of the Constitution saw such a record as a thing that would evolve and thrive naturally and hence needed only limited protection through provisions such as the freedom of the press (though certainly our ideas about how broad, and how public, such a record should be has expanded since the writing of the Constitution, and has developed in tandem with the evolution of democratic cultural heritage institutions such as public museums and libraries). Perhaps the framers could not foresee the constellation of economic, technical, and legal forces that today is assembling to threaten the existence and integrity of such a record, and thus felt no need to build in explicit protection against these forces. This is an area where constitutional, legal, political, historical, and cultural scholars are shaping the discussion that we need to have.

Interestingly the risks we suffer are not those of direct government control over the intellectual record. The vision that George Orwell portrayed in 1984 (to cite one canonical example from a rich genre) was of a totalitarian government that had obtained comprehensive control over this record and that continually rewrote it in order to maintain power and to further its own ends. What is threatening us today is not an abuse of centralized power, but rather a low-key, haphazard deterioration of the intellectual and cultural record that is driven primarily by economic motivations and the largely unintended and unforeseen consequences of new intellectual property laws that were enacted at the behest of powerful commercial interests and in the context of new and rapidly evolving technologies.

It is time for a blunt, fundamental discussion about the importance of preserving our social, cultural, and intellectual heritage as a key public policy goal; about the need to maintain this as a record that is held in trust for all citizens and that can be consulted by all citizens. We need to explore if and how to formalize new principles: for example, once a work has influenced the thinking of millions of people, it must, at some level, become part of the heritage of society as a whole, and we as citizens must have some rights and capabilities to revisit it. In other words, there is a point at which works that reach the public must become in some sense part of a public record. We need to be clear about how the social and intellectual record differs from the marketplace in intellectual properties and the extent to which this record is permitted to encroach upon the unfettered operation of the marketplace. And perhaps, in order to encourage the development and maintenance of this record, we need to make it easy for ephemera to enter this record and subsequently to be preserved without special actions on the part of creators. We need to consider whether restrictions on use or easy incorporation into the public record, and later the public domain, should be the default mode of operation in the absence of specific, affirmative actions by creators or their agents.

But we also need to be absolutely clear that the social and intellectual record at issue here is not necessarily something that is available instantly without charge, and without limitation from any computer connected to the Internet; it is something that is held in trust, collectively, by our cultural memory institutions. We must still address the exquisitely complex and delicate problem of how we can provide at least some level of access to this record (and what levels of access to what part of the record) without damaging the marketplace that creates so much of its vibrancy and richness.

A particular group of questions to which we must be sensitive concern the rights of authors and other creators, as distinct from the rights of publishers and other large corporate entities that often present themselves as speaking on behalf of creators. My focus here is not primarily economic; on an economic basis there is often considerable alignment between authors and publishers, and the central issue I am concerned with here is what can be preserved, not the ability of authors to derive income from a marketplace in their works. The most recent revisions of American copyright law have begun to introduce European notions of "moral" rights of creators into the discussion, in part because of international harmonization. At least in theory the new legal and technical capabilities give creators (or their assignees) an unprecedented ability to withdraw their works from circulation or otherwise control how they are seen after publication—or perhaps more appropriately, after they are granted broad availability, because the idea of publication per se seems to be ever more elusive. The fear is that moral rights will not be invoked by creators to protect the integrity of their works, but that they will become the tool of other interests in manipulating availability for other ends.

There are many kinds of creators with many purposes. A poet no longer comfortable with his or her youthful published works and who would just as soon see them forgotten is very different from someone now nominated for high office who is haunted by an embarrassing speech from a few years past that he or she would like to expunge from the record before the news media can obtain copies. Both of these cases are in turn very different from investigations of attempts to manipulate the price of stocks over time through the message-board discourse that has developed among investors in the digital world. Although all of these might be grouped together under a legalistic analysis, I think that the public would have very different degrees of sympathy for the rights of the creator to withdraw his or her works from public scrutiny from one scenario to another. The correct answers here are anything but clear because of this enormous variation, but the questions need to be part of our conversation about the future of the intellectual record, particularly in conjunction with the possible emergence of technologies that can "undo" publication or other broad distribution.

The recent Tasini v. New York Times et al. litigation is an excellent illustration of some of the issues and dilemmas that we must face in addressing the maintenance of an effective intellectual and cultural record in digital form as a public policy goal, and of balancing this goal with the rights of creators. The Tasini case also illustrates the problems of sheer scale, of practicality, and of overhead and transaction costs that may arise in trying to honor creators' rights as we try to migrate much of our existing cultural record to digital form in a context of extremely lengthy terms of copyright protection. It is somewhat different from the other situations I have discussed but has important resonances.

In Tasini we have a situation where the courts found that a number of authors have suffered an injustice. Their rights to control and benefit from the use of their works have not been respected. But redressing these abuses could have a high social cost: the potential corruption of key parts of our intellectual record. These authors contributed materials to major newspapers and magazines of record that were read by millions, and their works were reproduced in digital representations of these publications of record, thus providing an accurate digital representation of record that reflected the earlier printed works. The authors argued—and the courts agreed—that because the publishers did not have the rights to supply their works for inclusion in these digital compendia, their works should now be removed unless the publishers come to terms with the authors and obtain their permissions. Pragmatically it presents a real problem for the publishers: there are many authors involved and many works involved, and simply contacting all of them and concluding the necessary negotiations is a huge—perhaps impossible—task. Many database providers have removed substantial numbers of articles from their databases as a result of the decision. The only good news here is that although the integrity of the digital record has been damaged, we still have print and microform copies of the original newspapers to refer to (however inconvenient this may be).

We must find ways to avoid such debacles in future, particularly when we may no longer have the earlier print record as a recourse.

The public policy discussion needs to focus on questions about what sort of intellectual and cultural record we need to maintain, and why and what authorizations are necessary to assemble and maintain this record and to protect its integrity. Legal issues—including perhaps the need for new legislation, or for changes to existing legislation—should follow from these broader public policy goals. We should not allow the existing legal frameworks and marketplace practices to overly constrain our thinking about what goals are possible or desirable. We must not let the public debate be dominated by technical legal issues about the interpretation of currently existing legislation. The digital age will be very different, and some key laws on the books today have been enacted very early in the transition to this digital age. Our understanding, insight, and wisdom about the nature of a digital world are naturally and necessarily limited. Some of those laws—for example, the Digital Millennium Copyright Act—are already producing what many believe are undesirable and unintended consequences as we begin to see their first applications in actual cases.

One thing is clear. Without such a public policy debate and the changes that may occur as a result of it, by simply letting existing legal and marketplace forces continue to operate along their current trajectory we may face a crisis in our ability to capture and preserve our cultural and intellectual record in the emerging digital age. Future scholars may look back at the early years of the twenty-first century as a dark age, where we find we have irrevocably lost much of our cultural memory because libraries and other cultural heritage organizations could no longer function effectively and indeed even individual collectors of intellectual and cultural works, who have often historically served as a safety net for libraries, had lost much of their ability to build and keep collections. And these future scholars may also recognize a society in the early twenty-first century as deeply troubled by a loss of accountability and of intellectual and artistic continuity and haunted by recurrent bouts of amnesia about the basis and nature of its own activities and actions. A systemic failure of our cultural heritage institutions is likely to exact a real price on the society overall, not just on our commitment to the importance of scholarly inquiry.

<<< Previous <<< Contents>>> Next >>>

  The Library of Congress >> To Preserve and Protect
   September 15, 2008
Contact Us