ELECTRONIC INFORMATION AND DIGITIZATION
Preservation and Security Challenges
|
18. The Coming Crisis in Preserving Our Digital Cultural Heritage
Clifford A. Lynch
This paper offers a brief survey and synthesis of
several developments in areas as diverse as intellectual property law,
the marketplace for cultural and intellectual goods, and the technologies
involved in maintaining digital information across long periods
of time. These developments are converging to create a crisis in our
ability to preserve our cultural heritage as this heritage increasingly
migrates into digital formats.
In the historical period we are just leaving behind,
the stewardship of cultural and intellectual heritage was primarily
concerned with the acquisition and subsequent preservation of physical
artifacts. Copies of books, sound recordings, photographs, prints,
pamphlets, and other materials were made available in the public
marketplace. (There was also, of course, a market in original unique
objects such as paintings, but this involves different issues, which I
do not consider in this paper.) Libraries and other institutions
concerned with preserving our heritage obtained copies through
purchase, donation, or other means and then kept the artifacts in trust
for society. Anything broadly available commercially or for free was
available to libraries as well as to individuals. This was enabled by
a legal framework that included both copyright law and
the doctrine of first sale.
There were of course problems. Some artifacts offered
in the marketplace were poorly constructed for long-term persistence,
for instance, books printed on acid paper; the intent of the producers
was inexpensive mass production for a consumer market. Some important
cultural materials were not sold as artifacts, and libraries had trouble
obtaining copies, as has been the case with television broadcasts or
films before the emergence of the videocassette marketplace. Other
artifacts could be used only with technical playback systems that quickly
became obsolete or unavailable, for instance, early computer games and
some audio and video materials. In an ironic twist of fate, the only
surviving record we have of many early films is the paper prints that
were deposited for copyright registration in 1894 and the following
decades. The films themselves were produced for limited distribution on
volatile nitrate film stock that is long gone, and the studios that
created them often did not even try to preserve them; indeed, many
early studios simply went out of business. But the system worked well
enoughparticularly for print and sound recordings, where a mass
market existed from the beginning. Libraries were able to have access to
the vast majority of our cultural and intellectual heritage and to
select what they wished to preserve from this treasure trove, and our
civilization is immeasurably richer for this.
More and more of our society's new cultural and
intellectual works are being produced in digital forms. Older works are
also being repackaged (sometimes with important enhancements) as
digital products. As this migration takes place, we are seeing the
emergence of a new and very different marketplace in intellectual and
cultural goods. In this new marketplace, content is moving to
disembodied collections of bits that are delivered over the network,
removed from any specific artifactual "carrier." Even in the
still-numerous transitional cases where carrier media remain in
use, the complexity and rate of obsolescence of the playback system
technology mean that it will become increasingly commonplace to find
media that can no longer be played. There are relatively short windows
of opportunity when content can be copied and reformatted from one
medium to another while old and new playback technologies briefly
coexist in the marketplace. Preservation requires active management and
continual vigilance.
The terms of availability are changing as well.
Rather than selling an artifact, content is made available to the public
under constrained license terms that restrict sharing, copying,
transfer of ownership, display, performance, and other use, sometimes
far beyond the customary constraints imposed on artifacts by copyright
law. In the most extreme cases, consumers do not obtain works at all,
but rather the right to experience a work for a limited time under a
pay-per-view or similar rental framework, with no guarantee that a work
enjoyed today will still be available to be enjoyed tomorrow, even if
the reader is prepared to pay the additional fees. These
"pay-per-view" arrangements convert a much larger class of works
than the traditional performing arts into ephemeral, transient,
experiential things that sometimes may only be shared or revisited
through memory and re-description rather than through revisiting the
work itself.
Content that is available to the consumer may simply
not be available to libraries under terms that allow long-term retention
and future provision to the interested public. Although there has never
been an obligation on the part of publishers and other content
distributors to accommodate libraries among their customers (and thus
ensure that their materials will be available to the society for the
long term) as part of marketing their wares to the general public, in an
era characterized by a marketplace in artifacts, it was
very hard to avoid doing so.
In the new world of digital content and commerce
governed by license, it is very easy to target experiential consumer
markets while explicitly excluding long-term access (ownership of
copies) either by private collectors or by cultural heritage
institutions such as libraries. This shift is well illustrated by the
new characterization of music as a "service" that is being promoted by
some parts of the recording industry. Rather than acquiring ownership
of copies of specific musical works, consumers pay for a subscription
that allows them to listen on demand to a large but perhaps ever-changing
corpus of music, the details of which are determined by the
industry. This may be attractive to the consumer, but it is problematic
for organizations concerned with the long-term preservation of the
cultural record.
We are still limited in what we know about either the
costs or the best technical strategies for preserving digital content.
In particular, we face difficult intellectual issues about exactly what
we are trying to preserve. But we are developing a broad consensus in
several areas. First, we need to focus on the bits, and not the
artifacts that may temporarily carry them. The bits that define the
worksand not the media that may house the bits at any given
timeare what are important and what need to be preserved. In a
world of short-lived artifacts and even shorter-lived playback
systems, we cannot count on bits stored on and bound to artifacts to be
reliably readable in the long term simply because we have placed the
artifacts on shelves. Instead, the strategy is to copy bits from older
storage technologies to newer ones on a continuous basis, taking
advantage of those periods when generations of technologies overlap and
copying can be done inexpensively, and without incurring the risks
involved in making assumptions about the shelf-life of the various
media. Preservation of digital materials is a continuous, active
process (requiring steady funding), rather than a practice of benignly
neglecting artifacts stored in a hospitable environment, perhaps
punctuated by interventions every few decades for repairs.
Second, we recognize that maintaining bits in a
digital world depends not only on storage hardware but also on software
to interpret the bits and that software systems and standards (image,
audio, and video formats, for example) will evolve over time. Because of
this, not only do we need to simply copy bits, but we also periodically
need to reformat works from older standards to newer ones to ensure that
we will continue to have available software that can interpret the bits.
Whereas we know a great deal about the mechanics of how to manage bits
across time, we have no general theory of how to manage the migration of
formats across time as standards and software evolve, though there is
some basis for optimism about our ability to successfully navigate
format migrations case by case assuming we are able to ensure that
digital materials receive sustained and careful attention and stewardship.
This is the best understanding we have today about preserving
digital information, along with a well-honed sense of the fragility of
complex digital information such as interactive computer games or
simulations that depend not just on rendering content for the human
perceptual system but on the integral participation of computing systems
in mediating the "performance" or execution of the digital work.
Third and last, we have learned that preserving
digital works is difficult, even if we can easily read all the bits when
the work first arrives at the archive that is to manage it, and even if
the format of the work is well documented. But we are facing disturbing
developments that make the task of preservation infinitely more
difficult. Some works are no longer available in the marketplace as
open, documented files of bits. Rather, they are encrypted and wrapped
in protective active software systems that perform and enforce
rights management by preventing copying. New laws (discussed below)
have made it a crime to attempt to bypass these protections (though
there are some exemptions), but even leaving aside the legal issues,
these protective measures vastly complicate the copying and reformatting
that is necessary to preserve digital works.
It is probably not an exaggeration to say that the
most fundamental problem facing cultural heritage institutions is the
ability to obtain digital materials together with sufficient legal
rights to be able to preserve these materials and make them available to
the public over the long term. Without explicit and affirmative
permissions from the rights-holders, this is likely to be impossible.
Such permission is no longer part of the standard commercial framework
as we have moved toward licensing and pay-per-view agreements and away
from a marketplace dominated by long-lived artifacts. Indeed,
recent legal developmentsin particular, some of the provisions of
the Digital Millennium Copyright Act, such as those dealing with
anticircumventionhave made it much more difficult for libraries
to act to preserve digital content in the absence of explicit
permissions from rights-holders. Legislation such as the Uniform
Computer Information Transaction Act (UCITA) has helped to legitimize
pay-per-view and licensing frameworks. The Sonny Bono Copyright
Extension Act, by stretching out the term of copyright, has also made
libraries more dependent on obtaining explicit permissions to ensure
that digital materials are preserved for the long term. Because of the
new, extraordinarily long terms of copyright, it is even more improbable
that artifacts bearing digital content, or the digital content itself,
will remain readable throughout the duration of copyright without
active stewardship.
In this new world, then, content may not be preserved
by traditional cultural memory institutions, except perhaps by
the Library of Congress, by virtue of its special,
peculiar status, under American copyright deposit (and the issues here
are not entirely clear at present), or by other national libraries
under their own national copyright deposit arrangements, unless the
rights-holders take steps to make sure that this happens. This
prospect places a heavy burden on these national libraries and is
particularly dangerous because of the high degree of reliance on a
handful of unique institutions that are subject to the vagaries of
politically based funding and policy direction. The preservation of
large portions of our cultural heritage may depend critically on ample
and consistent annual funding to a single institution. A few years of
budget austerity could cause large portions of this record to vanish.
Previously, for print collections, a wide range of public and private
funding sources underwrote the preservation of the record, and the
nature of preserving print was such that it could survive considerable
periods of lean budgets. An equally diverse group of institutions both
public and private (as well as individual collectors) actually
collected and preserved the materials, and these materials are widely
distributed.
We have several sets of issues to consider as we look
beyond the possible special role of the national libraries, which can
invoke copyright deposit regulations as a means of obtaining control of
copies of materials for preservation. (Note that there is a broader, and
much more complex and controversial issue involved here, which is
largely beyond the scope of this paper: who gets access to the materials
and under what terms? The problem is one of finding a balance that does
not destroy the marketplace in cultural and intellectual goods, but that
still provides some measure of access to the public through cultural
memory organizations. Negotiating this balance will be an extraordinary
challenge. I am focusing here more narrowly on preservation.) If the
broader community of libraries, archives, museums, universities, and
other cultural heritage institutions is to exercise stewardship over
our intellectual record in the digital age, as matters stand today
these institutions will need to obtain permissions to perform these
functions. How and why might they obtain such permissions?
For publishers in the consumer marketplace, the
concerns are with revenue maximization and asset management through managed
availability. At best, questions of long-term preservation of their
wares as cultural heritage are irrelevant (think about broadcasts of the
nightly news, or about newspapers migrating to the digital medium). At
worst, it runs actively counter to their economic interests (here,
think about entertainment products like music). To be clear, there is
growing recognition that archives of various materials do have economic
value, and many content producers are offering products that involve
archives (such as the newspaper industry); but this is changing content
into new products, not preserving it for the longer term when it is no
longer viable as a product. It is not at all clear why these
content-ownersparticularly the smaller or newer
organizations that have not yet become cultural institutions in their
own rightwill even bother to spend the time and money to engage
the issues of putting permissions in place to ensure preservation, much
less actually grant the needed permissions.
There is another large class of content that can best
be termed "ephemera"network-based analogs of pamphlets,
broadsides, menus, transportation schedules, and the like.
Historically, if libraries could obtain a copy of such items, they could
preserve them using the framework of copyright, but for current
materials, legal agreements and permissions are needed. Yet the authors
of these works are often not major economic players, or they produce
content as a byproduct of other economic activities; they do not have
the funding or the interest to enter into such legal agreements. Indeed,
it is often impossible even to identify the authors of such works or
to engage them in a discussion about such agreements.
One can see this problem vividly in the efforts of the Internet Archive,
which simply collects publicly accessible Web pages on a continuing
basis and archives them under what are at best uncertain legal auspices.
The notion of the Internet Archive actually negotiating with the author
of each Web site and obtaining permission to make, store, and maintain a
copy of the site is literally unthinkable.
In the old world of physical artifacts, simply by
publishing their works so that an archive or library could obtain a copy
authors would in effect enter into the necessary agreements to ensure
that their works would be archived as a byproduct, but this is no longer
the case. The category of ephemera is actually very broadconsider
advertising, for exampleand grows ever broader as more people
employ the Web as a democratic, low-barrier-to-entry means of sharing
their ideas. Access to the digital printing press has truly become
available to almost everyone, but without the historic properties that
accompanied the physical output of the older printing press that were
so essential to preservation.
It is informative to look at the case of academic and
scholarly journals that have moved to the digital world. This is a very
different situation than we find in the consumer marketplace. Here, in
most cases, libraries constitute the primary marketplace. But even more
to the point, these journals exist to serve their authors and readers,
who are scholars operating within a strong culture of the importance of
maintaining the intellectual record. Organizations such as the Coalition
for Networked Information, the Council on Library and Information
Resources, and the Association of Research Libraries have sponsored a
number of meetings over the past few years to try to address issues of
archiving scholarly journals of record as they move to digital form. One
very strong message that has emerged from these discussions is that
there is a deeply held shared commitment to archiving and to the
integrity of the intellectual record as represented by these
publications. The publishers of these works, by and large, have made it
clear that they are prepared to assign the necessary rights and
permissions to libraries to ensure that these works are archived and
maintained for the long term. They understand that they have an
obligation to do so in order to keep faith with their authors and
readers, and that if they do not do so, they will not be able to
continue to attract authors and readers as their publications migrate to
digital form. Even if they were unwilling to do this, it seems likely
that libraries, as their primary customers, could persuade them to do
so, but such market pressure appears to be largely unnecessary. There is
a shared, common set of values that says that preservation is essential
and that the appropriate permissions simply must be put in place to make
it happen.
Some difficult technical, economic, and
organizational issues need to be resolved in order to put an effective
and comprehensive system of archiving for scholarly journals of record
in place. And yet, these developmentsand, most particularly this
strong affirmation of common values related to the integrity and
preservation of the intellectual recordleave me optimistic that
the problems will be solved.
But contrast this to mass market cultural products.
We have little evidence so far that creators, consumers, and publishers
in this world have been able to articulate a similar set of shared
values around preservation. Many rights-holders are keen on the notion
that they can simply withdraw a work from circulation at will,
regardless of how many people may have seen it and the extent of the
work's impact on society. As discussed already, there is a new emphasis
on content that is offered only on a limited-time and limited-use basis
rather than having copies distributed for continued consideration and
reassessment. And, of course, the impact of a work and its
cultural value may be perceived only in retrospect.
Orwellian scenarios involving the purging or rewriting of what is
clearly well-established cultural and intellectual history are
actually embraced by some as desirable and even attractive consequences
of the new technologies of content control and the new licensing
frameworks.
The issues in question are actually quite profound
and nuanced intellectually and it is not clear what the right answers
are. We have traditions of creators as owners, enjoying the exercise of
both property and moral rights over their creations. But offsetting
this, for example, we have a strong historical tradition that considers
publication an essentially irrevocable act, that once a work is
published, it cannot be withdrawn from the public record. As a society,
we generally reject government censorship and are at best deeply
uncomfortable with the idea that the exercise of ownership rights can
reverse the act of publication rather than amend it. In this tradition,
it may be possible to prevent new copies from entering the hands of the
public, reducing the work to a rare and specialized, but not
inaccessible part of that public record. The work becomes something that
may be consulted, perhaps with some difficulty or inconvenience, without
necessarily being available in new copies for new purchase. In effect,
our cultural and intellectual record has been supplied by the consumer
marketplace over time but has existed distinct and independent from the
present status of that marketplace at any particular point in time.
Rather, the cultural record has represented a summation of all that has
ever been available in the marketplace.
Today, in a radically altered legal and technical
landscape, it appears possible to change all thisbut should we,
and if so, on what basis? Libraries and other cultural heritage organizations
have traditionally served as the society's advocates for
preservation. One can all too readily envision futile attempts
by these cultural heritage organizations to intervene
in the new consumer marketplaces, where content is made available only
on a pay-per-view basis. Libraries simply do not represent a
significant sector of the marketplace, and they are likely to be told
either to accept the same terms as every other consumer or to refrain
from licensing the product if they do not like the termsor even
that the distributor of the work simply does not care to do business
with libraries at all. If this disenfranchising of our cultural
heritage institutions, this elimination of any opportunity to preserve
these materials, occurs, it will hurt our society in the long term. Such
an issue is unlikely to be resolved by marketplace forces.
It is all too easy to invoke a sort of narrow legal
and economic determinism here, to simply say that our current laws and
marketplaces empower the rights-holder (and even perhaps the consumer)
and that this is a good thing. We may be tempted to make vague
references to the inevitability and necessity of the globalization and
harmonization of intellectual property law or to argue the economic need
to maintain parity with European Community copyright law and policy all
as a way of abdicating any real responsibility for social consequences.
Standing in opposition to these developments, but too often overlooked
today, is the fundamental constitutional construction of intellectual
property in American society.
Intellectual property rights are not just another
form of property rights, they are a part of a pact between creators and
society as a whole. These rights are a tool to advance the "sciences
and useful arts," as specified in the U.S. Constitution. Rights are
assigned for a limited term, with the intent that after that term,
works will become part of a national intellectual patrimony, a part of
the public domain.
As I understand it, the Constitution does not speak
directly to a public intellectual and cultural record, though
copyright deposit legislation looks to the ongoing construction of
such a record. And surely such a record is vital, not
only as precursor to the public domain but also as a necessary
prerequisite for an informed, educated, accountable, vital, and democratic
society. Our society, at least as we conceive of it today needs its
libraries and its intellectual and cultural record. Perhaps the framers
of the Constitution saw such a record as a thing that would evolve and
thrive naturally and hence needed only limited protection through
provisions such as the freedom of the press (though certainly our ideas
about how broad, and how public, such a record should be has expanded
since the writing of the Constitution, and has developed in tandem with
the evolution of democratic cultural heritage institutions such as
public museums and libraries). Perhaps the framers could not foresee
the constellation of economic, technical, and legal forces that today is
assembling to threaten the existence and integrity of such a record, and
thus felt no need to build in explicit protection against these forces.
This is an area where constitutional, legal, political, historical, and
cultural scholars are shaping the discussion that we need to have.
Interestingly the risks we suffer are not those of
direct government control over the intellectual record. The vision that
George Orwell portrayed in 1984 (to cite one canonical example
from a rich genre) was of a totalitarian government that had obtained
comprehensive control over this record and that continually rewrote it
in order to maintain power and to further its own ends. What is
threatening us today is not an abuse of centralized power, but rather a
low-key, haphazard deterioration of the intellectual and cultural record
that is driven primarily by economic motivations and the largely
unintended and unforeseen consequences of new intellectual property laws
that were enacted at the behest of powerful commercial interests and in
the context of new and rapidly evolving technologies.
It is time for a blunt, fundamental discussion about
the importance of preserving our social, cultural, and intellectual
heritage as a key public policy goal; about the need to maintain this
as a record that is held in trust for all citizens and that can be
consulted by all citizens. We need to explore if and how to formalize
new principles: for example, once a work has influenced the thinking of
millions of people, it must, at some level, become part of the heritage
of society as a whole, and we as citizens must have some rights and
capabilities to revisit it. In other words, there is a point at which
works that reach the public must become in some sense part of a public
record. We need to be clear about how the social and intellectual
record differs from the marketplace in intellectual properties and the
extent to which this record is permitted to encroach upon the
unfettered operation of the marketplace. And perhaps, in order to
encourage the development and maintenance of this record, we need to
make it easy for ephemera to enter this record and subsequently to be
preserved without special actions on the part of creators. We need to
consider whether restrictions on use or easy incorporation into the
public record, and later the public domain, should be the default mode
of operation in the absence of specific, affirmative actions by creators
or their agents.
But we also need to be absolutely clear that the
social and intellectual record at issue here is not necessarily
something that is available instantly without charge, and without limitation
from any computer connected to the Internet; it is something that
is held in trust, collectively, by our cultural memory institutions. We
must still address the exquisitely complex and delicate problem of how
we can provide at least some level of access to this record (and what
levels of access to what part of the record) without damaging the
marketplace that creates so much of its vibrancy and richness.
A particular group of questions to which we must be
sensitive concern the rights of authors and other
creators, as distinct from the rights of publishers and other large
corporate entities that often present themselves as speaking on behalf
of creators. My focus here is not primarily economic; on an economic
basis there is often considerable alignment between authors and
publishers, and the central issue I am concerned with here is what can
be preserved, not the ability of authors to derive income from a
marketplace in their works. The most recent revisions of American
copyright law have begun to introduce European notions of "moral"
rights of creators into the discussion, in part because of international
harmonization. At least in theory the new legal and technical capabilities
give creators (or their assignees) an unprecedented ability to
withdraw their works from circulation or otherwise control how they are
seen after publicationor perhaps more appropriately, after they
are granted broad availability, because the idea of publication per se
seems to be ever more elusive. The fear is that moral rights will not be
invoked by creators to protect the integrity of their works, but that
they will become the tool of other interests in manipulating
availability for other ends.
There are many kinds of creators with many purposes.
A poet no longer comfortable with his or her youthful published works
and who would just as soon see them forgotten is very different from
someone now nominated for high office who is haunted by an embarrassing
speech from a few years past that he or she would like to expunge from
the record before the news media can obtain copies. Both of these cases
are in turn very different from investigations of attempts to
manipulate the price of stocks over time through the message-board
discourse that has developed among investors in the digital world.
Although all of these might be grouped together under a legalistic
analysis, I think that the public would have very different degrees of
sympathy for the rights of the creator to withdraw his or her works
from public scrutiny from one scenario to another. The correct answers
here are anything but clear because of this enormous variation, but the
questions need to be part of our conversation about the future of the
intellectual record, particularly in conjunction with the possible
emergence of technologies that can "undo" publication or other broad
distribution.
The recent Tasini v. New York Times et al.
litigation is an excellent illustration of some of the issues and
dilemmas that we must face in addressing the maintenance of an effective
intellectual and cultural record in digital form as a public policy
goal, and of balancing this goal with the rights of creators. The
Tasini case also illustrates the problems of sheer scale, of
practicality, and of overhead and transaction costs that may arise in
trying to honor creators' rights as we try to migrate much of our
existing cultural record to digital form in a context of extremely
lengthy terms of copyright protection. It is somewhat different from the
other situations I have discussed but has important resonances.
In Tasini we have a situation where the courts
found that a number of authors have suffered an injustice. Their rights
to control and benefit from the use of their works have not been
respected. But redressing these abuses could have a high social cost:
the potential corruption of key parts of our intellectual record. These
authors contributed materials to major newspapers and magazines of
record that were read by millions, and their works were reproduced in
digital representations of these publications of record, thus providing
an accurate digital representation of record that reflected the earlier
printed works. The authors arguedand the courts agreedthat
because the publishers did not have the rights to supply their works
for inclusion in these digital compendia, their works should now be
removed unless the publishers come to terms with the authors and obtain
their permissions. Pragmatically it presents a real problem for the publishers: there
are many authors involved and many works involved, and simply
contacting all of them and concluding the necessary negotiations is a
hugeperhaps impossibletask. Many database providers have
removed substantial numbers of articles from their databases as a
result of the decision. The only good news here is that although the
integrity of the digital record has been damaged, we still have print
and microform copies of the original newspapers to refer to (however
inconvenient this may be).
We must find ways to avoid such debacles in future,
particularly when we may no longer have the earlier print record as a
recourse.
The public policy discussion needs to focus on
questions about what sort of intellectual and cultural record we need to
maintain, and why and what authorizations are necessary to assemble and
maintain this record and to protect its integrity. Legal
issuesincluding perhaps the need for new legislation, or for
changes to existing legislationshould follow from these broader
public policy goals. We should not allow the existing legal frameworks
and marketplace practices to overly constrain our thinking about what
goals are possible or desirable. We must not let the public debate be
dominated by technical legal issues about the interpretation of
currently existing legislation. The digital age will be very different,
and some key laws on the books today have been enacted very early in the
transition to this digital age. Our understanding, insight, and wisdom
about the nature of a digital world are naturally and necessarily
limited. Some of those lawsfor example, the Digital Millennium
Copyright Actare already producing what many believe are
undesirable and unintended consequences as we begin to see their first
applications in actual cases.
One thing is clear. Without such a public policy
debate and the changes that may occur as a result of it, by
simply letting existing legal and marketplace forces continue to
operate along their current trajectory we may face a crisis in our ability
to capture and preserve our cultural and intellectual record in the
emerging digital age. Future scholars may look back at the early years
of the twenty-first century as a dark age, where we find we have
irrevocably lost much of our cultural memory because libraries and other
cultural heritage organizations could no longer function effectively
and indeed even individual collectors of intellectual and cultural
works, who have often historically served as a safety net for libraries,
had lost much of their ability to build and keep collections. And these
future scholars may also recognize a society in the early twenty-first
century as deeply troubled by a loss of accountability and of
intellectual and artistic continuity and haunted by recurrent bouts of
amnesia about the basis and nature of its own activities and actions. A
systemic failure of our cultural heritage institutions is likely to
exact a real price on the society overall, not just on our commitment
to the importance of scholarly inquiry.
chap18.html
|