Scanning the Past: Building a Digital Archive

The Zen of Xerox

My research career began with scanning. For five days straight, I mediated and meditated up in the University of Calgary English department, scanning an entire filing cabinet that contained over fifteen years of research material. Two years later, here I am: burning a hypertext poem in my kitchen sink, and writing my honours thesis on the effects of digital archiving.

Like facsimiles, digital archives are products of human interpreters who select, encode, and publish the texts that we access. The encoding process inevitably places certain texts above others, just as Google generates webpages with the greatest number of hits. The growing number of digital archives (and the disappearance of printed ones) is radically transforming how we distribute and access texts. As a medium of presentation, the archive is “a man-made channel with its own meanings, which has been designed to enable the movement of particular kinds of material, and which is susceptible to reinvention and creative reappropriation by its users” (Scott-Warren 4). Moreover, the digital archive contains a material history of its own, having been “coded with computational instructions” which are shaped by its “specific circumstances of manufacture” (Mak 68). The folders I created in DevonThink two years ago codify that digital library: taxonomically branching into subfolders and tags, which replace (but do not replicate) the hulking metal filing cabinet of the 90s.

C: WPFILE.TXT07:3004_05_13

The history of the book, then, is best described as a series of mediations: where history simply confirms that new versions of a work “will be created,” whether by its author, successive editors, readers and writers, or RAs with scanning equipment (McKenzie 37). What Gary Taylor terms the “New Textualism” is just another chapter in that history, where rather than printing the King James Bible, we xerox it. This is what Bob Stein1 refers to as the book ‘flip-flop’ phenomenon: the atomization of the book from the tangible to the digital; from Word-Processor to printer, printer to scanner, scanner to digital archive again—revealing, Stein argues, the fundamental instability of all ‘material’ texts, and the fallacy of the book as a stable object and authentic referent (Stein).2 Digitization is “not eliminating printed books” but is “actually producing more of them” (Ong 135). The decline in the production of Renaissance texts3 is concealed by a proliferation of versions:4

We effectively reproduce fewer works, but we produce more versions of the fewer works we do reproduce. We therefore feel that we ‘know’ those few works with an unparalleled breadth and intimacy; moreover, we test and confirm all our cultural theories against the database of those few works. That diminishing number of works thereby becomes the measure of all things. It is not simply that we concentrate more and more of our attention on Shakespeare; even within the Shakespeare canon, we concentrate upon a diminishing number of works—just as late classical culture concentrated its attention upon a small fraction of the plays of Sophocles, Euripides, Aeschylus. (Taylor 50)

Taylor heralds a second Dark Ages, where the gems of classical antiquity, along with the works of Will Shakespeare, are lost. The remaining text files will resemble Taylor’s poem, which he includes in his elegantly titled essay, “C:WPFILE.TXT05:4110_07_98″:

Welcome to the dark side of Barthian textuality, where the interconnectedness and interdependency of the digital archive precipitates its own collapse. In the Kingdom of Gates, no single file “has any independent viability; if the network to which it belongs collapses, or becomes obsolete, the individual text file becomes illegible” (Taylor 52). Unless librarians install sufficient safety mechanisms, digitalism risks “the loss of all but a tiny, idealised remnant of the past” (Taylor 52). XML files are precarious textual bodies—vulnerable to viruses, coffee-spills, force-quits, and server-crashes.

As Jason Scott-Warren writes, living in a digital age makes readers “ever more likely to forget that some kinds of information can only be supplied by books as material objects,” and that “this information—so far from being an ‘optional extra’—may well be constitutive” (241). The speed at which we are scanning, copying, pasting, and OCRing literature leaves us insufficient time to consider its material properties, and how those properties might be more properly encoded. Facsimiles become originals, as online databases replace museum archives:

The very act of creating a digital version of a text raises issues as much theoretical as technical. . . The world quickly divides . . . between those who consider the visual and material properties of a work as information to be preserved, and those who do not, those for whom content resides in a string of characters in a standard edition and those for whom publication and print history are an aspect of all documents. Decisions about what information matters will determine the features of texts that are marked, disregarded, and even discarded as increasing amounts of humanities materials are available exclusively in online formats. (Drucker)

Archives are founded on ideology: those elements of a text that we, as readers and potential encoders, consider most important—be it the nonverbal aspects of the mise-en-page, or punctuation in Milton’s Il Penseroso. Digital archives are the capsules of our history; there is “always a critical reference, a gesture to the past” that makes digitally encoded pages meaningful (Mak 71).5 At the same time, the history of the facsimile should not be confused with the history of its original, for codes tell the story of digitization, “not of its exemplar” (Mak 71).

In Memoriam Ex Librum

In the past ten years, America has seen a surge in bookless libraries.6 A new book-free library in San Antonio, called BiblioTech, plans to loan out ereaders instead of paperbacks.7 Clients can check out one of the $100 devices for a two-week period, after which time the ereader will go dead so the customer will have nothing worth keeping. The architecture of the library is à la Steve Jobbs, Nicolas Carr jokingly notes:

The building is being designed to resemble an Apple Store, aseptic and brightly lit, with long ranks of iMacs and an info-barista manning the reference desk-cum-genius bar. The patrons, as the artist’s rendering indicates, will be wraiths.

Several other US libraries have gone digital: California’s NewPort Beach library;8 the Tuscon-Pima Public Library System in Arizona;9 and the University of Texas at San Antonio library.10 The disappearance of print libraries has kindled elegies for the book—notably, the vanitas paintings of Ed Ruscha, who treats “the book as a kind of empty Pop container or Conceptual signboard” in the medium of painting, “collapsing trash onto treasure” (Rosenberg). At seventy-five, Ruscha doesn’t read on a Kindle or iPad:

I don’t even use a computer. Every day I am reminded how far behind the world of technology I am. I’m not a great reader, either, but I love books, the physical objects of them. (Ruscha, qtd in Vogel)

Self-proclaimed as “the Henry Ford of book making,” Ruscha is concerned primarily with how books are constructed and “approaches books in the manner of a nose-to-tail chef, using parts of them that are often overlooked and under-appreciated, the spines and endpapers and edging” (RuschaRosenberg).11 The display at the Gagosian Gallery in Chelsea, called “Reading Ed Ruscha,” (below) is an extract from Ruscha’s larger, 2012 Austrian exhibition and includes (left to right) “Old Book Back Then,” “Old Book With Wormholes,” and “Old Book Today”:12

Librado Romero, The New York Times

As Karen Rosenberg observes, Ruscha’s work “seems to divorce books from the act of reading,” and “invents new ways to read” (Rosenberg). Two of Ruscha’s paintings are of open, blank books floating above palindromes, complicating the usual left-to-right movement of English script in the spirit of his earlier “Mirror Paintings,” which set palindromes against a backdrop of mountain landscapes. The juxtaposition of ‘book’ with ‘text’ highlights the conventionality of reading not just pages, but also paintings; an English viewer is defamiliarized, and cannot help but scan Ruscha’s pages from left to right.

Though fire is hazardous to Ruscha’s works, his paintings are not, like digital artworks, prone to deletion. Unlike the more traditional medium of painting, code art is mainly about the process rather than the product of creation. Artworks like Golan Levin‘s become irreproducible when the original programs are lost.13 All media, like floppy disks, become obsolete and code artists work tirelessly to prevent losses resulting from the constant arrival of new software.14  While it is easy to identify the ‘work’ of art in terms of painting or sculpture, digital artworks escape our attention because we typically separate the ‘actual’ artwork from  “the technology that supports it” (Fino-Radin, qtd in Waxman).


Print mentality is deeply entrenched in bibliographic studies—so much so, that even those scholars most passionate about preserving textual artefacts remain oblivious to the signifying capabilities of the codex. Graham Caie’s statement, though well-meaning, falls under this category:

One of the joys of teaching medieval literature is to see the transformation in a student when working with a manuscript. This is not always possible, and indeed, we must consider the damage done to the original if it is constantly consulted. With such sophisticated techniques as enhancement of texts through digitisation there will be little need in the future to subject precious texts to the constant handling that is ruining them. Only the very few codicologists with interest in foliation or binding will need to inspect them in the future, as harassed librarians can point to the ‘virtual manuscript’ in electronic form. Although the smell cannot yet be reproduced (though this will surely be a matter of time) almost every other aspect of the manuscript can be conveyed in electronic form and available to those who live far from original copies. Clues about cost, appearance, readership and ownership can be gleaned from close examination of the digitised facsimile that can include the flyleaves, while punctuation and other markings can give hints on contemporary presentation of the text. (Caie 36)

At the end of the day, ‘precious texts’ transcend their materiality. Even once digital technology can replicate the smell of old flyleaves; even once it can reproduce wisps of marginalia in fine, OCRed detail; it only does so with the help of codes and algorithms that alter the bibliographic DNA, and therefore the meaning of the text in elemental ways—ways that neither senior scholars like Caie, nor clumsy undergraduates, should ignore. The notion that digitization is a form of democratization is one that implicitly favours Western nations and perspectives, where ‘those who live far from original copies’ are ‘those’ living in Canada, Europe, and the United States.

In contrast, Umberto Eco sees the book as a sophisticated social networking tool that cannot be enhanced or replaced:

The book is like the spoon, the scissors, the hammer, the wheel. Once invented, it cannot be improved. You cannot make a spoon that is better than a spoon . . . Perhaps it will evolve in terms of components; perhaps the pages will no longer be made of paper. But it will still be the same thing. (Eco 5)

Up till now, the book has been the most portable and compact tool of human communication; as a spoon, hammer, or wheel, the paperback has been a dependable little machine.15 In declaring that the book is more or less ‘the same thing’ despite developments in technology, Eco falls prey, like Caie, to print mentality: conceiving of the book as an Idea transcendent of materiality. As I discuss in “Pages,” digital paginae do signify differently from printed ones and, as such, do not constitute ‘the same thing.’ Katherine Hayles makes a similar statement:

Print books are far too hardy, reliable, long-lived, and versatile to be rendered obsolete by digital media. Rather, digital media have given us an opportunity we have not had for the last several hundred years: the chance to see print with new eyes, and with it, the possibility of understanding how deeply literary theory and criticism have been imbued with assumptions specific to print. (33)

Digitization is not the end of the book; rather it is the beginning of a digital humanities that, by necessity, practices media-specific bibliography. The evolution of the codex demands both methodological and terminological shifts in the academy, as digitization affects how texts are distributed and therefore received:

lucky notes

Add books to the first column, and ebooks to the second, and this table summarizes the necessary terminological shift. As more books and folios are digitized, literary scholars must rethink their critical lexicon: is encoding equivalent to writing? Will future readers browse Shakespeare? Digitization is not only a transformation of the medium, but also of the language we use to talk about and interpret literary texts.

  1. Stein is the founder and co-director of the New York Institute for the Future of the Book. []
  2. See my earlier post about the ‘flip-flop’ phenomenon, here. []
  3. Relative to the Victorian period; see Taylor p. 49. []
  4. In support of Taylor’s theory, here is a list of current digital Shakespeare projects: Shakespeare’s First FolioShakespeare’s First Folio Folger copy 68; Shakespeare’s First Folio Folger copy 5MIT ShakespeareThe Shakespeare Quartos ArchiveTreasures in fullInternet Shakespeare EditionsRare Book RoomFurness Library (Shakespeare and others); Renascence EditionsEarly Modern Literary Studies ListEMLS list of just playsQueen’s Men EditionsLost Plays Database; and Digital Renaissance Editions. []
  5. Printed pages also tell stories: enwreathed in marginalia, water stains and (for some students I know) crammed with cracker crumbs, which track the human presence in the material text. []
  6. Indeed, on 18 Apr. 2013, the Digital Public Library of America was launched: making the holdings of America’s research libraries, archives, and museums freely available online to Americans and the world. []
  7. Playing on the Spanish word for library, biblioteca. []
  8. Which attempted in 2001 to make its old library bookless, but withdrew the plans amid public outcry. []
  9. Which opened a bookless branch in 2002, but added books five years later at the community’s request because the library was, ironically, built in an area without Internet access. []
  10. Which opened one of America’s first bookless academic libraries in 2010. []
  11. Indeed, Ruscha’s ‘bookishness’ reflects his training at the Chouinard Art institute in commercial design and typography, as well as his firsthand experience in publishing houses. []
  12. For more of Ruscha’s work, see Kunsthaus Bregenz. []
  13. See “Alphabet Synthesis Machine,” in particular. []
  14. Organizations like Rhizome collaborate with digital artists to preserve their artwork.  []
  15. For an account of Sterne’s book-machine, see “The Algorithmic and the Transcendental: Tristram Shandy as an Exploding Machine.” []

Leave a Reply

Your email address will not be published. Required fields are marked *