Last week was rather busy for me. Right after I got back to the Dartmouth campus from a tour of the composting facility I had to hightail it to Carson (the “pseudo-building” glued to the side of Berry Library) to attend a class in Digital Document Management (DDM). Over the course of two hours, Dartmouth Records Manager Wess Jolley [PDF] described some of the ideas and theory behind DDM and presented analysis of the progress Dartmouth has made towards transitioning from paper-only offices to largely DDM-only offices.
Although the management of documents at Dartmouth is very large and complicated — Jolley described us like a “small town” in terms of document diversity — the College is making steady progress towards bringing DDM to the various departments. As Dartmouth doesn’t even have DDM software selected yet, and given the hurdles associated with transitioning and retraining employees to use a digital system, it will likely be several years before a majority of the departments are wholly switched to a digital document management system.
When I walked in to Carson 61 at 9:55am, hoping my shoes didn’t still smell like compost, I wasn’t sure what to expect. I signed up for the Digital Document Management course on a bit of a whim: A coworker of mine had signed up for a class on Photoshop, and when I browsed to the page of tech classes Dartmouth offered I found the class description for DDM interesting:
Digital Document Management
In this session we will delve deeper into new technologies for
managing digital content.
Over 80% of all digital data being managed today is in the form of
document based data. As we move into the future of recordkeeping, we
will find ourselves forced to manage these objects through their
entire life cycle in digital form. “Printing it and filing it” is
quickly becoming a thing of the past! In fact, today even paper
documents are frequently becoming part of digital record keeping
systems through the use of document imaging technology.
This course will look in detail at document management technology and
strategies, and provide baseline knowledge for planning the record
keeping system of the future. We will look at how digital documents
can be routed electronically for processing through workflow
technology that eliminates the needs to move paper from office to
office. And we’ll talk about productivity gains from having your
information at your desktop, whenever you need it.
If you want to be on the cutting edge of technology, and to make
quantum leaps in your operational efficiency, then digital document
management technology (and this course) is right for you!
Instructor: Wess Jolley, CRM, Records Manager
Storing data in computers has a great number of benefits including easy transmission, very simple backups (just save the transmitted data), full text search (just try doing that, Gutenberg Bible!), and prevention of degradation. The methods we choose to store data and the ways in which we make it available for searching and combining can result in very powerful new tools. In short, “digital is cool.” So I signed up for the class.
Jolley started off by showing us a “Do You Know…” set of slides describing the pace at which the human race is generating new data. As you might imagine, the quantity of data is exploding at lightning speed, and as a greater and greater percentage of people are using computers, the amount of data and the rate at which it is being produced only continues to grow. At this point I’d like to take the time to apologize for contributing to the problem by writing this blog post 🙂
Jolley next gave a brief outline of the class, part 5 of 5 in the Record Management Training Cycle, and mentioned that people who had attended all 5 classes were going to get their diplomas. Wait, what? Part 5 of what?
Yes, it turns out that I had noobishly chosen to attend part 5 of 5 classes on Record Management at Dartmouth, skipping the previous four classes including (1) Managing Records in an Office Environment, (2) Using Records Management Services, (3) Vital Records Identification Protection, and Disaster Recovery, and (4) Introduction to Digital Records. As I was attending on my own initiative I wasn’t worried about getting a diploma, but I did feel a bit sheepish stepping-in so late in the game. In my defense, I’m not sure there was any way for me to know that this was a multi-part course from the class description! As it turns out, Jolley’s slides and handouts provided plenty of information, making the session work out fine as a stand-alone class.
“Dartmouth College is failing,” Jolley told us, “Dartmouth College is failing to effectively manage the vast majority of the institution’s electronic information.” What a way to start off a class! With that powerful, no-holds-barred, straight-shooting analysis of the situation at Dartmouth, Jolley had my rapt attention. Fortunately for us, it’s not just the College. Lots of institutions, in particular academic institutions, are struggling with DDM.
Dartmouth is currently having trouble trying to manage information as records. We can back up all of the information on each of our computers, but without identifying the relevant pieces that should exist as records (and aren’t just temporary files, sketches, other other ephemera playing merely a transitory supporting role), relevant data gets lost in a sea of information.
Jolley compared Dartmouth to a small town as it, unlike most companies and organizations, has to deal with such a diverse set of documents. We’ve got student records and professor/department records. Dartmouth owns land in New Hampshire and Vermont, operates buildings, runs a heating plant and a composting facility and owns a majority stake in the Hanover Water Works, not to mention 27,000 acres of forest! Dartmouth has students that live on campus, utility bills, fire sprinkler and alarm systems, insurance for chemistry labs, insurance for axe throwing on the Forestry Team, a General Counsel’s office, and medical records.
Given the wide array of documents Dartmouth must manage, the question must be raised of how much can all of these records be integrated to facilitate sharing. With questions of client confidentiality at the General Counsel’s office and patient medical record privacy and HIPAA obligations at Dick’s House and the Medical School, Jolley agreed that some departments will have to maintain their own, private repositories of data. He guesstimated that about 80% of Dartmouth’s documents could live in one central repository.
Dartmouth currently produces structured data and unstructured data. Structured data includes billing records, student grades, and all of that stuff that already lives in database tables. The structured data is under control at Dartmouth. Unfortunately, Dartmouth creates a whole lot of unstructured data, things like emails, powerpoint presentations, and reports. Dartmouth created 7.5 billion unstructured documents in 2004. That number has likely increased in the last 5 years.
Ten years ago digital document systems were failing. The digital systems available at that point presented too big of a change from paper-based systems. In the intervening years two things have happened: (1) The systems have improved in design and speed, (2) The user base has become more familiar and comfortable with computers. While a conversion to a DDM system at Dartmouth 10 years ago would have been very difficult, it’s not only possible but is economically effective for the College to move to a DDM system as soon as possible.
One big issue is record classification. Do you automatically classify a record or do you ask the user to classify the data as they enter it into the system? There’s the “five second rule” that states that if it takes longer than five seconds to do something, a user won’t do it. So if it takes longer than about five seconds to put data into a record in the document management system, users will start to not enter documents into the system as the bar is set too high. In terms of Cost/Effort vs. Benefit, the effort right now to declare and classify a document and put it into the system is pretty high, and the user gets no significant benefit. Jolley made the observation that the pain for users of using a paper-based system has to be enough that they want to switch to a digital alternative.
Using systems like “drop box” folders that will automatically create records and integrating DDM into applications is critical to the successful replacement of a dead-tree office with a digital office. Users should be able to just click a button in an application and have a document or email go directly into the repository. Some metadata can be picked up from the environment, but we can also give users the opportunity to add additional information at that time.
Enterprise Content Management (ECM) is all about Declaration and Classification. All else is secondary. Declaration “freezes” the record. Modifications may be made later, but the management system must version the file so that the original record is retrievable. There are number of existing pieces of software that can provide version control, like CVS and Git, but a DDM system suitable for a large organization such as Dartmouth must include more structure and provide better control (perhaps cryptographic) guaranteeing the creation date and initial contents of a record in the system.
“Don’t overestimate the technological savvyness of users,” Jolley told us, “many don’t even know that they can save a document to a different location using the ‘Save As’ dialog.” That’s a really critical point to remember as Dartmouth moves forward with new technological improvements. One of the biggest impediments towards rolling out advanced digital technologies is the disconnect between users and the tools. You could have the best DDM software in the world, but without the proper training and continued support, your departments would get nothing done.
The ability to integrate DDM features into applications sounds like a good idea, but I wonder how this will work at Dartmouth. Jolley showed us a mock-up of a Blitzmail client with an additional “Declare” button. Integration of DDM tools could speed up workflow, but if Dartmouth is moving away from Blitzmail and other programs for which we have control of the source towards proprietary, hosted software like Google Mail, how can we hope to integrate the features we need into someone else’s software? Maybe Dartmouth will use Google Mail as the backend and use a modifiable client like Thunderbird as the user interface? One can only hope!
One interesting point brought up by Jolley was “How do we ensure the secure deletion of documents at the end of their life cycle?” Currently Dartmouth stores all of its records in paper boxes. Once a time-to-retain date has passed for a given box, the records management people carefully review their records to make sure they’re not destroying the wrong data, and then shred/burn/explode the sucker. Simple and effective. But it can’t work that way for digital records. If we have all of our records stored in a database sitting on top of a huge Network Attached Storage (NAS) disk, then when we delete a set of records from the database, those records will still exist in backups of the database. It’s not uncommon to retain nightly, monthly, and yearly backups, so a given document could be deleted but still exist in backups for over a year.
Dartmouth Special Collections does not currently have a method set up for storing digital documents. “If someone wants a document to be around in a hundred years,” Jolley told us, “I tell people to print it out and send it to us.”
It’s too bad Special Collections doesn’t have a digital repository set up yet. When I was an undergrad at Dartmouth I played around with the digital preservation of materials. I scanned all of the pages of the Dartmouth Songbook and put them up on my website. There have been several versions of the songbook printed, but I could only use first couple of editions as those are the only ones for which copyright has expired and are in the public domain.
Not merely content with scans of the Dartmouth Songs, I started a painstaking process of typesetting the songs using the Free Software program Lilypond, a tool for music notation and “automatic engraving”. After a few songs my attention moved on to other things, but I still have the files. Hopefully one day soon Dartmouth will have proper facilities to make both the scans and other versions of the data — for example a re-typesetting of the music — available online.
Jolley mentioned that hardware and software can foil preservation. Lack of interoperability between systems can cause massive problems. This sounds like a great opportunity to push for the use of open file formats and for the use of Free Software. If the owners of the data (Dartmouth) have control over the formats in which the data is stored and have the source code to the programs that are used to store, manipulate, and present that data, then the owners have a much more powerful and much more valuable repository. The use of open file formats and Free Software in the storage of data gives the user power and flexibility. Because of the incredible importance of maintaining legible, accessible records, I’d argue that using proprietary software and closed formats to build these digital filing cabinets is dangerous and ill-conceived, from both a legal and an archival standpoint.
We’re currently living in a “Digital Dark Age,” with documents being lost every day. This is occurring not only at Dartmouth but across the web with the loss of sites like Geocities. Jolley didn’t spend much time discussing what’s being lost outside of Dartmouth as that was outside the scope of the class, but as an archivist I think that he and I are in agreement that we need to develop technologies to preserve as much of the historical digital data as we can. One example off the top of my head is all of the random pieces of software and icons and documents that I’ve only seen on the Dartmouth “Potlatch” FTP servers. Delightful hacks like the “Too Many Macs” Mac Classic system extension may already be lost if we don’t start preserving snapshots of Dartmouth’s digital heritage.
At the end of his talk Jolley encouraged us to step up and “Start thinking NOW about the future.” You can all be Digital Records Evangelists and promote a technological infrastructure to which the institution is committed. “Dream big!”, one of his slides proclaims. That really got me thinking about my work to digitize the Dartmouth Songbook, not just as flat images, but as a starting point on top of which other media and content could be layered.
If I may stray slightly away from the administrative side of Digital Document Management and into the cool, delightful embrace of Webster Hall and the Special Collections seductively hiding within, please consider the possibilities if Dartmouth were to provide not only a DDM system for the accountants and the lawyers and all of the record tedium necessary for smooth operation of the College, but were to provide a repository able to store structured data from Dartmouth’s collections and from the students, professors, employees, alumni, and members of the Dartmouth Community.
Imagine if you could navigate to some permalink for the Dartmouth Songbook like http://collections.dartmouth.edu/permalink/dartmouth-1915-songbook/ and then be presented with digital scans of all of the pages of the book. The page could have a sidebar showing other editions of the book, and include links to other items on the digital collections website including Dartmouth groups performing the songs, sheet music for the songs, history of the songs, and information about the students and alumni that wrote the songs. While it could greatly benefit from outside user contributions, an initial shot at such a site wouldn’t even need to accept outside contributions of data: Dartmouth’s own collections contain more than enough data to keep a team of archivists, professors, and student interns busy scanning, tagging, cross-linking, and structuring for years!
With dreams now floating about my head of converting Dartmouth’s collections into delicious online databases of structured, open-file-formatted, permissively-licensed data, it really seems a shame to end this post so soon. But at 2600 words and counting, I’m not sure how many readers have managed to make it this far. Half of them are probably asleep (or dead) and the other half are probably skimming through these words so they can go have lunch (which isn’t a bad idea; I’m quite hungry myself).
Under the watchful eye of Wess Jolley, I think that Dartmouth’s migration to DDM systems is proceeding well. Moving to an all-digital system isn’t going to happen overnight, and there are going to be hurdles and hiccups getting some departments and some professors working smoothly with the new all-digital workflows, but at the end of the day it’s going to be worth it. Education will be key, as will the careful selection of client software that is intuitive and server software that can integrate with Dartmouth existing systems using open interfaces to avoid the “data silo” problem. I’ll be very interested to see how Dartmouth implements their DDM systems and to see how well the transition goes for more technical groups, such as the CS Department, vs. less technical groups such as the English Department. With luck, Dartmouth will emerge from the transition with much faster, more efficient department workflows. And we’ll probably even save some trees in the process!
Jolley gave us several paper handouts. As much as I like electronic documents, paper handouts are still the crispest, least-likely-to-lose-power format out there. Also, you can take notes on them!
Some of these are
probably available somewhere online available online.The handouts included
- A printout of the 88-slide PowerPoint presentation that framed the class
- A handout talking about “The Document Life Cycle”
- A Glossary of ECM Terms from AIIM, the ECM association
- ECM 101: The technologies, tools, and methods used to capture, manage, store, preserve, and deliver content across an enterprise. (Jolley quipped that whoever designed this document completely forgot about the Destroy step!)
- Ten Steps to an Effective E-Records Implementation