On “repository rat”

I’d like to welcome my good colleague Shane Beers to the biblioblogosphere. Shane took over my duties at George Mason, and has done a lot better with them than I ever did. I’m happy to see other repository managers blogging, and thrice happy to see Shane.

He brings up something that I’ve heard from other people as well: annoyance at my insistence on the phrase “repository-rat” to refer to librarians who manage institutional repositories. Some of that is me, and some of it is deliberate and calculated rhetorical strategy. It seems worth picking apart.

The “me” part, I confess, is of a piece with my steadfast refusal to take myself and what I do too seriously. Back in the day, I called myself a conversion peasant. Now I’m a repository-rat. I’m stubborn about this, and I don’t anticipate changing it… but I also recognize that it leaks into how I refer to other repository managers, as well as the specialty as a whole, and I see how that can feel like disdain.

It isn’t. It takes quite a bit of dedication to stick with IRs, and an impressive array of skills to manage one well. (I’m not saying I do, mind. Not for me to say. But I’m steeped in this field, I know whom I respect, and I know what they are capable of.) Moreover, these dedicated, skilled people have to persevere in the face of widespread ignorance, apathy, and even opprobrium directed at them, never mind lousy software and badly-stacked odds.

Which leads me to the rhetorical-strategy bit. I feel like a rat in the wainscoting, ignored and despised and isolated. Why shouldn’t I? Why should I be any prouder of what I do than my employer (which has partially defunded my service), my profession (which barely acknowledges I exist and makes no effort to support me), or the open-access movement (which openly insults me when it doesn’t ignore me)? Why should I pretend to support and respect I don’t actually have?

And why is it uniquely my responsibility to redress these issues? If the institution I work for, the profession I have joined, or the open-access movement I am part of would like me to stop referring to myself as a rodent, howsabout they toss me a bone so I can move up the animal taxonomy a bit?

Like the immortal archy, I see things from the under side. There’s use in that, I maintain, just as there’s use in colleagues such as Shane asserting themselves to raise the profile of our work and the esteem in which it is held. I’m on their side, I truly am—I just approach the work from a different angle.

insects are not always
going to be bullied
by humanity
some day they will revolt
i am already organizing
a revolutionary society to be
known as the worms turnverein

—Don Marquis


A courteous interface is a marvelous thing. It gets out of the way. It intuits what you want, squeezing every tiny bit of information possible out of whatever tidbits you feed it. It doesn’t bother you with its nasty little internal troubles. It’s Jeeves, there with a pick-me-up when you’ve got a drink-fueled headache.

DSpace’s administrative and item-submission interfaces are more like the temporary Jeeves replacement Bertie got stuck with once, the guy who snarled all the time and snaffled socks. It is about as courteous as a New York cabdriver in heavy traffic. As a result, it wastes incredible amounts of human time—my time, my sysadmin’s time, my submitters’ time, the time of dozens of admins just like me. I promised to talk about that, so I will.

For example. Just this morning I got an unhappy email from a submitter who didn’t have access to all the collections in a given community. The said collections are two or three levels deep because of intervening subcommunities—and while I’m talking about wasted time, I’ll spend a few words on wasted cognitive capacity, because I have yet to meet anyone for whom the DSpace distinction between communities and collections is intuitive or useful. My submitters expect to be able to submit items to communities. They do not understand why some items on the sitemap (which is how they think of the communities-and-collections page) are bold and others aren’t. I hate wasting time and effort explaining this stupid and essentially otiose distinction.

Right. Back to my submitter and her problem. I had to click open every single collection in order to click again to check its submitter list. For those collections she didn’t have submit access to, adding it was a four-click process and could have been more: click to open the eperson list, click to go to the last page, click to select her address (she’s late in the alphabet), click to update the submitter group. Wasted. Time.

And don’t get me started on DSpace’s repo-rat–hostile habit of building impenetrable names for otherwise-unnamed submitter groups. COLLECTION_27_SUBMIT. Yeah, that makes all kinds of sense in my little rat brain, how about yours? (If you’re wondering, the number is the collection’s database identifier, which is almost impossible to figure out from the DSpace UI. Real friendly, DSpace.) And these names proliferate like rats, because there’s no way to tell DSpace “use the people I just told you about, plzkthx” without going through the added hassle of creating and naming an actual group, and no way to tell DSpace “use the standard access rules for this community” or “use the access rules for this other collection.”

So then I needed to set up a new collection for her. Could DSpace pick up on the submitter-selection work I’d already wasted a bunch of time doing? Could it hell. I had to go through the same clickety-clickety process all over again. There’s no access templating in DSpace; every single collection in every single community is sui generis. Just imagine how much time I get to waste when someone leaves the university and someone else takes over their DSpace deposit duties! Woo-hoo! Because obviously I don’t have anything important to do with my time.

Which brings us to the DSpace deposit interface. To be clear, I’m working from 1.4.2 here, not 1.5—but let’s be clear about something else too, namely that 1.5 doesn’t fix all of these warts, though the Configurable Submission system is indeed a step forward. So let’s waste some time, everybody!

You start your submission from a collection page, or you start from My DSpace, in which case it asks you to pick a collection. What does it do with this collection information? It determines whether you have deposit access, duh, and if your friendly neighborhood repository-rat has spent time customizing a metadata form for that collection, it uses that form. (Does DSpace ask on collection creation which metadata forms to use? It does not. That’s configured via a file called input-forms.xml on the server. Mm-hm, that’s right, I have nothing better to do with my time than seek out and edit—twice, because I keep a version in source control—bitsy little XML files DSpace leaves all over creation.) Anything else? Like surveying existing items in that collection for commonalities in order to prepopulate metadata fields? Nah. Machine learning would save a human being’s time or something. Can’t have that.

Next you run into this screen, which I loathe with a white-hot loathing neutron stars might envy:

First DSpace submission screen

The top question is just goofy. In my experience, this is true for less than one-tenth of one percent of submissions. The Québécois might have a use for that checkbox, but how many DSpace installations does Québéc have exactly, and why exactly wouldn’t a Québécois installation just put in dc.title.alternative by default? So why is every submitter into every DSpace installation forced to cope with that moronic checkbox for every single submission? Because DSpace doesn’t give a tinker’s damn about anybody’s time or cognitive load, that’s why. The default is correct, at least, but that’s decidedly small comfort.

(I suspect there’s a librarian at the bottom of this interface wart somewhere. What about MARC 246, someone must have screamed. Guess what? I don’t care about MARC 246. I care about efficient use of person-hours, which that checkbox unquestionably isn’t. I love my fellow librarians, except when I hate them. I hate them when they gleefully glomp every iota of patron time and effort they can get their little mitts on.)

The middle question is difficult to understand (for my submitters, anyway; more of them get it wrong than right), and DSpace doesn’t explain why you have to answer it. I get a lot of questions from submitters about putting in publication dates and citations, because my submitters don’t mentally connect those fields with that checkbox. But that’s what that checkbox does when checked: it adds fields to the next metadata screen for dc.date.issued, dc.publisher, and dc.identifier.citation. (How many repository-rats running DSpace just learned something? Don’t be embarrassed. It was months before I figured it out, too, and I had to go in and read code before I had it sussed.)

But it gets better (for “worse” values of “better”). Imagine Ulysses Acqua for a moment, trying to be nice to Dr. Troia and the little open-access basketology journal she wants to archive. He uses the input-forms.xml file to make a custom metadata form that puts basic citation information for the basketology journal in dc.identifier.citation so Dr. Troia doesn’t have to retype it every time. When Dr. Troia submits her first article, she doesn’t think to tick the middle checkbox, and DSpace doesn’t tick it for her. What happens?

SHE GETS AN ERROR MESSAGE. I kid you not. AN ERROR MESSAGE. It reads “You’ve indicated that your submission has not been published or publicly distributed before, but you’ve already entered an issue date, publisher and/or citation. If you proceed, this information will be removed, and DSpace will assign an issue date.”

I—I—I honestly have no words. Do I need them? Maybe I do. The Jeeves interface never, ever, EVER threatens to discard information Bertie has provided it. It’s hard enough to pry useful information out of Bertie as it is! And talk about your bizarrely opaque, unhelpful, and inappropriately finger-wagging error messages! (How does Dr. Troia fix the problem, if she wants to keep her citation information or date or whatever? The message doesn’t even say.) I am just agog that this grotesque interaction exists in a production software system.

(Yes, of course I’ve triggered it. How do you think I figured out it exists? I don’t go looking for smelly garbage like this, I assure you.)

But it even gets worse than that. Weird interactions between input-forms.xml and the deposit code can make checkboxes on this page disappear when they shouldn’t. I haven’t dug into how this happens—but it bit me hard, such that I had to be unhelpful and take a date.issued out of a thesis metadata form in input-forms.xml. Because hey, troubleshooting DSpace’s sclerotic deposit system is such a productive use of my time!

Returning to our initial screen once more: there is absolutely no need whatever to ask the submitter about multiple files. None. Simply assume that submissions may have more than one file! Asking submitters to think about it up-front instead of at upload is wasted time.

So there we have it. An entire wasted screen, multiplied by untold numbers of DSpace submissions. There’s plenty more in there, the licensing system not least; Jeeves interface, not so much.

EPrints, as a rule, is a much better gentleperson’s personal gentleperson than DSpace. EPrints, for example, asks for item type up front, and configures its deposit screens to match, without the intervention of either submitter or repository-rat. Who knows, this politeness may have something to do with developer attitude. The last time I waxed profane on matters repository-interface-ish, Les Carr was in my inbox less than a day later asking eagerly, “is this what you mean? would this solution I just came up with work for you?” Whereas DSpace gets on my case for being negative. I’m just sayin’ here.

No. No, I’m not just sayin’. It runs deeper than that. I’ve occasionally seen a few nods in the DSpace developer community toward EPrints interface accomplishments. Unfortunately, the feel of the discourse I’ve seen is “look at all the shiny AJAX! we want that!”

This is not about shiny AJAX, people. It’s not about shiny at all. This is about DSpace not wasting my time. There’s a ton of work DSpace could do with the aim of removing time-wasters before anyone writes a single line of Javascript or de-uglifies a single line of CSS. To do so, though, DSpace developers will have to learn to give a damn about my time and the amount of it DSpace has wasted and continues to waste. I see next to zero evidence of that learning taking place. (Tim gets it, which is why I say “next to zero” rather than just plain zero.)

Stop. Wasting. My. Time. That’s far and away the most important interface-development priority DSpace should adopt. For values of “me” that include “all repository-rats and willing depositors,” of course. DSpace’s interface needs to sit down at its mama’s knee and learn some courtesy.