Miniature disasters and minor catastrophes

KT Tunstall’s wonderful song is playing on Pandora as I type this, and it’s just so fitting I have to use it as this post title!

This is a tale of beating DSpace and OS X with many, many rocks until they sorta-kinda work. I present it here in hopes of sparing someone else considerable annoyance.

One of my best clients emailed me with a “please fix this link in my HTML item” request. Simple enough, right?

The said HTML item is nested in folders three deep. This means that DSpace’s regular exporter breaks, because it’s not smart enough to create intermediate folders. Joy.

So I kicked that up to the dspace-tech list, and got a kind response from Larry Stone of MIT: “use the METS packager export instead.” I did, and lo! it worked.

So I twiddled the file needing twiddling, zipped up the whole, and tried to put it back. First the METS ingester barfed because I’d zipped the folder containing all the files, not the files themselves. Okay, durrr, I felt stupid and zipped the files properly.

Then the METS ingester barfed because unbeknownst to me, Mac OS X’s native zip utility adds OS X-specific junk into the zip file. Quite properly, the ingester said primly, “Your METS manifest doesn’t match your actual files. Go forth and fix it.” The solution to this little difficulty turned out to be YemuZip, which can emit a normal zip file.

Then the METS ingester barfed because the file I’d twiddled was a different size from what the METS file was claiming, logically enough. Helpfully, the ingester’s error message told me what size the file actually was, so I could pop into the METS file and fix the size in the several places it appears.

Then the METS ingester barfed because the checksums in the METS file didn’t match the checksum of the file I’d twiddled. There’s probably a quick and easy way to calculate a checksum from the command line, but CheckSumApp has a cute little GUI. Like the file size, the checksum appears several places in the METS file, so I made sure I got all of them.

Then the METS ingester actually worked. So now I have to go in and do database magic so that the item handle points to the new item, because the METS ingester doesn’t have a replace option the way the normal ingester does.

Anybody who thinks that a normal repository manager is going to go through all this to fix a link in an HTML file is as barking mad as I am. This is the ridiculousness that DSpace’s insistence on no-versioning, butterfly-pinned-to-wall “final archival” reduces me to. Yes, it’s funny—but it also cost me an entire hour to fix one link.


A courteous interface is a marvelous thing. It gets out of the way. It intuits what you want, squeezing every tiny bit of information possible out of whatever tidbits you feed it. It doesn’t bother you with its nasty little internal troubles. It’s Jeeves, there with a pick-me-up when you’ve got a drink-fueled headache.

DSpace’s administrative and item-submission interfaces are more like the temporary Jeeves replacement Bertie got stuck with once, the guy who snarled all the time and snaffled socks. It is about as courteous as a New York cabdriver in heavy traffic. As a result, it wastes incredible amounts of human time—my time, my sysadmin’s time, my submitters’ time, the time of dozens of admins just like me. I promised to talk about that, so I will.

For example. Just this morning I got an unhappy email from a submitter who didn’t have access to all the collections in a given community. The said collections are two or three levels deep because of intervening subcommunities—and while I’m talking about wasted time, I’ll spend a few words on wasted cognitive capacity, because I have yet to meet anyone for whom the DSpace distinction between communities and collections is intuitive or useful. My submitters expect to be able to submit items to communities. They do not understand why some items on the sitemap (which is how they think of the communities-and-collections page) are bold and others aren’t. I hate wasting time and effort explaining this stupid and essentially otiose distinction.

Right. Back to my submitter and her problem. I had to click open every single collection in order to click again to check its submitter list. For those collections she didn’t have submit access to, adding it was a four-click process and could have been more: click to open the eperson list, click to go to the last page, click to select her address (she’s late in the alphabet), click to update the submitter group. Wasted. Time.

And don’t get me started on DSpace’s repo-rat–hostile habit of building impenetrable names for otherwise-unnamed submitter groups. COLLECTION_27_SUBMIT. Yeah, that makes all kinds of sense in my little rat brain, how about yours? (If you’re wondering, the number is the collection’s database identifier, which is almost impossible to figure out from the DSpace UI. Real friendly, DSpace.) And these names proliferate like rats, because there’s no way to tell DSpace “use the people I just told you about, plzkthx” without going through the added hassle of creating and naming an actual group, and no way to tell DSpace “use the standard access rules for this community” or “use the access rules for this other collection.”

So then I needed to set up a new collection for her. Could DSpace pick up on the submitter-selection work I’d already wasted a bunch of time doing? Could it hell. I had to go through the same clickety-clickety process all over again. There’s no access templating in DSpace; every single collection in every single community is sui generis. Just imagine how much time I get to waste when someone leaves the university and someone else takes over their DSpace deposit duties! Woo-hoo! Because obviously I don’t have anything important to do with my time.

Which brings us to the DSpace deposit interface. To be clear, I’m working from 1.4.2 here, not 1.5—but let’s be clear about something else too, namely that 1.5 doesn’t fix all of these warts, though the Configurable Submission system is indeed a step forward. So let’s waste some time, everybody!

You start your submission from a collection page, or you start from My DSpace, in which case it asks you to pick a collection. What does it do with this collection information? It determines whether you have deposit access, duh, and if your friendly neighborhood repository-rat has spent time customizing a metadata form for that collection, it uses that form. (Does DSpace ask on collection creation which metadata forms to use? It does not. That’s configured via a file called input-forms.xml on the server. Mm-hm, that’s right, I have nothing better to do with my time than seek out and edit—twice, because I keep a version in source control—bitsy little XML files DSpace leaves all over creation.) Anything else? Like surveying existing items in that collection for commonalities in order to prepopulate metadata fields? Nah. Machine learning would save a human being’s time or something. Can’t have that.

Next you run into this screen, which I loathe with a white-hot loathing neutron stars might envy:

First DSpace submission screen

The top question is just goofy. In my experience, this is true for less than one-tenth of one percent of submissions. The Québécois might have a use for that checkbox, but how many DSpace installations does Québéc have exactly, and why exactly wouldn’t a Québécois installation just put in dc.title.alternative by default? So why is every submitter into every DSpace installation forced to cope with that moronic checkbox for every single submission? Because DSpace doesn’t give a tinker’s damn about anybody’s time or cognitive load, that’s why. The default is correct, at least, but that’s decidedly small comfort.

(I suspect there’s a librarian at the bottom of this interface wart somewhere. What about MARC 246, someone must have screamed. Guess what? I don’t care about MARC 246. I care about efficient use of person-hours, which that checkbox unquestionably isn’t. I love my fellow librarians, except when I hate them. I hate them when they gleefully glomp every iota of patron time and effort they can get their little mitts on.)

The middle question is difficult to understand (for my submitters, anyway; more of them get it wrong than right), and DSpace doesn’t explain why you have to answer it. I get a lot of questions from submitters about putting in publication dates and citations, because my submitters don’t mentally connect those fields with that checkbox. But that’s what that checkbox does when checked: it adds fields to the next metadata screen for, dc.publisher, and dc.identifier.citation. (How many repository-rats running DSpace just learned something? Don’t be embarrassed. It was months before I figured it out, too, and I had to go in and read code before I had it sussed.)

But it gets better (for “worse” values of “better”). Imagine Ulysses Acqua for a moment, trying to be nice to Dr. Troia and the little open-access basketology journal she wants to archive. He uses the input-forms.xml file to make a custom metadata form that puts basic citation information for the basketology journal in dc.identifier.citation so Dr. Troia doesn’t have to retype it every time. When Dr. Troia submits her first article, she doesn’t think to tick the middle checkbox, and DSpace doesn’t tick it for her. What happens?

SHE GETS AN ERROR MESSAGE. I kid you not. AN ERROR MESSAGE. It reads “You’ve indicated that your submission has not been published or publicly distributed before, but you’ve already entered an issue date, publisher and/or citation. If you proceed, this information will be removed, and DSpace will assign an issue date.”

I—I—I honestly have no words. Do I need them? Maybe I do. The Jeeves interface never, ever, EVER threatens to discard information Bertie has provided it. It’s hard enough to pry useful information out of Bertie as it is! And talk about your bizarrely opaque, unhelpful, and inappropriately finger-wagging error messages! (How does Dr. Troia fix the problem, if she wants to keep her citation information or date or whatever? The message doesn’t even say.) I am just agog that this grotesque interaction exists in a production software system.

(Yes, of course I’ve triggered it. How do you think I figured out it exists? I don’t go looking for smelly garbage like this, I assure you.)

But it even gets worse than that. Weird interactions between input-forms.xml and the deposit code can make checkboxes on this page disappear when they shouldn’t. I haven’t dug into how this happens—but it bit me hard, such that I had to be unhelpful and take a date.issued out of a thesis metadata form in input-forms.xml. Because hey, troubleshooting DSpace’s sclerotic deposit system is such a productive use of my time!

Returning to our initial screen once more: there is absolutely no need whatever to ask the submitter about multiple files. None. Simply assume that submissions may have more than one file! Asking submitters to think about it up-front instead of at upload is wasted time.

So there we have it. An entire wasted screen, multiplied by untold numbers of DSpace submissions. There’s plenty more in there, the licensing system not least; Jeeves interface, not so much.

EPrints, as a rule, is a much better gentleperson’s personal gentleperson than DSpace. EPrints, for example, asks for item type up front, and configures its deposit screens to match, without the intervention of either submitter or repository-rat. Who knows, this politeness may have something to do with developer attitude. The last time I waxed profane on matters repository-interface-ish, Les Carr was in my inbox less than a day later asking eagerly, “is this what you mean? would this solution I just came up with work for you?” Whereas DSpace gets on my case for being negative. I’m just sayin’ here.

No. No, I’m not just sayin’. It runs deeper than that. I’ve occasionally seen a few nods in the DSpace developer community toward EPrints interface accomplishments. Unfortunately, the feel of the discourse I’ve seen is “look at all the shiny AJAX! we want that!”

This is not about shiny AJAX, people. It’s not about shiny at all. This is about DSpace not wasting my time. There’s a ton of work DSpace could do with the aim of removing time-wasters before anyone writes a single line of Javascript or de-uglifies a single line of CSS. To do so, though, DSpace developers will have to learn to give a damn about my time and the amount of it DSpace has wasted and continues to waste. I see next to zero evidence of that learning taking place. (Tim gets it, which is why I say “next to zero” rather than just plain zero.)

Stop. Wasting. My. Time. That’s far and away the most important interface-development priority DSpace should adopt. For values of “me” that include “all repository-rats and willing depositors,” of course. DSpace’s interface needs to sit down at its mama’s knee and learn some courtesy.

Demeanor and community

DSpace’s market position in the IR software industry is “the out-of-the-box, one-size-fits-all solution.” It doesn’t demand the up-front coding investment that Fedora does, nor is it as narrow-focused as regards ingested material as EPrints. Since DSpace is open-source software, it attracts those who cannot afford hosted IR solutions; such adopters, owing to poverty, are not likely to be overblessed with technical staff.

This has consequences for the composition of the DSpace user community. I’d bet my entire net worth and a bit over that DSpace adopters contain many, many more non-techies and accidental techies than Fedora adopters. (I consider myself among the “accidental techies” group, incidentally. I’m not trained for DSpace sysadminning or code-monkeying and I’m far from expert at either, but I do them anyway.) I suspect, in fact, that these are the great silent majority of the DSpace community—emphasis on the “silent.”

I have some fairly direct evidence to bolster this notion. I got a considerable number of back-pats at OR ’07 over the DSpace customization guide that Tim Donohue and I wrote for JCDL ’06. A considerable number. Toss in that Tim must have gotten a lot too, that the self-selected OR ’07 crowd is in all probability more technical than the general run of DSpace administrators, and that the guide we wrote is actually pretty basic, and… look, you tell me how technical the DSpace adopter pool is.

Over at Five Weeks, we have several participants who worry over their perceived lack of technical savvy. We’re doing our best to reassure them, in part because honestly, too many librarians feeling uneasy and defensive about this look for any reason to back away from the keyboard. Confirming their perceptions about their own skills only leaves them less likely to learn, while superciliously casting aspersions on their abilities sends them fleeing headlong away. Reassure them, and be rewarded with wider adoption than you’d have thought possible—Five Weeks’s forty participants are blogging and wiki-ing great guns over there, and the course hasn’t even started yet!

Do I think this lesson extends to the DSpace community? I surely do. Do I think the core of the DSpace community—coders, mailing-list participants, documenters, et cetera—is generally friendly to the community’s less-technical members? I surely do not.

When I started at MPOW, I did as many newbie DSpace administrators do: I ran into roadblocks, problems the FAQs and mailing-list archives didn’t solve. (Heck, I still run into roadblocks. Ask me why the repository I run doesn’t have RSS feeds running. I can’t answer you, because I have tried everything I can think of and I don’t know why they still refuse to work, but feel free to ask.) More often than not, asking the dspace-tech mailing list produced no reply. Not “no helpful replies,” not “no useful replies,” but no replies whatsoever.

Frankly, I didn’t think much of the DSpace community after that, not for quite a while. Things have changed for the better in the year and a half I’ve been doing this, but I still see problems going unanswered (not just unresolved, unanswered), and I wonder how many current newbies have the same bad taste in their mouths that I had back in the day.

I also see a disturbing tendency toward non-techie-bashing by DSpace techies. I offer examples not to shine a spotlight on individual people, because I admit I’ve had my share of eye-rolling “what a maroon!” moments on reading the mailing lists, but only to establish that this is a genuine phenomenon. For that reason, I’m not attributing examples in this post; the links will have to do.

This weblog post regarding the University of Calgary’s search for an ETD solution, for example, contains the bald order “DSpace is an Open Source product, where words like ‘cannot’ should not be used unless you really have looked into it. The underlying search engine can do all of the things required for Calgary, and all it requires is the alteration of the UI to support it”.

Hey, where did “out of the box solution” go? Like it or not, ETDs are a major IR use case. If DSpace doesn’t support them out of the box, that is not the problem of DSpace administrators, it is DSpace’s problem. Getting huffy at managers who may not be able or even allowed to mess around under the hood solves nothing. It certainly doesn’t fix DSpace to work right with ETDs.

(Think “allowed” is not a real problem? Think again. I won’t be DSpace-sysadmin-in-chief at MnewPOW, which suits me fine because Tomcat gives me ulcers. At the interview, though, I received several strong hints that my coding chops, such as they are, would not only not be required, but would be actively discouraged by the actual sysadmins. Fortunately, this seems not to be entirely the case, but you’d better believe I will be walking on eggshells at MnewPOW until I sort out what I can and can’t do.)

Now consider this reply to a list of feature requests. The content of the reply is great, thought-provoking stuff. The manner of expression of the reply is unnecessarily scornful, completely failing to address the actual content of the original post in its zeal to scold the poster. Also consider that as of this writing, that reply has been the only reply.

Look, it takes considerable courage for a non-programmer to post feature requests to the mailing list or Bugzilla page for an open-source software project. I’ve been around the block some; I know how OSS programmers generally react to that. Most DSpace admins, techie or not, probably have the same awareness, so if someone speaks up, trust me, it’s about something important. Dumping on someone for politely opening a conversation about features is a sure road to never hearing from him or her again—and never hearing at all from dozens of DSpace adopters just like that one. That in turn is a recipe for DSpace to set poor development priorities.

So consider this my public call to the core of the DSpace community to watch its communication style and demeanor. We weaken the DSpace project when we turn people off. We weaken the DSpace project when we fail to offer help. Given its market position, we weaken the DSpace project when we make unwarranted assumptions about DSpace admins’ technical capacity. Let’s not, okay?

Even here

I’m fat, graying, scarred, unfashionable, generally homely as the proverbial mud fence. It bothers me less and less these days, and today I was reminded why.

At break, I met another DSpace admin, who will remain anonymous in this post for reasons that will shortly become obvious. Unlike me, she is young and conventionally quite attractive. She introduced herself to me, and we talked DSpace geekery for a bit before she said in a low voice, “I was glad to see another woman in the room. There was this guy from [locale deleted] behind me who was going on and on about taking me out, and you helped me escape him.”

In other words, some creepwad came on to her. At a PROFESSIONAL CONFERENCE.

For future reference, I am always available as a haven for folks in like case. I give off plenty enough ugly vibes (never mind “tall and hefty and imposing-looking” vibes) to make these wankers piss off.

I don’t know who the perp was. I don’t want to know (though if I find out, he should worry). Right now I just want to tell him, loudly and publicly, that he needs to cut that crap out NOW. No woman should have to “escape” people in a professional setting. EVER.

And yet it happens. Not to me, because I’m old and fat and ugly and married. But it happens. And it shouldn’t. And when the hell is it going to damned well stop?