Saturday, February 6, 2010

LibRIS: A PHP library for RIS parsing and writing

Once again I am working diligently on managing my bibliography and related resources. That means... time to create some new tools.

Today, I released the source code for an RIS parser and writer. The project, LibRIS is hosted on GitHub.

RIS (Research Information Systems) is a file format for exchanging information about reference sources. Journal articles, art works, books... metadata for these can all be expressed in RIS format. And most common citation formats need no more information than that found within an RIS record.

I released an "early access release" today. The format is basic enough that the reader/writer should already be relatively stable. I am using it in a new version of the fabled Lantern package, which is being written on Fortissimo.

Monday, January 4, 2010

Pilaster PHP Library in GitHub

Over the holiday break, I decided to dust off the Pilaster document database and bring it up to some acceptable level of stability.

As it stands now, Pilaster is a document database that provides similar capabilities to that of MongoDB. It is nowhere near as performant as MongoDB -- then again, it's written in pure PHP and does not require a separate application.

I figured it would be useful for smaller system or for applications running on hosted platforms.

In the backend, it uses Lucene (or, rather Zend's port of Lucene) to provide a high(-ish) performance datastore. You can search it using either name/value arrays or search strings. Yes, search strings are the database's query language. Weird... I know.

To test it out (and contribute, if you'd like), take a look at the Git repository here:

If you want to start with a build of it, you can download 2.0-alpha1 (or whatever today's latest version is) by going here:


When I initially developed Pilaster, it was basically a form of the Sinciput and Rhizome projects. It worked, but it was very tied to the way those systems used data. That was Pilaster 1.x.

As I took a fresh look at Pilaster in December, I realized that I could make it much more generic -- in short, I could make it work with native data structures! With minimal refactoring, I changed the entire Pilaster data model.

I decided that would be Pilaster 2.x.

Pilaster is not quite stable. The unit tests for the main public API all pass. However, the low-level driver is not complete. Import and export still need to be added. I am also uncertain as to how stable the Zend index really is. Your testing can help me discover what works and what doesn't.

Tuesday, March 17, 2009

Drupal 6 JavaScript and jQuery is out

I got my copies of Drupal 6 JavaScript and jQuery today.

Honestly, writing this book was a struggle. I think I spent longer debugging JavaScript on a multitude of platforms than I did actually writing the book. That makes the entire experience less enjoyable.

Another frustration with the book was trying to use the appropriate tone. My intent at the outset was to attract a different crowd. Rather than write to computer scientists and software engineers, I wanted to write to new web developers -- perhaps those who haven't done much programming at all. I don't think I met my own expectations in that regard. It is a rare gift to be able to explain something you know well to someone who has little base knowledge.

On the reverse side, I did have a chance to write some of the flashiest book code that I have ever written. Instead of writing generic code, I had the opportunity to start from what Drupal already has and build out. A few of my favorite examples were:
  • The real-time comment notification system (like Growl for a website). I might even have a project coming up that would allow me to use this on a production site.
  • A simple but configurable HTML editor called "better editor" (because it is an extension of an earlier example in the book). I could actually imagine creating a real module version of that.
  • A sorta cool client-side theming engine. It was fun. It has some practical application. Who knows... perhaps someone else will be able to make something useful of it.
So, yes, there were parts of this book that I really enjoyed working on.

Biographically speaking, it was rather strange to write a JavaScript book. I think the first server-side web apps that I wrote (that is, beyond simple CGIs) were written in Netscape's "Server-Side" JavaScript sometime around 1996 or 1997. And my first major consulting position was as a Lawson JavaScript developer. Looking back, Lawson was an unsung pioneer in the field. They were attempting to approximate AJAX with frames (not iframes -- no, this was before iframes were widely adopted). When I jumped ship from JavaScript to Java, I didn't think I'd turn back. But as the technologies have matured, I've found reason to want to script stuff on the client again.

Finally, I think the most interesting thing to come out of the book (and out of a discussion I had with Larry Garfield while writing the book) was QueryPath. Okay, to be completely honest, QueryPath was the result of three things:
  • Wes Munsil's teaching me how to write recursive descent parsers in June. Once I knew how to do it, I really wanted to write one. And who writes CSS parsers for PHP?
  • Learning jQuery. As soon as I started working with it I became enamoured. And the JavaScript book gave me more of a chance to think about its structure.
  • Expressing my frustrations with XML technologies to Crell, who flippantly suggested I write "jQuery for PHP". Really, that was the moment when the entire thing meshed. That happened somewhere around Chapter 4 of this book.
Will I write more books? Probably. I love the process. But maybe I'll take a breather for a few.

Wait... I just came up with a good idea for a book.

Wednesday, February 25, 2009

At the Foundations of Information Justice

My paper on FOSS and information ethics has been published in the Journal of Ethics and Information Technology. In a nutshell, I present the argument that FOSS can provide an avenue for social justice. It can be used to combat what I call "information poverty," where one is perpetually trapped in the role of a consumer, but never an owner, of information. Such a position places one at the liberty of others. I argue that FOSS and similar movements (like Creative Commons) provide mechanism that will not only provide the information poor with much-needed access to (and, in some sense, ownershipt of) information, but also prove to be an invaluable teaching tool.

George Thiruvathukal and Konstantin Laufer, both founding members of ETL, were instrumental in getting this paper. (In fact, Konstantin's father, a professor of economics, also provided valuable input.)

Paul Leisen, a fellow philosopher from Loyola, also helped out a lot, as did Tom Wren.

And, of course, I can't give enough credit to Samir Chopra and Scott Dexter, who were absolutely instrumental in getting this article off the ground.

At the Foundations of Information Justice
Matthew P. Butcher
Ethics and Information Technology
ISSN 1388-1957 (Print) 1572-8439 (Online)
Issue Volume 11, Number 1 / March, 2009
DOI 10.1007/s10676-009-9181-2
Pages 57-69
SpringerLink Date Wednesday, February 11, 2009

Saturday, January 31, 2009

The Drupal Quiz module

One of my key interests in Drupal these days is the Quiz module. As often as I can find time, I've been working on improving the module. So far, since I have inherited code maintainership, I have upgraded it to Drupal 6, released the 2.x branch, begun development on the 3.x branch, and added support for a new question type (long answer).

Right now I'm working on this crazy (in the Drupal world) idea: I'm architecting a generic object-oriented framework that should make it easy to add question types. Historically, the quiz module has only supported one type -- the so-called "multichoice" type. This one type is supposed to function as:
  • Multiple choice, single answer
  • Multiple choice, multi-answer
  • True/false
  • Personality-style multiple choice (no right/wrong answer)
The concept is interesting. Much of the logic between these four is shared. And the theory behind initial development was, apparently, that if they were so similar, they could simply all be covered by one set of functions.

This idea seemed to hold true... to a point. But as more and more feature requests come in (many of them not only reasonible requests, but features that really ought to have been in there for a long time -- like being able to skip a question), a choice has to be made:

Create a UI that is so complex that it will require training to use, or break out question types again.

I'm obviously not a fan of the first. I think the current (2.x) branch has a horribly complex process for creating tests, and the issue queue is too full of questions that shouldn't have to be asked.

I started down the second path by first implementing a new question type (rather than beginning by breaking down the multichoice module into several smaller pieces). I started by creating a long-answer question type.

What I discovered was this: Huge portions of the code were just copy-n-paste jobs from the multichoice module. (Actually, they were copy, paste, delete lots of conditionals jobs.) This got me thinking. The obvious solution for a situation like this (when you are working in an OO language) is to abstract and extend. That has long been looked at as un-drupalish, so I initially put the idea out of my mind.

But when I started attempting to add some other new features (in the afformentioned class of things that really ought to be there), I discovered that I was constantly running into roadblocks imposed by the lumping together of multiple choice types that should really be separate. And, really, the only way to separate them without cloning hundreds and hundreds of lines of code is to employ a decent OO architecture.

I've got the class model just about done. When I finish implementing it, I am estimating that it will take about two hours to create a True/False question type -- the simplest type. Other types will not take much longer. Reporting should be much easier, too. And perhaps even Views integration will take less time. That would be nice.

Twitter... I guess it was inevitable

George Thiruvathukal is working on something Twitter-related for an IEEE blog. As part of what I presume is a social experiment, he asked some ETL people to volunteer as followers.

That was the straw that broke the camel's back.

My friends are on Twitter. My colleagues are on Twitter (caveat: the Venn for that would show a large overlap). And each time one of them asks we if I tweet, a little piece of my Twitter defense breaks down. Apparently, the list little pebble of Twitter-apathy fell from its spot, and the dam burst. I tweeted my first (and second... and third... and fourth) today at

Notice: Now that I am on Twitter, it is essential that I express myself in brief metaphor -- dams bursting, backs breaking. Why? 'cuz that's the way we roll! (That's a metaphor, right?)

Saturday, January 24, 2009

QueryPath: It's like jQuery PHP.

I have just posted "my winter project." Its name is QueryPath, and it's something like jQuery PHP (or is that PHP jQuery?).

While writing the Drupal 6, JavaScript, and jQuery book, I started looking for something like jQuery in the PHP world. I found several projects that implemented small subsets of the jQuery API, but nothing approaching the complexity I had in mind.

So I went back a step, and started looking for a good CSS 3 Selector implementation. I didn't find one of those, either. At best, I found some simple regex-style tools that supported a small portion of the CSS 3 Selector standard.

There was nothing else to do but start coding.

First I wrote a recursive descent parser for CSS 3, including support for XML namespaces and other ill-defined yuckiness.

Then I wrote an event-based API similar to SAX2 -- only for CSS 3 Selectors. And I wrote an implementation of the API.

From there, I began constructing a PHP equivalent of jQuery. Not all of the jQuery API is relevant in PHP. After all, an event model glue layer is not of general interest in a single-threaded PHP app. But HTML/XML traversing and manipulating certainly are. So I borrowed as much of the jQuery API as seemed appropriate.

While I used the same function names (except for empty, which is a PHP reserved word), and tried to follow as closely as possible in parameters and return values, the internals are almost completely different. Why not? After all, JavaScript and PHP are very different languages.

The coding process was interesting. I'm too busy to write something like this in one go. So I spread it out... in 15-30 minute increments. In fact, I developed most of QueryPath while riding Chicagoland trains to and from work. Even with this bizarre and disjointed development path, I eventually finished. QueryPath's main library has 58 public methods, almost all of which are from jQuery's API (though I added some, like an XPath query and tools to use PHP delta and callback functions).

But I wasn't happy. I wanted some cool extensions (plugins), too. It didn't take me long to figure out the obvious: QueryPath needed a database layer. Using PHP's PDO library, I constructed a simple database library, QPDB, that allowed various ways of merging SQL results into XML/HTML. Take a query and turn it into a table or a list. Or get get more detailed -- you can put database results (in whole or in part) wherever you want! There's even a simple template language (I like to call it HTML) that you can use to format results in sophisticated ways.

I am so happy with the library that I have released it under LGPL or MIT License (your choice). You can head over to (or skip straight to the downloads and docs at to try it out for yourself.)

N.B. A huge thanks to the Fedora Hosted folks for (a) inviting me to host there, (b) providing an unbelievable array of VCS and bug tracking tools, and (c) being more than congenial all along.