Defining openness: open source, open data, open APIs, open communities, and more

A couple of weeks ago I was in Florida giving a talk on Open Source, Open Data in which I tried to describe what open data was. In preparation for that talk, I went looking for definitions of “open” as it applied to either field, and found myself drawing on the following documents:

In the end I structured my talk around the four freedoms because, let’s face it, they’re snappier — but this is all just background.

In any case, I’ve started to collect articles that talk about openness, and in the last couple of weeks I’ve seen a burst of them. Perhaps I’m just hyper-aware at the moment, or maybe we’re going through a phase of introspection about the whole idea. In any case, I thought I’d post a round-up of recent posts on describing, defining, and measuring openness for software, data, APIs, and the communities and processes that surround them.

From the OpenGeoData blog, The Cake Test for determining whether geodata is truly open:

What is the Cake Test? Easy: A set of geodata, or a map, is libre only if somebody can give you a cake with that map on top, as a present.

Cakes are empirical proof that most the data in most SDIs cannot be used freely, because of the licensing terms of the SDIs. And they are an empirical proof that attendants to the latest spanish SDI conference could taste themselves.

Louis Gray, The blurry picture of open APIs, standards, data ownership:

Following the much-discussed news of Facebook debuting its “Open Graph API” on Wednesday, I traded a few e-mails with a few respected tech-minded developers, and found, unsurprisingly, that not everyone believes Facebook is fully “open”. In fact, it’s believed some companies are playing fast and loose with terms that should be better understood.

To quickly summarize the discussion, there are essentially three major ways to bucket “open” APIs…

  • open access

  • APIs that leverage open standards
  • open standard APIs like OpenSocial, OpenID, PubSubHubbub, AtomPub and others

In short, you have “open but we control the process”, “standing on the backs of open” and “truly open”, if this opinion is accepted. The developer adds, “In short, the first two mean nothing, the last one actually fits the dictionary definition. The Web is built on open standard APIs and protocols.”

Simon Phipps, A software freedom scorecard (video from a talk at the South Tyrol Free Software Conference last week) describes why an OSI-approved license isn’t enough to guarantee software freedom, and describes a number of indicators you can use to quantify the freedom of a given piece of software.

Matt Zimmerman, Open vs. open vs. open: a model for public collaboration describes three axes of openness for open source projects:

Open (available)

In order for your project to be open to anyone, they need to be able to find out about it and experience it for themselves.

Open (transparent)

The next flavor of openness is about transparency. This means enabling people to find out about what is happening in your project.

Open (for participation)

The third type of openness is open participation. This builds on transparency by creating a feedback loop: people observe activity in your project, react to it, and then actually change its course.

Finally, Melissa Draper posted about Open community, pointing out external commentary and even criticism is a natural part of having an open (transparent – to use mdz’s term) community.

(Note: Some blockquoted sections above have been edited for length.)

Got any other good links — especially recent ones — on the topic? I’m sure I’ve missed some.

Warily, and with much trepidation

I used to have a Facebook account. I deleted it. Not just suspended, actually deleted. The whole system over there gave me the creeps, between the ads that oscillated wildly between knowing too much and too little about me, to the way it would send me email notifications that someone had left me a message without actually telling me what was in the message. And then there’s the fact that Facebook’s friending system is reciprocal, which means I can’t let someone follow me without following them in return and taking the risk that they’re the sort of person who spends their day throwing sheep at me.

I gather that things have got better in the last year or two, and I keep seeing reasons why I should use it for work, so the time has come to try it again. Warily, and with much trepidation.

In addition to the obvious AdBlock I’ll be making use of articles like these:

It’s not that I particularly desire privacy — hell, I spew all the minutiae of my life across Twitter without caring who reads it — but that those same settings might just help keep me sane and sheep-free. The problem is, I know a lot of people — more than most of the people I know1 — which leads to a serious imbalance of traffic. So I’m much more concerned about filtering inbound information than I am about filtering outbound information. I’m not sure that Facebook’s really set up for that.

Tips, as usual, are appreciated.

1. Somewhat-related article of interest: Why your friends have more friends than you do, via Radar.

The community spectrum: caring to combative

This is part of my “Craft of Community” series of blog posts; you can find more through my craft of community tag.

Like I said in my last post, I’ve started and participated in a pretty wide variety of communities: large and small, technical and non-technical, open and invite-only, non-profit and corporate-sponsored, focused and general. The only thing they’ve really had in common has been that they’ve all been online, to at least some degree; my life’s been pretty Internet-mediated since I first got online in 1993 and I can’t think of any communities I’ve been involved in since then that haven’t had at least an informal mailing list. So that’s just to declare (at least one aspect of) my bias up front.

Last year one of the designers at my work linked me to The Competitive Spectrum over at the Yahoo Developer Network, and it introduced me to a whole new way of thinking of the variety of communities. It’s part of a larger set of social patterns related to reputation, which is a whole nother subject, but for now I just want to talk about the spectrum itself.

The Competitive Spectrum describes communities as being:

  • Caring: members are motivated by helping each other.
  • Collaborative: members share goals and help each other to achieve them.
  • Cordial: members have their own goals which do not conflict with each other.
  • Competitive: members share the same goals, and compete against each other to achieve them.
  • Combative: members must achieve their goals by preventing others from being doing so.

Yahoo gives some examples of each (mostly drawing from their own web properties), but I found it interesting to consider some technical communities I know and think about where they fit into the spectrum.

Most open source projects are collaborative, at least on the surface. Contributors come together to build a piece of software, each contributing their own time and skills to achieve the shared goal. However, you see some spread on the spectrum as well: as contributors get to know each other through online chatter and real-life meetups, they can be quite caring; some developers submitting uncontroversial patches that scratch their own itches do so in a way that’s basically cordial; and at times, developers who are keen to see their own preferred solutions make it into core, or whose egos become tied up in their contributions or their role in the community, can tend toward competitive or even combative.

It’s interesting to compare Dreamwidth, an open source project I’ve blogged about before, which seems to tend more towards the caring end of the spectrum. I’ve seen countless examples of generosity and personal support in that community, and can only think of very mild examples of competitiveness (and none of combativeness). It’s impossible to tell how much of this is due to their founding principles, the project’s relative youth, the fact that the project is centred around a journalling platform that tends to expose contributors as “real” people, or the fact that the majority of contributors are women who have (for the most part) been socialised to behave this way: any or all of those may be contributing factors.

There are a handful of open source projects that actually show a kind of dimorphism: part of the community at either end of the spectrum. One example of this is the Linux Kernel, whose mailing list is known to be one of the most combative in the field, while the Kernel Newbies group has a friendlier, more helpful, caring feel:

Kernelnewbies are a community of people that improve or update their Kernels and of aspiring Linux kernel developers and more experienced developers willing to share their knowledge. We help each other learn how the Linux kernel works and occasionally discuss other operating system kernels.

Along similar lines, the Rails community, which has a reputation for being quite rough, has Railsbridge, whose mission is “to create an inclusive and friendly Ruby on Rails community.” In both these cases, the more caring group was founded in reaction to the main group’s unwelcoming reputation. We could look at these groups as separate communities, except that the membership and activities tend to cross over and blur. (The question of what delineates a community and how you define the edges is a hairy one, and I’m not going there for now.)

You can apply the community spectrum to any kind of community, not just open source projects, nor even just online communities. It’s easy to see how it can apply to anything from sports teams to cancer support groups.

This model’s really helped me realise that what works for one community may fail dismally for another. It’s not hard to see how a community’s place on this spectrum can influence everything from appropriate leadership to rewards for participation to what kinds of online forums or real-world meetups will work best for the group.

So, how about your communities? Where do they fit on the spectrum? Anyone got any other interesting examples of dimorphism?

The Craft of Community

A surprising number of old friends seem to be asking me, lately, what exactly it is that I’m doing these days. I guess that after a decade of being known mostly as a Perl developer, it seems like I’ve gone off on a bit of a tangent. So, to make it clear: these days, my day job is as the community manager for Freebase.com, specifically for what we call the “Freebase geek community”: open source developers, data contributors, and all kinds of individuals who just think Freebase is cool and want to play with it. (I have less to do with the big companies that build stuff on Freebase — we have separate business development people who work with them.)

This is the first time I’ve had “Community” in my job title, but it’s obviously not the first time I’ve done it. One of the first Internet communities I built, back in 1994, was a mailing list called AusBDSM, which had hundreds of members, events in most major Australian cities, and a couple of spinoff groups by the time I handed it off to my successors in… was it 1996? Since then I’ve founded dozens of other communities, ranging from technical to political, from a handful of members to several hundred, and from pretty-damn-successful to thoroughly moribund. As for how many communities I’ve participated in, it would have to be in the hundreds, easily, though of course I don’t keep count.

All of which is to say: I have opinions on community management. Oh boy do I have opinions. But it occurred to me recently that although I used to blog a fair bit about programming when I was a professional software developer, I don’t often blog about community management now I’m getting paid for that. Which is funny, because I originally set up this blog to write about work-related/professional subjects.

Speaking of my opinions on community management, tonight I started reading Jono Bacon’s new(ish) book, The Art of Community. (It’s available as a free download under Creative Commons, if you’d like to read it too.) And of course I have opinions on it. Most of them are along the lines of, “But what about…?” and “Why didn’t he mention…?” and I have to admit I thought I could have done it better — which is easy to say, of course, having never written a book myself ;)

I already (regretfully) decided against doing NaNoWriMo this month because I didn’t have time. But blogging? That I can do.

So I’m planning to do a series of posts called “The Craft of Community”, because that’s how I like to think of it. Craft is something anyone can pick up. We learn crafts informally, by seeing and by doing, and our early efforts are usually pretty ugly. While there are some craftworkers who produce pieces so beautiful they’ll bring tears to your eyes, for the most part crafters do what we do because it makes us feel good, and because we like to see something we made with our own hands, even if it the back of it is kind of a mess or the legs are a little bit crooked. And in most crafts (like in Perl, a craft language if ever I saw one), There’s More Than One Way To Do It, so we can learn the most when we look at a broad range of technique and experience.

Some topics I’d like to cover:

  • The variety of communities
  • Anonymity, pseudonymity, privacy
  • Status and advancement within communities (incl. meritocracy)
  • Community metrics
  • Implications of hosted community tools/forums/etc
  • Challenges for for-profit companies trying to build communities
  • Conference formats
  • Things to do at meetups

I’m not sure I’ll get around to covering all of those, and no doubt I’ll come up with topics that aren’t on that list, too, so I’ll just imagine those are rough notes for my future self. Let me know what you think.

My experience with a dawn simulator

A couple of weeks ago I posted asking if anyone had had experience with dawn simulators or opinions of what model I should get. I went with the Philips HF3480 and this is my review.

Day 0: The lamp arrived from Amazon and I plugged it in and played with it a bit. Determined that it worked as advertised. Here’s a picture of it part-way through its 30 minute “sunrise” sequence:

Dawn simulator

So far so good. I went to bed looking forward to being woken by it.

Day 1: The light was meant to wake me for 8am. Starting around 4am, every time I rolled over or half-woke for any reason, I would think, “Is it happening yet?” and crack an eyelid to check. As you can imagine, this didn’t lead to a restful night’s sleep. And it turns I’d screwed something up in setting the alarm, so it never lit up, and I staggered out of bed around 9:30am bleary-eyed and annoyed.

Day 2: Made sure the alarm was set properly and snuggled down with a book to read before I went to sleep. The light at its brightest (setting 20) is too bright as a bedside lamp, so I dialled it down to 14. The problem with that is that the same setting is used as the maximum light in the morning, which means I woke to a dim, cozy light rather than a bright one. Not what I was aiming for.

Day 3: Working from home with a cold, so decided not to wake myself early. However, I did take the opportunity to read the manual, in the hope that it would explain how to use setting 14 for evenings and 20 for mornings. No luck; whatever you have it set to in its normal lamp mode is what it will use for the dawn simulation. However, I did discover that you can turn up the brightness while the light is off. Weird as it sounds, it just means that instead of just flipping the light off at night, I need to flip it off then spin the dial up to 20 before sleep. As long as I remember to do that, all will be well.

Day 4: A day off, and still feeling a little cruddy, so I set the dawn simulator to go off latish (9am) so I’d get a long night’s sleep. It worked as advertised, but when I woke to it I rolled over and hit the kill switch, and went back to sleep in the dark til noon.

Day 5: Feeling better, and I’d like to get to the local market at a reasonable hour. Set the dawn sim for 8am and — wow! — it worked, I woke up, and actually got up and did things (for values of “things” meaning “sitting round in my pyjamas reading email”). Can’t complain though; I wouldn’t normally be doing that at 8am on a weekend, or indeed any day.

Day 6: The end of daylight savings. Glad to have the extra hour, of course, not to mention the extra light in the mornings going forward. Set the dawn sim for 8am (i.e. 9am in old money) and actually woke a little before it (7:10). Rolled over and dozed for a bit longer, then woke easily and cheerily when the electronic birds started chirping at 8am. Hurrah!

Day 7: Monday morning. Once again, woke at 7-something, rolled over, dozed til 8am then got up pretty easily. Not much more to say, really. Hurrah! Tomorrow I’m going to set it for an earlier time, maybe 7:30 or even 7:00. I think we can call this a success, even though it took a while to get here.

One other point to note: this device does not come with international power plugs nor any of the indicators (such as a wall wart) that usually suggest it will work well in other countries. The label underneath says 120V-60Hz with no variations suggested.

Open government and parsable data formats

I first became aware of these issues via Raymond Yee, who teaches at UC Berkeley and who I’ve worked with a bit, hosting an Open Govt meetup at Freebase’s office, and going over to speak to his class about Freebase. Anyway, Raymond has blogged on several occasions about the lack of clarity in Recovery.gov data format specifications and the difficulty in working with data that is theoretically open but impossible to query effectively.

To my mind, if you can’t readily query against the data, it’s not really open. It’s just standing a little way out of your reach, waving and taunting. The folks at the Open Government Working Group agree. Their Open Data Principles say:

5. Machine processable
Data are reasonably structured to allow automated processing.

They expand a bit on the wiki’s talk page, saying:

P Language Rule

You know you have a truly open format if you can build a parser for it in Perl, Python or PHP in an afternoon. That parser should be able to crawl through the dataset and dump the results into a SQL database. That doesn’t necessarily mean that the data is best handled with an SQL database (although most of this material will fall into that category) – just that it can be easily imported into one.

I’d take it further. If it takes more than an hour using a P-language and standard libraries (XML parser, etc) then the data’s insufficiently open. Ideally it would take around fifteen minutes. If you think that’s too stringent, keep in mind that there’s nothing stopping agencies from providing specialised scripts or libraries themselves, which would bring the time down to “run this script from the command line, giving your MySQL database details as parameters.” Even though my rocket scientist friends assure me that rocket science isn’t as hard as people think, this still isn’t rocket science.

Anyway, all this came to mind when I saw two links from Simon Willison this morning, pointing to posts from the Sunlight Foundation: No PDFs! and Adobe is bad for Open Government.

Simon says:

At the Guardian (and I’m sure at other newspapers) we waste an absurd amount of time manually extracting data from PDF files and turning it in to something more useful. Even CSV is significantly more useful for many types of information.

CSV is a great format for open data, especially when that data takes a “rectangular” shape and includes numeric data. It’s easy to understand, easy to parse, and even non-programmers can load it up in Excel or Google Spreadsheets to take a look and make charts. I wish more people provided CSV data. But in the meantime, I’ll add my voice to the “Oh, God, noooo! Anything but PDF!” chorus.