How Big is Too Big?

As I have previously noted, Wikipedia’s editors have gotten into the business of assigning certain Wikipedia entries to the dustbin of history for not being “notable” enough. The implication of this guideline for entries is that there is only so much space on the servers housing Wikipedia and so those precious megabytes (terrabytes?) shouldn’t be taken up with irrelevant entries. You can read my previous post to see why I think a notability test makes no sense, given the model that Wikipedia is built on.

In an interview with NEH Chairman Bruce Cole in the March/April edition of Humanities Wikipedia founder Jimmy Wales tries, unsuccessfully, to explain his position on this particular issue. When asked to discuss the differences between Wikipedia and the Encyclopedia Britannica, one of the differences Wales points to is the question of size and scope. As he correctly points out, Wikipedia is and will remain much greater in size and scope than any print encyclopedia both because it can be larger with only a small incremental increase in overhead and because the community-generated nature of its content makes it much easier to generate lots more content. “Certainly, you can find articles in Wikipedia that you would never expect to find in Britannica simply because of their cost structure,” Wales says in the interview (50).

Those unexpected entries include those Wales described as “fairly trivial topics” but, he continues, “the point is, well, why not cover fairly trivial topics if we have the resources to do it?” (50). When asked how Wikipedia judges what is and isn’t trivial, Wales gives the example of the obscure pop band that never charted and says such an entry would be cut before one on Thomas Jefferson. Big surprise there.

But then he goes on to say, “On the other hand, ‘wiki’ is not paper. There’s never a reason to say there’s not enough space or something is just taking up space; it does not take up any space that matters.” (50)

Okay, now I’m confused. Articles like the obscure pop band might have to be cut before those on Thomas Jefferson. If such a trade off (the OPB vs TJ) were being considered by Wikipedia’s editors and “notability” is the standard, what other possible reason could there be but server space? After all, Wikipedians love to cite the tiny staff of the organization and the volunteer editor model. So it isn’t staffing overhead that is the problem.

Wales does offer the example of the elementary school entry. He says that in general entries on elementary schools are rejected because verifiable information on the schools is hard to come by. Jimmy Wales (and all the Wikipedians who say such information is not available) very obviously does not have elementary school aged children in the U.S.! Here in Virginia we are overloaded with verifiable information about elementary schools–everything from the history of the schools, to the names of all the staff, to the percentage of children on subsidized meals, to the students’ scores on various standardized tests.

Okay, he picked a bad example to make his point.

I’ll accept that for an entry to stay up, it ought to be built on verifiable information. Shouldn’t that be the standard instead of notability? So, for instance, if he hadn’t founded first Nupedia and then Wikipedia, would Wales himself have been notable enough for his role as a purveyor of “guy searches” at Or would he have been just another B-level Internet boom guy who made a pile of cash in the go-go days of the boom and then rode off into the sunset in his Beemer ragtop?

Fortunately for all those B-level boom guys (and gals), there seems to still be space for them in Wikipedia.

12 thoughts on “How Big is Too Big?

  1. Sage


    I don’t know if you got me email, but I’d like to submit the “Wikis” section of your blog to the blog aggregator. I think many Wikipedians would find your occasional wiki-related posts very useful. Do you object to this?

    I think your take on notability and verifiability is right on (something I’ve blogged about before). It seemed for a little while like there was going to be a big change in Notability policy, but the forces have won out, sadly.

  2. tkelly7 Post author

    By all means, feel free to submit my posts to the blog aggregator.


  3. Seraphimblade


    Actually, notability has to do with quite a bit more than server space. Deletion makes no difference to server space-“deleted” articles are kept on the server in case it’s decided to undelete them at some point, and actually the deletion log entry takes up a little -more- space. So deleting an article frees up no space at all, and actually uses a tiny bit more.

    What it is, is an easy guide to “what we should have articles on”. To some degree, this is staffing-every article we have requires vandal patrolling, fact-checking, eventual improvement, and the like. Without a good deal of source material, this is effectively impossible. If no one provides a source to state that John Doe was convicted of murder in 1990, how can anyone possibly know if that’s true or not?

    But, it comes down to more than that, as well. We’re looking to create an encyclopedia. There are several things we’re specifically not looking to create, such as a directory, or an indiscriminate collection of information. (That policy has been around as long as I can remember, and as far as I know, for several years.)

    Yes, we can cover a tremendous amount more than Britannica. But there’s a difference between “expanded standards” and “no standards whatsoever”. Without any standards at all, we become a glorified Myspace. With too tight of them, we become a glorified Britannica. Yes, it is a tough balancing act to stay between those extremes. But I think it’s one we’ll pull off.

    And it is, after all, an open project. Your input is always welcome!

  4. tkelly7 Post author

    Hi Sage:

    I’ll see what I can figure out about a “wiki only” feed.


  5. tkelly7 Post author

    Hi Sb:

    I knew, of course, that server space was not the real issue, which is one of the reasons I found it interesting that Wales used that as a reason for deletion of entries in his interview with Bruce Cole.

    But I still wonder about the Notability Guidelines. Why not let the community decide instead of a smaller group of editors? For example, if the entry sees little to no traffic over a one year period of time, would that be a sufficient test of “notability”? Or, should the “long tail” prevail and anything that gets hit at least once a year stays?


  6. Seraphimblade

    Well, keeping in mind of course that this is just my way of thinking on it, of course. But the way I see it, it’s an issue of scope.

    Wikipedia aims to be an encyclopedia. In that way, it in some senses can’t be everything to everyone. We certainly aim for comprehensiveness, but in an encyclopedic sense. A lot of the stuff we delete is “Jane Doe is a 10th-grade student at Somewhere High School. She gets really good grades and likes this, that, and something else…”. Privacy issues aside, there’s already Myspace for that type of thing, and they’re far better equipped to handle it. One of our core principles is verifiability, and, really, we just can’t verify that type of stuff. A good bit of it is also spam of one form or another. Given that you blog, I imagine that you’ve dealt with that particular scourge yourself.

    However, it’s also a question of scope. We actually do have quite a few sister projects that specialize in a lot of things Wikipedia specifically doesn’t do. For example, we don’t do articles which are just a dictionary definition, but we have a sister project, Wiktionary, which is there for exactly that purpose. We don’t take how-to guides, recipes, or textbooks, but our Wikibooks and Wikiversity projects are geared toward exactly that. While we sometimes do have articles on events in the news, we don’t do every news story that comes down the pipe. But we do have Wikinews, which is set up to do exactly that.

    Wikipedia, out of all those projects, is by far the largest. If we started accepting that stuff, we would cannibalize those sister projects. And I don’t see that as a good thing, because, while the whole thing is very much a work in progress, I can certainly see them all working together. For an example, let’s take an apple.

    If you just want to know what the dictionary says an apple is, you head to Wiktionary. If you’d like a more encyclopedic treatment, that touches on the history of apple cultivation, tells you what areas of the world grow the most apples, what different varieties are out there, and so on, you visit the Wikipedia article. If you want a recipe for apple pie or a guidebook to caring for apple trees, Wikibooks is your stop. If you’d like a textbook on how to set up a commercial apple operation, Wikiversity is where you want to go. And if you’d like to see the latest news on an apple farmers’ strike, you head to Wikinews.

    As the projects better integrate and interlink, and as the sister projects come up to speed, I think this picture will become a reality. But in the meantime, even though Wikipedia has grown the fastest, and even though it has a tremendous scope, we best serve that project and others by making sure we don’t step too far outside that scope. The notability guidelines are often a bit clunky, and even within Wikipedia’s community can be somewhat contentious. But for now, I see them as a pretty good way to remind us to stay on target, and to include a whole lot of things while reminding us we’re not going for including everything.

  7. David Gerard

    Seraphimblade lacks a bit of history here. The notion of “notability” on Wikipedia was invented on Votes For Deletion as pretty much an excuse for subjective distaste.

  8. tkelly7 Post author

    Hi Sb:

    I like the apple example, because it lays out the differences between the projects quite clearly. And, of course, who wants to find entries on 10th graders who wrote about themselves (or convinced their friends to do it for them)?

    But the two examples Wales cited in the interview I blogged about are instructive, I think. The first was the rock band that never became famous. Why delete them for lack of notability? Speaking as an historian, if I were writing a history of Rock and Roll, I would find such entries invaluable. We’re drowning in information about the Beatles and the Rolling Stones, but what about the “Insect Surfers? If I were that historian, I’d be thrilled to find more information about more obscure bands who really only became well-known in their local club circuit.

    The other example Wales cited was the elementary school entry. In the interview, he said that Wikipedia’s editors had essentially drawn a line between middle schools and elementary schools (allowing for exceptions here and there), in part because verifiable information about elementary schools was too hard to come by. This statement (about the lack of verifiable information) is simply wrong, as I wrote in my post, and so it left me wondering if there was some other reason. Don’t get me wrong, I don’t think elementary school entries have much interest value, if any, but Wales was so blatantly off the mark on that one that it got my antennae twitching (once upon a time I was a reporter).

    So I’m happy to agree that the entries on the truly obscure 10th graders should be shunted to other projects (like the MemoryWiki) and that current recipes (not historical ones–foodways are of increasing interest to historians) should be in a CookWiki or whatever it might be called. But I remain unconvinced that the Long Tail should be chopped off category by category, e.g., elementary schools or rock bands that never charted.


  9. tkelly7 Post author

    Hi David:

    Thanks for pointing me to the discussion page you cited. A very interesting read to say the least. I have to say, I chuckled at the “firehose of crap” formulation. I’m familiar with that problem, albeit in more of a “garden hose” context, since we have a number of open archive projects that invite visitors to leave their thoughts, images, digital movies, etc., etc., on a particular subject. Cleaning our way through what shows up in the inbox for any one of these archives each day is quite an exercise when the traffic is high.


  10. Seraphimblade

    Actually, it looks like an article on the Insect Surfers just may be possible. With a quick look, it appears they’ve been reviewed quite a few times, even in the LA Times. I’ll have a better look later, but that article just may be workable.

    We do have a lot of articles on smaller bands, that’s not a problem. Usually, the band articles that end up getting deleted are the ones there’s -really- nothing on aside from their own Myspace and website. We’ve got articles on quite a few bands you’ve never heard of.

    Elementary schools are a bit stickier. We do have some good, well-referenced articles on those, and it might even be possible someday that it’s possible on all of them. It may turn out, though, that those are something we can better arrange in a “list by district” format, and split out separate articles on the few there’s a tremendous amount of information on. Really, they tend to be more a question of arrangement than inclusion, I don’t recall anyone objecting to a “list of schools in Somewhere School District” article, with redirects from the school pages. But in that case, the information’s still there, so our hypothetical future researcher can still get at it.

Comments are closed.