Can we trust an anonymous, collaborative, unprecedented Internet-scale encyclopedia such as Wikipedia?
Note: This entry was initially going to be a comment to the Cognitive Daily post, “Is there wisdom in crowds?” But as you can see, it is far too long to be a comment
The precursor to Wikipedia was Nupedia. It was run by Larry Sanger, a proper epistemologist. It had a seven point review process, as stringent as the Encyclopedia Britannica, if not moreso. Over three years, they finished around 25 articles. Relative to Wikipedia, Nupedia was a dismal failure. There are many lessons to be learned from this. Wikipedia is successful in part because it is able to tap into the wisdom of the crowds, as discussed at Cognitive Daily. It is also true that their is a sort of evolutionary process at work; memes, if you will. A researcher from UC Santa Cruz recently came up with a metric of judging text quality called Trust (http://trust.cse.ucsc.edu/). Although a complicated algorithm, what it boils down to is this: If you are a piece of text that is written by an author who tends to write text that tends to stay in the encyclopedia for a long time, and you have been in the encyclopedia for a long time, then you have a high Trust score. Luca has provided an interesting demo at the above URL, which you should try.
I have also done some work on predicting the quality of Wikipedia articles (pdf). This is a difficult machine learning task because of a particular point that I would like to add to this conversation: The encyclopedia is mostly unwritten. There is, as of yet, very little high quality text to train on. It is a work in progress! If you click on the Random Article link, you have a roughly 75% chance of landing on a “stub,” which is a very short article, that has yet to receive significant community attention. Keep in mind that the encyclopedia was started in the year 2000, and is fast approaching ten million articles in roughly 250 languages. The combined fortunes of Bill Gates and Warren Buffet could not have hired enough people to write as much as they have. And yet, the encyclopedia is mostly unwritten.
Nobody knows better the failings of Wikipedia than the Wikimedia Foundation (WMF), who run the encyclopedia, as evidenced by the Wikipedia article on “Criticism of Wikipedia.” In my view, quality control in an Internet-scale collaborative authoring system, with anonymous authors, is an unsolved problem. There is no one researcher that can solve this problem. Somewhat akin to the game Nomic, the rules of governance and quality are ever changing, albeit collaboratively determined, in much the same way that the encyclopedia is written. I see no reason to discredit such a process before it has had a reasonable amount of time to run its course. None of us knows what will emerge from it, but it is certainly exciting. New experiments in quality control methods are being conducted by the WMF, but they are just that - experiments.
Epistemologists love to sit around and critique the encyclopedia. I’m sure it’s no coincidence that it tends to have low quality philosophy articles. But its merits seem to outweigh any significant criticisms. During midterms and finals, walk into a busy computer lab on your campus and take a brief census of the number of students scanning Wikipedia articles. We use it as a first source of information on our lab mailing list, I have caught professors giving lectures based on information in Wikipedia articles, and I use it myself and find it immensely useful. While there is certainly significant difficulty in judging the veracity of the text contained within, students should not be taught to avoid the encyclopedia and go find an ‘authoritative’ primary or secondary source. They should be taught to judge critically all of the information they digest. Educators and institutions from the previous generations who are not yet on board with the 21st century may never have this realization. But the next generation of students, who are growing up without having a good reason to reference a printed encyclopedia such as Britannica, will recognize how valuable it is. In fact, they won’t recognize Wikipedia as being any different from anything else, except perhaps, more useful.
If anything, Wikipedia is disillusioning. It is removing the bias in our educational system that, when you want “facts,” you should go to the library. Because everything you find written there is true. This isn’t the exact message being taught in grade school, but it’s certainly the impression I had growing up, before I knew any better. I suspect I’m not alone… As long as we continue using it, the evidence seems to be that it will only get better. Britannica, on the other hand, is for the most part stagnant. Once a factual error has been printed, it can never be corrected. So why not continue by supporting the encyclopedia that was started with the goal of making all human knowledge free, for all humans? That’s what I plan to do…
One Comment, Comment or Ping
Ned Goudy
I share your interest and awe in the
phenomena which ‘is’ Wikipedia.
I am a voracious reader, and despite
years of education could probably never
have the patience or the accurate language skills to be a gatekeeper of the collective truth of humanity… which Wikipedia IMHO is shaping up to be.
I stumbled into the manner in which such gate keepers could be studied, both in
terms of the pages they maintain, their
credentials and the play in page disputes
and edits. My entrance to this was on the page for “Theodicy,” Hot disputes, from a variety of sources showed me the care that
apparently, or mostly enlightened, or knowledgeable people put into posting
to Wikipedia.
It appears that the tab labeled “discussion”
and “history” at the top of every page, for
me, lend an eye into the psychographic,
and other characteristics of the people who
maintain the pages, dispute the pages and some of the ‘pirates’ who have hacked the
pages.
This makes me more comfortable with
drawing my own conclusions, particularly
on those pages dealing with the social sciences. The hard science sites look
plausible to me too, but many if not all,
as they say… “are greek to me…”
I would be interested in the cross referencing of some of the math, set theory, game theory and statistical pages
with theories in social sciences and their
application to modern problems.
All in all, I am impressed with this interesting experiment in cyberspace.
If we could approach the allocation of
scarce resources on our planet that way
we do with free knowledge, we would go a long way toward extending the event horizon of the human species…
While I don’t consider myself to be a dummy, it would be nice if some of the hard
science stuff were made more readable for smaller, or less scientifically trained minds.
Examples in the common place would be
nice, too. Einstein for example had his metaphor of the man on a moving train
IIRC, playing catch with a friend at the side
of the tracks to describe the theory of relativity. These kinds of metaphors would be instructive throughout Wikipedia.
Just my two cents,…
Ned
Oct 19th, 2007
Reply to “Can we trust an anonymous, collaborative, unprecedented Internet-scale encyclopedia such as Wikipedia?”