Archive for the ‘ Technology ’ Category

Self-Unsatisfied

One problem with being in library school is that people assume that I know how to find things. The other day my friend Adam set me to work finding a particular image of a dog that he had seen at some point, somewhere on the internet. Things he was able to tell me about said image included the fact that the dog was sitting on a couch, may have been a terrier of some kind, and looked “self-satisfied.” I, on the other hand, did not. The search was not a success. The other problem with being in library school is that you feel like a failure when you don’t know how to find things.

I’ve only completed a semester and change of my MLS program, so I shouldn’t be too hard on myself, though I imagine there will be times much later on in my career when I will have to admit defeat. It will probably suck then too.

My inability to find a doggy picture to satisfy Adam was probably not due to a lack of knowledge on my part. I would know where to look for images of, say, Rembrandt drawings, but the resources in place for finding pictures of self-satisfied terriers are limited. Google image search is actually pretty great, but it still relies on standards-less, user-created metadata—if you can even really call it that.

The thing is, it’s a little difficult to imagine how it could be improved. Even if there was some kind of reliable image indexing or cataloging in place, one man’s “self-satisfied” is another man’s “serene.” Tagging is a possibility and works relatively well within smaller image collections, like the Brooklyn Museum’s, but I can’t see how it would work on such a large scale. The semantic web would certainly make this kind of searching much more feasible. Imagine being able to search for an image by subject, and then by attribute of that subject. Imagine a computer that knows what you mean by “some kind of terrier.” Definitely interesting to think about, but we’re not there yet.

Anyway, I did find this guy, and I think he is rad. I prefer my pooches forlorn-looking, I guess.

image

Addendum: Boyfriend contributes: “Flickr.” That too.

Working for The Man

I recently wrote a paper for an Information Technologies class on OCR, or Optical Character Recognition—software that allows a computer to “read” text. It works fine for things printed in the past fifty or so years, but is pretty useless when it comes to older stuff. Yellowed pages, faded text and old typefaces still confound technology. Enter reCAPTCHA, which uses crowdsourcing to convert the text in these documents to digital (searchable, cut/copy/pastable, etc.) text. Everyone has encountered CAPTCHAs—the tests ticketing websites and the like give us to prove we’re not spambots. Many CAPTCHAs use randomly generated jumbles of letter and numbers as challenges, but reCAPTCHA uses words that OCR can’t identify from old books and newspapers. More specifically, it uses one word that has been identified and one that hasn’t. If you type the one the computer knows correctly, it assumes you’re also right about the unknown word. The program waits until a word has been keyed in the same way by at least three people, at which point it considers the word identified.

Pretty cool, right? Crowdsourcing works! We are preserving information and making it accessible! These ubiquitous online challenges, which are merely irritating when you get them right, and infuriating when you don’t (I am not a robot,  goddamn it!!), are actually serving the greater good!

Or are they? reCAPTCHA is the brainchild of Luis von Ahn, a Carnegie Mellon professor, but since 2007 the program has been owned and controlled by Google. The words that we identify are slowly but surely contributing to the digitization of the archives of the New York Times and the Google books project. Helping out the evil empire that is Google always made me slightly uneasy, but how am I supposed to feel about it now that they are in the business of selling e-books? As far as I can tell, the books they’re selling in their new eBookstore are not the same texts that reCAPTCHA is helping to digitize. But this is not a voluntary program, the way Wikipedia is—we are basically forced to take part in if we want to continue our day-to-day business online—and it is serving a for-profit entity. Frankly, it feels a little sinister to me.

When I first found out about reCAPTCHA, I was surprised that Google wasn’t making more of an effort to publicize the project. Wouldn’t they want people to know that their time and effort wasn’t being wasted every time they had to enter a string of letters into a textbox? Now, though, I understand why they’re not shouting about it from the rooftops. They’ve essentially turned everyone with an internet connection into an unpaid laborer without them even knowing it.

There is one thing to come out of the reCAPTCHA project that I have only good feelings about: CAPTCHArt. This is a website of comics that people have created based on the challenges. It is random, childish, often inappropriate, and delightful. See below.

 

image