Posts Tagged ‘ Google ’


One problem with being in library school is that people assume that I know how to find things. The other day my friend Adam set me to work finding a particular image of a dog that he had seen at some point, somewhere on the internet. Things he was able to tell me about said image included the fact that the dog was sitting on a couch, may have been a terrier of some kind, and looked “self-satisfied.” I, on the other hand, did not. The search was not a success. The other problem with being in library school is that you feel like a failure when you don’t know how to find things.

I’ve only completed a semester and change of my MLS program, so I shouldn’t be too hard on myself, though I imagine there will be times much later on in my career when I will have to admit defeat. It will probably suck then too.

My inability to find a doggy picture to satisfy Adam was probably not due to a lack of knowledge on my part. I would know where to look for images of, say, Rembrandt drawings, but the resources in place for finding pictures of self-satisfied terriers are limited. Google image search is actually pretty great, but it still relies on standards-less, user-created metadata—if you can even really call it that.

The thing is, it’s a little difficult to imagine how it could be improved. Even if there was some kind of reliable image indexing or cataloging in place, one man’s “self-satisfied” is another man’s “serene.” Tagging is a possibility and works relatively well within smaller image collections, like the Brooklyn Museum’s, but I can’t see how it would work on such a large scale. The semantic web would certainly make this kind of searching much more feasible. Imagine being able to search for an image by subject, and then by attribute of that subject. Imagine a computer that knows what you mean by “some kind of terrier.” Definitely interesting to think about, but we’re not there yet.

Anyway, I did find this guy, and I think he is rad. I prefer my pooches forlorn-looking, I guess.


Addendum: Boyfriend contributes: “Flickr.” That too.


Just Google It



Yesterday, a friend from college sent me this text: “Think I broke the spirit of the librarian at my internship because he couldn’t find something on the database and I was like can we just Google it and it worked.”

I’ve had a lot of conversations in library school about the relevance of and need for librarians in the age of search engines. Obviously, I think there is still a place for information professionals; if I didn’t, I wouldn’t be so intent on becoming one. We can and will serve as navigators, evaluators, educators in information literacy, content creators, and more. But I object to the reflexive, slightly desperate way in which some people in the field defend themselves against real or imagined accusations of obsolescence. There is some tendency to over-compensate, to insist that any information that is widely available and easily found has little value, and that the only sources that are really worthwhile are those that require us, the librarians, to act as guides to them. The fact is, though, that there are times when Google is the way to go.

Generally, however, journals and databases are enormously valuable resources. But I have a real issue with them, or at least with their current distribution model, which shuts out people without connections to an institution that can afford the ridiculously expensive subscription fees. Even if money is no object, it is frequently impossible to get an individual subscription. I find it really problematic that so much information—the very information that we as a profession insist is really valuable—is made inaccessible. There are a small number of public institutions (bless you, NYPL) that do offer access to certain subscription-only resources, and a handful of open-access journals and repositories (like Harvard’s DASH), but these aren’t the norm. I feel strongly that these models need to become the standard. Aren’t we the ones who say that information wants to be free? Open access and other solutions to this problem are already being widely discussed. What I haven’t heard anyone mention is the conflict of interest that arises when librarians, who claim to strive for “equitable access” (see the 1st statement of the ALA’s code of ethics), continue to push resources that by nature create an inequality of access.

Image: Knuckles, with design both topical and subcutaneous, of the lovely and talented Jess Versus.

Working for The Man

I recently wrote a paper for an Information Technologies class on OCR, or Optical Character Recognition—software that allows a computer to “read” text. It works fine for things printed in the past fifty or so years, but is pretty useless when it comes to older stuff. Yellowed pages, faded text and old typefaces still confound technology. Enter reCAPTCHA, which uses crowdsourcing to convert the text in these documents to digital (searchable, cut/copy/pastable, etc.) text. Everyone has encountered CAPTCHAs—the tests ticketing websites and the like give us to prove we’re not spambots. Many CAPTCHAs use randomly generated jumbles of letter and numbers as challenges, but reCAPTCHA uses words that OCR can’t identify from old books and newspapers. More specifically, it uses one word that has been identified and one that hasn’t. If you type the one the computer knows correctly, it assumes you’re also right about the unknown word. The program waits until a word has been keyed in the same way by at least three people, at which point it considers the word identified.

Pretty cool, right? Crowdsourcing works! We are preserving information and making it accessible! These ubiquitous online challenges, which are merely irritating when you get them right, and infuriating when you don’t (I am not a robot,  goddamn it!!), are actually serving the greater good!

Or are they? reCAPTCHA is the brainchild of Luis von Ahn, a Carnegie Mellon professor, but since 2007 the program has been owned and controlled by Google. The words that we identify are slowly but surely contributing to the digitization of the archives of the New York Times and the Google books project. Helping out the evil empire that is Google always made me slightly uneasy, but how am I supposed to feel about it now that they are in the business of selling e-books? As far as I can tell, the books they’re selling in their new eBookstore are not the same texts that reCAPTCHA is helping to digitize. But this is not a voluntary program, the way Wikipedia is—we are basically forced to take part in if we want to continue our day-to-day business online—and it is serving a for-profit entity. Frankly, it feels a little sinister to me.

When I first found out about reCAPTCHA, I was surprised that Google wasn’t making more of an effort to publicize the project. Wouldn’t they want people to know that their time and effort wasn’t being wasted every time they had to enter a string of letters into a textbox? Now, though, I understand why they’re not shouting about it from the rooftops. They’ve essentially turned everyone with an internet connection into an unpaid laborer without them even knowing it.

There is one thing to come out of the reCAPTCHA project that I have only good feelings about: CAPTCHArt. This is a website of comics that people have created based on the challenges. It is random, childish, often inappropriate, and delightful. See below.