Thursday, May 31, 2007

Undergraduate Research Engine via Google Custom Search Engine

The coolest thing to come out of Google the past year is the Custom Search Engine. Libraries, organizations, and individuals can now provide their searchers customized search results, pointing them to selected sites. Google’s latest Librarian Central newsletter has a couple of good articles about the custom search engine: Editorial Value Meets Algorithmic Search and Google Custom Search Engine: A Powerful Tool for Knowledge Experts.
Inspired by ALA’s custom search engine, Librarian’s E-Library, I decided to create a custom search engine for my students called Undergraduate Research Engine. Its purpose is to help them find free authoritative sources online appropriate for college-level work. While I designed it specifically for my students, it is generic enough to be used by lower-level students (maybe even upper-level) at other colleges. In fact I have been using it for my own searches because I find the results more trust-worthy.


Sites Searched by the Undergraduate Research Engine
I set it up to search the entire Google index, but to rank listed sites higher. At this moment, it includes 44 sites. However, this number is misleading, because I told it to include all .gov and .edu sites. I realize some .gov and .edu sites aren’t as good as others, but a large number of the best sites out there fall into these 2 categories. It would be very time consuming and use up the 5000 site-limit to enter all the good .edu and .gov sites. I trust the Google algorithm (to a degree) to keep poorer sites further down the list. I did not include .org sites because there are too many non-profits with agendas. The infamous MartinLutherKing.org site is an obvious example.
This means all the good .org, as well as .com, .net, etc., sites will need to be entered individually. That is way I have opened up the engine to Volunteer Contributors. The more librarian-selected sites submitted, the better the search engine will be. To volunteer, go to the Undergraduate Research Engine and click “Volunteer to Contribute.”


Why does this engine exclude Wikipedia?
So far, I have excluded 2 sites from the search results: MartintLutherKing.org and Wikipedia. I don’t think MartinLutherKing.org needs an explanation. I excluded Wikipedia because I tell my students it is okay to get background information from Wikipedia, but it isn’t appropriate to cite it. Yet Google frequently lists it as a top site in search results. We talk about how their teachers look at their works cited list to see how well they did their research. After the submitted sites list is built up, I may remove Wikipedia from the excluded list, depending on the feedback I get. While I trust Google’s algorithm, it might be necessary to exclude other inappropriate sites that rank highly in search results.


More Custom Search Engine Features
Limits or Refinements: You can also include customized refinements to help your users limit and refine their searches. You can label submitted sites with refinement labels or allow Google to search the results for the chosen refinement. Since our students frequently have persuasive/argumentative assignments, I created 2 refinements: Ethics and Viewpoints. A student searching for “organ transplantation” will see these 2 limits at the top of their results and can select one to refine their search. I will add additional refinements as I think of them.


Add to your website: Google also provides a gadget to add a custom search engine to your own home page. You can get the code for Undergraduate Research Engine at gmodules.com/ig/creator?url=http%3A%2F%2Fwww.google.com%2Fcoop/api/018242335511370675646/cse/favv6aufpec/gadget. Go to amyferguson.net to see what this looks like.


Tips for Adding Sites
So, you have set up your own custom search engine, or maybe you volunteered to contribute to Undergraduate Research Engine, and you are ready to add sites. Google Documentation provides some guidance at www.google.com/coop/docs/cse/sites.html, but here are my 2 tips:
  • Use URL Patterns: I used the pattern *.edu/* to add all .edu and .gov sites to the list: You can also use this pattern to match sub-domains. *.bbc.co.uk/* will search www.bbc.co.uk and news.bbc.co.uk. A similar pattern www.example.com/* works to search all pages of a website.
  • Use the Google Marker: Add the Google Marker to your bookmarks or links toolbar. Then, each time you visit a site you would like to add to the search engine, you can click on the Google Marker to add it to your site. No need to enter the search engine’s control panel to add sites.
Volunteer to Contribute
If you would like to help make Undergraduate Research Engine the best engine it can be, volunteer to contribute by visiting www.google.com/coop/cse?cx=018242335511370675646%3Afavv6aufpec and clicking “Volunteer to Contribute.”

3 comments:

Tom Kaun said...

Hey, Amy.
I'm a high school librarian who, after a visit to Google this spring, started adding Google custom search engines to my web pages. Unlike your CSEs mine are specific to a particular topic. So far I have created one for U.S. Congress (for a government mock Senate project), a drugs and drug abuse one (for a 9th grade Social Issues class, mathematics (for advanced projects), Vietnam War (for U.S. History), general curriculum search engine for teachers, and a propaganda CSE for a world history class project. My CSEs vary, but in most cases I chose the option to search only the sites selected. I also use a CSE to search my own website, Redwood High School's Bessie Chin Library.
All of these are public CSEs and I'd love it if folks added contributions and if there was an easy way to find other useful CSEs out there which I could link to from my site.
BTW, your blog posting came up in my Google Alert for "information literacy." I'm always finding interesting things with my Google Alerts.

Amy said...

Tom,
Will you please share the links for search engines? I would love to try them out.
Thanks,
Amy

Tom Hoffman said...

Hi Amy,
martinlutherking dot org remains highly ranked by Google in large part because librarians like yourself keep linking to it, using your authority to increase its PageRank (in Google's eyes). Please remove the link above or use the rel="nofollow" attribute in the relevant anchor tag.