Friday, November 7, 2008

Reading Notes for Week 10: Nov. 11

Web Search Engines: Part 1 and Part 2 by David Hawking

The main point of these two articles are the nature and infrastructure of a search engine. Oddly enough the author believes that search engines should not index every web page. From what I understood indexing every page slows down the search and the probability of fetching "low-value" pages happens continuously, however, indexing has proven to be an effective strategy to find information. I did not find his arguments convincing and crawling sounds a lot like indexing to me, which leads me to Part 2 of his article.

In Part 2 he feebly attempts to explain indexing alogrithim , I got to the second paragraph and reread it over and over again. This was difficult to comprehend.

Current Developments and Future Trends for the OAI Protocol for Metadata Harvesting by Sarah L. Shreeves, Thomas G. Habing, Kat Hagerdorn, and Jeffrey A. Young

An interesting article that discusses current developments in the Open Archives Institute and its projects. For example, the Protocol for Metadata Harvesting is a tool developed by the center to facilitate interoperability between different collection standards in XML, HTTP, and Dublin Core.

3 comments:

Justin Charles Hite said...

Algorithms aren't fun. But it is interesting because I took a math class for liberal arts majors and we learned efficiency algorithms, and computers usually use the same ones to do their work. I think that crawling could be considered selective indexing, if that makes his argument make more sense.

spk said...

yeah, getting to read about dublin core again makes me giddy with excitement. I wish that i could pay 25 grand all over again in hopes of reading "dublin core" somewhere, anywhere. this is really the best of all possible worlds.

Justin Charles Hite said...

I think that "Dublin Core" could be a new genre of Irish Hardcore.