Chapter 6
Find What You Want--Fast
(Web 101--Making the 'Net Work for You by Wendy Lehnert, Addison Wesley, 2003)

The challenge on the Web is to be able to find what you are looking for. Using a search engine or meta search engine can be frustrating when your search query returns more than 1 million hits! You can become proficient in searching the Web so that the first ten documents in the search engine's hit list will be the ones you want to see. Some searches are difficult and require refinement and knowledge of how best to frame your search query more efficiently and effectively.

Searching
First, you will need to give thought to your search. You need to prepare for your search in advance of performing the actual search. You will need to check a variety of search engines and then pick your favorite ones. Search engines generally update their user interfaces--the look of the site--at least once a year to stay current and competitive. Each search engine has on-line documentation under a Help or Search hyperlink. A Search Engine Watch web page is listed at the end of this summary and at the end of the chapter. Before you perform a search, decide which kind of question you will use:

Voyager question

 

 

An open-ended, exploratory question. Use it when you are "just curious" about the subject you have in mind. An example could be a cursory search to see if your family name turns up any hits or you are searching for general information about a college, a company, a city, a state, or a country. Voyager and Deep Thought questions may require longer search times to find information that is satisfactory.

Deep Thought question

 

 

An open-ended question that is more specific than Voyager questions. You use this kind of question when you have a specific question in mind; however, the answers could result in a large number of possibilities. An example of a Deep Thought Question would be, "Are there lesson plans on the Web for Science teachers? Or....How are parents preparing to home school their children? Or...How can I begin a genealogy search?"

Joe Friday questions

 

 

A very specific question with a very specific, simple answer. Joe Friday Questions include questions about names, dates, locations, and other verifiable facts. Examples of Joe Friday Questions would include..."What are the entrance requirements for a specific college or university? When was the personal computer introduced for consumer purchase? What will it cost for me to fly from Dayton to Miami, FL, on a specific date?"

There are four types of Web resources available to help you find the answers to your types of questions:

Subject trees

 

 

 

 

 

"A hierarchically organized category of topics with lists of Web sites and online documents relevant to each topic. By navigating the hierarchy, you can find information sources for questions about specific topics. Subject trees are also called directories and topic hierarchies." Yahoo is the most popular Subject tree, as well as the oldest and largest one. Subject trees usually contain an alternative search engine on the site for use of a keyword query. When using a Subject tree, start at the root and branch out to more specific topics. Working with a Subject tree requires practice, but can often result in better hits than a search engine. You can try to move down another path if the first one doesn't work; broaden your search if it at first doesn't work. In addition to Yahoo, About and Open Directory Project are Subject Trees.

Clearinghouse

 

 

 

 

"A clearinghouse is a large collection of resources, Web sites, and online documents on a given topic." On the Internet, some publicly available clearinghouses have been created by researchers subsidized by federal funding. A few are compiled by librarians, teachers, or random individuals. Some clearinghouses focus on documents available online, whereas others index documents available only in hard copy" Some popular clearinghouses include ERIC--Clearinghouse on Elementary and Early Childhood Education (http://ericps.ed.uiuc.edu/), Argus Clearinghouse (http://www.clearinghouse.net/), and Environmental LawNet (http://www.environmentallawnet.com/), Computer Security Professionals (http:/www.infosyssec.com).

General search engine

 

 

 

 

Search engines use a keyword search system to look for information on the Web. Search engines are powerful tools in locating the resources you want. Search engines use software programs called spiders (see your textbook). When you use a search engine, you will insert a search query, usually a keyword or a series of keywords. The search engine will return a number of possible Web sites that are called hits. You will need to review the hits to find good ones. In order to receive good, relevant hits, make your queries good ones: five to ten keywords are better than one or two; include names of specific people if relevant; use lowercase for normal words and uppercase for proper names (first letters capitalized); use nouns relevant to your topic.
Specialized search engine A specialized search engine is like a general search engine, except that it is limited to Web pages that feature a specific topic. An example would be a search engine designed specifically for medical research.

When utilizing a Subject Tree, results are often organized into categories. Each category is a subdivision of the original selected topical area. A subject tree also displays resulting documents, which are relevant to the selected category. Documents are always represented by URLs for Web pages. If you use a subject tree's keyword search, the search will result in category matches and site matches. "A category match shows all of the places in the subject tree where you can examine a branch that has something to do with" the topic being searched. A site match is a list of relevant web sites found on the topic. Within a Yahoo search, the resulting categories often are followed by numbers within parentheses or the @ symbol--Antique@ or Furniture(10). The numbered category is a branch within the current subtree, and the number tells you how many document hits are stored within that particular branch of the tree. The @ character tags cross-listed categories that are found somewhere outside of the current subtree. Although the About subject tree is not as large as Yahoo, all of the resulting pages that are returned have been reviewed by experts in the field for reliability and accuracy! Web resources found at About can be trusted to contain good information.

When you use a Clearinghouse, remember that most of the work has been done for you if you are researching in a Clearinghouse that is relevant to your topic of research. To find an appropriate Clearinghouse, go to a general subject tree or search engine and conduct a keyword search that includes the word "clearinghouse" after the topic you are researching. For example, when looking for a Clearinghouse about education, key in "education clearinghouse." You can also use Argus, which is a clearinghouse of clearinghouses.

General Search Engine Assistance

You will need to experiment with what kind of queries you build. A query can be one word, a group of words, or a specific sentence or question. Search engines use different methods for indexing documents found in a query. If you knew how search engines created document indexes, you could design your Web pages so that yours would be included in the searches. You could also design your search queries with the indexing in mind. Selective text indexing searches through web page titles, first paragraphs, and document hyperlinks. Selective text indexing may also search bulleted lists within a web page document. Full-text indexing scans entire documents for indexing terms. It is a time-consuming method of indexing since it not only searches full text, but maintains some aspects of the selective text indexing for key terms also. When a search results in thousands or millions of hits, you probably will not have access to more than 1 ,000 of the ranked hit list.

Successive query refinement is the process of moving from an initial experimental query to a final successful query. The kinds of queries you use will determine your success:

Fuzzy Queries allow users the option of typing in full sentences, questions, phrases or incomplete sentence fragments in their queries. You can improve a fuzzy query by marking required terms as follows:

+ required term
- prohibited term
"" exact phrase matching (surrounded by quotes)

Some search engines will perform intelligent concept extraction, which automatically augments your query with synonyms and related terms. Some searches can be refined by the user by using relevance feedback. This is when the user makes an initial query and then reviews some of the resulting hits. The user then marks a number of good hits; the search engine will examine the marked hits and make an attempt to characterize those hits with relevant terms. Your search engine will automatically take over the job of constructing a good query at this point. HotBot and AltaVista will offer term counts, which is a statistic that tells you how many times a keyword (term) has been seen throughout the entire document database. This is not the same as the number of documents that contain the keyword. It is the number of times the keyword is found in a specific hit.

The use of logical operators by a number of search engines enables the user to complete a Boolean query.

AND Narrows a query by stating that it must include both x AND y
OR Broadens a query by stating that it could include x OR y
AND NOT A negative operator that excludes x
NEAR Similar to AND but this Boolean is requesting that the items are near to each other in the resulting hits.

Some search engines also offer the use of wildcards to obtain variations on a word (art, artist, arts, artwork all from art*).

The Help portion of a Search Engine will give you the information you need for constructing advanced queries and will give you information about whether you can use Boolean operators.

Some search engines allow domain searches or host tag , which make it possible to remove specific domains from your hits. Following are examples of removing the .com host when searching for Web pages relating to wreaths:

-host:.com +wreath (for AltaVista)
-domain:com +wreath (for HotBot) NOTE: no period included before the com

A title search narrows the search to documents that contain a key word or phrase in the title of a Web page. Create a title search on AltaVista by prefacing your search term with a title tag as follows: title:"Notre Dame University" or on Google with intitle: Notre Dame University.

Specialized Search Engines
Invisible Web (http://invisibleweb.com) - searches for searchable sites of publications, mailing lists, and large document collections. CNET's Search.com is a search engine for search engines through a Find A Search Link.

If you wish to find legitimate news articles on the web, it can be difficult, but there are two places where you can search: http://www.findarticles.com/ and http://www.magportal.com.

Take caution when searching for information on the Internet. Some sources are legitimate sites and can have useful information. Other sites are useless. You must learn to evaluate sites. When information is found on the Web, it is useless for research purposes if it does not identify the author, as well as additional information about the author. Look for a copyright statement on the Web page. If there is some doubt about the statement or if no statement is present, contact the author about the material's originality. Information posted on the Web by a magazine may post only partial versions of the printed articles. Be aware of advertising on the Web that may appear to be legitimate information, but is, indeed, advertising.

Related Web Sites:

Search Engine Watch http://searchenginewatch.com/
Tool Kit for the Expert Web Searcher http://www.lita.org/committe/toptech/toolkit.htm
Evaluating Web Resources http://www2.widener.edu/Wolfgram-Memorial-Library/webeval.htm
Search Engine Guide http://www.searchengineguide.com

Exercises:
In addition to the end-of-chapter assigned exercises, practice with search engines by completing the following:

  1. Using one of your three search topics, do a search on three different search engines. Print out the first page of hit results and submit them. Do a search on this same topic on Yahoo, using the Subject Tree. Print your final page of results and submit. Notice the difference in the number of web sites found by each of the different search techniques you use as well as the difference in the hits returned.
  2. Go to http://www.altavista.com and perform a search at the Altavista home page for pie recipes. Try the following search parameter: apple pie recipe -cinnamon. Scroll down and print the first page of hits found.
  3. Remaining at http://www.altavista.com, click on the link for Advanced Search. Click on Search wit this boolean expression and type in: apple AND pie AND recipe NEAR caramel AND NOT cinnamon.. Perform the search and print the resulting list of hits that have been returned.
  4. At the AltaVista home page, click on Advanced Search and perform the following under the Boolean query heading: type in
    genealogy AND [country of origin] AND [a family surname]
    Substitute the country where one of the sides of your family originated for [country of origin]; substitute a family name from that country for [a family surname]. If you do not know your country of origin or do not have a family name to research or you want to try something else, complete a substitute search as follows:
    Under Boolean query, type: music AND [a classification of music you like] AND [musical instrument you enjoy hearing]; for example, music AND classical AND guitar. Print the results of your search and submit.