STATE OF THE ART
Searches Where Less, Not More, Is Better
September 30, 1999, By PETER H. LEWIS
Imagine New York's Greenwich Village with millions of jumbled streets, add
Tokyo's chaotic street numbering system, have it grow faster than Las Vegas,
Nev., with a million new addresses a day, add all the languages of the
United Nations, and make sure that every map is hopelessly outdated. That is
today's World Wide Web.
No wonder search engines are so important. And fortunately, some intriguing
new choices have turned up.
Until recently my favorite search engines were Hotbot (www.hotbot.com) and
Alta Vista (www.altavista.com). Hotbot is useful for finding popular Web
sites, and Alta Vista is good at ferreting out obscure information. Alta
Vista in particular returns a bazillion potential hits when it is asked to
scour the Net for a word or phrase.
But the larger the World Wide Web becomes, the more important it becomes for
search engines to return fewer results, not more. Few people have time to
click through 70,482 query matches hoping that the one they want, the most
relevant one, is in there somewhere. The engines not only have to be
smarter, but also faster.
Several search engines introduced recently deserve serious consideration,
including the revamped version of MSN.com Search (msn.com), introduced by
the Microsoft Network last week, and AOL.com Search (aol.com), to be
introduced by America Online next week. But if you are searching for the
next generation in search technologies, look for Gurunet and Google.
Gurunet (www.gurunet.com) is a remarkable but still unfinished instant
information utility. It is not a classic search engine, but rather the best
implementation yet of the "information at your fingertips" promise.
Gurunet lets an Internet user click on any word in any on-screen document --
in Web pages, E-mail messages, word processing documents, product
catalogues, even chat transcripts -- and instantly call up a variety of
relevant information.
Gurunet is free Windows software that is available for downloading from the
Web site. It is relatively compact, only 700 kilobytes. Once loaded, it
resides quietly in the Windows system tray until needed. It works only when
the user has an open connection to the Internet, so it is most useful to
people who are connected through an office network, a cable modem or
high-speed digital phone line and conversely, it can be a pain for people
who connect to the Net with dial-up modems.
The Gurunet software is activated by holding down the Alt key and clicking
on any word, which opens a new window containing a dictionary definition of
the word. To the left of the definition are choices that include, depending
on the context, a dictionary definition, an encyclopedia entry, a list of
hyperlinks to relevant Web sites, a business profile or biography, a stock
quote or other useful factoids and infonuggets. In most cases, there is no
need to type search commands.
Gurunet is able to search in context. For example, clicking on the word
"Clinton" in a sentence that mentions Hillary Rodham Clinton calls up a
biographical mention of the First Lady, not her husband, Bill, nor George
Clinton of the P-Funk All-Stars.
It scans a half-dozen words on either side of the highlighted word for clues
to the context, and, more often than not, it hits the mark.
Clicking on "AT&T;" in an E-mail message calls up a company profile; its
stock ticker symbol, stock chart and current stock price, and a selection of
recent news headlines pointing to news stories mentioning AT&T.;
Gurunet is still a work in progress, and many, if not most, of its links are
bare. Its success will depends on the addition of information databases,
which could include more reference works, E-commerce partners, product
catalogues, news services and so on.
Some of the reference works are outdated or incomplete. Using Gurunet to
click on "Caruso" in a newspaper story about Carnegie Hall, for example,
inexplicably calls up information not about the famous Italian tenor, but
about the current shortstop for the Chicago White Sox. Score that one as an
error.
But Gurunet has the potential to transform the way electronic documents are
used. Someday, for example, clicking on "village" may bring up a map of
Greenwich Village, a weather report, a roster of top-rated restaurants,
movie listings and a selection of current shopping specials. Even the best
of the conventional search engines would get lost trying to do all that.
A recent Gurunet search found no matches for "Hotbot" or "Google," although
it did correctly find "googol," the number represented as the numeral 1
followed by 100 zeroes.
A googol is a big number, but a Google is an impressive search engine that
returns a small number of responses from a big number of pages.
Google (www.google.com) is not new, having made its public debut last week
after a year of trials. But it is very fast and uses clever intuitive
techniques to rank search results by relevancy.
Search for "Bill Clinton," for example, and Google returns a handful of
entries starting with the home page of the White House. Search for "sex,"
and Google once again returns a list starting with the home page of the
White House.
Just kidding.
What Google does do, however, is to come up with a list that starts with a
guide to marriage and sex, not the long string of pornographic sites that
would pop up in the search listings of most other engines. Many disreputable
Web site operators attempt to fool search engines by salting their pages
with bogus key words in an attempt to lure unsuspecting users. Google does
not ogle.
Instead, Google determines the relevance or importance of a page in part by
measuring how many other sites have links to it. That technique enables
Google to rank even those sites that it has not visited. Many Web sites do
not allow search engines to catalogue their content, but they may hold the
information a searcher wants.
Google then takes into account the importance, measured in popularity, of
the sites that are linking to the page. Links from popular sites are given
more weight than links from obscure sites. If a lot of important sites
establish links with the page, the reasoning goes, it must be important too.
It is the cyber-age variant on the common wisdom that the best roadside
diners are the ones with all the big trucks parked outside.
Google is so good at finding relevant sites that it even offers an "I'm
Feeling Lucky" button that does not even bother to return a list of search
results -- it deposits the user directly onto the site with the highest
relevancy ranking.
All this focus on relevancy does not mean that the size of the list of Web
pages visited, the traditional measure of search engine prowess, is no
longer important, said Danny Sullivan, editor of Search Engine Watch, a Web
site published in Shrewton, England.
With several hundred million Web pages out there, it can be useful to employ
a search engine that covers as much virtual ground as possible. "Especially
if you are looking for information about obscure or unusual diseases, for
example, it can be helpful to get back 30 or 40 different results," Mr.
Sullivan said. "But for more popular sites, size does not matter."
Back to list of news