Today it is possible to access the Encyclopaedia Britannica via the Internet. The on-line version of Britannica's Pathfinder WAIS[2] search engine allows users to search in the collected files of the EB organisation. Below is a search report generated by the system in response to the following question: "What is the Norwegian national anthem?
- This is the search report for the search you ran on Sep 24 01:17:01 1995.
- It is a temporary file, and will expire about an hour after the search.
- ---------------------------------------------
- Searching Articles...
- Your query was:
- [ What is the Norwegian national anthem? ]
- The database contains 38,182,193 index terms in 109,209 documents.
- There are 767,925 different index terms. The plural stemmer was used.
- What stems to What, which is not present; trying lowercase what.
- what stems to what, which is not present OR present too often to be used.
- is stems to is, which is not present OR present too often to be used.
- the stems to the, which is not present OR present too often to be used.
- Norwegian stems to Norwegian, which occurs 1,240 times in 650 documents.
- national stems to national, which occurs 22,807 times in 13,110 documents.
- anthem stems to anthem, which occurs 227 times in 120 documents.
- The search found 13,670 documents. It took about a second.
- ---------------------------------------------
- The search was performed by a WAIS Inc server: WAIS Server 1-0-11.
- For more information send email to info@wais.com.
We can note here that this search found [in one second] in all 13,670 documents out of a total mass of 109,209 documents containing about 38 million index terms [of which 767,925 are unique]. The search procedure here is fairly simple. What basically happens is that the sentence or phrase entered by the searcher is analysed sequentially, i.e. from start to finish on the basis of the separate lexical items that comprise it. Functional lexical items such as "what", "is" and "the" are rejected on the basis of too high frequency of occurrence [or alternatively non-occurrence] as index terms. A check is also done for lexical items beginning with upper-case characters to rule out the possibility of them being nominals which are also proper names. In the search above, three lexical items from the initial query are selected as relevant for the search on the above basis, and then the search engine looks for all documents that contain Item1 AND Item2 AND Item3, [and those that contain Item1 OR Item2 AND Item3, Item1 AND Item2 OR Item3 etc.] The resulting list is then weighted, with the highest weight apparently being given to those documents with at least two or three of the items present simultaneously. The weighted list is truncated on the basis of a prior choice [via a screen menu] by the searcher to somewhere between 10 and 500 items. A look at the results of a search truncated to ten items [see appendix I], reveals that this type of search produces a great deal of redundant information. Amongst other things we are given a list of people who wrote the national anthems of many other countries than Norway. The fourth item in the search list (Richard Nordraak) contains some of the information being searched for, namely the name of the anthem of Norway:
- [*]Nordraak, Rikard [2,305 bytes]
- Nordraak also spelled NORDRAACH [b. June 12, 1842, Christiania [now
- Oslo], Nor.--d. March 20, 1866, Berlin [Germany]], Norwegian composer
- perhaps best known as the composer of the music for the Norwegian
- national anthem, " Ja, vi elsker dette landet" [1864; "Yes, We Love
- This Land"].
The purpose of my search in this case was however to retrieve the full text of the anthem (written by Bjørnstjerne Bjørnson - number five on the search list) in order to check out the lines which I included earlier on in this paper. This turned out not to be available in the Britannica database, at least not as far as I could ascertain from a subsequent, more comprehensive search with the truncation point set at 500 items. Indeed, this search only produced a vast number of much more esoteric items, such as the following:
- [*]Lons-le-Saunier [1,250 bytes]
- town, capital of Jura dèpartement, Franche-Comtè règion, eastern
- France, south-southeast of Dijon. Located at 846 feet [258 m] above
- sea level in the valley of the Solvan, it is surrounded by vine-clad
- hills. It is a pleasant spa, owing its original Roman name,
- Salinarius, to the local salt mines. It . . .
- [*]Book of the Year [1995]: Biography: McEntire, Reba [3,127 bytes]
- "Everyone's going to OD on Reba," joked country music singer Reba
- McEntire near the beginning of the year. McEntire, already considered
- the reigning queen of country, did indeed spend more time in the
- limelight in 1994 than ever before. She released Read My Mind, her
- 22nd album; published Reba: My Story, a best-selling . . .
which although interesting enough in themselves, hardly have any relevance in relation to the simple query that generated them. This experience then reflects quite clearly that at the present time, the search algorithms used in these kinds of systems are still fairly elementary, and require considerable post-search filtering on the part of the searcher to produce the required information, if this is in fact available at all. There is obviously still a lack of knowledge on the part of the search engine developers of how in fact people really do filter and process all the information that is available around them as they orient themselves and move around in the real world, and this seems to have led to a principle which says that all information items with at least one key-word or other corresponding to those words that comprise a query may be, initially at least, considered as important as another. The situated indexical function of "what" is for instance not taken account of at all. This is further illustrated by another search with an even more specific focus that I made, asking: "Who wrote the Norwegian national anthem?". As seen above, there is only really two items that are truly relevant in this connection, namely those pointing to Richard Nordraak and Bjørnstjerne Bjørnson. The search report below shows however that no real filtering occurred:
- This is the search report for the search you ran on Oct 22 12:22:31 1995.
- It is a temporary file, and will expire about an hour after the search.
- ---------------------------------------------
- Searching Articles...
- Your query was:
- [ Who wrote the Norwegian national anthem? ]
- The database contains 38,182,193 index terms in 109,209 documents.
- There are 767,925 different index terms. The plural stemmer was used.
- Who stems to Who, which is not present; trying lowercase who.
- who stems to who, which is not present OR present too often to be used.
- wrote stems to wrote, which occurs 7,917 times in 5,963 documents.
- the stems to the, which is not present OR present too often to be used.
- Norwegian stems to Norwegian, which occurs 1,240 times in 650 documents.
- national stems to national, which occurs 22,807 times in 13,110 documents.
- anthem stems to anthem, which occurs 227 times in 120 documents.
- The search found 18,726 documents. It took about 2 seconds.
- ---------------------------------------------
- The search was performed by a WAIS Inc server: WAIS Server 1-0-11.
- For more information send email to info@wais.com.
The search gave 18,726 documents, which is over 5000 more than in the previous search for the anthem itself [which gave 13,670 documents]. Also with regard to weighting, this second search was less "successful", since the most relevant items [those of Nordraak and Bjørnson] did not appear until number six and eight repspectively in the list of the ten first retrieved items [see appendix II]
[NEXT]