Relevancy

Ordering documents

mnoGoSearch sorts results first by relevancy and second by popularity rank.

Relevancy calculation

Relevancy for every found document is calculated as 100% multiplied by the cosine of an angle formed by weights vectors for the request and weights vectors for the document found. The number of vector coordinates is equal to the multiplication of the number of words forms in the search query and the number of sections defined in indexer.conf. Every vector's coordinate corresponds to a word in a search query that fits one of the document's sections. The values of this coordinate depend on the weight of this section, defined by the wf parameter (see the Section called Changing different document parts weights at search time). And this word is exactly the same as in the search query or its word form or synonym. And one more coordinate is equal to the average distance between searched words in the document. For the query's vector, this coordinate is equal to 0.

In the default configuration search can produce quite small score values, because it expects that the words will be found in up to 256 document sections at the same time. Please see NumSections search.htm command description how to specify the real number of sections used, and thus increase score values.

Other commands affecting document order and/or score value are: DateFactor, DocSizeWeight, MinCoordFactor, NumDistinctWordFactor, NumWordFactor, WordDistanceWeight.

Popularity rank

The popularity rank calculation is made in two stages. At first stage, the value of the Weight parameter for every server is divided by the number of links from this server. Thus, the weight of one link from this server is calculated. At second stage, for every page we find the sum of weights of all links pointed to this page. This sum is the popularity rank for this page. Self links, i.e. when a page has a link to itself, do not affect popularity rank.

By default, the value of the Weight parameter is equal to 1 for all servers indexed. You may change this value by Weight command in the indexer.conf file or directly in the server table, if you load the servers configuration from this table.

If you place the PopRankSkipSameSite yes command in the indexer.conf file, the indexer will take only inter-site links (i.e. links from a page on one site to a page on another site) for popularity rank calculation.

If you place the PopRankFeedBack yes command in the indexer.conf file, the indexer will calculate the site weight before page rank calculation. To do that, the indexer calculates the sum of popularity rank for all pages from the same site. If this sum is greater than 1, the weight for the site is set to this sum, otherwise, the site weight is set to 1.

If you place the PopRankUseTracking yes command in the indexer.conf file, the indexer will calculate the site weight as the number of tracked queries with restriction on this site.

If you place the PopRankUseShowCnt yes command in the search.htm file, then for every result shown to the user, the corresponding url.shows value will be increased by 1, if relevancy for this result is great or equal to the value specified by the PopRankShowCntRatio command (default value is 25.0). If you place PopRankUseShowCnt yes in the indexer.conf file, the indexer will add to url's PopularityRank the value of url.shows multiplied by value, specified in the PopRankShowCntWeight command (default value is 0.01).

Analyzing score values

Starting from version 3.3.7, it's possible to debug score values calculated for the documents found. In order to debug score value go through these steps:

  1. Add this code into the bottom of the <!--restop-->section of your search template:
    
<--restop-->
    ....
    [DebugScore: $(DebugScore)]
    <--/restop-->
            
  2. Add this code into the bottom of the <!--res-->section of your search template:
    
<--res-->
    ....
    [ID=$(ID)]
    <--/res-->
            
  3. Open search.cgiin your browser and run a search query consisting of multiple words. You will see document ID after the usual document information.
  4. Choose a document you want to see score debug information for. Remember its ID (let's say the ID is 100).
  5. Go to your browser's location bar, add &DebugURLID=100at the very end of the URL and press Enter.

    Note: URL will look approximately like this:

    
http://hostname/cgi-bin/search.cgi?q=test+query&DebugURLID=100
              

  6. Find a line of this format in between the search form and the results:
    
DebugScore: url_id=82 RDsum=98 distance=84 (84/1) minmax=0.99091089
                density=0.00196271 numword=0.90135133 wordform=0.00000000
            
    It will give you an idea why score for the chosen document is too high or too low and help to fine tune various parameters like WordDistanceWeight or WordDensityFactor.

Note: Score debugging currently works only for queries with multiple search words. Queries with a single search word don't return debug information.

Crosswords

This feature authorizes assignment of the words between <a href="xxx"> and </a> to the document given in the link. To enable using Crosswords, use the CrossWords command in indexer.conf and search.htm.