Dictionary Scoring


Description

Dictionary scoring assignes a value to a text based on the number of words and their length that appear in the text and also in the dictionary.

Words are truncated to a certain length. This is to compensate for the fact the dictionary might not contain all the possible suffixes for a certain word stem.

Using a weight parameter longer words can be given greater impact.

Words of length less than 4 are ignored.

Scoring Algorithm

int Score(DICTIONARY Dictionary, char Text[], int MaxWordLength, int Weight)
{
    int score=0;

    for(int i=0; i<sizeof(Text)-MaxWordLength; i++) 
	for(j=MaxWordLength; j>=3; j-- )
	   if( Dictionary.FindWord(&Text[i],j) )
	   {
		score += Weight * j;
		i += j-1;
		break;
           }

    return score;
}
This procedure might miss some characters at the end which the real code doe not.

In order to speedup the lookup operation of the dictionary we use Bloom-filters.


Last Update: 15.04.96 (Format: DD.MM.YY)