Tuesday 25 August 2009

Prefix Match Search using Ternary Search Tree Dictionary in C#

I got a thank you email last week from someone who found my TST prefix match search code useful. I posted that code 2 and half years ago on CodeProject, very please to know someone is interested in it :-)

Ternary Search Tree is an useful data structure for certain type of project. It combines attributes of binary search trees and digital search tries. Like tries, they proceed character by character. Like binary search trees, they are space efficient, though each node has three children, rather than two. A search compares the current character in the search string with the character at the node. If the search character is less, the search goes to the left child; if the search character is greater, the search goes to the right child. When the search character is equal, though, the search goes to the middle child, and proceeds to the next character in the search string.

The idea behind ternary search trees dates back at least as far as 1964. In "Randomized Binary Searching with Tree Structures" (Communications of the ACM, March 1964), H.A. Clampett sketched a primitive version of the structure. Computer scientists have proven many theorems about the trees; for instance, searching for a string of length k in a ternary search tree with n strings will require at most O(log n+k) comparisons.

Useful ariticles:
http://www.comp.nus.edu.sg/~xujia/mirror/algorithm.myrice.com/resources/technical_artile/ternary_search_tree/terstree.htm
http://en.wikipedia.org/wiki/Ternary_search_tree

Ternary Search Tree is wildly used in computer science. There are implementation in C++, Java and .NET. I have found a C# version in http://www.codeproject.com/KB/recipes/tst.aspx.

This is a great C# implementation, although there are a couple of issues (please read the comments of the article), the main one is the partial search it provided is not what I expect, for example, provide words zymase, zyme and zymic, partial search for zym* only return zyme, partial search for zym** will return zymic. What I need is a prefix match search that will return all three words for zym. There is a guy provide a wildcard search function, but it didn't get back what I expect as well. So I wrote the prefixMatch feature based on a Java Ternary Tree example.

Feel free to change it to suite your requirements and polish it (e.g. exception handling etc ^_^). Currently it return a StringCollection. In my project I actually changed the dictionary to generic so can store and return different type of value object. I use a keysNumReturnValues to limit the number of results that returned, -1 means return all. I do this for my testing purpose.

I stored the words that in the words.txt in the TstDictionary(included in the Tst source code), when I search for zym, it give me the following results:
------------------------------------------------------------
type a prefix to do prefix match or type exit to exit
zym
-- end 19 results --
zymase
zyme
zymic
zymogen
zymogene
zymogenic
zymologic
zymological
zymologist
zymology
zymolysis
zymome
zymometer
zymophyte
zymoscope
zymose
zymosimeter
zymosis
zymotic

Hope you find the code useful.

------------------------ Source Code -------------------------------

///
/// Returns alphabetical list of all keys in the tree that begin with prefix.
///

///
The prefix.
///
public virtual ICollection PrefixMatch(string prefix)
{
return PrefixMatch(prefix, -1);
}

///
/// Returns alphabetical list of all keys in the tree that begin with prefix.
///

/// The prefix.
/// The number of values to return.
///
public virtual ICollection PrefixMatch(string prefix, int numReturnValues)
{
if (prefix==null)
throw new ArgumentNullException("prefix");
if (prefix.Length==0)
throw new ArgumentException("prefix is empty");
if (numReturnValues < -1 || numReturnValues == 0) throw new ArgumentException("invalid numReturnValues"); TstDictionaryEntry startNode = Find(prefix); if(startNode == null) return null; int keysNumReturnValues = numReturnValues; StringCollection keyList = new StringCollection(); if(startNode.IsKey) { keyList.Add(startNode.Key); DecreaseRecursionCount(ref keysNumReturnValues); } if(startNode.EqChild != null && (keysNumReturnValues == -1 || keysNumReturnValues > 0))
{
GetKeysRecursion(startNode.EqChild, ref keyList, ref keysNumReturnValues);
}

return keyList;
}


internal void GetKeysRecursion(TstDictionaryEntry currentNode, ref StringCollection keyList, ref int keysNumReturnValues)
{
if(currentNode == null) return;
if(currentNode.LowChild != null && (keysNumReturnValues == -1 || keysNumReturnValues > 0))
GetKeysRecursion(currentNode.LowChild, ref keyList, ref keysNumReturnValues);
if(currentNode.IsKey && (keysNumReturnValues == -1 || keysNumReturnValues > 0))
{
keyList.Add(currentNode.Key);
DecreaseRecursionCount(ref keysNumReturnValues);
}
if(currentNode.EqChild != null && (keysNumReturnValues == -1 || keysNumReturnValues > 0))
GetKeysRecursion(currentNode.EqChild, ref keyList, ref keysNumReturnValues);
if(currentNode.HighChild != null && (keysNumReturnValues == -1 || keysNumReturnValues > 0))
GetKeysRecursion(currentNode.HighChild, ref keyList, ref keysNumReturnValues);
}


internal void DecreaseRecursionCount(ref int keysNumReturnValues){
if(keysNumReturnValues > 0)
keysNumReturnValues--;

}

2 comments:

  1. i`m still beginner here,for this code you show it here,is it runnable?because i try to run it,i got error..Nice efford

    ReplyDelete
  2. Hi Abi,

    Sorry for late reply. Yes, the code snippet can be added to the original library provided in CodeProject and can be run from there, you can change it or polish it to suit your requirement.

    ReplyDelete