Hashing

[Home]   [Puzzles & Projects]    [Delphi Techniques]   [Math topics]   [Library]   [Utilities]

 

Search

Search WWW

Search DelphiForFun.org

As of October, 2016, Embarcadero is offering a free release of Delphi (Delphi 10.1 Berlin Starter Edition ).     There are a few restrictions, but it is a welcome step toward making more programmers aware of the joys of Delphi.  They do say "Offer may be withdrawn at any time", so don't delay if you want to check it out.  Please use the feedback link to let me know if the link stops working.

 

Support DFF - Shop

 If you shop at Amazon anyway,  consider using this link. We receive a few cents from each purchase.   Thanks.


Support DFF - Donate

 If you benefit from the website,  in terms of knowledge, entertainment value, or something otherwise useful, consider making a donation via PayPal  to help defray the costs.  (No PayPal account necessary to donate via credit card.)  Transaction is secure.

Contact

Feedback:  Send an e-mail with your comments about this program (or anything else).

Search DelphiForFun.org only

 

 

Hashing is a  mathematical technique to convert a key into an array index.  It is useful if we have keys with a limited set of unique values.  For example, the set of Delphi reserved words or the set of words in the dictionary.  It is further desirable that we  access the entries many times -  since retrieval is much faster than other types of searching, applications with many lookups are ideal candidates. 

The function that performs this conversion is called the hashing function and the array that we will index is called the hash table, the positions in the table are frequently referred to as buckets.  The ideal hashing function for a fixed size set of key values would translate every valid key into a unique bucket.    For the more general case, the best we can hope for is that the hashing function creates hashed keys that are uniformly randomly distributed across the the table.  

Clearly the hash table must be at least as large as the number of unique values that will be inserted.  For performance reasons, it is desirable to have the table considerably larger.   This is because, most hashing functions are many to one, i.e. it is possible that more than one input will produce the same output.  For valid entries, this is known as a collision, and must be handled.  One common technique is to start checking buckets sequentially until we find a matching entry or we find an empty bucket.  There are other techniques as well.   The ratio of the number of entries to the number of buckets is called the loading ratio.    For a well designed hashing algorithm, the average number of searches required when accessing the table is  1+ loading2 ratio .    In other words, if the table twice as large as its contents (loading ratio=0.5), we can expect to search 1.25.  entries on average to find an entry or find a location to insert the entry.  Still much shorter than a binary search (and much much faster than a sequential search).

Here's a link to download the source for HashWordCount, a version of the word counting program that defines and uses a first version of a THashStr object to count words in a document.  Buttons using list.find (binary search) and list.indexof (sequential search) are included  to compare speeds.   The THashStr  object defined here assumes that keys are strings and that the entries to be accessed are objects derived from TObject.  This this case the objects are TCount objects containing only a single integer (the word count) but more complex objects could be used.  This program also defines simple StartTimer and StopTimer routines using the QueryPerformanceCount API to  accurately compute run times (microsecond resolution).     

 
  [Feedback]   [Newsletters (subscribe/view)] [About me]
Copyright 2000-2017, Gary Darby    All rights reserved.