As of October, 2016, Embarcadero is offering a free release
of Delphi (Delphi
10.1 Berlin Starter Edition ). There
are a few restrictions, but it is a welcome step toward making
more programmers aware of the joys of Delphi. They do say
"Offer may be withdrawn at any time", so don't delay if you want
to check it out. Please use the
link to let me know if the link stops working.
Support DFF - Shop
If you shop at Amazon anyway, consider using
this link. We receive a few cents from each purchase.
Support DFF - Donate
If you benefit from the website, in terms of
knowledge, entertainment value, or something otherwise useful,
consider making a donation via PayPal to help defray the
costs. (No PayPal account necessary to donate via credit
card.) Transaction is secure.
e-mail with your comments about this program (or anything else).
This is the first installment of a series of program about words. I've included
2 programs here. The first, DicMaint, introduces the dictionary structure
and provides code to maintain them. The second program, CrosswordHelper,
is a word completion program, displays a list of dictionary words matching a mask of known letters
Background & Techniques
The first requirement for many word manipulation
problems is a dictionary. Not the the kind
with definitions, just the kind with a list of valid words. The TDic object
compresses a wordlist to about half of its uncompressed size. The wordlist is maintained as a
TStringList object. The initial letters of each word in the list are replaced by a byte with the count of letters which match the preceding
word. Unused bits of this byte also flag foreign words and abbreviations.
To speed processing, a letter index is maintained pointing to the first word in the list for each letter.
The SetRange method defines the beginning and ending initial letters and the
shortest and longest words to be retrieved. GetNextWord retrieves words within this range and returns false when
no more words are
available. Other methods load and save dictionaries (in compressed or uncompressed form), add and remove words, lookup words, etc.
A request to load a dictionary with an extension of .txt will scan a text
file and extract all unique words as a dictionary. A request to save
a dictionary with an extension of .txt will build an expanded word list with one
word per line.
Just to get us started, I've also included CrosswordHelper, a simple program using the Tdic class to find all words matching a given mask.
Unknown letters are entered as _ characters. For example, using Full.dic. the
mask "_n_e" returns "ante", "knee", and "once".
CrosswordHelper addendum, Jan 20,2001: I
added mask characters "?" as a synonym for "_", and
"*" to represent any number of unknown characters. Works great
to find rhyming words for you poets out there! Implementation was
simplified when I ran across the MatchesMask function included in
Delphi's Mask unit.
I've put three dictionaries in a separate download file. Full.dic
contains about 60,000 words. General.dic about 16,000 and Small.dic about 1500
words. All should be considered works in progress. Any errors for suggestions for improvements will be appreciated.
Small.dic is duplicated with each of the source and object downloads,
so that any download should be runnable, even though you'll want to use one of
the larger dictionaries for most purposes. In general, I'd say that
for checking words, you'll want to use the largest dictionary and for
pprograms that generate words, you would be better served by one of the smaller
Running/Exploring the Program
Suggestions for Further Explorations
||My granddaughter's electronic Hangman game
claims to have an 8,000 word dictionary. It also has "categories", I'll have to
borrow it from her to check this out but categories sounds like a good
idea for that application. Perhaps a descendant of TDic, or a
special header word in at the start of the dictionary could specify that each
word has an added category byte. Category names would also be included in the
dictionary and an index of category counts built at load time (to allow random selections of
word within a category).
Readln text file code is used to read text files that are not
compressed dictionaries. I have encountered a problem in one case
where the entire text file is a single record. The maximum record
(line) size for Readln is 255 characters. The solution is to
convert to Blockread logic, but I decided I didn't want to
read that file anyway. I'll just put this on the back burner
along with all the other stuff I'll probably never get around to.