[Home] [Puzzles & Projects] [Delphi Techniques] [Math topics] [Library] [Utilities]
|
|
This demo for Delphi programmers is an extension of a previously posted Parsing Strings program. It's a text parser which helps identify paragraphs, sentences, words and delimiters in plain text files. Paragraphs are defined by blank lines. Sentences in this demo are identified by trailing period (.), exclamation point (!),or question mark (?) as ending delimiters. It was written to help a teacher in Indonesia who teaches English as a foreign language and is working on an automated translator to provide sample material for some Computer Based Training he is developing. I wished him luck but warned him that identifying sentences and paragraphs is likely to be the easiest part of the job. The "GetNextWord" function takes a string as input and returns a Boolean result: "True" if a word was found and "False" when there are no more words in the string. This is a non-destructive version which returns more information than the GetWord function of the Parsing Strings program. In addition to the word found, this version returns the trailing delimiters of the word, an index of the starting point of the next location to check and Boolean flags indicating whether the current word is End-of-Sentence (EOS) and End-of-Paragraph (EOP). Single Carriage Return (CR) and Linefeed (LF) pairs indicating the end of a line are ignored. Double CR-LF pairs indicate the EOP condition. Buttons allow loading a text file to process, a file of abbreviations, and starting the parsing operation which will display the parsed results in a separate display area. Version 2 adds abbreviation checking. One of the problems with checking
periods as sentence delimiters is that abbreviations containing periods will be
treated as EOS. Pass a sorted TStringlist list containing abbreviations
along with the results of GetNextWord to the "IsAbbreviation"
function and it will reconstruct matching entries with the proper parameters.
With the provided sample abbreviation list, such entries as "Mr.",Dr.", "e.g.",
"vs." etc. will be detected correctly. Ambiguous conditions when a
sentence ends with an abbreviation that is not the end of a paragraph may not be
handled properly in all cases.
Also the TMemo trick of setting SelStart and Sellength to 0 to force
the first line of he display to scroll into view, does not work with
TRichEdit. Instead we need to generate a EM_SCROLLCARET Download and Explore ProgramsClick here to download executable program Future Explorations
|
[Feedback] [Newsletters (subscribe/view)] [About me]Copyright © 2000-2018, Gary Darby All rights reserved. |