The interface to the spelling checker RM (Relocatable Module) is defined in terms of the procedure calling standard outlined in "The ARM Procedure Call Standard" reference APCS Issue 2/01. The entries are called via the SWI instruction (SWI No's to be determined), but the parameter passing conventions are as for compiled language procedure calls - treat it just like an external procedure call but replace the "BL addr" by "SWI number". This makes it easy to call a procedure contained in a relocatable module - by writing a 'stub' procedure in whatever language is being used (as long as it conforms to the calling conventions). For example, a call to a hypothetical procedure which takes two integer parameters and returns an integer result would be written in (extended) Pascal as follows: program fred(input, output); function add(i, j: integer): integer; begin *SWI_1234 {R0, R1 have the parameters in them} end {add}; {R0 on exit holds results} { (Functions with embedded machine code don't need to assign to function result) } begin a := add(i, j) end. ------------------------------------------------------------------------------ In the calls below, words are represented by two integers - byte pointers to the first char of the word, and the byte after the last character of the word. Similarly, store areas are bounded by two pointers - the first inclusive and the second exclusive. This convention makes it easier to operate on pointers in several ways. For example, a text buffer of 1024 bytes at address 20000 would be delimited by (20000,21024). A ten character word at the start of the buffer would be described by (20000,20010), i.e. the end-pointer is the start-pointer plus the length. ------------------------------------------------------------------------------ type checkcode = (Correct {0}, Dubious {1}, Error {2}) function Check(WordStart,WordEnd: address): checkcode; This function takes a word supplied as an entity and performs a simple check. With this call, it is up to the caller to decide what constitutes a word: The caller could pass "well-wisher's" as a single unit - if the dictionary contains hyphenated words, and understands the possessive case, it will return "correct" for this unit. If it does not contain hyphenated words, it is allowed to check the individual parts, i.e. "well", "wisher" and "s", and again will return "correct". However, if the unit were "well-washers", although the second case above would still say "correct" - the first case would return the result code "dubious" - because the hyphenated unit was not found in a dictionary which does contain hyphenated words. Finally, if the unit were "wll-wisher's", both cases would return "error". Callers who want an easy life will of course simply pass each contiguous alphabetic sequence separately. ----------------------------------------------------------------------------- function FindError(BufferStart, BufferEnd: address; var WordStart, WordEnd: address): checkcode; This function will search within the buffer limits for anything if finds incorrect or dubious. It will make its own decisions about what constitutes a word. A status result of "Correct" means that there are no more items in the buffer to check. This procedure would normally be called in a loop as below. The buffer might be the whole file, a cached block, or just that part which is currently being displayed on screen. An example of use would be: { Buffstart, Buffend are set up already } WHILE FindError(Buffstart,Buffend, WordStart,WordEnd)<>Correct DO BEGIN HighLight(WordStart, WordEnd); BuffStart := WordEnd; { Skip past word ready for next search } END; --------------------------------------------------------------------------- function Correct(WordStart, WordEnd, ResultStart, ResultEnd: integer; var Always: boolean): integer; {result is count of words returned} The word as returned from FindError above or as passed to Check is examined and any possible alternatives are returned. No more alternatives than will fit in the buffer will be returned. The alternatives are returned in order of likelihood, as measured by the Dammerau-Levenstein spelling metric, although this ordering may be changed under instruction from the caller (see below). You can expect only one or two corrections for average length words, but more for shorter words. "Always" will be true if the word should be corrected (to the first word in the list - there may be others too...) without asking the user first. Offering a flag leaves this decision to the utility writer or the user. Each word in the buffer will be newline terminated (NL = 10). --------------------------------------------------------------------------- type ActCode = (Only {0}, Preferred {1}, Detested {2}); function Correction(WrongStart, WrongEnd, RightStart, RightEnd: integer; Action: ActCode); The Action codes have the following meanings: Only: This alternative is the only one to be offered if this erroneous word is found again. Preferred: This alternative should be offered at the head of any multiple alternatives. Detested: This alternative should be offered at the foot of any list of alternatives; hopefully it will drop off the end... Example; Correction( <"archemedes">, <"Archimedes">, Only); (I'm cheating here: <"..."> is clearly a denotation for the two pointers) If a correction is only wanted once, the utility should not bother to call "Correction" at all. --------------------------------------------------------------------------- Procedure AddTemp(WordStart, WordEnd): boolean; This adds a new word to the store dictionary. Action will have to be taken to save the store dictionary to a file. This will be done externally by a *-command in the RM, the details of which are not needed here. The Boolean result is in case it is not possible to add new words to the dictionary; I cannot foresee this being likely, but callers should be prepared to take action (a status-line message perhaps) or not as preferred. ---------------------------------------------------------------------------- Procedure RemoveTemp(WordStart, WordEnd): boolean; This removes a word from the dictionary - possibly by adding it to a list of "bad" words rather than by actually removing it in situ. This is to allow the implementor to force permanent dictionary updates to go through some sort of validation. Again, this will be external and does not concern this document. The boolean result is in case the extraction cannot be done, although I do not intend to draw the distinction between the word not having been found, and it being there but not removable for implementation reasons. ---------------------------------------------------------------------------- That's all I've got in at the moment. The problems I can see are interfacing to BCPL, and space constraints on the dictionary. Please let me know once you have digested this whether you have any suggestions or comments. Graham. P.S. Points still under consideration are advisory messages about word usage (e.g. license, licence); consistency of equivalent words (enquire, inquire), acronyms (O.H.M.S., OHMS); Capitalisation, Proper names and place names; and English and American spellings. I can also sometimes suggest when a word might be better hyphenated even although there is no hyphenated version in the dictionary: if you offer "counterinsurgency" it will not find it, however it will find "counter" and "insurgency" and may advise that perhaps a hyphen would be better here. I have not decided whether this should be a standard part of the "Correct" procedure or should be on an option flag. I think we can get away with omitting these from A-writer, and having a stand-alone programme to handle them separately. I am also discussing a grammar checker (unix "style" like, but better) with a chap at Edinburgh who has done one which seems pretty good.