€TS1 €FN1,Times-Roman,12 €FN2,Times-Bold,18 €FN3,Times-Italic,12 €FN4,Times-Bold,14 €FN5,Courier,12 €FN6,Courier,12,,0 0 1 10 boxed .3 outline €FNA,Times-Roman,10 €SC7,14 €LM2 €DH/AAcorn Arabic V2/Graham Toal/1Program Design Document/ €TM0 €HM1 €FM1 €BM2 This document is an electronic notebook to remind me of some of the issues which keep coming up, and why I decided to choose one route and not another. Anyone else reading it should assume the standard disclaimer... 1. MOS Vector intersection. The AROM intercepts several MOS vectors. Because these vectors should be available for user-programmed Roms, the AROM must ensure that the vectors can be re-intercepted. Normally, only 2-byte vectors (re-directing into Ram) can be intercepted twice. The three-byte vectors which point into sideways Rom cannot be immediately re-directed without MOS support. In AROM, these vectors are re-directed by sending any calls back down one of the 2 spare MOS 3-byte vectors. 2. Ram workspace. Any sideways Rom claiming Ram workspace for buffers and variables must do so using the MOS sideways allocation mechanisms. The prototype ROM was semi-unclean in this respect. Although it claimed 256 bytes for itself, it always assumed they would be allocated from the same address in the IO processor - &E00. This assumption would only hold if there were no other space-claiming Roms installed in the machine with a higher priority number. 3. VDU Interceptions. There are several logical filters in the Arabic code. In the prototype implementation, all levels of filtering are thrown in together in one chunk of code. The AROM will structure these as independent pieces of code, but will not go as far as implementing independent interceptions of the vectors. The style shall be: etc. The VDU interceptions are: Filter out redundant cursor movements (GoTo X,Y, where at X,Y already). State machine treatment of Arabic characters - shape modification (i.e. code substitution) €ST Select default VDU/Select Arabic VDU (Right -> left) Mode 7 test - no Arabic in mode 7 Numeric sequence reversing in Arabic mode Intercepting all printer output generated by VDU stream €STFJ 4. Printer handling The AROM assumes an Epson-compatible downloadable printer. It supports a 'cheap & nasty' mode Arabic printer interface: it will remember up to a line's worth (80 columns? - parameter to be stored somewhere) of text and will reverse that line before sending it to the printer. The printer must have been previously downloaded with a character set which has the same 8-bit coding as the AROM set. (See '5'). The prototype sent the data directly to the parallel port, synchronously to the 'Put-in-buffer' call. ************* This was because a recursive call back to the MOS does not seem to work. (Passing characters to the printer buffer after collecting what was sent to the printer and reversing it) The problem is related to the business of kicking the dormant printer, but even when you do what you are supposed to do, it still does not work. Characters get randomly lost. I'm sure its a MOS re-entrancy problem again. PB reckons this ought to work and is probably a bug - and has offered to help out at the coding stage. A hero. The AROM will probably filter out all VDU calls which generated printer data at the WRCH stage, thus bypassing the first put-in-buffer call and the ensuing recurion. Because the context analysis is carried out at the VDU stage, the printer interception has to be carried out early (I've just realised) because we may need a non-analysed printer stream for intelligent Arabic printers which do their own context analysis, i.e. before the VDU has got its hands on it - or rather the stuff sent to the screen must not go through the intercepted VDU stream. Argh. This is a can of worms... 5. Printer downloading The 8-bit character set for the printer will be downloaded at power-up reset time, or on a CTRL-BREAK. This is only practical for a parallel printer. An econet printer should be initialised by it's local machine. A serial printer must be initialised explicitly: the delay due to auto- downloading would be unacceptable. The AROM tests to see if a printer is connected at reset by sending a couple of NULs to the printer hardware directly. ********** Using the MOS to do this test at this time doesn't seem to work. It would be much preferred if users downloaded a character set explicitly. The solution above is only of use to the more naive users. 6. High-resolution printing The Epson-compatible character set built into the Rom is a straight steal from the screen characters in the AROM, stretched to an 8 by 9 matrix. These are barely readable, although some printers can double-print at a slight offset to give the impression of more solid text. My preferred approach to better quality is to use a printer with a down-loadable high-resultion character set, say 24 by 18. If such a set were sent to a printer, and the character codes were the same, and NLQ mode were selected after downloading the characters, then the mechanism in the AROM would correctly print in Arabic at the higher resolution. This I feel is preferable to wiring in a high-resolution font in the Rom, and treating all Arabic output effectively as a high-resolution screen dump. This latter scheme would also have the disadvantage of taking up a large Rom area for the high-res character set, and (possibly, possibly not) a lot of Ram for the graphics image line buffer. Postponing the high-resolution decision to run-time will help get the product out sooner, as this can be supplied on disk after the Rom is finished. 7. Keyboard input/Copy key (Footnote: find out if new mechanisms in Int'l MOS for fonts staying in Rom and not needing to be re-defined in Ram. (Have done. It doesn't) Does MOS have 3-byte RomNo, FontAddr table?. If so, check action of read-char-under-cursor in case when current font in Rom: what I'm worried about is that if I have 4 notional fonts for start-of-word/end-of-word forms etc., then the copy-cursor will only be able to recognise the currently active font. Then the only way to make it work is to have yet another font which contains the superset of characters (in arbitrary positions) which must be selected all the time (i.e. so that it is there when the copy-key etc. is active) and to switch in and out the other 4 fonts for every character!) The context-sensitive display of characters on the screen is carried out behind the back of the user, who only ever knows of one ASCII code value for any given letter. However, the character display on the screen can have (in some cases) up to four different shapes, and of course it is simplest on a BBC system to allot those four shapes to unused values in the code table. This has the unhappy side-effect that when the BBC copy- key mechanism is invoked, four different values can be returned where only one is wanted. A simple ploy to avoid this trouble would be to filter all keyboard input while in Arabic mode through a look-up table, thus forcing the four values back to the original. However, there is a bug in this logic: we really only want to coerce values which were read from the screen - not from the keyboard. This is because the spare slots we stole to store the alternative letter forms happen to share the same code values as the cursor keys and function keys on the micro. Therefore, if one of these keys were pressed and intercepted, it would go through as an Arabic character rather than a soft-key or whatever. The solution to this is either to include a new hook in the MOS for intercepting the copy-cursor, or to do more selective table look-ups at the put-in-keyboard-buffer stage. The latter is perhaps possible under the new International MOS because genuine soft-key presses will now be preceded by a NUL character. A quick keyboard scan to see if the character being inserted has actually been pressed may also be necessary. How any of this stuff interacts with user-defined keyboards is anybody's guess. *********************** NO! In fact all that is needed is for the MOS code which copies at the Copy-cursor to use the MOS 'Read character under cursor' routine. I then intercept the OSByte which does this, and tweak the results. 8. Arabic keyboard When in Arabic mode, the values returned by the keyboard are those corresponding to the stand-alone form of the Arabic letter inscribed on the key as pressed. Since these characters are in the ASCII range 128- 255, special care has to be taken not to confuse them with soft-keys. This is in fact not a problem, as simply inserting the appropriate value in the keyboard buffer is sufficient: to make the value look like a soft-key, a NUL must be inserted before the character. The prototype implementation does the Arabic handling by intercepting the keyboard put-in-buffer vector. It also intercepted only the un-shifted characters on the keyboard - shifted keys stayed the same in Arabic as in English. This was a policy decision taken to keep the amount of information a user has to remember to a minimum. If we had some shifted Arabic keys and not others it might get confusing. The prototype also had to cope with two-key entry for accented characters: it was intended to cope with both English-Arabic and French-Arabic hybrids. The AROM will let the International MOS handle any 'foreign' characters in its own way. It was originally planned that the keyboard would be replaced by a new keyboard scanner module. This was to allow more intelligent re-definition of the keypad. This decision has been revised in light of experience. The AROM will only intercept put-in-buffer calls and will not do any special keyboard parsing - with the one exception below. *********** I may return to the scanner if we need extra keys - in which case it would easier to write if the MOS scanning entry - the one which depends on the four values of C and V - called itself recursively when it is possible to do so - i.e. in the keyboard scan it should call itself to test each single key in turn. At the moment, it does not adhere to its own conventions. 9. Screen flip The change from English to Arabic working is normally brought about by an asynchronous keypress event. This is most easily trapped by a keyboard scanner - an interception of the scan-keyboard vector. Note that this is different from the put-in-buffer vector. The flip operation must have no side-effects and be callable whilst the CPU is executing parts of the MOS, (particularly) such as in the middle of a VDU sequence. There is a complication which is yet to be resolved: if the copy-cursor is active when the screen is flipped, a spurious inverse-video block is generated where the cursor was. The screen flip is to be a pixel-by-pixel one; the original prototype was a character-by-character one. The screen-flip code has to be fast, so there will be seperate versions for each screen mode. Each byte will be swapped with its correspondent across the screen, and the pixels within each byte will be swapped via a look-up table for the appropriate mode. Since screen-handling code has to sit at address &9000 upwards (?), note that this places a restriction on the location and size of tables. (Is this correct - or am I thinking of the bit of code which tweaks the MOS VDU variables.) 10. Right-to-left mode The screen flip, Arabic keyboard, and the right-to-left mode are not tightly bound in principle, although they may be in practice. Consideration should be given to the future addition of other right-to-left langauges, such as Hebrew and Devenagari. Also, it may some day be desired that documents are generated with some lines right->left and others left->right. (Say, using escape sequences.) In such a case, you would want to switch to right->left mode without flipping what is on the screen. 11. Context analysis The VDU character output stream is filtered and the following transitions are monitored: non-alphabetic text -> start_of_word -> middle_of_word -> end_of_word. If the start_of_word and end_of_word states overlap then it is in fact in the stand_alone state. A character cannot usually be identified as belonging to any particular state until the following character is known. The form of a character as display on the screen dependings on the state of the character. Therefore to display the correct form for a character at all times, it is usually necessary to erase that character from the screen and replace it with the correct form once that is known. The start_of_word position is always known before a character is displayed, so the first letter of a word can be done correctly. Any intermediate form for subsequent letters would do, as they will be corrected on state transitions. The end_of_word form could be used because it would keep the display consistent at all times. This means that the character behind the cursor would change on every keypress, which may or may not be visually acceptable. However, there is a good implementation reason for choosing a scheme where the middle_of_word form is output by default, and the end form only issued in a 'clean-up' operation once the end of the word has been determined. (The reason being that the state machine is simpler and doesn't have a quite so complex backtrack algorithm: because of the 2-char backtrack for the case when a vowel precedes a consonant at the end of a word, the number of states would be greatly increased if this were to be handled everywhere. If the work is done at the end of a word all the paths filter through the same exit.) The prototype implementation started off clean (it had a table categorising the types of characters as vowels, consonants, and others) but developed into rather a lot of ad-hockery. The AROM shall not only have a table of character types (which is now extended: vowel, consonant, not-valid-at-end, not-valid-at-start, plus some others to be determined) but shall also have a table-driven state machine. This should shrink the code as well as make it easier to maintain (although possibly not debug, in the first instance). 12. Arabic Numerals In right -> left mode, Arabic numerals behave anomalously: they are represented left -> right as in English. This generates some problems in a portable environment. Ideally, the numeric Read and Write routines would handle the reversal of digits. However, it is not practical to change all the existing code, so a compromise has to be built in (hacked in) to the AROM. In priniciple, whenever the output stream starts to write numerals, those numbers will be filtered off into a buffer. When the data ceases to contain numerals, the contents of the buffer will be printed in reverse. In practice, the screen must be kept up to date at all times, (because the numbers in question are data being read in and echoed - we cannot wait for another character to trigger output) so the reversed buffer must be continually redrawn as each character is presented.