€TS1 €FN1,Times-Roman,12 €FN2,Times-Bold,18 €FN3,Times-Italic,12 €FN4,Times-Bold,14 €FN5,Courier,12 €FN6,Courier,12,,0 0 1 10 boxed .3 outline €FNA,Times-Roman,10 €SC7,14 €LM2 €DH/AAcorn Arabic V2/Graham Toal/1Program Design Document/ €TM0 €HM1 €FM1 €BM2 This document is an electronic notebook to remind me of some of the issues which keep coming up, and why I decided to choose one route and not another. Anyone else reading it should assume the standard disclaimer... 1. MOS Vector intersection. The AROM intercepts several MOS vectors. Because these vectors should be available for user-programmed Roms, the AROM must ensure that the vectors can be re-intercepted. Normally, only 2-byte vectors (re-directing into Ram) can be intercepted twice. The three-byte vectors which point into sideways Rom cannot be immediately re-directed without MOS support. In AROM, these vectors are re-directed by sending any calls back down one of the 2 spare MOS 3-byte vectors. 2. Ram workspace. Any sideways Rom claiming Ram workspace for buffers and variables must do so using the MOS sideways allocation mechanisms. The prototype ROM was semi-unclean in this respect. Although it claimed 256 bytes for itself, it always assumed they would be allocated from the same address in the IO processor - &E00. This assumption would only hold if there were no other space-claiming Roms installed in the machine with a higher priority number. 3. VDU Interceptions. There are several logical filters in the Arabic code. In the prototype implementation, all levels of filtering are thrown in together in one chunk of code. The AROM will structure these as independent pieces of code, but will not go as far as implementing independent interceptions of the vectors. The style shall be:

    etc.

The VDU interceptions are:

Filter out redundant cursor movements (GoTo X,Y, where at X,Y already).
State machine treatment of Arabic characters - shape modification (i.e.
code substitution)
€ST
Select default VDU/Select Arabic VDU (Right -> left)
Mode 7 test - no Arabic in mode 7
Numeric sequence reversing in Arabic mode
Intercepting all printer output generated by VDU stream
€STFJ


4. Printer handling

The AROM assumes an Epson-compatible downloadable printer.  It supports a
'cheap & nasty' mode Arabic printer interface: it will remember up to a
line's worth (80 columns? - parameter to be stored somewhere) of text and
will reverse that line before sending it to the printer.  The printer must
have been previously downloaded with a character set which has the same
8-bit coding as the AROM set. (See '5'). The prototype sent the data
directly to the parallel port, synchronously to the 'Put-in-buffer' call. 

************* This was because a recursive call back to the MOS does not
seem to work. (Passing characters to the printer buffer after collecting
what was sent to the printer and reversing it) The problem is related to
the business of kicking the dormant printer, but even when you do what you
are supposed to do, it still does not work.  Characters get randomly lost. 
I'm sure its a MOS re-entrancy problem again.  PB reckons this ought to
work and is probably a bug - and has offered to help out at the coding
stage.  A hero.

The AROM will probably filter out all VDU calls which generated printer
data at the WRCH stage, thus bypassing the first put-in-buffer call and
the ensuing recurion.

Because the context analysis is carried out at the VDU stage, the printer
interception has to be carried out early (I've just realised) because we
may need a non-analysed printer stream for intelligent Arabic printers
which do their own context analysis, i.e. before the VDU has got its hands
on it - or rather the stuff sent to the screen must not go through the
intercepted VDU stream. Argh. This is a can of worms...


5. Printer downloading

The 8-bit character set for the printer will be downloaded at power-up
reset time, or on a CTRL-BREAK.  This is only practical for a parallel
printer.  An econet printer should be initialised by it's local machine. A
serial printer must be initialised explicitly: the delay due to auto-
downloading would be unacceptable. The AROM tests to see if a printer is
connected at reset by sending a couple of NULs to the printer hardware
directly.

********** Using the MOS to do this test at this time doesn't seem to
work.

It would be much preferred if users downloaded a character set
explicitly. The solution above is only of use to the more naive users.


6. High-resolution printing

The Epson-compatible character set built into the Rom is a straight steal
from the screen characters in the AROM, stretched to an 8 by 9 matrix. 
These are barely readable, although some printers can double-print at a
slight offset to give the impression of more solid text.  My preferred
approach to better quality is to use a printer with a down-loadable
high-resultion character set, say 24 by 18.  If such a set were sent to a
printer, and the character codes were the same, and NLQ mode were selected
after downloading the characters, then the mechanism in the AROM would
correctly print in Arabic at the higher resolution.  This I feel is
preferable to wiring in a high-resolution font in the Rom, and treating
all Arabic output effectively as a high-resolution screen dump.  This
latter scheme would also have the disadvantage of taking up a large Rom
area for the high-res character set, and (possibly, possibly not) a lot of
Ram for the graphics image line buffer.  Postponing the high-resolution
decision to run-time will help get the product out sooner, as this can be
supplied on disk after the Rom is finished.


7. Keyboard input/Copy key

(Footnote: find out if new mechanisms in Int'l MOS for fonts staying in
Rom and not needing to be re-defined in Ram. (Have done. It doesn't)  Does
MOS have 3-byte RomNo, FontAddr table?. If so, check action of
read-char-under-cursor in case when current font in Rom: what I'm worried
about is that if I have 4 notional fonts for start-of-word/end-of-word
forms etc., then the copy-cursor will only be able to recognise the
currently active font.  Then the only way to make it work is to have yet
another font which contains the superset of characters (in arbitrary
positions) which must be selected all the time (i.e. so that it is there
when the copy-key etc. is active) and to switch in and out the other 4
fonts for every character!)

The context-sensitive display of characters on the screen is carried out
behind the back of the user, who only ever knows of one ASCII code value
for any given letter.  However, the character display on the screen can
have (in some cases) up to four different shapes, and of course it is
simplest on a BBC system to allot those four shapes to unused values in
the code table.  This has the unhappy side-effect that when the BBC copy-
key mechanism is invoked, four different values can be returned where only
one is wanted.  A simple ploy to avoid this trouble would be to filter all
keyboard input while in Arabic mode through a look-up table, thus forcing
the four values back to the original.  However, there is a bug in this
logic:  we really only want to coerce values which were read from the
screen - not from the keyboard.  This is because the spare slots we stole
to store the alternative letter forms happen to share the same code values
as the cursor keys and function keys on the micro.  Therefore, if one of
these keys were pressed and intercepted, it would go through as an Arabic
character rather than a soft-key or whatever.

The solution to this is either to include a new hook in the MOS for
intercepting the copy-cursor, or to do more selective table look-ups at
the put-in-keyboard-buffer stage.  The latter is perhaps possible under
the new International MOS because genuine soft-key presses will now be
preceded by a NUL character.  A quick keyboard scan to see if the
character being inserted has actually been pressed may also be necessary. 
How any of this stuff interacts with user-defined keyboards is anybody's
guess.

*********************** NO! In fact all that is needed is for the MOS code
which copies at the Copy-cursor to use the MOS 'Read character under
cursor' routine. I then intercept the OSByte which does this, and tweak
the results.


8. Arabic keyboard

When in Arabic mode, the values returned by the keyboard are those
corresponding to the stand-alone form of the Arabic letter inscribed on
the key as pressed.  Since these characters are in the ASCII range 128-
255, special care has to be taken not to confuse them with soft-keys. 
This is in fact not a problem, as simply inserting the appropriate value
in the keyboard buffer is sufficient: to make the value look like a
soft-key, a NUL must be inserted before the character.

The prototype implementation does the Arabic handling by intercepting the
keyboard put-in-buffer vector.  It also intercepted only the un-shifted
characters on the keyboard - shifted keys stayed the same in Arabic as in
English.  This was a policy decision taken to keep the amount of
information a user has to remember to a minimum.  If we had some shifted
Arabic keys and not others it might get confusing.

The prototype also had to cope with two-key entry for accented characters:
it was intended to cope with both English-Arabic and French-Arabic
hybrids.  The AROM will let the International MOS handle any 'foreign'
characters in its own way.

It was originally planned that the keyboard would be replaced by a new
keyboard scanner module.  This was to allow more intelligent re-definition
of the keypad.  This decision has been revised in light of experience. 
The AROM will only intercept put-in-buffer calls and will not do any
special keyboard parsing - with the one exception below.

*********** I may return to the scanner if we need extra keys - in which
case it would easier to write if the MOS scanning entry - the one which
depends on the four values of C and V - called itself recursively when it
is possible to do so - i.e. in the keyboard scan it should call itself to
test each single key in turn.  At the moment, it does not adhere to its
own conventions.

9. Screen flip

The change from English to Arabic working is normally brought about by an
asynchronous keypress event.  This is most easily trapped by a keyboard
scanner - an interception of the scan-keyboard vector.  Note that this is
different from the put-in-buffer vector.  The flip operation must have no
side-effects and be callable whilst the CPU is executing parts of the MOS,
(particularly) such as in the middle of a VDU sequence.

There is a complication which is yet to be resolved: if the copy-cursor is
active when the screen is flipped, a spurious inverse-video block is
generated where the cursor was.

The screen flip is to be a pixel-by-pixel one; the original prototype was
a character-by-character one.  The screen-flip code has to be fast, so
there will be seperate versions for each screen mode.  Each byte will be
swapped with its correspondent across the screen, and the pixels within
each byte will be swapped via a look-up table for the appropriate mode.
Since screen-handling code has to sit at address &9000 upwards (?), note
that this places a restriction on the location and size of tables. (Is
this correct - or am I thinking of the bit of code which tweaks the MOS
VDU variables.)


10. Right-to-left mode

The screen flip, Arabic keyboard, and the right-to-left mode are not
tightly bound in principle, although they may be in practice. 
Consideration should be given to the future addition of other
right-to-left langauges, such as Hebrew and Devenagari.  Also, it may some
day be desired that documents are generated with some lines right->left
and others left->right. (Say, using escape sequences.)  In such a case,
you would want to switch to right->left mode without flipping what is on
the screen.


11. Context analysis

The VDU character output stream is filtered and the following transitions
are monitored:  non-alphabetic text -> start_of_word -> middle_of_word ->
end_of_word. If the start_of_word and end_of_word states overlap then it
is in fact in the stand_alone state.  A character cannot usually be
identified as belonging to any particular state until the following
character is known.  The form of a character as display on the screen
dependings on the state of the character.  Therefore to display the
correct form for a character at all times, it is usually necessary to
erase that character from the screen and replace it with the correct form
once that is known.  The start_of_word position is always known before a
character is displayed, so the first letter of a word can be done
correctly.  Any intermediate form for subsequent letters would do, as they
will be corrected on state transitions.

The end_of_word form could be used because it would keep the display
consistent at all times.  This means that the character
behind the cursor would change on every keypress,
which may or may not be visually acceptable.  However, there is a good
implementation reason for choosing a scheme where the middle_of_word form
is output by default, and the end form only issued in a 'clean-up'
operation once the end of the word has been determined.  (The reason being
that the state machine is simpler and doesn't have a quite so complex
backtrack algorithm: because of the 2-char backtrack for the case when a
vowel precedes a consonant at the end of a word, the number of states
would be greatly increased if this were to be handled everywhere.  If the
work is done at the end of a word all the paths filter through the same
exit.)

The prototype implementation started off clean (it had a table
categorising the types of characters as vowels, consonants, and others)
but developed into rather a lot of ad-hockery.  The AROM shall not only
have a table of character types (which is now extended: vowel, consonant,
not-valid-at-end, not-valid-at-start, plus some others to be determined)
but shall also have a table-driven state machine.  This should shrink the
code as well as make it easier to maintain (although possibly not debug,
in the first instance).


12. Arabic Numerals

In right -> left mode, Arabic numerals behave anomalously: they are
represented left -> right as in English.  This generates some problems in
a portable environment.  Ideally, the numeric Read and Write routines
would handle the reversal of digits.  However, it is not practical to
change all the existing code, so a compromise has to be built in (hacked
in) to the AROM.  In priniciple, whenever the output stream starts to
write numerals, those numbers will be filtered off into a buffer.  When the
data ceases to contain numerals, the contents of the buffer will be
printed in reverse.  In practice, the screen must be kept up to date at
all times, (because the numbers in question are data being read in and
echoed - we cannot wait for another character to trigger output) so the
reversed buffer must be continually redrawn as each character is
presented.