Character substitutions in text

TeX handles some character sequence substitutions by (ab)using the ligature mechanism, e.g., ``→“. This works reasonably well for Computer Modern which defines these in its ligature table, but falls apart once we start trying to use non-TeX fonts. Furthermore, there’s the added complication that most fonts put the characters ' and ` in character positions 39 and 126 while TeX is expecting those characters to typeset ’ and ‘.


I’m thinking that a solution to this would be to have a character sequence substitution that’s run-time configurable as part of the text-input pipeline. This would happen after commands have been interpreted but before the text stream is processed for ligatures and line breaks. The standard route would be to import a tab-delimited table of input and output sequences of Unicode characters. The standard TeX input would look like:


!` ¡
?` ¿
~ \u00a0


Note that we no longer have an active character concept to allow using ~ for non-breaking spaces. Also, the timing of when the substitutions take place mean that we cannot use this mechanism to insert commands into the input stream. doing a mapping like TeX\TeX will not typeset the TeX logo but will typeset the sequence \TeX instead including the backslash (Actually, given the use of \ to open a Unicode hex sequence it might produce something like a tab followed by eX depending on what other escape sequences are employed.


Other TeX conventions, like the AMS Cyrillic transliteration where ligatures are used to map sequences like yu→ю can easily be managed. Similarly Silvio Levy’s ASCII input scheme for polytonic Greek can also be easily managed. These would allow for easy input of non-Latin alphabets for users who primarily write in Latin alphabets and work on operating systems where switching keyboard layouts to allow input of non-Latin scrips are difficult.

Comments |1|

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Legend *) Required fields are marked
**) You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>
Category: architecture