D-Pub for Keyboards for Agglutinative Languages and Abugidas

I just noticed that a month ago, my defensive publication was officially published. It describes two main ideas for improving virtual keyboards that I talked about previously. The first idea is that languages that use abugida scripts should use their phonemes as the keys to make an efficient and intuitive keyboard. The second idea is a higher level idea, saying that agglutinative languages should use infix/suffix morpheme suggestions in the autocomplete list.

Defensive publications establish prior art that inherently protects future implementations, in open source or otherwise, because the prior art nullifies potential future patents, including by patent trolls. So anyone interested is free to try implementing.

To reiterate, I think the phoneme style keyboards for abugida scripts is the optimal style for these scripts. I’m definitely not a fan of the flickering of the Gboard, as referenced in the conference talk and defensive publication. I think applying the flick-style input method to the languages of S and SE Asia would be even worse than Gboard, given that they have 12+ vowels. It’s way too easy to fat-finger a mistake, unless you sacrifice speed. This deficiency is in contrast to the source of inspiration, Japanese, which does okay with the flick-style input since it only has 5 vowels.

For Tamil, the Tamil99 standard for keyboards, used as the default for Tamil in Apple mobile devices, is closer to a phoneme keyboard style than Gboard. But Tamil99 is confusing because it approximates the phoneme style keyboard with its own consistencies, while requiring a lot of “intelligent” inference that reduces user control and leaves room for errors in corner cases. The closest to a phoneme keyboard for is the Murasu Anjal (“phonetic” / “romanized”) style, which is basically typing in English using a consistent transliteration scheme to generate Tamil letters, ex: “thamizi ezuththukaL” -> “தமிழி எழுத்துகள்”. But it requires knowing the English alphabet to type in Tamil. Hinglish keyboards for Hindi are really analogous to a cross of the functionalities provided by Tamil99 and phonetic style keyboards for Tamil.

In support of what I am saying, the presentation at the Unicode Conference by one of the main developers / maintainers of Keyman describes a framework of several principles that make a good keyboard / input method, referred to via the mnemonic abbreviation “DISCUS”. The presentation describes how these principles guided and confirmed improvements to keyboards for Thai and Lao and Amharic. Although it only refers to Amharic as an abugida script, in fact, they all are abugida scripts. There are 2 points that are most interesting from the presentation. The first, from slide 42, “User does not have to think about backing store order but rather the word”, says that the keyboard can construct the Unicode sequence in a way that is transparent to the user so that the user can think about the word, and not be inadvertently exposed to the technical details of how the word is represented by a computer. The second point about Amharic is that the multi-tap design for its keyboard “Exploits language structure for efficient input,” as stated on slide 51. This point is relevant for Ethiopic and most S Asian scripts, since they don’t have so many consonant possibilities due to the lack of tones that SE Asian languages have, and thus multi-tap is enough without resorting to gestures to supplement the available consonants to input.

For context, Keyman is the software that has been used since the late 1990s to provide a free way on Windows to type Tamil in the various pre-Unicode Tamil fonts (Anjal’s InaiMathi, Bamini, Boopalam, TSCII-standard fonts, TAB-style fonts, etc.) and Tamil input layouts (Anjal/romanized/phonetic-style or Tamil typewriter style). Now that Unicode has obviated the need for pre-Unicode fonts, Keyman’s support for Tamil is mostly about enabling different layouts / methods for inputting Tamil Unicode text. In the early 2000s, a download bundle that included both Keyman and the Tamil keyboard configuration files needed in Keyman were packaged up and called “eKalappai” (kalappai = கலப்பை = “plow” in Tamil). And Murasu Anjal’s InaiMathi font that was created back in the 1990s is the same Tamil font available for Tamil by default on macOS and iOS devices today, since about 2005 with Mac OS X 10.4 Tiger. We’re thankfully past the days when these details mattered since Unicode has replaced the different pre-Unicode character sets (TSCII, TAB, etc.) with the universal Unicode character set for all languages. It eliminates the dimension of complexity from having multiple encodings all the way up through every level of the i18n support stack in technology, including input methods, fonts, OS support for visual layout and rendering, etc.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s