Categories
general programming tamil thamil

Redesigning an Input Method for an Abugida Script

After I previously talked about problems of input methods for abugida scripts, and added more supporting details to the point, I finally started prototyping possible implementations of the idea (try it out!).

But there are quite a few constraints and tradeoffs that come up once you start thinking about the details. I think these issues apply generally to most abugida scripts. So I am documenting all of the details below. Also, getting a new input method adopted requires more than perfecting just the technical details and user experience — it also requires overcoming user inertia (or creating awareness), and it also requires educating industry experts and those implementing changes. If you have feedback, please send it my way so that I can continue to update this post with the latest information.

Categories
general

Supporting and fixing North American Indigenous languages in technology

This recent blog post from font maker Typotheque has a video explaining the work involved to support an under-/unrepresented language in technology:

It shows the added challenges for a language when it is spoken in rural areas by dwindling numbers due to a history of active suppression by the government in charge (in this case, the Canadian government). The sounds of First Nations languages like this are fascinating. What’s also neat is to see how these efforts materialize via the Unicode Consortium into the official technology standards that all major hardware and software companies subsequently implement to make it real. This is the proposal to add the needed characters to the Unified Canadian Aboriginal Syllabics block, and this is the proposal to fix up the standard’s documentation so that the example display of characters appear correctly for these languages. There is a lot of other technical work that goes on behind the scenes to make this useful in a practical sense — ex: a language-specific (smartphone) keyboard, or the software that helps your phone/computer know how to handle text layout and display for any language. The video above shows the real impact of this unseen work by the people around me.

Categories
general

My Wordle Strategy

Wordle is a game that is a cross between Mastermind and Scrabble. I heard about it because it is really trendy. But I tried it out because Mastermind is one of the games I introduced to my nieces in August. It seemed hard for them, so I always played on the side of the person guessing, and showed them how to reason through any game (like thinking up a series of unit tests, without saying so, haha!) in order to win reliably.

Categories
general

Musings on Effort and Passion

I just came across this post/article that contrasts 2 positions — passion results in effort, and effort increases your passion — and shows how they’re interrelated. The part about effort resulting in passion (or more likely: sustained effort -> mastery -> motivation) resonates currently, especially since sustained effort is not always easy when things feel difficult or change too often. Things changing often disrupts our ability to learn, I think, because we accrete knowledge only when we can incorporate it into the knowledge structures that we’ve already built. Being in this pandemic still means avenues for people connection and mentorship are harder to come by, so it removes opportunities to make those mental connections in our mind that speed up our knowledge, and in turn, successively, our mastery, our autonomy, our drive, and our passion. As insightful and timeless as essays from Paul Graham about how to do what you love and Steve Jobs’ commencement speech are, they start with a premise of passion and don’t really mention the gritty complement of hard work. Bill Burr’s bit is still hilarious, where he questions Jobs as “Nerd Jesus”. And the news yesterday of Roy Williams retiring as UNC basketball’s coach touched on lots of issues, but one issue is the change of an environment to one where there is less emphasis on the personal relationships and player development, especially if self-development requires a high level of commitment and overcoming adversity.

Categories
general

D-Pub for Keyboards for Agglutinative Languages and Abugidas

I just noticed that a month ago, my defensive publication was officially published. It describes two main ideas for improving virtual keyboards that I talked about previously. The first idea is that languages that use abugida scripts should use their phonemes as the keys to make an efficient and intuitive keyboard. The second idea is a higher level idea, saying that agglutinative languages should use infix/suffix morpheme suggestions in the autocomplete list.

Defensive publications establish prior art that inherently protects future implementations, in open source or otherwise, because the prior art nullifies potential future patents, including by patent trolls. So anyone interested is free to try implementing.

Categories
general

LearnTamil.com is Refreshed

I’ve updated my grammar lessons for learning Tamil at https://www.learntamil.com . It should have a cleaner look, easier to view on mobile devices, and you can listen to the audio files because they’ve finally been converted from RealAudio format to a modern common format, MP3.

Categories
clojure general programming Rust

Learning Rust for Beginners

Rust is a new-ish language that is very compelling in certain contexts, but learning it has a really deceptive learning curve, so I wanted to provide the links that I have found most effective for slow learning beginners like myself, especially because the “official” Rust book(s) are to me paradoxically hard to learn from despite being thorough.

Categories
general

ICU4X Mailing List and 0.1 Release

For people who are interested in internationalization (i18n), they are likely writing software using ICU, the gold standard library for internationalization functionality and performance. Of course, ICU is available only in C++/C (“ICU4C”) and Java (“ICU4J”), and is quite the behemoth. In order to support other programming languages directly and to support more resource-constrained computing environments (ex: mobile), we have the ICU4X the project.

The first preliminary release, v 0.1, is now official, and the current code has been published in Rust crates.io.

To received future project announcements and to stay connected, sign up for the icu4x-announce@unicode.org mailing list.

Categories
clojure general programming tamil thamil

Deriving Lexical Data for Tamil from Scratch Using Morphology

I presented at the Unicode Conference 2 weeks ago, on Oct. 16, on important yet overlooked issues that concern languages that use abugida scripts and have agglutinative morphology, using Thamil language as a case study. Although the talk was mainly about the issues around dictionary data sets, other issues included input methods, and the need for phoneme level segmentation for these use cases. See below for more details:

Slides:

https://docs.google.com/presentation/d/1EdNLgh8MyvSqDlm2I2_aXM-WgTINaZekXZWq0629ZLQ/edit

Pre-recorded talk:

The talk covered the following topics:

Categories
general

Tamil Names

I was talking with my friend about how Tamil names differ from Western names. During the conversation, we reminisced about how he was interviewed by a local radio show on how his name is “long”. I remembered feeling unimpressed by the radio segment with my friend, and it helped explain more about Tamil names.