Categories
general programming tamil thamil

Redesigning an Input Method for an Abugida Script

After I previously talked about problems of input methods for abugida scripts, and added more supporting details to the point, I finally started prototyping possible implementations of the idea (try it out!).

But there are quite a few constraints and tradeoffs that come up once you start thinking about the details. I think these issues apply generally to most abugida scripts. So I am documenting all of the details below. Also, getting a new input method adopted requires more than perfecting just the technical details and user experience — it also requires overcoming user inertia (or creating awareness), and it also requires educating industry experts and those implementing changes. If you have feedback, please send it my way so that I can continue to update this post with the latest information.

Categories
clojure general programming tamil thamil

Deriving Lexical Data for Tamil from Scratch Using Morphology

I presented at the Unicode Conference 2 weeks ago, on Oct. 16, on important yet overlooked issues that concern languages that use abugida scripts and have agglutinative morphology, using Thamil language as a case study. Although the talk was mainly about the issues around dictionary data sets, other issues included input methods, and the need for phoneme level segmentation for these use cases. See below for more details:

Slides:

https://docs.google.com/presentation/d/1EdNLgh8MyvSqDlm2I2_aXM-WgTINaZekXZWq0629ZLQ/edit

Pre-recorded talk:

The talk covered the following topics:

Categories
programming tamil thamil

VS Code supports abugida scripts

Finding text editors that properly support the input and navigation of various scripts’ Unicode-encoded text is no longer as rare as it used to be.  Unicode has been well-established for a long time as the standard for encoding all of the world’s languages.  However, when it comes to text editors specifically for programmers (IDEs), ironically, the situation is pretty bad.  It looks like in Visual Studio Code’s most recent update, they finally have proper support for input and navigation of abugida scripts, or as they’re alternatively called, alphasyllabaries. The animated picture in the VS Code update page shows someone typing and navigating Tamil text, but the change should actually apply to several languages across East Africa, South Asia, and Southeast Asia.

Categories
general tamil thamil

Half Gods by Akil Kumarasamy available on pre-order

A friend of mine has just finished a novel / book of interlinked stories in which the Tamil experience is an important factor.  It begins with an Eelam Tamil family grieving about the end of the war in Jersey and connects stories across time/place. There’s also a chapter/story called “The Office of Missing Persons,” which directly deals with disappearances.  The book has gotten really positive reviews from some well-known writers in the literary world.  It’s coming out in a month, available on pre-order (links below).

Categories
general tamil thamil

Favorite Tamil & South Indian restaurants of the SF Bay Area

By now, I have visited enough Tamil & other South Indian restaurants in the San Francisco Bay Area to have some favorites (and not-so-favorites).  Here are my favorites, “honorable mentions”, footnotes, and disclaimers.  Enjoy!

Categories
programming tamil thamil

Updates from the last 20 months

The last 18 months have been eventful even if my updates have been sparse. Here’s a quick rundown of some of the things that I’ve been up to:

Happy New Year, and hoping that 2018 is a good year!

Categories
programming tamil thamil

Tamil Internet Conference 2017 – Prefix Trees for Language Processing – slides and paper

The Tamil Internet Conference for 2017 in Toronto, Ontario, Canada just concluded. I presented a more in-depth explanation of my previous post on prefix trees along with specific examples of how I have used them.

Here is the full paper that I submitted for the conference proceedings, entitled “Prefix Trees (Tries) for Tamil Language Processing”. Here is the slide deck for the presentation I gave in the conference.

The following is the full text of the paper from the link above:

Categories
general programming tamil thamil

Using Prefix Trees for Thamil Language Processing

Thamil computing has made a lot of progress in the past 10-20 years. Much of the work that has reached the public has been in the areas of fonts/rendering and input methods. Thanks to the continuing efforts in these areas, most of those issues have been solved, Thamil text has standardized on a single character set (Unicode), and we have nice fonts and input methods for major operating systems and mobile devices. The new environment has enabled the widespread creation and consumption of digital content in Thamil.

Now, the next set of problems to solve are handling Thamil text that is written using the Unicode character set. Unicode is designed for all languages’ fonts to standardize, but the slight cost to Thamil language processing has been its complexity. But the challenges can be handled easily by representing the data in a suitable data structure, which in this case is a prefix tree (or “trie”).

Categories
clojure general programming tamil thamil

Speaking at Clojure/West 2015!

I’m excited to be selected as a speaker at the upcoming Clojure/West 2015 conference next month in Portland! I’ll be talking on how Clojure can be used to program in other human languages (other than English). There are interesting opportunities related to diversity and access. I will be drawing on my experiences with programming in/for Thamil in the clj-thamil library. And I’ll see what other interesting, related ideas I can slip in (turtles that draw?)… and put a bird on them.

Categories
clojure java programming tamil thamil

Exploring programming in Thamil (not English) through Clojure

Or: A clear example of what macros can do

Introduction

I started working on a library called clj-thamil that I envision as a general-purpose library for Thamil language computing (ex: mobile & web input method), but a slight excursion in that work has led me to some very deep, intriguing ideas — some of which are technical, and some of which are socio-cultural. But they all fit together in my mind — Clojure, macros, opportunity and diversity (in computing), and the non English-speaking world.

I think that the implications are things that we should all think about. But if nothing else, hopefully you can read this account and understand something about macros — the kind of power they uniquely provide and at least good one use case where they are necessary.