programming tamil thamil

VS Code supports abugida scripts

Finding text editors that properly support the input and navigation of various scripts’ Unicode-encoded text is no longer as rare as it used to be.  Unicode has been well-established for a long time as the standard for encoding all of the world’s languages.  However, when it comes to text editors specifically for programmers (IDEs), ironically, the situation is pretty bad.  It looks like in Visual Studio Code’s most recent update, they finally have proper support for input and navigation of abugida scripts, or as they’re alternatively called, alphasyllabaries. The animated picture in the VS Code update page shows someone typing and navigating Tamil text, but the change should actually apply to several languages across East Africa, South Asia, and Southeast Asia.

Abugida scripts are scripts for which letters of the language (grapheme clusters) are written with a base shape (base grapheme) and other marks/shapes (graphemes) around it  Scripts that fall into the abugida category include scripts for the Arhamic (Ethiopic) set of languages, scripts for South Asian (Indic) languages, and scripts for SE Asian languages.

Of course, the scripts (writing systems) of Southeast Asia are related to the scripts of South Asia due to the sustained, centuries-long connection by the Pallava kingdom and Chola kingdom.  Up until now, the origin of the scripts of South Asia have been described as forms of a single “Brahmi” script, for which the flow of introduction went from North India to South India, and perhaps even originally from West Asia to North India.  Recent excavations in the Keezhadi excavation site in modern day Tamil Nadu include artifacts with writing older than the previously known oldest writing (Ashoka’s edicts pillars in the Gangetic plain of North India), and the site also shows artifacts that mix Indus Valley Civilization graffiti (markings) with the earliest Tamil writing.  In light of these new findings, the full story of the Tamil script overturns our previously held beliefs of the origins of scripts in South Asia, and that is the reason why Tamil scholars have now adopted “Tamizhi” as the more accurate (and historically accurate) name of the origin of the Tamil script, instead of “Tamil Brahmi”.  After all, Tamizhi (2600 year ago) is now older than Brahmi (2300 years ago), and has strong connections to something twice as old (mature phase IVC was 4500 years ago).  If you need to watch the Tamizhi documentary in the above link in English, turn on the English captions in Youtube.

The end result is that the writing system style (abugida) is used by many people in many countries.   Technology has long supported alphabets well, and even ideographic scripts (“CJK languages”), but support for abugida scripts lags behind.  As I pointed out in my talk about writing code in other programming languages, it took much longer to wait for an IDE to show up that supports writing in Tamil (abugidas) — 9 months — than it did to actually write the code that I presented — maybe 5 months.  In the 5 years since then, not much has changed.  Now with VS Code, at least there is a way to do editing in Linux.  Here’s what I’ve tested, which I think is still accurate as of the time of this writing:

emacs / Aquamacsyesno
VS Codeyesyes

Beyond that, which other editors have good integrated support for Clojure, the language that I was using? I don’t remember all of the IDEs that I tried — maybe I had also tried KDevelop / Kate, but I do remember being exhaustive within reason and coming up empty.

This proves the point, once again, that to have a language supported in technology for a particular use case, you need support all the way through the stack — universal character set (Unicode), using the same choice of character encoding (ex: UTF-8), fonts and/or OS support to ensure the proper rendering of the letters (grapheme clusters) represented in Unicode, input tools to type in the language, and text editors and/or OS support that allow meaningfully correct navigation of text and can support the input tools’ usage. Most operating systems, mobile phones, and word processor programs support all of this, finally.  Now, all we need is for programmers to figure out this stuff well enough (beyond just the basics) so that they can make the tools that they write for themselves to edit code behave correctly.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s