ICU4X Mailing List and 0.1 Release

For people who are interested in internationalization (i18n), they are likely writing software using ICU, the gold standard library for internationalization functionality and performance. Of course, ICU is available only in C++/C (“ICU4C”) and Java (“ICU4J”), and is quite the behemoth. In order to support other programming languages directly and to support more resource-constrained computing environments (ex: mobile), we have the ICU4X the project.

The first preliminary release, v 0.1, is now official, and the current code has been published in Rust

To received future project announcements and to stay connected, sign up for the mailing list.

clojure general programming tamil thamil

Deriving Lexical Data for Tamil from Scratch Using Morphology

I presented at the Unicode Conference 2 weeks ago, on Oct. 16, on important yet overlooked issues that concern languages that use abugida scripts and have agglutinative morphology, using Thamil language as a case study. Although the talk was mainly about the issues around dictionary data sets, other issues included input methods, and the need for phoneme level segmentation for these use cases. See below for more details:


Pre-recorded talk:

The talk covered the following topics:


Tamil Names

I was talking with my friend about how Tamil names differ from Western names. During the conversation, we reminisced about how he was interviewed by a local radio show on how his name is “long”. I remembered feeling unimpressed by the radio segment with my friend, and it helped explain more about Tamil names.