How we write down languages

Writing systems — the ways in which we write down a language — are fascinating.  They may not be as diverse and surprising as all the animals and plants of the world, but you likely may be surprised to learn just how differently we have found ways to write down information, each with strengths and weaknesses.  And if you don’t get all the details right, then what you end up telling the world is not “follow your heart” but “coward”.  Oops.  (Countless examples exist that could fill up several blogs — it turns out failures in cross-cultural appreciation & appropriation go both ways.)

People believe that, in the timeline in human evolution, language evolves first as speech, and only later does writing evolve to record it down.  For example, when new words (ex: slang) or spoken forms derived from spreading process (ex: “gonna”, “hafta”) get coined, if they persist long enough, they will become normalized and enter the vocabulary.  Whether these new words are defined as “official words” depends on if you have a prescriptivist or descriptivist view of the situation, but that is a small example in our own experiences where we see writing changing according to speech.  (Btw, Shakespeare was a master of introducing crazy, made-up words into a language.)

What is fascinating are some of the more apparent examples in recent history in which we make choices on how to write, or change how we write, a language, and the implications those choices have.  Or in one case, it’s the other way around — it’s the act of recategorizing dialects as separate languages through how we write.

Kazakhstan’s was recently considering a new Latin script alphabet a few months ago to write the Kazakh language, which would replace the Kazakh version of Cyrillic that was pushed during early Soviet times.  The decisions over details was unwittingly going to make the proposed script much more difficult than it needed to be, and potentially problematic for encoding data in computers by overloading the semantics of the plain apostrophe.  The plan to replace Cyrillic with Latin was political — to remove the legacy of colonialism (Russian Soviets) from the country.  According to the article, the introduction of Cyrillic in the first place was itself a deliberate, political decision:

For centuries, it was written using the script of Arabic, the language of Islam, which most Kazakhs have long at least nominally practiced.

Kazakhstan switched briefly to the Latin alphabet at the start of the last century, and Russia’s Communist leaders after the 1917 revolution initially supported the use of a Latin script.

Later, growing fearful of pan-Turkic sentiment among Kazakhs, Uzbeks and other Turkic peoples in the Soviet Union, Moscow between 1938 and 1940 ordered that Kazakh and other Turkic languages be written in modified Cyrillic as part of a push to promote Russian culture. To try to ensure that different Turkic peoples could not read one another’s writings and develop a shared non-Soviet sense of common identity, it introduced nearly 20 versions of Cyrillic, Mr. Kocaoglu said.

Computers may the last thing that people are thinking when these are the considerations factoring into their decision, but encoding such a scheme into computers can end up being a good litmus test of simplicity (or, at least, lack of incidental complexity).

This BBC article on the proposed change explains the economics, some of the linguistics, and the possible political effects.  The diagram in the article illustrates how tricky the switch is (but it doesn’t show how the sounds compare to the Turkish language alphabet, for example).


The article succinctly explains the importance of language to economics and political relationships:

Currently, up to 10% of the current trade flow between Russia, Kazakhstan and Ukraine can be explained by the convenience of a shared language, which in some ways translates to a shared culture and mentality, says Madumarov. This also means that Russian-speaking Kazakhs have more economic mobility between countries. Meanwhile, Azerbaijan and Georgia, nations that are not as fluent in Russian, have weaker trade links.

Inversely, he says that the benefits to having a Latin-script alphabet means being better integrated with most of the Western world. As an example, Turkey, which switched to a Latin-based alphabet from its former Arabic script in 1928, has managed to form alliances with the European Union and was in negotiations – up until recently, when the government moved towards a more autocratic direction – to be a member.

Here is the full map of the world according to writing system category.

In all this, we’re not even talking about the abrupt change of the Turkish alphabet from Arabic to Latin.  Despite (or because of) Turkey’s top-down shift to the West politically in the early 1900s, the quick adoption of a Latin was not painless, and we still deal today with the “Turkish I problem” in computers.  Also note that English and Dutch use the 26 letter “unadorned” Latin alphabet (no diacritics around letters).  Most of the relatively recent West-oriented alphabet changes for various languages are based on the English alphabet, and English has an extra strange relationship between letters and sounds because of changes like the Great Vowel Shift.

(If the standard Latin alphabet in standard ASCII has 26 letters, why does Italian, the most direct descendent of Latin, have only 21 letters?  It turns out that the old Latin alphabet evolved over time, once in ancient times 2000 years ago after conquering Greece and expanding to include Y and Z to represent Greek loanwords, and again in the Medieval times (500-700 years ago?) to distinguish I from J, V from U, and include W for Germanic languages.  Italian today doesn’t use J, K, W, X, and Y.)

The most surprising fact in all this is that the brief time in which Kazakh was written in Latin script in the early 1900s before its current Cyrillic form was due to the USSR promoting its Latinization project for language writing systems.  Those early Soviet days were heady times, and other languages were a part of that project, until the project was abandoned after a few years in favor of Cyrillic for less idealistic reasons.  And that attempt to Latinize some of the languages overlapped with and was helped by Turkey’s adoption of the Latin script.

What’s fascinating about the recent history of South Asia is that the 2 main languages of the Gangetic Plain spreading across Pakistan and northern India — Urdu and Hindi — is that they are pretty much the same language, or at least, dialects of the same language.  The language was once called Hindustani in the 1800s — a language with word roots in Sanskrit (via Prakrit), Arabic, and Persian — for which Hindi and Urdu were considered dialects of Hindustani, and as such, the names were considered synonyms for the language.  But before the arrival of the British, who coined the name Hindustani as a synonym for Urdu and Hindi, it was just called Urdu.  (The influences of Persian and Arabic were brought about during the time of the Mughal Empire, which traced its lineage to the Mongols of Genghis Khan fame a few centuries prior, where the surname “Khan” is a remaining vestige.  The spread of Genghis Khan is the reason why Turkic languages are spoken from Turkey to Central Asia including to Mongolia, but the Mughals maintained no Turkic language influences in India.)

But there, the divergence occurring since independence in 1947 is one that was & is accelerated by deliberate actions by governments both in lexicon and writing system.  Since independence and the Partition of India and Pakistan were only 1 day apart, the resulting 2 countries and respective religious and ethnic identities pushed the relevant dialects apart by introducing more of either Sanskrit or Arabic vocabulary, and Urdu using the Nastaʿlīq script and India standardized Hindi around the Devanagari script, passing over the Kaithi script that was neutral in that it was used by both Muslims and Hindus alike.  Where there was once very recently one language with two major dialects is now two languages (other regional dialects exist as before).

And Serbian is a language that remains as one language, but it can commonly be written either in Cyrillic or Latin.  But as mentioned before, the definition of Serbian as a language distinct from Bosnian or Croatian, and now Montengran (after the independence of Montenegro in 2006) and not as dialects is a bit murky, as it is to define Norwegian, Swedish and Danish as distinct languages and not dialects.  Or maybe the distinction is easy to make.

So what about Vietnamese, which is another language that switched over to a Latin-based alphabet (although less by choice and more because of the French)?  All the sounds can be represented in the alphabet, but it doesn’t mean that everything is not so straightforward for people familiar with the Latin alphabet.  D sounds like “z” or “j” depending on the dialect, while Đ sounds like “d”, and it looks trickier from there.  Issues like this or the “Turkish I” may not be problems for native speakers of those languages, so are they just problems of the non-native speakers, then?

I continue to find some inspiration the story of Adlam, a language made up from scratch by a couple of brothers for their native Fulani (aka Pular) on the challenge from their father to help preserve their language.  Access to education for a traditionally nomadic people is scarce, and learning to write the language, which is either in Arabic of French script, effectively meant learning those languages when going to the schools that taught the associated script.  The spontaneous changes that result when a script fits its language and region is fascinating.  The sheer belief required in inventing a script by yourself for millions of people in dozens of countries, the amount of effort required to promote the language, and the ongoing work necessary to make and keep the language relevant in the digital age is a story full of lessons for those who observe.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s