About Me

Hi, I’m Elango Cheran.  I’m currently working on projects that help the productivity of computational scientists.

I have a background in Next Generation Sequencing data, doing analysis for sequence data from different sequencing platforms, with DNA/RNA taken from various organisms and modified in various experiment types.

In the process of supporting NGS data, I have dealt with the challenges of large amounts of data, from storage server decisions, network utilization, cluster configuration and queuing, and more.  As a computer scientist, I’ve come to appreciate the issues of efficiency that such “Big Data” causes, whether in the algorithmic complexity of an operation, latency of network protocols, speed of language feature implementations, and the cost and scalability of human code development.

If you, too, are interested in NGS, productivity in computational sciences, or even just computer technology in general, I’d love to hear your feedback.

Email: elango .DOT. cheran .AT. g m a i l .DOT. com

9 replies on “About Me”

May we know if you’re the same Elango Cheran who put up a Tamil website for learners? it’s a fantastic website, the best we’ve found for learners whose first language is English. We posted a thank you but we got a “delivery failure” message on that.

Tim and Felicia

Greetings from a fellow computing and Lisp enthusiast! I only discovered your blog recently: I started studying the Tamil language a few weeks ago and was genuinely disappointed at the lack of resources for such a thriving language. A spirited Google search led me to your website for Tamil learners, and indirectly, to this blog, for which I am very grateful. Your posts are always informative, unique, and fun to read – I look forward to reading more of your blog, so please keep it comin’!
I also have a request – feel free to ignore it if you feel like it – would it be possible to get the contents of your Tamil learner’s website in PDF format? I ask because in the unlikely event that your website goes down or you decide to discontinue hosting it, all that easy-to-read information on the grammar (that I *still* haven’t been able to find anywhere else) will be lost. I hope you consider this request!

Best,
Sam

Hey Sam, thanks! I’m glad you found the lessons after the journey of searching. If you find them useful, then maybe you can drop links in other pages that were dead-ends in your searching, to help point the way for the next people who will be just like you, and shorten their journeys? Changing the site’s URL a couple of times has made it less discoverable, ironically, which is where you can help. Thanks for the encouragement, I’ll try to post again on the blog soon.

I wouldn’t worry about the lessons going anywhere. As the bottom of the left navbar and a semi-recent blog post indicate, they’ve been around for a long while, which is forever in internet time. In the early years, I did maintain a PDF version from the beginning, but for a different reason — in 2001, it was hard to guarantee that the Thamil text would show up correctly without all the pieces (ex: font download & installation). The standardization of OSes around Unicode via fonts, layout engine fixes, UTF-8, etc. finally solved those problems. But you can still find old copies of that PDF that people decided to copy and upload to places like scribd. Since then, I’ve made corrections, updates, and additions, which the static old PDF copies won’t have. Plus, there are new interactive features that a PDF won’t capture well.

Your website is the most organised website I have yet found for learning Tamil. Thank you for the meticulous work.

Hi, I watched your talk about Lexical Data for Tamil. Is there a publicly available lexical data set for Tamil? You mentioned Unilex, but I am not able to find public data for Tamil. I am interested in making various Tamil word games but need a high quality word list including verb class info and suffixes. Let me know if there are any publicly available data sets. I love your crash course on Tamil as well. Thanks!

There is not a more expansive dataset. You can create that for yourself if you want since the code is there and the foundational work has already been done to manually inspect and record the denylist of false positives (corner cases) of new words. Part of the extra work for handling a larger dataset would be cleaning up bad data. In this case, the long tail would include a higher percentage of non-words (abbreviations, names, places, transcribed foreign words). There were those non-words already in the Unilex input word list data, which included the top 10000 or so most frequent words instead of all words. You would also have the problem of people typing text that is not properly encoded in Unicode even though it looks visually the same/similar. Good luck.

Leave a comment