I recently was tasked with performing an ETL task that should be done as efficiently and quickly as possible. The work led me to learn more about parallel and distributed processing in Clojure. In addition to having a greater appreciation for what Clojure enables (once again), I also pushed the boundaries of what I thought is possible using the available tools. I ultimately ended up writing a Spark job whose executors are each running N threads (currently, N=3). But the path to that solution taught as much by what didn’t work as much as what did work.
Continue reading Using Clojure to Create Multi-Threaded Spark Jobs
For those of you with experience with Maven, you might be wondering why anyone who is using Leiningen to build a project would then want to run that build tool from Maven, which is itself another build tool. There is a reason why I even ventured down this path. I would like to share what I have found so far, in case it benefits anyone else, but I would also like to get feedback from people who know of a better way of accomplishing the same goals.
Continue reading Compiling a Leiningen Project from Maven
Thamil computing has made a lot of progress in the past 10-20 years. Much of the work that has reached the public has been in the areas of fonts/rendering and input methods. Thanks to the continuing efforts in these areas, most of those issues have been solved, Thamil text has standardized on a single character set (Unicode), and we have nice fonts and input methods for major operating systems and mobile devices. The new environment has enabled the widespread creation and consumption of digital content in Thamil.
Now, the next set of problems to solve are handling Thamil text that is written using the Unicode character set. Unicode is designed for all languages’ fonts to standardize, but the slight cost to Thamil language processing has been its complexity. But the challenges can be handled easily by representing the data in a suitable data structure, which in this case is a prefix tree (or “trie”).
Continue reading Using Prefix Trees for Thamil Language Processing
I’m excited to be selected as a speaker at the upcoming Clojure/West 2015 conference next month in Portland! I’ll be talking on how Clojure can be used to program in other human languages (other than English). There are interesting opportunities related to diversity and access. I will be drawing on my experiences with programming in/for Thamil in the clj-thamil library. And I’ll see what other interesting, related ideas I can slip in (turtles that draw?)… and put a bird on them.
Or: A clear example of what macros can do
I started working on a library called clj-thamil that I envision as a general-purpose library for Thamil language computing (ex: mobile & web input method), but a slight excursion in that work has led me to some very deep, intriguing ideas — some of which are technical, and some of which are socio-cultural. But they all fit together in my mind — Clojure, macros, opportunity and diversity (in computing), and the non English-speaking world.
I think that the implications are things that we should all think about. But if nothing else, hopefully you can read this account and understand something about macros — the kind of power they uniquely provide and at least good one use case where they are necessary.
Continue reading Exploring programming in Thamil (not English) through Clojure
Back around the December – January time frame, I was trying to implement the Lambda Architecture as described by Nathan Marz. At that time, the early-release version of his upcoming Big Data book was just at chapter 5 or 6, but my goal was to tackle what seemed liked the harder part — real-time (Storm). The book chapters hadn’t yet caught up to it. A few slide decks mentioned their current implementations of a fully thought-out, end-to-end Lambda Architecture implementation that included Storm, but no reliable, easy-to-deploy code was readily available from the interwebs.
In installing Storm, it quickly seemed apparent that having Kafka running upstream of it was one way to support both real-time and batch processing of incoming data, and probably the one of least resistance. So I added installing Kafka to my to-do list.
Cutting edge technology means dealing with rough edges. I downloaded the latest versions of the relevant software components, but the integration of all of them didn’t work. As I found out, the reason was that versions of components that finally worked together with each other for me were not the most recent, but instead maybe a version or two behind.
The code that I ended up with to get Kafka and Storm working together on a toy example using the Twitter Dev Stream is on github here:
Continue reading Example of Storm and Kafka (in Clojure)
I have been pushing for 2 months to institute a monthly Hack Day in the company, and it has happened finally on Feb 13-14! Having a company culture that fosters creativity and engagement is important when you work with technology. Technology is all about the new, and the new is all about synthesizing new ideas to create and adapt. The environment in which you are can make a big difference. Whether you talk about the network effects of being in the city that speaks to your interests Cities and Ambitions, or you are talking about opportunities that come with family upbringing, or the general attitudes and behaviors of the workplace, it’s amazing how your surroundings can be so important and yet subtle and difficult to quantify. I’m proud of the efforts I’ve been making so far, including the monthly Hack Day, to make committed first steps in introducing tangible, bottom-up, creative spaces. Continue reading Organizing a Hack Day