October 07, 2007
Arabic and Technology: Reform or Die
The use of Arabic in technology has been one of my pet issues since, as a teenager, young Arab kids who didn’t start learning the Latin alphabet yet, used to come to my place during summer to play video games, and wouldn’t be able to enter the necessary commands to launch them. A few years later, I was trying to teach basic programming skills to Arab professionals and they were also facing serious hardship in associating the mnemonics of computer languages based on English roots with their functions. While I found programming very easy even as a child – after all, computer languages are just another language to express yourself in – this made me realize that computers as they are today could never be as easily accessible on average to an Arab as they are to a European, not to mention an American.
Aside from being already equipped with a European language since birth, to allow for the existence of a significant number of skilled Arab geeks, the kind of teenagers who will invent the next big software or website and maybe later become rich out of it, they would have to be introduced to technology through a medium more adapted to their Arabic brain wiring. While there have been attempts by companies like Sakhr to create an Arabic based programming language, they came too early in the 80s, at a time when computers were an extremely rare gadget in the Arab World so they never gained momentum.
Today, whereas support for Arabic exists in such common operating systems as Windows at a user level, the underlying algorithms are such that they make their use outside the field of word processing very impractical if not impossible. Even at that user level, one can easily notice that the use of Arabic is not as slick or intuitive as English, with cursor jumps, inconsistent keyboard behavior, etc.
The main algorithm used to support Arabic is the Unicode bidirectional algorithm. As its name indicates, it’s been created to allow writing documents with complex sentences in both left to right and right to left directions. This algorithm is also defined with the implicit assumption that one is writing on an interface whose underlying horizontal coordinates go from left to right. If a bidirectional text written with this algorithm was used on a non Unicode bidirectional medium, only the Arabic portions would be messed up and the rest would be readable and editable.
Whereas this algorithm definitely answers a certain need, most Arabic texts are almost as monolingual as English texts are, with at most a few Latin words thrown in here and there. The way this algorithm is done also means that it was initially made with non Arabs occasionally writing Arabic on a non Arab medium in mind. This involves many issues. Unlike French or Spanish for example, where all you have to do is to translate a few bits at the user interface level to have a fully functional system, Arabic requires adding a whole new layer of programming at the core level, something that most companies wouldn’t bother doing. Even when the OS already integrates it, it still requires some efforts, and in some cases, you just can’t rely on it. Among the minor issues, the counter-intuitive behavior of interfaces mentioned above. Or sending and receiving text between machines which are not equally aware of it being directional (who never got trouble with Arabic email?). Worst, this whole algorithm is a word processing algorithm. Meaning, if you wanted to create a piece of software that’s not word and paragraph based (e.g. an Arabic based computer language, command line terminal, network protocol, etc.), it’s completely useless.
Then, there’s a second layer of treatment. Glyph rendering, where you have to reshape Arabic letters according to their position within a word. This goes beyond technology, since it implies creating Arabic “print letters” (e.g. position independent letters along this). There have been three or four such attempts for over a century now, but none really caught up. Probably not so much a matter of resistance to change as much as just no major diffused medium who took the trouble (or rather, the lack thereof) to use it.
Yet, the Unicode algorithm has been diffused as the single most used way of enabling Arabic on any computer today, the fully localized monolingual Windows versions included. In fact, from my discussions with quite a few people from the IT world who had an interest in this, many even saw it as the only option, and anything else was digital blasphemy. Reverence to the Unicode Consortium, and to “If Everyone Does It That Way, Then It Must Be Right”, switches off any critical thinking, even among the very people who are supposed to innovate.
The choice of this algorithm as the de facto standard is not what necessarily made most economic sense. Other, better alternatives, might have required less effort. The fact it was there, i.e. inertia, and the lack of Arab IT engineers with real creative thinking, are probably the main drivers behind its widespread. Today, even if one wanted to go back to a better solution, it would be necessary to overcome the resistance and cost of switching away from it first.
There might not be enough demand for more technology-adapted Arabic. But this could be a catch-22. Was the existence of a generation of computer geeks the result of a successful offer of personal computers, or was it the opposite? Either way, the adoption by the average user of any given tool is largely correlated with how adapted it is to their needs and comfort zones. And if Arabic, as a tool, is not adapted then it’ll just join the ranks of other liturgical languages.
Posted by Shaheen at October 7, 2007 02:38 PM
Filed Under: Society & Culture
TrackBack URL for this entry:
It's worth noting that these hassles also occur for other languages which use Arabic script, and many of them occur for right-to-left languages in general. This increases the number of people interested in making things better. For example, lead developers of the GNU Fribidi library, which implements the Unicode bidirectional algorithm, have been Israeli and Iranian.
Posted by: ziz at October 8, 2007 02:03 AM
Interesting. The question that comes to mind is whether Open Source is a solution?
Posted by: The Lounsbury at October 8, 2007 03:52 PM
Short answer is, Open Source could be a solution, more so than proprietary software, but the conditions are not there.
Longer answer: Open Source is currently inferior in terms of both features and cost except in some specific areas (network, system infrastructure, smb-level databases, etc.).
Cost manifests itself not only in terms of licensing, but skills to acquire, cost of maintenance increased by complexity etc. When cost is the driver behind Open Source adoption, the licensing part must be higher than the other components in most cases.
In the Arab World, de facto, cost of licensing is zero (piracy) outside of a few big businesses and governments. And the higher skills that are usually required by the use of Open Source software are too rare to make its adoption an easy option.
IOW, the incentives are clearly on the (de facto license-free) proprietary side in MENA.
Plus the lack of creativity and reverence described above exists even among the lesser Arab geeks. Arab support in Open Source software has mostly been developed by a total of very small numbers of Arabist Westerners, Israelis, Iranians, and occasional Arabs (speaking of the actual underlying development, not translation). 10-20 people max over the last 15 years. And it has been done along the lines of adding the Unicode algorithm to existing user interfaces.
One possibility is that if some Arab universities did actual research, they might come up with more interesting ideas based on Open Source. Putting aside some rare projects driven by ideological incentives (e.g. nationalism, religion), I haven't seen any movement there. When you try to contact them to propose a joint business-university research program that might make economic sense and favor some real innovation, the academic morons have unrealistic expectations about what you're ready to invest and impose proportionally unrealistic conditions to their cooperation.
At the same time, there's no reason for Microsoft or Oracle to invent new paradigms if they have no strong incentives for it.
Oddly I feel rather stupid for the Open Source comment the moment I saw your reply as I bloody well knew the answer had I given it a thought or two.
(And my painful experience in trying out recently for personal financial usage some open source accounting applications taught me the hidden cost. The Code Monkeys seem to think it quite normal to simply assume end user is a programmer okay with them forgetting some key information like one has to install all kinds of bollocks to get their damned thing to work, painfully)
De facto license free of course locks in to the slicker proprietary. I suppose some of the arguments I have heard for government acquisition to drive development of Open Source in MENA make sense on a theoretical level from this perspective.
But from another pespective, as you note, precious little creativity and one is likely in the near term to get barely to unusable product that the government chimpanzees can't use.... for a lose-lose situation.
This comment caught my eye as well:
haven't seen any movement there. When you try to contact them to propose a joint business-university research program that might make economic sense and favor some real innovation, the academic morons have unrealistic expectations about what you're ready to invest and impose proportionally unrealistic conditions to their cooperation
I'm afraid it is not just the academics who have unrealistic expectations. Precious little understanding of what real "venturing" (whether technology or business practice based) requires or the long road. Too little practical experience and mentoring, too many having formed their ideas off of breathless journos writing about the end of the game and the big money realized.
Well, God is with the Patient, isn't that what one is told?
Nevertheless, growing bit by bit some real innovation centres isn't entirely out of the question. Pity, however, say the big local telecoms groups are not putting their money into such seriously. Short sighted I think.
Posted by: The Lounsbury at October 8, 2007 07:47 PM
All this talk about technology in the Arab world and no one has thought to consult the expert on such matters? Thomas Friedman tells us there is no problem, being in the flat world that we're in, and after all, "Everyone in Mali uses Linux."
Problem solved, where do I pick up my check?
Posted by: Djuha at October 8, 2007 11:36 PM
Regarding the question of directionalism in terminal windows, etc., one might want directionalism of text to be a fundemental setting at operating system level, so you had to specify a different setting than the default to have text displayed in the other direction.
Questions, which might illuminate the work that has been done in this interesting field so far:
How does this work in the Linux distributions that already support the Arabic language, e.g. Ubuntu?
I have heard that several purely Iranian Linux distributions have been created - how well have they integrated the Iranian language into the system?
And finally, how do they do this in Israel, where they may have a very similar problem, i.e. computers which are programmed in languages with a different directionalism and different alphabets?
(Ah, and BTW: What about Chinese and Japanese? They must ALSO have similar problems).
The reason I ask how it's done in the Linux distributions is that even though Windows has greater uptake/diffusion, Linux is much more easily manipulated and "tinkered with" and has better support for multiple languages (in Ubuntu, you can change installation language on an installed system without reinstalling, something which to my knowledge is not supported by Windows).
Just a few thoughts :-)
Posted by: Carsten Agger at October 9, 2007 04:16 PM
Actually, open source at the government level (or any other big corporation or organization) makes a whole lot of sense.
The problem is not so much with the fact that open source is mostly "work-in-process" material (exceptions being infrastructure and commodity components), it's that it's expected to be made of finished products. So in many projects, it'll be deployed as though it was finished. With all the negative consequences of lack of user buy-in, low productivity, low ROI, etc.
Now, if you are aware that you will need to invest into polishing them and making them monkey-proof, then you can make vast economies of scale by using open source software. Developing software up to its useability is more or less a fixed cost, and its marginal cost of replication is virtually nil, unlike proprietary software where you have to pay grosso modo proportional licensing costs. For a government, the cost of development could even be lowered by pushing it through universities' trainings.
Add to those benefits the flexibility and security of controlling your code. Plus the positive externalities relating to generated skills.
Reality though is that I'm not even sure there's the necessary couple of braincells needed to carry out such projects correctly in MENA (too much third-world level politics, too little management), and I'm not sure whether the driver behind the adoption of open source would be hype or economic analysis.
BTW, agreed re venturing and unrealistic expectations. But it goes beyond journos. It's the idea that somehow "outsiders" who open up their pocket must be too rich to count their money or something. Like the cases of the wasta boys.
Hebrew has exactly the same problem as Arabic (minus the glyph rendering). Unicode bidi at the GUI, and nothing at the console. I have no idea about Chinese or Japanese.
Turkey replaced the Arabic script with a Latin script. Would it make sense for Arab countries to do the same thing? I imagine there would be objections -- both of a practical nature and an ideological one.
Posted by: Cabalamat at October 19, 2007 01:42 PM
A change to Latin scripts in Arab countries is likely to meet unanimous resistance and doesn't make much sense (for what benefits - even the change in Turkey created problems at the time, for what?).
More specific to this entry, this is mostly a technological/algorithmic problem. The only part that is related to Arabic script itself is minor (the glyph rendering issue), and even that one can easily be resolved by just creating the right "fonts". After all, Arabic is extremely rich in calligraphy.
Here's an excellent article about the evolution of Arabic "print letters" and how difficult it has been: Keyboard Calligraphy.
Posted by: Ali K at October 20, 2007 11:04 AM