This is a story about four people named Everyone, Someone, Anyone and No-one. There was an important job to be done and Everyone was expected to do it. Everyone was sure Someone would do it. Anyone could have done it, but No-one did it. Someone got angry about it because it was Everyone’s job. Everyone thought Anyone could do it, but No-one realised that Everyone wouldn’t do it. It ended up that Everyone blamed Someone when No-one did what Anyone could have done.
Fixing the issues are everyone’s job. And there are so many issues are there with GNU/Linux systems that handles Malayalam Unicode and so called ASCII / ISCII fonts. Most people in Kerala who use Malayalam fonts suffers this issue. Majority publishing houses and newspapers still use the non-standard ASCII fonts. This ASCII Fonts are nothing but a gimmick done in the ISO/IEC 8859-1 code table. This code table handles the complex characters in Latin script. What ASCII Malayalam font does is really funny – it replaces Malayalam glyphs instead of the whole glyphs in the ISO/IEC 8859-1 code table! So the computer system identifies the text just as some junk characters in Latin script. So what is ISO/IEC 8859-1 code table? Wikipedia says:
ISO 8859-1 was based on the Multinational Character Set used by Digital Equipment Corporation in the popular VT220 terminal. It was developed within ECMA, the European Computer Manufacturers Association, and published in March 1985 as ECMA-94, by which name it is still sometimes known. The second edition of ECMA-94 (June 1986) also included ISO 8859-2, ISO 8859-3, and ISO 8859-4 as part of the specification.
In 1985 Commodore adopted ISO 8859-1 for its new AmigaOS operating system. The Seikosha MP-1300AI impact dot-matrix printer, used with the Amiga 1000, included this encoding.
In 1992, the IANA registered the character map ISO_8859-1:1987, more commonly known by its preferred MIME name of ISO-8859-1 (note the extra hyphen over ISO 8859-1), a superset of ISO 8859-1, for use on the Internet. This map assigns the C0 and C1 control characters to the unassigned code values thus provides for 256 characters via every possible 8-bit value.
ISO-8859-1 is (according to the standards at least) the default encoding of documents delivered via HTTP with a MIME type beginning with “text/” (however the draft HTML 5 specification requires that documents advertised as ISO-8859-1 actually be parsed with the Windows-1252 encoding.) It is the default encoding of the values of certain descriptive HTTP headers, and defines the repertoire of characters allowed in HTML 3.2 documents (HTML 4.0, however, is based on Unicode). It and Windows-1252 are often assumed to be the encoding of text on Unix and Microsoft Windows in the absence of locale or other information, this is only gradually being replaced with Unicode encoding such as UTF-8 or UTF-16.
Currently various DTP tools including Scribus, GIMP etc does not support Unicode. Persons like me who use extensive Malayalam text regularly suffers a lot due to such huge issues. And Unicode itself has some disputes regarding the codepoints like zwj, zwnj, atomic chillu etc.
I am attaching a presentation done for a group discussion earlier in 2010. Some png files generated from Oo.Impress:
The presentation was planned to deliver in Malayalam. But the fact is that I could not present it properly in the discussion because the schedule was so tight so that I could not enough time to deliver my speech. Lets “Everyone” of us join hands together to fix the issues the Unicode Malayalam.