In India too, we had a similar indigenous encoding schemes, the most widespread of them being ISCII (Indian Script Code for Information Interchange) – the idea is very similar in that you use characters between 128-255 to denote Indian language characters. Depending on the ISCII encoding you chose, the same set of bytes could represent a different language.
The Encoding class in .NET allows you to convert between these encodings and Unicode. Recently while having a discussion with Dr. Pavanaja, it occurred to me that you can use the Encoding class to also do the conversion from ISCII to Unicode.
Let’s see an example (I chose to write it as a web-page and not a console app because the final Unicode result will not show up on console):
<%@Page Language="C#"%>
<%@Import Namespace="System.Text"%>
<%@Import Namespace="System.Globalization"%>
<script runat="server">
void Page_Load(Object o, EventArgs e)
{
//Response.Write("Hello World");
Encoding encFrom = Encoding.GetEncoding(1252);
Encoding encTo = Encoding.GetEncoding(57008);
String str = "ØÛÆèÄÜ";
//Get it into a byte array...
Byte[] b = encFrom.GetBytes(str);
String strUnicode = encTo.GetString(b);
Response.Write(strUnicode);
}
</script>
57002 denotes the ISCII Hindi Encoding. Other ISCII Encodings are:
Codepage | Name | Language |
---|---|---|
57002 | x-iscii-de | Devnagri |
57003 | x-iscii-be | Bengali |
57004 | x-iscii-ta | Tamizh |
57005 | x-iscii-te | Telugu |
57006 | x-iscii-as | Assamese |
57007 | x-iscii-or | Oriya |
57008 | x-iscii-ka | Kannada |
'ØÛÆèÄÜ' is an Indic String – with the right software/font you should be able to view it. You can also create an HTML document (thanks agin to Dr. Pavanaja for the tip!) with the following Meta tag, to view the contents of the ISCII string without explicitly doing conversion to Unicode (though IE does it internally for you before rendering it using the sytem installed Indic Open-Type fonts):
<meta http-equiv="Content-Type" content="text/html; charset=x-iscii-de">
5 comments:
So what happens if you happen to have some of the CDAC fonts installed, and have used the appropriate font tags? Does IE still convert internally to Unicode and display using the supplied Indic fonts?
While on the topic, has anyone created any more Unicode Indic fonts?
Hi,
a.) If you use the meta-tag, IE will still do a conversion and use the default opentype font for the display. If you want the CDAC font to be used you'll need to skip the meta tag altogether.
b.) I haven't seen anything from MS, and unfortunately, I haven't seen any free open-type fonts either. The tools to create these fonts are freely available but their uptake has been slow.
Regards,
Deepak
Hi,
CDAC fonts have to be converted into ISCII first and then into Unicode. Please visit the discussion forums at www.bhashaindia.com wherein these are discussed heavily.
Regards,
Pavanaja
Hi,
can u please tell me how indic languages are encoded and inter pretated in ISCII coding method as all of them have same numerical-character set??
Hi,
How to write that to a text file. I tried but failed using system.io.
Any suggestions to properly write the unicoded text to a file?
Thanks!!
Post a Comment