Text formats and representing language character has ever been in the focus ever since computers were invented. For obvious reasons we wanted to interact with the computer in the language that we understand better rather than in the binaries. Clearly the focus initially was to build a representation method for the international language English. But as we evolved more sophisticated in the Internet space global applications are more looking at systems that can enable users to work in their specific locale (language, currency, date and time formats etc.). As far as language is concerned the old formats such as ASCII and EBCDIC will not help in representing the characters of languages around the world.
The Unicode Consortium, a non profit organization developed the standards Unicode Transformation Formats that help in representing the characters of any language in the world. The Unicode Standard defines three encoding forms that allow the same data to be transmitted in a byte, word or double word oriented format (i.e. in 8, 16 or 32-bits per code unit). All three encoding forms encode the same common character and can be efficiently transformed into one another without loss of data. UTF-8 (Unicode Transformation Format 8 ) is the standard format that is used for web applications that is applications that use HTML for visual representation of text. “The Unicode® Standard: A Technical Introduction” in the Unicode site gives an introduction to the technical details of UTF.