Until 2003, the domain names could only consist of the Latin alphabet, numbers from 0 to 9, and hyphen. These measures are explained by the fact that the Domain Name System is built with an English keyboard layout and was not really suitable for such a global project as the Internet.


To remedy the situation, a system called Internationalizing Domain Names was introduced. The purpose of this mechanism was to define a standardized translation from Unicode to ASCII (American Standard Code for Information Interchange) so that characters from all world known alphabets could be displayed on the Internet domains.


How does it work?

Most of the Internet infrastructure is supported by the ASCII character set only. To make sure these internationalized names can be processed, each IDN that is available in Unicode can be converted to an ACE string. This string is based on the American Standard Code for Information Interchange. Following this, URLs using the accent or umlauts signs are shown as they should. And the server, in turn, continues to process the address as ASCII-compatible. These processes are described in IDNA2003 and IDNA2008 memos. The converting from Unicode to ASCII is executed on the client’s side and is based on the standardized encrypting processes Punycode.


The differences between IDN2003 and IDNA

The original process of 2003 implies the normalization of internationalized URLs prior to the Punycode encoding process using the nameprep method. This method changed the uppercase letters to lowercase ones, removed the unprintable chars, and unified the equivalent characters. Nameprep was removed from this process as soon as IDNA2008 was introduced. From now on, IDNA does not specify any normalization, but instead, it recommends an algorithm that converts the uppercase letters to lowercase ones.

Read also