FAQTechnical Documentation
DRS Documentation
General FAQ
Whois Accuracy Program (WAP)
.de Domains.eu DomainsGeneric DomainsSponsored DomainsccTLDsSpecial TypesENUM
Domain Types
Internationalized Domain Names (IDN)
Technical Background
Countries Overview
Character Sets
Example Website
Redirection FAQ
Service Bundle
Price List
Application Form
Premium-Partner-Program (P3)

Background Knowledge onInternationalized Domain Names

Domain names containing special characters and umlauts (like “ü” in “grün.info”), can be registered at more and more top level domains.
In the following, we want to provide an insight into the technical foundations of the topic “Internationalized Domain Names” (IDN). All you need in order to be able to use these domain names, are modern programs like the browsers Mozilla 1.4, Netscape 7.1 or Opera 7.2. They are completely preconfigured and ready to go.
Some browsers still have to evolve


Your browser has to support the conversion into Punycode in order to be able to use domain names with local language characters. (See below for an explanation of Punycode.) In contrast to the browsers listed above, Microsoft's Internet Explorer currently doesn't offer support for IDN. In order to become “IDN-aware”, it has to be enhanced with so-called Plugins, the most famous being VeriSign's I-Nav-Plugin.
Use this link to simply test if your system is configured to use IDN. The link takes you to our test page grün.knipp.de. If a new browser window shows up with a green (german: grün) page, you are ready to use Internationalized Domain Names.
If your browser cannot display the test page, you can transform the domain name “grün.knipp.de” into Punycode yourself by using our conversion tool. You can then load the page simply by copying the so acquired character code to the address line of your browser.
When we are talking about introducing german umlaut characters and other local language characters here, this is about domain names only. The contents of a website have been able to handle characters like that ever since the beginning. For example, Knipp has once created a Japanese website called “Germany Shop”, which was used to sell typical german products in Japan.
In the beginning there was ASCII

In the beginning

The Internet first spread in the United States of America. The English language does barely know any special characters. Therefore, the complete technical infrastructure and the domain name system was based on the characters from “a” to “z”, the digits from “0” to “9” and the hyphen. Those domain names are also called LDH-Names (Letter, Digits, Hyphen).
To break this restriction, it would be necessary to completely replace the equipment and software, including all switching centers for e-mail, www-proxys, firewalls, etc. This would be very complex and expensive and is virtually impossible.
Power On for Extensions

The solution

Scientists, among them technicians, computer scientists and also linguists, have thought up an alternative solution to replacing the complete infrastructure. Only the end devices, or - to be more exact - the “end software” has to be changed. In other words, only the browsers and the e-mail programs have to understand special characters.
Software with this capability is then called "IDN-aware". It unambigously maps every domain name containing a local language character to a new name, which in turn contains only characters from “a” to “z”, the digits from “0” to “9” and the hyphen.
It is the task of the Unicode Consortium to determine which characters can be mapped throughout the world. At present, about 70.000 characters are defined. The registries can choose what subset out of this huge amount of characters they want to allow for their top level domain. Afilias, for example, the registry for .info domains, has initially decided to only allow the umlaut characters “ä”, “ö” and “ü” as well as the the german character “ß”.
Using the rear exit to avoid problems


The conversion used by the end software is called Punycode conversion. It is defined in a sort of industry standard in RFC 3492. It was a subject of consideration that the converted name should give a reasonable idea of what the original name sounds like. Example:
Original name
Punycode format
You can use our conversion site to try additional conversions by yourself. You can also use it to re-convert Punycode format to the original notation.
By the way, each character string that is separated by dots is converted individually. To give an example, the sub-level domain “käse.müller.info” is converted to “xn--kse-qla.xn--mller-kva.info”. This is called individual label conversion.
Every Punycode label consists of up to 3 parts:
The prefix always consists of this character string. It indicates that the label is in Punycode format. For this reason, many registries have disallowed the registration of common domain names that begin with this character string (which probably is of not much use in everyday's life anyway).
These are all characters of the label which remain after deleting the special characters. If no conventional characters are used the original name, this part remains empty, as shown in the example above.
The enconding defines which special characters exist at which position of the original name. The encoding is based on a very complex formula. If the root is empty, even the character “-” is omitted which usually separates root and encoding.
Die grünen Äpfel = the green apples

Length does matter

The technical regulations for the domain system define that labels can use a maximum of 63 characters each. Practically this restriction does not loom large so far.
To avoid exceeding the limit of 63 characters when choosing a domain name, you should always consider the fact that the Punycode format of the name is usually longer than the original format. Based on the number of special characters used, the length of the Punycode format can easily double by the conversion.
It is not easy to predict, as the following examples show. Even if the original names are of the same length, different Punycode lengths may result:
Strange characters

Special case ß

The letter “ß” has a special role in the conversion. It does not lead to a name which starts with the prefix “xn--”. In fact an “ß” is converted into an double-s “ss”. Have a look at the example in the table above.
The reason is, that “ß” from a linguistical point of view is not a special character but a ligation. Ligations are characters which were created by merging two other characters. “ß” is composed of “s” and “z”. Less known and nowadays barely known ligations are “fi” and “ffi”.
Some fruit are closely connected: Bundles

Variants, bundles and languages

Besides the conversion of the names, another problem has to be solved. Different spellings of words can mean the same thing. If, for example, a normal character like “e” is a variant of “è”, then in French the name of the domain for the swiss town geneva can be written in two different ways:
  • geneve.ch
  • genève.ch
That is the reason why the respective registry has to determine how to handle these kind of variants. In principle, there are the following possibilities:
  • If a registrant has registered one variant, then he also holds all other variants at the same time.
  • If a registrant has registered one variant, all other variants are reserved for him. He has to pay individually for each of the other variants, however, if he intends to actually use them.
  • Only the variant as exactly registered is owned by the registrant. Other variants can be registered by different registrants.
The automatic registration of further variants when registering one variant is called “bundle”.
Since the composition of a bundle strongly depends on the language, it is internationally prescribed that the language must always be specified when registering Internationalized Domain Names. There will be no bundles for .de domains, however. For that reason, the language field will be automatically set to “ger” or “de”, respectively. For domains from the so-called CJK area (China, Japan, Korea), managing bundles is a rather complex subject due to the large number of characters with simultaneous yet differing usage in the different languages, and consistently leads to legal conflicts.
print (C) 1996-2024 Knipp Medien und Kommunikation GmbH|webmaster@knipp.de