IDN ccTLD Discussion

From Idn

Jump to: navigation, search

IDN ccTLD Discussion - Examples of possible - or inappropriate - IDN.IDN TLD strings for ccTLDs

Contents

[hide]

[edit] About this Document

This document is being developed and maintained for the sole purpose of illustrating solutions and issues related to the introduction of of IDN ccTLDs and, indirectly, IDN gTLDs. It does not constitute a proposed norm of any sort; it merely serves as a compilation of currently available information. If you have any knowledge that might be useful for this purpose, or if you are able to correct mistakes in this document, please contribute by making directly the changes on the Wiki, or by contacting one of the contributors.

This compilation is work in progress and must not be used for any other purpose than to facilitate discussion of IDN TLD-related issues.

[edit] Criterion of Demonstrated Need for the delegation of an IDN ccTLD

IDN ccTLDs alias should be assigned on the basis of a demonstrated need only. In other words, this should be the only criterion, in the sense that the absence of an IDN alternative to the ASCII ccTLD hinders the development of the Internet in the country or territory.

Here are a couple of practical problems from which a pressing need for an IDN ccTLD can arise. The premise is of course fact that the second-level domains often need be IDN because transliterated spelling is imprecise, unreliable, unpleasant or even inaccessible to most users. In some cases, the TLD must be an IDN label as well because of one or several of the the following reasons.

[edit] Need to switch input methods within the domain name

For most non-latin input methods, the entire keyboard is used for the localized input. To switch over to Latin input, the meaning of all the keys must be switched. This is extremely unpractical to do within a domain name.

[edit] Need to switch writing direction within the domain name

In right-to-left scripts such as Arabic or Hebrew, the use of an ASCI TLD label implies that the writing direction changes within the domain. This is further complicated by the fact that the dot between labels has the property of being both right-to-left and left-to-right. As a result, the user trying to type a mixed-script domain name will have to be extremely astute to just even correct a spelling mistake in a domain name - a deletion or an insertion of a new character, or even just moving the cursor, might produce totally unexpected results.

[edit] Confusability of characters within the same domain if TLD is in ASCII

In scripts containing characters whose giyphs can be identical to Latin glyphs (whether or not with the same sound value), the use of an ASCII TLD in combination with IDNA lower-level labels causes user confusion. The user might not realize that a given character is being read or typed in the "wrong" script for the second label, whereas the top label has to be written in the "wrong" script.

Of instance, the expression сахар.ru (xn--80aa2cbv for "sachhar") looks as if the second-level label was in ASCII, even though it is in Cyrillic. Conversely, a user might mistakenly type it in ASCII (caxap.ru) which of course has nothing to with the domain. If the TLD is written in a way that makes it clear that it is cyrillic (thanks to the use of at least one character that does not have a glyph used in Latin script), then this confusion can largely be avoided.

[edit] Impossibility of graceful pronunciation or rendering

A domain name is a single expression, i.e. it must be graceful as a whole, including the TLD label, to be practical. The ability to create aesthetically pleasant domain names is not a luxury, but a necessity for the development of the Internet in the local environment. All scripts in the world have naturally developed aesthetic criteria through which written words become pleasant to see. By the same token, the phonetics of each language naturally have developed a coherent set of sounds that are pleasant to hear. Shapes from foreign characters or sound of foreign languages are often not only difficult to understand, but also unpleasant to see or hear if mixed too narrowly with the native language. Graphical designers usually create subtle workarounds in these cases, separating the local and foreign scripts if they appear on the same view, or adjusting size, spacing or weight of the characters to create a pleasant overall look. In spoken language, people automatically separate sounds that do not go well with one another. Within a domain name, these workarounds are mostly impossible. Overall this means that the absence of an IDN ccTLD may mean for a certain community that words of its own language are not just unclear, but ugly and unpleasant as domain names because they need to be combined with a foreign script. Any such situation severely limits the development of the local Internet.

[edit] Consequences of the demonstrated need criterion

If this criterion is applied, only a very limited number of the countries require one or more IDN ccTLD. Only very few require more than one.

For instance, one can easily conclude that there is no need for an IDN ccTLD for .ch, .de or .fr. Conversely, in the cases brought to the attention of the ICANN community, such as Arabic, Chinese or Cyrillic, the need for an IDN ccTLD is obvious. There is no demonstrated need for an IDN ccTLD for a country using extended ASCII. For instance, neither for Austria or Vietnam need an IDN ccTLD, even though the respective country names require extended ASCII.

[edit] Discussion of IDN ccTLDs by Major Script Groups

[edit] CJK (Chinese, Japanese and Korean)

[edit] .cn - China - 中国

The country variants has two variants:

  • .中国 xn--fiqs8s "zhōng guó" (China) in simplified ideographs
  • .中國 xn--fiqz9s "zhōng guó" (China) in traditional ideographs

The letter 中 (zhōng) meaning "middle" is identical between simplified and traditional Chinese, as well as Japanese and Korean. Both of the two variants of the second letter, 国 /國 (guó) for "country", also exist in Japanese. For Japanese, the string is pronounced ”ちゅうごく" ("chûgoku") and 中国 can be obtained with any Japanese computer input method.


[edit] .hk - Hong Kong - 香港

  • .香港 xn--j6w193g "xiāng gǎng" (Hong Kong)

There are no variants as both characters are identical between traditional and simplified Chinese. For Japanese, a typical computer input method permits obtaining 香港 by inputing the Hiragana string ほんこん ("honkon").


[edit] .jp - Japan - 日本

  • .日本 xn--wgv71a (pronounced "nihon" or "nippon" and "riben" in Chinese) Full country name.

The ideographs 日 and 本 are identical in Chinese traditional and simplified form, so there are no variants.


[edit] .kr - Korea - 한국

See: http://en.wikipedia.org/wiki/Names_of_Korea

  • Imaginable IDN ccTLDs for .kr
    • .한국 xn--3e0b707e "Hankuk" (Korea) in Hangul script
    • .韓國 xn--9cs902m "Hankuk" in Hanja
    • .한 xn--6q8b ("han", first syllable for "Hankuk")

Hanja is rarely used in Korea nowadays, but appears in sometimes in formal names. It is available on Korean keyboards as a conversion function from Hangul script. This format is understood in Japan, even though in Japan now the simplified character 国 is mostly used instead of 國. In Japanese, this is pronounced "Kankoku" and is used to refer to South Korea. In Japanese, computer input methods will automatically use two characters 韓国, but the Japanese also recognize the string 韓國. In Chinese, the first character is also simplified, 韩国 (Hánguó), used for South Korea.


[edit] .tw - Taiwan - 臺灣

Due to the simplified and traditional variants of these characters, there are four variants possible:

  • .臺灣 xn--nnx388a "taiwan" in traditional ideographs
  • .台湾 xn--kprw13d "taiwan" in simplified ideographs
  • .台灣 xn--kpry57d "taiwan" in typical usage (also used in Japan)

台灣 xn--kpry57d "taiwan" is mixed form involving the character 台 often classified as "simplified chinese" and the character 灣 which is classified as traditional Chinese. However, the character 台 is often used for Taiwan, even in Taiwan itself (where usually traditional Chinese is used), and in Japan.

One solution could involve delegating the three versions, where two of them would enter the DNS as DNAME records.

Technically, the 臺湾 xn--s8wp92b "taiwan" in mixed form could also be added, but is not generally used. However, only the first two (traditional-only and simplified-only) variants would be needed since mixed form words occur rarely in practical usage other than by error. It is worth noting that in Taiwan, traditional ideographs are used exclusively. The simplified variant (.台湾) may be used by Chinese users from other parts of the world.

[edit] Cyrillic

[edit] General Remarks

[edit] Identical Characters (similar shape, similar value) between Cyrillic and Latin

The Cyrillic script has many characters that either resemble their Latin equivalents On the top level the DNS, fact that the Cyrillic "а" is confusable with the Latin "a" is not as problematic as it is on the the lower levels. This means, for instance, that .mk and .мк (for the Former Yougoslav Republic of Makedonia), albeit confusable, would map to the same ccTLD.

[edit] Confusable Characters (similar shape, another character's value) between Cyrillic and Latin

Issues arise on the DNS top level from the fact that the Cyrillic "с" (as in "Сахара" for "Sahara") is confusable with the Latin "c", the Cyrillic н (as in "Новгород" for "Novgorod" is confusable with the Latin uppercase "H", the Cyrillic "р" (as in "Россия" for "Russia") is confusable with the Latin "P", etc.

This makes the Cyrillic transliterations of some ISO-3166 codes confusable with other ISO-3166 codes. For instance, Russia's ".ru" becomes ".ру" by transliteration, confusable with Paraguay. Mongolia's ".mn" becomes ".мн", confusable with the Marshall Islands.

The way to deal with it is to ensure that there is no confusable TLD or ISO-3166 string. To minimize the need to reserve ISO-3166 codes, one can try to select Cyrillic strings that have at least one character that does not resemble a Latin letter. This works in many cases, as in "рф" for Российская Федерация ("Russian Federation"). In some cases (as in "Україна" for "Ukraine", all characters in the country name are confusable, so either a three-letter (or longer) IDN ccTLD would have to be used, or a the confusable two-letter ASCII string would have to be reserved in ISO-3166.

At the same time, the IDN ccTLD should be acceptable to the local community: in each case, many non-confusable strings are available, and the local community should select it.

[edit] Countries having switched from Cyrillic to Latin

Azerbaijan (.az - "Azərbaycan", Cyrillic "Азәрбајҹан") and Uzbekistan (.uz - O‘zbekiston, Cyrillic "Ўзбекистон") have switched from Cyrillic to Latin script in the early 1990s. For this reason, we have not included them in this list. It is of course up the respective national community whether or not a Cyrillic or Arabic-script IDN ccTLD should be introduced.

[edit] .by - Беларусь - Belarus

  • .бу xn--90a0b (.БУ in uppercase) Literal transliteration of existing ccTLD string, non-confusable with ASCII (other than the .bз itself), also matches the Cyrillic country name.


[edit] .ba - Босна и Херцеговина - Bosnia and Hercegovina

  • .ба xn--80ab (.БА in uppercase) Literal transliteration of existing ccTLD string, non-confusable with ASCII (other than the .ba itself), also matches the Cyrillic country name.


[edit] .bg - Бблгария - Bulgaria

  • .бг xn--90ae (БГ in upper case), equivalent to the current ASCII ccTLD string ".bg", corresponds to first letter of first two syllables in Cyrillic, not confusable with an ascii string other than .bg itself. The the .bg ccTLD registry expressed its intention to request this string through a letter by CENTR to ICANN in March 2007.


[edit] .kz - Қазақстан - Kazakstan

  • .қз xn--g1a3r (ҚЗ in uppercase) Transliteration of ASCII ccTLD string, also represents first letter of first too syllables in Cyrillic.

Alternatives include .қзқ and .қаз.


[edit] .kg - Кыргызстан - Kyrgyztan

  • .қг xn--c1a1s (ҚГ in uppercase) Transliteration of ASCII ccTLD string, also represents first letter of first too syllables in Cyrillic.


[edit] .mk - Македонија (ПЈРМ) - Macedonia (FYROM)

  • .мк xn--j1ad (МК in uppercase) Literal transliteration of existing ccTLD string, confusable wit its own ASCII representation, hence an adequate solution.

Alternatives include: мкд xn--d1alf


[edit] .mn - Монгол улс - Mongolia

Candidate strings for the IDN variant to the .MN ccTLD include:

  • .мон xn--l1acc (first 3 letters of "mongol" in Cyrillic. This would imply that .moh (in ascii) would not be available as a TLD, except possibly as an alias to .мон. In Mongolia, .мон would be familiar and more natural. The choice of .мон would also imply that entire domain strings are possible that would look like a full ASCII string even though they are in Cyrillic, such as "омо.мон".
  • .мг xn--c1ar (first letter of each syllable in "mongol" in Cyrillic. The letter "г" in the string would ensure that no confusability can arise with an ASCII ccTLD. Second alternative would be .мг, however, there is no tradition to refer to Mongolia with the string .мг.

Alternatives include .монгол xn--c1aqbeec ("mongol" in Cyrillic characters. ) The transliteration of the ASCII TLD string ("mn") would be "мн" (xn--l1ac). This is confusable with the TLD "mh" for the Marshall Islands.

[edit] .ru - Россия - Russia

  • .рф xn--p1ai ( .РФ in uppercase) reads "rf" for Российская Федерация (Russian Federation)

One alternative would be ".ря" xn--p1a4a (first and last letter of the word Росси'я), but from a non-confusability perspective, it is inferior to .рф. A abbreviation based on the first two letters, ".ро" xn--n1ad ("ro") would be confusable with the two Latin letters "po" (which do not currently represent an ISO 3166 code, though). The transliteration of Russia's ccTLD string "ru" would give ".ру" xn--p1ag, visually identical to ".py", the ccTLD string for Paraguay.

As an alternative, ".рос" xn--n1ade ("ros" in Latin transliteration) would be imaginable. But many people might not realize that the ".рос" is in Cyrillic and mispronounce it beyond recognition. Of course this would imply blocking the Latin character string "poc" for TLD purposes.

[edit] .rs/.yu - Република Србија - Republic of Serbia

Alternatives include:

  • .сб xn--90a5a First letter of the first two syllables (if the r-sound is regarded as containing vowel)
  • .срб xn--90a3ac First three letters of hte country name.

Other alternatives: ср xn--p1ab is confusable with the exceptional ISO 3166 code reservation for "Clipperton Islands". рс xn--p1ac is confusable with the ascii string "pc" which is not currently reserved in ISO 3166. The Cyrillic string "рс" for Република Србија (reflecting the latin characters "rs" as in "Republic of Serbia) could be used if ISO 3166/MA decided to reserve the ascii code "PC" for this purpose.


[edit] .tj - Тоҷикистон - Tajikistan

  • тҷ xn--r1a8u (.ТҶ in uppercase) Transliteration of ASCII ccTLD string, also represents first letter of first two syllables in Cyrillic.


[edit] .ua - Україна - Ukraine

  • .укр xn--j1amh (.УКР in capitals). Letters "ukr", the first three letters of the country name in Cyrillic. If the Cyrillic string .укр is delegated, then ICANN must make sure treat the ASCII string "ykp" as confusingly similar to .укр.
  • .уа xn--80a1b (.УА in capitals). Transliteration of the existing ccTLD string "ua". This is confusable with the ascii string "ya" which (as most of the strings beginning with y) happens to be available in ISO 3166. If the Cyrillic string .уа is delegated, then ISO-3166/MA must make sure that the string "YA" is reserved and not used within ISO 3166.

Technically, .ук would be possible, but it is not natural in Cyrillic to abbreviate a word before the last consonant of the syllable. Therefore .укр is more natural.

[edit] Arabic

[edit] Introduction

The Arabic Domain Name Task Force (ADTF), formed under the auspices of the UN Economic and Social Commission for Western Asia (ESCWA), issued and Internet Draft in December 2006 recommending the use of short country names as Arabic IDN ccTLDs. See: http://www.ietf.org/internet-drafts/draft-farah-adntf-adns-guidelines-02.pdf (referenced as "Farah, et al., 2006" below). The recommended strings are highlighted in the list below. The working group had considered other alternatives, such as the use of two-letter abbreviations. There is no tradition of using abbreviations in Arabic. The fact that the Task Force dropped the idea of using abbreviations after careful considerations shows the limits of a "one paradigm fits all" approach. Aprart from the fact that some two-letter abbreviations carry inappropriate meanings, one fundamental reason cited in the document is the need or a domain to be graceful for use in advertising by registrants. This is more important than short domain names, especially a Arabic graphemes are very economical in terms of space.

The TLD string alternatives compiled here are largely taken from the July 2003 paper "Supporting the Arabic Language in Domain Names" by Abdulaziz H. Al-Zoman http://www.arabic-domains.org/docs/NIC-docs/SupportingArabicDomainNmaes.pdf . See also http://www.minc.org/data/d6570566088xpdw_Final%20MINC%20IDN%20TLD%20Report.pdf .

[edit] .jo - Jordan - الأردن

Possible TLD string Punycode encoding Keystrokes
$ = hamza above
Comments

الإردن

xn----igbhzh7gpa
u+0627 u+0644 u+0623
u+0631 u+062F u+0646
"al$ardn"

ا ل أ ر د ن

short country name including "al-"
recommended in Farah, et al., 2006.

أردن

xn--igbyf9f"$ardn"short country name (without the "al")

ار

xn--mgbu"$ar"first two letters of country name


[edit] .ae - United Arab Emirates - الإمارات العربية المتحد

Alternatives include:

Possible TLD string Punycode encoding Keystrokes
£=alef with
hamza below
Comments

الإمارات

xn--kgbdbap4b0ij
u+0627 u+0644 u+0625
u+0645 u+0627 u+0631
u+0627 u+062A
"al£marat"

ا ل إ م ا ر ا ت

Short name Al-Imārāt
Recommended by Farah, et al., 2006

إمارات

xn--kgbeam7a8h"£marat"Short name without the "al"

ام

xn--mgb4d"am"two-letter abbreviation


[edit] .bh - Bahrain - البحرين

Alternatives include:

Possible TLD string Punycode encoding Keystrokes Comments

البحرين

xn--mgbcpq6gpa1a
u+0627 u+0644 u+0628
u+062D u+0631 u+064A
u+0646
"albhrin"

ا ل ب ح ر ي ن

short country name (with "al-")
Recommended by Farah, et al, 2006

بحرين

xn--ngbkm8fta"bhrin"short country name (without the "al")

بح

xn--ngbk"bh"two-letter abbreviation
matches first two letters in Arabic
transliteration of ISO-3166 code


[edit] .tn - Tunisia - تونس

Possible TLD string Punycode encoding Keystrokes Comments

تونس

xn--pgbs0dh
u+062A u+0648 u+0646
u+0633
"tuns"

ت و ن س

Short country name
Recommended by Farah, et al., 2006

تو

xn--pgb4d"tu"two-letter abbreviation
matches first two letters in Arabic.


[edit] .dz - Algeria - الجزائر

Possible TLD string Punycode encoding Keystrokes
$=hamza above
Comments

الجزائر

xn--lgbbat1ad8j
u+0627 u+0644 u+062C
u+0632 u+0627 u+0626
u+0631
"aljza$ir"

ا ل ج ز ا ئ ر

Short country name (with "al-")
Recommended by Farah, et al., 2006

جزائر

xn--lgbbowc"jza$ir"Short country name (without the "al")

جز

xn--rgbm"jz"two-letter abbreviation
matches first two letters in Arabic.


[edit] .dj - Djibouti - جيبوتي

Possible TLD string Punycode encoding Keystrokes Comments

جيبوتي

xn--ngbee7iid
u+062C u+064A u+0628
u+0648 u+062A u+064A
"jibuti"

ج ي ب و ت ي

Short country name
Recommended by Farah, et al., 2006

جي

xn--rgb4d"ji"two-letter abbreviation
matches first two letters in Arabic


[edit] .km - Comoros - جزر القمر

Possible TLD string Punycode encoding Keystrokes Comments

القمر

xn--mgbu4chg
u+0627 u+0644 u+0642
u+0645 u+0631
"alqmr"

ا ل ق م ر

Short country name with "al-"
Recommended by Farah, et al., 2006

قم

xn--ehbg"qm"two-letter abbreviation


[edit] .sa - Saudi Arabia - السعودية

Possible TLD string Punycode encoding Keystrokes
G=ain
p=teh marbuta
Comments

السعودية

xn--mgberp4a5d4ar
u+0627 u+0644 u+0633
u+0639 u+0648 u+062F
u+064A u+0629
"alsGudip"

ا ل س ع و د ي ة

Short name as-Saʻūdiyya
Recommended by Farah, et al., 2006

سعودية

xn--ogblly9en"sGudip"Short name without the "al"

سع

xn--ygbm"sG"two-letter abbreviation
matches first two letters in Arabic
transliteration of ISO-3166 code


[edit] .sd - Sudan - السودان

Possible TLD string Punycode encoding Keystrokes Comments

السودان

xn--mgbaxp8fpl
u+0627 u+0644 u+0633
u+0648 u+062F u+0627
u+0646
"alsudan"

ا ل س و د ا ن

Short name as-Sūdān
Recommended by Farah, et al., 2006

سودان

xn--mgbpl2fh"sudan"Short name without the "al"

سد

xn--ugbh"sd"two-letter abbreviation
matches first and third letter in Arabic
transliteration of ISO-3166 code


[edit] .sy - Syria - سورية or سوريا

Possible TLD string Punycode encoding Keystrokes
p=teh marbuta
Comments

سورية

xn--ogbpf8fl
u+0633 u+0648 u+0631
u+064A u+0629
"surip"

س و ر ي ة

short country name
(spelling with teh marbuta)
Recommended by Farah, et al., 2006

سوريا

xn--mgbtf8fl"suria"short country name
(spelling with alef)

سر

xn--wgbd"sr"two-letter abbreviation
matches first and third letter in Arabic


There are two spellings of the short country name, one ending in an alef (ا) and the other in a teh marbuta (ة).

[edit] .so - Somalia - الصومال

Possible TLD string Punycode encoding Keystrokes
c=sad
Comments

الصومال

xn--mgba5b5cceu
u+0627 u+0644 u+0635
u+0648 u+0645 u+0627
u+0644
"alcumal"

ا ل ص و م ا ل

Short country name with "al-"
Recommended by Farah, et al., 2006

صومال

xn--mgb1a8bco"cumal"short country name (without the "al")

صو

xn--0gb2b"cu"two-letter abbreviation
matches first two letters in Arabic


[edit] .iq - Iraq - العراق

Possible TLD string Punycode encoding Keystrokes
G=ain
Comments

العراق

xn--mgba3a5azci
u+0627 u+0644 u+0639
u+0631 u+0627 u+0642
"alGraq"

ا ل ع ر ا ق

Short name al-`Irāq
Recommended by Farah, et al., 2006

عراق

xn--mgbtx2b"Graq"Short name without the "al"

عر

xn--wgbp"Gr"two-letter abbreviation
matches first two letters in Arabic
transliteration of ISO-3166 code


[edit] .om - Oman - عُمان

Possible TLD string Punycode encoding Keystrokes
G=ain
Comments

عمان

xn--mgb9awbf
u+0639 u+0645 u+0627
u+0646
"Gman"

ع م ا ن

Short name ʿUmān
(without diacritic mark)
Recommended by Farah, et al., 2006

عم

xn--4gby"Gm"two-letter abbreviation
matches first two letters in Arabic
transliteration of ISO-3166 code


[edit] .ps - Palestine - فلسطين

Possible TLD string Punycode encoding Keystrokes
z=tah
Comments

فلسطين

xn--ygbi2ammx
u+0641 u+0644 u+0633
u+0637 u+064A u+0646
"flszin"

ف ل س ط ي ن

Short name "Falastīn"
Recommended by Farah, et al., 2006

فل

xn--dhbg"fl"two-letter abbreviation
matches first two letters in Arabic


[edit] .qa - Quatar - قطر

Possible TLD string Punycode encoding Keystrokes
z=tah
Comments

قطر

xn--wgbl6a
u+0642 u+0637 u+0631
"qzr"

ق ط ر

Short name Qatar
Recommended by Farah, et al., 2006

قط

xn--2gbv"qz"two-letter abbreviation
matches first two letters in Arabic


[edit] .kw - Kuwait - الكويت

Possible TLD string Punycode encoding Keystrokes Comments

الكويت

xn--mgbg8edvm
"alkuit"

short name al-Kuwayt
Recommended by Farah, et al., 2006

كويت

xn--pgb3cpi"kuit"Short country name (without the "al")

كو

xn--fhbk"ku"two-letter abbreviation
matches first two letters in Arabic


[edit] .lb - Lebanon - لبنان

Possible TLD string Punycode encoding Keystrokes Comments

لبنان

xn--mgbb7fjb
u+0644 u+0628 u+0646
u+0627 u+0646
"lbnan"

ل ب ن ا ن

Short name Lubnān
Recommended by Farah, et al., 2006

لب

xn--ngb9c"lb"two-letter abbreviation
matches first two letters in Arabic
transliteration of ISO-3166 code.


[edit] .ly - Lybia - ليبيا

Possible TLD string Punycode encoding Keystrokes Comments

ليبيا

xn--mgbb7fyab
u+0644 u+064A u+0628
u+064A u+0627
"libia"

ل ي ب ي ا

Short name Lībiyā
Recommended by Farah, et al., 2006

لي

xn--ghbm"li"two-letter abbreviation
matches first two letters in Arabic


[edit] .eg - Egypt - مصر

Possible TLD string Punycode encoding Keystrokes Comments

مصر

xn--wgbd7c
u+0645 u+0635 u+0631
"msr"

م ص ر

Short name Miṣr or Máṣr
Recommended by Farah, et al., 2006

مس

xn--ygb9a"ms"two-letter abbreviation
matches first two letters in Arabic


[edit] .ma - Morocco - المغرب

Possible TLD string Punycode encoding Keystrokes
G=ain
Comments

المغرب

xn--mgbc0a9azcg
u+0627 u+0644 u+0645
u+063A u+0631 u+0628
"almgrb"

ا ل م غ ر ب

Short name al-Maghrib
Recommended by Farah, et al., 2006

مغرب

xn--ngbr0a6b"mgrb"Short name without the "al"

مغ

xn--5gbv"mg"two-letter abbreviation
matches first two letters in Arabic


[edit] .mr - Mauritania - موريتانيا

Possible TLD string Punycode encoding Keystrokes Comments

موريتانيا

xn--mgbah1a3hjkrd
u+0645 u+0648 u+0631
u+064A u+062A u+0627
u+0646 u+064A u+0627
"muritania"

م و ر ي ت ا ن ي ا

Short name Mūrītāniyā
Recommended by Farah, et al., 2006

مو

xn--hhbg"mu"two-letter abbreviation
matches first two letters in Arabic


[edit] .ye - Yemen - اليمن

Possible TLD string Punycode encoding Keystrokes Comments

اليمن

xn--mgb2ddes
u+0627 u+0644 u+064A
u+0645 u+0646
"alimn"

ا ل ي م ن

Short name al-Yaman
Recommended by Farah, et al., 2006

يمن

xn--hhbck"imn"Short name without the "al"

يم

xn--hhbj"im"two-letter abbreviation
matches first two letters in Arabic


[edit] Arabic-related

[edit] .af Afghanistan افغانستان

  • افغانستان (xn--mgbaal8b0b9b2bd) Full short country name.

[edit] .ir Iran ايران

The preferred solution involves simultaneous delegation of the following two homoglyph strings:

Possible TLD string Punycode encoding Keystrokes
y= Farsi Yeh
Comments

ایران

xn--mgba3a4f16a
u+0627 u+06CC u+0631
u+0627 u+0646
"ayran"

ا ی ر ا ن

"Iran" in Farsi script
Requested by .ir ccTLD registry

ايران

xn--mgba3a4fra
u+0627 u+064A u+0631
u+0627 u+0646
"airan"

ا ي ر ا ن

"Iran" as Arabic variant
Presentation identical to Farsi
Code point difference wrt Farsi

Note: the two strings are visually identical, but both would be needed in the root. The reason why variants are needed is linked to the fact that glyphs change depending on preceding and following characters. As a result, different code points exist in some cases where a different ligature behaviour is required even though the character often has the same shape and same pronounciation between Farsi and Arabic. The .ir ccTLD registry already annonced its intention to request the first string through a letter from CENTR to ICANN in March 2007. For technical reasons, both strings should be delegated to the .ir registry.

[edit] .pk Pakistan پاکستان

  • پاکستان (xn--mgbai9azgqp6j) Full word of the short country name.

[edit] Greek

[edit] .gr Greece - Ελλάδα or Ελλάς

  • .ελ xn--qxam Epsilon and lambda, first two letters of country name "Ellas" (Hellas).
  • .ελλάς xn--hxarsa5b
    • .ελλας xn--mxahsa5b
  • .ελλάδα xn--hxakic4aa
    • .ελλαδα xn--mxaaic4aa

[edit] .cy Cyprus - Κύπρος - Kıbrıs

  • .κπ xn--vxam
  • .κπρ xn--vxamd
  • .κύπρος xn--vxakcel0d (also matches κύπροσ)
    • .κυπρος xn--vxakceli (also matches κυπροσ)

The punycode encoding of κύπρος, xn--vxakcel0d, is identical to the encoding of "κύπροσ", as both the normal sigma "σ" and the final Sigma "ς" map to the uppercase Sigma "Σ".

The most logical transliteration of Cyprus' ISO 3166 code ".cy" would be ".κυ" (uppercase ".ΚΥ"), but this would be confusable with ".ky" (Cayman Islands) and "ku" (not currently used in ISO 3166).

For Greek, an IDN TLD would be needed to avoid the need for script switching within the domain. In this case, the ".κπρ" would be the simplest solution: it would avoid any confusability, avoid the tonos problem (ύ vs υ) and avoid the final sigma problem (ς vs. σ). ".κπρ" combined with .cy would allow single-script domains all of Cyprus' national languages.

[edit] Indic Scripts, Indic-Related Scripts, Caucasus

[edit] .bd Bangladesh

[edit] .bt Bhutan འབྲུག་ རྒྱལ་ཁབ་

[edit] .ge Georgia საქართველო

  • საქართველო (xn--lodamdjuvtg5b) (Full country name, pronounced Sakartvelo)

[edit] .in - India - Many Scripts

There is an obvious need in India for a number of IDN.IDN ccTLDs as alteratives to .in. In addition, of course, IDN gTLDs for large communties should also be considered, such as a Tamil cultural and language TLD. Given India's enourmous cultural wealth, population, economic potential and intellectual resources, one could easily expect a dozen or more IDN ccTLD variants for .in in addition to IDN gTLDs in Indic scripts. IDN ccTLDs are gTLDs are far from mutually exclusive in this respect. For instance, it seems perfectly sensible to create an IDN ccTLD for India in Tamil as well as an IDN gTLD for Tamil.

IDN ccTLD variants for .in could include:

  • .भारत (xn--h2brj9c "Bharat"; India in Hindi). As very convincingly pointed by various speakers from India at the IDN workship in Lisbon, an abbreviation would not be helpful)
  • .இந் (xn--xkc0e5e ; two-letter string pronounced "ind". This is an abbreviation of the Tamil word "இந்தியா", pronounced "india", meaning India)

[edit] .kh Cambodia

[edit] .la Laos

[edit] .lk Sri Lanka ( _________ / இலங்கை )

Sri Lanka is one of the clear cases where it is inconceivable to just add a single IDN ccTLD, as there two main language communities, Sinhalese and Tamil, each of which would need to have an IDN TLD for Sri Lanka.

[edit] .np Nepal नेपाल

  • .नेपाल (xn--l2bey1c2b) Full word of the short country name.

[edit] .mm Myanmar

[edit] .th - Thai - ประเทศไทย

The Thai script is a where the vowels associated with a consonant are not written in phonetic sequential order, but before or after the consonant, or both.

  • .ไทย (xn--o3cw4h "thai") ?

[edit] Supranational Country Codes

Currently, "EU" is the only supranational country code used as a TLD.

[edit] .eu European Union - Европа / Ευρώπη

The European Union has two members that use non-Latin scripts, namely Bulgaria and Greece. As a result, it can only have either no IDN ccTLD, or if it does, it requires one in Cyrillic and one in Greek. In keeping with the similarity of scripts, it may be good to use abbreviation, most probably for purely practical reasons, the simple transliteration of "EU" into Greek and Bulgarian.

  • Alternatives for Greek:
    • .ευ (xn--qxa6a) Transliteration of "eu" into Greek, also corresponding to the first two letters of the Greek word for Europe.
    • . ευρώπη (xn--qxae0adt9c) Full Word for Europe in Greek.
  • Alternatives for Bulgarian
    • евр (xn--b1af8a) First three letters of Bulgarian full word for Europe, equivalent to "eur".
    • европа (xn--80adi1bfe) Bulgarian full word for Europe.

The Bulgarian ccTLD for EU should be chosen not only in consideration of the Bulgarian language, but also in consideration of other Cyrillic script languages. Moreover, any forms that would result in glyphs confusable with other TLDs would have to be excluded. In this context, the string "еу" (in uppercase "ЕУ"; Cyrillic letter Ye followed by Cyrillic letter U) might be considered as a phonetic transliteration of "eu", but it would be confusable with "ey", which is not currently assigned in ISO-3166 and would then have to be reserved. The string "ев" (in uppercase "ЕВ"; Cyrillic letter Ye followed by Cyrillic letter Ve), i.e. the first two letters of the word for Europe in Bulgarian, would be confusable with the latin string "EB".

[edit] References

ISO 3166 Codes overview: http://en.wikipedia.org/wiki/ISO_3166-1_alpha-2#Decoding_table

Personal tools