An
alphabet is a complete standardized set of
letters—basic written symbols—each of which roughly represents a
phoneme of a spoken
language, either as it exists now or as it may have been in the past. There are other
systems of writing such as
logograms, in which each symbol represents a
morpheme, and
syllabaries, in which each symbol represents a syllable.
The word "alphabet" itself comes from
alpha and
beta, the first two symbols of the
Greek alphabet. There are dozens of alphabets in use today. Most of them are
linear, which means that they are made up of lines. Notable
exceptions are the
Braille alphabet,
Morse Code and the
cuneiform alphabet of the ancient city of
Ugarit.
Types
Among segmental scripts (that is, scripts that use a separate glyph for each phoneme, commonly called "alphabets"), one may distinguish
abjads, which only record
consonants and were first developed by the
Egyptians as part of their hieroglyphic script; true alphabets which record consonants and
vowels separately, first developed by the
Greeks; and
abugidas, in which the vowels are indicated by diacritical marks or systematic modification of the form of the consonants, first developed by the
Indians. Examples of present-day abjads are the Arabic and
Hebrew scripts; true alphabets include
Latin, Cyrillic, and Korean
Hangul; and abugidas are used to write
Ethiopic, Hindi, and
Thai. The
Canadian Aboriginal Syllabics are also an abugida, as a glyph stands for a consonant and is rotated to represent the vowel, rather than each consonant-vowel combination being represented by a separate glyph, as in a true syllabary.
The boundaries between these three types is not always clear-cut. For example, Iraqi
Kurdish is written in the Arabic script, which is normally an abjad. However, in Kurdish, writing the vowels is mandatory, and full letters are used, so the script is a true alphabet. Other languages may use a Semitic abjad with mandatory vowel diacritics, effectively making it an abugida. On the other hand, the Phagspa script of the
Mongol Empire was based closely on the
Tibetan abugida, but all vowel marks were written after the preceding consonant rather than as diacritic marks. Although short
a was not written, as in the abugidas, one could argue that the linear arrangement made this a true alphabet. Conversely, the vowel marks of the Ethiopic abugida have been so completely assimilated into their consonants that the system is learned as a
syllabary rather than as a segmental script. Even more extreme, the Pahlavi abjad became
logographic. (See below.)
Thus the primary classification of alphabets reflects how they treat vowels. Further classification can be based on tone, though there are as yet no names to distinguish the various types. Some alphabets simply disregard tone entirely, especially when it does not carry a heavy functional load, as in parts of Africa and the Americas. Such scripts are to tone what abjads are to vowels. Most commonly, tones are indicated with diacritics, the way vowels are treated in abugidas. This is the case for
Vietnamese (a true alphabet) and
Thai (an abugida). In Thai, tone is determined primarily by the choice of consonant, with diacritics for disambiguation. In the Pollard script (an abugida), vowels are indicated by diacritics, but the placement of the vowel relative to the consonant indicates the tone. More rarely, a script will have separate letters for the tones, as is the case for Hmong and
Zhuang. Regardless of whether letters or diacritics are used, the most common tone will not be marked, just as the most common vowel is not marked in Indic abugidas.
Alphabets can be quite small. The Book
Pahlavi script, an abjad, had only twelve letters at one point, and may have had even fewer later on, and the Scandinavian
Younger Futharc had just sixteen. Today the
Rotokas alphabet alphabet has only twelve letters. (The
Hawaiian alphabet is sometimes claimed to be as small, but it actually consists of 18 letters, including the
‘okina and five long vowels. Rotokas also has long vowels, but they are written doubled rather than using separate letters.) While Rotokas and the Polynesian languages have small alphabets because they have few phonemes to represent (Rotokas could actually get by with one fewer, because
s and
t represent the same speech sound), Book Pahlavi was small because many letters had been conflated (that is, the graphic distinctions had been lost over time), and diacritics were not developed to compensate, as they were in
Arabic, another script that lost many of its distinct letter shapes. For example, a comma-shaped letter represented
g, d, y, k, and
j. However, such simplifications can perversely make a script more complicated. In later Pahlavi
papyri, up to half of the remaining graphic distinctions were lost, and the script could no longer be read as a sequence of letters at all, but had to be learned as word symbols – that is, as
logograms like Egyptian
demotic.
The largest segmental script is probably an abugida,
Devanagari. When written in Devanagari, Vedic Sanskrit has an alphabet of 53 letters, including the
visarga mark for final aspiration and special letters for
kš and
jñ, though one of the long els is theoretical and not actually used. The Hindi alphabet must represent both Sanskrit and modern vocabulary, and so has been expanded to 58 with the
khutma letters (letters with a dot added to represent sounds from Persian and English).
The largest known abjad is
Sindhi, with 51 letters. The largest true alphabets include
Kabardian and
Abxaz (for Cyrillic), with 58 and 56 letters, repectively, and
Slovak (for the
Latin alphabet), with 46. However, these scripts either include di- and tri-graphs (as Spanish does with
ch), or
diacritics (like Slovak
č). The largest true alphabet where each letter is graphically independent is probably
Georgian, with 40. (The Georgian alphabet is supposed to have been extended to 52 letters to write
Aghbanian, but this probably involved diacritics.)
Syllabaries typically include 50 to 400 glyphs (though Pirahã would require only 24 if tone were not indicated, and Rotokas 30), and the glyphs of logographic systems number from the hundreds to thousands. Thus a simple count of the number of distinct symbols is an important clue to deciphering an unknown script.
It is not always clear what constitutes a different alphabet.
French uses the same basic alphabet as English, but many of the letters can carry
diacritic accents and other marks (for example, é, à or ô). In French, these accents are not considered to create additional letters. However, in
Icelandic, the accented letters (such as á, í and ö) are considered distinct letters of the alphabet. Some adaptations of the Latin alphabet are augmented with
ligatures, such as %C6 in
Old English and
Ȣ in
Algonquin; by borrowings from other alphabets, such as the thorn þ in
Old English and
Icelandic, which came from the Futhark runes; and by modifying existing letters, such as the
eth ð of Old English and Icelandic, which came from
d. Other alphabets only use a subset of the Latin alphabet, such as Hawaiian, or
Italian, which only uses the letters
k,
x, and
w for foreign words.
Spelling
Each language may establish certain general rules that govern the association between letters and phonemes, but, depending on the language, these rules may or may not be consistently followed. In a perfectly
phonological alphabet, the phonemes and letters would correspond perfectly in two directions: a writer could predict the spelling of a word given its pronunciation, and a speaker could predict the pronunciation of a word given its spelling. However, languages often evolve independently of their writing systems, and writing systems have been borrowed for languages they were not designed for, so the degree to which letters of an alphabet correspond to phonemes of a language varies greatly from one language to another and even within a single language.
Languages may fail to achieve a one-to-one correspondence between letters and sounds in any of several ways:
- A language may represent a given phoneme with a combination of letters rather than just a single letter. Two-letter combinations are called digraphs and three-letter groups are called trigraphs. Kabardian uses a tesseragraph (four letters) for one of its phonemes.
- A language may represent the same phoneme with two different letters or combinations of letters.
- A language may spell some words with unpronounced letters that exist for historical or other reasons.
- Pronunciation of individual words may change according to the presence of surrounding words in a sentence.
- Different dialects of a language may pronounce different phonemes for the same word.
- A language may use different sets of symbols or different rules for distinct sets of vocabulary items (such as the Japanese hiragana and katakana syllabaries, or the various rules in English for spelling words from Latin and Greek, or the original Germanic vocabulary.
National languages generally elect to address the problem of dialects by simply associating the alphabet with the national standard. However, with international languages with wide variations in its dialects, such as
English, it would be impossible to represent the language in all its variations with a single phonetic alphabet.
Some national languages like
Finnish have a very regular spelling system with close to a one-to-one correspondence between letters and phonemes. The
Italian language has no verb corresponding to
spell;
scriversi ('is written') suffices, because a correct pronunciation exactly corresponds to a correct
orthography. In standard Spanish, it is possible to predict the pronunciation of a word from its spelling, but not vice versa; this is because certain phonemes can be represented in more than one way, but a given letter is consistently represented.
French, with its
silent letters and its heavy use of
nasal vowels and
elision, may seem to lack much correspondence between spelling and pronunciation, but its rules on pronunciation are actually consistent and predictable with a fair degree of accuracy. At the other extreme, however, are languages such as
English, where the spelling of many words simply has to be memorized as they do not correspond to sounds in a consistent way, because the
Great Vowel Shift in English occurred after orthography was established, and because English has acquired a large number of loanwords at different times retaining their original spelling at varying levels. However, even English has general rules that predict pronunciation from spelling, and these rules are successful most of the time.
The sounds of speech of all languages of the world can be written by a rather small universal phonetic alphabet. A standard for this is the
International Phonetic Alphabet.
Collation
An alphabet also serves to establish an
order among letters that can be used for sorting entries in lists, called collating. Note that the order does not have to be constant among different languages using this alphabet; for examples see Latin alphabet: Collating in other languages.
In recent years the
Unicode initiative has attempted to collate most of the world's known writing systems into a single
character encoding. As well as its primary purpose of standardising computer processing of non-Roman scripts, the Unicode project has provided a focus for script-related scholarship.
History and diffusion
The
oldest known alphabet consists of recently discovered graffiti, scratched onto rocks in central
Egypt around 1800 BC. It appears to have been used by Semitic workers or mercenaries partially integrated into Egyptian society. The alphabet had previously been thought to have originated some 300 years later. (See
Middle Bronze Age alphabets.)
The Egyptians aleady had an alphabet as part of their hieroglyphic script, but only used purely alphabetic writing when transcribing loan words or foreign names. The inventors of the Semitic alphabet, whether Semitic workers or Egyptian bureaucrats, appear to have taken Egyptian hieroglyphs (and not just the Egyptian alphabet) and given them translated
Semitic names. So, for example,
pr "house" became
bayt "house". At this point scholars are still debating whether, when these glyphs were used to write the Semitic language instead of Egyptian, they were purely alphabetic, or whether, for example, the "house" glyph stood for both the consonant
b and the sequence
byt, as it had stood for both
p and
pr in Egyptian. However, by the time it was inherited by the
Canaanites, it was purely alphabetic, standing only for
b (see
Phoenician alphabet).
All subsequent alphabets around the world have either descended from this first Semitic alphabet, or else been inspired by one of its descendants, with the possible exception of
Meroitic, a seemingly independent
3rd century BC alphabetic adaptation of hieroglyphs. The one modern-day national alphabet that cannot be traced to the Canaanite alphabet graphically is the
Maldivian script, which is unique in that, although clearly modeled after existing alphabets such as Arabic, it derives its letters from numerals. (However, there is some speculation that the ancestral
Brahmi numerals above 3 might ultimately derive from the Semitic alphabet as well.)
Among alphabets that
aren't used as national scripts today, a few are clearly independent of other alphabets in their letter forms: the
Zhuyin phonetic alphabet derives from Chinese characters, and the geometric
Cree Syllabics (which, despite its name, is an abugida) is derived from British
shorthand. The
Santali alphabet, an indigenous true alphabet of India, appears to be based on traditional symbols such as "danger" and "meeting place", as well as pictographs invented by its creator. (The names of the Santali letters are related to the sound they represent through the acrophonic principle, but it is the
final consonant or vowel that the letter represents: e.g.
le "swelling" represents
e, while
en "thresh grain" represents
n.) In the ancient world,
Ogham consisted of tally marks, and the monumental inscriptions of the Old Persian Empire were written in an essentially alphabetic cuneiform script whose letter forms seem to have been created for the occasion. All five of these appear to be graphically independent of the other alphabets of the world, but they were devised from their example.
Changes to a new medium sometimes caused a break in graphical form, or made the relationship difficult to trace. It is not immediately obvious that the cuneiform
Ugaritic alphabet derives from a prototypical Semitic abjad, for example. Although
manual alphabets are a direct continuation of the local alphabet (both the British two-handed and the
French/
American one-handed alphabets retain the forms of the Latin alphabet, as the Indian manual alphabet does Devanagari, and the Korean does Hangul),
Braille,
semaphore,
maritime signal flags, and the
Morse codes are essentially arbitrary geometric forms. The shapes of the Braille and semaphore letters, for example, are derived from the alphabetic order of the Latin alphabet, but not from the letters themselves. Modern
shorthand also appears to be graphically unrelated. If it derives from the Latin alphabet historically, the connection has been lost.
However, most alphabets descend directly from the original Semitic script. The
Aramaic alphabet, which evolved from Phoenician in the
7th century BC and was used by the
Persian Empire, appears to be the ancestor of nearly all of the modern alphabets of Asia. The modern
Hebrew alphabet started out as a local variant of Aramaic. (The original Hebrew alphabet has been retained by the
Samaritans.) The
Arabic alphabet descended from Aramaic via the Nabatean alphabet of what is now southern
Jordan. The
Syriac alphabet used after the
3rd century CE evolved, through
Pahlavi and
Sogdian, into the alphabets of northern Asia, such as
Orkhon (probably),
Uyghur,
Mongolian, and
Manchu. The
Georgian alphabet is of uncertain provenance, but appears to be part of the Persian-Aramaic family.
The Aramaic alphabet is also the most likely ancestor of the
Brahmic alphabets of India, which spread to
Tibet,
Southeast Asia, and
Indonesia with the
Hindu and
Buddhist religions.
China and
Japan, while absorbing
Buddhism, maintained their own
logographic and
syllabic scripts. However, the
Hangul alphabet invented in
Korea in the 15
th century is based on half a dozen letters apparently derived from
Tibetan via the imperial Phagspa alphabet of the
Yuan dynasty in China.
Besides Aramaic, the Phoenician alphabet gave rise to the
Berber and
Greek alphabets. Whereas separate letters for vowels would have actually hindered the legibility of Egyptian, Berber, or Semitic, their absence was problematic for Greek, which had a very different
morphological structure. However, there was a simple solution. The alphabet was based on the
acrophonic principle, where a letter represented the first sound of its name. Thus
bayt (Greek
beta) stood for
b. All of the names of the letters of the Phoenician alphabet started with consonants. However, several of these were rather soft, and unpronounceable by the Greeks, who simply ignored them. For example, the Greeks had no glottal stop or
h, so the Phoenician letters
’alep and
het became Greek
alpha and
eta. By the acrophonic principle, these now stood for the vowels
a and
e rather than the consonants
ʔ and
h. As this didn't provide for all twelve Greek vowels, the Greeks created
digraphs and other modifications, such as
ei,
ou,
o (which became
omega), or simply ignored the deficiency, as in long
a, i, u.
Greek is in turn the source for all the modern scripts of Europe. Eastern Greek, where the letter
eta stood for a vowel, gave rise to
Cyrillic and probably
Armenian; the western dialects of Greek, where eta remained an
h, produced the Roman alphabet and even the
runes.
Although this description presents the evolution of scripts in a linear fashion, this is a simplification. For example, the Manchu alphabet, descended from the abjads of West Asia, was also influenced by Korean hangul, which was either independent (the traditional view) or derived from the abugidas of South Asia. Georgian apparently derives from the Aramaic family, but was strongly influenced in its conception by Greek. The Greek alphabet, itself ultimately a derivative of the Egyptian
hieratic hieroglyphs, later more directly adopted half a dozen
demotic hieroglyphs when it was used to write
Coptic Egyptian.
The most popular alphabet in use today is the 26-letter Latin alphabet used, with some modification, for most of the languages of the
European Union, the Americas, Subsaharan Africa, and the islands of the Pacific Ocean:
English,
Spanish,
Portuguese,
Indonesian,
French,
Turkish,
German,
Javanese,
Vietnamese,
Italian,
Polish,
Hausa,
Swahili,
Filipino, etc. In modern usage, the term
Latin alphabet is used for any straight-forward derivation of the alphabet used by the Romans. These variants may drop letters (
Hawaiian) or add letters (
Czech) to or from the classical Roman script, and of course many letter shapes have changed over the centuries — such as the lower-case letters you're reading now, which the Romans would not have recognized.
The default Latin alphabet is the Roman, supplemented with J, U, W, and lower-case variants:
::
A,
B,
C,
D,
E,
F,
G,
H,
I,
J,
K,
L,
M,
N,
O,
P,
Q,
R,
S,
T,
U,
V,
W,
X,
Y,
Z
Additional letters may be formed as
ligatures, as W was from VV, for example
ash Æ from AE,
oethel Œ from OE,
eszett ß from SS,
engma ŋ from NG,
ou Ȣ from OU,
Ñ from NN, or
Ç from CZ; by diacritics, such as
Å, Č,
Ų; as
digraphs, such as
IJ and
Ll; by modification, as J was from I, such as
Ø,
eth Ð,
yogh Ȝ from G, and
schwa Ə from either A or E; or may even be borrowed from another alphabet entirely, as
thorn Þ and
wynn Ƿ were from Futhark.
However, these glyphs are not always considered independent letters of the alphabet. For instance, in English
æ is considered a graphic variant of
ae rather than a separate letter, while in
Danish and Norwegian it is a true letter, and is placed at the end of the alphabet along with
ø and
aa/''å''.
See also
References
External links
Category:Alphabetic writing systems
Category:Documents
Category:Writing
af:Alfabet
ar:أبجدية
ast:Alfabetu
bg:Азбука
ca:Alfabet
cs:Abeceda
da:Alfabet
de:Alphabet
et:Tähestik
als:Alphabet
es:Alfabeto
eo:Alfabetoj
fr:Alphabet
gl:Alfabeto
id:Aksara
it:Alfabeto
ko:자모 문자
he:אלפבית
lt:Abėcėlė
ms:Aksara
nl:Alfabet
ja:アルファベット
nb:Alfabet
nn:Alfabet
pl:Alfabet
pt:Alfabeto
ro:Alfabet
ru:Алфавит
sk:Abeceda
sl:Abeceda
fi:Aakkoset
sv:Alfabet
tr:Abece
zh:字母