JCMC 9 (1) November 2003
Collab-U CMC Play E-Commerce Symposium Net Law InfoSpaces Usenet
NetStudy VEs VOs O-Journ HigherEd Conversation Cyberspace Web Commerce
Vol. 6 No. 1 Vol. 6 No. 2 Vol. 6 No. 3 Vol. 6 No. 4 Vol. 7 No. 1 Vol. 7 No. 2 Vol. 7 No. 3 Vol. 7 No. 4 Vol. 8 No. 1 Vol. 8 No. 2
"A Funky Language for Teenzz to Use": Representing Gulf Arabic in Instant Messaging
David Palfreyman and Muhamed al Khalil
Zayed University, Dubai
- Research Questions
- About the Authors
Editors' Note: If you have difficulty viewing the fonts in this article, you can first try the following: 1. In MSIE, Go to TOOLS menu/ INTERNET OPTIONS/ACCESSIBILITY [in the GENERAL tab]; 2. Check the box next to [Ignore font styles specified on Web pages]; 3. Press OK; 4. In the GENERAL tab again choose the FONTS tab; 5. Set the "Web Page Font" to Arial Unicode MS, and keep the "Language Script" set to "Latin Based" in the drop menu; 6. Press OK/OK; 7. Exit and Restart IE. If that strategy fails, you may wish to download and consult a PDF version of the paper which displays the fonts correctly. This PDF document is for reference only. For citation purposes the html document is the official version.
AbstractComputer-mediated communication (CMC) users writing in Arabic often represent Arabic in 'ASCII-ized' form, using the Latin alphabet rather than the Arabic alphabet normally used in other contexts (Warschauer, El Said, & Zohry, 2002). Analyzing ASCII-ized Arabic (AA) can give insights into ways in which CMC is shaped by linguistic, technological and social factors. This paper presents a study of AA as used among female university students in the United Arab Emirates, drawing on data from a small corpus of instant messenger (IM) conversations, and from an e-mail survey of users' experience with this form of writing. The AA in the conversations was found to show influences from computer character sets, from different varieties of spoken Arabic, from Arabic script, from English orthography and from other latinized forms of Arabic used in contexts which pre-date CMC. Users have developed creative (but variable) solutions to the constraints involved, but the purposes of AA use also extend for social reasons to situations where technical constraints do not apply.
IntroductionThe growth of computer-mediated communication (CMC) around the world has brought with it changes in how language is used, including faster composition and reading of texts (Baron, 2002), and diffusion of oral discourse features into written language (Werry, 1996; Yates, 1996). Research into these phenomena has focused mainly on the English language; expanding the focus to other languages highlights new issues for research. This paper presents the findings of a small-scale, exploratory study of how female Arab university students in the United Arab Emirates use the Latin alphabet to write vernacular Arabic in online communication, specifically in instant messaging (using applications such as MSN Messenger, Yahoo Messenger, or ICQ).
Instant messaging (IM) is a form of CMC whereby two or more participants can carry out a synchronous ('instant') written conversation, which they can watch developing on their respective computer screens. In this respect an IM environment resembles that of a chatroom. However, IM is more dyadic in its orientation than chat: in IM, as with e-mail, in order to communicate with another user you must know their account identifier ('nickname')—either because they give it to you, or by finding it from some sort of directory—and other IM users cannot enter the conversation uninvited. IM has become a popular medium for casual online interaction between teens in countries where the technological means for this exist (Schiano, et al., 2002; Rodgers & Gauntlett, 2002).
The short extract in Table 1 shows a sample of the type of discourse studied in this paper. The left-hand column is the opening of an online conversation in the corpus used for this study; the right-hand column shows an approximate English translation.
D: السلام عليكم ورحمة الله وبركاته
D: مرحبا حمده،، شحالج؟
F: w 3laikom essalaaam asoomah ^__^
F: b'7air allah eysallemch .. sh7aalech enty??
D: el7emdellah b'7eer w ne3meh
D: sorry kent adawwer scripts 7ag project eljava script w rasi dayer fee elcodes
D: Hello there.
D: Hi Hamda, how are you doing?
F: Hi there Asooma ^__^
F: Fine, God bless you. How about you?
D: Fine, great thanks.
D: Sorry, I was looking for scripts for the java script project and my head is swarming with code.
Table 1. Opening of a typical messenger conversation
Although some features of this extract are familiar from other types of CMC in other contexts (turns are typically short, for example, and emoticons such as ^__^ are used to represent emotive content), even a reader with no knowledge of Arabic will notice some linguistic complexity here. The first two turns of the conversation are in Arabic script, then both participants start to use the Latin alphabet instead. The latter part of this extract, although using a different alphabet, still represents Arabic, but letters are interspersed with numerals, and Arabic with English words. In this paper we discuss characteristics of this latinized Arabic, or more precisely ASCII-ized Arabic (AA), in which ASCII (American Standard Code for Information Interchange) symbols are used to represent Arabic in IM and other electronic written communication.
The IM users in this study are students in higher education in the United Arab Emirates (UAE), mainly on the Dubai campus of Zayed University (ZU), a university for female UAE nationals. ZU has a policy of using information technology in the curriculum: All students are required to have a laptop, and the campus offers almost unrestricted intra- and internet access. ZU also aims to produce students who are bilingual and biliterate in English and Arabic, and the majority of the courses in the university are conducted in English. These students' familiarity with technology and with English contrasts greatly with UAE women (and many men) of their mothers' generation, who often did not receive even primary schooling. AA is one product of such dramatic changes in the UAE, and an analysis of the form and use of AA illustrates some of the interrelations between language, literacy, technology and globalization.
The setting of this study therefore involves two writing systems (the Arabic and Latin alphabets) which are closely linked to two world languages (Arabic and English). The interaction between these four language systems takes place within a particular social context (a traditional community in the midst of rapid modernization), which shapes the linguistic and technical resources used by these young women. Before presenting the study itself, therefore, we will lay out some background which will help in making sense of it. First we will discuss writing systems: the mapping of sounds and symbols, and how orthography relates to macro and micro aspects of social context, as well as psycholinguistic and social psychological processes. We will then provide some explanation about Arabic in particular: the Arabic language and its script, latinization, and the different varieties of Arabic and their social significance. To complete the background we will discuss, in relation to social context, the technical resources involved (computer character sets) and the phenomenon of ASCII-ization. After this we will describe an exploratory study of Arabic IM discourse, and discuss its implications for the study of CMC in a global context.
BackgroundLinguistic Aspects of Writing Systems
The majority of the world's writing systems consist of conventions which link the sounds of spoken language with written symbols. In this paper, sounds (or more precisely phonemes—sounds which are used to distinguish meaning in a particular language) are shown as International Phonetic Alphabet (IPA) symbols between slashes (e.g. /s/), while written symbols (such as letters) are shown here as follows: <s>. The conventions which link sounds and symbols vary widely between languages, and can be complex. The symbol <c> can be pronounced in various ways in English (e.g. as /k/ in <cat> and as /s/ in <cent>), and in yet other ways in other languages (e.g. as /t∫/ in Italian, as /θ/ in Castilian Spanish and as /dʒ/ in Turkish). Conversely, the sound /k/ can be written with <c> or <k> in English (as in <cat>, <keep>), or with a completely different symbol <ك> in Arabic. In English, single sounds may also be represented by digraphs—sequences of two symbols such as <sh> or <oo> in <shoot>, which uses five letters to represent three sounds /∫u:t/.
One writing system familiar to readers of this paper is the standard English alphabet, consisting of 26 letters, which in general correspond, individually or in combination, to phonemes in the spoken language. Each letter comprises a lower case and an upper case form, as well as font variations in print or handwriting. Other languages use basically the same set of characters as English, but with some letters omitted and/or added—thus the Spanish and Turkish versions of the Latin alphabet comprise 27 and 29 letters respectively. 'Non-Latin' alphabets such as Arabic use entirely different characters to represent sounds, but Arabic script is still based on correspondences between letters and sounds, and the set of symbols used in Arabic is comparable in size to the 'English alphabet'.
Beesley (1998) states that "many language communities adopt their standard orthography more or less by historical accident." He cites the distribution of the Latin and Cyrillic alphabets in Europe, which largely reflects whether these areas were, in the early centuries AD, occupied and/or proselytized under Roman or Greek influence. In a similar way, the use of Arabic script has in the past been closely associated with the spread of Islam, being used to represent languages unrelated to Arabic (e.g., Turkish and Persian) in Muslim societies.
At a societal level, Grivelet (2001) uses the term 'digraphia' to describe societies where two different writing systems with different social functions are used to represent essentially the same language (e.g., the Cyrillic and Latin scripts for 'Serbo-Croatian' in pre-1990s Yugoslavia). This term suggests a local consensus about which writing system should be used for which purpose, but more often the relative statuses of the writing systems in question are contested to some extent. In general, any decision about orthography constitutes a social/political statement (Unger, 2001), and in the modern world such decisions are linked with access to various types of literacy and thence to technology and power (Street, 1995; Bruthiaux, 2002). These connections are apparent in choices between different orthographic systems such as Latin and Arabic: The new Turkish Republic of the 1920s, for example, pursued a very effective 'alphabet revolution,' converting from the Arabic to the Latin script with the express aim of facilitating access to Western discourse. Conversely, the Syrian Ba'ath party in the 1980s orchestrated the smashing of shop signs written in Latin script, in the name of Arab purism.
The importance of social norms in orthography is also visible at a more micro level of discourse. One pervasive feature of CMC discourse which has attracted the attention of researchers is phonological simulation—representation of spoken features in online text, as for example in the written use of English contractions such as "gonna" and "wanna." Werry (1996) documents such features, and states that "the conventions that are emerging are a direct reflection of the physical constraints on the medium combined with a desire to create a language that is as 'speech-like' as possible" (p. 48). However, this downplays the social significance of this way of writing. Research shows that accents and other aspects of language act as markers of 'in-group' and 'out-group' identity (Abrams & Hogg, 1987; Cargile & Giles, 1997), and Stevenson (2000) suggests that phonological simulation in Internet Relay Chat (IRC), rather than being motivated simply by individual "desire" to mirror spoken features, "is a result of social pressure to break conventional spelling rules and comply with IRC's nonconformist, hacker image." It is therefore important to bear in mind the sociolinguistic norms which users embrace or distance themselves from as they make decisions about writing online.
Psycholinguistic Perspectives and Language Change
Orthographic representation of language also has implications for the study of the mental processing of language. On the one hand, literacy may affect processes of perception as well as production of spoken language: Read, Zhang, Nie, and Ding (1987) found that familiarity with the Latin alphabet (as compared with the logographic Chinese script) correlated with Chinese speakers' awareness of individual sounds, while Mann (1986) and Leong (1991) and found that the Japanese syllable-based writing system helped Japanese children to segment heard utterances into syllables but not phonemes. On the other hand, psychological representations may also influence orthographic ones, making orthography a valuable source of evidence in historical linguistics. For example, non-standard spellings in graffiti and other non-formal inscriptions in ancient Roman sites provide evidence as to the actual pronunciation of Vulgar Latin, which illuminates the evolution of Proto-Romance in a way that the standardized texts of Classical Latin do not (Sampson, 2002). One modern equivalent of this is online phonological simulation, which Li (2001) uses as evidence for local changes in the use of tones in Chinese. As well as reflecting pronunciation, such non-standard texts also point to the ways in which speakers understand the structure of their own language. For example, Hentschel (1998) notes that in Serbian chat people write the negative particle "ne" joined to the verb rather than, as in standard orthography, writing it as a free morpheme.
Arabic belongs to the Semitic family of languages, which also includes Hebrew. The study of Arabic e-discourse in general and Gulf Arabic in particular is of wider interest for several reasons. For one thing Arabic, like English, may be described as a world language—it is ranked fifth among the world's languages, for example, on an index based on number of speakers, geographical spread and socio-literary prestige (Weber, 1997). On the other hand, technical and economic considerations have restricted the use of Arabic on the internet in proportion to the number of users of the language. Another reason for wider interest in Arabic is that the grammar, vocabulary, sounds and writing system of Arabic are strikingly different from those of English and other Indo-European languages, and have in the past challenged and expanded assumptions of European-based linguistics.
The sound system of Arabic includes more consonant sounds and fewer vowel sounds than that of English. In addition to /t/, /d/, /s/ and // (the first sound in "this"), pronounced more or less as in English, Arabic distinguishes the 'emphatic' consonants /'/, /t'/, /d'/ and /s'/, which are pronounced with a tense and somewhat retracted tongue, moving any vowels adjacent to them backward in the mouth. Another set of distinctive Arabic consonants are known as the gutturals: sounds produced by constricting the throat. These include for example the sounds /x/ (similar to the final sound of Scots "loch") and /ʕ/ (an approximant resembling a light gargle). Overall, Modern Standard Arabic (MSA) distinguishes ten consonants which are not used as distinct sounds in English: the glottal stop /ʔ/ (written <ء>), /q/ (<ق>), /ɣ/ (<غ>), /ʕ/ (<ع>), /'/ (<ظ>), /t'/ (<ط>), /d'/ (<ض>), /s'/ (<ص>), /x/ (<خ>) and /ħ/ (<ح>). Conversely, there are six consonant sounds in English which are not seen as distinct sounds in MSA: /g/, /v/, /ŋ/ (as in "sing"), /p/, /ʒ/ (the <s> in "pleasure") and /t∫/ (the first sound in "chip"). The MSA vowel system is generally taken to include the short vowels /i/, /a/, /u/ and their long equivalents /i:/, /a:/ and /u:/—although the actual pronunciation of these phonemes may vary widely.
The sounds of Arabic are normally represented with an alphabet of 28 letters. Broadly speaking, Arabic orthography is phonemic: one letter represents one sound, and 'silent' letters (like <gh> in English <night>) and digraphs (like English <sh> or <th>) do not occur. However, there are some significant exceptions to this. In relation to vowel sounds, although a set of diacritics exist to mark short vowels, these diacritics are restricted largely to religious or literary texts, and are used only selectively in books. In Arabic newspapers, magazines and handwriting (as in Hebrew), only consonants and long vowels are written. In this respect, the Arabic writing system depends on the background knowledge of the reader to accurately pronounce the written word—much as a reader in English needs to decide on the basis of context whether <read> is to be pronounced /ri:d/ (present tense) or /red/ (past tense). Another common case in which Arabic orthography does not reflect the surface pronunciation is in the definite article. This is pronounced /əl/ before words beginning with vowels or some consonants, but assimilates before words beginning with certain consonants, so that it is pronounced as /əs/, /ər/, /əθ/ or other pronunciations depending on the initial sound of the word which follows. However, Arabic orthography represents the article consistently as <ال>, as if it were always pronounced /al/.
Only one language of the Semitic family (Maltese) uses the Latin alphabet as its standard orthography; but Arabic (like Hebrew) does have some more or less established conventions for writing Arabic words in Latin script for particular purposes. The most authoritative of these are the conventions used by the Library of Congress and by the Encyclopedia of Islam. The latter's system is quite elaborate and precise, and is the one most often adopted by books and journals of Middle Eastern studies. It uses, for example, <kh> for the sound /x/ (retained in the spelling but normally not the pronunciation of English "sheikh" and "Khartoum", for example); <q> for the guttural sound /q/ (hence the spelling <Iraq>); and capitals for the emphatic sounds described above (e.g. <S> for 'emphatic' /s'/ and <s> for 'normal' /s/). It also represents the definite article always as <al>, regardless of its pronunciation. While this system originated in academic contexts, people in the UAE are also accustomed to seeing latinized representations of Arabic names, for example on road signs or in English-medium institutions like the university where this research took place. In these contexts, a de facto latinization system is used, which resembles the academic standard, although with less effort to represent all sounds consistently. Since this is the kind of latinized Arabic most likely to be familiar to the informants in this study, we will refer to it as "Common Latinized Arabic" (CLA). In CLA the guttural sounds and vowels tend to be represented less consistently than in the academic standard. For example, /ħ/ (<ح>) and the English-type /h/ are both represented with <h>; /ʕ/ (<ع>) is represented with an apostrophe (e.g. <Mazra'a> or not shown at all (e.g. <Al Ain> for /ɨl ʕain/). In representing vowels, <mina> and <meena>, for example, appear variably to represent the same word, as do <sheikh> and <shaikh>.
Beesley (1998) distinguishes orthographic transcription of sounds ("orthography devised and used by linguists to characterize the phonology or morphophonology of a language") from transliteration of an existing orthography, which uses "the exact same orthographical conventions [as the language's customary orthography], but using carefully substituted orthographical symbols": for example, one transliteration of the Arabic word <كتب> (= 'books', pronounced /kutob/) would be <ktb>, with each of the three Arabic letters converted into a Latin character; whereas one transcription of the same word would be <kutob>. Beesley notes that "the typical transcription of Arabic has as its purpose to convey the pronunciation of Arabic words, usually to foreigners who are not comfortable with traditional Arabic orthography." Transliterations, on the other hand, "are appropriate when one wants to use the traditional orthography (with all its strengths and weaknesses, all its distinctions and ambiguities) but where writing or displaying or storing the original characters is impossible or inconvenient". Beesley's concern is with transliterating Arabic script for computer systems where only ASCII characters are supported. In this context, each Arabic letter (rather than each sound) must be replaced with an ASCII character or string; the ASCII form may be arbitrarily chosen, provided the computer can unambiguously translate Arabic symbols into ASCII and vice versa.
In Beesley's terms, the latinization conventions for Arabic described above are more transcriptions than transliterations, since they aim to represent pronunciation. However, the rendition of the definite article as <al> regardless of its pronunciation represents a transliteration of Arabic <ال>, which in turn reflects the underlying grammatical identity of the word, rather than its varying surface pronunciation. Both transcriptions and transliterations provide ways of relating Arabic to Latin script, but ambiguities often result. For example, the character sequence <kh> has in principle two readings: either as a digraph representing /x/, as in the name <Shaikha> (/∫eixa/), or as a sequence of two consonants /k/ and /h/, as for example in <samakha> (/səməkha/, "her fish"), formed from "samak" ("fish") and the possessive particle "-ha" ("her").
In terms of the social functions of language, Arabic has been cited as a textbook case of diglossia (Ferguson, 1959): the systematic use of distinct 'High' and 'Low' forms of the same language for different purposes. The Arabic language of today exists in two forms: the formal MSA, and the several vernaculars derived from it. While all Arabs necessarily speak at least one of these vernaculars in their daily communication, an educated Arab by definition must add to that a fair mastery of MSA. MSA is today's version of Classical Arabic, the language of the Quran, Islam, and Classical Arabic literature, written forms of which date as far back as the first century A.D. MSA is mainly a written language, spoken only in very formal settings (e.g. judicial proceedings, parliamentary deliberations, religious sermons, etc.), often from a prepared script (e.g. news broadcast, dedications, etc.). MSA, as the official language of some 20 Arab countries, is also the written language that students learn in their primary education, although verbal communication in the classroom is usually conducted in the local vernacular.
Arabic vernaculars display a very wide geographic distribution (Egyptian Arabic in Egypt; Levantine Arabic in Syria, Lebanon, Jordan, Palestine and Israel; Gulf Arabic in Southern Iraq and the Gulf region; etc). In addition, linguistic patterns correlate with economic development of Arab communities from bedouin to rural to urban, resulting in the existence of varieties within each vernacular which reflect this dynamic ecolinguistic transition (Cadora, 1992). In the case of the Arabian Gulf and North Africa, however, rapid urban development following the exploitation of oil has compressed this trend, and an accelerated transition from bedouin to urban society is under way.
The UAE vernacular Arabic spoken in Dubai is part of the larger group of dialects known as Gulf Arabic. Despite rapid urbanization, most of these dialects still reflect strong bedouin characteristics which they share with other bedouin-rooted dialects across the Arab World. Consonants in particular exhibit considerable variation from MSA. The sounds /g/ or /j/ are used instead of MSA /q/ (except in those words borrowed recently from MSA), (e.g. /ħari:ga/ or /ħari:dʒa/ for MSA's /ħari:qa/, "fire"); the /k/ sound is replaced with a /t∫/ sound in certain positions (compare MSA /samak/ (fish) with UAE /sɨmat∫/. Other characteristics include the pronunciation of /d'/ as /'/; and the occasional use of /y/ instead of /dʒ/ (e.g. /yɨlas/ for MSA /dʒalas/ (he sat) (Hoffiz, 1995)).
Until approximately forty years ago the UAE vernacular was used by often nomadic communities which were almost entirely illiterate. UAE Arabic, like other Arabic vernaculars, is not normally put into writing, since doing so is often thought to undermine the stature of MSA and corrupt its image. Yet Arabic script may be used to write the vernacular in certain circumstances. In the UAE, popular poetry composed in the vernacular is written down using Arabic script. Other uses include cartoons in newspapers and magazines, and to a lesser extent TV advertisements and street billboards, in an attempt to capture a local flavour; as well as the spontaneous creativity of the occasional wall graffiti. Online communication is another area where the script is sometimes used, as computer support for Arabic script becomes more accessible and more efficient.
The AA used in the computer-mediated conversations studied here differs from traditional ways of writing Arabic in several ways. Most obviously, it uses ASCII characters rather than Arabic letters; this in turn means that AA is read from left to right (the opposite direction from normal Arabic script); and that the letters are always separate from each other, rather than joined together (in slightly varying forms) as letters in the cursive Arabic script often are. Warschauer et al (2002) studied the use of English and Egyptian vernacular AA in electronic communication, and provide some initial observations about the prevalence of these two varieties in contrast to the use of MSA in non-electronic written communication. They also note in passing some orthographic features of the ASCII-ized variety, including the use of numerals to represent certain sounds. Their main focus is on the balance between English and Arabic, rather than on analyzing the features of AA itself; but they point out the great volume of use of AA, compared with that of regular Arabic script. A previous e-mail survey of 83 students at ZU Dubai (Palfreyman, 2001b) supports the view of AA as a significant medium for online communication: approximately 25% of respondents said they used mainly Arabic script in IM, 25% AA, and 50% English.
Computer Character Sets and ASCII-ization
In the use of electronic fonts, ASCII has been the most widely used standard since the 1960s. Like English, it is a kind of lingua franca of the internet (The Default Language, 1999). The technical motivation often cited for ASCII-ization is the difficulty or impossibility of typing symbols standard to particular languages using the ASCII character set: keyboards, computers, operating systems and servers often do not support the use of Arabic characters. In the conversation in Table 1, however, participant A is clearly able to use Arabic script, but changes from Arabic to Latin script after her first two turns in the conversation. This suggests that technical considerations should be viewed in relation to other linguistic, social and psychological factors: orthographies (such as the Latin or Arabic alphabets) and computer character sets (such as ASCII) constitute resources, used within sociocultural contexts which influence their forms and uses.
At the most basic level, computers deal not with letters but with numbers; the number of different characters which a computer can recognize depends on the size of the numbers which it is ready to handle. The ASCII character set, established in 1968, consists of 128 characters, of which 96 are visible (the others being hidden codes). These include the Latin letters most commonly used in European languages (each in upper and lower case), numerals, punctuation marks and some other common symbols such as ®. Each of these characters is represented by a 7-digit binary number; the limit of 128 characters is inherent in this 7-bit system. Later, more extensive character sets such as Latin 1 (ISO-8859-1) use an 8-bit system, require more working memory, and allow for an 'extended' character set of 256 different characters. More recent standards include Unicode (based on a 16-bit system allowing 65,536 basic characters), which claims to offer "a unique number for every character, no matter what the platform, no matter what the program, no matter what the language" (Unicode Consortium, 2002).
It is important to note that the technical considerations outlined above interact with social factors, just as more traditional orthographic systems do. Even terms such as 'non-Latin script' and 'extended character set' indicate a particular, socially-motivated perspective on orthographic systems. The term 'extended character set,' for instance, suggests that the symbols used for writing English are the norm, from which other alphabets are derived by 'extending' the English alphabet with diacritics and so on. Note also that standards such as Unicode are for various reasons unevenly implemented throughout the world. A variety of keyboard layouts, older or competing versions of software applications or operating systems, and the dependence of online communication on a chain of servers all contribute to the persistence of ASCII as a "lowest common denominator" and de facto standard. Warschauer et al. (2002) point out that the diffusion and distribution of technologies favour some languages over others. Japan, Taiwan and Israel, for example, have implemented software standards in their own scripts by using the economic and technological resources at their disposal. However, many countries do not have these resources, and even where they are available, computer skills tend to be learned in tandem with English and other European languages. Thus, even though technical limitations may be cited as the reason for ASCII-ization, other factors reinforce this tendency. Even when indigenous fonts become available the phenomenon often lives on, as illustrated in Table 1 above.
Given that people use informal ASCII-ized representations in chat and other electronic communication, which ASCII symbols do they use to represent their language? Several factors seem to play a part in the choice of symbols. Androutsopoulos (1999) and Tseliga (2002) note that ASCII-ized 'orthographies' do not typically have the consistency characteristic of other orthographic systems: they may vary among writers, or within an individual writer's usage. However, they highlight some clear noticeable patterns in usage, which relate indirectly to the distinction between transcription and transliteration made by Beesley (1998).
The primary principle is that of phonological similarity to a sound in the language in question. In many cases, a sound in this language resembles a sound in English and/or another familiar language using the Latin alphabet, and there is a widely accepted and fairly consistent Latin alphabet spelling for this sound. Note that this varies with the spelling conventions of particular foreign languages. For example, the sound /u/ in Arabic tends to be represented as <ou> in Moroccan Arabic, on the basis of French spelling, and as <oo> in the UAE, where English is the main foreign language. Another key principle is that of visual similarity to a character in the normal alphabet of the language in question. Androutsopoulos (1999) and Tseliga (2002) note the use in ASCII-ized Greek CMC of representations such as <h> for the Greek letter <η>, which is clearly based on visual similarity, rather than on the sound (/i/) associated with the Greek letter. Another example for ASCII-ized Greek is the use of <8> or <0> to substitute for Greek <Θ> (pronounced like <th> in English <thing>).
In general, ASCII-ization (like traditional orthographic systems) seems to produce competing alternate representations (Palfreyman, 2001a); however, the factors which affect such variation remain to be investigated. Attitudes to this lack of consistency are often ambivalent: ASCII-ized varieties (in common with other new varieties of e-language) appear to be perceived as modern, but also as somewhat sloppy and perhaps as a threat to the language (Tseliga, 2002). Research Questions The aim of the present study is to analyze ASCII-ized representations used in IM conversations, and the social factors which impinge on these representations. The discussion above highlights some factors which are potentially relevant to the analysis of AA. These include technical factors (notably computer character support), linguistic factors (phonological patterns and orthographic conventions linked to English and Arabic) and social-psychological factors (ways in which users orient to social values through choices of linguistic resources). The main questions for investigation are:
- How do IM users represent (or not represent) Arabic sounds in AA?
- How consistent are these representations across interactions and across users?
- What linguistic resources do users draw on in representing Arabic?
- What purposes does AA serve for those who use it?
MethodologyThis study involves three sources of data: a corpus of messenger conversations (supplemented by short interviews with the core informants), responses to a short e-mail survey and informal observation. We collected the corpus by asking three student volunteers to provide sample conversations. These core informants were female UAE nationals aged 18-19, studying in their first or second year in ZU. They all had some experience of IM before entering the university, but began using it more extensively since they began their university studies; all three of them were also comfortably literate in both Arabic and English. Their interlocutors in the conversations were of a similar background (all female UAE nationals in higher education), except for B, a male cousin of one of the core informants. Since the focus of this study was AA, we asked the core informants to provide conversations including examples of this variety. The messenger programs used by students all include a feature for archiving or saving conversations, and students were asked to obtain consent, where feasible, from their interlocutor to save the conversation and use it anonymously for research. The students collected these conversations between November 2002 and January 2003. After this we conducted a short interview with each core informant to gather background information and clarify a few points in the conversations.
Each conversation included two interlocutors, a core informant and another acquaintance, and the conversations used added up to approximately the same size sample for each core informant. The resulting corpus included a total of approximately 2,400 words of AA (in 543 conversational turns), approximately 2,000 words of English (in 571 turns-note that many turns included both AA and English items), and a small amount of material in Arabic script. This is clearly not a large corpus, but it includes contributions from ten different interlocutors and seems sufficient for an initial exploration of phenomena in this variety of Arabic. We will refer to users throughout by the pseudonyms A, B, C, etc.
*A: interacts with B (a male cousin) and C (a female friend).
*D: interacts with E and F (female friends). *G: interacts with H, I and J (female friends).
Table 2. Informants (* = core informants)
The corpus was analyzed initially by counting instances of particular key symbols known to be used for particular Arabic sounds (e.g., <q>, <ai> and numerals), checking that each instance did indeed represent that sound. An Arabic speaker then read through the conversations to locate instances where sounds or words were represented in unexpected ways; in a few instances particular segments of the corpus were shown anonymously to UAE vernacular speakers to check their pronunciation. A few obvious typos were discounted from the analysis.
Following this quantitative analysis of the conversations, an e-mail survey including the following four open questions was sent to all of the approximately 1000 students at ZU Dubai, in order to elicit their perceptions of the wider social context of AA:
1. Why do people sometimes write Arabic [on a computer] with English letters instead of using Arabic letters?
2. Do you remember when and how you learned to write Arabic like this?
3. Where do you think this way of writing came from—who *first* used these symbols?
4. Do you ever see or write Arabic like this in other situations (*not* in messenger)?
79 responded to this survey by e-mail. The responses were coded, and where necessary the respondents were asked (by e-mail) to clarify parts of their responses. Throughout the period of this research we also collected examples of AA which we observed in other settings, for example on a Student Council "friendship wall" on campus.
FindingsThe conversations in this corpus share some features with English CMC studied by earlier researchers. The register used is generally informal, turns typically short (often four words or less), and the language stripped down and abbreviated. For example, letters are almost exclusively lower case, with capitalization used mainly for emphasis (this may be reinforced by the lack of upper case letters in Arabic script). In addition, typographical conventions are used to represent stylized verbal or attitudinal effects: vowels are reduplicated for emphasis or expressiveness (e.g.,
when greeting someone by name), as are punctuation marks such as <!> and <?>. Emoticons such as <^_^> occur, although ones standard in English CMC were often converted automatically by the messaging program to icons, which were subsequently lost in the text-only saved version of the conversations. Abbreviations based on English are also used, such as <lol> (= "laugh (out loud)"—normally a turn on its own). There were also various typos, as one might find in English chat or messaging.ت>, and /s/ is represented by <s> instead of <س>. Other sounds are represented by digraphs clearly drawn from English: for example, <th> is used for the sounds /θ/ and //, and <sh> for the sound /∫/. These may seem natural and inevitable to native speakers of English, but they are based in a particular representation system used in English. In contrast to Common Latinized Arabic, several sounds which do not exist in English are represented by numerals which are seen by the informants as having some visual resemblance to the corresponding Arabic letters. The Arabic sounds and the symbols used to represent them in normal and ASCII-ized orthography are shown in Table 3. Note that the visual resemblance is clearer in some cases than in others, and in some cases involves mirror-image reversal of all or part of the symbol.
Concerning the use of Arabic vis-Ó-vis English, in common with Warschauer et al.'s (2002) findings, in the present corpus there was a fair amount of code-switching (changing mid-utterance or mid-sentence from one language to another) and code-mixing (using words or phrases from one language within sentences in the other language). This mixing of varieties often correlated with different functions or topics, with Arabic being used for more formulaic phrases, and English for topics such as university courses. In the extract at the beginning of this paper, for instance, the formulaic greetings are entirely in Arabic, whereas D's final utterance includes a number of English words related to the student's (English-medium) university course, e.g., <project eljava script >, which consists of English words, concatenated with Arabic word-order and the definite article <el>. It should be remembered, however, that valid quantitative generalizations cannot be made about the proportions of the different varieties on the basis of this corpus, since students were specifically asked to provide conversations including some AA.
The AA used by these students broadly follows latinization conventions used in signage in Dubai. This means that AA is in many respects a transcription of what writers would say, rather than a transliteration of Arabic script: although the initial motivation for using AA may be the difficulty of using Arabic characters themselves, the students do not simply turn each Arabic letter into a corresponding Latin one. For example short vowels, which are not normally written in Arabic orthography, are often included in AA.
Some Arabic sounds are represented in AA by single ASCII letters, based on the usual pronunciation of these letters in English. For example, the sound /t/ is represented by <t> instead of the normal Arabic letter <
Sound Arabic letter ASCII representation
(& English translation)
/ħ/ (a heavy /h/-type sound) <ح> <7> <wa7ed> (one) /ʕ/ (a tightening of the throat resembling a light gargle) <ع> <3> <ba3ad> (after) /t'/ (the emphatic version of /t/) <ط> <6> <6arrash> (he sent) /s'/ (the emphatic version of /s/) <ص> <9> <a9lan> (actually) /ʔ/ (glottal stop) <ء> <2> <so2al> (question)
Table 3. Numerals used to represent Arabic sounds
There are several letters in Arabic which are distinguished only by the presence or absence of a dot above them. Some of the dotted letters correspond to sounds existing in English, and are replaced with the English letters for these sounds (e.g. <ز> is replaced by /z/ on the basis of pronunciation, rather than by <j>, which looks similar). However, where the sounds do not occur in English, they are represented in AA, on the basis of visual resemblance to the Arabic characters, with digraphs consisting of a numeral preceded by an apostrophe. Note that in the AA segments of this corpus the apostrophe is not used alone (for example to represent a glottal stop). The conventions used are shown in Table 4:
Sound Arabic letter ASCII representation Example
(& English translation)
/x/ (final sound in Scots 'loch') <خ> '7 <'7ebar> (news) /ɣ/ (voiced version of above) <غ> '3 <'3ada> (lunch) /'/ (the emphatic version of //, the first sound in English "that") <ظ> '6 <'6ahry> (my back) /d'/ (the emphatic version of /d/) <ض> '9 <man3ara'9> (not shown)
Table 4. Numerals used with apostrophe to represent Arabic sounds
The numeral <5>, is also used as an alternate to <'7> to represent the sound /x/. This appears to derive from the fact that the Arabic word for "five", /xamsa/, begins with this sound. Looking at the corpus as a whole, it appears that individuals use consistently either <'7> or <5>. A, B, I and J use only <5>, while D, F, G and H regularly use <'7> for the same sound; this means that in some conversations, one participant consistently uses <5> and the other <'7>. Note that this sound happened not to occur in any of the words used by C and E.
Notice the overall logic of the symbols above: Arabic sounds with clear equivalents in English are represented according to English conventions; but those which have no clear equivalent in English are represented by symbols based on familiar Arabic letters and sounds. This reflects the background of the writers as native speakers of Arabic who are familiar with the writing systems of both languages (as opposed to being linguists, or native speakers of English), and it is interesting to consider possible conventions which have not been adopted by these writers. For example, if a native English speaker who does not speak Arabic were asked to represent /s'/, s/he would probably use <s> (the closest correspondence in English); but none of the conversations in the corpus used <s> to represent this sound. For an Arabic speaker /s/ and /s'/ are quite distinct phonemes, as distinct as the sounds at the beginning of "sing" and "thing" are for many English speakers. Likewise, the writers here do not necessarily use ready-made forms from Common Latinized Arabic (CLA). For example, they never represent the sound /x/ by using the digraph <kh> (which suggests some relation with the sound /k/), nor <gh> for /ɣ/, but instead use distinct symbols derived mainly from the corresponding Arabic letters (<'7>) but in at least one case from a familiar spoken Arabic word (<5>). In the process, the transcription becomes in some ways less ambiguous and less dependent on contextual cues and background knowledge than in CLA: for example, in contrast with the ambiguity of <kh> in CLA which was mentioned earlier, in AA <kh> unambiguously represents the sequence of two sounds /k h/ (rather than the sound /x/, which would be written <'7>).
The variety of Arabic used in the corpus is basically the UAE vernacular, and this affects the kinds of written symbols which are used in the conversations. As noted earlier, the sound /t∫/ (usually represented in English as <ch>, as in <chair>) is specific to Gulf vernaculars: it does not occur in MSA, and there is no standard Arabic letter to represent it. When Arabic writers wish to represent words with this sound, they try to use an Arabic letter which is in some sense 'close to' this sound. Interestingly, in this case English spelling provides a ready-made solution where MSA does not: in the corpus the writers regularly use <ch> to represent this sound. The extract in Table 1 provides interesting examples in this regard. Both D and F write the vernacular expression /∫ħa:lit∫/ (= "how are you-FEMININE"), corresponding to MSA /kaifa ħa:luki/. D, using Arabic script at this stage, writes the final /t∫/ sound as <ج>, which normally represents the similar, but voiced, sound /dʒ/. Note that by doing so she makes an effort to represent the vernacular pronunciation, deviating from the standard spelling with <ك> (/k/), which reflects the MSA pronunciation of this expression. F, on the other hand, makes use of English conventions to represent the vernacular, and writes the same expression as <sh7aalech>.
Another difference between UAE vernacular and MSA is in the pronunciation of the sound /q/, written <ق> in Arabic and <q> in CLA. For the most part the MSA sound /q/ is represented in this corpus using <g>, reflecting (in terms of the conventions of English) its local vernacular pronunciation, e.g. <gooli> (tell me). If the conversations were pure vernacular, then the letter <q> would presumably not be used at all; however, there are a few occurrences of this letter in the corpus. For example, when informant A writes <fe i qanah?> (on which [TV] channel?), she uses the CLA character <q> to reflect a pronunciation with the MSA /q/ sound. Checks with other UAE Arabic speakers confirmed that they would be likely to give this word the MSA pronunciation, reflecting the recent borrowing of this word from MSA.
Another sign of influence from 'outside' is one example of the use of <t> for the emphatic /t'/, in the word <fattoom> (a diminutive of the name 'Fatima'). This is the only example in the corpus of <t> (as opposed to <6>) being used for this sound, and is an exception to the overall tendency described above for Arabic consonant phonemes to be kept distinct from each other in the ASCII-ized version. It seems plausible that this is influenced by the writer having often seen personal names such as 'Fatima' written in this way at the University and perhaps elsewhere.
Influence from 'official' CLA spellings might also help explain the spellings used in the student-drawn cartoon shown in Figure 1, which was observed on a 'friendship wall' set up in the university cafeteria by the Student Council for graduating students to write messages to each other. As we will discuss later, AA has come to be used for stylistic effect in some offline contexts as well as in CMC. As well as many inscriptions in Arabic and some in English, the 'friendship wall' included several examples of AA, including this cartoon, drawn by one student to represent the students in her class. Note that the name 'Sheikha' (the 'official' name of the seventh student from the right, apparently nicknamed "Blossom") is written <sheikha>, using the CLA form which would be familiar from university contexts. However, the diminutive of the same name, which would not normally be seen in official contexts, is written beneath the caricature of another student (fifth from the right) as <shwee5>, using the numeral <5> to represent the /x/ sound, as in AA.
However, more idiosyncratic examples of CLA also occur in the IM corpus: for example, on one occasion D writes <9ah> for the more common <9a7> (= true): although this is not typical of AA, it is presumably not a random typo, since it reflects the English spelling for the English sound which is perceptually closest to Arabic /ħ/, as well as being the CLA representation for this sound.
Figure 1. Student cartoon (photograph by David Palfreyman)
Whereas Arabic has more consonants than English provides ready symbols for, in the case of vowels English offers a much wider range of letters and digraphs than Arabic writing, even if the little-used Arabic vowel diacritics are taken into account. Furthermore, UAE vernacular distinguishes more vowel sounds than MSA. Probably for both these reasons, the representation of vowels in the corpus is considerably less consistent than that of consonants. As in Arabic script, short vowels are often left out entirely. In some cases, this reflects reduction of syllables in UAE vernacular; in other cases a vowel which would be pronounced is left out (e.g. <97> for /s'aħ/). However, short vowels are also represented in many cases, following the convention for English and in contrast to Arabic script, where they would not normally be marked at all. When vowels are marked, each may be represented by a variety of written symbols. Table 5 summarizes the most noticeable patterns in representing vowels.
Sound Most frequently
Less frequently represented as: Occasionally represented as:
<a> - e.g. hala (= hello)
-- -- /a:/
<a> - e.g. <kaif al7al?> (= How is it going?)
<aa> - e.g. <7elwah hay el aflaam?>
(= are these films good?)
-- /ɨ/ <e> - e.g. <kent malaaaaanaaaah> (= I was boooooored) <i> - e.g. <ana kint achoof Angel> ( = I was watching Angel) <o>, <u>, <a> - e.g <noba narged> (= I want to sleep) /ei/ <ai > - e.g. <enzain> (= all right) <ei> - e.g. <nseit> (= I forgot) <ee> - e.g. < el'7eer> (= goodness) /i/ <i> - e.g. <almohim> ( = the important thing) <y> - e.g. enty (= you) <ee> - e.g. < fee bali> (= on my mind), <e> - e.g. <fe i qanah?> /i:/ <ee> - e.g. tadreeb (= drill, exercise) -- <i>, <e> - e.g. <iman> ( = girl's name), <el moderah> ( = headmistress) /u/
<u> - e.g. <shukran> ( = thanks)
<o> - e.g. <sho ishtraiti?> ( = What did you buy?) -- /u:/ <oo> - e.g. <kent ashoof> ( = I was watching) -- <ou> - e.g. <you9al> (= delivered, done) /o/ <o> - e.g. <w 3laikom essalam> ( = and peace upon you) <u> - e.g. <wa 3alaikum essalam> (= and peace upon you) -- /o:/ <o> - e.g. <a'7ar yom> ( = last day)
<oo> - e.g. <el yoom> ( = today)
<ou> - e.g. <youm> (=day)
Table 5. Representation of vowels
Although there is considerable variation, as in the case of <'7> and '<5> mentioned above, individual informants show some consistency in their use of symbols. For example, although the vernacular sound /ɨ/ is variably written (/yɨmkɨn/ (= could be) appears in the corpus as <yumkin>, <yemken> and <yemkin>), each writer tends to use either <i> or <e> fairly consistently to represent this sound. Similarly, <ai> and <ei> alternate as in the word "laish" in the following exchange:C: laish (= why?)The influence of English and the familiar CLA orthography is detectable not only in correspondences such as <oo> for /u:/, but also in the choice of variant. For example, <y> is used for /i/ only at the end of words of more than one syllable, reflecting a similar convention in English. Furthermore, as in the examples of <fattoom> and < sheikha> mentioned above, there are indications that personal names are more likely to be influenced by CLA conventions, since the only case of <i> being used for the vowel /i:/ is in the personal name <iman> (see Table 3).
A: sho leish?? (= why what?)
C: laish ma 3ndch friends (= why don't you have any friends?)
ASCII-izing Arabic Morphophonology
We will consider here two examples where AA in this corpus represents not only the phonology (pronunciation) of Arabic, but also its morphophonology (grammatical units which underlie these sounds). The first example is the definite article, consistently written <al> in CLA and <ال> in standard Arabic orthography, and typically pronounced as /ɨl/ in UAE vernacular. As mentioned earlier, the pronunciation of the consonant in the definite article often assimilates to the sound which follows it, for example being pronounced as /s'/ before /s'/ and /t/ before /t/. In the corpus, the definite article is most frequently represented as <el>: The vowel reflects its vernacular pronunciation, but the consistent use of <l> regardless of its pronunciation (e.g., <el 9ba7> , <el jeep> rather than the more phonologically accurate *<e9 9ba7>, *<ej jeep>) reflects in this case a transliteration of Arabic orthography rather than a transcription of pronunciation. Nevertheless, there were a few examples of phonological spelling, in common phrases and vernacular words involving assimilation to /s/, e.g., <essalam 3aleikom> (= peace upon you, hello) and <essa3ah 3> (= 3 o'clock—note that the second <3> here represents the number three). In these phrases the definite article is also orthographically attached to the word which follows it, whereas in other cases a space is usually inserted between the article and the following word. Another case where AA reflects an underlying grammatical distinction is the representation of the feminine "-a" ending. This ending is generally pronounced /a/, but is realized in certain contexts as /at/, for example in /dʒa:mʕat zayed/ (= Zayed University). In CLA, words with this ending tend to be written with final <ah>, distinguishing them from non-feminine words such as /ana/ (= I) which always end in /a/. In AA this ending is transcribed as /et/ when the /t/ would be pronounced, but even when it is pronounced simply /a/, the underlying distinction is largely maintained, with <ah> and <eh> used for the feminine ending. Some informants (e.g., B) use <ah> throughout their contributions; others (e.g., D) mainly use <eh>, but still use <ah> for words which tend to be pronounced according to MSA. One example of this is the word <qanah> referred to earlier.
Another particularly striking example is the following: <elmoderah mb 3arfeh shai 3n essalfeh> (= the headmistress does not know anything about the story). Here D uses <ah> in <moderah>, an item of institutional vocabulary normally pronounced as in MSA, but uses <eh> in the more vernacular words <3arfeh> and <salfeh>). Both <ah> and <eh>, however, are much more common than <a>, which represents the pronunciation rather than the underlying form. We can thus see influence from three sources: CLA spelling (associated with MSA pronunciation), vernacular pronunciation, and an underlying awareness of the morphophonemic patterns which both varieties share.
The Social Context of AA
When talking about the conversations in the interviews, the core informants did not use a particular term to refer to AA. "Arabic English," "writing Arabic with English letters" and "Arabenglish" were all used to refer to it: note the equation in all these cases of the ASCII symbols with the English language. Ease of typing was mentioned by most informants as a motivation for using AA, and they pointed out that they had had little experience of using a keyboard before coming to the largely English-medium university. Privacy (particularly from parents) and the intrinsic interest of writing in an unusual script were the other motivations cited. In addition to IM, chat rooms, e-mails, URLs and mobile phone text messaging (SMS) were mentioned as contexts where AA is used extensively. Informants mentioned seeing ASCII-ized URLs such as www.6arab.com (a music site) on print advertisements, but had not otherwise encountered the use of ASCII-ized forms offline. In general, the core informants had perhaps seen AA before coming to university, but had started to use it themselves only after starting their studies. One student said that she had first encountered it in communicating with her brother who was living in Sweden and who had no Arabic script functionality on his computer there.
In order to expand the data base on these topics, all students on the Dubai campus were sent an e-mail survey, to which 79 students responded. Their responses are summarized below.
1. Why do people sometimes write Arabic [on a computer] with English letters instead of using Arabic letters?
55% of respondents said that they use AA because they find it easier to type in English than in Arabic. In most cases, it was implied or stated that this was a matter of greater familiarity with English keyboard layout, "because we type most of our projects, homework etc in English." 30% of respondents mentioned technical factors, specifically lack of support for Arabic script, but several of these noted that this lack of support was true in the past rather than now. 15% of respondents mentioned using English letters for vernacular sounds not represented in Arabic script, citing <ch> and <g> as examples.
Ten percent of respondents commented explicitly on positive social connotations of AA. One, for example, felt that "people who are higher educated use this way of writing and others use original Arabic writing letters [...] in another word the one who is used to English right in this way and the one who is not used to English use the original writing way;" another described AA as "kind of a code ,, we feel that only ppl of our age could understand such symbols and such way of typing [...] i guess its kind of a funky language for teenzz to use." Both of these comments in different ways link AA to positive, in-group local values of education, competence in English and peer group prestige.
Interestingly, one informant contrasted AA with CLA: "[AA makes] the word sound more like 'Arabic' pronunciation rather than English. For example, we would type the name ('7awla) instead of (Khawla). It sounds more Arabic this way :)." Here the CLA form of a personal name seems to be associated with 'out-group' discourse, possibly through its association with formal university contexts; this is contrasted with AA, which is paradoxically seen as more 'locally authentic'—paradoxically because AA is in orthographic terms no more 'Arabic' than CLA. It is interesting to compare this with the contrast in Figure 1 above between the spelling of /x/ in the full name <sheikha> and in its diminutive <shwee5>.
2. Do you remember when and how you learned to write Arabic like this?
Seventy percent of those who mentioned when they learned to use AA said that they had encountered this variety before entering the university. All of those who mentioned how they learned AA said that they had learned it from other people with whom they interacted online—these included relatives (especially relatives studying abroad), and online acquaintances. None of them mentioned learning it from sources such as websites or print materials.
3. Where do you think this way of writing came from—who *first* used these symbols?
The most common answer to this question was "I don't know," but guesses tended to focus on Arabs abroad (citing lack of Arabic script support), and/or on positively evaluated groups including "chatters," "young people" and "creative people."
4. Do you ever see or write Arabic like this in other situations (*not* in messenger)?
None of the respondents mentioned seeing AA in more official public settings (although it appears in URLs and has been used in a fast food advertisement on posters in Dubai), focusing instead on personal communications. Most mentioned using AA in e-mails, and 40% mentioned using it in mobile phone text messaging (SMS). Note that mobile phones, like computers, are often Arabic script enabled; however, default settings can make Arabic script somewhat more complicated to set up and/or use than Latin script. The second most frequently cited genre of offline communication (25%) was notes and cards to friends. One student said that "my 15 years old sister uses it to write her friends short messages that her teachers won't understand during class in case they get caught"—this contrasts the in-group aspect of AA with privacy and exclusion of 'outsiders.' This was mentioned by several respondents, especially in relation to secret communication in class: one stated that "I have seen some girls back in high school use this sort of language to cheat in arabic tests. An arabic supervisor when reading it won't understand anything written there. And the same goes for English supervisor!" After reading this response, we actually showed samples of AA to a few non-UAE Arab teachers aged over 40, and found that they did indeed find it almost impossible to read, apparently owing to unfamiliarity both with the orthographic conventions, and with the vernacular used.
ConclusionWarschauer et al. (2002) note that the private, relatively unregulated world of CMC may foster the development of linguistic varieties which will reflect and contribute to changes in the linguistic balance of the Arab World. The phenomenon of ASCII-ization, as noted earlier, is apparently a response to a technical constraint (lack of script support), but the ways in which users get around this constraint, and the ways in which they use AA in contexts where this constraint does not apply, reveal much about their use of linguistic and other resources. These resources include the spoken languages with which they are familiar, the orthographic symbols and conventions of these languages, and, beyond this, the social meanings which surround various kinds of literacy.
On the one hand, the interaction between English and Arabic in AA involves a combination of transcription of spoken language and mediation from the properties of the Arabic and Latin writing systems. While in many cases AA follows patterns drawn from the CLA latinizations used on road signs and in other public contexts, there is also more idiosyncratic influence from English (e.g., the use of <y> for /i/ in final position, which is not common in CLA in Dubai). On the other hand, where English does not provide a phonologically comparable and fairly consistent orthographic convention, ASCII symbols outside the English alphabet (notably the numerals) are used. In these cases numerals based on a purely visual resemblance to Arabic characters are used to maintain the distinctness of sounds. AA in fact represents these sounds more faithfully and consistently than the CLA forms found in public domains.
There is also the issue of vernacular versus Modern Standard Arabic. In effect, in the ASCII-ized Arabic of these students, we can observe firsthand the first extended written use of the UAE vernacular. This variety has previously been written down only in very brief texts, in specialized genres such as poetry, cartoons and linguistic studies. Now, however, under the combined pressure of technical and social change, it is being used routinely in written form, for everyday interactional purposes. Standardization of this form of Arabic is almost entirely informal, but it draws on other linguistic resources as outlined above.
The use of AA also highlights the use of symbolic resources current among young Gulf Arabs. Shigemoto (n.d.) notes in relation to formal language planning that "a writing system legitimates literacy efforts, which, in turn, contribute to the cultural production and vitality of a community." In the UAE context, the informal use of AA appears to be enabling (at least for the moment, in certain domains) a vernacular with local prestige. Users apparently choose to by-pass the Arabic writing system—to which UAE vernacular has the closest historical relation, and from which it has historically been excluded because of the low social status of vernaculars in the Arab world in general—and instead draw on the orthographic system of another language (English) which has a different prestige base, in the broader context of globalization.
One point to note about this study is that all but one of the informants are female—a consequence of our sampling methodology, which capitalized on online social networks that in this case are typically female. The single male informant did not appear to use AA in a way different from the others, but it would be interesting to examine the characteristics of AA in mixed or male-predominant samples. Other research possibilities related to AA include analyzing a larger corpus, perhaps drawn from e-mails and chatrooms as well as from IM. Various aspects of such a corpus could be studied, including the representation of sounds (particularly vowels) and the representation of underlying and surface forms, as well as of vernacular versus MSA. Although this study has looked at AA as a set of orthographic items, a similar corpus offers the possibility of studying from a discourse analysis perspective how AA use develops through an interaction: in the opening in Table 1 in this paper, for example, what factors influence D's change from Arabic script to AA? From a psychological perspective, it would also be interesting to examine the speed and accuracy of people's comprehension of AA as compared with Arabic script, and their attitudinal responses to these varieties (cf. Dahlbńck, Swamy, Nass, Arvidsson, & Skňgeby, 2001).
Abrams, D., & Hogg, M. A. (1987). Language attitudes, frames of reference, and social identity: A Scottish dimension. Journal of Language and Social Psychology, 5, 202-13.
Androutsopoulos, J. (1999, April). Latin-alphabeted Greek in email communication: Uses and stances. Paper presented at the 20th Working Meeting of the Linguistics Department of the Aristoteles University of Thessaloniki. Retrieved October 29, 2002 from http://www.rzuser.uni-heidelberg.de/~iandrout/greekmail/rep_99_1.htm. Currently available at http://greekweb.archetype.de/rep_99_1.htm.
Baron, N. (2002, October). Text in the fast lane. Paper presented at AoIR 3.0: The 3rd International Conference of the Association of Internet Researchers, Maastricht, Holland.
Beesley, K. R. (1998). Romanization, transcription and transliteration. Retrieved June 1, 2003 from http://www.xrce.xerox.com/competencies/content-analysis/arabic/info/romanization.html.
Bruthiaux, P. (2002). Hold your courses: Language education, language choice, and economic development. TESOL Quarterly, 36/3, 275-96.
Cadora, F. J. (1992). Bedouin, village, & urban Arabic: An ecolinguistic study. Leiden, The Netherlands: E. J. Brill.
Cargile, A. C., & Giles, H. (1997). Understanding language attitudes: Exploring listener affect and identity. Language and Communication, 17, 195-217.
Dahlbńck, N., Swamy, S., Nass, C., Arvidsson, F., & Skňgeby, J. (2001). Spoken interaction with computers in a native or non-native language: Same or different? Human Computer Interact '01, 294-301. Retrieved June 21, 2003 from http://www.stanford.edu/~nass/comm369/pdf/InGroupVs.OutGroupPrompts.pdf.
Ferguson, C. (1959). Diglossia. Word, 15, 325-340.
Grivelet, S. (2001). Introduction. International Journal of the Sociology of Language, 150. Digraphia: Writing Systems and Society, 1-10.
Hentschel, E. (1998). Communication on IRC. Linguistik Online 1/1. Retrieved February 1, 2003 from http://viadrina.euv-frankfurt-o.de/~wjournal/irc.htm.
Hoffiz, B. T. (1995). Morphology of UAE Arabic, Dubai Dialect. Ph.D. Dissertation, University of Michigan microform.
Leong, C. K. (1991). From phonemic awareness to phonological processing to language access in children developing reading proficiency. In D. J. Sawyer & B. J. Fox (Eds.), Phonological awareness in reading: The evolution of current perspectives. New York: Springer-Verlag.
Li, W.-C. (2001). Where have all the neutral tones gone? Charting neutral tone decline in Taipei Mandarin, with evidence from online phonological simulation. Paper presented at the American Oriental Society 211th Meeting, Toronto Colony Hotel, March 30, 2001.
Mann, V. A. (1986). Phonological awareness: The role of reading experience. Cognition, 24, 65-92.
Palfreyman, D. (2001a). LINGUIST List 12.2760: Informal Latinized Orthographies. Retrieved October 29, 2002 from http://www.ling.ed.ac.uk/linguist/issues/12/12-2760.html.
Palfreyman, D. (2001b). Keeping in touch: Online communication and tradition in a Gulf Arab context. Unpublished paper.
Read, C. A., Zhang, Y., Nie, H., & Ding, B. (1987). The ability to manipulate speech sounds depends on knowing alphabetic reading. Cognition, 24, 31-44.
Rodgers, J., & Gauntlett, D. (2002, October). Teenage intercultural communications online: A redeployment of the Internet Activist Model. Paper presented at AoIR 3.0: The 3rd International Conference of the Association of Internet Researchers. Maastricht, Holland.
Sampson, G. (2002). Pronunciation of Greek and Latin two thousand years ago. Retrieved July 4, 2003 from: http://www.linguistlist.org/~ask-ling/archive-most-recent/msg06865.html.
Schiano, D. J., Chen, C. P., Ginsberg, J., Gretarsdottir, U., Huddleston, M., & Isaacs, E. (2002). Teen use of messaging media. Human factors in computing systems: CHI 2002. Extended abstracts. NY: ACM. Retrieved June 1, 2003 from: http://hci.stanford.edu/cs377/nardi-schiano/CHI2002.Schiano.pdf.
Shigemoto, J. (n.d.) Language change and language planning and policy. Pacific Resources for Education and Learning (PREL). Retrieved June 1, 2003 from http://www.prel.org/products/Products/language-change.pdf.
Stevenson, J. (2000). The language of Internet Relay Chat. Retrieved June 15, 2003 from http://www.demo.inty.net/Units/Internet%20Relay%20Chat.htm .
Street, B. V. (1995). Social literacies. London: Longman.
The default language. (1999, May 15). Economist, p. 67.
Tseliga, T. (2002, October). Some cultural and linguistic implications of computer-mediated Greeklish. Paper presented at AoIR 3.0: The 3rd International Conference of the Association of Internet Researchers. Maastricht, Holland.
Unger, J. M. (2001). Functional digraphia in Japan as revealed in consumer product preferences. International Journal of the Sociology of Language 150: Digraphia: Writing Systems and Society, 141-52.
Unicode Consortium (2003). What is Unicode? Retrieved June 29, 2003 from http://www.unicode.org/standard/WhatIsUnicode.html.
Warschauer, M., El Said, G. R., & Zohry, A. (2002). Language choice online: Globalization and identity in Egypt. Journal of Computer-Mediated Communication, 7 (4). Retrieved August 15, 2002 from http://jcmc.indiana.edu/vol7/issue4/warschauer.html .
Weber, G. (1997, December). The world's 10 most influential languages. Language Today, 2.
Werry, C. (1996). Linguistic and interactional features of Internet Relay Chat. In S. Herring (Ed.), Computer-mediated communication: Linguistic, social and cross-cultural perspectives (pp. 47-63). Amsterdam: John Benjamins.
Yates, S. J. (1996). Oral and written linguistic aspects of computer conferencing. In S. Herring (Ed.), Computer-mediated communication: Linguistic, social and cross-cultural perspectives (pp. 9-46). Amsterdam: John Benjamins.
About the AuthorsDavid Palfreyman works at Zayed University, Dubai, contributing to ESL-related programmes in the English Language Centre and educational development in the Centre for Teaching, Learning and Assessment. His research interests include the roles of sociocultural context in education, and the use of information and communication technology.
Muhamed al Khalil works in the Arabic Studies Department at Zayed University. He teaches Arabic composition, Arabic as a Foreign Language, and contributes to the University's Arabic Across the Curriculum program. His research interests include Arabic dialectology, sociolinguistic influences on Arabic rhetoric, and the interplay of the literary and the political in the Middle East.
Address: Zayed University, P.O. Box 19282, Dubai, United Arab Emirates.
(c)Copyright 2003 Journal of Computer-Mediated Communication