Working on Riau Indonesian
In 1992, I got a job at the National University of Singapore, and on a clear day, I could see one of the islands of the Riau archipelago, Pulau Karimun, from my office window. (If I had been lucky enough to have an office on the other side of the corridor, I would have been able to see some 20 much closer islands.) On weekends, I would go travelling in the islands, and I soon began picking up the local language.
Right away, I was struck by how different the local language was from the Standard Malay / Indonesian that I had read about in the linguistic literature. So I set out to investigate the language, by eliciting data from native speakers. But this turned out to be a virtually impossible task: the interference from the standard language was much too strong. If I asked speakers how to say something in colloquial Indonesian, they would invariably provide sentences in the formal language. If I then confronted speakers with sentences that they themselves had uttered, they would deny having produced them, and then offer to "correct" the sentences by translating them into the formal language. Similar problems occur in many or all languages; however, the extent of the phenomenon differs considerably from one speech community to another and here it was about as difficult as it gets.
Refusing to give up, I resorted to various ruses. At the ferry terminal on Batam island, where I would get off the boat from Singapore, there was a group of shoeshine boys who I had befriended. Whenever I passed through, I would hang out with them, and occasionally buy them drinks at one of the foodstalls there. At one point I was interested to find out whether there were any constraints on "accessibility to relativization" in Riau Indonesian. (This was before I had come to the conclusion that Riau Indonesian does not have relative clauses.) Having already tried, and failed, to obtain data from the boys by direct elicitation, I thought I would trick them. Before leaving Singapore, I carefully prepared a series of drawings of fish, each in a different situation. (This was before the days of Photoshop and digital cameras.) My hope was that I could somehow engage them in a conversation about the drawings, and lure them into a situation in which they would want to distinguish between the different fish, by using "relative clauses" of various kinds. Arriving at the ferry terminal, I met up with the boys, and invited them to drinks, with the fish drawings carefully sticking out of my shoulder bag. The bait was swallowed: the boys snatched the pictures out of my bag, and began talking about them, mostly making friendly fun of my rather limited graphic skills. I waited, waited, but the "relative clauses" just wouldn't come. Gently, or so I thought, I tried to nudge the conversation along in such a way that the desired construction would appear. But I must have tried too hard. Suddenly, the boys realized that this was not a casual conversation, but rather some kind of a "test". Immediately, they switched registers, and started speaking to me as I imagine they would speak to their school teachers, with a distinctive intonation contour, and lots of verbal morphology. In other words, in formal Standard Indonesian. I never got the data I was after, and after a few more similar attempts, I simply gave up trying to elicit data from native speakers in Riau Indonesian.
However, I was still learning the language, and using it for my daily interactions. So I started to jot down utterances that I heard people produce, which struck me at the time as being interesting from a grammatical point of view. What began haphazardly soon became much more systematized, and before long I found myself carrying a little notebook and pen around with me wherever I went. The list of utterances increased in length, and after a while began to get out of hand. So I constructed a computerized database, and entered all the utterances I had collected to that point. Since then, I have been adding continually to the database, which is growing constantly in size. Whenever I am in a Malay / Indonesian speaking environment I keep my ears peeled. Some days I don't take down any data; other days I may write down one, two, or a dozen different utterances, and later enter them into the database.
In the database, each record contains a single utterance, or spontaneous speech specimen. The first few fields provide the text, an interlinear gloss, an idiomatic translation into English, and a description of the context in as little or as much detail as seems to be relevant. The next few fields provide the name of the dialect, the location of the utterance, and, when known, the name and ethnicity of the speaker. And the next several fields provide various kinds of grammatical information on the utterance. For example, there is a field for reduplication, in which every instance of reduplication is classified in accordance with formal and semantic features; a field for interrogative forms, in which every WH expression is classified in terms of its form and function; and so on. In addition, there are a few fields for assorted information of a less structured type that happen to strike me as relevant. At present, the database contains a few thousand records of individual utterances from a variety of situations and places, ranging from departmental meetings at the university in Kuala Lumpur through shoeshine boy chatter in Riau province all the way to Irian Jaya tribesmen fussing with their penis gourds.
This method of data collection has many advantages. First, dealing with naturalistic data turns up construction types that may not be obtained via elicitation. Secondly, the data are real, in the sense that they reflect speakers' spontaneous behaviour, not their possibly biased reports of what they think they say. Thirdly, it is easy to obtain data from a large number of speakers, abstracting away from idiosyncratic speech patterns of particular individuals; similarly, it is easy to gather data from a great variety of contexts and situations, which may be conducive to different kinds of lexical and grammatical patterns. However, there is also a flip side to this method. First, there is the problem of reliability: in real live situations, and without a recorder, even the best ear may occasionally hear and record an utterance incorrectly. Secondly, you can only write down what you are capable of storing in your own short term memory, which is one short-to-medium length utterance at a time: longer utterances, not to mention sequences of utterances, are impossible. Thirdly, such a corpus does not lend itself readily to frequency studies: each record in the database is there because it struck me as interesting and worthy of note, not necessarily because it is typical as a result, the corpus may be biased in numerous ways. Finally, there is the absence of negative data: a corpus can tell you what is possible, but not what is impossible, at least not without engaging in risky statistical inferences. Some constructions, particularly those of greater complexity, may be grammatical but so infrequent that it is unlikely that they would show up in any reasonably sized corpus.
In order to overcome some of these disadvantages, I have, on occasion, supplemented the database of spontaneous speech specimens with additional kinds of data. A few years ago, I was working on a US National Science Foundation project concerned with long-distance reflexives, and was trying to find out whether such constructions occur in Riau Indonesian. My database did not contain any long distance reflexives; however, it contains so few utterances of the necessary length and complexity that the absence of any unequivocal long-distance reflexives could hardly be considered significant. Even in languages that have them, long distance reflexives do not occur very frequently. So I resorted to another ruse. On repeated occasions and in different places, I would seat myself in an outdoor coffeeshop with an open laptop computer showing a variety of drawings, and pretend to be engrossed in working on one of the drawings. Often, a crowd of curious onlookers, children and adults, would gather around, and describe the pictures to each other: if they said anything interesting I would take notes. One of the pictures showed a man holding a gun to his head, which was great for getting simple reflexives, as people would point to it and exclaim "He's shooting himself" in a variety of ways. However, the picture I was really interested in was one in which a woman, looking in a mirror, sees a man standing behind her pointing a gun at her back. What I wanted was for people to see the picture and say "She can see he's going to shoot " and then see whether they finish the sentence with an ordinary pronoun " her", as in English, or with a long distance reflexive " herself". But in Riau Indonesian, people tend to avoid complex sentences, and nobody produced either of the two constructions. So the ruse failed.
But when one thing does not work, you try something else. One of the well-known methods of collecting naturalistic data is through recordings of conversations, narratives, and other kinds of linguistic behaviour. Recordings provide reasonably good solutions to the problems of reliability, utterance length and frequency, if not that of negative data. However, you cannot carry recording equipment around with you wherever you go, so the range of data obtainable from recordings is necessarily more limited. Another problem with recordings is that it's all too easy to switch the tape recorder on, but once you have made the recording it takes a great deal of effort to transcribe and annotate. Linguists who have worked with naturalistic data estimate that it takes between ten and one hundred hours of work to transcribe and annotate one hour of speech. An additional problem is that of context: listening to a recording being played back, it is often difficult or impossible, in the absence of extra-linguistic cues, to reconstruct what it was all about. Ideally, I try and do the transcription soon after the recording was made, while the memory of the context is still fresh in my mind; but that's not always practical. An alternative solution would be for the audio recording to be accompanied by video, which, if done well, may fill in much of the context. But data collection methods such as these require lots of time, skills, and resources; they are major research projects. This is what we're now engaged in doing in the Jakarta Field Station.
It should be kept in mind, however, that no one method of data collection can provide an adequate basis for the understanding of how a language works. The field worker should approach his or her language in an ecumenical spirit, and be willing to seek out data from a variety of sources, using whatever methods are most feasible for the case at hand.
Short Essay: A Language Without a Name
Back to Riau home page
The Jakarta Field Station
Back to David Gil's home page
Back to Department