LWT Main Page
  Meaning List  
  MPI Leipzig Linguistics  

Loanword Typology

Guidelines for Contributors (2007 version)


1. Introduction

The goal of the Loanword Typology (LWT) project is to assemble systematic information on loanword patterns in about 40 languages from around the world, as a way of assessing lexical borrowability in a controlled way. The planned result will be a volume of about 40 chapters on individual languages, contributed by authors who are specialists in these languages. In addition, there will be a volume discussing the overall results of the project. The lexical database will be made accessible in an electronic format.

    Each chapter will consist  of two parts: a data part and a discussion part. The first version of the data part should be submitted to the editors (Martin Haspelmath and Uri Tadmor) by the end of 2006. The final version of the database and the text are due by the end of June 2007. Revisions and editing will be conducted during the second half of 2007.

2.  The discussion part (=the prose chapter)

A. Form


The Loanword Typology contributions consist of a data part and a discussion part, here called "prose chapter". The first version of the database was due at the end of 2006. The first version of the prose chapter is due on 30 June 2007.

Both parts should be submitted in electronic form to both editors (Martin Haspelmath and Uri Tadmor).

The prose chapter should be submitted as a text file or in a common word processor format (not PDF, because the editors need to process the text further).


Only Unicode fonts should be used, font size 12.


The text should consist of not more than 7,000 prose words, excluding references and the appendix. Single spaced, this yields about 15 printed pages. There can also be up to 5 figures (maps, tables, etc.).

A4.Title and headings

The titles of the chapters are uniform: "Loanwords in X". For little-known languages, further information on affiliation and geographical location is given, e.g.

    * "Loanwords in Gawwada, a Cushitic language of Ethiopia"
* "Loanwords in Gurindji, a Pama-Nyungan language of Australia"
* "Loanwords in Thai"

The chapters are divided exhaustively into sections, beginning with section 1. Sections may be divided into subsections (5.1., 5.2., etc.), which must again be exhaustive (though a short introductory paragraph may precede the first subsection).


Figures (tables, maps, etc.) should be centered and numbered. Titles of figures should also be centered, and begin with the type of figure, followed by its number, a colon, and then the title. No punctuation mark is used at the end of titles. For example:

    Table 4: Retained glides in Sanskrit loanwords in Indonesian

When referring to figures in the text, use the figure number rather than expressions like ‘the following table’, since we are not sure where exactly the figures will appear in the printed version.

A6.Abbreviations and footnotes

Abbreviations and footnotes should be avoided. Language names are not abbreviated, and other abbreviations are probably not needed.

Footnotes are probably not needed due to the survey character of the chapters. If footnotes are used, they should be printed at the bottom of the page.

A7.Citing loanwords and source words

All non-English words should be italicized. Glosses are given in single quotes. If the meaning of the source word is different from that of the loanword, it too should be given in single quotes. Etymologies are placed in parentheses. The mark '<', followed by a space, may be substituted for 'from':

    A rare example of a plural form in Indonesian is muslimin 'Muslim men' (< Arabic muslimīn).
In the word warta 'news' (< Sanskrit vṛtta 'event'), the initial glide has been retained.


In the text, works cited are referred to only by author and year. The year is not parenthesized, unless the referent is the author rather than the work:

    Arabic loanwords in Indonesian are listed in Jones 1978.
Jones (1978) lists Arabic loanwords in Indonesian.

An alphabetical complete list of full references is given at the very end of the paper, following the appendix. (Note that this is not counted for the 7000 word limit.)

The following format is used:

    * Alpher, Barry & Nash, David. 1999. "Lexical replacement and cognate equilibrium in Australia." 
Australian Journal of Linguistics
19.1: 5-56.
* Baldi, Sergio. 1988. A First ethnolinguistic comparison of Arabic loanwords common to Hausa and Swahili.
Naples: Istituto Universitario Orientale.
* Betz, Werner. 1974. "Lehnwörter und Lehnprägungen im Vor- und Frühdeutschen."
In: Maurer, Friedrich & Rupp, Helmut (eds.) Deutsche Wortgeschichte, Band I. Berlin: de Gruyter, 33-57.

A9.Loanword appendix

An appendix lists all the loanwords (of categories 3 and 4, i.e. certain and probable loanwords), in two columns, loanword and translation (the source words are not given). They are arranged by donor language, as seen in the example box below. Within each donor language, the words are arranged roughly according to the LWT Meaning List (if a word corresponds to several LWT meanings, it should of course be given only once).

    (Hypothetical example language: German)


* Insel 'island'
* Ozean 'ocean'
* Löwe 'lion'
* Kamel 'camel'
* Kochen 'cook'
* Kessel 'kettle'
* Pfanne 'pan'
* Schüssel 'dish'


* Lagune 'lagoon'
* Stiefel 'boot'


* Teller 'plate'

Middle Low German

* Bucht 'bay'
* Riff 'reef'
* Ebbe 'ebb'

Middle Dutch

* Klippe 'cliff or precipice'
* Küste 'shore'
* Juwel 'jewel'

B. Content

The prose chapter should give an accessible overview of the loanword situation in the language and should at the same time summarize the findings from the research done by the author for this project.

For the sake of comparability, the paper should contain each of the following sections (even if there is little to report). In addition, it may (but need not) contain up to three additional sections, in which the author can discuss any topic that s/he considers relevant.

B1.The language and its speakers

This section should contain information on – the speakers: how many, where – the language's genealogical affiliation – the sociolinguistic status: which domains of use (home, school, religion, official business, media, written literature) – the historical context

(The section should also contain a simple map showing the geographical location of the language and perhaps some neighboring donor languages. The editors will provide help with making these maps.)

B2.Sources of data

What data have been used for the database? What earlier work on loanwords in the language exists?

B3.Contact situations

This section should list all the major contact languages, as well as describe the contact situations. (For some contact languages, there are clearly distinguishable contact situations, e.g. Medieval French and 19th century French for loanwords into English.) What were the social conditions under which the loanwords arose?

B4.Numbers and kinds of loanwords

Two standardized tables should show the distribution of loanwords over semantic word classes (nouns, verbs, ..) and semantic fields (Physical world, Kinship, Animals, Body, etc.). Only equivalents of meanings in the core LWT should be included; the equivalents of any meanings added by the author should not be counted. If the same loanword is the equivalent of more than one LWT meaning, it is only counted once.

If a word corresponds to several LWT meanings that are in different semantic categories (e.g. if a word corresponds both to a LWT noun and to a LWT verb, or if a word corresponds to both a LWT body part term and a LWT political term, perhaps a word translatable as 'head, chief'), then this word counts half in one category and half in the other category.

The numbers are automatically generated by the layouts "Tallies:semantic field(abs)" and "Tallies:semantic word class(abs)" of the latest database template (v. 0.5.2). In the final version, theere will probably be percentages, but the editors will take care of this. (Please do not use precise numbers when discussing the table in the text, but relative terms such as "very few", "the bulk", etc.)

Note that the following (abbreviated) semantic field names should be used:
1. The physical world
2. Kinship
3. Animals
4. The body
5. Food and drink
6. Clothing and grooming
7. The house
8. Agriculture and vegetation
9. Basic actions and technology
10. Motion
11. Possession
12. Spatial relations
13. Quantity
14. Time
15. Sense perception
16. Emotions and values
17. Cognition
18. Speech and language
19. Social and political relations
20. Warfare and hunting
21. Law
22. Religion and belief
23. The modern world
24. Function words

The text should mention which kinds of words are more commonly borrowed than others. For example, if nouns are borrowed more often than members of other word classes, or if content words are borrowed more often than function words, this should be discussed. Categories which are likelier to be borrowed may be more specific, e.g. kinship terms, pronouns, mammal names, cooking terminology, etc. In all such cases, try to propose reasons why these and not other words have been borrowed.

B5.Integration of loanwords

– What are the major adaptation processes that loanwords have undergone/are undergoing? (Phonological, morphological, orthographic) – Do the loanwords form recognizable vocabulary strata? – Are there particular borrowing routines (i.e. conventionalized ways of adapting words to the recipient language)? – What do we know about speakers' attitudes to loanwords?

B6.Grammatical borrowing

Does grammatical borrowing also occur alongside the lexical borrowing that is the focus of this project? What are the major grammatical contact phenomena?

B7.Any other interesting material

(explanations, elaboration, speculation,...)


A summary of your major findings.

3. The data part (=the electronic database)

The data part of the chapters will consist of a database of words of the language. These words must minimally include the counterparts (=translational equivalents) of the 1460 meanings on the Loanword Typology Meaning List (to the extent that this is possible).

For all aspects of the database, contributors should feel free to provide further information. No information that exists and is reliable should be lost. The fields below are in some sense a minimal list. The data provided by authors may also contain the following:

    * Additional meanings (especially additional loanwords)
    * Additional fields (e.g. about pronunciation, further comments fields, etc.)
    * Additional distinctions for those fields that have a controlled value list (however, in this case authors must specify how their distinctions map on the general distinctions made in the project)

Contributors must submit the data in the form of a FileMaker Pro 7 database, using a template prepared by the editors (see here for more information).

3.1. The Loanword Typology Meaning List (LWTML)

This list is based on the IDS (Intercontinental Dictionary Series) list, which itself is based on C.D.Buck's A dictionary of selected synonyms in the principal Indo-European languages (1949). It contains 1460 lexical meanings. See this page for the list.

It is important to realize that the LWT Meaning List should be thought of as a list of meanings that is designed to elicit words from the project languages. It is not a list of English words. Some LWT meanings are narrower than those of the English label; e.g. LWT 9.61, labeled 'to forge', is intended to refer to the action of making something from a piece of metal, not to the action of illegally copying something. And some LWT meanings are broader than those of corresponding English words; e.g. LWT 1.36, labeled 'the river or stream', is intended to refer to a flowing body of water of any size, a meaning for which English does not have a non-circumlocutory expression.

The words have been divided into 24 "semantic fields" (the 22 domains from Buck, plus a domain called "modern world" and a domain called "function words"). These serve as a rough organization of the words for the moment, but they play no crucial role. Any other reasonable classification would work. There is also some information on word-class-style semantic categories (noun, verb, etc.). Some words have been added for particular ecoregions (Africa, Amazonia, Arctic, etc.), to balance the (Indo-) European bias of Buck.

The LWT code uniquely identifies the corresponding LWT meaning. It consists of the LWT chapter number (for the “semantic field”), followed by sequential numbers.

The LWT Meaning List contains the following fields (marked by “M” followed by field number), which have been filled by the editors and should not be modified by the contributors:

(M2) LWT Label. Recall that this stands for a meaning, not for a word of English.

(M3) Meaning Description. This is additional information about the meaning entered by the editors in order to clarify or disambiguate certain items. It is filled in for about 25% of the records.

(M4) Typical Context. These are phrases or sentences entered by the editors in order to assist contributors who are unsure about the meaning of the LWT label. It should be used as a guideline only. This field is filled in for almost half of the records.

(M5) Semantic Category. This field refers to ontological categories and is NOT meant to convey any syntactic information. Thus ‘noun’ can be taken as a label for ‘a thing’ or ‘an entity’; ‘verb’ is an activity; ‘adjective’ is a property; and so forth.

If you add new meanings, you will of course have to create LWT label, and you may also fill in some of the other fields in this section.

3.2.Fields of the project-language databases

The individual databases supplied by the project members provide information on the counterparts of the LWT meanings in the project languages. There are 32 fields (marked by "W" followed by field number). White fields are obligatory and must be filled in by the contributors. Grey fields are optional. The two fields “W2” and “W9” in red font are the most important fields.

(A) Fields that are relevant to all words of the database (not just to the loanwords):

(W0) Project Language. This should appear automatically as soon as you enter data for a word.

((W1) Record #. This field uniquely identifies the word. You do not need to fill it in.)

(W2) Word Form in the project language. This should be given in the spelling or transcription/transliteration that is most commonly used by linguists for the language. Standard Unicode encoding must be used. Important: Do not fill in more than one item in this field.

In the simplest case, there is one word for a given LWT meaning, and a project-language word corresponds to just one LWT meaning (one-to-one relationship). For more complicated relationships between meanings and words, see below under (C).

Homonyms should be distinguished by indices in parentheses (e.g. German arm (1) 'poor', Arm (2) 'arm').

The words should be given in their standard citation form. (Do not include articles unless they are part of the standard citation form; the English labels only contain articles to indicate the semantic category.) For nouns, this means singular in almost all cases. However, if a noun occurs only in a different number than the English label, this is no problem. (Thus, an English plurale tantum such as oats can be rendered by a singular mass noun, and a singular can be rendered by a plurale tantum; e.g. German only has the plural form Geschwister 'siblings', but this is a perfect counterpart of LWT 2.456 (the sibling).) Similarly, if the project language has a close semantic match that belongs to a different semantic category than the LWT meaning, such a word can be a counterpart. (Thus, for LWT 5.14 'to be hungry', French only has the noun faim and English only has the adjective hungry.)

The word form can be a single word, or a phrasal expression. Phrasal expressions should only be used when they are conventionalized fixed phrases that normally express the meaning in question. We do not want descriptions or explanations of the meaning. For example, for LWT 4.393 ('the feather'), a language may have 'hair of bird' as the best equivalent, but if this is not a fixed expression, it cannot be used as the language's counterpart. (The entry should be left unfilled.) By contrast, to make love (LWT 4.67) is a fixed expression in English. Similarly, we do not want a compound phrase of two or more hyponyms (“A and/or B”). For example, in the Indonesian list, LWT 2.94 ('we') is left unfilled, because there is no single word that corresponds to this meaning, but rather two sub-counterparts that are already included in the LWT list: kami ‘we [inclusive]’ (LWT 2.941) and kita ‘we [exclusive]’ (LWT 2.942).

Only established, conventionalized loanwords that are felt to be part of the language should be given, not nonce borrowings. This distinction is often hard to make (especially when there are no monolingual speakers), but authors should try as best they can.

Missing Word Form field: Sometimes the contributor may be unable to provide any project-language form. In such cases, the "Missing word form" box should be selected. Please give the reason why a word form is missing. In general, the reason will be one of the following three:

<insufficient information> (Many authors will simply not have 100% of the information needed to fill out the list completely, so although this should be minimized, it will sometimes be unavoidable.)

<meaning irrelevant to speakers> (This applies to cultural and environmental items that speakers virtually never have occasion to talk about. If most or all speakers are bilingual and speak a dominant language that has words for such items, speakers are often happy to use nonce borrowings, but we are not really interested in them.)

<no counterpart> (Some meanings may not have a direct counterpart in the project language although they would not be irrelevant in the culture or environment. For example, Jakarta Indonesian has no word for 'do', even though people do things all the time. Or a meaning might be taboo, or expressed only lexically.).

Further details can be entered into the "Comments (on missing word form)" field.

(W3) Original Script. In the main word-form field, the word is given in the usual spelling or transcription/transliteration. When the language uses a non-Latin script, the word form as written in the language's usual writing system may be entered here, using standard Unicode encoding.

(W4) Grammatical Info. This is an optional field that some authors may want to fill in.

(W5B) Comments On Word Form. Here you can add all kinds of additional information about the word.

(W6) Meaning in the project language. This field needs to be filled out only when there is a noticeable meaning difference between the project-language form and the LWT meaning. This will be the case whenever a language only has several co-hyponyms of a LWT meaning, and also when the project-language form is a hyperonym of a LWT meaning.

(W7) Analyzability. Indicate whether the word is (1) unanalyzable (if the form cannot be analyzed into two or more constituents); (2) semi-analyzable (if you can identify a constituent structure, but not all constituents have meanings, such as cran in cranberry); (3) analyzable derived; (4) analyzable compound; (5) analyzable phrasal.

(W8) For (semi-)analyzable items, give a morpheme-by-morpheme gloss, i.e. a hyphenation and a gloss in brackets. For example, for German Zahnfleisch 'gums', the field would contain the following: "Zahn-fleisch [tooth-flesh]". As far as possible, use the Leipzig Glossing Rules. If you cannot find an abbreviation for a particular grammatical category in the Leipzig Glossing Rules, do not abbreviate. Rather, give the complete term in capitals. For example, the Indonesian counterpart of LWT 4.71 ('to beget') would be mem-per-anak-kan [ACTIVE-CAUS-child-APPL]. The abbreviations CAUS for causative and APPL for applicative can be found in the Leipzig Glossing Rules (see the list at the end of this page). However, they do not contain an abbreviation for the term active, so the full form must be used: ACTIVE.

(W9) Borrowed. Indicate whether, to the best of your knowledge, the word is a loanword, i.e. has been borrowed from another language at some point in the language's history (or prehistory; protolanguages are also considered stages of the same language, so that a word borrowed into Proto-Uralic would count as a loanword in Hungarian). Five degrees of certainty are distinguished:

0. No evidence for borrowing
1. Very little evidence for borrowing
2. Perhaps borrowed
3. Probably borrowed
4. Clearly borrowed

(This field does not allow values like "Clearly not borrowed" or "Clearly inherited" because any word could have been borrowed at some prehistoric time, so we can never be sure that a word is not an old loanword. And even loanwords can be inherited, e.g. a word borrowed into Proto-Uralic can be inherited by Hungarian.)

We are dealing basically with morphs which are transferred or copied from one lect into another. Words from a substrate language are considered to be loanwords, even though some linguists may not consider that as "borrowing". The same goes for words from a superstrate. Transfer of morphs from one dialect to another, e.g. from a standard variety into a regional dialect, or from an acrolect into a basilect in general, is also treated an instance of borrowing, even though "foreign" loans are our major focus of attention.

Excluded from the class of loanwords are neologisms (= productively created lexemes), even those which consist partly or entirely of foreign material, because they are created in the recipient language, not in the donor language (but see W12 below).

(W10) Comments On Borrowed. Here comments on the judgment given in the "borrowed" field can be entered.

(W11) Calqued. If an analyzable word is (or seems to be) a calque, this information can be included here. A calque is a complex form that was created on the model of a complex form in a donor language and whose constituents correspond semantically to the donor language constituents (e.g. French lune de miel calqued from English honeymoon). Note that information about calques is optional. (Semantic loans, i.e. words in which the contact influence concerns exclusively the polysemy patterns, cannot be taken into account systematically in this project.)

(W12) Created On Loan Basis. If an analyzable word has a loanword as at least one of its bases, this information can be included here. Note that this information is optional.

(W13) Age. For each word, give the time at which it was first attested or reconstructed in the language.

For loanwords, give the time when the word was borrowed. For non-loanwords, give the time of earliest attestation or reconstruction.

Times may be indicated by year numbers or by period names. For exceptionally well-attested languages, it may be possible to give centuries or even decades. For example, German Kamel 'camel' is known to have been borrowed during the 16th century. In most cases, only language-particular periods such as "Middle High German", or "Tang dynasty" can be given. In languages with no earlier attestation, reconstructed proto-languages may be given as periods. Another possible type of period is "first contact with Spanish" (although in such cases precise dates are generally known).

Contributors must provide approximate dates for the periods they use in a separate list, e.g. Middle High German = 1050-1350, Tang Dynasty = 618-907, Proto-Indo-European = 5000-3000 BCE.

The age field may also contain the elements "before", "after", or "approximately" preceding the period name or year number.

(W13A) Frequency [numeric]. This is an optional field, relevant for authors who have information on word frequency based on a reasonably representative corpus. To make the frequency data comparable, the figure entered here should be frequency of occurrence per million words (e.g. if the corpus contains 50,000 words and a word occurs 3 times, the frequency per million words is 60).

(W13B) Frequency [relative]. Contributors who have no access to numeric information about frequency may still want to include some information about frequency, as this will be important when computing borrowability. In this field you can state whether, impressionistically, the word in question is very common, fairly common, or not common.

(W23) Register. This is an optional field for giving impressionistic information about the word’s register: formal, colloquial, or general. For example, when giving the English counterparts for item LWT 1.343 ('the cape'), promontory would be marked as ‘formal’, while peninsula would be marked ‘general’. For LWT 23.16 ('the airplane'), the Hebrew counterpart avirón would be marked ‘colloquial’, and the counterpart matós would be marked ‘general’.

(B) Fields to be filled in just for words identified as loanwords, calques, or created on loan basis

(Note that these fields can be filled out even for words that the author does not regard as loanwords, i.e. where s/he sees "Very little evidence for borrowing".)

The first question we ask about borrowed items is their source, ideally their immediate source. For example, LWT 23.31 ('the president') has the Indonesian counterpart presiden, which is ultimately derived from Latin praesidens. But Indonesian borrowed the word directly from Dutch president, so this should be given as the source word. However, sometimes the immediate source will not be known, in which case the earliest known source word should be given. For example, the Indonesian counterpart of LWT 14.64 ('Wednesday') is rabu. It probably entered the language via a yet unidentified Indian language; what is clear is that it ultimately derived from Arabic arba‘a, so this is given as the source word.

Sometimes it is clear that a word must be a loanword, but its source is not known. In such a case, tick "Source word is unidentifiable".

There are three pieces of information for the source word. Contributors must fill in either the three fields for the immediate source word (if this is known) (W14-W15-W16), or the three fields for the ultimate source word (W14A-W15A-W16A). The other three fields are optional.

(W14/W14A) Source Word Form. This should be given in the spelling or transcription/transliteration that is most commonly used by linguists for the language. Standard Unicode encoding must be used.

(W15/W15A) Donor Language. Please state the donor language, if known. If the source word is clear, but the donor language cannot be determined, enter ‘No information’. Sometimes the possibilities can be narrowed down to a small set of languages, e.g. a family of closely related languages, or a set of two or three languages. So entries like "Mongolic" or "Spanish or Portuguese" are also acceptable in this field. (In this case, multiple entries in W14/W14A are possible.)

(W16/W16A) Meaning. The meaning of the source word, also if it is identical to the meaning of the borrowed word.

(W17) Comments on Intermediate Source Words. Sometimes we have information about intermediate source words, in addition to the immediate and ultimate source words. This information can be entered here.

(W18) Effect. This field tells us whether the word whether it replaced an earlier word (1: replacement), whether it was simply added where no earlier word existed with the same meaning (2: insertion), whether it coexists with an earlier word of roughly the same meaning (3: coexistence), or whether there is no information about its effect (0: no information).

(W19) Integration (1: highly integrated, 2: intermediate, 3: unintegrated). This is an impressionistic scale. We do not attempt to measure the degree of phonological and morphological integration in a precise way. As a rough guideline, let us say that an unintegrated loanword keeps significant phonological and/or morphological peculiarities of the donor language and is recognizable as loanword also to speakers with no training in linguistics. A highly integrated loanword is one that has no properties that betray its foreign origin. A loanword with an intermediate degree of integration is one that has some synchronic properties of the foreign language.

(W20) Environmental Salience. This field gives information about the degree to which a word's meaning is relevant to the speakers. "Environment" refers both to the natural and to the cultural environment. The three values are:

1. Present in pre-contact environment
2. Present only since contact
3. Not present

By ‘contact’, we mean the first contact between speakers of the project language and the donor language. This contact could have been with speakers of the donor language, but it could also have been with written sources in the donor language.

(W21) Bibliographical reference for loanword status. Ideally, this page should give author-year-page number, e.g. "Johanson 1971: 33" (though giving page numbers may sometimes be difficult). If the data are the author's own data, it should say "own data". (A separate complete list of full references must be provided in addition, in conjunction with the book chapter.)

(W22) Other Comments. This field is for any additional comments about the loanword entry.

(W27) Contact situation. This field should contain the name of the contact situation in which the word was borrowed. These names can be entered into the editable drop-down menu. Normally there will be at least as many contact situations as there are donor languages. But languages can borrow words from the same language in completely different situations. For instance, English dish was borrowed from Latin discus in pre-Old English times, whereas discus was borrowed from the same Latin word in the 17th century. So we need to distinguish a contact situation "Latin to West Germanic" from a contact situation "learned modern Latin". On the other hand, for the borrowing of boomerang and kangaroo, we can assume basically the same contact situation ("Australian to invaders' English"), even if the two terms are from two different donor languages. The various contact situations will be explained in some detail in the discussion part (see below).

(C) Meaning-word relationships

Relationships between LWT meanings and project-language words are often more complicated than the simple one-to-one relationship assumed so far.

For this reason, the database is designed in such a way that a single meaning can correspond to several words, and a single word can correspond to several meanings. The database essentially consists of three tables: a Words table (with word-related information), a Meanings table (with information relating to the 1460 meanings which were supplied by the editors), and a Meaning-Word Pairs table (with information concerning the relationship between meanings and words). The three tables are linked through a unique meaning identifier (the LWT Code) and a unique word identifier (the Word Record #).

(i) One meaning, several words

Often there will be several different possibilities for filling in the field W2, because languages commonly have synonyms. You may enter as many synonyms as you like, as long as you include the most frequent and/or colloquial synonym.

Synonyms are entered by going to the WORDS table, creating a new record (menu Records > New record), and then linking the new word to the old meaning in the main layout. (Create a new record in the main layout, and then select a meaning and a word from the drop-down lists next to the buttons "set to an existing meaning" and "set to an existing word".)

You may also want to enter several sub-counterparts (i.e. words that are narrower in meaning than the LWT meaning). This is done in the same way as entering several synonyms, but for sub-counterparts, please select "sub-counterpart" in the field W25 (Word to meaning relationship).

When the project language has no exact counterpart of the LWT meaning, but a super-counterpart (a word with a broader meaning), this broader word should be given, and "super-counterpart"should be selected in the field W25 (Word to meaning relationship). Thus, for LWT 3.94 ('the snail'), German Schnecke may be given, although it means 'snail or slug'.

If the hyperonym's meaning is the direct counterpart of another LWT meaning. But for LWT 3.42 ('the stallion'), a word meaning 'horse' may not be given, because 'horse' is another LWT meaning (LWT 3.41).

If a language has no conventionalized way of referring to a male horse specifically, this field must be left unfilled for that language.

(ii) One word, several meanings

It is probably best to fill in all the words and then (in a second step) deal with cases where a word corresponds to several LWT meanings (i.e. cases of duplicates).

Ultimately, no word duplicates (i.e. records with the same content in the Word Form field) are allowed. Polysemous or vague words corresponding to several LWT meanings should eventually appear just once in the words table. (And of course homonyms must be distinguished by numbers; see above)

So what needs to be done now is to remove duplicate records from the words table, while keeping the relationships to the meanings table intact. This is made easy by a script on the layout "WORDS-duplicate housekeeping". You can find duplicates by clicking on the button "find all duplicates" in the first line of that layout. This will list the duplicates in two portals (=sets of up to three rows with words), one for word form duplicates, and the other for full duplicates. (One record is always displayed in each portal; eventually the portals must not have more than one record anywhere.)

Word form duplicates are records with the same content in field W2. These may have very different content in the other fields, but they could still represent the same word (the differences could be due to the fact that the content was filled in at different times, etc.). Full duplicates are records with identical content in all fields, except the match with the LWT meaning, which is of course different. Full duplicates are fairly likely to be the same word, but it could of course be that two homonyms which are not loanwords have the same age and analyzability etc. and none of the other fields are filled. (All full duplicates are also word form duplicates, by definition.)

It is best to start by removing full duplicates. Go into the Find mode and click on the button next to "POSSIBLE FULL DUPLICATES". This will find the full duplicates (though FileMaker only looks at the first ten characters in each field, so there is a small chance that some of the found records may not be full duplicates after all), and sort them alphabetically. Then for each duplicate set, you'll need to decide first whether the duplicates are homonyms or a single vague/polysemous word. (You can switch between the duplicates by clicking on the "view" button at the end of the portal rows.)
• If they are homonyms, please add distinguishing numbers in parentheses. (Note that as soon as you add a number, the word disappears from the portal, because it is no longer a duplicate. So please remember which words are homonyms and mark them all by numbers.)
• If they are a single vague/polysemous word, you'll need to delete all but one of the duplicates. This is done by declaring one of the duplicates the "master", by clicking on the "master" button on the right in the portals. This will mark the master record as "KEEP", and the others as "DISCARD". Please make sure not to lose information from any of the fields of the discarded word records.
• Once you are sure that all the information you want to give has been taken into the master record, make a backup copy of the database file and then click on "delete all discarded words". This will delete the non-master words and re-link the master word to the right LWT meanings.

Creating a new meaning entry: Sometimes a contributor might want to add a meaning that is not included in the original 1460 meanings of the database. To do that, go to the "Records" menu and click on "New Record". A new record will open. A temporary random number will be assigned to the record. However, you will still need to enter one of the 24 existing chapter numbers before filling in any other fields.

(D) Custom fields

There are 10 custom fields that you may use for entering all kinds of other information

3.3. Paper/electronic publishing

At most the (presumed) loanwords will be listed in the paper version. All other words will only be published in the electronic version.
List of category abbreviations for morpheme-by-morpheme glosses

The following abbreviations are standard by the Leipzig Glossing Rules. Some other abbreviations have been added specifically for the Loanword Typology project (see below at the end of the list).

*1 first person
*2 second person
*3 third person
*A agent-like argument of canonical transitive verb
*ABL ablative
*ABS absolutive
*ACC accusative
*ADJ adjective
*ADV adverb(ial)
*AGR agreement
*ALL allative
*ANTIP antipassive
*APPL applicative
*ART article
*AUX auxiliary
*BEN benefactive
*CAUS causative
*CLF classifier
*COM comitative
*COMP complementizer
*COMPL completive
*COND conditional
*COP copula
*CVB converb
*DAT dative
*DECL declarative
*DEF definite
*DEM demonstrative
*DET determiner
*DIST distal
*DISTR distributive
*DU dual
*DUR durative
*ERG ergative
*EXCL exclusive
*F feminine
*FOC focus
*FUT future
*GEN genitive
*IMP imperative
*INCL inclusive
*IND indicative
*INDF indefinite
*INF infinitive
*INS instrumental
*INTR intransitive
*IPFV imperfective
*IRR irrealis
*LOC locative
*M masculine
*N neuter
*N- non- (e.g. NSG nonsingular, NPST nonpast)
*NEG negation, negative
*NMLZ nominalizer/nominalization
*NOM nominative
*OBJ object
*OBL oblique
*P patient-like argument of canonical transitive verb
*PASS passive
*PFV perfective
*PL plural
*POSS possessive
*PRED predicative
*PRF perfect
*PRS present
*PROG progressive
*PROH prohibitive
*PROX proximal/proximate
*PST past
*PTCP participle
*PURP purposive
*Q question particle/marker
*QUOT quotative
*RECP reciprocal
*REFL reflexive
*REL relative
*RES resultative
*S single argument of canonical intransitive verb
*SBJ subject
*SBJV subjunctive
*SG singular
*TOP topic
*TR transitive
*VOC vocative

Abbreviations that are not in the Leipzig Glossing Rules:

*12 combined first and second person
*AGT agent, agentive
*ACT active
*CIRC circumfix (for the second part of a circumfix; see Leipzig Glossing Rules, Rule 7)
*COLL collective
*DENOM denominal
*DIMIN diminutive
*FREQ frequentative
*HON honorific
*MID middle
*NOUN noun-forming affix (including from other nouns)
*RED reduplication
*SEM semelfactive
*STAT stative
*VN verbal noun


  top Top  
Max Planck Institute for Evolutionary Anthropology - Department of Linguistics