The goal of the Loanword Typology (LWT) project is to
assemble systematic information on loanword patterns in about 40
languages from around the world, as a way of assessing lexical
borrowability in a controlled way. The planned result will be a volume
of about 40 chapters on individual languages, contributed by authors
who are specialists in these languages. In addition, there will be a
volume discussing the overall results of the project. The lexical
database will be made accessible in an electronic format.
Each chapter will consist of two parts: a data part and a discussion part. The first version
of the data part should be submitted to the editors (Martin Haspelmath
and Uri Tadmor) by the end of 2006. The final version of the database
and the text are due by the end of June 2007. Revisions and editing
will be conducted during the second half of 2007.
The discussion part (=the prose chapter)
The Loanword Typology contributions consist of a data
part and a
discussion part, here called "prose chapter". The first version of the
database was due at the end of 2006. The first version of the prose
chapter is due on 30 June 2007.
Both parts should be submitted in electronic form to
both editors (Martin Haspelmath and Uri Tadmor).
The prose chapter should be submitted as a text file or
common word processor format (not PDF, because the editors need to
process the text further).
Only Unicode fonts should be used, font size 12.
The text should consist of not more than 7,000 prose
excluding references and the appendix. Single spaced, this yields about
15 printed pages. There can also be up to 5 figures (maps, tables,
A4.Title and headings
The titles of the chapters are uniform: "Loanwords in
little-known languages, further information on affiliation and
geographical location is given, e.g.
* "Loanwords in Gawwada, a Cushitic language of Ethiopia"
* "Loanwords in Gurindji, a Pama-Nyungan language of Australia"
* "Loanwords in Thai"
The chapters are divided exhaustively into sections,
section 1. Sections may be divided into subsections (5.1., 5.2., etc.),
which must again be exhaustive (though a short introductory paragraph
may precede the first subsection).
Figures (tables, maps, etc.) should be centered and
of figures should also be centered, and begin with the type of figure,
followed by its number, a colon, and then the title. No punctuation
mark is used at the end of titles. For example:
Table 4: Retained glides in Sanskrit loanwords in Indonesian
When referring to figures in the text, use the figure
than expressions like ‘the following table’, since we are not sure
where exactly the figures will appear in the printed version.
A6.Abbreviations and footnotes
Abbreviations and footnotes should be avoided. Language
not abbreviated, and other abbreviations are probably not needed.
Footnotes are probably not needed due to the survey
of the chapters. If footnotes are used, they should be printed at the
bottom of the page.
A7.Citing loanwords and source words
All non-English words should be italicized. Glosses are
single quotes. If the meaning of the source word is different from that
of the loanword, it too should be given in single quotes. Etymologies
are placed in parentheses. The mark '<', followed by a space, may be
substituted for 'from':
A rare example of a plural form in Indonesian is muslimin 'Muslim men' (< Arabic muslimīn).
In the word warta 'news' (< Sanskrit vṛtta 'event'), the initial glide has been retained.
In the text, works cited are referred to only by author
The year is not parenthesized, unless the referent is the author rather
than the work:
Arabic loanwords in Indonesian are listed in Jones 1978.
Jones (1978) lists Arabic loanwords in Indonesian.
An alphabetical complete list of full references is
given at the
very end of the paper, following the appendix. (Note that this is not
counted for the 7000 word limit.)
The following format is used:
* Alpher, Barry & Nash, David. 1999. "Lexical replacement and cognate equilibrium in Australia."
Australian Journal of Linguistics 19.1: 5-56.
* Baldi, Sergio. 1988. A First ethnolinguistic comparison of Arabic loanwords common to Hausa and Swahili.
Naples: Istituto Universitario Orientale.
* Betz, Werner. 1974. "Lehnwörter und Lehnprägungen im Vor- und Frühdeutschen."
In: Maurer, Friedrich & Rupp, Helmut (eds.) Deutsche Wortgeschichte, Band I. Berlin: de Gruyter, 33-57.
An appendix lists all the loanwords (of categories 3 and
certain and probable loanwords), in two columns, loanword and
translation (the source words are not given). They are arranged by
donor language, as seen in the example box below. Within each donor
language, the words are arranged roughly according to the LWT Meaning
List (if a word corresponds to several LWT meanings, it should of
course be given only once).
(Hypothetical example language: German)
* Insel 'island'
* Ozean 'ocean'
* Löwe 'lion'
* Kamel 'camel'
* Kochen 'cook'
* Kessel 'kettle'
* Pfanne 'pan'
* Schüssel 'dish'
* Lagune 'lagoon'
* Stiefel 'boot'
* Teller 'plate'
Middle Low German
* Bucht 'bay'
* Riff 'reef'
* Ebbe 'ebb'
* Klippe 'cliff or precipice'
* Küste 'shore'
* Juwel 'jewel'
The prose chapter should give an accessible overview of
situation in the language and should at the same time summarize the
findings from the research done by the author for this project.
For the sake of comparability, the paper should contain
the following sections (even if there is little to report). In
addition, it may (but need not) contain up to three additional
sections, in which the author can discuss any topic that s/he considers
B1.The language and its speakers
This section should contain information on
– the speakers: how many, where
– the language's genealogical affiliation
– the sociolinguistic status: which domains of use (home, school,
religion, official business, media, written literature)
– the historical context
(The section should also contain a simple map showing
geographical location of the language and perhaps some neighboring
donor languages. The editors will provide help with making these maps.)
B2.Sources of data
What data have been used for the database? What earlier
work on loanwords in the language exists?
This section should list all the major contact
languages, as well as
describe the contact situations. (For some contact languages, there are
clearly distinguishable contact situations, e.g. Medieval French and
19th century French for loanwords into English.) What were the social
conditions under which the loanwords arose?
B4.Numbers and kinds of loanwords
Two standardized tables
should show the distribution of loanwords over semantic word classes
(nouns, verbs, ..) and semantic fields (Physical world, Kinship,
Animals, Body, etc.). Only equivalents of meanings
in the core LWT should be included; the equivalents of any meanings
added by the author should not be counted. If the same loanword is the
equivalent of more than one LWT meaning, it is only counted once.
word corresponds to several LWT meanings that are in different semantic
categories (e.g. if a word corresponds both to a LWT noun and to a LWT
verb, or if a word corresponds to both a LWT body part term and a LWT
political term, perhaps a word translatable as 'head, chief'), then
this word counts half in one category and half in the other category.
The numbers are
automatically generated by the layouts "Tallies:semantic field(abs)"
"Tallies:semantic word class(abs)" of the latest database template (v.
0.5.2). In the final version, theere will probably be percentages, but
the editors will take care of this. (Please do not use precise numbers
when discussing the table in the text, but relative terms such as "very
few", "the bulk", etc.)
Note that the following
(abbreviated) semantic field names should be used:
1. The physical world
4. The body
5. Food and drink
6. Clothing and grooming
7. The house
8. Agriculture and vegetation
9. Basic actions and technology
12. Spatial relations
15. Sense perception
16. Emotions and values
18. Speech and language
19. Social and political relations
20. Warfare and hunting
22. Religion and belief
23. The modern world
24. Function words
The text should mention which kinds of words are more
borrowed than others. For example, if nouns are borrowed more often
than members of other word classes, or if content words are borrowed
more often than function words, this should be discussed. Categories
which are likelier to be borrowed may be more specific, e.g. kinship
terms, pronouns, mammal names, cooking terminology, etc. In all such
cases, try to propose reasons why these and not other words have been
B5.Integration of loanwords
– What are the major adaptation processes that loanwords
undergone/are undergoing? (Phonological, morphological, orthographic)
– Do the loanwords form recognizable vocabulary strata?
– Are there particular borrowing routines (i.e. conventionalized ways
of adapting words to the recipient language)?
– What do we know about speakers' attitudes to loanwords?
Does grammatical borrowing also occur alongside the
borrowing that is the focus of this project? What are the major
grammatical contact phenomena?
B7.Any other interesting material
(explanations, elaboration, speculation,...)
A summary of your major findings.
3. The data part (=the electronic
The data part of the chapters will consist of a database of words of
the language. These words must minimally include the counterparts
(=translational equivalents) of the 1460 meanings on the Loanword
Typology Meaning List (to the extent that this is possible).
For all aspects of the database, contributors should feel free to
provide further information. No information that exists and is reliable
should be lost. The fields below are in some sense a minimal list. The
data provided by authors may also contain the following:
* Additional meanings (especially additional
* Additional fields (e.g. about pronunciation,
further comments fields, etc.)
* Additional distinctions for those fields that have
a controlled value list (however, in this case authors must specify how
their distinctions map on the general distinctions made in the project)
Contributors must submit the data in the form of a FileMaker Pro 7
database, using a template prepared by the editors (see here for
3.1. The Loanword
Typology Meaning List (LWTML)
This list is based on the IDS (Intercontinental Dictionary Series)
list, which itself is based on C.D.Buck's A dictionary of selected
synonyms in the principal Indo-European languages (1949). It contains
1460 lexical meanings. See this page
for the list.
It is important to realize that the LWT Meaning List should be thought
of as a list of meanings that is designed to elicit words from the
project languages. It is not a list
of English words. Some LWT meanings are narrower than those of
the English label; e.g. LWT 9.61, labeled 'to forge', is intended to
refer to the action of making something from a piece of metal, not to
the action of illegally copying something. And some LWT meanings are
broader than those of corresponding English words; e.g. LWT 1.36,
labeled 'the river or stream', is intended to refer to a flowing body
of water of any size, a meaning for which English does not have a
The words have been divided into 24 "semantic fields" (the 22 domains
from Buck, plus a domain called "modern world" and a domain called
"function words"). These serve as a rough organization of the words for
the moment, but they play no crucial role. Any other reasonable
classification would work. There is also some information on
word-class-style semantic categories (noun, verb, etc.). Some words
have been added for particular ecoregions (Africa, Amazonia, Arctic,
etc.), to balance the (Indo-) European bias of Buck.
The LWT code
uniquely identifies the corresponding LWT meaning. It consists of the
LWT chapter number (for the “semantic field”), followed by sequential
The LWT Meaning List contains the following fields (marked by “M”
followed by field number), which have been filled by the editors and
should not be modified by the contributors:
(M2) LWT Label.
Recall that this stands for a meaning, not for a word of English.
(M3) Meaning Description.
This is additional information about the meaning entered by the editors
in order to clarify or disambiguate certain items. It is filled in for
about 25% of the records.
(M4) Typical Context.
These are phrases or sentences entered by the editors in order to
assist contributors who are unsure about the meaning of the LWT label.
It should be used as a guideline only. This field is filled in for
almost half of the records.
(M5) Semantic Category.
This field refers to ontological categories and is NOT meant to convey
any syntactic information. Thus ‘noun’ can be taken as a label for ‘a
thing’ or ‘an entity’; ‘verb’ is an activity; ‘adjective’ is a
property; and so forth.
If you add new meanings, you will of course have to create LWT label,
and you may also fill in some of the other fields in this section.
3.2.Fields of the
The individual databases supplied by the project members provide
information on the counterparts of the LWT meanings in the project
languages. There are 32 fields (marked by "W" followed by field
number). White fields are obligatory and must be filled in by the
contributors. Grey fields are optional. The two fields “W2” and “W9” in
red font are the most important fields.
Fields that are relevant to all words of the database (not just to the
(W0) Project Language.
This should appear automatically as soon as you enter data for a word.
((W1) Record #. This field
uniquely identifies the word. You do not need to fill it in.)
(W2) Word Form in
the project language. This should be given in the spelling or
transcription/transliteration that is most commonly used by linguists
for the language. Standard Unicode encoding must be used. Important: Do
not fill in more than one item in this field.
In the simplest case, there is one word for a given LWT meaning, and a
project-language word corresponds to just one LWT meaning (one-to-one
relationship). For more complicated relationships between meanings and
words, see below under (C).
Homonyms should be distinguished by indices in parentheses (e.g. German
arm (1) 'poor', Arm (2) 'arm').
The words should be given in their standard citation form. (Do not
include articles unless they are part of the standard citation form;
the English labels only contain articles to indicate the semantic
category.) For nouns, this means singular in almost all cases. However,
if a noun occurs only in a different number than the English label,
this is no problem. (Thus, an English plurale tantum such as oats can be rendered by a singular
mass noun, and a singular can be rendered by a plurale tantum; e.g.
German only has the plural form Geschwister
'siblings', but this is a perfect counterpart of LWT 2.456 (the
sibling).) Similarly, if the project language has a close semantic
match that belongs to a different semantic category than the LWT
meaning, such a word can be a counterpart. (Thus, for LWT 5.14 'to be
hungry', French only has the noun faim
and English only has the adjective hungry.)
The word form can be a single word, or a phrasal expression. Phrasal
expressions should only be used when they are conventionalized fixed
phrases that normally express the meaning in question. We do not want
descriptions or explanations of the meaning. For example, for LWT 4.393
('the feather'), a language may have 'hair of bird' as the best
equivalent, but if this is not a fixed expression, it cannot be used as
the language's counterpart. (The entry should be left unfilled.) By
contrast, to make love (LWT
4.67) is a fixed expression in English. Similarly, we do not want a
compound phrase of two or more hyponyms (“A and/or B”). For example, in
the Indonesian list, LWT 2.94 ('we') is left unfilled, because there is
no single word that corresponds to this meaning, but rather two
sub-counterparts that are already included in the LWT list: kami ‘we [inclusive]’ (LWT 2.941)
and kita ‘we [exclusive]’
Only established, conventionalized loanwords that are felt to be part
of the language should be given, not nonce borrowings. This distinction
is often hard to make (especially when there are no monolingual
speakers), but authors should try as best they can.
Missing Word Form field:
Sometimes the contributor may be unable to provide any project-language
form. In such cases, the "Missing word form" box should be selected.
Please give the reason why a word form is missing. In general, the
reason will be one of the following three:
– <insufficient information>
(Many authors will simply not have 100% of the information needed to
fill out the list completely, so although this should be minimized, it
will sometimes be unavoidable.)
– <meaning irrelevant to
speakers> (This applies to cultural and environmental items
that speakers virtually never have occasion to talk about. If most or
all speakers are bilingual and speak a dominant language that has words
for such items, speakers are often happy to use nonce borrowings, but
we are not really interested in them.)
– <no counterpart>
(Some meanings may not have a direct counterpart in the project
language although they would not be irrelevant in the culture or
environment. For example, Jakarta Indonesian has no word for 'do', even
though people do things all the time. Or a meaning might be taboo, or
expressed only lexically.).
Further details can be entered into the "Comments (on missing word
(W3) Original Script.
In the main word-form field, the word is given in the usual spelling or
transcription/transliteration. When the language uses a non-Latin
script, the word form as written in the language's usual writing system
may be entered here, using standard Unicode encoding.
(W4) Grammatical Info.
This is an optional field that some authors may want to fill in.
(W5B) Comments On Word
Form. Here you can add all kinds of additional information about
(W6) Meaning in the
project language. This field needs to be filled out only when
there is a noticeable meaning difference between the project-language
form and the LWT meaning. This will be the case whenever a language
only has several co-hyponyms of a LWT meaning, and also when the
project-language form is a hyperonym of a LWT meaning.
Indicate whether the word is (1) unanalyzable (if the form cannot be
analyzed into two or more constituents); (2) semi-analyzable (if you
can identify a constituent structure, but not all constituents have
meanings, such as cran in cranberry); (3) analyzable derived; (4)
analyzable compound; (5) analyzable phrasal.
(semi-)analyzable items, give a morpheme-by-morpheme
gloss, i.e. a hyphenation and a gloss in brackets. For example,
for German Zahnfleisch 'gums', the field would contain the following:
"Zahn-fleisch [tooth-flesh]". As far as possible, use the Leipzig
Glossing Rules. If you cannot find an abbreviation for a particular
grammatical category in the Leipzig Glossing Rules, do not abbreviate.
Rather, give the complete term in capitals. For example, the Indonesian
counterpart of LWT 4.71 ('to beget') would be mem-per-anak-kan
[ACTIVE-CAUS-child-APPL]. The abbreviations CAUS for causative and APPL
for applicative can be found in the Leipzig Glossing Rules (see the
list at the end of this page). However,
they do not contain an abbreviation for the term active, so the full
form must be used: ACTIVE.
Indicate whether, to the best of your knowledge, the word is a
loanword, i.e. has been borrowed from another language at some point in
the language's history (or prehistory; protolanguages are also
considered stages of the same language, so that a word borrowed into
Proto-Uralic would count as a loanword in Hungarian). Five degrees of
certainty are distinguished:
0. No evidence for borrowing
1. Very little evidence for borrowing
2. Perhaps borrowed
3. Probably borrowed
4. Clearly borrowed
(This field does not allow values like "Clearly not borrowed" or
"Clearly inherited" because any word could have been borrowed at some
prehistoric time, so we can never be sure that a word is not an old
loanword. And even loanwords can be inherited, e.g. a word borrowed
into Proto-Uralic can be inherited by Hungarian.)
We are dealing basically with morphs which are transferred or copied
from one lect into another. Words from a substrate language are
considered to be loanwords, even though some linguists may not consider
that as "borrowing". The same goes for words from a superstrate.
Transfer of morphs from one dialect to another, e.g. from a standard
variety into a regional dialect, or from an acrolect into a basilect in
general, is also treated an instance of borrowing, even though
"foreign" loans are our major focus of attention.
Excluded from the class of loanwords are neologisms (= productively
created lexemes), even those which consist partly or entirely of
foreign material, because they are created in the recipient language,
not in the donor language (but see W12 below).
(W10) Comments On Borrowed.
Here comments on the judgment given in the "borrowed" field can be
(W11) Calqued. If
an analyzable word is (or seems to be) a calque, this information can
be included here. A calque is a complex form that was created on the
model of a complex form in a donor language and whose constituents
correspond semantically to the donor language constituents (e.g. French
lune de miel calqued from English honeymoon). Note that information
about calques is optional. (Semantic loans, i.e. words in which the
contact influence concerns exclusively the polysemy patterns, cannot be
taken into account systematically in this project.)
(W12) Created On Loan
Basis. If an analyzable word has a loanword as at least one of
its bases, this information can be included here. Note that this
information is optional.
(W13) Age. For each
word, give the time at which it was first attested or reconstructed in
For loanwords, give the time when the word was borrowed. For
non-loanwords, give the time of earliest attestation or reconstruction.
Times may be indicated by year numbers or by period names. For
exceptionally well-attested languages, it may be possible to give
centuries or even decades. For example, German Kamel 'camel' is known
to have been borrowed during the 16th century. In most cases, only
language-particular periods such as "Middle High German", or "Tang
dynasty" can be given. In languages with no earlier attestation,
reconstructed proto-languages may be given as periods. Another possible
type of period is "first contact with Spanish" (although in such cases
precise dates are generally known).
Contributors must provide approximate dates for the periods they use in
a separate list, e.g. Middle High German = 1050-1350, Tang Dynasty =
618-907, Proto-Indo-European = 5000-3000 BCE.
The age field may also contain the elements "before", "after", or
"approximately" preceding the period name or year number.
(W13A) Frequency [numeric].
This is an optional field, relevant for authors who have information on
word frequency based on a reasonably representative corpus. To make the
frequency data comparable, the figure entered here should be frequency
of occurrence per million words (e.g. if the corpus contains 50,000
words and a word occurs 3 times, the frequency per million words is 60).
[relative]. Contributors who have no access to numeric
information about frequency may still want to include some information
about frequency, as this will be important when computing
borrowability. In this field you can state whether,
impressionistically, the word in question is very common, fairly
common, or not common.
This is an optional field for giving impressionistic information about
the word’s register: formal, colloquial, or general. For example, when
giving the English counterparts for item LWT 1.343 ('the cape'),
promontory would be marked as ‘formal’, while peninsula would be marked
‘general’. For LWT 23.16 ('the airplane'), the Hebrew counterpart
avirón would be marked ‘colloquial’, and the counterpart
matós would be marked ‘general’.
Fields to be filled in just for words identified as loanwords, calques,
or created on loan basis
(Note that these fields can be filled out even for words that the
author does not regard as loanwords, i.e. where s/he sees "Very little
evidence for borrowing".)
The first question we ask about borrowed items is their source, ideally
their immediate source. For
example, LWT 23.31 ('the president') has the Indonesian counterpart presiden, which is ultimately
derived from Latin praesidens.
But Indonesian borrowed the word directly from Dutch president, so this
should be given as the source word. However, sometimes the immediate
source will not be known, in which case the earliest known source word should be
given. For example, the Indonesian counterpart of LWT 14.64
('Wednesday') is rabu. It probably entered the language via a yet
unidentified Indian language; what is clear is that it ultimately
derived from Arabic arba‘a, so this is given as the source word.
Sometimes it is clear that a word must be a loanword, but its source is
not known. In such a case, tick "Source
word is unidentifiable".
There are three pieces of information for the source word. Contributors
must fill in either the three fields for the immediate source word (if
this is known) (W14-W15-W16), or the three fields for the ultimate
source word (W14A-W15A-W16A). The other three fields are optional.
(W14/W14A) Source Word
Form. This should be given in the spelling or
transcription/transliteration that is most commonly used by linguists
for the language. Standard Unicode encoding must be used.
(W15/W15A) Donor Language.
Please state the donor language, if known. If the source word is clear,
but the donor language cannot be determined, enter ‘No information’.
Sometimes the possibilities can be narrowed down to a small set of
languages, e.g. a family of closely related languages, or a set of two
or three languages. So entries like "Mongolic" or "Spanish or
Portuguese" are also acceptable in this field. (In this case, multiple
entries in W14/W14A are possible.)
The meaning of the source word, also if it is identical to the meaning
of the borrowed word.
(W17) Comments on
Intermediate Source Words. Sometimes we have information about
intermediate source words, in addition to the immediate and ultimate
source words. This information can be entered here.
(W18) Effect. This
field tells us whether the word whether it replaced an earlier word (1:
replacement), whether it was simply added where no earlier word existed
with the same meaning (2: insertion), whether it coexists with an
earlier word of roughly the same meaning (3: coexistence), or whether
there is no information about its effect (0: no information).
(1: highly integrated, 2: intermediate, 3: unintegrated). This is an
impressionistic scale. We do not attempt to measure the degree of
phonological and morphological integration in a precise way. As a rough
guideline, let us say that an unintegrated loanword keeps significant
phonological and/or morphological peculiarities of the donor language
and is recognizable as loanword also to speakers with no training in
linguistics. A highly integrated loanword is one that has no properties
that betray its foreign origin. A loanword with an intermediate degree
of integration is one that has some synchronic properties of the
Salience. This field gives information about the degree to which
a word's meaning is relevant to the speakers. "Environment" refers both
to the natural and to the cultural environment. The three values are:
1. Present in pre-contact environment
2. Present only since contact
3. Not present
By ‘contact’, we mean the first contact between speakers of the project
language and the donor language. This contact could have been with
speakers of the donor language, but it could also have been with
written sources in the donor language.
reference for loanword status. Ideally, this page should give
author-year-page number, e.g. "Johanson 1971: 33" (though giving page
numbers may sometimes be difficult). If the data are the author's own
data, it should say "own data". (A separate complete list of full
references must be provided in addition, in conjunction with the book
(W22) Other Comments.
This field is for any additional comments about the loanword entry.
(W27) Contact situation.
This field should contain the name of the contact situation in which
the word was borrowed. These names can be entered into the editable
drop-down menu. Normally there will be at least as many contact
situations as there are donor languages. But languages can borrow words
from the same language in completely different situations. For
instance, English dish was borrowed from Latin discus in pre-Old
English times, whereas discus was borrowed from the same Latin word in
the 17th century. So we need to distinguish a contact situation "Latin
to West Germanic" from a contact situation "learned modern Latin". On
the other hand, for the borrowing of boomerang and kangaroo, we can
assume basically the same contact situation ("Australian to invaders'
English"), even if the two terms are from two different donor
languages. The various contact situations will be explained in some
detail in the discussion part (see below).
Relationships between LWT meanings and project-language words are often
more complicated than the simple one-to-one relationship assumed so far.
For this reason, the database is designed in such a way that a single
meaning can correspond to several words, and a single word can
correspond to several meanings. The database essentially consists of
three tables: a Words table
(with word-related information), a Meanings
table (with information relating to the 1460 meanings which were
supplied by the editors), and a Meaning-Word
Pairs table (with information concerning the relationship
between meanings and words). The three tables are linked through a
unique meaning identifier (the LWT Code) and a unique word identifier
(the Word Record #).
(i) One meaning, several
Often there will be several different possibilities for filling in the
field W2, because languages commonly have synonyms. You may enter as
many synonyms as you like, as long as you include the most frequent
and/or colloquial synonym.
Synonyms are entered by going to the WORDS table, creating a new record
(menu Records > New record), and then linking the new word to the
old meaning in the main layout. (Create a new record in the main
layout, and then select a meaning and a word from the drop-down lists
next to the buttons "set to an existing meaning" and "set to an
You may also want to enter several sub-counterparts (i.e. words that
are narrower in meaning than the LWT meaning). This is done in the same
way as entering several synonyms, but for sub-counterparts, please
select "sub-counterpart" in the field W25 (Word to meaning
When the project language has no exact counterpart of the LWT meaning,
but a super-counterpart (a word with a broader meaning), this broader
word should be given, and "super-counterpart"should be selected in the
field W25 (Word to meaning relationship). Thus, for LWT 3.94 ('the
snail'), German Schnecke may be given, although it means 'snail or
If the hyperonym's meaning is the direct counterpart of another LWT
meaning. But for LWT 3.42 ('the stallion'), a word meaning 'horse' may
not be given, because 'horse' is another LWT meaning (LWT 3.41).
If a language has no conventionalized way of referring to a male horse
specifically, this field must be left unfilled for that language.
(ii) One word, several
It is probably best to fill in all the words and then (in a second
step) deal with cases where a word corresponds to several LWT meanings
(i.e. cases of duplicates).
Ultimately, no word duplicates (i.e. records with the same content in
the Word Form field) are allowed. Polysemous or vague words
corresponding to several LWT meanings should eventually appear just
once in the words table. (And of course homonyms must be distinguished
by numbers; see above)
So what needs to be done now is to remove duplicate records from the
words table, while keeping the relationships to the meanings table
intact. This is made easy by a script on the layout "WORDS-duplicate
housekeeping". You can find duplicates by clicking on the button "find
all duplicates" in the first line of that layout. This will list the
duplicates in two portals (=sets of up to three rows with words), one
for word form duplicates, and the other for full duplicates. (One
record is always displayed in each portal; eventually the portals must
not have more than one record anywhere.)
Word form duplicates are records with the same content in field W2.
These may have very different content in the other fields, but they
could still represent the same word (the differences could be due to
the fact that the content was filled in at different times, etc.). Full
duplicates are records with identical content in all fields, except the
match with the LWT meaning, which is of course different. Full
duplicates are fairly likely to be the same word, but it could of
course be that two homonyms which are not loanwords have the same age
and analyzability etc. and none of the other fields are filled. (All
full duplicates are also word form duplicates, by definition.)
It is best to start by removing full duplicates. Go into the Find mode
and click on the button next to "POSSIBLE FULL DUPLICATES". This will
find the full duplicates (though FileMaker only looks at the first ten
characters in each field, so there is a small chance that some of the
found records may not be full duplicates after all), and sort them
alphabetically. Then for each duplicate set, you'll need to decide
first whether the duplicates are homonyms or a single vague/polysemous
word. (You can switch between the duplicates by clicking on the "view"
button at the end of the portal rows.)
• If they are homonyms, please add distinguishing numbers in
parentheses. (Note that as soon as you add a number, the word
disappears from the portal, because it is no longer a duplicate. So
please remember which words are homonyms and mark them all by numbers.)
• If they are a single vague/polysemous word, you'll need to delete all
but one of the duplicates. This is done by declaring one of the
duplicates the "master", by clicking on the "master" button on the
right in the portals. This will mark the master record as "KEEP", and
the others as "DISCARD". Please make sure not to lose information from
any of the fields of the discarded word records.
• Once you are sure that all the information you want to give has been
taken into the master record, make a backup copy of the database file
and then click on "delete all discarded words". This will delete the
non-master words and re-link the master word to the right LWT meanings.
Creating a new meaning
entry: Sometimes a contributor might want to add a meaning that
is not included in the original 1460 meanings of the database. To do
that, go to the "Records" menu and click on "New Record". A new record
will open. A temporary random number will be assigned to the record.
However, you will still need to enter one of the 24 existing chapter
numbers before filling in any other fields.
There are 10 custom fields that you may use for entering all kinds of
At most the (presumed) loanwords will be listed in the paper version.
All other words will only be published in the electronic version.
List of category
abbreviations for morpheme-by-morpheme glosses
The following abbreviations are standard by the
Leipzig Glossing Rules. Some other abbreviations have been added
specifically for the Loanword Typology project (see below at the end of
*1 first person
*2 second person
*3 third person
*A agent-like argument of canonical transitive verb
*N- non- (e.g. NSG nonsingular, NPST nonpast)
*NEG negation, negative
*P patient-like argument of canonical transitive verb
*Q question particle/marker
*S single argument of canonical intransitive verb
that are not in the Leipzig Glossing Rules:
*12 combined first and second person
*AGT agent, agentive
*CIRC circumfix (for the second part of a circumfix; see Leipzig
Glossing Rules, Rule 7)
*NOUN noun-forming affix (including from other nouns)
*VN verbal noun