A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.
SWATWOL part-of-speech tagset is available in Swahili (also known as Kiswahili) corpora.
An Example of a tag in the CQL concordance search box: [tag="N"]
finds all nouns, e.g. watu, mwaka (note: please make sure that you use straight double quotation marks)
Tagset
Tag | Description | Example |
A-INFL | inflecting adjective | viti vizuri |
A-UINFL | uninflecting adjective | viti haba |
ABBR | abbreviation | n.k. |
AD-ADJ | qualifier of an adjective | kabisa, sana |
ADJ | adjective | zuri, haba |
ADJ-INFL | inflecting adjective | zingine |
ADV | adverb | tena |
AG-PART | agentive particle | ililiwa na panya |
AG-PART_PRON | agentive pronoun | nayo |
AR | Arabic origin | Kitaalam |
CC | coordinating conjunction | baba na mama |
CLB | clause boundary | . ; ? ! ili, kwamba |
COLON | colon | : |
COMMA | comma | , |
CONCORD1 | grammatical concord | kupi |
CONJ | conjunction | ili |
DEF-V:li | verb with no inflection based on the root “li” | walio, alye |
DEF-V:na | verb with no inflection based on the root “na” | kuna |
DEF-V:ni | verb with no inflection based on the root “ni” | ni |
DEM:hV | demonstrative pronoun of the type hV(C)V | hiki, haya, hii |
DOLLAR-SIGN | dollar sign | $ |
DOUBLE-QUOTE | double quote | “ |
EMPH | emphatic form | mimi ndimi |
ENG | English origin | Apartheid |
EQUAL-MARK | equal mark | = |
EXCLAM | number | 2000 |
EXCLAMATION | exclamation mark | ! |
GEN-CON | genitive connector | kitabu cha mtoto |
HYPHEN | hyphen | – |
IDIOM | idiom | punde si punde |
IMP | imperative | twende |
INTERROG | interrogative | lini? mbona? |
LEFT-PARENTHESIS | left parenthesis | ( |
LEFT-SQUAREBRACKET | left square bracket | [ |
N | noun | kitu |
NA-POSS | possessive particle | alikuwa na mali |
NEG | negative | sisomi, hasomi, hatasoma |
NUM | numeral | ishirini |
PL1-SP | noun class 1/2 1p plural, subject prefix | mnasoma |
PL2-SP | noun class 1/2 2p plural, subject prefix | mnasoma |
PREP | preposition | katika |
PREP_PRON | preposition+pronoun | naye |
PROCENT-MARK | procent mark | % |
PRON | pronoun | mimi, hiki |
PROPNAME | proper name | Ali, Mombasa |
REL | relative marker in verb | ninayesoma |
RHET | rhetorical | Unakwenda, sio? |
RIGHT-PARENTHESIS | right parenthesis | ) |
RIGHT-SQUAREBRACKET | right square bracket | ] |
SELFSTANDING | selfstanding subject particle | yu, zi |
SG1-SP | noun class 1/2 1p singular, subject prefix | ninafikiri |
SG2-SP | noun class 1/2 2p person singular, subject prefix | unasoma |
SG3-SP | noun class 1/2 3p person singular, subject prefix | anasoma |
SINGLE-QUOTE | single quote | ‘ |
V | verb | kusoma |
VCAP_ | word starting with capital | Nawe |
VFIN | finite verb | anasoma |
VIMP | imperative verb | rudi, angalia |
VINF | infinite verb | kuwa, kufanya |
Vkwisha | verb with the marker “kwisha” | kwisha |
Source: http://www.aakkl.helsinki.fi/cameel/corpus/swatags.pdf