Malaysian and Indonesian tagset

A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.
When creating user corpora, the recommended tagset is always preselected. Using a different tagset, if there are more than one for the language, is only recommended for advanced users. Tagsets cannot be normally changed for preloaded corpora.

Since Word Sketches, thesaurus, term extraction and trends make use of POS tagging, their respective settings (e.g. Word Sketch grammar, term grammar) have to be written using the tagset used by the corpus.

Malay

available corpora

Indonesian

available corpora

Malaysian and Indonesian corpora in Sketch Engine can have these tagsets:

(to check the tagset used, go to Corpus Statistics and details page)

An Example of a tag in the CQL concordance search box: [tag="" & morph=""] searches for cardinal numerals

Indonesian and Malaysian_Previous morphology - Apertium

Source http://wiki.apertium.org/wiki/Indonesian_and_Malaysian/Previous_morphology

Malaysian and Indonesian corpora in Sketch Engine can have these tagsets:

for learners of languages

A Course in Lexicography and Lexical Computing

term extraction

learn sketch engine