Chinese Penn Treebank POS tagset

A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.

Chinese corpora annotated by the Stanford tagger use this Chinese Penn Treebank part-of-speech tagset.

Chinese tagsets

used in Sketch Engine

See what is a POS tag?

An example of a tag in the CQL concordance search box: [tag="OD"] finds ordinal numbers, e.g. 第一

Tag	Description	Example
AD	adverb	也
AS	aspect marker	着
BA	把 in ba-construction	把
CC	coordinating conjunction	和
CD	cardinal number	一百
CS	subordinating conjunction	虽然
DEC	的 in a relative-clause	的
DEG	associative	的
DER	in V-de const. and V-de-R	得
DEV	地 before VP	地
DT	determiner	这
ETC	for words 等, 等等	等, 等等
FW	foreign words	A
IJ	interjection	哈哈
JJ	other noun-modifer	新
LB	被 in long bei-const	被
LC	localizer	里
M	measure word	个
MSP	other particle	所
NN	common noun	工作
NR	proper noun	中国
NT	temporal noun	目前
OD	ordinal number	第一
ON	onomatopoeia
P	Prepositions (excluding 把 and 被)	在
PN	pronoun	我
PU	punctuation	标点
SB	被 in short bei-const	被
SP	sentence-final particle	吗
VA	predicative adjective	好
VC	copula	是
VE	有 as the main verb	有
VV	other verbs	要
X	numbers and units, mathematical sign	59mm

Source: https://repository.upenn.edu/entities/publication/82db5c98-308c-47d1-8744-dfa6d6723cb0

Chinese text corpora

Sketch Engine offers dozens of Chinese corpora.

Available corpora

or

register for free trial

Chinese text corpora

for learners of languages

A Course in Lexicography and Lexical Computing

term extraction

learn sketch engine