A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.

Chinese corpora annotated by the Stanford tagger use this Chinese Penn Treebank part-of-speech tagset.

An example of a tag in the CQL concordance search box: [tag="OD"] finds ordinal numbers, e.g. 第一

Tag Description Example
AD adverb
AS aspect marker
BA 把 in ba-construction
CC coordinating conjunction
CD cardinal number 一百
CS subordinating conjunction 虽然
DEC 的 in a relative-clause
DEG associative
DER in V-de const. and V-de-R
DEV 地 before VP
DT determiner
ETC for words 等, 等等 等, 等等
FW foreign words A
IJ interjection 哈哈
JJ other noun-modifer
LB 被 in long bei-const
LC localizer
M measure word
MSP other particle
NN common noun 工作
NR proper noun 中国
NT temporal noun 目前
OD ordinal number 第一
ON onomatopoeia
P Prepositions (excluding 把 and 被)
PN pronoun
PU punctuation 标点
SB 被 in short bei-const
SP sentence-final particle
VA predicative adjective
VC copula
VE 有 as the main verb
VV other verbs
X numbers and units, mathematical sign 59mm

Source: https://repository.upenn.edu/entities/publication/82db5c98-308c-47d1-8744-dfa6d6723cb0

Chinese text corpora

Sketch Engine offers dozens of Chinese corpora.

or