ALC: Arabic Learner corpus
The Arabic Learner corpus (ALC) is a language corpus made up of texts written and spoken texts that belong to learners of Arabic in Saudi Arabia. All texts were gained in the years 2012–2013 and include 282732 words of 942 students from 67 nationalities.
See more on the project site: http://www.arabiclearnercorpus.com/about-the-corpus-en
Part-of-speech tagset
Texts were POS tagged using the Stanford parser with the following POS tagset description.
Availability
This Arabic corpus is accessible to all users with a Sketch Engine standard subscription, corpus texts are licensed under CC-BY NC 4.0 licence. The corpus is provided in Sketch Engine with permission of the author Abdullah Alfaifi.
Tools to work with the Arabic ALC corpus
A complete set of Sketch Engine tools is available to work with this Arabic Learner Corpus to generate:
- word sketch – Arabic collocations categorized by grammatical relations
- thesaurus – synonyms and similar words for every word
- keywords – terminology extraction of one-word and multi-word units
- word lists – lists of Arabic nouns, verbs, adjectives etc. organized by frequency
- n-grams – frequency list of multi-word units
- concordance – examples in context
- text type analysis – statistics of metadata in the corpus
Bibliography
TenTen corpora
Alrabiah, M., Al-Salman, A., & Atwell, E. S. (2013). The design and construction of the 50 million words KSUCCA. In Proceedings of WACL’2 Second Workshop on Arabic Corpus Linguistics (pp. 5-8). The University of Leeds.
Search the Arabic Learner Corpus
Sketch Engine offers a range of tools to work with the Arabic Learner Corpus.
Use Sketch Engine in minutes
Generate collocations, frequency lists, examples in contexts, n-grams or extracting terms is easy with Sketch Engine. Use our Quick Start Guide to learn it in minutes.