Laboratory for the Computational Studies of Language
METU Turkish Corpus Project
It is of vital importance nowadays for modern languages to be represented
with linguistically and meta-linguistically preprocessed corpora consisting
of written and spoken samples. The need for such an electronic corpus of
modern Turkish open to researchers and implementors becomes evident as its
possible academic and practical uses are considered. Developing natural
language processing software based on corpora such as machine translation,
text-to-speech generation, text summarization systems will be the computer
science applications whereas supporting cognitive and linguistic hypotheses
on syntax, semantics, discourse or linguistic variation with experimental
evidence will be beneficial for linguistic and educational sciences. This
project aims to produce a CD of tagged and at least partially parsed
written Turkish texts from various genres (samples from narratives,
argumentatives, editorials etc.) with suitable access software included.