Laboratory for the Computational Studies of Language


METU Turkish Corpus Project

It is of vital importance nowadays for modern languages to be represented with linguistically and meta-linguistically preprocessed corpora consisting of written and spoken samples. The need for such an electronic corpus of modern Turkish open to researchers and implementors becomes evident as its possible academic and practical uses are considered. Developing natural language processing software based on corpora such as machine translation, text-to-speech generation, text summarization systems will be the computer science applications whereas supporting cognitive and linguistic hypotheses on syntax, semantics, discourse or linguistic variation with experimental evidence will be beneficial for linguistic and educational sciences. This project aims to produce a CD of tagged and at least partially parsed written Turkish texts from various genres (samples from narratives, argumentatives, editorials etc.) with suitable access software included.