You are using an outdated browser. Please upgrade your browser to improve your experience.

Project: Overview

The project Linguistic dynamics in the Greater Tunis Area: a corpus-based approach (TUNICO) ran from August 2013 until July 2016. It was funded by the Austrian Science Fund (FWF; P-25706) and conducted as a joint endeavour of the University of Vienna and the Austrian Academy of Sciences.

The TUNICO project was aimed at getting a clearer picture of the current linguistic situation of Tunis. Methodologically, it was situated at the crossroads of variational linguistics and language technology, combining dialectological approaches with up-to-date text technological methodologies. The data and tools developed and tested in the project were meant to be beneficial for a wide range of research questions both in the field of Arabic linguistics but also beyond.

The project was conducted in the spirit of open source and open access. Both the corpus, the lexicographical data and the documentation created during the project are available to the scientific community through publicly accessible web interfaces.


For a long time, the dialect of the Tunisian capital has been regarded as one of the best-described urban dialects of the Arab world. The first linguistic descriptions go back to the late 19th century, i.e. to an era when the scientific interest in colloquial varieties of Arabic had just begun. The majority of publications on the dialect of Tunis focus on sociolinguistics, phonological and morphological issues. In-depth studies on syntax are very scarce and there is no up-to-date dictionary available that is based on authentic spoken data. There are also very few relevant studies dedicated to the linguistic dynamics caused by recent demographic changes in the metropolitan area of Tunis. Today, the variety of the Arabic spoken by most inhabitants of the city has become a koiné that has not only spread to the vicinity of the city but is widely used throughout Tunisia.


The TUNICO project focused on contemporary language. We therefore strived to gather data from field recordings made with young speakers. Most of these grew up in the city of Tunis but descended from parents who for the most part had moved to the capital from other regions. As part of the project, we created two digital language resources: (1) a corpus of unmonitored speech that containing both conversations and narratives and (2) a dictionary based on this corpus and on previously published resources. A particular focus of the project was the the dictionary/corpus interface, allowing researchers to navigate from the corpus to the dictionary and vice versa.

Besides serving as the primary source for the dictionary, the digital corpus was used to investigate a number of selected topics dealing with the morphology and syntax of contemporary Tunis Arabic. As for the dictionary, it not only contains lexicographic data gleaned from the corpus. Two additional sources were incorporated as well: data elicited from complementary interviews with young Tunisians and lexicographical material compiled from various older sources such as the monumental grammar published by H.-R. Singer in 1984. The diachronic nature of the dictionary-the mentioned printed sources contain material from the middle of the 20th century and earlier-allowed us to analyse the linguistic dynamics in the realm of the lexicon as well.