Layout

You are using an outdated browser. Please upgrade your browser to improve your experience.

Corpus: Help

The search tool behind this interface is NoSketch Engine, an open-source tool built for fast indexing and querying of digital texts. It is widely used for linguistic corpora.

The TUNICO corpus interface passes users' queries to the indexer and displays the results. Through the query interface, you can search for words or groups of words in the corpus. By simply entering a word such as ʕaṛbi and pressing the ENTER button on your keyboard you will trigger the query. Results matching your query will be displayed below the input field. The

As the whole system builds on XML, queries are case sensitive!

When typing in single words, the interface's autocomplete function will show you wordforms and lemmata contained in the corpus

The transcription used in the corpus is for the most part DMG. If you need special characters such as ā, š or ʕ, click on the respective letters in the character table.

The preview option will show you a list of tokens that start with the characters you entered so far.

A good documentation of the query language can be found on the website of the SketchEngine.

It is possible to search in particular fields of the corpus. Wildcards are applied on the token level.

.*ūni All wordforms ending in ūni Try it!
[word=".*ūni"] The same as the previous one Try it!
[lemma="ktāb"] All occurrences of the word ktāb, including plural forms Try it!
[lemma="ž.*" & tag="noun"] All nouns starting with ž Try it!
[subc="diminutive"] All diminutives Try it!
[tag="noun"] []{0,1} [tag="adjective"] All nouns followed by an adjective Try it!