Search

Pesquisa do corpus - Programa CONCOR

Corpus selection:

Word selection (separeted by space or line, after # all the characters are ignored):
Help
Consider word as lemma (except the ones preceded by _)

Concordances Frequencies by alphabetic order Frequencies by numeric orderHelp

Sorting (ex: 0,1,-2): Help

Number of lines of context (0 results in one line per context): Help

Number of columns of context: Help     Number of columns of reference: Help


How to query the corpus?
The CONCOR tool, a corpus query tool, allows the user to define several aspects of the query:

Word selection  ^
The CONCOR allows searching for a lexical form or a list of lexical forms that occur in the corpus. To do so, it is necessary to introduce the chosen forms in the "Word selection" box.

Azul
  or  
Azul
Amarelo
Verde
  or  
Azul  Amarelo  Verde


The CONCOR tool also allows the user to search all the inflected forms of a word (lemma) that occur in the corpus. To do so, it is necessary to:
1) enter the head word in the box (the dictionary entry form - verb infinitive form, noun and adjective masculine singular form, and so on)
2) activate the option " Consider word as lemma".
example
In this way, the introduction of the word "bonito" in the box, together with the "Consider word as lemma" query option activation, enables the concordances or frequencies extraction of the forms "bonito", "bonita", "bonitos", "bonitas", "bonitinho", and so on. On the contrary, if the user doesn't activate the "Consider word as lemma" option, the CONCOR tool provides only the results that refer to the form "bonito". It is also possible to query simultaneously on word forms and lemma included in a list. To do so, it is necessary to use the character '_' before the words that are not to query on accordingly to the selected option .
example
In this case, the results will consist on the concordances or frequencies of all the forms of the lemma "ler" and "bonito" and on the concordances or frequencies of just the form "escreva". It is also possible to copy a list of words to serach for from a file and paste it on the right box.

Note: Since the lemmatiser is still being improved, the results of a lemma search may not include some word forms (namely, verbal forms with clitics, diminutives, and so on).

Concordances or frequencies  ^
The CONCOR tool can provide two types of results:
- concordances: set of contexts in which the lexical forms occur. These forms are always presented in the center of the context.
- frequencies: information about the ocurrence frequency of the lexical forms in the corpus. The frequency results can be presented by alphabetic order or decreasing numeric order.

Sorting  ^
The concordances can be presented by the alphabetic order of the chosen word or of the words that occur to its left or right. The sorting options are expressed by numbers, for instance:

0
alphabetic order of the chosen word
1
alphabetic order of the first word to the right
2
alphabetic order of the second word to the right
-1
alphabetic order of the first word to the left
-2
alphabetic order of the second word to the left
and so on.
It is possible to combine these options, for instance:
1, -2 alphabetic order of the first word to the right and, on a second level, alphabetic order of the second word to the left.

Number of lines of context  ^
By default, the CONCOR tool provides contexts with one line before and one line after the chosen word. Nevertheless, the user can define the context lenght:

0
one line of context with the chosen word on its center
1
one line before and one line after the chosen word
and so on
until the limit of 5 lines before and after the chosen word.
With the exception of option 0, the CONCOR tool provides information about: the form/lemma frequency; the order number of occurrence in the corpus accordingly to the selected sorting; the bibliographic reference code.

Number of columns (characters) of context  ^
When option 0 is selected, the context line has 80 characters by default. However, it is possible to define a larger line, by indicating another number of characters: 100, 124, etc..

Number of columns (characters) of reference  ^
When option 0 is selected, in order to get information about the bibliographic reference code it is necessary to enter a number of characters for the number of columns of reference.



[top]



Print

  2012  •  CLUL - Centro de Linguística da Universidade de Lisboa  •   Copyright   •  Webmaster  •   Contacts   •  Design: Plasma