The following subcorpora of CRPC are available for online queries. They can be searched as a single corpus or as partial subcorpora according to text type.
To search the European Corpus click here.
To search the African Corpus click here.
Corpora of European Portuguese
1) ELAN Corpus : 2.840.552 words
ELAN Corpus (ELAN - European Language Activity Network)
|
corpus_ELAN |
Number of words |
|
newspaper (jornal_ELAN) |
1.878.156 |
|
technical and scientific book (livrotec_ELAN) |
510.562 |
|
periodical (revista_ELAN) |
262.465 |
|
miscellaneous (varia_ELAN) |
189.356 |
|
Total |
2.840.552 |
2) RL Corpus: 8.670.438 words
Non-annotated RL Corpus (Language Resources for Portuguese: a corpus and tools for query and analysis)
| corpus |
Number of words |
|
spoken corpus (corpus_oral_RL) |
105.964 |
|
written corpus (corpus_escrito_RL) |
8.564.474 |
|
newspaper (jornal RL) |
4.097.868 |
|
fiction book (livrolit RL) |
1.792.590 |
|
technical and scientific book (livrotec RL) |
1.440.625 |
|
periodical (revista RL) |
420.792 |
|
miscellaneous (varia RL) |
812.599 |
|
Total (spoken + written) |
8.670.438 |
3) ELAN + RL Corpora: 11.405.026 words
ELAN Corpus (ELAN - European Language Activity Network) + Non-annotated corpus RL (Language Resources for Portuguese: a corpus and tools for query and analysis)
Tagged Corpus of European Portuguese
|
corpus_RL_ELAN |
Nº de palavras |
|
newspaper (jornal_RL_ELAN) |
5.976.024 |
|
technical and scientific book (livrotec_RL_ELAN) |
1.951.187 |
|
periodical (revista_RL_ELAN) |
683.257 |
|
miscellaneous (varia_RL_ELAN) |
1.001.955 |
4) RL tagged Corpus: 501.042 words (annotation manual)
Tagged Corpus RL (Language Resources for Portuguese: a corpus and tools for query and analysis)
Corpus of African Portuguese Varieties
|
Tagged corpus (corpus_anotado_RL) |
Number of words |
|
newspaper (jornal_anotado_RL) |
336.151 |
|
periodical (revista_anotado_RL) |
25.908 |
|
book (livro_anotado_RL) |
125.434 |
|
miscellaneous (varia_anotado_RL) |
13.549 |
|
Total |
501.042 |
Annotation manual
It is also possible to query separately files which were automatically tagged, with no manual revision (Ex.: jornal_anot_auto_RL) and files which were manually revised (ex.: jornal_anot_rev_man_RL):
|
newspaper (jornal_anot_auto_RL) |
184.418 |
|
book (livro_anot_auto_RL) |
60.344 |
|
periodical (revista_anot_auto_RL) |
18.914 |
|
miscellaneous (varia_anot_auto_RL) |
8.273 |
|
newspaper (jornal_anot_rev_man_RL) |
184.131 |
|
book (livro_anot_rev_man_RL) |
63.264 |
|
periodical (revista_anot_rev_man_RL) |
15.328 |
|
miscellaneous (varia_anot_rev_man_RL) |
8.319 |
To query a word in the tagged corpus, either ask for the lemma or for the word form and tag, (Ex conta/nc; conta/vpi).
5) AFRICA Corpus: 3.000.000 words
AFRICA Corpus (Linguistic Resources for the Study of African Varieties of Portuguese)
|
Countries |
Spoken corpus |
Written corpus |
|
Angola |
27.363 |
613.495 |
|
Cape Verde |
25.413 |
612.120 |
|
Guinea-Bissau |
25.016 |
615.404 |
|
Mozambique |
26.166 |
615.297 |
|
Sao Tome and Principe |
25.287 |
614.563 |
|
Total |
129.245 |
3.070.879 |
|
Total of both corpora |
|
3.070.879 |
To search the European Portuguese corpus:




