Online queries to CRPC subcorpora (corpus query tool CONCOR)

E-mail Print PDF

The following subcorpora of CRPC are available for online queries. They can be searched as a single corpus or as partial subcorpora according to text type.
To search the European Corpus click here.
To search the African Corpus click here.

Corpora of European Portuguese

1) ELAN Corpus : 2.840.552 words

ELAN Corpus (ELAN - European Language Activity Network)

corpus_ELAN

Number of words

newspaper (jornal_ELAN)

1.878.156

technical and scientific book (livrotec_ELAN)

510.562

periodical (revista_ELAN)

262.465

miscellaneous (varia_ELAN)

189.356

  Total

2.840.552

2) RL Corpus: 8.670.438 words

Non-annotated RL Corpus (Language Resources for Portuguese: a corpus and tools for query and analysis)

corpus

Number of words

spoken corpus (corpus_oral_RL)

105.964

written corpus (corpus_escrito_RL)

8.564.474

newspaper (jornal RL)

4.097.868

fiction book (livrolit RL)

1.792.590

technical and scientific book (livrotec RL)

1.440.625

periodical (revista RL)

420.792

miscellaneous (varia RL)

812.599

Total (spoken + written)

8.670.438

3) ELAN + RL Corpora: 11.405.026 words

ELAN Corpus (ELAN - European Language Activity Network) + Non-annotated corpus RL (Language Resources for Portuguese: a corpus and tools for query and analysis)

Tagged Corpus of European Portuguese

corpus_RL_ELAN

Nº de palavras

newspaper (jornal_RL_ELAN)

5.976.024

technical and scientific book (livrotec_RL_ELAN)

1.951.187

periodical (revista_RL_ELAN)

683.257

miscellaneous (varia_RL_ELAN)

1.001.955

4) RL tagged Corpus: 501.042 words (annotation manual)

Tagged Corpus RL (Language Resources for Portuguese: a corpus and tools for query and analysis)

Corpus of African Portuguese Varieties

Tagged corpus (corpus_anotado_RL)

Number of words

newspaper (jornal_anotado_RL)

336.151

periodical (revista_anotado_RL)

25.908

book (livro_anotado_RL)

125.434

miscellaneous (varia_anotado_RL)

13.549

 Total

501.042



Annotation manual

It is also possible to query separately files which were automatically tagged, with no manual revision (Ex.: jornal_anot_auto_RL) and files which were manually revised (ex.: jornal_anot_rev_man_RL):

newspaper (jornal_anot_auto_RL)

184.418

book (livro_anot_auto_RL)

60.344

periodical (revista_anot_auto_RL)

18.914

miscellaneous (varia_anot_auto_RL)

8.273

 

newspaper (jornal_anot_rev_man_RL)

184.131

book (livro_anot_rev_man_RL)

63.264

periodical (revista_anot_rev_man_RL)

15.328

miscellaneous (varia_anot_rev_man_RL)

8.319

To query a word in the tagged corpus, either ask for the lemma or for the word form and tag, (Ex conta/nc; conta/vpi).

 

5) AFRICA Corpus: 3.000.000 words 

AFRICA Corpus (Linguistic Resources for the Study of African Varieties of Portuguese)

Countries

Spoken corpus

Written corpus

Angola

27.363

613.495

Cape Verde

25.413

612.120

Guinea-Bissau

25.016

615.404

Mozambique

26.166

615.297

Sao Tome and Principe

25.287

614.563

Total

129.245

3.070.879

Total of both corpora

 

3.070.879

 


To search the European Portuguese corpus:

 




To search the African corpus:


Last Updated on Wednesday, 13 October 2010 11:58  


Login Form