|
![]() |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Search |
DUPLEX - Doubles and Expletives in European Portuguese Dialect Syntax
Project description: DUPLEX is a three-year project aimed at promoting the study of European Portuguese dialect syntax by means of a twofold approach:
The project extends the research developed within the projects CORDIAL-SIN (PRAXIS XXI/P/PLP/113046/1998), CORDIAL-SIN-2 (POSI/1999/PLP/33275) and Dialectal Syntax (POCTI/LIN/46980/2002): firstly, as an enhancement of the compiled dialectal corpus (CORDIAL-SIN), which will be provided with sentence-based annotation for syntactic structure, thus becoming a more efficient resource for the purpose of studying syntax; secondly, as an in-depth study of a coherent selection of topics previously identified as relevant in the domain of Portuguese dialect syntax. The investigated topics are also of high significance for cross-linguistic dialect syntax – syntactic doubling in particular is among the central concerns of the supranational ESFfunded project European Dialect Syntax (EDISYN). DUPLEX will thus allow a substantial improvement of the Portuguese participation at this European project on dialect syntax (running from September 2005 until September 2010). The building up of the Syntax-Oriented Dialectal Corpus (CORDIAL-SIN) was achieved in 2007. This 500,000 words corpus is based on a geographically representative body of selected excerpts of spontaneous and semi-directed speech, drawn from the rich recorded speech collection gathered by the Variation Group of CLUL. The corpus is available online, under three different formats: verbatim transcripts, normalized transcripts, part-of-speech tagged files. One of the aims of the proposed project is to make additionally available a parsed version of the corpus CORDIAL-SIN which will allow searching not only for words or word sequences but also for syntactic structure. The syntactic annotation is implemented over part-of-speech tagged texts. The annotation system has already been set up, in collaboration with other research groups engaged in the building up of syntactically annotated corpora, namely the Tycho Brahe project and the Penn-Helsinki Parsed Corpus of Middle English (PPCME2) - cf. preliminary version of the CORDIAL-SIN Syntactic Annotation Manual. The chosen annotation system thus profits from available tools and guidelines for syntactic annotation developed by the above mentioned projects. CORDIAL-SIN partof- speech tagged texts are run through the Multilingual Statistical Parsing Engine designed by D. Bikel. Human editing corrects and adds information to the output of the parser. The syntactic annotation results in a tree representation in the form of labeled brackets, marking constituent boundaries, phrase and clause dependencies, sentence types, grammatical relations and certain transformational relations. Complete and automatic searching for predefined syntactic configurations is enabled by the already available search engine used within PPCME2. The results of syntactic annotation ease the way to pursue the parallel aim of DUPLEX, handily providing a solid empirical ground for the research on doubling and expletive constructions in Portuguese dialect syntax. (“Doubling” and “expletive” are understood here under their broad meanings: the former is not strictly limited to double instances of the very same unit and the latter broadly refers to any sort of semantically vacuous element.) Phenomena such as complementizer doubling, focus-elements doubling, clitic duplication, subject doubling, as well as the use of different expletive words (even when no doubling appears to be involved) are far more pervasive in dialects than in the standard variety. For any cross-linguistically grounded syntactic theory, such phenomena represent an important challenge: since both doubling and expletive constructions involve some semantically vacuous element, it is conceivable that they rely on strictly syntactic properties, thus providing important clues about the structure of language. Differences within this concerted area of focus will certainly shed some light on the nature of linguistic variation. [top] |
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 2012 • CLUL - Centro de Linguística da Universidade de Lisboa •
Copyright
• Webmaster •
Contacts
• Design: Plasma |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||