The Banque de Données Langue Corse (Corsican Language Database or BDLC) is a scientific tool designed to illustrate and study the language variation of Corsican in space - including the Corsican dialect of Gallura and the alloglot community of Bunifaziu (Ligurian) - and in time. The data contained in the database are the result of field surveys and were collected from native speakers.

Scientific officers : Marie-José Dalbera-Stefanaggi & Stella Retali-Medori

Brief history

The BDLC (Corsican Language Database) was created in 1986. Since its creation, it has been linked to the programme for the Nouvel Atlas Linguistique et Ethnographique de la Corse (New Linguistic and Ethnographic Atlas of Corsica or NALC), a project outsourced to the University of Corsica in 1981 by the CNRS. A simplified version of the BDLC was made available online in 2009 thanks to funding from UMR LISA 6240. Works also began in 2006 to build a collection entitled Detti è usi di paesi, Matériaux et analyses extraits de la Banque de Données Langue Corse (materials and analyses extracted from the BDLC) around the NALC-BDLC programme.

Methodology and goals

The participants in the NALC-BDLC programme collect linguistic data related to Corsican know-how and cultural traditions throughout the island – as well as in La Maddalena and Gallura (Sardinia). Interviewers conduct interviews aided by topic-based questionnaires. They question people who are considered custodians of inherited, i.e. naturally acquired, language varieties. Moreover, those asked to respond to surveys have also practised the activities covered by the questions.

The interviews are then processed: the dictionary entries are transcribed into their phonetic and written forms; the texts are transcribed into written form. All transcriptions aim to reflect as closely as possible the forms encountered. The data are then input into the database where they are subject to an initial series of analyses as regards vocabulary.

The entries are broken down according to the topics and locations to which they relate, and are then made available online, where they can be viewed in the form of lists or maps (see below). To date (2020), around 120,000 lexical entries can be accessed online, alongside iconographic documents. More than a thousand ethno-texts are available to the public. Other surveys, that have either been completed or that are still underway, are gradually being processed and will eventually be made available. The ethno-texts and the iconographic documents constitute a source of high ethnographic value and help illustrate the link between language and culture.

Moreover, it should be stressed that the approach implemented by the NALC-BDLC programme, which aims to collect, preserve and restore linguistic and ethnographic data, complies with UNESCO's recommendations on the preservation and restoration of intangible cultural heritage of humanity. Moreover, at a local level, this work echoes a political and social demand for knowledge and re-appropriation of the language.

The NALC-BDLC programme therefore plays a key role in memory and heritage. It is, however, above all a tool benefiting research in Language Sciences. More specifically, NALC and BDLC data stimulate important scientific productions both in our laboratory and further afield. One of the noteworthy achievements stemming from the NALC and the BDLC is the fundamental contribution made to describing the Corsican language. According to an acquired principle in the field of Geolinguistics, the synchronic variation of a language, which takes place in space, is generally a reflection of time. Thus, scientific works have led to the production of descriptions of how the Corsican language has evolved from its Latin source from a phonetic standpoint, from a morphological standpoint and also from a lexical standpoint. Although lexical variation sheds light on the way history has left its mark on the language, it also reveals deep lexical processes through the images behind the formation of vocabulary and the changes in meaning. The analyses carried out on these linguistic phenomena shed light on how Corsican functions, and thus lay the foundations for a growing language.

International outreach

The NALC-BDLC programme is part of larger projects:

  • the Atlas Linguistique des Côtes de l’Arc Nord Occidental de la Méditerranée (Linguistic Atlas of the North-Western Arc of the Mediterranean known as ALCANOM)
  • the Atlas Linguistique Roman (Roman Linguistic Atlas or ALiR)
  • the Atlas Linguarum Europae (Linguistic Atlas of Europe or ALE)
  • the Atlante Linguistico Mediterraneo (Mediterranean Linguistic Atlas or ALM).

Participation in these transnational atlases involves data processing through etymological and motivational analysis and not the mere transmission of raw facts.

Moreover, NALC and BDLC materials are incorporated into Italo-Roman and Roman lexicographical works, in particular those of the Dictionnaire Dialectal et Étymologique des Parlers Corses (Dialectal and Etymological Dictionary of Corsican Dialects) currently being produced, as well as into the development of Natural Language Processing (NLP) in Corsican (CPER funding, 2020-2022).