How to Build an English-Portuguese and Portuguese-English Collocations Dictionary
What's the News?
How to Build an English-Portuguese and Portuguese-English Collocations Dictionary
Building an English-Portuguese and Portuguese-English collocations dictionary is a valuable tool for language learners, translators, and linguists. A collocation refers to a combination of words that frequently occur together in a particular language, forming natural, native-like phrases. Understanding and using these combinations is crucial for fluency, as they reflect common patterns of speech and writing that native speakers use. Developing such a dictionary requires a systematic approach, including data collection, analysis, and careful consideration of linguistic principles. Here’s a detailed guide on how to build a collocations dictionary for both English and Portuguese.
Step 1: Understand What Collocations Are
Before diving into the creation of the dictionary, it is important to clarify what collocations are. Collocations are combinations of words that are more likely to appear together than by chance. They are not random but instead follow established patterns in a language. For instance:
English Collocations:
Strong coffee (not "powerful coffee")
Make a decision (not "do a decision")
Fast food (not "quick food")
Portuguese Collocations:
Café forte (not "café poderoso")
Tomar uma decisão (not "fazer uma decisão")
Comida rápida (not "comida veloz")
The first word in the collocation is often known as the "collocate," and the second word is the "collocation." In English and Portuguese, collocations may involve adjectives, verbs, nouns, adverbs, and prepositions.
Step 2: Data Collection
The first major step in building a collocations dictionary is to collect data. This can be done through various methods, including:
1. Corpus Compilation:
A corpus is a large, structured set of texts used for linguistic analysis. You can create or obtain a bilingual corpus of English and Portuguese texts that reflect a variety of registers, such as newspapers, novels, academic papers, and conversation transcripts. Many online linguistic databases offer freely available corpora (e.g., the British National Corpus for English and the Corpus Brasileiro for Portuguese).
Parallel corpora (texts available in both English and Portuguese) are particularly useful for identifying direct equivalents of collocations.
2. Using Pre-existing Dictionaries and Thesauri:
Refer to high-quality English-Portuguese and Portuguese-English dictionaries. While dictionaries may not explicitly list collocations, many include common phrases or idiomatic expressions.
Thesauruses and collocation-specific dictionaries for both languages can be helpful in identifying word pairings that are frequently used together.
3. Web Scraping:
Web scraping tools can be used to gather textual data from websites, blogs, forums, and news sources in both languages. This process allows for the extraction of word combinations that occur naturally in authentic contexts.
4. Consulting Language Resources:
Online resources like Linguee and Reverso show how words are used in context, providing examples of common collocations. These can be great starting points.
5. Crowdsourcing:
If needed, you could employ language learners or native speakers to help identify common word pairings through surveys, questionnaires, or focus groups.
Step 3: Collocation Extraction
Once you have a sufficient corpus, the next task is to extract collocations. This step typically involves the use of natural language processing (NLP) techniques to identify word pairings that appear together more often than by chance. Some methods to extract collocations include:
1. Frequency Analysis:
Simple frequency analysis can help identify word pairs that frequently appear together. For example, if the word "strong" frequently pairs with "coffee" in the corpus, this suggests a collocation.
2. Statistical Methods:
Use statistical measures like Mutual Information (MI), T-score, or Log-likelihood to quantify the strength of a collocation. These measures calculate how statistically significant a word pair is, accounting for how much more often the words appear together compared to when they appear separately.
3. Co-occurrence and Window Analysis:
Co-occurrence refers to how often two words appear in the same context (or "window") in a sentence or text. A typical window size is defined as a certain number of words preceding and following a given word. This method helps extract combinations that occur within a narrow range of each other.
4. Contextual Analysis:
It’s crucial to verify that the identified pairs are indeed used as natural collocations. Sometimes, frequent word pairs may be statistical coincidences rather than common collocations. Review the context in which words appear together to ensure they make sense.
5. Manual Review and Refinement:
After extracting potential collocations, it is important to manually verify them. Not all statistically significant pairs will be actual collocations, and some may be domain-specific or not applicable in general use.
Step 4: Creating the Dictionary
Once collocations have been extracted and validated, the next step is to organize and present them in a dictionary format. A useful collocations dictionary should:
1. Include both directions:
For each collocation, you’ll need to list both the English-to-Portuguese and Portuguese-to-English forms. This allows users to see how words combine in both languages.
2. Provide clear definitions:
For each collocation, provide a clear, concise definition. In the case of idiomatic or figurative expressions, explain the meaning and usage in both languages.
3. Include Contextual Examples:
Provide sample sentences or short paragraphs showing how the collocations are used in context. For example:
English: "He made a quick decision about the meeting."
Portuguese: "Ele tomou uma decisão rápida sobre a reunião."
4. Consider Variations:
Be mindful of regional variations. For example, certain collocations might be common in European Portuguese but not in Brazilian Portuguese, or vice versa. Likewise, British and American English may have different common collocations.
5. Organize by Type:
Classify the collocations by the grammatical category they belong to, such as verb-noun (e.g., “make a decision” / “tomar uma decisão”), adjective-noun (e.g., “strong coffee” / “café forte”), or adverb-verb (e.g., “highly recommend” / “altamente recomendar”).
6. Include Collocational Strength:
For each pair, indicate the strength of the collocation. This could be labeled as "strong," "medium," or "weak," or based on statistical measures (like MI score).
Step 5: Integration of Language Learning Tools
To make the dictionary more user-friendly and helpful for language learners, consider integrating additional features:
1. Frequency Information:
Indicate the frequency of each collocation in real-world usage. For instance, you can rank collocations by how often they appear in the corpus, helping learners prioritize the most useful ones.
2. Phonetic Transcriptions:
Provide phonetic transcriptions for the English collocations, especially for learners who are trying to improve their pronunciation.
3. Language Learning Context:
Offer contextual notes, such as whether a collocation is formal, informal, technical, or slang. This helps learners understand where and when they can use specific phrases.
4. Interactive Features:
If the dictionary is digital, you can include interactive elements such as quizzes, exercises, and spaced repetition systems to help learners internalize the collocations.
Step 6: Testing and Updating
Once the dictionary is created, test it by asking native speakers or advanced learners to use it. They can provide feedback on whether the collocations are accurate, useful, and easy to understand. This testing phase ensures that your dictionary is as practical and accurate as possible.
Furthermore, language usage evolves over time. New collocations emerge, and old ones may fall out of use. Therefore, it’s important to periodically update the dictionary with fresh data from new corpora, online sources, or linguistic research.
Conclusion
Building an English-Portuguese and Portuguese-English collocations dictionary is a complex but rewarding process. It requires combining linguistic expertise with technology and data analysis to identify natural and frequent word pairings. By following a systematic approach to data collection, extraction, and presentation, you can create a valuable resource that will aid language learners, translators, and professionals in understanding the subtleties of both languages. Ultimately, a well-constructed col
locations dictionary improves the ability to speak and write more naturally, helping learners sound more like native speakers and understand the nuances of word combinations in context.
Bibliographic References
Tagnin, S. (1998). Levels of Conventionality and the Translator's Task. (PhD thesis).
This work highlights the treatment of verbal collocations in existing dictionaries and proposes a model for their inclusion, emphasizing the importance of listing collocations under the noun for production purposes.
Lopes, A., Nogueira, R., Lotufo, R., & Pedrini, H. (2020). Lite Training Strategies for Portuguese-English and English-Portuguese Translation. arXiv preprint arXiv:2008.08769.
This paper explores the use of pre-trained language models for translation tasks, offering insights into handling Portuguese characters and providing comparative analyses with existing translation models.
Baker, M. (1992). In Other Words: A Coursebook on Translation. Routledge.
Baker discusses translation strategies, including the handling of collocations, and provides exercises that can be beneficial for understanding and compiling collocational data.
McCarthy, M., & O'Dell, F. (2005). English Collocations in Use: Intermediate. Cambridge University Press.
This book offers a practical approach to learning and teaching collocations, providing numerous examples and exercises that can serve as a reference for identifying common collocational patterns.
Hausmann, F. J. (1985). Wörterbuch der kollokationen: Ein Beitrag zur Lexikographie des Deutschen.
Hausmann's work delves into the lexicographic treatment of collocations, offering insights that can be applied to English-Portuguese dictionary compilation.