Tag Archives: research

Practical Corpus Linguistics by Martin Weisser

Practical Corpus Linguistics: An Introduction to Corpus-Based Language Analysis

Practical Corpus Linguistics: An Introduction to Corpus-Based Language Analysis by Martin Weisser

Practical Corpus Linguistics is a great introduction to analyzing language data with hands-on exercises using free software and websites. For anyone interested in textual analysis, corpus linguistics, and digital humanities, this book will get you started on the basics. There are other introduction to corpus linguistics books available, but this appears to be the only one that is designed as more of a how-to guide rather than a theoretical overview.

Chapters include collecting and cleaning data, concordancing, querying mega corpora online, frequency analysis, keywords in context, and part-of-speech tagging. There are even chapters on regular expressions and XML. Each chapter features several exercises for you to try out, as well as solutions to and comments on the exercises. There is also a companion website for the book with more exercises as well as updates.

The BYU website was changed shortly after the book was published, so you will need to check the companion website for instructions on using the new interface for accessing the Corpus of Contemporary American English. However, the old website is still available. You will also need to download the free software, AntConc, and create a free account at the BNCweb site to follow along with the exercises.

If you prefer to analyze languages other than English, check out a simple analysis of telenovela Spanish phrases I carried out using DownThemAll to collect transcripts, Python to clean the text files, and AntConc to find the most common phrases.

More Corpus Linguistics

The 8-week Corpus Linguistics MOOC at Futurelearn is another great introduction to the methodology. Both Lancaster and Birmingham run Corpus Linguistics summer schools in the UK every year if you’re able to travel to them.

If you’re interested in learning more programming for humanities research (usually in Python), try the lessons at Programming Historian and check out my list of Digital Humanities tools. Martin Weisser also wrote Essential Programming for Linguists (using Perl) that you might be interested in.

Non-Linguists, Please Stop Trying to Do or Talk About Linguistics Without the Help of Actual Linguists

Ben Zimmer has a wonderful article on “When physicists do linguistics” over at the Boston Globe, which can perhaps be best summarized by this comic from xkcd:

Joking aside, I am happy that other disciplines have an interest in language – however, I hate when other disciplines try to do linguistic research and fail because they do not involve any actual linguists in the research. I agree completely when Zimmer says that there is a “need for better communication between disciplines that previously had little to do with each other.” Communication among related fields could use a little boost too, because it isn’t just physicists who publish papers that contradict linguistic research. Psychologists, speech pathologists, and cognitive scientists have been doing it wrong for a while too, especially when it comes to multilingual and cultural aspects of language acquisition.

Linguistics seems to the be the field that everyone thinks they can do without any special training. Most people wouldn’t think of talking about chemistry or mathematics without actually having studied those subjects. Yet everyone seems to think they are experts on language simply because they speak a language (their native language) or because they have learned another language. Sorry, but those abilities do not make you a qualified linguist nor do they give you the right to talk about language without checking facts or to teach language as if you were an experienced teacher. I know how to drive a car, but I don’t go around pretending to be a certified mechanic or give advice to others on how to fix their own cars.

Robert Lane Greene’s book, You Are What You Speak: Grammar Grouches, Language Laws, and the Politics of Identity, is about this phenomenon. People believe, and repeat, such ridiculous things as “this language has eleventy billion words for X” or “this language is primitive but that language is logical” all the time. Even worse, respected authors repeat these myths in their articles and books, such as Bill Bryson in The Mother Tongue, and so they are repeated again and again without anyone questioning whether they are true or not. These myths are dangerous because a lot of them are based on ethnocentrism and the perceived superiority of the way we speak compared to everyone else.

Please, do yourself a favor and study language seriously instead of repeating myths. Talk to actual linguists, read books written by actual linguists or whose authors talked to actual linguists. In addition to You Are What You Speak, you can start with Language Myths (for a general overview), Vocabulary Myths (for language learners/teachers, which I previously posted about), and the “truth-squad” blog Language Log. But most importantly, always question what is written about language even if it is published by best-selling authors or academic researchers because they may not be linguists at all.

Update 26/02/13: And another one! Ugh. “Why speaking English can make you poor when you retire” about research done by a behavioural economist. Hey, that’s not linguistics! ::sigh:: At least the article quotes my hero, John McWhorter.

Update 15/03/15: So glad I’m not the only one who complains about this: If you’re not a linguist, don’t do linguistic research by @EvilJoeMcVeigh

Vocabulary Myths: Applying Second Language Research to Classroom Teaching

Vocabulary Myths: Applying Second Language Research to Classroom Teaching by Keith Folse (2004, University of Michigan Press) is a great introduction to the gap between practice and research in vocabulary learning and teaching.

I highly recommend the book, but if you’d like a shorter summary, Folse’s article “Myths about Teaching and Learning Second Language Vocabulary: What Recent Research Says” [TESL Reporter 37,2 (2004), pp. 1-13] is also available if you have access to online journals.

The eight myths are:

  1. Vocabulary is not as important in learning a foreign language as grammar or other areas.
  2. It is not good to use lists of words when learning vocabulary.
  3. Vocabulary should be presented in semantic sets.
  4. The use of translations is a poor way to learn new vocabulary.
  5. Guessing words from context is as productive for foreign language learners as it is for first language learners.
  6. The best vocabulary learners make use of only one or two effective specific vocabulary learning strategies.
  7. Foreign language learners should use a monolingual dictionary.
  8. Vocabulary is sufficiently covered in our curricula and courses.

Think about your language classes and how many of these myths were prevalent in the textbook or even encouraged by your teacher.  These myths make teaching languages as well as designing textbooks much easier for the teacher or author, but they go against second language acquisition research on how learners should go about learning a language and tend to make learning even harder.

Free Corpora of Spoken French for French Language Learners or Researchers

Free Corpora of Spoken French for French Learners or Researchers

Learn French with Free Corpora of Spoken French

I am always looking for corpora of spoken French for my research so I was quite surprised to come across several freely available resources on the internet in the past week. Most of these corpora contain audio and/or video with transcripts of authentic and spontaneous spoken French – perfect for self-study or use in a language lab.

  • SACODEYL (System-aided compilation: an open distribution of European youth language) is actually available in seven EU languages (English, French, German, Italian, Spanish, Romanian, and Lithuanian) and was designed specifically for teaching purposes. Click on Resources after choosing a corpus to access the learning packages.
  • TCOF (Traitement de Corpus Oraux en Français) includes recordings from the 1980’s and 1990’s, available under a Creative Commons license.
  • CFPP2000 (Corpus de français parlé parisien des années 2000) contains several interviews of Parisians from the early 2000’s. Audio files and transcripts are available for download.
  • CFPQ (Corpus de français parlé au Québec) is a multimodal corpus that also includes information on non-verbal aspects of communication (such as gestures, facial movements, etc.) It also dates from the 2000’s; however, only PDFs of the transcripts are available.

Other corpora of spoken French or simply videos with transcripts that I’ve mentioned in the past include:

And don’t forget my French Listening Resources, with plenty of transcripts and exercises.

If you know of other freely accessible corpora of French, please let me know.