Category Archives: PhD Research

Practical Corpus Linguistics by Martin Weisser

Practical Corpus Linguistics: An Introduction to Corpus-Based Language Analysis

Practical Corpus Linguistics: An Introduction to Corpus-Based Language Analysis by Martin Weisser

Practical Corpus Linguistics is a great introduction to analyzing language data with hands-on exercises using free software and websites. For anyone interested in textual analysis, corpus linguistics, and digital humanities, this book will get you started on the basics. There are other introduction to corpus linguistics books available, but this appears to be the only one that is designed as more of a how-to guide rather than a theoretical overview.

Chapters include collecting and cleaning data, concordancing, querying mega corpora online, frequency analysis, keywords in context, and part-of-speech tagging. There are even chapters on regular expressions and XML. Each chapter features several exercises for you to try out, as well as solutions to and comments on the exercises. There is also a companion website for the book with more exercises as well as updates.

The BYU website was changed shortly after the book was published, so you will need to check the companion website for instructions on using the new interface for accessing the Corpus of Contemporary American English. However, the old website is still available. You will also need to download the free software, AntConc, and create a free account at the BNCweb site to follow along with the exercises.

If you prefer to analyze languages other than English, check out a simple analysis of telenovela Spanish phrases I carried out using DownThemAll to collect transcripts, Python to clean the text files, and AntConc to find the most common phrases.

More Corpus Linguistics

The 8-week Corpus Linguistics MOOC at Futurelearn is another great introduction to the methodology. Both Lancaster and Birmingham run Corpus Linguistics summer schools in the UK every year if you’re able to travel to them.

If you’re interested in learning more programming for humanities research (usually in Python), try the lessons at Programming Historian and check out my list of Digital Humanities tools. Martin Weisser also wrote Essential Programming for Linguists (using Perl) that you might be interested in.

Non-Linguists, Please Stop Trying to Do or Talk About Linguistics Without the Help of Actual Linguists

Ben Zimmer has a wonderful article on “When physicists do linguistics” over at the Boston Globe, which can perhaps be best summarized by this comic from xkcd:

Joking aside, I am happy that other disciplines have an interest in language – however, I hate when other disciplines try to do linguistic research and fail because they do not involve any actual linguists in the research. I agree completely when Zimmer says that there is a “need for better communication between disciplines that previously had little to do with each other.” Communication among related fields could use a little boost too, because it isn’t just physicists who publish papers that contradict linguistic research. Psychologists, speech pathologists, and cognitive scientists have been doing it wrong for a while too, especially when it comes to multilingual and cultural aspects of language acquisition.

Linguistics seems to the be the field that everyone thinks they can do without any special training. Most people wouldn’t think of talking about chemistry or mathematics without actually having studied those subjects. Yet everyone seems to think they are experts on language simply because they speak a language (their native language) or because they have learned another language. Sorry, but those abilities do not make you a qualified linguist nor do they give you the right to talk about language without checking facts or to teach language as if you were an experienced teacher. I know how to drive a car, but I don’t go around pretending to be a certified mechanic or give advice to others on how to fix their own cars.

Robert Lane Greene’s book, You Are What You Speak: Grammar Grouches, Language Laws, and the Politics of Identity, is about this phenomenon. People believe, and repeat, such ridiculous things as “this language has eleventy billion words for X” or “this language is primitive but that language is logical” all the time. Even worse, respected authors repeat these myths in their articles and books, such as Bill Bryson in The Mother Tongue, and so they are repeated again and again without anyone questioning whether they are true or not. These myths are dangerous because a lot of them are based on ethnocentrism and the perceived superiority of the way we speak compared to everyone else.

Please, do yourself a favor and study language seriously instead of repeating myths. Talk to actual linguists, read books written by actual linguists or whose authors talked to actual linguists. In addition to You Are What You Speak, you can start with Language Myths (for a general overview), Vocabulary Myths (for language learners/teachers, which I previously posted about), and the “truth-squad” blog Language Log. But most importantly, always question what is written about language even if it is published by best-selling authors or academic researchers because they may not be linguists at all.

Update 26/02/13: And another one! Ugh. “Why speaking English can make you poor when you retire” about research done by a behavioural economist. Hey, that’s not linguistics! ::sigh:: At least the article quotes my hero, John McWhorter.

Update 15/03/15: So glad I’m not the only one who complains about this: If you’re not a linguist, don’t do linguistic research by @EvilJoeMcVeigh

Topic vs. Frequency in Vocabulary Learning

Teachers and learners of languages, I am looking for your input in the topic vs. frequency debate. Almost all textbooks and coursebooks introduce vocabulary in chapter topics or themes such as food, clothing, transportation, etc.  These related words are often used to fill in the slots of functional phrases, which a lot of current books are based on thanks to the  popularity of the communicative approach. For example, one of the chapters in the French textbook that I use in my class combines the functions of offering, accepting and refusing with the topic of drinks. So students are expected to memorize the question Voulez-vous boire un/e ____ ? and the vocabulary list is full of nouns such as un verre de lait, une tasse de thé, un coca, un chocolat chaud, etc. (The conjugation of vouloir is not actually taught in this or any preceding chapters.)

The problems with presenting vocabulary like this, however, is that it goes against vocabulary acquisition research. Many researchers have argued that grouping vocabulary into topics (and therefore semantic sets) actually hinders acquisition and confuses the students more. The topics tend to represent concrete concepts as well and can easily be illustrated in the chapters with pictures or photographs – which consequently leaves out abstract ideas. Plus words grouped according to topic mean that the words are not grouped according to frequency, which is the most important criterion for selecting vocabulary to teach/learn first.  Of course, frequency is not the only criterion, but it should be the starting point for vocabulary selection.

Learn opposites together = forever confused

If frequency is supported by research and topic is not, then why do all textbooks teach vocabulary based on topic? Is it because it easier to write textbooks in this way? Is it easier for the instructor to teach in this way? Is it considered less boring and more engaging for students to learn in this manner even if it goes against vocabulary acquisition research?

I’ve heard arguments that students should learn vocabulary in topics so they can talk about them right away, but that doesn’t make sense if the students don’t even have the basic vocabulary needed to construct sentences. Even if you learn all the articles of clothing, what exactly can you say about them? How can you have conversations about clothes if all you know if a list of nouns? In my class’s textbook, students learn to say Je porte un/e ___ and then some adjectives to describe the clothes. I really don’t see how that is going to help them communicate in the real world.

It seems to me that it’s more of a classroom vs. real world debate. We want students to be able to use the language as soon as possible, even if that means teaching things that will only ever be used inside the classroom. But isn’t it our job as educators to prepare students as much as possible for the future when they will leave our classrooms? Or are we simply just trying to make sure they don’t fall asleep in class?

I’ m not saying that students should just learn the 2,000 most frequent words of a language in sequential order. That would be rather boring and frustrating. But there is a much better way of presenting vocabulary – the most frequent words among a few topics presented in story format, for example – that textbook authors keep resisting. And I want to know why! Is it because the textbook publishing industry does not want to change and try something new (for fear of losing money)? Is it because too many people think it’s more logical to learn vocabulary in semantic sets regardless of what research says? Personally I feel it is much more logical to learn the words that you are most likely to encounter, i.e. the most frequent words. Even if there are problems with frequency – such as, what texts were used in the corpus to generate the frequency data? – it is actually supported by research, and that is what is most important to me.

How many first year French students do you think really need to learn the words arc-boutant (flying buttress) or fluocompacte (energy-saving) but not tel (such), également (also), soit (either…or), mener (to lead), appartenir (to belong to), atteindre (to reach), entier (whole), moindre (least), or intérêt (interest)? These are all words that are not taught in the active vocabulary lists of ANY of the 12 first year textbooks that I am analyzing and they are all ranked among the top 500 most frequent words in French.

So what do you think?

Vocabulary Myths: Applying Second Language Research to Classroom Teaching

Vocabulary Myths: Applying Second Language Research to Classroom Teaching by Keith Folse (2004, University of Michigan Press) is a great introduction to the gap between practice and research in vocabulary learning and teaching.

I highly recommend the book, but if you’d like a shorter summary, Folse’s article “Myths about Teaching and Learning Second Language Vocabulary: What Recent Research Says” [TESL Reporter 37,2 (2004), pp. 1-13] is also available if you have access to online journals.

The eight myths are:

  1. Vocabulary is not as important in learning a foreign language as grammar or other areas.
  2. It is not good to use lists of words when learning vocabulary.
  3. Vocabulary should be presented in semantic sets.
  4. The use of translations is a poor way to learn new vocabulary.
  5. Guessing words from context is as productive for foreign language learners as it is for first language learners.
  6. The best vocabulary learners make use of only one or two effective specific vocabulary learning strategies.
  7. Foreign language learners should use a monolingual dictionary.
  8. Vocabulary is sufficiently covered in our curricula and courses.

Think about your language classes and how many of these myths were prevalent in the textbook or even encouraged by your teacher.  These myths make teaching languages as well as designing textbooks much easier for the teacher or author, but they go against second language acquisition research on how learners should go about learning a language and tend to make learning even harder.

Native Speaker Teachers and Use of the First Language in the Classroom

Around the world, there is a conventional thought that foreign languages should only be taught by native speakers and that the students’ native language should be banned from the classroom. This is especially commonplace among English as a Second or Foreign Language schools which tend to exclusively employ native speakers of English, even if they have absolutely no experience or training in language teaching. However, this is mostly done for reasons related to money, prestige and prejudice and it is not, in fact, supported by linguistic research. Imagine any other business where you could teach someone else to do something in which you have absolutely no knowledge or success. How can you teach someone to speak a second or additional language when you do not speak a second or additional language yourself?

Or knowledge, or expertise, or degrees…

Only hiring native speakers and denying use of the native/first language (L1) only serves to undermine (and insult) multilingual local teachers and contradicts numerous studies showing the benefits of using the native language to learn a second or subsequent language. I certainly feel insulted when people say they will not learn languages from non-native speaking teachers because I am a non-native teacher of French. I am fluent in the language and have years of teaching experience, as well as several degrees and publications, and yet because my native language is not French that somehow makes me inferior to native speakers with no experience or education in teaching. In many ways, I actually prefer non-native speakers as teachers because then I know they have gone through the same experience as me in learning the language and they know the mistakes that I am likely to make and how to avoid them. Many people do not want to learn from non-native speakers because of their accent or the fear that the teacher will make mistakes, most of the input in the foreign language needs to come from authentic sources of language use rather than from the teacher anyway.

This problem is more rampant among English classes since English is taught much more often across the globe, but the prejudice remains for all languages. And it leads into the second issue of banning the L1, because if the teacher is monolingual then he or she cannot resort to another language in the classroom. Yet second language acquisition research provides no reason to ban the L1 completely from the classroom, and there certainly exists research to support that using the L1 is more effective for certain aspects of language learning – such as explaining grammar or tasks, disciplining students, translations for ambiguous words, etc. Of course, there are limits to how much the L1 should be used as the amount of input in the second language (L2) is extremely important. But the L1 does indeed help in learning the L2 and creating connections between the two languages. As there is some overlap among languages in the brain, it can be impossible to “turn off” the L1 when using the L2. Code-switching and constantly moving between languages and cultures is entirely normal – it is not something to be banned or looked down upon.

The success of immersion programs has been used as the rationale to support banning the L1, and even though teaching non-language courses in a foreign language can improve language learning, many immersion programs do not ban the L1 completely. In fact, much of the research on immersion programs show the importance of adding an L2 to an L1 instead of replacing the L1 by an L2. Unfortunately it happens all too often that the opposite of research reported in the popular press immediately becomes wrong. We are too quick to assume that evidence for an idea also means evidence against the competing idea. Yet nothing is ever that black and white. The success of a few immersion programs should in no way imply that non-immersion programs are a failure, especially when there is no evidence for it. And thanks to research on code-switching, the cognitive benefits of L1 use, and L2 language exposure (input alone does not suffice – it must become intake), many scholars have softened their position to agree that the L1 should not be banned completely.

Bilingual kidsCode-switching makes me smile

Language students should always be thought of as developing bilinguals or multilinguals, rather than two or more monolinguals. The monolingual native speaker model that is portrayed in essentially all pedagogical materials (as well as by hiring monolingual teachers) presents an unattainable and impossible goal for language learners. When you learn a second language, you are no longer monolingual and by definition, you will never be a native speaker of another language. So why is that the model that we teach to students?  I completely agree with Carl Blyth when he notes the irony of “using monolingual speakers as role models for learners striving to overcome their own monolingualism.” We need non-native and multilingual models and teachers of the language because that is exactly what the students are and what they will become: non-native and multilingual.

Students should never be denied the opportunity to use their L1 in any type of learning, especially young students who haven’t even completely acquired their native language yet. Allowing the native language in school has many benefits, yet there still exists “English Only” attitudes that only help to deteriorate students’ cognitive abilities. Recent reports of students being punished for speaking their native language – such as Menominee in the US or French in northern Belgium – are worrying because they bring back horrible reminders of Native American boarding schools and the Stolen Generation. Students should certainly never be made to feel as though their language is bad or wrong, because if their language is undesirable, then what about the culture linked to the language or the people themselves who speak the language? Are they undesirable as human beings as well?

English Only ZoneJust say NO to lack of empirical evidence!

Fortunately researchers have started calling for a more bilingual or lingua franca approach to teaching English which focuses on context and learner needs, which really should be applied to all languages. Ideally the teachers are multilingual and multicultural, who know the language of their students and have some knowledge of the particularities of the varieties of the language used throughout the world. When talking about world languages, we tend to think of English, Spanish, French, Arabic, etc. but every language consists of varieties depending on where and how it is used. For more information on lingua franca teaching, World Englishes: Implications for International Communication and English Language Teaching by Andy Kirkpatrick is a great introduction.

 

Other books I like to re-read on this topic include:

Australia’s Language Potential by Michael Clyne

Second Language Learning and Language Teaching by Vivian Cook

First Language Use in Second and Foreign Language Learning edited by Miles Turnbull and Jennifer Dailey-O’Cain (especially chapter 9 by Carl Blyth)

Linguistic Semantics: Language Reflects Ways of Living and Thinking

Anna Wierzbicka is a Polish-Australian linguist who has extensively researched intercultural linguistics, semantics and pragmatics. I have been reading many of her books and articles for my PhD research because she is interested in how language reflects ways of living and thinking, and more specifically, how the lexicon or words of a language can provide valuable clues to understanding culture.

Linguistic relativity, better known as the Sapir-Whorf Hypothesis, has been debated for quite a while by certain researchers who argue that human thought and language are completely separate and independent. Steven Pinker, author of The Language Instinct, is probably the most popular denier. However, Pinker was attempting to describe human thought and cognition on the basis of English alone.  Wierzbicka, among others, has rightly criticized Pinker for his views on the link between language and thought. Here are a few quotes from the introduction to her book, Understanding Cultures through their Key Words:

“To people with an intimate knowledge of two (or more) different languages and cultures, it is usually self-evident that language and patterns of thought are interlinked… Monolingual popular opinion, as well as the opinion of some cognitive scientists with little interest in languages and cultures, can be quite emphatic in their denial of the existence of such links and differences.”

“The grip of people’s native language on their thinking habits is so strong that they are no more aware of the conventions to which they are party than they are of the air they breathe; and when others try to draw attention to these conventions they may even go on with a seemingly unshakable self-assurance to deny their existence.”

“The conviction that one can understand human cognition, and human psychology in general, on the basis of English alone seems shortsighted, if not downright ethnocentric.”

The strong version of the Sapir-Whorf hypothesis – that language constrains thought and prevents users of a language from thinking about certain concepts – is indeed wrong. The weak version of the hypothesis, which Guy Deutscher attempted to explain in his popular article Does Your Language Shape How You Think? and his book, Through the Language Glass: Why the World Looks Different in Other Languages, is generally accepted by most linguists. Deutscher, however, insists on stating that language creates thought when in fact it may be more accurate to say that culture influences thought, which is then expressed through language. Personally, I believe that language reflects and describes ways of living and thinking, but it does not necessarily shape or determine how you live or think.

This is precisely John McWhorter’s criticism of Deutscher’s book, though I do have to disagree with his assertion that color perception as evidence of linguistic relativity is “dull.” If someone does not think cultural elaboration through the lexicon, such as the famous words for snow example, is interesting or relevant, then why does that person bother researching languages and cultures in the first place? Besides, as Wierzbicka explains, “once the principle of cultural elaboration has been established as valid on the basis of ‘boring’ examples, it can then be applied to areas whose patterning is less obvious to the naked eye.”

Here’s an interesting experiment you can try with color perception. It will be very easy to choose which square is a different color in the image below.

 

Linguistic Semantics: Language Reflects Ways of Living and Thinking

 

However, it will probably be a tiny bit harder to find which square is different in the second image. (If you’ve seen these circles before, beware that I did change the location of the different square!)

 

Linguistic Semantics: Language Reflects Ways of Living and Thinking

 

Yet the Himba of northern Namibia have the exact opposite problem. They are able to detect the different square quite easily in the second image, but took longer for the first image, because their culture, and therefore language, has a different way of categorizing shades of colors. Not every human being thinks in terms of  ROYGBIV. Because English speakers do not normally classify colors based on slightly different shades (or at least what we perceive as slightly different shades) of green in the second image, it is harder for English speakers to see it at first glance, but the absence of that word does not mean that English speakers cannot see it at all or do not have the ability to form the concept in their minds.

My native language does not have a word for Schadenfreude but I certainly know what it is and can understand the concept. The fact that German has one word for this concept and English does not simply means that the concept is perhaps more salient for users of German, but it does not mean that users of other languages cannot conceive of what it is. There are countless “untranslatable” words such as saudade, hyggelig, or litost that express the values and thoughts of the people who use these words. They provide insights into the life of the society and culture to which the language belongs. We cannot even begin to understand a different culture if we do not know the words because it is through language that culture and ways of living and thinking are expressed.

 

Linguistic Semantics: Language Reflects Ways of Living and Thinking

 

Another book by Wierzbicka I recommend, Translating Lives: Living with Two Languages and Cultures, includes the experiences of twelve Australians who speak more than one language. Their stories and their lives show how language, culture and identity cannot be separated and what it is like to live with, and between, multiple languages and cultures. For anyone who is a speaker of another language, the idea that you are a different person and that you interact with other human beings in a different way when using different languages seems a bit obvious. But most monolinguals are not aware that their worldview is shaped by their native, and only, culture and language. They tend to assume that the every human being thinks in the same way but simply uses different words for concepts, objects, ideas, etc. Even if they know a few words in another language, they believe that translations found in dictionaries are sufficient. Dictionaries may list freedom as the translation for French liberté, but are they really the same thing? How about truth and Russian pravda? Anger and Italian rabbia?

To quote Sapir: “The fact of the matter is that the ‘real world’ is to a large extent unconsciously built up on the language habits of the group. No two languages are ever sufficiently similar to be considered as representing the same social reality. The worlds in which different societies live are distinct worlds, not merely the same world with different labels attached.”

When I speak French, I am fully aware that I am not the same person as when I speak English. I do not interact with other French speakers in the same manner as I do with English speakers while I’m speaking English. There are certain concepts that I find easier to express in French, and yet others that do not have a strong enough emphasis or connotation for me if I use French rather than English. When I hear the word milk in English, I have a different concept of what it is compared to when I hear lait in French. I’ve explored some of these cultural differences before (Cultural Differences in Photos & Culturally Relevant Photos), but they are not limited to separate languages. There are, of course, differences among dialects of the same language. Whenever Australians say the word thongs, I picture a very different article of clothing than they do!

That is not to say that all words in a language are culture-specific. If they were, cultural differences couldn’t really be explored. Linguistic relativity is actually combined with linguistic universality. Wierzbicka is also the lead researcher on Natural Semantic Metalanguage, an approach to cultural analysis that is based on the idea that there are, in fact, a few universal meanings expressed by words (semantic primes) shared by all human languages and that using these primes can help eliminate cross-cultural miscommunication. Listen to/read her interview with the Australian Broadcasting Corporation for more information.

I’d love to hear your opinions on this! Do you believe that how we speak shapes how we think OR that how we think shapes how we speak? Or are language and thought so interlinked that we cannot separate them?