A Linguistic Analysis of Telenovela Spanish, or How this Nerdy Linguist Spent her Friday Night
Ever since I discovered that Univision started including transcripts of their telenovelas online, I had been wanting to experiment with the free corpus linguistics software AntConc to analyze the most common phrases used in telenovela Spanish. I chose Pasión y Poder because it had the most transcripts still available on the website, even though I rarely watched it. It was a fairly typical telenovela, unlike El Hotel de Los Secretos or Yago, with plenty of fighting and drama and a (mostly) happy ending. Unfortunately Telemundo does not provide transcripts of their telenovelas (which tend to be better) which is a shame since I’d love to analyze the language of La Esclava Blanca, a Colombian telenovela set in the mid 1800’s.
Here’s how I created the corpus and found the most frequent phrases, if you feel inclined to be as nerdy…
How to be a linguistics/telenovela nerd:
- Downloading the html files was easy and quick thanks to the DownThemAll add-on for Firefox and the fact that the URL of each episode only differs by the number so I was able to use batch descriptors. (I know webscraping is possible with Python, but my programming knowledge is still pretty basic and I knew that I could get the files with the add-on in about 20 seconds.)
- Then I needed to find a way to extract the text from all of the <p> tags – since the transcript was the only text enclosed in these tags in all of the html code – and create text files for each episode. I managed to find some Python/BeautifulSoup code online after an hour of searching that did what I needed, after a couple tweaks, a few tears, and many error messages.
- Finally, I loaded all the text files into AntConc and played around with the Clusters/N-Grams option and N-Gram Size to find the most frequent phrases between five and ten words.
NEW! Watch a video explaining the steps:
Most Frequent Phrases in Pasión y Poder
So here are the most frequent phrases used in Pasión y Poder, starting with ten word phrases and ending with five word phrases. Keep in mind that some of the phrases are typically Mexican, and some are overly dramatic because, well, they’re from a telenovela!
- A ver, a ver, a ver, a ver, a ver. (A ver is usually translated as let’s see, but I have no idea what a good translation for this many a vers together would be in natural English.)
- No te metas en lo que no te importa. (Don’t stick your nose where it doesn’t belong./Mind your own business.)
- No sabes el gusto que me da que… (You don’t know how happy it makes me that…)
¿No te das cuenta? ¿No te das cuenta? (Don’t you realize? Don’t you realize?)
- Esto no se va a quedar así. (This isn’t over. [said as a threat of revenge])
No me lo tomes a mal, pero… (Don’t take this the wrong way, but…)
- … lo que te voy a decir. (… what I’m going to tell you.)
Lo único que quiero es que… (The only thing I want is that…)
No, eso no va a pasar. (No, that is not going to happen.)
No tiene nada que ver con… (It has nothing to do with…)
Lo que pasa es que no… (What is happening is that … not)
No te lo voy a perdonar. (I’m not going to forgive you for it.)
No te voy a permitir que… (I won’t allow you to…)
Eres el amor de mi vida. (You are the love of my life.)
No tiene la culpa de nada. (S/he is not guilty of anything.)
A pesar de todo, lo que… (In spite of everything, what…)
Creo que lo mejor es que… (I think the best thing is that/to…)
Lo que me preocupa es que… (What worries me is that…)
Lo único que espero es que… (The only thing I hope is that…)
- Todo va a estar bien. (Everything will be fine.)
Me da mucho gusto que… (I’m very happy that…)
No voy a dejar que… (I’m not going to let…)
No, por supuesto que no. (No, of course not.)
¿Que fue lo que pasó? (What happened?)
Sí, lo sé, lo sé. (Yes, I know, I know.)
Ya me tengo que ir. (I have to go now.)
No me importa lo que… (I don’t care what…)
… lo que vas a hacer. (…what you’re going to do.)
Te pido por favor que… (I am asking you please to…)
Ya me di cuenta que… (I already realized that…)
De una vez por todas. (Once and for all.)
¿No te das cuenta que…? (Don’t you realize that…?)
Yo no tengo nada que… (I have nothing that…)
Y lo peor es que… (And the worst is that…)
Telenovela Battle of Screams and Insults
I was also interested in finding out which words I heard yelled all the time were more frequent:
In the battle suéltame (let go of me) vs. lárgate (get out), the winner is: ¡lárgate! (59 vs. 61)
And in the battle infeliz (fool) vs. desgraciado (bastard), the winner is: ¡infeliz! (74 vs. 69)
However, the winner of them all was ¡No puede ser! (It can’t be!) with a frequency count of 151.
So what have we learned?
To sum up, Telenovela Spanish is hilarious and corpus linguistics is amazing.
If you’d like to learn more about Corpus Linguistics, there is a great free Corpus Linguistics MOOC at Futurelearn and the hands-on exercises in the new textbook Practical Corpus Linguistics will get you started with AntConc, plus there are tutorials on Youtube on how to use this software. If you’re interested in learning Python, try Dr. Chuck’s Python for Everybody lessons.