Practical Corpus Linguistics: An Introduction to Corpus-Based Language Analysis by Martin Weisser
Practical Corpus Linguistics is a great introduction to analyzing language data with hands-on exercises using free software and websites. For anyone interested in textual analysis, corpus linguistics, and digital humanities, this book will get you started on the basics. There are other introduction to corpus linguistics books available, but this appears to be the only one that is designed as more of a how-to guide rather than a theoretical overview.
Chapters include collecting and cleaning data, concordancing, querying mega corpora online, frequency analysis, keywords in context, and part-of-speech tagging. There are even chapters on regular expressions and XML. Each chapter features several exercises for you to try out, as well as solutions to and comments on the exercises. There is also a companion website for the book with more exercises as well as updates.
The BYU website was changed shortly after the book was published, so you will need to check the companion website for instructions on using the new interface for accessing the Corpus of Contemporary American English. However, the old website is still available. You will also need to download the free software, AntConc, and create a free account at the BNCweb site to follow along with the exercises.
If you prefer to analyze languages other than English, check out a simple analysis of telenovela Spanish phrases I carried out using DownThemAll to collect transcripts, Python to clean the text files, and AntConc to find the most common phrases.
More Corpus Linguistics
The 8-week Corpus Linguistics MOOC at Futurelearn is another great introduction to the methodology. Both Lancaster and Birmingham run Corpus Linguistics summer schools in the UK every year if you’re able to travel to them.
If you’re interested in learning more programming for humanities research (usually in Python), try the lessons at Programming Historian and check out my list of Digital Humanities tools. Martin Weisser also wrote Essential Programming for Linguists (using Perl) that you might be interested in.