This paper is available for the academic year 2016-17.
This paper provides an introduction to computational linguistics, covering the fundamental techniques which can be used to model linguistic phenomena computationally at the levels of morphology, syntax, semantics and pragmatics. Students are taught how such techniques are implemented, evaluated and applied to natural language processing (NLP) tasks. An overview of the use of such techniques is provided, along with an introduction to several applications (e.g. machine translation, summarisation and dialogue systems). At the end of the course, students will understand basic computational linguistics techniques as well as their limitations and current performance levels when applied to linguistic research and to real-world tasks.
The course will follow the main text book used for Computational Linguistics worldwide: Speech and Language Processing - An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition by Daniel Jurafsky and James. H. Martin (2008, Second Edition, Prentice-Hall). This book will be accessible to all those taking the paper. More specialised reading is listed in each chapter of the book and the relevant readings will be introduced to students during the lectures. These materials are freely available on the Web (and will be downloadable as pdf documents).
- To introduce the fundamental techniques of natural language processing
- To develop an understanding of the limits of those techniques
- To introduce some current and potential applications
- To develop an understanding how natural language processing can support research in theoretical and applied linguistics
- Focus on basic natural language processing techniques at the levels of morphology, syntax, semantics and pragmatics
- Focus on text (rather than speech) processing
- No prerequisite courses in computational linguistics or computer science are required. The course is an entry level course accessible to any undergraduate student in linguistics, and does not require any programming skills.
Proposed lecture schedule/topics to be covered:
1. Introduction: Brief history of NLP research, current applications, generic NLP system architecture
2. Regular Expressions and Automata
3. Words, Transducers and N-grams
4. Part-of-Speech Tagging
5. Formal Grammars of English and Syntactic Parsing
6. Statistical Parsing
7. Features and Unification
8. Language and Complexity
9. The Representation of Meaning
10. Computational Semantics
11. Lexical Semantics
12. Computational Lexical Semantics
13. Computational Discourse
14. Dialogue and Conversational Agents
15. Applications I: Information Extraction, Summarisation
16. Applications II: Question-Answering, Machine Translation
Relevant reading lists are available from the Linguistics Resources on CamTools here.
The first part will provide an introduction to the course, and cover morphological and syntactic processing of language as well as discuss issues of language and complexity. The second part will focus on computational semantics and pragmatics, and introduce some well-known NLP applications.
You will receive sixteen lectures in total, eight in Michaelmas Term and eight in Lent Term. You will also have eight supervisions, normally three during Michaelmas Term, four in Lent Term and one in Easter Term.
Assessment will be by a three-hour written examination.
Dr Andrew Caines