Introduction to Statistical Natural Language Processing
Before diving into the sea of probability theory and statistics, it may
be good to have a basic idea of where we are going and what we want to
achieve. In this introductory lecture, I will therefore try to give
preliminary answers to the following two questions:
- What is statistics?
- How can it be used in natural language processing?
Statistics is a vast field, but from the point of view of natural language
processing, there are three components of statistics that are important:
- Probability theory: Mathematical theory of uncertainty (random experiments).
- Descriptive statistics: Methods for summarizing (large) datasets.
- Inferential statistics: Methods for drawing inferences from (large) datasets.
The use of statistics in natural language processing falls mainly in three
categories:
- Processing: We may use probabilistic models or algorithms to process
natural language input or output.
- Learning: We may use inferential statistics to learn from examples (corpus data).
In particular, we may estimate the parameters of probabilistic models that
can be used in processing.
- Evaluation: We may use statistics to assess the performance of language
processing systems.
We can exemplify this with respect to part-of-speech tagging:
- Processing: A probabilistic part-of-speech tagger computes the most probable
part-of-speech sequence for a given word sequence, using a probabilistic model M.
- Learning: The parameters of the model M used by the tagger can be estimated
from corpus data using a variety of methods.
- Evaluation: The performance of the tagger can be evaluated by running it on
a test data set and computing various statistical measures.
The rest of the course is organized as follows:
- Lectures 2-4 introduce the necessary concepts from probability
theory and statistics.
- Lectures 5-9 deal with different areas of natural language processing,
focusing on statistical methods in processing and learning.
- Lecture 10 is devoted to the use of statistics in evaluation.
Slides for lecture 1
Suggested Reading
- Manning, C. D. & Schütze, H. (1999) Foundations of Statistical
Natural Language Processing. MIT Press. Chapter 1.
- Nivre, J. (2002)
On Statistical Methods in Natural Language Processing.
In Bubenko, J. & Wangler, B. (eds) Promote IT. Second Conference for the
Promotion of Research in IT at New Universities and University Colleges
in Sweden. University of Skövde, 684-694.