GSLT: Statistical Methods 1
The aim of this course is to give a research-oriented
introduction to probabilistic modeling, statistical methods
and their use within the field of language technology.
The course is aimed at students with a basic knowledge of
natural language processing and/or speech technology (at least
the equivalent of a GSLT level 1 course in one of these areas,
see NLP,
Speech
technology). Basic programming skills are useful as well as
a rudimentary knowledge of basic statistics and probability theory.
The course consists of two parts:
- The first part of the course is aimed at giving students an
overview of the field and a basic grasp of the most important
models and methods used in statistical approaches to language
technology. During this part of the course, participants
will mostly study the same material.
- The second part of the course is aimed at giving students an
in-depth understanding of a particular subfield, which is chosen
individually. During this part of the course, students will work
on individual projects which will be reported in the form of a
term paper.
Before the first part of the course, there will be a short
phase of self-study in order to build a common platform for
the first part of the course. The main text book for the course
is Manning & Schütze (1999)
Foundations
of Statistical Natural Language Processing.
NB:
The official language within GSLT is English but we can decide to have
lectures, seminars and discussions in Swedish instead,
provided of course that all participants
are comfortable with this. In any case, participants are free to formulate
their contributions to discussions, whether oral or written, in any language
that can be understood by the other participants (which in most
circumstances means Swedish or English).
Preparatory Studies
Before the first teaching period, all participants will be expected
to review basic concepts of statistics and language technology.
Recommended reading is Manning & Schütze (1999)
Foundations
of Statistical Natural Language Processing,
Part I (Preliminaries).
In addition, participants will be asked to read Nivre,
On
Statistical Methods in Natural Language Processing as preparation
for the first intensive week.
Part 1: Basic Course
The first part of the course starts with a series of lectures during
the first intensive week as follows (times still to be confirmed):
| Date |
Time |
Room |
Contents |
28/1 |
9-11 |
H527 |
Tutorial on Probability Theory (optional) |
28/1 |
13-15 |
H323 |
Introduction [Slides] |
28/1 |
15-17 |
H323 |
Stochastic Models and Algorithms [Slides] |
29/1 |
9-11 |
C444 |
Statistical Learning [Slides] |
30/1 |
9-11 |
D304 |
Statistical Evaluation [Slides] |
The period between the first and second intensive weeks will
be devoted to the study of parts II-IV of Manning & Schütze
(1999) Foundations
of Statistical Natural Language Processing,
according to the following schedule:
- Week 6: Follow-up on the first intensive week
- Week 7: Part I (Chapters 5-8)
- Week 8: Part II (Chapters 9-12)
- Week 9: Part III (Chapters 13-16)
- Week 10: Preparation for the second intensive week
Each participant will be responsible for summarizing one chapter. This
summary should be distributed to the other participants for comments
and questions by the end of the week devoted to the chapter in
question. There will be no other assignments during this period.
The first part of the course ends with two guest lectures by Christer
Samuelsson and one planning session during the second intensive week:
| Date |
Time |
Room |
Contents |
15/3 |
9-11 |
D326 |
A Theory of Stochastic Grammars (CS) |
15/3 |
13-15 |
D326 |
Project planning (JN) |
15/3 |
15-17 |
D326 |
A Statistical Theory of Dependency Syntax (CS) |
The slides for Christer Samuelsson's lectures will be distributed as
hand-outs but can also be downloaded here.
Part 2: Individual Project
The second part of the course is an individual project, which will
be reported in the form of a term paper. Topics have to be submitted
(and approved) by the end of the second intensive week and the work
carried out in the period between the second intensive week and
the closing seminar. The deadline for the submission of the term
paper is 24 April. For the closing seminar two reviewers will be
assigned to each paper.
The closing seminar will take place at
Växjö University
2-4 May 2002.