GSLT: Statistical Methods 1



The aim of this course is to give a research-oriented introduction to probabilistic modeling, statistical methods and their use within the field of language technology. The course is aimed at students with a basic knowledge of natural language processing and/or speech technology (at least the equivalent of a GSLT level 1 course in one of these areas, see NLP, Speech technology). Basic programming skills are useful as well as a rudimentary knowledge of basic statistics and probability theory.

The course consists of two parts:

Before the first part of the course, there will be a short phase of self-study in order to build a common platform for the first part of the course. The main text book for the course is Manning & Schütze (1999) Foundations of Statistical Natural Language Processing.

NB: The official language within GSLT is English but we can decide to have lectures, seminars and discussions in Swedish instead, provided of course that all participants are comfortable with this. In any case, participants are free to formulate their contributions to discussions, whether oral or written, in any language that can be understood by the other participants (which in most circumstances means Swedish or English).


Preparatory Studies

Before the first teaching period, all participants will be expected to review basic concepts of statistics and language technology. Recommended reading is Manning & Schütze (1999) Foundations of Statistical Natural Language Processing, Part I (Preliminaries). In addition, participants will be asked to read Nivre, On Statistical Methods in Natural Language Processing as preparation for the first intensive week.


Part 1: Basic Course

The first part of the course starts with a series of lectures during the first intensive week as follows (times still to be confirmed):

Date
Time
Room
Contents
28/1
9-11
H527
Tutorial on Probability Theory (optional)
28/1
13-15
H323
Introduction [Slides]
28/1
15-17
H323
Stochastic Models and Algorithms [Slides]
29/1
9-11
C444
Statistical Learning [Slides]
30/1
9-11
D304
Statistical Evaluation [Slides]

The period between the first and second intensive weeks will be devoted to the study of parts II-IV of Manning & Schütze (1999) Foundations of Statistical Natural Language Processing, according to the following schedule:

Each participant will be responsible for summarizing one chapter. This summary should be distributed to the other participants for comments and questions by the end of the week devoted to the chapter in question. There will be no other assignments during this period.

The first part of the course ends with two guest lectures by Christer Samuelsson and one planning session during the second intensive week:

Date
Time
Room
Contents
15/3
9-11
D326
A Theory of Stochastic Grammars (CS)
15/3
13-15
D326
Project planning (JN)
15/3
15-17
D326
A Statistical Theory of Dependency Syntax (CS)

The slides for Christer Samuelsson's lectures will be distributed as hand-outs but can also be downloaded here.


Part 2: Individual Project

The second part of the course is an individual project, which will be reported in the form of a term paper. Topics have to be submitted (and approved) by the end of the second intensive week and the work carried out in the period between the second intensive week and the closing seminar. The deadline for the submission of the term paper is 24 April. For the closing seminar two reviewers will be assigned to each paper. The closing seminar will take place at Växjö University 2-4 May 2002.