MaltParser 0.3
This is the home page for MaltParser, Version 0.3, a system for data-driven dependency parsing,
which can be used to induce a parsing model from treebank data and to parse new data using an
induced model. Besides instructions for downloading the system for various platforms, this page
also contains a user guide.
Relation to older versions:
- From Version 0.1 to 0.2:
- Parsing algorithms: Nivre's algorithm (Nivre 2003) has been reimplemented
and Covington's incremental algorithm (Covington 2001) has been added.
- Learning algorithms: Support vector machines, using LIBSVM (Chang and Lin 2005),
has been added
in addition to memory-based learning, using TiMBL (Daelemans and Van den Bosch 2005).
- Feature models: A specification language for feature models has been implemented,
which allows the user to specify arbitrary combinations of lexical features, part-of-speech
features, and dependency type features. (This functionality replaces the hard-coded models
supplied with version 0.1.)
- From Version 0.2
to 0.21: More robust handling of input and output and new options
for setting maximum sentence length, maximum token length
and maximum number of features.
- From Version 0.21
to 0.22: New option for setting maximum tag length and command line
flags for all options.
- From Version 0.22
to 0.23: New options for character encoding and root label.
- From Version 0.23
to 0.3: New input and output formats compatible with the
CoNLL-X shared task,
with fields separated by a single tab (not blanks).
Note, however, that only the fields FORM, POSTAG and DEPREL can be
used as features in the current version. In a later version, we plan
to incorporate LEMMA, CPOSTAG and FEATS as well.
Download MaltParser 0.3
MaltParser 0.3 can be downloaded as binaries and is available on three platforms:
The software can be used freely for non-commercial research and educational
purposes. It comes with no warranty, but we welcome all comments, bug reports,
and suggestions for improvements.
MaltParser 0.3 uses libTimbl, part of TiMBL
(Tilburg Memory-Based Learner), Version 5.1, and
LIBSVM, Version 2.8,
in order to learn parsing models
from treebanks, and we gratefully acknowledge the use of these software packages. However,
MaltParser 0.3 is a standalone application, so there is
no need to install either TiMBL or LIBSVM separately.
Shortcut
For users who only want to have a decent robust dependency parser (and who are not
interested in experimenting with different parsing algorithms, learning algorithms
and feature models), we provide pretrained parsing models for a selection of languages:
These models can be used together with MaltParser to get a running parser without
having access to training data from a treebank. Note, however, that the parser presupposes
that the input has been segmented and tagged for parts-of-speech in accordance with the
format of the original treebank (see details for each language).
User Guide for MaltParser 0.3
This is a short user guide for MaltParser 0.3, a data-driven parser that uses
dependency-based syntactic representations and treebank-induced classifiers
to guide the parser at nondeterministic choice points. The guide includes basic
instructions for running the system, including a specification of the option
file used to control the system's behavior, and is followed by a short presentation of
the parsing methodology implemented in the system.
More information about parsing algorithms, learning algorithms
and feature models can be found in the following
publications:
- Chang, C.-C. and Lin, C.-J. (2005)
LIBSVM: A Library for Support
Vector Machines. URL:
http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf.
- Covington, M. A. (2001)
A Fundamental Algorithm for Dependency Parsing.
In Proceedings of the 39th Annual ACM Southeast Conference,
pp. 95-102.
- Daelemans, W. and Van den Bosch, A. (2005)
Memory-Based Language Processing.
Cambridge University Press.
- Nivre, J. (2003)
An Efficient
Algorithm for Projective Dependency Parsing. In Proceedings
of the 8th International Workshop on Parsing Technologies (IWPT 03),
Nancy, France, 23-25 April 2003, pp. 149-160.
- Nivre, J., Hall, J. and Nilsson, J. (2004)
Memory-Based
Dependency Parsing.
In Ng, H. T. and Riloff, E. (eds.)
Proceedings of the Eighth Conference on Computational Natural
Language Learning (CoNLL), May 6-7, 2004, Boston, Massachusetts,
pp. 49-56.
- Nivre, J. (2004)
Incrementality
in Deterministic Dependency Parsing. In Incremental Parsing:
Bringing Engineering and Cognition Together. Workshop at ACL-2004,
Barcelona, Spain, July 25, 2004.
- Nivre, J. and Scholz, M. (2004)
Deterministic
Dependency Parsing of English Text. In Proceedings of COLING 2004,
Geneva, Switzerland, August 23-27, 2004.
Running MaltParser
The parser can be run in two basic modes, learning (inducing a
parsing model from a treebank) and parsing (using the parsing
model to parse new data). In the current version of the parser, new data
must be tokenized and part-of-speech tagged in the
Malt-TAB format.
Regardless of mode, MaltParser is normally run by executing the following
command at the command line prompt:
> ./maltparser -f file
where file is the name of an option file, specifying all
the parameters needed. From version 0.22 it is also possible to
specify options using command line flags, which will override any
settings included in the option file. (Although it is possible to
run the program using only flags and no option file, it is usually
more convenient to combine the two methods.) The option file and
the flags are described in detail below. A list of all flags and
options can be obtained by running MaltParser with the -h flag:
> ./maltparser -h
Option File
The option file contains a sequence of parameter specifications with
the following simple syntax:
$PARAMETER$
VALUE
In addition, the option file may contain comment lines starting with "--".
The following table lists all the available parameters with their
permissible values. Default values are marked with "*". Parameters
that lack a default value must be specified in the option file
(if they are required by the particular configuration of modules
invoked). An example option file can be found
here.
In the following table, we describe the different options that can be specified
in the option file. Options that have no default value must be specify.
For each option we also give the flag that can be used for specifying the
option directly in the command line (overriding any specifications in the
option file).
| Global Parameters
| Flag
| Description
| Values
| Description
|
| INFILE
| -i
| Input file
| Filename
| The input (for both learning and parsing)
must be in the Malt-TAB or
CoNLL-X shared task format format,
as specified in the INFORMAT option (see below).
An example input file in Malt-TAB format
can be found here.
|
| OUTFILE
| -o
| Output file
| Filename
|
| INFORMAT
| -I
| Input data format
| MALTTAB* CONLLTAB
| Malt-TAB
CoNLL-X shared task format (tab separated)
|
| OUTFORMAT
| -O
| Output data format
| MALTTAB CONLLTAB MALTXML* CONLLXML TIGERXML
| Malt-TAB
CoNLL-X shared task format (tab separated)
Malt-XML
CoNLL-X shared task format (XML version)
TIGER-XML
|
| CHARSET
| -C
| Character set
| ISO-8859-1* UTF-8 ...
| NB: The user must verify that the character encoding
in the input file conforms to the specification of this option.
The parser does not check that the encoding is correct and only
escapes the characters ", ', &, <, > if the output format
is MALTXML or TIGERXML.
|
| VERBOSE
| -v
| Output to terminal
| YES* NO
|
| MAXSENTENCELENGTH
| -z
| Maximum number of tokens per sentence
| Integer | Default = 512
|
| MAXTOKENLENGTH
| -y
| Maximum number of characters per token
| Integer | Default = 256
|
| Tagset Parameters
| Flag
| Description
| Values
| Description
|
| MAXTAGLENGTH
| -w
| Maximum number of characters per tag name
| Integer | Default = 128
|
| POSSET
| -P
| Part-of-speech tagset
| Filename
| The part-of-speech tagset must be specified in a text file
with one tag per line (and no blank lines). An example file
can be found here.
|
| CPOSSET
| -Q
| Coarse-grained part-of-speech tagset (CoNLL format only)
| Filename
| The coarse-grained part-of-speech tagset must be specified in a text file
with one tag per line (and no blank lines). An example file
can be found here.
|
| DEPSET
| -D
| Dependency type tagset
| Filename
| The dependency type tagset must be specified in a text file
with one tag per line (and no blank lines). An example file
can be found here.
|
| ROOTLABEL
| -R
| Dependency type label used for roots
| String
| Default = ROOT
|
| Parser Parameters
| Flag
| Description
| Values
| Description
|
| MODE
| -m
| Mode (learning or parsing)
| PARSE*
| Parsing (using an induced model to parse new data)
|
| | | LEARN | Learning (inducing a model from
treebank data)
|
| ALGORITHM
| -a
| Parsing algorithm (see description below)
| NIVRE* COVINGTON
| Nivre (2003, 2004) Covington (2001) (incremental)
|
| PARSEROPTIONS
| -p
| Parser options (algorithm specific)
| -a [ES] | Arc order (NIVRE): E(ager), S(tandard)NB: In flag,
whitespace is replaced by underscore (e.g.
"-a_E" instead of "-a E").
|
| | | -g [NP] | Graph condition (COVINGTON): N(on-Projective), P(rojective)
|
| | |
|
| Guide Parameters
| Flag
| Description
| Values
| Description
|
| MAXFEATURES
| -c
| Maximum number of features of each type
| Integer | Default = 30
|
| FEATURES
| -F
| Feature model specification (see description below)
| Filename
| Model specified in Filename.par
NB: If no feature model specification can be loaded,
a default specification equivalent to m3.par
is used for parsing.
|
| Learner Parameters
| Flag
| Description
| Values
| Description
|
| LEARNER
| -l
| Learner type (see description below)
| MBL* SVM
| Memory-based learning (TiMBL) Support vector machine (LIBSVM)
|
| LEARNEROPTIONS
| -L
| Parameter settings (learner specific)
| String
| TiMBL example: "-m M -k 5 -w 0 -d ID -L 3"
(see TiMBL Documentation)
NB: In flag,
whitespace is replaced by underscore (e.g.
"-m_M_-k_5" instead of "-m M -k 5").
|
| | | | LIBSVM example: "-t 0"
(see LIBSVM Documentation)
|
Inductive Dependency Parsing
MaltParser can be characterized as a data-driven parser-generator.
While a traditional parser-generator constructs a parser given a
grammar, a data-driven parser-generator constructs a parser given
a treebank. MaltParser is an implementation of
inductive dependency parsing, where
the syntactic analysis of a sentence amounts to the derivation of
a dependency structure, and where inductive machine learning is used
to guide the parser at nondeterministic choice points. This parsing
methodology is based on three essential components:
- Deterministic parsing algorithms for building dependency
graphs.
- History-based feature models for predicting the next parser
action.
- Discriminative machine learning to map histories to parser actions
Given the restrictions imposed by these components, MaltParser has been
designed to give maximum flexibility in the way components can be
varied independently of each other. We now describe the functionality
for each of the components in turn.
Parsing Algorithms
Any deterministic parsing algorithm compatible with the MaltParser
architecture has to operate with the following set of data structures,
which also provide the interface to the feature model:
- A stack STACK of partially processed tokens, where STACK[i]
is the i+1th token from the top of the stack, with the top being
STACK[0].
- A list INPUT of remaining input tokens, where INPUT[i] is the
i+1th token in the list, with the first token being INPUT[0].
- A stack CONTEXT of unattached tokens occurring between the token
on top of the stack and the next input token, with the top CONTEXT[0]
being the token closest to STACK[0] (farthest from INPUT[0]).
- A function HEAD defining the partially built dependency structure,
where HEAD[i] is the syntactic head of the token i (with
HEAD[i] = 0 if i is not yet attached to a head).
- A function DEP labeling the partially built dependency structure,
where DEP[i] is the dependency type linking the token i to its syntactic head
(with DEP[i] = ROOT if i is not yet attached to a head).
- A function LC defining the leftmost child of a token in the partially built
dependency structure (with LC[i] = 0 if i has not left children).
- A function RC defining the rightmost child of a token in the partially built
dependency structure (with RC[i] = 0 if i has not right children).
- A function LS defining the next left sibling of a token in the partially built
dependency structure (with LS[i] = 0 if i has no left siblings).
- A function RS defining the next right sibling of a token in the partially built
dependency structure (with RS[i] = 0 if i has no right siblings).
An algorithm builds dependency structures incrementally by updating
HEAD and DEP, but it can only add a dependency arc between the top of the stack
(STACK[0]) and the next input token (INPUT[0]) in the current configuration.
(The context stack CONTEXT is therefore only used by algorithms that allow
non-projective dependency structures, since unattached tokens under a dependency
arc are ruled out in projective dependency structures.)
MaltParser provides two basic parsing algorithms, each with two options:
- Nivre's algorithm (Nivre 2003, Nivre 2004) is a linear-time algorithm
limited to projective dependency structures. It can be run in arc-eager
(-a E) or arc-standard (-a S) mode (cf. Nivre 2004).
- Covington's algorithm (Covington 2001) is a quadratic-time algorithm for
unrestricted dependency structures, which proceeds by trying to link each new token
to each preceding token. It can be run in a projective (-g P) mode, where the linking
operation is restricted to projective dependency structure, or in a non-projective
(-g N) mode, allowing non-projective (but acyclic) dependency structures. In non-projective mode,
the algorithm uses the CONTEXT stack to store unattached tokens occurring between STACK[0]
and INPUT[0] (from right to left).
NB: The projective algorithms (Nivre's algorithm and Covington's algorithm
with the -g P option) will only perform well if the training data does not contain
unattached nodes (head="0") with arcs covering them, since the projectivity constraint
will prohibit the reproduction of such covering arcs during parsing. For example, if
internal punctuation is left unattached in the treebank data, performance can usually
be improved considerably by attaching these punctuation tokens to the nearest words
(either left or right).
Feature Models
MaltParser uses history-based feature models for predicting the next action in the
deterministic derivation of a dependency structure, which means that it uses features
of the partially built dependency structure together with features of the (tagged)
input string. More precisely, features are defined in terms of the word form (LEX),
part-of-speech (POS) or dependency type (DEP) of a token defined relative to one
of the data structures STACK, INPUT and CONTEXT, using the auxiliary functions
HEAD, LC, RC, LS and RS.
A feature model is defined in an external feature specification
with the following syntax:
<fspec> ::= <feat>+
<feat> ::= <lfeat> | <nlfeat>
<lfeat> ::= LEX \t <dstruc> \t <off> \t <suff> \n
<nlfeat> ::= (POS|DEP) \t <dstruc> \t <off> \n
<dstruc> ::= (STACK|INPUT|CONTEXT)
<off> ::= <nnint> \t <int> \t <nnint> \t <int> \t <int>
<suff> ::= <nnint>
<int> ::= (...|-2|-1|0|1|2|...)
<nnint> ::= (0|1|2|...)
As syntactic sugar, any <lfeat> or <nlfeat>
can be truncated if all remaining integer values are zero. An example feature specification can
be found here.
Each feature is specified on a single line, consisting of at least two tab-separated
columns. The first column defines the feature type to be lexical (LEX),
part-of-speech (POS), or dependency (DEP).
The second column identifies one of the main data structures in the
parser configuration, usually the stack (STACK) or the list of remaining input tokens (INPUT),
as the ``base address'' of the feature. (The third alternative, CONTEXT, is relevant only
together with Covington's algorithm in non-projective mode.)
The actual address is then specified by a series of
``offsets'' with respect to the base address as follows:
- The third column defines a list offset i, which can only be positive and which
identifies the i+1th token in the list/stack specified in the second column
(i.e. STACK[i], INPUT[i] or CONTEXT[i]).
- The fourth column defines a linear offset i, which can be positive (forward/right) or
negative (backward/left) and which refers to (relative) token positions in the original
input string.
- The fifth column defines an offset i in terms of the HEAD function, which has
to be non-negative and which specifies i applications of the HEAD function to the
token identified through preceding offsets.
- The sixth column defines an offset i in terms of the LC or RC function, which can
be negative (| i | applications of LC), positive (i applications of RC), or zero
(no applications).
- The seventh column defines an offset i in terms of the LS or RS function,which can
be negative (| i | applications of LS), positive (i applications of RS), or zero
(no applications).
Let us consider a few examples:
POS STACK 0 0 0 0 0
POS INPUT 1 0 0 0 0
POS INPUT 0 -1 0 0 0
DEP STACK 0 0 1 0 0
DEP STACK 0 0 0 -1 0
The feature defined on the first line is simply the part-of-speech of the token on
top of the stack (TOP). The second feature is the part-of-speech of the token immediately
after the next input token in the input list (NEXT), while the third feature is the part-of-speech
of the token immediately before NEXT in the original input string (which may not be present either
in the INPUT list or the STACK anymore). The fourth feature is the dependency type of the head of TOP
(zero steps down the stack, zero steps forward/backward in the input string, one step up to the head).
The fifth and final feature is the dependency type of the leftmost dependent of TOP (zero steps
down the stack, zero steps forward/backward in the input string, zero steps up through heads,
one step down to the leftmost dependent). Using the syntactic sugar of truncating all remaining
zeros, these five features can also be specified more succintly:
POS STACK
POS INPUT 1
POS INPUT 0 -1
DEP STACK 0 0 1
DEP STACK 0 0 0 -1
The only difference between lexical and non-lexical features is that the specification of
lexical features may contain an eighth column specifying a suffix length n.
By convention, if n = 0, the entire word form is included; otherwise only the n
last characters are included in the feature value. Thus, the following specification defines
a feature the value of which is the four-character suffix of the word form of the next left
sibling of the rightmost dependent of the head of the token immediately below TOP.
LEX STACK 1 0 1 1 -1 4
Finally, it is worth noting that if any of the offsets is undefined in a given
configuration, the feature is automatically assigned a null value.
Feature Model Examples
Some of the features regularly used in feature models are depicted below. Red features
are lexical features (LEX); blue features are part-of-speech features
(POS); and green features are dependency features (DEP).
The following table shows three of the models provided with Version 0.1 of MaltParser
(there called MBL2, MBL3 and MBL4 because MBL was the only learner type supported in
that version). For each model we also give a link to the feature specification for that model.
| Models
| Top
| Next
| T
| N
| TH
| TL
| TR
| NL
| TH
| TL
| TR
| NL
| L1
| L2
| L3
| Feature specification
|
| M2
| | | + | + | + | + | + | + | | | | | + | | | m2.par
|
| M3
| + | + | + | + | + | + | + | + | | | | | + | | | m3.par
|
| M4
| + | + | + | + | + | + | + | + | | | | | + | + | + | m4.par
|
Learning Algorithms
Inductive dependency parsing requires a learning algorithm to induce a mapping
from parser histories, relative to a given feature model, to parser actions, relative
to a given parsing algorithm. MaltParser comes with two different learning
algorithms, each with a wide variety of parameters:
- Memory-based learning and classification (Daelemans and Van den Bosch 2005)
stores all training
instances at learning time and uses some variant of k-nearest neighbor classification
to predict the next action at parsing time. MaltParser uses the software package
TiMBL
to implement this learning algorithm,
and supports all the options provided by that package.
- Support vector machines rely on kernel functions to induce a maximum-margin
hyperplane classifier at learning time, which can be used to predict the next
action at parsing time. MaltParser uses the library LIBSVM (Chang and Lin 2005)
to implement this algorithm with all the options provided by this library.