MaltTagger


The software can be used freely for non-commercial research and educational purposes. It comes with no warranty, but we welcome all comments, bug reports, and suggestions for improvements.

Download:

malttagger0.2.tar.gz

Unpack:

% gunzip malttagger0.2.tar.gz
% tar -cvf malttagger0.2.tar

Build:

% cd tagger
% make
% cp data/option.dat ../bin/tagger/.
% cp data/iso8859-1.txt ../bin/tagger/.

Run:

Modify the option file before running.

% cd ../bin/tagger
% malttagger -f option.dat

Option file

The option file contains a sequence of parameter specifications with the following simple syntax:

$PARAMETER$
VALUE 

In addition, the option file may contain comment lines starting with "--". The following table lists all the available parameters with their permissible values. Default values are marked with "*". Parameters that lack a default value must be specified in the option file (if they are required by the particular configuration of modules invoked).

ParametersDescriptionValuesDescription
INFORMATInput formatTABThe input (for both learning and parsing) must be in the Malt-TAB format.
OUTFORMATOutput data formatTAB
XML*
Malt-TAB
Malt-XML
ENCODINGCharacter encodingFilenameCharacter encoding file which specify all of characters. Some character encoding files can be found under the data directory.
INFILEInput fileFilenameThe input (for both learning and parsing) must be in the Malt-TAB format. During learning the two columns form, postag are required; during parsing only the first (form) is required.
OUTFILEOutput fileFilename  
VERBOSEOutput to terminalYES*
NO
 
POSSETPart-of-speech tagsetFilenameThe part-of-speech tagset must be specified in a text file with one tag per line (and no blank lines).
MODEMode (learning or tagging)LEARN
TAG
Tagging (using a model to tag new data)
Learning (inducing a model from corpus data)
MODELName of the modelPath and/or prefix filenameDuring Learning it creates two files $MODEL$.con and $MODEL$.lex, and these two files are used during tagging.

Models

LanguageDataLinuxSolarisWindows/CygWin
Swedish SUC2 suc2granska_linux.tar.gz suc2granska_solaris.tar.gz suc2granska_win.tar.gz
English The Penn Treebank (WSJ) penn_linux.tar.gz penn_solaris.tar.gz penn_win.tar.gz
Danish DDT/Parole ddt_linux.tar.gz ddt_solaris.tar.gz ddt_win.tar.gz
© Växjö universitet (MSI) - Uppdaterad: 2006-05-16 (Johan Hall)