The software can be used freely for non-commercial research and educational purposes. It comes with no warranty, but we welcome all comments, bug reports, and suggestions for improvements.
% gunzip malttagger0.2.tar.gz % tar -cvf malttagger0.2.tar
% cd tagger % make % cp data/option.dat ../bin/tagger/. % cp data/iso8859-1.txt ../bin/tagger/.
Modify the option file before running.
% cd ../bin/tagger % malttagger -f option.dat
The option file contains a sequence of parameter specifications with the following simple syntax:
$PARAMETER$ VALUE
In addition, the option file may contain comment lines starting with "--". The following table lists all the available parameters with their permissible values. Default values are marked with "*". Parameters that lack a default value must be specified in the option file (if they are required by the particular configuration of modules invoked).
| Parameters | Description | Values | Description |
|---|---|---|---|
| INFORMAT | Input format | TAB | The input (for both learning and parsing) must be in the Malt-TAB format. |
| OUTFORMAT | Output data format | TAB XML* | Malt-TAB Malt-XML |
| ENCODING | Character encoding | Filename | Character encoding file which specify all of characters. Some character encoding files can be found under the data directory. |
| INFILE | Input file | Filename | The input (for both learning and parsing) must be in the Malt-TAB format. During learning the two columns form, postag are required; during parsing only the first (form) is required. |
| OUTFILE | Output file | Filename | |
| VERBOSE | Output to terminal | YES* NO | |
| POSSET | Part-of-speech tagset | Filename | The part-of-speech tagset must be specified in a text file with one tag per line (and no blank lines). |
| MODE | Mode (learning or tagging) | LEARN TAG | Tagging (using a model to tag new data) Learning (inducing a model from corpus data) |
| MODEL | Name of the model | Path and/or prefix filename | During Learning it creates two files $MODEL$.con and $MODEL$.lex, and these two files are used during tagging. |
| Language | Data | Linux | Solaris | Windows/CygWin |
|---|---|---|---|---|
| Swedish | SUC2 | suc2granska_linux.tar.gz | suc2granska_solaris.tar.gz | suc2granska_win.tar.gz |
| English | The Penn Treebank (WSJ) | penn_linux.tar.gz | penn_solaris.tar.gz | penn_win.tar.gz |
| Danish | DDT/Parole | ddt_linux.tar.gz | ddt_solaris.tar.gz | ddt_win.tar.gz |