To run MaltEval you need the Java VM (tested for JRE 1.4.1).
Usage:
java -jar MaltEval.jar or
java -jar MaltEval.jar <parser/tagger/chunker file> <gold-standard file>
<sentence id="3" user="" date=""> <word id="1" form="Det" lemma="den" postag="pn.neu.sin.def.sub/obj" head="2" deprel="SUB" chunk="NPB"/> <word id="2" form="innebär" lemma="innebära" postag="vb.prs.akt" head="0" deprel="ROOT" chunk="VCB"/> <word id="3" form="bl." lemma="bland" postag="pp" head="2" deprel="ADV" chunk="ADVPB"/> <word id="4" form="a." lemma="annat" postag="nn.neu.sin.ind.nom" head="3" deprel="ID" chunk="ADVPI"/> <word id="5" form="att" lemma="att" postag="sn" head="2" deprel="OBJ" chunk="O"/> <word id="6" form="endast" lemma="endast" postag="ab" head="7" deprel="ADV" chunk="ADVPB"/> <word id="7" form="en" lemma="en" postag="dt.utr.sin.ind" head="8" deprel="DET" chunk="NPB"/> <word id="8" form="skatteskala" lemma="skatteskala" postag="nn.utr.sin.ind.nom" head="9" deprel="SUB" chunk="NPI"/> <word id="9" form="kommer" lemma="komma" postag="vb.prs.akt" head="5" deprel="UK" chunk="VCB"/> <word id="10" form="att" lemma="att" postag="ie" head="9" deprel="VC" chunk="VCI"/> <word id="11" form="finnas" lemma="finna" postag="vb.inf.sfo" head="10" deprel="IM" chunk="VCB"/> <word id="12" form="för" lemma="för" postag="pp" head="11" deprel="ADV" chunk="PPB"/> <word id="13" form="beräkning" lemma="beräkning" postag="nn.utr.sin.ind.nom" head="12" deprel="PR" chunk="NPB|PPI"/> <word id="14" form="av" lemma="av" postag="pp" head="13" deprel="ATT" chunk="NPI|PPB|PPI"/> <word id="15" form="statlig" lemma="statlig" postag="jj.pos.utr.sin.ind.nom" head="16" deprel="ATT" chunk="APB|NPI|PPI|PPI"/> <word id="16" form="inkomstskatt" lemma="inkomstskatt" postag="nn.utr.sin.ind.nom" head="14" deprel="PR" chunk="NPB|NPI|PPI|PPI"/> <word id="17" form="." lemma="." postag="mad" head="2" deprel="IP" chunk="O"/> </sentence>The tagsets used for parts-of-speech, dependency relations and chunk relations must be specified in the header of the XML document. An example document can be found here. Worth mentioning is that in Malt-XML, more attributes can be incorporated in the word element (such as the lemma-attribute), but they will be ignored by MaltEval.
Det pn.neu.sin.def.sub/obj 2 SUB NPB innebär vb.prs.akt 0 ROOT VCB bl. pp 2 ADV ADVPB a. nn.neu.sin.ind.nom 3 ID ADVPI att sn 2 OBJ O endast ab 7 ADV ADVPB en dt.utr.sin.ind 8 DET NPB skatteskala nn.utr.sin.ind.nom 9 SUB NPI kommer vb.prs.akt 5 UK VCB att ie 9 VC VCI finnas vb.inf.sfo 10 IM VCB för pp 11 ADV PPB beräkning nn.utr.sin.ind.nom 12 PR NPB|PPI av pp 13 ATT NPI|PPB|PPI statlig jj.pos.utr.sin.ind.nom 16 ATT APB|NPI|PPI|PPI inkomstskatt nn.utr.sin.ind.nom 14 PR NPB|NPI|PPI|PPI . mad 2 IP OThe example document can be found here. Sentence splits in Malt-TAB are represented by a blank line as can be seen in the example document. As mentioned, the tagsets used for parts-of-speech, dependency relations and chunk relations are not specified in the Malt-TAB format and must therefore be specified in auxiliary files, one for each tagset. For the example file, the parts-of-speech file can be found here, the dependency relation file here and the chunk relation file here. The only compulsory attribute for Malt-TAB is the first column, namely the word form. The other attributes are all optional, but the order among the included attributes cannot be altered. For example, if both the part-of-speech and the head-attributes are included, then the part-of-speech column must be to the left of the head column. Also, if the head attribute is present the the dependency relation attribute must also be present. This is the total order:
form (required) < postag (optional) < head (optional) < deprel (optional) < chunk (optional)
And since MaltEval does not try to "guess" which attributes are included in the file when you choose to read a Malt-TAB file, you are asked by the program to specify this in a dialog box.
The order of the tags in the tagset files is irrelevant, but it is crucial that all tags in the input files are enumerated in the tagset file. The same is true if the input format in Malt-XML, i.e. the order of the tags in the header is not important but all tags for all words must be enumerated in the header. The evaluator will otherwise complain and the evaluation will be disabled.
When two input files have been successfully imported and before any evaluation is enabled, they conform to each other in a number of respects. So before it is possible to perform any kind of evaluation, MaltEval ensures that the files contain the same text and have the same tagsets. The two files must have the same number of sentences, and each sentence in one file must have the same number of words as the corresponding sentence in the other file. Moreover, the attribute form for each word must be equivalent to the corresponding word in the other file. It is also important that the tagsets in for both files are equivalent, i.e. have the same size and contain the same tags (although the order of the tags in the input files may differ).
NB:Very large input files can cause OutOfMemoryException. If this happens, more memory can be allocated by using the flag -Xmx<N>m when starting MaltEval. (For example, java -jar -Xmx500m MaltEval.jar sets the maximum memory allocation to 500 MB.)
From this menu you can edit four different settings which are common for tagging, parsing and chunking. It is possible to exclude one or more parts-of-speech and/or dependency relations from the evaluation. The exclusion of parts-of-speech is applicable to both the tagging evaluation and the parsing evaluation, whereas the exclusion of dependency relations only applies to the parsing evaluation. No parts-of-speech or dependency relations can be excluded for the chunking evaluation.
From the Edit menu it is possible to specify the minimal and maximal sentence length. All sentences below the lower limit and above the upper limit are discarded from the evaluation. In order to avoid being forced to specify an extremely high value for the upper limit if you want to include all sentences, the value "0" represents infinity. By default the values are set so that all sentences are included. This feature works for all three evaluation options.
The Edit menu also includes the possibility to change the file name of the export file.