ZENTRUM FÜR AUGUSTINUS-FORSCHUNG

AN DER JULIUS-MAXIMILIANS-UNIVERSITÄT WÜRZBURG

ZAF Logo 3

Fecisti nos ad te, domine, et inquietum est cor nostrum donec requiescat in te.

Confessiones 1,1

Geschaffen hast du uns auf dich hin, o Herr, und unruhig ist unser Herz, bis es Ruhe findet in dir.

Bekenntnisse 1,1

The significance of lemmatisation for the users of text databases

Info: Lemmatisation is a linguistic procedure which assigns by means of numerical codes an actual word form appearing in a text to its grammatical base word form.

Disadvantages of non-lemmatised databases

Example: Search inquiry for ‹lex, legis› (= law) in the Augustinian oeuvre in conventional non-lemmatised databases:

Possible strategy 1: Request for lex and for leg*

Problem: You will find, additionally to ‹lex›, any inflected forms, but also any forms of the present stem and the perfect stem active of ‹legere› (= to read). Also derivates of ‹legalis, -e› or ‹legitimus, -a, -um› etc.

Possible strategy 2: Entry of any forms of ‹lex›:

Problem: You will have to search for 8 different forms, whereas you will have to take in consideration the overlapping with 4 forms of the verbum ‹legere›, which must be sorted out from the research result.

lex  leges  lex leges
legis legum legis legum
legi legibus legi legibus
legem leges legem leges
lege legibus lege legibus

 

Advantages of lemmatised databases

Example: Search inquiry for ‹lex, legis› (= law) within the Augustinian oeuvre by means of the lemmatised text database of the Corpus Augustinianum Gissense a Cornelio Mayer editum (CAG-online):

Entry of l:lex

Result: Within some seconds you will find any of the 8.000 word forms of ‹lex› exclusively – identical forms of other words are not to be included.