Searching strategies for the Hungarian language

Search by :

ALL Author Subject ISBN/ISSN Advanced Search

Last search:

Image of Searching strategies for the Hungarian language

Journal Articles

Searching strategies for the Hungarian language

Savoy, Jacques - Personal Name;

This paper reports on the underlying IR problems encountered when dealing with the complex morphology and
compound constructions found in the Hungarian language. It describes evaluations carried out on two general
stemming strategies for this language, and also demonstrates that a light stemming approach could be quite effective.
Based on searches done on the CLEF test collection, we find that a more aggressive suffix-stripping approach may produce better MAP. When compared to an IR scheme without stemming or one based on only a light stemmer, we find the differences to be statistically significant. When compared with probabilistic, vector-space and language models, we find that the Okapi model results in the best retrieval effectiveness. The resulting MAP is found to be about 35% better than the classical tf idf approach, particularly for very short requests. Finally, we demonstrate that applying an automatic decompounding procedure for both queries and documents significantly improves IR performance (+10%), compared to word-based indexing strategies.

Availability

No copy data

Detail Information

Series Title: Information Processing & Management
Call Number: -
Publisher: s.l. : Elsevier., 2008
Collation: -
Language: English
ISBN/ISSN: 0306-4573
Classification: -
Content Type: -
Media Type: -
Carrier Type: -
Edition: Vol. 44, No. 1, Page 310-324
Subject(s): Evaluation
Hungarian information retrieval
Hungarian language
CLEF
Decompounding
n-gram indexing
Specific Detail Info: -
Statement of Responsibility: -

Other version/related

No other version available

File Attachment

No Data

Comments

You must be logged in to post a comment