Morphological compression of arabic texts.

number: 
347
إنجليزية
department: 
Degree: 
Imprint: 
Computer Science
Author: 
Wasan Ghazi Ibrahim
Supervisor: 
Dr. Mohammed A. Shallal
Dr. Ali F.Saleh
year: 
1999

Abstract:

A morphological compression system is constructed for Arabic text files, which makes use of the morphological structure of the Arabic language. This concept of compression reduces the size of data by replacing some words in the text by their morphological representation. In Arabic, this representation consists of a root and a pattern combination. The word is morphologically handled by isolating it from all its affixes (prefixes and suffixes) first, then the result is reduced to its root and pattern form. A cascaded arrangement of both Word-Based and Character-Based techniques is used. This mixing is to enhance the compression ratio, where one of the word-based techniques is used to compress the word, and the character-based technique is used for the words that could not be reduced (i.e., compressed) using any of the word-based techniques.