Problem definition:
The problem of identifying proper names is particularly difficult for Arabic, since:
- Non-Vocalization: It is due to a lack of short vowels in usual texts from which a high degree of ambiguity ensues. In theory, only the Koran, and children’s books are fully vowelled.
- Lack of capitalization: The problem of identifying named entities is particularly difficult for Arabic, since names in the Arabic language do not start with capital letters and, therefore, we cannot mark them in the text by looking at the first letter of the word.
- Delimitation problems: They are related to the lack of information about unknown words with NEs, an antonomastic usage where proper names are substituted with a phrase or conversely as well as the presence of some homonyms which increases ambiguity when trying to mark NE constituents.