Friday, November 19, 2010

Name Entity Recognition (NER)

Problem definition:
The problem of identifying proper names is particularly difficult for Arabic, since:
  • Non-Vocalization: It is due to a lack of short vowels in usual texts from which a high degree of ambiguity ensues. In theory, only the Koran, and children’s books are fully vowelled.
  • Lack of capitalization: The problem of identifying named entities is particularly difficult for Arabic, since names in the Arabic language do not start with capital letters and, therefore, we cannot mark them in the text by looking at the first letter of the word.
  • Delimitation problems: They are related to the lack of information about unknown words with NEs, an antonomastic usage where proper names are substituted with a phrase or conversely as well as the presence of some homonyms which increases ambiguity when trying to mark NE constituents.

Monday, November 1, 2010

Question Answering System Phases

In all papers we read we found that any Question Answering system(QAS) has three common phases which are:
  1. Question Analysis.
  2. Passage Retrieval.
  3. Answer Extraction.
And there is an addition phase we found in one paper which is "Answer Validation".

Question Analysis:
      The Question Analysis module which processes the question in order to obtain some useful information about the type of answer we are looking for and extracts the question key words and named entities.