PolyU Library
Journal Call no.TA4.C3578
AuthorChen, Keh-Jiann.
Article TitleLexical analysis for Chinese : difficulties and possible solutions / Keh-Jiann Chen.
Is Part OfJournal of the Chinese Institute of Engineers ; v.22, no.5, Sep 1999, p.561-571, illus.
AbstractChinese sentences are composed with strings of characters without blanks to mark word boundaries. However, the basic processing unit for sentence processing is the word. It is the smallest meaningful, freely used unit for any natural language. Therefore lexical analysis became the first step in processing Chinese sentences. Usually a lexicon is utilized to match words and provide their syntactic and semantic information in the process of lexical analysis. During the word matching process, problems of segmentation ambiguity and occurrences of unknown words will occur. In the article, both statistical methods and rule-based methods are discussed for their advantages and disadvantages in solving segmentation ambiguities. For unknown word identification, off-line word extraction methods and on-line unknown word identification strategies are surveyed. Both methods complement each other in solving the problem. The strategies and knowledge sources for implementing a practical system are also discussed.