Lexical analysis for Chinese‐ difficulties and possible solutions†
作者:
Keh‐Jiann Chen,
期刊:
Journal of the Chinese Institute of Engineers
(Taylor Available online 1999)
卷期:
Volume 22,
issue 5
页码: 561-571
ISSN:0253-3839
年代: 1999
DOI:10.1080/02533839.1999.9670494
出版商: Taylor & Francis Group
关键词: lexical analysis;word segmentation;unknown word identification
数据来源: Taylor
摘要:
Chinese sentences are composed with strings of characters without blanks to mark word boundaries. However, the basic processing unit for sentence processing is the word. It is the smallest meaningful, freely used unit for any natural language. Therefore lexical analysis became the first step in processing Chinese sentences. Usually a lexicon is utilized to match words and provide their syntactic and semantic information in the process of lexical analysis. During the word matching process, problems of segmentation ambiguity and occurrences of unknown words will occur. In this paper, both statistical methods and rule‐based methods are discussed for their advantages and disadvantages in solving segmentation ambiguities. For unknown word identification, off‐line word extraction methods and on‐line unknown word identification strategies are surveyed. Both methods complement each other in solving the problem. The strategies and knowledge sources for implementing a practical system are also discussed.
点击下载:
PDF (1181KB)
返 回