测绘学报

• 学术论文 • 上一篇    下一篇

中文文本的地理命名实体标注

张雪英,朱少楠,张春菊   

  1. 南京师范大学
  • 收稿日期:2011-04-13 修回日期:2011-06-03 出版日期:2012-02-25 发布日期:2012-02-25
  • 通讯作者: 张春菊

Annotation of Geographical Named Entities in Chinese Text

  • Received:2011-04-13 Revised:2011-06-03 Online:2012-02-25 Published:2012-02-25

摘要: 通过文本中地理信息的语义解析,可以帮助人们深入理解空间认知和空间语言的表达规律,解决自然语言与地理信息系统(GIS)之间的语义障碍问题,提升GIS空间查询、空间推理、地理信息检索和地理信息服务的智能化水平。制定标注体系和建立标注语料库,能够发现自然语言中地理信息描述的语言结构,建立它们的元数据。本文在分析中文文本和GIS中地理实体描述和表达机制差异的基础上,结合地理命名实体描述的语言特点,制定了中文文本的地理命名实体标注体系和标注规范,并以GATE(General Architecture for Text Engineering)作为标注平台,构建了基于《中国大百科全书中国地理》(简称“GeoCorpus”)的大规模标注语料库,较为有效地解决了当前相关标准和规模化标准数据匮乏的问题。

Abstract: Semantic interpretation of geographic information in natural language text can help people more in-depth understand the mechanism of geospatial cognition and spatial language, and enhance the intelligence of spatial query in geographic information systems (GIS), spatial reasoning, and geographical information retrieval etc. Corpus annotation is the task of analyzing specific language information, linguistic structure of domain information found in the text, and the establishment of the metadata describing them. Firstly, this paper analyzes the difference of representation of geographical entities in Chinese text and GIS. Secondly, based on the description of linguistic characteristics of geographical named entities in Chinese text, an annotation scheme is presented and the annotation specification is given in detail. Finally, GATE(General Architecture for Text Engineering)is introduced as a annotation platform, and a large-scale annotated corpus (i.e. GeoCorpus) based on "Encyclopedia of China Geography" (2,130,000 bytes of Chinese text) is established and evaluated. This study effectively addresses the current lack of related standardization and standardized data. The further work will focus on the following work: 1) Establishing a general corpus annotation based on web pages to resolve the imbalance of GeoCorpus; 2) Developing a visual annotation tool integrating GIS database with GATE to further improve annotation performance; 3) Annotation of spatial relations in Chinese text based on the theory of spatial semantic rules.