主管单位:中华人民共和国工业和信息化部
主办单位:西北工业大学  中国航空学会
地       址:西北工业大学友谊校区航空楼
基于分词的民用飞机验证试验数据检索技术研究
DOI:
作者:
作者单位:

1.西北工业大学;2.中航西飞民用飞机有限责任公司;3.同方知网数字出版技术股份有限公司

作者简介:

通讯作者:

中图分类号:

V19

基金项目:


Research on Civil Aircraft Verification Test Data Retrieval Technology Based on Word Segmentation
Author:
Affiliation:

Northwestern Polytechnical University

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    民用飞机验证试验中存在着海量的非结构化数据,难以基于文件内容开展数据检索和应用,而传统的分词检索方法又无法满足验证试验领域的分词检索要求。面向验证试验领域提出基于统计和术语词典的分词检索方法:首先利用条件随机场模型(CRF)实现文本初分;然后基于领域文件构建术语词典,并在术语词典和初分文本基础上结合逆向最大匹配算法(RMM)实现文本的专业分词;最后依据专业分词结果对非结构化文件进行内容拆分和建立索引,采用某试验大纲文本内容开展案例分词,并与CRF、N-gram 以及HMM(Hidden Markov Model)等传统统计分词方法开展对比验证。结果表明:本文所提分词检索方法在专业分词的准确性上表现最优,能够帮助构建民用飞机验证试验数据库,实现数据库非结构化文件的快速检索功能。

    Abstract:

    There is a massive amount of unstructured data in civil aircraft validation tests, making it difficult to conduct data retrieval and application based on file content, and traditional segmentation retrieval methods cannot meet the segmentation retrieval requirements in the field of validation tests. Therefore, this article proposes a segmentation retrieval method based on statistics and terminology dictionaries for the field of validation experiments: first, the conditional random fields (CRF) model is used to achieve text initial segmentation; Then, based on the domain files, a terminology dictionary is constructed, and combined with the Reverse Maximum Matching (RMM) algorithm on the basis of the terminology dictionary and the initial segmented text to achieve professional segmentation of the text; Finally, based on the professional segmentation results, the unstructured files are divided and indexed to support data retrieval based on file content. To verify the correctness of the proposed method, a case segmentation was conducted using the text content of a certain experimental outline, and compared with traditional statistical segmentation methods such as CRF, N-gram, and Hidden Markov Model (HMM). The result shows that the proposed method exhibited the best accuracy in professional segmentation. The segmentation retrieval method proposed in this article can help build a civil aircraft validation test database and achieve rapid retrieval of unstructured files in the database.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-03-28
  • 最后修改日期:2024-04-22
  • 录用日期:2024-05-24
  • 在线发布日期: 2025-04-11
  • 出版日期: