Abstract:There is a massive amount of unstructured data in civil aircraft validation tests, making it difficult to conduct data retrieval and application based on file content, and traditional segmentation retrieval methods cannot meet the segmentation retrieval requirements in the field of validation tests. Therefore, this article proposes a segmentation retrieval method based on statistics and terminology dictionaries for the field of validation experiments: first, the conditional random fields (CRF) model is used to achieve text initial segmentation; Then, based on the domain files, a terminology dictionary is constructed, and combined with the Reverse Maximum Matching (RMM) algorithm on the basis of the terminology dictionary and the initial segmented text to achieve professional segmentation of the text; Finally, based on the professional segmentation results, the unstructured files are divided and indexed to support data retrieval based on file content. To verify the correctness of the proposed method, a case segmentation was conducted using the text content of a certain experimental outline, and compared with traditional statistical segmentation methods such as CRF, N-gram, and Hidden Markov Model (HMM). The result shows that the proposed method exhibited the best accuracy in professional segmentation. The segmentation retrieval method proposed in this article can help build a civil aircraft validation test database and achieve rapid retrieval of unstructured files in the database.