Loading…

DRIP: Segmenting individual requirements from software requirement documents

Numerous academic research projects and industrial tasks related to software engineering require individual requirements as input. Unfortunately, according to our observation, several requirements may be packed in one paragraph without explicit boundaries in specification documents. To understand th...

Full description

Saved in:

Bibliographic Details
Published in:	Software, practice & experience practice & experience, 2024-05, Vol.54 (5), p.842-874
Main Authors:	Zhao, Ziyan, Zhang, Li, Lian, Xiaoli, Lv, Heyang
Format:	Article
Language:	English
Subjects:	Algorithms Boundaries deep learning Documents requirement items Research projects Segmentation Segments Semantics Sentences Software engineering software requirements text segmentation
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Numerous academic research projects and industrial tasks related to software engineering require individual requirements as input. Unfortunately, according to our observation, several requirements may be packed in one paragraph without explicit boundaries in specification documents. To understand this problem's prevalence, we performed a preliminary study on the open requirement documents widely used in the academic community over the last 10 years, and found that 26% of them include this phenomenon. Several text segmentation approaches have been reported; however, they tend to identify topically coherent units which may contain more than one requirement. What is more, they do not take the constitutions of semantic units of requirements into consideration. Here we report a two‐phase learning‐based approach named DRIP to segment individual requirements from paragraphs. To be specific, we first propose a Requirement Segmentation Siamese framework, which models the similarity of sentences and their conjunction relations, and then detects the initial boundaries between individual requirements. Then, we optimize the boundaries heuristically based on the semantic completeness validation of the segments. Experiments with 1132 paragraphs and 6826 sentences show that DRIP outperforms the popular unsupervised and supervised text segmentation algorithms with respect to processing different documents (with accuracy gains of 57.65%–187.53%) and processing paragraphs of different complexity (with average accuracy gains of 54.46%–158.68%). We also show the importance of each component of DRIP to the segmentation.
ISSN:	0038-0644 1097-024X
DOI:	10.1002/spe.3303