For text categorization, it is necessary to select a set of features(terms) with high discrimination by using feature selection. In text feature selection, Accuracy2（ACC2）treats terms with same absolute document rate difference but different discrimination equally, which is unreasonable. Existing improved methods (normalized difference measure（NDM）, max-min ratio（MMR）and trigonometric comparison measure（TCM）) based on ACC2 may confuse the importance of rare and sparse terms on account of challenge for parameter selection.
To solve the problems, a research team led by Li Zhang published their new research on 15 February 2023 in Frontiers of Computer Science co-published by Higher Education Press and Springer Nature.
The team proposed max-difference maximization criterion（MDMC）, which introduces a new weight based on class information occupacity and combines it with ACC2 to estimate the importance of terms. As a result, MDMC can avoid overestimate of sparse terms.
In the research, they analyze the weight distributions of methods (ACC2, NDM, MMR, TCM and MDMC) and intuitively show the mechanism of MDMC to estimate the importance of terms, which is shown in online resources. Experiments demonstrate that MDMC is capable of catching more discriminant terms without any parameter than other filter ones regardless of classifier, and shows its superiority over other dimensionality reduction methods (improved sine cosine algorithm（ISCA）, principal component analysis（PCA）and non-negative matrix factorization（NMF）).
LETTER, Published: 15 February 2023
Lingbin JIN, Li ZHANG, Lei ZHAO. Max-difference maximization criterion: a feature selection method for text categorization. Front. Comput. Sci., 2023, 17(1): 171337, https://doi.org/10.1007/s11704-022-2154-x
About Frontiers of Computer Science (FCS)
FCS was launched in 2007. It is published bimonthly both online and in print by HEP and Springer. Prof. Zhi-Hua Zhou from Nanjing University serves as the Editor-in-Chief. It aims to provide a forum for the publication of peer-reviewed papers to promote rapid communication and exchange between computer scientists. FCS covers all major branches of computer science, including: architecture, software, artificial intelligence, theoretical computer science, networks and communication, information systems, multimedia and graphics, information security, interdisciplinary, etc. The readers may be interested in the special columns “Perspective” and “Excellent Young Scholars Forum”.
FCS is indexed by SCI(E), EI, DBLP, Scopus, etc. The latest IF is 2.669. FCS solicits the following article types: Review, Research Article, Letter.
Frontiers of Computer Science
Method of Research
Subject of Research
Max-difference maximization criterion: a feature selection method for text categorization
Article Publication Date