Background:
The Naive Bayes (NB) classifier is a powerful supervised algorithm
widely used in Machine Learning (ML). However, its effectiveness relies
on a strict assumption of conditional independence, which is often
violated in real-world scenarios. To address this limitation, various
studies have explored extensions of NB that tackle the issue of
non-conditional independence in the data. These approaches can be
broadly categorized into two main categories: feature selection and
structure expansion. In this particular study, we propose a novel
approach to enhancing NB by introducing a latent variable as the parent
of the attributes. We define this latent variable using a flexible
technique called Bayesian Latent Class Analysis (BLCA). As a result, our
final model combines the strengths of NB and BLCA, giving rise to what
we refer to as NB-BLCA. By incorporating the latent variable, we aim to
capture complex dependencies among the attributes and improve the
overall performance of the classifier.
Methods:
Both Expectation-Maximization (EM) algorithm and the Gibbs
sampling approach were offered for parameter learning. A simulation
study was conducted to evaluate the classification of the model in
comparison with the ordinary NB model. In addition, real-world data
related to 976 Gastric Cancer (GC) and 1189 Non-ulcer dyspepsia (NUD)
patients was used to show the models performance in an actual
application. The validity of models was evaluated using the 10-fold
cross-validation.
Results:
The presented model was superior to ordinary NB in all the
simulation scenarios according to higher classification sensitivity and
specificity in test data. The NB-BLCA model using Gibbs sampling
accuracy was 87.77 (95% CI: 84.87-90.29). This index was estimated at
77.22 (95% CI: 73.64-80.53) and 74.71 (95% CI: 71.02-78.15) for the
NB-BLCA model using the EM algorithm and ordinary NB classifier,
respectively.
Conclusions:
When considering the modification of the NB classifier,
incorporating a latent component into the model offers numerous
advantages, particularly within medical and health-related contexts. By
doing so, the researchers can bypass the extensive search algorithm and
structure learning required in the local learning and structure
extension approach. The inclusion of latent class variables allows for
the integration of all attributes during model construction.
Consequently, the NB-BLCA model serves as a suitable alternative to
conventional NB classifiers when the assumption of independence is
violated, especially in domains pertaining to health and medicine.