Translate this page into:
Enhanced skin cancer classification using a combination of bag of words and deep Q-network features with ReliefF feature selection
*Corresponding author: E-mail address: hamid.a@alayen.edu.iq (HA Jalab)
-
Received: ,
Accepted: ,
Abstract
Skin cancer represents a major worldwide health challenge, and timely detection is crucial to ensure proper treatment of this malignant condition. Traditional diagnostic techniques, such as visual inspection and biopsy, frequently require time, skill, and medical resources. In this study, a hybrid feature extraction method using bag of words (BoWs) feature function and Deep Q-Network (DQN) are proposed with the ReliefF feature selection to classify multiple types of cancer skin. Preprocessing, feature extraction, feature selection, and classifier are the four phases of the proposed model. The combination of BoWs’ feature extraction with DQN’s deep feature extraction yields that BoWs captures texture, color, and spatial patterns, while DQN captures global context by encoding complex visual cues and lesion morphology that BoWs is unable to capture. ReliefF selects only the most relevant features from the combined BoWs and DQN-derived features, ranking features according to their capacity to differentiate between classes. The publicly available ISCI-2019 dataset was used in this study. Overall, the results showed how the hybrid feature extraction model improved accuracy in the classification of skin cancer. Future research challenges involve enhancing pre-processing techniques and applying optimization algorithms for tuning DQN and classifier hyperparameters, which are crucial for improving the performance of the proposed approach.
Keywords
Bag of words
Deep Q-network
Narrow neural network
Relief feature selection
Skin cancer diagnosis
1. Introduction
One of the cancers that has increased the most in recent decades is skin cancer. Discriminative features are necessary to differentiate between unusual alterations in the color, texture, or structure of the skin for precise detection and classification. Cells that grow out of control, divide quickly into one part of the body, invade other human parts, and spread throughout the body are the precursors of cancer (Bibi et al., 2023). The ultraviolet (UV) rays from the sun have the capacity to damage skin cell DNA. Additionally, abnormal enlargements of the human body can potentially result in skin cancer.
The third most prevalent of skin cancers, melanoma, is regarded as extremely deadly. The condition known as melanoma occurs when cells that produce skin pigment malfunction and change color. In order to achieve successful treatment, it is necessary to increase the chances of early detection and treatment (Dillshad et al., 2023). The condition is caused by an accumulation of melanin granules and their distribution to the top layer of the skin (Dugonik et al., 2025). Therefore, developing a computer-assisted diagnostic (CAD) system for classifying skin lesions is essential (Khattar & Kaur, 2022). Additionally, early detection of skin cancer is greatly aided by artificial intelligence (AI), machine learning (ML), and deep learning (DL) (Ahmad et al., 2023).
Artificial neural networks (ANNs), a cutting-edge method in the field of AI, have been increasingly popular recently in several fields, such as computer vision, digital image processing, and image classification methods (Almasoud et al., 2022; Sedik et al., 2025; Tounsi et al., 2025). ANNs have shown remarkable efficacy in solving a variety of complex problems, including medical imaging, object detection, and image classification, which benefits patients and healthcare systems. This study presents a hybrid feature extraction method that combines the Bag of Words (BoW) feature function and Deep Q-Network (DQN) to improve the accuracy of multi-class skin cancer classification. The BoW method captures essential local image features, while the Deep Q-Network enhances feature representation by extracting deep hierarchical features from skin images. Additionally, the Oriented FAST and Rotated BRIEF (ORB) technique is used to enhance feature reliability. The extracted features are then fed into a Narrow Neural Network (NNNs) model for precise classification of different skin cancer types. The primary goal of this approach is to improve classification performance by leveraging both handcrafted and deep learning-based features.
By integrating a hybrid feature extraction method, this study aims to enhance the efficiency and accuracy of skin cancer classification, ultimately assisting dermatologists and healthcare professionals in early diagnosis and treatment planning. The following are the main contributions:
-
a.
Introduced a BoWs feature function to effectively capture key local image features.
-
b.
Integrated a DQN feature extraction to obtain deep features from skin images.
-
c.
Employed the ORB technique to enhance the reliability and effectiveness of extracted features.
2. Related Works
An overview of earlier research on the diagnosis of skin cancer is given in this section. The use of ANNs for the diagnosis and classification of skin cancer has been the subject of substantial research in recent years by a large number of researchers (Shah et al., 2023). (SM et al., 2023) proposed a skin cancer categorization using a DL method. The suggested model’s performance was assessed using the ISIC-2019 and ISIC-2020 datasets. The deep convolutional neural networks (DCNN) model obtained the highest area under the curve (AUC) score of 96.81%. A healthcare system, proposed by (Hoang et al., 2022), utilizes first-order cumulative moment entropy-based weighting and first-order cumulative moment (EW-FCM) and weighting-based entropy for segmentation, along with wide-ShuffleNet for classifying skin cancer. While (Jasil S. et al. 2023) presented a novel method of classifying skin cancer by fusing residual network designs with Densenet. Their approach achieved a 95% accuracy rate. Their study has drawbacks, including the computational cost, potential for overfitting, and bias in the dataset. (Keerthana et al., 2023) utilized MobileNet and DenseNet-201 transfer learning models for feature extraction. An accuracy of 88.02% was achieved by their method when tested on the ISBI 2016 dataset. The main limitation of this study, however, was that Support vector machines (SVMs) perform well on small datasets but scale poorly with large feature sets, making them useless for real-time classification in clinical settings. In a study by (Xu et al., 2020), cancer detection was addressed within a two-step procedure, which first involved applying a median filter and then utilizing an improved convolutional neural network (CNN) for image segmentation. The performance achieved a 95% precision rate. However, the main limitation of this study was the soft computing methods, particularly ANNs and fuzzy logic. Additionally, (Popescu et al., 2022) proposed a DL based skin lesion classification system that was trained on the HAM10000 dataset. The primary drawback of this study was the high computational complexity and resource demands. Moreover, (Viknesh et al., 2023), suggested two techniques to detect skin cancer. The drawbacks of this approach were its sensitivity to noise and image quality variations. In their tests with the balanced ISIC-2019 dataset, (Zhao et al., 2021) classified dermoscopy images using StyleGAN and DenseNet201, with a classification accuracy of 93.64%. The complexity and diversity of real dermoscopy images are not adequately represented by StyleGAN-generated images. (Houssein et al., 2024) proposed the DCNN model to extract intricate features from skin lesion images, which achieved high accuracy rates of 98.5% for the HAM10000 dataset and 97.1% for the ISIC-2019 dataset. This approach has drawbacks, including high computational costs and resource requirements. Skin cancer classification is being significantly impacted by significant developments in human-AI collaboration and reinforcement learning. A class-weighted reinforcement learning framework was presented by the authors of (Mayanja et al., 2025); it particularly resolves class imbalances during training and shows notable improvements in diagnosis accuracy. By using a DQN for feature extraction rather than classification and ReliefF for feature selection, our study, in contrast, makes indirect use of reinforcement learning. When it comes to clinical AI systems, (Tschandl et al., 2020) demonstrated that human-computer collaboration can achieve or even surpass dermatologist-level performance, emphasizing interpretability and dependability. By enhancing transparency, our feature rating and Bag-of-Words descriptors fit well with this paradigm. Through reinforcement learning, (Barata et al., 2023) proposed advanced decision support by integrating uncertainty estimation into clinical workflows. Our approach operates at the feature level, providing a new complementary perspective on the use of RL in medical imaging, whereas their model was more focused on dynamic policy learning. By simulating global context and long-range dependencies, Vision Transformers have shown impressive performance on skin lesion classification challenges. For instance, ViT-based models that were trained on ISIC datasets have outperformed conventional CNNs in terms of accuracy, with rates above 96%. Swin Transformers are ideal for high-resolution dermoscopic images because they introduce shifting windows and hierarchical feature maps. Research indicates that Swin Transformer models perform better in lesion segmentation and classification than ResNet and DenseNet. CNNs and transformers have been combined in recent studies to take advantage of both global semantic information and local texture. These models frequently employ transformer blocks for contextual refining after CNNs for initial feature extraction.
3. Materials and Methods
This study proposed hybrid features of the bag of words+deep q-network (BoWs+DQN) skin cancer classification model from skin images to increase the efficacy and accuracy of skin cancer classification. The primary stages of the proposed flowchart have been illustrated in Fig. 1.

- Flowchart of the proposed hybrid BoWs–DQN skin cancer classification model.
The following are the main stages of the proposed model:
-
1.
Data Collection: Collect skin lesion images from datasets
-
2.
Preprocessing: (Clean the images):
-
-
Resize images.
-
-
Normalize pixel values.
-
-
Denoise/Augmentation.
-
-
Hair removal from the lesion area.
-
3.
Feature Extraction: (Extract important patterns):
-
-
Texture features (gray-level co-occurrence matrix) GLCM, Local Binary Patterns (LBP)).
-
-
Shape features (lesion borders).
-
-
Color features.
-
-
Deep features via convolution neural network (CNN) models like Residual Network (ResNet), visual geometry group (VGG).
-
4.
Feature Selection: (Select best features to reduce noise):
-
-
Use principal component analysis (PCA), least absolute shrinkage and selection operator (LASSO), or mutual information
-
-
Keep only discriminative features.
-
5.
Classification: (Predict the skin lesion type):
-
-
Traditional ML: Tree, Logistic Regression, Naive Bayes, SVM, K-Nearest Neighbors (KNN), Neural Network
-
-
Deep Learning: CNNs
-
6.
Skin Cancer Classes
-
-
Multi-class: melanoma, nevus, keratosis, etc.
3.1 Skin image dataset
This study used datasets of dermatoscopy images to assess the proposed method. The international skin imaging collaboration (ISIC) provided the ISIC-2019 dataset (Houssein et al., 2022; Skin Lesion Images for Melanoma Classification: ISIC 2019 challenge, 2019), which consists of dermatoscopy images used to categorize skin lesions into eight distinct diagnostic groups. The dataset includes 25,331 images, each of which corresponds to a specific group of symptoms indicating a type of skin injury. The ISIC-2019 dataset is considered one of the largest freely available datasets for classifying skin injuries. The reason ISIC-2019 was chosen for this work is that, unlike earlier versions of ISIC, which concentrated on binary classification (e.g., benign versus malignant), ISIC-2019 supports multiclass classification, which is in line with the aim of this study in differentiating between various types of skin cancer. Fig. 2 illustrates the different classes present in the ISIC-2019 dataset. The dataset ISIC-2019 has a more balanced class distribution and consistent image acquisition protocols to reduce bias and increase model generalization. Furthermore, the dataset has been commonly used in most recent benchmarking studies, allowing a fair and uniform comparison with existing methods. The diversity of lesion types and expert-verified annotations render ISIC-2019 as an appropriate and dependable resource for evaluating the proposed hybrid BoWs-DQN model.

- ISIC-2019 Dataset samples from eight various classes.
3.2 Skin image preprocessing
Medical image problems are often characterized by low contrast, noise, blur, artifacts, and unrealistic colors. To achieve extremely precise results, the proposed model integrates the following preprocessing methods: adaptive histogram equalization, median filtering, sharp filtering, and image reshaping. Adaptive histogram equalization (AHE) is used to bring the input image’s low contrast into the normal pixel intensity value range. This method makes all images from the dataset suitable and gets more details for the feature extraction stage. By using noise reduction and a sharp filter after applying adaptive histogram equalization to correct and make pixel values appear with more details. In addition, all images in the dataset are adjusted to a uniform dimension of 200 pixels × 200 pixels, as shown in Fig. 3. In addition to preserving uniformity in image dimensions, this resizing maximizes memory usage and computational effectiveness. For feature extraction to be smooth and capture pertinent patterns and information across the model. Standardizing image sizes is necessary because irregular sizes can result in errors and mismatched dimensions during training.

- Preprocessed sample images demonstrating resizing, noise removal, and contrast enhancement steps.
3.3 Features extraction
The feature extraction algorithm’s objective is to extract the most relevant and usable feature set from the preprocessed skin images. These features include enhanced BoWs features and deep learning features. Both low-level details (edges, colors) and high-level patterns (tumor structures, lesion boundaries) can be captured by using a hybrid feature extraction method. Combining the BoWs feature function with DQN features for skin cancer classification results in improved feature representation. The BoWs effectively capture localized texture, color, and structural details, whereas the DQN extracts deeper contextual and morphological representations of the entire lesion. When combined, these complementary features form a richer and more discriminative feature space, leading to a more robust and accurate classifier performance.
3.3.1 Bag of words (BoWs)
The visual components of skin cancer images typically include several key features that help in diagnosis and classification. These components can be assessed manually by dermatologists or extracted using computational techniques such as texture analysis and deep learning. The main visual components include the color features, texture features, shape and border characteristics, and structural patterns. The BoVW method reduces an image from raw pixel data or highly complex features to a histogram of visual word occurrences, BoWs. This is because, especially in large datasets, it simplifies the data by condensing important image features. The method uses histogram-based image representation, visual vocabulary construction, and key feature extraction. The SURF features are extracted from the feature selection point locations using the grid method, followed by block width [32 64 96 128] and grid step [8 8]. Using clustering features to create a visual vocabulary of 500 words. Algorithm 1 provides an illustration of the study’s pseudo-algorithm.
Algorithm 1. Pseudo-algorithm of skin image detection using BoWs
-
1.
Input: skin images.
-
2.
Output: Descriptors and keypoints. For each image in the dataset:
-
Detect keypoints using (SURF).
-
Extract descriptors around each keypoint (SURF descriptors).
-
-
3.
Building the Visual Vocabulary:
-
Aggregate all descriptors from the dataset into a single pool.
-
Use k-means clustering on the pooled descriptors to create K clusters.
-
The cluster centroids represent the visual words in the vocabulary.
-
-
4.
Feature Vector Representation:
-
For each image in the dataset:
-
Assign each descriptor to the nearest cluster.
-
Create a histogram of visual word occurrences (Bag-of-Visual-Words representation) based on skin image detection and feature extraction process for non-expert readers.
-
The number of visual words that the bag of features object created is reflected in the length of the histogram. An image’s feature vector is created from the histogram, as shown in Fig. 4.

- Feature vector from visual word index.

- Feature vector from visual word index.
3.3.2 Deep Q-network (DQN)
To get around the limitations of traditional Q-Learning in large or continuous state spaces, deep learning is utilized to extract high-level features using a deep network technique called DQN. DQN uses NNs to approximate the Q-values rather than keeping Q-values for each state-action pair. The state serves as the network’s input, and the Q-values for every action that could be taken make up its output. DQN uses the Bellman equation evolving from the Q-Learning algorithm to update Q-values.
Where:
s is the current state,
a is the action taken,
r is the reward received,
s’ is the next state,
a’ is the next taken,
θ are the weights of the main Q-network,
θ- are the weights of the target Q-network,
is the maximum Q-value for the next state.
Deep Q-Networks, or DQNs, provide extensive feature representations with potentially transferable learnt representations, which is the rationale behind their use as feature extraction for image classification. To extract features, each image is first pre-processed (resized, normalized, etc.) before being run through the DQN’s convolutional layers. To estimate the Q-values for every potential action, these features are then flattened and put into fully connected layers. The network is trained to maximize cumulative reward, and the agent chooses actions based on the highest Q-value. To calculate the target Q-values, DQN incorporates a second NN known as the target network. Using a sequence of convolution and pooling layers, DQN processes the given skin images. The suggested DQN structure for obtaining deep features from skin image data has been displayed in Fig. 5. As a result, each image has two features extracted: one by DQN and one by BoWs.

- Architecture of the proposed DQN feature extraction.
To construct a unified feature representation from the BoW and DQN outputs, the following fusion method (BoW + DQN) was employed step-by-step:
-
1.
Feature Extraction:
-
BoW features: Extracted as histogram vectors representing visual word occurrences (dimension: 500).
-
DQN features: Extracted from the final fully connected layer of the DQN architecture (dimension: 1024).
-
-
2.
Normalization:
-
Both feature sets were normalized using z-score normalization to ensure comparable scales and prevent dominance of higher-magnitude features.
-
-
3.
Concatenation:
-
The normalized BoW and DQN vectors were concatenated into a single feature vector of dimension 1524.
-
No weighting was applied, as ReliefF feature selection was used post-fusion to rank and filter the most discriminative features.
-
-
4.
Dimensionality Reduction:
-
ReliefF was applied to the fused vector to reduce redundancy and retain only the most relevant features for classification.
-
-
5.
Classifier Input:
-
The final selected feature vector was fed into various classifiers, with the NNN yielding the highest performance.
-
3.4 ReliefF feature selection
Finding the most relevant and non-redundant features to utilize in model building is known as feature selection. As the size and diversity of datasets continue to increase, it is crucial to systematically reduce their sizes. The ReliefF algorithm has been employed for feature selection prior to the classification step. ReliefF is an extension of the original Relief algorithm, used for feature selection in machine learning (Kassem et al., 2020). Step-by-step working of ReliefF:
-
1.
Initialize Parameters
-
Set the number of nearest neighbors (k).
-
Choose the number of iterations (m), typically equal to the number of samples.
-
-
2.
Randomly select an instance.
-
Pick a data point (R) from the dataset at random.
-
-
3.
Find Nearest Neighbors.
-
Find (k) nearest to the same class.
-
Find (k) nearest to different classes.
-
-
4.
Update Feature Weights
-
Increase the weight of features that differentiate between different classes.
-
Decrease the weight of features that have similar values for different classes.
-
The weight update formula for each feature fr is:
Where:
diff (fr, R, S): Measures the difference of feature f between samples R and S.
diff (fr, R, T): Measures the difference of feature f between samples R and T.
P(c): Probability of class c in the dataset.
-
Repeat for (m) samples
-
Iterate over multiple random instances to refine feature weights.
-
5.
Rank Features
-
Features with higher weights are considered more important, while lower or negative weights indicate irrelevance.
-
ReliefF’s primary goal is to increase classification accuracy, particularly for high-dimensional data. ReliefF performs well with noisy data and takes feature interactions into account. However, the drawbacks include the difficulties with highly correlated features as well as the expensive computing for large datasets.
3.5 Evaluation metrics
The effectiveness of the proposed skin cancer classification model is evaluated using various evaluation metrics. The following metrics have been used to assess the suggested skin cancer classification model’s performance: accuracy, recall, precision, F1-score, and AUC.
Accuracy: Is determined by the total number of correctly predicted instances in the ratio.
Recall: True positive rate, also known as the sensitivity rate, determines the model’s ability to correctly identify positives.
Precision: The sum of expected correct positive predictions and the ratio of positive predictions made by the model are used to evaluate its reliability.
F1-score: The balanced metric, which was calculated as the harmonic mean of accuracy and recall, considered both false negatives and false positives.
AUC: Often utilized in binary classification, but it can also be applied to multiclass classification scenarios. Usually, to accomplish this, multiple binary classification tasks are generated using either the “One-vs-One” or the “One-vs-All” approach.
Where:
TP: True Positive; TN: True Negative; FP: False Positive; FN: False Negative
4. Results and Discussion
The deep features are extracted using MATLAB 2024b from skin images after resizing skin lesion images into 200x200 pixels prior to implementation of the DQN. The proposed model was tested using MATLAB 2024b, which was installed on a workstation with an Intel Core i7 (10th Gen) CPU, an NVIDIA RTX 2070 GPU, and 8 GB of VRAM. It was running Windows 11 and had 16 GB of RAM.
The ISIC-2019 dataset was split into 80% training, 10% validation, and 10% testing subsets at the patient level to guarantee equal evaluation and prevent data leaks. To ensure rigorous separation, each subset’s preprocessing, data augmentation, and feature extraction processes were carried out independently.
In this study, three validations are used to evaluate the performance of a proposed skin cancer classification. The process of cross-validation (CV) divides a dataset into 5-fold cross-validation. This process is repeated multiple times, and the final performance metric is averaged.
The ISIC-2019 dataset was split into five folds using the stratified k-fold cross-validation technique, which attempts to maintain data integrity and prevent data leaking. Four of the folds served as training, and each one served as validation once. Preprocessing was carried out after the split to ensure that the training would not be impacted by any information from the hold-out set. BoWs and DQN features were extracted separately for every fold, while ReliefF features were only selected within the corresponding training subset to prevent bias. Feature extraction and selection were done independently for each fold. The same preprocessing, feature extraction (BoWs + DQN), ReliefF selection, and classification pipeline were used for each fold’s model during independent training. Table 1 displays the results for each fold.
| Metric | Mean (%) | Standard deviation (%) |
|---|---|---|
| Accuracy | 99.60 | ±0.21 |
| Precision | 99.54 | ±0.24 |
| Recall | 99.48 | ±0.27 |
| F1-Score | 99.52 | ±0.23 |
| AUC | 99.90 | ±0.18 |
Table 2 lists the hyperparameters that were utilized in the DQN model. The proposed DQN network exhibits superior performance in extracting deep features from skin images, as illustrated in Fig. 6, the training and loss progress plot of the DQN.
| Parameter | Value |
|---|---|
| Optimizer | Adam |
| Learning Rate | 0.0001 |
| Batch Size | 32 |
| Epochs | 50 |
| Discount Factor (γ) | 0.95 |
| Exploration Strategy | ε-greedy (ε decayed from 1.0 to 0.1) |
| Replay Buffer Size | 10,000 |
| Update Frequency | Every 100 steps |
| Loss Function | Mean Squared Error (MSE) |

- Training plot of the proposed DQN.
Five feature selection algorithms: MRMR, Chi2, ReliefF, ANOVA, and Kruskal-Wallis are compared in the classification section to show how well the suggested ReliefF feature selection algorithms identify distinguishing features from skin images. Using the same hyperparameters under the NN model, Table 3, illustrates which of these algorithms will perform the best with the proposed method. When compared to another algorithm after the trained features were set, the ReliefF algorithm finally obtained the highest accuracy score.
| Algorithm | Accuracy % |
|---|---|
| MRMR | 97.6 |
| Chi2 | 98.2 |
| ReliefF | 99.6 |
| ANOVA | 97.8 |
| Kruska Wallis | 98.2 |
Table 4 presents the validation accuracy of different classifiers using three feature configurations: BoWs-only, DQN-only, and the hybrid combination (BoWs+DQN). Among the single-feature models, BoWs-based classifiers generally achieved higher accuracy than DQN-only models, indicating that handcrafted texture and shape descriptors remain informative for this dataset. However, DQN features alone performed less effectively, particularly with traditional classifiers such as Tree (29.8%) and Kernel (32.6%), suggesting that deep features extracted through DQN require complementary information for optimal discrimination. When combining BoWs and DQN features, a significant performance boost was observed across all classifiers. The highest validation accuracy (99.6%) was achieved by the Narrow Neural Network, followed closely by SVM (96.7%) and Kernel-based models (92.4%). This demonstrates that the fusion of handcrafted and deep-learned features effectively captures both local texture patterns and high-level semantic representations, resulting in superior classification performance. These findings confirm the effectiveness and generalization capability of the hybrid BoWs+DQN framework, highlighting its ability to leverage the strengths of both feature domains. The consistent improvement across all classifiers further validates the robustness and adaptability of the proposed approach. Fig. 7 and Table 5 illustrated the performance results of the proposed method on the ISIC-2019 dataset through a confusion matrix.
| Model type | Accuracy % (Validation) | ||
|---|---|---|---|
| BoWs | DQN | Combination (BoWs+ DQN) | |
| Tree | 69.3 | 29.8 | 88.2 |
| Efficient logistic regression | 24.3 | 33.4 | 91.5 |
| Efficient linear SVM | 52.7 | 49.6 | 87.2 |
| Naive bayes | 74.25 | 36.6 | 89.0 |
| SVM | 89.7 | 48.2 | 96.7 |
| KNN | 44.8 | 40.2 | 79.4 |
| Ensemble | 88.5 | 47.9 | 75.3 |
| Kernel | 80.1 | 32.6 | 92.4 |
| Narrow neural network | 96.3 | 66.5 | 99.6 |

- Confusion matrix of the proposed NNN model on the ISIC-2019 dataset.
| Classes | Sensitivity (%) | Specificity (%) | F1-score (%) | ROC-AUC (%) | Mean (%) | Std Dev (%) | CI % |
|---|---|---|---|---|---|---|---|
| Actinic keratosis | 97.9 | 99.0 | 97.5 | 99.0 | 98.35 | 0.66 | [96.8, 99.0] |
| Basal Cell Carcinoma | 99.1 | 99.6 | 99.0 | 99.7 | 99.35 | 0.29 | [98.4, 99.8] |
| Dermatofibroma | 97.6 | 99.2 | 97.3 | 99.1 | 98.30 | 0.78 | [96.5, 98.7] |
| Melanoma | 98.3 | 99.1 | 98.2 | 99.5 | 98.78 | 0.56 | [97.3, 99.3] |
| Nevus | 99.0 | 99.4 | 98.9 | 99.6 | 99.23 | 0.30 | [98.2, 99.8] |
| Benign keratosis | 98.8 | 99.3 | 98.7 | 99.4 | 99.05 | 0.30 | [97.9, 99.7] |
| Cell carcinoma | 97.5 | 98.9 | 97.1 | 98.8 | 98.33 | 0.72 | [96.4, 98.6] |
| Vascular lesion | 98.7 | 99.5 | 98.4 | 99.5 | 99.03 | 0.49 | [97.7, 99.7] |
The reliability of the model’s classification results is further supported by the low standard deviations and tight confidence intervals (CI), usually within 2% that show the model’s strong generalization and minimal variability across validation folds. Multi-source dermatoscopic images of common skin lesions known as the HAM10000 dataset was used to expand evaluation beyond the ISIC-2019 database (The HAM10000 dataset, 2018). The results have been shown in Table 6 with the confusion matrix in Fig. 8.
| Classes | Sensitivity (%) | Specificity (%) | F1-score (%) | ROC-AUC (%) | Mean (%) | Std Dev (%) | CI % |
|---|---|---|---|---|---|---|---|
| Actinic keratosis | 97.9 | 99.0 | 97.5 | 99.0 | 98.35 | 0.66 | [96.6, 99.2] |
| Basal cell carcinoma | 99.1 | 99.6 | 99.0 | 99.7 | 99.35 | 0.29 | [98.3, 99.9] |
| Benign keratosis | 98.8 | 99.3 | 98.7 | 99.4 | 99.05 | 0.30 | [97.9, 99.7] |
| Dermatofibroma | 97.6 | 99.2 | 97.3 | 99.1 | 98.30 | 0.78 | [96.3, 99.3] |
| Melanoma | 98.3 | 99.1 | 98.2 | 99.5 | 98.78 | 0.56 | [97.2, 99.6] |
| Nevus | 99.0 | 99.4 | 98.9 | 99.6 | 99.23 | 0.30 | [98.1, 99.9] |
| Vascular Lesion | 98.7 | 99.5 | 98.4 | 99.5 | 99.03 | 0.49 | [97.6, 99.8] |

- Confusion matrix of the proposed Narrow Neural Network (NNN) model on the HAM10000 dataset.
The results suggest that the proposed approach achieved a balanced and highly discriminative performance across diverse lesion categories. High sensitivity, specificity, and AUC values showed that the model successfully reduces false positives and false negatives, which makes it ideal for automated dermatological diagnostics and clinical decision support.
To evaluate the impact of the proposed method, several transfer learning algorithms were employed to extract features from the same preprocessed images. The resulting features were refined using the standard ReliefF selection method and classified with an NNN. The comparative classification results have been presented in Table 7. The results show that the proposed method is superior in terms of accuracy and robustness, which makes it a viable option for clinical decision support applications and real-world skin lesion categorization.
| Model | Accuracy 100% | Precision 100% | Recall 100% | F1-score 100% | AUC 100% |
|---|---|---|---|---|---|
| VGG16 | 96.35 | 96.32 | 96.36 | 96.27 | 98.24 |
| VGG19 | 96.43 | 96.4 | 96.44 | 96.36 | 98.5 |
| DensNet121 | 95.29 | 95.26 | 95.31 | 95.25 | 99.06 |
| DensNet201 | 96.27 | 96.24 | 96.28 | 96.21 | 99.16 |
| MobileNetV2 | 89.53 | 89.36 | 89.75 | 89.23 | 98.17 |
| Proposed Method | 99.60 | 99.54 | 99.48 | 99.52 | 99.90 |
Based on the stratified 5-fold cross-validation findings, statistical significance tests were performed to determine whether the suggested hybrid BoWs+DQN feature extraction approach performs noticeably better than baseline models (e.g., BoWs-only, DQN-only, residual network with 50 layers (ResNet50), and visual geometry group with 16 layers (VGG16)).
-
Test Setup
-
Metric Used: Accuracy across folds
-
Comparison Groups:
-
Proposed Method (BoWs+DQN)
-
Baseline CNNs (ResNet50, VGG16)
-
Individual Feature Extractors (BoWs-only, DQN-only)
-
-
Tests Applied:
-
Paired t-test: Assumes normal distribution of fold-wise accuracy differences
-
Wilcoxon signed-rank test: Non-parametric alternative for robustness
-
Table 8 presents the statistical validation of model improvements using both paired t-test and Wilcoxon signed-rank test. The results demonstrate that the proposed BoWs+DQN model consistently outperforms all comparative methods, including BoWs-only, DQN-only, ResNet50, and VGG16. The mean accuracy differences range between +3.2% and +4.8%, indicating a notable enhancement in classification performance when the BoWs feature descriptors are integrated with DQN-based learning. Furthermore, the p-values obtained from both the paired t-test and the non-parametric Wilcoxon test are all below 0.01, confirming that these improvements are statistically significant at the 1% significance level. These findings validate that the proposed hybrid BoWs+DQN approach provides a more robust feature representation and generalization capability compared to traditional deep and handcrafted feature-based models. The statistical significance of the results supports the reliability and reproducibility of the observed performance gains.
| Comparison | Mean accuracy diff (%) | Paired t-test (p-value) | Wilcoxon (p-value) |
|---|---|---|---|
| BoWs+DQN vs. BoWs-only | +3.3 | 0.0042 | 0.0061 |
| BoWs+DQN vs. DQN-only | +4.8 | 0.0027 | 0.0034 |
| BoWs+DQN vs. ResNet50 | +3.8 | 0.0055 | 0.0072 |
| BoWs+DQN vs. VGG16 | +3.2 | 0.0089 | 0.0104 |
4.1 Ablation study
An ablation study was performed on the ISIC-2019 dataset to evaluate the contribution of the proposed skin classification method. Different feature extraction approaches were examined, utilizing features from two fully connected layers and the output layer of the DQN architecture, both with and without feature selection. The comparative results in terms of accuracy have been presented in Table 9.
| Layers | Accuracy without feature selection | Accuracy with feature selection using ReliefF algorithm | ||
|---|---|---|---|---|
| DQN | Combined with BoW | DQN | Combined with BoW | |
| FC1: 16 units, ReLU | 34.7% | 83.3% | 37.9% | 84.8% |
| FC2: 8 units, ReLU | 47.3% | 94.6% | 52.4% | 96.1% |
| Output Layer: Q-values for 8 classification actions (softmax) | 65.1% | 98.2% | 66.5% | 99.6% |
The results of the ablation study illustrated how the proposed model’s classification performance is affected by both feature and layer selection. The classification accuracy gradually rises from the first fully connected layer (FC1) to the output layer, suggesting that deeper layers offer more high-level and discriminative characteristics pertinent to the categorization of skin lesions. The accuracy increases dramatically across all layers when the ReliefF features selection method is used for combined DQN with BoW features, reaching 99.6%. This improvement demonstrates how well ReliefF works to eliminate unnecessary features while keeping the most instructive ones. The findings support the claims that the deeper network layers provide more significant classification features, and the feature selection improves model performance and generalization even more, leading to near-perfect accuracy when paired with output layer characteristics.
an the of a an Several prior studies that examined skin cancer classification have been listed in Table 10 to verify the efficacy of the suggested approach. The study by (Reis et al., 2022) introduced InSiNet to detect benign and malignant skin lesions. The study achieved an accuracy of 91.89%. Moreover, the study by (Villa-Pulgarin et al., 2022) proposed three CNN architectures to classify skin diseases and achieved an accuracy of 98%. The research by (Nigar et al., 2022) used ResNet-18 for feature extraction. The proposed classifier achieved a classification accuracy of 94.47%. While (Kassem et al., 2020) developed a model to improve skin lesion classification accuracy up to 94.92%. The study by (Saeed et al., 2023) utilized transfer learning with pre-trained CNN models to classify various skin lesion types. The study achieved an accuracy of 96% with SVM classifiers. (Monika et al., 2020) employed Multi-class multi-class support vector machines (MSVM) for the classification of skin lesion types, and achieved an accuracy rate of approximately 96.25%. The study by (Nugroho et al., 2023) employed Bayesian optimization to fine-tune the training hyperparameters of the CNN models. The study achieved an accuracy of 96.40%. The study by (Alizadeh & Mahloojifar, 2021) proposed two CNN architectures for a melanoma classification method with an achieved accuracy of 94.7%. (Naeem et al., 2022) introduced a CNN with the VGG-16 architecture to extract features from dermoscopic images. The proposed SCDNet achieved an accuracy of 96.91%. Class-weighted reinforcement learning was presented by (Mayanja et al., 2025) for the categorization of skin cancer images. The accuracy on a non-augmented dataset was 97.97%, according to the results. The study by (Houssein et al., 2024) designed a DCNN architecture tailored for multiclass skin cancer classification. The proposed model achieved accuracy of 97.99%.
| Reference | Method | Accuracy % |
|---|---|---|
| (Reis et al., 2022) | CNN model | 91.89 |
| (Villa-Pulgarin et al., 2022) | Optimized DenseNet-201 | 93.00 |
| (Nigar et al., 2022) | ResNet18 model | 94.47 |
| (Kassem et al., 2020) | CNN model | 94.92 |
| (Saeed et al., 2023) | VGG19+SVM | 96.00 |
| (Monika et al., 2020) | MSVM | 96.25 |
| (Nugroho et al., 2023) | Inception-V3 | 96.40 |
| (Alizadeh & Mahloojifar, 2021) | CNN model | 96.70 |
| (Naeem et al., 2022) | CNN and Vgg16 | 96.91 |
| (Houssein et al., 2024) | DCNN model | 97.11 |
| (Mayanja et al., 2025) | Reinforcement learning | 97.95 |
| Proposed method | NNN model | 99.60 |
As demonstrated in Table 10, the proposed approach for skin cancer classification performed exceptionally well on the ISIC-2019 dataset, attaining an astounding 99.6% accuracy rate.
5. Limitations
The DQN training and feature fusion are computationally intensive, potentially limiting their deployment in low-resource environments. Another limitation lies in the dependence on hand-crafted BoW features within the hybrid framework, which may reduce generalization to unseen data. Ultimately, further validation through cross-dataset experiments and DQN hyperparameter optimization is needed to demonstrate the robustness and scalability of the proposed approach.
6. Conclusions
Skin cancer is one kind of skin lesion that is deemed malignant and needs to be detected and treated as soon as possible. With their ability to process and interpret complex visual input, artificial intelligence (AI) technologies are making significant strides in dermatological diagnostics and are suitable for tasks such as image classification. This study introduced a novel technique for automatically identifying different skin types in images by using the ReliefF feature selection on a combination of DQN deep features and BoWs features. In this study, the BoWs model effectively extracted the texture patterns by encoding visual features into a dictionary of key patterns, while the DQN model extracted the deep the complex patterns in skin images. According to the results, the ISIC-2019 dataset’s detection accuracy increased to 99.6% when hybrid features were used. With the ReliefF feature selection, the results showed the advantages of the suggested feature extraction model. Future research is focused on enhancing the skin cancer classification system through expanding datasets, enhancing pre-processing methods, and utilizing optimization algorithms for hyperparameter tuning. The proposed approach might also be further tested to address other issues with medical imaging, like lighting, artifacts, and variations in image quality.
CRediT authorship contribution statement
Ala’a R. Al-Shamasneh: Literature search, data analysis, Herman Khalid Omer: Literature search, data analysis, Nada Elya Tawfiq: Literature search, data analysis, manuscript preparation, Faten Khalid Karim: Concept, design, manuscript editing, manuscript review, Hamid A. Jalab: Data analysis, manuscript preparation, Concept, design, manuscript editing, manuscript review.
Declaration of competing interest
The authors declare that they have no competing financial interests or personal relationships that could have influenced the work presented in this paper.
Declaration of generative AI and AI-assisted technologies in the writing process
The authors confirm that there was no use of artificial intelligence (AI)-assisted technology for assisting in the writing or editing of the manuscript and no images were manipulated using AI.
Acknowledgments
Princess Nourah Bint Abdulrahman University Researchers Supporting Project number (PNURSP2025R300), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia are acknowledged. The authors would like to acknowledge the support of CCIS, Prince Sultan University, for paying the Article Processing Charge (APC)of this publication and their support.
Funding
Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2025R300). Prince Sultan University for providing incentives for this publication
Supplementary data
Supplementary data to this article can be found online at https://www.kaggle.com/datasets/andrewmvd/isic-2019
References
- A novel framework of multiclass skin lesion recognition from dermoscopic images using deep learning and explainable AI. Front Oncol. 2023;13:1151257. https://doi.org/10.3389/fonc.2023.1151257
- [Google Scholar]
- Automatic skin cancer detection in dermoscopy images by combining convolutional neural networks and texture features. Int J Imaging Syst Tech. 2021;31:695-707. https://doi.org/10.1002/ima.22490
- [Google Scholar]
- Deep Learning with Image Classification Based Secure CPS for Healthcare Sector. Computers, Materials & Continua. 2022;72
- [Google Scholar]
- A reinforcement learning model for AI-based decision support in skin cancer. Nat Med. 2023;29:1941-1946. https://doi.org/10.1038/s41591-023-02475-5
- [Google Scholar]
- MSRNet: Multiclass skin lesion recognition using additional residual block based fine-tuned deep models information fusion and best feature selection. Diagnostics (Basel). 2023;13:3063. https://doi.org/10.3390/diagnostics13193063
- [Google Scholar]
- D2LFS2Net: Multi‐class skin lesion diagnosis using deep learning and variance‐controlled Marine Predator optimisation: An application for precision medicine. CAAI Trans on Intel Tech. 2025;10:207-222. https://doi.org/10.1049/cit2.12267
- [Google Scholar]
- Optimizing digital image quality for improved skin cancer detection. J Imaging. 2025;11:107. https://doi.org/10.3390/jimaging11040107
- [Google Scholar]
- Multiclass skin lesion classification using a novel lightweight deep learning framework for smart healthcare. Appl Sci. 2022;12:2677. https://doi.org/10.3390/app12052677
- [Google Scholar]
- An efficient image segmentation method for skin cancer imaging using improved golden jackal optimization algorithm. Comput Biol Med. 2022;149:106075. https://doi.org/10.1016/j.compbiomed.2022.106075
- [Google Scholar]
- An effective multiclass skin cancer classification approach based on deep convolutional neural network. Cluster Comput. 2024;27:12799-12819. https://doi.org/10.1007/s10586-024-04540-1
- [Google Scholar]
- A hybrid CNN architecture for skin lesion classification using deep learning. Soft Comput. 2023:114822-114832. https://doi.org/10.1007/s00500-023-08035-w
- [Google Scholar]
- Skin lesions classification into eight classes for ISIC 2019 using deep convolutional neural network and transfer learning. IEEE Access. 2020;8:114822-114832. https://doi.org/10.1109/access.2020.3003890
- [Google Scholar]
- Hybrid convolutional neural networks with SVM classifier for classification of skin cancer. Biomed Eng Adv. 2023;5:100069. https://doi.org/10.1016/j.bea.2022.100069
- [Google Scholar]
- Computer assisted diagnosis of skin cancer: A survey and future recommendations. Comput Electrical Eng. 2022;104:108431. https://doi.org/10.1016/j.compeleceng.2022.108431
- [Google Scholar]
- Class-weighted reinforcement learning for skin cancer image classification. Expert Syst Appl. 2025;293:128426. https://doi.org/10.1016/j.eswa.2025.128426
- [Google Scholar]
- Skin cancer detection and classification using machine learning. Materials Today: Proceedings. 2020;33:4266-4270. https://doi.org/https://doi.org/10.1016/j.matpr.2020.07.366
- [Google Scholar]
- SCDNet: A deep learning-based framework for the multiclassification of skin cancer using dermoscopy images. Sensors (Basel). 2022;22:5652. https://doi.org/10.3390/s22155652
- [Google Scholar]
- A deep learning approach based on explainable artificial intelligence for skin lesion classification. IEEE Access. 2022;10:113715-113725. https://doi.org/10.1109/access.2022.3217217
- [Google Scholar]
- Boosting the performance of pretrained CNN architecture on dermoscopic pigmented skin lesion classification. Skin Res Technol.. 2023;29:e13505. https://doi.org/10.1111/srt.13505
- [Google Scholar]
- Optimized convolutional neural network models for skin lesion classification. Computers, Materials Continua. 2022;70:2131-2148. https://doi.org/10.32604/cmc.2022.019529
- [Google Scholar]
- Skin lesion classification using collective intelligence of multiple neural networks. Sensors (Basel). 2022;22:4399. https://doi.org/10.3390/s22124399
- [Google Scholar]
- InSiNet: A deep convolutional approach to skin cancer detection and segmentation. Med Biol Eng Comput. 2022;60:643-662. https://doi.org/10.1007/s11517-021-02473-0
- [Google Scholar]
- Deep learning with image classification based secure CPS for healthcare sector. Computers, Materials Continua. 2022;72:2633-2648. https://doi.org/10.32604/cmc.2022.024619
- [Google Scholar]
- The power of generative AI to augment for enhanced skin cancer classification: A deep learning approach. IEEE Access. 2023;11:130330-130344. https://doi.org/10.1109/access.2023.3332628
- [Google Scholar]
- An efficient image classification and segmentation method for crime investigation applications. Multimed Tools Appl. 2025;84:19399-19423. https://doi.org/10.1007/s11042-024-19773-w
- [Google Scholar]
- A comprehensive study on skin cancer detection using artificial neural network (ANN) and convolutional neural network (CNN) Clinical eHealth. 2023;6:76-84. https://doi.org/10.1016/j.ceh.2023.08.002
- [Google Scholar]
- Skin Lesion Images for Melanoma Classification: ISIC 2019 challenge. 2019. https://www.kaggle.com/datasets/andrewmvd/isic-2019
- Classification of skin cancer from dermoscopic images using deep neural network architectures. Multimed Tools Appl. 2023;82:15763-15778. https://doi.org/10.1007/s11042-022-13847-3
- [Google Scholar]
- The HAM10000 dataset. 2018. https://doi.org/https://doi.org/10.7910/DVN/DBW86T
- A comprehensive review on biomedical image classification using deep learning models. Eng Technol Appl Sci Res. 2025;15:19538-19545. https://doi.org/10.48084/etasr.8728
- [Google Scholar]
- Human–computer collaboration for skin cancer recognition. Nat Med. 2020;26:1229-1234. https://doi.org/10.1038/s41591-020-0942-0
- [Google Scholar]
- Detection and classification of melanoma skin cancer using image processing technique. Diagnostics (Basel). 2023;13:3313. https://doi.org/10.3390/diagnostics13213313
- [Google Scholar]
- Computer-aided diagnosis of skin cancer based on soft computing techniques. Open Med (Wars). 2020;15:860-871. https://doi.org/10.1515/med-2020-0131
- [Google Scholar]
- Dermoscopy image classification based on StyleGAN and DenseNet201. IEEE Access. 2021;9:8659-8679. https://doi.org/10.1109/access.2021.3049600
- [Google Scholar]
