Results of forgery detection using traditional feature extraction techniques with machine learning classifiers on Dataset1.
Open access peer-reviewed article
This Article is part of Communications and Security Section
Article metrics overview
168 Article Downloads
View Full Metrics
Article Type: Research Paper
Date of acceptance: November 2024
Date of publication: November 2024
DoI: 10.5772/acrt.20240042
copyright: ©2024 The Author(s), Licensee IntechOpen, License: CC BY 4.0
Detecting forgeries in diverse datasets involves identifying alterations or manipulations of various digital media, such as images, videos, and documents. There are unique challenges associated with each type of media for efficient forgery detection. This study investigated the effectiveness of both traditional and AI-based feature extraction techniques, combined with machine learning classifiers, in detecting forgeries across three distinct datasets: document images, the CASIA 2.0 standard image forgery dataset, and a video dataset containing real and forged videos. Traditional feature extraction methods such as Local Binary Patterns (LBP), Gray-Level Co-occurrence Matrix (GLCM), color histograms, and RGB to HSV conversion were utilized alongside machine learning classifiers like Support Vector Machines (SVM), Random Forest (RF), and Gradient Boosting Machine (GBM). Additionally, AI-based techniques leveraging pretrained models like ResNet50, VGG16, EfficientNetB0 and ensemble methods with Principal Component Analysis (PCA) were employed. This study evaluated the performance of these methods using metrics such as precision, recall, F1 score, and accuracy across different datasets. The results demonstrate the effectiveness of using ensembled features with PCA in improving the detection capabilities of the models. This study underscores the importance of leveraging traditional and advanced AI methodologies to effectively combat digital forgeries in various media types.
ensemble methods
forgery detection
image classification
pretrained models
video forgery
Author information
The digital era has made information a vital tool for communication and dissemination in a variety of fields, including business, science, law, education, politics, media, military, medical imaging and diagnosis, art, digital forensics, intelligence, sports and scientific publications. The authenticity of this digital information is paramount, as it often influences significant decisions and public perception. Images can convey far more information than text, and people are more likely to believe what they see, significantly influencing their judgment. This visual credibility can lead to a series of unintended reactions and consequences. Image forgery is primarily carried out for malicious purposes, such as distorting information, spreading immoral and fake news, fraudulently obtaining money from unsuspecting audiences, damaging the reputations of celebrities or public figures, and exerting adverse political influence on digital platform users. Therefore, clear authentication of images and videos before they are used or shared on digital media platforms is essential. This precaution makes it more difficult for users to unwillingly spread falsified information. However, the advent of sophisticated editing tools has led to an increase in digital forgeries, posing a serious threat to the integrity of information [1–3].
Forgery is the act of deliberately altering, creating, or imitating documents, signatures, artworks, or other items with the intent to deceive, mislead, or defraud. The rise of powerful editing tools and Artificial Intelligence (AI)-generated content has made it easier than ever to create sophisticated forgeries that can deceive even trained professionals. Among the most concerning types of digital forgeries are those involving identity documents, general images, and deepfake videos as shown in the following Figure 1. These forgeries pose significant risks, including identity theft, misinformation, financial fraud, and lack of trust in digital media.
Identity Document Forgeries: Forged identity documents can facilitate a wide range of criminal activities, from illegal immigration to financial fraud. The ability to detect these forgeries accurately and swiftly is crucial for maintaining security and trust in digital and physical identification processes.
General Forgery Images: Manipulated images are prevalent in digital platforms, often used to spread false information, manipulate public opinion, or tarnish reputations. Detecting these forgeries is essential to uphold the integrity of visual information and prevent the harmful consequences of misinformation.
Deepfake Videos: Deepfakes represent one of the most challenging types of digital forgeries. These AI-generated videos are convincingly altered appearances and speeches of individuals, creating realistic but entirely fake content. Deepfakes have been used for malicious purposes, including political manipulation, blackmail, and misinformation campaigns. As deepfake technology continues to advance, the need for effective detection methods becomes increasingly urgent.
Given the diverse nature and serious implications of these forgeries, there is an immediate need to develop advanced AI techniques capable of detecting forgeries across different types of data. Traditional methods are often inadequate against the sophistication of modern forgeries. Thus, leveraging state-of-the-art AI models offers a promising solution.
Structured workflow for detecting and managing digital forgeries, enhancing the reliability of digital media is shown in Figure 2. The major steps involved are
Collecting digital media (images or videos) from various sources such as standard databases, social media platforms, news websites, or personal archives.
Converting data to a consistent format to carry out experiments, i.e., normalization and standardization of data. The common preprocessing techniques involve remove noise and artifacts, resizing images, and standardizing video frames.
Identifying and extracting relevant features from the media.
Applying machine learning or deep learning models to the extracted features to detect anomalies or inconsistencies that indicate forgery.
Cross-validating the results using multiple models or techniques to ensure robustness.
The motivation for this research is rooted in the desire to leverage advanced AI techniques to protect against the multifaceted threats of digital forgeries. By addressing the challenges posed by identity document forgeries, general forgery images, and deepfake videos, developing a robust and comprehensive framework for forgery detection, we address the following:
Enhance security measures for identity verification processes by accurately detecting forged identity documents.
Combat the spread of misinformation by identifying manipulated images.
Protect individuals and institutions from the threats posed by deepfake videos.
Morphing attacks present a significant threat to face recognition systems, particularly in border control scenarios. These attacks exploit the vulnerabilities of facial recognition systems by altering facial features, making it difficult to distinguish between legitimate and morphed images. To combat this, various morphing attack detection (MAD) algorithms and performance metrics have been developed, offering innovative solutions to detect and reject morphed images during enrollment or verification [4]. One approach utilizes patch-level features and lightweight networks, which significantly improve detection accuracy while maintaining computational efficiency and robustness [5]. Another method enhances detection by analyzing RGB color channels and using error-level analysis combined with selective kernel networks. This approach captures critical feature differences and improves classification accuracy without adding excessive parameters [6].
Despite advances in supervised detection methods, they often struggle with diverse attacks due to limited dataset diversity. To address this, an unsupervised solution using self-paced anomaly detection and convolutional autoencoders has been proposed [7]. This method analyses reconstruction behaviours in attack and bona fide samples, allowing better generalization to unknown attacks. Additionally, new neural network training schemes enhance robustness by modifying training data to improve generality and resistance to morphing attacks [8]. By employing layer-wise relevance propagation, these training schemes ensure more reliable models across all image regions, essential for security applications. Overall, these advancements in detection techniques offer promising solutions for enhancing the security of face recognition systems.
Face morphing attacks compromise face recognition systems (FRS), hence the need for development of MAD techniques. The fairness of six single image-based MAD (S-MAD) methods across four ethnic groups was evaluated in [9] and bias was revealed through fairness discrepancy rate (FDR) underscoring the need for more equitable MAD approaches. A survey [10] reviewed progress in face morphing, covering image generation techniques and state-of-the-art MAD algorithms in the context of public databases, vulnerability assessments, benchmarking, and future challenges in biometrics.
Traditional approaches to tampered image detection often involve extracting handcrafted features that capture specific properties of images, such as texture and color. These features are then used to train machine learning classifiers to distinguish between authentic and tampered images. In a review of recent deep learning (DL) techniques [11] for detecting image forgeries, particularly copy-move and splicing attacks, as well as deepfake content, the authors highlighted the superior performance of DL methods on benchmark datasets and possible future research directions in DL architectures, evaluation methods, and dataset development.
Image splicing is detected by analyzing and characterizing texture content within different regions of an image [12]. Detection of copy-move and splicing forgeries has been described in [13] by analyzing texture changes in the Cb and Cr components of YCbCr images using a standard deviation filter and higher-order texture descriptors, followed by SVM classification. A study [14] explored feature extraction and classification techniques for detecting multimedia forgery, using a combination of GLCM, GLDM, GLRLM, and Local Binary Pattern (LBP) features. Transform domain techniques combined with LBP for tampered image detection using machine learning techniques were proposed in [15]. Utilizatiob of Haralick-based textural descriptors and Extreme Learning Machine (ELM) to detect image tampering by applying LBP on YCbCr color bands and classifying features with ELM has been described in [16]. Segmentation-based fractal texture analysis (SFTA) with K-nearest neighbors (KNN) was proposed for copy-move forgery detection in [17]. Extracting and combining SIFT and KAZE features from the HSV color channels for detecting copy-move forgery (CMF) was proposed in [18]. Spliced image forgery detection using adaptive over-segmentation combined with AKAZE, ORB, and SIFT feature descriptors was proposed in [19]. Detecting and localizing CMF by applying SIFT key points on CLAHE-enhanced sub-images in RGB and Lab∗ color spaces was presented in [20]. A CMF detection method using color features and hierarchical feature point matching was presented in [21].
Various digital image tamper detection methods, focusing on the challenges posed by advanced editing tools and DL techniques like GANs, analyze both traditional and DL approaches for detecting image forgeries, their strengths and limitations are elucidated in [22]. A comprehensive review [23] of deepfake detection techniques using DL algorithms categorizes the methods of video, image, audio, and hybrid multimedia tamper detection. It also posits how deepfakes are created and detected, recent advancements, weaknesses in current security methods, and areas for further research. The results highlight convolutional neural networks (CNN) as the most commonly used DL approach, with video deepfake detection being the primary focus, particularly on improving accuracy. The use of DL in creating and detecting deepfakes, which are often difficult to distinguish from real images has been explored in [24]. With growing concerns about privacy and security, various deepfake detection methods have emerged and a DL-based image enhancement technique to improve the quality of generated deepfakes has also been proposed.
A novel framework for detecting fake videos using transfer learning with autoencoders and a hybrid model of CNN and recurrent neural networks (RNN) is worth noting [25]. Convolutional vision transformer for detecting deepfakes, combining CNN for feature extraction with vision transformers (ViT) for classification using an attention mechanism has been proposed [26]. Robust methods for detecting deepfake videos by analyzing teeth and mouth movements, which are challenging to replicate accurately are presented in [27] by incorporating multi-transfer learning techniques with models such as DenseNet121, DenseNet169, EfficientNetB0, EfficientNetB7, InceptionV3, MobileNet, ResNet50, VGG16, VGG19, and Xception. Detecting deepfake videos using CNNs with transfer learning has been explored in [28].
Using the recent advancements in ViT for face forgery detection, a parameter-efficient ViT-based detection model that includes lightweight forgery feature extraction modules, enabling the model to extract global and local forgery clues simultaneously was proposed in [29]. Generalized multi-scenario deepfake detection framework (GM-DF) presented in [30] showed strong performance across five datasets on unseen scenarios. GM-DF uses hybrid expert modelling for domain-specific features, CLIP for cross-domain alignment, and a masked reconstruction mechanism to capture detailed forgeries. Latent space data augmentation model proposed in [31], expanded the forgery feature space by simulating diverse forgeries within the latent space. This approach enhanced domain-specific features and smoothed transitions across forgery types, resulting in a robust, generalizable deepfake detector that outperformed state-of-the-art models on multiple benchmarks. A framework that captures broader forgery clues by combining diverse local representations into a global feature, guided by local and global information integrity principles to optimize feature relevance was proposed in [32]. Deepfake detection network combining spatial and noise domains proposed in [33], used a dual feature fusion module and a local enhancement attention module, which captured complementary forgery features, achieving superior performance across mainstream datasets.
Traditional approaches for forgery detection such as texture-based and color-based, often rely on analyzing the intrinsic properties of digital media, leveraging various statistical and structural characteristics. Texture-based techniques are widely used due to their effectiveness in capturing fine details and patterns that can indicate tampering, whereas color-based techniques are essential for detecting image forgeries as they analyze the color properties and distributions within an image.
Two prominent texture-based techniques used in the proposed work are local binary patterns (LBP) and gray level co-occurrence matrix (GLCM).
LBP is a simple yet powerful texture descriptor used for image analysis, which works by comparing each pixel in an image to its neighboring pixels typically in a 3 × 3 grid. A binary value (0 or 1) is assigned to each neighboring pixel based on whether its value is less than or greater than the central pixel’s value. The binary values are concatenated to form a binary number representing the local texture pattern. A histogram of the LBP values is constructed over the entire image or a specific region, capturing the distribution of local textures. LBP is an effective feature for texture categorization and is frequently used for forgery detection in digital images due to its effectiveness in capturing local texture patterns [41, 42]. LBP calculation is expressed as
For a given pixel
The central pixel pixel
The result is a binary value pixel
GLCM is a powerful tool used in texture analysis to describe the spatial relationship between pixel intensities in an image. GLCM is created by counting how often pairs of pixels with specific values and spatial relationships (defined by distance and angle) occur in an image. Various statistical properties are extracted from GLCM to quantify the texture of an image [43]. GLCM Calculation of GLCM is as follows.
For an image with gray levels
GLCM is a matrix where the element
Texture features that are extracted from GLCM in our study are:
Contrast: Measures the intensity difference between a pixel and its neighbour.
Homogeneity: Measures how close the distribution of elements in the GLCM is to the diagonal, indicating uniform texture.
Energy (ASM): Measures textural uniformity by summing the squared elements of GLCM.
Correlation: Assesses how correlated a pixel is to its neighbor, indicating linear dependencies.
Color-based techniques are valuable tools for detecting image forgeries by analyzing various color properties and distributions. These methods can identify inconsistencies and anomalies in color, lighting, and patterns that are indicative of tampering, thus enhancing the reliability and accuracy of forgery detection systems.
Forgeries may introduce sharp edges or unnatural transitions in color, which can be detected by analyzing the histograms of different image regions. It works by calculating the color histograms for different regions of an image, comparing these histograms to identify inconsistencies in color distribution. Significant differences in color histograms between regions indicate possible forgeries [44]. For an image, a histogram
Converting an image from RGB (Red, Green, Blue) color space to HSV (Hue, Saturation, Value) color space helps to separate color information (hue) from intensity information (value). This is useful in detecting color anomalies that may not be apparent in the RGB space. It works by analysing hue, saturation, and value channels separately to detect inconsistencies in any of these channels that might indicate tampering [45]. RGB to HSV conversion is a color space transformation often used in image processing and is computed using
In the field of forgery detection, machine learning classifiers play a crucial role in distinguishing between authentic and manipulated media. Among the various classifiers, random forest (RF), support vector machine (SVM), and gradient boosting machine (GBM) are widely used due to their robust performance and ability to handle complex patterns [46, 47].
∙
∙
∙
AI techniques have proven to be highly effective in identifying tampered images. This approach typically involves two primary steps: feature extraction and classification. Pretrained models such as ResNet50, VGG16, and EfficientNetB0 are employed for feature extraction, and CNNs are utilized for classification.
Pretrained models like ResNet-50, VGG16, and EfficientNetB0 are CNNs trained on large datasets (e.g., ImageNet). When using these models for feature extraction, we remove the final classification layers and use the output of the last convolutional layer or a fully connected layer (before softmax layer) as features. Computations at convolutional layer, max pooling layer and then feature concatenation are expressed as
where
After the convolutional and pooling layers, the feature maps are flattened into a 1D feature vector.
ResNet50 has a deep CNN architecture as shown in the following Figure 3 and is widely used for image classification tasks, for its depth and efficiency in learning complex patterns [48]. Some of the key components are given below.
VGG16 is a deep CNN [50] known for its high accuracy in image classification. The architecture of vgg16 is given in Figure 4 and the key components are as follows.
Convolution layer applies filters that perform a dot product with areas of the input image to produce an output.
Pooling layer reduces dimensionality and complexity by down-sampling the activation map, which helps to speed up processing and prevent overfitting.
Max Pooling selects the maximum value from each region in the input image, similar to the convolution layer, but focusing on reducing the size of the data.
Softmax layer outputs a probability distribution over the labels, representing the likelihood of each label being correct.
EfficientNet B0 architecture is designed to achieve state-of-the-art performance while maintaining computational efficiency [52, 53]. Following are the key components of efficientnet B0.
Compound Scaling to balance network depth (number of layers), width (number of channels), and resolution (input image size). This scaling strategy optimizes both accuracy and efficiency across different computational constraints.
Convolutional Blocks uses a combination of depthwise separable convolutions and standard convolutions to extract features efficiently. Depthwise separable convolutions reduce the number of parameters and computational cost compared to traditional convolutions.
Squeeze-and-Excitation (SE) Blocks improve feature representation by adaptively recalibrating the responses of channel-wise features. This enhances the model’s focus on informative features and boosts overall performance.
Efficient Scaling refers to scaling the network architecture uniformly in all dimensions (depth, width, and resolution) based on a predefined compound scaling method, which ensures a balanced trade-off between accuracy and computational efficiency.
PCA is a powerful technique used for dimensionality reduction in machine learning. After extracting features from images or other types of data, the dimensionality of the feature vectors can be quite high, leading to challenges such as increased computational complexity, risk of overfitting, and difficulties in visualization. PCA helps address these issues by transforming the high-dimensional data into a lower dimensional space while preserving as much variance as possible [54]. Following are the steps to perform PCA:
Consider a dataset with highly correlated features
Standardize: Subtracting the mean and dividing by the standard deviation for each feature.
Computing the mean of the data and subtracting the mean from the data is as follows:
Compute Covariance Matrix: Calculate the covariance between
Eigenvectors and Eigenvalues: Solve for eigenvalues and eigenvectors of the covariance matrix.
Select the top
Transform Data: Project the original data onto the first principal component, reducing it to one dimension with most variance retained.
Once the features are extracted, the next step is to classify the image as either authentic or tampered. This is achieved using a CNN designed for binary classification [55]. The architecture of a CNN for binary classification is shown in Figure 5.
The CNN consists of multiple convolutional layers, pooling layers, and fully connected layers that further process the feature vectors to learn distinguishing patterns.
The final layer of the CNN is sigmoid layer which outputs the probability of the image being authentic or tampered.
The CNN is trained using a labelled dataset of authentic and tampered images. The network learns to minimize the classification error by adjusting its weights through backpropagation.
This process typically includes texture-based and color-based feature extraction methods, followed by classification using machine learning algorithms as shown in Figure 6 and the steps involved are as follows:
LBP: Compute the LBP for each pixel and generate histogram of LBP values.
GLCM: Compute GLCM for specified distances and angles, extract properties such as contrast, dissimilarity, homogeneity, energy and correlation.
Color histograms: Compute the histogram for each color channel (RGB) and normalize the histograms.
RGB to HSV Conversion: Convert the image from RGB to HSV color space. Extract and normalize histograms for each channel (H, S, V).
This process typically includes feature extraction from pretrained models, followed by classification using DL algorithms as shown in Figure 7 and the steps are as follows:
Accuracy, recall, precision, and F1 score are the assessment metrics that are used to gauge how well the suggested algorithm performs.
∙ Accuracy: It is determined by applying the following equation to the percentage of images that can be correctly identified.
∙ Precision: The algorithm’s positive predictions are accurate to a certain degree, which is measured by precision. In other words, it is the proportion to find positivity of the predictions by considering true positives and both true and false positives.
∙ Recall: Recall, also known as sensitivity or true positive rate, is a metric used to assess an algorithm’s ability to find all positive cases. It evaluates appropriately detected positive examples. Along with actual positive values, it also takes into account misleading negative values.
∙ F1 Score: It is calculated as the harmonic mean of recall and precision values. The value of F1 score ranges from 0 to 1, prediction values near to 1 means model is performing really well.
Detailed results and analysis of our forgery detection experiments using both traditional feature extraction techniques combined with machine learning classifiers, and advanced AI-based methods leveraging pretrained convolutional neural networks (CNNs) are summarized in Tables 1, 2 and 3 along with performance metrics Precision, Recall, F1 score and Accuracy for various classifiers SVM, RF, GBM applied to different traditional feature extraction techniques LBP, GLCM, Color Histogram, RGB to HSV conversion evaluated on three different datasets respectively.
Precision | Recall | F1 score | Accuracy | |
---|---|---|---|---|
Classifiers | LBP | |||
SVM | 0.92 | 0.9 | 0.91 | 0.89 |
RF | 0.94 | 0.91 | 0.92 | 0.9 |
GBM | 0.93 | 0.92 | 0.92 | 0.91 |
GLCM | ||||
SVM | 0.88 | 0.87 | 0.87 | 0.86 |
RF | 0.9 | 0.88 | 0.89 | 0.87 |
GBM | 0.91 | 0.89 | 0.9 | 0.88 |
Color Histogram | ||||
SVM | 0.85 | 0.83 | 0.84 | 0.82 |
RF | 0.86 | 0.84 | 0.85 | 0.83 |
GBM | 0.87 | 0.85 | 0.86 | 0.84 |
RGB to HSV conversion | ||||
SVM | 0.8 | 0.78 | 0.79 | 0.77 |
RF | 0.82 | 0.8 | 0.81 | 0.79 |
GBM | 0.83 | 0.81 | 0.82 | 0.80 |
Precision | Recall | F1 score | Accuracy | |
---|---|---|---|---|
Classifiers | LBP | |||
SVM | 0.89 | 0.88 | 0.88 | 0.86 |
RF | 0.91 | 0.89 | 0.90 | 0.87 |
GBM | 0.90 | 0.89 | 0.89 | 0.88 |
GLCM | ||||
SVM | 0.85 | 0.83 | 0.84 | 0.82 |
RF | 0.87 | 0.85 | 0.86 | 0.84 |
GBM | 0.88 | 0.86 | 0.87 | 0.85 |
Color Histogram | ||||
SVM | 0.81 | 0.79 | 0.80 | 0.78 |
RF | 0.83 | 0.81 | 0.82 | 0.80 |
GBM | 0.84 | 0.82 | 0.83 | 0.81 |
RGB to HSV conversion | ||||
SVM | 0.78 | 0.76 | 0.77 | 0.75 |
RF | 0.80 | 0.78 | 0.79 | 0.77 |
GBM | 0.81 | 0.79 | 0.8 | 0.78 |
Precision | Recall | F1 score | Accuracy | |
---|---|---|---|---|
Classifiers | LBP | |||
SVM | 0.87 | 0.85 | 0.86 | 0.84 |
RF | 0.88 | 0.86 | 0.87 | 0.85 |
GBM | 0.89 | 0.87 | 0.88 | 0.86 |
GLCM | ||||
SVM | 0.83 | 0.81 | 0.82 | 0.80 |
RF | 0.85 | 0.83 | 0.84 | 0.82 |
GBM | 0.86 | 0.84 | 0.85 | 0.83 |
Color Histogram | ||||
SVM | 0.79 | 0.77 | 0.78 | 0.76 |
RF | 0.81 | 0.79 | 0.80 | 0.78 |
GBM | 0.82 | 0.8 | 0.81 | 0.79 |
RGB to HSV conversion | ||||
SVM | 0.75 | 0.73 | 0.74 | 0.72 |
RF | 0.77 | 0.75 | 0.76 | 0.74 |
GBM | 0.78 | 0.76 | 0.77 | 0.75 |
Figure 8 illustrates the performance of the GBM classifier when applied to various feature extraction techniques on Dataset1. The comparative analysis highlights the effectiveness of each technique in detecting forgeries and shows that LBP provides good results. Figure 9 also presents a comprehensive comparison of the accuracy of different classifiers; SVM, RF, GBM across three datasets. It demonstrates how each classifier performs with traditional feature extraction techniques, providing insights into their relative strengths and weaknesses.
Tables 4, 5 and 6 detail the performance metrics of pretrained CNN models ResNet50, VGG16, EfficientNetB0 and custom CNN architectures, along with ensembled features with PCA, on dataset1, dataset2 and dataset3 respectively. Performance metrics include Precision, Recall, F1 score, and Accuracy. Comparative analysis of these techniques is represented in Figures 10, 11 and 12 respectively.
Precision | Recall | F1 score | Accuracy | |
---|---|---|---|---|
Classifiers | Resnet50 | |||
CNN | 0.92 | 0.9 | 0.91 | 0.9 |
Custom CNN | 0.94 | 0.92 | 0.93 | 0.92 |
VGG16 | ||||
CNN | 0.91 | 0.89 | 0.9 | 0.89 |
Custom CNN | 0.93 | 0.91 | 0.92 | 0.91 |
EfficientnetB0 | ||||
CNN | 0.93 | 0.91 | 0.92 | 0.91 |
Custom CNN | 0.95 | 0.93 | 0.94 | 0.93 |
Ensembled features with PCA | ||||
CNN | 0.94 | 0.92 | 0.93 | 0.92 |
Custom CNN | 0.96 | 0.94 | 0.95 | 0.94 |
Precision | Recall | F1 score | Accuracy | |
---|---|---|---|---|
Classifiers | Resnet50 | |||
CNN | 0.93 | 0.91 | 0.92 | 0.91 |
Custom CNN | 0.95 | 0.93 | 0.94 | 0.93 |
VGG16 | ||||
CNN | 0.92 | 0.9 | 0.91 | 0.9 |
Custom CNN | 0.94 | 0.92 | 0.93 | 0.92 |
EfficientnetB0 | ||||
CNN | 0.94 | 0.92 | 0.93 | 0.92 |
Custom CNN | 0.96 | 0.94 | 0.95 | 0.94 |
Ensembled features with PCA | ||||
CNN | 0.95 | 0.93 | 0.94 | 0.93 |
Custom CNN | 0.97 | 0.95 | 0.96 | 0.95 |
Precision | Recall | F1 score | Accuracy | |
---|---|---|---|---|
Classifiers | Resnet50 | |||
CNN | 0.91 | 0.89 | 0.9 | 0.89 |
Custom CNN | 0.93 | 0.91 | 0.92 | 0.91 |
VGG16 | ||||
CNN | 0.9 | 0.88 | 0.89 | 0.88 |
Custom CNN | 0.92 | 0.9 | 0.91 | 0.9 |
EfficientnetB0 | ||||
CNN | 0.92 | 0.9 | 0.91 | 0.9 |
Custom CNN | 0.94 | 0.92 | 0.93 | 0.92 |
Ensembled features with PCA | ||||
CNN | 0.93 | 0.91 | 0.92 | 0.91 |
Custom CNN | 0.95 | 0.93 | 0.94 | 0.93 |
Table 7 presents a comparative analysis of the proposed system for forgery detection against existing methods described in the literature. The comparison covers different datasets, types of data such as images or videos, methods employed, and their corresponding detection results in terms of accuracy. The proposed system demonstrates superior performance compared to existing methods across different datasets and types of data. Its higher accuracy rates 95.23% for image datasets and 93.34% for video datasets highlight the effectiveness of combining ensembled features with PCA for forgery detection. This approach not only improves detection accuracy but also maintains computational efficiency, making it a robust solution for both image and video forgery detection tasks.
Article | Dataset | Type of data | Methods | Results |
---|---|---|---|---|
Doegar | MICC-f220 | Images | Alexnet model with SVM | 93.94% |
Saikia | FaceForensics++ | Videos | Hybrid CNN-LSTM model with optical flow features | 91.21% |
Pokroy and Egorov [59] | DFDC | Videos | Pretrained EfficientNet B5 | 74.4% |
Malolan | FaceForensics | Images | LRP, LIME | 94.3% |
Patel | DFDC | Videos | MobileNet and Random Forest | 90.2% |
Proposed System | Casia 2.0 and Adhaar card image dataset | Images | Ensembled features (Pretrained NN) with PCA | |
Proposed System | SDFVD- Small-scale Deepfake Forgery Video Dataset | Videos | Ensembled features (Pretrained NN) with PCA | |
∙ Pretrained Models (ResNet-50, VGG16, EfficientNetB0) for Deep Feature Extraction: Pretrained models, originally trained on large-scale datasets like ImageNet, can extract deep, high-level features from images. Unlike traditional handcrafted features, these deep features capture complex patterns, textures, and structures within the images, which are critical for identifying subtle manipulations. For images and videos, pretrained models can distinguish between genuine and forged images by recognizing anomalies in the texture or layout that simpler techniques might overlook.
∙ Ensembling of Models for Enhanced Feature Representation: Ensembling different pretrained models, leveraging the strengths of each model, for instance, ResNet-50 may capture detailed spatial hierarchies, VGG16 may focus on edge and texture information, and EfficientNetB0 balances performance and computational efficiency. Combining these features provides a richer and more comprehensive representation of the image. In fraud detection involving normal images, such as detecting tampered photos on social media, an ensemble of these models would provide a robust detection system capable of identifying a wide range of manipulations, from minor to major alterations.
∙ CNN Classifier: After feature extraction, a custom CNN is employed to classify images as “fake” or “authentic.” The CNN architecture is specifically tailored to process the combined feature vector and make accurate predictions. For videos, the custom CNN can detect deepfakes by analyzing subtle inconsistencies in facial expressions, lighting, or shadows that may not align perfectly due to the manipulation.
∙ Higher Accuracy and Robustness: AI tools, particularly DL models, have demonstrated superior accuracy in tasks like image recognition and classification compared to traditional methods (LBP, GLCM). They are better equipped to handle the complexity and variety of real-world data. Traditional methods might fail to detect sophisticated forgeries involving small, nuanced changes, but a DL-based approach can pick up on these subtleties due to its ability to learn complex patterns from the data.
∙ Adaptability: DL models can be fine-tuned on specific datasets, making them highly adaptable to different types of fraud detection tasks. This adaptability ensures that the models remain effective even as the nature of forgeries evolves. In a scenario where new types of document forgeries emerge, the AI models can be retrained or fine-tuned on updated datasets to quickly adapt to new fraud patterns.
∙ Automation and Scalability: AI tools automate the process of image identification and fraud detection, which is crucial for large-scale applications like scanning millions of images or videos on social media platforms. This automation reduces the need for manual inspection, making the process faster and more scalable. In the financial sector, AI tools can automatically scan thousands of transaction documents daily to detect any signs of forgery, significantly improving efficiency and accuracy.
∙ Consider a scenario where government-issued IDs are being forged. The AI tools analyze the texture, layout, and fonts in the document images, detect inconsistencies, such as slight deviations in font style or spacing, which are indicative of forgery. This leads to the early detection of forged documents, preventing potential identity theft or fraud.
∙ On social media platforms, AI tools are used to detect fake images that have been manipulated for malicious purposes (e.g., spreading misinformation). Such AI applications can identify signs of tampering including unnatural lighting or shadows, that may not be detectable by the naked eye. The platform can automatically flag or remove these images, maintaining the integrity of the content shared with users.
∙ Deepfakes represent one of the most challenging forms of video manipulation. The AI tools analyze facial movements, expressions, and other subtle cues in the video. This helps in preventing the spread of malicious deepfake videos, which could be used for blackmail or misinformation.
By leveraging advanced AI tools like pretrained DL models and custom CNNs, the proposed system provides a powerful and scalable solution for fraud detection. These tools are well-suited to handle the complexities and evolving nature of digital forgeries, offering significant advantages in terms of accuracy, robustness, and adaptability. The examples provided illustrate how these tools can be effectively applied across different scenarios.
This study comprehensively evaluated the performance of traditional and AI-based feature extraction techniques, along with various machine learning classifiers, in detecting digital forgeries across three diverse datasets. Traditional feature extraction methods such as LBP, GLCM, color histograms, and RGB to HSV conversion, while effective to a certain extent, showed limitations in handling the complexity and diversity of forgery types present in the datasets. The results demonstrate that AI-based feature extraction methods, particularly those leveraging pretrained models like ResNet50, VGG16, and EfficientNetB0, significantly outperform traditional methods in terms of precision, recall, F1 score, and accuracy. AI-based methods, especially when enhanced with ensemble techniques and dimensionality reduction through PCA, demonstrated superior adaptability and accuracy with advanced classifiers such as CNNs and custom CNN architectures. This study underscores the critical role of advanced AI techniques in addressing the sophisticated challenges posed by digital forgeries and offers a robust framework for future developments in this field.
The authors acknowledge the use of QuillBot for language polishing of the manuscript.
This research did not receive external funding from any agencies.
Not applicable.
Source data is not available for this article.
The authors declare no conflict of interest.
Written by
Article Type: Research Paper
Date of acceptance: November 2024
Date of publication: November 2024
DOI: 10.5772/acrt.20240042
Copyright: The Author(s), Licensee IntechOpen, License: CC BY 4.0
© The Author(s) 2024. Licensee IntechOpen. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Impact of this article
168
Downloads
234
Views
1
Altmetric Score
Join us today!
Submit your Article