Network parameters and training conditions of transfer learning-based CNN (VGG19-based CNN).
Open access peer-reviewed article
This Article is part of Robotics Section
Article metrics overview
193 Article Downloads
View Full Metrics
Article Type: Research Paper
Date of acceptance: December 2024
Date of publication: December 2024
DoI: 10.5772/acrt.20240043
copyright: ©2024 The Author(s), Licensee IntechOpen, License: CC BY 4.0
Recently, machine learning models such as CNNs (Convolutional Neural Networks) are implemented on various frameworks, such as PyTorch, TensorFlow, MATLAB, and others. However, it is not easy to ensure interoperability of a CNN model while maintaining complete equivalence on different frameworks. We developed a MATLAB application to efficiently design, train, and test prediction models for various kinds of defect detection tasks in CNN, SVM (Support Vector Machine), CAE (Convolutional Autoencoder), FCN (Fully Convolutional Network), VAE (Variational Autoencoder), YOLO (You Only Look Once), and FCDD (Fully Convolutional Data Description). In this study, a VGG19-based transfer learning CNN model built on MATLAB was exported to an ONNX (Open Neural Network eXchange) model and applied to a picking robot running on Python to detect defects. Two user interfaces for MATLAB and Python were developed to ensure pixel-level equivalence and ascertain interoperability on both frameworks. Experimental data show that the achievement of equivalence is dependent on the method used to interpolate images for downsizing. The validity and effectiveness are shown through classification experiments by an ONNX model and a peg-in-hole task by a small-sized industrial robot incorporated with the ONNX model.
CNN
interoperability
nearest-neighbor interpolation
ONNX
transfer learning
Author information
Recently, machine learning models such as CNNs (Convolutional Neural Networks) are implemented on various frameworks, such as PyTorch, TensorFlow, MATLAB, and others. However, it is not easy to ensure the interoperability of a CNN model while maintaining complete equivalence on different frameworks. ONNX (Open Neural Network eXchange) is an open data format built to represent interoperable machine learning models. ONNX defines a common set of operators for the building blocks of machine learning and deep learning models, and a common file format to allow AI developers to use these models with a mix of frameworks, tools, runtimes, and compilers [1].
Mishra
Duque
It is evident from these studies that VGG19-based CNN models possess a high classification ability. This tendency was also observed in earlier studies [11]. However, the interoperability among different frameworks expected from ONNX has not been elucidated.
As for recent developments of robots for peg-in-hole tasks, Huang
We developed a MATLAB application to efficiently design, train and test prediction models for various kinds of defects and feature detection tasks of industrial products, industrial materials, cultivated cell, steel spark test, and others [11, 14–16] utilizing CNN, 3D CNN, SVM, CAE, FCN, VAE, YOLO, and FCDD models. ONNX model import and export function was implemented in the application.
In this study, a VGG19-based transfer learning CNN model built on MATLAB was exported to an ONNX model in order to be used for the defect detection by a picking robot running on Python. Two user interfaces were developed for MATLAB and Python to ensure pixel-level compatibility and achieve satisfactory interoperability on both frameworks. Generally, when designing and testing a transfer learning-based CNN, the resolution of input images has to be downsized to match the input layer of the transferred powerful CNN model such as VGG19. Furthermore, when images are downsized, an interpolation parameter such as nearest neighbor, bilinear or bicubic should be suitably selected. Billa
The evaluation results of three types of interpolation methods were shown through classification experiments of an industrial product using the two user interfaces created. Moreover, the practicality of a robot incorporated with an ONNX model for defect detection is demonstrated through an application to a peg-in-hole task of small workpieces.
Since the VGG19-based transfer learning CNN models have demonstrated high classification accuracy in previous experiments, the same design method is applied in this study to the tasks of classifying photographs of industrial products, industrial materials, cultivated cell and steel spark test. Original 156 non-defective and 196 defective images of an industrial product provided by a collaborative manufacturer were used for training a VGG19-based CNN as shown in Figure 1. Test images were generated by flipping the original images horizontally, which were then used to test the CNN after training, i.e., to evaluate the generalization performance.
Figure 1 shows the overall network structure and activations of the CNN model for binary classification, i.e., OK or NG, which is transferred based on VGG19. Each convolution block has a ReLU (Rectified Linear Unit) layer, and each fully-connected block has a dropout layer or softmax layer. Network parameters, training conditions, training time needed, and processing time after training are shown in Table 1. Following are the specifications of the computing environment: Intel(R) Core(TM) i9-10850K CPU 3.60 GHz, GPU: NVIDIA GeForce RTX 3090, main memory 64 GB. The forward calculation time needed for processing an image in this environment was 70 ms. The forward calculation time includes the interpolation time by nearest neighbor and the prediction time of the VGG19-based CNN or its ONNX model. The forward calculation time of the VGG19-based CNN on MATLAB and its ONNX model on Python were similar.
Item | Value or setting |
---|---|
Model name | VGG19-based CNN |
Number of total layers | 47 |
Number of total weights | 139,578,434 |
Training images | 156 (OK), 196 (NG) |
Test images | 156 (OK), 196 (NG) |
Resolution of input images | 224 × 224 × 3 |
Epoch size | 50 |
Mini batch size | 32 |
Initial learning rate | 0.0001 |
Optimizer | SGDM |
Loss function | Cross entropy loss |
Convergence time [s] | 871 (average) |
Forward calculation time [s] | 0.07 (average) |
The original images used in this study had a resolution of 2590 × 1942 × 3, however, to additionally train the VGG19-based CNN using the target product’s images, the resolution of the images has to be downsized to 224 × 224 × 3 to fit the input layer of VGG19. Nearest neighbor, bilinear and bicubic methods were available for downsizing of original sized images on MATLAB environment. Since nearest neighbor downsizing only selects one of the closest points for downsizing, it is the simplest and requires minimal processing time. Bilinear downsizing considers the closest 2 by 2 neighborhood of known pixels and then the average value of these 4 pixels is outputted. This results in visually smoother images than nearest neighbor. Bicubic downsizing considers the closest 4 by 4 neighborhood of known pixels, i.e., a total of 16 pixels, resulting in generation of noticeably sharper images than the previous two methods. When bilinear or bicubic method is used for downsizing, pixel values different from original ones are forced to be included in downsized images.
Figure 2 shows the differences of pixel values between nearest neighbor and bilinear, nearest neighbor and bicubic, and bilinear and bicubic methods.
In this study, images downsized by nearest neighbor were used for training the CNN and testing the generalization ability, due to the fact that the image downsized by nearest neighbor does not have the required pixel values. Moreover, it was confirmed that pixel values computed by bilinear or bicubic were marginally different on MATLAB and Python. The changes in pixel values tend to bring out scores predicted by machine learning models. Thus, nearest neighbor was selected to ensure the reliability and reproducibility of predicted scores for defect detection between MATLAB and Python environments.
After training the CNN on MATLAB, the generalization ability was evaluated using the test images. Table 2 shows the confusion matrix, which confirmed a classification accuracy of 99%.
Predicted | ||
---|---|---|
True | Anomaly (NG) | Normal (OK) |
Anomaly (NG) | 196 | 0 |
Normal (OK) | 3 | 153 |
ONNX was developed by Microsoft and Facebook [1]. Prior to the development of ONNX, neural network models were trained on various frameworks and were customized for each framework, making it difficult for each model to be mutually used on other frameworks and the concept of interoperability of a CNN model among different frameworks had not been sufficiently established. However, with the development of ONNX, it become possible for engineers to convert trained CNN models to ONNX format models and interoperate each model on different frameworks, regardless of which framework they were trained on.
The VGG19-based transfer learning CNN model built on MATLAB can be converted to an ONNX model by calling the exportONNXNetwork( ) function provided by ONNX runtime via the ONNX export button, as shown in Figure 3. In MATLAB, ONNX operator number from 6 to 14 is supported. In this study, the default value was set to 8, and generated an ONNX model file (file extension: onnx) which could be imported in Python.
The application developed in Python for controlling the MG400 can import ONNX models using the ONNX model load function shown in Figure 4, which enables image classification using ONNX Runtime. In order to obtain the same classification scores on MATLAB and Python, downsizing of images in Python was equivalent to that of MATLAB.
In the Python application, the resolution of the test images was reduced to 224 × 224 by specifying ‘NEAREST’ in the cv2.resize( ) function of the computer vision library to fit the input layer of the transferred VGG19. The downsized images were fed to the input layer of the ONNX model imported in Python for classification. However, the classification scores observed were different from that predicted in MATLAB.
The resize( ) function in Python was provided by the image processing library, Pillow. Downsizing on test images was done by setting the parameter to ‘NEAREST’, and classification experiments with the ONNX model were conducted. The classification scores equal to MATLAB environment were obtained.
Figures 5 and 6 show the tips of non-defective and defective samples, respectively. The defective samples displayed undesirable deformations around the tip. Although these materials are generally inspected by experienced workers, fatigue sets in after a period of time. Tables 3 and 4 show the classification accuracies of 224 × 224 resized images without and with defects, respectively, in which the nearest neighbor interpolation method is applied to downsize the images. Moreover, it is evident that with Pillow library, the desirable equality of score between MATLAB and Python was achieved.
MATLAB | Pillow (Python) | CV2 (Python) | |
---|---|---|---|
Trial 1 | 0.9024 | 0.9028 | 0.9865 |
Trial 2 | 0.9535 | 0.9537 | 0.9766 |
Trial 3 | 0.9870 | 0.9871 | 0.9834 |
Trial 4 | 0.8390 | 0.8393 | 0.6999 |
Trial 5 | 0.9493 | 0.9492 | 0.8123 |
Average | 0.9262 | 0.9264 | 0.8917 |
MATLAB | Pillow (Python) | CV2 (Python) | |
---|---|---|---|
Trial 1 | 1.0000 | 1.0000 | 0.9997 |
Trial 2 | 0.9893 | 0.9894 | 0.9997 |
Trial 3 | 0.9914 | 0.9914 | 0.3918 |
Trial 4 | 0.9998 | 0.9998 | 0.9998 |
Trial 5 | 0.9728 | 0.9727 | 0.9997 |
Average | 0.9906 | 0.9906 | 0.8765 |
As established in the previous section, the combination of ONNX converter and the nearest neighbor interpolation in downsizing images achieved the reliable interoperability of the CNN model between different two platforms, i.e., MATLAB and Python. In this section, a robot incorporated with the imported ONNX model for defect detection is applied to a peg-in-hole task. Figure 7 shows the overview of the robot system for a peg-in-hole task running on Python. Even though the robots shown right and left figures process red and black workpieces, respectively, their tip shapes are manufactured to be identical. The experiment verified that the robots were able to repeatedly continue the task while picking a workpiece, moving to the position in front of the camera and taking a photo, classifying the photo into OK or NG category, and finally placing the workpiece into the designated hole prepared for OK or NG workpiece.
The accuracy of defect detection using ONNX model was 99% as shown in Table 2. Since the repetitive position accuracy of these robots are ±0.05 mm and the clearance between a hole and a workpiece is 0.8 mm, the peg-in-hole task could be completely conducted with a success rate of 100%.
It was not intended to use the ONNX model for the conversion process. If the target picking robot could run on MATLAB environment, the VGG19-based CNN model built on MATLAB could be directly applied. It is likely that different systems and/or models need integration in the future and the groundwork of interoperability and the qualitative equivalence will be fundamental.
In this study, a VGG19-based transfer-learning CNN model built on MATLAB was exported to ONNX (Open Neural Network eXchange) format for use in defect detection of a picking robot running in Python environment. Two user interfaces, i.e., control applications, were developed on MATLAB and Python, so that pixel-level equality in downsizing images and interoperability of CNN models can be achieved on both frameworks. Generally, when designing and testing a transfer learning-based CNN model, the resolution of downsized input images should be equal to the input layer of the transferred powerful CNN such as VGG19. Moreover, when images are downsized, a suitable interpolation parameter such as nearest neighbor, bilinear or bicubic is selected. Results of classification experiments showed the achievement of compatibility and interoperability of CNN models between MATLAB and Python interfaces created. Finally, it has been demonstrated that a small-sized industrial robot incorporated with an imported ONNX model can successfully perform a peg-in-hole task while classifying each workpiece into OK (without a defect) or NG (with a defect).
Presently, industrial robots are deployed for various automation tasks, mainly in the manufacturing sector. Of late, open architecture industrial robots are available, so that application developments on the user side are feasible. Applications are developed in different software languages such as C++, Python and MATLAB according to the specification of SDK (Software Development Kit) provided by each robot manufacturer. Similarly, machine learning models for product quality inspection are also built on various frameworks. Consequently, the combination of the software codebase used for robot application development and frameworks on which machine learning models are built will be varied and numerous. Thus, it is important to evaluate the interoperability and qualitative equivalence of various combinations.
This research did not receive external funding from any agencies.
Not Applicable.
Source data is not available for this article.
The authors declare no conflict of interest.
Written by
Article Type: Research Paper
Date of acceptance: December 2024
Date of publication: December 2024
DOI: 10.5772/acrt.20240043
Copyright: The Author(s), Licensee IntechOpen, License: CC BY 4.0
© The Author(s) 2024. Licensee IntechOpen. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Impact of this article
193
Downloads
151
Views
1
Altmetric Score
Join us today!
Submit your Article