Recently, machine learning models such as CNNs (Convolutional Neural Networks) are implemented on various frameworks, such as PyTorch, TensorFlow, MATLAB, and others. However, it is not easy to ensure interoperability of a CNN model while maintaining complete equivalence on different frameworks. We developed a MATLAB application to efficiently design, train, and test prediction models for various kinds of defect detection tasks in CNN, SVM (Support Vector Machine), CAE (Convolutional Autoencoder), FCN (Fully Convolutional Network), VAE (Variational Autoencoder), YOLO (You Only Look Once), and FCDD (Fully Convolutional Data Description). In this study, a VGG19-based transfer learning CNN model built on MATLAB was exported to an ONNX (Open Neural Network eXchange) model and applied to a picking robot running on Python to detect defects. Two user interfaces for MATLAB and Python were developed to ensure pixel-level equivalence and ascertain interoperability on both frameworks. Experimental data show that the achievement of equivalence is dependent on the method used to interpolate images for downsizing. The validity and effectiveness are shown through classification experiments by an ONNX model and a peg-in-hole task by a small-sized industrial robot incorporated with the ONNX model.

Keywords

CNN
interoperability
nearest-neighbor interpolation
ONNX
transfer learning

Author information

Introduction

1.1.

Literature review

Mishra et al. developed an automated robot car using artificial intelligence (AI), trained its neural network using AlexNet model, and used YOLO in the object detection phase, where ONNX format was used for practical deduction and judging components [2]. Namala et al. proposed a system using U-Net CNN architecture to solve the problem of hair segmentation in real-world scenarios. In human–robot interactions, where real-time interaction is essential, hair segmentation model using a standard U-Net model faces difficulty in real-time processing. Their proposed method involved Keras and PyTorch for segmentation, which was possible using ONNX, and the difficulty in real-time processing was resolved [3]. Vicente et al. [4] provided a review of machine learning tools that may be suited for deployment in embedded systems, from which they selected two representative tools. One was the well-established Python-based Scikit-Learn, the other was the interoperability-oriented ONNX Runtime. Comparison of their response times showed the considerable superiority of ONNX Runtime. As for actual applications using ONNX, Lin et al. [5] developed a fluorescent sensor array combined with deep learning techniques for real-time monitoring of meat freshness, in which SqueezeNet was applied for automatic identification of the freshness level of meat based on fluorescent sensor array images with high accuracy (98.17%) and further deployed it in various production environments such as personal computers, mobile devices, and websites using ONNX format. To reiterate, deep learning model converters such as ONNX can move designed models between frameworks and to runtime environments. However, Jajal et al. pointed out that conversion errors tend to compromise model quality and disrupt deployment, and that the failure characteristics of deep learning model converters are unknown, adding to the risk when deep learning interoperability technologies are used [6].

Duque et al. addressed the challenge of optimizing a VGG16 CNN for efficient weapon detection on computationally and memory-constrained devices. They utilized a strategic blend of techniques, including transfer learning, pruning, and quantization. The VGG16 architecture was chosen for its relevance in rapid and accurate weapon detection in crowded settings [7]. Wan et al. proposed improved VGG19-based transfer learning models for strip steel surface defect recognition, in which it is reported that the improved model performed significantly better in detecting surface seams defects, despite few samples and imbalanced datasets [8]. Al-khuzaie et al. reported that identifying acute lymphocytic leukemia is time-consuming and labor-intensive, and that the results are not always accurate. The authors investigated methods to improve efficiency and accuracy and to automate the diagnosis by analyzing images of leukemia cells and showed that using a VGG19-based CNN, they were able to obtain an accuracy as high as 99.49%, surpassing other models in terms of simplicity and performance [9]. In addition, Gill et al. investigated a model that could discriminate between normal and cataractous eyes using a VGG19-based CNN, and were able to identify cataracts with more than 90% accuracy [10].

It is evident from these studies that VGG19-based CNN models possess a high classification ability. This tendency was also observed in earlier studies [11]. However, the interoperability among different frameworks expected from ONNX has not been elucidated.

As for recent developments of robots for peg-in-hole tasks, Huang et al. proposed a method in which a compliant dual-arm robot completes a peg-in-hole assembly task, using a position control model during the phase when the arms are not in contact with the part, and a torque control model during the phase when they are in contact with the part. As a result, the system successfully completed the assembly task for different assembled parts with a maximum gap between the peg and the hole of 0.5 mm [12]. Jiang et al. proposed a measurement method for robot peg-in-hole prealignment using a combined two-level vision system. The assembly system and a global coordinate system calibration method based on a dynamic coordinate system were developed to execute an accurate transformation between the coordinate systems. The hole pose was predicted with the proposed image processing method and the hole edge matching method. Experimental data and analysis results demonstrated that the hybrid measurement system showed high precision in the local hole pose and global robot pose measurement accuracy [13]. However, defect detection methods for the workpieces were not considered or equipped.

1.2.

Proposed method and evaluation

We developed a MATLAB application to efficiently design, train and test prediction models for various kinds of defects and feature detection tasks of industrial products, industrial materials, cultivated cell, steel spark test, and others [11, 14–16] utilizing CNN, 3D CNN, SVM, CAE, FCN, VAE, YOLO, and FCDD models. ONNX model import and export function was implemented in the application.

In this study, a VGG19-based transfer learning CNN model built on MATLAB was exported to an ONNX model in order to be used for the defect detection by a picking robot running on Python. Two user interfaces were developed for MATLAB and Python to ensure pixel-level compatibility and achieve satisfactory interoperability on both frameworks. Generally, when designing and testing a transfer learning-based CNN, the resolution of input images has to be downsized to match the input layer of the transferred powerful CNN model such as VGG19. Furthermore, when images are downsized, an interpolation parameter such as nearest neighbor, bilinear or bicubic should be suitably selected. Billa et al. proposed a novel CNN-based architecture for image resizing forensics, both from resizing detection and factor determination points of view in the presence of double-JPEG compression, in which image scaling utilizing bicubic interpolation was used [17].

The evaluation results of three types of interpolation methods were shown through classification experiments of an industrial product using the two user interfaces created. Moreover, the practicality of a robot incorporated with an ONNX model for defect detection is demonstrated through an application to a peg-in-hole task of small workpieces.

Building CNN models in MATLAB environment

2.1.

Design and training of VGG19-based transfer learning CNN models

Since the VGG19-based transfer learning CNN models have demonstrated high classification accuracy in previous experiments, the same design method is applied in this study to the tasks of classifying photographs of industrial products, industrial materials, cultivated cell and steel spark test. Original 156 non-defective and 196 defective images of an industrial product provided by a collaborative manufacturer were used for training a VGG19-based CNN as shown in Figure 1. Test images were generated by flipping the original images horizontally, which were then used to test the CNN after training, i.e., to evaluate the generalization performance.

Figure 1.
The network structure of a CNN model for binary classification, i.e., OK or NG, which is transferred based on VGG19.

Figure 1 shows the overall network structure and activations of the CNN model for binary classification, i.e., OK or NG, which is transferred based on VGG19. Each convolution block has a ReLU (Rectified Linear Unit) layer, and each fully-connected block has a dropout layer or softmax layer. Network parameters, training conditions, training time needed, and processing time after training are shown in Table 1. Following are the specifications of the computing environment: Intel(R) Core(TM) i9-10850K CPU 3.60 GHz, GPU: NVIDIA GeForce RTX 3090, main memory 64 GB. The forward calculation time needed for processing an image in this environment was 70 ms. The forward calculation time includes the interpolation time by nearest neighbor and the prediction time of the VGG19-based CNN or its ONNX model. The forward calculation time of the VGG19-based CNN on MATLAB and its ONNX model on Python were similar.

Item	Value or setting
Model name	VGG19-based CNN
Number of total layers	47
Number of total weights	139,578,434
Training images	156 (OK), 196 (NG)
Test images	156 (OK), 196 (NG)
Resolution of input images	224 × 224 × 3
Epoch size	50
Mini batch size	32
Initial learning rate	0.0001
Optimizer	SGDM
Loss function	Cross entropy loss
Convergence time [s]	871 (average)
Forward calculation time [s]	0.07 (average)

Table 1

Network parameters and training conditions of transfer learning-based CNN (VGG19-based CNN).

The original images used in this study had a resolution of 2590 × 1942 × 3, however, to additionally train the VGG19-based CNN using the target product’s images, the resolution of the images has to be downsized to 224 × 224 × 3 to fit the input layer of VGG19. Nearest neighbor, bilinear and bicubic methods were available for downsizing of original sized images on MATLAB environment. Since nearest neighbor downsizing only selects one of the closest points for downsizing, it is the simplest and requires minimal processing time. Bilinear downsizing considers the closest 2 by 2 neighborhood of known pixels and then the average value of these 4 pixels is outputted. This results in visually smoother images than nearest neighbor. Bicubic downsizing considers the closest 4 by 4 neighborhood of known pixels, i.e., a total of 16 pixels, resulting in generation of noticeably sharper images than the previous two methods. When bilinear or bicubic method is used for downsizing, pixel values different from original ones are forced to be included in downsized images.

Figure 2 shows the differences of pixel values between nearest neighbor and bilinear, nearest neighbor and bicubic, and bilinear and bicubic methods.

Figure 2.
Comparison examples of three types of image downsizing methods.

In this study, images downsized by nearest neighbor were used for training the CNN and testing the generalization ability, due to the fact that the image downsized by nearest neighbor does not have the required pixel values. Moreover, it was confirmed that pixel values computed by bilinear or bicubic were marginally different on MATLAB and Python. The changes in pixel values tend to bring out scores predicted by machine learning models. Thus, nearest neighbor was selected to ensure the reliability and reproducibility of predicted scores for defect detection between MATLAB and Python environments.

After training the CNN on MATLAB, the generalization ability was evaluated using the test images. Table 2 shows the confusion matrix, which confirmed a classification accuracy of 99%.

	Predicted
True	Anomaly (NG)	Normal (OK)
Anomaly (NG)	196	0
Normal (OK)	3	153

Table 2

Classification results of VGG19-based CNN built on MATLAB.

2.2.

ONNX (Open Neural Network Exchange)

ONNX was developed by Microsoft and Facebook [1]. Prior to the development of ONNX, neural network models were trained on various frameworks and were customized for each framework, making it difficult for each model to be mutually used on other frameworks and the concept of interoperability of a CNN model among different frameworks had not been sufficiently established. However, with the development of ONNX, it become possible for engineers to convert trained CNN models to ONNX format models and interoperate each model on different frameworks, regardless of which framework they were trained on.

2.3.

Export to ONNX model

The VGG19-based transfer learning CNN model built on MATLAB can be converted to an ONNX model by calling the exportONNXNetwork( ) function provided by ONNX runtime via the ONNX export button, as shown in Figure 3. In MATLAB, ONNX operator number from 6 to 14 is supported. In this study, the default value was set to 8, and generated an ONNX model file (file extension: onnx) which could be imported in Python.

Figure 3.
The part of ONNX model export function included in the MATLAB application created.

Classification and evaluation using ONNX model in Python

The application developed in Python for controlling the MG400 can import ONNX models using the ONNX model load function shown in Figure 4, which enables image classification using ONNX Runtime. In order to obtain the same classification scores on MATLAB and Python, downsizing of images in Python was equivalent to that of MATLAB.

Figure 4.
The part of ONNX model import function included in the Python application created for a picking robot control.

In the Python application, the resolution of the test images was reduced to 224 × 224 by specifying ‘NEAREST’ in the cv2.resize( ) function of the computer vision library to fit the input layer of the transferred VGG19. The downsized images were fed to the input layer of the ONNX model imported in Python for classification. However, the classification scores observed were different from that predicted in MATLAB.

The resize( ) function in Python was provided by the image processing library, Pillow. Downsizing on test images was done by setting the parameter to ‘NEAREST’, and classification experiments with the ONNX model were conducted. The classification scores equal to MATLAB environment were obtained.

Figures 5 and 6 show the tips of non-defective and defective samples, respectively. The defective samples displayed undesirable deformations around the tip. Although these materials are generally inspected by experienced workers, fatigue sets in after a period of time. Tables 3 and 4 show the classification accuracies of 224 × 224 resized images without and with defects, respectively, in which the nearest neighbor interpolation method is applied to downsize the images. Moreover, it is evident that with Pillow library, the desirable equality of score between MATLAB and Python was achieved.

Figure 5.
Samples of enlarged tip images of non-defective products.

Figure 6.
Samples of enlarged tip images of defective products.

	MATLAB	Pillow (Python)	CV2 (Python)
Trial 1	0.9024	0.9028	0.9865
Trial 2	0.9535	0.9537	0.9766
Trial 3	0.9870	0.9871	0.9834
Trial 4	0.8390	0.8393	0.6999
Trial 5	0.9493	0.9492	0.8123
Average	0.9262	0.9264	0.8917

Table 3

Classification accuracies of 224 × 224 downsized images of non-defective materials.

	MATLAB	Pillow (Python)	CV2 (Python)
Trial 1	1.0000	1.0000	0.9997
Trial 2	0.9893	0.9894	0.9997
Trial 3	0.9914	0.9914	0.3918
Trial 4	0.9998	0.9998	0.9998
Trial 5	0.9728	0.9727	0.9997
Average	0.9906	0.9906	0.8765

Table 4

Classification accuracies of 224 × 224 downsized images of defective materials.

Peg-in-hole experiment by a robot incorporated with ONNX model

As established in the previous section, the combination of ONNX converter and the nearest neighbor interpolation in downsizing images achieved the reliable interoperability of the CNN model between different two platforms, i.e., MATLAB and Python. In this section, a robot incorporated with the imported ONNX model for defect detection is applied to a peg-in-hole task. Figure 7 shows the overview of the robot system for a peg-in-hole task running on Python. Even though the robots shown right and left figures process red and black workpieces, respectively, their tip shapes are manufactured to be identical. The experiment verified that the robots were able to repeatedly continue the task while picking a workpiece, moving to the position in front of the camera and taking a photo, classifying the photo into OK or NG category, and finally placing the workpiece into the designated hole prepared for OK or NG workpiece.

Figure 7.
The overview of the robot system for peg-in-hole task running on Python.

The accuracy of defect detection using ONNX model was 99% as shown in Table 2. Since the repetitive position accuracy of these robots are ±0.05 mm and the clearance between a hole and a workpiece is 0.8 mm, the peg-in-hole task could be completely conducted with a success rate of 100%.

It was not intended to use the ONNX model for the conversion process. If the target picking robot could run on MATLAB environment, the VGG19-based CNN model built on MATLAB could be directly applied. It is likely that different systems and/or models need integration in the future and the groundwork of interoperability and the qualitative equivalence will be fundamental.

Conclusion

In this study, a VGG19-based transfer-learning CNN model built on MATLAB was exported to ONNX (Open Neural Network eXchange) format for use in defect detection of a picking robot running in Python environment. Two user interfaces, i.e., control applications, were developed on MATLAB and Python, so that pixel-level equality in downsizing images and interoperability of CNN models can be achieved on both frameworks. Generally, when designing and testing a transfer learning-based CNN model, the resolution of downsized input images should be equal to the input layer of the transferred powerful CNN such as VGG19. Moreover, when images are downsized, a suitable interpolation parameter such as nearest neighbor, bilinear or bicubic is selected. Results of classification experiments showed the achievement of compatibility and interoperability of CNN models between MATLAB and Python interfaces created. Finally, it has been demonstrated that a small-sized industrial robot incorporated with an imported ONNX model can successfully perform a peg-in-hole task while classifying each workpiece into OK (without a defect) or NG (with a defect).

Presently, industrial robots are deployed for various automation tasks, mainly in the manufacturing sector. Of late, open architecture industrial robots are available, so that application developments on the user side are feasible. Applications are developed in different software languages such as C++, Python and MATLAB according to the specification of SDK (Software Development Kit) provided by each robot manufacturer. Similarly, machine learning models for product quality inspection are also built on various frameworks. Consequently, the combination of the software codebase used for robot application development and frameworks on which machine learning models are built will be varied and numerous. Thus, it is important to evaluate the interoperability and qualitative equivalence of various combinations.

Author contributions

Nagata, Fusaomi: Conceptualization, Methodology, Software, Writing – original draft; Sakata, Singo and Abe, Ryoma: Investigation, Validation, Visualization; Watanabe, Keigo: Writing – review & editing; Habib, Maki K: Writing – review & editing.

Funding

This research did not receive external funding from any agencies.

Ethical statement

Not Applicable.

Data availability

Source data is not available for this article.

Conflicts of interest

The authors declare no conflict of interest.

References

1.
Open Neural Network Exchange. The open standard for machine learning interoperability [Internet]; Accessed 19 November 2024. Available from https://onnx.ai/.
2.
Mishra S, Minh CS, Chuc HT, Long TV, Nguyen TT. Automated robot (car) using artificial intelligence. In: Proc. 2021 International Seminar on Machine Learning, Optimization, and Data Science (ISMODE). Piscataway, NJ: IEEE; 2022. p. 319–324. doi:10.1109/ISMODE53584.2022.9743130.
3.
Namala S, Avva VP, Mangalraj M. An intelligent system for hair coloring using UNET and ONNX. In: Proc. 2022 3rd International Conference for Emerging Technology (INCET2022). Piscataway, NJ: IEEE; 2022. p. 1–5. doi:10.1109/INCET54531.2022.9825256.
4.
Vicente P, Santos PM, Asulba B, Martins N, Sousa J, Almeida L. Comparing performance of machine learning tools across computing platforms. In: Proc. 2023 18th Conference on Computer Science and Intelligence Systems (FedCSIS). Piscataway, NJ: IEEE; 2023. p. 1185–1189. doi:10.15439/2023F3594.
5.
Lin Y, Ma J, Sun DW, Cheng JH, Zhou C. Fast real-time monitoring of meat freshness based on fluorescent sensing array and deep learning: from development to deployment. Food Chem. 2024;448: 139078. doi:10.1016/j.foodchem.2024.139078.
6.
Jajal P, Jiang W, Tewari A, Kocinare E, Woo J, Sarraf A, Interoperability in deep learning: a user survey and failure analysis of ONNX model converters. In: Proc. 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis. New York: ACM; 2024. p. 1466–1478. doi:10.1145/3650212.3680374.
7.
Duque F, Rivera F, Velasquez R. Optimizing convolutional neural networks for efficient weapon detection on edge devices. In: Proc 2023 IEEE CHILEAN Conference on Electrical, Electronics Engineering, Information and Communication Technologies (CHILECON). Piscataway, NJ: IEEE; 2023. p. 1–6. doi:10.1109/CHILECON60335.2023.10418646.
8.
Wan X, Zhang X, Liu L. An improved VGG19 transfer learning strip steel surface defect recognition deep neural network based on few samples and imbalanced datasets. Appl Sci. 2021;11(6):1–24. doi:10.3390/app11062606.
9.
Al-khuzaie MY, Zearah SA, Mohammed NJ. Developing an efficient VGG19-based model and transfer learning for detecting acute lymphoblastic leukemia (ALL). In: Proc. 2023 5th International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA). Piscataway, NJ: IEEE; 2023. p. 1–5. doi:10.1109/HORA58378.2023.10156679.
10.
Gill KS, Anand V, Gupta R. Cataract detection using optimized VGG19 model by transfer learning perspective and its social benefits. In: Proc. 2023 Second International Conference on Augmented Intelligence and Sustainable Systems (ICAISS). Piscataway, NJ: IEEE; 2023. p. 593–596. doi:10.1109/ICAISS58487.2023.10250513.
11.
Nagata F, Habib MK, Watanabe K. Transfer learning-based and originally-designed CNNs for robotic pick and place operation. Int J Mech Automat. 2021;8(3):142–150. Inderscience Publishers. doi:10.1504/IJMA.2021.118430.
12.
Huang Y, Zheng Y, Wang N, Ota J, Zhang X. Peg-in-hole assembly based on master-slave coordination for a compliant dual-arm robot. Assem Autom. 2020;40(2):189–198. doi:10.1108/AA-10-2018-0164.
13.
Jiang T, Cui H, Cheng X, Tian W. A measurement method for robot peg-in-hole prealignment based on combined two-level visual sensors. IEEE Trans Instrum Meas. 2021;70: 1–12. doi:10.1109/TIM.2020.3026802.
14.
Nagata F, Tokuno K, Ochi H, Otsuka A, Ikeda T, Watanabe K, A design and training application for deep convolutional neural networks and support vector machines developed on MATLAB. In: Proc. 6th International Conference on Robot Intelligence Technology and Applications (RiTA2018) Lecture Notes in Mechanical Engineering (LNME). Berlin: Springer; 2018. p. 27–33. doi:10.1007/978-981-13-8323-6_3.
15.
Shimizu T, Nagata F, Arima K, Miki K, Kato H, Otsuka A, Enhancing defective region visualization in industrial products using grad-CAM and random masking data augmentation. Artif Life Robot. 2024;29(1):62–69. Springer. doi:10.1007/s10015-023-00913-8.
16.
Nagata F, Nakashima K, Miki K, Arima K, Shimizu T, Watanabe K, Design and Evaluation Support System for Convolutional Neural Network, Support Vector Machine and Convolutional Autoencoder, Measurements and Instrumentation for Machine Vision. Boca Raton, FL: CRC Press, Taylor & Francis Group; 2024. p. 66–82.
17.
Billa NR, Das BP, Biswal M, Okade M. CNN based image resizing forensics for double compressed JPEG images. J Inf Secur Appl. 2024;81: 103693. doi:10.1016/j.jisa.2023.103693.

Written by

Fusaomi Nagata, Shingo Sakata, Ryoma Abe, Keigo Watanabe, and Maki K. Habib

Article Type: Research Paper

•

Date of acceptance: December 2024

Date of publication: December 2024

•

DOI: 10.5772/acrt.20240043

Copyright: The Author(s), Licensee IntechOpen, License: CC BY 4.0

Download for free

© The Author(s) 2024. Licensee IntechOpen. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Impact of this article

193

Downloads

151

Views

Altmetric Score

Share this article

Join us today!

Submit your Article

Evaluation of Interoperability of CNN Models between MATLAB and Python Environments Using ONNX Runtime Model

Abstract