Abstract The human brain is the most complicated human organ, and simulating its functionality is an exceedingly challenging task, particularly the multi-modal sensory functionalities of the brain. Results from biological experiments show that it is possible to identify instances of objects using tactile signals. This research uses similar concepts for modelling a multi-modal sensory input processing system for tactile inputs. VRSS is a novel touch-to-vision-to-text-to-audio system which simulates the multi-modal sensory behavior of the brain by converting tactile inputs to visual images, which are further converted to audio and text. The main aim of this research is to classify object instances based on tactile signals. Tactile inputs are captured and implicitly converted to visual inputs using the DIGIT sensor simulated in the TACTO simulator, and using them, the object is classified using Convolutional Neural Networks. The classification output is further converted into audio, thus successfully simulating three modalities - touch, vision, and sound. For construction of VRSS, multiple pretrained CNNs with different configurations of hyperparameters were tested, and the pretrained ConvNeXtTiny model had the best accuracy of them all - 91%. It was further modified, and the accuracy of the resulting custom VRSS CNN Model was found to be 95.83%. Following these results, this research will help in expanding the applicability of different CNNs. Along with this, it will also facilitate in-depth understanding of the human multi-modal sensory system, and also has wide scope in the fields of artificial intelligence and robotics, particularly in the navigation of uncharted territories.
Alan : Mühendislik
Dergi Türü : Uluslararası
Benzer Makaleler | Yazar | # |
---|
Makale | Yazar | # |
---|