ACOUSTIC GARBAGE CLASSIFICATION METHOD USING ONE-DIMENSIONAL CONVOLUTIONAL NEURAL NETWORK (1D-CNN)

20220349861 · 2022-11-03

Assignee

Inventors

Cpc classification

International classification

Abstract

An acoustic garbage classification method using a one-dimensional convolutional neural network (1D-CNN) is provided. The method includes: acquiring sound signals generated by falling garbage; preprocessing the sound signals; acquiring and preprocessing the sound signals of different types of garbage, building a sound database for garbage classification, and establishing and training a 1D-CNN model; acquiring a sound signal of garbage to be classified, and inputting the sound signal into the trained 1D-CNN for garbage classification to obtain a classification result. The present disclosure is helpful to assist people in accurate garbage classification, improves the accuracy of garbage classification and recycling, and has high practical and popularization value.

Claims

1. An acoustic garbage classification method using a one-dimensional convolutional neural network (1D-CNN), comprising the following steps: (A) acquiring sound signals generated by falling garbage; (B) preprocessing the sound signals; (C) acquiring and preprocessing the sound signals of different types of garbage, building a sound database for a garbage classification, and establishing and training a 1D-CNN model; and (D) acquiring a sound signal of garbage to be classified, and inputting the sound signal of the garbage to be classified into a trained 1D-CNN model for the garbage classification to obtain a classification result.

2. The acoustic garbage classification method using the 1D-CNN according to claim 1, wherein in step (A), the sound signals produced by the falling garbage are generated by an impact of the falling garbage freely falling to a plate, and are recorded by a single-channel microphone; the sound signals are sampled at a frequency of 44,100 Hz; and the above process is repeated a plurality of times for each of different types of garbage to acquire multiple sound signals.

3. The acoustic garbage classification method using the 1D-CNN according to claim 1, wherein in step (B), the preprocessing comprises: intercepting each of the sound signals for an effective duration of 120 ms, the effective duration of 120 ms starts from a peak of each of the sound signals and ends at 120 ms backward.

4. The acoustic garbage classification method using the 1D-CNN according to claim 1, wherein in step (C), the 1D-CNN model comprises an input layer, a convolutional layer, a pooling layer, a fully connected layer and an output layer; during a training of the 1D-CNN model, the input layer is used to input preprocessed sound signals labeled with a garbage type; the convolutional layer performs a convolution operation and a feature extraction on output data of the input layer; a rectified linear unit (ReLU) activation function is used to enhance a nonlinear performance of the 1D-CNN model; a max pooling layer performs a feature dimensionality reduction, a network parameter reduction and an overfitting; and the fully connected layer and the output layer respectively perform the garbage classification and output the classification result.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0021] FIG. 1 is a flowchart of a method of the present disclosure;

[0022] FIG. 2 is a schematic view of a CNN model for acoustic garbage classification; and

[0023] FIG. 3 is a schematic view of a 1D-CNN according to the present disclosure.

DETAILED DESCRIPTION OF THE EMBODIMENTS

[0024] The present disclosure will be further described below with reference to the drawings and embodiments. The implementations of the present disclosure include but are not limited to the following embodiments.

[0025] The specific embodiments and implementations of the present disclosure are as follows.

[0026] In a specific implementation, a 1D-CNN model uses TensorFlow 1.9 as a DL framework. All algorithms are written in python 3.6. All programs are executed on a laptop with a GTX1050Ti graphics card, an 8.sup.th-generation Intel Core i5 processor and a 512 GB solid state drive (SSD).

[0027] As shown in FIG. 2, based on a sound acquisition device, various types of garbage materials are prepared. Garbage sound data is acquired, labeled and preprocessed. A simple and efficient network structure is designed to analyze the sound data. The network includes a large number of parameters, and an orthogonal experiment method is used to search for the optimal parameter combination. Orthogonal experiments are respectively implemented based on the following parameters: network depth: 1, 2, 3, 4 and 5; kernel size: 3, 5, 7, 9 and 11; and learning rate: 1e-5, 5e-5, 1e-4, 5e-4 and 1e-3. According to an orthogonal design table of six factors and five levels, 25 experiments are carried out, and the optimal parameter combination is finally obtained: network depth: 5; kernel size: 9; and learning rate: 5e-4.

[0028] As shown in FIG. 3, in an embodiment, the 1D-CNN model includes an input layer, five convolutional layers, five pooling layers, a fully connected layer and an output layer. 1D sound signals are input to the input layer of the 1D-CNN. A convolution operation is performed between the input signals and a corresponding convolution kernel to generate an input feature map. The input feature map is transmitted through an activation function to generate an output feature map of the convolutional layers. The pooling layers are usually used after the convolutional layers, which reduce the computational cost by reducing the dimensionality of the features extracted from the convolutional layers, and provide basic translation invariance to the features.

[0029] In the embodiment, the 1D-CNN model is specifically designed as follows. The convolutional layers 1 to 5 are defined by the following parameters: number of convolution kernels: 8, 16, 24, 32 and 40; kernel size: 9×9; stride: 1; border padding: SAME: and activation function: ReLU function. The max pooling layers 1 to 5 are defined by the following parameters: number of convolution kernels: 8, 16, 24, 32 and 40; kernel size: 9×9: and stride: 2. Finally, the fully connected layer and the output layer are defined. A softmax function is used for classification, and a type label 0-n is output.

[0030] The model is trained and evaluated. The training process is mainly divided into two stages: forward propagation and backpropagation. Forward propagation is a calculation process of building the model and establishing a mapping relationship between input and output. Backpropagation is to use gradient descent to train model parameters to minimize a loss function. After a final model is obtained, the classification effect of the model is evaluated through a test set in a sound database.

[0031] (D) A sound signal of garbage to be classified is acquired, and the sound signal is into the trained 1D-CNN for garbage classification to obtain a classification result.

[0032] The above described are merely preferred implementations of the present disclosure, and the present disclosure is not limited thereto. All technical solutions that achieve the objective of the present disclosure by substantially the same means fall within the protection scope of the present disclosure.