Electroencephalogram-based Multiclass Auditory Attention Decoding of Attended Speaker Direction

Yuanming Zhang; Jing Lu; Zhibin Lin; Fei Chen; Haoliang Du; Xia Gao

doi:10.36227/techrxiv.24592356.v1

loading page

Electroencephalogram-based Multiclass Auditory Attention Decoding of Attended Speaker Direction

Yuanming Zhang ,
Jing Lu ,
Zhibin Lin ,
Fei Chen ,
Haoliang Du ,
Xia Gao

Abstract

Decoding the directional focus of an attended speaker from listenersâ\euro™ electroencephalogram signals is an important part of a practical brain-computer interface device aimed at improving the quality of life for individuals with hearing impairment. Existing works focus on binary directional focus decoding, i.e., determining whether the attended speaker is on the left or right side of the listener. However, the information brought by the binary decoding to the subsequent speech processing algorithm is limited in practical applications, and more precise decoding of the exact direction of the attended speaker is desired. In this paper, we first present a new dataset with 15 alternative speaker directions, and then demonstrate the feasibility of multiclass directional focus decoding of attended speakers by applying our recently proposed learnable spatial mapping (LSM) module, the benefit of which has already been proven in binary decoding scenarios. Apart from combining the LSM module with the convolutional neural network (CNN), we further validate its benefit by combining it with the spectro-spatial-temporal convolutional recurrent network (CRN), a recently proposed state-of-the-art model for binary directional focus decoding. The proposed LSM-CNN and LSM-CRN models achieve a noteworthy decoding accuracy of 85.7% and 87.5%, respectively, in the presented subject-independent dataset with a decision window length of 1 second. Comprehensive experiments not only substantiate the advantages attributed to the LSM module, but also verify the influence of the decision window length, the distraction caused by interfering speakers, and the contribution of different subband of EEG signals.