Electroencephalogram-based Multiclass Auditory Attention Decoding of
Attended Speaker Direction
Abstract
Decoding the directional focus of an attended speaker from
listenersâ\euro™ electroencephalogram signals is an important part of
a practical brain-computer interface device aimed at improving the
quality of life for individuals with hearing impairment. Existing works
focus on binary directional focus decoding, i.e., determining whether
the attended speaker is on the left or right side of the listener.
However, the information brought by the binary decoding to the
subsequent speech processing algorithm is limited in practical
applications, and more precise decoding of the exact direction of the
attended speaker is desired. In this paper, we first present a new
dataset with 15 alternative speaker directions, and then demonstrate the
feasibility of multiclass directional focus decoding of attended
speakers by applying our recently proposed learnable spatial mapping
(LSM) module, the benefit of which has already been proven in binary
decoding scenarios. Apart from combining the LSM module with the
convolutional neural network (CNN), we further validate its benefit by
combining it with the spectro-spatial-temporal convolutional recurrent
network (CRN), a recently proposed state-of-the-art model for binary
directional focus decoding. The proposed LSM-CNN and LSM-CRN models
achieve a noteworthy decoding accuracy of 85.7% and 87.5%,
respectively, in the presented subject-independent dataset with a
decision window length of 1 second. Comprehensive experiments not only
substantiate the advantages attributed to the LSM module, but also
verify the influence of the decision window length, the distraction
caused by interfering speakers, and the contribution of different
subband of EEG signals.