loading page

IIST BCI Dataset-1 for Selected Common Malayalam Words
  • +4
  • Parvathi Nair,
  • Parvathy S S,
  • Nancy Sunil,
  • Anurag Mukati,
  • Charu Chauhan,
  • S Sumitra,
  • B S Manoj
Parvathi Nair
Indian Institute of Technology Roorkee

Corresponding Author:[email protected]

Author Profile
Parvathy S S
A J College of Science and Technology, Thonnakkal

Corresponding Author:[email protected]

Author Profile
Nancy Sunil
A J College of Science and Technology, Thonnakkal
Author Profile
Anurag Mukati
Indore Institute of Science and Technology, Indore
Author Profile
Charu Chauhan
Indian Institute of Space Science and Technology, Trivandrum
S Sumitra
Indian Institute of Space Science and Technology, Trivandrum
B S Manoj
Indian Institute of Space Science and Technology, Trivandrum

Abstract

Designing Brain Computer Interfaces (BCIs), for helping patients, needs appropriate datasets which are relevant for the language of the patients. There exists a significant shortage of datasets for Indian languages that can be used for BCI research. Malayalam is a prominent south Indian language spoken by more than 34 million people, yet, there exist no BCI datasets for research. We address this issue by creating a dataset for selected Malayalam words by collecting Electro Encephalograph (EEG) signal samples. Our dataset was created by generating EEG samples using the OpenBCI Cyton device when the commonly used Malayalam words were spoken by a volunteer. The created dataset consists of three major types of data: (i) EEG data for spoken Malayalam words, (ii) EEG data for the spoken English words which were closest to the English translation of the corresponding Malayalam words, and (iii) EEG data for sub-vocal (silent) pronunciation of the Malayalam words. We created the dataset for 26 words where each of these words had been recorded for the above mentioned three types. For each word, 10 EEG samples over 8 channels were recorded. This dataset is useful for developing BCI solutions for patients suffering from neuro-degenerative diseases by developing Machine Learning (ML) classifiers for translating EEG-signals to Malayalam words, vocal or sub-vocal, especially considering the scarcity of datasets available in Indian languages.
29 Dec 2023Submitted to TechRxiv
08 Jan 2024Published in TechRxiv