Common Voice
Common Voice’s multi-language dataset is the largest publicly available voice dataset of its kind. Each entry in the dataset consists of a unique MP3 and corresponding text file. Many of the 32,585 recorded hours in the dataset also include demographic metadata. The dataset currently consists of 21,594 validated hours in 131 languages.