论文标题

非洲裔:低资源语言的MNIST风格数据集的合成生成

Afro-MNIST: Synthetic generation of MNIST-style datasets for low-resource languages

论文作者

Wu, Daniel J, Yang, Andrew C, Prabhu, Vinay U

论文摘要

我们介绍了非洲裔MNIST,这是一套用于非洲亚洲和尼日尔语言中的四片拼字图的合成MNIST风格的数据集:Ge`Ez(Ethiopic),Vai,Osmanya和N'ko。这些数据集用作MNIST的“倒入”替代品。我们还描述和开源是从每个数字的单个示例中生成合成MNIST式数据集生成的方法。这些数据集可在https://github.com/daniel-wu/afromnist上找到。我们希望将为其他数字系统开发MNIST风格的数据集,并且这些数据集在研究界的代表性不足的国家中使机器学习教育具有生命。

We present Afro-MNIST, a set of synthetic MNIST-style datasets for four orthographies used in Afro-Asiatic and Niger-Congo languages: Ge`ez (Ethiopic), Vai, Osmanya, and N'Ko. These datasets serve as "drop-in" replacements for MNIST. We also describe and open-source a method for synthetic MNIST-style dataset generation from single examples of each digit. These datasets can be found at https://github.com/Daniel-Wu/AfroMNIST. We hope that MNIST-style datasets will be developed for other numeral systems, and that these datasets vitalize machine learning education in underrepresented nations in the research community.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源