Contenerized text-to-speech engine based on Facebook's Massively Multilingual Speech model with web API
Jacek Kowalski
2024-08-01 71e49cf0d406c0c7fc6abd8d5b682dc10114e9b0
commit | author | age
4e7cca 1 # Text-to-speech API based on Facebook's MMS
JK 2
3 Simple Python-based container with everything that is needed
4 to have a self-hosted web-based text-to-speech API.
5
6 ## Using the container
7
8 Just run:
9
10 ```
11 docker run -d -p 8000:8000 ghcr.io/jacekkow/docker-text-to-speech-api-mms:master
12 ```
13
14 and then visit http://localhost:8000/docs
15
16 There is a simple `/sythesize` endpoint that expects a JSON and returns a wave file:
17
18 ```
19 curl -o result.wav -X 'POST' \
20   'http://localhost:8000/synthesize' \
21   -H 'Content-Type: application/json' \
22   -d '{"language": "en", "text": "Sample text."}'
23 ```
24
25 ## Adding languages
26
27 Currently only English and Polish models are included in the image.
28
29 To add additional languages you can simply add extra entries
30 in `src/config.py` file and rebuild the container.
31
32 Supported language model codes can be found on Hugging Face:
33 https://huggingface.co/models?sort=downloads&search=facebook%2Fmms-tts-