Creating Live Translation API With OpenAPI and Python for Audio Streams

Using OpenAPI (also known as Swagger) to create a live translation audio service involves defining the API endpoints for capturing audio, translating text, and converting text to speech. Here’s a high-level outline and example implementation using Python with Flask for the API server, and Google Cloud services for speech recognition, translation, and text-to-speech.

Step-by-Step Guide:

  1. Set Up Google Cloud Services:
  • Ensure you have a Google Cloud account and enable the necessary APIs: Speech-to-Text, Translate, and Text-to-Speech.
  • Obtain API keys for accessing these services.
  1. Install Required Python Packages:
  • Flask: Web framework for creating the API server.
  • Google Cloud libraries: For interacting with Google Cloud services.
  • Flask-RESTPlus: For OpenAPI support.
pip install Flask flask-restplus google-cloud-speech google-cloud-translate google-cloud-texttospeech
  1. Define the OpenAPI Specification:
  • Create a YAML or JSON file to define your API endpoints.

Example OpenAPI Specification (openapi.yaml):

openapi: 3.0.0
info:
  title: Live Translation API
  version: 1.0.0
paths:
  /translate:
    post:
      summary: Translate live audio stream
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              properties:
                audio:
                  type: string
                  format: binary
                target_language:
                  type: string
      responses:
        '200':
          description: Translated audio stream
          content:
            audio/mpeg:
              schema:
                type: string
                format: binary
  1. Implement the API Server in Python:

Example Implementation (app.py):

import os
from flask import Flask, request, send_file
from flask_restplus import Api, Resource, fields
from google.cloud import speech, translate_v2 as translate, texttospeech

app = Flask(__name__)
api = Api(app, version='1.0', title='Live Translation API',
          description='API for live translation of audio streams')

ns = api.namespace('translate', description='Translation operations')

translate_model = api.model('Translate', {
    'audio': fields.String(required=True, description='Audio file in binary format'),
    'target_language': fields.String(required=True, description='Target language code')
})

client_speech = speech.SpeechClient()
client_translate = translate.Client()
client_text_to_speech = texttospeech.TextToSpeechClient()

@ns.route('/')
class Translate(Resource):
    @api.expect(translate_model)
    def post(self):
        data = request.json
        audio_content = data['audio']
        target_language = data['target_language']

        # Speech-to-Text
        audio = speech.RecognitionAudio(content=audio_content)
        config = speech.RecognitionConfig(
            encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
            sample_rate_hertz=16000,
            language_code="en-US",
        )
        response = client_speech.recognize(config=config, audio=audio)
        transcript = response.results[0].alternatives[0].transcript

        # Translate
        translation = client_translate.translate(transcript, target_language=target_language)
        translated_text = translation['translatedText']

        # Text-to-Speech
        synthesis_input = texttospeech.SynthesisInput(text=translated_text)
        voice = texttospeech.VoiceSelectionParams(
            language_code=target_language,
            ssml_gender=texttospeech.SsmlVoiceGender.NEUTRAL
        )
        audio_config = texttospeech.AudioConfig(audio_encoding=texttospeech.AudioEncoding.MP3)
        response = client_text_to_speech.synthesize_speech(
            input=synthesis_input, voice=voice, audio_config=audio_config
        )

        audio_filename = 'translated_audio.mp3'
        with open(audio_filename, 'wb') as out:
            out.write(response.audio_content)

        return send_file(audio_filename, mimetype='audio/mpeg')

if __name__ == '__main__':
    app.run(debug=True)
  1. Run the API Server:
  • Start the Flask server to host your API.
python app.py
  1. Test the API:
  • Use tools like Postman or curl to send requests to your API and verify it works as expected.

Example curl Request:

curl -X POST "http://127.0.0.1:5000/translate/" -H "accept: audio/mpeg" -H "Content-Type: application/json" -d '{"audio": "<base64_encoded_audio>", "target_language": "ru"}' --output translated_audio.mp3

Note:

  • The audio in the request body should be base64 encoded.
  • The above code assumes that the input audio is in the LINEAR16 format. Adjust the configuration based on your input audio format.

By following these steps, you can create an API for live translation of audio streams using OpenAPI and Google Cloud services. If you have specific requirements or encounter any issues, feel free to ask for further assistance!

Comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.