Exploring Speech Recognition and Translation APIs¶

Author: Mohammad Sayem Chowdhury

Welcome! In this notebook, I'll share my personal journey experimenting with audio-to-text and language translation APIs. The goal is to convert an audio file into text, then translate it into other languages, reflecting on the process and sharing my own insights.

In this project, I wanted to see how well I could turn spoken English into text and then translate that text into Spanish using public APIs. You'll need your own API keys and endpoints to follow along, but I'll walk you through my approach and thoughts at each step.

What's Inside¶

  • Speech to Text: Converting audio to written words
  • Language Translation: Turning English into Spanish (and more)
  • My Reflections & Experiments

Estimated time: about 25 minutes, depending on your curiosity!

In [ ]:
# I use these libraries for working with the APIs and downloading files
!pip install ibm_watson wget

Speech to Text: My Approach¶

First, I import the SpeechToTextV1 class from the ibm_watson package. If you're curious about the details, you can check out the official API documentation.

In [ ]:
from ibm_watson import SpeechToTextV1 
import json
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator

# Setting up the speech-to-text service with my credentials

The API endpoint depends on where your service is hosted. I keep my endpoint URL in a variable for easy access.

In [ ]:
speech_to_text_url = "https://stream.watsonplatform.net/speech-to-text/api"

You'll need your own API key, which you can get from your cloud provider's dashboard.

In [ ]:
my_s2t_api_key = "YOUR_API_KEY_HERE"  # Replace with your actual API key

Now, I create a speech-to-text adapter object using my endpoint and API key.

In [ ]:
my_authenticator = IAMAuthenticator(my_s2t_api_key)  # Initialize the IAM authenticator with my API key
my_s2t_service = SpeechToTextV1(authenticator=my_authenticator)  # Create a Speech to Text service instance
my_s2t_service.set_service_url(speech_to_text_url)  # Set the service URL for the Speech to Text service
my_s2t_service  # Display the service instance

Next, I download the audio file that I want to transcribe.

In [ ]:
# Downloading the sample audio file for testing
!wget -O my_sample_audio.mp3 https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/PY0101EN/labs/PolynomialRegressionandPipelines.mp3

Here's the path to the audio file I want to convert to text.

In [ ]:
audio_file_path = 'my_sample_audio.mp3'

I open the audio file in binary mode and use the recognize method to get the transcribed text. The content_type parameter tells the API what kind of audio file I'm using.

In [ ]:
with open(audio_file_path, mode="rb") as audio_file:
    my_response = my_s2t_service.recognize(audio=audio_file, content_type='audio/mp3')

The result is a dictionary with the transcription and other details.

In [ ]:
my_response.result
In [ ]:
from pandas.io.json import json_normalize

# Normalize the JSON response to flatten the data
json_normalize(my_response.result['results'], "alternatives")
In [ ]:
my_response

To get just the recognized text, I extract it from the response and store it in a variable.

In [ ]:
recognized_text = my_response.result['results'][0]["alternatives"][0]["transcript"]
type(recognized_text)

Language Translation: My Experience¶

Now, I import the LanguageTranslatorV3 class to handle translation. More details are in the API docs.

In [ ]:
from ibm_watson import LanguageTranslatorV3  # For translating text

Just like before, I set the endpoint URL for the translation service.

In [ ]:
translator_url = 'https://gateway.watsonplatform.net/language-translator/api'

You'll need your own API key for the translation service as well.

In [ ]:
my_translator_api_key = 'YOUR_TRANSLATOR_API_KEY'  # Replace with your actual API key

The API requires a version parameter. I use the current version as of this writing.

In [ ]:
translator_version = '2018-05-01'

Now, I create the language translator object.

In [ ]:
translator_authenticator = IAMAuthenticator(my_translator_api_key)  # Update API key variable name
language_translator = LanguageTranslatorV3(version=translator_version, authenticator=translator_authenticator)  # Update version variable name
language_translator.set_service_url(translator_url)  # Update URL variable name
language_translator

I can list all the languages the service can identify. This helps me know which language codes to use for translation.

In [ ]:
from pandas.io.json import json_normalize

# Retrieve and normalize the list of identifiable languages
languages_data = language_translator.list_identifiable_languages().get_result()
normalized_languages = json_normalize(languages_data, "languages")

normalized_languages

To translate the text, I use the translate method. I set the model to 'en-es' for English to Spanish. The response is a detailed object with the translation.

In [ ]:
translation_response = language_translator.translate(\
    text=recognized_text, model_id='en-es')
translation_response

The translation result is a dictionary.

In [ ]:
translation = translation_response.get_result()
translation

To get the translated string, I extract it from the dictionary.

In [ ]:
spanish_translation = translation['translations'][0]['translation']  # Get the Spanish translation from the API response
spanish_translation  # Display the Spanish translation

I can also translate the Spanish text back to English to check the accuracy.

In [ ]:
translation_back = language_translator.translate(text=spanish_translation, model_id='es-en').get_result()

Here's how I get the English translation from the response.

In [ ]:
english_translation = translation_back['translations'][0]['translation']
english_translation

Just for fun, I also tried translating the English text into French.

In [ ]:
french_translation_response = language_translator.translate(
    text=english_translation, model_id='en-fr').get_result()
In [ ]:
french_translation_response['translations'][0]['translation']

Reflections on Language Translation¶

This notebook was created and narrated by Mohammad Sayem Chowdhury. All code and explanations reflect my own learning and experimentation.

Copyright © 2025 Mohammad Sayem Chowdhury. This notebook and its content are shared for personal learning and inspiration.


My Takeaways and Next Steps¶

Working with speech-to-text and translation APIs was a fascinating experience. I learned how to process audio, extract text, and translate it between languages, all with a few lines of code. There are many ways to expand on this, such as building a simple translation app or visualizing the results. If you have ideas or want to collaborate, feel free to reach out!

— Mohammad Sayem Chowdhury