George Toumbas

3 min read

Podlingo

Podcast Search
Library Page
Translations Page
Podcast Page

Translation Demos

The Man Who Couldn't Stop Going to College

The Daily

153- Adrianople

The History of Rome

Project Overview

Podlingo is an AI-powered platform designed to translate podcast episodes into multiple languages, making them accessible to a global audience.

Technologies Used

  • Frontend: React, Next.js, TailwindCSS
  • Backend:
    • PostgreSQL
    • Supabase for authentication and real-time database
  • AI Services:
    • OpenAI Whisper for speech-to-text
    • ElevenLabs for voice cloning and speech synthesis
    • DeepL API for translation
    • Custom pipeline for identifying and isolating music/effects
  • Cloud Infrastructure:
    • AWS EC2 for translation workers
    • AWS S3 for storage
  • Programming Languages: TypeScript, Python
  • Version Control: Git, GitHub
  • CI/CD: GitHub Actions

Key Features

Full-Stack Development

Built with Next.js for a responsive and dynamic frontend experience, and Supabase for a reliable and scalable backend infrastructure. This combination ensures seamless data handling and user interactions. Additionally, implemented Python workers specifically for translation tasks, enabling efficient and scalable processing of podcast translations in the backend.

User Interface

Designed a clean and intuitive interface that allows users to:

  • Search for podcasts easily
  • Browse available episodes
  • Select episodes for translation with minimal effort

Custom Translation Pipeline

Developed a robust translation pipeline that includes:

  • Voice Cloning & Synthesis: Utilized ElevenLabs to replicate the original speaker’s voice, maintaining consistency in translated episodes.
  • Audio Processing: Implemented processes to handle the separation of music and sound effects from spoken content, preserving the quality and integrity of the original audio.

Challenges and Solutions

  • Maintaining Audio Quality: Ensuring that translations do not compromise the original audio elements was critical. This was addressed by integrating specialized models to isolate and retain background music and sound effects.
  • Efficient Processing: Managing the translation workload required an efficient backend. Implementing Python workers allowed for scalable and timely processing of translation requests.
  • Language Alignment: Different languages have varying speaking durations, which can affect the synchronization of audio effects. Developed an algorithm to align translated speech with the original audio timings, ensuring that sound effects and music remain properly synchronized across different languages.