Podlingo

Project Overview

Podlingo is an AI-powered platform designed to translate podcast episodes into multiple languages, making them accessible to a global audience.

Technologies Used

Frontend: React, Next.js, TailwindCSS
Backend:
- PostgreSQL
- Supabase for authentication and real-time database
AI Services:
- OpenAI Whisper for speech-to-text
- ElevenLabs for voice cloning and speech synthesis
- DeepL API for translation
- Custom pipeline for identifying and isolating music/effects
Cloud Infrastructure:
- AWS EC2 for translation workers
- AWS S3 for storage
Programming Languages: TypeScript, Python
Version Control: Git, GitHub
CI/CD: GitHub Actions

Key Features

Full-Stack Development

Built with Next.js for a responsive and dynamic frontend experience, and Supabase for a reliable and scalable backend infrastructure. This combination ensures seamless data handling and user interactions. Additionally, implemented Python workers specifically for translation tasks, enabling efficient and scalable processing of podcast translations in the backend.

User Interface

Designed a clean and intuitive interface that allows users to:

Search for podcasts easily
Browse available episodes
Select episodes for translation with minimal effort

Custom Translation Pipeline

Developed a robust translation pipeline that includes:

Voice Cloning & Synthesis: Utilized ElevenLabs to replicate the original speaker’s voice, maintaining consistency in translated episodes.
Audio Processing: Implemented processes to handle the separation of music and sound effects from spoken content, preserving the quality and integrity of the original audio.

Challenges and Solutions

Maintaining Audio Quality: Ensuring that translations do not compromise the original audio elements was critical. This was addressed by integrating specialized models to isolate and retain background music and sound effects.
Efficient Processing: Managing the translation workload required an efficient backend. Implementing Python workers allowed for scalable and timely processing of translation requests.
Language Alignment: Different languages have varying speaking durations, which can affect the synchronization of audio effects. Developed an algorithm to align translated speech with the original audio timings, ensuring that sound effects and music remain properly synchronized across different languages.

George Toumbas

Translation Demos

The Man Who Couldn't Stop Going to College

153- Adrianople