TranscribeX

WhisperX-Powered Transcription Service

Contact admin to obtain an access token

Menu

  • Home
  • Recording Tips
  • About
  • Privacy

TranscribeX

Upload Audio File

Drag & drop audio file here

or

Supported: MP3, M4A, WAV, FLAC, OGG • Max: 2 hours

to

Processing

Uploading...

0%
0 MB / 0 MB

Transcribing...

Processing on Modal GPU...

Recent Jobs

No transcription jobs yet

This month: $0.00 (0 jobs)

Speaker Identification Tips

Better transcripts with simple habits

When using WhisperX for meeting transcription, the system can identify that different people spoke, but not always who they are. These simple practices dramatically improve speaker identification in your transcripts.

The 30-Second Introduction Protocol

The single most effective thing you can do.

At the start of any recorded meeting, have everyone briefly introduce themselves:

"Let's go around quickly—I'm Sarah."
"Angela here."
"This is Rich."
"Hi, I'm John."

That's it. Takes 30 seconds. The system can now often match voices to names throughout the rest of the meeting.

Works especially well for:

  • Board meetings
  • First-time group calls
  • Podcast recordings
  • Classroom sessions (instructor intro at minimum)

Name-Drop When Directing Questions

Instead of:

"What do you think about that?"

Try:

"John, what do you think about that?"

Or when responding:

"Building on Sarah's point about the budget..."

These natural name mentions create anchors the system uses to identify speakers.

Meeting Types That Work Best

Meeting Type Expected Accuracy Tips
Podcasts Excellent Hosts usually introduce guests by name
Interviews Very Good Typically 2 speakers with clear roles
Board Meetings Good Use intro protocol, formal structure helps
Team Standups Good Recurring voices improve over time
Classroom Moderate Instructor easily identified, students harder
Casual Calls Lower Fewer name mentions, less structure

Taking Notes During Meetings (Optional)

If you want maximum accuracy, simple timestamped notes help enormously:

9:03 - John (wearing blue)
9:05 - Sarah joined late
9:15 - New person, asked about budget

Tools that timestamp automatically:

  • Notion (timestamp option available)
  • Obsidian (with timestamp plugin)
  • iPhone: Timenotes, Timestamp & Memo
  • Android: Timestamper, NoteTime, Lognote
  • macOS: Timenotes, Noted, Type
  • Windows: OneNote

Even rough timestamps help. Don't stress about precision—within a minute or two is fine.

What Doesn't Help (Don't Worry About These)

  • Audio quality for identification (diarization is robust to noise)
  • Accent differences (actually helps distinguish speakers!)
  • Speaking speed (system handles fast and slow talkers)
  • Crosstalk (brief interruptions don't confuse it much)

Quick Reference Card

Situation What To Do
Starting a meeting 30-second intro round
Asking someone a question Use their name
Want perfect accuracy Take simple timestamped notes
Recurring meetings Just keep using it—it learns

Common Questions

Q: What if someone never says their name?
A: They'll appear as "SPEAKER_03" or similar. You can manually edit the transcript if needed, or post-identify them later.

Q: Does everyone need to introduce themselves?
A: No, but it helps. Even identifying 3 of 5 speakers makes the transcript more useful.

Q: What about very large meetings (20+ people)?
A: Focus on identifying key speakers (presenters, leads). Minor contributors can stay as "SPEAKER_XX" if needed.

Q: Can the system identify me automatically?
A: Future versions may support voice enrollment—save your voice profile once, get recognized forever. Not yet implemented.

The Bottom Line

One habit to adopt: Start meetings with quick introductions.

One habit when speaking: Use people's names when addressing them.

That's 80% of the value with almost no effort.

About TranscribeX

TranscribeX is a WhisperX-powered transcription service that provides accurate audio-to-text conversion with speaker identification.

Features

  • High-accuracy transcription using OpenAI's Whisper models
  • Automatic speaker diarization (who spoke when)
  • Multiple GPU options for different processing speeds
  • Support for various audio formats (MP3, M4A, WAV, FLAC, OGG)
  • Secure file handling and processing

Technology Stack

  • WhisperX: Enhanced Whisper with speaker diarization
  • Modal: Serverless GPU compute platform
  • Flask: Python web framework

This is a development instance. More information will be added as the service evolves.

Privacy & Data Control

It's dev, and you are in control of your data.

How Your Data Flows

  1. Upload: Your audio file is uploaded to the TranscribeX server
    • Stored temporarily in the local filesystem
    • Access controlled by your authentication token
  2. Processing: File is sent to Modal's GPU infrastructure
    • Processed in isolated, ephemeral containers
    • No data persistence on Modal after processing completes
  3. Storage: Transcription results return to TranscribeX server
    • Results stored as JSON and Markdown files
    • Original audio files can be deleted after processing
  4. Access: You download your results
    • Only accessible with your authentication token
    • Files remain on server until manually deleted or cleaned up

Data Control

You control:

  • What files you upload
  • When you download results
  • How long your data remains (contact admin for deletion)

We provide:

  • Secure token-based authentication
  • No sharing of your data with third parties
  • Transparent processing pipeline

Development Status

This is a development instance. Privacy policies and data handling procedures may evolve as the service matures. Current focus is on functionality and user control.

Questions about your data? Contact the system administrator.

TranscribeX • Powered by WhisperX on Modal