What this API does
AssemblyAI provides a robust speech-to-text API that transforms audio and video content into accurate, searchable text. Key features include speaker diarization for identifying different speakers, sentiment analysis for understanding emotional tone, and large language model powered summaries that condense lengthy transcripts.
How it works
Developers can upload pre-recorded audio and video files or stream live audio for transcription. The API supports RESTful endpoints, returning responses in JSON format to facilitate integration with various programming environments. This API is adaptable for applications in media, customer service, and healthcare.
Authentication
To access the AssemblyAI API, developers need to sign up for an API key. This key must be included in the headers of API requests for authentication and tracking.
Example usage
POST /v2/transcript- Submits audio or video for transcription.GET /v2/transcript/{id}- Retrieves the transcription result based on the transcript ID.POST /v2/diarize- Requests speaker diarization on an audio file.
Limits
Limits on transcription duration and request frequency may apply; however, specific values are not documented. It's advisable to monitor usage to avoid exceeding any unspecified limits.
Ideal use cases
- Building applications for media transcription and analysis.
- Creating customer support tools that analyze call sentiments.
- Developing educational platforms with transcribed lectures.
- Integrating voice-to-text features in healthcare applications.