Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
The Azure AI Speech Transcription client library provides easy access to Azure's speech-to-text transcription service, enabling you to convert audio to text with high accuracy.
Use the client library to:
- Transcribe audio files to text
- Support multiple languages and locales
- Enable speaker diarization to identify different speakers
- Apply profanity filtering
- Use custom speech models
- Process both local files and remote URLs
Source code | Package (NuGet) | API reference documentation | Product documentation
Getting started
Prerequisites
- .NET 8.0 SDK or later
- Azure Subscription
- An Azure AI Speech resource or an Azure AI Foundry resource
Install the package
Install the client library for .NET with NuGet:
dotnet add package Azure.AI.Speech.Transcription --prerelease
Authenticate the client
Azure Speech Transcription supports two authentication methods:
Option 1: Entra ID OAuth2 Authentication (Recommended for Production)
For production scenarios, it's recommended to use Entra ID authentication with managed identities or service principals. This provides better security and easier credential management.
using Azure.Identity;
using Azure.AI.Speech.Transcription;
// Use DefaultAzureCredential which works with managed identities, service principals, Azure CLI, etc.
DefaultAzureCredential credential = new DefaultAzureCredential();
Uri endpoint = new Uri("https://<your-region>.api.cognitive.microsoft.com");
TranscriptionClient client = new TranscriptionClient(endpoint, credential);
Note: To use Azure Identity authentication, you need to:
- Add the
Azure.Identitypackage to your project - Assign the appropriate role (e.g., "Cognitive Services User") to your managed identity or service principal
- Ensure your Speech resource has Entra ID authentication enabled
For more information on Entra ID authentication, see:
Option 2: API Key Authentication (Subscription Key)
You can find your Speech resource's API key in the Azure Portal or by using the Azure CLI:
az cognitiveservices account keys list --name <your-resource-name> --resource-group <your-resource-group>
Once you have an API key, you can authenticate using ApiKeyCredential:
using System;
using System.ClientModel;
using Azure.AI.Speech.Transcription;
Uri endpoint = new Uri("https://<your-region>.api.cognitive.microsoft.com/");
ApiKeyCredential credential = new ApiKeyCredential("<your-api-key>");
TranscriptionClient client = new TranscriptionClient(endpoint, credential);
Service API versions
The client library targets the latest service API version by default. A client instance accepts an optional service API version parameter from its options to specify which API version service to communicate.
Select a service API version
You have the flexibility to explicitly select a supported service API version when instantiating a client by configuring its associated options. This ensures that the client can communicate with services using the specified API version.
For example,
Uri endpoint = new Uri("https://myaccount.api.cognitive.microsoft.com/");
ApiKeyCredential credential = new("your apikey");
TranscriptionClientOptions options = new TranscriptionClientOptions(TranscriptionClientOptions.ServiceVersion.V20251015);
TranscriptionClient client = new TranscriptionClient(endpoint, credential, options);
When selecting an API version, it's important to verify that there are no breaking changes compared to the latest API version. If there are significant differences, API calls may fail due to incompatibility.
Always ensure that the chosen API version is fully supported and operational for your specific use case and that it aligns with the service's versioning policy.
Key concepts
TranscriptionClient
The TranscriptionClient is the primary interface for interacting with the Speech Transcription service. It provides methods to transcribe audio to text.
Audio Formats
The service supports various audio formats including WAV, MP3, OGG, and more. Audio must be:
- Shorter than 2 hours in duration
- Smaller than 250 MB in size
Transcription Options
You can customize transcription with options like:
- Profanity filtering: Control how profanity is handled in transcriptions
- Speaker diarization: Identify different speakers in multi-speaker audio
- Phrase lists: Provide domain-specific phrases to improve accuracy
- Language detection: Automatically detect the spoken language
- Enhanced mode: Improve transcription quality with custom prompts, translation, and task-specific configurations
Thread safety
We guarantee that all client instance methods are thread-safe and independent of each other (guideline). This ensures that the recommendation of reusing client instances is always safe, even across threads.
Additional concepts
Client options | Accessing the response | Long-running operations | Handling failures | Diagnostics | Mocking | Client lifetime
Examples
- Create a TranscriptionClient
- Transcribe a local audio file
- Transcribe audio from a URL
- Access individual transcribed words
- Identify speakers with diarization
- Filter profanity
- Improve accuracy with custom phrases
- Transcribe with a known language
- Use Enhanced Mode for highest accuracy
- Combine multiple options
Create a TranscriptionClient
Create a TranscriptionClient using your Speech service endpoint and API key:
using System;
using System.ClientModel;
using Azure.AI.Speech.Transcription;
Uri endpoint = new Uri("https://myaccount.api.cognitive.microsoft.com/");
ApiKeyCredential credential = new ApiKeyCredential("your-api-key");
TranscriptionClient client = new TranscriptionClient(endpoint, credential);
Transcribe a local audio file
The most basic operation is to transcribe an audio file from your local filesystem:
string audioFilePath = "path/to/audio.wav";
using FileStream audioStream = File.OpenRead(audioFilePath);
TranscriptionOptions options = new TranscriptionOptions(audioStream);
ClientResult<TranscriptionResult> response = await client.TranscribeAsync(options);
// Get the transcribed text
var channelPhrases = response.Value.PhrasesByChannel.First();
Console.WriteLine(channelPhrases.Text);
For synchronous transcription, use the Transcribe method instead of TranscribeAsync.
Transcribe audio from a URL
You can transcribe audio directly from a publicly accessible URL without downloading the file first:
Uri audioUrl = new Uri("https://example.com/audio/sample.wav");
TranscriptionOptions options = new TranscriptionOptions(audioUrl);
ClientResult<TranscriptionResult> response = await client.TranscribeAsync(options);
TranscriptionResult result = response.Value;
Console.WriteLine($"Transcribed audio from URL: {audioUrl}");
var channelPhrases = result.PhrasesByChannel.First();
Console.WriteLine($"\nTranscription:\n{channelPhrases.Text}");
Access individual transcribed words
To access word-level details including timestamps, confidence scores, and individual words:
string audioFilePath = "path/to/audio.wav";
using FileStream audioStream = File.OpenRead(audioFilePath);
TranscriptionOptions options = new TranscriptionOptions(audioStream);
ClientResult<TranscriptionResult> response = await client.TranscribeAsync(options);
// Access the first channel's phrases
var channelPhrases = response.Value.PhrasesByChannel.First();
// Iterate through each phrase (typically sentences or segments)
foreach (TranscribedPhrase phrase in channelPhrases.Phrases)
{
Console.WriteLine($"\nPhrase: {phrase.Text}");
Console.WriteLine($" Offset: {phrase.Offset} | Duration: {phrase.Duration}");
Console.WriteLine($" Confidence: {phrase.Confidence:F2}");
// Access individual words in the phrase
foreach (TranscribedWord word in phrase.Words)
{
Console.WriteLine($" Word: '{word.Text}' | Confidence: {word.Confidence:F2} | Offset: {word.Offset}");
}
}
Identify speakers with diarization
Speaker diarization identifies who spoke when in multi-speaker conversations:
string audioFilePath = "path/to/conversation.wav";
using FileStream audioStream = File.OpenRead(audioFilePath);
TranscriptionOptions options = new TranscriptionOptions(audioStream)
{
DiarizationOptions = new TranscriptionDiarizationOptions
{
MaxSpeakers = 4 // Expect up to 4 speakers in the conversation
}
};
ClientResult<TranscriptionResult> response = await client.TranscribeAsync(options);
TranscriptionResult result = response.Value;
Console.WriteLine("Transcription with speaker diarization:");
var channelPhrases = result.PhrasesByChannel.First();
foreach (TranscribedPhrase phrase in channelPhrases.Phrases)
{
Console.WriteLine($"Speaker {phrase.Speaker}: {phrase.Text}");
}
Filter profanity
Control how profanity appears in your transcriptions using different filter modes:
string audioFilePath = "path/to/audio-with-profanity.wav";
using FileStream audioStream = File.OpenRead(audioFilePath);
TranscriptionOptions options = new TranscriptionOptions(audioStream)
{
ProfanityFilterMode = ProfanityFilterMode.Masked // Default - profanity replaced with asterisks
};
ClientResult<TranscriptionResult> response = await client.TranscribeAsync(options);
TranscriptionResult result = response.Value;
var channelPhrases = result.PhrasesByChannel.First();
Console.WriteLine(channelPhrases.Text); // Profanity will appear as "f***"
Available modes:
None: No filtering - profanity appears as spokenMasked: Profanity replaced with asterisks (e.g., "f***")Removed: Profanity completely removed from textTags: Profanity wrapped in XML tags (e.g., "<profanity>word</profanity>")
Improve accuracy with custom phrases
Add custom phrases to help the service correctly recognize domain-specific terms, names, and acronyms:
string audioFilePath = "path/to/audio.wav";
using FileStream audioStream = File.OpenRead(audioFilePath);
TranscriptionOptions options = new TranscriptionOptions(audioStream)
{
PhraseList = new PhraseListProperties()
};
// Add names, locations, and terms that might be misrecognized
options.PhraseList.Phrases.Add("Contoso");
options.PhraseList.Phrases.Add("Jessie");
options.PhraseList.Phrases.Add("Rehaan");
ClientResult<TranscriptionResult> response = await client.TranscribeAsync(options);
TranscriptionResult result = response.Value;
var channelPhrases = result.PhrasesByChannel.First();
Console.WriteLine(channelPhrases.Text);
Transcribe with a known language
When you know the language of the audio, specifying a single locale improves accuracy and reduces latency:
string audioFilePath = "path/to/english-audio.mp3";
using FileStream audioStream = File.OpenRead(audioFilePath);
TranscriptionOptions options = new TranscriptionOptions(audioStream);
options.Locales.Add("en-US");
ClientResult<TranscriptionResult> response = await client.TranscribeAsync(options);
TranscriptionResult result = response.Value;
var channelPhrases = result.PhrasesByChannel.First();
Console.WriteLine(channelPhrases.Text);
For language identification when you're unsure of the language, specify multiple candidate locales and the service will automatically detect the language. See Sample08_TranscribeWithLocales.cs for details.
Use Enhanced Mode for highest accuracy
Enhanced Mode uses LLM-powered processing for the highest accuracy transcription and translation:
string audioFilePath = "path/to/audio.wav";
using FileStream audioStream = File.OpenRead(audioFilePath);
EnhancedModeProperties enhancedMode = new EnhancedModeProperties
{
Task = "transcribe"
};
TranscriptionOptions options = new TranscriptionOptions(audioStream)
{
EnhancedMode = enhancedMode
};
ClientResult<TranscriptionResult> response = await client.TranscribeAsync(options);
TranscriptionResult result = response.Value;
var channelPhrases = result.PhrasesByChannel.First();
Console.WriteLine(channelPhrases.Text);
Enhanced Mode also supports translation. See Sample04_EnhancedMode.cs for translation examples.
Combine multiple options
You can combine multiple transcription features for complex scenarios:
string audioFilePath = "path/to/meeting.wav";
using FileStream audioStream = File.OpenRead(audioFilePath);
TranscriptionOptions options = new TranscriptionOptions(audioStream);
// Enable speaker diarization to identify different speakers
options.DiarizationOptions = new TranscriptionDiarizationOptions
{
MaxSpeakers = 5
};
// Mask profanity in the transcription
options.ProfanityFilterMode = ProfanityFilterMode.Masked;
// Add custom phrases to improve recognition of domain-specific terms
options.PhraseList = new PhraseListProperties();
options.PhraseList.Phrases.Add("action items");
options.PhraseList.Phrases.Add("Q4");
options.PhraseList.Phrases.Add("KPIs");
ClientResult<TranscriptionResult> response = await client.TranscribeAsync(options);
TranscriptionResult result = response.Value;
// Display results
var channelPhrases = result.PhrasesByChannel.First();
Console.WriteLine("Full Transcript:");
Console.WriteLine(result.CombinedPhrases.First().Text);
Troubleshooting
Common issues
- Authentication failures: Verify your API key or Entra ID credentials are correct and that your Speech resource is active.
- Unsupported audio format: Ensure your audio is in a supported format (WAV, MP3, OGG, FLAC, etc.). The service automatically handles format detection.
- Slow transcription: For large files, consider using asynchronous transcription or ensure your network connection is stable.
- Poor accuracy: Try specifying the correct locale, adding custom phrases for domain-specific terms, or using Enhanced Mode.
Exceptions
The library throws exceptions for various error conditions:
RequestFailedException: The service returned an error response (checkStatusandErrorCodefor details)ArgumentException: Invalid parameters were provided to a methodInvalidOperationException: The operation cannot be performed in the current state
Enable client logging
You can enable logging to debug issues with the client library. For more information, see the diagnostics documentation.
Next steps
Explore additional samples to learn more about advanced features:
- Sample01_BasicTranscription.cs - Create clients and basic transcription
- Sample02_TranscriptionOptions.cs - Combine multiple transcription features
- Sample03_TranscribeFromUrl.cs - Transcribe from remote URLs
- Sample04_EnhancedMode.cs - LLM-powered transcription and translation
- Sample05_TranscribeWithProfanityFilter.cs - All profanity filtering modes
- Sample06_TranscribeWithDiarization.cs - Speaker identification
- Sample07_TranscribeWithPhraseList.cs - Custom vocabulary
- Sample08_TranscribeWithLocales.cs - Language specification and detection
- Sample09_MultilingualTranscription.cs - Multilingual content
Contributing
For details on contributing to this repository, see the contributing guide.
- Fork it
- Create your feature branch (
git checkout -b my-new-feature) - Commit your changes (
git commit -am 'Add some feature') - Push to the branch (
git push origin my-new-feature) - Create new Pull Request
Azure SDK for .NET