Get audio effects detection insights

Applies to: Cloud-based Azure AI Video Indexer

Audio effects detection detects acoustic events and classifies them into categories like laughter, crowd reactions, alarms, or sirens.

Audio effects use cases

Improve accessibility by offering more context for a hearing- impaired audience by transcription of nonspeech effects.
Improving efficiency when creating raw data for content creators. Important moments in promos and trailers such as laughter, crowd reactions, gunshots, or explosions can be identified, for example, in Media and Entertainment.
Detect and classify gunshots, explosions, and glass shattering in a smart-city system or in other public environments that include cameras and microphones.

Supported audio categories

Audio effects detection can detect and classify effects into standard and advanced categories. For more information, see pricing.

The following table shows which categories are supported depending on Preset Name (Audio Only / Video + Audio vs. Advance Audio / Advance Video + Audio). When you're using the Advanced indexing, categories appear in the Insights pane of the website.

Class	Standard indexing	Advanced indexing
Crowd Reactions		✔️
Silence	✔️	✔️
Gunshot or explosion		✔️
Breaking glass		✔️
Alarm or siren		✔️
Laughter		✔️
Dog		✔️
Bell ringing		✔️
Bird		✔️
Car		✔️
Engine		✔️
Crying		✔️
Music playing		✔️
Screaming		✔️
Thunderstorm		✔️

View the insight JSON with the web portal

After you upload and index a video, download insights in JSON format from the web portal.

Select the Library tab.
Select the media you want.
Select Download, and then select Insights (JSON). The JSON file opens in a new browser tab.
Find the key pair described in the example response.

Here's an example showing the audio effects detection.

Use the API

Use a Get Video Index request. Pass &includeSummarizedInsights=false.
Find the key pairs described in the example response.

Example response

    "audioEffects": [
      {
        "id": 1,
        "type": "Silence",
        "instances": [
          {
            "confidence": 0,
            "adjustedStart": "0:01:46.243",
            "adjustedEnd": "0:01:50.434",
            "start": "0:01:46.243",
            "end": "0:01:50.434"
          }
        ]
      },
      {
        "id": 2,
        "type": "Speech",
        "instances": [
          {
            "confidence": 0,
            "adjustedStart": "0:00:00",
            "adjustedEnd": "0:01:43.06",
            "start": "0:00:00",
            "end": "0:01:43.06"
          }
        ]
      }
    ]

Important

Read the transparency note overview for VI features.

Sample code

See all samples for VI

Closed captions

Audio effects in closed caption files appear as square brackets:

Type	Example
SRT	00:00:00,000 00:00:03,671 [Gunshot or explosion]
VTT	00:00:00.000 00:00:03.671 [Gunshot or explosion]
TTML	Confidence: 0.9047 `<p begin="00:00:00.000" end="00:00:03.671">[Gunshot or explosion]</p>`
TXT	[Gunshot or explosion]
CSV	0.9047,00:00:00.000,00:00:03.671, [Gunshot or explosion]

Note

Silence event type isn't added to the closed captions.
Minimum timer duration to show an event is 700 milliseconds.

Add audio effects to closed caption files

To include audio effects in closed captions files, you can use the API or the web portal.

API

You can add audio effects to closed captions files with the Get video captions request and by choosing true for the includeAudioEffects parameter.

Note

When you use the update transcript from closed caption files or update custom language model from closed caption files, audio effects included in those files are ignored.

Web portal

You can also use the web portal by selecting Download -> Closed Captions -> Include Audio Effects.

Get keywords extraction insights

Feedback

Was this page helpful?

Last updated on 2026-01-06