Skip to main content

SonixAudioUtils

Utility functions for manipulating audio data: concatenation, peak normalization, offline mixing, and frame/time conversions.

Quick Start

Kotlin

// Concatenate audio clips
val combined = SonixAudioUtils.concatenate(listOf(intro, verse, chorus))

// Peak-normalize audio
val normalized = SonixAudioUtils.normalize(decoded)

// Mix two tracks with custom gains
val mixed = SonixAudioUtils.mix(
audioList = listOf(vocal, backing),
gains = floatArrayOf(1.0f, 0.5f)
)

Swift

let combined = SonixAudioUtils.concatenate([intro, verse, chorus])
let normalized = SonixAudioUtils.normalize(audio: decoded)
let mixed = SonixAudioUtils.mix(audioList: [vocal, backing], gains: [1.0, 0.5])

Methods

concatenate

Join multiple audio clips end-to-end. All inputs must have the same sample rate and channel count.

val combined = SonixAudioUtils.concatenate(listOf(audio1, audio2, audio3))

normalize

Peak-normalize audio so the maximum absolute sample value reaches 1.0. Preserves dynamic range while maximizing signal level. Silent audio is returned unchanged.

val normalized = SonixAudioUtils.normalize(audio)

mix

Mix multiple mono audio tracks into a single output. Tracks are resampled to a common sample rate if needed, shorter tracks are zero-padded, and output is soft-clipped to prevent distortion.

For real-time playback mixing, use SonixMixer instead.

// Equal gain
val mixed = SonixAudioUtils.mix(listOf(vocal, backing))

// Custom gains and target sample rate
val mixed = SonixAudioUtils.mix(
audioList = listOf(vocal, backing),
gains = floatArrayOf(1.0f, 0.5f),
targetSampleRate = 44100
)
ParameterTypeDefaultDescription
audioListList<AudioRawData>requiredMono tracks to mix
gainsFloatArray?null (1.0 for all)Per-track gain factors
targetSampleRateIntFirst track's rateOutput sample rate in Hz

framesToTime / timeToFrames

Convert between frame indices and time in seconds.

val times = SonixAudioUtils.framesToTime(intArrayOf(0, 10, 20), sampleRate = 16000, hopLength = 512)
val frames = SonixAudioUtils.timeToFrames(floatArrayOf(0.0f, 0.32f), sampleRate = 16000, hopLength = 512)

framesToSegments

Convert a boolean frame mask to time segment pairs. Useful for extracting voiced regions from VAD output.

val mask = booleanArrayOf(false, true, true, true, false)
val segments = SonixAudioUtils.framesToSegments(mask, sampleRate = 16000, hopLength = 512)
// segments: [(0.032, 0.128)]

Method Summary

MethodDescription
concatenateJoin audio clips sequentially
normalizePeak-normalize to amplitude 1.0
mixOffline mixing with gains and resampling
framesToTimeFrame indices to seconds
timeToFramesSeconds to frame indices
framesToSegmentsBoolean mask to time segments

Next Steps