Skip to main content

PitchDetection

Realtime pitch detection and batch contour extraction. PitchDetection is the public facade for both workflows; it returns a PitchDetector (realtime) or PitchContourExtractor (batch).

Quick Start

Kotlin

// Realtime detection
val detector = PitchDetection.createDetector()
val point = detector.detect(audioBuffer, sampleRate = 16000)
detector.close()

// Batch extraction
val extractor = PitchDetection.createContourExtractor(
ContourExtractorConfig.SCORING,
modelProvider = { ModelLoader.loadSwiftF0() }
)
val contour = extractor.extract(audioSamples, sampleRate = 16000)
extractor.release()

Swift

let detector = PitchDetection.createDetector()
let point = detector.detect(samples: audioBuffer, sampleRate: 48000)
detector.close()

let extractor = PitchDetection.createContourExtractor(config: .scoring) {
ModelLoader.shared.loadSwiftF0()
}
let contour = extractor.extract(audio: audioSamples, sampleRate: 48000)
extractor.release()

Factory Methods

MethodReturnsUse for
createDetector(config, modelProvider)PitchDetectorRealtime, frame-by-frame
createContourExtractor(config, modelProvider)PitchContourExtractorBatch, whole recording

modelProvider is required when the algorithm is PitchAlgorithm.SWIFT_F0 and no global provider was registered via AIModelRegistry.registerSwiftF0 { ... }. Both factories require a resolvable provider in that case (IllegalArgumentException otherwise).

PitchDetectorConfig

Presets

PresetKotlinSwiftbufferSizetoleranceconfidenceThreshold
Balanced (default)PitchDetectorConfig.BALANCED.balanced10240.150.75
RelaxedPitchDetectorConfig.RELAXED.relaxed10240.200.65
Precise (offline only)PitchDetectorConfig.PRECISE.precise40960.100.85

PRECISE uses a 4096-sample buffer — too expensive per frame for realtime use (per ADR-020). Use it only for offline batch analysis.

Properties

PropertyTypeDefaultDescription
algorithmPitchAlgorithmYINYIN or SWIFT_F0
bufferSizeInt1024Audio buffer size (YIN-specific)
hopSizeInt160Hop size between frames in samples
toleranceFloat0.15YIN tolerance (lower = more accurate)
minFreqFloat80Minimum detectable frequency (Hz)
maxFreqFloat1000Maximum detectable frequency (Hz)
amplitudeGateDbFloat-40RMS gate threshold (dB); below = unvoiced
confidenceThresholdFloat0.75Min confidence to accept pitch (0.0–1.0)
enableSmoothingBooleanfalseInline smoothing filter
enableOctaveCorrectionBooleanfalseInline octave correction
smoothingWindowSizeInt5Smoothing window (must be odd)
octaveThresholdCentsFloat150Snap-back threshold for octave correction
swiftF0BatchSizeInt2560SwiftF0 streaming buffer size

Builder

val config = PitchDetectorConfig.Builder()
.preset(PitchDetectorConfig.BALANCED)
.algorithm(PitchAlgorithm.SWIFT_F0)
.voiceType(VoiceType.carnaticMale)
.quietHandling(QuietHandling.SENSITIVE)
.strictness(DetectionStrictness.LENIENT)
.enableProcessing() // smoothing + octave correction
.bufferSize(1024)
.hopSize(160)
.tolerance(0.15f)
.swiftF0BatchSize(2560)
.build()

VoiceType

Sealed class. Both the object form (VoiceType.WesternSoprano) and a lowercase companion getter (VoiceType.westernSoprano) work in Kotlin; Swift uses the lowercase form (.westernSoprano).

VoiceTypeRange (Hz)
Auto65 – 1500
WesternSoprano200 – 1500
WesternAlto130 – 1000
WesternTenor100 – 700
WesternBass65 – 450
WesternChild180 – 1500
CarnaticMale75 – 600
CarnaticFemale120 – 1100
CarnaticChild180 – 1300
HindustaniMale75 – 600
HindustaniFemale120 – 1100
HindustaniChild180 – 1300
PopMale75 – 600
PopFemale120 – 1100
PopChild180 – 1300
IndianFilmMale75 – 600
IndianFilmFemale120 – 1100
IndianFilmChild180 – 1300

QuietHandling

Maps to amplitudeGateDb. Frames below the gate are returned as unvoiced.

LevelGate (dB)Use for
SENSITIVE-50Quiet rooms, soft singing
NORMAL (default)-40Typical environments
NOISY-30Loud environments

DetectionStrictness

Maps to confidenceThreshold.

LevelThresholdUse for
STRICT0.85Fewer false positives
BALANCED (default)0.75Balanced
LENIENT0.65Catches more notes

ContourExtractorConfig

Used by createContourExtractor.

Presets

PresetKotlinSwiftpresetcleanup
DefaultContourExtractorConfig.DEFAULT.defaultBALANCEDSCORING
ScoringContourExtractorConfig.SCORING.scoringPRECISESCORING
DisplayContourExtractorConfig.DISPLAY.displayBALANCEDDISPLAY
RawContourExtractorConfig.RAW.rawBALANCEDRAW

Properties

PropertyTypeDefaultDescription
presetPitchPresetBALANCEDResolution / accuracy trade-off
algorithmPitchAlgorithmSWIFT_F0YIN or SwiftF0
sampleRateInt16000Input audio sample rate (Hz)
hopMsInt10Hop between pitch samples (ms)
cleanupPitchProcessingConfigSCORINGPost-processing applied after extraction
voiceTypeVoiceTypeAutoFrequency range optimization
quietHandlingQuietHandlingNORMALAmplitude gate level
strictnessDetectionStrictnessBALANCEDConfidence threshold

Builder

val config = ContourExtractorConfig.Builder()
.preset(ContourExtractorConfig.SCORING)
.pitchPreset(PitchPreset.PRECISE)
.algorithm(PitchAlgorithm.SWIFT_F0)
.sampleRate(16000)
.hopMs(10)
.cleanup(PitchProcessingConfig.SCORING)
.voiceType(VoiceType.carnaticMale)
.quietHandling(QuietHandling.SENSITIVE)
.strictness(DetectionStrictness.BALANCED)
.build()

PitchDetector

abstract class PitchDetector : AutoCloseable. Construct via PitchDetection.createDetector(...).

Methods

MethodDescription
detect(samples, sampleRate)Single-shot detection. Returns latest PitchPoint. Does NOT write to pitchContour.
feedContour(samples, sampleRate, anchorTime)Stream audio into pitchContour / livePitch. Each emission's timestamp is back-spread from anchorTime by the detector's hop.
pitchAt(timeSeconds)Closest contour point to timeSeconds. Returns null if contour is empty.
getAmplitude(samples, sampleRate)RMS of the input. Resamples to 16 kHz internally.
clearPitchContour()Wipe the entire contour.
clearPitchContourFrom(timeSeconds)Drop points at-or-after timeSeconds; keep earlier ones. Used for segment-aware retry / seek-back.
reset()Reset internal state and audio buffer.
release() / close()Release native resources.
duplicate()New detector with the same config and an independent contour.

Properties

PropertyTypeDescription
configPitchDetectorConfigConfiguration used to create this detector
latencyMsFloatDetection latency in milliseconds
hasProcessingBooleanWhether post-processing is available
processingEnabledBoolean (var)Toggle smoothing / octave correction at runtime
pitchContourPitchContourRecorderLossless append-only session contour; read whole via snapshot() or windowed via recent(seconds)
livePitchSharedFlow<PitchPoint>Per-emission pitch stream — same source/rate as pitchContour, event-shaped

Both pitchContour and livePitch are filled by feedContour in lock-step. Read pitchContour.recent(seconds) once per render frame for scrolling trails (the caller chooses the span at read time), or pitchContour.snapshot() for the whole session; use the SharedFlow for live tuners or telemetry. PitchContourRecorder lives in com.musicmuni.voxatrace.common.streaming and is read-only to the caller; also exposes pitchAt(timeSeconds), size, and durationSeconds.

PitchContourExtractor

Construct via PitchDetection.createContourExtractor(...). Holds a native ONNX session (when SwiftF0). Call release() when done.

MethodDescription
extract(samples, sampleRate)Run the configured pipeline; returns PitchContour
release()Free native resources

PitchPoint

data class PitchPoint(
val pitch: Float, // Hz, or -1f if unvoiced
val confidence: Float, // 0.0 – 1.0
val timeSeconds: Float = 0f
)
Computed propertyTypeDescription
isSingingBooleanpitch > 0
midiNoteIntMIDI number, or -1 if unvoiced
noteString?e.g., "A4", "C#5"; null if unvoiced
centsOffInt-50…+50 cents from nearest 12-TET note
tuningPitchPoint.TuningSILENT, FLAT, IN_TUNE, or SHARP (±10 c thresholds)

PitchContour

data class PitchContour(
val samples: List<PitchPoint>,
val sampleRate: Int = 16000,
val hopSize: Int = 0
)
PropertyTypeDescription
durationFloatSeconds (timestamp of the last sample)
voicedRatioFloatRatio of voiced to total samples
sizeIntNumber of samples
isEmptyBooleanTrue if no samples
timesFloatArrayTimestamps
pitchesHzFloatArrayPitch values in Hz (-1 = unvoiced)
pitchesMidiFloatArrayPitch values in MIDI (-1 = unvoiced)
Method / factoryDescription
slice(startTime, endTime, relativeTimes = true)Time-range slice
toTimesArray() / toPitchesArray()Parallel arrays for native-side APIs
PitchContour.fromArrays(times, pitches, …)Build from parallel arrays
PitchContour.fromPoints(points, …)Build from a list
PitchContour.fromPitchData(data, …)Build from parsed PitchData (see SonixParser)
PitchContour.EMPTYEmpty constant

Common Pitfalls

  1. PRECISE is not for realtime. 4096-sample buffer breaks the 40 ms per-buffer budget. Use BALANCED or RELAXED for realtime; reserve PRECISE for offline (per ADR-020).
  2. SwiftF0 needs a model. Either register globally (AIModelRegistry.registerSwiftF0 { … }) or pass modelProvider explicitly. createDetector/createContourExtractor throw IllegalArgumentException otherwise.
  3. Mono input only. SonixDecoder.decode() averages stereo channels automatically (per ADR-017). Custom audio paths must convert to mono.
  4. detect() does not write the contour. Use feedContour(samples, sampleRate, anchorTime) if you want pitchContour populated.
  5. Always release() / close(). Detectors and extractors hold native resources.

See also