Skip to main content

Pitch Detection

Understanding how pitch detection works helps you choose the right settings for your app.

What is Pitch?

Pitch is the perceptual quality that allows us to order sounds from "low" to "high." When you sing an A4 note, your vocal cords vibrate 440 times per second, producing a fundamental frequency (F0) of 440 Hz.

Pitch detection algorithms analyze audio to find this fundamental frequency.

Frequency vs. Note

ConceptExampleWhat It Is
Frequency440 HzHow many times per second the waveform repeats
NoteA4Musical name (letter + octave number)
MIDI Number69Integer representation (A4 = 69)
Cents+5 centsHow many hundredths of a semitone off from perfect

VoxaTrace returns frequency in Hz. Convert to notes using:

// Frequency to MIDI number
val midi = 12 * log2(frequency / 440.0) + 69

// MIDI to note name
val noteNames = arrayOf("C", "C#", "D", "D#", "E", "F", "F#", "G", "G#", "A", "A#", "B")
val noteName = noteNames[midi.toInt() % 12]
val octave = (midi.toInt() / 12) - 1

Detection Algorithms

VoxaTrace offers two pitch detection algorithms:

YIN (Default)

A classic DSP algorithm that works well for most use cases.

AspectDetails
AccuracyGood for clean audio
Latency~50ms
DependenciesNone (pure Kotlin)
Best ForSimple tuners, low-resource devices

SwiftF0 (Neural Network)

A deep learning model trained on singing and speech data.

AspectDetails
AccuracyExcellent, handles noise well
Latency~50ms
DependenciesONNX Runtime
Best ForSinging apps, noisy environments

Choosing an Algorithm

// YIN - no dependencies, good for simple apps
val detector = CalibraPitch.createDetector(
PitchDetectorConfig.BALANCED
)

// SwiftF0 - better accuracy, requires model
val detector = CalibraPitch.createDetector(
PitchDetectorConfig.Builder()
.algorithm(PitchAlgorithm.SWIFT_F0)
.build(),
modelProvider = { ModelLoader.loadSwiftF0() }
)

Performance Comparison

MetricYINSwiftF0
Latency~50ms~50ms
CPU usageLowMedium
MemoryMinimal~10MB (model)
Accuracy (clean audio)92–95%96–98%
Accuracy (noisy audio)70–80%85–92%
Vibrato handlingFairExcellent
DependenciesNoneONNX Runtime

Decision Tree

What are you building?

├─► Simple tuner app
│ └─► Use YIN (minimal dependencies, good enough accuracy)

├─► Karaoke / singing app
│ └─► Use SwiftF0 (better handling of vibrato and background noise)

├─► Low-end device support required?
│ └─► Use YIN (lower memory and CPU)

└─► Accuracy is critical?
└─► Use SwiftF0 (neural network handles edge cases better)

Key Concepts

Confidence

Each pitch detection returns a confidence value (0.0 to 1.0) indicating how certain the algorithm is about the result.

  • > 0.8: High confidence, reliable pitch
  • 0.5 - 0.8: Moderate confidence, probably correct
  • < 0.5: Low confidence, might be noise or wrong octave

Use confidence to filter unreliable detections:

val point = detector.detect(samples, sampleRate)
if (point.confidence > 0.7f) {
// Trust this pitch
updateDisplay(point.pitch)
}

Octave Errors

Sometimes the algorithm detects the wrong octave - reporting 880 Hz instead of 440 Hz (one octave too high) or 220 Hz (one octave too low).

VoxaTrace provides octave correction:

// Enable in detector config
val config = PitchDetectorConfig.Builder()
.enableProcessing() // Enables smoothing + octave correction
.build()

// Or fix in post-processing
val corrected = CalibraPitch.PostProcess.correctOctaveErrors(pitches)

Voiced vs. Unvoiced

Not all audio contains pitch. Consonants, noise, and silence are unvoiced.

VoxaTrace returns -1 for unvoiced frames:

val point = detector.detect(samples, sampleRate)
if (point.pitch > 0) {
// Voiced - show pitch
} else {
// Unvoiced - show silence indicator
}

Real-time vs. Batch

Real-time (Detector)

For live audio streams - process one buffer at a time:

val detector = CalibraPitch.createDetector()
recorder.audioBuffers.collect { buffer ->
val point = detector.detect(buffer.toFloatArray(), buffer.sampleRate)
updateUI(point)
}
detector.close()

Batch (ContourExtractor)

For recorded audio files - process the entire file at once:

val extractor = CalibraPitch.createContourExtractor(ContourExtractorConfig.SCORING)
val contour = extractor.extract(audioSamples, sampleRate = 16000)
// contour.samples contains all pitch points with timestamps
extractor.release()

Frequency Range

Human singing typically spans:

Voice TypeLowHigh
BassE2 (82 Hz)E4 (330 Hz)
BaritoneA2 (110 Hz)A4 (440 Hz)
TenorC3 (131 Hz)C5 (523 Hz)
AltoF3 (175 Hz)F5 (698 Hz)
SopranoC4 (262 Hz)C6 (1047 Hz)

VoxaTrace detects 50 Hz to 2000 Hz by default. Customize for your use case:

val config = PitchDetectorConfig.Builder()
.voiceType(VoiceType.WesternSoprano) // Optimizes for soprano range
.build()

Sample Rate

Pitch detection works best at 16 kHz. VoxaTrace resamples automatically:

// Pass any sample rate - VoxaTrace handles resampling
val point = detector.detect(samples, sampleRate = 48000) // Resampled internally

Next Steps