CalibraSpeakingPitch

Natural speaking pitch detection for voice profiling.

What is Speaking Pitch?

Speaking pitch is the median fundamental frequency of a person's voice when speaking naturally. It represents their "home base" vocal frequency.

Use it for:

  • Voice profiling: Establish user's natural pitch range

  • Shruti suggestion: Recommend musical tonic based on voice

  • Voice type classification: Soprano, tenor, bass, etc.

  • Voice health tracking: Monitor changes over time

Note: Speaking pitch is different from:

  • Singing range: Full high-to-low capability (use CalibraVocalRange)

  • Shruti/tonic: Musical reference note (calculated from range)

When to Use

ScenarioUse This?Why
Detect natural voice pitchYesCore use case
Classify voice typeYesBased on frequency range
Detect singing rangeNoUse CalibraVocalRange
Real-time pitch displayNoUse CalibraPitch

Quick Start

Kotlin

// From audio samples (16kHz mono)
val speakingPitch = CalibraSpeakingPitch.detectFromAudio(audioSamples)

if (speakingPitch > 0) {
println("Speaking pitch: $speakingPitch Hz")
val note = CalibraMusic.hzToNoteLabel(speakingPitch)
println("Closest note: $note")
}

// Or from existing pitch contour
val contour = pitchExtractor.extract(audio, 16000)
val speakingPitch = CalibraSpeakingPitch.detectFromPitch(contour.toPitchesArray())

Swift

// From audio samples (16kHz mono)
let speakingPitch = CalibraSpeakingPitch.detectFromAudio(audioMono: audioSamples)

if speakingPitch > 0 {
print("Speaking pitch: \(speakingPitch) Hz")
let note = CalibraMusic.hzToNoteLabel(speakingPitch)
print("Closest note: \(note)")
}

// Or from existing pitch contour
let contour = pitchExtractor.extract(audio: audio, sampleRate: 16000)
let speakingPitch = CalibraSpeakingPitch.detectFromPitch(pitchesHz: contour.toPitchesArray())

Typical Speaking Pitches

Voice TypeTypical Range
Bass85-155 Hz
Baritone110-165 Hz
Tenor130-200 Hz
Alto175-255 Hz
Soprano220-330 Hz

Platform Notes

iOS/Android

  • Accepts any sample rate; internally resamples to 16kHz if needed (ADR-017)

  • Uses median-based detection for robustness against outliers

  • Returns -1 if detection fails (not enough voiced audio)

Common Pitfalls

  1. Singing instead of speaking: This detects speaking pitch, not singing

  2. Not enough audio: Need several seconds of natural speech

  3. Background noise: High noise levels affect detection

See also

For detecting singing range

For frequency-to-note conversions

Functions

Link copied to clipboard
fun detectFromAudio(audioMono: FloatArray, sampleRate: Int = 16000): Float

Detect natural speaking pitch from audio samples.

Link copied to clipboard

Detect natural speaking pitch from pitch contour.