Calibra Speaking Pitch
Natural speaking pitch detection for voice profiling.
What is Speaking Pitch?
Speaking pitch is the median fundamental frequency of a person's voice when speaking naturally. It represents their "home base" vocal frequency.
Use it for:
Voice profiling: Establish user's natural pitch range
Shruti suggestion: Recommend musical tonic based on voice
Voice type classification: Soprano, tenor, bass, etc.
Voice health tracking: Monitor changes over time
Note: Speaking pitch is different from:
Singing range: Full high-to-low capability (use
CalibraVocalRange)Shruti/tonic: Musical reference note (calculated from range)
When to Use
| Scenario | Use This? | Why |
|---|---|---|
| Detect natural voice pitch | Yes | Core use case |
| Classify voice type | Yes | Based on frequency range |
| Detect singing range | No | Use CalibraVocalRange |
| Real-time pitch display | No | Use CalibraPitch |
Quick Start
Kotlin
// From audio samples (16kHz mono)
val speakingPitch = CalibraSpeakingPitch.detectFromAudio(audioSamples)
if (speakingPitch > 0) {
println("Speaking pitch: $speakingPitch Hz")
val note = CalibraMusic.hzToNoteLabel(speakingPitch)
println("Closest note: $note")
}
// Or from existing pitch contour
val contour = pitchExtractor.extract(audio, 16000)
val speakingPitch = CalibraSpeakingPitch.detectFromPitch(contour.toPitchesArray())Swift
// From audio samples (16kHz mono)
let speakingPitch = CalibraSpeakingPitch.detectFromAudio(audioMono: audioSamples)
if speakingPitch > 0 {
print("Speaking pitch: \(speakingPitch) Hz")
let note = CalibraMusic.hzToNoteLabel(speakingPitch)
print("Closest note: \(note)")
}
// Or from existing pitch contour
let contour = pitchExtractor.extract(audio: audio, sampleRate: 16000)
let speakingPitch = CalibraSpeakingPitch.detectFromPitch(pitchesHz: contour.toPitchesArray())Typical Speaking Pitches
| Voice Type | Typical Range |
|---|---|
| Bass | 85-155 Hz |
| Baritone | 110-165 Hz |
| Tenor | 130-200 Hz |
| Alto | 175-255 Hz |
| Soprano | 220-330 Hz |
Platform Notes
iOS/Android
Accepts any sample rate; internally resamples to 16kHz if needed (ADR-017)
Uses median-based detection for robustness against outliers
Returns -1 if detection fails (not enough voiced audio)
Common Pitfalls
Singing instead of speaking: This detects speaking pitch, not singing
Not enough audio: Need several seconds of natural speech
Background noise: High noise levels affect detection
See also
For detecting singing range
For frequency-to-note conversions