Skip to main content

TesseraBreath

Breath control score, phrase-level structure, and reference-vs-student alignment from a PitchContour.

Quick Start

Kotlin

// One-shot analysis: control score + phrase summary
val metrics = TesseraBreath.analyze(contour, config = BreathConfig.PRACTICE)
println("Control: ${metrics.controlScore}")
println("Longest phrase: ${metrics.phrases?.longestDuration}s")
println("Comfortable range: ${metrics.phrases?.comfortableRange}")

// With a reference recording: also populates alignmentScore
val metricsVsRef = TesseraBreath.analyze(studentContour, reference = referenceContour)
println("Alignment: ${metricsVsRef.alignmentScore}")

// Composable: reuse the breath function for analysis + comparison
val bf = TesseraBreath.computeBreathFunction(contour)
val metrics = TesseraBreath.analyze(bf)
val refBf = TesseraBreath.computeBreathFunction(referenceContour)
val alignment = TesseraBreath.compare(refBf, bf)

Swift

let metrics = TesseraBreath.analyze(contour: contour, config: .practice)
print("Control: \(metrics.controlScore ?? 0), Longest: \(metrics.phrases?.longestDuration ?? 0)")

Methods

MethodDescription
computeBreathFunction(contour, config = DEFAULT): BreathFunctionBuild the shared intermediate (values, times, equivalent sustain time)
analyze(contour, reference = null, config = DEFAULT): BreathMetricsControl score + phrase summary; pass reference to also populate alignmentScore
analyze(breathFunction, reference = null, config = DEFAULT): BreathMetricsSame, but reusing pre-computed breath functions
compare(reference, student, config = DEFAULT): Float?FFT cross-correlation peak-matching of two breath functions; returns alignment score alone. Null when too short to align, or when the reference has no detectable breath peaks.
compare(refContour, studentContour, config = DEFAULT): Float?One-shot comparison from contours

Result types

BreathMetrics

data class BreathMetrics(
val controlScore: Float?, // sigmoid-scaled control score in [0, 1); null when no breath signal
val phrases: PhraseSummary?, // phrase-level structure; null when audio has no detectable phrase boundaries
val alignmentScore: Float? = null, // populated only when `analyze(..., reference = ..., ...)` is used
)

All three fields are nullable. controlScore is null when equivalent sustain time is zero (no voicing detected) — the sigmoid output for that case is mathematically valid but semantically meaningless. phrases is null when the recording has fewer than two pause boundaries (e.g., very short audio or unbroken voicing). alignmentScore is null on no-reference calls, when either recording is shorter than BreathConfig.minAlignmentDuration, or when the reference has no detectable breath peaks to align against.

PhraseSummary

data class PhraseSummary(
val totalPhrases: Int,
val phrases: List<Phrase>, // each (startTime, duration) in seconds
val comfortableRange: PhraseRange?, // middle two bins of the 5-bin phrase-duration histogram
val avgDuration: Float, // mean phrase duration (s)
val shortestDuration: Float, // (s)
val longestDuration: Float, // (s) — LOF-filtered peak phrase, headline value for UI
val longestDurationUnfiltered: Float, // (s) — raw maximum (no outlier filtering)
val phraseToBreathRatios: FloatArray, // index-aligned with `phrases`; phrase ÷ preceding pause
val avgPhraseToBreathRatio: Float, // headline efficiency value
)

data class Phrase(val startTime: Float, val duration: Float)
data class PhraseRange(val lower: Float, val upper: Float)

longestDuration excludes phrases flagged as statistical outliers by Local Outlier Factor on the phrase-to-breath ratios — it's resilient to single fluky phrases. Use longestDurationUnfiltered for the raw maximum.

BreathFunction

data class BreathFunction(
val values: FloatArray, // exponential growth on voiced, decay on unvoiced
val times: FloatArray, // same length as values
val equivalentSustainTime: Float, // input to the control sigmoid
)

BreathConfig

Presets

PresettauRisetauFallsigmoidKsigmoidMminUnvoiced
DEFAULT / SINGING8.00.40.3100.10
PRACTICE8.00.150.3150.05
SPEECH5.00.40.360.15
CLINICAL8.00.10.25200.05

Use PRACTICE for sustained alankaar/scales, SPEECH for spoken word, CLINICAL for sustained-phonation tests.

Properties

PropertyTypeDefaultDescription
featureRateFloat30Resampling rate for analysis (Hz)
tauRiseFloat8.0Time constant for growth during voicing (s)
tauFallFloat0.4Time constant for decay during pauses (s)
sigmoidKFloat0.3Sigmoid steepness for control score
sigmoidMFloat10Sigmoid midpoint (s)
minUnvoicedDurationFloat0.1Min gap (s) treated as a real pause
controlThresholdFloat0.55Peak detection amplitude threshold
lofNeighborsInt25Neighbors for LOF outlier detection
minAlignmentDurationFloat6.0Min length (s) for cross-correlation comparison
peakTimeToleranceFloat0.5Max time offset for matching peaks (s)
peakAmplitudeToleranceFloat0.3Max amplitude ratio difference (30%)
alignmentSnippetsInt6Random snippets for cross-correlation estimation
alignmentSnippetDurationInt5Duration of each snippet (s)

Builder

val config = BreathConfig.Builder()
.preset(BreathConfig.PRACTICE)
.sigmoidM(12f)
.minUnvoicedDuration(0.08f)
.build()

Common Pitfalls

  1. Contour must have ≥ 2 samples. Throws IllegalArgumentException per ADR-022.
  2. controlScore is nullable. Null when no voicing was detected. Always guard before scaling for display.
  3. phrases is nullable. Always check before using longestDuration, comfortableRange, etc.
  4. alignmentScore is nullable. Populated only when you pass reference = ..., and only when both recordings exceed minAlignmentDuration and the reference has detectable breath peaks.
  5. Match the preset to the audio. SINGING for songs, PRACTICE for alankaar, SPEECH for spoken word, CLINICAL for sustained-tone tests.

See also