Roadmap: multi-format support (WAV / ALAC / APE)

Design note for the v0.15+ work of widening FLAC Detective beyond .flac. Status: WAV shipped in v0.15.0; ALAC + APE shipped in v0.16.0 (decode-façade

  • bitrate-from-original wiring). Kept as the design record. See the commit history around v0.14 for the original scoping context.

The key insight: detection is codec-agnostic

The transcode signal FLAC Detective looks for — the MP3 spectral cliff, the cutoff vs. sample rate, compression artefacts (pre-echo, aliasing), and the CNN mel-spectrogram — all operate on the decoded PCM. They don’t care what container delivered the samples. So widening to other lossless containers is overwhelmingly an input/output problem, not a detection-science problem.

What is actually coupled to FLAC:

Concern

Location

Notes

File discovery

main.py (suffix == ".flac")

trivial: widen the accepted extensions

Audio decoding

analysis/new_scoring/audio_loader.py (soundfile)

WAV is free (libsndfile); ALAC/APE need another decoder

Metadata

analysis/metadata.py (mutagen.flac.FLAC)

needs a per-format reader or mutagen.File

Container-bitrate rules

analysis/new_scoring/bitrate.py + Rules 1/3

format-dependent semantics — see below

The worldview gotcha: non-FLAC is currently treated as fake

The bigger coupling isn’t technical, it’s philosophical. Today the tool does not merely skip non-FLAC files — main._create_non_flac_result() reports them with score 100, verdict NON_FLAC, “must be replaced with an authentic FLAC.” The tool’s worldview is “a lossless collection is made of FLACs; anything else is suspect.”

So supporting WAV means a deliberate product shift: WAV moves from “rejected, replace it” to “a first-class lossless format we analyse on its own merits” (is this WAV a genuine recording, or an MP3→WAV fake?). Concretely, the scanner must route .wav into the analysis list instead of the non-FLAC reject list, while ALAC/APE (until their decoders land) stay in the reject list. This is a decision to make explicitly, not a silent glob change.

The design gotcha: container-bitrate rules

Rules 1 (MP3-bitrate signature) and 3 (source-vs-container) reason about a lossless-compressed container: a real FLAC of clean audio compresses to a genuinely lossless size, while an MP3-sourced fake compresses smaller / sits in a recognisable bitrate band. This logic is meaningful for FLAC and for other lossless-compressed formats (ALAC, APE) — but not for uncompressed WAV: a WAV transcoded from an MP3 still has the full uncompressed bitrate (~1411 kbps), so the “compressible → suspect” signal disappears.

Decision: for uncompressed formats, gate Rules 1 and 3 off and rely on the spectral rules (cutoff / artefacts / CNN), which still see the MP3 cliff. There is already a precedent for conditionally disabling Rule 1 — Rule 11 (cassette) does it for legitimate analogue sources — so the mechanism exists.

Effort and sequencing

Format

Decoder

Effort

Value

When

WAV

soundfile (already)

low (~½ day)

high — common

✅ v0.15.0

ALAC (.m4a)

ffmpeg decode-façade

medium (~1–2 d)

medium — Apple

✅ v0.16.0

APE (.ape)

ffmpeg decode-façade

medium-high

low — niche

✅ v0.16.0

WAV (v0.15) — concretely

  1. Widen file discovery to .wav (+ keep .flac).

  2. Metadata: read WAV header (soundfile sf.info gives sample rate / channels / subtype → bit depth); duration from frames.

  3. Gate Rules 1 & 3 when the input is uncompressed (no lossless-compression signal).

  4. Tests: a synthetic clean WAV (authentic) and an MP3→WAV fake (flagged by the cliff), plus a regression check that FLAC behaviour is unchanged.

The structural investment (unlocks ALAC/APE cleanly)

Refactor audio_loader to be format-agnostic: try soundfile first, fall back to an ffmpeg-decode path for containers libsndfile can’t read. Once that exists, ALAC and APE are mostly “add the extension + a metadata reader”.

Landed (v0.16 foundation): analysis/audio_formats.py — the isolated, tested decode-façade. ffmpeg_available(), probe_codec() (ffprobe codec_name), is_analysable_lossless() (FLAC/WAV native; ALAC/APE/etc. by probe; an AAC .m4a correctly returns False → stays a reject), needs_ffmpeg_decode(), and decode_to_wav() (ffmpeg -i -vn temp.wav). ffmpeg is a hard requirement for non-native formats only — FLAC/WAV never touch it. Tests in tests/test_audio_formats.py (skip if ffmpeg absent).

ALAC/APE wiring — the bitrate-from-original subtlety (must get right)

The decode-façade lets the pipeline treat ALAC/APE as a plain WAV for the spectral rules. But there’s a trap in the bitrate path:

  • For a lossless-compressed source (FLAC/ALAC/APE), real_bitrate = original_compressed_size × 8 / duration, and the real/apparent < 0.92 ratio is what keeps Rules 1 & 3 on (the “compressible → could be a fake” signal).

  • If we naively feed the decoded WAV to the calculator, it computes real_bitrate from the uncompressed WAV → ratio ≈ 1.0 → the file is mistaken for uncompressed → R1/R3 gate off → we’d miss ALAC-wrapped fakes.

So the calculator must derive real_bitrate from the original .m4a/.ape size, while the rules read audio from the decoded WAV. Today new_calculate_score(filepath, …) derives bitrate from the same filepath the rules load audio from — one path can’t be both (the original isn’t soundfile-readable). The wiring therefore needs the original size/bitrate threaded in separately (e.g. an explicit source_size/pre-computed bitrate-metrics argument), not just a temp-path swap. This is the careful core-path change that distinguishes ALAC from the trivial WAV case.

Metadata likewise: .m4amutagen.mp4 (or ffprobe), .ape → ffprobe, since mutagen.flac.FLAC / soundfile.info can’t read them.

Out of scope (and why)

  • Detecting AAC/Opus/Vorbis → lossless transcodes is a detection limit, not a format-input one, and is near-impossible at high bitrate (see ml/README.md — the tool’s measured blind spot). Supporting a container ≠ being able to judge it.