# Roadmap: multi-format support (WAV / ALAC / APE)

> Design note for the v0.15+ work of widening FLAC Detective beyond `.flac`.
> Status: **WAV shipped in v0.15.0; ALAC + APE shipped in v0.16.0** (decode-façade
> + bitrate-from-original wiring). Kept as the design record. See the commit history
> around v0.14 for the original scoping context.

## The key insight: detection is codec-agnostic

The transcode signal FLAC Detective looks for — the MP3 **spectral cliff**, the
cutoff vs. sample rate, compression artefacts (pre-echo, aliasing), and the CNN
mel-spectrogram — all operate on the **decoded PCM**. They don't care what
container delivered the samples. So widening to other *lossless* containers is
overwhelmingly an **input/output** problem, not a detection-science problem.

What is actually coupled to FLAC:

| Concern | Location | Notes |
|---|---|---|
| File discovery | `main.py` (`suffix == ".flac"`) | trivial: widen the accepted extensions |
| Audio decoding | `analysis/new_scoring/audio_loader.py` (soundfile) | WAV is free (libsndfile); ALAC/APE need another decoder |
| Metadata | `analysis/metadata.py` (`mutagen.flac.FLAC`) | needs a per-format reader or `mutagen.File` |
| Container-bitrate rules | `analysis/new_scoring/bitrate.py` + Rules 1/3 | **format-dependent semantics — see below** |

## The worldview gotcha: non-FLAC is currently treated as *fake*

The bigger coupling isn't technical, it's philosophical. Today the tool does not
merely skip non-FLAC files — `main._create_non_flac_result()` reports them with
**score 100, verdict `NON_FLAC`, "must be replaced with an authentic FLAC."** The
tool's worldview is *"a lossless collection is made of FLACs; anything else is
suspect."*

So supporting WAV means a deliberate **product shift**: WAV moves from *"rejected,
replace it"* to *"a first-class lossless format we analyse on its own merits"* (is
this WAV a genuine recording, or an MP3→WAV fake?). Concretely, the scanner must
route `.wav` into the **analysis** list instead of the non-FLAC reject list, while
ALAC/APE (until their decoders land) stay in the reject list. This is a decision
to make explicitly, not a silent glob change.

## The design gotcha: container-bitrate rules

Rules **1** (MP3-bitrate signature) and **3** (source-vs-container) reason about a
*lossless-compressed* container: a real FLAC of clean audio compresses to a
genuinely lossless size, while an MP3-sourced fake compresses smaller / sits in a
recognisable bitrate band. This logic is meaningful for FLAC and for other
**lossless-compressed** formats (ALAC, APE) — but **not for uncompressed WAV**: a
WAV transcoded from an MP3 still has the full uncompressed bitrate (~1411 kbps),
so the "compressible → suspect" signal disappears.

**Decision:** for uncompressed formats, **gate Rules 1 and 3 off** and rely on the
spectral rules (cutoff / artefacts / CNN), which still see the MP3 cliff. There is
already a precedent for conditionally disabling Rule 1 — Rule 11 (cassette) does it
for legitimate analogue sources — so the mechanism exists.

## Effort and sequencing

| Format | Decoder | Effort | Value | When |
|---|---|---|---|---|
| **WAV** | soundfile (already) | **low** (~½ day) | high — common | ✅ v0.15.0 |
| **ALAC** (`.m4a`) | ffmpeg decode-façade | medium (~1–2 d) | medium — Apple | ✅ v0.16.0 |
| **APE** (`.ape`) | ffmpeg decode-façade | medium-high | low — niche | ✅ v0.16.0 |

### WAV (v0.15) — concretely
1. Widen file discovery to `.wav` (+ keep `.flac`).
2. Metadata: read WAV header (soundfile `sf.info` gives sample rate / channels /
   subtype → bit depth); duration from frames.
3. Gate Rules 1 & 3 when the input is uncompressed (no lossless-compression signal).
4. Tests: a synthetic clean WAV (authentic) and an MP3→WAV fake (flagged by the
   cliff), plus a regression check that FLAC behaviour is unchanged.

### The structural investment (unlocks ALAC/APE cleanly)
Refactor `audio_loader` to be **format-agnostic**: try soundfile first, fall back
to an ffmpeg-decode path for containers libsndfile can't read. Once that exists,
ALAC and APE are mostly "add the extension + a metadata reader".

**Landed (v0.16 foundation):** `analysis/audio_formats.py` — the isolated,
tested decode-façade. `ffmpeg_available()`, `probe_codec()` (ffprobe
`codec_name`), `is_analysable_lossless()` (FLAC/WAV native; ALAC/APE/etc. by
probe; an **AAC** `.m4a` correctly returns False → stays a reject),
`needs_ffmpeg_decode()`, and `decode_to_wav()` (ffmpeg `-i … -vn temp.wav`).
ffmpeg is a **hard requirement for non-native formats only** — FLAC/WAV never
touch it. Tests in `tests/test_audio_formats.py` (skip if ffmpeg absent).

### ALAC/APE wiring — the bitrate-from-original subtlety (must get right)
The decode-façade lets the pipeline treat ALAC/APE as a plain WAV for the
**spectral** rules. But there's a trap in the **bitrate** path:

- For a *lossless-compressed* source (FLAC/ALAC/APE), `real_bitrate` =
  `original_compressed_size × 8 / duration`, and the
  `real/apparent < 0.92` ratio is what keeps Rules 1 & 3 **on** (the
  "compressible → could be a fake" signal).
- If we naively feed the **decoded WAV** to the calculator, it computes
  `real_bitrate` from the *uncompressed* WAV → ratio ≈ 1.0 → the file is
  mistaken for uncompressed → **R1/R3 gate off** → we'd miss ALAC-wrapped fakes.

So the calculator must derive `real_bitrate` from the **original `.m4a`/`.ape`
size**, while the rules read audio from the **decoded WAV**. Today
`new_calculate_score(filepath, …)` derives bitrate from the *same* `filepath`
the rules load audio from — one path can't be both (the original isn't
soundfile-readable). The wiring therefore needs the original size/bitrate
threaded in separately (e.g. an explicit `source_size`/pre-computed
bitrate-metrics argument), not just a temp-path swap. This is the careful
core-path change that distinguishes ALAC from the trivial WAV case.

Metadata likewise: `.m4a` → `mutagen.mp4` (or ffprobe), `.ape` → ffprobe, since
`mutagen.flac.FLAC` / `soundfile.info` can't read them.

## Out of scope (and why)
- **Detecting AAC/Opus/Vorbis → lossless transcodes** is a *detection* limit, not a
  format-input one, and is near-impossible at high bitrate (see `ml/README.md` —
  the tool's measured blind spot). Supporting a container ≠ being able to judge it.