# Technical Details

Deep dive into FLAC Detective's architecture, detection algorithms, and rule system.

## Table of Contents

- [System Architecture](#system-architecture)
- [Supported Formats](#supported-formats)
- [Repair: lossless reconstruction, only when needed](#repair-lossless-reconstruction-only-when-needed)
- [Detection Rules](#detection-rules) (Rules 1–11 + optional ML Rule 12)
- [Scoring System](#scoring-system)
- [Spectral Analysis](#spectral-analysis)
- [Performance Optimizations](#performance-optimizations)
- [Technical Limitations](#technical-limitations)

## System Architecture

### High-Level Overview

```
 ┌──────────────────────────────────────────────────────────────┐
 │  Input: files / folders (scanned recursively)                │
 │  .flac   .wav   .m4a   .ape   (+ any other audio it finds)   │
 └────────────────────────────────┬─────────────────────────────┘
                                   ▼
 ┌──────────────────────────────────────────────────────────────┐
 │  Scanner / Router          (main.scan_files)                 │
 │   • .flac / .wav            → analyse (read natively)        │
 │   • .m4a / .ape → ffprobe ──┬─ ALAC / APE → analyse          │
 │                             └─ AAC / lossy → reject          │
 │   • .mp3 / .ogg / .opus / … → reject ("not lossless,         │
 │                                        replace with a FLAC") │
 └────────────────────────────────┬─────────────────────────────┘
                                   ▼   one analysable file
 ┌──────────────────────────────────────────────────────────────┐
 │  Decode to local temp      (analyzer.analyze_file)           │
 │   • FLAC / WAV → copy-to-temp, read by libsndfile            │
 │   • ALAC / APE → ffmpeg decode → temporary WAV               │
 │   ↻ on read failure: auto-repair via `flac` CLI, then retry  │
 └────────────────────────────────┬─────────────────────────────┘
                                   ▼
 ┌──────────────────────────────────────────────────────────────┐
 │  Feature extraction  (one shared AudioCache — the temp file  │
 │  is read once and reused by every step below)                │
 │   • Metadata     sample rate, bit depth, channels, duration  │
 │   • Spectral     FFT → cutoff freq, energy ratio, stability  │
 │   • Quality      clipping, DC offset, silence, fake hi-res,  │
 │                  upsampling, corruption                      │
 │   • Duration     metadata vs decoded (consistency check)     │
 └────────────────────────────────┬─────────────────────────────┘
                                   ▼
 ┌──────────────────────────────────────────────────────────────┐
 │  Scoring engine     (new_scoring/calculator.py)              │
 │  11 heuristic rules + optional CNN (Rule 12) → 0–150 pts     │
 │  phased execution with gates & short-circuits — see below    │
 └────────────────────────────────┬─────────────────────────────┘
                                   ▼
 ┌──────────────────────────────────────────────────────────────┐
 │  Verdict        (single source of truth: constants.py)       │
 │  ≤30 AUTHENTIC · 31–54 WARNING · 55–85 SUSPICIOUS · ≥86 FAKE │
 └────────────────────────────────┬─────────────────────────────┘
                                   ▼
 ┌──────────────────────────────────────────────────────────────┐
 │  Reporting:  Rich console  ·  text report  ·  JSON           │
 │  (all derive the verdict from the thresholds above)          │
 └──────────────────────────────────────────────────────────────┘
```

### Core Components

#### 1. File Scanner (`flac_detective/utils.py`)

Recursively finds FLAC files in directories.

**Key features**:
- Recursive directory traversal
- `.flac` extension filtering
- Symbolic link handling
- Error recovery for inaccessible files

#### 2. Metadata Reader (`flac_detective/analysis/metadata.py`)

Extracts FLAC metadata using the Mutagen library.

**Extracted information**:
- Sample rate (Hz): 44100, 48000, 96000, etc.
- Bit depth: 16, 24, 32
- Channels: 1 (mono), 2 (stereo)
- Duration (seconds)
- Encoder information

#### 3. Audio Loader (`flac_detective/analysis/audio_cache.py`)

Loads audio data with intelligent caching.

**Features**:
- Configurable sample duration (default: 30s)
- Memory-efficient caching
- Multiple backend support (soundfile, ffmpeg fallback)
- Automatic retry on corruption

#### 4. Spectral Analyzer (`flac_detective/analysis/spectrum.py`)

Performs FFT (Fast Fourier Transform) analysis.

**Computed metrics**:
- Cutoff frequency (Hz)
- Energy distribution
- Frequency variance
- Spectral density patterns

**Algorithm**:
```python
# Simplified spectral analysis flow
audio_data = load_audio(file, duration=30.0)
fft_result = np.fft.rfft(audio_data)
magnitude = np.abs(fft_result)
frequencies = np.fft.rfftfreq(len(audio_data), 1/sample_rate)

# Find cutoff frequency (where energy drops significantly)
cutoff_freq = detect_cutoff(magnitude, frequencies)
```

#### 5. Scoring Engine (`flac_detective/analysis/new_scoring/`)

Strategy pattern implementation with 11 independent rules.

**Structure**:
```
new_scoring/
├── calculator.py      # Orchestrates rule execution
├── verdict.py         # Maps score to verdict
└── rules/            # Individual rule implementations
    ├── rule_01.py    # MP3 Spectral Signature
    ├── rule_02.py    # Cutoff vs Nyquist
    ├── ...
    └── rule_11.py    # Cassette Detection
```

#### 6. Report Generator (`flac_detective/reporting/`)

Creates formatted output for users.

**Output formats**:
- Console (Rich library, colored, progress bars)
- Text file (detailed analysis)
- JSON (for automation)

### Data Flow

```
FLAC File
   │
   ├─► Extract Metadata
   │   ├─ Sample rate: 44100 Hz
   │   ├─ Bit depth: 16 bits
   │   └─ Duration: 245.3 seconds
   │
   ├─► Load Audio (30 seconds)
   │   └─ Audio array: [samples x channels]
   │
   ├─► Compute FFT
   │   ├─ Magnitude spectrum
   │   ├─ Frequency bins
   │   └─ Cutoff detection
   │
   ├─► Apply Rules 1-11
   │   ├─ Rule 1: +50 pts (MP3 signature detected)
   │   ├─ Rule 2: +15 pts (cutoff at 19.5 kHz)
   │   ├─ Rule 5: -10 pts (high variance protection)
   │   └─ Total: 55 pts
   │
   └─► Generate Verdict
       └─ Score 55 → SUSPICIOUS ⚠️
```

## Supported Formats

Detection is **codec-agnostic**: every rule operates on the decoded PCM samples, so the
container only decides *how the samples are read in*.

| Format | Extension | How it's read | ffmpeg needed? |
|---|---|---|---|
| FLAC | `.flac` | libsndfile (native) | no |
| WAV | `.wav` | libsndfile (native) | no |
| ALAC (Apple Lossless) | `.m4a` | decoded to PCM via ffmpeg | **yes** |
| APE (Monkey's Audio) | `.ape` | decoded to PCM via ffmpeg | **yes** |

The real codec is probed with `ffprobe` — the extension is never trusted. A `.m4a` that
turns out to hold **lossy AAC** is not analysed; it's reported as a non-lossless file to
replace, exactly like an `.mp3`. ffmpeg is a hard dependency **only** for ALAC/APE; a
FLAC/WAV-only workflow never invokes it. For lossless-*compressed* sources decoded to a
temporary WAV (ALAC/APE), the "real bitrate" used by Rules 1 & 3 is sized from the
**original compressed file**, not the decoded WAV — otherwise the file would look
uncompressed and those rules would wrongly switch off.

## Repair: lossless reconstruction, only when needed

Analysis is **read-only**. There is exactly one case where FLAC Detective writes: when a
FLAC is **so corrupted it cannot be decoded at all**, even after the loader's retry/backoff.
A file that won't decode can't be analysed — so, rather than skip it, the tool rebuilds a
**valid, byte-identical FLAC** from whatever the audio data still allows, and then analyses
that. This is the opposite of "tinkering with the sound": **nothing in the audio is
processed, resampled, normalised or 'enhanced'.**

### Why it's lossless (the part that matters for hi-fi)

FLAC is a *lossless* codec: decoding a FLAC and re-encoding it yields the **exact same PCM
samples**, bit for bit. Repair uses Xiph's **reference `flac` tool** for both halves of the
round-trip, so the repaired file's audio is sample-identical to what the corrupted file
could still deliver. The corruption is in the FLAC *framing/container*, not in the PCM you
can still read; repair rebuilds correct framing around those exact samples. No psychoacoustic
processing, no dithering, no gain — none of the things a "repair" might scarily imply.

### The procedure (each step is verifiable)

```
corrupted .flac  ── can't be decoded after retries
   │
   1. extract metadata        (mutagen: all tags + embedded album art)
   2. decode → WAV            (flac --decode-through-errors: recover every
   │                            sample the corruption didn't destroy)
   3. re-encode WAV → FLAC     (flac --best: lossless, exact same samples)
   4. restore metadata         (tags + pictures put back, untouched)
   5. verify                   (flac --test: refuse to proceed unless the
   │                            rebuilt file is provably valid)
   6. replace original         (only after a .corrupted.bak backup is written)
   ▼
 valid .flac  ── now analysable; backup of the original kept beside it
```

### Safety guarantees

- **Only broken files.** A file that decodes normally is *never* rewritten. Healthy music is
  read and left exactly as it is.
- **A backup is always kept.** The original is copied to `<name>.flac.corrupted.bak` *before*
  anything replaces it — you can always go back.
- **Verified before trusted.** If the rebuilt file fails `flac --test`, repair aborts and the
  original is left untouched.
- **Metadata preserved.** Tags and embedded artwork are carried across verbatim.
- **Honest limit.** Samples that corruption genuinely destroyed can't be invented back —
  `--decode-through-errors` recovers everything still readable and no less. Repair never makes
  a file *worse* than the corruption already did; it makes a broken file *usable* again.

There are two entry points to the same lossless machinery:

- **Automatic**, during analysis — triggered only by the undecodable-file case above, so a
  scan of a healthy library never writes anything.
- **Standalone**, `python -m flac_detective.repair /path` — a duration-header fixer for FLACs
  whose declared length disagrees with their actual decoded length (also a lossless re-encode,
  also with a `.bak` backup).

## Detection Rules

FLAC Detective uses **11 heuristic rules** with **additive scoring** (0–150 points), plus
an **optional 12th rule** (a CNN, enabled with the `[ml]` extra — see Rule 12 below).

### Scoring engine flow

**Order matters.** The rules don't just sum — the engine runs them in a deliberate order
with *gates* (that switch rules off when they'd misfire) and *short-circuits* (that stop
early once the answer is certain, skipping the expensive rules). This is both for accuracy
and for speed.

```
 cutoff freq · bitrate · metadata · audio ─►  ScoringContext  (mutable, shared)

 1. Rule 8   Nyquist exception        ── always first (refined later if MP3 found)
 2. Rule 11  Cassette detection       ── EARLY, only if cutoff < 19 kHz (protect rips)

    ┌─ Gates — these DISABLE the container-bitrate rules (1 & 3) ────────────────┐
    │   cassette detected (R11 ≥ 30)        → drop Rule 1, apply −40 protection  │
    │   uncompressed input  (real/apparent  → drop Rules 1 & 3                   │
    │     bitrate ratio > 0.92, e.g. WAV)     (no lossless-compression signal)   │
    └────────────────────────────────────────────────────────────────────────────┘

 3. PHASE 1 — fast rules, always run:   R1  R2  R3  R4  R5  R6
       │
       ├─►  score ≥ 86               →  FAKE_CERTAIN   (stop — skip costly rules)
       └─►  score < 10 and no MP3    →  AUTHENTIC      (stop)

 4. PHASE 2 — expensive rules, only when relevant (need the full decoded audio):
       • R7  silence / vinyl     if 19 kHz ≤ cutoff ≤ 21.5 kHz
       • R9  compression artefacts if cutoff < 21 kHz  OR  an MP3 signature was seen
       • R11 cassette            if cutoff < 19 kHz and not already run early
       └─ Rule 8 re-refined now that MP3 context is known
       └─►  score ≥ 86            →  FAKE_CERTAIN   (stop)

 5. Rule 10  multi-segment consistency   ── only if score > 30 (already suspect)
 6. Rule 12  CNN classifier (optional)   ── abstains if rolloff < 7 kHz;
                                            no-op unless installed with [ml]
       │
       ▼
   total score (0–150)  ─►  verdict
```

The rules themselves, in detail:

### Rule 1: MP3 Spectral Signature Detection

**Purpose**: Detect CBR (Constant Bitrate) MP3 patterns

**Detection method**:
- Analyzes cutoff frequency
- Matches against known MP3 bitrate signatures

**MP3 Bitrate Signatures**:
```
128 kbps MP3 → 16000-16500 Hz cutoff
160 kbps MP3 → 17000-17500 Hz cutoff
192 kbps MP3 → 19000-19500 Hz cutoff
256 kbps MP3 → 20000-20500 Hz cutoff
320 kbps MP3 → 20000-20500 Hz cutoff (with exceptions)
Authentic    → 22050 Hz (full spectrum)
```

**Scoring**:
- MP3 signature detected: **+50 points**
- Exception for high-quality MP3 320k: Some protection
- No signature: **0 points**

**Example**:
```
File with 19200 Hz cutoff:
→ Matches 192 kbps MP3 signature
→ +50 points
```

---

### Rule 2: Cutoff Frequency vs Nyquist Threshold

**Purpose**: Penalize files with suspiciously low frequency content

**Detection method**:
1. **Slice-based cutoff detection** (primary)
   - Detects sharp magnitude drops in FFT
2. **Energy-based cutoff detection** (fallback)
   - Finds where 90% of energy is concentrated
   - **Critical**: Only 15-22 kHz range is suspicious
   - Bass concentration (< 15 kHz) = authentic

**Why 15 kHz minimum?**
```
Bass-heavy music example:
  Energy distribution:
  │████████  ← 80% energy at 2-3 kHz (bass)
  │██        ← 15% energy at 5-10 kHz (mids)
  │▓         ← 5% energy at 10-22 kHz (highs)
  └──────────→
   0    22kHz

  This is AUTHENTIC music, not MP3 artifact!
  Without 15 kHz threshold → False positive
```

**Scoring**:
- Per 200 Hz below threshold: **+1 point** (max +30)
- Formula: `min((threshold - cutoff) / 200, 30)`
- Bass concentration (< 15 kHz): **0 points** (protected)

**Example**:
```
Cutoff at 19000 Hz, threshold 22000 Hz:
→ Deficit: 3000 Hz
→ Score: 3000 / 200 = 15 points
```

---

### Rule 3: Source vs Container Bitrate

**Purpose**: Detect "inflated" files (low-quality source in heavy container)

**Detection method**:
- Calculate effective source bitrate from spectral analysis
- Compare with FLAC container bitrate
- Large mismatch indicates upsampling

**Scoring**:
- MP3 source + container > 600 kbps: **+50 points**
- Moderate mismatch: **+20-30 points**
- No mismatch: **0 points**

**Example**:
```
MP3 128 kbps source → FLAC 900 kbps container
→ Inflation ratio: 7x
→ +50 points (suspicious)
```

---

### Rule 4: Suspicious 24-bit Detection

**Purpose**: Identify fake high-resolution files

**Detection method**:
- Check bit depth metadata
- 16-bit = CD quality (standard)
- 24-bit = high-resolution (rare for MP3 transcodes)
- Combined with other indicators → fake high-res

**Scoring**:
- 24-bit + suspicious patterns: **+30 points**
- 16-bit: **0 points**

---

### Rule 5: High Variance Protection (VBR)

**Purpose**: Protect legitimate Variable Bitrate files

**Detection method**:
- Analyze bitrate variance across audio segments
- VBR MP3s have natural variance
- CBR transcodes have uniform patterns

**Scoring**:
- High variance detected: **-40 points** (protection)
- Low variance: **0 points**

---

### Rule 6: High Quality Protection

**Purpose**: Protect high-quality legitimate files

**Detection method**:
- Check container bitrate
- > 700 kbps indicates quality encoding

**Scoring**:
- Bitrate > 700 kbps: **-30 points** (protection)
- Lower bitrate: **0 points**

---

### Rule 7: Silence & Vinyl Analysis

**Purpose**: Detect and protect vinyl/analog sources

**Detection phases**:
1. **Dither detection**: Analyze silence for noise shaping
2. **Surface noise**: Low-frequency rumble (< 100 Hz)
3. **Clicks & pops**: Vinyl surface artifacts

**Scoring**:
- Vinyl characteristics detected: **-100 points** (strong protection)
- No vinyl signatures: **0 points**

**Why protection?**
```
Vinyl rips legitimately have:
- Surface noise throughout
- Frequency content that may look "limited"
- These are NOT indicators of transcoding
```

---

### Rule 8: Nyquist Exception

**Purpose**: Protect files with cutoff near theoretical maximum

**Detection method**:
- Cutoff ≥ 95% Nyquist (e.g., ≥ 20947 Hz for 44.1 kHz)
- Likely anti-aliasing filter, not MP3 cutoff

**Scoring**:
- Near Nyquist: **-50 points** (protection)
- Far from Nyquist: checked by Rule 2

---

### Rule 9: Compression Artifacts

**Purpose**: Detect MP3 compression artifacts

**Sub-tests**:
- **Pre-echo**: MDCT temporal masking artifacts
- **Aliasing**: High-frequency aliasing patterns
- **Quantization noise**: MP3 quantization patterns

**Scoring**:
- One artifact: **+15 points**
- Two artifacts: **+30 points**
- Three artifacts: **+50 points**

---

### Rule 10: Multi-Segment Consistency

**Purpose**: Validate patterns across entire file

**Detection method**:
- Analyze 3+ segments of the file
- MP3s show consistent compression throughout
- Authentic files have variable spectral content

**Scoring**:
- Consistent MP3 patterns: **+20 points**
- Variable patterns: **0 points**

---

### Rule 11: Cassette Detection

**Purpose**: Identify and protect cassette tape sources

**Detection method**:
- Wow & flutter (speed variations)
- Age-related noise floor elevation
- Dropout patterns

**Scoring**:
- Cassette characteristics: **-60 points** (protection)
- No cassette signatures: **0 points**

### Rule 12: ML Classifier (CNN) — *optional*

**Purpose**: An independent, learned second opinion that *sharpens* borderline verdicts.
It is the only non-heuristic rule and is **off unless** the ML extra is installed
(`pip install "flac-detective[ml]"`); without it, Rule 12 is a no-op and rules 1–11 stand alone.

**Model**: a small **EfficientNet-B0** CNN bundled with the package. Input is a
**2-channel mid/side mel-spectrogram** (mid = L+R, side = L−R) rather than mono — MP3
quantises the side channel aggressively, so its fingerprints survive even on band-limited
material where the high-frequency cliff is faint. This stereo move is what lifted real-world
specificity from 80 % (mono, v0.12) to 95 % (v0.14).

**Reliability gate (key design choice)**: a false-positive audit on 11 234 certified-authentic
FLACs showed the CNN is unreliable on sources that roll off below **~7 kHz** (genuinely
band-limited masters look like transcodes to it). Below that 95 % spectral-rolloff threshold
the model **abstains** (contributes 0) and lets the heuristic rules decide — faithful to the
"protect authentic files first" philosophy. The rolloff is computed from the same decode used
for the mel-spectrogram, so the gate is essentially free.

**Scoring**: adds a bounded boost on already-suspect files; it is tuned to *raise confidence*
on borderline cases far more than to catch fakes the heuristics miss outright. It cannot, by
itself, flip a clean file to FAKE.

> The full R&D story — the false-positive audit, four dead-ends, a debunked "AUC 0.99", and
> the mono→stereo breakthrough — is written up as a learning resource in
> [`ml/README.md`](https://github.com/Guillain-RDCDE/FLAC_Detective/blob/main/ml/README.md).

## Scoring System

### Additive Scoring

All rules contribute to a **total score** (0-150 points):

```
Total Score = Σ(all rule contributions)

Example calculation:
  Rule 1 (MP3 Spectral):      +50 pts
  Rule 2 (Cutoff):            +15 pts
  Rule 5 (VBR Protection):    -10 pts
  Rule 9 (Compression):       +7 pts
  ────────────────────────────────────
  Total:                      62 pts → SUSPICIOUS ⚠️
```

### Verdict Mapping

```
Score ≤ 30   → AUTHENTIC ✅      (no evidence of transcoding)
Score 31-54  → WARNING ❓        (borderline — manual review)
Score 55-85  → SUSPICIOUS ⚠️     (likely a transcode)
Score ≥ 86   → FAKE_CERTAIN ❌   (multiple strong indicators)
```

The thresholds live in `new_scoring/constants.py` (`SCORE_AUTHENTIC=30`,
`SCORE_WARNING=31`, `SCORE_SUSPICIOUS=55`, `SCORE_FAKE_CERTAIN=86`) and are the **single
source of truth** for the console, the text/JSON reports and the Python API — none of them
re-derive a verdict from a private cutoff.

### Score Interpretation

**Philosophy**: Higher score = More evidence of transcoding

- **Positive contributions** (+points): Indicators of MP3 transcode
- **Negative contributions** (-points): Protection for authentic sources

**Thresholds explained**:
- **≤ 30**: All protection mechanisms considered, minimal suspicious indicators
- **31-54**: Some suspicious indicators but with protective factors
- **55-85**: Multiple strong indicators, few protective factors
- **≥ 86**: Overwhelming evidence, definitive fake

> **On "confidence".** Verdicts are *evidence levels*, not probabilities. A `FAKE_CERTAIN`
> means several independent indicators agree — in practice very reliable — but `AUTHENTIC`
> means *"no evidence of transcoding found"*, **not** a guarantee: high-bitrate AAC/Opus
> transcodes and genuinely band-limited masters can score low (measured specificity is
> ~80–87 %, see [`ml/README.md`](https://github.com/Guillain-RDCDE/FLAC_Detective/blob/main/ml/README.md)). For critical decisions, confirm with a
> visual tool such as Spek.

### Threshold Calibration

The bands aren't arbitrary — the SUSPICIOUS floor was **moved from 61 to 55 in v0.15.1**
after a score-distribution study. The study scored a large set of *known* MP3 transcodes
and found their scores cluster around a **median of ~58** — i.e. inside the old WARNING
band (31–60), so genuine fakes were being under-called as "borderline". Lowering the floor
to 55 reclaimed roughly **+5 percentage points** of transcodes as actionable SUSPICIOUS,
while authentic false positives stayed at ~1 %. The `FAKE_CERTAIN` floor (86) and the
AUTHENTIC ceiling (30) were left untouched. This is the concrete trade-off the
"protect authentic files first" philosophy makes: the boundary is placed where it catches
the most real fakes without pushing the authentic false-positive rate up.

## Spectral Analysis

### FFT (Fast Fourier Transform)

FLAC Detective uses FFT to analyze frequency content:

```python
# Simplified FFT analysis
def analyze_spectrum(audio_data, sample_rate):
    # Compute FFT
    fft_result = np.fft.rfft(audio_data)
    magnitude = np.abs(fft_result)
    frequencies = np.fft.rfftfreq(len(audio_data), 1/sample_rate)

    # Find cutoff frequency
    threshold = 0.01 * np.max(magnitude)  # 1% of peak
    cutoff_indices = np.where(magnitude > threshold)[0]
    cutoff_freq = frequencies[cutoff_indices[-1]]

    return cutoff_freq, magnitude, frequencies
```

### Cutoff Detection Methods

#### Method 1: Slice-Based (Primary)

Detects sharp magnitude drops:

```
Magnitude
    │
100%│████████████████
    │████████████████
 50%│████████████████
    │████████████████
  1%│████████████████ ← Sharp drop here
  0%│
    └────────────────────→ Frequency
           ↑
      Cutoff point (MP3 signature)
```

#### Method 2: Energy-Based (Fallback)

Finds 90% cumulative energy point:

```
Cumulative Energy
    │
100%│          ┌─────
    │         /
 90%│        / ← 90% threshold
    │       /
 50%│      /
    │     /
  0%│────/
    └────────────────→ Frequency
           ↑
    90% energy point
```

## Performance Optimizations

### 1. Intelligent Caching

```python
# Audio cache system
class AudioCache:
    def __init__(self, max_size=100):
        self.cache = {}  # filepath → audio_data
        self.max_size = max_size

    def get_or_load(self, filepath, duration):
        if filepath in self.cache:
            return self.cache[filepath]  # Cache hit

        # Load and cache
        audio = load_audio(filepath, duration)
        self.cache[filepath] = audio
        return audio
```

**Impact**: 80% faster on repeated analyses

### 2. Sample Duration Optimization

Default: 30 seconds (balance of speed vs accuracy)

```
Duration    Accuracy    Speed
15s         85%         Fast
30s         95%         Balanced ← Default
60s         98%         Slow
```

### 3. Parallel Processing

Multiple files can be analyzed in parallel:

```python
from concurrent.futures import ProcessPoolExecutor

with ProcessPoolExecutor(max_workers=4) as executor:
    results = executor.map(analyze_file, flac_files)
```

### 4. FFT Optimization

- Use `np.fft.rfft` (real FFT) instead of full FFT
- Downsample when appropriate
- Vectorized operations

## Technical Limitations

### What FLAC Detective Can Do

✅ Detect MP3-to-lossless transcodes (CBR and VBR)
✅ Analyze FLAC, WAV (v0.15), ALAC and APE (v0.16, via ffmpeg) sources
✅ Identify fake high-resolution files
✅ Protect vinyl and cassette sources
✅ Detect compression artifacts
✅ Handle corrupted files (with repair)

### What It Cannot Do

❌ **Detect other lossy formats** (AAC, OGG, WMA → FLAC)
❌ **Guarantee 100% accuracy** (see [Accuracy](#accuracy))
❌ **Real-time processing** (designed for batch analysis)
❌ **Analyze lossless formats beyond FLAC/WAV/ALAC/APE** (e.g. WavPack, TAK — not yet decoded)
❌ **Subjective quality assessment** (only transcode detection)

### Accuracy

Based on testing with diverse audio samples:

```
True Authentic Files:
  Correctly identified: 95.2%
  False positives: 4.8%

True Transcoded Files:
  Correctly identified: 97.8%
  False negatives: 2.2%

Overall Accuracy: 96.5%
```

**False positive causes**:
- Aggressive mastering or limiting
- Unusual frequency content (e.g., sine wave tests)
- Rare analog sources not covered by protection rules

**False negative causes**:
- Very high-quality MP3 320 kbps VBR
- MP3s with unusual encoding settings
- Heavily processed audio (e.g., extreme normalization)

### Edge Cases

**1. MP3 320 kbps VBR**
- May pass as AUTHENTIC due to Rule 6 protection
- Intentional: prioritize avoiding false positives

**2. Vinyl rips**
- Protected by Rule 7
- Should score AUTHENTIC despite frequency limitations

**3. Streaming sources**
- May have legitimate frequency cutoffs (platform processing)
- May trigger WARNING (manual review recommended)

**4. Remastered albums**
- Heavy processing can create unusual patterns
- Use multiple tools for confirmation

## Algorithm Pseudocode

Complete detection algorithm:

```
function analyze_flac(filepath):
    # Step 1: Load metadata
    metadata = read_metadata(filepath)
    sample_rate = metadata.sample_rate
    bit_depth = metadata.bit_depth

    # Step 2: Load audio
    audio = load_audio(filepath, duration=30.0)

    # Step 3: Spectral analysis
    fft_result = compute_fft(audio)
    cutoff_freq = detect_cutoff(fft_result, sample_rate)
    energy_dist = compute_energy_distribution(fft_result)

    # Step 4: Apply rules
    score = 0
    score += rule_01(cutoff_freq, sample_rate)     # MP3 signature
    score += rule_02(cutoff_freq, sample_rate)     # Cutoff vs Nyquist
    score += rule_03(metadata, energy_dist)        # Bitrate mismatch
    score += rule_04(bit_depth, cutoff_freq)       # Suspicious 24-bit
    score += rule_05(audio, sample_rate)           # VBR protection
    score += rule_06(metadata)                     # High quality
    score += rule_07(audio)                        # Vinyl/silence
    score += rule_08(cutoff_freq, sample_rate)     # Nyquist exception
    score += rule_09(audio, fft_result)            # Compression artifacts
    score += rule_10(filepath, sample_rate)        # Multi-segment
    score += rule_11(audio)                        # Cassette

    # Step 5: Determine verdict
    if score <= 30:
        verdict = "AUTHENTIC"
    elif score <= 60:
        verdict = "WARNING"
    elif score <= 85:
        verdict = "SUSPICIOUS"
    else:
        verdict = "FAKE_CERTAIN"

    return {score, verdict, reasons}
```

## Further Reading

- **User documentation**: [User Guide](user-guide.md)
- **Python API**: [API Reference](api-reference.md)
- **Development**: [Contributing](https://github.com/Guillain-RDCDE/FLAC_Detective/blob/main/.github/CONTRIBUTING.md)
- **Quick start**: [Getting Started](getting-started.md)

---

For technical questions, visit [GitHub Discussions](https://github.com/Guillain-RDCDE/FLAC_Detective/discussions).