Audio File Analysis With Sox

Wednesday, February 7, 2024

Sox is a cool program, a “Swiss Army knife of sound processing,” and a useful tool for checking audio files that belongs in anyone’s audio processing workflow. I thought it might be useful for detecting improperly encoded audio files or those files that have decayed due to bit rot or cosmic rays or other acoustic calamities and it is.

Sox has two statistical output command line options, “stat” and “stats,” which output different but useful data. What’s useful about sox for this, that some metadata checking programs (like the very useful MP3Diags-unstable) don’t do is actually decode the file and compute stats from the actual audio data. This takes some time, about 0.7 sec for a typical (5 min) audio file. This may seem fast, it is certainly way faster than real time, but if you want to process 22,000 files, it will take 4-5 hours.

Some of the specific values that are calculated seem to mean something obvious, like “Flat factor” is related to the maximum number of identical samples in a row – which would make the waveform “flat.” But the computation isn’t linear and there is a maximum value (>30 is a bad sign, usually).

So I wrote a little program to parse out the results and generate a csv file of all of the results in tabular form for analysis in LibreOffice Calc. I focused on a few variables I thought might be indicative of problems, rather than all of them:

DC offset—which you’d hope was always close to zero.
Min-Max level difference—min and max should be close to symmetric and usually are, but not always.
RMS pk dB—which is normally set for -3 or -6 dB, but shouldn’t peak at nearly silent, -35 dB.
Flat factor—which is most often 0, but frequently not.
Pk count—the number of samples at peak, which is most often 2
Length s—the length of the file in seconds, which might indicate a play problem

After processing 22,000 files, I gathered some statistics on what is “normal” (ish, for this set of files), which may be of some use in interpreting sox results. The source code for my little bash script is at the bottom of the post.

DC Bias

DC Bias really should be very close to zero, and the most files are fairly close to zero, but some in the sample had a bias of greater than 0.1, which even so has no perceptible audio impact.

Min Level – Max Level

Min level is most often normalized to -1 and max level most often normalized to +1, which would yield a difference of 2 or a difference of absolute values of 0 (as measured) and this is the most common result (31.13%). A few files, 0.05% or so have a difference greater than 0.34, which is likely to be a problem and is worth a listen.

RMS pk dB

Peak dB is a pretty important parameter to optimize as an audio engineer and common settings are -6dB and -3dB for various types of music, however if a set of files is set as a group, individual files can be quite a bit lower or, sometimes, a bit higher. Some types of music, psychobilly for example, might be set even a little over -3 dB. A file much above -3 dB might have sound quality problems or might be corrupted to be just noise; 0.05% of files have a peak dB over -2.2 dB. A file with peak amplitudes much below -30 dB may be silent and certainly will be malto pianissimo; 0.05% of files have a peak dB below -31.2 dB.

A very quiet sample, with a Pk dB of -31.58, would likely have a lot of aliasing due to the entire program using only about 10% of the total head room.

Flat factor

Flat factor is a complicated measure, but is roughly (but not exactly) the maximum number of consecutive identical samples. @AkselA offered a useful oneliner (sox -n -p synth 10 square 1 norm -3 | sox - -n stats) to verify that it is not, exactly, just a run of identical values and just what it actually is, isn’t that well documented. Whatever it is exactly, 0 is the right answer and 68% of files get it right. Only 0.05% of files have a flat factor greater than 27.

Pk count

Peak count is a good way to measure clipping. 0.05% of files have a pk count < 1000, but the most common value, 65.5%, is 2, meaning most files are normalized to peak at 100%… exactly twice (log scale chart, the peak is at 2).

As an example, a file with levels set to -2.31 and a flat factor of only 14.31 but with a Pk count of 306,000 looks like this in Audacity with “Show Clipping” on, and yet sounds kinda like you’d think it is supposed to. Go figure.

Statistics

What’s life without statistics, sample pop: 22,096 files. 205 minutes run time or 0.56 seconds per file.

Stats	DC bias	min amp	max amp	min-max	avg pk dB	flat factor	pk count	length s
Mode	0.000015	-1	1	0	-10.05	0.00	2	160
Count at Mode	473	7,604	7,630	6,879	39	14,940	14,472	14
% at mode	2.14%	34.41%	34.53%	31.13%	0.18%	67.61%	65.50%	0.06%
Average	0.00105	-0.80	0.80	0.03	-10.70	2.03	288.51	226.61
Min	0	-1	0.0480	0	-34.61	0	1	4.44
Max	0.12523	-0.0478	1	0.497	-1.25	129.15	306,000	7,176
Threshold	0.1	-0.085	0.085	0.25	-2.2	27	1,000	1,200
Count @ Thld	3	11	10	68	12	12	35	45
% @ Thld	0.01%	0.05%	0.05%	0.31%	0.05%	0.05%	0.16%	0.20%

Bash Script

#!/bin/bash

###############################################################
# This program uses sox to analyize an audio file for some
# common indicators that the actual file data may have issues
# such as corruption or have been badly prepared or modified
# It takes a file path as an input and outputs to stdio the results
# of tests if that file exceeds the theshold values set below
# or, if the last conditional is commented out, all files.
# a typical invocation might be something like:
# find . -depth -type f -name "*.mp3" -exec soxverify.sh {} > stats.csv \;
# The code does not handle single or multi-track files and will
# throw an error. If sox can't read the file it will throw an error
# to the csv file. Flagged files probably warrant a sound check.

##############################################
### Set reasonable threshold values ##########
# DC offset should be close to zero, but is almost never exactly
# The program uses the absolute value of DC offset (which can be
# neg or positive) as a test and is normalized to 1.0
# If the value is high, total fidelity might be improved by
# using audacity to remove the bias and recompressing.
# files that exceed the dc_offset_bias will be output with
# Error Code "O"
dc_offset_threshold=0.1

# Most files have fairly symmetric min_level and max_level
# values.  If the min and max aren't symmetric, there may
# be something wrong, so we compute and test. 99.95% of files have
# a delta below 0.34, files with a min_max_delta above 
# min_max_delta_threshold will be flagged EC "D"
min_max_delta_threshold=0.34

# Average peak dB is a standard target for normalization and
# replay gain is common used to adjust files or albums that weren't
# normalized to hit that value. 99.95% of files have a
# RMS_pk_dB of < -2.2, higher than that is weird, check the sound.
# Exceeding this threshold generates EC "H"
RMS_pk_dB_threshold=-2.2

# Extremely quiet files might also be indicative of a problem
# though some are simply malto pianissimo. 99.95% of files have
# a minimum RMS_pk_dB > -31.2 . Files with a RMS pk dB < 
# RMS_min_dB_threshold will be flagged with EC "Q"
RMS_min_dB_threshold=-31.2

# Flat_factor is a not-linear measure of sequential samples at the
# same level. 68% of files have a flat factor of 0, but this could
# be intentional for a track with moments of absolute silence
# 99.95% of files have a flat factor < 27. Exceeding this threshold
# generates EC "F"
flat_factor_threshold=27

# peak_count is the number of samples at maximum volume and any value > 2
# is a strong indicator of clipping. 65% of files are mixed so that 2 samples
# peak at max. However, a lot of "loud" music is engineered to clip
# 8% of files have >100 "clipped" samples and 0.16% > 10,000 samples
# In the data set, 0.16% > 1000 samples. Exceeding this threshold
# generates EC "C"
pk_count_threshold=1000

# Zero length (in seconds) or extremely long files may be, depending on
# one's data set, indicative of some error. A file that plays back
# in less time than length_s_threshold will generate EC "S"
# file playing back longer than length_l_threshold: EC "L"
length_s_threshold=4
length_l_threshold=1200



# Check if a file path is provided as an argument
if [ "$#" -ne 1 ]; then
    echo "Usage: $0 <audio_file_path>"
    exit 1
fi

audio_file="$1"

# Check if the file exists
if [ ! -f "$audio_file" ]; then
    echo "Error: File not found - $audio_file"
    exit 1
fi

# Run sox with -stats option, remove newlines, and capture the output
sox_stats=$(sox "$audio_file" --replay-gain off -n stats 2>&1 | tr '\n' ' ' )

# clean up the output
sox_stats=$(  sed 's/[ ]\+/ /g' <<< $sox_stats )
sox_stats=$(  sed 's/^ //g' <<< $sox_stats )


# Check if the output contains "Overall" as a substring
if [[ ! "$sox_stats" =~ Overall ]]; then
    echo "Error: Unexpected output from sox: $1"
    echo "$sox_stats"
    echo ""
    exit 1
fi


# Extract and set variables
dc_offset=$(echo "$sox_stats" | cut -d ' ' -f 6)
min_level=$(echo "$sox_stats" | cut -d ' ' -f 11)
max_level=$(echo "$sox_stats" | cut -d ' ' -f 16)
RMS_pk_dB=$(echo "$sox_stats" | cut -d ' ' -f 34)
flat_factor=$(echo "$sox_stats" | cut -d ' ' -f 50)
pk_count=$(echo "$sox_stats" | cut -d ' ' -f 55)
length_s=$(echo "$sox_stats" | cut -d ' ' -f 67)

# convert DC offset to absolute value
dc_offset=$(echo "$dc_offset" | tr -d '-')

# convert min and max_level to absolute values:
abs_min_lev=$(echo "$min_level" | tr -d '-')
abs_max_lev=$(echo "$max_level" | tr -d '-')

# compute delta and convert to abs value
min_max_delta_int=$(echo "abs_max_lev - abs_min_lev" | bc -l)
min_max_delta=$(echo "$min_max_delta_int" | tr -d '-')

# parss pkcount
pk_count=$(  sed 's/k/000/' <<< $pk_count )
pk_count=$(  sed 's/M/000000/' <<< $pk_count )


# Compare values against thresholds
threshold_failed=false
err_code="ERR: "

# Offset bad check
if (( $(echo "$dc_offset > $dc_offset_threshold" | bc -l) )); then
    threshold_failed=true
    err_code+="O"
fi

# Large delta check
if (( $(echo "$min_max_delta >= $min_max_delta_threshold" | bc -l) )); then
    threshold_failed=true
    err_code+="D"
fi

# Mix set too high check
if (( $(echo "$RMS_pk_dB > $RMS_pk_dB_threshold" | bc -l) )); then
    threshold_failed=true
    err_code+="H"
fi

# Very quiet file check
if (( $(echo "$RMS_pk_dB < $RMS_min_dB_threshold" | bc -l) )); then
    threshold_failed=true
    err_code+="Q"
fi

# Flat factor check
if (( $(echo "$flat_factor > $flat_factor_threshold" | bc -l) )); then
    threshold_failed=true
    err_code+="F"
fi

# Clipping check - peak is max and many samples are at peak
if (( $(echo "$max_level >= 1" | bc -l) )); then
    if (( $(echo "$pk_count > $pk_count_threshold" | bc -l) )); then
        threshold_failed=true
        err_code+="C"
    fi
fi

# Short file check
if (( $(echo "$length_s < $length_s_threshold" | bc -l) )); then
    threshold_failed=true
    err_code+="S"
fi

# Long file check
if (( $(echo "$length_s > $length_l_threshold" | bc -l) )); then
    threshold_failed=true
    err_code+="L"
fi

# for data collection purposes, comment out the conditional and the values
# for all found files will be output.
if [ "$threshold_failed" = true ]; then
    echo -e "$1" "\t" "$err_code" "\t" "$dc_offset" "\t" "$min_level" "\t" "$max_level" "\t" "$min_max_delta" "\t" "$RMS_pk_dB" "\t" "$flat_factor" "\t" "$pk_count" "\t" "$length_s"
fi

Gessel On…