Smol bash script for finding oversize media files

Friday, September 2, 2022 

Sometimes you want to know if you have media files that are taking up more than their fair share of space.  You compressed the file some time ago in an old, inefficient format, or you just need to archive the oversize stuff, this can help you find em.  It’s different from file size detection in that it uses mediainfo to determine the media file length and wc -c to get the size, and from that computes the total effective data rate. All math is done with bc, which is usually installed. Files are found recursively from the starting point (passed as first argument) using find.

basic usage would be:

./find-high-rate-media.sh /search/path/tostart/ [min rate] [min size]

The script will then report media with a rate higher than minimum and size larger than minimum as a tab delimited list of filenames, calculated rate, and calculated size. Piping the output to a file, output.csv, makes it easy to sort and otherwise manipulate in LibreOffice Calc.

Save the file as a name you like (such as find-high-rate-media.sh) and # chmod  +x find-high-rate-media.sh and off you go.

The code (also available here):

#!/usr/bin/bash

# check arguments passed and set defaults if needed
# No argument given?
if [ -z "$1" ]; then
  printf "\nUsage:\n\n  pass a starting point and min data rate in kbps and min size like /media/gessel/datas/Downloads/ 100 10 \n\n" 
  exit 1
fi

if [ -z "$2" ]; then
  printf "\nUsage:\n\n  returning files with data rate greater than default max of 100 kbps  \n\n" 
  maxr=100
  else
        maxr=$2
        echo -e "\n\n  returning files with dara rate greater than " $maxr " kbps  \n\n" 
fi

if [ -z "$3" ]; then
  printf "\nUsage:\n\n  returning files with file size greater than default max of 100 MB  \n\n" 
  maxs=10
  else
        maxs=$3
        echo -e "\n\n  returning files with dara rate greater than " $maxs " MB  \n\n" 
fi

# multipliers to get to human readable values
msec="1000"
kilo="1024"

echo -e "file path \t rate kbps \t size MB"

# search for files with the extensions enumerated below
# edit this list to your needs (e.g. -iname \*.mp3 or whatever
# the -o means "or", -iname (vs -name) means case independent so
# it will find .MKV and .mkv.
# then pass each file found to check if the data rate is 
# above the min rate of concern and then if the files size is 
# above the min size of concern, and if so, print the result
 
find "$1" -type f \( -iname \*.avi -o -iname \*.mkv -o -iname \*.mp4 -o -iname \*.wmv \) -print0 | while read -rd $'\0' file
do
    size="$(wc -c  "$file" |  awk '{print $1}')"
    duration="$(mediainfo --Inform="Video;%Duration%" "$file")"
    seconds=$(bc -l <<<"${duration}/${msec}")
    sizek=$(bc -l <<<"scale=1; ${size}/${kilo}")
    sizem=$(bc -l <<<"scale=1; ${sizek}/${kilo}")
    rate=$(bc -l <<<"scale=1; ${sizek}/${seconds}")
    if (( $(bc  <<<"$rate > $maxr") )); then
        if (( $(bc  <<<"$sizem > $maxs") )); then
            echo -e $file "\t" $rate "\t" $sizem
        fi
    fi
done

Results might look like

file path 	 rate kbps 	 size MB
/media/my kitties playing.mkv 	 1166.0 	 5802.6
/media/cats jumping.mkv 	 460.1 	 2858.9
/media/fuzzy kitties.AVI 	 1092.7 	 7422.0

Another common task is renaming video files with some key stats on the contents so they’re easier to find and compare. Linux has limited integration with media information (dolphin is somewhat capable, but thunar not so much). This little script also leans on mediainfo command line to append the following to the file name of media files recursively found below a starting directory path:

  • WidthxHeight in pixels (1920×1080)
  • Runtime in HH-MM-SS.msec (02-38-15.111) (colons aren’t a good thing in filenames, yah, it is confusingly like a date)
  • CODEC name (AVC)
  • Datarate (1323kbps)

For example

kittyplay.mp4 -> kittyplay_1280x682_02-38-15.111_AVC_154.3kbps.mp4

The code is also available here.

#!/usr/bin/bash
PATH="/home/gessel/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"

############################# USE #######################################################
# find_media.sh /starting/path/ (quote path names with spaces)
########################################################################################

# No argument given?
if [ -z "$1" ]; then
  printf "\nUsage:\n  pass a starting point like \"/Downloads/Media files/\" \n" 
  exit 1
fi

msec="1000"
kilo="1024"
s="_"
x="x"
kbps="kbps"
dot="."

find "$1" -type f \( -iname \*.avi -o -iname \*.mkv -o -iname \*.mp4 -o -iname \*.wmv \) -print0 | while read -rd $'\0' file
do
  if [[ -f "$file" ]]; then
    size="$(wc -c  "$file" |  awk '{print $1}')"
    duration="$(mediainfo --Inform="Video;%Duration%" "$file")"
    seconds=$(bc -l <<<"${duration}/${msec}")
    sizek=$(bc -l <<<"scale=1; ${size}/${kilo}")
    sizem=$(bc -l <<<"scale=1; ${sizek}/${kilo}")
    rate=$(bc -l <<<"scale=1; ${sizek}/${seconds}")
    codec="$(mediainfo --Inform="Video;%Format%" "$file")"
    framerate="$(mediainfo --Inform="General;%FrameRate%" "$file")"
    rtime="$(mediainfo --Inform="General;%Duration/String3%" "$file")"
    runtime="${rtime//:/-}"
    width="$(mediainfo --Inform="Video;%Width%" "$file")"
    height="$(mediainfo --Inform="Video;%Height%" "$file")"
    fname="${file%.*}"
    ext="${file##*.}"
    $(mv "$file" "$fname$s$width$x$height$s$runtime$s$codec$s$rate$kbps$dot$ext")
  fi
done

If you don’t have mediainfo installed,

sudo apt update
sudo apt install mediainfo
Summary
software image
no rating based on 0 votes
Software Name
find-high-rate-media.sh
Operating System
Linux
Software Category
Multimedia
Price
USD 0
Posted at 10:18:58 GMT-0700

Category : AudioHowToLinuxvideo

Tags :

Leave a Reply