Social media fundamentally changed strategic messaging, cutting the cost per effect by at least two orders of magnitude, probably more. It has become the most cost effective munition in the global arsenal. Even when it took teams of actual humans to populate content and troll farms to flood social media with messaging intended to result in a desired outcome, for example to swing an election, start a war, damage alliances, break treaties, or generate support for one particular policy, foreign or domestic, or another, it was still a revolution in reduced cost warfare.
Take Operation INFEKTION, the active measure campaign run by the KGB starting in about 1983″to create a favorable opinion for us abroad that this disease (AIDS) is the result of secret experiments with a new type of biological weapon by the secret services of the USA and the Pentagon that spun out of control.”
This campaign leveraged assets put in place as far back as 1962 and eventually consumed the authority of Prof. Jakob Segal as a self-referential authoritative citation. After a little more than a decade of relentless media placements of strategic messaging, even in the United States more than 25% of the population had been convinced AIDS was a government project and 12% had been manipulated into believing it was created and spread by the CIA. This project was tremendously successful despite having to overcome the then standard and generally principled editorial gate keeping that protected “traditional” media from abuse and cooptation by manufacturing plausible chains of authority and fabricating deep and broad reference chains to thwart fact checking.
By the 2016 Election, the KGB’s successors, the IRA and GRU, efficiently and expertly leveraged social media to achieve even more impressive results, possibly winning the most significant military battle in history, to alter the outcome of the US election at a cost of only a few billion dollars and within a mere 2-3 years of effort.
Any-to-any publishing circumvents editorial protections (he writes without a trace of irony). What might otherwise be a limitation of psyop being clearly outside any authorative endorsement, something that required the consumption of an asset like Jakob Segal to achieve in an earlier era, has been overwhelmingly diminished by a parallel effort to destroy trust in institutions and authority creating a direct path to shape the beliefs of targets through mass individualization of messaging unchecked by any need for longitudinal reputation building.
That the 2016 effort still cost billions, requiring a massive capacity build of English speaking, internet savvy teams inducted into “troll farms,” (many ironically located in Bulgaria given that county’s role in Operation INFEKTION) may already be obsolete just 8 years later.
As an example, let me introduce to you Dr. Alexander Greene, a person ChatGPT tells us “is a highly esteemed and celebrated professor with a remarkable career dedicated to advancing the fields of green energy and engineering.”
A few images from bing/Dall-E and we can create a very convincing article that would easily pass muster as an authoritative discussion on the benefits of continuing to burn fossil fuels with minimal editing and formatting, just to cut out the caveats that ChatGPT inserts in counterfactual text requests we can have such pearls of wisdom to impart upon the world as:
Access to affordable and reliable energy is a crucial driver of economic development, and historically, fossil fuels have played a significant role in providing low-cost energy solutions. While there are concerns about the environmental impact of fossil fuels, particularly their contribution to climate change, it is essential to understand the benefits they have brought to the developing world and the potential consequences of increasing energy costs.
Read the whole synthetic article in pdf form below and consider the difficulty of finding a shared factual foundation in a world where it is trivial to synthesize plausible authority.
A seemingly common task would be to convert a talk or presentation given as a slide show into a playable video using a standards compliant format like WebM, which plays in almost all HTML 5 compliant browsers, that is just about everything but Safari on iOS (because Apple are walled garden asses) IE (because derp duh) and Opera Mini (because … well maybe too much work).
These days, WebM supports the pretty fantastic encoder AV1, which is the new thing that is genuinely open source and everyone should use for everything for this reason. It’s also absurdly slow to encode to, so maybe not everything, but it yields better results than VP9, which is also pretty decent. Compared to the H.series encoders (263/4/5/6), my results put VP9 between 4 and 5, depending on content and AV1 between 5 and 6, which is pretty solid and there aren’t too many tradeoffs with h/x.265 for most applications and you get FOSS freedom. Hopefully there will be much better hardware encoder support soon, though I’m sure we’re not gonna get that in iOS anytime soon. H.264 remains a pretty solid format for most use as it has broad and well optimized hardware compressors and decompressors. H.265 seems pretty well entrenched as well, but H.266 has been slow to pick up and very unscientifically I see AV1 as likely to undermine the H.266 adoption, though we’re still a long way from the necessary hardware for mobile/IoT devices to be confident of such an outcome yet.
I spent some time messing around with the process and testing various parameters. The fruit of that labor is as follows.
Export your slides into some useful format
The first step is exporting slides into a useful format like PNG. Assuming you have or can convert a presentation to PDF, then you can extract all the pages to individual files with pdfseparate, a command line tool that comes with poppler (which you’ll have because you run Inkscape, right? You should).
Set the width and height to meet your specific needs. Maybe 1920×1080? 1536×2160?
Extract or convert your audio file to wmv
This tutorial uses whisper to convert the audio file to text, which also makes it easy(ier) to get slide timings to about a second from the .vtt file whisper generates. The first step is converting the audio file to .wmv so whisper can eat it. I used ffmpeg (you need a recent version for AV1 support later so depending on distro, you might need to build it) to do this like
Note that I am forcing mono audio with -ac 1 (and beware out of phase audio can cancel). I would expect most talks to not rely on stereo effects, but YMMV. Once you have this file and you’ve installed whisper (surprisingly easy), it will convert your long audio ramblings into equally (in)coherent text (the accuracy is surprisingly good) with a simple:
whisper audio.wav
At this point you should have a folder full of .png slides, an audio file of your talk in .wav, and a variety of text files in various text formats including a .vtt file that has time stamps in MM:SS.sss format and text and it is time for the first tedious bit. You need the from start HH:MM:SS.sss timing for each slide, plus the slide-to-slide timing in seconds (SSS), and it is helpful to have the from start seconds (SSSSSS) timing to verify key frame placement using ffprobe. Since Whisper’s .vtt file is only integer seconds, this isn’t going to be fractional second accurate timing, but good enough for slide transitions in a talk.
You could put this in a spreadsheet to automate the conversions from the .vtt’s format to the needed formats. I did it manually like this:
You have to convert all those time stamps into absolute seconds. I haven’t automated this yet, but it should be doable with a little regexp and bash or python if you have a lot of slides. If I do this with a talk that has enough slides to justify the effort, I’ll post a converter script.
Convert the Stills to Video Clips
This part assumes no fancy transitions, just jump cuts and that each slide is meant to be on-screen for some amount of time, the audio droning on over it and, perhaps, subtitles below it. This isn’t high art, mind you, just a utilitarian conversion for web viewing. The first step is to determine the slide transitions points, which you’ll need in absolute time format and in seconds/slide format, data which should be pretty easy to access from the speech-to-text .vtt file:
WEBVTT
00:00.000 --> 00:05.000
Okay, today's talk is going to be about quantum cryptography and quantum computing.
(snip)
00:46.000 --> 00:49.000
So let's get right into it.
Slide one’s duration is 49 seconds and slide two should be shown at 00:00:49.000; collect this data for all the slides. Next, make text file for each slide that includes the slide file name (the .png file created earlier) and the duration (seconds on screen, not the absolute time). The files should look like this:
Slide 3 will be shown for 6 seconds. The file name appears twice around the duration because of some quirk in ffmpeg’s concat function. You should now have in your folder each slide as a .png file and for each slide a .txt file that describes the duration in file system sortable order (slide01.txt, slide02.txt etc).
I put the video encoding command in a bash script to make it a bit easier:
#!/bin/bash
# AV1 compression single pass
for i in *.txt
do name=`echo "$i" | cut -d'.' -f1`
ffmpeg -hide_banner -f concat -i "$i" -y -vf fps=10 -c:v libsvtav1 -pix_fmt yuv420p10le -preset 3 -svtav1-params tune=0:color-range=1:keyint=60000:scm=1 -b:v 0 -crf 40 -an "${name}.webm"
done
what the settings mean:
-vf fps=10 set the output video frame rate to 10fps. Why? Because timing works out. 5 might work too.
-c:v libsvtav1 use the intel encoder, it is like 10x faster than libaom.
-pix-fmt yuv420p10le use 10 bit encoding which makes gradients and dark areas better at a small cost, might as well.
-preset 3 this determines encoding effort. 3 was manageable. 2 took a loong time. YMMV
-stvav1-params this passes parameters through ffmpeg to the CODEC
tune=0 tuned for content rather than PSNR tests
color-range=1 full (computer) color rather than studio color
keyint=60000 don't put in extra keyframes at all, just starting I then all P frames
scm=1 peeps say 1 is good for digital graphics and maybe animation, 0 is default for live action
-b:v 0 don't limit bandwidth (quality control only)
-crf 40 this is a very low target quality because there's no motion to worry about
-an no audio (for now)
Note that since iOS devices can’t do AV1 yet, it might be preferable to use the less efficient VP9 either as the sole version or as a fallback. This can be done with:
#!/bin/bash
# VP9 compression single pass
for i in *.txt
do name=`echo "$i" | cut -d'.' -f1`
ffmpeg -hide_banner -f concat -i "$i" -y -vf fps=10 -c:v libvpx-vp9 -pix_fmt yuv420p10le -deadline best -cpu-used 0 -b:v 0 -crf 40 -g 60000 -an "${name}.webm"
done
The options mean
-deadline best means to use the highest quality, slowest encoding
-cpu-used 0 default is 0, but never hurts to be sure, best quality
-g 60000 libvpx-vp9 uses the ffmpeg "g" to set keyframe intervals
-lossless 1 seems like a good thing for slides, but yields quite large files
save and chmod+x and then execute
UPDATE
I had trouble with ffmpeg slide timing with the concat command, something I am apparently not alone in. I rewrote the conversion shell script to be a lot more robust, this reads a tab-delimited text file called files_to_encode.txt in a format of
file_one.png\t3.12
file_two.jpg\t25.344
...
and then executes ffmpeg to convert the images to single keyframe (and no intermediate frame) video files with one frame duration accuracy of the duration values – that is fractional values are allowed. The default of 20fps should be within 0.05 seconds of the target length.
#!/bin/bash
# Base directory - change this to your absolute path
BASE_DIR="/home/gessel/Work/Slocumisms/IPHROS/antigua/Short_Presentation/pages"
# Add logging
LOG_FILE="$BASE_DIR/encoding_log.txt"
PROCESSED_FILES="$BASE_DIR/processed_files.txt"
# Initialize log file with timestamp
echo "=== Encoding session started at $(date) ===" > "$LOG_FILE"
# Change to base directory
cd "$BASE_DIR" || {
echo "Error: Cannot change to base directory $BASE_DIR" | tee -a "$LOG_FILE"
exit 1
}
# Function to convert seconds to HH:MM:SS.msec format
convert_to_hms() {
local total_seconds=$1
local seconds=${total_seconds%.*}
local msec=${total_seconds#*.}
msec=$(printf "%-3s" $msec)
msec=${msec// /0}
local hours=$((seconds / 3600))
local minutes=$(((seconds % 3600) / 60))
local secs=$((seconds % 60))
printf "%02d:%02d:%02d.%s" $hours $minutes $secs $msec
}
# First, ensure the input file is in Unix format
dos2unix -n "$BASE_DIR/files_to_encode.txt" "$BASE_DIR/files_to_encode.unix.txt"
# Read all lines into an array
mapfile -t lines < "$BASE_DIR/files_to_encode.unix.txt"
# Process each line
for ((i=0; i<${#lines[@]}; i++)); do
line_number=$((i + 1))
line="${lines[$i]}"
# Debug logging for raw line
echo "DEBUG: Line $line_number raw: '$line'" >> "$LOG_FILE"
echo "DEBUG: Hex dump of line:" >> "$LOG_FILE"
echo -n "$line" | xxd >> "$LOG_FILE"
# Skip empty lines
if [ -z "$line" ]; then
echo "Line $line_number: Empty line, skipping" | tee -a "$LOG_FILE"
continue
fi
# Split the line using parameter expansion
filename="${line%%$'\t'*}"
duration="${line#*$'\t'}"
# Debug logging after split
echo "DEBUG: After split:" >> "$LOG_FILE"
echo " Filename: '$filename'" >> "$LOG_FILE"
echo " Duration: '$duration'" >> "$LOG_FILE"
# Validate input file exists
if [ ! -f "$filename" ]; then
echo "Line $line_number: Input file '$filename' not found" | tee -a "$LOG_FILE"
continue
fi
# Extract base filename without extension
base_filename="${filename%.*}"
# Check if output file already exists
if [ -f "${base_filename}.webm" ]; then
echo "Line $line_number: Skipping ${filename} - output file already exists" | tee -a "$LOG_FILE"
continue
fi
# Validate duration format
if ! [[ "$duration" =~ ^[0-9]+(\.[0-9]+)?$ ]]; then
echo "Line $line_number: Invalid duration format '$duration'" | tee -a "$LOG_FILE"
continue
fi
# Convert duration to HH:MM:SS.msec format
duration_hms=$(convert_to_hms "$duration")
echo "Line $line_number: Processing $filename with duration $duration_hms" | tee -a "$LOG_FILE"
# Execute ffmpeg command with reduced verbosity
if ffmpeg -hide_banner -loglevel error -loop 1 -framerate 1/20 -i "$filename" \
-vf fps=20 -c:v libvpx-vp9 -pix_fmt yuv420p10le -deadline best -cpu-used 0 \
-b:v 0 -crf 40 -g 60000 -ss 00:00:00.000 -t "$duration_hms" -an \
"${base_filename}.webm" 2>> "$LOG_FILE"; then
echo "Line $line_number: Successfully encoded $filename" | tee -a "$LOG_FILE"
echo "$filename" >> "$PROCESSED_FILES"
else
echo "Line $line_number: Error encoding $filename" | tee -a "$LOG_FILE"
fi
sleep 1
done
# Clean up temporary file
rm "$BASE_DIR/files_to_encode.unix.txt"
echo "=== Encoding session completed at $(date) ===" >> "$LOG_FILE"
I also learned something new as my latest project was recorded from text, rather than converted from a live talk, that recording each slide’s audio text as an individual track in Audacity and then make sure the extras menu is enabled and use Extra->Scriptables II -> Get Info… and set type to Tracks and Format to Brief to get the slide timing info in start seconds/stop seconds. Use calc to subtract the per track start time from the end time to get duration.
And a practical hint, normalizing audio tracks to -3dB is a pretty standard expectation, but I find that the perceived loudness is somewhat random even if the peak of each track is -3dB. Using Effect -> Volume and Compression -> Loudness Normalization and setting perceived loudness to somewhere between -16 and -22 LUFS gives much more consistent perceived loudness (who’da’thunk?). I recommend starting at about -16 LUFS and then checking for any clipping and then decreasing by steps of -1 or -2 LUFS until there’s no clipping indicated in the waveform (indicated in red in the normal interface). This seems to result in more consistent audio tracks than normalizing to peak amplitude.
Once this finishes (and it will be a while with AV1) there will be a video file of the right number of seconds for each .png file. For my slides, the video clips are about 60-80% the size of the original png slides, because AV1 is much more efficient than .png even for still compression (like webp, based on vp9, which AV1 is the successor to).
The next step is to concatenate all the slide videos into a single video stream. First we create video list file from the folder of webm files like:
for f in *.webm; do echo "file '$f'" >> vidlist.txt; done
then we use ffmpeg again to merge the list into a single file like:
-c copy Video streams are direct copied, no re-compression.
To verify the stream parameters you can use
ffprobe -hide_banner -select_streams v -show_entries frame=pict_type,pts_time -of csv=p=0 -i slideshow.webm | grep -v P
This should show a key frame at the cumulative seconds count (not H:M:S.MS format) for each slide change and no others (assuming there’s < 100 minutes per slide). Note this makes seeking really slow (REALLY slow) like 5 seconds to jump but to each slide is close to instant. You could use a standard value like “150” for the keyint meaning a keyframe every 15 seconds to speed up searching but at the cost of a lot of filesize.
The original png files were 9.3 MiB, the AV1 video conversion is 2.9MiB and the VP9 conversion is 3.1MiB. For a slide show, I’d argue that AV1 isn’t likely to be worth the extended encode time and compatibility issues, but YMMV and it is worth doing tests as results are very content dependent.
At this point you should have a video file without any audio and still have your .wav file plus your timing file. Next we’re going to add (back) the audio.
Adding Audio Back
This is a fairly straight forward, but we have to compress using an allowed codec for webm. I also keep the single channel and use a moderate data rate for speech.
-c:a libopus Use libopus, an allowed audio codec in webm
-b:a 48k Compress at 48kbps, this is quite good for speech
-ac 1 One audio track. If you're doing stereo, then default is fine
Now you have an audio video file, synced audio and slides and should be quite compact whether in VP9 or AV1, but it can be nice to add some metainformation including subtitles and chapter headings.
Adding subtitles back.
I suggest giving the whisper produced .vtt file at least a cursory edit. It is quite good, but can have trouble with homophones, which is understandable, especially with technical jargon. Once you’re happy with the text, you can merge the subs back into the webm container, tag the audio stream and subs with languages using:
-map 0:v use the video from index 0 (first input)
-map 0:a use the audio from index 0 (first input)
-map 1:s use subtitles from index 1 (second input)
-metadata:s:a language=eng the audio is english (or pick your lang)
-metadata:s:s:0 language=eng the subs are english (or pick your lang)
Now your video file has subtitles and these should be selectable in VLC player
Add metadata and chapters with MKVToolNix
Chapter data and additional metadata is our first foray away from ffmpeg to another open source tool called MKVToolNix. Chapter data is easiest (and most reusable) by creating a chapter file using the slide timing data collected previous, but this time in absolute time in HH:MM:SS.sss so it looks like this (with the blank chapter at the out time of the whole video):
CHAPTER01=00:00:00.000
CHAPTER01NAME="Quantum Computing and Cryptography"
CHAPTER02=00:00:28.000
CHAPTER02NAME="1.0 Topics to be Covered"
...
CHAPTER22=00:50:07.000
CHAPTER22NAME="6.0 Practical Implementations"
CHAPTER23=00:51:44.000
CHAPTER23NAME="7.0 Conclusion: Gessel's Law P=2^2^(Y/2) rev: P=2^2^(Y/3.4)"
CHAPTER24=00:54:50.000
CHAPTER24NAME=""
You open this chapters.txt file in the Chapter Editor tab of MKVToolNix and then right click and select “additional modifications” then select the language. Finally, save from the “Chapter editor” menu (top of screen) and select “Save to Matroska or WebM file” and confirm that you’re going to overwrite the no-chapters version (with the addition of the chapter data).
The last tidbit is to add some moderately useful metainformation, at least title and possible date (if relevant). Title, at least, is what’s used as the VLC title and possibly in other places. The MKVToolNix header editor tab will do what’s needed. You want to edit the “Segment information” – I’m not sure where the track information titles show up, so I don’t bother with them, but no harm in editing those either. Then just save with CTRL-S to update your WebM video with the additional metadata.
That’s it, you should now have a well-formatted, searchable, indexible video that will play directly from your own web server without relying on plugins or gifting your data to services like youtube or tiktok or whatever data harvesting service is luring the unwitting to data slaughter with Judas goats of convenience.