From my distant location overseas, listening to the news via podcasts is a great way to keep up: something I’m quite grateful to have access to on demand and via the internets. Until the end of Net Neutrality means that only “Verizon Insights” and “Life at AT&T” are still accessible, I enjoy a range of news sources on a daily basis using a podcatcher called Beyond Pod. One of the essential features it has is the ability to speed up the tempo of podcasts, some of which are a bit slow as recorded. But one…. one is like dripping molasses on a winter day: The Daily from the NYT by Michael Barbaro. I’m pretty sure silences are inserted in editing to draw out the drama to infuriating lengths and the tempo of the audio is selectively slowed to about half normal speed. Nobody can actually talk that slowly. I mean listen and try – like actually try to draw out a word that might take 1 second to pronounce to two full seconds. It is a pretty good news summary and has some useful information, but there’s no way I’d suffer through it without setting the tempo to 2x.
Every time I accidentally stream the podcast, rather than downloading and playing, the tempo control is disabled and while I scramble to skip to the next podcast before my I start questioning reality I often wonder for a moment just how bad the pauses are. Here’s my analysis:
|Podcast||# of Silences||% Silence||I Playback @|
|BBC Global News||102||0.3%||1.5x|
|NYT The Daily||282||8.4%||2.0x|
I consider the BBC Global News to be a very professional, truly “broadcast quality” podcast. The announcers are clear, comprehensible, and speak at a pace that is appropriate for a news broadcast. I still speed it up because daily life is like that now, but if I listen at normal pace, it isn’t even slightly annoying.
The Economist Radio is fairly typical of a print publication’s efforts at presenting print journalists as audio stars. it doesn’t always work out perfectly and the pacing varies a lot by who is speaking and the rather eclectic line-up. In general it is annoyingly slow, but not interminably so. It comes across as a bit amateur by broadcast standards, but well done and very informative.
But then there’s The Daily from the NYT. This podcast was the reason I took the time to figure out how to speed up playback. There was no other choice: either unsubscribe or speed it up to something not aneurysm inducing. Looking at the waveforms, I suspect they might actually be inserting silences of around 500msec between words, perhaps for dramatic effect (there’s way too much dramatic effect in a lot of the stories, which speeding up only hastens rather than fully alleviating—never have you heard so many interviewees break into uncomfortable tears as they’re overwhelmed by whatever the day’s tragedy is, an artifice only slightly less annoying than broadcasting the sound of someone eating. OMG, that’s real. Rule 34.)
The first step was to normalize to provide a fairly constant ratio of spoken amplitude to the amplitude of the spaces between the words.
To count the silences, I ran “Silence Finder,” which generates a tag track with all the discovered silences. Exporting that tag track and counting the lines gives the number of silences (moments quieter than -3o dB and longer than 300 msec).
This is a typical BBC article break: 1.13 seconds long. This would get truncated as a silence.
This is a typical BBC announcer word break, about 50 msec.
By comparison, here’s a typical “The Daily” word break: 874 msec. Shorter than an article break on the BBC but about 30x longer than a typical BBC word break. No wonder The Daily sounds like they’re on Quaaludes.
The next step was to truncate silence with the built-in filter. It takes any silence consistently lower than -30 dB and longer than 300 msec and cuts it down to 300 msec. This makes the pacing on The Daily sound almost normal, though the tempo is still bizarrely slow.
There are some podcast tools that have automatic silence skipping, but BeyondPod isn’t one of them. It does allow speedup, which also speeds up those annoying silences, but they are still odd. The result is a very staccato speech style with normal or slightly fast pronunciation of the words with weird gaps between them.
This is the intro to The Daily as distributed (this is “normal” playback speed, really):
And sped up 75% so it sounds kind of normal. Listen to the first sample again after playing this one.
https://www.bbc.co.uk/programmes/p02nq0gn/episodes/downloads http://aod-pod-ww-live.akamaized.net/mpg_mp3_med/modav/bUnknown-5bca4af6-4cb3-4482-a450-bc715a6c57a7_p05s9k03_1514563626168.mp3?__gda__=1514652930_a32452856567e60f510127ca7b5eec92 GlobalNewsPodcast-20171229-ChinaAccusedOfOilTransfersToNorthKorea.mp3 Stereo track properties short break 50msec, long break between words 263 msec Stereo track properties Left channel: Peak level: 0.530338 (-5.5 dB) .. Passes ACX RMS level: 0.054236 (-25.3 dB) < < Less than ACX -23 dB min NoiseFloor: 0.000221 (-73.1 dB) .. Passes ACX RMS (A): 0.036387 (-28.8 dB) NoiseFloor (A): 0.000116 (-78.7 dB) DC offset: -0.002238% Clip fails to meet ACX requirements RMS level is outside the ACX specification of -18 to -23 dBFS Right channel: Peak level: 0.530353 (-5.5 dB) .. Passes ACX RMS level: 0.054397 (-25.3 dB) << Less than ACX -23 dB min NoiseFloor: 0.000221 (-73.1 dB) .. Passes ACX RMS (A): 0.036511 (-28.8 dB) NoiseFloor (A): 0.000116 (-78.7 dB) DC offset: -0.002232% Clip fails to meet ACX requirements RMS level is outside the ACX specification of -18 to -23 dBFS Length of selection: 1950.72 seconds. 86026752 samples at 44100 Hz. Duration: 32:30.720 Pre truncate Duration 32:24.410 post truncate 102 silences Length of selection: 1945.41 seconds. 85792581 samples at 44100 Hz. 0.3% silence
economoist-encoded-98994fbf71d7a093f914af49374f2f8e-128.mp3 https://web.archive.org/web/20190321221112/http://radio.economist.com/ Money Talks: Once bitcoined, twice... Dec 12, 2017 Stereo track properties Left channel: Peak level: 0.777844 (-2.2 dB) << Exceeds ACX -3 dB max RMS level: 0.057140 (-24.9 dB) << Less than ACX -23 dB min NoiseFloor: 0.000353 (-69.0 dB) .. Passes ACX RMS (A): 0.035065 (-29.1 dB) NoiseFloor (A): 0.000145 (-76.8 dB) DC offset: -0.002515% Clip fails to meet ACX requirements Peak exceeds ACX specification of -3 dBFS RMS level is outside the ACX specification of -18 to -23 dBFS Right channel: Peak level: 0.777867 (-2.2 dB) << Exceeds ACX -3 dB max RMS level: 0.057060 (-24.9 dB) << Less than ACX -23 dB min NoiseFloor: 0.000353 (-69.0 dB) .. Passes ACX RMS (A): 0.035057 (-29.1 dB) NoiseFloor (A): 0.000145 (-76.8 dB) DC offset: -0.002464% Clip fails to meet ACX requirements Peak exceeds ACX specification of -3 dBFS RMS level is outside the ACX specification of -18 to -23 dBFS Length of selection: 918.544 seconds. 40507776 samples at 44100 Hz. Typical word-word break 70msec topic break: 900msec Track duration 15:18.544 pre truncate 14:36.581 after truncate Length of selection: 876.581 seconds. 119 silences 4.6% silence
The Daily 21 Dec Fix Final http://feeds-origin.podtrac.com:8007/nyt-thedaily https://web.archive.org/web/20180516195227/https://dfkfj8j276wwv.cloudfront.net/episodes/ce243027-afdc-4442-954f-04e9b06904df/d54d1d1590e6b48e3613ede3e3d3742b6b92e51122b64336882ac67ac2be3eee21f97175313b8d25d2e1b361bbc3bb25126d7a42da564fac37f6810a2a678f78/TD%20Dec%2021%20FINAL%20no%20ad%20break.mp3 Stereo track properties Left channel: Peak level: 1.001390 (0.0 dB) < < Exceeds ACX -3 dB max RMS level: 0.080325 (-21.9 dB) .. Passes ACX NoiseFloor: 0.000000 (-290.9 dB) .. Passes ACX RMS (A): 0.051846 (-25.7 dB) NoiseFloor (A): 0.000000 (-290.9 dB) DC offset: -0.001172% Clip fails to meet ACX requirements Peak exceeds ACX specification of -3 dBFS Right channel: Peak level: 1.003223 (0.0 dB) << Exceeds ACX -3 dB max RMS level: 0.079702 (-22.0 dB) .. Passes ACX NoiseFloor: 0.000000 (-288.3 dB) .. Passes ACX RMS (A): 0.051543 (-25.8 dB) NoiseFloor (A): 0.000000 (-297.5 dB) DC offset: -0.000963% Clip fails to meet ACX requirements Peak exceeds ACX specification of -3 dBFS Length of selection: 1793.04 seconds. 79073280 samples at 44100 Hz. Word break 875msec - looks like silence is INSERTED Unfixed program length: 29:53.045 Truncated 27:22.529 282 silences Length of selection: 1642.53 seconds. 72435516 samples at 44100 Hz. 8.4% silence After 75% temp boos to normal cadence Length of selection: 938.588 seconds.