When Upgrading FFmpeg Backfires: A Tale of Defaults

It all began when we had to upgrade our ffmpeg binaries from 4.4.x to the latest 7.1, because a new image with security patches did not contain the old one anymore. At first things went pretty ok, but soon after testing, we discovered an eerie performance degradation in rather simple use cases.

The Setup

In one of our services, we aggregate mp4 chunks of a render result, concatenate them using Concat Demuxer into an intermediary Transport Stream container. Then we merge that with an audio stream, transcoding only the audio into AAC.

# First: concatenate video chunks into TS
ffmpeg -f concat -safe 0 -i video_chunks.txt -c copy output.ts

# Second: merge with audio, transcode audio to AAC
ffmpeg -i output.ts -i audio.mp3 -c:v copy -c:a aac -movflags +faststart output.mp4

Splitting this process into two commands while using TS has benefits. First, it helps reduce the risk of audio/video sync issues, because it uses continuous presentation timestamps across segments, whereas with mp4 slight shifts or mismatches in PTS, or non monotonous DTS could result in sync issues. Second, ffmpeg doesn’t have to keep track of the MOOV atom in memory until the result is finalized, which reduces peak memory consumption when remuxing the result with audio.

Finding the Culprit

Intuitively, using the concat demuxer and remuxing audio and video are rather simple operations which are also pretty fast as no transcoding takes place, so there’s less chance they cause the difference in performance. Instead, retranscoding the audio stream has more potential to create the impact.

Since the command is split, we could easily benchmark the process and see that indeed the source of trouble lies in the second command.

However, with the many improvements ffmpeg had since major version 4, and playing a major role in the video industry as it is - it made less sense that such a basic issue exists in the 7 release, rather than a simple change in the way we utilize it. Attempts to compare performance between the versions on different platforms did suggest a basic difference does exist though.

Investigating the AAC Encoder

The next step was to research the immediate suspect, and see if anything special changed in the AAC encoding options between versions. ChatGPT suggested: “Starting in FFmpeg 5.0 (Jan 2022), the defaults were changed: the encoder switches to variable‐bitrate (VBR) quality mode (with -q:a ≈0.8 by default) instead of fixed 128 kbps”. Fortunately, a trivial check validated this was not the case, and rather a hallucination.

What was interesting though, is that the source it pointed to did confirm that “The default AAC encoder settings were also changed to improve quality”. Looks like we are getting close!

Since the changelog was pretty opaque, I grabbed the release/5.0 branch of the official git repository and produced the HTML version of the docs:

brew install texi2html
texi2html ffmpeg-codecs.texi

Comparing the compiled ffmpeg-codecs.html to the current aac docs, it was clear that the aac_coder method was changed from twoloop to fast.

Bingo.

Understanding the Difference

But what is exactly the difference?

The “quantizer” mentioned in the FFmpeg AAC encoder “twoloop” method refers to a parameter used to control the level of quantization applied to frequency bands of the audio signal. Quantization is the process of approximating a continuous range of values with a finite set of levels, reducing the precision of the audio data to achieve compression.

The twoloop method sets initial quantizers based on band thresholds and then iteratively improves the encoding by adjusting these quantizers.

In contrast, the fast method applies a constant quantizer with the same quantization scale across all bands or frames, without this iterative optimization - which is simpler (and faster) but less efficient for achieving high audio quality at a target bitrate.

It is now clear why the changelog mentioned: “The default AAC encoder settings were also changed to improve quality”, at the expense of speed, that is :)

Lessons Learned

I think this makes a wonderful lesson that when dealing with ffmpeg it is safer to get acquainted with the different command options that are mission critical and get verbose about it. If we’re dealing with a use case that is somewhere in the pipeline which is incredibly performance sensitive, then we should map the critical defaults which might change in a future version.

It’s not perfect, but this way we reduce the risk when we have to make the next major version step.

[Top]