A question that keeps coming up over and over from readers of this website and on some of the forums I participate in relates to editing modern video file formats on a general all purpose computer.
The questions take various forms but in a nutshell it boils down to why is it so resource intensive and problematic when compared to other activities on a computer.
The reason people have this trouble comes down to a number of factors but I guess at the bottom of it all is that these current, highly compressed video files were never designed with editing in mind.
The only requirements that were considered as they were being designed were the ability to capture footage at a very high level of quality, the smallest possible file size and the ability to playback those files at acceptable bitrates.
As a result we now have the H.264 and the H.265 video codecs as the predominant solutions to both the creation and playback of most of our videos.
These two codecs rely on a system of capturing frames within their structure in a way that satisfies the original requirements of quality, file size and acceptable bitrates.
Long GOPs
To do this they use a system called the Long GOP (Group of Pictures) structure.
What that means is that in any sequence of captured frames only a few of those frames are actually full recordings of the scene.
All the others are partial recordings plus instructions to find the remaining information from other frames either before or after the current frame. This is called Interframe dependency.
This compression technique, while highly efficient for storage and playback, introduces complexities due to those interframe dependencies.
In fact it is slightly inaccurate to describe it as “compression” as that suggests they are taking something complete and “compressing” it into a smaller package.
In actuality what they are doing is throwing a bunch of data away and leaving tiny notations as to where the software can find what was thrown away from other frame data.
Understanding Long GOP Structures
A long GOP structure is a sequence of video frames consisting of three types:
I-frames (Intra-coded frames): These are complete, standalone images and serve as key reference points.
They are probably best thought of as being like a .jpg file. Slightly compressed but complete in themselves.
P-frames (Predicted frames): These reference previous frames for data, reducing redundancy.
As an example let’s say you shoot a video of a wall with a ball moving through the frame.
The P-frame only contains the necessary information of the ball’s new position since the previous frame.
All other information is not re-recorded but there is simply a note as part of the frame information to refer to the previous frame for the wall and everything else that has been repeated.
B-frames (Bidirectional predicted frames): These rely on both preceding and succeeding frames for data compression.
These frames are similar to the P-frames in using data from previous frames but also add data from later frames especially for maintaining smooth motion.
For example, a common Long GOP pattern might look like: I-B-B-P-B-B-P-B-B-P-B-B-P-B-B-P-B-B-P-B-B-P-B-B-I
The structure starts with an I-frame (a complete frame) followed by a mix of P-frames and B-frames and concludes with another I-frame.
This arrangement creates a highly efficient file at both the capture and playback points but becomes a nightmare for editing software because of the interdepenancies that exist.
I am going to cover how your editing sfotware deals with all of this but bear in mind everything that has to be done to “reconstitute” all of those thousands of “incomplete” frames, is going to be done by your computer… on the fly!
How Editing Software Handles Long GOP Footage
Identifying and Managing GOP Structures
Editing software first has to identify the GOP pattern within the video.
Since I-frames can be decoded independently the software relies on them as reference points for both navigation and editing.
When edits occur somewhere within a GOP the entire sequence will need adjustment to maintain structural integrity.
Rebuilding the GOP
When an edit is made within a GOP (e.g., cutting or applying an effect), the software needs to rebuild the GOP entirely.
This process involves creating a new I-frame or re-encoding the affected frames.
Rebuilding ensures that the resulting video maintains proper compression and playback.
For example if you make a cut at a point between two B-frames, the software has to create a new I-frame at the “new” end of the clip plus another I-frame at the “new” beginning of the next clip where you made the cut.
Handling Frame Dependencies
The interdependence of P-frames and B-frames requires special handling.
If a P-frame or B-frame is modified, deleted, or moved, adjustments are necessary for all dependent frames to ensure accurate decoding.
Given that an average Long GOP can be around twenty-five frames, it means that the software has to completely recalculate everything for about ten frames in both directions.
Computational Challenges
Editing long GOP video can strain even high-performance systems.
Tasks such as scrubbing through the timeline, reversing playback, or applying effects demand significant computational power.
This is why you may suffer from lags, stuttering or delays during editing because again, “on the fly” everything has to be thrown together as you are doing that.
Remember that in reality, technically no frame other than the I-frames actually exist as a complete image.
Exporting and Rendering
Exporting long GOP footage can be time-intensive.
Any edits that disrupt the original GOP structure require re-encoding which increases export times.
Advanced editing systems may employ techniques like smart GOP splicing to optimize this process but at the end of the day this is still computational work your computer has to complete.
Proxy Workflows and Optimizations
To improve real-time performance many editing systems allow the use of proxies which are low-resolution versions of the original footage.
This can certainly reduce the workload on your computer during the editing process given that the sheer amount of data being moved around is greatly reduced… BUT!
At the end of the day it is still only the amount of data being recalculated that has reduced, not the actual number of calculations that need to be performed.
Inherent Problems Editing Long GOP Video
Disruption of GOP Structure
Edits within a GOP disrupt its original structure necessitating re-encoding and hence the pressure on your computer.
This process is not only resource-intensive but can also introduce compression artifacts, particularly if the footage undergoes multiple rounds of editing.
Quality Degradation
Repeated re-encoding leads to progressive quality loss due to the introduction of compression artifacts.
This is especially problematic when applying effects or transitions, as these segments need to be re-encoded in their entirety.
Random Access Limitations
The dependency of P-frames and B-frames on I-frames limits the software’s ability to provide random access to individual frames.
Editing software relies on I-frames as starting points for decoding, which can slow down precise edits.
For example if you choose to make a cut at a particular point the software has to find the nearest later and earlier I-frames to reconstruct that point as “whole” images to then make the cut.
Performance Bottlenecks
Editing high-definition or high-bitrate footage encoded with long GOP structures can overwhelm even powerful systems.
The decoding and re-encoding demands increase with the complexity of the project leading to slower workflows.
Error Propagation
Errors or glitches in one part of a GOP can affect multiple frames until the next I-frame.
This cascading effect complicates error correction and may require rebuilding significant portions of the GOP.
To clarify this it has to be understood that as a video file is being recorded there may be pieces of information that are missed, written incorrectly to the recording media or just skipped by the recording device in order to keep up.
Those tiny glitches are no problem for playing back the footage because they are minute in their nature.
However if you decide to edit at one of those points then the editing software may run into trouble reconstructing that part of the file due to the missing data.
Worse still the errors may get included into the new file with all new dependant P-frames and B-frames being built from that erroneous data set in a sort of snowballing effect.
The Bottom Line
Editing long GOP video involves navigating a complex interplay of frame dependencies, computational demands and workflow constraints.
While long GOP compression offers remarkable efficiency for storage and playback its intricacies demand careful handling in the editing process.
With continued advancements in NLE technology these challenges are becoming more manageable especially with the addition of GPU processing being added to the mix but even that has not completely solved this problem.
In the meantime the best way to deal with the situation is to have as powerful a computer as you can manage and get into the habit of slowing down a little to give it time to finish what it is doing.
Discover more from The DIY Video Editor
Subscribe to get the latest posts sent to your email.
Leave a Reply