VOID Explained: Netflix Research on Video Object & Interaction Deletion

When editors talk about “removing something from a clip,” they usually mean inpainting: hide the object and fill plausible pixels. VOID (Video Object and Interaction Deletion)—from Netflix-affiliated researchers and collaborators—extends that to cases where pixels alone are not enough: if a removed object pushed, blocked, or deflected something else, the whole timeline may need to change (project site).

For BGBlur readers who polish interviews, product shots, or social cuts, VOID is a good overview of where academic video ML is headed: counterfactual video that respects simple physics, not only texture.

Demo: the VOID-style clip we attached to this post

The MP4 below is the GitHub user-attachment you provided, shipped as /videos/void-demo.mp4 on this site so playback stays reliable (signed GitHub URLs expire). It is a good sanity check for smudge-free motion compared with interaction-aware removal.

How VOID works (high level)

Per the VOID site and paper (arXiv:2604.02296):

User selection highlights an object to remove.
A vision-language model (VLM) estimates which other regions are causally affected (things that should fall, ricochet, or reroute).
That guidance is encoded for a video diffusion backbone described as using CogVideoX-5B with SAM 2 in the overall stack.
A optional refinement pass uses flow-warped noise if the first synthesis morphs objects—a failure mode the authors associate with smaller video diffusion models.

Training leans on synthetic / motion-rich paired data (including Kubric and HUMOTO, as summarized on their page) so the network sees examples where “delete object A” really means “change the whole interaction.”

Runway, ProPainter, and evaluating quality

VOID positions itself against strong baselines in video object removal; on their materials you will see comparisons that include Runway-class and ProPainter-related references from the literature. Use those as paper-level guidance: they reflect specific datasets and metrics, not every real-world brief.

Across tools, creators still judge the same things: temporal consistency, lack of smears, and whether background motion looks intentional.

BGB (BgRemover) integration and what already works

BgRemover (BGB) at BgRemover.video already delivers the kind of clean, artifact-aware video object and background removal teams ship today—the baseline VOID builds on for harder physics cases.

Our roadmap: treat VOID as a blueprint for interaction-aware masking and training signals we can merge into BGB once they are robust enough for production SLAs. BGBlur stays focused on cinematic background blur and privacy-style effects, while BGB remains the home for removal—so integration work channels through the same product family you already use.

FAQ

What does “interaction deletion” mean?

Removing an object and updating how other objects move when they were physically coupled to it—per VOID’s framing on void-model.github.io.

Is VOID available as a consumer app?

The public artifacts today are research-grade; production tools like BgRemover continue to offer the practical path for removals right now.

Where is the official write-up?

Paper: arXiv:2604.02296
Site: https://void-model.github.io

References

Saman Motamed, William Harvey, Benjamin Klein, Luc Van Gool, Zhuoning Yuan, Ta-Ying Cheng, VOID: Video Object and Interaction Deletion, 2026. https://arxiv.org/abs/2604.02296