MuZero does YouTube
Tue 15 February 2022
If you've been watching YouTube lately, the encoding settings for the video you watched might have been selected by MuZero - using less of your bandwidth for the same quality. The task here is rate control: selecting the quantization parameters within the VP9 codec to maximize the quality at a specified bitrate.
This is a constrained RL problem, requiring us to optimize two conflicting objectives of variable difficulty at the same time. To deal with this challenge we introduce a self-competition based reward mechanism, where the reward depends on how successful other recent episodes were at maximizing the quality while staying under the bitrate concern. This allows the agent to improve smoothly at any stage of learning, no matter whether it has not yet learned to stay within the constraints, or whether it is in the final stages of maximizing quality.
See our official blog post as well as our paper for the full details.
italics | surround text with *asterisks* |
bold | surround text with **two asterisks** |
hyperlink | [hyperlink](https://example.com)or just a bare URL |
code | surround text with `backticks` |
surround text with ~~two tilde characters~~ | |
quote | prefix with > |
Loading comments...