AVSync15 Demos
For each of the 15 categories in the AVSync test set, we selected one example video to show the results of different generation methods.
It is recommended to use earphones to hear the demos videos, raise the volume and zoom in the videos.
Greatesthits Demos
We selected an 5 cases from Greatesthits test dataset to show the results of different generation methods.
It is recommended to use earphones to hear the demos videos, raise the volume and zoom in the videos.
Landscape Demos
For each of the 9 categories in the Landscape test set, we selected one example video to show the results of different generation methods.
It is recommended to use earphones to hear the demos videos, raise the volume and zoom in the videos.
Comparison
Below cases further illustrate the comparison between Vanilla CFG and Enhanced Joint-CFG, demonstrating the former's ability to generate more varied visuals (as seen in case 1 with the forging hammer), maintain image clarity (as observed in case 2 with the rooster's head movement), create more coherent scenes (as in case 3 with the undisturbed bowling pins), and produce superior sound quality (as in case 4 with the more pure sound).