“With three pictures, you have summarized the whole story – no need for text anymore,” wrote Christele Bouchat, Co-Director for the Broadband Forum’s Wireless-Wireline Convergence Work Area. This statement inspired the idea of putting those images into a sequence to promote the article to which Bouchat referred. This seemed like a perfect opportunity to create a 20-second video that would appear on LinkedIn as a post linking to the article.
This would be a simple “15-minute project”, as long as the scope remained limited to
- Reuse of the existing images. No additional animations, even though the temptation would be there.
- Text, based on the captions from the article, would augment the photos.
- No voiceover for several reasons
- It adds more time to the project. Text-to-speech still isn’t that good, unless a paid option is used.
- The story must work without audio, as, according to this research from 2017, the majority of people watch LinkedIn videos with the sound muted.
The result took about 15 minutes and, again, much to my surprise, my colleagues liked the result. They had a couple of changes that should have been minor but given how the images were generated (see part 1 of this article that describes the image generation process), it took longer than desired. The result, however, was a video that would serve the purpose.
As Mrs. Fields would say, “Good Enough Isn’t”[1]
Still, the video could be better, but it was obvious that the point of diminishing returns was well in view. Then, I made the mistake of sharing the pre-publication with a close friend who has a sharp eye. In his capable hands, he could turn this rough video into a masterpiece. He made valid comments about it being too fast with too much text.
His suggestions were incorporated, as much as possible without the wholesale surgery that would take it to the next level. Still, time was wasted trying to do some quick fixes that Roger could have done in a minute but that would take this editor hours (e.g. animating the radio waves, for instance).
It also seemed like it needed a voice, even though this was counter to LinkedIn’s advice. Not wanting to use my voice, several generative AI voice options were explored, including:
- Adobe Audition – it uses the underlying voices that are part of the PC’s operating system. They still sound synthesized.
- Paid options, such as Natural Reader and Eleven Labs, sound natural but were outside the budget. OpenVoice V2 looks promising.
- Google Cloud Synthesize to Speech
The Google Cloud Synthesize to Speech was chosen as its terms seem to allow for commercial use. The British voice seemed realistic enough. It required a bit of editing in Adobe Premiere, as the closing words were gibberish.
A Few Lessons and an A/B Test Possibility #
One of the good things is that a 16×9 aspect ratio was chosen when creating the original images (see part 1 of this article). Unfortunately, the Photoshop layers were not well organized, creating a more complex workflow than what was probably necessary.
The path of least resistance was taken by using a familiar tool, Adobe Premiere Pro. On one hand, this was good, but one must wonder if using some of the generative AI tools from companies, such as Canva, Runway, or Augie would have created a more engaging video in less time.
Lastly, here are the two versions: a 25-second, silent version, and a 36-second version with a voiceover. Which one performs better could be an interesting A/B test to explore in the future.
26 seconds – no voiceover
35 seconds – synthesized voiceover
[Final note – the text in this article was 100% human-generated, save for some grammar and spelling help.]
[1] From Debbie Fields’ book, One Smart Cookie.
Leave a Reply