CaptionHub Voiceover

Requirements: Enterprise subscription

CaptionHub Voiceover is a text-to-speech service that allows you to create synthetic voice-overs right inside of CaptionHub. CaptionHub Voiceover will ‘read out’ the text that’s attached to a set of captions.


To use CaptionHub Voiceover to export audio:

  1. In the main caption track, ensure that you have original captions and translations that are correct, and synchronised to the audio
  1. If you have multiple speakers, make sure that they’re labelled correctly via caption metadata
  1. Navigate to the Export tab, and click on the “CaptionHub Voiceover” button
  1. A modal will pop up, allowing you to map your speakers to different voices. You can preview the voices at any time.
  1. Once you’ve mapped your voices to speakers, click the “Create audio” button. This will initialise the process of rendering the audio.
  1. Once the audio has rendered, you’ll be able to download it from the Export tab. Simply click the Download button and you’ll have the option to “Download CaptionHub Voiceover (audio)”.

To use CaptionHub Voiceover to export video with embedded audio:

  1. Follow the process above, to the point at which you have audio ready for download
  1. Click on the “CaptionHub Render Voiceover” button
  1. Click the Download button and you’ll have the option to “Download CaptionHub Voiceover (Video)”. This will download the original video file that you uploaded to CaptionHub, but with the audio tracks replaced with the synthetically generated audio.


  • Voices with an asterisk next door to them are neurally generated, and are higher quality. If you can, we’d recommend prioritising these.
  • The full list of voices is available here.
  • Our algorithm will align the start of each sentence with the generated audio, keeping everything in sync. Use the “Enforce strict synchronisation with captions” checkbox, if you’d like to keep everything in sync. Where a sentence of voiceover is ‘longer’ than the captions allow for, CaptionHub will increase the speed of the voiceover in order to make it fit.
Did this answer your question?