One problem I see a lot is content creators uploading videos on YouTube or similar with audio that fluctuates in volume and quality since many are using their iPhone or a laptop to record video content. It would be a huge advantage and selling point of the Video converter if it could tackle the following issues:
Multi-pass Adaptive Leveling: leveling only vocal content while leaving Music, road and wind noise, etc alone. Aside from machine learning, possible solutions could include dividing the audio spectrum into different frequency bins and processing each one of those bins separately to keep the same volume level, based on a Munson Loudness curve. Frequencies falling outside typical human speech ranges could be leveled to a much lower volume retaining a natural sound while automatically filtering out high and low frequency noise components
Multi-pass Adaptive Dynamic Range Compression: dividing the result of leveling into short Windows of time and bringing down the peaks of the vocal frequencies while also pushing down the frequencies containing high and low frequency noise by the same amount. Using multiple passes could dramatic pumping effects from happening, especially if you varied the time Windows chosen for each pass.
Adaptive Noise Gate: if sections of time do not contain vocal frequencies, silence those areas or lower the volume down on those sections by -12db
Adaptive Phase Correction: for each wave sample (or a given number of samples) if the positive and negative values differ from each other, balance the phase so each section is in proper phase. Also allows more headroom for leveling (auto fader) and dynamic range compression.
Canal pass for target loudness levels based on -14 LUFS/LUKS with true-peak limiting to -2db - this allows for consistency across vocal content and eliminates possibility for clipping once encoded and played back across the range of possible equipment.
Just spit balling here and throwing out some ideas. I suspect most of the functionality to do this is already there with the software, just hasn’t been compiled in a streamlined process targeted for video podcasts yet
Thanks again for all you do!
-Dan
Multi-pass Adaptive Leveling: leveling only vocal content while leaving Music, road and wind noise, etc alone. Aside from machine learning, possible solutions could include dividing the audio spectrum into different frequency bins and processing each one of those bins separately to keep the same volume level, based on a Munson Loudness curve. Frequencies falling outside typical human speech ranges could be leveled to a much lower volume retaining a natural sound while automatically filtering out high and low frequency noise components
Multi-pass Adaptive Dynamic Range Compression: dividing the result of leveling into short Windows of time and bringing down the peaks of the vocal frequencies while also pushing down the frequencies containing high and low frequency noise by the same amount. Using multiple passes could dramatic pumping effects from happening, especially if you varied the time Windows chosen for each pass.
Adaptive Noise Gate: if sections of time do not contain vocal frequencies, silence those areas or lower the volume down on those sections by -12db
Adaptive Phase Correction: for each wave sample (or a given number of samples) if the positive and negative values differ from each other, balance the phase so each section is in proper phase. Also allows more headroom for leveling (auto fader) and dynamic range compression.
Canal pass for target loudness levels based on -14 LUFS/LUKS with true-peak limiting to -2db - this allows for consistency across vocal content and eliminates possibility for clipping once encoded and played back across the range of possible equipment.
Just spit balling here and throwing out some ideas. I suspect most of the functionality to do this is already there with the software, just hasn’t been compiled in a streamlined process targeted for video podcasts yet
Thanks again for all you do!
-Dan