Releases: Huanshere/VideoLingo
Releases · Huanshere/VideoLingo
v2.1.2
v2.1.1
Release Notes
🐛 Bug Fixes:
- Use empty audio as a fallback for failed TTS tasks and handle empty lines in TTS tasks generation.
- Handle multiple spaces when merging words and allow multiple splits in second segmentation.
- Add support for Grok beta and resolve compatibility issues in the
askgpt
function. - Pin ctranslate2 to version 4.5.0 to avoid errors.
发布说明
🐛 问题修复:
- 为失败的 TTS 任务使用空音频作为回退,并处理 TTS 任务生成中的空行。
- 在合并单词时处理多个空格,并允许在第二次分段中进行多次分割。
- 支持 Grok beta 并解决
askgpt
函数中的兼容性问题。 - 将 ctranslate2 固定为 4.5.0 版本,以避免报错。
v2.1.0
Release Notes
🚀 New Features:
- Added support for custom terms.
- Added a custom TTS setting.
- Added support for Deepseekcoder.
- Added support for pure local operation using Ollama and Edge-TTS.
- Unified TTS methods with 302ai integration, now requiring only one 302ai key to experience the full functionality.
🐛 Bug Fixes:
- Fixed the NVIDIA GPU check.
- Added a check for
[br]
in the response splitting step. - Avoided errors by not checking the source audio bit rate.
🔧 Improvements:
- Removed automatic FFmpeg installation, now requiring a system-level install.
🚀 新功能:
- 增加了自定义术语的支持。
- 增加了自定义 TTS 设置。
- 增加了对 Deepseekcoder 的支持。
- 增加了对纯本地运行的支持,使用 Ollama 和 Edge-TTS。
- 与 302ai 集成统一了 TTS 方法,现在只需要一个 302ai 密钥就能体验完整功能。
🐛 问题修复:
- 修复了 NVIDIA GPU 检查。
- 在响应拆分步骤中增加了对
[br]
的检查。 - 通过不检查源音频比特率避免了错误。
🔧 改进:
- 移除了自动 FFmpeg 安装,现在需要系统级安装。
v2.0.4
Release Notes
🚀 New Features:
- Added Demucs configuration in the Streamlit sidebar and increased human voice volume after Demucs processing, recommended for videos with loud background music.
- Added memory cleanup after Demucs and improved Demucs audio quality.
🐛 Bug Fixes:
- Reduced subtitle font size to prevent double-line subtitles.
- Fixed the "All same length" error. Note: This issue may still occur with smaller models or reverse API usage.
- Added handling for exceeding dubbing length limits, now truncating instead of throwing an error.
🔧 Improvements:
- Optimized prompt splitting to handle repeated text.
- Improved language style and removed the concise requirement from translation prompts.
- Updated
install.py
to rollback to the v2.0 installation method, removing local third-party installation to avoid errors. - Installation now directly installs whisperX and Demucs from Git, avoiding dependency conflicts.
🚀 新功能:
- 在 Streamlit 侧边栏中添加了 Demucs 配置,并在 Demucs 处理后增加了人声音量,适用于背景音乐较大的视频。
- 在 Demucs 处理后添加了内存清理,并提升了 Demucs 音质。
🐛 问题修复:
- 减小了字幕字体大小,防止出现双行字幕。
- 修复了 "All same length" 错误。注意:使用较小的模型或逆向接口时仍可能报错。
- 增加了配音长度超限的处理,现在会截断而不是抛出错误。
🔧 改进:
- 优化了 prompt 分割,处理重复文本。
- 优化了翻译 prompt 的语言风格并移除了简洁要求。
- 更新了
install.py
,回滚到 v2.0 安装方法,移除了本地第三方安装,避免错误。 - 安装现在直接从 Git 安装 whisperX 和 Demucs,避免了依赖冲突。
v2.0.3
Release Notes
December 2, 2024
🐛 Bug Fixes:
- Fixed an alignment error in the voice generation task.
- Resolved an audio reading issue during the whisper process.
- Loosened the audio speed change error tolerance to reduce errors.
- Applied stricter JSON constraints.
🔧 Improvements:
- Added memory cleanup in batch mode to improve performance.
2024年12月2日
🐛 问题修复:
- 修复了生成配音任务的对齐错误。
- 解决了 whisper 过程中的读取音频问题。
- 放宽了音频速度变化的误差容限,以减少错误。
- 应用了更严格的 JSON 约束。
🔧 改进:
- 在批量模式中添加了内存清理,以提高性能。
v2.0.2-deprecated
Release Notes
🐛 Bug Fixes:
- Removed the one-click installation script due to instability on some computers.
- Removed conda installation of FFmpeg due to issues on Windows.
- Fixed issues with gptsovits English support.
- Reverted the local installation of FFmpeg.
🔧 Improvements:
- Set the audio bitrate to 32000 to improve recognition accuracy.
🐛 问题修复:
- 移除了一键安装脚本,因为在某些电脑上表现不稳定。
- 移除了 conda 安装 FFmpeg,因为在 Windows 上无法正常使用。
- 修复了 gptsovits 英文支持的问题。
- 回滚了 FFmpeg 的本地安装。
🔧 改进:
- 将音频比特率设置为 32000 以提高识别精度。
v2.0.1-deprecated
🔧 Improvements:
- Simplify installation process: Simplified the installation process. Now, Windows users no longer need to manually install CUDA, and the installation of FFmpeg and whisperX has been streamlined.
- Optimized minor issues: Various small issues have been optimized for better performance and user experience.
🔧 改进:
- 简化安装过程:简化了安装过程。现在,Windows 用户不再需要手动安装 CUDA,FFmpeg 和 whisperX 的安装也进行了简化。
- 优化了一些小问题:对一些小问题进行了优化,以提高性能和用户体验。
v2.0
V 2.0
🚀 New Features:
- The default LLM has been switched to SiliconFlow's Qwen-72B, which is exceptionally useful!
- Integrated SiliconFlow's Fish TTS cloning voiceover feature, making VideoLingo accessible with just one key!
- Solved the inconsistency issue between dubbing and subtitles by using on-screen subtitles for dubbing tasks!
- Introduced OneKeyInstall to simplify the installation process, so there's no need to understand coding anymore!
🐛 Bug Fixes:
- Fixed a critical error causing subtitles to become misaligned after multiple splits.
- Resolved the issue with Rich box import.
- Fixed FFmpeg error.
- Replaced MoviePy due to an error it caused.
🔧 Improvements:
- Enhanced code readability.
- Added checks for Hugging Face's mirror sites.
- Modified the audio extraction method to retain the original audio, compressing only during Whisper processing to improve audio quality.
- Batch execution of TTS.
V 2.0
🚀 新功能:
- 将默认 LLM 切换为硅基流动的 Qwen-72B,超好用!
- 集成了硅基流动的 Fish TTS 克隆配音功能,现在一个 Key 就能畅行 VideoLingo!
- 通过使用显示的字幕进行配音任务,解决了配音和字幕之间的不一致问题!
- 引入了 OneKeyInstall 简化安装流程,再也不需要懂代码了!
🐛 问题修复:
- 修复了导致多次分割后字幕错位的严重错误。
- 修复了 Rich box 导入问题。
- 修复了 FFmpeg 错误。
- 由于 MoviePy 引起错误,已将其替换。
🔧 改进:
- 提高代码可读性。
- 增加了 Hugging Face 镜像站点的检查。
- 修改了音频提取方式,保留原始音频,仅在 Whisper 处理过程中进行压缩,从而提高了音频质量。
- 批量执行 TTS。
v1.8.0
Release Notes
🔧 Improvements:
- Refined and optimized the overall code structure for better performance and maintainability.
- Enhanced the prompt to be more concise and applicable to a wider range of models.
- Improved fuzzy and precise matching in translation processes.
🐛 Bug Fixes:
- Fixed several errors occurring during the FFmpeg compression process.
- Resolved an issue with phrase errors caused by lack of initialization and null returns from model translations.
📝 Updates:
- Demucs vocal separation is no longer performed by default before transcription, addressing the issue of missing sentences and improving processing speed.
- Removed support for the whisperX replicate API to simplify the project as an open-source initiative.
- Adjusted the translation process to handle smaller segments, reducing the likelihood of errors.
发布说明
🔧 改进:
- 精简优化了整体代码结构,提高了性能和可维护性。
- 优化了提示词,使其更加精简,适用于更多模型。
- 改进了翻译过程中的模糊和精确匹配。
🐛 问题修复:
- 修复了在 FFmpeg 压制过程中发生的一些错误。
- 解决了由于未初始化和模型翻译返回空值导致的 phrase 错误。
📝 更新:
- 默认不在转录前进行 Demucs 人声分离,以解决遗漏句子的问题并提高处理速度。
- 删除了 whisperX replicate API 的支持,以简化开源项目。
- 调整翻译过程为处理更小的块,减少错误的可能性。
v1.7.1
Release Notes
🚀 New Features:
- Added MPS support for Demucs to improve performance.
- Implemented an error retry mechanism for batch processing.
- Set the default to use the Gemini model and updated documentation accordingly.
- Added an auto-update feature for
ytdlp
.
🐛 Bug Fixes:
- Corrected a long video segmentation error.
- Fixed loading issues with the local Chinese Whisper model.
- Improved audio splitting robustness and encoding handling.
- Resolved issues with handling reference audio prerequisites for GPT-SoVITS batch processing.
- Correctly implemented retry on translation failures.
🔧 Improvements:
- Updated the CPU-specific torch version in the installation process.
- Refactored to simplify the prompt reasoning chain due to minimal improvement.
📝 Updates:
- Removed the install option in the OneKey batch script.
- Added a section for the SaaS website in the documentation.
🚀 新功能:
- 为 Demucs 增加了 MPS 支持以提升性能。
- 实现了批处理的错误重试机制。
- 设置默认使用 Gemini 模型并相应更新了文档。
- 新增
ytdlp
自动更新功能。
🐛 问题修复:
- 修正了长视频分段错误。
- 修复了本地中文 Whisper 模型的加载问题。
- 改进了音频分割的鲁棒性和编码处理。
- 解决了 GPT-SoVITS 批处理的参考音频前提条件问题。
- 正确实现了翻译失败时的重试机制。
🔧 改进:
- 更新了安装过程中与 CPU 相关的 torch 版本。
- 重构以简化提示推理链条,因其改进效果有限。
📝 更新:
- 移除了 OneKey 批处理脚本中的安装选项。
- 在文档中添加了 SaaS 网站的部分。