PODCAST · technology
Kana & Mari’s SoundRepos
by Kana & Mari
Kana と Mari が、GitHub で見つけた TTS・MIDI・Audio など “音” にまつわる注目リポジトリを声で紹介。音とコードが交差するオープンソースの世界を軽やかにナビゲートします。 Kana と Mari のプロフィールはこちら:Kana – Newbie Esports CasterMari – Newbie Esports Analyst※ 本番組の原稿は生成 AI を用いて自動生成されています。内容には誤りを含む可能性がありますので参考情報としてお楽しみください。
-
101
wildminder/awesome-ai-voice
List of open-source TTS, voice cloning, and music generation models
-
100
mahimairaja/voiceai
Set of with to help those building Voice AI agents ️
-
99
PowerBeef/QwenVoice
Vocello is a local-first voice generation app for Apple Silicon Macs. Public beta for macOS 26; QwenVoice v1.2.3 remains the stable macOS 15 fallback.
-
98
r9y9/ttslearn
ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)
-
97
livekit-examples/kitt
Talk to ChatGPT in real time using LiveKit
-
96
yl4579/PL-BERT
Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions
-
95
ElmTran/praises
Praises is a text-to-speech tool that can help you read text easily.
-
94
Elleo/pied
Pied makes it simple to install and manage text-to-speech Piper voices for use with Speech Dispatcher.
-
93
1neReality/MITSUHA
World's First Multilingual Inexpensive Therapeutic Sophisticated Ultra-responsive Holographic Agent. In simple terms, an AI you can talk to and it'll talk back with a body using VTube Studio.
-
92
rishikksh20/iSTFTNet-pytorch
iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform
-
91
frostming/tetos
A unified interface for multiple Text-to-Speech (TTS) providers.
-
90
atomicoo/FCH-TTS
A fast Text-to-Speech (TTS) model. Work well for English, Mandarin/Chinese, Japanese, Korean, Russian and Tibetan (so far). 快速语音合成模型,适用于英语、普通话/中文、日语、韩语、俄语和藏语(当前已测试)。
-
89
Executedone/Chinese-FastSpeech2
基于标贝数据继续训练,同时对原本的FastSpeech2模型做了改进,引入了韵律表征以及韵律预测模块,使中文发音更生动且富有节奏
-
88
JackismyShephard/ultimate-rvc
An app for creating audio-based content such as song covers and speech using Retrieval-based Voice Conversion.
-
87
mathigatti/midi2voice
Singing synthesis from MIDI file
-
86
trymirai/uzu
A high-performance inference engine for AI models
-
85
maum-ai/univnet
Unofficial PyTorch Implementation of UnivNet Vocoder (https://arxiv.org/abs/2106.07889)
-
84
LSimon95/megatts2
Unoffical implementation of Megatts2
-
83
makerjackie/MTTS
A Demo of Mandarin/Chinese TTS frontend
-
82
haoheliu/voicefixer_main
General Speech Restoration
-
81
ManimCommunity/manim-voiceover
Manim plugin for all things voiceover
-
80
travisvn/obsidian-edge-tts
Free, high quality text-to-speech for your Obsidian notes, leveraging Microsoft Edge's Read Aloud API.
-
79
zlargon/google-tts
Google TTS (Text-To-Speech) for node.js
-
78
developersdigest/ai-devices
AI Device Template Featuring Whisper, TTS, Groq, Llama3, OpenAI and more
-
77
zarazhangrui/personalized-podcast
Turn any content into a personalized AI podcast. NotebookLM-style, except you control the script, voices, and hosts. Listen in Apple Podcasts, Spotify, or any podcast app.
-
76
debpalash/OmniVoice-Studio
A Cinematic audio dubbing, Cloning and voice generation studio
-
75
akdeb/ElatoAI
Realtime Voice AI with 100+ Models on Arduino ESP32 for AI Toys, Companions, and Devices
-
74
izwi-ai/izwi
Voice AI runtime. Local first transcription, speaker diarization, TTS, and voice cloning with an OpenAI compatible API.
-
73
moonshine-ai/moonshine
Very low latency speech to text, intent recognition, and text to speech, for building voice agents and interfaces
-
72
Saganaki22/ComfyUI-OmniVoice-TTS
OmniVoice TTS nodes for ComfyUI - Zero-shot multilingual text-to-speech with voice cloning, voice design, and multi-speaker dialogue
-
71
OpenMOSS/MOSS-TTS-Nano
MOSS-TTS-Nano is an open-source multilingual tiny speech generation model from MOSI.AI and the OpenMOSS team. With only 0.1B parameters, it is designed for realtime speech generation, can run directly on CPU without a GPU, and keeps the deployment stack simple enough for local demos, web serving, and lightweight product integration.
-
70
Aratako/T5Gemma-TTS
Multilingual TTS model with voice cloning and duration control, based on T5Gemma encoder-decoder LLM
-
69
lmnt-com/wavegrad
A fast, high-quality neural vocoder.
-
68
mbzuai-oryx/LLMVoX
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM
-
67
Adri6336/gpt-voice-conversation-chatbot
Allows you to have an engaging and safely emotive spoken / CLI conversation with the AI ChatGPT / GPT-4 while giving you the option to let it remember things discussed.
-
66
richardr1126/openreader
An open-source read-along document reader server with high-quality TTS options, synchronized highlighting, and audiobook export for EPUB, PDF, DOCX, TXT, and MD.
-
65
Aratako/Irodori-TTS
A Flow Matching-based Text-to-Speech Model with Emoji-driven Style Control
-
64
LlmKira/fast-langdetect
⚡️ 80x faster Fasttext language detection out of the box | Split text by language
-
63
Sharrnah/whispering-ui
Native UI for the Whispering Tiger project - https://github.com/Sharrnah/whispering (live transcription / translation)
-
62
keonlee9420/Expressive-FastSpeech2
PyTorch Implementation of Non-autoregressive Expressive (emotional, conversational) TTS based on FastSpeech2, supporting English, Korean, and your own languages.
-
61
Agents365-ai/video-podcast-maker
AI-powered video podcast creation skill for coding agents. Supports Bilibili & YouTube, multi-language (zh-CN/en-US), 6 TTS engines (Edge/Azure/ElevenLabs/OpenAI/Doubao/CosyVoice), 4K Remotion rendering.
-
60
funnyzak/tts-now
跨平台基于云平台(阿里云、讯飞等)语音合成 API 的文字转语音助手。支持单文本快速合成和批量合成。支持windows、macOS、Linux。
-
59
yandexdataschool/speech_course
YSDA course in Speech Processing.
-
58
FlorianEagox/WeeaBlind
A program to dub non-english media with modern AI speech synthesis, diarization, and voice cloning!
-
57
sipeter/CloneTTS
A lightweight, offline Android Text-to-Speech (TTS) engine enabling seamless system-wide voice cloning and high-fidelity text reading. / 运行在安卓本地的轻量级文字转语音 (TTS) 引擎,支持离线发音人提取、零门槛音色克隆与双擎系统级全局听书。
-
56
TrevorS/voxtral-mini-realtime-rs
Voxtral ASR & TTS running natively and in the browser. A Rust implementation of Mistral's Voxtral mini realtime ASR / TTS using the Burn ML framework
-
55
Poeschl/Hassio-Addons
The repository for my Home Assistant Supervisor Add-ons.
-
54
Migushthe2nd/MsEdgeTTS
A simple Azure Speech Service module that uses the Microsoft Edge Read Aloud API. https://www.npmjs.com/package/msedge-tts
-
53
keonlee9420/Comprehensive-Transformer-TTS
A Non-Autoregressive Transformer based Text-to-Speech, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate TTS
-
52
Rongjiehuang/GenerSpeech
PyTorch Implementation of GenerSpeech (NeurIPS'22): a text-to-speech model towards zero-shot style transfer of OOD custom voice.
We're indexing this podcast's transcripts for the first time — this can take a minute or two. We'll show results as soon as they're ready.
No matches for "" in this podcast's transcripts.
No topics indexed yet for this podcast.
Loading reviews...
ABOUT THIS SHOW
Kana と Mari が、GitHub で見つけた TTS・MIDI・Audio など “音” にまつわる注目リポジトリを声で紹介。音とコードが交差するオープンソースの世界を軽やかにナビゲートします。 Kana と Mari のプロフィールはこちら:Kana – Newbie Esports CasterMari – Newbie Esports Analyst※ 本番組の原稿は生成 AI を用いて自動生成されています。内容には誤りを含む可能性がありますので参考情報としてお楽しみください。
HOSTED BY
Kana & Mari
CATEGORIES
Loading similar podcasts...