PODCAST · technology

Kana & Mari’s SoundRepos

by Kana & Mari

Kana と Mari が、GitHub で見つけた TTS・MIDI・Audio など “音” にまつわる注目リポジトリを声で紹介。音とコードが交差するオープンソースの世界を軽やかにナビゲートします。 Kana と Mari のプロフィールはこちら：Kana – Newbie Esports CasterMari – Newbie Esports Analyst※ 本番組の原稿は生成 AI を用いて自動生成されています。内容には誤りを含む可能性がありますので参考情報としてお楽しみください。

Subscribe · 0 Bookmark

101

gpustack/vox-box

Vox Box は、OpenAI API 互換の音声認識（speech-to-text）/音声合成（text-to-speech）サーバーです。Whisper、FunASR、Bark、Dia、CosyVoice などのバックエンドモデルを切り替えて利用でき、/v1/audio/speech、/v1/audio/transcriptions、/v1/models、/v1/voices、/health などのAPIを提供します。

Jun 25, 2026

2m
100

zhenye234/CoMoSpeech

CoMoSpeechという、テキストから音声・歌声を生成するための拡散モデル/Consistency Modelベースの音声合成リポジトリです。1ステップ生成による高速推論を目指しており、教師モデルの蒸留による学生モデル学習、推論、LJSpeechを用いた学習コードが含まれています。HiFi-GAN вокoder を使ってメルスペクトログラムから波形を生成します。

Jun 24, 2026

2m
99

jscrane/TTS

Arduino向けのText-to-Speech（TTS）ライブラリです。英語の語彙・音素変換ルールと音声データをPROGMEMに保持し、PWMやDAC出力を使ってArduino系ボード上で音声合成を行います。

Jun 23, 2026

1m
98

hegedustibor/htgo-tts

Go言語向けのText-to-Speech（TTS）ライブラリです。Google Translateの音声生成APIを利用してテキストをMP3化し、ファイル保存や再生まで行えます。再生はmplayerを使う方法と、go-mp3 + oto/v2 を使うネイティブ再生の両方に対応しています。

Jun 22, 2026

1m
97

mtkresearch/BreezeApp

BreezeAPP 是一款為 Android 和 iOS 平台開發的純手機 AI 應用程式。從 App Store下載，即可在不連網的狀態下享受多項 AI 功能。源碼由聯發創新基地(MediaTek Research)提供。我們旨在推廣兩個概念: 人人都可以在自己的手機上自由選擇並運行不同的LLM - one is free to choose one's own LLM to run on a phone，以及任何app開發者都可以輕鬆寫作創意的純手機AI應用 - any dev can create purely phone-based AI apps easily。

Jun 21, 2026

2m
96

netease-youdao/Confucius4-TTS

Confucius4-TTS: a Multilingual and Cross-Lingual Zero-Shot TTS Engine

Jun 20, 2026

2m
95

Lyrcaxis/KokoroSharp

Fast local TTS inference engine in C# with ONNX runtime. Multi-speaker, multi-platform and multilingual. Integrate on your .NET projects using a plug-and-play NuGet package, complete with all voices.

Jun 19, 2026

2m
94

OEvortex/llm4free

LLM4Free — All-in-one Python toolkit for web search, AI interaction (40+ free providers), digital utilities, and more. Formerly WebScout.

Jun 18, 2026

2m
93

p0p4k/pflowtts_pytorch

Unofficial implementation of NVIDIA P-Flow TTS paper

Jun 17, 2026

1m
92

OpenMOSS/MOSS-Audio-Tokenizer

MOSS-Audio-Tokenizer is a Causal Transformer-based audio tokenizer built on the CAT architecture. Trained on 3M hours of diverse audio, it supports streaming and variable bitrates, delivering SOTA reconstruction and strong performance in generation and understanding—serving as a unified interface for next-generation native audio language models.

Jun 16, 2026

1m
91

worldwonderer/video-recap-skills

Turn any video into a narration recap with claude code skill｜用claude code skill把任何视频剪辑成中文解说视频，支持剪映导出

Jun 15, 2026

1m
90

thuhcsi/Crystal

Crystal - C++ implementation of a unified framework for multilingual TTS synthesis engine with SSML specification as interface.

Jun 14, 2026

1m
89

dunky11/voicesmith

[WIP] VoiceSmith makes training text to speech models easy.

Jun 13, 2026

1m
88

ekwek1/soprano-factory

Soprano-Factory: Train your own 2000x realtime text-to-speech model

Jun 12, 2026

1m
87

small-cactus/M.I.L.E.S

M.I.L.E.S, a GPT-4-Turbo voice assistant, self-adapts its prompts and AI model, can play any Spotify song, adjusts system and Spotify volume, performs calculations, browses the web and internet, searches global weather, delivers date and time, autonomously chooses and retains long-term memories. Available for macOS and Windows.

Jun 11, 2026

2m
86

Yazdi9/Talking_Face_Avatar

Avatar Generation For Characters and Game Assets Using Deep Fakes

Jun 10, 2026

1m
85

XilinJia/Podcini

Open source podcast instrument for Android supporting contents from YouTube and YT Music as well as normal podcasts.

Jun 9, 2026

1m
84

LonePheasantWarrior/TalkifyTTS

云端大模型驱动的 Android 语音合成应用（TTS引擎）。支持豆包、腾讯、微软、千问等模型。An Android text-to-speech (TTS) engine powered by cloud-based large language models. Supports models such as Doubao, Tencent, Microsoft, and Qwen.

Jun 8, 2026

1m
83

rishikksh20/FastSpeech2

PyTorch Implementation of FastSpeech 2 : Fast and High-Quality End-to-End Text to Speech

Jun 7, 2026

1m
82

herimor/voxtream

VoXtream is a Full-Stream Zero-shot TTS model with Extremely Low Latency and Speaking rate Control

Jun 6, 2026

2m
81

Hagsten/Talkify

Javascript Text to speech library

Jun 5, 2026

1m
80

robinhad/ukrainian-tts

Ukrainian TTS (text-to-speech) using ESPNET

Jun 4, 2026

1m
79

foyoux/pygtrans

谷歌翻译, 支持 APIKEY 一口气翻译十万条

Jun 3, 2026

1m
78

CMsmartvoice/One-Shot-Voice-Cloning

:relaxed: One Shot Voice Cloning base on Unet-TTS

Jun 2, 2026

1m
77

keonlee9420/DiffSinger

PyTorch implementation of DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (focused on DiffSpeech)

Jun 1, 2026

1m
76

AIFSH/ComfyUI-GPT_SoVITS

a comfyui custom node for GPT-SoVITS! you can voice cloning and tts in comfyui now

May 31, 2026

1m
75

yl4579/HiFTNet

HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform

May 30, 2026

1m
74

asiff00/On-Device-Speech-to-Speech-Conversational-AI

This is an on-CPU real-time conversational system for two-way speech communication with AI models, utilizing a continuous streaming architecture for fluid conversations with immediate responses and natural interruption handling.

May 29, 2026

1m
73

BinWang28/audio-ai-hub

The hub for audio AI research: papers, open models, benchmarks & datasets across audio LLMs, speech recognition, TTS, music & audio generation.

May 28, 2026

1m
72

Xerophayze/TTS-Story

TTS-Story is a web-based multi‑voice TTS studio for turning tagged scripts into audiobooks—featuring full speaker management, chunk review/regeneration, a job queue and library system, and local GPU or API backends including Kokoro, Chatterbox, VOX CPM, Pocket-TTS, Kitten-TTS, IndexTTS-2, QWEN3 TTS and Omnivoice engines

May 27, 2026

2m
71

AutoArk/GPA

[AutoArk] GPA (General Purpose Audio) can do ASR, TTS and voice conversion with one tiny model!

May 26, 2026

1m
70

elevenlabs/skills

Collections of skills for building with ElevenLabs

May 25, 2026

1m
69

KevinMIN95/StyleSpeech

Official implementation of Meta-StyleSpeech and StyleSpeech

May 24, 2026

1m
68

ddxfish/sapphire

She's the AI agent you come home to.

May 23, 2026

1m
67

shell-nlp/gpt_server

gpt_server是一个用于生产级部署LLMs、Embedding、Reranker、ASR、TTS、文生图、图片编辑和文生视频的开源框架。

May 22, 2026

1m
66

mlalma/kokoro-ios

Kokoro TTS for iOS and macOSX

May 21, 2026

1m
65

keonlee9420/DailyTalk

Official repository of DailyTalk: Spoken Dialogue Dataset for Conversational Text-to-Speech, ICASSP 2023

May 20, 2026

1m
64

DBJD-CR/astrbot_plugin_proactive_chat

一个能让 Bot 在私聊和群聊中发起主动消息的插件，拥有上下文感知、持久化数据、动态情绪、免打扰时段和 TTS 集成。还有独立 WebUI，可进行个性化配置。 An AstrBot plugin that enables Bot to send proactive messages in private and group chats, featuring context awareness, persistent data, dynamic emotions, do-not-disturb periods, and TTS integration. It also boasts an independent WebUI for personalized.

May 19, 2026

1m
63

devnen/Kitten-TTS-Server

Self-host the ultra-lightweight Kitten TTS model with this enhanced API server with an intuitive Web UI, large text processing for audiobooks, and GPU acceleration.

May 18, 2026

1m
62

leaonline/easy-speech

Cross browser Speech Synthesis also known as Text to speech or TTS; no dependencies; uses Web Speech API

May 17, 2026

1m
61

rendchevi/nix-tts

Nix-TTS: Lightweight and End-to-end Text-to-Speech via Module-wise Distillation

May 16, 2026

1m
60

HITsz-TMG/VideoClaw

AI 全自动化视频生成员工 | Your First AIGC Coworker. Chat an Idea. Get a Film.

May 15, 2026

2m
59

ChaituRajSagar/gemini-youtube-automation

A fully autonomous AI Agent/Python pipeline that utilizes Large Language Models (LLMs) like Gemini to generate content, produce videos, and automatically upload educational videos to YouTube.

May 14, 2026

1m
58

wildminder/awesome-ai-voice

List of open-source TTS, voice cloning, and music generation models

May 13, 2026

1m
57

mahimairaja/voiceai

Set of with to help those building Voice AI agents ️

May 12, 2026

1m
56

PowerBeef/QwenVoice

Vocello is a local-first voice generation app for Apple Silicon Macs. Public beta for macOS 26; QwenVoice v1.2.3 remains the stable macOS 15 fallback.

May 11, 2026

1m
55

r9y9/ttslearn

ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)

May 10, 2026

1m
54

livekit-examples/kitt

Talk to ChatGPT in real time using LiveKit

May 9, 2026

1m
53

yl4579/PL-BERT

Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions

May 8, 2026

1m
52

ElmTran/praises

Praises is a text-to-speech tool that can help you read text easily.

May 7, 2026

1m

View all 101 episodes →

Type above to search every episode's transcript for a word or phrase. Matches are scoped to this podcast.

Searching…

We're indexing this podcast's transcripts for the first time — this can take a minute or two. We'll show results as soon as they're ready.

No matches for "" in this podcast's transcripts.

Showing of matches

No topics indexed yet for this podcast.

Loading reviews...

Share your thoughts

ABOUT THIS SHOW

HOSTED BY

Kana & Mari

Produced by Aquariumy Studio Inc.

Frequently Asked Questions

How many episodes does Kana & Mari’s SoundRepos have?

Kana & Mari’s SoundRepos currently has 50 episodes available on PodParley. New episodes are automatically indexed when they're published to the podcast feed.

What is Kana & Mari’s SoundRepos about?

How often does Kana & Mari’s SoundRepos release new episodes?

Kana & Mari’s SoundRepos has 50 episodes. Check the episode list to see recent publication dates and frequency.

Where can I listen to Kana & Mari’s SoundRepos?

You can listen to Kana & Mari’s SoundRepos on PodParley by clicking any episode. We provide an embedded audio player for direct listening, and you can also subscribe via your preferred podcast app using the RSS feed.

Who hosts Kana & Mari’s SoundRepos?

Kana & Mari’s SoundRepos is created and hosted by Kana & Mari.

URL copied to clipboard!

gpustack/vox-box

zhenye234/CoMoSpeech

jscrane/TTS

hegedustibor/htgo-tts

mtkresearch/BreezeApp

netease-youdao/Confucius4-TTS

Lyrcaxis/KokoroSharp

OEvortex/llm4free

p0p4k/pflowtts_pytorch

OpenMOSS/MOSS-Audio-Tokenizer

worldwonderer/video-recap-skills

thuhcsi/Crystal

dunky11/voicesmith

ekwek1/soprano-factory

small-cactus/M.I.L.E.S

Yazdi9/Talking_Face_Avatar

XilinJia/Podcini

LonePheasantWarrior/TalkifyTTS

rishikksh20/FastSpeech2

herimor/voxtream

Hagsten/Talkify

robinhad/ukrainian-tts

foyoux/pygtrans

CMsmartvoice/One-Shot-Voice-Cloning

keonlee9420/DiffSinger

AIFSH/ComfyUI-GPT_SoVITS

yl4579/HiFTNet

asiff00/On-Device-Speech-to-Speech-Conversational-AI

BinWang28/audio-ai-hub

Xerophayze/TTS-Story

AutoArk/GPA

elevenlabs/skills

KevinMIN95/StyleSpeech

ddxfish/sapphire

shell-nlp/gpt_server

mlalma/kokoro-ios

keonlee9420/DailyTalk

DBJD-CR/astrbot_plugin_proactive_chat

devnen/Kitten-TTS-Server

leaonline/easy-speech

rendchevi/nix-tts

HITsz-TMG/VideoClaw

ChaituRajSagar/gemini-youtube-automation

wildminder/awesome-ai-voice

mahimairaja/voiceai

PowerBeef/QwenVoice

r9y9/ttslearn

livekit-examples/kitt

yl4579/PL-BERT

ElmTran/praises

Authentication Required