51 Ways to Spell the Image Giraffe:  (39c3) episode artwork

EPISODE · Dec 28, 2025 · 38 MIN

51 Ways to Spell the Image Giraffe: (39c3)

from Chaos Computer Club - recent events feed (high quality) · host Ting-Chun Liu, Leon-Etienne Kühr

Generative AI models don't operate on human languages – they speak in **tokens**. Tokens are computational fragments that deconstruct language into subword units, stored in large dictionaries. These tokens encode not only language but also political ideologies, corporate interests, and cultural biases even before model training begins. Social media handles like *realdonaldtrump*, brand names like *louisvuitton*, or even *!!!!!!!!!!!!!!!!* exist as single tokens, while other words remain fragmented. Through various artistic and adversarial experiments, we demonstrate that tokenization is a political act that determines what can be represented and how images become computable through language. Tokens are the fragments of words that generative models use to process language, the step that breaks text into subword units before any neural networks are involved. There are 51 ways to combine tokens to spell the word giraffe using existing vocabulary: from a single token **giraffe** to splits using multiple tokens like *gi|ra|ffe*, *gira|f|fe*, or even *g|i|r|af|fe*. In one experiment, we hijacked the prompting process and fed token combinations directly to text-to-image models. With variations like *g|iraffe* or *gir|affe* still generating recognizable results, our experiments show that the beginning and end of tokens hold particular semantic weight in forming giraffe-like images. This reveals that certain images cannot be generated through prompting alone, as the tokenization process sanitizes most combinations, suggesting that English, or any human language, is merely a subset of token languages. The talk features experiments using genetic algorithms to reverse-engineer prompts from images, respelling words in token language to change their generative outcomes, and critically examining token dictionaries to investigate edge cases where the vocabulary breaks down entirely, producing somewhat *speculative languages* that include strange words formed at the edge of chaos where English meets token (non-)sense. These experiments show that even before generation occurs, token dictionaries already encode a stochastic worldview, shaped by the statistical frequencies of their training data – dominated by popular culture, brands, platform-speak, and *non-words*. Tokenization is, therefore, a political act: it defines what can be represented and how the world becomes computationally representable. We will look at specific tokens and ask: Which models use which vocabularies? What *non-word* tokens are shared among models? And how do language models make sense of a world using a language we do not understand? Licensed to the public under http://creativecommons.org/licenses/by/4.0 about this event: https://events.ccc.de/congress/2025/hub/event/detail/51-ways-to-spell-the-image-giraffe-the-hidden-politics-of-token-languages-in-generative-ai

Generative AI models don't operate on human languages – they speak in **tokens**. Tokens are computational fragments that deconstruct language into subword units, stored in large dictionaries. These tokens encode not only language but also political ideologies, corporate interests, and cultural biases even before model training begins. Social media handles like *realdonaldtrump*, brand names like *louisvuitton*, or even *!!!!!!!!!!!!!!!!* exist as single tokens, while other words remain fragmented. Through various artistic and adversarial experiments, we demonstrate that tokenization is a political act that determines what can be represented and how images become computable through language. Tokens are the fragments of words that generative models use to process language, the step that breaks text into subword units before any neural networks are involved. There are 51 ways to combine tokens to spell the word giraffe using existing vocabulary: from a single token **giraffe** to splits using multiple tokens like *gi|ra|ffe*, *gira|f|fe*, or even *g|i|r|af|fe*. In one experiment, we hijacked the prompting process and fed token combinations directly to text-to-image models. With variations like *g|iraffe* or *gir|affe* still generating recognizable results, our experiments show that the beginning and end of tokens hold particular semantic weight in forming giraffe-like images. This reveals that certain images cannot be generated through prompting alone, as the tokenization process sanitizes most combinations, suggesting that English, or any human language, is merely a subset of token languages. The talk features experiments using genetic algorithms to reverse-engineer prompts from images, respelling words in token language to change their generative outcomes, and critically examining token dictionaries to investigate edge cases where the vocabulary breaks down entirely, producing somewhat *speculative languages* that include strange words formed at the edge of chaos where English meets token (non-)sense. These experiments show that even before generation occurs, token dictionaries already encode a stochastic worldview, shaped by the statistical frequencies of their training data – dominated by popular culture, brands, platform-speak, and *non-words*. Tokenization is, therefore, a political act: it defines what can be represented and how the world becomes computationally representable. We will look at specific tokens and ask: Which models use which vocabularies? What *non-word* tokens are shared among models? And how do language models make sense of a world using a language we do not understand? Licensed to the public under http://creativecommons.org/licenses/by/4.0 about this event: https://events.ccc.de/congress/2025/hub/event/detail/51-ways-to-spell-the-image-giraffe-the-hidden-politics-of-token-languages-in-generative-ai

NOW PLAYING

51 Ways to Spell the Image Giraffe: (39c3)

0:00 38:28

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

No similar episodes found.

LIGHTS, CAMERA, SMILE! Creatives Club Media Lights, Camera, Smile, is a podcast for anyone with a dream to share something with the world, out of the overflow of themselves - be it their mind, their heart, their personalities, and much more. Each of us are alive in this moment in time, with an innate ability to have ideas and create various things to benefit both ourselves and the people around us for a reason, and here, you will find the encouragement, the inspiration, and the motivation to do just that. Hosted by Cicily, founder of Creatives Club, she dives into various topics surrounding creativity and business. Exploring entrepreneurship for creatives in a corporate reality, sharing tips and tricks in a media centered company, answering questions regarding what a creative actually is are just a few of the things discussed on this podcast. Be encouraged to create for yourself as Cicily gets vulnerable by pivoting the camera to herself for the first time.To submit questions for Cicily to answer, or have her address certain t Chewing the Fat with WorkForge WorkForge Bite-Sized Conversations for Building a Stronger Workforce Welcome to Chewing the Fat, a podcast delving deep into the world of food manufacturing. Dive into real conversations around critical topics like staffing, retention, onboarding, and career development in this essential industry. Subscribe now to gain insights from your peers, subject matter experts and more on the biggest issues facing food manufacturers today: -Hiring and retaining employees -Addressing the challenges of the Silver Tsunami -Improving time to productivity of new employees -Engaging employees from hire to retire And more... Tune in to Chewing the Fat, a WorkForge podcast, and join the conversation on how to build and sustain a resilient, high-performing workforce in food manufacturing. Sermons | Countryside Bible Church Countryside Bible Church At Countryside Bible Church, we equip believers to joyfully live holy lives, to serve one another, and to share the gospel of Jesus Christ, all to the glory of God. We are committed to a high view of God, and a high view of Scripture. The PFN Cincinnati Bengals Podcast Pro Football Network The PFN Cincinnati Bengals Podcast is where you can stay up-to-date with the latest news and analysis on the Cincinnati Bengals! Our hosts, industry experts Jay Morrison and Dallas Robinson, provide weekly coverage of all the latest rumors and updates about the Bengals. Don’t forget to follow the show to receive new episodes directly in your podcast feed and leave a rating and review to let us know your thoughts.

Frequently Asked Questions

How long is this episode of Chaos Computer Club - recent events feed (high quality)?

This episode is 38 minutes long.

When was this Chaos Computer Club - recent events feed (high quality) episode published?

This episode was published on December 28, 2025.

What is this episode about?

Generative AI models don't operate on human languages – they speak in **tokens**. Tokens are computational fragments that deconstruct language into subword units, stored in large dictionaries. These tokens encode not only language but also political...

Can I download this Chaos Computer Club - recent events feed (high quality) episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!