RR 340: Strings and Encodings in Ruby with Aaron Lasseigne episode artwork

EPISODE · Dec 12, 2017 · 54 MIN

RR 340: Strings and Encodings in Ruby with Aaron Lasseigne

from Ruby Rogues · host Charles M Wood

Panel:Charles Max WoodDave KimuraEric BerryDavid RichardsIn this episode, the Ruby Rogues panel discuss Strings and Encodings in Ruby with Aaron Lasseigne. Aaron has been a Ruby developer for over a decade and is the author of Mastering Ruby: Strings and Encodings. Also, Aaron talks about his recent work on a service object Gem called Active Interaction. This is a great episode on learning about Strings and Encodings.In particular, we dive pretty deep on: Discussion Points (contributed by guests and hosts):•Why is it so important to understand strings?◦“The internet is powered by multimillion-dollar string manipulation machines. We put strings in a box, and get new strings out. While there’s plenty of mathy things that can happen in the middle, there is no denying the importance of strings in today’s world.” - Schneems◦They’re the only data structure that lies to you. You can see the exact contents of an array or hash but strings mask what’s happening. That’s why you can get situations when a single character has a length of 2.•What are character sets?◦A character set defines a group of characters, their order, and it assigns each an identifier (a code point).▪Unicode is a character set.◦What are code points?▪Unique identifiers within the character set.◦UTF-8, UTF-16, and UTF-32 are implementations of the Unicode character set▪Each has its own benefits•Normalization forms◦Different representations of the same character. We can represent “é” as a single character or as an “e” and a combining mark (2 characters). Normalization forms allow us to change between forms.◦There are 4 forms, NFC, NFD, NFKC, and NFKD and they all do slightly different things.▪They can be switched between with `String#unicode_normalize`.•Sorting◦Easy for English only but can be quite difficult with other languages. Sorting “e” and “é” can be tricky.•Security◦Identical characters, similar characters, and invisible characters can all be used to spoof user names.▪https://www.huffingtonpost.com/entry/how-to-avoid-downloading-a-fake-app_us_5a147d40e4b0f401dfa7eafb ▪https://www.reddit.com/r/Android/comments/7ahujw/psa_two_different_developers_under_the_same_name/ •The current state of Unicode support in Ruby. It was improved in 2.4 when methods like `upcase` started working with Unicode characters.•The addition of grapheme support in Ruby 2.5.•Freezing strings with `String#freeze` and with the special comment at the top of a file.◦Immutable strings may still make it into Ruby 3 as the default.•Character set expressions (a term I made up) for use with methods like `String#count` and `String#delete`.◦They’re like the inside of a regular expression character set (e.g. `[a-z]`)•Tofu and mojibake◦Tofu are those white boxes you see when a character doesn’t exist on your computer.◦Mojibake is when the characters show up but they don’t make sense because you’re using the wrong encoding or they were misencoded somewhere along the way.•Fixing bad characters◦Strings can be checked with `valid_encoding?`.◦`String#scrub` lets you replace invalid bytes with a single character which is the replacement character by default (that black diamond with a question mark in it).◦`String#encode` also does replacement work and will let you swap out characters if you go from something like UTF-8 to ASCII.▪You can even change out newline types with it.◦`Encoding::Converter` is an even more powerful way to convert but it’s a tool for when things go seriously wrong.Links: https://aaronlasseigne.comhttps://github.com/AaronLasseigne @AaronLasseigneMastering Ruby: Strings and EncodingsActive InteractionPicks:EricThe Secret of Luck Do Things That Don’t Scale GirlsDaveFireFox Quantum Davidchris.comhttps://juliasilge.com/blog/tidy-word-vectors/Charlesslack.comVisual Studio Code SharingPodcast for React And ViewAarondevdoc.ioRose MountainThe Dollop  Special Guest: Aaron Lasseigne. Advertising Inquiries: https://redcircle.com/brandsPrivacy & Opt-Out: https://redcircle.com/privacyBecome a supporter of this podcast: https://www.spreaker.com/podcast/ruby-rogues--6102073/support.

Panel:Charles Max WoodDave KimuraEric BerryDavid RichardsIn this episode, the Ruby Rogues panel discuss Strings and Encodings in Ruby with Aaron Lasseigne. Aaron has been a Ruby developer for over a decade and is the author of Mastering Ruby: Strings and Encodings. Also, Aaron talks about his recent work on a service object Gem called Active Interaction. This is a great episode on learning about Strings and Encodings.In particular, we dive pretty deep on: Discussion Points (contributed by guests and hosts):•Why is it so important to understand strings?◦“The internet is powered by multimillion-dollar string manipulation machines. We put strings in a box, and get new strings out. While there’s plenty of mathy things that can happen in the middle, there is no denying the importance of strings in today’s world.” - Schneems◦They’re the only data structure that lies to you. You can see the exact contents of an array or hash but strings mask what’s happening. That’s why you can get situations when a single character has a length of 2.•What are character sets?◦A character set defines a group of characters, their order, and it assigns each an identifier (a code point).▪Unicode is a character set.◦What are code points?▪Unique identifiers within the character set.◦UTF-8, UTF-16, and UTF-32 are implementations of the Unicode character set▪Each has its own benefits•Normalization forms◦Different representations of the same character. We can represent “é” as a single character or as an “e” and a combining mark (2 characters). Normalization forms allow us to change between forms.◦There are 4 forms, NFC, NFD, NFKC, and NFKD and they all do slightly different things.▪They can be switched between with `String#unicode_normalize`.•Sorting◦Easy for English only but can be quite difficult with other languages. Sorting “e” and “é” can be tricky.•Security◦Identical characters, similar characters, and invisible characters can all be used to spoof user names.▪https://www.huffingtonpost.com/entry/how-to-avoid-downloading-a-fake-app_us_5a147d40e4b0f401dfa7eafb ▪https://www.reddit.com/r/Android/comments/7ahujw/psa_two_different_developers_under_the_same_name/ •The current state of Unicode support in Ruby. It was improved in 2.4 when methods like `upcase` started working with Unicode characters.•The addition of grapheme support in Ruby 2.5.•Freezing strings with `String#freeze` and with the special comment at the top of a file.◦Immutable strings may still make it into Ruby 3 as the default.•Character set expressions (a term I made up) for use with methods like `String#count` and `String#delete`.◦They’re like the inside of a regular expression character set (e.g. `[a-z]`)•Tofu and mojibake◦Tofu are those white boxes you see when a character doesn’t exist on your computer.◦Mojibake is when the characters show up but they don’t make sense because you’re using the wrong encoding or they were misencoded somewhere along the way.•Fixing bad characters◦Strings can be checked with `valid_encoding?`.◦`String#scrub` lets you replace invalid bytes with a single character which is the replacement character by default (that black diamond with a question mark in it).◦`String#encode` also does replacement work and will let you swap out characters if you go from something like UTF-8 to ASCII.▪You can even change out newline types with it.◦`Encoding::Converter` is an even more powerful way to convert but it’s a tool for when things...

NOW PLAYING

RR 340: Strings and Encodings in Ruby with Aaron Lasseigne

0:00 54:27

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

JFK The Enduring Secret Jeff Crudele An in depth tutorial and discussion around the assassination of John F. Kennedy, (JFK) the country's 35th president who was brutally murdered in Dallas Texas on November 22, 1963. The series comprehensively explores the major facts, themes, and events leading up to the assassination in Dealey Plaza and the equally gripping stories surrounding the subsequent investigation. We review key elements of the Warren Commission Report , and the role of the CIA and FBI. We explore the possible involvement of the Mafia in the murder and the review of that topic by the government's House Select Committee on Assassinations in the 1970's. We explore the Jim Garrison investigation and the work of other key figures such as Mark Lane and others. Learn more about Lee Harvey Oswald the suspected killer and Jack Ruby the distraught Dallas night club owner with underworld ties and the man that killed Oswald as a national TV audience was watching. Stay with us as we take you through the facts and theorie Explicit 暗黑森林 The Dark Forest 榮忠豪/Ruby 盧春如/Joanna Wang 王若琳 社會總是希望人人都活在明亮。但一旦人的黑暗面露出的時候,社會會怎麼反應? 人性的黑暗總是被壓抑的而不被允許顯露, 但若這些邪惡的行為無法被壓下來 會有什麼事情發生? 本播客想透過真實殺人案件與其他暗黑的故事來探索人的黑暗面,但就像暗黑的森林,在黑暗的樹枝之中還是看得到光芒,提醒人們黑暗之處還是有希望的存在。 除了只關注故事的黑暗,『暗黑森林』也會專注在人們對於彼此的關懷,同情,與自我保護的重要性。來吧!跟著主持人 榮忠豪/Joanna 王若琳/Ruby 盧春如 一起走進 「暗黑森林」 Powered by Firstory Hosting Explicit Rogues Gallery 27th Letter Productions Kristen, M.J., and Chris investigate pop culture's most memorable villains, antiheroes, and misunderstood monsters to find out how they make being bad look so good. New episodes every other Thursday. Explicit Ruby Ryder – Pegging Paradise Ruby Ryder Your guide for pegging, anal sex, and bdsm Explicit

Frequently Asked Questions

How long is this episode of Ruby Rogues?

This episode is 54 minutes long.

When was this Ruby Rogues episode published?

This episode was published on December 12, 2017.

What is this episode about?

Panel:Charles Max WoodDave KimuraEric BerryDavid RichardsIn this episode, the Ruby Rogues panel discuss Strings and Encodings in Ruby with Aaron Lasseigne. Aaron has been a Ruby developer for over a decade and is the author of Mastering Ruby:...

Can I download this Ruby Rogues episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!