Ryan Bigg

Two Amazing Things: Thing #1

30 Jun 2009

Today two amazing things happened and I would like to share them with you in a two-part series (woah, look at me going all high-tech on my reader). Here’s the first:

Hampton Catlin

Today Hampton Catlin was talking with Dr Nic about Ruby 1.9 issues he was having with his wikimedia-mobile project, specifically he was getting incompatible character encodings: ASCII-8BIT and UTF-8. This is a guy I admire and look up to and think he’s “the shit”. He came to my company looking for help and it was my (and Bo’s) task to help him figure out what’s going on. Honoured.

The search box on his site was a bit wrong for fanciful languages like that German:

Wikipedia

and some pages threw some more interesting errors:

Argument Error

I’d seen this error before in my Ruby 1.9 testing, but that was so long ago that I had forgotten what context or even if I fixed it. Probably not.

I remembered someone linking to this post by Dave Thomas a while ago but forgot the link, but thanks to Google I was able to enter “Ruby 1.9 encodings” and it knew exactly what I was after. I followed the “instructions” and put # encoding: utf-8 at the top of the merb executable and the buffer.rb file in HAML (which, it turns out had no bearing on the final result). No luck. Then Hampton mentioned he put -KU on the end of the ruby interpreter which randomly fixed/broke random things. So I tried that, and got a couple of degrees of success.

I opened up irb1.9 -KU (yeah, I’m so cool I have two versions of Ruby installed, at the same time) and I knew of the encoding method you could call on a string in order to get the encoding of that string. So I tried something simple: “Ryan”.encoding which gave UTF-8 so I tried the German text and I wasn’t surprised when that also returned UTF-8. So what’s going on?

Well, turns out that even though we specified # encoding: utf-8 in the merb executable and even in a meta tag in the HTML, the HAML that was getting sent to the parser was being sent in ASCII-8BIT! Around this point Bo came in and we discovered the lovely force_encoding method for strings in order to… well, I’m sure you can figure it out.

This is the misbehaving line in haml 2.0.9 and to fix it we just do result.force_encoding(“UTF-8”) and that forces whatever’s being appended to the buffer to always be UTF-8!

Hampton was happy, we were happy, and karma rewarded me with a delicious steak sandwich + icecream with banana slices with maple syrup on top.