Encoding of non-ascii characters in URLs
Today I've been subjecting the PeopleAggregator API implementation to the 'Sam Ruby Iñtërnâtiônàlizætiøn test'. It went in and out just fine through XML-RPC, but the REST methods caused a bit more trouble. All sorted out now, but...
It turns out that Firefox, at least on my dev machine, encodes URLs as ISO-8859-1 (or perhaps Windows-1252), whereas Internet Explorer encodes them as UTF-8. I was trying to use PHP's mb_convert_encoding function to convert this, but it was just ignoring any non-ASCII chars.
The interesting thing about non-ascii chars in URLs and POSTDATA is that the browsers don't seem to send any indication of the charset used. Whether the content is UTF-8 or ISO-8859-1, all I get is "Content-Type: application/x-www-form-urlencoded". It would be nice to have "; charset=UTF-8" at the end, but it doesn't seem like I'm that lucky!
As a results of this, I've reduced the scope - PeopleAggregator will support UTF-8 and ISO-8859-1, with UTF-8 strongly preferred.
For Frontier's benefit, it will handle XML-RPC requests that pretend to be UTF-8 but are actually ISO-8859-1.