Phillip Pearson - web + electronics notes

tech notes and web hackery from a new zealander who was vaguely useful on the web back in 2002 (see: python community server, the blogging ecosystem, the new zealand coffee review, the internet topic exchange).

2003-7-1

Funky

... more like this: []

Google ads

I just put some Google AdSense ads up on the Blogging Ecosystem. I wonder if anyone will click on them ...
... more like this: []

Invalid XML-RPC from Radio / Frontier

Georg Bauer:

Actually there is a problem with the charset stuff in the XML-RPC spec. Or not in the spec, but in the implementations of the company whose former CEO created the spec (was that diplomatic enough? ;-)

This is very true. XML-RPC as implemented in most cases (e.g. Python, Java) sends all requests and responses as well-formed XML.

Most XML documents start with this:

    <?xml version="1.0"?>

Others start with this:

    <?xml version="1.0" encoding="UTF-8"?>

According to the XML spec, these two are equivalent: the default character encoding for XML is UTF-8. So if you don't specify an encoding attribute, the text should be encoded in UTF-8:

In the absence of information provided by an external transport protocol (e.g. HTTP or MIME), it is an error for an entity including an encoding declaration to be presented to the XML processor in an encoding other than that named in the declaration, or for an entity which begins with neither a Byte Order Mark nor an encoding declaration to use an encoding other than UTF-8. Note that since ASCII is a subset of UTF-8, ordinary ASCII entities do not strictly need an encoding declaration.

However, UserLand's Frontier web server (that also runs Radio) sends XML-RPC requests that start with this, but contain content encoded in ISO-8859-1 format:

    <?xml version="1.0"?>

This chokes the Python XML-RPC parser, and should (according to the spec) also fail with other implementations. Georg put in a nice hack for the Python Community Server that rewrites the XML declaration to look like:

    <?xml version="1.0" encoding="ISO-8859-1"?>

... before it reaches the parser. This lets us interoperate fine with Radio, which is nice, but technically we're accepting non-well-formed XML.

There's a really simple solution here; the UserLand XML-RPC formatter needs to generate the above XML header (specifying ISO-8859-1 format) rather than the default one (implying UTF-8 format) when generating XML-RPC requests. Alternatively, it could format its XML with UTF-8, but changing the header is much much easier.

Jake - could you do this? It would make interop with Frontier much easier for the rest of us!

See also: an effbot.org note: Unofficial XML-RPC Errata:

For maximum interoperability, make sure you check the encoding attribute of the <?XML> header, if present. If the encoding attribute is not present, you must treat the request/response as UTF-8.

... more like this: [, ]