Actually there is a problem with the charset stuff in the XML-RPC spec. Or not in the spec, but in the implementations of the company whose former CEO created the spec (was that diplomatic enough? ;-)
This is very true. XML-RPC as implemented in most cases (e.g. Python, Java) sends all requests and responses as well-formed XML.
Most XML documents start with this:
Others start with this:
<?xml version="1.0" encoding="UTF-8"?>
According to the XML spec, these two are equivalent: the default character encoding for XML is UTF-8. So if you don't specify an
encoding attribute, the text should be encoded in UTF-8:
In the absence of information provided by an external transport protocol (e.g. HTTP or MIME), it is an error for an entity including an encoding declaration to be presented to the XML processor in an encoding other than that named in the declaration, or for an entity which begins with neither a Byte Order Mark nor an encoding declaration to use an encoding other than UTF-8. Note that since ASCII is a subset of UTF-8, ordinary ASCII entities do not strictly need an encoding declaration.
However, UserLand's Frontier web server (that also runs Radio) sends XML-RPC requests that start with this, but contain content encoded in ISO-8859-1 format:
This chokes the Python XML-RPC parser, and should (according to the spec) also fail with other implementations. Georg put in a nice hack for the Python Community Server that rewrites the XML declaration to look like:
<?xml version="1.0" encoding="ISO-8859-1"?>
... before it reaches the parser. This lets us interoperate fine with Radio, which is nice, but technically we're accepting non-well-formed XML.
There's a really simple solution here; the UserLand XML-RPC formatter needs to generate the above XML header (specifying ISO-8859-1 format) rather than the default one (implying UTF-8 format) when generating XML-RPC requests. Alternatively, it could format its XML with UTF-8, but changing the header is much much easier.
Jake - could you do this? It would make interop with Frontier much easier for the rest of us!
See also: an effbot.org note: Unofficial XML-RPC Errata:
For maximum interoperability, make sure you check the encoding attribute of the <?XML> header, if present. If the encoding attribute is not present, you must treat the request/response as UTF-8.
... more like this: [