Phillip Pearson - web + electronics notes: Decentralised social networking | ElementTree returns normal strings if given a 7-bit document

2007-8-10

Decentralised social networking

There's a lot of chatter at the moment about portable social networks, but IMHO more important will be decentralised social networks.

Compare social networking to blogging. Blogging started out native to the web, with individual blogs on individuals' own sites. Services such as editThisPage.com, BlogSpot, and Radio eventually started hosted lots of blogs together under one domain, but the blogs themselves stayed more or less independent. It wasn't any harder to link to a blog hosted elsewhere as it was to link to one on the same site. Even within sites, syndication was via RSS rather than by something internal. A Radio user wanting to keep track of another Radio user would enter an RSS feed URL into their aggregator, in exactly the same manner as if they wanted to follow a blog hosted elsewhere. This pattern has generally remained.

Social networks, on the other hand, have always provided special means (friend links, groups, private messages, and more recent developments like the Facebook newsfeed) to connect with other users on the same site, while limiting external connections to HTML links and perhaps e-mail. RSS or Atom might be supported, however only as a second class citizen: you wouldn't use RSS to read your friend's blog.

This makes sense as an internal optimization, but it's time that we had a global social network that functioned the same way as the global network of blogs.

Way back when social networking on the web was in its infancy, FOAF enabled more or less exactly this. It has a critical limitation, though: privacy. FOAF is fine for distributing something like a blogroll or a list of Twitter friends (followees?) in a machine-readable form, but how about for presenting profile information which you want to keep private?

The solution there is to provide multiple views of your FOAF: unauthenticated users would see just the public details, while friends or family members could get more detail after authenticating themselves to your homepage.

So far this has been a dead end, as it's much easier to sign up for Facebook and use its built-in privacy controls than to create an account on each of your friends' sites.

Now it starts to get interesting. The basic OpenID protocol has been around and working fine for ages now. I'm not in the loop on how attribute exchange is going, but that's not required here.

The bootstrap starts with a simple profile page, an editor for the profile page, an OpenID identify provider just for the page owner, and an OpenID identity consumer so other people can log in from their own profile pages.

First you install the software on your server, create an account for yourself, fill in your details, upload your photos, and all that. Then you give your profile URL to all your friends. They install the software on their own servers, create their own pages, and head on back to yours, where they log in. If they want to mark you as a friend, your software asks you about it, and a friend relationship is established between your two URLs (as represented by FOAF and XFN).

You could make the login process smoother with a central authentication server that could proxy home URLs (http://site2.example.com/ redirects to http://authproxy.example.com/log-me-in, which sends you back to your own homesite to authenticate yourself to site2) or you could enable it with a bookmarklet that would send you straight back to the login page on your homesite.

Upon successful OpenID login, the two sites would exchange identity tokens, which could be used for private messaging or other communication in future.

Groups would presumably live on a separate server, or within the homesite of the .group owner.. Newsfeeds could be built by individual homesites by aggregating feeds of recent updates (authenticated, of course) from friends' homesites.

No magic is required here; it's just a matter of someone having the time to sit down a build all this. The difficulty is that after building all the software, your users own their own social networks, so you don't really have anything to monetize! If the progress of the blogging world is anything to go by, the money will be made by the first few big centralised social networks that also support this process for interoperability with other networks.

A related thing that I haven't thought about deeply yet is how it would work when individuals want to have multiple profiles. My thoughts above are all about making it so you can have one profile that lives on your own server, but what about when you want to have your own one, but you want to separate out some of your content (part of yourself?) elsewhere? More on this later ...

... more like this: [Decentralisation, Social Networking]

ElementTree returns normal strings if given a 7-bit document

I'm parsing some XML with ElementTree and trying to handle character encodings properly, but was confused as ET was giving me plain strings (types.StringType) rather than unicode strings (types.UnicodeType), which is what I'm used to.

Finally figured out that ET returns plain strings if given 7-bit input, so it should be safe to pass anything from ET through unicode() if you want to make sure input to something else is in unicode string format.

Test script:

#!/usr/bin/python2.5 -u

import sys
import traceback
from xml.etree import ElementTree as ET
from xml.etree import cElementTree as cET

def main():
    for eclair in (u'\xe9clair', u'plain text'):
        utf = eclair.encode("utf-8")
        iso = eclair.encode("iso-8859-1")
        for xml in (
            # expected to fail:
            u"""<?xml version="1.0"?><test>%s</test>""" % eclair, # unicode source - will fail
            """<?xml version="1.0"?><test>%s</test>""" % iso, # iso input specified as utf-8, will fail

            # expected to succeed:
            """<?xml version="1.0"?><test>%s</test>""" % utf, # correct utf-8 input, default encoding
            """<?xml version="1.0" encoding="utf-8"?><test>%s</test>""" % utf, # utf-8 specified as such
            """<?xml version="1.0" encoding="iso-8859-1"?><test>%s</test>""" % iso, # iso-8859-1 specified as such
            ):
            print "------ parsing %s" % `xml`
            try:
                tree = ET.fromstring(xml)
            except Exception, e:
                print "FAIL:",e
                continue
            print "to tree:",tree
            ctree = cET.fromstring(xml)
            print "    cET:",ctree
            print "string:",`tree.text`
            print "   cET:",`ctree.text`

main()

And the output:

$ ./et_utf8.py
------ parsing u'<?xml version="1.0"?><test>\xe9clair</test>'
FAIL: 'ascii' codec can't encode character u'\xe9' in position 27: ordinal not in range(128)
------ parsing '<?xml version="1.0"?><test>\xe9clair</test>'
FAIL: not well-formed (invalid token): line 1, column 27
------ parsing '<?xml version="1.0"?><test>\xc3\xa9clair</test>'
to tree: <Element test at b7d99a8c>
    cET: <Element 'test' at 0xb7d9c908>
string: u'\xe9clair'
   cET: u'\xe9clair'
------ parsing '<?xml version="1.0" encoding="utf-8"?><test>\xc3\xa9clair</test>'
to tree: <Element test at b7d99a0c>
    cET: <Element 'test' at 0xb7d9c950>
string: u'\xe9clair'
   cET: u'\xe9clair'
------ parsing '<?xml version="1.0" encoding="iso-8859-1"?><test>\xe9clair</test>'
to tree: <Element test at b7d9972c>
    cET: <Element 'test' at 0xb7d9c938>
string: u'\xe9clair'
   cET: u'\xe9clair'
------ parsing u'<?xml version="1.0"?><test>plain text</test>'
to tree: <Element test at b7d9992c>
    cET: <Element 'test' at 0xb7d9c908>
string: 'plain text'
   cET: 'plain text'
------ parsing '<?xml version="1.0"?><test>plain text</test>'
to tree: <Element test at b7d996cc>
    cET: <Element 'test' at 0xb7d9c950>
string: 'plain text'
   cET: 'plain text'
------ parsing '<?xml version="1.0"?><test>plain text</test>'
to tree: <Element test at b7d999ec>
    cET: <Element 'test' at 0xb7d9c938>
string: 'plain text'
   cET: 'plain text'
------ parsing '<?xml version="1.0" encoding="utf-8"?><test>plain text</test>'
to tree: <Element test at b7d9990c>
    cET: <Element 'test' at 0xb7d9c908>
string: 'plain text'
   cET: 'plain text'
------ parsing '<?xml version="1.0" encoding="iso-8859-1"?><test>plain text</test>'
to tree: <Element test at b7d99a0c>
    cET: <Element 'test' at 0xb7d9c950>
string: 'plain text'
   cET: 'plain text'

Some notes from the above

Input to ET should be of type types.StringType, with a proper encoding specification in the XML header.
cElementTree and ElementTree return consistent results.

... more like this: [ElementTree, Python]