Phillip Pearson - web + electronics notes: Notes from DAS2006

2006-2-16

Notes from DAS2006

I just got back today from the 7th IAPR International Workshop on Document Analysis Systems (proceedings), held in Nelson from 13-15 Feb.

The presentations were all about document or image analysis, but the heavy use of AI techniques could make some of it relevant to what I work on these days.

Some of the interesting people I met or caught up with:

Adam Behringer (Exbiblio)
Abdel Belaid (Loria)
Jim Fruchterman (Benetech)
Koichi Kise (Osaka Prefecture University)
Bertin Klein (DFKI)
Marcus Liwicki (University of Bern)
Larry Spitz (DocRec Ltd)
Noorazrin Zakaria (Université de la Rochelle)

Projects I should take a look at:

GroupLens (recommendation system)
Semantic Wikis (workshop)
PRImA Research document database
OpenCV
Digital Library of India
IUPR camera-captured document archive
Tohoku University's OCR web service
IAM-OnDB pen-captured writing database

Techniques I should learn (or re-learn):

Gabor filters
Hidden Markov models
Standard classifiers: NNC, LDC
Analytical segmentation
Dynamic programming
Viterbi algorithm
Dynamic time warping
RAST algorithm for alignment
X-tree spatial indexing algorithm
Affine invariants
Gaussian mixed models

Things that should exist:

A better browser for mailing lists - that thinks more about the message content and tries to figure out what's going on, presenting more statistics etc in the list view to help you find interesting messages.
A browser for academic papers with tagging so you can collect together papers on a very specific subject without prejudicing the normal categorisation.
Realtime image stitching - build a panorama out of a video. (Existing: traffic monitoring.)
Connected component analysis on colour images.

... more like this: [Conferences]

Free OCR

Just so I don't forget - found this free OCR engine on SourceForge, a "commercial quality OCR engine originally developed at HP between 1985 and 1995".

Rumour has it that Google will be developing open source OCR soon...