Here's how to connect the ht://Dig search engine to PyCS, and get a decent search function for your weblogs that respects any access controls you may have set up.
NOTE: I ran into stability problems running this on a FreeBSD server. It seemed to work fine on my development Linux box, though. You might want to test it for a while before running it on a public system.
First, note that ht://Dig is covered by the GPL, whereas PyCS is covered by the MIT license. They are compatible, but any combination will end up being covered by the GPL, so you can't make a closed source version of PyCS that includes the ht://Dig connection.
First, get the modified version of ht://Dig (more info).
Unpack and apply.
tar -vzxf htdig-pycs-snapshot-*.tar.gz
Configure, make, make install (change /usr/local to /usr if your Python libraries are in /usr/lib/python2.2 instead of /usr/local/lib/python2.2).
./configure --prefix=/usr/local --with-python=yes
Now install the files (you need to be root to do this).
Now patch Medusa; edit src/pycs/medusa/http_server.py and go to line 498. Change the following:
# r.handler = h # CYCLE h.handle_request (r) except: self.server.exceptions.increment() (file, fun, line), t, v, tbinfo = asyncore.compact_traceback()
# r.handler = h # CYCLE h.handle_request (r) except SystemExit: raise except: self.server.exceptions.increment() (file, fun, line), t, v, tbinfo = asyncore.compact_traceback()
Now PyCS needs to know where you have installed the
_htsearch module (it is in /usr/local/cgi-bin/ in this example). Edit
etc/pycs/pycs.conf and add the following lines:
enablehtdig = yes
htsearchpath = /usr/local/cgi-bin
htsearchconf = /usr/local/etc/htdig.pycs.conf
Now you should be able to restart PyCS and the
/modules/search.py page will work. If you get an error about not having the search path or config path set up, check to make sure the lines are in the right place in
pycs.conf. Look in the
etc.log files for messages that might help. Also try running the
/usr/local/cgi-bin/qtest program to make sure your ht://Dig installation is working properly. It should return some matches - "No matches" is a sign that your database isn't working or you haven't run the crawler yet. Don't forget to edit the config file and run
rundig to initialise the database ... you might need to
mkdir -p /usr/local/var/htdig before it'll work.
pysearch, by Paul Erickson. It goes further than I have with the output, and returns it as some sort of collection of Python objects. However, it doesn't run with newer versions of ht://Dig and Python.