Installing ht://Dig on FreeBSD

2002-12-18

After installing mnoGoSearch on FreeBSD yesterday and realising that the context provided in search results only shows the top of the page, it's time to try ht://Dig today.

This one is a little easier to do than mnoGoSearch, because it doesn't use MySQL; it has its own builtin database. First, compile and install:

cd /usr/ports/textproc/htdig
make build install clean


Now you need to create the config file and customise it to fit your site:

cd /usr/local/etc/htdig
cp htdig.conf.sample htdig.conf
emacs htdig.conf


Change database_dir from (...)/database to (...)/database_foo, where foo is some identifier that will remind you what you are searching. I'm assuming you are at least considering installing many ht://Dig instances on your server, and you need a different database_dir for each one. If you only want one search engine, you don't need to change this.

Change start_url to point to the root of your website.

Change maintainer so you don't get the silly default e-mail address when you do log analysis.

Change exclude_urls to include everything you don't want to index. If you're indexing a Python Community Server, I recommend adding referers.py format=rss, which will prevent referrer ranking pages and comment XML feeds from being indexed.

Change bad_extensions. I added .xml, because I don't want to have rss.xml files showing in the search results. Also .fttb, so Radio templates don't show up. Put in anything here that you have on the server but will look silly in the results.

Now you can make the database directory and build the database:

cd /usr/local/share/htdig/
mkdir database_foo
rundig


Now put the CGI in your webspace, make sure it's executable, and try it out:

cp /usr/local/share/apache/cgi-bin/htsearch /path/to/cgi-bin/htsearch.cgi

See the mnoGoSearch install notes (linked above) for details on how to get a .cgi file to run.

Enjoy!

Did I get anything wrong? Drop me a line () and I'll correct it.

Update: The version in the ports tree doesn't do phrase matching. Here's how to install from source:

log in as root
wget http://www.htdig.org/files/htdig-3.2.0b3.tar.gz
tar -vzxf htdig-3.2.0b3.tar.gz
cd htdig-3.2.0b3
./configure --prefix=/usr/local && gmake && gmake install && gmake clean

edit /usr/local/conf/htdig.conf as before
rundig (with fingers crossed).

More detail here later.