Django Text Search with Haystack and Whoosh
May 19, 2010
I decided to improve the simple search on this blog using Haystack and Whoosh. Haystack is a Django plugin to allow text search, whilst Whoosh is a pure Python search backend.
settings.py
I copied haystack and whoosh as suppliers to my project, and then modified settings.py:
HAYSTACK_SITECONF = 'drumcoder.search_sites' HAYSTACK_SEARCH_ENGINE = 'whoosh' HAYSTACK_WHOOSH_PATH = '/home/user/web/drumcoder/index.whoosh' INSTALLED_APPS = ( #... 'haystack', #... )
The HAYSTACK_WHOOSH_PATH
is the location at which the index files will be stored.
search_sites.py
Next, inside my Django site, at the same level as settings.py
, I created a new search_sites.py
:
import haystack haystack.autodiscover()
This causes haystack to look for search_indexes.py
inside each app in the Django site.
blog/search_indexes.py
This fill maps the Django model object to the search engine.
import datetime from haystack.indexes import * from haystack import site from drumcoder.blog.models import BlogEntry class BlogEntryIndex(SearchIndex): text = CharField(document=True, use_template=True) def get_queryset(self): """ This is used when the entire index for model is updated, and should only include public entries """ return BlogEntry.objects.filter(publication_date__lte=datetime.datetime.now(), status=BlogEntry.LIVE_STATUS) site.register(BlogEntry, BlogEntryIndex)
This is a similar mechanism to the admin.py
setup. The key thing here is the text attribute, which will contain the text to index. We have use_template=True
here, so now we'll create a template that includes the text for a BlogEntry
that we wish to index.
templates/search/indexes/blog/blogentry_text.txt
This file can be placed anywhere in the template directories for your site, but it needs to be inside a folder named after the application (we're dealing with BlogEntry objects, which are in the Blog app), and the model object (BlogEntry). So in our case, we will need a blogentry_text.txt
template inside blog
, inside indexes
, inside search
.
{{ object.title }} {{ object.body }}
Here we're going to create the text index to include the title and body attributes from the BlogEntry.
templates/search/search.html
Next, we need a template for the search page. Create the search.html template inside a search directory.
{% extends 'template.html' %} {% block title %}Search{% endblock %} {% block content %} <h2>Search</h2> <form method="get" action="."> <table> {{ form.as_table }} <tr> <td> </td> <td><input type="submit" value="Search"></td> </tr> </table> {% if query %} <h3>Results</h3> {% for result in page.object_list %} <p><a href="{{ result.object.get_absolute_url }}">{{ result.object.title }}</a></p> {% empty %} <p>No results found.</p> {% endfor %} {% endif %} </form> {% endblock %}
urls.py
Finally in the code, we need to add the urls. At the top level urls.py
add:
(r'^search/', include('haystack.urls')),
Create the index
The final thing to do is to create the index. Run ./manage.py rebuild_index
to create the new search index.
Once that is done, go to /search/ on your server and have a play.
Indexing Links
Once you have the above working, it is trivial to add more fields to the index. For example, to add an index for the links application, we need two more files:
links/search_indexes.py
This maps the Link
object in the same way as we did for BlogEntry
above. As there is no status field or publication date on links, we don't have to worry about limiting the queryset.
import datetime from haystack.indexes import * from haystack import site from drumcoder.links.models import Link class LinkIndex(SearchIndex): text = CharField(document=True, use_template=True) site.register(Link, LinkIndex)
templates/search/indexes/links/link_text.txt
Here we define the fields from Link
we want to include in the search index.
{{ object.title }} {{ object.description }} {{ object.url }}
That's it - save, reindex, and you will now be able to search on Link
objects.
Updating the Index
There are a couple of ways to update the index. ./manage.py update_index
will add new entries to the index. ./manage.py rebuild_index
will recreate the index from scratch. An alternative approach that works fine for low volumes of change, like I have here, is to change the superclass of each Index to RealTimeSearchIndex
instead of just SearchIndex
:
import datetime from haystack.indexes import * from haystack import site from drumcoder.blog.models import BlogEntry class BlogEntryIndex(RealTimeSearchIndex): text = CharField(document=True, use_template=True) def get_queryset(self): """Used when the entire index for model is updated.""" return BlogEntry.objects.filter(publication_date__lte=datetime.datetime.now(), status=BlogEntry.LIVE_STATUS) site.register(BlogEntry, BlogEntryIndex)
This will automatically update the index when a BlogEntry is created or changed.