Django ORM Extra

May 12, 2010

Given a two model structure, BlogEntry and BlogComment, where there is one BlogEntry to many BlogComments, then there are three possible ways of getting hold of the comment count for a given entry, when displaying a list of entries.

Simple

The simple approach is to use the reverse links on the model, to get hold of the blogcomment_set

{{entry.blogcomment_set.count}} comments

The disadvantage of this approach is that is requires an extra select per entry in the list.

Django ORM

The django ORM approach is to use extra:

lBlogEntries = BlogEntry.objects.all().extra(select={'comment_count': 
    'SELECT count(*) FROM blog_blogcomment WHERE blog_blogcomment.blog_entry_id=blog_blogentry.id'},)

This adds extra fields into the select. You can now use:

{{entry.comment_count}} comments

from the template.

Raw SQL

The final (and not recommended) route is to use raw SQL:

from django.db import connection

lCursor = connection.cursor()
lCursor.execute("""SELECT (SELECT count(*) 
                           FROM blog_blogcomment WHERE blog_blogcomment.blog_entry_id=blog_blogentry.id) 
                           AS "comment_count", 
                           "blog_blogentry"."publication_date", 
                           "blog_blogentry"."title", 
                           "blog_blogentry"."slug"
                           FROM "blog_blogentry" ORDER BY "blog_blogentry"."publication_date" DESC""")
lRows = lCursor.fetchall()
lBlogEntries = [ ]
for row in lRows:
    lEachBlogEntry = BlogEntry()
    lEachBlogEntry.comment_count = row[0]
    lEachBlogEntry.publication_date = row[1]
    lEachBlogEntry.title = row[2]
    lEachBlogEntry.slug = row[3]
    lBlogEntries.append(lEachBlogEntry)

This makes maintenance difficult, but it does allow finegrained control over which fields are fetched.

Performance Improvements

I've just revisited some forum code I wrote when I first started with Django. I always knew it was slow, but never realised quite how slow until I fired it up using the Django debug toolbar. To show a list of forums, along with the number of posts, last time that forum was posted to, and whether there were unread posts or not: 1068 queries, 81958ms. Gulp. Maybe I should have started using the debug toolbar sooner!

Anyway, replacing the complex set of Django ORM selects with this:

lForums = Forum.on_site.all().extra(select={
   'topic_count': 'SELECT count(*) FROM forum_forumtopic WHERE forum_forumtopic.forum_id = forum_forum.id',
   'post_count' : """SELECT count(*) FROM forum_topicpost WHERE forum_topicpost."forumTopic_id" IN 
      (SELECT id FROM forum_forumtopic WHERE forum_forumtopic.forum_id = forum_forum.id)""",
   'last_post_date' : """SELECT max(forum_topicpost.created) FROM forum_topicpost WHERE forum_topicpost."forumTopic_id" IN 
      (SELECT id FROM forum_forumtopic WHERE forum_forumtopic.forum_id = forum_forum.id)""",
   'unread_count' : """SELECT count(*) FROM forum_topicpost WHERE forum_topicpost."forumTopic_id" IN 
      (SELECT id FROM forum_forumtopic WHERE forum_forumtopic.forum_id = forum_forum.id) AND forum_topicpost.id NOT IN 
      (SELECT post_id FROM forum_postread WHERE forum_postread.user_id=%s)""" % request.user.id,
                                            })

brought the total down to eight queries, and 66ms. That's quite an improvement!

Tags: django orm extra