Software | Web | Coding

Customising Django Simple Captcha

I used the django-simple-captcha to protect the comments form from spamming. Some time after the site went live, I noticed spam comments starting to appear. 

The site does not generate much traffic, but the comment spam nonetheless became more and more prolific, with dozens of entries appearing each day. I took a look at the captcha to find out what was happening

The default captcha uses a string of random characters, but django-simple-captcha provides good customisation options using Django settings.

First I tried switching to a Math based captcha, in which the user is asked to complete simple arithmetic. 

This also failed to protect the form, with the same number of spam comments appearing.

I wanted to check the contents of the post request to see if the captcha was being circumvented somehow. I am using nginx to handle requests initially, so added this to my nginx config:

log_format postdata $request_body;

server {

    location / {      
        access_log  /var/log/nginx/postdata.log postdata;

This saves all post data to a dedicated log file. Looking at the post data I could see the spam captchas were being completed correctly (or close to) for both random char and math challenges. 

 - csrfmiddlewaretoken=gTcnDmbR2Ve8IgbDUVOFMkFfcJFqMRA5fh6ewAbtz94BKLjrX82FqK3vES01RS1p&name=Robertthuse&<>cialis+cheap<%2Fa><>viagra+online<%2Fa><>viagra+online<%2Fa>&captcha_0=9fb0cfcf30d6af14ae77e122251ec894a2a419bc&captcha_1=WFDM&entry=&entry=4&submit= - 

I decided not to try to further obfuscate the captcha, as it seems likely the machine recognition used by the spam bots would be at least as good as a human at reading this. It also makes captchas even more unpleasant for a human to deal with.

My solution was to take advantage of django-simple-captcha's custom challenge feature. A custom function is provided that must return a string for the challenge and the correct result. This provides the opportunity to make a human comprehensible adjustment that cannot be easily parsed by a bot, and will sit outside the standard behaviour they expect.

import random

def captcha_challenge():
    challenge = u''
    response = u''
    for i in range(4):
        digit = random.randint(0,9)
        challenge += str(digit)
        response += str((digit + 1) % 10)
    return challenge, response

In this case I am using a random string of digits and requiring the user add one to each. I adjusted the form to add this instruction for the user.

As I know the captcha image will be parsed regardless of the noise settings, I remove this element of randomness to make the captcha easier to use. 

# Plain output with no obfuscation as we are using a custom challenge
CAPTCHA_CHALLENGE_FUNCT = captcha_challenge

Of course, this example is easy to circumvent automatically, but the spam bots search all sites looking for easy targets, and will just fail and move on from a small site like this.

Add a comment


Add one to each digit. 9 becomes 0!