spam

A 3-post collection

Defeating manual spam, or damn dastardly conniving commenters!

I'm not especially keen on meta-blog posts, but the issue came up in email recently and I've this penchant for expounding at some length on interesting subjects, even in the least suitable medium, target audience: one. Fortunately I have a blog.

Not so fortunately, manually entered spam has been an issue. When you optimize for humans you regrettably include manual spammers.

Such spam is surprisingly devious, but here are some common characteristics:

  • Complimentary. "Wow this is a great post!"
  • Relative. "I don't like spam."
  • Unconstructive. Adds nothing of value.
  • Disingenuous. "I appreciate it because…"
  • Erroneous. "…this should keep spam out of my email inbox."

See! Manual spam is recognizably crude.

ego spam

Of course the payload is the link, and they aren't all blatant advertisements, but even sites which might appear legit may advertise themselves unscrupulously. As expected they will lack real content.

Here with BlogEngine.NET the payload is put in the website field and not the body (the name field is the link text). Neither asking for your name nor indicating the website field is ignored by search engines made a difference.

Akismet did however, and thus far I've had zero false positives, only false negatives. Some have been crafted so cleverly as to be very close, but after investigating I've concurred. If Akismet marks a unique (but crude) comment as spam I expect the link to be unsatisfactory given that it's the defining constant.

My recent addition of reCaptcha seems to have made the largest difference. Most likely because there's now some difficulty involved. I actually feel pretty good about this because the duality of distinguishing computers from humans while simultaneously solving complex problems that computers do poorly absolutely fascinates me. Given that solving my captcha is now no longer a technical waste of time, I know some readers will begin to feel that it isn't as well.

Discussion

Akismet support for BlogEngine.NET 1.5

4 Oct 2009 Not sure why I didn't notice before, but the Commentor extension has been around for some time! It solves what I still needed, a place to manage all of the comments from a centralized location. The code below still adds spam management to individual comments and they may work together rather without incident.

Since I previously mentioned comment spam I've experimentally bolstered the defenses of this blog in my update to BlogEngine.NET 1.5, and… received the same spam. It's manually entered, and highly deceptive, frequently a "thanks, X helped me with Y" or just ever so subtly off-topic, often only the spam URL giving it away. At least this has absolved naive captcha of blame (still a little randomization in field names for playback bots might be a good idea).

So, BlogEngine.NET community, let's do something about the manual spam problem and integrate Akismet with a spam moderation queue like the WordPress plugin.

I've personally tackled it in my usual hackish fashion and converted BlogEngine's entire moderation queue into a spam queue (essentially moderation enabled but non-flagged spam is immediately approved). Who wants to have to moderate everything anyway?

The actual Akismet comment checking is accomplished with Joel Thom's ASP.NET API you may need to download.

Following that, here are the relevant comparison reports:

BlogEngine.Core 1.5 with Akismet

BlogEngine.Web 1.5 with Akismet

Joel.Net.Akismet.1.0.1 for BlogEngine.NET

Don't expect perfection, in particular my error handling is probably unduly sparse. I'm sure I'll notice it when it breaks painfully.

You will need a WordPress API key to work Akismet in the first place. Get one and specify it with your blog URL at the top of BlogEngine.Web\User controls\CommentView.ascx.cs.

Further you'll have to compile the changes to Core, update the dll in Web's bin, add a reference to Core in Joel.Net.Akismet, compile, copy dll assembly, and reference that from Web as well, but you already knew that because you're a programmer and the using references give it away anyway. Right? ;)

Or you could use this tidy package I've provided, keeping in mind to only replace files/assemblies you haven't modified from the stock download, and merge the rest. (If you've no interest in source code you may ignore the BlogEngine.Core and Joel.Net.Akismet.1.0.1 folders, the compiled assemblies are already in BlogEngine.Web.)

You must also Enable comment moderation.

Additional Details

  • Commenters are notified if their comment requires moderation immediately (the JavaScript now provides, very hackishly, an "isModerated" case).
  • Moderated comments have a new administrative Ham link for submitting false positives back to Akismet (this also approves the comment).
  • Approved (visible) comments have a new administrative Spam link for submitting false negatives back to Akismet (this also deletes the comment).
  • These links have been added to the corresponding admin comment notification emails as well.
  • Comment moderation has been toggled on in settings.xml (for XML data source blogs).
  • The Joel.Net.Akismet API has been modified to take the HttpRequest and BlogEngine.Core.Post directly.
  • Only labels.resx has been updated, this package is not localized (and could be a little cleaner for that), I'm but one English speaking man.

Please improve upon this, make it an extension (I don't think it can be 100%), or otherwise incorporate it into an official version.

Discussion

Implementing a naive captcha in BlogEngine.NET

4 Oct 2009 Keith Ratliff went to the very involved work of converting BlogEngine's comment submission process from JavaScript-centric to postback and standard ASP.NET validation, thereby enabling a more or less drag and drop installation of reCAPTCHA. Hooray Keith! Fantastic work. See that post instead.

A couple years ago Mad Kristensen implemented an invisible captcha into BlogEngine.NET, but as my blog attested to, this is not enough.

Instead of inconveniencing readers with a captcha, you can use your own clever validation trick. The more unique it is, the less likely it will be automatically discovered and circumvented. When it is, you need a new trick.

A naive captcha is basically a captcha that's always the same image, and works off of the principle that your site isn't important enough for spammers to manually specify (how cheerful!), but if it's good enough for Coding Horror it's good enough for me.

Of course being an image itself resists the automated discovery of this particular trick, and if it is discovered, manually or otherwise, it's easy to change the image (it doesn't even have to be of text).

Implementing my own naive captcha here has been quite effective so far. My next step may be Akismet for manually entered spam.

Implement your own

The patched (against vanilla BlogEngine.NET 1.4.5) files are available. For making the change to your existing and customized blog, take a look at this comparison courtesy of Beyond Compare 3, or view the compact version below; this post needed some color.

You'll want to change the paths and formatting in CommentView.ascx to suit your liking, also the word "chicken".

Oh, and don't forget that my code sucks. Someone please be my guest and make this a properly coded BlogEngine.NET extension. My first attempt was with the strictly-server-side RegularExpressionValidator control you see commented out below, which I couldn't get to work, so I used existing mechanisms instead.

Modified (check margin) lines are in red. Unimportant differences are in blue (mostly, the JavaScript isn't truly commented). The rest is context.

Discussion