Wiki registration spam

Deiz · April 25, 2015, 4:40pm

A number of Open Knowledge-administered wikis, such as wiki.okfn.org and wiki.opendataday.org are receiving a continuous stream of ~10 spam user registrations per day, which can’t be filtered out from the wikis’ Special:RecentChanges pages without using the RecentChangesLogFilter extension. As a result, it’s difficult to see what legitimate users are working on.

It seems this topic last came up in March 2014, on the okfn-discuss list, and that the end result was to use a combination of this Questy CAPCHA in conjunction with Akismet. While Akismet seems to do an excellent job of stopping the actual spam edits, it’s clear that many spammers have implemented a solver for the CAPTCHA. It’s been nearly two years since that CAPTCHA was publicly posted, and it seems to have worked reasonably well on wiki.okfn.org from March until August 2014, after which the spam registrations resumed and have continued since (to the tune of nearly 2000 users).

The various Open Knowledge wikis are all fairly low-volume and probably of little interest to spammers, so I think the CAPTCHA’s security-through-obscurity model is fine — It’s just that Open Knowledge can’t rely on an “off the shelf” solution remaining unbroken for more than a few months. I think making some minor changes to the current CAPTCHA (phrasing and HTML) would probably keep the Open Knowledge wikis free of spam registrations for the foreseeable future without negatively impacting legitimate users.

mikechelen · April 25, 2015, 7:39pm

There is an updated version of the Questy captcha, but I’m not sure if it would be to burdensome for humans:
http://thingelstad.com/updated-dynamic-questy-captchas/

Deiz · April 25, 2015, 8:01pm

I’d say the original implementation is better from a usability standpoint. The newer one looks to be just as machine-solvable as the original, but much harder for legitimate users. I suspect that like the original, it’ll be solved and added to various spam scripts within a few months.

Although, regardless of which Questy CAPTCHA is used, I’d bet that most of the spammers’ solutions are fragile and can probably be outsmarted via trivial changes.