Anatomy of a Digg | Infotropism

Yesterday morning I woke up to find that one of the posts on my Geek Etiquette blog had been Dugg.

Within half an hour, my site was offline.

I spent a good chunk of the day dealing with my first “digg effect”, and thought it might be interesting to write up.

Timeline

All times are in Australian east coast time, and approximate.

* Wednesday afternoon, I posted Dresscodes: Geek vs Non-Geek to my blog.
* Thursday, during the day, I checked my website stats and noticed links coming in from Reddit. These increased my blog traffic from the usual 400 a day to about 1100.
* Around the same time, the site was also submitted by the same person — a commenter on GE — to Digg. The traffic was slower to take off there, though, and I didn’t initially see any traffic coming through from there. * This morning, a bit after 8am, I checked my email. There was a slew of comments on GE, so I checked the stats. Both the GE comment moderation panel and my stats site (on a different domain, but hosted on the same server) were slow to load, but I managed to see that I was being dugg. I found the entry on Digg’s site, and found that I had around 750 diggs (“thumbs up” votes) and rising. * 8:30am: While moderating comments, suddenly I was locked out of my site. I was getting HTTP 403 (“Access Denied”) errors. I ssh’d into my server and took a look around. I saw that at least one file in my blog installation had a modification time of today — and not by me. My first thought was perhaps that getting dugg had attracted attention to my site and made it the target of malicious intruders. I called work to let them know I’d be late while I dealt with the problem. * 8:35am: copied the file tree of my GE site to a safe spot, just in case. Took a closer look at what had been changed. Just the `.htaccess` — which had had “Order deny, allow… allow from…” and two IP numbers. Ran `nslookup`, determined that those IPs were a) something at qwest.net, and b) my work’s gateway. Huh?

* 8:36am: email from Dreamhost support, telling me that my site was bringing the server to its knees, and that they’d cut off access for now. Ah! Stopped worrying about script kiddies. Support suggested enabling the wp-cache plugin. I thought I had already, but no, for some reason it wasn’t activated. Duh! Activated it, and checked that it was active on my other blogs too (it was). Just to be on the safe side, logged in to the Dreamhost panel and checked that my WordPress install was at the latest version, with no outstanding security patches. All was well.

* 8:45am: headed off to work, phoning my boss to let him know I wouldn’t be so late after all.

* 9:15am: arrived at work, checked email. DH support thanked me for enabling cache, but said the load was still high, and could I do anything else — perhaps disabling unnecessary plugins — to help with that. I tweaked a bunch of stuff, then remembered I’d seen a potentially useful plugin the other day. I found Digg Defender, installed and activated it, and the load on DH’s server dropped back to normal.

* Saturday morning, at time of posting, the number of Diggs is approaching 1200.

Effects

Unfortunately my usual web stats package was busted by the digg experience (see below), but I can still see a number of first and second level effects:

* My Reinvigorate web stats, for which I signed up for a beta program, claim I had about 30,000 page views (from 28,000 unique visitors) on the day of the Digg and around 3000/2800 so far today. My usual traffic is 300-400 visitors a day. The following graph only shows the last few days, since I activated Reinvigorate stats on the GE site, but it gives you some idea of the traffic spike:

* My Feedburner stats show RSS subscriptions rising from around 370 to 622 on the day of the Digg.
* My Technorati authority score rose from 27 to 38, based on a number of other blogs linking my post. It looks like they’re about 50% “real” posts and 50% blogs that automatically link anything that comes through digg.
* The post was subsequently picked up by StumbleUpon; right now, a day after the peak of the Digg traffic, I’m seeing most referrals coming through there, rather than Digg itself.
* There have been more than 20 comments on the post; posts on GE more often get single-digit comments. The increased comments definitely aren’t linearly related to the number of visitors, though.

I don’t have any Google ads on GE, so no stats from there, sorry.

Technical issues and solutions

Server load: WordPress configuration solutions

Apparently the load average on my server peaked around 150. Yow! WordPress runs on PHP and MySQL, and that makes for a pretty resource-hungry site if it’s getting hit hard.

If you run a WordPress blog and have even the slightest expectations of any traffic at all, you *must* install and activate wp-cache. As far as I can tell there is no excuse for not doing this. In fact, I thought I *had* done it; perhaps I just mis-clicked somewhere at some point and didn’t notice.

Unfortunately wp-cache only managed to bring the server load down to 20ish: still too high. Next step was to deactivate any unnecessary plugins. I believe any active plugin (i.e. green in the “Plugins” admin screen) will be loaded into the PHP interpreter each time you get a page hit on the site, so it’s generally good practice not to leave anything activated if you’re not using it. I also deactivated a handful of things like the comment preview plugin, which I could live without for a little while.

I also simplified my sidebar. The main thing I removed was the “related posts” section, which wasn’t really working for me anyway. I suspect this may have been generated dynamically and not cached, but I’m not sure; either way, no harm in getting rid of it. If I’d had any other dynamic sidebar cruft, I also would’ve removed it at this point.

By now the load was hovering between 10 and 20 and I’d soaked up some caffeine. Finally I recalled the Digg Defender plugin I’d seen while cruising around the other day. This plugin uses the Coral Cache distributed caching network to protect against sudden spikes in load.

I grabbed it and installed it, and the load dropped back to 3 or 4. Support tells me this is normal for that server at that time of day, so we’re all happy now. I can’t recommend this plugin highly enough; I reckon you should install and activate it “just in case” on any blog that has the slightest chance of being dugg, slashdotted, or otherwise hit by a social networking DDOS.

My stats problems

Dreamhost’s web stats are kind of manky, so for a long time I’ve supplemented them with awstats. I run the awstats update script regularly from cron, and can then view the reports via a website I’ve set up for the purpose.

Unfortunately, getting dugg meant my httpd logs were so huge that the awstats script took a long time to run. That wouldn’t be a problem except that Dreamhost kills any long-running scripts. And the longer I wait to run this thing, the more backlog there is in my access log and the longer it’s *going* to take. It’s currently sitting at half a million lines.

I’ve contacted DH support to see whether there’s any way round the script-killer. I suspect that running it at a quiet time of day, might help. I’ve tried `nice` and it does nothing; I had hoped the script-killer might be smart enough to look for that, but no luck.

If I can’t get it sorted out by the time the logs are rolled, I’ll probably save off a copy of the enormous access.log, cut it up into chunks, and pass them to awstats one by one. At least that will make sure I don’t lose the history of this event.

Finally, I’m actually finding that my Reinvigorate stats are pretty good, and I might just increase my use of, and reliance on, those external stats, rather than relying on my own awstats installation. The stats report images in this post are all taken from my Reinvigorate stats.

Scroll to Top