Apache CURL error How To HTML IIS MySQL performance plugins Programming The Internet Web-Development Web Design Windows WordPress WP-Cache

Comment Spam Deluge – Did our Captcha get Hacked?

Have you been having trouble reading Unsought Input lately? You're in good company – I've been having trouble writing for it.

We've been having issues with MySQL to the point of hanging connections and pleasant, but not very helpful WordPress error messages. It's nice that user-friendly errors are built-in to WordPress, since you never want to give users cryptic, blue-screen-of-death style errors. But I needed to get to the root of the problem.

I quickly put on my detective cap and tried to log in with phpMyAdmin – no luck, but this time the error message was a little more useful:

#1040 - Too many connections

Normally you encounter this error for one of two reasons: either you are being Slashdotted, or you are opening up persistent connections (with PHP's mysql_pconnect(), for example) and they are not being closed properly. In the first case, there are just too many queries at once and it fills up the connection limit, and in the second case they build up over time.

I didn't think possibility number 1 was very likely, since we don't write anything cool and geeky enough to get on Slashdot. The story about the Canadian geologist was probably our best bet. I knew I hadn't written any code to use persistent connections, but what about the rest of WordPress?

No such luck. Not a single pconnect in any of the WordPress or plugin code. Back to the first possibility – is it possible we were being hit but a distributed denial of service attack (DDoS)? More specifically (and more likely), we were being effectively DDoS'ed by comment spammers.

How did I figure it out? The connection limit for MySQL is set in the config file, my.cnf in Apache (or possibly my.ini in Windows/IIS):

[mysqld] set-variable=max_connections=100

The default is 100 and that should be enough for most sites. I needed to see what was actually being run, so I connected as a user with administrative rights and sent MySQL this command:

SHOW FULL PROCESSLIST

I got back a list of 200 locked queries, all dealing with selecting or deleting comments!

We have two measures in place to combat comment spam. One is Askimet, which is a standard plugin for WordPress. I have no hard data but I would guess almost everyone uses it. The other is a captcha plugin called Did You Pass Math?

The idea behind captchas is to give visitors a small task that is easy for humans but harder for machines. That's where those fancy images with the wavy letters and numbers come from. I wanted to use something a little simpler, so I went with Did You Pass Math. From what I've read, a big part of the power of captchas is just having something there at all to make your submit form non-standard and break the really naïve spamming scripts (see Jeff Atwood's story about his captcha in Coding Horror). It worked really well for a while.

But not any more. Askimet now reports an order of magnitude more spam blocked than ever before.

Is Did You Pass Math officially broken? It seems like I'll need to upgrade or find something different. Maybe I can hack it a bit to ask about more than just addition.

Jess B was kind enough to look through our logs and she found a ton of hits from the same IP range, and the IPs all went to spammy sites filled with more spam. Ugh.

Has anyone else noticed this with Did You Pass Math, or any other captcha plugin?

Weird Errors – Fix Timeout Issues in CURL, PHP, and Apache.

Hitting strange errors when trying to execute long-running PHP processes, like large file reads, generating static HTML pages, file uploads, or CURL calls? It might not be just bugs in your code.

Are you getting pages that seem to load, but then nothing shows up in the browser? When you go to a page, does your browser sometimes ask, "You have chosen to open something.php which is a : PHP file. What should Firefox do with this file" or possibly "File name: something.php File type: PHP File Would you like to open the file or save it to your computer" Do you get internal server errors at random intervals?

Depending on what you are trying to, you could be running into timeout issues, either in PHP, in a particular library, in Apache (or IIS or whatever web server you use), or even in the browser. Timeout issues can be a real pain because you don't run into them very often and they don't result in clear error messages.

Let's take a PHP script that does a number of CURL calls as an example. PHP gives you access to libcurl a really powerful tool for calling up other web pages, web services, RSS feeds, and whatever else you can dream up, right in your PHP code. This article is not a general introduction to CURL, so I won't go into detail, but basically the CURL functions allow your code to make requests and get responses from web sites just like a browser. You can then parse the results use the data on your site.

Let's say you have a page on your site where you would like to display the latest posts from a few of your friends' websites, and they don't have RSS feeds set up. When a user comes to your site, you can make a series of CURL calls to get the data:

$curl_session = curl_init(); curl_setopt($curl_session, CURLOPT_HEADER, false); curl_setopt($curl_session, CURLOPT_FOLLOWLOCATION, true); curl_setopt($curl_session, CURLOPT_RETURNTRANSFER, true); curl_setopt ($curl_session, CURLOPT_HTTPGET, true); curl_setopt($curl_session, CURLOPT_URL, 'http://www.example.com'); $string = curl_exec($curl_session);

 

You can now parse the results in $string and hack out the most recent post. You would repeat these calls for each of your friends' web sites.

You try running the page and everything seems to work at first, but then you hit reload and get some strange behavior, like the the problems listed above. In the worst cases, you won't get the same exact error each time - sometimes the page will load, some times you'll get an empty $string or errors from curl, sometimes a blank page will appear, and some times you will be asked to download the PHP file - which includes all your source code!

In this situation you could be timing out. CURL is going out to another web server and your code will have to wait for it to finish before moving on to something else. In addition, your web server may be waiting on PHP to finish it's work before sending something to the browser.

Luckily, there are a few ways to control how long the CURL functions, PHP, and Apache wait and you can do a little bit to ensure that the user's browser doesn't just give up either.

CURL has two options worth looking at, CURLOPT_TIMEOUT and CURLOPT_CONNECTTIMEOUT. The former sets how long CURL will run before it gives up and the latter sets how long CURL will wait to even connect to the site you want to pull data from. If you wanted to wait at most 4 seconds to connect and 8 seconds total, you would set it like this:

curl_setopt($curl_session, CURLOPT_CONNECTTIMEOUT, 4); curl_setopt($curl_session, CURLOPT_TIMEOUT, 8);

This can be very helpful if you are connecting to a large number of different web sites or connecting to sites that not always available or are on slow hosts. You may wish to set the timeouts much higher, if you really need to get that data, or fairly low, if you have a lot of CURL calls and don't want PHP to time out. You can get an idea how long things are taking by using curl_getinfo():

echo '

';
print_r(curl_getinfo($curl_session));
echo '
';

PHP may also time out if it is running for too long. Luckily, you can control this to some extent by changing a setting in your php.ini or using the set_time_limit() function. If you can make changes to php.ini, it might be worth adding or adjusting the following lines:

max_execution_time = 300 ; Maximum execution time of each script, in seconds max_input_time = 60 ; Maximum amount of time each script may spend parsing request data memory_limit = 8M ; Maximum amount of memory a script may consume (8MB)

If you don't have access to php.ini, you may be able to use set_time_limit() to change the max_execution time on each page where it is needed. If you are in a shared hosting environment, don't monkey with these values too much or you might impact other users. If you raise the time limit too high, you may get an angry email from your admin. Some hosts have programs set up to look out for long-running processes and kill them - check with your admin if you raise the time limit and the script still dies an early death.

Your web server (Apache is used for this example) may also be running into timeout issues. If you have access to your httpd.conf, changing the timeout is pretty easy:

Timeout 300

Unfortunately, not everyone will be able to edit their httpd.conf and this is not something you can add to an .htaccess file to change for just the scripts in a particular directory. Luckily we can work around this limitation, so long as we are sending the webpage to the user in parts, rather than waiting for the entire PHP script to execute and then sending the response.

How do we do it? First, make sure mod_gzip is turned off in an .htaccess file:

mod_gzip_on no mod_gzip_item_include mime ^text/.* mod_gzip_item_exclude mime ^image/.*$

Mod_gzip is a great way to reduce bandwidth use and increase site performance, but it waits until PHP has completed executing before zipping and sending the web page to the user.

Second, take a look at your PHP code and make sure you are not output buffering the whole page, including output buffering to send gz-encoded (gzipped) output. Output buffering can give you a lot of control, but in this case it can cause problems. You can Look for something like this:

ob_start(); // ... // a whole ton of time-consuming code here // ... ob_flush(); //or possibly ob_end_flush();

Finally, if you have a number of time-intensive sections in your code, you can force some data out to the browser to keep Apache going and help make sure the browser doesn't lose interest either. It might look something like this:

echo "Loading Steve's page ..."; // ... // a time-consuming CURL call // ... //do a flush to keep browser interested... echo str_pad(" Loaded. ",8); sleep(1); flush(); echo "Loading Jill's page ..."; // ... // a time-consuming CURL call // ... echo str_pad(" Loaded ... ",8); sleep(1); flush();

The flush() function is the main trick - it tells PHP to send out what it has generated so far. The str_pad() and sleep() calls might not be necessary in this case, but the general idea is that some browsers need a minimum of 8 bytes to start displaying and the delay from the sleep(1) call seems to make IE happy.

This technique is not just useful in getting around timeout problems, it can also be used on long pages to give the user something to start looking at while the rest of the data loads. Also, some browsers might not handle content serves as XML incrementally – in that case you might want to serve it as text/html:

header("Content-Type: text/html");

Hopefully this will help you track down those nasty timeout-related bugs. Have questions or some other tips? Post in the comments below.

WordPress Tutorial: Using WP-Cache on Windows / IIS

Is your blog starting to bog down? Getting nasty emails from your ISP about overloading the database server? Since most blogs are read far more often than they are updated, caching your pages can result in a real performance improvement.

Wordpress has some very basic object caching, but you really need to be able to cache whole pages to see a big benefit. Luckily there is a very good page-caching plugin, WP-Cache.

If you are on a Linux or Unix host, installation is pretty straightforward.

Now, what if you are on a Windows/ IIS host and using 'date and name based', almost-pretty permalinks? No sweat. Okay, a little bit of sweat.

The code for WP-Cache makes a few assumptions about the environment it's running in which don't work out so well in Windows. My first major step in getting it to work was a great blog post on CPUIdle. Since that blog seems to be down, I'll quote their steps here:

"1. Download WP-Cache zip file (current version as of writing is 2.0.17) and unzip into wp-content/plugins folder.

2. Copy wp-content/plugins/wp-cache/wp-cache-phase1.php to wp-content/advanced-cache.php (not really sure why this isn’t simplified by the author).

3. Open the standard wp-config.php file and add define('WP_CACHE', true);

4. Now comes the tricky part:

open wp-content/plugins/wp-cache/wp-cache.php in your favourite text editor. Search for the wp_cache_add_pages function and change the function code like this:

add_options_page('WP-Cache Manager', 'WP-Cache', 5, 'wp_cache/wp_cache.php', 'wp_cache_manager');

Reason the original code doesn’ work is that the original __FILE__ resolves to wp_cache\wp_cache.php which some browser eat and convert to wp_cachewp_cache.php- which doesn’t exist.

5. Second problem is that WP-Cache checks for installation step 2) in a windows-incompatible manner. Search for the wp_cache_check_link function. Change the first three lines after the variable declaration in this way:

# if ( basename(@readlink($wp_cache_link)) != basename($wp_cache_file)) {

# @unlink($wp_cache_link);

# if (!@symlink ($wp_cache_file, $wp_cache_link)) {

if (!file_exists($wp_cache_link)) { {

6. Finally, open wp-content/plugins/wp-cache/wp-cache-phase2.php and search for ob_end_clean(); and replace with ob_end_flush();. Without this change the cached page contents are not written back when the page is initially cached. It’s unclear to me if that works under *nix, I assume it couldn’t.

7. That’s it- you’re done. No goto Options/WP-Cache and turn caching on."

Unfortunately, if you are set up like we are, using the “index.php� style permalinks, there's one last step you're going to have to do. In Windows / IIS, $_SERVER['REQUEST_URI'] is blank. You need to use $_SERVER['SCRIPT_NAME'].$_SERVER['PATH_INFO'] instead. If you don't, WP-Cache will happily cache your index.php file, but it will also think your /index.php/category/cheese/ page and your /index.php/2006/01/01/I-am-very-interesting/ page are the same as index.php.

In wp-cache-phase1.php (and also advance-cache.php) look for this line:

$key = md5(preg_replace('/#.*$/', '', $_SERVER['REQUEST_URI']) . wp_cache_get_cookies_values());

and change it to this:

$key = md5(preg_replace('/#.*$/', '', $_SERVER['SCRIPT_NAME'].$_SERVER['PATH_INFO']) . wp_cache_get_cookies_values());

By the way, one nice thing about step 6 above is that it also fixes a blank-page bug that some people have run into.

Finally, what if you want to use both WP-Cache and gzip? Here's how.