Web Design

Related Tags:

blogging Business How To HTML Humor Innovation PHP plugins Programming The Internet Ultimate-Tag-Warrior usability Web-Development WordPress

Technorati Hates Me

Every day, five or six thousand social-networking, blogosphere-trotting, long-tailing web sites are created.  All of them with a really great, new idea that combines RSS with AJAX and plans to stay in Beta forever.  Out of all of those, a few are really cool and useful - and Technorati is definitely one of them. Does Technorati like me?Technorati tracks blogs and the discussions, reactions, and responses that bounce from blog to blog via the simple mechanism of who is linking to who.  It also collects tags and allows you to search the mass of blogs for posts that might be relevant to your query.  Bloggers can "claim" their own blog and use some surprisingly fun tools to see who is talking about them.  Some people have even been abandoning the whole trackback system in favor of Technorati. And apparently, Technorati hates me. Now, Technorati hasn't outright said it hated me (or us, since this is a group blog), but it won't let us claim Unsought Input.  Every time we try to make the claim, we get this:
There was a problem claiming your blog. Please try again in a few minutes. You can also go to Technorati Help for help claiming your blog.
Trying again is of no use, whether we wait a few minutes or a few weeks.  Using the customer service form to send an email has been fruitless as well.  Each time an acknowledgment email is promptly returned, but no answer--even when we send them a reminder with our ticket number. For a while I thought I knew the problem - some of our authors had claimed their author archive pages as their own blogs.  This doesn't really work, though, since virtually no one links to our author pages and posts on Unsought Input don't fall under the same URL pattern.  After we cleared out those old claims, I had a small glimmer of hope - but alas, we still cannot complete the claim. I know they are busy.  I know that it is a free service (though to tell the truth I would be willing to pay a reasonable price, like I have with StumbleUpon and Last.fm, it really is a cool service).  But at this point I feel like a freshman in high school with no date for the winter formal: Technorati does not like me But why doesn't Technorati like us?  There was a post about some technical difficulties on the Technorati blog last month, but judging by the example at blogs.marketwatch.com, it turned out to be more about indexing times than problems with claims. I've seen the same issue mentioned on other blogs like Bark Bark Woof Woof, and a few commenters have mentioned the possibility that Unsought Input has been identified as spam.  I hope the latter is the case, because it has become clear recently that if a powerful gateway site like Google thinks you are spam, you are in big trouble. ... To be fair, this post is a bit tongue-in-cheek.   The folks at Technorati are remarkably accessible, and many of them have blogs of their own (or even make their email addresses available to the public).  I just haven't worked up the gumption to pester them more directly.  I would much rather go through the support page, I know they are busy.

Fighting Spam on a Diet – How to fix Akismet Performance Problems

Running into strange WordPress performance problems and database errors?  Akismet could be the culprit, but we're in luck, it's an easy fix. Earlier I wrote a bit about our encounter with vicious, robotic Chinese comment spammers.  Since then we've had a few further issues, and I think I've found the culprit - Akismet, the plugin we've been using to fight the spam. First off, let me say that I think Akismet is a great plugin.  While we had hundreds of spams come in for a few days in a row, not one made it out to the public.  Very nice.  But it is a bit too aggressive in one spot, and that can slow down your blog or lock up the comment table, filling your max_connections. The problem is in akismet.php, specifically the akismet_delete_old() function.  Look for the following lines:
$n = mt_rand(1, 5); if ( $n % 5 ) $wpdb->query("OPTIMIZE TABLE $wpdb->comments");
Those of you with PHP / MySQL experience will recognize the problem immediately.  For the less code-literate, this is creating a random number between 1 and 5, and if the number has a remainder after being divided by 5, it runs and OPTIMIZE TABLE on the comments table.  That means that at random, it will lock the entire table and compute statistics after 80% off all deletes. Now, it's a good idea to optimize your tables after a large number of deletes.  But it is a pretty expensive operation, because it could be rearranging things on disk to free up space. Now, imagine you get hit by a spam bot and end up with a couple hundred spam comments.  Akismet catches them all, and 15 days later tries to delete them all in one big loop.  One big loop filled with a couple hundred table-locking, disk-intensive database operations. But it's easy to fix.  Replace the lines above with this:
$n = mt_rand(1, 100); if ( $n == 42 ) $wpdb->query("OPTIMIZE TABLE $wpdb->comments");
That will only optimize the table on average once out of 100 comments deleted.  Why 100?  It's an educated guess.  According to the MySQL documentation, at most you will need to optimize a table once a month or so, maybe once a week if you have a large number of deletes or edits on varchar fields. Why did I pick 42 for the one value out of a hundred that triggers an optimization?  You're asking the wrong question.

Comment Spam Deluge – Did our Captcha get Hacked?

Have you been having trouble reading Unsought Input lately? You're in good company – I've been having trouble writing for it.

We've been having issues with MySQL to the point of hanging connections and pleasant, but not very helpful WordPress error messages. It's nice that user-friendly errors are built-in to WordPress, since you never want to give users cryptic, blue-screen-of-death style errors. But I needed to get to the root of the problem.

I quickly put on my detective cap and tried to log in with phpMyAdmin – no luck, but this time the error message was a little more useful:

#1040 - Too many connections

Normally you encounter this error for one of two reasons: either you are being Slashdotted, or you are opening up persistent connections (with PHP's mysql_pconnect(), for example) and they are not being closed properly. In the first case, there are just too many queries at once and it fills up the connection limit, and in the second case they build up over time.

I didn't think possibility number 1 was very likely, since we don't write anything cool and geeky enough to get on Slashdot. The story about the Canadian geologist was probably our best bet. I knew I hadn't written any code to use persistent connections, but what about the rest of WordPress?

No such luck. Not a single pconnect in any of the WordPress or plugin code. Back to the first possibility – is it possible we were being hit but a distributed denial of service attack (DDoS)? More specifically (and more likely), we were being effectively DDoS'ed by comment spammers.

How did I figure it out? The connection limit for MySQL is set in the config file, my.cnf in Apache (or possibly my.ini in Windows/IIS):

[mysqld] set-variable=max_connections=100

The default is 100 and that should be enough for most sites. I needed to see what was actually being run, so I connected as a user with administrative rights and sent MySQL this command:

SHOW FULL PROCESSLIST

I got back a list of 200 locked queries, all dealing with selecting or deleting comments!

We have two measures in place to combat comment spam. One is Askimet, which is a standard plugin for WordPress. I have no hard data but I would guess almost everyone uses it. The other is a captcha plugin called Did You Pass Math?

The idea behind captchas is to give visitors a small task that is easy for humans but harder for machines. That's where those fancy images with the wavy letters and numbers come from. I wanted to use something a little simpler, so I went with Did You Pass Math. From what I've read, a big part of the power of captchas is just having something there at all to make your submit form non-standard and break the really naïve spamming scripts (see Jeff Atwood's story about his captcha in Coding Horror). It worked really well for a while.

But not any more. Askimet now reports an order of magnitude more spam blocked than ever before.

Is Did You Pass Math officially broken? It seems like I'll need to upgrade or find something different. Maybe I can hack it a bit to ask about more than just addition.

Jess B was kind enough to look through our logs and she found a ton of hits from the same IP range, and the IPs all went to spammy sites filled with more spam. Ugh.

Has anyone else noticed this with Did You Pass Math, or any other captcha plugin?

Usability Begins at Home – 3 Challenges in Usability Testing with Older Users

Have you ever gone to a web site and had a hard time navigating around the site? Ever try to purchase something online only to find the steps so confusing and unintuitive you give up and buy somewhere else?

Web sites that suffer from poor usability almost invariably also suffer from poor readership and sales. That's why a small, but growing number of companies are starting to put some time and money into usability testing. They are, quite shockingly, actually watching their users try to use their web site.

People age 60 and up are the fastest-growing user group on the web, and a large number of sites will want them as customers. In this post, I want to talk about a test I ran with an older user where the web site was actually not at fault – at least not primarily at fault – for a severe lack of usability. We will cover the three major challenges you need to address when doing usability testing with older adults.

The web site was targeted at a wide range of users, specifically including the elderly. I was running an informal usability test with two middle-aged users and one elderly user. The middle-aged users were experienced users of office software and occasional web users. They were able to walk through the tasks reasonably well, although it was apparent some of the labels were a little unclear and could use tweaking.

The first challenge of testing with older users and computer novices is use of the mouse. If you are not yet used to using a mouse, moving the pointer accurately can seem unnatural and disconnected. The use of mouse buttons can also be a challenge – in my experience, novice users will either often click the wrong button or stop and ask at every click - “should I right-click or left-click?�

How do you teach grandma, grandpa, or the nice lady who volunteers for the church how to use a mouse? Luckily, there is a software package almost guaranteed to work – and it's free- and you already have it. It's called solitaire. The only way to improve is through practice, but just about everyone seems to be able to pick it up after enough solitaire. I became comfortable with a mouse in a couple of days by playing GeoWorks solitaire on a blazing fast 286, it worked for both of my grandmothers, and it worked for this particular user.

The second challenge is visual. It can be very tempting for designers to make web sites “best viewed at 1024x768 resolution,� and it can be very tempting for usability researchers to make everyone use the same font and resolution settings, to eliminate independent variables. Visually impaired users tend not to care about your pixel-perfect design or your variables, and they will change the font and resolution settings. Your site (and your experimental design) had better be able to cope with it, or risk losing older people as customers altogether.

And, of course, your most important users are blind.

I was aware of both of these challenges when I began my test with my third user. She had little experience with the web but had used terminals in the 70s and 80s and was an avid player of solitaire, free cell, and other computer card games. In addition, her resolution was set to 800x600 and her fonts were set larger than normal, just as she would normally use.

The first task was to sign up for a user account. The user was able to quickly find the link and click to the sign up page.

The third challenge became apparent very quickly once she got to the form. She clicked in the first text box, and then began a slow, painful process of searching the keyboard for letters. The keyboard itself can be a usability problem!

Many of the older users you will encounter learned to touch-type on terminals, word processors, or typewriters and can transfer that skill to using the web. On the other hand, many, many people who were able to hunt-and-peck their way through their entire career find they are unable to do so as their vision deteriorates.

This was an informal usability study, so was not keeping time—but if I had been, the site would have failed miserably. My participant worked for a full half hour on the sign-up page, filling just 10 form fields. The effort required was obviously way to much, especially since use of this web application would require small amounts of typing on a daily basis.

How do you address the third challenge? You have two options:

  1. Eliminate as much typing as possible. Many users ignore your site navigation and immediately start searching, but you must continue to make all points in your site accessible by clicking down a hierarchy or other organizational scheme. Take a look at every text input in every form on your site – how much of that information is really needed? Make it very clear when some items are optional and others are required, and try to keep the number of required fields to a minimum. Will the world come to an end if not every user gives you their zip code?
  2. Address the problem at the root – replace the user's keyboard with a large-text keyboard. Now obviously, if you have a general-interest web site you will not have access to each user's home, and it would be expensive to distribute free keyboards through the neighborhood like Halloween candy. In some cases, however, giving a user a large-print keyboards that cost less than $10 may very well be worth a $20/month subscription fee, or thousands of dollars in direct sales. At the very least, if you expect to have a large elderly user population, offer large-print keyboards on your site or link to someone who does.

If you are looking for a large-print keyboard, I have found them at Amazon.

Search for large-print keyboards.

Weird Errors – Fix Timeout Issues in CURL, PHP, and Apache.

Hitting strange errors when trying to execute long-running PHP processes, like large file reads, generating static HTML pages, file uploads, or CURL calls? It might not be just bugs in your code.

Are you getting pages that seem to load, but then nothing shows up in the browser? When you go to a page, does your browser sometimes ask, "You have chosen to open something.php which is a : PHP file. What should Firefox do with this file" or possibly "File name: something.php File type: PHP File Would you like to open the file or save it to your computer" Do you get internal server errors at random intervals?

Depending on what you are trying to, you could be running into timeout issues, either in PHP, in a particular library, in Apache (or IIS or whatever web server you use), or even in the browser. Timeout issues can be a real pain because you don't run into them very often and they don't result in clear error messages.

Let's take a PHP script that does a number of CURL calls as an example. PHP gives you access to libcurl a really powerful tool for calling up other web pages, web services, RSS feeds, and whatever else you can dream up, right in your PHP code. This article is not a general introduction to CURL, so I won't go into detail, but basically the CURL functions allow your code to make requests and get responses from web sites just like a browser. You can then parse the results use the data on your site.

Let's say you have a page on your site where you would like to display the latest posts from a few of your friends' websites, and they don't have RSS feeds set up. When a user comes to your site, you can make a series of CURL calls to get the data:

$curl_session = curl_init(); curl_setopt($curl_session, CURLOPT_HEADER, false); curl_setopt($curl_session, CURLOPT_FOLLOWLOCATION, true); curl_setopt($curl_session, CURLOPT_RETURNTRANSFER, true); curl_setopt ($curl_session, CURLOPT_HTTPGET, true); curl_setopt($curl_session, CURLOPT_URL, 'http://www.example.com'); $string = curl_exec($curl_session);

 

You can now parse the results in $string and hack out the most recent post. You would repeat these calls for each of your friends' web sites.

You try running the page and everything seems to work at first, but then you hit reload and get some strange behavior, like the the problems listed above. In the worst cases, you won't get the same exact error each time - sometimes the page will load, some times you'll get an empty $string or errors from curl, sometimes a blank page will appear, and some times you will be asked to download the PHP file - which includes all your source code!

In this situation you could be timing out. CURL is going out to another web server and your code will have to wait for it to finish before moving on to something else. In addition, your web server may be waiting on PHP to finish it's work before sending something to the browser.

Luckily, there are a few ways to control how long the CURL functions, PHP, and Apache wait and you can do a little bit to ensure that the user's browser doesn't just give up either.

CURL has two options worth looking at, CURLOPT_TIMEOUT and CURLOPT_CONNECTTIMEOUT. The former sets how long CURL will run before it gives up and the latter sets how long CURL will wait to even connect to the site you want to pull data from. If you wanted to wait at most 4 seconds to connect and 8 seconds total, you would set it like this:

curl_setopt($curl_session, CURLOPT_CONNECTTIMEOUT, 4); curl_setopt($curl_session, CURLOPT_TIMEOUT, 8);

This can be very helpful if you are connecting to a large number of different web sites or connecting to sites that not always available or are on slow hosts. You may wish to set the timeouts much higher, if you really need to get that data, or fairly low, if you have a lot of CURL calls and don't want PHP to time out. You can get an idea how long things are taking by using curl_getinfo():

echo '

';
print_r(curl_getinfo($curl_session));
echo '
';

PHP may also time out if it is running for too long. Luckily, you can control this to some extent by changing a setting in your php.ini or using the set_time_limit() function. If you can make changes to php.ini, it might be worth adding or adjusting the following lines:

max_execution_time = 300 ; Maximum execution time of each script, in seconds max_input_time = 60 ; Maximum amount of time each script may spend parsing request data memory_limit = 8M ; Maximum amount of memory a script may consume (8MB)

If you don't have access to php.ini, you may be able to use set_time_limit() to change the max_execution time on each page where it is needed. If you are in a shared hosting environment, don't monkey with these values too much or you might impact other users. If you raise the time limit too high, you may get an angry email from your admin. Some hosts have programs set up to look out for long-running processes and kill them - check with your admin if you raise the time limit and the script still dies an early death.

Your web server (Apache is used for this example) may also be running into timeout issues. If you have access to your httpd.conf, changing the timeout is pretty easy:

Timeout 300

Unfortunately, not everyone will be able to edit their httpd.conf and this is not something you can add to an .htaccess file to change for just the scripts in a particular directory. Luckily we can work around this limitation, so long as we are sending the webpage to the user in parts, rather than waiting for the entire PHP script to execute and then sending the response.

How do we do it? First, make sure mod_gzip is turned off in an .htaccess file:

mod_gzip_on no mod_gzip_item_include mime ^text/.* mod_gzip_item_exclude mime ^image/.*$

Mod_gzip is a great way to reduce bandwidth use and increase site performance, but it waits until PHP has completed executing before zipping and sending the web page to the user.

Second, take a look at your PHP code and make sure you are not output buffering the whole page, including output buffering to send gz-encoded (gzipped) output. Output buffering can give you a lot of control, but in this case it can cause problems. You can Look for something like this:

ob_start(); // ... // a whole ton of time-consuming code here // ... ob_flush(); //or possibly ob_end_flush();

Finally, if you have a number of time-intensive sections in your code, you can force some data out to the browser to keep Apache going and help make sure the browser doesn't lose interest either. It might look something like this:

echo "Loading Steve's page ..."; // ... // a time-consuming CURL call // ... //do a flush to keep browser interested... echo str_pad(" Loaded. ",8); sleep(1); flush(); echo "Loading Jill's page ..."; // ... // a time-consuming CURL call // ... echo str_pad(" Loaded ... ",8); sleep(1); flush();

The flush() function is the main trick - it tells PHP to send out what it has generated so far. The str_pad() and sleep() calls might not be necessary in this case, but the general idea is that some browsers need a minimum of 8 bytes to start displaying and the delay from the sleep(1) call seems to make IE happy.

This technique is not just useful in getting around timeout problems, it can also be used on long pages to give the user something to start looking at while the rest of the data loads. Also, some browsers might not handle content serves as XML incrementally – in that case you might want to serve it as text/html:

header("Content-Type: text/html");

Hopefully this will help you track down those nasty timeout-related bugs. Have questions or some other tips? Post in the comments below.