02 May 2006

More Google(W)hacking

I was thinking some more about googlewhacking. Specifically, if it would be possible to write a whack checking program to look at a URL and work out the whacks it contains. My theory is that many pages will contain at least one googlewhack and 'CheckWhack' would prove popular with web developers.

A phriend worked out that for any number of words on a web page, n, there were n! / (n-2)! * 2! combinations. For a web page containing 1000 unique words, that works out at 499,500 combinations - way too many to run through google.

I thought I'd try working out how many unique words there were in this blog. It came out at 757 giving 286,146 combinations - still far too many to check. Knocking out the 500 most common English words, brings the counts down to 549 unique words, 150,426 combinations.

Even limiting the words to those with 7 or more letters you end up with 159 unique words and 12,562 combinations.

In short, it doesn't look like it'll work! The CheckWhack project is on permanent hiatus (903,000 hits at time of writing, wait, 903,001)

No comments: