Skip Navigation

Google-style keyword highlighting using htaccess and PHP

 
Searching for pages on Google brings up a fairly standard list of results for most queries. Under the vast majority of those results is a link to the cached copy of the page - the copy Google saved to its servers when it viewed the page. The cached copy is basically a snapshot of what Google saw last time it visited each page.

Google search result for ILoveJackDaniels with cached link highlighted

Clicking the "Cached" link above takes you to this cached copy of the ILoveJackDaniels.com front page.

You will see, on that page, that the keyword I used to find the page ("ilovejackdaniels") is highlighted in the cached copy of the page. This is one of Google's best features. When searching, many people frequently click through to Google's cache rather than visit the original page, simply because it is easier to pick out information when what you are looking for is highlighted for you.

In order to take this a step further, it would be great to enable this kind of functionality on a site directly. For that reason, I have written the tutorial below. It's a simple technique to allow you to serve a page to users visiting from Google, MSN, AllTheWeb, Yahoo or LookSmart, with the keywords or keywords they used to find the page highlighted for them.

Most developers are quite capable of writing a highlighting tool for their own site. However, most developers do not have the time to write something along these lines. Sometimes, there are just too many files to edit. For that reason, the solution below provides highlighting for keywords without requiring any editing to the current PHP scripts behind a site.

To begin with, we need to create (or edit) an htaccess file with the following:

php_value auto_prepend_file /full/path/to/start_google_highlight.php
php_value auto_append_file /full/path/to/end_google_highlight.php

The two lines above operate like includes, the first adding a file to the beginning of each php script run on the server and the second adding a file to the end, both operating the same way as as the include() function in PHP.

The first of these files, below, tells the PHP parser to buffer the output from PHP. That means that it will not send anything to the browser until we tell it to. The idea is that, before anything is sent out, we can grab the page and perform some magic on it. At that stage, the page will be prepared and ready to be viewed - we just need to add our highlighting. The code below should be copied and pasted to a file called "start_google_highlight.php".

Note: Please be aware that if your site is already using output buffering, these scripts may not have the desired effect. In addition, output buffering is not available in versions of PHP previous to 4.

<?php
ob_start();
?>

The second file, below, is included at the end of each PHP file (and this code should be copied and pasted to a file called "end_google_highlight.php").

<?php

// 10 colours for highlighting
$colours[0] = '#FFFF99';
$colours[1] = '#99FFFF';
$colours[2] = '#99FF99';
$colours[3] = '#FF9999';
$colours[4] = '#FF99FF';
$colours[5] = '#9999FF';
$colours[6] = '#999999';
$colours[7] = '#886800';
$colours[8] = '#004699';
$colours[9] = '#990099';

if ((isset($_SERVER['HTTP_REFERER'])) and ($_SERVER['HTTP_REFERER'] != '')) {
$keywords = "";
$url = urldecode($_SERVER['HTTP_REFERER']);
// Google
if (eregi("www\.google",$url)) {
preg_match("'(\?|&)q=(.*?)(&|$)'si", " $url ", $keywords);
$search_engine = 'Google';
}
// AllTheWeb
if (eregi("www\.alltheweb",$url)) {
preg_match("'(\?|&)q=(.*?)(&|$)'si", " $url ", $keywords);
$search_engine = 'AllTheWeb';
}
// MSN
if (eregi("search\.msn",$url)) {
preg_match("'(\?|&)q=(.*?)(&|$)'si", " $url ", $keywords);
$search_engine = 'MSN';
}
// Yahoo
if ((eregi("yahoo\.com",$url)) or (eregi("search\.yahoo",$url))) {
preg_match("'(\?|&)p=(.*?)(&|$)'si", " $url ", $keywords);
$search_engine = 'Yahoo';
}
// Looksmart
if (eregi("looksmart\.com",$url)) {
preg_match("'(\?|&)qt=(.*?)(&|$)'si", " $url ", $keywords);
$search_engine = 'Looksmart';
}
if (($keywords[2] != '') and ($keywords[2] != ' ')) {
$keywords = preg_replace('/"|\'/', '', $keywords[2]); // Remove quotes
$keyword_array = preg_split("/[\s,\+\.]+/",$keywords); // Create keyword array
}

$j = (sizeof($keyword_array) > 10) ? 10 : sizeof($keyword_array);

if ($j > 0) {

$page_contents = ob_get_contents();
ob_end_clean();

$page_parts = explode('<body', $page_contents);
$page_body = '<body' . $page_parts[1];

$keywords_list = '';

for ($i = 0; $i < $j; $i++) {
//$page_contents = preg_replace('/(>)([^<]*)([^a-z]+)(' . $keyword_array[$i] . ')([^a-z]+)/i', '$1$2$3<span style="font-weight: bold; background-color: ' . $colours[$i] . ';">$4</span>$5', $page_contents);
$page_body = str_replace('\"', '"', substr(preg_replace('#(\>(((?>([^><]+|(?R)))*)\<))#se', "preg_replace('#\b(" . $keyword_array[$i] . ")\b#i', '<span style=\"font-weight: bold; background-color: " . $colours[$i] . ";\"><b>\\\\1</b></span>', '\\0')", '>' . $page_body . '<'), 1, -1));
$keywords_list .= $keyword_array[$i] . ', ';
}

$notice = '<div style="border-bottom: 1px solid #000; font-size: 80%; padding: 3px;">Welcome, ' . $search_engine . ' user. The following search terms have been highlighted: ' . substr($keywords_list, 0, -2) . '<br><a href="http://' . $HTTP_HOST . $REQUEST_URI . '">Click here to remove highlighting</a></div>';

$page_body = eregi_replace("(<body[^>]*>)", "\\1" . $notice, $page_body);

echo $page_parts[0] . $page_body;
}
}

?>

It starts out simply enough. The first part of the script allows you to define the 10 colours you want to use to highlight keywords. The first keyword the user entered will be highlighted with the first colour in the list, and so on.

Next, the script checks to see if the user's browser sent us the address that referred them to this page, in the "HTTP_REFERER" header. If so, then we check to see if they were referred from one of the 5 search engines we are checking for. If they are, then the next stage is to strip the keywords they searched for out of the referral URL.

Once we have that, we process the keywords, and turn them into an array. Then, one item at a time, we wrap a <span> around each keyword with the highlight colour in it. The script is careful not to replace anything in the <head> portion of the page, and not to replace keywords that may appear inside HTML tags.

Last, we add a note to the top of the page explaining what is going on, and add a quick link so that the user can remove the highlighting if they like.

That is literally it. You can, if you like, expand the script quite easily to include other search engines. You can change the method of highlighting, use external CSS, underline text or use a different font colour - whatever suits you. And best of all, no editing of your site is required.
 

Syndication

If you like this post, subscribe to my full feed or partial feed.

 

21 comments (Add Yours)

You can keep up to date with this discussion by subscribing to the RSS or Atom feed.
 
Daniel O'Connor
Australia #1: June 2, 2004
I saw this today in co-wiki, and instead of trawling the CVS I just googled for this. Very very nice :)
 Republic Of Moldova #2: November 5, 2004
Hi.
This is very important for me.
I did everythink what you are wrighting about...but i have real problems with php_value auto_prepend_file. I can't understand how to use it...i have written the way to my start and end files - php_value auto_prepend_file /var/www/http://vintage-reprints.com/catalog/start.php
and
php_value auto_apend_file /var/www/http://vintage-reprints.com/catalog/finish.php and it doesn't work :(

Please help me. I need your help! say me please how to use it and to wright the way to my filse right.

With best regards.
Tom.
Hi Mamont. The problem looks like one of location. You need to change "/var/www/http://vintage-reprints.com/catalog/start.php" to point to the file. Not the web address, the file address. Ask your host for the file path if you don't know what it is.
Ray
Netherlands #4: December 21, 2004
It really works great, thank you so much m8!
Very cool, but does it work with news.google.x ? I can't understand preg_replace.Thank you!
It doesn't work with news.google as it currently stands, but could easily be modified to do so with a quick addition.
 United Kingdom #7: March 30, 2005
This is great, my only problem was that the search terms in the intro at the top were not colour coded - was not a major issue to move it up though, so that the search and replace included the added message. Thanks a lot for a great little script!
bug or not bug?

anybody else notice problem with &nbsp;
getting transformed to:
<span class="spacer">&<span style="font-weight: bold; background-color: #FF9999;"><b></b></span>nbsp<span style="font-weight: bold; background-color: #FF9999;"><b></b></span>;</span>

Ends up breaking the design as it then no longer gets interpreted as &nbsp; and gets printed to screen.

click on first link to see what i mean:
http://www.google.com/search?hl=en&lr=&q=site%3Arobusthaven.com+robust
oops look like i &amp;nsb; got hidden in my last quote

(suggestion)
might be helpful to explain this in the article as it does all the work:
$page_body = str_replace('\"', '"', substr(preg_replace('#(\>(((?>([^><]+|(?R)))*)\<))#se', "preg_replace('#\b(" . $keyword_array[$i] . ")\b#i', '<span style=\"font-weight: bold; background-color: " . $this->colours[$i] . ";\"><b>\\\\1</b></span>', '\\0')", '>' . $page_body . '<'), 1, -1));

forgot to say thanks for the great script.
I noticed that common words were getting highlighted.

Easy way is to delete them from and or is keyword_array

$this->stop_words=array('to','and','the','The','in','is','are','on','or','');

if(is_array($keyword_array) && count($keyword_array)>=1){
foreach($keyword_array as $key => $value){
if(in_array($value,$this->stop_words)){
unset($keyword_array[$key]);
}else{$temp[]=$value;}
}
}
$keyword_array=$temp;

hope it helps someone
sorry dave for all the posts!!!

before i posted that &amp;nsb; was causing break in the designs.. well guess what adding the stop word filter helped in fixing that.. i'm guessing it translated &amp;nsb; to be ''

and therefore took it out the keyword_array

neways thanks for the kewl script

$this->stop_words=array('but','if','it','for','be','of','a','to','and','the','The','in','is','are','on','or','');
ntg
Greece #12: June 10, 2005
I have a site full of articles, with its own search engine and i want to higlight the keywords in an article when someone perfoms a search.

Could you help me a little? I 'm confused on how to filter the text, the db sends me back, and enclose in spans whatever word is similar to keywords.
carlos
United States #13: June 10, 2005
i've been to many sites that show how to install this text highlighter. but i've never used HTML codes before. I don't even know what HTaccess and PHP mean. is there any remote chance that anyone can help me?
 Russian Federation #14: May 21, 2006
This is a really excellent guide - the only one I've found anywhere! I'm using it on my site to create a live glossary plus linking keywords to my friends' sites (just for fun) in the style of phpfreak's glossary tooltips.

I'm not very good with regular expressions - could anybody possibly help explain how to alter the preg_replace() call used here not to replace anything it finds in anchor text (i.e. between <a... and </a> or inside <form></form> tags? Would be much obliged.

Links to this site and the others that helped me with this project will be going up from my blog in the next few days!
Robert
Russian Federation #15: May 21, 2006
Okay, I've figured out a simpler solution with forms - just switch it off if there's a <textarea or <form occurence in the ob :)

However, I'd still really like to figure out how to recognise if a keyword is within <a href...> and </a> - anybody? Please?!
mihai
Romania #16: December 22, 2006
Excellent script. I was always wondering how are some sites able o highlight my search phrase. Nice to see this script can be used on any type of application.

Thank you!
Great stuff.. I just needed a part of the code but still excellent
Your code is so cool!
But it does not work for multibyte code. Do you have any suggestion for the UTF-8?
I did this all, but it does't seem to do anything at all?

Some help please.
Vincent
Unknown #20: June 25, 2007
Hi,

That's a good php script, however I keep trying to insert it in my website and I have the same problem than Leblanc Meneses.
A navigation bar on the left has disapeared and some &nbsp; appears on my page.
So could you tell me how to fix it? I tried to do what Leblanc Meneses did but I dont know where exactly he added that in the script.
Phil
Unknown #21: June 25, 2007
How can i prevent some parts of my html code to be replaced?
For example, if I have some url text that I don't want to be changed.

 

Post Your Comment

 
Only the name and comment fields are required.
 

Live Comment Preview

 United States #22: 1 minute ago

Web Design, Development and Marketing