by Mike Banks Valentine copyright
© 2003
As a search engine optimization specialist, I see
a steady stream of business oriented web sites belonging
to clients and potential clients that unknowingly
HIDE their web sites from the search engines. Hide
and seek! Peekaboo!
HIDING? Yes, you heard me right, I said hiding from
search engines! Let's take a look at a few of the
ways you might do that without meaning to do so.
Secure Server.
Search engines do have the ability to spider secure
server hosted pages, but often these pages require
either that a visitor fill out a form or log-in with
a password and user name before being allowed past
a certain point. If any page requires filling out
of forms or passwords to reach, search engine robots
will simply leave. They can't log in because they
can't fill out forms, leave email addresses or enter
passwords.
I was contacted by a webmaster for a 4500 page ecommerce
web site. He wondered why search engines were ignoring
such a large site. I asked the URL of the site and
visited the home page. I noted that upon loading,
there was an immediate passing of the URL http://anybusiness.com
site to a secure httpS://anybusiness.com page. This
has two immediate faults that may be a problem - the
forwarding method and different server. If the instant
forward is by javascript, bad news.
First, search engines often either penalize or downgrade
sites that use immediate URL forwarding, especially
from a home page. URL forwarding suggests doorway
pages (a search engine no-no) or affiliate URL's forwarding
to an affiliate program site, or the worst of all
scenarios, cloaking software on your server. You may
not be doing any of these things, but the robots don't
know, don't care, and don't index your site, plain
and simple.
Secondly, secure servers are very often a separate
web site, meaning that the secure server is actually
a different machine and is an entirely different site
from the non-secure server site unless your site is
hosted on a dedicated server on it's own IP address,
security certificate at the same domain. This can
happen when secure shopping carts are hosted by a
third party host so that a small ecommerce site needn't
purchase a security certificate or set up complex
shopping carts.
For example, if your shopping cart is hosted by Yahoo
stores or other application service providers (ASP's),
pages hosted in the shopping cart don't reside on
your domain and can't be recognized as pages on YOUR
site unless you also host your domain with the same
company. Unfortunately, many shopping cart ASP's use
dynamic IP addresses (IP address is different each
time you visit) and use database generated dynamic
pages.
So even if you do host your domain with that provider,
search engine spiders have reached another roadblock
in common with any dynamic web site. Dynamic pages
are created "on-the-fly" from information
contained in a database and called for from that database
and inserted into a page template before being served
to the visitor as an HTML document.
The process of serving dynamic pages is not the problem.
The problem is simply that the URL of those pages
contains several characters that either stop or severely
curtail search engine spiders. Question marks (?)
are the biggest culprit, followed by ampersands (&),
equal signs (=) percent symbols (%) and plus signs
(+) in the URL's of dynamic pages.
These symbols serve as alarm bells to the spiders
and either turn them away entirely or dramatically
slow the indexing of your pages. This is stated simply
in the Google "Information for Webmasters"
page http://www.google.com/webmasters/2.html
"Reasons your site may not be included.
"Your pages are dynamically generated. We are
able to index dynamically generated pages. However,
because our web crawler can easily overwhelm and crash
sites serving dynamic content, we limit the amount
of dynamic pages we index."
Just because your site is dynamically generated, creating
long URL's full of question marks, equal signs and ampersands
like
www.domain.com/category.asp?ct=this+28that+other%29&l=thing
doesn't mean you are in search engine limbo. There are
simple solutions available for your webmaster. Here
are a couple of articles explaining an elegant solution
called "mod_rewrite".
You can read about that technique if technically inclined:
http://alistapart.com/articles/urls/
http://alistapart.com/articles/succeed/
This technique is simply creating a set of instructions
for your web server to present URL's in a different
form that replaces those "bad" question marks
and ampersands with slash marks (/) instead. The method
will require that your webmaster is a bit more technically
savvy than most home business CEO's who created their
own web site. Some hosts will help here by simply turning
on the "mod_rewrite" for shared hosting clients.
Don't play hide and seek with the search engines! Tell
them EXACTLY where to find every page on your site and
if there's any question that they will find every page
on your site, give them a map.
A site map.
Hard code those dynamic URL's for most categories within
the categories of different sections of your web site
into your comprehensive site map. As long as those dynamic
links (even those that include ?=+%& symbols) are
hard coded into a site map, the spiders will follow
them. Clearly those 4500 pages mentioned earlier would
be too much for a site map listing. But the main category
pages could be provided for the engines. I visited the
site map page of the webmaster mentioned above and saw
fourteen pages listed on the site map. That explains
why they have fourteen pages, not 4500, indexed by Google.
How to find out how many pages of your site are indexed?
Go to Google search and type
"allinurl:www.domain.com"
without the quotes. Replace "domain" in the
above example with your own domain name. This query
operator will return a list of every page of your site.
Look in the blue bar across the top of the Google results
page and you'll see the number of pages indexed at your
site!
Finally, the biggest hide-and-seek game is played by
web sites with "framed" web sites. Again,
we'll turn to the Google page for the authoritative
word.
"Your page uses frames. Google supports
frames to the extent that it can. Frames . . . cause
problems with search engines, bookmarks, emailing
links and so on, because frames don't fit the conceptual
model of the web (every page corresponds to a
single URL)."
Owners of framed sites needn't be in search limbo, they
just need to adapt to the search engines. Here is a
tutorial from Search Engine Watch that explains some
workarounds.
http://www.searchenginewatch.com/webmasters/article.php/2167901
That should do it. Get indexed and stop playing hide-and-seek!