Preventing Google from indexing empty folder

Home Forums Newbie Helpdesk Preventing Google from indexing empty folder

  • This topic is empty.
Viewing 10 posts - 1 through 10 (of 10 total)
  • Author
    Posts
  • #3290 Reply
    kderow
    Guest

    How can I make it so that Google does not index and spider empty folders on my server? When I say empty, I mean that I have folders without any index.html or index.php files, instead the folders are used to store images or files.

    Google spider reports error 402 or something similiar because there is no index.html or index.php. How can I prevent Google from looking for those files or is there a certain way to configure the server to handle requests for index pages that doesnt exist?

    Its nothing major, but I hate having errors in my sitemaps or on google pages.

    #3291 Reply
    ebbie
    Guest

    did you tried robots.txt saying don’t spider that directory?

    #3292 Reply
    robson
    Guest

    A 402 is a weird response code (if that is what it is). That’s a “payment required” response code. 403 (forbidden) and 404 (not found) would be the standard responses.

    Responses in that range are the correct way to tell Google there’s nothing there. If you’re getting errors in sitemaps because of it you need to correct the source of the problem – the sitemap. If it’s not a sitemap, then you need to find and fix the link to the non-existent document.

    If you use robots.txt you’ll still get an “error” of sorts in that it will report it as a URL blocked by robots.txt… Much better to fix the source of the problem.

    #3293 Reply
    maxaff
    Guest

    Right use Cpanel or .htaccess to deny directory listings or do an index.html that does a meta refresh to the homepage with a 5 second delay to avoid any SEO penalty.

    #3294 Reply
    kderow
    Guest

    Sorry yes, it might be a error 403.

    Its the errors in Google Sitemap (webmaster tools) I want to get rid off.

    What you say kind of makes sense, so I need to write some sort of exclusion in my sitemap config file for google so that those are not indexed rather than trying to do something on my server with folders?

    I just can’t quite figure out a wildcard exclusion in the config file which would cover any folders without a index file.. without also excluding any other files like images etc.

    For example, I would use this to exclude any folder called Z-A from being indexed.. but how do i do non index file folders?

    #3295 Reply
    kderow
    Guest

    If I do a redirect or metarefresh it will still come up as a error on Google Sitemaps. Unless I have a file there which doesnt redirect it will give some sort of error. Anything other than status 200, will be reported as error.

    It seems a bit anal to do this, but it helps keeping everything optimized.

    #3296 Reply
    robson
    Guest

    Given everything I’ve read. My advice is to not use a sitemap file at all unless it was custom written for your site and you know the logic behind everything it does. A lot of people have seen problems with off-the-shelf sitemap generators and there are many documented cases of them causing harm to sites’ rankings.

    The search engines know how to crawl pretty well. Don’t worry that you’re not going to do as well if you don’t submit a sitemap for old pages. Unless you’re telling them the priority of each page, there’s really no point to an XML sitemap. They can get everything else by spidering the page. Specifying priorities IMHO is the real benefit of an XML sitemap and you can’t do that with an off-the-shelf piece of software – you can only do that with something that hits your db and knows all the attributes of each page.

    I have a few sites that have proper XML sitemaps (custom built for the site). But usually I just submit RSS feeds as “sitemaps” to get content into the index quickly (and into Google Blog Search).

    #3297 Reply
    kderow
    Guest

    Sitemaps worked very well for me for some time. But I use Google’s own script which also automatically submits the sitemap each time its rebuilt (daily).

    Without it, all kinds of stuff would be indexed that I don’t want to have listed. And with over 200,000 pages you got to tell Google what to index otherwise it just skips things that might be important and index others which are not.

    #3298 Reply
    Northstar
    Guest

    Wow, do you create and manage all those pages your self? That has to keep you busy.

    #3299 Reply
    robson
    Guest

    I’d still urge you to build your own sitemap generator that builds pages off your database. You know more about your pages than Google ever will, and just ’cause it’s Google’s software doesn’t mean you can’t do better.

    But yes, with a site that large you do have to actively help googlebot understand the site. I’ve got a corporate site that has 97,000 pages in the Google index and sitemaps are really important on that site as is robots.txt and meta robots… It’s really sorta a 3 pronged approach – you have to use all of them – especially meta robots noindex – ’cause some you don’t want in the SERPs, but you do want passing link juice..

Viewing 10 posts - 1 through 10 (of 10 total)
Reply To: Preventing Google from indexing empty folder
Your information: