PDA

View Full Version : Google Sitemap Generator using log files


Argyll
10-04-2005, 12:02 PM
Google has a Python script (http://www.google.com/webmasters/sitemaps/docs/en/sitemap-generator.html) that indexes your site for the new-ish Sitemap feature. I'm trying to get it to work using the logfile option in the config file.

<!--
"accesslog" nodes tell the script to scan webserver log files to
extract URLs on your site. Both Common Logfile Format (Apache's default
logfile) and Extended Logfile Format (IIS's default logfile) can be read.

Required attributes:
path - path to the file

Optional attributes:
encoding - encoding of the file if not US-ASCII
-->
<accesslog path="/usr/local/apache/domlogs/blah" encoding="UTF-8" />

I think I'm having problems with the "blah" part. I get an error message after starting the script that says the file cannot be found.

Here's the relevant documentation:

Ensure that the path value is the complete path and filename on your web server. If the log files are not encoded as US-ASCII or UTF-8, then use the optional encoding attribute to specify the encoding. Rather than list each log file, you can use wildcards. For instance, in the above example, you could include the following entry that would include all three log files:

<accesslog path="/etc/httpd/logs/access.log*" encoding="UTF-8" />

The Sitemap Generator assigns priority to URLs it finds in the logs based on how often each URL is accessed. For instance, a URL that has been accessed 100 times will be given a higher priority than a URL that has been accessed twice. The actual priority assignment is relative and depends on each URL as compared to other URLs in the site.