The Panix Help System -- Web Page Help

Interpreting Webalizer Reports

The first page

On this page, you'll see a graph such as this:

This is a summary view of the past 12 months, for comparison purposes. You'll be able to see long term trends in your web traffic at a quick glance.

The next (and last) thing of interest on the first page of your webalizer report is a table which summarizes each month for which you have run a webalizer report, and which gives both daily averages and monthly totals for Hits, Files, Pages and Visits, as well as the monthly total in Kilobytes*.


Summary by Month

Month	Daily Avg				Monthly Totals
Month	Hits	Files	Pages	Visits	Sites	KBytes	Visits	Pages	Files	Hits

Jul 2004	55	13	2	1	97	7627	42	64	423	1718
Jun 2004	629	263	9	3	114	157292	105	274	7896	18876
May 2004	49	22	2	1	112	40322	51	76	665	1498
Apr 2004	5	3	1	0	51	4714	24	34	118	160
Mar 2004	6	2	1	1	56	98	33	37	78	210
Feb 2004	13	4	1	1	73	1434	35	40	126	384
Jan 2004	12	9	1	1	55	466	35	49	294	380
Dec 2003	24	4	2	1	50	1192	44	62	148	732
Nov 2003	24	3	1	1	63	284	42	50	92	728
Oct 2003	89	55	3	1	58	28154	56	112	1708	2788
Sep 2003	17	14	1	1	19	450	37	47	422	516
Aug 2003	8	6	1	1	24	2797	34	42	208	272

Totals						244830	538	887	12178	28262

Each month in the left hand column is a link to a more detailed breakdown of that month's traffic.

The Details

The data being presented to you takes the form of Hits, Files, Visits, Sites, Pages, Kilobytes, URLs, Referrers, User Agents and Response codes. All of these data are generated by interpreting a series of web transfers as logged by our Apache web servers. Apache creates a line such as this:
66.196.90.216 - - [12/Aug/2004:04:02:03 -0400] "GET http://www.arrgh.net/music/data.php?composer_name=Gorecki HTTP/1.0" 200 659 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)"

Each such line breaks down to the following data:

Internet address of the machine visiting your site
The date, time, and time zone as an offset from GMT
The specific request. In most cases, this will be GET, but there may also be POST and HEAD requests.
The URL being requested
The HTTP protocol used for the request
The response code
The number of bytes transferred
The referring page
The user agent

From this data, webalizer crafts various views of your web traffic.

Definitions

Hits measure the total number of requests made to the server during the given time period (month, day, hour etc..). Each separate item on a single page will produce a hit when the page is requested. For example, if you have a page with 3 graphics and some text, a request for that page will (usually) result in 4 hits.
Files measure the total number of hits (requests) that actually resulted in something being sent back to the user. Not all hits will send data, such as 404-Not Found requests and requests for pages that are already in the browsers cache.
A Site is a unique IP address (or hostname, if you are doing name resolution) that made requests to the server. This is less useful than it may appear, because many different computers can share a single address, and the same visitor can also visit from many addresses, so it should be used simply as a rough gauge as to the number of visitors to your server.
Pages are those URLs that would be considered the actual page being requested, and not all of the individual items that make it up (such as images and audio clips). Webalizer's default is to consider any URL that has an extension of .htm, .html or .cgi as a Page. If you use php at panix, you might wish to add lines to your webalizer.conf file to add .php to this list:
PageType .htm* PageType .cgi PageType .php
A Visit is recorded when some remote site makes a request for a page on your server for the first time. As long as the same site keeps making requests within a given timeout period, they will all be considered part of the same visit. If the site makes a request to your server, and the length of time since the last request is greater than the specified timeout (default is 30 minutes), a new visit is counted. Since only pages will trigger a visit, remote sites that link to graphic and other non-page URLs will not be counted in the visit totals.
A KByte* (KB) is 1024 bytes (1 Kilobyte). Used to show the amount of data that was transfered between the server and remote machines, based on the data found in the server log. Note that at Panix, the logs you should be using for webalizer do not accurately reflect the total number of Kilobytes sent, so webalizer's Kilobyte count should not be used for accounting purposes, such as double checking transfers on your bill. See below for more information on this.
URL - Uniform Resource Locator. All requests made to a web server need to request something. A URL is that something, and represents an object somewhere on your server, that is accessible to the remote user, or results in an error (ie: 404 - Not found). URLs can be of any file type.
Referrers are those URLs that lead a user to your site or caused the browser to request something from your server. The vast majority of requests are made from your own URLs, since most HTML pages contain links to other objects such as graphics files. If one of your HTML pages contains links to 5 images, then each request for the HTML page will produce 5 more hits with your page as referrer.
Search Strings are obtained by examining the referrer string and looking for known patterns from various search engines. The search engines and the patterns to look for can be specified by the user within a configuration file. The default will catch most of the major ones.
User Agents Are software programs which connect to the web server and make requests. Most User Agents are browsers, such as IE, Mozilla or Netscape. Each user agent reports itself in a unique way to your server. Keep in mind however, that many browsers allow the user to change it's reported name, so you might see some obvious fake names in the listing.
Entry and Exit pages are those pages that were the first requested in a visit (Entry), and the last requested (Exit). These pages are calculated using the visits logic above. When a visit is first triggered, the requested page is counted as an Entry page, and whatever the last requested URL was, is counted as an Exit page.
Countries are determined based on the top level domain of the requesting site. This is questionable however, as there is no longer strong enforcement of domains as there was in the past. A .COM domain may reside in the US, or somewhere else. An .IL domain may actually be in Israel, however it may also be located in the US or elsewhere. A large percentage may also be shown as Unresolved/Unknown because a fairly large percentage of dialup and other customer access points do not resolve IP addresses to a name, and so are left as an IP address. If you are not doing name resolution in your reports, all hits will be recorded as Unresolved/Unknown here.
Response Codes are defined as part of the HTTP/1.1 protocol (RFC 2068; See Chapter 10). These codes are generated by the web server and indicate the completion status of each request made to it.

* You cannot use the Kbytes as reported by webalizer to check your billed transfers. Panix uses web accelerators known as Squids. The squids cache pages and, if the actual page has not changed, serves the page from it's cache rather than the web server. In the process of doing this, duplicate log entries are created for a File; one for the squids and one for the web server. The web server log entry will not show any bytes transferred, however, so you need to get squid logs as well as web logs to check on bytes transferred. This can be done using the '-a' switch to getlogs. We do not do this by default for webalizer processing because the duplicate log entries would render the rest of Webalizer's statistics grossly inaccurate.