optimized counting of web statistics - Mailing list pgsql-performance

From Billy extyeightysix
Subject optimized counting of web statistics
Date
Msg-id 22455e7050628133978b17cd4@mail.gmail.com
Whole thread Raw
In response to Re: tricky query  ("Merlin Moncure" <merlin.moncure@rcsonline.com>)
Responses Re: optimized counting of web statistics  (Billy extyeightysix <exty86@gmail.com>)
Re: optimized counting of web statistics  (Matthew Nuzum <mattnuzum@gmail.com>)
List pgsql-performance
Hola folks,

I have a web statistics Pg database (user agent, urls, referrer, etc)
that is part of an online web survey system. All of the data derived
from analyzing web server logs is stored in one large table with each
record representing an analyzed webserver log entry.

Currently all reports are generated when the logs are being analyzed
and before the data ever goes into the large table I mention above.
Well, the time has come to build an interface that will allow a user
to make ad-hoc queries against the stats and that is why I am emailing
the performance list.

I need to allow the user to specify any fields and values in a query.
For example,

"I want to see a report about all users from Germany that have flash
installed"  or
"I want to see a report about all users from Africa that are using FireFox 1"

I do not believe that storing all of the data in one big table is the
correct way to go about this. So, I am asking for suggestions,
pointers and any kind of info that anyone can share on how to store
this data set in an optimized manner.

Also, I have created a prototype and done some testing using the
colossal table. Actually finding all of the rows that satisfy the
query is pretty fast and is not a problem.  The bottleneck in the
whole process is actually counting each data point (how many times a
url was visited, or how many times a url referred the user to the
website). So more specifically I am wondering if there is way to store
and retrieve the data such that it speeds up the counting of the
statistics.

Lastly, this will become an open source tool that is akin to urchin,
awstats, etc. The difference is that this software is part of a suite
of tools for doing online web surveys and it maps web stats to the
survey respondent data.  This can give web site managers a very clear
view of what type of people come to the site and how those types use
the site.

Thanks in advance,

exty

pgsql-performance by date:

Previous
From: Sebastian Hennebrueder
Date:
Subject: Re: tricky query
Next
From: Billy extyeightysix
Date:
Subject: Re: optimized counting of web statistics