Re: how many record versions - Mailing list pgsql-general

From Greg Stark
Subject Re: how many record versions
Date
Msg-id 87vfimq114.fsf@stark.xeocode.com
Whole thread Raw
In response to Re: how many record versions  (David Garamond <lists@zara.6.isreserved.com>)
Responses Re: how many record versions  (David Garamond <lists@zara.6.isreserved.com>)
List pgsql-general
David Garamond <lists@zara.6.isreserved.com> writes:

> Actually, each record will be incremented probably only thousands of times a
> day. But there are many banners. Each record has a (bannerid, campaignid,
> websiteid, date, countrycode) "dimensions" and (impression, click) "measures".

In the past when I had a very similar situation we kept the raw impression and
click event data. Ie, one record per impression in the impression table and
one record per click in the click data.

That makes the tables insert-only which is efficient and not prone to locking
contention. They would never have to be vacuumed except after purging old data.

Then to accelerate queries we had denormalized aggregate tables with a cron
job that did the equivalent of

insert into agg_clicks (
  select count(*),bannerid
    from clicks
   where date between ? and ?
   group by bannerid
  )

Where the ?s were actually hourly periods. Ie, at 12:15 it ran this query for
the 11-12 period.

This meant we didn't have immediate up-to-date stats on banners but it meant
we did have stats on every single impression and click including time and
information about the users.

This worked out very well for reporting needs. If your system is using the
data to handle serving the ads, though, it's a different kettle of fish. For
that I think you'll want something that avoids having to do a database query
for every single impression.


--
greg

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: how many record versions
Next
From: Paul Thomas
Date:
Subject: Re: pg_dump error