On Mon, Sep 2, 2013 at 7:24 PM, Jeff Davis <pgsql@j-davis.com> wrote:
On Thu, 2013-08-22 at 19:26 +0300, pablo platt wrote:
> I'll use a bit varying field with unlimited length. > > To record unique visitors per day, I'll flip the bit corresponding to > a user's id. > > Each event type will have a separate record. > It is possible to get useful information using union(&) and > intersection(|) of several fields. > > A field for 1 M users will required 1M bits = 125KB.
Even though you're modifying only a single bit at a time, every update will consume significant overhead. It will also be fairly hard to query and require procedural functions like you have.
One thing I would consider is using a "raw events" table for the incoming data, and then periodically summarize it into another table and delete the raw data after you summarize it.
insert into event_summary(day, unique_users) select day, count(*) from (select distinct ts::date as day, user_id from event_raw) r where day < current_date group by day; delete from event_raw where ts::date < current_date;
That will make it easier to query. Remember that if you need to include the current data in a report, you need to do a UNION (and probably make a view so that it's easier).
If you want to be a little more efficient, you can use a common table expression to do the delete and insert in one step:
with d as ( delete from event_raw where ts::date < current_date returning ts, user_id ) insert into event_summary(day, unique_users) select day, count(*) from (select distinct ts::date as day, user_id from d) r where day < current_date group by day;
Also, if you need to do hourly summaries instead of daily, then use date_trunc() rather than just casting to date.