Thread: Bit count
I'm trying to convert this method using bitmaps in redis for real time metrics to postgresql:
http://blog.getspool.com/2011/11/29/fast-easy-realtime-metrics-using-redis-bitmaps/
I'll use a bit varying field with unlimited length.http://blog.getspool.com/2011/11/29/fast-easy-realtime-metrics-using-redis-bitmaps/
Each event type will have a separate record.
It is possible to get useful information using union(&) and intersection(|) of several fields.
A field for 1 M users will required 1M bits = 125KB.It is possible to get useful information using union(&) and intersection(|) of several fields.
Is there a built-in function to count bits similar to BIT_COUNT in mysql and BITCOUNT in redis?
I've found several plpgsql functions in the mailing list.
What is the best function to use for large bitstrings (>100KB)?
Thanks
CREATE OR REPLACE FUNCTION bitcount(i integer) RETURNS integer AS $$
DECLARE n integer;
DECLARE amount integer;
BEGIN
amount := 0;
FOR n IN 1..16 LOOP
amount := amount + ((i >> (n-1)) & 1);
END LOOP;
RETURN amount;
END
$$ LANGUAGE plpgsql;
create or replace function bitsetlen(bit) returns int as $$
declare i int;
c int;
begin
c:=0;
for i in 1..length($1) loop
if substring($1,i,1)=B'1' then
c:=c+1;
end if;
end loop;
return c;
end;
$$ language plpgsql;
SELECT (myBit & 1 + myBit >> 1 & 1 + myBit >> 2 & 1) AS bitCount FROM myBitTable;
SELECT LENGTH( REPLACE( CAST( B'101000000000000000000010' AS TEXT ), '0', ''));
On Thu, 2013-08-22 at 19:26 +0300, pablo platt wrote: > I'll use a bit varying field with unlimited length. > > To record unique visitors per day, I'll flip the bit corresponding to > a user's id. > > Each event type will have a separate record. > It is possible to get useful information using union(&) and > intersection(|) of several fields. > > A field for 1 M users will required 1M bits = 125KB. Even though you're modifying only a single bit at a time, every update will consume significant overhead. It will also be fairly hard to query and require procedural functions like you have. One thing I would consider is using a "raw events" table for the incoming data, and then periodically summarize it into another table and delete the raw data after you summarize it. Something like: create table event_raw(ts timestamptz, user_id integer); create table event_summary(day date, unique_users bigint); And after a day has passed do: insert into event_summary(day, unique_users) select day, count(*) from (select distinct ts::date as day, user_id from event_raw) r where day < current_date group by day; delete from event_raw where ts::date < current_date; That will make it easier to query. Remember that if you need to include the current data in a report, you need to do a UNION (and probably make a view so that it's easier). If you want to be a little more efficient, you can use a common table expression to do the delete and insert in one step: with d as ( delete from event_raw where ts::date < current_date returning ts, user_id ) insert into event_summary(day, unique_users) select day, count(*) from (select distinct ts::date as day, user_id from d) r where day < current_date group by day; Also, if you need to do hourly summaries instead of daily, then use date_trunc() rather than just casting to date. I hope this helps, Jeff Davis
I'll try it.
Thank you.
Thank you.
On Mon, Sep 2, 2013 at 7:24 PM, Jeff Davis <pgsql@j-davis.com> wrote:
On Thu, 2013-08-22 at 19:26 +0300, pablo platt wrote:Even though you're modifying only a single bit at a time, every update
> I'll use a bit varying field with unlimited length.
>
> To record unique visitors per day, I'll flip the bit corresponding to
> a user's id.
>
> Each event type will have a separate record.
> It is possible to get useful information using union(&) and
> intersection(|) of several fields.
>
> A field for 1 M users will required 1M bits = 125KB.
will consume significant overhead. It will also be fairly hard to query
and require procedural functions like you have.
One thing I would consider is using a "raw events" table for the
incoming data, and then periodically summarize it into another table and
delete the raw data after you summarize it.
Something like:
create table event_raw(ts timestamptz, user_id integer);
create table event_summary(day date, unique_users bigint);
And after a day has passed do:
insert into event_summary(day, unique_users)
select day, count(*)
from
(select distinct ts::date as day, user_id
from event_raw) r
where day < current_date
group by day;
delete from event_raw where ts::date < current_date;
That will make it easier to query. Remember that if you need to include
the current data in a report, you need to do a UNION (and probably make
a view so that it's easier).
If you want to be a little more efficient, you can use a common table
expression to do the delete and insert in one step:
with d as (
delete from event_raw where ts::date < current_date
returning ts, user_id
)
insert into event_summary(day, unique_users)
select day, count(*)
from
(select distinct ts::date as day, user_id from d) r
where day < current_date
group by day;
Also, if you need to do hourly summaries instead of daily, then use
date_trunc() rather than just casting to date.
I hope this helps,
Jeff Davis