Re: number of rows estimation for bit-AND operation - Mailing list pgsql-performance

From Scott Marlowe
Subject Re: number of rows estimation for bit-AND operation
Date
Msg-id dcc563d10908180008g6e53e53do7aa16c928fa996b3@mail.gmail.com
Whole thread Raw
In response to number of rows estimation for bit-AND operation  (Slava Moudry <smoudry@4info.net>)
Responses Re: number of rows estimation for bit-AND operation  (Slava Moudry <smoudry@4info.net>)
List pgsql-performance
On Mon, Aug 17, 2009 at 2:07 PM, Slava Moudry<smoudry@4info.net> wrote:
> Hi,
>
> I am using int8 field to pack a number of error flags. This is very common
> technique for large tables to pack multiple flags in one integer field.
>
> For most records – the mt_flags field is 0. Here is the statistics (taken
> from pgAdmin Statistics tab for mt_flags column):
>
> Most common Values: {0,128,2,4,8)
>
> Most common Frequencies: {0.96797,0.023,0.0076,0.0005,0.00029)
>
> What I notice that when bit-AND function is used – Postgres significantly
> underestimates the amount of rows:
>
> explain analyze select count(*) from mt__20090801 where  mt_flags&8=0;
>
>                               QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------------
>
>  Aggregate  (cost=83054.43..83054.44 rows=1 width=0) (actual
> time=2883.154..2883.154 rows=1 loops=1)
>
>    ->  Seq Scan on mt__20090801  (cost=0.00..83023.93 rows=12200 width=0)
> (actual time=0.008..2100.390 rows=2439435 loops=1)
>
>          Filter: ((mt_flags & 8) = 0)
>
>  Total runtime: 2883.191 ms
>
> (4 rows)
>
> This is not an issue for the particular query above, but I noticed that due
> to that miscalculation in many cases Postgres chooses plan with Nested Loops
> for other queries. I can fix it by setting enable_nest_loops to off, but
> it's not something I should set for all queries.
>
> Is there any way to help Postgres make a better estimation for number of
> rows returned by bit function?

You can index on the function.  For instance:

create table t (mt_flags int);
create index t_mtflags_bit on t ((mt_flags&8));
insert into t select case when random() > 0.95 then case when random()
>0.5 then 8 else 12 end else 0 end from generate_series(1,10000);
analyze t;
explain select * from t where mt_flags&8=8;
                                QUERY PLAN
--------------------------------------------------------------------------
 Index Scan using t_mtflags_bit on t  (cost=0.00..52.17 rows=467 width=4)
   Index Cond: ((mt_flags & 8) = 8)
(2 rows)

Hope that helps a little.

pgsql-performance by date:

Previous
From: Greg Smith
Date:
Subject: Re: Memory reporting on CentOS Linux
Next
From: Pierre Frédéric Caillaud
Date:
Subject: Re: Getting time of a postgresql-request