Re: number of rows estimation for bit-AND operation - Mailing list pgsql-performance

From Slava Moudry
Subject Re: number of rows estimation for bit-AND operation
Date
Msg-id 622F69662CFE9F4182958973F99F3F15151033CBD0@EXVMBX017-12.exch017.msoutlookonline.net
Whole thread Raw
In response to Re: number of rows estimation for bit-AND operation  (Scott Marlowe <scott.marlowe@gmail.com>)
List pgsql-performance
> increase default stats target, analyze, try again.
This field has only 5 values. I had put values/frequencies in my first post.
Based on the values (see below) - there is no reason for planner to think that mt_flags&134=0 should return 12200 rows.
select mt_flags, count(*) from staging.tmp_t group by 1;
 mt_flags |  count
----------+---------
      128 |   57362
        4 |    1371
        8 |     627
        2 |   19072
        0 | 2361630
(5 rows)

In fact, if I rewrite the query using value matching - the estimations are right on:
explain analyze select count(*) from staging.tmp_t where mt_flags not in (128,2,4);
                                                       QUERY PLAN


------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=85878.63..85878.64 rows=1 width=0) (actual time=2904.005..2904.005 rows=1 loops=1)
   ->  Seq Scan on tmp_t  (cost=0.00..79973.85 rows=**2361910** width=0) (actual time=0.008..2263.983 rows=2362257
loops=1)
         Filter: (mt_flags <> ALL ('{128,2,4}'::integer[]))
 Total runtime: 2904.038 ms
(4 rows)


Anyways, I've been using statistics target of 100 in 8.3 and in 8.4 100 is default. I am currently using
default_statistics_target=1000.

Do you think that bit-and function might be skewing the statistics for execution plan somehow?
Thanks,
-Slava.

-----Original Message-----
From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
Sent: Tuesday, August 18, 2009 2:58 PM
To: Slava Moudry
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] number of rows estimation for bit-AND operation

2009/8/18 Slava Moudry <smoudry@4info.net>:
> Hi Scott,
> Thank you for reply.
> I am using Postgres 8.4.0 (btw - great release --very happy about it) and I got a different plan after following your
advice:

Yeah, you're returning most of the rows, so a seq scan makes sense.
Try indexing / matching on something more uncommon and you should get
an index scan.



> The seq scan is OK, since I don't expect Postgres to use index scan for such low-selective condition.
> It would be tough for me to support indexes for each bit flag value and their combinations. E.g. in the query below
itis again 200x off on number of rows. 

increase default stats target, analyze, try again.


> explain analyze select count(*) from staging.tmp_t where  mt_flags&134=0;
>                                                      QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=83054.43..83054.44 rows=1 width=0) (actual time=2964.960..2964.960 rows=1 loops=1)
>   ->  Seq Scan on tmp_t  (cost=0.00..83023.93 rows=12200 width=0) (actual time=0.014..2152.031 rows=2362257 loops=1)
>         Filter: ((mt_flags & 134) = 0)
>  Total runtime: 2965.009 ms
> (4 rows)
>
> I still wonder if it's something I could/should report as a bug? I've been struggling with this issue in 8.2, 8.3.x
 (nowusing 8.4.0). 
> We can more or less work around this by disabling nestloop in our analytics queries but I have problems enforcing
thisin reporting applications. 

Looks more like a low stats target.  Try increasing that first.

pgsql-performance by date:

Previous
From: Slava Moudry
Date:
Subject: Re: number of rows estimation for bit-AND operation
Next
From: Karl Denninger
Date:
Subject: Re: SQL Query Performance - what gives?