RE: Bad estimates - Mailing list pgsql-performance

From Artur Zając
Subject RE: Bad estimates
Date
Msg-id 03e801d363b2$4953bef0$dbfb3cd0$@ang.com.pl
Whole thread Raw
In response to Re: Bad estimates  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
Thank you for your response,

Clause used by me is not important (I used binary & operator only for
example), I tried to show some kind of problems.

Now I did another test:

alter table xyz add x int;
alter table xyz add y int;
alter table xyz add z int;
update xyz set x=gs,y=gs,z=gs;

create index xyza_i1 on xyz ((x%200));
create index xyza_i2 on xyz ((y%200));
create index xyza_i3 on xyz ((z%200));

vacuum full verbose xyza;

And now:

explain analyze select gs from xyza where (x%200)=1 and (y%200)=1 and
(z%200)=1;
                                                           QUERY PLAN
----------------------------------------------------------------------------
------------------------------------------------------Bitmap Heap Scan on xyz  (cost=2782.81..2786.83 rows=1 width=4)
(actual
time=134.827..505.642 rows=50000 loops=1)  Recheck Cond: (((z % 200) = 1) AND ((y % 200) = 1) AND ((x % 200) = 1))
HeapBlocks: exact=50000  ->  BitmapAnd  (cost=2782.81..2782.81 rows=1 width=0) (actual 
time=108.712..108.712 rows=0 loops=1)        ->  Bitmap Index Scan on xyza_i3  (cost=0.00..927.43 rows=50000
width=0) (actual time=22.857..22.857 rows=50000 loops=1)              Index Cond: ((z % 200) = 1)        ->  Bitmap
IndexScan on xyza_i2  (cost=0.00..927.43 rows=50000 
width=0) (actual time=26.058..26.058 rows=50000 loops=1)              Index Cond: ((y % 200) = 1)        ->  Bitmap
IndexScan on xyza_i1  (cost=0.00..927.43 rows=50000 
width=0) (actual time=23.079..23.079 rows=50000 loops=1)              Index Cond: ((x % 200) = 1)Planning time: 0.340
msExecutiontime: 513.171 ms 
(12 rows)

Estimates are exactly the same because it's assumed that if first clause
reduces records count by n, second by m, third by o then bringing all of
them together will reduce the result records count by n*m*o, so it is the
general behaviour, independent of whether they are statistics or not.

You suggest:

> If you're not wedded to this data representation, consider replacing that
integer flags column with a bunch of boolean columns.  You might or might
not want indexes on the booleans, but > in any case ANALYZE would create
stats that would allow decent estimates for "WHERE boolval".

But, did you ever think about something like this?

CREATE STATISTICS ON (x&1) FROM xyz;

(using the syntax similar to CREATE STATISTICS from PostgreSQL 10).

Sometimes It's not possibile to divide one column into many , and as I know,
it is not worth creating an index if there are few different values in the
table.


Artur Zajac


-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Wednesday, November 22, 2017 4:02 PM
To: Artur Zając <azajac@ang.com.pl>
Cc: pgsql-performance@lists.postgresql.org
Subject: Re: Bad estimates

Artur Zając <azajac@ang.com.pl> writes:
[ poor estimates for WHERE clauses like "(gs & 1) = 1" ]

Don't hold your breath waiting for that to get better on its own.
You need to work with the planner, not expect it to perform magic.
It has no stats that would help it discover what the behavior of that sort
of WHERE clause is; nor is there a good reason for it to think that the
selectivity of such a clause is only 0.5 rather than something more in line
with the usual behavior of an equality constraint on an integer value.

One way you could attack the problem, if you're wedded to this data
representation, is to create expression indexes on the terms "(gs & x)"
for all the values of x you use.  Not only would that result in better
estimates (after an ANALYZE) but it would also open the door to satisfying
this type of query through an index search.  A downside is that updating all
those indexes could make DML on the table pretty expensive.

If you're not wedded to this data representation, consider replacing that
integer flags column with a bunch of boolean columns.  You might or might
not want indexes on the booleans, but in any case ANALYZE would create stats
that would allow decent estimates for "WHERE boolval".
        regards, tom lane




pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Query became very slow after 9.6 -> 10 upgrade
Next
From: Andres Freund
Date:
Subject: Re: pg_dump 3 times as slow after 8.4 -> 9.5 upgrade