Thread: bad planning with 75% effective_cache_size

bad planning with 75% effective_cache_size

From

Istvan Endredy

Date:

05 April 2012, 10:26:41

Hi,

i've ran into a planning problem.

Dedicated PostgreSQL Server:
"PostgreSQL 9.1.3 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit"
Memory: 8GB
4CPUs

The problem is reduced to the following: there are 2 tables:
-product (3millions rows, 1GB)
-product_parent (3000rows, 0.5MB)

If effective_cache_size has a greater value (6GB), this select has a bad planning and long query time (2000ms):

select distinct product_code from product p_
inner join product_parent par_ on p_.parent_id=par_.id
where par_.parent_name like 'aa%' limit 2

If effective_cache_size is smaller (32MB), planning is ok and query is fast. (10ms)
In the worst case (effective_cache_size=6GB) the speed depends on the value of 'limit' (in select): if it is smaller, query is slower. (12ms)

Good planning: http://explain.depesz.com/s/0FD
"Limit (cost=3704.00..3704.02 rows=2 width=5) (actual time=0.215..0.217 rows=1 loops=1)"
" -> HashAggregate (cost=3704.00..3712.85 rows=885 width=5) (actual time=0.213..0.215 rows=1 loops=1)"
"        -> Nested Loop (cost=41.08..3701.79 rows=885 width=5) (actual time=0.053..0.175 rows=53 loops=1)"
"              -> Index Scan using telepulesbugreport_nev_idx on product_parent par_ (cost=0.00..8.27 rows=1 width=4) (actual time=0.016..0.018 rows=1 loops=1)"
"                    Index Cond: (((parent_name)::text ~>=~ 'aa'::text) AND ((parent_name)::text ~<~ 'ab'::text))"
"                    Filter: ((parent_name)::text ~~ 'aa%'::text)"
"              -> Bitmap Heap Scan on product p_ (cost=41.08..3680.59 rows=1034 width=9) (actual time=0.033..0.125 rows=53 loops=1)"
"                    Recheck Cond: (parent_id = par_.id)"
"                    -> Bitmap Index Scan on kapubugreport_telepules_id_idx (cost=0.00..40.82 rows=1034 width=0) (actual time=0.024..0.024 rows=53 loops=1)"
"                          Index Cond: (parent_id = par_.id)"
"Total runtime: 0.289 ms"

Bad planning: http://explain.depesz.com/s/yBh
"Limit (cost=0.00..854.37 rows=2 width=5) (actual time=1799.209..4344.041 rows=1 loops=1)"
" -> Unique (cost=0.00..378059.84 rows=885 width=5) (actual time=1799.207..4344.038 rows=1 loops=1)"
"        -> Nested Loop (cost=0.00..378057.63 rows=885 width=5) (actual time=1799.204..4344.020 rows=53 loops=1)"
"              Join Filter: (p_.parent_id = par_.id)"
"              -> Index Scan using kapubugreport_irsz_telepules_id_idx on product p_ (cost=0.00..334761.59 rows=2885851 width=9) (actual time=0.015..1660.449 rows=2884172 loops=1)"
"              -> Materialize (cost=0.00..8.27 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=2884172)"
"                    -> Index Scan using telepulesbugreport_nev_idx on product_parent par_ (cost=0.00..8.27 rows=1 width=4) (actual time=0.013..0.014 rows=1 loops=1)"
"                          Index Cond: (((parent_name)::text ~>=~ 'aa'::text) AND ((parent_name)::text ~<~ 'ab'::text))"
"                          Filter: ((parent_name)::text ~~ 'aa%'::text)"
"Total runtime: 4344.083 ms"

schema:

CREATE TABLE product
(
id serial NOT NULL,
parent_id integer NOT NULL,
product_code character varying COLLATE pg_catalog."C" NOT NULL,
product_name character varying NOT NULL
)
WITH (
OIDS=FALSE
);
ALTER TABLE product
OWNER TO aa;

CREATE INDEX product_code_parent_id_idx
ON product
USING btree
(product_code COLLATE pg_catalog."C" , parent_id );

CREATE INDEX product_name_idx
ON product
USING btree
(product_name COLLATE pg_catalog."default" );

CREATE INDEX product_parent_id_idx
ON product
USING btree
(parent_id );

CREATE INDEX product_parent_id_ocde_idx
ON product
USING btree
(parent_id , product_code COLLATE pg_catalog."C" );

CREATE TABLE product_parent
(
id serial NOT NULL,
parent_name character varying NOT NULL,
CONSTRAINT telepulesbugreport_pkey PRIMARY KEY (id )
)
WITH (
OIDS=FALSE
);
ALTER TABLE product_parent
OWNER TO aa;

CREATE INDEX product_parent_name_idx
ON product_parent
USING btree
(parent_name COLLATE pg_catalog."default" varchar_pattern_ops);

I hope you can help me... :)
Best Regads,
Istvan

Re: bad planning with 75% effective_cache_size

From

"Kevin Grittner"

Date:

05 April 2012, 12:42:22

Istvan Endredy <istvan.endredy@gmail.com> wrote:

> i've ran into a planning problem.

> If effective_cache_size has a greater value (6GB), this select has
> a bad planning and long query time (2000ms):

Could you try that configuration with one change and let us know how
it goes?:

set cpu_tuple_cost = '0.05';

I've seen an awful lot of queries benefit from a higher value for
that setting, and I'm starting to think a change to that default is
in order.

-Kevin

Re: bad planning with 75% effective_cache_size

From

Istvan Endredy

Date:

13 April 2012, 10:58:16

Hi Kevin,

thanks for the suggestion. It was my 1st task to try this after Easter. :)

Sorry to say this parameter doesn't help:

bad planning:
set cpu_tuple_cost = '0.05';
set effective_cache_size to '6GB';
1622ms
http://explain.depesz.com/s/vuO

or
set cpu_tuple_cost = '0.01';
set effective_cache_size to '6GB';
1634ms
http://explain.depesz.com/s/YqS

good planning:
set effective_cache_size to '32MB';
set cpu_tuple_cost = '0.05';
22ms
http://explain.depesz.com/s/521

or
set effective_cache_size to '32MB';
set cpu_tuple_cost = '0.01';
12ms
http://explain.depesz.com/s/Ypc

this was the query:
select distinct product_code from product p_
inner join product_parent par_ on p_.parent_id=par_.id
where par_.parent_name like 'aa%' limit 2

Any idea?
Thanks in advance,
Istvan

2012/4/5 Kevin Grittner <Kevin.Grittner@wicourts.gov>

Istvan Endredy <istvan.endredy@gmail.com> wrote:

> i've ran into a planning problem.

> If effective_cache_size has a greater value (6GB), this select has
> a bad planning and long query time (2000ms):

Could you try that configuration with one change and let us know how
it goes?:

set cpu_tuple_cost = '0.05';

I've seen an awful lot of queries benefit from a higher value for
that setting, and I'm starting to think a change to that default is
in order.

-Kevin

Re: bad planning with 75% effective_cache_size

From

Filippos

Date:

15 April 2012, 17:02:19

could you try to set the statistics parameter to 1000 (ALTER TABLE SET
STATISTICS) for these tables, then run analyze and try again?

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/bad-planning-with-75-effective-cache-size-tp5620363p5642356.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: bad planning with 75% effective_cache_size

From

Istvan Endredy

Date:

17 April 2012, 17:10:46

Hi,

thanks for the suggestion, but it didn't help. We have tried it earlier.

7500ms
http://explain.depesz.com/s/ctn

ALTER TABLE product_parent ALTER COLUMN parent_name SET STATISTICS 1000;
ALTER TABLE product ALTER COLUMN parent_id SET STATISTICS 1000;
ANALYZE product_parent;
ANALYZE product;

query was:
select distinct product_code from product p_
inner join product_parent par_ on p_.parent_id=par_.id
where par_.parent_name like 'aa%' limit 2


i've played with the query, and found an interesting behaviour: its
speed depends on value of limit:
select ... limit 2; => 1500ms
select ... limit 20; => 14ms  (http://explain.depesz.com/s/4iL)
select ... limit 50; => 17ms

These were with high effective_cache_size (6GB). Somehow it uses good
planning in these cases.
If it helps i can send the db to repro (53M).

Any tips to try?
Thanks in advance,
Istvan

Re: bad planning with 75% effective_cache_size

From

Віталій Тимчишин

Date:

18 April 2012, 03:51:33

How about

with par_ as (select * from product_parent where parent_name like 'aa%' )

select distinct product_code from product p_
inner join par_ on p_.parent_id=par_.id
limit 2

?

2012/4/3 Istvan Endredy <istvan.endredy@gmail.com>

Hi,

i've ran into a planning problem.

select distinct product_code from product p_
inner join product_parent par_ on p_.parent_id=par_.id
where par_.parent_name like 'aa%' limit 2

If effective_cache_size is smaller (32MB), planning is ok and query is fast. (10ms)
In the worst case (effective_cache_size=6GB) the speed depends on the value of 'limit' (in select): if it is smaller, query is slower. (12ms)

--
Best regards,
Vitalii Tymchyshyn

Re: bad planning with 75% effective_cache_size

From

Josh Berkus

Date:

18 April 2012, 21:45:03

On 4/17/12 2:49 AM, Istvan Endredy wrote:
> Hi,
>
> thanks for the suggestion, but it didn't help. We have tried it earlier.
>
> 7500ms
> http://explain.depesz.com/s/ctn

This plan seems very odd -- doing individual index lookups on 2.8m rows
is not standard planner behavior.  Can you confirm that all of your
other query cost parameters are the defaults?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: bad planning with 75% effective_cache_size

From

Ants Aasma

Date:

19 April 2012, 07:33:51

On Thu, Apr 19, 2012 at 3:44 AM, Josh Berkus <josh@agliodbs.com>
>> 7500ms
>> http://explain.depesz.com/s/
> This plan seems very odd -- doing individual index lookups on 2.8m rows
> is not standard planner behavior.  Can you confirm that all of your
> other query cost parameters are the defaults?

This similat to the issue with limit that Simon was complaining about
a few weeks ago [1]. A lot of the estimation in the planner is biased
to give overestimations for number of rows returned in the face of
uncertainty. This works well for joins but interacts really badly with
limits. The two issues here are the join cardinality being
overestimated a factor of 15x and then the unique is off by another
50x. The result is that the planner thinks that it needs to scan 0.25%
of the input, while actually it needs to scan the whole of it,
underestimating the real cost by a factor of 400.

I'm not sure what to do about unique node overestimation, but I think
it could be coaxed to be less optimistic about the limit by adding an
optimization barrier and some selectivity decreasing clauses between
the limit and the rest of the query:

select * from (
    select distinct product_code from product p_
    inner join product_parent par_ on p_.parent_id=par_.id
    where par_.parent_name like 'aa%'
    offset 0 -- optimization barrier
) as x
where product_code = product_code -- reduce selectivity estimate by 200x
limit 2;

[1] http://archives.postgresql.org/message-id/CA+U5nMLbXfUT9cWDHJ3tpxjC3bTWqizBKqTwDgzebCB5bAGCgg@mail.gmail.com

Cheers,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

Re: bad planning with 75% effective_cache_size

From

Istvan Endredy

Date:

19 April 2012, 12:25:44

Hi everybody,

thanks for the so many responses. :)


> On Thu, Apr 19, 2012 at 3:44 AM, Josh Berkus <josh@agliodbs.com>
>>> 7500ms
>>> http://explain.depesz.com/s/
>> This plan seems very odd -- doing individual index lookups on 2.8m rows
>> is not standard planner behavior.  Can you confirm that all of your
>> other query cost parameters are the defaults?

Josh: i confirm the non-default values:
i've ran this query: http://wiki.postgresql.org/wiki/Server_Configuration
its result:
"version";"PostgreSQL 9.1.3 on x86_64-unknown-linux-gnu, compiled by
gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit"
"bytea_output";"escape"
"client_encoding";"UNICODE"
"client_min_messages";"log"
"effective_cache_size";"6GB"
"lc_collate";"en_US.UTF-8"
"lc_ctype";"en_US.UTF-8"
"listen_addresses";"*"
"log_directory";"/var/log/postgres"
"log_duration";"on"
"log_error_verbosity";"default"
"log_filename";"postgresql-%Y-%m-%d.log"
"log_line_prefix";"%t %u@%h %d %p %i "
"log_lock_waits";"on"
"log_min_duration_statement";"0"
"log_min_error_statement";"warning"
"log_min_messages";"warning"
"log_rotation_age";"15d"
"log_statement";"all"
"logging_collector";"on"
"max_connections";"100"
"max_stack_depth";"2MB"
"port";"5432"
"server_encoding";"UTF8"
"shared_buffers";"6024MB"
"TimeZone";"Europe/Budapest"
"wal_buffers";"16MB"


Ants:
> I'm not sure what to do about unique node overestimation, but I think
> it could be coaxed to be less optimistic about the limit by adding an
> optimization barrier and some selectivity decreasing clauses between
> the limit and the rest of the query:
>
> select * from (
>    select distinct product_code from product p_
>    inner join product_parent par_ on p_.parent_id=par_.id
>    where par_.parent_name like 'aa%'
>    offset 0 -- optimization barrier
> ) as x
> where product_code = product_code -- reduce selectivity estimate by 200x
> limit 2;

Its planning: http://explain.depesz.com/s/eF3h
1700ms

Віталій:
> How about
>
> with par_ as (select * from product_parent where parent_name like 'aa%' )
> select distinct product_code from product p_
> inner join par_ on p_.parent_id=par_.id
> limit 2

Its planning: http://explain.depesz.com/s/YIS

All suggestions are welcome,
Istvan