Thread: Performance regressions in PG 9.3 vs PG 9.0

Performance regressions in PG 9.3 vs PG 9.0

From
uher dslij
Date:
Hi,

We recently upgraded from pg 9.0.5 to 9.3.2 and we are observing much higher load on our hot standbys (we have 3).  As you can see from the query plans below, we have some queries that are running 4-5 times slower now, many due to what looks like a bad plan in 9.3.  Are there any known issues with query plan regressions in 9.3?  Any ideas about how I can get back the old planning behavior with 9.3.2?

Thanks in advance for any help!

On Production System
----------------------
Postgres 9.3.2
Intel(R) Xeon(R) CPU E5649 (2.53 Ghz 6-core)
12 GB RAM
Intel 710 SSD
---------------------

explain analyze select distinct on (t1.id) t1.id, t1.hostname as name, t1.active, t1.domain_id, t1.base, t1.port, t1.inter_domain_flag from location t1, host t2, container t3, resource_location t4 where t2.id = 34725278 and t3.id = t2.container_id and t4.location_id = t1.id and t4.parent_id in (select * from parentContainers(t3.id)) and t1.license is not null and (t1.license_end_date is null or t1.license_end_date >= current_date) and t1.active <> 0 and t3.active <> 0 and t4.active <> 0 and t1.domain_id = t2.domain_id and t2.domain_id = t3.domain_id and t3.domain_id = t4.domain_id and (0 = 0 or t1.active <> 0);                                                                                QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------Unique  (cost=313.44..313.45 rows=1 width=35) (actual time=989.836..989.837 rows=1 loops=1)  ->  Sort  (cost=313.44..313.44 rows=1 width=35) (actual time=989.836..989.837 rows=1 loops=1)        Sort Key: t1.id        Sort Method: quicksort  Memory: 25kB        ->  Nested Loop  (cost=1.27..313.43 rows=1 width=35) (actual time=922.484..989.791 rows=1 loops=1)              Join Filter: (SubPlan 1)              Rows Removed by Join Filter: 742              ->  Nested Loop  (cost=0.99..33.80 rows=1 width=53) (actual time=0.174..5.168 rows=934 loops=1)                    Join Filter: (t2.domain_id = t1.domain_id)                    ->  Nested Loop  (cost=0.71..11.23 rows=1 width=18) (actual time=0.101..0.103 rows=1 loops=1)                          ->  Index Scan using host_pkey on host t2  (cost=0.29..5.29 rows=1 width=12) (actual time=0.041..0.042 rows=1 loops=1)                                Index Cond: (id = 34725278::numeric)                          ->  Index Scan using container_pkey on container t3  (cost=0.42..5.43 rows=1 width=12) (actual time=0.057..0.058 rows=1 loops=1)                                Index Cond: (id = t2.container_id)                                Filter: ((active <> 0::numeric) AND (t2.domain_id = domain_id))                    ->  Index Scan using idx_location_domain_id on location t1  (cost=0.28..18.55 rows=8 width=35) (actual time=0.065..3.768 rows=934 loops=1)                          Index Cond: (domain_id = t3.domain_id)                          Filter: ((license IS NOT NULL) AND (active <> 0::numeric) AND ((license_end_date IS NULL) OR (license_end_date >= ('now'::cstring)::date)))                          Rows Removed by Filter: 297              ->  Index Scan using idx_resource_location_domain_id on resource_location t4  (cost=0.28..27.63 rows=1 width=21) (actual time=0.532..0.849 rows=1 loops=934)                    Index Cond: (domain_id = t1.domain_id)                    Filter: ((active <> 0::numeric) AND (t1.id = location_id))                    Rows Removed by Filter: 1003              SubPlan 1                ->  Function Scan on parentcontainers  (cost=0.25..500.25 rows=1000 width=32) (actual time=0.253..0.253 rows=2 loops=743)Total runtime: 990.045 ms
(26 rows)


On test box:
----------------------
Postgres 9.0.2
Intel(R) Xeon(R) CPU E5345 (2.33 Ghz 4-core)
8 GB RAM
6 x SAS 10K RAID 10
----------------------

explain analyze select distinct on (t1.id) t1.id, t1.hostname as name, t1.active, t1.domain_id, t1.base, t1.port, t1.inter_domain_flag from location t1, host t2, container t3, resource_location t4 where t2.id = 34725278 and t3.id = t2.container_id and t4.location_id = t1.id and t4.parent_id in (select * from parentContainers(t3.id)) and t1.license is not null and (t1.license_end_date is null or t1.license_end_date >= current_date) and t1.active <> 0 and t3.active <> 0 and t4.active <> 0 and t1.domain_id = t2.domain_id and t2.domain_id = t3.domain_id and t3.domain_id = t4.domain_id and (0 = 0 or t1.active <> 0);                                                                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Unique  (cost=389.96..389.97 rows=1 width=1192) (actual time=217.479..217.480 rows=1 loops=1)  ->  Sort  (cost=389.96..389.97 rows=1 width=1192) (actual time=217.477..217.477 rows=1 loops=1)        Sort Key: t1.id        Sort Method:  quicksort  Memory: 25kB        ->  Nested Loop  (cost=9.28..389.95 rows=1 width=1192) (actual time=103.359..217.437 rows=1 loops=1)              Join Filter: ((t1.domain_id = t3.domain_id) AND (SubPlan 1))              ->  Nested Loop  (cost=9.28..130.66 rows=1 width=1320) (actual time=18.494..29.577 rows=744 loops=1)                    Join Filter: (t2.domain_id = t1.domain_id)                    ->  Nested Loop  (cost=9.28..49.44 rows=12 width=160) (actual time=18.434..21.279 rows=1000 loops=1)                          ->  Index Scan using host_pkey on host t2  (cost=0.00..7.26 rows=1 width=64) (actual time=0.054..0.055 rows=1 loops=1)                                Index Cond: (id = 34725278::numeric)                          ->  Bitmap Heap Scan on resource_location t4  (cost=9.28..36.15 rows=12 width=96) (actual time=18.370..20.638 rows=1000 loops=1)                                Recheck Cond: (t4.domain_id = t2.domain_id)                                Filter: (t4.active <> 0::numeric)                                ->  Bitmap Index Scan on idx_resource_location_domain_id  (cost=0.00..9.28 rows=12 width=0) (actual time=10.377..10.377 rows=1004 loops=1)                                      Index Cond: (t4.domain_id = t2.domain_id)                    ->  Index Scan using location_pkey on location t1  (cost=0.00..6.26 rows=1 width=1192) (actual time=0.006..0.007 rows=1 loops=1000)                          Index Cond: (t1.id = t4.location_id)                          Filter: ((t1.license IS NOT NULL) AND (t1.active <> 0::numeric) AND ((t1.license_end_date IS NULL) OR (t1.license_end_date >= ('now'::text)::date)))              ->  Index Scan using container_pkey on container t3  (cost=0.00..7.29 rows=1 width=64) (actual time=0.005..0.006 rows=1 loops=744)                    Index Cond: (t3.id = t2.container_id)                    Filter: (t3.active <> 0::numeric)              SubPlan 1                ->  Function Scan on parentcontainers  (cost=0.25..500.25 rows=1000 width=32) (actual time=0.243..0.243 rows=2 loops=744)Total runtime: 217.735 ms

Re: Performance regressions in PG 9.3 vs PG 9.0

From
uher dslij
Date:
As a follow up to this issue on Graeme's suggestion in a private email,

I checked the statistics in both databases, and they were the same (these values as seen from pg_stat_user_tables to not seem to propagate to slave databases however).  I even ran a manual analyze on the master database in the 9.3.2 cluster, it did not affect query performance in the least bit.

We've installed all versions of postgres and tested the same query on the same data:

PG 9.0.x : 196 ms
PG 9.1.13 : 181 ms
PG 9.2.8 : 861 ms
PG 9.3.4 : 861 ms

The EXPLAINs all pretty much look like my original post.  The planner in 9.2 and above is simply not using bitmap heap scans or bitmap index scans?  What could be the reason for this?

Thanks in advance,


On Mon, Apr 7, 2014 at 2:34 PM, uher dslij <codon3@gmail.com> wrote:
Hi,

We recently upgraded from pg 9.0.5 to 9.3.2 and we are observing much higher load on our hot standbys (we have 3).  As you can see from the query plans below, we have some queries that are running 4-5 times slower now, many due to what looks like a bad plan in 9.3.  Are there any known issues with query plan regressions in 9.3?  Any ideas about how I can get back the old planning behavior with 9.3.2?

Thanks in advance for any help!

On Production System
----------------------
Postgres 9.3.2
Intel(R) Xeon(R) CPU E5649 (2.53 Ghz 6-core)
12 GB RAM
Intel 710 SSD
---------------------

explain analyze select distinct on (t1.id) t1.id, t1.hostname as name, t1.active, t1.domain_id, t1.base, t1.port, t1.inter_domain_flag from location t1, host t2, container t3, resource_location t4 where t2.id = 34725278 and t3.id = t2.container_id and t4.location_id = t1.id and t4.parent_id in (select * from parentContainers(t3.id)) and t1.license is not null and (t1.license_end_date is null or t1.license_end_date >= current_date) and t1.active <> 0 and t3.active <> 0 and t4.active <> 0 and t1.domain_id = t2.domain_id and t2.domain_id = t3.domain_id and t3.domain_id = t4.domain_id and (0 = 0 or t1.active <> 0);                                                                                QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------Unique  (cost=313.44..313.45 rows=1 width=35) (actual time=989.836..989.837 rows=1 loops=1)  ->  Sort  (cost=313.44..313.44 rows=1 width=35) (actual time=989.836..989.837 rows=1 loops=1)        Sort Key: t1.id        Sort Method: quicksort  Memory: 25kB        ->  Nested Loop  (cost=1.27..313.43 rows=1 width=35) (actual time=922.484..989.791 rows=1 loops=1)              Join Filter: (SubPlan 1)              Rows Removed by Join Filter: 742              ->  Nested Loop  (cost=0.99..33.80 rows=1 width=53) (actual time=0.174..5.168 rows=934 loops=1)                    Join Filter: (t2.domain_id = t1.domain_id)                    ->  Nested Loop  (cost=0.71..11.23 rows=1 width=18) (actual time=0.101..0.103 rows=1 loops=1)                          ->  Index Scan using host_pkey on host t2  (cost=0.29..5.29 rows=1 width=12) (actual time=0.041..0.042 rows=1 loops=1)                                Index Cond: (id = 34725278::numeric)                          ->  Index Scan using container_pkey on container t3  (cost=0.42..5.43 rows=1 width=12) (actual time=0.057..0.058 rows=1 loops=1)                                Index Cond: (id = t2.container_id)                                Filter: ((active <> 0::numeric) AND (t2.domain_id = domain_id))                    ->  Index Scan using idx_location_domain_id on location t1  (cost=0.28..18.55 rows=8 width=35) (actual time=0.065..3.768 rows=934 loops=1)                          Index Cond: (domain_id = t3.domain_id)                          Filter: ((license IS NOT NULL) AND (active <> 0::numeric) AND ((license_end_date IS NULL) OR (license_end_date >= ('now'::cstring)::date)))                          Rows Removed by Filter: 297              ->  Index Scan using idx_resource_location_domain_id on resource_location t4  (cost=0.28..27.63 rows=1 width=21) (actual time=0.532..0.849 rows=1 loops=934)                    Index Cond: (domain_id = t1.domain_id)                    Filter: ((active <> 0::numeric) AND (t1.id = location_id))                    Rows Removed by Filter: 1003              SubPlan 1                ->  Function Scan on parentcontainers  (cost=0.25..500.25 rows=1000 width=32) (actual time=0.253..0.253 rows=2 loops=743)Total runtime: 990.045 ms
(26 rows)


On test box:
----------------------
Postgres 9.0.2
Intel(R) Xeon(R) CPU E5345 (2.33 Ghz 4-core)
8 GB RAM
6 x SAS 10K RAID 10
----------------------

explain analyze select distinct on (t1.id) t1.id, t1.hostname as name, t1.active, t1.domain_id, t1.base, t1.port, t1.inter_domain_flag from location t1, host t2, container t3, resource_location t4 where t2.id = 34725278 and t3.id = t2.container_id and t4.location_id = t1.id and t4.parent_id in (select * from parentContainers(t3.id)) and t1.license is not null and (t1.license_end_date is null or t1.license_end_date >= current_date) and t1.active <> 0 and t3.active <> 0 and t4.active <> 0 and t1.domain_id = t2.domain_id and t2.domain_id = t3.domain_id and t3.domain_id = t4.domain_id and (0 = 0 or t1.active <> 0);                                                                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Unique  (cost=389.96..389.97 rows=1 width=1192) (actual time=217.479..217.480 rows=1 loops=1)  ->  Sort  (cost=389.96..389.97 rows=1 width=1192) (actual time=217.477..217.477 rows=1 loops=1)        Sort Key: t1.id        Sort Method:  quicksort  Memory: 25kB        ->  Nested Loop  (cost=9.28..389.95 rows=1 width=1192) (actual time=103.359..217.437 rows=1 loops=1)              Join Filter: ((t1.domain_id = t3.domain_id) AND (SubPlan 1))              ->  Nested Loop  (cost=9.28..130.66 rows=1 width=1320) (actual time=18.494..29.577 rows=744 loops=1)                    Join Filter: (t2.domain_id = t1.domain_id)                    ->  Nested Loop  (cost=9.28..49.44 rows=12 width=160) (actual time=18.434..21.279 rows=1000 loops=1)                          ->  Index Scan using host_pkey on host t2  (cost=0.00..7.26 rows=1 width=64) (actual time=0.054..0.055 rows=1 loops=1)                                Index Cond: (id = 34725278::numeric)                          ->  Bitmap Heap Scan on resource_location t4  (cost=9.28..36.15 rows=12 width=96) (actual time=18.370..20.638 rows=1000 loops=1)                                Recheck Cond: (t4.domain_id = t2.domain_id)                                Filter: (t4.active <> 0::numeric)                                ->  Bitmap Index Scan on idx_resource_location_domain_id  (cost=0.00..9.28 rows=12 width=0) (actual time=10.377..10.377 rows=1004 loops=1)                                      Index Cond: (t4.domain_id = t2.domain_id)                    ->  Index Scan using location_pkey on location t1  (cost=0.00..6.26 rows=1 width=1192) (actual time=0.006..0.007 rows=1 loops=1000)                          Index Cond: (t1.id = t4.location_id)                          Filter: ((t1.license IS NOT NULL) AND (t1.active <> 0::numeric) AND ((t1.license_end_date IS NULL) OR (t1.license_end_date >= ('now'::text)::date)))              ->  Index Scan using container_pkey on container t3  (cost=0.00..7.29 rows=1 width=64) (actual time=0.005..0.006 rows=1 loops=744)                    Index Cond: (t3.id = t2.container_id)                    Filter: (t3.active <> 0::numeric)              SubPlan 1                ->  Function Scan on parentcontainers  (cost=0.25..500.25 rows=1000 width=32) (actual time=0.243..0.243 rows=2 loops=744)Total runtime: 217.735 ms

Re: Performance regressions in PG 9.3 vs PG 9.0

From
Tom Lane
Date:
uher dslij <codon3@gmail.com> writes:
> The EXPLAINs all pretty much look like my original post.  The planner in
> 9.2 and above is simply not using bitmap heap scans or bitmap index scans?
>  What could be the reason for this?

I don't see any reason to think this is a planner regression.  The
rowcount estimates are pretty far off in both versions; so it's just a
matter of luck that 9.0 is choosing a better join order than 9.3.

I'd try cranking up the statistics targets for the join columns
(particularly domain_id) and see if that leads to better estimates.

            regards, tom lane


Re: Performance regressions in PG 9.3 vs PG 9.0

From
uher dslij
Date:
Thanks for your reply Tom.  I've found that the culprit is the function parentContainers(), which recurses in a folder structure and looks like this:

create function parentContainers(numeric) returns setof numeric
as '
    select parentContainers( (select container_id from container where id = $1 ) )
union
select id from container where id = $1
        ' language sql stable returns null on null input;


Is is declared stable, but I know that is just planner hint, so it doesn't guarantee that it will only get called once.  If I replace the function call with the two values this function returns, 


On Tue, Apr 8, 2014 at 5:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
uher dslij <codon3@gmail.com> writes:
> The EXPLAINs all pretty much look like my original post.  The planner in
> 9.2 and above is simply not using bitmap heap scans or bitmap index scans?
>  What could be the reason for this?

I don't see any reason to think this is a planner regression.  The
rowcount estimates are pretty far off in both versions; so it's just a
matter of luck that 9.0 is choosing a better join order than 9.3.

I'd try cranking up the statistics targets for the join columns
(particularly domain_id) and see if that leads to better estimates.

                        regards, tom lane

Re: Performance regressions in PG 9.3 vs PG 9.0

From
uher dslij
Date:
Sorry for the premature send on that last email.  Here is the full one:

Thanks for your reply Tom.  I've found that the culprit is the function parentContainers(), which recurses up a folder structure and looks like this:

create function parentContainers(numeric) returns setof numeric
as '
    select parentContainers( (select container_id from container where id = $1 ) )
union
    select id from container where id = $1
' language sql stable returns null on null input;

It is declared stable, but I know that STABLE is just planner hint, so it doesn't guarantee that it will only get called once.  If I replace the function call with the two values this function returns, I get < 1 ms runtime on all versions of pg.  So there is data to support the statement that we were relying on planner luck before and that luck has run out.

What is the best practice to ensure a stable function only gets called once?  Should I use a CTE to cache the result?  Is there a better way?

Thanks in advance,


On Tue, Apr 8, 2014 at 5:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
uher dslij <codon3@gmail.com> writes:
> The EXPLAINs all pretty much look like my original post.  The planner in
> 9.2 and above is simply not using bitmap heap scans or bitmap index scans?
>  What could be the reason for this?

I don't see any reason to think this is a planner regression.  The
rowcount estimates are pretty far off in both versions; so it's just a
matter of luck that 9.0 is choosing a better join order than 9.3.

I'd try cranking up the statistics targets for the join columns
(particularly domain_id) and see if that leads to better estimates.

                        regards, tom lane

Re: Performance regressions in PG 9.3 vs PG 9.0

From
Tom Lane
Date:
uher dslij <codon3@gmail.com> writes:
> Thanks for your reply Tom.  I've found that the culprit is the function
> parentContainers(), which recurses up a folder structure and looks like
> this:

Hmm ... I had noticed the execution of that in a subplan, but it appeared
that the subplan was being done the same number of times and took about
the same amount of time in both 9.0 and 9.3, so I'd discounted it as the
source of trouble.  Still, it's hard to argue with experimental evidence.

> create function parentContainers(numeric) returns setof numeric
> as '
>     select parentContainers( (select container_id from container where id =
> $1 ) )
> union
>     select id from container where id = $1
> ' language sql stable returns null on null input;

Yeah, that looks like performance trouble waiting to happen --- it's not
clear what would bound the recursion, for one thing.  Have you considered
replacing this with a RECURSIVE UNION construct?  Wasn't there in 9.0
of course, but 9.3 can do that.

            regards, tom lane