Thread: Performance regressions found using sqlfuzz

Performance regressions found using sqlfuzz

From

"Jung, Jinho"

Date:

11 February 2019, 22:29:36

Hello,

We are developing a tool called sqlfuzz for automatically finding performance regressions in PostgreSQL. sqlfuzz performs mutational fuzzing to generate SQL queries that take more time to execute on the latest version of PostgreSQL compared to prior versions. We hope that these queries would help further increase the utility of the regression test suite.

We would greatly appreciate feedback from the community regarding the queries found by the tool so far. We have already incorporated prior feedback from the community in the latest version of sqlfuzz.

We are sharing four SQL queries that exhibit regressions in this report. These queries have an average size of 245 bytes. Here’s an illustrative query:

EXAMPLE:

select distinct

ref_0.i_im_id as c0,

ref_1.ol_dist_info as c1

from

public.item as ref_0

right join public.order_line as ref_1

on (cast(null as "numeric") <> 1)

Time taken on PostgreSQL v9.5: 15.1 (seconds)

Time taken on PostgreSQL v11: 64.8 (seconds)

Here are the steps for reproducing our observations:

[Test environment]

* Ubuntu 16.04 machine "Linux sludge 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux"

* Postgres installed via APT package manager

* Database: TPC-C benchmark (with three scale factors)

[Setup Test Environment]

1. Install PostgreSQL v11 and v9.5

$ wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -

$ sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt/ xenial-pgdg main" > /etc/apt/sources.list.d/pgdg_xenial.list'

$ sudo apt update

$ sudo sudo apt-get install postgresql-11

$ sudo sudo apt-get install postgresql-9.5

* set password of postgres user (with your desirable one)

$ sudo passwd postgres

* change port number of two version of DBs

$ sudo vim /etc/postgresql/11/main/postgresql.conf

=> change "port = ????" ==> "port = 5435"

$ sudo vim /etc/postgresql/9.5/main/postgresql.conf

=> change "port = ????" ==> "port = 5432"

* restart DB

$ sudo pg_ctlcluster 9.5 main restart

$ sudo pg_ctlcluster 11 main restart

=> check you have opened ports at 5432 and 5435

* setup privilege

$ sudo -i -u postgres

$ psql -p 5432 (then copy and run the below query to setup password)

# ALTER USER postgres PASSWORD 'mysecretpassword';

# \q

$ psql -p 5435 (then copy and run the below query to setup password)

# ALTER USER postgres PASSWORD 'mysecretpassword';

# \q

$ exit (to original user)

$ sudo -u postgres createuser -s $(whoami); createdb $(whoami)

* test your setting by showing DB version

< old version >

$ PGPASSWORD=mysecretpassword psql -h 127.0.0.1 -p 5432 -U postgres -c "select version();"

< new version >

$ PGPASSWORD=mysecretpassword psql -h 127.0.0.1 -p 5435 -U postgres -c "select version();"

2. Set up TPC-C test benchmark

* Download TPC-C (scale factor of 1) and extract it

$ wget https://gts3.org/~/jjung/tpcc/tpcc1.tar.gz

$ wget https://gts3.org/~/jjung/tpcc/tpcc10.tar.gz

$ wget https://gts3.org/~/jjung/tpcc/tpcc50.tar.gz

$ tar xzvf tpcc1.tar.gz

$ tar xzvf tpcc10.tar.gz

$ tar xzvf tpcc50.tar.gz

* Create DB (example of TPC-C scale factor 1)

$ PGPASSWORD=mysecretpassword psql -h 127.0.0.1 -p 5432 -U postgres -c "create database test_bd;"

$ PGPASSWORD=mysecretpassword psql -h 127.0.0.1 -p 5435 -U postgres -c "create database test_bd;"

* Import benchmark (example of TPC-C scale factor 1)

$ PGPASSWORD=mysecretpassword psql -h 127.0.0.1 -p 5432 -U postgres -d test_bd -f ./tpcc_host.pgsql

$ PGPASSWORD=mysecretpassword psql -h 127.0.0.1 -p 5435 -U postgres -d test_bd -f ./tpcc_host.pgsql

* (Optional) Deleting databases

$ PGPASSWORD=mysecretpassword psql -h 127.0.0.1 -p 5432 -U postgres -c "drop database test_bd;"

$ PGPASSWORD=mysecretpassword psql -h 127.0.0.1 -p 5435 -U postgres -c "drop database test_bd;"

3. Test SQL queries that exhibit performance regressions

We are sharing four queries in this report. We vary the scale-factor of the TPC-C benchmark from 1 through 50 to demonstrate that the performance regressions are more prominent on larger databases.

* Download queries

$ wget https://gts3.org/~/jjung/tpcc/case.tar.gz

$ tar xzvf case.tar.gz

* Execute the queries

$ PGPASSWORD=mysecretpassword psql -t -A -F"," -h 127.0.0.1 -p 5432 -U postgres -d test_bd -f case-1.sql

$ PGPASSWORD=mysecretpassword psql -t -A -F"," -h 127.0.0.1 -p 5435 -U postgres -d test_bd -f case-1.sql

Here’s the time taken to execute four SQL queries on old (v9.5) and newer version (v11) of PostgreSQL (in milliseconds):

+----------------------+--------+---------+---------+

+----------------------+--------+---------+---------+

| Case-1 (v9.5) | 28 | 273 | 1459 |

| Case-1 (v11) | 90 | 854 | 4818 |

+----------------------+--------+---------+---------+

| Case-2 (v9.5) | 229 | 2793 | 15096 |

| Case-2 (v11) | 838 | 11276 | 64808 |

+----------------------+--------+---------+---------+

| Case-3 (v9.5) | 28 | 248 | 1231 |

| Case-3 (v11) | 74 | 677 | 3345 |

+----------------------+--------+---------+---------+

| Case-4 (v9.5) | 0.03 | 0.03 | 0.04 |

| Case-4 (v11) | 0.04 | 0.04 | 632 |

+----------------------+--------+---------+---------+

1) CASE-1 shares same plan but shows different execution time. Execution time increases on larger databases.

2) CASE-2 shows different cost estimation and it causes performance regression. Execution time increases on larger databases.

3) CASE-3 uses different executor. Newer version (PG11.1) uses parallel seq scan but shows slower execution time. Execution time increases on larger databases.

4) CASE-4 shows performance regression only in TPC-C with scale factor 50. Instead of using index scan, newer version (PG11.1) applies filter, thereby increasing the time taken to execute the query.

We would greatly appreciate feedback from the community regarding these queries and are looking forward to improving the tool based on the community’s feedback.

Thanks.

Jinho Jung

Re: Performance regressions found using sqlfuzz

From

Christoph Berg

Date:

12 February 2019, 12:00:18

Re: Jung, Jinho 2019-02-11 <BN6PR07MB3409922471073F2B619A8CA4EE640@BN6PR07MB3409.namprd07.prod.outlook.com>
> We are developing a tool called sqlfuzz for automatically finding performance regressions in PostgreSQL. sqlfuzz
performsmutational fuzzing to generate SQL queries that take more time to execute on the latest version of PostgreSQL
comparedto prior versions. We hope that these queries would help further increase the utility of the regression test
suite.

Hi,

is sqlfuzz available anywhere? Does it have anything to do with
sqlsmith? The query formatting looks very much like sqlsmith's.

Christoph

Re: Performance regressions found using sqlfuzz

From

Jeff Janes

Date:

12 February 2019, 18:03:48

On Tue, Feb 12, 2019 at 4:23 AM Jung, Jinho <jinho.jung@gatech.edu> wrote:

Hello,

We are developing a tool called sqlfuzz for automatically finding performance regressions in PostgreSQL. sqlfuzz performs mutational fuzzing to generate SQL queries that take more time to execute on the latest version of PostgreSQL compared to prior versions. We hope that these queries would help further increase the utility of the regression test suite.

We would greatly appreciate feedback from the community regarding the queries found by the tool so far. We have already incorporated prior feedback from the community in the latest version of sqlfuzz.

This approach doesn't seem very exciting to me as-is, because optimization is a very pragmatic endeavor. We make decisions all the time that might make some queries better and others worse. If the queries that get better are natural/common ones, and the ones that get worse are weird/uncommon ones (like generated by a fuzzer), then making that change is an improvement even if there are some performance (as opposed to correctness) regressions.

I would be more interested in investigating some of these if the report would:

1) include the exact commit in which the regression was introduced (i.e. automate "git bisect").

2) verify that the regression still exists in the dev HEAD and report which commit it was verified in (since HEAD changes frequently).

3) report which queries (if any) in your corpus were made better by the same commit which made the victim query worse.

Cheers,

Jeff

Re: Performance regressions found using sqlfuzz

From

"Jung, Jinho"

Date:

14 February 2019, 17:27:40

Hi Jeff,

Thanks for the feedback! The git bisect idea was particularly helpful.

We use query complexity constraints in sqlfuzz to ensure that the constructed queries are realistic (e.g., limit the query size, automatically minimize the query, avoid esoteric expressions and functions, restrict number of joins, etc.).

Our goal is to augment the test suite with queries that will assist developers with more comprehensively evaluating the impact of new optimization heuristics, query processing strategies etc. We are working on improving the utility of the tool and your feedback on these reports will be super helpful. Thanks.

For each regression, we share:

1) the associated query,

2) the commit that activated it,

3) our high-level analysis, and

4) query execution plans in old and new versions of PostgreSQL.

All these regressions are observed on the latest version (dev HEAD).

####### QUERY 2:

select distinct

ref_0.i_im_id as c0,

ref_1.ol_dist_info as c1

from

public.item as ref_0 right join

public.order_line as ref_1

on (ref_0.i_id = 5)

- Commit: 84f9a35 (Improve estimate of distinct values in estimate_num_groups())

- Our analysis: We believe that this regression is related to the new logic for estimating the number of distinct values in the optimizer. This is affecting even queries with point lookups (ref_0.i_id = 5) in the TPC-C benchmark.

- Query Execution Plans

[OLD version]

HashAggregate (cost=11972.38..12266.20 rows=29382 width=29) (actual time=233.543..324.973 rows=300144 loops=1)

Group Key: ref_0.i_im_id, ref_1.ol_dist_info

-> Nested Loop Left Join (cost=0.29..10471.64 rows=300148 width=29) (actual time=0.012..114.955 rows=300148 loops=1)

-> Seq Scan on order_line ref_1 (cost=0.00..6711.48 rows=300148 width=25) (actual time=0.004..25.061 rows=300148 loops=1)

-> Materialize (cost=0.29..8.32 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=300148)

-> Index Scan using item_pkey on item ref_0 (cost=0.29..8.31 rows=1 width=4) (actual time=0.005..0.005 rows=1 loops=1)

Index Cond: (i_id = 10)

Planning time: 0.267 ms

Execution time: 338.027 ms

[NEW version]

Unique (cost=44960.08..47211.19 rows=300148 width=29) (actual time=646.545..885.502 rows=300144 loops=1)

-> Sort (cost=44960.08..45710.45 rows=300148 width=29) (actual time=646.544..838.933 rows=300148 loops=1)

Sort Key: ref_0.i_im_id, ref_1.ol_dist_info

Sort Method: external merge Disk: 11480kB

-> Nested Loop Left Join (cost=0.29..10471.64 rows=300148 width=29) (actual time=0.016..111.889 rows=300148 loops=1)

-> Seq Scan on order_line ref_1 (cost=0.00..6711.48 rows=300148 width=25) (actual time=0.004..24.612 rows=300148 loops=1)

-> Materialize (cost=0.29..8.32 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=300148)

-> Index Scan using item_pkey on item ref_0 (cost=0.29..8.31 rows=1 width=4) (actual time=0.008..0.008 rows=1 loops=1)

Index Cond: (i_id = 10)

Planning Time: 0.341 ms

Execution Time: 896.662 ms

####### QUERY 3:

select

cast(ref_1.ol_i_id as int4) as c0

from

public.stock as ref_0

left join public.order_line as ref_1

on (ref_1.ol_number is not null)

where ref_1.ol_number is null

- Commit: 77cd477 (Enable parallel query by default.)

- Our analysis: We believe that this regression is due to parallel queries being enabled by default. Surprisingly, we found that even on a larger TPC-C database (scale factor of 50, roughly 4GB), parallel scan is still slower than the non-parallel one in the old version, when the query is not returning any tuples.

- Query Execution Plans

[OLD version]

Nested Loop Anti Join (cost=0.00..18006.81 rows=1 width=4) (actual time=28.689..28.689 rows=0 loops=1)

-> Seq Scan on stock ref_0 (cost=0.00..5340.00 rows=100000 width=0) (actual time=0.028..15.722 rows=100000 loops=1)

-> Materialize (cost=0.00..9385.22 rows=300148 width=4) (actual time=0.000..0.000 rows=1 loops=100000)

-> Seq Scan on order_line ref_1 (cost=0.00..6711.48 rows=300148 width=4) (actual time=0.004..0.004 rows=1 loops=1)

Filter: (ol_number IS NOT NULL)

Planning time: 0.198 ms

Execution time: 28.745 ms

[NEW version]

Gather (cost=1000.00..15164.93 rows=1 width=4) (actual time=91.022..92.634 rows=0 loops=1)

Workers Planned: 2

Workers Launched: 2

-> Nested Loop Anti Join (cost=0.00..14164.83 rows=1 width=4) (actual time=88.889..88.889 rows=0 loops=3)

-> Parallel Seq Scan on stock ref_0 (cost=0.00..4756.67 rows=41667 width=0) (actual time=0.025..7.331 rows=33333 loops=3)

-> Seq Scan on order_line ref_1 (cost=0.00..6711.48 rows=300148 width=4) (actual time=0.002..0.002 rows=1 loops=100000)

Filter: (ol_number IS NOT NULL)

Planning Time: 0.258 ms

Execution Time: 92.699 ms

####### QUERY 4:

select

ref_0.s_dist_06 as c0

from

public.stock as ref_0

where (ref_0.s_w_id < cast(least(0, 1) as int8))

- Commit: 5edc63b (Account for the effect of lossy pages when costing bitmap scans)

- Our analysis: We believe that this regression has to do with two factors: 1) conditional expression (e.g., LEAST or NULLIF) are not reduced to constants unlike string functions (e.g., CHAR_LENGTH) 2) change in the cost estimation function for bitmap scan. Execution time grows by 3 orders of magnitude. We note that this regression is only observed on large databases (e.g., scale factor of 50).

- Query Execution Plans

[OLD version]

Bitmap Heap Scan on stock ref_0 (cost=31201.11..273173.13 rows=1666668 width=25) (actual time=0.005..0.005 rows=0 loops=1)

Recheck Cond: (s_w_id < (LEAST(0, 1))::bigint)

-> Bitmap Index Scan on stock_pkey (cost=0.00..30784.44 rows=1666668 width=0) (actual time=0.005..0.005 rows=0 loops=1)

Index Cond: (s_w_id < (LEAST(0, 1))::bigint)

Planning time: 0.228 ms

Execution time: 0.107 ms

[NEW version]

Seq Scan on stock ref_0 (cost=0.00..304469.17 rows=1666613 width=25) (actual time=716.397..716.397 rows=0 loops=1)

Filter: (s_w_id < (LEAST(0, 1))::bigint)

Rows Removed by Filter: 5000000

Planning Time: 0.221 ms

Execution Time: 716.513 ms

####### QUERY 1:

select

ref_0.o_d_id as c0

from

public.oorder as ref_0

where EXISTS (

select

from

(select distinct

ref_0.o_entry_d as c0,

ref_1.c_credit as c1

from

public.customer as ref_1

where (false)

) as subq_1

);

- Commit: bf6c614 (Do execGrouping.c via expression eval machinery, take two)

- Our analysis: We are not sure about the root cause of this regression. This might have to do with grouping logic.

- Query Execution Plans

[OLD version]

Seq Scan on oorder ref_0 (cost=0.00..77184338.54 rows=15022 width=4) (actual time=34.173..34.173 rows=0 loops=1)

Filter: (SubPlan 1)

Rows Removed by Filter: 30044

SubPlan 1

-> Subquery Scan on subq_1 (cost=2569.01..2569.03 rows=1 width=0) (actual time=0.001..0.001 rows=0 loops=30044)

-> HashAggregate (cost=2569.01..2569.02 rows=1 width=3) (actual time=0.000..0.000 rows=0 loops=30044)

Group Key: ref_0.o_entry_d, ref_1.c_credit

-> Result (cost=0.00..2569.00 rows=1 width=3) (actual time=0.000..0.000 rows=0 loops=30044)

One-Time Filter: false

-> Seq Scan on customer ref_1 (cost=0.00..2569.00 rows=1 width=3) (never executed)

Planning time: 0.325 ms

Execution time: 34.234 ms

[NEW version]

Seq Scan on oorder ref_0 (cost=0.00..1152.32 rows=15022 width=4) (actual time=74.799..74.799 rows=0 loops=1)

Filter: (SubPlan 1)

Rows Removed by Filter: 30044

SubPlan 1

-> Subquery Scan on subq_1 (cost=0.00..0.02 rows=1 width=0) (actual time=0.002..0.002 rows=0 loops=30044)

-> HashAggregate (cost=0.00..0.01 rows=1 width=20) (actual time=0.000..0.000 rows=0 loops=30044)

Group Key: ref_0.o_entry_d, c_credit

-> Result (cost=0.00..0.00 rows=0 width=20) (actual time=0.000..0.000 rows=0 loops=30044)

One-Time Filter: false

Planning Time: 0.350 ms

Execution Time: 79.237 ms

From: Jeff Janes <jeff.janes@gmail.com>
Sent: Tuesday, February 12, 2019 1:03 PM
To: Jung, Jinho
Cc: pgsql-performance@postgresql.org
Subject: Re: Performance regressions found using sqlfuzz

On Tue, Feb 12, 2019 at 4:23 AM Jung, Jinho <jinho.jung@gatech.edu> wrote:

Hello,

We are developing a tool called sqlfuzz for automatically finding performance regressions in PostgreSQL. sqlfuzz performs mutational fuzzing to generate SQL queries that take more time to execute on the latest version of PostgreSQL compared to prior versions. We hope that these queries would help further increase the utility of the regression test suite.

We would greatly appreciate feedback from the community regarding the queries found by the tool so far. We have already incorporated prior feedback from the community in the latest version of sqlfuzz.

I would be more interested in investigating some of these if the report would:

1) include the exact commit in which the regression was introduced (i.e. automate "git bisect").

2) verify that the regression still exists in the dev HEAD and report which commit it was verified in (since HEAD changes frequently).

3) report which queries (if any) in your corpus were made better by the same commit which made the victim query worse.

Cheers,

Jeff

Re: Performance regressions found using sqlfuzz

From

Andrew Gierth

Date:

15 February 2019, 16:20:07

>>>>> "Jung" == Jung, Jinho <jinho.jung@gatech.edu> writes:

 Jung>   select distinct
 Jung>     ref_0.i_im_id as c0,
 Jung>     ref_1.ol_dist_info as c1
 Jung>   from
 Jung>       public.item as ref_0 right join
 Jung>       public.order_line as ref_1
 Jung>       on (ref_0.i_id = 5)

 Jung> - Commit: 84f9a35 (Improve estimate of distinct values in estimate_num_groups())

 Jung> - Our analysis: We believe that this regression is related to the
 Jung> new logic for estimating the number of distinct values in the
 Jung> optimizer. This is affecting even queries with point lookups
 Jung> (ref_0.i_id = 5) in the TPC-C benchmark.

So what's happening here is that the old plan was mis-estimating the
result, believed incorrectly that it would fit into work_mem, and
generated a hashaggregate plan accordingly; it ran fast because
hashaggregate doesn't spill to disk but silently overflows work_mem.

The new plan correctly estimates the result size, and therefore is
forbidden from generating the hashaggregate plan at the default work_mem
setting; it generates a sort plan, and the sort of course spills to disk
since work_mem is exceeded.

Had the value of work_mem been set to something appropriate for the
workload, then the query plan would not have changed.

So the problem (from an automated testing perspective) is that an actual
_improvement_ in the code is being reported as a regression.

 Jung> ####### QUERY 3:

 Jung>   select
 Jung>     cast(ref_1.ol_i_id as int4) as c0
 Jung>   from
 Jung>     public.stock as ref_0
 Jung>       left join public.order_line as ref_1
 Jung>       on (ref_1.ol_number is not null)
 Jung>   where ref_1.ol_number is null

 Jung> - Commit: 77cd477 (Enable parallel query by default.)

 Jung> - Our analysis: We believe that this regression is due to
 Jung> parallel queries being enabled by default. Surprisingly, we found
 Jung> that even on a larger TPC-C database (scale factor of 50, roughly
 Jung> 4GB), parallel scan is still slower than the non-parallel one in
 Jung> the old version, when the query is not returning any tuples.

The problem here is not actually with parallel scans as such, but rather
the omission of a Materialize node in the parallel plan, and what looks
like some rather serious mis-costing of the nestloop antijoin.

 Jung> ####### QUERY 4:

 Jung>   select
 Jung>     ref_0.s_dist_06 as c0
 Jung>   from
 Jung>     public.stock as ref_0
 Jung>   where (ref_0.s_w_id < cast(least(0, 1) as int8))

 Jung> - Commit: 5edc63b (Account for the effect of lossy pages when costing bitmap scans)

 Jung> - Our analysis: We believe that this regression has to do with
 Jung> two factors: 1) conditional expression (e.g., LEAST or NULLIF)
 Jung> are not reduced to constants unlike string functions (e.g.,
 Jung> CHAR_LENGTH) 2) change in the cost estimation function for bitmap
 Jung> scan. Execution time grows by 3 orders of magnitude. We note that
 Jung> this regression is only observed on large databases (e.g., scale
 Jung> factor of 50).

Again, this is showing up because of a large database and a small
work_mem. The bitmap scan on stock only becomes lossy if the number of
rows matched in the index is very large relative to work_mem; the lack
of plan-time evaluation of LEAST means that the planner doesn't have any
good way to estimate the selectivity, so it's taking a default estimate.

 Jung> ####### QUERY 1:

 Jung>   select
 Jung>     ref_0.o_d_id as c0
 Jung>   from
 Jung>     public.oorder as ref_0
 Jung>   where EXISTS (
 Jung>     select
 Jung>       1
 Jung>     from
 Jung>       (select distinct
 Jung>           ref_0.o_entry_d as c0,
 Jung>           ref_1.c_credit as c1
 Jung>        from
 Jung>           public.customer as ref_1
 Jung>        where (false)
 Jung>       ) as subq_1
 Jung>   );

 Jung> - Commit: bf6c614 (Do execGrouping.c via expression eval machinery, take two)

 Jung> - Our analysis: We are not sure about the root cause of this
 Jung> regression. This might have to do with grouping logic.

What this query is basically exercising is how fast one can do
ExecReScan on a DISTINCT query, without also considering the performance
effects of actually doing the grouping (the constant-false qual here
means that the grouping comparison is never actually performed). An
optimization tradeoff that speeds up comparisons within a scan at the
cost of a fixed overhead for the scan will therefore make this query
slower, but it still seems a good tradeoff to make (of course it would
be even better to make the overhead per-query rather than per-scan, and
there were other issues with this commit that should have been caught at
the time).

-- 
Andrew (irc:RhodiumToad)

Re: Performance regressions found using sqlfuzz

From

Andres Freund

Date:

15 February 2019, 16:44:34

Hi,

On 2019-02-14 17:27:40 +0000, Jung, Jinho wrote:
> ####### QUERY 2:
> 
>   select distinct
>     ref_0.i_im_id as c0,
>     ref_1.ol_dist_info as c1
>   from
>       public.item as ref_0 right join
>       public.order_line as ref_1
>       on (ref_0.i_id = 5)
> 
> - Commit: 84f9a35 (Improve estimate of distinct values in estimate_num_groups())
> 
> - Our analysis: We believe that this regression is related to the new logic for estimating the number of distinct
valuesin the optimizer. This is affecting even queries with point lookups (ref_0.i_id = 5) in the TPC-C benchmark.
 
> 
> - Query Execution Plans
> 
>   [OLD version]
>   HashAggregate  (cost=11972.38..12266.20 rows=29382 width=29) (actual time=233.543..324.973 rows=300144 loops=1)
>     Group Key: ref_0.i_im_id, ref_1.ol_dist_info
>     ->  Nested Loop Left Join  (cost=0.29..10471.64 rows=300148 width=29) (actual time=0.012..114.955 rows=300148
loops=1)
>           ->  Seq Scan on order_line ref_1  (cost=0.00..6711.48 rows=300148 width=25) (actual time=0.004..25.061
rows=300148loops=1)
 
>           ->  Materialize  (cost=0.29..8.32 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=300148)
>                 ->  Index Scan using item_pkey on item ref_0  (cost=0.29..8.31 rows=1 width=4) (actual
time=0.005..0.005rows=1 loops=1)
 
>                       Index Cond: (i_id = 10)
>   Planning time: 0.267 ms
>   Execution time: 338.027 ms
> 
> 
>   [NEW version]
>   Unique  (cost=44960.08..47211.19 rows=300148 width=29) (actual time=646.545..885.502 rows=300144 loops=1)
>     ->  Sort  (cost=44960.08..45710.45 rows=300148 width=29) (actual time=646.544..838.933 rows=300148 loops=1)
>           Sort Key: ref_0.i_im_id, ref_1.ol_dist_info
>           Sort Method: external merge  Disk: 11480kB
>           ->  Nested Loop Left Join  (cost=0.29..10471.64 rows=300148 width=29) (actual time=0.016..111.889
rows=300148loops=1)
 
>                 ->  Seq Scan on order_line ref_1  (cost=0.00..6711.48 rows=300148 width=25) (actual
time=0.004..24.612rows=300148 loops=1)
 
>                 ->  Materialize  (cost=0.29..8.32 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=300148)
>                       ->  Index Scan using item_pkey on item ref_0  (cost=0.29..8.31 rows=1 width=4) (actual
time=0.008..0.008rows=1 loops=1)
 
>                             Index Cond: (i_id = 10)
>   Planning Time: 0.341 ms
>   Execution Time: 896.662 ms

This seems perfectly alright - the old version used more memory than
work_mem actually should have allowed. I'd bet you get the performance
back if you set work mem to a bigger value.


> ####### QUERY 3:
> 
>   select
>     cast(ref_1.ol_i_id as int4) as c0
>   from
>     public.stock as ref_0
>       left join public.order_line as ref_1
>       on (ref_1.ol_number is not null)
>   where ref_1.ol_number is null
> 
> - Commit: 77cd477 (Enable parallel query by default.)
> 
> - Our analysis: We believe that this regression is due to parallel queries being enabled by default. Surprisingly, we
foundthat even on a larger TPC-C database (scale factor of 50, roughly 4GB), parallel scan is still slower than the
non-parallelone in the old version, when the query is not returning any tuples.
 
> 
> - Query Execution Plans
> 
>   [OLD version]
>   Nested Loop Anti Join  (cost=0.00..18006.81 rows=1 width=4) (actual time=28.689..28.689 rows=0 loops=1)
>     ->  Seq Scan on stock ref_0  (cost=0.00..5340.00 rows=100000 width=0) (actual time=0.028..15.722 rows=100000
loops=1)
>     ->  Materialize  (cost=0.00..9385.22 rows=300148 width=4) (actual time=0.000..0.000 rows=1 loops=100000)
>           ->  Seq Scan on order_line ref_1  (cost=0.00..6711.48 rows=300148 width=4) (actual time=0.004..0.004 rows=1
loops=1)
>                 Filter: (ol_number IS NOT NULL)
>   Planning time: 0.198 ms
>   Execution time: 28.745 ms
> 
>   [NEW version]
>   Gather  (cost=1000.00..15164.93 rows=1 width=4) (actual time=91.022..92.634 rows=0 loops=1)
>     Workers Planned: 2
>     Workers Launched: 2
>     ->  Nested Loop Anti Join  (cost=0.00..14164.83 rows=1 width=4) (actual time=88.889..88.889 rows=0 loops=3)
>           ->  Parallel Seq Scan on stock ref_0  (cost=0.00..4756.67 rows=41667 width=0) (actual time=0.025..7.331
rows=33333loops=3)
 
>           ->  Seq Scan on order_line ref_1  (cost=0.00..6711.48 rows=300148 width=4) (actual time=0.002..0.002 rows=1
loops=100000)
>                 Filter: (ol_number IS NOT NULL)
>   Planning Time: 0.258 ms
>   Execution Time: 92.699 ms

I'm not particularly bothered - this is a pretty small difference. Most
of the time here is likely spent starting the workers, the cost of which
is hard to predict/model accurately.


> ####### QUERY 4:
> 
>   select
>     ref_0.s_dist_06 as c0
>   from
>     public.stock as ref_0
>   where (ref_0.s_w_id < cast(least(0, 1) as int8))
> 
> - Commit: 5edc63b (Account for the effect of lossy pages when costing bitmap scans)
> 
> - Our analysis: We believe that this regression has to do with two factors: 1) conditional expression (e.g., LEAST or
NULLIF)are not reduced to constants unlike string functions (e.g., CHAR_LENGTH)  2) change in the cost estimation
functionfor bitmap scan. Execution time grows by 3 orders of magnitude.  We note that this regression is only observed
onlarge databases (e.g., scale factor of 50).
 
> 
> - Query Execution Plans
> 
>   [OLD version]
>   Bitmap Heap Scan on stock ref_0  (cost=31201.11..273173.13 rows=1666668 width=25) (actual time=0.005..0.005 rows=0
loops=1)
>     Recheck Cond: (s_w_id < (LEAST(0, 1))::bigint)
>     ->  Bitmap Index Scan on stock_pkey  (cost=0.00..30784.44 rows=1666668 width=0) (actual time=0.005..0.005 rows=0
loops=1)
>           Index Cond: (s_w_id < (LEAST(0, 1))::bigint)
>   Planning time: 0.228 ms
>   Execution time: 0.107 ms
> 
>   [NEW version]
>   Seq Scan on stock ref_0  (cost=0.00..304469.17 rows=1666613 width=25) (actual time=716.397..716.397 rows=0
loops=1)
>     Filter: (s_w_id < (LEAST(0, 1))::bigint)
>     Rows Removed by Filter: 5000000
>   Planning Time: 0.221 ms
>   Execution Time: 716.513 ms


Hm. The primary problem here is that the estimation both before and
after are really bad. So I don't think the commit you point out here is
really to blame.  I'm not that bothered by the query not being great,
given the weird construction with LEAST(), but we probably could fix
that pretty easily.


> ####### QUERY 1:
> 
>   select
>     ref_0.o_d_id as c0
>   from
>     public.oorder as ref_0
>   where EXISTS (
>     select
>       1
>     from
>       (select distinct
>           ref_0.o_entry_d as c0,
>           ref_1.c_credit as c1
>        from
>           public.customer as ref_1
>        where (false)
>       ) as subq_1
>   );
> 
> - Commit: bf6c614 (Do execGrouping.c via expression eval machinery, take two)
> 
> - Our analysis: We are not sure about the root cause of this regression. This might have to do with grouping logic.
> 
> - Query Execution Plans
> 
>   [OLD version]
>   Seq Scan on oorder ref_0  (cost=0.00..77184338.54 rows=15022 width=4) (actual time=34.173..34.173 rows=0 loops=1)
>     Filter: (SubPlan 1)
>     Rows Removed by Filter: 30044
>     SubPlan 1
>       ->  Subquery Scan on subq_1  (cost=2569.01..2569.03 rows=1 width=0) (actual time=0.001..0.001 rows=0
loops=30044)
>             ->  HashAggregate  (cost=2569.01..2569.02 rows=1 width=3) (actual time=0.000..0.000 rows=0 loops=30044)
>                   Group Key: ref_0.o_entry_d, ref_1.c_credit
>                   ->  Result  (cost=0.00..2569.00 rows=1 width=3) (actual time=0.000..0.000 rows=0 loops=30044)
>                         One-Time Filter: false
>                         ->  Seq Scan on customer ref_1  (cost=0.00..2569.00 rows=1 width=3) (never executed)
>   Planning time: 0.325 ms
>   Execution time: 34.234 ms
> 
>   [NEW version]
>   Seq Scan on oorder ref_0  (cost=0.00..1152.32 rows=15022 width=4) (actual time=74.799..74.799 rows=0 loops=1)
>     Filter: (SubPlan 1)
>     Rows Removed by Filter: 30044
>     SubPlan 1
>       ->  Subquery Scan on subq_1  (cost=0.00..0.02 rows=1 width=0) (actual time=0.002..0.002 rows=0 loops=30044)
>             ->  HashAggregate  (cost=0.00..0.01 rows=1 width=20) (actual time=0.000..0.000 rows=0 loops=30044)
>                   Group Key: ref_0.o_entry_d, c_credit
>                   ->  Result  (cost=0.00..0.00 rows=0 width=20) (actual time=0.000..0.000 rows=0 loops=30044)
>                         One-Time Filter: false
>   Planning Time: 0.350 ms
>   Execution Time: 79.237 ms

I think that might be fixed in the latest point release. I screwed up
and made resets of tuple hash tables (and there's 30044 of those here)
more expensive. It's fixed in the latest minor release however.


Greetings,

Andres Freund

Re: Performance regressions found using sqlfuzz

From

Tom Lane

Date:

16 February 2019, 22:37:49

Andres Freund <andres@anarazel.de> writes:
> On 2019-02-14 17:27:40 +0000, Jung, Jinho wrote:
>> - Our analysis: We believe that this regression has to do with two factors: 1) conditional expression (e.g., LEAST
orNULLIF) are not reduced to constants unlike string functions (e.g., CHAR_LENGTH)  2) change in the cost estimation
functionfor bitmap scan. Execution time grows by 3 orders of magnitude.  We note that this regression is only observed
onlarge databases (e.g., scale factor of 50). 

> Hm. The primary problem here is that the estimation both before and
> after are really bad. So I don't think the commit you point out here is
> really to blame.  I'm not that bothered by the query not being great,
> given the weird construction with LEAST(), but we probably could fix
> that pretty easily.

We already did:

Author: Tom Lane <tgl@sss.pgh.pa.us>
Branch: master [6f19a8c41] 2018-12-30 13:42:04 -0500

    Teach eval_const_expressions to constant-fold LEAST/GREATEST expressions.

    Doing this requires an assumption that the invoked btree comparison
    function is immutable.  We could check that explicitly, but in other
    places such as contain_mutable_functions we just assume that it's true,
    so we may as well do likewise here.  (If the comparison function's
    behavior isn't immutable, the sort order in indexes built with it would
    be unstable, so it seems certainly wrong for it not to be so.)

    Vik Fearing

    Discussion: https://postgr.es/m/c6e8504c-4c43-35fa-6c8f-3c0b80a912cc@2ndquadrant.com

BTW, const-folding NULLIF would not be a similarly tiny fix, because
it would need to check for immutability of the underlying operator
(since it is possibly a cross-type comparison, we can't get
away with just assuming it's immutable).  I'm not convinced that
case is worth carrying extra code for.

            regards, tom lane

Re: Performance regressions found using sqlfuzz

From

"Jung, Jinho"

Date:

18 February 2019, 21:08:52

Andres, Andrew, and Tom:

Thanks for your insightful comments! We conducted additional analysis based on your comments and would like to share the results. We would also like to get your feedback on a few design decisions to increase the utility of our performance regression reports.

####### QUERY 2:

As Andres and Andrew correctly pointed out, this regression is not observed after we increase the work_mem parameter.

* work_mem = 256MB

* shared_buffers = 1024MB

* temp_buffers = 64MB

We wanted to check if these settings look good to you. If so, we will use them for validating the regressions identified by sqlfuzz.

####### QUERY 3:

Surprisingly, the performance impact of this regression is more prominent on larger databases. We concur with Andrew in that this might be related to the lack of a Materialize node and mis-costing of the Nested Loop Anti-Join.

When we increased the scale-factor of TPC-C to 300 (~30 GB), this query ran three times slower on v11 (24 seconds) in comparison to v9.5 (7 seconds). We also found more than 15 regressions related to the same commit and share a couple of them below.

Commit: 77cd477 (Enable parallel query by default.)

Summary: Execution Time (milliseconds)

+----------------------+--------+---------+---------+-----------+

+----------------------+--------+---------+---------+-----------+

| Case-3 (v9.5) | 28 | 248 | 1231 | 7265 |

| Case-3 (v11) | 74 | 677 | 3345 | 24581 |

+----------------------+--------+---------+---------+-----------+

| Case-3A (v9.5) | 88 | 937 | 4721 | 27241 |

| Case-3A (v11) | 288 | 2822 | 13838 | 85081 |

+----------------------+--------+---------+---------+-----------+

| Case-3B (v9.5) | 101 | 934 | 4824 | 29363 |

| Case-3B (v11) | 200 | 2331 | 12327 | 74110 |

+----------------------+--------+---------+---------+-----------+

###### QUERY 3A:

select

ref_0.ol_delivery_d as c1

from

public.order_line as ref_0

where EXISTS (

select

ref_1.i_im_id as c0

from

public.item as ref_1

where ref_0.ol_d_id <= ref_1.i_im_id

)

Execution plan:

[old]

Nested Loop Semi Join (cost=0.00..90020417940.08 rows=30005835 width=8) (actual time=0.034..24981.895 rows=90017507 loops=1)

Join Filter: (ref_0.ol_d_id <= ref_1.i_im_id)

-> Seq Scan on order_line ref_0 (cost=0.00..2011503.04 rows=90017504 width=12) (actual time=0.022..7145.811 rows=90017507 loops=1)

-> Materialize (cost=0.00..2771.00 rows=100000 width=4) (actual time=0.000..0.000 rows=1 loops=90017507)

-> Seq Scan on item ref_1 (cost=0.00..2271.00 rows=100000 width=4) (actual time=0.006..0.006 rows=1 loops=1)

Planning time: 0.290 ms

Execution time: 27241.239 ms

[new]

Gather (cost=1000.00..88047487498.82 rows=30005835 width=8) (actual time=0.265..82355.289 rows=90017507 loops=1)

Workers Planned: 2

Workers Launched: 2

-> Nested Loop Semi Join (cost=0.00..88044485915.32 rows=12502431 width=8) (actual time=0.033..68529.259 rows=30005836 loops=3)

Join Filter: (ref_0.ol_d_id <= ref_1.i_im_id)

-> Parallel Seq Scan on order_line ref_0 (cost=0.00..1486400.93 rows=37507293 width=12) (actual time=0.023..2789.901 rows=30005836 loops=3)

-> Seq Scan on item ref_1 (cost=0.00..2271.00 rows=100000 width=4) (actual time=0.001..0.001 rows=1 loops=90017507)

Planning Time: 0.319 ms

Execution Time: 85081.158 ms

###### QUERY 3B:

select

ref_0.ol_i_id as c0

from

public.order_line as ref_0

where EXISTS (

select

ref_0.ol_delivery_d as c0

from

public.order_line as ref_1

where ref_1.ol_d_id <= cast(nullif(ref_1.ol_o_id, ref_0.ol_i_id) as int4))

Execution plan:

[old]

Nested Loop Semi Join (cost=0.00..115638730740936.53 rows=30005835 width=4) (actual time=0.017..27009.302 rows=90017507 loops=1)

Join Filter: (ref_1.ol_d_id <= NULLIF(ref_1.ol_o_id, ref_0.ol_i_id))

Rows Removed by Join Filter: 11557

-> Seq Scan on order_line ref_0 (cost=0.00..2011503.04 rows=90017504 width=4) (actual time=0.009..7199.540 rows=90017507 loops=1)

-> Materialize (cost=0.00..2813221.56 rows=90017504 width=8) (actual time=0.000..0.000 rows=1 loops=90017507)

-> Seq Scan on order_line ref_1 (cost=0.00..2011503.04 rows=90017504 width=8) (actual time=0.001..0.002 rows=14 loops=1)

Planning time: 0.252 ms

Execution time: 29363.737 ms

[new]

Gather (cost=1000.00..84060490326155.39 rows=30005835 width=4) (actual time=0.272..71712.491 rows=90017507 loops=1)

Workers Planned: 2

Workers Launched: 2

-> Nested Loop Semi Join (cost=0.00..84060487324571.89 rows=12502431 width=4) (actual time=0.046..60153.472 rows=30005836 loops=3)

Join Filter: (ref_1.ol_d_id <= NULLIF(ref_1.ol_o_id, ref_0.ol_i_id))

Rows Removed by Join Filter: 1717

-> Parallel Seq Scan on order_line ref_0 (cost=0.00..1486400.93 rows=37507293 width=4) (actual time=0.023..2819.361 rows=30005836 loops=3)

-> Seq Scan on order_line ref_1 (cost=0.00..2011503.04 rows=90017504 width=8) (actual time=0.001..0.001 rows=1 loops=90017507)

Planning Time: 0.334 ms

Execution Time: 74110.942 ms

####### QUERY 4:

As Tom pointed out, this regression is not present in DEV head. We will not report regressions related to NULLIF in the future.

####### QUERY 1:

This regression is also not present in DEV head. We will validate our regressions on DEV head in the future report.

Best,

Jinho Jung

From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Saturday, February 16, 2019 5:37:49 PM
To: Andres Freund
Cc: Jung, Jinho; Jeff Janes; pgsql-performance@postgresql.org
Subject: Re: Performance regressions found using sqlfuzz

Andres Freund <andres@anarazel.de> writes:
> On 2019-02-14 17:27:40 +0000, Jung, Jinho wrote:
>> - Our analysis: We believe that this regression has to do with two factors: 1) conditional expression (e.g., LEAST or NULLIF) are not reduced to constants unlike string functions (e.g., CHAR_LENGTH) 2) change in the cost estimation function for bitmap scan. Execution time grows by 3 orders of magnitude. We note that this regression is only observed on large databases (e.g., scale factor of 50).

> Hm. The primary problem here is that the estimation both before and
> after are really bad. So I don't think the commit you point out here is
> really to blame. I'm not that bothered by the query not being great,
> given the weird construction with LEAST(), but we probably could fix
> that pretty easily.

We already did:

Author: Tom Lane <tgl@sss.pgh.pa.us>
Branch: master [6f19a8c41] 2018-12-30 13:42:04 -0500

    Teach eval_const_expressions to constant-fold LEAST/GREATEST expressions.

    Doing this requires an assumption that the invoked btree comparison
    function is immutable. We could check that explicitly, but in other
    places such as contain_mutable_functions we just assume that it's true,
    so we may as well do likewise here. (If the comparison function's
    behavior isn't immutable, the sort order in indexes built with it would
    be unstable, so it seems certainly wrong for it not to be so.)

    Vik Fearing

    Discussion: https://postgr.es/m/c6e8504c-4c43-35fa-6c8f-3c0b80a912cc@2ndquadrant.com

BTW, const-folding NULLIF would not be a similarly tiny fix, because
it would need to check for immutability of the underlying operator
(since it is possibly a cross-type comparison, we can't get
away with just assuming it's immutable). I'm not convinced that
case is worth carrying extra code for.

                        regards, tom lane

Re: Performance regressions found using sqlfuzz

From

"Jung, Jinho"

Date:

28 February 2019, 16:43:26

Hi Andres:

Could you please share your thoughts on QUERY 3?

The performance impact of this regression increases *linearly* on larger databases. We concur with Andrew in that this is related to the lack of a Materialize node and mis-costing of the Nested Loop Anti-Join.

We found more than 20 regressions related to this commit. We have shared two illustrative examples (QUERIES 3A and 3B) below.

- Commit: 77cd477 (Enable parallel query by default.)

- Summary: Execution Time (milliseconds)

+-----------------------+--------+---------+---------+-----------+

+-----------------------+--------+---------+---------+-----------+

| Query 3 (v9.5) | 28 | 248 | 1231 | 7265 |

| Query 3 (v11) | 74 | 677 | 3345 | 24581 |

+-----------------------+--------+---------+---------+-----------+

| Query 3A (v9.5) | 88 | 937 | 4721 | 27241 |

| Query 3A (v11) | 288 | 2822 | 13838 | 85081 |

+-----------------------+--------+---------+---------+-----------+

| Query 3B (v9.5) | 101 | 934 | 4824 | 29363 |

| Query 3B (v11) | 200 | 2331 | 12327 | 74110 |

+-----------------------+--------+---------+---------+-----------+

###### QUERY 3:

select

cast(ref_1.ol_i_id as int4) as c0

from

public.stock as ref_0

left join public.order_line as ref_1

on (ref_1.ol_number is not null)

where ref_1.ol_number is null

###### QUERY 3A:

select

ref_0.ol_delivery_d as c1

from

public.order_line as ref_0

where EXISTS (

select

ref_1.i_im_id as c0

from

public.item as ref_1

where ref_0.ol_d_id <= ref_1.i_im_id

)

Execution plan:

[OLD version]

Nested Loop Semi Join (cost=0.00..90020417940.08 rows=30005835 width=8) (actual time=0.034..24981.895 rows=90017507 loops=1)

Join Filter: (ref_0.ol_d_id <= ref_1.i_im_id)

-> Seq Scan on order_line ref_0 (cost=0.00..2011503.04 rows=90017504 width=12) (actual time=0.022..7145.811 rows=90017507 loops=1)

-> Materialize (cost=0.00..2771.00 rows=100000 width=4) (actual time=0.000..0.000 rows=1 loops=90017507)

-> Seq Scan on item ref_1 (cost=0.00..2271.00 rows=100000 width=4) (actual time=0.006..0.006 rows=1 loops=1)

Planning time: 0.290 ms

Execution time: 27241.239 ms

[NEW version]

Gather (cost=1000.00..88047487498.82 rows=30005835 width=8) (actual time=0.265..82355.289 rows=90017507 loops=1)

Workers Planned: 2

Workers Launched: 2

-> Nested Loop Semi Join (cost=0.00..88044485915.32 rows=12502431 width=8) (actual time=0.033..68529.259 rows=30005836 loops=3)

Join Filter: (ref_0.ol_d_id <= ref_1.i_im_id)

-> Parallel Seq Scan on order_line ref_0 (cost=0.00..1486400.93 rows=37507293 width=12) (actual time=0.023..2789.901 rows=30005836 loops=3)

-> Seq Scan on item ref_1 (cost=0.00..2271.00 rows=100000 width=4) (actual time=0.001..0.001 rows=1 loops=90017507)

Planning Time: 0.319 ms

Execution Time: 85081.158 ms

###### QUERY 3B:

select

ref_0.ol_i_id as c0

from

public.order_line as ref_0

where EXISTS (

select

ref_0.ol_delivery_d as c0

from

public.order_line as ref_1

where ref_1.ol_d_id <= cast(nullif(ref_1.ol_o_id, ref_0.ol_i_id) as int4))

Execution plan:

[OLD version]

Nested Loop Semi Join (cost=0.00..115638730740936.53 rows=30005835 width=4) (actual time=0.017..27009.302 rows=90017507 loops=1)

Join Filter: (ref_1.ol_d_id <= NULLIF(ref_1.ol_o_id, ref_0.ol_i_id))

Rows Removed by Join Filter: 11557

-> Seq Scan on order_line ref_0 (cost=0.00..2011503.04 rows=90017504 width=4) (actual time=0.009..7199.540 rows=90017507 loops=1)

-> Materialize (cost=0.00..2813221.56 rows=90017504 width=8) (actual time=0.000..0.000 rows=1 loops=90017507)

-> Seq Scan on order_line ref_1 (cost=0.00..2011503.04 rows=90017504 width=8) (actual time=0.001..0.002 rows=14 loops=1)

Planning time: 0.252 ms

Execution time: 29363.737 ms

[NEW version]

Gather (cost=1000.00..84060490326155.39 rows=30005835 width=4) (actual time=0.272..71712.491 rows=90017507 loops=1)

Workers Planned: 2

Workers Launched: 2

-> Nested Loop Semi Join (cost=0.00..84060487324571.89 rows=12502431 width=4) (actual time=0.046..60153.472 rows=30005836 loops=3)

Join Filter: (ref_1.ol_d_id <= NULLIF(ref_1.ol_o_id, ref_0.ol_i_id))

Rows Removed by Join Filter: 1717

-> Parallel Seq Scan on order_line ref_0 (cost=0.00..1486400.93 rows=37507293 width=4) (actual time=0.023..2819.361 rows=30005836 loops=3)

-> Seq Scan on order_line ref_1 (cost=0.00..2011503.04 rows=90017504 width=8) (actual time=0.001..0.001 rows=1 loops=90017507)

Planning Time: 0.334 ms

Execution Time: 74110.942 ms