Home > mailing lists

Thread: Different execution plan between PostgreSQL 8.4 and 12.11

Different execution plan between PostgreSQL 8.4 and 12.11

From

gzh

Date:

09 October 2022, 09:11:19

Hi,

I have had a Perl Website working for 7 years and have had no problems

until a few weeks ago I replaced my database server with a newer one.

Database server (old): PostgreSQL 8.4 32bit

Database server (new): PostgreSQL 12.11 64bit

I run following sql in PostgreSQL 8.4 and PostgreSQL 12.11, it returns different execution plan.

--PostgreSQL 8.4

---------------

old=# select count(1) from analyze_word_reports;

count

---------

9164136

(1 row)

old=# select indexdef from pg_indexes where tablename='analyze_word_reports';

indexdef

-------------------------------------------------------------------------------------------

CREATE INDEX analyze_word_reports_index_cseid ON analyze_word_reports USING btree (cseid)

CREATE UNIQUE INDEX analyze_word_reports_pkey ON analyze_word_reports USING btree (seq)

(2 rows)

old=# explain analyze select 2 from analyze_word_reports where (cseid = 94) limit 1;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------

Limit (cost=0.00..0.63 rows=1 width=0) (actual time=0.893..0.893 rows=0 loops=1)

-> Index Scan using analyze_word_reports_index_cseid on analyze_word_reports (cost=0.00..18621.98 rows=29707 width=0) (actual time=0.892..0.892 rows=0 loops=1)

Index Cond: (cseid = 94)

Total runtime: 0.941 ms

(4 rows)

--PostgreSQL 12.11

---------------

new=# select count(1) from analyze_word_reports;

count

---------

20131947

(1 row)

new=# select indexdef from pg_indexes where tablename='analyze_word_reports';

indexdef

-------------------------------------------------------------------------------------------

CREATE INDEX analyze_word_reports_index_cseid ON analyze_word_reports USING btree (cseid)

CREATE UNIQUE INDEX analyze_word_reports_pkey ON analyze_word_reports USING btree (seq)

(2 rows)

new=# explain analyze select 2 from analyze_word_reports where (cseid = 94) limit 1;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------

Limit (cost=0.00..0.41 rows=1 width=4) (actual time=4908.459..4908.462 rows=1 loops=1)

-> Seq Scan on analyze_word_reports (cost=0.00..528610.15 rows=1278561 width=4) (actual time=4908.455..4908.456 rows=1 loops=1)

Filter: (cseid = 94)

Rows Removed by Filter: 15477750

Planning Time: 0.411 ms

Execution Time: 4908.498 ms

(6 行)

Although PostgreSQL 8.4 has more than 10 million rows of data less than PostgreSQL 12.11,

PostgreSQL 12.11 is too slow and analyze_word_reports_index_cseid does not seem to work.

I'm guessing that the limit keyword of PostgreSQL 12.11 causes the index not to work.

But I don't understand why PostgreSQL 8.4 is normal.

What is the reason for this and is there any easy way to maintain compatibility?

Regards,

gzh

Re: Different execution plan between PostgreSQL 8.4 and 12.11

From

Ron

Date:

09 October 2022, 09:56:15

Did you analyze and vacuum all of the tables in the new database?

On 10/9/22 04:11, gzh wrote:

Hi,

I have had a Perl Website working for 7 years and have had no problems
until a few weeks ago I replaced my database server with a newer one.

Database server (old): PostgreSQL 8.4 32bit
Database server (new): PostgreSQL 12.11 64bit

I run following sql in PostgreSQL 8.4 and PostgreSQL 12.11, it returns different execution plan.

--PostgreSQL 8.4
---------------

old=# select count(1) from analyze_word_reports;
count
---------
9164136
(1 row)

old=# select indexdef from pg_indexes where tablename='analyze_word_reports';
indexdef
-------------------------------------------------------------------------------------------
CREATE INDEX analyze_word_reports_index_cseid ON analyze_word_reports USING btree (cseid)
CREATE UNIQUE INDEX analyze_word_reports_pkey ON analyze_word_reports USING btree (seq)
(2 rows)

old=# explain analyze select 2 from analyze_word_reports where (cseid = 94) limit 1;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..0.63 rows=1 width=0) (actual time=0.893..0.893 rows=0 loops=1)
-> Index Scan using analyze_word_reports_index_cseid on analyze_word_reports (cost=0.00..18621.98 rows=29707 width=0) (actual time=0.892..0.892 rows=0 loops=1)
Index Cond: (cseid = 94)
Total runtime: 0.941 ms
(4 rows)

--PostgreSQL 12.11
---------------

new=# select count(1) from analyze_word_reports;
count
---------
20131947
(1 row)

new=# select indexdef from pg_indexes where tablename='analyze_word_reports';
indexdef
-------------------------------------------------------------------------------------------
CREATE INDEX analyze_word_reports_index_cseid ON analyze_word_reports USING btree (cseid)
CREATE UNIQUE INDEX analyze_word_reports_pkey ON analyze_word_reports USING btree (seq)
(2 rows)

new=# explain analyze select 2 from analyze_word_reports where (cseid = 94) limit 1;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..0.41 rows=1 width=4) (actual time=4908.459..4908.462 rows=1 loops=1)
-> Seq Scan on analyze_word_reports (cost=0.00..528610.15 rows=1278561 width=4) (actual time=4908.455..4908.456 rows=1 loops=1)
Filter: (cseid = 94)
Rows Removed by Filter: 15477750
Planning Time: 0.411 ms
Execution Time: 4908.498 ms
(6 行)

Although PostgreSQL 8.4 has more than 10 million rows of data less than PostgreSQL 12.11,
PostgreSQL 12.11 is too slow and analyze_word_reports_index_cseid does not seem to work.
I'm guessing that the limit keyword of PostgreSQL 12.11 causes the index not to work.
But I don't understand why PostgreSQL 8.4 is normal.

What is the reason for this and is there any easy way to maintain compatibility?

Regards,

--

gzh

--
Angular momentum makes the world go 'round.

Re: Different execution plan between PostgreSQL 8.4 and 12.11

From

Tom Lane

Date:

09 October 2022, 16:02:09

gzh <gzhcoder@126.com> writes:
> I run following sql in PostgreSQL 8.4 and PostgreSQL 12.11, it returns different execution plan.

Have you done an ANALYZE in the new database?  The rowcount estimates
seem a lot different, leading me to guess that the newer installation
doesn't have good statistics yet.

            regards, tom lane

Re: Different execution plan between PostgreSQL 8.4 and 12.11

From

Mladen Gogala

Date:

09 October 2022, 16:05:17

On 10/9/22 05:11, gzh wrote:

Hi,

I have had a Perl Website working for 7 years and have had no problems
until a few weeks ago I replaced my database server with a newer one.

Database server (old): PostgreSQL 8.4 32bit
Database server (new): PostgreSQL 12.11 64bit

I run following sql in PostgreSQL 8.4 and PostgreSQL 12.11, it returns different execution plan.

--PostgreSQL 8.4
---------------

old=# select count(1) from analyze_word_reports;
count
---------
9164136
(1 row)

old=# select indexdef from pg_indexes where tablename='analyze_word_reports';
indexdef
-------------------------------------------------------------------------------------------
CREATE INDEX analyze_word_reports_index_cseid ON analyze_word_reports USING btree (cseid)
CREATE UNIQUE INDEX analyze_word_reports_pkey ON analyze_word_reports USING btree (seq)
(2 rows)

old=# explain analyze select 2 from analyze_word_reports where (cseid = 94) limit 1;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..0.63 rows=1 width=0) (actual time=0.893..0.893 rows=0 loops=1)
-> Index Scan using analyze_word_reports_index_cseid on analyze_word_reports (cost=0.00..18621.98 rows=29707 width=0) (actual time=0.892..0.892 rows=0 loops=1)
Index Cond: (cseid = 94)
Total runtime: 0.941 ms
(4 rows)

--PostgreSQL 12.11
---------------

new=# select count(1) from analyze_word_reports;
count
---------
20131947
(1 row)

new=# select indexdef from pg_indexes where tablename='analyze_word_reports';
indexdef
-------------------------------------------------------------------------------------------
CREATE INDEX analyze_word_reports_index_cseid ON analyze_word_reports USING btree (cseid)
CREATE UNIQUE INDEX analyze_word_reports_pkey ON analyze_word_reports USING btree (seq)
(2 rows)

new=# explain analyze select 2 from analyze_word_reports where (cseid = 94) limit 1;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..0.41 rows=1 width=4) (actual time=4908.459..4908.462 rows=1 loops=1)
-> Seq Scan on analyze_word_reports (cost=0.00..528610.15 rows=1278561 width=4) (actual time=4908.455..4908.456 rows=1 loops=1)
Filter: (cseid = 94)
Rows Removed by Filter: 15477750
Planning Time: 0.411 ms
Execution Time: 4908.498 ms
(6 行)

Although PostgreSQL 8.4 has more than 10 million rows of data less than PostgreSQL 12.11,
PostgreSQL 12.11 is too slow and analyze_word_reports_index_cseid does not seem to work.
I'm guessing that the limit keyword of PostgreSQL 12.11 causes the index not to work.
But I don't understand why PostgreSQL 8.4 is normal.

What is the reason for this and is there any easy way to maintain compatibility?

Regards,

--

gzh

Well, as someone who has worked on more than one database, I can tell you that new version always means new plans. Most of the time, the new plans are better but sometimes they're not. Your problem is probably caused by one or two SQL statements that have changed plans. I would advise installing pg_hint_plan extension and fixing those few SQL queries manually. PostgreSQL would probably benefit from something like Oracle baselines, which are a good mechanism for carrying plans over to the new version.

If you don't want to install the new extension, you can try by running vacuum analyze on the database. Also, upgrading to PgSQL 12 doesn't make much sense given the fact that PgSQL 15 will be released in a few weeks. Can you upgrade to PgSQL 14?

Regards

-- 
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com

Re:Re: Different execution plan between PostgreSQL 8.4 and 12.11

From

gzh

Date:

10 October 2022, 02:32:56

Hi Tom,

Thank you for your prompt response.

I've run analyze(not vacuum analyze), but it doesn't seem to work.

Is there any other way to optimize the database?


At 2022-10-10 00:02:09, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>gzh <gzhcoder@126.com> writes:
>> I run following sql in PostgreSQL 8.4 and PostgreSQL 12.11, it returns different execution plan.
>
>Have you done an ANALYZE in the new database?  The rowcount estimates
>seem a lot different, leading me to guess that the newer installation
>doesn't have good statistics yet.
>
>			regards, tom lane

Re: Different execution plan between PostgreSQL 8.4 and 12.11

From

Tom Lane

Date:

10 October 2022, 02:45:54

gzh  <gzhcoder@126.com> writes:
> I've run analyze(not vacuum analyze), but it doesn't seem to work.

When you're asking for help, please don't give us vague statements
like "doesn't seem to work".  Did the plan (including rowcount
estimates) change at all?  To what?  How far off is that rowcount
estimate, anyway --- that is, how many rows actually have cseid = 94?

If the estimate is far off, then increasing the table's statistics
target might help.

Another thing that would be worth checking is whether
"set enable_seqscan = off" prods it to choose the plan you want.
If not, then there's something else going on besides poor estimates.

            regards, tom lane

Re:Re: Different execution plan between PostgreSQL 8.4 and 12.11

From

gzh

Date:

11 October 2022, 03:12:55

Hi, Tom

Thank you for your reply.

> When you're asking for help, please don't give us vague statements

> like "doesn't seem to work".

I understand.

> Did the plan (including rowcount

> estimates) change at all? To what? How far off is that rowcount

> estimate, anyway --- that is, how many rows actually have cseid = 94?

Please refer to the new execution plan (PostgreSQL 12.11) below.

new=# show enable_seqscan;

enable_seqscan

----------------

(1 行)

new=# select count(*) from analyze_word_reports;

count

----------

21331980

(1 行)

new=# select count(*) from analyze_word_reports where (cseid = 94);

count

---------

1287156

(1 行)

new=# explain analyze select count(2) from analyze_word_reports where (cseid = 94) limit 1;

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------

Limit (cost=65184.06..65184.07 rows=1 width=8) (actual time=123.713..133.035 rows=1 loops=1)

-> Finalize Aggregate (cost=65184.06..65184.07 rows=1 width=8) (actual time=123.712..133.033 rows=1 loops=1)

-> Gather (cost=65183.85..65184.06 rows=2 width=8) (actual time=123.548..133.024 rows=3 loops=1)

Workers Planned: 2

Workers Launched: 2

-> Partial Aggregate (cost=64183.85..64183.86 rows=1 width=8) (actual time=119.495..119.496 rows=1 loops=3)

-> Parallel Index Only Scan using analyze_word_reports_index_cseid on analyze_word_reports (cost=0.56..6290

9.34 rows=509805 width=0) (actual time=0.031..98.706 rows=429052 loops=3)

Index Cond: (cseid = 94)

Heap Fetches: 1287156 Planning Time: 0.122 ms Execution Time: 133.069 ms

(11 行)

new=# explain analyze select 2 from analyze_word_reports where (cseid = 94) limit 1;

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------

---

Limit (cost=0.00..0.43 rows=1 width=4) (actual time=2156.964..2156.966 rows=1 loops=1)

-> Seq Scan on analyze_word_reports (cost=0.00..528550.75 rows=1223533 width=4) (actual time=2156.962..2156.964 rows=1 loops=

Filter: (cseid = 94)

Rows Removed by Filter: 18320180 Planning Time: 0.086 ms Execution Time: 2156.985 ms

(6 行)

> If the estimate is far off, then increasing the table's statistics

> target might help.

Thank you for your advice.

Please tell me how to set the table's statistics up to improve performance.

new=# select oid from pg_class where relname = 'analyze_word_reports';

oid

-------

16429

(1 行)

new=# select attrelid,attname,attstattarget from pg_attribute where attrelid=16429 and attname='cseid';

attrelid | attname | attstattarget

----------+---------+---------------

16429 | cseid | -1

(1 行)

> Another thing that would be worth checking is whether

> "set enable_seqscan = off" prods it to choose the plan you want.

> If not, then there's something else going on besides poor estimates.

"set enable_seqscan = off" works, and the performance is greatly improved, which is almost the same as PostgreSQL 8.4.

The enable_seqscan(PostgreSQL 8.4) is on, will this change have an unknown effect on other queries?


At 2022-10-10 10:45:54, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>gzh  <gzhcoder@126.com> writes:
>> I've run analyze(not vacuum analyze), but it doesn't seem to work.
>
>When you're asking for help, please don't give us vague statements
>like "doesn't seem to work".  Did the plan (including rowcount
>estimates) change at all?  To what?  How far off is that rowcount
>estimate, anyway --- that is, how many rows actually have cseid = 94?
>
>If the estimate is far off, then increasing the table's statistics
>target might help.
>
>Another thing that would be worth checking is whether
>"set enable_seqscan = off" prods it to choose the plan you want.
>If not, then there's something else going on besides poor estimates.
>
>			regards, tom lane

Re: Re: Different execution plan between PostgreSQL 8.4 and 12.11

From

Pavel Stehule

Date:

11 October 2022, 03:32:48

út 11. 10. 2022 v 5:13 odesílatel gzh <gzhcoder@126.com> napsal:

Hi, Tom
Thank you for your reply.

> When you're asking for help, please don't give us vague statements
> like "doesn't seem to work".
I understand.

> Did the plan (including rowcount
> estimates) change at all? To what? How far off is that rowcount
> estimate, anyway --- that is, how many rows actually have cseid = 94?
Please refer to the new execution plan (PostgreSQL 12.11) below.

new=# show enable_seqscan;
enable_seqscan
----------------
on
(1 行)

new=# select count(*) from analyze_word_reports;
count
----------
21331980
(1 行)

new=# select count(*) from analyze_word_reports where (cseid = 94);
count
---------
1287156
(1 行)

new=# explain analyze select count(2) from analyze_word_reports where (cseid = 94) limit 1;
QUERY PLAN

the LIMIT clause is in this case totally useless and messy, and maybe can negative impacts optimizer

Regards

Pavel

----------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------
Limit (cost=65184.06..65184.07 rows=1 width=8) (actual time=123.713..133.035 rows=1 loops=1)
-> Finalize Aggregate (cost=65184.06..65184.07 rows=1 width=8) (actual time=123.712..133.033 rows=1 loops=1)
-> Gather (cost=65183.85..65184.06 rows=2 width=8) (actual time=123.548..133.024 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=64183.85..64183.86 rows=1 width=8) (actual time=119.495..119.496 rows=1 loops=3)
-> Parallel Index Only Scan using analyze_word_reports_index_cseid on analyze_word_reports (cost=0.56..6290
9.34 rows=509805 width=0) (actual time=0.031..98.706 rows=429052 loops=3)
Index Cond: (cseid = 94)
Heap Fetches: 1287156 Planning Time: 0.122 ms Execution Time: 133.069 ms
(11 行)

new=# explain analyze select 2 from analyze_word_reports where (cseid = 94) limit 1;
QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
---
Limit (cost=0.00..0.43 rows=1 width=4) (actual time=2156.964..2156.966 rows=1 loops=1)
-> Seq Scan on analyze_word_reports (cost=0.00..528550.75 rows=1223533 width=4) (actual time=2156.962..2156.964 rows=1 loops=
1)
Filter: (cseid = 94)
Rows Removed by Filter: 18320180 Planning Time: 0.086 ms Execution Time: 2156.985 ms
(6 行)

> If the estimate is far off, then increasing the table's statistics
> target might help.
Thank you for your advice.
Please tell me how to set the table's statistics up to improve performance.

new=# select oid from pg_class where relname = 'analyze_word_reports';
oid
-------
16429
(1 行)

new=# select attrelid,attname,attstattarget from pg_attribute where attrelid=16429 and attname='cseid';
attrelid | attname | attstattarget
----------+---------+---------------
16429 | cseid | -1
(1 行)

> Another thing that would be worth checking is whether
> "set enable_seqscan = off" prods it to choose the plan you want.
> If not, then there's something else going on besides poor estimates.
"set enable_seqscan = off" works, and the performance is greatly improved, which is almost the same as PostgreSQL 8.4.
The enable_seqscan(PostgreSQL 8.4) is on, will this change have an unknown effect on other queries?
At 2022-10-10 10:45:54, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>gzh  <gzhcoder@126.com> writes:
>> I've run analyze(not vacuum analyze), but it doesn't seem to work.
>
>When you're asking for help, please don't give us vague statements
>like "doesn't seem to work".  Did the plan (including rowcount
>estimates) change at all?  To what?  How far off is that rowcount
>estimate, anyway --- that is, how many rows actually have cseid = 94?
>
>If the estimate is far off, then increasing the table's statistics
>target might help.
>
>Another thing that would be worth checking is whether
>"set enable_seqscan = off" prods it to choose the plan you want.
>If not, then there's something else going on besides poor estimates.
>
>			regards, tom lane

Re:Re: Re: Different execution plan between PostgreSQL 8.4 and 12.11

From

gzh

Date:

11 October 2022, 04:05:16

Hi, Pavel

Thank you for your reply.

> the LIMIT clause is in this case totally useless and messy, and maybe can

> negative impacts optimizer

Yes. After removing the LIMIT clause, the performance is improved.

The execution plan shows that the index worked.

We've noticed it, but I don't want to fix the problem by modifying the SQL until I find the cause.

At 2022-10-11 11:32:48, "Pavel Stehule" <pavel.stehule@gmail.com> wrote:

út 11. 10. 2022 v 5:13 odesílatel gzh <gzhcoder@126.com> napsal:
Hi, Tom
Thank you for your reply.

> When you're asking for help, please don't give us vague statements
> like "doesn't seem to work".
I understand.

> Did the plan (including rowcount
> estimates) change at all? To what? How far off is that rowcount
> estimate, anyway --- that is, how many rows actually have cseid = 94?
Please refer to the new execution plan (PostgreSQL 12.11) below.

new=# show enable_seqscan;
enable_seqscan
----------------
on
(1 行)

new=# select count(*) from analyze_word_reports;
count
----------
21331980
(1 行)

new=# select count(*) from analyze_word_reports where (cseid = 94);
count
---------
1287156
(1 行)

new=# explain analyze select count(2) from analyze_word_reports where (cseid = 94) limit 1;
QUERY PLAN

the LIMIT clause is in this case totally useless and messy, and maybe can negative impacts optimizer

Regards

Pavel
----------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------
Limit (cost=65184.06..65184.07 rows=1 width=8) (actual time=123.713..133.035 rows=1 loops=1)
-> Finalize Aggregate (cost=65184.06..65184.07 rows=1 width=8) (actual time=123.712..133.033 rows=1 loops=1)
-> Gather (cost=65183.85..65184.06 rows=2 width=8) (actual time=123.548..133.024 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=64183.85..64183.86 rows=1 width=8) (actual time=119.495..119.496 rows=1 loops=3)
-> Parallel Index Only Scan using analyze_word_reports_index_cseid on analyze_word_reports (cost=0.56..6290
9.34 rows=509805 width=0) (actual time=0.031..98.706 rows=429052 loops=3)
Index Cond: (cseid = 94)
Heap Fetches: 1287156 Planning Time: 0.122 ms Execution Time: 133.069 ms
(11 行)

new=# explain analyze select 2 from analyze_word_reports where (cseid = 94) limit 1;
QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
---
Limit (cost=0.00..0.43 rows=1 width=4) (actual time=2156.964..2156.966 rows=1 loops=1)
-> Seq Scan on analyze_word_reports (cost=0.00..528550.75 rows=1223533 width=4) (actual time=2156.962..2156.964 rows=1 loops=
1)
Filter: (cseid = 94)
Rows Removed by Filter: 18320180 Planning Time: 0.086 ms Execution Time: 2156.985 ms
(6 行)

> If the estimate is far off, then increasing the table's statistics
> target might help.
Thank you for your advice.
Please tell me how to set the table's statistics up to improve performance.

new=# select oid from pg_class where relname = 'analyze_word_reports';
oid
-------
16429
(1 行)

new=# select attrelid,attname,attstattarget from pg_attribute where attrelid=16429 and attname='cseid';
attrelid | attname | attstattarget
----------+---------+---------------
16429 | cseid | -1
(1 行)

> Another thing that would be worth checking is whether
> "set enable_seqscan = off" prods it to choose the plan you want.
> If not, then there's something else going on besides poor estimates.
"set enable_seqscan = off" works, and the performance is greatly improved, which is almost the same as PostgreSQL 8.4.
The enable_seqscan(PostgreSQL 8.4) is on, will this change have an unknown effect on other queries?
At 2022-10-10 10:45:54, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>gzh  <gzhcoder@126.com> writes:
>> I've run analyze(not vacuum analyze), but it doesn't seem to work.
>
>When you're asking for help, please don't give us vague statements
>like "doesn't seem to work".  Did the plan (including rowcount
>estimates) change at all?  To what?  How far off is that rowcount
>estimate, anyway --- that is, how many rows actually have cseid = 94?
>
>If the estimate is far off, then increasing the table's statistics
>target might help.
>
>Another thing that would be worth checking is whether
>"set enable_seqscan = off" prods it to choose the plan you want.
>If not, then there's something else going on besides poor estimates.
>
>			regards, tom lane

Re: Re: Re: Different execution plan between PostgreSQL 8.4 and 12.11

From

Pavel Stehule

Date:

11 October 2022, 04:13:47

út 11. 10. 2022 v 6:05 odesílatel gzh <gzhcoder@126.com> napsal:

Hi, Pavel
Thank you for your reply.

> the LIMIT clause is in this case totally useless and messy, and maybe can
> negative impacts optimizer
Yes. After removing the LIMIT clause, the performance is improved.
The execution plan shows that the index worked.
We've noticed it, but I don't want to fix the problem by modifying the SQL until I find the cause.

The LIMIT clause changes total cost. This is a very aggressive clause. And although it is absolutely useless in this case, Postgres does not have any logic for removing it. Postgres doesn't try to fix developer's mistakes.

At 2022-10-11 11:32:48, "Pavel Stehule" <pavel.stehule@gmail.com> wrote:
út 11. 10. 2022 v 5:13 odesílatel gzh <gzhcoder@126.com> napsal:
Hi, Tom
Thank you for your reply.

> When you're asking for help, please don't give us vague statements
> like "doesn't seem to work".
I understand.

> Did the plan (including rowcount
> estimates) change at all? To what? How far off is that rowcount
> estimate, anyway --- that is, how many rows actually have cseid = 94?
Please refer to the new execution plan (PostgreSQL 12.11) below.

new=# show enable_seqscan;
enable_seqscan
----------------
on
(1 行)

new=# select count(*) from analyze_word_reports;
count
----------
21331980
(1 行)

new=# select count(*) from analyze_word_reports where (cseid = 94);
count
---------
1287156
(1 行)

new=# explain analyze select count(2) from analyze_word_reports where (cseid = 94) limit 1;
QUERY PLAN

the LIMIT clause is in this case totally useless and messy, and maybe can negative impacts optimizer

Regards

Pavel
----------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------
Limit (cost=65184.06..65184.07 rows=1 width=8) (actual time=123.713..133.035 rows=1 loops=1)
-> Finalize Aggregate (cost=65184.06..65184.07 rows=1 width=8) (actual time=123.712..133.033 rows=1 loops=1)
-> Gather (cost=65183.85..65184.06 rows=2 width=8) (actual time=123.548..133.024 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=64183.85..64183.86 rows=1 width=8) (actual time=119.495..119.496 rows=1 loops=3)
-> Parallel Index Only Scan using analyze_word_reports_index_cseid on analyze_word_reports (cost=0.56..6290
9.34 rows=509805 width=0) (actual time=0.031..98.706 rows=429052 loops=3)
Index Cond: (cseid = 94)
Heap Fetches: 1287156 Planning Time: 0.122 ms Execution Time: 133.069 ms
(11 行)

new=# explain analyze select 2 from analyze_word_reports where (cseid = 94) limit 1;
QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
---
Limit (cost=0.00..0.43 rows=1 width=4) (actual time=2156.964..2156.966 rows=1 loops=1)
-> Seq Scan on analyze_word_reports (cost=0.00..528550.75 rows=1223533 width=4) (actual time=2156.962..2156.964 rows=1 loops=
1)
Filter: (cseid = 94)
Rows Removed by Filter: 18320180 Planning Time: 0.086 ms Execution Time: 2156.985 ms
(6 行)

> If the estimate is far off, then increasing the table's statistics
> target might help.
Thank you for your advice.
Please tell me how to set the table's statistics up to improve performance.

new=# select oid from pg_class where relname = 'analyze_word_reports';
oid
-------
16429
(1 行)

new=# select attrelid,attname,attstattarget from pg_attribute where attrelid=16429 and attname='cseid';
attrelid | attname | attstattarget
----------+---------+---------------
16429 | cseid | -1
(1 行)

> Another thing that would be worth checking is whether
> "set enable_seqscan = off" prods it to choose the plan you want.
> If not, then there's something else going on besides poor estimates.
"set enable_seqscan = off" works, and the performance is greatly improved, which is almost the same as PostgreSQL 8.4.
The enable_seqscan(PostgreSQL 8.4) is on, will this change have an unknown effect on other queries?
At 2022-10-10 10:45:54, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>gzh  <gzhcoder@126.com> writes:
>> I've run analyze(not vacuum analyze), but it doesn't seem to work.
>
>When you're asking for help, please don't give us vague statements
>like "doesn't seem to work".  Did the plan (including rowcount
>estimates) change at all?  To what?  How far off is that rowcount
>estimate, anyway --- that is, how many rows actually have cseid = 94?
>
>If the estimate is far off, then increasing the table's statistics
>target might help.
>
>Another thing that would be worth checking is whether
>"set enable_seqscan = off" prods it to choose the plan you want.
>If not, then there's something else going on besides poor estimates.
>
>			regards, tom lane

Re:Re: Re: Re: Different execution plan between PostgreSQL 8.4 and 12.11

From

gzh

Date:

11 October 2022, 05:07:45

Hi, Pavel

> The LIMIT clause changes total cost. This is a very aggressive clause. And

> although it is absolutely useless in this case, Postgres does not have any

> logic for removing it. Postgres doesn't try to fix developer's mistakes.

Sorry,I didn't understand what you mean.

Couldn't the LIMIT clause be used like the SQL statement below?

>> new=# explain analyze select 2 from analyze_word_reports where (cseid =

>> 94) limit 1;

This SQL statement is no problem under PostgreSQL 8.4, the index works well.

At 2022-10-11 12:13:47, "Pavel Stehule" <pavel.stehule@gmail.com> wrote:

út 11. 10. 2022 v 6:05 odesílatel gzh <gzhcoder@126.com> napsal:

Hi, Pavel
Thank you for your reply.

> the LIMIT clause is in this case totally useless and messy, and maybe can
> negative impacts optimizer
Yes. After removing the LIMIT clause, the performance is improved.
The execution plan shows that the index worked.
We've noticed it, but I don't want to fix the problem by modifying the SQL until I find the cause.

The LIMIT clause changes total cost. This is a very aggressive clause. And although it is absolutely useless in this case, Postgres does not have any logic for removing it. Postgres doesn't try to fix developer's mistakes.
At 2022-10-11 11:32:48, "Pavel Stehule" <pavel.stehule@gmail.com> wrote:
út 11. 10. 2022 v 5:13 odesílatel gzh <gzhcoder@126.com> napsal:
Hi, Tom
Thank you for your reply.

> When you're asking for help, please don't give us vague statements
> like "doesn't seem to work".
I understand.

> Did the plan (including rowcount
> estimates) change at all? To what? How far off is that rowcount
> estimate, anyway --- that is, how many rows actually have cseid = 94?
Please refer to the new execution plan (PostgreSQL 12.11) below.

new=# show enable_seqscan;
enable_seqscan
----------------
on
(1 行)

new=# select count(*) from analyze_word_reports;
count
----------
21331980
(1 行)

new=# select count(*) from analyze_word_reports where (cseid = 94);
count
---------
1287156
(1 行)

new=# explain analyze select count(2) from analyze_word_reports where (cseid = 94) limit 1;
QUERY PLAN

the LIMIT clause is in this case totally useless and messy, and maybe can negative impacts optimizer

Regards

Pavel
----------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------
Limit (cost=65184.06..65184.07 rows=1 width=8) (actual time=123.713..133.035 rows=1 loops=1)
-> Finalize Aggregate (cost=65184.06..65184.07 rows=1 width=8) (actual time=123.712..133.033 rows=1 loops=1)
-> Gather (cost=65183.85..65184.06 rows=2 width=8) (actual time=123.548..133.024 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=64183.85..64183.86 rows=1 width=8) (actual time=119.495..119.496 rows=1 loops=3)
-> Parallel Index Only Scan using analyze_word_reports_index_cseid on analyze_word_reports (cost=0.56..6290
9.34 rows=509805 width=0) (actual time=0.031..98.706 rows=429052 loops=3)
Index Cond: (cseid = 94)
Heap Fetches: 1287156 Planning Time: 0.122 ms Execution Time: 133.069 ms
(11 行)

new=# explain analyze select 2 from analyze_word_reports where (cseid = 94) limit 1;
QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
---
Limit (cost=0.00..0.43 rows=1 width=4) (actual time=2156.964..2156.966 rows=1 loops=1)
-> Seq Scan on analyze_word_reports (cost=0.00..528550.75 rows=1223533 width=4) (actual time=2156.962..2156.964 rows=1 loops=
1)
Filter: (cseid = 94)
Rows Removed by Filter: 18320180 Planning Time: 0.086 ms Execution Time: 2156.985 ms
(6 行)

> If the estimate is far off, then increasing the table's statistics
> target might help.
Thank you for your advice.
Please tell me how to set the table's statistics up to improve performance.

new=# select oid from pg_class where relname = 'analyze_word_reports';
oid
-------
16429
(1 行)

new=# select attrelid,attname,attstattarget from pg_attribute where attrelid=16429 and attname='cseid';
attrelid | attname | attstattarget
----------+---------+---------------
16429 | cseid | -1
(1 行)

> Another thing that would be worth checking is whether
> "set enable_seqscan = off" prods it to choose the plan you want.
> If not, then there's something else going on besides poor estimates.
"set enable_seqscan = off" works, and the performance is greatly improved, which is almost the same as PostgreSQL 8.4.
The enable_seqscan(PostgreSQL 8.4) is on, will this change have an unknown effect on other queries?
At 2022-10-10 10:45:54, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>gzh  <gzhcoder@126.com> writes:
>> I've run analyze(not vacuum analyze), but it doesn't seem to work.
>
>When you're asking for help, please don't give us vague statements
>like "doesn't seem to work".  Did the plan (including rowcount
>estimates) change at all?  To what?  How far off is that rowcount
>estimate, anyway --- that is, how many rows actually have cseid = 94?
>
>If the estimate is far off, then increasing the table's statistics
>target might help.
>
>Another thing that would be worth checking is whether
>"set enable_seqscan = off" prods it to choose the plan you want.
>If not, then there's something else going on besides poor estimates.
>
>			regards, tom lane

Re: Re: Re: Re: Different execution plan between PostgreSQL 8.4 and 12.11

From

Pavel Stehule

Date:

11 October 2022, 05:24:12

út 11. 10. 2022 v 7:08 odesílatel gzh <gzhcoder@126.com> napsal:

Hi, Pavel

> The LIMIT clause changes total cost. This is a very aggressive clause. And
> although it is absolutely useless in this case, Postgres does not have any
> logic for removing it. Postgres doesn't try to fix developer's mistakes.
Sorry,I didn't understand what you mean.
Couldn't the LIMIT clause be used like the SQL statement below?

>> new=# explain analyze select 2 from analyze_word_reports where (cseid =
>> 94) limit 1;

there was query

SELECT aggregate() FROM xx LIMIT 1

This SQL statement is no problem under PostgreSQL 8.4, the index works well.

The optimizer is under nonstop change. And you can expect from any new release

75% queries are without change, 20% queries are faster, and 5% queries are slower

The optimization is based on statistics and estimations, and searching for the optimal solution in space of all solutions. In any version there are smaller or bigger changes of estimation methods, and between old 8.4 and 12 there are big changes in possibilities of how the query can be executed. So there is a higher possibility to find some really fast queries, but there is a higher possibility to find some local optimum or slow query too. Usually the optimizer is smarter (what is the benefit), but more sensitive too (what is the cost). You cannot expect the same result, when the data and algorithm is changed in any version. Postgres doesn't garant the immutability of execution plans.

The clause LIMIT with low LIMIT value can be problematic in more cases. The model in Postgres expects data are uniformly stored in the table (heap), but the reality can be different. The common trick in these cases is using OFFSET 0 clause like

SELECT * FROM (SELECT * FROM foo LIMIT 1000 OFFSET 0) s LIMIT 10.

At 2022-10-11 12:13:47, "Pavel Stehule" <pavel.stehule@gmail.com> wrote:
út 11. 10. 2022 v 6:05 odesílatel gzh <gzhcoder@126.com> napsal:

Hi, Pavel
Thank you for your reply.

> the LIMIT clause is in this case totally useless and messy, and maybe can
> negative impacts optimizer
Yes. After removing the LIMIT clause, the performance is improved.
The execution plan shows that the index worked.
We've noticed it, but I don't want to fix the problem by modifying the SQL until I find the cause.

The LIMIT clause changes total cost. This is a very aggressive clause. And although it is absolutely useless in this case, Postgres does not have any logic for removing it. Postgres doesn't try to fix developer's mistakes.
At 2022-10-11 11:32:48, "Pavel Stehule" <pavel.stehule@gmail.com> wrote:
út 11. 10. 2022 v 5:13 odesílatel gzh <gzhcoder@126.com> napsal:
Hi, Tom
Thank you for your reply.

> When you're asking for help, please don't give us vague statements
> like "doesn't seem to work".
I understand.

> Did the plan (including rowcount
> estimates) change at all? To what? How far off is that rowcount
> estimate, anyway --- that is, how many rows actually have cseid = 94?
Please refer to the new execution plan (PostgreSQL 12.11) below.

new=# show enable_seqscan;
enable_seqscan
----------------
on
(1 行)

new=# select count(*) from analyze_word_reports;
count
----------
21331980
(1 行)

new=# select count(*) from analyze_word_reports where (cseid = 94);
count
---------
1287156
(1 行)

new=# explain analyze select count(2) from analyze_word_reports where (cseid = 94) limit 1;
QUERY PLAN

the LIMIT clause is in this case totally useless and messy, and maybe can negative impacts optimizer

Regards

Pavel
----------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------
Limit (cost=65184.06..65184.07 rows=1 width=8) (actual time=123.713..133.035 rows=1 loops=1)
-> Finalize Aggregate (cost=65184.06..65184.07 rows=1 width=8) (actual time=123.712..133.033 rows=1 loops=1)
-> Gather (cost=65183.85..65184.06 rows=2 width=8) (actual time=123.548..133.024 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=64183.85..64183.86 rows=1 width=8) (actual time=119.495..119.496 rows=1 loops=3)
-> Parallel Index Only Scan using analyze_word_reports_index_cseid on analyze_word_reports (cost=0.56..6290
9.34 rows=509805 width=0) (actual time=0.031..98.706 rows=429052 loops=3)
Index Cond: (cseid = 94)
Heap Fetches: 1287156 Planning Time: 0.122 ms Execution Time: 133.069 ms
(11 行)

new=# explain analyze select 2 from analyze_word_reports where (cseid = 94) limit 1;
QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
---
Limit (cost=0.00..0.43 rows=1 width=4) (actual time=2156.964..2156.966 rows=1 loops=1)
-> Seq Scan on analyze_word_reports (cost=0.00..528550.75 rows=1223533 width=4) (actual time=2156.962..2156.964 rows=1 loops=
1)
Filter: (cseid = 94)
Rows Removed by Filter: 18320180 Planning Time: 0.086 ms Execution Time: 2156.985 ms
(6 行)

> If the estimate is far off, then increasing the table's statistics
> target might help.
Thank you for your advice.
Please tell me how to set the table's statistics up to improve performance.

new=# select oid from pg_class where relname = 'analyze_word_reports';
oid
-------
16429
(1 行)

new=# select attrelid,attname,attstattarget from pg_attribute where attrelid=16429 and attname='cseid';
attrelid | attname | attstattarget
----------+---------+---------------
16429 | cseid | -1
(1 行)

> Another thing that would be worth checking is whether
> "set enable_seqscan = off" prods it to choose the plan you want.
> If not, then there's something else going on besides poor estimates.
"set enable_seqscan = off" works, and the performance is greatly improved, which is almost the same as PostgreSQL 8.4.
The enable_seqscan(PostgreSQL 8.4) is on, will this change have an unknown effect on other queries?
At 2022-10-10 10:45:54, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>gzh  <gzhcoder@126.com> writes:
>> I've run analyze(not vacuum analyze), but it doesn't seem to work.
>
>When you're asking for help, please don't give us vague statements
>like "doesn't seem to work".  Did the plan (including rowcount
>estimates) change at all?  To what?  How far off is that rowcount
>estimate, anyway --- that is, how many rows actually have cseid = 94?
>
>If the estimate is far off, then increasing the table's statistics
>target might help.
>
>Another thing that would be worth checking is whether
>"set enable_seqscan = off" prods it to choose the plan you want.
>If not, then there's something else going on besides poor estimates.
>
>			regards, tom lane

Re:Re: Re: Re: Re: Different execution plan between PostgreSQL 8.4 and 12.11

From

gzh

Date:

11 October 2022, 08:01:41

Thank you for providing the requested information.

The WebSite has been used for many years, and this upgrade is only a version upgrade of the PostgreSQL database.

My customer does not want to modify the SQL because it will increase the cost of the project(All SQL that contains a LIMIT clause needs to be analyzed and checked).

Is there no other way to solve the problem?

At 2022-10-11 13:24:12, "Pavel Stehule" <pavel.stehule@gmail.com> wrote:

út 11. 10. 2022 v 7:08 odesílatel gzh <gzhcoder@126.com> napsal:
Hi, Pavel

> The LIMIT clause changes total cost. This is a very aggressive clause. And
> although it is absolutely useless in this case, Postgres does not have any
> logic for removing it. Postgres doesn't try to fix developer's mistakes.
Sorry,I didn't understand what you mean.
Couldn't the LIMIT clause be used like the SQL statement below?

>> new=# explain analyze select 2 from analyze_word_reports where (cseid =
>> 94) limit 1;

there was query

SELECT aggregate() FROM xx LIMIT 1

This SQL statement is no problem under PostgreSQL 8.4, the index works well.

The optimizer is under nonstop change. And you can expect from any new release

75% queries are without change, 20% queries are faster, and 5% queries are slower

The optimization is based on statistics and estimations, and searching for the optimal solution in space of all solutions. In any version there are smaller or bigger changes of estimation methods, and between old 8.4 and 12 there are big changes in possibilities of how the query can be executed. So there is a higher possibility to find some really fast queries, but there is a higher possibility to find some local optimum or slow query too. Usually the optimizer is smarter (what is the benefit), but more sensitive too (what is the cost). You cannot expect the same result, when the data and algorithm is changed in any version. Postgres doesn't garant the immutability of execution plans.

The clause LIMIT with low LIMIT value can be problematic in more cases. The model in Postgres expects data are uniformly stored in the table (heap), but the reality can be different. The common trick in these cases is using OFFSET 0 clause like

SELECT * FROM (SELECT * FROM foo LIMIT 1000 OFFSET 0) s LIMIT 10.
At 2022-10-11 12:13:47, "Pavel Stehule" <pavel.stehule@gmail.com> wrote:
út 11. 10. 2022 v 6:05 odesílatel gzh <gzhcoder@126.com> napsal:

Hi, Pavel
Thank you for your reply.

> the LIMIT clause is in this case totally useless and messy, and maybe can
> negative impacts optimizer
Yes. After removing the LIMIT clause, the performance is improved.
The execution plan shows that the index worked.
We've noticed it, but I don't want to fix the problem by modifying the SQL until I find the cause.

The LIMIT clause changes total cost. This is a very aggressive clause. And although it is absolutely useless in this case, Postgres does not have any logic for removing it. Postgres doesn't try to fix developer's mistakes.
At 2022-10-11 11:32:48, "Pavel Stehule" <pavel.stehule@gmail.com> wrote:
út 11. 10. 2022 v 5:13 odesílatel gzh <gzhcoder@126.com> napsal:
Hi, Tom
Thank you for your reply.

> When you're asking for help, please don't give us vague statements
> like "doesn't seem to work".
I understand.

> Did the plan (including rowcount
> estimates) change at all? To what? How far off is that rowcount
> estimate, anyway --- that is, how many rows actually have cseid = 94?
Please refer to the new execution plan (PostgreSQL 12.11) below.

new=# show enable_seqscan;
enable_seqscan
----------------
on
(1 行)

new=# select count(*) from analyze_word_reports;
count
----------
21331980
(1 行)

new=# select count(*) from analyze_word_reports where (cseid = 94);
count
---------
1287156
(1 行)

new=# explain analyze select count(2) from analyze_word_reports where (cseid = 94) limit 1;
QUERY PLAN

the LIMIT clause is in this case totally useless and messy, and maybe can negative impacts optimizer

Regards

Pavel
----------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------
Limit (cost=65184.06..65184.07 rows=1 width=8) (actual time=123.713..133.035 rows=1 loops=1)
-> Finalize Aggregate (cost=65184.06..65184.07 rows=1 width=8) (actual time=123.712..133.033 rows=1 loops=1)
-> Gather (cost=65183.85..65184.06 rows=2 width=8) (actual time=123.548..133.024 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=64183.85..64183.86 rows=1 width=8) (actual time=119.495..119.496 rows=1 loops=3)
-> Parallel Index Only Scan using analyze_word_reports_index_cseid on analyze_word_reports (cost=0.56..6290
9.34 rows=509805 width=0) (actual time=0.031..98.706 rows=429052 loops=3)
Index Cond: (cseid = 94)
Heap Fetches: 1287156 Planning Time: 0.122 ms Execution Time: 133.069 ms
(11 行)

new=# explain analyze select 2 from analyze_word_reports where (cseid = 94) limit 1;
QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
---
Limit (cost=0.00..0.43 rows=1 width=4) (actual time=2156.964..2156.966 rows=1 loops=1)
-> Seq Scan on analyze_word_reports (cost=0.00..528550.75 rows=1223533 width=4) (actual time=2156.962..2156.964 rows=1 loops=
1)
Filter: (cseid = 94)
Rows Removed by Filter: 18320180 Planning Time: 0.086 ms Execution Time: 2156.985 ms
(6 行)

> If the estimate is far off, then increasing the table's statistics
> target might help.
Thank you for your advice.
Please tell me how to set the table's statistics up to improve performance.

new=# select oid from pg_class where relname = 'analyze_word_reports';
oid
-------
16429
(1 行)

new=# select attrelid,attname,attstattarget from pg_attribute where attrelid=16429 and attname='cseid';
attrelid | attname | attstattarget
----------+---------+---------------
16429 | cseid | -1
(1 行)

> Another thing that would be worth checking is whether
> "set enable_seqscan = off" prods it to choose the plan you want.
> If not, then there's something else going on besides poor estimates.
"set enable_seqscan = off" works, and the performance is greatly improved, which is almost the same as PostgreSQL 8.4.
The enable_seqscan(PostgreSQL 8.4) is on, will this change have an unknown effect on other queries?
At 2022-10-10 10:45:54, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>gzh  <gzhcoder@126.com> writes:
>> I've run analyze(not vacuum analyze), but it doesn't seem to work.
>
>When you're asking for help, please don't give us vague statements
>like "doesn't seem to work".  Did the plan (including rowcount
>estimates) change at all?  To what?  How far off is that rowcount
>estimate, anyway --- that is, how many rows actually have cseid = 94?
>
>If the estimate is far off, then increasing the table's statistics
>target might help.
>
>Another thing that would be worth checking is whether
>"set enable_seqscan = off" prods it to choose the plan you want.
>If not, then there's something else going on besides poor estimates.
>
>			regards, tom lane

Re: Re: Re: Re: Re: Different execution plan between PostgreSQL 8.4 and 12.11

From

Pavel Stehule

Date:

11 October 2022, 08:21:42

út 11. 10. 2022 v 10:01 odesílatel gzh <gzhcoder@126.com> napsal:

Thank you for providing the requested information.

The WebSite has been used for many years, and this upgrade is only a version upgrade of the PostgreSQL database.
My customer does not want to modify the SQL because it will increase the cost of the project(All SQL that contains a LIMIT clause needs to be analyzed and checked).

Is there no other way to solve the problem?

I don't know about any alternative

Regards

Pavel

At 2022-10-11 13:24:12, "Pavel Stehule" <pavel.stehule@gmail.com> wrote:
út 11. 10. 2022 v 7:08 odesílatel gzh <gzhcoder@126.com> napsal:
Hi, Pavel

> The LIMIT clause changes total cost. This is a very aggressive clause. And
> although it is absolutely useless in this case, Postgres does not have any
> logic for removing it. Postgres doesn't try to fix developer's mistakes.
Sorry,I didn't understand what you mean.
Couldn't the LIMIT clause be used like the SQL statement below?

>> new=# explain analyze select 2 from analyze_word_reports where (cseid =
>> 94) limit 1;

there was query

SELECT aggregate() FROM xx LIMIT 1

This SQL statement is no problem under PostgreSQL 8.4, the index works well.

The optimizer is under nonstop change. And you can expect from any new release

75% queries are without change, 20% queries are faster, and 5% queries are slower

The optimization is based on statistics and estimations, and searching for the optimal solution in space of all solutions. In any version there are smaller or bigger changes of estimation methods, and between old 8.4 and 12 there are big changes in possibilities of how the query can be executed. So there is a higher possibility to find some really fast queries, but there is a higher possibility to find some local optimum or slow query too. Usually the optimizer is smarter (what is the benefit), but more sensitive too (what is the cost). You cannot expect the same result, when the data and algorithm is changed in any version. Postgres doesn't garant the immutability of execution plans.

The clause LIMIT with low LIMIT value can be problematic in more cases. The model in Postgres expects data are uniformly stored in the table (heap), but the reality can be different. The common trick in these cases is using OFFSET 0 clause like

SELECT * FROM (SELECT * FROM foo LIMIT 1000 OFFSET 0) s LIMIT 10.
At 2022-10-11 12:13:47, "Pavel Stehule" <pavel.stehule@gmail.com> wrote:
út 11. 10. 2022 v 6:05 odesílatel gzh <gzhcoder@126.com> napsal:

Hi, Pavel
Thank you for your reply.

> the LIMIT clause is in this case totally useless and messy, and maybe can
> negative impacts optimizer
Yes. After removing the LIMIT clause, the performance is improved.
The execution plan shows that the index worked.
We've noticed it, but I don't want to fix the problem by modifying the SQL until I find the cause.

The LIMIT clause changes total cost. This is a very aggressive clause. And although it is absolutely useless in this case, Postgres does not have any logic for removing it. Postgres doesn't try to fix developer's mistakes.
At 2022-10-11 11:32:48, "Pavel Stehule" <pavel.stehule@gmail.com> wrote:
út 11. 10. 2022 v 5:13 odesílatel gzh <gzhcoder@126.com> napsal:
Hi, Tom
Thank you for your reply.

> When you're asking for help, please don't give us vague statements
> like "doesn't seem to work".
I understand.

> Did the plan (including rowcount
> estimates) change at all? To what? How far off is that rowcount
> estimate, anyway --- that is, how many rows actually have cseid = 94?
Please refer to the new execution plan (PostgreSQL 12.11) below.

new=# show enable_seqscan;
enable_seqscan
----------------
on
(1 行)

new=# select count(*) from analyze_word_reports;
count
----------
21331980
(1 行)

new=# select count(*) from analyze_word_reports where (cseid = 94);
count
---------
1287156
(1 行)

new=# explain analyze select count(2) from analyze_word_reports where (cseid = 94) limit 1;
QUERY PLAN

the LIMIT clause is in this case totally useless and messy, and maybe can negative impacts optimizer

Regards

Pavel
----------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------
Limit (cost=65184.06..65184.07 rows=1 width=8) (actual time=123.713..133.035 rows=1 loops=1)
-> Finalize Aggregate (cost=65184.06..65184.07 rows=1 width=8) (actual time=123.712..133.033 rows=1 loops=1)
-> Gather (cost=65183.85..65184.06 rows=2 width=8) (actual time=123.548..133.024 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=64183.85..64183.86 rows=1 width=8) (actual time=119.495..119.496 rows=1 loops=3)
-> Parallel Index Only Scan using analyze_word_reports_index_cseid on analyze_word_reports (cost=0.56..6290
9.34 rows=509805 width=0) (actual time=0.031..98.706 rows=429052 loops=3)
Index Cond: (cseid = 94)
Heap Fetches: 1287156 Planning Time: 0.122 ms Execution Time: 133.069 ms
(11 行)

new=# explain analyze select 2 from analyze_word_reports where (cseid = 94) limit 1;
QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
---
Limit (cost=0.00..0.43 rows=1 width=4) (actual time=2156.964..2156.966 rows=1 loops=1)
-> Seq Scan on analyze_word_reports (cost=0.00..528550.75 rows=1223533 width=4) (actual time=2156.962..2156.964 rows=1 loops=
1)
Filter: (cseid = 94)
Rows Removed by Filter: 18320180 Planning Time: 0.086 ms Execution Time: 2156.985 ms
(6 行)

> If the estimate is far off, then increasing the table's statistics
> target might help.
Thank you for your advice.
Please tell me how to set the table's statistics up to improve performance.

new=# select oid from pg_class where relname = 'analyze_word_reports';
oid
-------
16429
(1 行)

new=# select attrelid,attname,attstattarget from pg_attribute where attrelid=16429 and attname='cseid';
attrelid | attname | attstattarget
----------+---------+---------------
16429 | cseid | -1
(1 行)

> Another thing that would be worth checking is whether
> "set enable_seqscan = off" prods it to choose the plan you want.
> If not, then there's something else going on besides poor estimates.
"set enable_seqscan = off" works, and the performance is greatly improved, which is almost the same as PostgreSQL 8.4.
The enable_seqscan(PostgreSQL 8.4) is on, will this change have an unknown effect on other queries?
At 2022-10-10 10:45:54, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>gzh  <gzhcoder@126.com> writes:
>> I've run analyze(not vacuum analyze), but it doesn't seem to work.
>
>When you're asking for help, please don't give us vague statements
>like "doesn't seem to work".  Did the plan (including rowcount
>estimates) change at all?  To what?  How far off is that rowcount
>estimate, anyway --- that is, how many rows actually have cseid = 94?
>
>If the estimate is far off, then increasing the table's statistics
>target might help.
>
>Another thing that would be worth checking is whether
>"set enable_seqscan = off" prods it to choose the plan you want.
>If not, then there's something else going on besides poor estimates.
>
>			regards, tom lane

Re: Different execution plan between PostgreSQL 8.4 and 12.11

From

gzh

Date:

11 October 2022, 09:17:22

Thank you for all your assistance.

By communicating with my customer, we have adopted the following solution to fix the problem.

set enable_seqscan = off

At 2022-10-11 16:21:42, "Pavel Stehule" <pavel.stehule@gmail.com> wrote:

út 11. 10. 2022 v 10:01 odesílatel gzh <gzhcoder@126.com> napsal:
Thank you for providing the requested information.

The WebSite has been used for many years, and this upgrade is only a version upgrade of the PostgreSQL database.
My customer does not want to modify the SQL because it will increase the cost of the project(All SQL that contains a LIMIT clause needs to be analyzed and checked).

Is there no other way to solve the problem?

I don't know about any alternative

Regards

Pavel
At 2022-10-11 13:24:12, "Pavel Stehule" <pavel.stehule@gmail.com> wrote:
út 11. 10. 2022 v 7:08 odesílatel gzh <gzhcoder@126.com> napsal:
Hi, Pavel

> The LIMIT clause changes total cost. This is a very aggressive clause. And
> although it is absolutely useless in this case, Postgres does not have any
> logic for removing it. Postgres doesn't try to fix developer's mistakes.
Sorry,I didn't understand what you mean.
Couldn't the LIMIT clause be used like the SQL statement below?

>> new=# explain analyze select 2 from analyze_word_reports where (cseid =
>> 94) limit 1;

there was query

SELECT aggregate() FROM xx LIMIT 1

This SQL statement is no problem under PostgreSQL 8.4, the index works well.

The optimizer is under nonstop change. And you can expect from any new release

75% queries are without change, 20% queries are faster, and 5% queries are slower

The optimization is based on statistics and estimations, and searching for the optimal solution in space of all solutions. In any version there are smaller or bigger changes of estimation methods, and between old 8.4 and 12 there are big changes in possibilities of how the query can be executed. So there is a higher possibility to find some really fast queries, but there is a higher possibility to find some local optimum or slow query too. Usually the optimizer is smarter (what is the benefit), but more sensitive too (what is the cost). You cannot expect the same result, when the data and algorithm is changed in any version. Postgres doesn't garant the immutability of execution plans.

The clause LIMIT with low LIMIT value can be problematic in more cases. The model in Postgres expects data are uniformly stored in the table (heap), but the reality can be different. The common trick in these cases is using OFFSET 0 clause like

SELECT * FROM (SELECT * FROM foo LIMIT 1000 OFFSET 0) s LIMIT 10.
At 2022-10-11 12:13:47, "Pavel Stehule" <pavel.stehule@gmail.com> wrote:
út 11. 10. 2022 v 6:05 odesílatel gzh <gzhcoder@126.com> napsal:

Hi, Pavel
Thank you for your reply.

> the LIMIT clause is in this case totally useless and messy, and maybe can
> negative impacts optimizer
Yes. After removing the LIMIT clause, the performance is improved.
The execution plan shows that the index worked.
We've noticed it, but I don't want to fix the problem by modifying the SQL until I find the cause.

The LIMIT clause changes total cost. This is a very aggressive clause. And although it is absolutely useless in this case, Postgres does not have any logic for removing it. Postgres doesn't try to fix developer's mistakes.
At 2022-10-11 11:32:48, "Pavel Stehule" <pavel.stehule@gmail.com> wrote:
út 11. 10. 2022 v 5:13 odesílatel gzh <gzhcoder@126.com> napsal:
Hi, Tom
Thank you for your reply.

> When you're asking for help, please don't give us vague statements
> like "doesn't seem to work".
I understand.

> Did the plan (including rowcount
> estimates) change at all? To what? How far off is that rowcount
> estimate, anyway --- that is, how many rows actually have cseid = 94?
Please refer to the new execution plan (PostgreSQL 12.11) below.

new=# show enable_seqscan;
enable_seqscan
----------------
on
(1 行)

new=# select count(*) from analyze_word_reports;
count
----------
21331980
(1 行)

new=# select count(*) from analyze_word_reports where (cseid = 94);
count
---------
1287156
(1 行)

new=# explain analyze select count(2) from analyze_word_reports where (cseid = 94) limit 1;
QUERY PLAN

the LIMIT clause is in this case totally useless and messy, and maybe can negative impacts optimizer

Regards

Pavel
----------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------
Limit (cost=65184.06..65184.07 rows=1 width=8) (actual time=123.713..133.035 rows=1 loops=1)
-> Finalize Aggregate (cost=65184.06..65184.07 rows=1 width=8) (actual time=123.712..133.033 rows=1 loops=1)
-> Gather (cost=65183.85..65184.06 rows=2 width=8) (actual time=123.548..133.024 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=64183.85..64183.86 rows=1 width=8) (actual time=119.495..119.496 rows=1 loops=3)
-> Parallel Index Only Scan using analyze_word_reports_index_cseid on analyze_word_reports (cost=0.56..6290
9.34 rows=509805 width=0) (actual time=0.031..98.706 rows=429052 loops=3)
Index Cond: (cseid = 94)
Heap Fetches: 1287156 Planning Time: 0.122 ms Execution Time: 133.069 ms
(11 行)

new=# explain analyze select 2 from analyze_word_reports where (cseid = 94) limit 1;
QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
---
Limit (cost=0.00..0.43 rows=1 width=4) (actual time=2156.964..2156.966 rows=1 loops=1)
-> Seq Scan on analyze_word_reports (cost=0.00..528550.75 rows=1223533 width=4) (actual time=2156.962..2156.964 rows=1 loops=
1)
Filter: (cseid = 94)
Rows Removed by Filter: 18320180 Planning Time: 0.086 ms Execution Time: 2156.985 ms
(6 行)

> If the estimate is far off, then increasing the table's statistics
> target might help.
Thank you for your advice.
Please tell me how to set the table's statistics up to improve performance.

new=# select oid from pg_class where relname = 'analyze_word_reports';
oid
-------
16429
(1 行)

new=# select attrelid,attname,attstattarget from pg_attribute where attrelid=16429 and attname='cseid';
attrelid | attname | attstattarget
----------+---------+---------------
16429 | cseid | -1
(1 行)

> Another thing that would be worth checking is whether
> "set enable_seqscan = off" prods it to choose the plan you want.
> If not, then there's something else going on besides poor estimates.
"set enable_seqscan = off" works, and the performance is greatly improved, which is almost the same as PostgreSQL 8.4.
The enable_seqscan(PostgreSQL 8.4) is on, will this change have an unknown effect on other queries?
At 2022-10-10 10:45:54, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>gzh  <gzhcoder@126.com> writes:
>> I've run analyze(not vacuum analyze), but it doesn't seem to work.
>
>When you're asking for help, please don't give us vague statements
>like "doesn't seem to work".  Did the plan (including rowcount
>estimates) change at all?  To what?  How far off is that rowcount
>estimate, anyway --- that is, how many rows actually have cseid = 94?
>
>If the estimate is far off, then increasing the table's statistics
>target might help.
>
>Another thing that would be worth checking is whether
>"set enable_seqscan = off" prods it to choose the plan you want.
>If not, then there's something else going on besides poor estimates.
>
>			regards, tom lane

Re: Different execution plan between PostgreSQL 8.4 and 12.11

From

Josef Šimánek

Date:

11 October 2022, 09:52:34

út 11. 10. 2022 v 11:17 odesílatel gzh <gzhcoder@126.com> napsal:
>
> Thank you for all your assistance.
>
>
> By communicating with my customer, we have adopted the following solution to fix the problem.
>
>
> set enable_seqscan = off
>
This can make some queries fail since there will be no way to gather
data without seqscan.
>
>
>
>
> At 2022-10-11 16:21:42, "Pavel Stehule" <pavel.stehule@gmail.com> wrote:
>
>
>
> út 11. 10. 2022 v 10:01 odesílatel gzh <gzhcoder@126.com> napsal:
>>
>> Thank you for providing the requested information.
>>
>>
>> The WebSite has been used for many years, and this upgrade is only a version upgrade of the PostgreSQL database.
>>
>> My customer does not want to modify the SQL because it will increase the cost of the project(All SQL that contains a
LIMITclause needs to be analyzed and checked). 
>>
>> Is there no other way to solve the problem?
>
>
> I don't know about any alternative
>
> Regards
>
> Pavel
>
>>
>>
>>
>>
>>
>> At 2022-10-11 13:24:12, "Pavel Stehule" <pavel.stehule@gmail.com> wrote:
>>
>>
>>
>> út 11. 10. 2022 v 7:08 odesílatel gzh <gzhcoder@126.com> napsal:
>>>
>>> Hi, Pavel
>>>
>>>
>>> > The LIMIT clause changes total cost.  This is a very aggressive clause. And
>>>
>>> > although it is absolutely useless in this case, Postgres does not have any
>>>
>>> > logic for removing it. Postgres doesn't try to fix developer's mistakes.
>>>
>>> Sorry,I didn't understand what you mean.
>>>
>>> Couldn't the LIMIT clause be used like the SQL statement below?
>>>
>>>
>>> >> new=# explain analyze select 2 from analyze_word_reports where (cseid =
>>>
>>> >> 94) limit 1;
>>
>>
>> there was query
>>
>> SELECT aggregate() FROM xx LIMIT 1
>>
>>>
>>> This SQL statement is no problem under PostgreSQL 8.4, the index works well.
>>>
>>>
>>
>> The optimizer is under nonstop change. And you can expect from any new release
>>
>> 75% queries are without change, 20% queries are faster, and 5% queries are slower
>>
>> The optimization is based on statistics and estimations, and searching for the optimal solution in space of all
solutions.In any version there are smaller or bigger changes of estimation methods, and between old 8.4 and 12 there
arebig changes in possibilities of how the query can be executed. So there is a higher possibility to find some really
fastqueries, but there is a higher possibility to find some local optimum or slow query too.  Usually the optimizer is
smarter(what is the benefit), but more sensitive too (what is the cost). You cannot expect the same result, when the
dataand algorithm is changed in any version. Postgres doesn't garant the immutability of execution plans. 
>>
>> The clause LIMIT with low LIMIT value can be problematic in more cases. The model in Postgres expects data are
uniformlystored in the table (heap), but the reality can be different. The common trick in these cases is using OFFSET
0clause like 
>>
>> SELECT * FROM (SELECT * FROM foo LIMIT 1000 OFFSET 0) s LIMIT 10.
>>
>>
>>
>>
>>
>>
>>>
>>>
>>>
>>>
>>>
>>> At 2022-10-11 12:13:47, "Pavel Stehule" <pavel.stehule@gmail.com> wrote:
>>>
>>>
>>>
>>> út 11. 10. 2022 v 6:05 odesílatel gzh <gzhcoder@126.com> napsal:
>>>>
>>>>
>>>> Hi, Pavel
>>>>
>>>> Thank you for your reply.
>>>>
>>>>
>>>> > the LIMIT clause is in this case totally useless and messy, and maybe can
>>>>
>>>> > negative impacts optimizer
>>>>
>>>> Yes. After removing the LIMIT clause, the performance is improved.
>>>>
>>>> The execution plan shows that the index worked.
>>>>
>>>> We've noticed it, but I don't want to fix the problem by modifying the SQL until I find the cause.
>>>
>>>
>>> The LIMIT clause changes total cost.  This is a very aggressive clause. And although it is absolutely useless in
thiscase, Postgres does not have any logic for removing it. Postgres doesn't try to fix developer's mistakes. 
>>>
>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> At 2022-10-11 11:32:48, "Pavel Stehule" <pavel.stehule@gmail.com> wrote:
>>>>
>>>>
>>>>
>>>> út 11. 10. 2022 v 5:13 odesílatel gzh <gzhcoder@126.com> napsal:
>>>>>
>>>>> Hi, Tom
>>>>> Thank you for your reply.
>>>>>
>>>>> > When you're asking for help, please don't give us vague statements
>>>>>
>>>>> > like "doesn't seem to work".
>>>>>
>>>>> I understand.
>>>>>
>>>>>
>>>>> > Did the plan (including rowcount
>>>>>
>>>>> > estimates) change at all?  To what?  How far off is that rowcount
>>>>>
>>>>> > estimate, anyway --- that is, how many rows actually have cseid = 94?
>>>>>
>>>>> Please refer to the new execution plan (PostgreSQL 12.11) below.
>>>>>
>>>>>
>>>>> new=# show enable_seqscan;
>>>>>
>>>>>  enable_seqscan
>>>>>
>>>>> ----------------
>>>>>
>>>>>  on
>>>>>
>>>>> (1 行)
>>>>>
>>>>>
>>>>> new=# select count(*) from analyze_word_reports;
>>>>>
>>>>>   count
>>>>>
>>>>> ----------
>>>>>
>>>>>  21331980
>>>>>
>>>>> (1 行)
>>>>>
>>>>>
>>>>> new=# select count(*) from analyze_word_reports where (cseid = 94);
>>>>>
>>>>>   count
>>>>>
>>>>> ---------
>>>>>
>>>>>  1287156
>>>>>
>>>>> (1 行)
>>>>>
>>>>>
>>>>> new=# explain analyze select count(2) from analyze_word_reports where (cseid = 94) limit 1;
>>>>>
>>>>>                                                                                                  QUERY PLAN
>>>>>
>>>>>
>>>>
>>>>
>>>> the LIMIT clause is in this case totally useless and messy, and maybe can negative impacts optimizer
>>>>
>>>> Regards
>>>>
>>>> Pavel
>>>>
>>>>
>>>>>
>>>>>
----------------------------------------------------------------------------------------------------------------------------------
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>>
>>>>>  Limit  (cost=65184.06..65184.07 rows=1 width=8) (actual time=123.713..133.035 rows=1 loops=1)
>>>>>
>>>>>    ->  Finalize Aggregate  (cost=65184.06..65184.07 rows=1 width=8) (actual time=123.712..133.033 rows=1 loops=1)
>>>>>
>>>>>          ->  Gather  (cost=65183.85..65184.06 rows=2 width=8) (actual time=123.548..133.024 rows=3 loops=1)
>>>>>
>>>>>                Workers Planned: 2
>>>>>
>>>>>                Workers Launched: 2
>>>>>
>>>>>                ->  Partial Aggregate  (cost=64183.85..64183.86 rows=1 width=8) (actual time=119.495..119.496
rows=1loops=3) 
>>>>>
>>>>>                      ->  Parallel Index Only Scan using analyze_word_reports_index_cseid on analyze_word_reports
(cost=0.56..6290
>>>>>
>>>>> 9.34 rows=509805 width=0) (actual time=0.031..98.706 rows=429052 loops=3)
>>>>>
>>>>>                            Index Cond: (cseid = 94)
>>>>>
>>>>>                            Heap Fetches: 1287156  Planning Time: 0.122 ms  Execution Time: 133.069 ms
>>>>>
>>>>> (11 行)
>>>>>
>>>>>
>>>>> new=# explain analyze select 2 from analyze_word_reports where (cseid = 94) limit 1;
>>>>>
>>>>>                                                              QUERY PLAN
>>>>>
>>>>>
>>>>>
>>>>>
----------------------------------------------------------------------------------------------------------------------------------
>>>>>
>>>>> ---
>>>>>
>>>>>  Limit  (cost=0.00..0.43 rows=1 width=4) (actual time=2156.964..2156.966 rows=1 loops=1)
>>>>>
>>>>>    ->  Seq Scan on analyze_word_reports  (cost=0.00..528550.75 rows=1223533 width=4) (actual
time=2156.962..2156.964rows=1 loops= 
>>>>>
>>>>> 1)
>>>>>
>>>>>          Filter: (cseid = 94)
>>>>>
>>>>>          Rows Removed by Filter: 18320180  Planning Time: 0.086 ms  Execution Time: 2156.985 ms
>>>>>
>>>>> (6 行)
>>>>>
>>>>>
>>>>>
>>>>> > If the estimate is far off, then increasing the table's statistics
>>>>>
>>>>> > target might help.
>>>>>
>>>>> Thank you for your advice.
>>>>>
>>>>> Please tell me how to set the table's statistics up to improve performance.
>>>>>
>>>>>
>>>>> new=#  select oid from pg_class where relname = 'analyze_word_reports';
>>>>>
>>>>>   oid
>>>>>
>>>>> -------
>>>>>
>>>>> 16429
>>>>>
>>>>> (1 行)
>>>>>
>>>>>
>>>>> new=# select attrelid,attname,attstattarget from pg_attribute where attrelid=16429 and attname='cseid';
>>>>>
>>>>> attrelid | attname | attstattarget
>>>>>
>>>>> ----------+---------+---------------
>>>>>
>>>>>     16429 | cseid   |            -1
>>>>>
>>>>> (1 行)
>>>>>
>>>>>
>>>>> > Another thing that would be worth checking is whether
>>>>>
>>>>> > "set enable_seqscan = off" prods it to choose the plan you want.
>>>>>
>>>>> > If not, then there's something else going on besides poor estimates.
>>>>>
>>>>> "set enable_seqscan = off" works, and the performance is greatly improved, which is almost the same as PostgreSQL
8.4.
>>>>>
>>>>> The enable_seqscan(PostgreSQL 8.4) is on, will this change have an unknown effect on other queries?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> At 2022-10-10 10:45:54, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>>>>> >gzh  <gzhcoder@126.com> writes:
>>>>> >> I've run analyze(not vacuum analyze), but it doesn't seem to work.
>>>>> >
>>>>> >When you're asking for help, please don't give us vague statements
>>>>> >like "doesn't seem to work".  Did the plan (including rowcount
>>>>> >estimates) change at all?  To what?  How far off is that rowcount
>>>>> >estimate, anyway --- that is, how many rows actually have cseid = 94?
>>>>> >
>>>>> >If the estimate is far off, then increasing the table's statistics
>>>>> >target might help.
>>>>> >
>>>>> >Another thing that would be worth checking is whether
>>>>> >"set enable_seqscan = off" prods it to choose the plan you want.
>>>>> >If not, then there's something else going on besides poor estimates.
>>>>> >
>>>>> > regards, tom lane

Re: Different execution plan between PostgreSQL 8.4 and 12.11

From

David Rowley

Date:

11 October 2022, 09:59:45

On Tue, 11 Oct 2022 at 22:52, Josef Šimánek <josef.simanek@gmail.com> wrote:
> This can make some queries fail since there will be no way to gather
> data without seqscan.

Disabling enable_seqscan only adds a const penalty to Seq Scans. It
does not outright disallow them altogether.

Having said that, having Paths with the disabled high cost penalty can
cause other issues like the planner thinking two Paths are "fuzzily"
similar enough in costs and rejecting better Paths when in fact the
better Path is really quite a bit better when you subtract the
disabling cost penalty.

David

Re: Re: Different execution plan between PostgreSQL 8.4 and 12.11

From

David Rowley

Date:

11 October 2022, 10:53:36

On Tue, 11 Oct 2022 at 16:13, gzh <gzhcoder@126.com> wrote:
> new=# explain analyze select 2 from analyze_word_reports where (cseid = 94) limit 1;
>  Limit  (cost=0.00..0.43 rows=1 width=4) (actual time=2156.964..2156.966 rows=1 loops=1)
>    ->  Seq Scan on analyze_word_reports  (cost=0.00..528550.75 rows=1223533 width=4) (actual time=2156.962..2156.964
rows=1loops=

> 1)
>          Filter: (cseid = 94)
>          Rows Removed by Filter: 18320180  Planning Time: 0.086 ms  Execution Time: 2156.985 ms

It's a bit unfortunate that the planner picked this plan.  I can
recreate the problem on the master branch with:

create table t1 (a int, b int);
insert into t1 select x,x from generate_Series(1,10000000)x;
insert into t1 select 0,0 from generate_Series(1,10000000)x;
analyze t1;
create index on t1(a);
set synchronize_seqscans=off;
explain analyze select * from t1 where a=0 limit 1;
                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.03 rows=1 width=8) (actual
time=1865.838..1865.840 rows=1 loops=1)
   ->  Seq Scan on t1  (cost=0.00..338496.00 rows=10076667 width=8)
(actual time=1865.831..1865.831 rows=1 loops=1)
         Filter: (a = 0)
         Rows Removed by Filter: 10000000
 Planning Time: 1.507 ms
 Execution Time: 1866.326 ms
(6 rows)

What seems to be going on is that the index path is considered on the
base relation, but it's rejected by add_path() due to the costs being
higher than the seq scan costs.

I see even after dropping random_page_cost right down to 0.0 that we
do start to keep the Index path as a base relation path, but then the
LimitPath with the Seqscan subpath wins out over the LimitPath with
the index scan due to the Index scan having a higher startup cost.

It feels like something is a bit lacking in our cost model here. I'm
just not sure what that is.

David

Re: Different execution plan between PostgreSQL 8.4 and 12.11

From

Tom Lane

Date:

11 October 2022, 13:59:43

David Rowley <dgrowleyml@gmail.com> writes:
> It feels like something is a bit lacking in our cost model here. I'm
> just not sure what that is.

The example you show is the same old problem that we've understood for
decades: for cost-estimation purposes, we assume that matching rows
are more or less evenly distributed in the table.  Their actual
location doesn't matter that much if you're scanning the whole table;
but if you're hoping that a LIMIT will be able to stop after scanning
just a few rows, it does matter.

While it'd be pretty easy to insert some ad-hoc penalty into the
LIMIT estimation to reduce the chance of being fooled this way,
that would also discourage us from using fast-start plans when
they *do* help.  So I don't see any easy fix.

            regards, tom lane

Re: Different execution plan between PostgreSQL 8.4 and 12.11

From

Bruce Momjian

Date:

11 October 2022, 14:30:53

On Tue, Oct 11, 2022 at 09:59:43AM -0400, Tom Lane wrote:
> David Rowley <dgrowleyml@gmail.com> writes:
> > It feels like something is a bit lacking in our cost model here. I'm
> > just not sure what that is.
> 
> The example you show is the same old problem that we've understood for
> decades: for cost-estimation purposes, we assume that matching rows
> are more or less evenly distributed in the table.  Their actual
> location doesn't matter that much if you're scanning the whole table;
> but if you're hoping that a LIMIT will be able to stop after scanning
> just a few rows, it does matter.

We do have a correlation statistics value for each column but I am
unclear if that would help here.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson

Re: Different execution plan between PostgreSQL 8.4 and 12.11

From

Klint Gore

Date:

12 October 2022, 00:06:12

From: Bruce Momjian <bruce@momjian.us> Sent: Wednesday, 12 October 2022 1:30 AM

>On Tue, Oct 11, 2022 at 09:59:43AM -0400, Tom Lane wrote:
>> David Rowley <dgrowleyml@gmail.com> writes:
>> > It feels like something is a bit lacking in our cost model here. I'm
>> > just not sure what that is.
>>
>> The example you show is the same old problem that we've understood for
>> decades: for cost-estimation purposes, we assume that matching rows
>> are more or less evenly distributed in the table. Their actual
>> location doesn't matter that much if you're scanning the whole table;
>> but if you're hoping that a LIMIT will be able to stop after scanning
>> just a few rows, it does matter.
>
> We do have a correlation statistics value for each column but I am
> unclear if that would help here.

This might give someone an idea - the best query I come up with was

explain analyze select distinct 2 from tbl where (fld = 230) limit 1;

Limit (cost=0.56..28349.28 rows=1 width=4) (actual time=0.039..0.040 rows=1 loops=1)

-> Unique (cost=0.56..28349.28 rows=1 width=4) (actual time=0.039..0.039 rows=1 loops=1)

-> Index Only Scan using idx on tbl (cost=0.56..28349.28 rows=995241 width=4) (actual time=0.038..0.038 rows=1 loops=1)

Index Cond: (fld = 230)

Heap Fetches: 0

Planning Time: 0.066 ms

Execution Time: 0.047 ms

With the distinct and the limit, the planner somehow knows to push the either the distinct or the limit into the index only scan so the unique for distinct only had 1 row and the outer limit only had 1 row. Without the limit, the distinct still does the index only scan but has to do the unique on the million rows and execution time goes to about 100ms.

fld is mostly ordered - it's a serial primary key in another table. The cardinality of the 131 distinct values is an exponential distribution. Of the 20m rows, the fld values ordered by count is 8m, 5m 2m, 1m, 1m, .... down to about 10k. index is btree with stats target of 1000. table is analyzed and vacuum frozen. there is a "create statistics" on this table for n:1 relationship between another field and this one.

Without the distinct, choosing a different value with lower number of rows changed the plan to index only scan with limit somewhere between 3.7% and 4.7% of the table. With a brin index on a similar size/distributed table that is in fld order, that changed to somewhere between 0.6% and 0.7%.

Re: Different execution plan between PostgreSQL 8.4 and 12.11

From

David Rowley

Date:

12 October 2022, 01:47:17

On Wed, 12 Oct 2022 at 13:06, Klint Gore <kgore4@une.edu.au> wrote:
> Limit  (cost=0.56..28349.28 rows=1 width=4) (actual time=0.039..0.040 rows=1 loops=1)
>   ->  Unique  (cost=0.56..28349.28 rows=1 width=4) (actual time=0.039..0.039 rows=1 loops=1)
>         ->  Index Only Scan using idx on tbl  (cost=0.56..28349.28 rows=995241 width=4) (actual time=0.038..0.038
rows=1loops=1) 
>               Index Cond: (fld = 230)
>               Heap Fetches: 0
> Planning Time: 0.066 ms
> Execution Time: 0.047 ms
>
> With the distinct and the limit, the planner somehow knows to push the either the distinct or the limit into the
indexonly scan so the unique for distinct only had 1 row and the outer limit only had 1 row.  Without the limit, the
distinctstill does the index only scan but has to do the unique on the million rows and execution time goes to about
100ms.

I think that would be very simple to fix. I believe I've done that
locally but just detecting if needed_pathkeys == NULL in
create_final_distinct_paths().

i.e.

-                       if (pathkeys_contained_in(needed_pathkeys,
path->pathkeys))
+                       if (needed_pathkeys == NIL)
+                       {
+                               Node *limitCount = makeConst(INT8OID,
-1, InvalidOid,
+
                  sizeof(int64),
+
                  Int64GetDatum(1), false,
+
                  FLOAT8PASSBYVAL);
+                               add_path(distinct_rel, (Path *)
+
create_limit_path(root, distinct_rel, path, NULL,
+
            limitCount, LIMIT_OPTION_COUNT, 0,
+
            1));
+                       }
+                       else if
(pathkeys_contained_in(needed_pathkeys, path->pathkeys))

That just adds a Limit Path instead of the Unique Path. i.e:

postgres=# explain (analyze, costs off) select distinct a from t1 where a = 0;
                                      QUERY PLAN
--------------------------------------------------------------------------------------
 Limit (actual time=0.074..0.075 rows=1 loops=1)
   ->  Index Only Scan using t1_a_idx on t1 (actual time=0.072..0.073
rows=1 loops=1)
         Index Cond: (a = 0)
         Heap Fetches: 1
 Planning Time: 0.146 ms
 Execution Time: 0.100 ms
(6 rows)

However, I might be wrong about that. I've not given it too much thought.

David

Re: Different execution plan between PostgreSQL 8.4 and 12.11

From

gzh

Date:

12 October 2022, 10:30:17

Hi everyone,

Who can tell me which solution is better below：

Solution 1: Change the configuration parameters

set enable_seqscan = off

Solution 2: Add DISTINCT clause to SQL

explain analyze select DISTINCT 2 from analyze_word_reports where (cseid = 94) limit 1;

If I don't want to change SQL, is Solution 1 OK?


At 2022-10-12 09:47:17, "David Rowley" <dgrowleyml@gmail.com> wrote:
>On Wed, 12 Oct 2022 at 13:06, Klint Gore <kgore4@une.edu.au> wrote:
>> Limit  (cost=0.56..28349.28 rows=1 width=4) (actual time=0.039..0.040 rows=1 loops=1)
>>   ->  Unique  (cost=0.56..28349.28 rows=1 width=4) (actual time=0.039..0.039 rows=1 loops=1)
>>         ->  Index Only Scan using idx on tbl  (cost=0.56..28349.28 rows=995241 width=4) (actual time=0.038..0.038 rows=1 loops=1)
>>               Index Cond: (fld = 230)
>>               Heap Fetches: 0
>> Planning Time: 0.066 ms
>> Execution Time: 0.047 ms
>>
>> With the distinct and the limit, the planner somehow knows to push the either the distinct or the limit into the index only scan so the unique for distinct only had 1 row and the outer limit only had 1 row.  Without the limit, the distinct still does the index only scan but has to do the unique on the million rows and execution time goes to about 100ms.
>
>I think that would be very simple to fix. I believe I've done that
>locally but just detecting if needed_pathkeys == NULL in
>create_final_distinct_paths().
>
>i.e.
>
>-                       if (pathkeys_contained_in(needed_pathkeys,
>path->pathkeys))
>+                       if (needed_pathkeys == NIL)
>+                       {
>+                               Node *limitCount = makeConst(INT8OID,
>-1, InvalidOid,
>+
>                  sizeof(int64),
>+
>                  Int64GetDatum(1), false,
>+
>                  FLOAT8PASSBYVAL);
>+                               add_path(distinct_rel, (Path *)
>+
>create_limit_path(root, distinct_rel, path, NULL,
>+
>            limitCount, LIMIT_OPTION_COUNT, 0,
>+
>            1));
>+                       }
>+                       else if
>(pathkeys_contained_in(needed_pathkeys, path->pathkeys))
>
>That just adds a Limit Path instead of the Unique Path. i.e:
>
>postgres=# explain (analyze, costs off) select distinct a from t1 where a = 0;
>                                      QUERY PLAN
>--------------------------------------------------------------------------------------
> Limit (actual time=0.074..0.075 rows=1 loops=1)
>   ->  Index Only Scan using t1_a_idx on t1 (actual time=0.072..0.073
>rows=1 loops=1)
>         Index Cond: (a = 0)
>         Heap Fetches: 1
> Planning Time: 0.146 ms
> Execution Time: 0.100 ms
>(6 rows)
>
>However, I might be wrong about that. I've not given it too much thought.
>
>David

Re: Different execution plan between PostgreSQL 8.4 and 12.11

From

Klint Gore

Date:

13 October 2022, 02:42:09

From: gzh <gzhcoder@126.com> Sent: Wednesday, 12 October 2022 9:30 PM

> Who can tell me which solution is better below：

> Solution 1: Change the configuration parameters

> set enable_seqscan = off

> Solution 2: Add DISTINCT clause to SQL

> explain analyze select DISTINCT 2 from analyze_word_reports where (cseid = 94) limit 1;

> If I don't want to change SQL, is Solution 1 OK?

Both solutions are ugly

enable_seqscan is a really blunt instrument and may affect the rest of your system as well as just this query. Queries that boil down to "select * from partition" are now encouraged to use the index in a useless manor. A small table (e.g. to hold application settings) now has to do a primary key lookup when all rows fit on the first page anyway.

distinct+limit is really just trying to convince the v12 planner that it can bail out after the first row found at all levels. Having both is superfluous as they individually end up at the same result. it may not work in v13/14/15/... or even be needed. Have you tried it on your v12? My data may be different enough to your data that it doesn't work anyway. What it does in the old postgres version is anyone's guess.

Solution 1 I'd treat as an emergency stop gap to buy time to find a better solution. The patient is no longer bleeding out and the path forward can be considered. If you're not going to change the app, then the only other choice is play with other system wide settings (like random_page_cost). Not as blunt as enable_seqscan but still affects all queries, not just this one.