Re: Different execution plan between PostgreSQL 8.4 and 12.11 - Mailing list pgsql-general

From Josef Šimánek
Subject Re: Different execution plan between PostgreSQL 8.4 and 12.11
Date
Msg-id CAFp7QwrAp4j_T+F240Qgdt60yWMWNz1OsaudFRmtQJnCF67daw@mail.gmail.com
Whole thread Raw
In response to Re: Different execution plan between PostgreSQL 8.4 and 12.11  (gzh <gzhcoder@126.com>)
List pgsql-general
út 11. 10. 2022 v 11:17 odesílatel gzh <gzhcoder@126.com> napsal:
>
> Thank you for all your assistance.
>
>
> By communicating with my customer, we have adopted the following solution to fix the problem.
>
>
> set enable_seqscan = off
>
This can make some queries fail since there will be no way to gather
data without seqscan.
>
>
>
>
> At 2022-10-11 16:21:42, "Pavel Stehule" <pavel.stehule@gmail.com> wrote:
>
>
>
> út 11. 10. 2022 v 10:01 odesílatel gzh <gzhcoder@126.com> napsal:
>>
>> Thank you for providing the requested information.
>>
>>
>> The WebSite has been used for many years, and this upgrade is only a version upgrade of the PostgreSQL database.
>>
>> My customer does not want to modify the SQL because it will increase the cost of the project(All SQL that contains a
LIMITclause needs to be analyzed and checked). 
>>
>> Is there no other way to solve the problem?
>
>
> I don't know about any alternative
>
> Regards
>
> Pavel
>
>>
>>
>>
>>
>>
>> At 2022-10-11 13:24:12, "Pavel Stehule" <pavel.stehule@gmail.com> wrote:
>>
>>
>>
>> út 11. 10. 2022 v 7:08 odesílatel gzh <gzhcoder@126.com> napsal:
>>>
>>> Hi, Pavel
>>>
>>>
>>> > The LIMIT clause changes total cost.  This is a very aggressive clause. And
>>>
>>> > although it is absolutely useless in this case, Postgres does not have any
>>>
>>> > logic for removing it. Postgres doesn't try to fix developer's mistakes.
>>>
>>> Sorry,I didn't understand what you mean.
>>>
>>> Couldn't the LIMIT clause be used like the SQL statement below?
>>>
>>>
>>> >> new=# explain analyze select 2 from analyze_word_reports where (cseid =
>>>
>>> >> 94) limit 1;
>>
>>
>> there was query
>>
>> SELECT aggregate() FROM xx LIMIT 1
>>
>>>
>>> This SQL statement is no problem under PostgreSQL 8.4, the index works well.
>>>
>>>
>>
>> The optimizer is under nonstop change. And you can expect from any new release
>>
>> 75% queries are without change, 20% queries are faster, and 5% queries are slower
>>
>> The optimization is based on statistics and estimations, and searching for the optimal solution in space of all
solutions.In any version there are smaller or bigger changes of estimation methods, and between old 8.4 and 12 there
arebig changes in possibilities of how the query can be executed. So there is a higher possibility to find some really
fastqueries, but there is a higher possibility to find some local optimum or slow query too.  Usually the optimizer is
smarter(what is the benefit), but more sensitive too (what is the cost). You cannot expect the same result, when the
dataand algorithm is changed in any version. Postgres doesn't garant the immutability of execution plans. 
>>
>> The clause LIMIT with low LIMIT value can be problematic in more cases. The model in Postgres expects data are
uniformlystored in the table (heap), but the reality can be different. The common trick in these cases is using OFFSET
0clause like 
>>
>> SELECT * FROM (SELECT * FROM foo LIMIT 1000 OFFSET 0) s LIMIT 10.
>>
>>
>>
>>
>>
>>
>>>
>>>
>>>
>>>
>>>
>>> At 2022-10-11 12:13:47, "Pavel Stehule" <pavel.stehule@gmail.com> wrote:
>>>
>>>
>>>
>>> út 11. 10. 2022 v 6:05 odesílatel gzh <gzhcoder@126.com> napsal:
>>>>
>>>>
>>>> Hi, Pavel
>>>>
>>>> Thank you for your reply.
>>>>
>>>>
>>>> > the LIMIT clause is in this case totally useless and messy, and maybe can
>>>>
>>>> > negative impacts optimizer
>>>>
>>>> Yes. After removing the LIMIT clause, the performance is improved.
>>>>
>>>> The execution plan shows that the index worked.
>>>>
>>>> We've noticed it, but I don't want to fix the problem by modifying the SQL until I find the cause.
>>>
>>>
>>> The LIMIT clause changes total cost.  This is a very aggressive clause. And although it is absolutely useless in
thiscase, Postgres does not have any logic for removing it. Postgres doesn't try to fix developer's mistakes. 
>>>
>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> At 2022-10-11 11:32:48, "Pavel Stehule" <pavel.stehule@gmail.com> wrote:
>>>>
>>>>
>>>>
>>>> út 11. 10. 2022 v 5:13 odesílatel gzh <gzhcoder@126.com> napsal:
>>>>>
>>>>> Hi, Tom
>>>>> Thank you for your reply.
>>>>>
>>>>> > When you're asking for help, please don't give us vague statements
>>>>>
>>>>> > like "doesn't seem to work".
>>>>>
>>>>> I understand.
>>>>>
>>>>>
>>>>> > Did the plan (including rowcount
>>>>>
>>>>> > estimates) change at all?  To what?  How far off is that rowcount
>>>>>
>>>>> > estimate, anyway --- that is, how many rows actually have cseid = 94?
>>>>>
>>>>> Please refer to the new execution plan (PostgreSQL 12.11) below.
>>>>>
>>>>>
>>>>> new=# show enable_seqscan;
>>>>>
>>>>>  enable_seqscan
>>>>>
>>>>> ----------------
>>>>>
>>>>>  on
>>>>>
>>>>> (1 行)
>>>>>
>>>>>
>>>>> new=# select count(*) from analyze_word_reports;
>>>>>
>>>>>   count
>>>>>
>>>>> ----------
>>>>>
>>>>>  21331980
>>>>>
>>>>> (1 行)
>>>>>
>>>>>
>>>>> new=# select count(*) from analyze_word_reports where (cseid = 94);
>>>>>
>>>>>   count
>>>>>
>>>>> ---------
>>>>>
>>>>>  1287156
>>>>>
>>>>> (1 行)
>>>>>
>>>>>
>>>>> new=# explain analyze select count(2) from analyze_word_reports where (cseid = 94) limit 1;
>>>>>
>>>>>                                                                                                  QUERY PLAN
>>>>>
>>>>>
>>>>
>>>>
>>>> the LIMIT clause is in this case totally useless and messy, and maybe can negative impacts optimizer
>>>>
>>>> Regards
>>>>
>>>> Pavel
>>>>
>>>>
>>>>>
>>>>>
----------------------------------------------------------------------------------------------------------------------------------
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>>
>>>>>  Limit  (cost=65184.06..65184.07 rows=1 width=8) (actual time=123.713..133.035 rows=1 loops=1)
>>>>>
>>>>>    ->  Finalize Aggregate  (cost=65184.06..65184.07 rows=1 width=8) (actual time=123.712..133.033 rows=1 loops=1)
>>>>>
>>>>>          ->  Gather  (cost=65183.85..65184.06 rows=2 width=8) (actual time=123.548..133.024 rows=3 loops=1)
>>>>>
>>>>>                Workers Planned: 2
>>>>>
>>>>>                Workers Launched: 2
>>>>>
>>>>>                ->  Partial Aggregate  (cost=64183.85..64183.86 rows=1 width=8) (actual time=119.495..119.496
rows=1loops=3) 
>>>>>
>>>>>                      ->  Parallel Index Only Scan using analyze_word_reports_index_cseid on analyze_word_reports
(cost=0.56..6290
>>>>>
>>>>> 9.34 rows=509805 width=0) (actual time=0.031..98.706 rows=429052 loops=3)
>>>>>
>>>>>                            Index Cond: (cseid = 94)
>>>>>
>>>>>                            Heap Fetches: 1287156  Planning Time: 0.122 ms  Execution Time: 133.069 ms
>>>>>
>>>>> (11 行)
>>>>>
>>>>>
>>>>> new=# explain analyze select 2 from analyze_word_reports where (cseid = 94) limit 1;
>>>>>
>>>>>                                                              QUERY PLAN
>>>>>
>>>>>
>>>>>
>>>>>
----------------------------------------------------------------------------------------------------------------------------------
>>>>>
>>>>> ---
>>>>>
>>>>>  Limit  (cost=0.00..0.43 rows=1 width=4) (actual time=2156.964..2156.966 rows=1 loops=1)
>>>>>
>>>>>    ->  Seq Scan on analyze_word_reports  (cost=0.00..528550.75 rows=1223533 width=4) (actual
time=2156.962..2156.964rows=1 loops= 
>>>>>
>>>>> 1)
>>>>>
>>>>>          Filter: (cseid = 94)
>>>>>
>>>>>          Rows Removed by Filter: 18320180  Planning Time: 0.086 ms  Execution Time: 2156.985 ms
>>>>>
>>>>> (6 行)
>>>>>
>>>>>
>>>>>
>>>>> > If the estimate is far off, then increasing the table's statistics
>>>>>
>>>>> > target might help.
>>>>>
>>>>> Thank you for your advice.
>>>>>
>>>>> Please tell me how to set the table's statistics up to improve performance.
>>>>>
>>>>>
>>>>> new=#  select oid from pg_class where relname = 'analyze_word_reports';
>>>>>
>>>>>   oid
>>>>>
>>>>> -------
>>>>>
>>>>> 16429
>>>>>
>>>>> (1 行)
>>>>>
>>>>>
>>>>> new=# select attrelid,attname,attstattarget from pg_attribute where attrelid=16429 and attname='cseid';
>>>>>
>>>>> attrelid | attname | attstattarget
>>>>>
>>>>> ----------+---------+---------------
>>>>>
>>>>>     16429 | cseid   |            -1
>>>>>
>>>>> (1 行)
>>>>>
>>>>>
>>>>> > Another thing that would be worth checking is whether
>>>>>
>>>>> > "set enable_seqscan = off" prods it to choose the plan you want.
>>>>>
>>>>> > If not, then there's something else going on besides poor estimates.
>>>>>
>>>>> "set enable_seqscan = off" works, and the performance is greatly improved, which is almost the same as PostgreSQL
8.4.
>>>>>
>>>>> The enable_seqscan(PostgreSQL 8.4) is on, will this change have an unknown effect on other queries?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> At 2022-10-10 10:45:54, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>>>>> >gzh  <gzhcoder@126.com> writes:
>>>>> >> I've run analyze(not vacuum analyze), but it doesn't seem to work.
>>>>> >
>>>>> >When you're asking for help, please don't give us vague statements
>>>>> >like "doesn't seem to work".  Did the plan (including rowcount
>>>>> >estimates) change at all?  To what?  How far off is that rowcount
>>>>> >estimate, anyway --- that is, how many rows actually have cseid = 94?
>>>>> >
>>>>> >If the estimate is far off, then increasing the table's statistics
>>>>> >target might help.
>>>>> >
>>>>> >Another thing that would be worth checking is whether
>>>>> >"set enable_seqscan = off" prods it to choose the plan you want.
>>>>> >If not, then there's something else going on besides poor estimates.
>>>>> >
>>>>> > regards, tom lane



pgsql-general by date:

Previous
From: gzh
Date:
Subject: Re: Different execution plan between PostgreSQL 8.4 and 12.11
Next
From: David Rowley
Date:
Subject: Re: Different execution plan between PostgreSQL 8.4 and 12.11