RE: Query with straightforward plan changes and becomes 6520 times slower after VACUUM ANALYZE - Mailing list pgsql-bugs

From Jan Kort
Subject RE: Query with straightforward plan changes and becomes 6520 times slower after VACUUM ANALYZE
Date
Msg-id AM0PR0502MB3620FB9E1A9B76AD313A1DBF9B2B9@AM0PR0502MB3620.eurprd05.prod.outlook.com
Whole thread Raw
In response to Re: Query with straightforward plan changes and becomes 6520 times slower after VACUUM ANALYZE  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Query with straightforward plan changes and becomes 6520 times slower after VACUUM ANALYZE  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-bugs
Hi David,

Ok so the problem is that the MCV list is created, because there are groups of common values, but then this has no
effect,because there are not enough single values to create a histogram?
 

That is not such a rare case as I would have hoped, for example this would then also cause the same behavior:

values(1,1),(2,1),(3,1000000),(4,2),(5,2),(6,3),(7,3);

And it seems to do.

Ranges can then be longer too:

values(1,1),(2,1),(4,1),(3,1000000);

for some reason I can't reverse the ranges:

values(1,1000000),(2,1000000),(3,1);

I don’t understand why that doesn't create a bad plan, there seems to be order involved too.

I admit that this is still somewhat rare, but much less rare than I was hoping for.

A few more things:

Using text as primary key seems to perform good too, even if it keeps the same "merge" plan:

create table million (id text primary key); create table three (id text primary key, million_id text not null);

What the high value is and how close it is to the low value seems to matter:

values(1,1),(2,1),(3,100000);
values(1,1),(2,1),(3,10000);
values(1,1),(2,1),(3,1000);

Gets progressively less bad, again the plan is not changed, but the execution of it becomes more efficient.

It seems like it goes through the records from 1 till say 10000 one by one.

Regards,

Jan

-----Oorspronkelijk bericht-----
Van: David Rowley <dgrowleyml@gmail.com> 
Verzonden: woensdag 19 mei 2021 09:52
Aan: Jan Kort <jan.kort@genetics.nl>
CC: pgsql-bugs@lists.postgresql.org
Onderwerp: Re: Query with straightforward plan changes and becomes 6520 times slower after VACUUM ANALYZE

On Wed, 19 May 2021 at 14:13, David Rowley <dgrowleyml@gmail.com> wrote:
> That makes me think the best fix would be to do something better 
> during ANALYZE and maybe try and include some more upper bound MCVs.
> I'm not yet too sure what drawbacks there might be from doing that.

I had a quick look at the ANALYZE code and see that we only consider items for the MCV list when they appear more than
oncein the analyzed set.  When it comes to the histogram, we only consider making the histogram if there are at least 2
itemsthat are not covered in the MCV list. The comment mentions:
 

/*
* Generate a histogram slot entry if there are at least two distinct
* values not accounted for in the MCV list.  (This ensures the
* histogram won't collapse to empty or a singleton.) */

So given we only have 2 distinct values in the "three" table and one of those is tracked in the MCV list, there are not
enoughvalues remaining to build a histogram.
 

It seems you've hit about the worst-case here. If you'd had 1 more value in the gfo_zaken_kosten table then that would
havebeen enough to build a histogram. Or if the 98 value was not duplicated then we'd not have built an MCV list and
builta histogram with the two values instead.
 

If you want a workaround, you could do:

alter table trial.gfo_zaken_kosten alter column gfo_zaken_id set statistics 0; delete from pg_statistic where
starelid='trial.gfo_zaken_kosten'::regclassand staattnum=2; analyze trial.gfo_zaken_kosten;
 

that's a bit dirty though. You'd need to do:

alter table trial.gfo_zaken_kosten alter column gfo_zaken_id set statistics -1; analyze trial.gfo_zaken_kosten;

if that table was to ever change, else you might get bad plans due to lack of stats.

David

pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #17024: ERROR: column c.relhasoids does not exist at character 245
Next
From: Daniel Gustafsson
Date:
Subject: Re: BUG #17024: ERROR: column c.relhasoids does not exist at character 245