Home > mailing lists

Re: Analyze results in more expensive query plan - Mailing list pgsql-performance

From	Jeremy Altavilla
Subject	Re: Analyze results in more expensive query plan
Date	May 20, 2019 20:19:48
Msg-id	CAG3Nut8Xaak-qy3cLT0WGDRBtbNPgGxVEM_GN9vCmNQiAUNu+g@mail.gmail.com Whole thread
In response to	Re: Analyze results in more expensive query plan (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Analyze results in more expensive query plan
List	pgsql-performance

Tree view

Thanks for the suggestion. I created extended statistics objects for the two tables in question. Unfortunately the resulting plan was the same (and had the same estimates). It looks like the extended stats discovered a potentially useful correlation on bag: "2, 3 => 1" (owner_id, bag_type_id => id). I'm guessing this wasn't usable because the docs state "They are not used to improve estimates for equality conditions comparing two columns".

I created functional dependency extended stats (none of our queries use group by), and ran analyze. The resulting objects are below. For correlations of 1, the results seemed logically correct (I'm not sure how to interpret the .966 values). The limitations section said that extended stats are only applied for simple equality conditions, so I modified the query to use equality instead of any. That still resulted in the same plan and estimate. Just to be thorough, I tried with all permutations of zero, one or both stats objects. In all cases the resulting plan and estimates didn't change from the slow hash join.

create statistics bag_type_stats (dependencies) on id, name, game from bag_type;
analyze bag_type;
create statistics bag_stats (dependencies) on id, owner_id, bag_type_id from bag;
analyze bag;

select * from pg_statistic_ext;
-[ RECORD 1 ]---+------------------------------------------------------------------------------------------------------------------------------------------------------
stxrelid    | 16411
stxname | bag_stats
stxnamespace    | 2200
stxowner    | 10
stxkeys | 1 2 3
stxkind | {f}
stxndistinct |
stxdependencies | {"1 => 2": 1.000000, "1 => 3": 1.000000, "2 => 1": 0.966567, "2 => 3": 0.966567, "1, 2 => 3": 1.000000, "1, 3 => 2": 1.000000, "2, 3 => 1": 1.000000}
-[ RECORD 2 ]---+------------------------------------------------------------------------------------------------------------------------------------------------------
stxrelid | 16398
stxname | bag_type_stats
stxnamespace | 2200
stxowner | 10
stxkeys | 1 2 4
stxkind | {f}
stxndistinct |
stxdependencies | {"1 => 2": 1.000000, "1 => 4": 1.000000, "2 => 1": 1.000000, "2 => 4": 1.000000, "1, 2 => 4": 1.000000, "1, 4 => 2": 1.000000, "2, 4 => 1": 1.000000}

For bag keys 1, 2, 3 are id, owner_id and bag_type_id. For bag_type 1, 2, 4 are id, name and game.

--Thanks

--Jeremy

On Fri, May 17, 2019 at 9:35 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Jeremy Altavilla <jeremyaltavilla@gmail.com> writes:
> > We have several select statements whose performance is greatly improved by
> > deleting some stats from pg_statistic.
>
> You might have better results by setting up some "extended stats" for
> the combination of bag_type columns that this query depends on. Per your
> description, there's a fair amount of cross-column correlation, which
> the planner will not expect without some extended stats to tell it so.
>
> https://www.postgresql.org/docs/10/planner-stats.html#PLANNER-STATS-EXTENDED
>
> regards, tom lane

pgsql-performance by date:

From: Mariel Cherkassky
Date: 20 May 2019, 17:20:33
Subject: Re: Trying to handle db corruption 9.6

From: Tomas Vondra
Date: 20 May 2019, 20:55:53
Subject: Re: Trying to handle db corruption 9.6

Re: Analyze results in more expensive query plan - Mailing list pgsql-performance

Previous

Next