Re: analyze causes query planner to choose suboptimal plan for a select query in separate transaction - Mailing list pgsql-general
From | Martin Kováčik |
---|---|
Subject | Re: analyze causes query planner to choose suboptimal plan for a select query in separate transaction |
Date | |
Msg-id | D5E147D0-991B-43BC-8428-312EE55B2962@redbyte.eu Whole thread Raw |
In response to | Re: analyze causes query planner to choose suboptimal plan for aselect query in separate transaction (Adrian Klaver <adrian.klaver@aklaver.com>) |
List | pgsql-general |
I’ve got it now, thank you for the clarification. You are right. Martin Kováčik +421904236791 > On 26 Apr 2019, at 00:25, Adrian Klaver <adrian.klaver@aklaver.com> wrote: > >> On 4/25/19 1:28 PM, Martin Kováčik wrote: >> I'm not sure if I understand what you mean. My initial thought was that stats are fixed per transaction, i.e. analyzefrom comitted transaction doesn't interfere with another running transaction. Maybe I was confused by this becauseanalyze can be run inside a transaction, so my assumption was it is isolated like other statements. > > To make it clearer I would take a look at: > > https://www.postgresql.org/docs/11/sql-analyze.html > https://www.postgresql.org/docs/11/catalog-pg-statistic.html > > > The gist is currently there is one entry(with exception noted in second link) per column in each table. This is the mostrecent information available subject to the conditions Tom pointed out in his post. To make it MVCC aware like you suggestwould mean tracking the state of all transactions currently open and recording that information, so each transactioncould find its stats. This means the planning problem could then move to the stats table as it would need to beanalyzed itself to work efficiently. This would probably also need to be MVCC aware to be relevant, which would add tothe overhead. I could see this turn into a 'hall of mirrors' problem quickly. > >> *Martin Kováčik* >> /CEO/ >> *redByte*, s.r.o. >> +421 904 236 791 >> kovacik@redbyte.eu <mailto:kovacik@redbyte.eu>, www.redbyte.eu <http://redbyte.eu> >> On Thu, Apr 25, 2019 at 9:58 PM Adrian Klaver <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote: >> On 4/25/19 12:47 PM, Martin Kováčik wrote: >> > As my example shows you don't have to import a lot of rows - 1000 is >> > enough to make a difference - it all depends on the query. When a >> > cartesian product is involved only a few records is enough. >> > I think that stats should be MVCC versioned otherwise the planner is >> > using wrong statistics and chooses wrong plans. >> Then you are looking at moving the choke point to looking up the >> correct >> stats across possibly hundreds/thousands of transactions in flight. >> > *Martin Kováčik* >> > /CEO/ >> > *redByte*, s.r.o. >> > +421 904 236 791 >> > kovacik@redbyte.eu <mailto:kovacik@redbyte.eu> >> <mailto:kovacik@redbyte.eu <mailto:kovacik@redbyte.eu>>, >> www.redbyte.eu <http://www.redbyte.eu> >> > <http://redbyte.eu> >> > >> > >> > On Thu, Apr 25, 2019 at 9:28 PM Michael Lewis <mlewis@entrata.com >> <mailto:mlewis@entrata.com> >> > <mailto:mlewis@entrata.com <mailto:mlewis@entrata.com>>> wrote: >> > >> > >> > >> > On Thu, Apr 25, 2019, 11:34 AM Martin Kováčik >> <kovacik@redbyte.eu <mailto:kovacik@redbyte.eu> >> > <mailto:kovacik@redbyte.eu <mailto:kovacik@redbyte.eu>>> wrote: >> > >> > Turning off autovacuum for the tests is a valid option and I >> > will definitely do this as a workaround. Each test pretty >> much >> > starts with empty schema and data for it is generated >> during the >> > run and rolled back at the end. I have a lot of tests and >> at the >> > moment it is not feasible to modify them. >> > >> > The real workload for the application is different, but there >> > are some cases, when we import data from remote web >> service in a >> > transaction do some work with it and then we do a commit. If >> > there is an autovacuum during this process I assume there >> will >> > be similar problem regarding planner statistics. >> > >> > >> > Unless you are importing a huge amount of data relative to >> what is >> > already there, it seems likely to be significantly less impactful >> > than adding data to a completely empty table. The stats on a >> table >> > with 0 rows and then 5000 rows is going to be night and day, >> while >> > the difference between stats on 100,000 rows and 105,000 is >> not as >> > impactful. Musing here. I expect others will chime in. >> > >> > Stats are not versioned with MVCC so it would expected that a >> commit >> > in another transaction that is updating stats would influence the >> > query plan for another transaction that is active. >> > >> -- Adrian Klaver >> adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com> > > > -- > Adrian Klaver > adrian.klaver@aklaver.com
pgsql-general by date: