Re: Strange query behaviour between 9.4 and 12beta1 - Mailing list pgsql-performance
From | Tomas Vondra |
---|---|
Subject | Re: Strange query behaviour between 9.4 and 12beta1 |
Date | |
Msg-id | 20190604230118.obhk44tautptakmb@development Whole thread Raw |
In response to | Strange query behaviour between 9.4 and 12beta1 (Vasilis Ventirozos <v.ventirozos@gmail.com>) |
List | pgsql-performance |
On Tue, Jun 04, 2019 at 05:34:07PM +0300, Vasilis Ventirozos wrote: >Hello everyone, >I started comparing performance between postgres 9.4 and 12beta1 more specifically comparing the new (materialized) CTE. Are you saying the CTE is specified as MATERIALIZED in the query on 12? Because I don't see it in the explain plan (it's mentioned in the 9.4 plan, though). >The statements i use are application statements that i have little control over, >Hardware is identical as both clusters are running on the same server, on the same disks, with the same data. >Also, cluster settings are almost identical and both clusters have been analyzed. >In all my tests 12 is faster , sometimes much faster, apart from one single query that takes ~12 seconds on 9.4 and nearly300 seconds on 12. > >Plans for both : >Plan for 12 <https://explain.depesz.com/s/wRXO> >plan for 9.4 <https://explain.depesz.com/s/njtH> > >The plans are obfuscated , apologies for that but what stands out is the following : > Meh. > >Hash Left Join <http://www.depesz.com/2013/05/09/explaining-the-unexplainable-part-3/#join-modifiers> (cost=200,673.150..203,301.940rows=153,121 width=64) (actual time=1,485.883..284,536.440 rows=467,123 loops=1) >Hash Cond: (lima_sierra(six_lima_november2.victor_romeo, 1) = foxtrot_hotel.victor_romeo) >Join Filter: (whiskey_uniform1.six_sierra = foxtrot_hotel.uniform_juliet) >Rows Removed by Join Filter: 4549925366 > You have two equality conditions for the join. The first one is used to match rows by the hash join itself - it's used to compute the hash value and lookups. But there may be multiple rows that match on either side, generating additional "virtual rows". Those are then removed by the second condition. Consider for example simple cross-join on this table: a | b ------------- 1 | a 1 | b 2 | a 2 | b and the query is SELECT * FROM t t1 JOIN t t2 ON (t1.a = t2.a AND t1.b = t2.b) Now, in the first phase, the hash join might only do (t1.a = t2.a), which will generate 8 rows a | t1.b | t2.b ---------------- 1 | a | a 1 | a | b 1 | b | a 1 | b | b 2 | a | a 2 | a | b 2 | b | a 2 | b | b And then it will apply the second condition (t1.b = t2.b) as a "filter" which will remove some of the rows. In your case the first step generates 4.5B rows the second step discards. I'm not sure why we don't use both conditions as a hash condition. Perhaps it's a data type that does not support hashing, and on 9.4 that does not matter because we end up using merge join. > >I really can't understand what these 4.5bil rows have been removed from, there is nothing suggesting that this dataset wasever created (eg. temp) >and these numbers definitely don't match what i was expecting, which is more or less what i'm seeing in 9.4 plan. > >Obviously i've tested this more than once and this behaviour consists. > > >Best Regards, >Vasilis Ventirozos > regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-performance by date: