Re: BUG #16241: Degraded hash join performance - Mailing list pgsql-bugs
From | Andres Freund |
---|---|
Subject | Re: BUG #16241: Degraded hash join performance |
Date | |
Msg-id | 20200204134408.oing4ypvshgr5ykx@alap3.anarazel.de Whole thread Raw |
In response to | BUG #16241: Degraded hash join performance (PG Bug reporting form <noreply@postgresql.org>) |
Responses |
Re: BUG #16241: Degraded hash join performance
(Thomas Butz <tbutz@optitool.de>)
Re: BUG #16241: Degraded hash join performance (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-bugs |
Hi, On 2020-02-04 10:02:45 +0100, Thomas Butz wrote: > I've executed the query twice and attached the generated reports. Interesting! The no-children one clearly shows that a lot of the the time is spent evaluating regular expressions (there's other regex functions in the profile too): 23.36% postgres postgres [.] subcolor and that this is inside a stack of plpgsql functions: | | |--18.95%--exec_stmt_block | | | exec_stmt | | | plpgsql_exec_function | | | plpgsql_call_handler | | | ExecInterpExpr | | | ExecEvalExpr (inlined) | | | exec_eval_simple_expr | | | (inlined) [...] | | | plpgsql_exec_function | | | plpgsql_call_handler | | | ExecInterpExpr | | | ExecEvalExpr (inlined) | | | exec_eval_simple_expr (inlined) [...] | | | plpgsql_exec_function | | | plpgsql_call_handler | | | ExecInterpExpr | | | ExecEvalExpr (inlined) [...] I'm not aware of any relevant regular expression evaluation changes between 11 and 12. Tom, does this trigger anything? Thomas, could you provide a EXPLAIN (ANALYZE, VERBOSE) for both versions? Clearly the hash join is where the time is spent: -> Hash Join (cost=1.000..6436.030 rows=44 width=517) (actual time=5.532..3383.213 rows=2779 loops=1) Buffers: shared hit=9077 -> Index Scan using planet_osm_hstore_line_way_idx on planet_osm_hstore_line planet_osm_hstore_line (cost=0.550..6419.380rows=491 width=352) (actual time=0.423..19.147 rows=2779 loops=1) Index Cond: (planet_osm_hstore_line.way && '01030000A0110F00000100000005000000AF78D2F4ECA632411CD92E58CD2E58410000000000000000AF78D2F4ECA63241E4EED5138D3958410000000000000000D3CF6EE3EBD13241E4EED5138D3958410000000000000000D3CF6EE3EBD132411CD92E58CD2E58410000000000000000AF78D2F4ECA632411CD92E58CD2E58410000000000000000'::geometry) Filter: ((((planet_osm_hstore_line.tags -> 'name'::text) IS NOT NULL) OR ((planet_osm_hstore_line.tags-> 'oneway'::text) = ANY ('{yes,-1}'::text[])) OR ((planet_osm_hstore_line.tags -> 'junction'::text)= 'roundabout'::text)) AND ((planet_osm_hstore_line.tags -> 'highway'::text) = ANY ('{motorway,motorway_link,trunk,trunk_link,primary,primary_link,secondary,secondary_link,tertiary,tertiary_link,residential,unclassified,road,service,pedestrian,raceway,living_street,construction}'::text[]))) Buffers: shared hit=9077 -> Hash (cost=0.230..0.230 rows=18 width=32) (actual time=0.029..0.030 rows=18 loops=1) -> Values Scan (cost=0.000..0.230 rows=18 width=32) (actual time=0.003..0.017 rows=18 loops=1) but I don't think it's the hash condition itself - there's no plpgsql functions necessary for the join as far as I can tell. A VERBOSE EXPLAIN might help up with knowing what the function being called actually is. It seems quite possible that all that time is spent inside plpgsql functions that are part of postgis. And that the postgis version difference between 2.5 and 3.0 is the relevant part here, not the postgres version difference. Or it's just in the application code: The sort step shows among others the following column to be sorted on: osml10n_get_streetname_from_tags(planet_osm_hstore_line.tags, true, false, ' - '::text, 'de'::text, planet_osm_hstore_line.way,NULL::text)) DESC which is actually likely computed as the output of the HashJoin node. And is the only obvious candidate for a plpgsql function potentially taking a lot of the time. Is that function defined exactly the same between the version? Is the underlying dataset roughly the same size? I suggest running the query without it, and checking how long it takes. Regards, Andres
pgsql-bugs by date: