Re: BUG #16076: JIT causes huge delays in a complex query. jit=off solves it. - Mailing list pgsql-bugs
From | Andres Freund |
---|---|
Subject | Re: BUG #16076: JIT causes huge delays in a complex query. jit=off solves it. |
Date | |
Msg-id | 20210530212639.67te76g2btc6bdzz@alap3.anarazel.de Whole thread Raw |
In response to | Re: BUG #16076: JIT causes huge delays in a complex query. jit=off solves it. (Yuri Astrakhan <yuriastrakhan@gmail.com>) |
List | pgsql-bugs |
Hi, I'm afraid not :(. I really could use some help around this... Regards, Andres On 2021-05-21 17:54:16 -0400, Yuri Astrakhan wrote: > Hi Andres, I just heard v14 beta is out (congrats!). Do you know if the > very large query optimizations you mentioned (see last paragraph) have made > it into v13 or v14b1? Thanks! > > On Thu, Oct 24, 2019 at 9:04 PM Andres Freund <andres@anarazel.de> wrote: > > > Hi, > > > > On 2019-10-24 15:12:16 +0000, PG Bug reporting form wrote: > > > The following bug has been logged on the website: > > > > > > Bug reference: 16076 > > > Logged by: Yuri Astrakhan > > > Email address: yuriastrakhan@gmail.com > > > PostgreSQL version: 12.0 > > > Operating system: Ubuntu 19.10 > > > Description: > > > > > > (huge thanks to @Komzpa - Darafei Praliaskouski who figured out the > > > culprit) > > > > > > PostgreSQL 12.0 (Ubuntu 12.0-2.pgdg19.10+1) on x86_64-pc-linux-gnu, > > compiled > > > by gcc (Ubuntu 9.2.1-9ubuntu1) 9.2.1 20191008, 64-bit > > > POSTGIS="3.0.0 r17983" [EXTENSION] PGSQL="120" GEOS="3.8.0-CAPI-1.13.1 " > > > PROJ="Rel. 5.2.0, September 15th, 2018" LIBXML="2.9.4" LIBJSON="0.13.1" > > > LIBPROTOBUF="1.3.1" WAGYU="0.4.3 (Internal)" > > > > > > I have a full OpenStreetMap loaded using OpenMapTiles configuration into > > > multiple machine configurations. A single query (wrapped as a UDF) > > generates > > > the whole MVT tile blob, and uses many layers, sub-queries, and UDFs. > > When > > > JIT is enabled, the query takes 5-10 seconds to run. Disabling JIT makes > > the > > > query run in milliseconds. Same queries on PostgreSQL 10 + Postgis > > > 2.5.2/GEOS 3.6 have no issues (as there was no JIT). > > > > Ugh. > > > > > > > Details: > > > The query is a concatenation of multiple map layers (binary blobs in MVT > > > format from ST_AsMVT()). All layers are "UNION ALL-ed" from subqueries, > > one > > > query per layer. The "explain analyze" shows huge cost of the first layer > > > query, regardless of the order of those layers. Every layer added about > > > 200-300ms to the execution (not planning!) time, totalling about 5-7 > > > seconds. Yet, I noticed that when I was testing a portion of layers, > > > removing layer_waterway made query run in milliseconds (I suspect there > > are > > > more than one layer that causes this bug). > > > > > I will try to reproduce this error using OpenMapTiles's quickstart setup, > > > but that will involve some heavy customization. Is this a known bug, or > > > should I try to get an easy to set up repo? Thanks! > > > > A repro would be very helpful. > > > > > > > * "perf top" results - > > > https://gist.github.com/nyurik/fa0cb0dd935d50e796cd5d58d8d41067 > > > > Hm. That's a flat looking profile, making it hard to glean information > > about what's going wrong. It might just be the size of the emitted > > code... > > > > > > If it's too hard to generate a quick repro, it might already help if you > > sent me the bitcode for such a slow query. You can do so by setting > > 'jit_dump_bitcode' and looking for the *.bc files in the data > > directory. > > > > > > > * A few explore analyze runs with different layer orderings: > > > * https://explain.depesz.com/s/fecC > > > * https://explain.depesz.com/s/MpMo > > > * https://explain.depesz.com/s/d7RU > > > * main query - > > > https://gist.github.com/nyurik/ed80c5e6d20f67fd14d37ea2503a9cbd > > > * helpers - > > https://gist.github.com/nyurik/00c4c9d69b7d4fa67c86159b9ce83027 > > > > This looks like it creates pretty darn huge queries. There's a *lot* of > > table references here, and each of them has their own targetlist etc. > > > > I want to figure out why it's *this* slow, but this many expressions are > > going to have overhead to JIT. There's 330 generated functions in that > > last explain. A lot of them never actually executed, because there's > > basically no content. Presumably there's a lot of redundancy, with the > > expressions being basically the same - but we're unable to take > > advantage of that in 12, due to the pointer constants making the > > expressions not actually be the same, c.f. patch at[1] (which also is > > why there's no caching). > > > > Unfortunately I don't think there's going to be full blown fix for this > > in 12, although we might be able to improve things a bit. So you'll > > probably end up having to just disable jit. But if you can try to get me > > enough data to work with, the situation in 13 might be better. > > > > Greetings, > > > > Andres Freund > > > > [1] > > https://www.postgresql.org/message-id/20191023163849.sosqbfs5yenocez3%40alap3.anarazel.de > > Greetings, Andres Freund
pgsql-bugs by date: