Thread: BUG #18831: Particular queries using gin-indexes are not interruptible, resulting is resource usage concerns.
BUG #18831: Particular queries using gin-indexes are not interruptible, resulting is resource usage concerns.
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 18831 Logged by: Niek Email address: niek.brasa@hitachienergy.com PostgreSQL version: 17.4 Operating system: Windows/Linux Description: As part of checking why some of the queries were not interruptible (even though statement_timeout was set), I was able to reduce it to something generic, as the usage of a gin index and the ?| operator. I have tested this in PostgreSQL 13.20, 16.4 and 17.4 Although performance is certainly a real issue as well, that is not the main cause of filing a bug report. As you can imagine, it is relatively easy to execute multiple of those statements, and put the server to a halt, which is what happened in our case. We have 120 seconds for statement timeout, and it was not being honored causing our services to reconnect after another timeout was triggered, created another connection and executing the same query. Output for 1000 iteration: NOTICE: Query for 1000/ 10 took 0.001004 s NOTICE: Query for 1000/ 100 took 0.000983 s NOTICE: Query for 1000/ 1000 took 0.029361 s NOTICE: Query for 1000/ 10000 took 3.364018 s NOTICE: Query for 1000/ 50000 took 84.101790 s NOTICE: Query for 1000/ 100000 took 373.761158 s NOTICE: Query for 1000/ 150000 took 846.428113 s NOTICE: Query for 1000/ 200000 took 1486.085538 s NOTICE: Query for 1000/ 300000 took 3382.042434 s NOTICE: Query for 1000/ 400000 took 6249.934149 s NOTICE: Query for 1000/ 500000 took 9731.607982 s NOTICE: Query for 1000/ 600000 took 14113.382917 s NOTICE: Query for 1000/ 700000 took 19276.934712 s NOTICE: Query for 1000/ 800000 took 24991.138558 s And one for 1000 and interrupted at the 50000 point (you will notice, it will return after the query is executed with exception) NOTICE: Query for 1000/ 50000 took 93.362066 s, before being canceled. Below you find the reproducible block, let it run, when you see that you have to wait a long time for a result, use the pg_terminate_backend or pg_cancel_backend to stop the statement. You will notice that nothing is happening, but the statement is actually executed, you will get a query_canceled exception. DO $$ DECLARE _dummy int; _start timestamp; _a_t int[] := ARRAY[1,10,100,1000,10000,100000]; _a_l int[] := ARRAY[10,100,1000,10000,50000,100000,150000,200000,300000,400000,500000,600000,700000,800000,900000,1000000]; _i_l int; _i_t int; BEGIN FOREACH _i_t IN ARRAY _a_t LOOP -- create a table with _i_t rows that contains a jsonb column 'doc', with -- one attribute 'a' containing an array of two random uuids as text. -- using the gin index here, will cause the statement to be non interruptible DROP TABLE IF EXISTS public.t1; CREATE TABLE public.t1 as (SELECT jsonb_build_object('a', jsonb_build_array(gen_random_uuid()::text,gen_random_uuid()::text)) doc FROM generate_series(1,_i_t)); CREATE INDEX t1_idx ON public.t1 USING gin ((doc->'a') jsonb_ops); ANALYZE public.t1; FOREACH _i_l IN ARRAY _a_l LOOP BEGIN _start := clock_timestamp(); -- the following query cannot be cancelled or terminated and the postgres process -- will keep running, can only hard kill the process, which causes the database to go -- into recovery mode. If statement takes longer than the statement timeout, it will -- return with canceled statement exception. -- Although performance is horrendous with larger table contents, the fact that it is -- not interruptible is the main concern raised here. -- -- - Using an in memory table, will cause the statement to be interruptible. -- ,t1 AS (select jsonb_build_object('a', jsonb_build_array(gen_random_uuid()::text,gen_random_uuid()::text)) doc FROM generate_series(1,_i_t)) -- - Using an actual table without the gin index, will cause the statement to be -- interruptible -- - Using a gin index, will cause the statemetnt to be non interruptible. -- WITH l1 AS (select gen_random_uuid()::text uuid FROM generate_series(1,_i_l)) SELECT 1 INTO _dummy FROM t1 WHERE t1.doc->'a' ?| (SELECT array_agg(uuid) FROM l1); EXCEPTION WHEN query_canceled THEN RAISE NOTICE 'Query for %/% took % s, before being canceled.' ,LPAD(_i_t::text,5,' ') ,LPAD(_i_l::text,7,' ') ,LPAD(EXTRACT(epoch FROM (clock_timestamp() - _start))::text,13,' '); RETURN; END; RAISE NOTICE 'Query for %/% took % s' ,LPAD(_i_t::text,5,' ') ,LPAD(_i_l::text,7,' ') ,LPAD(EXTRACT(epoch FROM (clock_timestamp() - _start))::text,13,' '); END LOOP; END LOOP; END; $$;
Re: BUG #18831: Particular queries using gin-indexes are not interruptible, resulting is resource usage concerns.
From
Tom Lane
Date:
PG Bug reporting form <noreply@postgresql.org> writes: > As part of checking why some of the queries were not interruptible (even > though statement_timeout was set), I was able to reduce it to something > generic, as the usage of a gin index and the ?| operator. Thanks for the report. It looks like the issue boils down to trying to do a GIN index search with a very large array on the RHS of the ?| operator. extractQuery extracts all the array elements as search keys, and then we spend O(N^2) time in a very stupid de-duplication loop in ginFillScanEntry. Also, if we get past that, there's O(N^2) work in startScanKey. It's not hard to remove the O(N^2) behavior in ginFillScanEntry: the de-duplication is just an optimization AFAICS, and we can stop doing it once there are too many keys. A more principled approach would be to sort the keys and then just compare adjacent ones to de-dup, but I seriously doubt it's worth the extra code that would be needed. Half of the problem in startScanKey is just wasted work: it re-initializes the entire entryRes[] array for each required-key probe, when it actually would take less code to incrementally update the array. However, the other half is that we are doing up to nentries calls of the triConsistentFn, which may be doing O(nentries) work itself per call --- and indeed is doing so in this specific case. I don't see any solution for that that doesn't involve changes in the API spec for GIN opclasses. It does seem like a pretty bad way to be doing things for operators that have simple AND or OR semantics, where we could know the answer in constant time if we only had an API for it. So maybe it's worth trying to do something about that, but don't hold your breath. For now I propose the attached, which bounds ginFillScanEntry's de-dup efforts at 1000 keys, fixes the silly coding in startScanKey, and adds a CHECK_FOR_INTERRUPTS in the startScanKey loop to cover the fact that it's still O(N^2) in the worst case. With this I don't notice any objectionable delay in response to cancel interrupts. Also, while it's still O(N^2) in the end, there's a pretty nice speedup for mid-size problems: NOTICE: Query for 10000/ 200000 took 6.957140 s vs NOTICE: Query for 10000/ 200000 took 649.409895 s regards, tom lane diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c index f3b19d280c3..4a56f19390d 100644 --- a/src/backend/access/gin/ginget.c +++ b/src/backend/access/gin/ginget.c @@ -558,16 +558,18 @@ startScanKey(GinState *ginstate, GinScanOpaque so, GinScanKey key) qsort_arg(entryIndexes, key->nentries, sizeof(int), entryIndexByFrequencyCmp, key); + for (i = 1; i < key->nentries; i++) + key->entryRes[entryIndexes[i]] = GIN_MAYBE; for (i = 0; i < key->nentries - 1; i++) { /* Pass all entries <= i as FALSE, and the rest as MAYBE */ - for (j = 0; j <= i; j++) - key->entryRes[entryIndexes[j]] = GIN_FALSE; - for (j = i + 1; j < key->nentries; j++) - key->entryRes[entryIndexes[j]] = GIN_MAYBE; + key->entryRes[entryIndexes[i]] = GIN_FALSE; if (key->triConsistentFn(key) == GIN_FALSE) break; + + /* Make this loop interruptible in case there are many keys */ + CHECK_FOR_INTERRUPTS(); } /* i is now the last required entry. */ diff --git a/src/backend/access/gin/ginscan.c b/src/backend/access/gin/ginscan.c index 63ded6301e2..a50694f6776 100644 --- a/src/backend/access/gin/ginscan.c +++ b/src/backend/access/gin/ginscan.c @@ -68,8 +68,12 @@ ginFillScanEntry(GinScanOpaque so, OffsetNumber attnum, * * Entries with non-null extra_data are never considered identical, since * we can't know exactly what the opclass might be doing with that. + * + * Also, give up de-duplication once we have 1000 entries. That avoids + * spending O(N^2) time on probably-fruitless de-duplication of large + * search-key sets. */ - if (extra_data == NULL) + if (extra_data == NULL && so->totalentries < 1000) { for (i = 0; i < so->totalentries; i++) {