Thread: [PERFORM]
Hello,
We have a query that is run almost each second and it's very important to squeeze every other ms out of it. The query is:
SELECT c0."id" FROM "campaign_jobs" AS c0
WHERE (((c0."status" = $1) AND NOT (c0."id" = ANY($2))))
OR ((c0."status" = $3) AND (c0."failed_at" > $4))
OR ((c0."status" = $5) AND (c0."started_at" < $6))
ORDER BY c0."priority" DESC, c0."times_failed"
LIMIT $7
FOR UPDATE SKIP LOCKED
We have a query that is run almost each second and it's very important to squeeze every other ms out of it. The query is:
SELECT c0."id" FROM "campaign_jobs" AS c0
WHERE (((c0."status" = $1) AND NOT (c0."id" = ANY($2))))
OR ((c0."status" = $3) AND (c0."failed_at" > $4))
OR ((c0."status" = $5) AND (c0."started_at" < $6))
ORDER BY c0."priority" DESC, c0."times_failed"
LIMIT $7
FOR UPDATE SKIP LOCKED
I added following index:
CREATE INDEX ON campaign_jobs(id, status, failed_at, started_at, priority DESC, times_failed);
And it didn't help at all, even opposite - the planning phase time grew up from ~2ms up to ~40 ms leaving execution time intact:
Limit (cost=29780.02..29781.27 rows=100 width=18) (actual time=827.753..828.113 rows=100 loops=1)
-> LockRows (cost=29780.02..32279.42 rows=199952 width=18) (actual time=827.752..828.096 rows=100 loops=1)
-> Sort (cost=29780.02..30279.90 rows=199952 width=18) (actual time=827.623..827.653 rows=100 loops=1)
Sort Key: priority DESC, times_failed
Sort Method: external sort Disk: 5472kB
-> Seq Scan on campaign_jobs c0 (cost=0.00..22138.00 rows=199952 width=18) (actual time=1.072..321.410 rows=200000 loops=1)
Filter: (((status = 0) AND (id <> ALL ('{1,2,3,4,5,6,7,8,9,10,11,12, 13,14,15,16,17,18,19,20,21,22, 23,24,25,26,27,28,29,30,31,32, 33,34,35,36,37,38,39,40,41,42, 43,44,45,46,47,48}'::integer[] ))) OR ((status = 2) AND (failed_at > '2017-06-22 03:18:09'::timestamp without time zone)) OR ((status = 1) AND (started_at < '2017-06-23 03:11:09'::timestamp without time zone)))
Planning time: 40.734 ms
Execution time: 913.638 ms
(9 rows)
I see that query still went through the Seq Scan instead of Index Scan. Is it due to poorly crafted index or because of query structure? Is it possible to make this query faster?
Thanks
2017-06-28 8:47 GMT+02:00 Yevhenii Kurtov <yevhenii.kurtov@gmail.com>:
Hello,
We have a query that is run almost each second and it's very important to squeeze every other ms out of it. The query is:
SELECT c0."id" FROM "campaign_jobs" AS c0
WHERE (((c0."status" = $1) AND NOT (c0."id" = ANY($2))))
OR ((c0."status" = $3) AND (c0."failed_at" > $4))
OR ((c0."status" = $5) AND (c0."started_at" < $6))
ORDER BY c0."priority" DESC, c0."times_failed"
LIMIT $7
FOR UPDATE SKIP LOCKEDI added following index:CREATE INDEX ON campaign_jobs(id, status, failed_at, started_at, priority DESC, times_failed);And it didn't help at all, even opposite - the planning phase time grew up from ~2ms up to ~40 ms leaving execution time intact:Limit (cost=29780.02..29781.27 rows=100 width=18) (actual time=827.753..828.113 rows=100 loops=1)-> LockRows (cost=29780.02..32279.42 rows=199952 width=18) (actual time=827.752..828.096 rows=100 loops=1)-> Sort (cost=29780.02..30279.90 rows=199952 width=18) (actual time=827.623..827.653 rows=100 loops=1)Sort Key: priority DESC, times_failedSort Method: external sort Disk: 5472kB-> Seq Scan on campaign_jobs c0 (cost=0.00..22138.00 rows=199952 width=18) (actual time=1.072..321.410 rows=200000 loops=1)Filter: (((status = 0) AND (id <> ALL ('{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22, 23,24,25,26,27,28,29,30,31,32, 33,34,35,36,37,38,39,40,41,42, 43,44,45,46,47,48}'::integer[] ))) OR ((status = 2) AND (failed_at > '2017-06-22 03:18:09'::timestamp without time zone)) OR ((status = 1) AND (started_at < '2017-06-23 03:11:09'::timestamp without time zone))) Planning time: 40.734 msExecution time: 913.638 ms(9 rows)I see that query still went through the Seq Scan instead of Index Scan. Is it due to poorly crafted index or because of query structure? Is it possible to make this query faster?
There are few issues
a) parametrized LIMIT
b) complex predicate with lot of OR
c) slow external sort
b) signalize maybe some strange in design .. try to replace "OR" by "UNION" query
c) if you can and you have good enough memory .. try to increase work_mem .. maybe 20MB
if you change query to union queries, then you can use conditional indexes
create index(id) where status = 0;
create index(failed_at) where status = 2;
create index(started_at) where status = 1;
Regards
Pavel
Thanks
Hello Pavel,
Can you please give a tip how to rewrite the query with UNION clause? I didn't use it at all before actually and afraid that will not get it properly from the first time :)
On Wed, Jun 28, 2017 at 2:12 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
2017-06-28 8:47 GMT+02:00 Yevhenii Kurtov <yevhenii.kurtov@gmail.com>:Hello,
We have a query that is run almost each second and it's very important to squeeze every other ms out of it. The query is:
SELECT c0."id" FROM "campaign_jobs" AS c0
WHERE (((c0."status" = $1) AND NOT (c0."id" = ANY($2))))
OR ((c0."status" = $3) AND (c0."failed_at" > $4))
OR ((c0."status" = $5) AND (c0."started_at" < $6))
ORDER BY c0."priority" DESC, c0."times_failed"
LIMIT $7
FOR UPDATE SKIP LOCKEDI added following index:CREATE INDEX ON campaign_jobs(id, status, failed_at, started_at, priority DESC, times_failed);And it didn't help at all, even opposite - the planning phase time grew up from ~2ms up to ~40 ms leaving execution time intact:Limit (cost=29780.02..29781.27 rows=100 width=18) (actual time=827.753..828.113 rows=100 loops=1)-> LockRows (cost=29780.02..32279.42 rows=199952 width=18) (actual time=827.752..828.096 rows=100 loops=1)-> Sort (cost=29780.02..30279.90 rows=199952 width=18) (actual time=827.623..827.653 rows=100 loops=1)Sort Key: priority DESC, times_failedSort Method: external sort Disk: 5472kB-> Seq Scan on campaign_jobs c0 (cost=0.00..22138.00 rows=199952 width=18) (actual time=1.072..321.410 rows=200000 loops=1)Filter: (((status = 0) AND (id <> ALL ('{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22, 23,24,25,26,27,28,29,30,31,32, 33,34,35,36,37,38,39,40,41,42, 43,44,45,46,47,48}'::integer[] ))) OR ((status = 2) AND (failed_at > '2017-06-22 03:18:09'::timestamp without time zone)) OR ((status = 1) AND (started_at < '2017-06-23 03:11:09'::timestamp without time zone))) Planning time: 40.734 msExecution time: 913.638 ms(9 rows)I see that query still went through the Seq Scan instead of Index Scan. Is it due to poorly crafted index or because of query structure? Is it possible to make this query faster?There are few issuesa) parametrized LIMITb) complex predicate with lot of ORc) slow external sortb) signalize maybe some strange in design .. try to replace "OR" by "UNION" queryc) if you can and you have good enough memory .. try to increase work_mem .. maybe 20MBif you change query to union queries, then you can use conditional indexescreate index(id) where status = 0;create index(failed_at) where status = 2;create index(started_at) where status = 1;RegardsPavelThanks
2017-06-28 9:28 GMT+02:00 Yevhenii Kurtov <yevhenii.kurtov@gmail.com>:
SELECT c0."id" FROM "campaign_jobs" AS c0
WHERE (((c0."status" = $1) AND NOT (c0."id" = ANY($2))))
Hello Pavel,Can you please give a tip how to rewrite the query with UNION clause? I didn't use it at all before actually and afraid that will not get it properly from the first time :)
WHERE (((c0."status" = $1) AND NOT (c0."id" = ANY($2))))
UNION SELECT c0."id" FROM "campaign_jobs" AS c0
WHERE ((c0."status" = $3) AND (c0."failed_at" > $4))
UNION SELECT c0."id" FROM "campaign_jobs" AS c0
WHERE ((c0."status" = $5) AND (c0."started_at" < $6))
ORDER BY c0."priority" DESC, c0."times_failed"
LIMIT $7
FOR UPDATE SKIP LOCKED
WHERE ((c0."status" = $5) AND (c0."started_at" < $6))
ORDER BY c0."priority" DESC, c0."times_failed"
LIMIT $7
FOR UPDATE SKIP LOCKED
Something like this
Pavel
On Wed, Jun 28, 2017 at 2:12 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:2017-06-28 8:47 GMT+02:00 Yevhenii Kurtov <yevhenii.kurtov@gmail.com>:Hello,
We have a query that is run almost each second and it's very important to squeeze every other ms out of it. The query is:
SELECT c0."id" FROM "campaign_jobs" AS c0
WHERE (((c0."status" = $1) AND NOT (c0."id" = ANY($2))))
OR ((c0."status" = $3) AND (c0."failed_at" > $4))
OR ((c0."status" = $5) AND (c0."started_at" < $6))
ORDER BY c0."priority" DESC, c0."times_failed"
LIMIT $7
FOR UPDATE SKIP LOCKEDI added following index:CREATE INDEX ON campaign_jobs(id, status, failed_at, started_at, priority DESC, times_failed);And it didn't help at all, even opposite - the planning phase time grew up from ~2ms up to ~40 ms leaving execution time intact:Limit (cost=29780.02..29781.27 rows=100 width=18) (actual time=827.753..828.113 rows=100 loops=1)-> LockRows (cost=29780.02..32279.42 rows=199952 width=18) (actual time=827.752..828.096 rows=100 loops=1)-> Sort (cost=29780.02..30279.90 rows=199952 width=18) (actual time=827.623..827.653 rows=100 loops=1)Sort Key: priority DESC, times_failedSort Method: external sort Disk: 5472kB-> Seq Scan on campaign_jobs c0 (cost=0.00..22138.00 rows=199952 width=18) (actual time=1.072..321.410 rows=200000 loops=1)Filter: (((status = 0) AND (id <> ALL ('{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22, 23,24,25,26,27,28,29,30,31,32, 33,34,35,36,37,38,39,40,41,42, 43,44,45,46,47,48}'::integer[] ))) OR ((status = 2) AND (failed_at > '2017-06-22 03:18:09'::timestamp without time zone)) OR ((status = 1) AND (started_at < '2017-06-23 03:11:09'::timestamp without time zone))) Planning time: 40.734 msExecution time: 913.638 ms(9 rows)I see that query still went through the Seq Scan instead of Index Scan. Is it due to poorly crafted index or because of query structure? Is it possible to make this query faster?There are few issuesa) parametrized LIMITb) complex predicate with lot of ORc) slow external sortb) signalize maybe some strange in design .. try to replace "OR" by "UNION" queryc) if you can and you have good enough memory .. try to increase work_mem .. maybe 20MBif you change query to union queries, then you can use conditional indexescreate index(id) where status = 0;create index(failed_at) where status = 2;create index(started_at) where status = 1;RegardsPavelThanks
----- Mensaje original ----- > De: "Yevhenii Kurtov" <yevhenii.kurtov@gmail.com> > Para: pgsql-performance@postgresql.org > Enviados: Miércoles, 28 de Junio 2017 3:47:44 > Asunto: [PERFORM] > > Hello, > > We have a query that is run almost each second and it's very important to > squeeze every other ms out of it. The query is: > > SELECT c0."id" FROM "campaign_jobs" AS c0 > WHERE (((c0."status" = $1) AND NOT (c0."id" = ANY($2)))) > OR ((c0."status" = $3) AND (c0."failed_at" > $4)) > OR ((c0."status" = $5) AND (c0."started_at" < $6)) > ORDER BY c0."priority" DESC, c0."times_failed" > LIMIT $7 > FOR UPDATE SKIP LOCKED > > I added following index: > > CREATE INDEX ON campaign_jobs(id, status, failed_at, started_at, priority > DESC, times_failed); > > And it didn't help at all, even opposite - the planning phase time grew up > from ~2ms up to ~40 ms leaving execution time intact: > > Limit (cost=29780.02..29781.27 rows=100 width=18) (actual > time=827.753..828.113 rows=100 loops=1) > -> LockRows (cost=29780.02..32279.42 rows=199952 width=18) (actual > time=827.752..828.096 rows=100 loops=1) > -> Sort (cost=29780.02..30279.90 rows=199952 width=18) (actual > time=827.623..827.653 rows=100 loops=1) > Sort Key: priority DESC, times_failed > Sort Method: external sort Disk: 5472kB > -> Seq Scan on campaign_jobs c0 (cost=0.00..22138.00 > rows=199952 width=18) (actual time=1.072..321.410 rows=200000 loops=1) > Filter: (((status = 0) AND (id <> ALL > ('{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22, > 23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42, > 43,44,45,46,47,48}'::integer[]))) OR ((status = 2) AND (failed_at > > '2017-06-22 03:18:09'::timestamp without time zone)) OR ((status = 1) AND > (started_at < '2017-06-23 03:11:09'::timestamp without time zone))) > Planning time: 40.734 ms > Execution time: 913.638 ms > (9 rows) > > > I see that query still went through the Seq Scan instead of Index Scan. Is > it due to poorly crafted index or because of query structure? Is it > possible to make this query faster? > > > Thanks > Well, most of the time is spent ordering, and it is doing a (slow) disk sort. Try increasing work_mem for a in-memory sort. How many rows in campaign_jobs? If the query is returning most of the rows in the table, it will not going to use any indexanyway. HTH Gerardo
On 2017-06-28, Pavel Stehule wrote ... > On 2017-06-28, Yevhenii Kurtov wrote ... >> On 2017-06-28, Pavel Stehule wrote ... >>> On 2017-06-28, Yevhenii Kurtov wrote ... >>>> We have a query that is run almost each second and it's very important to squeeze every other ms out of it. The queryis: >>>> ... >>>> I added following index: CREATE INDEX ON campaign_jobs(id, status, failed_at, started_at, priority DESC, times_failed); >>>> ... >>> There are few issues >>> a) parametrized LIMIT >>> b) complex predicate with lot of OR >>> c) slow external sort >>> >>> b) signalize maybe some strange in design .. try to replace "OR" by "UNION" query >>> c) if you can and you have good enough memory .. try to increase work_mem .. maybe 20MB >>> >>> if you change query to union queries, then you can use conditional indexes >>> >>> create index(id) where status = 0; >>> create index(failed_at) where status = 2; >>> create index(started_at) where status = 1; >> >> Can you please give a tip how to rewrite the query with UNION clause? > > SELECT c0."id" FROM "campaign_jobs" AS c0 > WHERE (((c0."status" = $1) AND NOT (c0."id" = ANY($2)))) > UNION SELECT c0."id" FROM "campaign_jobs" AS c0 > WHERE ((c0."status" = $3) AND (c0."failed_at" > $4)) > UNION SELECT c0."id" FROM "campaign_jobs" AS c0 > WHERE ((c0."status" = $5) AND (c0."started_at" < $6)) > ORDER BY c0."priority" DESC, c0."times_failed" > LIMIT $7 > FOR UPDATE SKIP LOCKED Normally (at least for developers I've worked with), that kind of query structure is used when the "status" values don'toverlap and don't change from query to query. Judging from Pavel's suggested conditional indexes (i.e. "where status= <constant>"), he also thinks that is likely. Give the optimizer that information so that it can use it. Assuming $1 = 0 and $3 = 2 and $5 = 1, substitute literals. Substituteliteral for $7 in limit. Push order by and limit to each branch of the union all (or does Postgres figure thatout automatically?) Replace union with union all (not sure about Postgres, but allows other dbms to avoid sorting andmerging result sets to eliminate duplicates). (Use of UNION ALL assumes that "id" is unique across rows as implied byonly "id" being selected with FOR UPDATE. If multiple rows can have the same "id", then use UNION to eliminate the duplicates.) SELECT "id" FROM "campaign_jobs" WHERE "status" = 0 AND NOT "id" = ANY($1) UNION ALL SELECT "id" FROM "campaign_jobs" WHERE "status" = 2 AND "failed_at" > $2 UNION ALL SELECT "id" FROM "campaign_jobs" WHERE "status" = 1 AND "started_at" < $3 ORDER BY "priority" DESC, "times_failed" LIMIT 100 FOR UPDATE SKIP LOCKED Another thing that you could try is to push the ORDER BY and LIMIT to the branches of the UNION (or does Postgres figurethat out automatically?) and use slightly different indexes. This may not make sense for all the branches but one nicething about UNION is that each branch can be tweaked independently. Also, there are probably unmentioned functional dependenciesthat you can use to reduce the index size and/or improve your match rate. Example - if status = 1 means thatthe campaign_job has started but not failed or completed, then you may know that started_at is set, but failed_at andended_at are null. The < comparison in and of itself implies that only rows where "started_at" is not null will matchthe condition. SELECT c0."id" FROM "campaign_jobs" AS c0 WHERE (((c0."status" = 0) AND NOT (c0."id" = ANY($1)))) ORDER BY c0."priority"DESC, c0."times_failed" LIMIT 100 UNION ALL SELECT c0."id" FROM "campaign_jobs" AS c0 WHERE ((c0."status" = 2) AND (c0."failed_at" > $2)) ORDER BY c0."priority" DESC,c0."times_failed" LIMIT 100 UNION ALL SELECT c0."id" FROM "campaign_jobs" AS c0 WHERE ((c0."status" = 1) AND (c0."started_at" < $3)) ORDER BY c0."priority" DESC,c0."times_failed" LIMIT 100 ORDER BY c0."priority" DESC, c0."times_failed" LIMIT 100 FOR UPDATE SKIP LOCKED Including the "priority", "times_failed" and "id" columns in the indexes along with "failed_at"/"started_at" allows the optimizerto do index only scans. (May still have to do random I/O to the data page to determine tuple version visibilitybut I don't think that can be eliminated.) create index ... ("priority" desc, "times_failed", "id") where "status" = 0; create index ... ("priority" desc, "times_failed", "id", "failed_at") where "status" = 2 and "failed_at" is not null; create index ... ("priority" desc, "times_failed", "id", "started_at") where "status" = 1 and "started_at" is not null; --and ended_at is null and ... I'm assuming that the optimizer knows that "where status = 1 and started_at < $3" implies "and started_at is not null" andwill consider the conditional index. If not, then the "and started_at is not null" needs to be explicit.
Hello folks,
Thank you very much for analysis and suggested - there is a lot to learn here. I just tried UNION queries and got following error:
ERROR: FOR UPDATE is not allowed with UNION/INTERSECT/EXCEPT
ERROR: FOR UPDATE is not allowed with UNION/INTERSECT/EXCEPT
I made a table dump for anyone who wants to give it a spin https://app.box.com/s/464b12glmlk5o4gvzz7krc4c8s2fxlwr
and here is the gist for the original commands https://gist.github.com/lessless/33215d0c147645db721e74e07498ac53
On Wed, Jun 28, 2017 at 8:10 PM, Brad DeJong <Brad.Dejong@infor.com> wrote:
On 2017-06-28, Pavel Stehule wrote ...
> On 2017-06-28, Yevhenii Kurtov wrote ...
>> On 2017-06-28, Pavel Stehule wrote ...
>>> On 2017-06-28, Yevhenii Kurtov wrote ...
>>>> We have a query that is run almost each second and it's very important to squeeze every other ms out of it. The query is:
>>>> ...
>>>> I added following index: CREATE INDEX ON campaign_jobs(id, status, failed_at, started_at, priority DESC, times_failed);
>>>> ...
>>> There are few issues
>>> a) parametrized LIMIT
>>> b) complex predicate with lot of OR
>>> c) slow external sort
>>>
>>> b) signalize maybe some strange in design .. try to replace "OR" by "UNION" query
>>> c) if you can and you have good enough memory .. try to increase work_mem .. maybe 20MB
>>>
>>> if you change query to union queries, then you can use conditional indexes
>>>
>>> create index(id) where status = 0;
>>> create index(failed_at) where status = 2;
>>> create index(started_at) where status = 1;
>>
>> Can you please give a tip how to rewrite the query with UNION clause?
>
> SELECT c0."id" FROM "campaign_jobs" AS c0
> WHERE (((c0."status" = $1) AND NOT (c0."id" = ANY($2))))
> UNION SELECT c0."id" FROM "campaign_jobs" AS c0
> WHERE ((c0."status" = $3) AND (c0."failed_at" > $4))
> UNION SELECT c0."id" FROM "campaign_jobs" AS c0
> WHERE ((c0."status" = $5) AND (c0."started_at" < $6))
> ORDER BY c0."priority" DESC, c0."times_failed"
> LIMIT $7
> FOR UPDATE SKIP LOCKED
Normally (at least for developers I've worked with), that kind of query structure is used when the "status" values don't overlap and don't change from query to query. Judging from Pavel's suggested conditional indexes (i.e. "where status = <constant>"), he also thinks that is likely.
Give the optimizer that information so that it can use it. Assuming $1 = 0 and $3 = 2 and $5 = 1, substitute literals. Substitute literal for $7 in limit. Push order by and limit to each branch of the union all (or does Postgres figure that out automatically?) Replace union with union all (not sure about Postgres, but allows other dbms to avoid sorting and merging result sets to eliminate duplicates). (Use of UNION ALL assumes that "id" is unique across rows as implied by only "id" being selected with FOR UPDATE. If multiple rows can have the same "id", then use UNION to eliminate the duplicates.)
SELECT "id" FROM "campaign_jobs" WHERE "status" = 0 AND NOT "id" = ANY($1)
UNION ALL
SELECT "id" FROM "campaign_jobs" WHERE "status" = 2 AND "failed_at" > $2
UNION ALL
SELECT "id" FROM "campaign_jobs" WHERE "status" = 1 AND "started_at" < $3
ORDER BY "priority" DESC, "times_failed"
LIMIT 100
FOR UPDATE SKIP LOCKED
Another thing that you could try is to push the ORDER BY and LIMIT to the branches of the UNION (or does Postgres figure that out automatically?) and use slightly different indexes. This may not make sense for all the branches but one nice thing about UNION is that each branch can be tweaked independently. Also, there are probably unmentioned functional dependencies that you can use to reduce the index size and/or improve your match rate. Example - if status = 1 means that the campaign_job has started but not failed or completed, then you may know that started_at is set, but failed_at and ended_at are null. The < comparison in and of itself implies that only rows where "started_at" is not null will match the condition.
SELECT c0."id" FROM "campaign_jobs" AS c0 WHERE (((c0."status" = 0) AND NOT (c0."id" = ANY($1)))) ORDER BY c0."priority" DESC, c0."times_failed" LIMIT 100
UNION ALL
SELECT c0."id" FROM "campaign_jobs" AS c0 WHERE ((c0."status" = 2) AND (c0."failed_at" > $2)) ORDER BY c0."priority" DESC, c0."times_failed" LIMIT 100
UNION ALL
SELECT c0."id" FROM "campaign_jobs" AS c0 WHERE ((c0."status" = 1) AND (c0."started_at" < $3)) ORDER BY c0."priority" DESC, c0."times_failed" LIMIT 100
ORDER BY c0."priority" DESC, c0."times_failed"
LIMIT 100
FOR UPDATE SKIP LOCKED
Including the "priority", "times_failed" and "id" columns in the indexes along with "failed_at"/"started_at" allows the optimizer to do index only scans. (May still have to do random I/O to the data page to determine tuple version visibility but I don't think that can be eliminated.)
create index ... ("priority" desc, "times_failed", "id") where "status" = 0;
create index ... ("priority" desc, "times_failed", "id", "failed_at") where "status" = 2 and "failed_at" is not null;
create index ... ("priority" desc, "times_failed", "id", "started_at") where "status" = 1 and "started_at" is not null; -- and ended_at is null and ...
I'm assuming that the optimizer knows that "where status = 1 and started_at < $3" implies "and started_at is not null" and will consider the conditional index. If not, then the "and started_at is not null" needs to be explicit.
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
2017-06-29 7:17 GMT+02:00 Yevhenii Kurtov <yevhenii.kurtov@gmail.com>:
Hello folks,Thank you very much for analysis and suggested - there is a lot to learn here. I just tried UNION queries and got following error:
ERROR: FOR UPDATE is not allowed with UNION/INTERSECT/EXCEPT
it is sad :(
maybe bitmap index scan can work
postgres=# create table test(id int, started date, failed date, status int);
CREATE TABLE
postgres=# create index on test(id) where status = 0;
CREATE INDEX
postgres=# create index on test(started) where status = 1;
CREATE INDEX
postgres=# create index on test(failed ) where status = 2;
CREATE INDEX
postgres=# explain select id from test where (status = 0 and id in (1,2,3,4,5)) or (status = 1 and started < current_date) or (status = 2 and failed > current_date);
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│ QUERY PLAN
╞════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
│ Bitmap Heap Scan on test (cost=12.93..22.50 rows=6 width=4)
│ Recheck Cond: (((id = ANY ('{1,2,3,4,5}'::integer[])) AND (status = 0)) OR ((started < CURRENT_DATE) AND (status = 1)) OR ((faile
│ Filter: (((status = 0) AND (id = ANY ('{1,2,3,4,5}'::integer[]))) OR ((status = 1) AND (started < CURRENT_DATE)) OR ((status = 2)
│ -> BitmapOr (cost=12.93..12.93 rows=6 width=0)
│ -> Bitmap Index Scan on test_id_idx (cost=0.00..4.66 rows=1 width=0)
│ Index Cond: (id = ANY ('{1,2,3,4,5}'::integer[]))
│ -> Bitmap Index Scan on test_started_idx (cost=0.00..4.13 rows=3 width=0)
│ Index Cond: (started < CURRENT_DATE)
│ -> Bitmap Index Scan on test_failed_idx (cost=0.00..4.13 rows=3 width=0)
│ Index Cond: (failed > CURRENT_DATE)
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
(10 rows)
I made a table dump for anyone who wants to give it a spin https://app.box.com/s/464b12glmlk5o4gvzz7krc4c8s2fxl wr and here is the gist for the original commands https://gist.github.com/lessless/ 33215d0c147645db721e74e07498ac 53 On Wed, Jun 28, 2017 at 8:10 PM, Brad DeJong <Brad.Dejong@infor.com> wrote:
On 2017-06-28, Pavel Stehule wrote ...
> On 2017-06-28, Yevhenii Kurtov wrote ...
>> On 2017-06-28, Pavel Stehule wrote ...
>>> On 2017-06-28, Yevhenii Kurtov wrote ...
>>>> We have a query that is run almost each second and it's very important to squeeze every other ms out of it. The query is:
>>>> ...
>>>> I added following index: CREATE INDEX ON campaign_jobs(id, status, failed_at, started_at, priority DESC, times_failed);
>>>> ...
>>> There are few issues
>>> a) parametrized LIMIT
>>> b) complex predicate with lot of OR
>>> c) slow external sort
>>>
>>> b) signalize maybe some strange in design .. try to replace "OR" by "UNION" query
>>> c) if you can and you have good enough memory .. try to increase work_mem .. maybe 20MB
>>>
>>> if you change query to union queries, then you can use conditional indexes
>>>
>>> create index(id) where status = 0;
>>> create index(failed_at) where status = 2;
>>> create index(started_at) where status = 1;
>>
>> Can you please give a tip how to rewrite the query with UNION clause?
>
> SELECT c0."id" FROM "campaign_jobs" AS c0
> WHERE (((c0."status" = $1) AND NOT (c0."id" = ANY($2))))
> UNION SELECT c0."id" FROM "campaign_jobs" AS c0
> WHERE ((c0."status" = $3) AND (c0."failed_at" > $4))
> UNION SELECT c0."id" FROM "campaign_jobs" AS c0
> WHERE ((c0."status" = $5) AND (c0."started_at" < $6))
> ORDER BY c0."priority" DESC, c0."times_failed"
> LIMIT $7
> FOR UPDATE SKIP LOCKED
Normally (at least for developers I've worked with), that kind of query structure is used when the "status" values don't overlap and don't change from query to query. Judging from Pavel's suggested conditional indexes (i.e. "where status = <constant>"), he also thinks that is likely.
Give the optimizer that information so that it can use it. Assuming $1 = 0 and $3 = 2 and $5 = 1, substitute literals. Substitute literal for $7 in limit. Push order by and limit to each branch of the union all (or does Postgres figure that out automatically?) Replace union with union all (not sure about Postgres, but allows other dbms to avoid sorting and merging result sets to eliminate duplicates). (Use of UNION ALL assumes that "id" is unique across rows as implied by only "id" being selected with FOR UPDATE. If multiple rows can have the same "id", then use UNION to eliminate the duplicates.)
SELECT "id" FROM "campaign_jobs" WHERE "status" = 0 AND NOT "id" = ANY($1)
UNION ALL
SELECT "id" FROM "campaign_jobs" WHERE "status" = 2 AND "failed_at" > $2
UNION ALL
SELECT "id" FROM "campaign_jobs" WHERE "status" = 1 AND "started_at" < $3
ORDER BY "priority" DESC, "times_failed"
LIMIT 100
FOR UPDATE SKIP LOCKED
Another thing that you could try is to push the ORDER BY and LIMIT to the branches of the UNION (or does Postgres figure that out automatically?) and use slightly different indexes. This may not make sense for all the branches but one nice thing about UNION is that each branch can be tweaked independently. Also, there are probably unmentioned functional dependencies that you can use to reduce the index size and/or improve your match rate. Example - if status = 1 means that the campaign_job has started but not failed or completed, then you may know that started_at is set, but failed_at and ended_at are null. The < comparison in and of itself implies that only rows where "started_at" is not null will match the condition.
SELECT c0."id" FROM "campaign_jobs" AS c0 WHERE (((c0."status" = 0) AND NOT (c0."id" = ANY($1)))) ORDER BY c0."priority" DESC, c0."times_failed" LIMIT 100
UNION ALL
SELECT c0."id" FROM "campaign_jobs" AS c0 WHERE ((c0."status" = 2) AND (c0."failed_at" > $2)) ORDER BY c0."priority" DESC, c0."times_failed" LIMIT 100
UNION ALL
SELECT c0."id" FROM "campaign_jobs" AS c0 WHERE ((c0."status" = 1) AND (c0."started_at" < $3)) ORDER BY c0."priority" DESC, c0."times_failed" LIMIT 100
ORDER BY c0."priority" DESC, c0."times_failed"
LIMIT 100
FOR UPDATE SKIP LOCKED
Including the "priority", "times_failed" and "id" columns in the indexes along with "failed_at"/"started_at" allows the optimizer to do index only scans. (May still have to do random I/O to the data page to determine tuple version visibility but I don't think that can be eliminated.)
create index ... ("priority" desc, "times_failed", "id") where "status" = 0;
create index ... ("priority" desc, "times_failed", "id", "failed_at") where "status" = 2 and "failed_at" is not null;
create index ... ("priority" desc, "times_failed", "id", "started_at") where "status" = 1 and "started_at" is not null; -- and ended_at is null and ...
I'm assuming that the optimizer knows that "where status = 1 and started_at < $3" implies "and started_at is not null" and will consider the conditional index. If not, then the "and started_at is not null" needs to be explicit.
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
On Tue, Jun 27, 2017 at 11:47 PM, Yevhenii Kurtov <yevhenii.kurtov@gmail.com> wrote:
Hello,
We have a query that is run almost each second and it's very important to squeeze every other ms out of it. The query is:
SELECT c0."id" FROM "campaign_jobs" AS c0
WHERE (((c0."status" = $1) AND NOT (c0."id" = ANY($2))))
OR ((c0."status" = $3) AND (c0."failed_at" > $4))
OR ((c0."status" = $5) AND (c0."started_at" < $6))
ORDER BY c0."priority" DESC, c0."times_failed"
LIMIT $7
FOR UPDATE SKIP LOCKED
I see that query still went through the Seq Scan instead of Index Scan. Is it due to poorly crafted index or because of query structure? Is it possible to make this query faster?
An index on (priority desc, times_failed) should speed this up massively. Might want to include status at the end as well. However, your example data is not terribly realistic.
What version of PostgreSQL are you using?
Cheers,
Jeff
Hi Jeff,
That is just a sample data, we are going live in Jun and I don't have anything real so far. Right now it's 9.6 and it will be a latest stable available release on the date that we go live.
On Fri, Jun 30, 2017 at 1:11 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Tue, Jun 27, 2017 at 11:47 PM, Yevhenii Kurtov <yevhenii.kurtov@gmail.com> wrote:Hello,
We have a query that is run almost each second and it's very important to squeeze every other ms out of it. The query is:
SELECT c0."id" FROM "campaign_jobs" AS c0
WHERE (((c0."status" = $1) AND NOT (c0."id" = ANY($2))))
OR ((c0."status" = $3) AND (c0."failed_at" > $4))
OR ((c0."status" = $5) AND (c0."started_at" < $6))
ORDER BY c0."priority" DESC, c0."times_failed"
LIMIT $7
FOR UPDATE SKIP LOCKEDI see that query still went through the Seq Scan instead of Index Scan. Is it due to poorly crafted index or because of query structure? Is it possible to make this query faster?An index on (priority desc, times_failed) should speed this up massively. Might want to include status at the end as well. However, your example data is not terribly realistic.What version of PostgreSQL are you using?Cheers,Jeff
Pavel Stehule wrote: > 2017-06-29 7:17 GMT+02:00 Yevhenii Kurtov <yevhenii.kurtov@gmail.com>: > > > I just tried UNION queries and got following error: > > > > ERROR: FOR UPDATE is not allowed with UNION/INTERSECT/EXCEPT > > it is sad :( I think we could lift this restriction for UNION ALL, but UNION sounds difficult. BTW I wonder how much of the original problem is caused by using a prepared query. I understand the desire to avoid repeated planning work, but I think in this case it may be working against you. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Jun 29, 2017 at 1:11 PM, Yevhenii Kurtov <yevhenii.kurtov@gmail.com> wrote: > Hi Jeff, > > That is just a sample data, we are going live in Jun and I don't have > anything real so far. Right now it's 9.6 and it will be a latest stable > available release on the date that we go live. Trust me on this one, you want to get some realistic fake data in there, and in realistic quantities before you go live to test. Postgresql's planner makes decisions based on size of the data it has to trundle through and statistical analysis of the data in the tables etc. You don't wanna go from 500 rows in a test table with the same values to 10,000,000 rows with wildly varying data in production without having some clue where that db is gonna be headed performance wise.
On Thu, Jun 29, 2017 at 12:11 PM, Yevhenii Kurtov <yevhenii.kurtov@gmail.com> wrote:
Hi Jeff,That is just a sample data, we are going live in Jun and I don't have anything real so far. Right now it's 9.6 and it will be a latest stable available release on the date that we go live.
You need to use your knowledge of the application to come up with some plausible sample data.
What happens when something succeeds? Does it get deleted from the table, or does it get retained but with a certain value of the status column? If it is retained, what happens to the priority and times_failed fields?
The performance of your queuing table will critically depend on that.
If you need to keep it once it succeeds, you should probably do that by deleting it from the queuing table and inserting it into a history table. It is much easier to keep performance up with that kind of design.
Cheers,
Jeff