Duplicate WHERE condition changes performance and plan - Mailing list pgsql-performance

From singh400@gmail.com
Subject Duplicate WHERE condition changes performance and plan
Date
Msg-id CAOtbvRKSSg5VOY1tE7h4tM09J=etjNWyM8cMm=G-cRk7=KyeEw@mail.gmail.com
Whole thread Raw
Responses Re: Duplicate WHERE condition changes performance and plan
Re: Duplicate WHERE condition changes performance and plan
List pgsql-performance
Hi,

We have an odd issue where specifying the same where clause twice causes PG to pick a much more efficent plan. We would like to know why.

Query A (this is the 'slow' query):
UPDATE problem_instance SET processed = false
FROM problem
WHERE problem.id = problem_instance.problem_id
AND problem.status != 2
AND processed = true;

Query B (this is the 'fast' query):
UPDATE problem_instance SET processed = false
FROM problem
WHERE problem.id = problem_instance.problem_id
AND problem.status != 2
AND problem.status != 2
AND processed = true;

The EXPLAIN ANALYZE for both queries can be found here:-
Query A: https://explain.depesz.com/s/lFuy
Query B: https://explain.depesz.com/s/Jqmv

The table definitions (including the indexes) can be found here:-
public.problem: https://gist.github.com/indy-singh/e90ee6d23d053d32c2564501720353df
public.problem_instance: https://gist.github.com/indy-singh/3c77096b91c89428752cf314d8e20286

Data stats:-
public.problem has around 10,000 rows and once the condition status != 2 is applied there are around 800 rows left.
public.problem_instance has around 592,000 rows and once the condition processed = true is applied there are around 370,000 rows left.

PG version:
PostgreSQL 9.5.19 on x86_64-pc-linux-gnu (Debian 9.5.19-1.pgdg90+1), compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit

-- SELECT relname, relpages, reltuples, relallvisible, relkind, relnatts, relhassubclass, reloptions, pg_table_size(oid) FROM pg_class WHERE relname='TABLE_NAME'
Table metadata:-
public.problem: https://gist.github.com/indy-singh/ff34a3b6e45432ea4be2bf0b5038e0be
public.problem_instance: https://gist.github.com/indy-singh/a09fe66c8a8840b7661ce9726ebcab71

Last Vacuum:-
public.problem: 2020-04-14 23:11:47.51056+01
public.problem_instance: 2020-04-14 20:11:04.187138+01

Last Analyze:
public.problem: 2020-04-14 23:11:47.592878+01
public.problem_instance: 2020-04-14 20:11:04.508432+01

Server Configuration: https://gist.github.com/indy-singh/8386d59206af042d365e5cd49fbae68f

I tried my best getting all the information up front, please let me know if I missed anything.

Thanks,
Indy

pgsql-performance by date:

Previous
From: Don Seiler
Date:
Subject: Re: High kswapd
Next
From: Stephen Carboni
Date:
Subject: Using unlogged tables for web sessions