I'm struggling with a query that seems to use a suboptimal query plan.
Schema: units reference a subjob reference a job. In other words: a job contains multiple subjobs. A subjob contains
multipleunits. (full schema below)
We're trying to select all subjobs that need to be reviewed and that contain units that aren't reviewed yet (either
becausevalidated is NULL or validated is 'N')
Notice the EXISTS with subquery which will turn out to be the problem:
(SELECT s0_m0_msubJobs."__id"
AS
s0_msubJobs_mid,
s0_m0_msubJobs."document_mflow"
AS s0_msubJobs_mdocument_mflow,
s0_m0_msubJobs."status"
AS s0_msubJobs_mstatus,
s0_m0_msubJobs."error_mmessage"
AS s0_msubJobs_merror_mmessage,
s0_m0_msubJobs."validation_mrequired"
AS s0_msubJobs_mvalidation_mrequired,
s0_m0_msubJobs."completion_mdate"
AS s0_msubJobs_mcompletion_mdate,
s0_m0_msubJobs."creation_mdate"
AS s0_msubJobs_mcreation_mdate,
s0_m0_msubJobs."file_mlocation"
AS s0_msubJobs_mfile_mlocation,
s0_m1_mjob."__id"
AS s0_mjob_mid,
s0_m1_mjob."xml_mname"
AS s0_mjob_mxml_mname,
( s0_m0_msubJobs."creation_mdate" )
AS e0_m4
FROM "subJobs" s0_m0_msubJobs,
"job" s0_m1_mjob
WHERE ( ( ( ( s0_m0_msubJobs."status" ) = ( 'IN_PROGRESS' ) )
AND ( ( s0_m0_msubJobs."validation_mrequired" ) = ( 'Y' ) ) )
AND ( EXISTS (((SELECT s1_m1_munit."__id" AS s1_munit_mid
FROM "subJobs" s1_m0_msubJobs,
"unit" s1_m1_munit
WHERE ( ( ( s0_m0_msubJobs."__id" ) =
( s1_m0_msubJobs."__id" ) )
AND
( s1_m0_msubJobs."__id" = s1_m1_munit."subJobs_mid" ) )
AND ( ( NOT ( s1_m1_munit."validated" IS NOT NULL ) )
OR ( ( s1_m1_munit."validated" ) = ( 'N'
) ) )))
)
) )
AND ( s0_m0_msubJobs."job_mid" = s0_m1_mjob."__id" ))
ORDER BY e0_m4 DESC,
s0_mjob_mid nulls first,
s0_msubjobs_mid nulls first
This generates the following query plan
Sort (cost=63242.75..63242.83 rows=30 width=503) (actual time=804.180..804.182 rows=49 loops=1)
Sort Key: s0_m0_msubjobs.creation_mdate, s0_m1_mjob.__id, s0_m0_msubjobs.__id
Sort Method: quicksort Memory: 31kB
Buffers: shared hit=3855 read=13852
-> Hash Join (cost=63087.27..63242.02 rows=30 width=503) (actual time=803.045..804.144 rows=49 loops=1)
Hash Cond: (s0_m0_msubjobs.job_mid = s0_m1_mjob.__id)
Buffers: shared hit=3855 read=13852
-> Hash Join (cost=63069.02..63223.35 rows=30 width=484) (actual time=802.875..803.953 rows=49 loops=1)
Hash Cond: (s1_m0_msubjobs.__id = s0_m0_msubjobs.__id)
Buffers: shared hit=3848 read=13852
-> HashAggregate (cost=63014.58..63060.13 rows=4555 width=16) (actual time=802.733..803.452 rows=4555
loops=1)
Buffers: shared hit=3808 read=13852
-> Hash Join (cost=149.49..59533.65 rows=1392372 width=16) (actual time=1.157..620.181
rows=1392372loops=1)
Hash Cond: (s1_m1_munit."subJobs_mid" = s1_m0_msubjobs.__id)
Buffers: shared hit=3808 read=13852
-> Seq Scan on unit s1_m1_munit (cost=0.00..35017.65 rows=1392372 width=8) (actual
time=0.004..211.780rows=1392372 loops=1)
Filter: ((validated IS NULL) OR ((validated)::text = 'N'::text))
Buffers: shared hit=3761 read=13852
-> Hash (cost=92.55..92.55 rows=4555 width=8) (actual time=1.140..1.140 rows=4555 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 178kB
Buffers: shared hit=47
-> Seq Scan on "subJobs" s1_m0_msubjobs (cost=0.00..92.55 rows=4555 width=8) (actual
time=0.004..0.551rows=4555 loops=1)
Buffers: shared hit=47
-> Hash (cost=54.07..54.07 rows=30 width=484) (actual time=0.122..0.122 rows=49 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 5kB
Buffers: shared hit=40
-> Bitmap Heap Scan on "subJobs" s0_m0_msubjobs (cost=5.20..54.07 rows=30 width=484) (actual
time=0.046..0.110rows=49 loops=1)
Recheck Cond: ((status)::text = 'IN_PROGRESS'::text)
Filter: ((validation_mrequired)::text = 'Y'::text)
Buffers: shared hit=40
-> Bitmap Index Scan on subjob_status (cost=0.00..5.19 rows=125 width=0) (actual
time=0.034..0.034rows=125 loops=1)
Index Cond: ((status)::text = 'IN_PROGRESS'::text)
Buffers: shared hit=2
-> Hash (cost=12.00..12.00 rows=500 width=27) (actual time=0.165..0.165 rows=500 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 30kB
Buffers: shared hit=7
-> Seq Scan on job s0_m1_mjob (cost=0.00..12.00 rows=500 width=27) (actual time=0.005..0.085 rows=500
loops=1)
Buffers: shared hit=7
Total runtime: 804.382 ms
Now, if we add OFFSET 0 to the EXISTS subquery (which shouldn't alter the query's meaning - correct?)
EXPLAIN (ANALYZE, BUFFERS) (SELECT s0_m0_msubJobs."__id"
AS
s0_msubJobs_mid,
s0_m0_msubJobs."document_mflow"
AS s0_msubJobs_mdocument_mflow,
s0_m0_msubJobs."status"
AS s0_msubJobs_mstatus,
s0_m0_msubJobs."error_mmessage"
AS s0_msubJobs_merror_mmessage,
s0_m0_msubJobs."validation_mrequired"
AS s0_msubJobs_mvalidation_mrequired,
s0_m0_msubJobs."completion_mdate"
AS s0_msubJobs_mcompletion_mdate,
s0_m0_msubJobs."creation_mdate"
AS s0_msubJobs_mcreation_mdate,
s0_m0_msubJobs."file_mlocation"
AS s0_msubJobs_mfile_mlocation,
s0_m1_mjob."__id"
AS s0_mjob_mid,
s0_m1_mjob."xml_mname"
AS s0_mjob_mxml_mname,
( s0_m0_msubJobs."creation_mdate" )
AS e0_m4
FROM "subJobs" s0_m0_msubJobs,
"job" s0_m1_mjob
WHERE ( ( ( ( s0_m0_msubJobs."status" ) = ( 'IN_PROGRESS' ) )
AND ( ( s0_m0_msubJobs."validation_mrequired" ) = ( 'Y' ) ) )
AND ( EXISTS (((SELECT s1_m1_munit."__id" AS s1_munit_mid
FROM "subJobs" s1_m0_msubJobs,
"unit" s1_m1_munit
WHERE ( ( ( s0_m0_msubJobs."__id" ) =
( s1_m0_msubJobs."__id" ) )
AND
( s1_m0_msubJobs."__id" = s1_m1_munit."subJobs_mid" ) )
AND ( ( NOT ( s1_m1_munit."validated" IS NOT NULL ) )
OR ( ( s1_m1_munit."validated" ) = ( 'N'
) ) )
OFFSET 0))
)
) )
AND ( s0_m0_msubJobs."job_mid" = s0_m1_mjob."__id" ))
ORDER BY e0_m4 DESC,
s0_mjob_mid nulls first,
s0_msubjobs_mid nulls first
we get the following query plan
Sort (cost=556.27..556.30 rows=15 width=503) (actual time=0.828..0.829 rows=49 loops=1)
Sort Key: s0_m0_msubjobs.creation_mdate, s0_m1_mjob.__id, s0_m0_msubjobs.__id
Sort Method: quicksort Memory: 31kB
Buffers: shared hit=390
-> Hash Join (cost=23.44..555.97 rows=15 width=503) (actual time=0.229..0.788 rows=49 loops=1)
Hash Cond: (s0_m0_msubjobs.job_mid = s0_m1_mjob.__id)
Buffers: shared hit=390
-> Bitmap Heap Scan on "subJobs" s0_m0_msubjobs (cost=5.19..537.52 rows=15 width=484) (actual
time=0.057..0.591rows=49 loops=1)
Recheck Cond: ((status)::text = 'IN_PROGRESS'::text)
Filter: (((validation_mrequired)::text = 'Y'::text) AND (SubPlan 1))
Buffers: shared hit=383
-> Bitmap Index Scan on subjob_status (cost=0.00..5.19 rows=125 width=0) (actual time=0.031..0.031
rows=125loops=1)
Index Cond: ((status)::text = 'IN_PROGRESS'::text)
Buffers: shared hit=2
SubPlan 1
-> Limit (cost=0.00..1187.36 rows=307 width=8) (actual time=0.009..0.009 rows=1 loops=49)
Buffers: shared hit=343
-> Nested Loop (cost=0.00..1187.36 rows=307 width=8) (actual time=0.009..0.009 rows=1 loops=49)
Buffers: shared hit=343
-> Index Scan using "subJobs_mid_mindex" on "subJobs" s1_m0_msubjobs (cost=0.00..8.27
rows=1width=8) (actual time=0.002..0.002 rows=1 loops=49)
Index Cond: (__id = s0_m0_msubjobs.__id)
Buffers: shared hit=147
-> Index Scan using "unit_msubJobs_mid_mindex" on unit s1_m1_munit (cost=0.00..1176.02
rows=307width=16) (actual time=0.006..0.006 rows=1 loops=49)
Index Cond: ("subJobs_mid" = s0_m0_msubjobs.__id)
Filter: ((validated IS NULL) OR ((validated)::text = 'N'::text))
Buffers: shared hit=196
-> Hash (cost=12.00..12.00 rows=500 width=27) (actual time=0.164..0.164 rows=500 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 30kB
Buffers: shared hit=7
-> Seq Scan on job s0_m1_mjob (cost=0.00..12.00 rows=500 width=27) (actual time=0.003..0.082 rows=500
loops=1)
Buffers: shared hit=7
Total runtime: 0.899 ms
which is a few orders of magnitude faster.
Is there a reason why the more optimal query plan isn't chosen without the OFFSET 0 clause?
Shouldn't the optimizer evaluate the option where the EXISTS query is JOINED as well as the option where the EXISTS
queryisn't and choose the plan with the lowest cost?
Any light you could shed on this is appreciated.
Potentially useful information:
Version: PostgreSQL 9.1.1, compiled by Visual C++ build 1500, 64-bit
Data: Most units have validated set to NULL, 500 jobs, 4555 subJobs, 1392372 units.
Schema:
-- Table: job
CREATE TABLE job
(
__id bigint NOT NULL,
parent_mjob bigint,
status character varying(32),
priority integer,
creation_mdate timestamp without time zone,
completion_mdate timestamp without time zone,
description character varying(200),
xml_mname character varying(200),
__source character varying(200),
__label character varying(200),
error_mmessage character varying(200),
last_mchange_mdate timestamp without time zone,
__size numeric(19,0),
CONSTRAINT job_pkey PRIMARY KEY (__id ),
CONSTRAINT job_parent_mjob_fkey FOREIGN KEY (parent_mjob)
REFERENCES job (__id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE SET NULL
)
WITH (
OIDS=FALSE
);
CREATE INDEX job_mdescription_mindex
ON job
USING gin
(to_tsvector('english'::regconfig, description::text) );
CREATE INDEX job_mid_mindex
ON job
USING btree
(__id );
-- Table: "subJobs"
CREATE TABLE "subJobs"
(
__id bigint NOT NULL,
document_mflow character varying(200),
status character varying(200),
error_mmessage character varying(200),
validation_mrequired character varying(200),
completion_mdate timestamp without time zone,
creation_mdate timestamp without time zone,
job_mid bigint NOT NULL,
file_mlocation character varying(200),
CONSTRAINT "subJobs_pkey" PRIMARY KEY (__id ),
CONSTRAINT "subJobs_job_mid_fkey" FOREIGN KEY (job_mid)
REFERENCES job (__id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE
)
WITH (
OIDS=FALSE
);
CREATE INDEX "subJobs_mid_mindex"
ON "subJobs"
USING btree
(__id );
CREATE INDEX "subJobs_mjob_mid_mindex"
ON "subJobs"
USING btree
(job_mid );
CREATE INDEX subjob_status
ON "subJobs"
USING btree
(status COLLATE pg_catalog."default" );
-- Table: unit
CREATE TABLE unit
(
__id bigint NOT NULL,
client_mnumber character varying(200) NOT NULL,
source_mid character varying(200),
delivery_mformat character varying(200),
delivery_mtype character varying(200),
client_memailaddress character varying(200),
client_mcollectivity character varying(200),
client_mcommunication_mpreference character varying(200),
validated character varying(200),
status character varying(200),
error_mmessage character varying(200),
completion_mdate timestamp without time zone,
creation_mdate timestamp without time zone,
file_mlocation character varying(200),
delivery_mfeedback character varying(200),
"subJobs_mid" bigint NOT NULL,
__type character varying(200),
CONSTRAINT unit_pkey PRIMARY KEY (__id ),
CONSTRAINT "unit_subJobs_mid_fkey" FOREIGN KEY ("subJobs_mid")
REFERENCES "subJobs" (__id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE
)
WITH (
OIDS=FALSE
);
CREATE INDEX unit_mid_mindex
ON unit
USING btree
(__id );
CREATE INDEX "unit_msubJobs_mid_mindex"
ON unit
USING btree
("subJobs_mid" );
CREATE INDEX unit_validated
ON unit
USING btree
(validated COLLATE pg_catalog."default" );
With kind regards,
Nick Hofstede
________________________________
Inventive Designers' Email Disclaimer:
http://www.inventivedesigners.com/email-disclaimer