Thread: Oddity with view
Why is this view 9x slower than the base table? cnuapp_prod@postgres06.nut=# explain analyze select count(*) from loan_tasks_committed; QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------------------------ Aggregate (cost=994625.69..994625.70 rows=1 width=0) (actual time=7432.306..7432.306 rows=1 loops=1) -> Seq Scan on loan_tasks_committed (cost=0.00..929345.35 rows=26112135 width=0) (actual time=0.012..5116.776 rows=26115689 loops=1) Total runtime: 7432.360 ms (3 rows) Time: 7432.858 ms loan_tasks effectively does SELECT * FROM loan_tasks_committed UNION ALL SELECT * FROM loan_tasks_pending;. There's some lookup tables for _pending, but as this explain shows there's no actual data there right now. cnuapp_prod@postgres06.nut=# explain analyze select count(*) from loan_tasks; QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------------------------------ --------- Aggregate (cost=1516929.75..1516929.76 rows=1 width=0) (actual time=60396.081..60396.082 rows=1 loops=1) -> Append (cost=0.00..1190523.94 rows=26112465 width=240) (actual time=0.023..57902.470 rows=26115689 loops=1) -> Subquery Scan "*SELECT* 1" (cost=0.00..1190466.70 rows=26112135 width=162) (actual time=0.023..54776.335 rows=26115689 loops=1) -> Seq Scan on loan_tasks_committed (cost=0.00..929345.35 rows=26112135 width=162) (actual time=0.014..22531.902 rows=26115689 loops=1) -> Subquery Scan "*SELECT* 2" (cost=36.10..57.24 rows=330 width=240) (actual time=0.003..0.003 rows=0 loops=1) -> Hash Join (cost=36.10..53.94 rows=330 width=240) (actual time=0.002..0.002 rows=0 loops=1) Hash Cond: (ltp.loan_task_code_id = ltc.id) -> Seq Scan on loan_tasks_pending ltp (cost=0.00..13.30 rows=330 width=208) (actual time=0.001..0.001 rows=0 loops=1) -> Hash (cost=21.60..21.60 rows=1160 width=36) (never executed) -> Seq Scan on loan_task_codes ltc (cost=0.00..21.60 rows=1160 width=36) (never executed) Total runtime: 60396.174 ms (11 rows) Time: 60397.046 ms SELECT true AS "committed", loan_tasks_committed.id, ..., loan_tasks_committed.task_amount FROM loan_tasks_committed UNION ALL SELECT false AS "committed", ltp.id, ..., NULL::"unknown" AS task_amount FROM loan_tasks_pending ltp JOIN loan_task_codes ltc ON ltp.loan_task_code_id = ltc.id; The stuff I omitted is just some fields and a few other NULLs. This is 8.2.9. -- Decibel! jnasby@cashnetusa.com (512) 569-9461
"Jim 'Decibel!' Nasby" <jnasby@cashnetusa.com> writes: > loan_tasks effectively does SELECT * FROM loan_tasks_committed UNION > ALL SELECT * FROM loan_tasks_pending;. You seem to have neglected to mention a join or two. regards, tom lane
On Nov 10, 2008, at 7:06 AM, Tom Lane wrote: > "Jim 'Decibel!' Nasby" <jnasby@cashnetusa.com> writes: >> loan_tasks effectively does SELECT * FROM loan_tasks_committed UNION >> ALL SELECT * FROM loan_tasks_pending;. > > You seem to have neglected to mention a join or two. Yeah, though I did show them at the end of the message... SELECT true AS "committed", loan_tasks_committed.id, ..., loan_tasks_committed.task_amount FROM loan_tasks_committed UNION ALL SELECT false AS "committed", ltp.id, ..., NULL::"unknown" AS task_amount FROM loan_tasks_pending ltp JOIN loan_task_codes ltc ON ltp.loan_task_code_id = ltc.id; Thing is, there's no data to be had on that side. All of the time is going into the seqscan of loan_tasks_committed. But here's what's really disturbing... Aggregate (cost=994625.69..994625.70 rows=1 width=0) (actual time=7432.306..7432.306 rows=1 loops=1) -> Seq Scan on loan_tasks_committed (cost=0.00..929345.35 rows=26112135 width=0) (actual time=0.012..5116.776 rows=26115689 loops=1) vs Aggregate (cost=1516929.75..1516929.76 rows=1 width=0) (actual time=60396.081..60396.082 rows=1 loops=1) -> Append (cost=0.00..1190523.94 rows=26112465 width=240) (actual time=0.023..57902.470 rows=26115689 loops=1) -> Subquery Scan "*SELECT* 1" (cost=0.00..1190466.70 rows=26112135 width=162) (actual time=0.023..54776.335 rows=26115689 loops=1) -> Seq Scan on loan_tasks_committed (cost=0.00..929345.35 rows=26112135 width=162) (actual time=0.014..22531.902 rows=26115689 loops=1) -> Subquery Scan "*SELECT* 2" (cost=36.10..57.24 rows=330 width=240) (actual time=0.003..0.003 rows=0 loops=1) How on earth did the seqscan suddenly take 4x longer? And why is the subquery scan then doubling the amount of time again? -- Decibel! jnasby@cashnetusa.com (512) 569-9461
Jim 'Decibel!' Nasby wrote: > On Nov 10, 2008, at 7:06 AM, Tom Lane wrote: >> "Jim 'Decibel!' Nasby" <jnasby@cashnetusa.com> writes: >>> loan_tasks effectively does SELECT * FROM loan_tasks_committed UNION >>> ALL SELECT * FROM loan_tasks_pending;. >> >> You seem to have neglected to mention a join or two. > > > Yeah, though I did show them at the end of the message... > > SELECT true AS "committed", loan_tasks_committed.id, ..., > loan_tasks_committed.task_amount > FROM loan_tasks_committed > UNION ALL > SELECT false AS "committed", ltp.id, ..., NULL::"unknown" AS task_amount > FROM loan_tasks_pending ltp > JOIN loan_task_codes ltc ON ltp.loan_task_code_id = ltc.id; > > Thing is, there's no data to be had on that side. All of the time is > going into the seqscan of loan_tasks_committed. But here's what's really > disturbing... > -> Seq Scan on loan_tasks_committed (cost=0.00..929345.35 > rows=26112135 width=0) (actual time=0.012..5116.776 rows=26115689 loops=1) > -> Seq Scan on loan_tasks_committed > (cost=0.00..929345.35 rows=26112135 width=162) (actual > time=0.014..22531.902 rows=26115689 loops=1) It's the width - the view is fetching all the rows. Is the "true as committed" bit confusing it? -- Richard Huxton Archonet Ltd
"Jim 'Decibel!' Nasby" <jnasby@cashnetusa.com> writes: > How on earth did the seqscan suddenly take 4x longer? And why is the > subquery scan then doubling the amount of time again? Maybe the disk access is less sequential because of the need to fetch the other table too? regards, tom lane
On Nov 10, 2008, at 12:21 PM, Richard Huxton wrote: > Jim 'Decibel!' Nasby wrote: >> On Nov 10, 2008, at 7:06 AM, Tom Lane wrote: >>> "Jim 'Decibel!' Nasby" <jnasby@cashnetusa.com> writes: >>>> loan_tasks effectively does SELECT * FROM loan_tasks_committed >>>> UNION >>>> ALL SELECT * FROM loan_tasks_pending;. >>> >>> You seem to have neglected to mention a join or two. >> >> >> Yeah, though I did show them at the end of the message... >> >> SELECT true AS "committed", loan_tasks_committed.id, ..., >> loan_tasks_committed.task_amount >> FROM loan_tasks_committed >> UNION ALL >> SELECT false AS "committed", ltp.id, ..., NULL::"unknown" AS >> task_amount >> FROM loan_tasks_pending ltp >> JOIN loan_task_codes ltc ON ltp.loan_task_code_id = ltc.id; >> >> Thing is, there's no data to be had on that side. All of the time is >> going into the seqscan of loan_tasks_committed. But here's what's >> really >> disturbing... > >> -> Seq Scan on loan_tasks_committed (cost=0.00..929345.35 >> rows=26112135 width=0) (actual time=0.012..5116.776 rows=26115689 >> loops=1) > >> -> Seq Scan on loan_tasks_committed >> (cost=0.00..929345.35 rows=26112135 width=162) (actual >> time=0.014..22531.902 rows=26115689 loops=1) > > It's the width - the view is fetching all the rows. Is the "true as > committed" bit confusing it? Turns out, no. I was just writing up a stand-alone test case and forgot to include that, but there's still a big difference (note what I'm pasting is now from HEAD as of a bit ago, but I see the effect on 8.2 as well): decibel@platter.local=# explain analyze select count(*) from a; QUERY PLAN ------------------------------------------------------------------------ --------------------------------------------- Aggregate (cost=137164.57..137164.58 rows=1 width=0) (actual time=4320.986..4320.986 rows=1 loops=1) -> Seq Scan on a (cost=0.00..120542.65 rows=6648765 width=0) (actual time=0.188..2707.433 rows=9999999 loops=1) Total runtime: 4321.039 ms (3 rows) Time: 4344.158 ms decibel@platter.local=# explain analyze select count(*) from v; QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------------------------ Aggregate (cost=270286.52..270286.53 rows=1 width=0) (actual time=14766.630..14766.630 rows=1 loops=1) -> Append (cost=0.00..187150.20 rows=6650905 width=36) (actual time=0.039..12810.073 rows=9999999 loops=1) -> Subquery Scan "*SELECT* 1" (cost=0.00..187030.30 rows=6648765 width=36) (actual time=0.039..10581.367 rows=9999999 loops=1) -> Seq Scan on a (cost=0.00..120542.65 rows=6648765 width=36) (actual time=0.038..5731.748 rows=9999999 loops=1) -> Subquery Scan "*SELECT* 2" (cost=37.67..119.90 rows=2140 width=40) (actual time=0.002..0.002 rows=0 loops=1) -> Hash Join (cost=37.67..98.50 rows=2140 width=40) (actual time=0.002..0.002 rows=0 loops=1) Hash Cond: (b.c_id = c.c_id) -> Seq Scan on b (cost=0.00..31.40 rows=2140 width=8) (actual time=0.000..0.000 rows=0 loops=1) -> Hash (cost=22.30..22.30 rows=1230 width=36) (never executed) -> Seq Scan on c (cost=0.00..22.30 rows=1230 width=36) (never executed) Total runtime: 14766.784 ms (11 rows) Time: 14767.550 ms In 8.2, it took 20 seconds to go through the view: QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------------------------ Aggregate (cost=303960.98..303960.99 rows=1 width=0) (actual time=20268.877..20268.877 rows=1 loops=1) -> Append (cost=0.00..211578.98 rows=7390560 width=40) (actual time=0.038..17112.190 rows=9999999 loops=1) -> Subquery Scan "*SELECT* 1" (cost=0.00..211467.40 rows=7388620 width=36) (actual time=0.038..13973.782 rows=9999999 loops=1) -> Seq Scan on a (cost=0.00..137581.20 rows=7388620 width=36) (actual time=0.037..8280.204 rows=9999999 loops=1) -> Subquery Scan "*SELECT* 2" (cost=36.10..111.58 rows=1940 width=40) (actual time=0.003..0.003 rows=0 loops=1) -> Hash Join (cost=36.10..92.18 rows=1940 width=40) (actual time=0.002..0.002 rows=0 loops=1) Hash Cond: (b.c_id = c.c_id) -> Seq Scan on b (cost=0.00..29.40 rows=1940 width=8) (actual time=0.000..0.000 rows=0 loops=1) -> Hash (cost=21.60..21.60 rows=1160 width=36) (never executed) -> Seq Scan on c (cost=0.00..21.60 rows=1160 width=36) (never executed) Total runtime: 20269.333 ms (11 rows) The results for 8.3 are similar to HEAD. Here's the commands to generate the test case: create table a(a int, b text default 'test text'); create table c(c_id serial primary key, c_text text); insert into c(c_text) values('a'),('b'),('c'); create table b(a int, c_id int references c(c_id)); create view v as select a, b, null as c_id, null as c_text from a union all select a, null, b.c_id, c_text from b join c on (b.c_id= c.c_id); \timing insert into a(a) select generate_series(1,9999999); select count(*) from a; select count(*) from v; explain analyze select count(*) from a; explain analyze select count(*) from v; -- Decibel! jnasby@cashnetusa.com (512) 569-9461
"Jim 'Decibel!' Nasby" <jnasby@cashnetusa.com> writes: > Here's the commands to generate the test case: > create table a(a int, b text default 'test text'); > create table c(c_id serial primary key, c_text text); > insert into c(c_text) values('a'),('b'),('c'); > create table b(a int, c_id int references c(c_id)); > create view v as select a, b, null as c_id, null as c_text from a > union all select a, null, b.c_id, c_text from b join c on (b.c_id= > c.c_id); > \timing > insert into a(a) select generate_series(1,9999999); > select count(*) from a; > select count(*) from v; > explain analyze select count(*) from a; > explain analyze select count(*) from v; I think what you're looking at is projection overhead and per-plan-node overhead (the EXPLAIN ANALYZE in itself contributes quite a lot of the latter). One thing you could do is be more careful about making the union input types match up so that no subquery scan nodes are required: create view v2 as select a, b, null::int as c_id, null::text as c_text from a union all select a, null::text, b.c_id, c_text from b join c on (b.c_id=c.c_id); On my machine this runs about twice as fast as the original view. regards, tom lane
On Nov 10, 2008, at 1:31 PM, Tom Lane wrote: > "Jim 'Decibel!' Nasby" <jnasby@cashnetusa.com> writes: >> Here's the commands to generate the test case: > >> create table a(a int, b text default 'test text'); >> create table c(c_id serial primary key, c_text text); >> insert into c(c_text) values('a'),('b'),('c'); >> create table b(a int, c_id int references c(c_id)); >> create view v as select a, b, null as c_id, null as c_text from a >> union all select a, null, b.c_id, c_text from b join c on (b.c_id= >> c.c_id); >> \timing >> insert into a(a) select generate_series(1,9999999); >> select count(*) from a; >> select count(*) from v; >> explain analyze select count(*) from a; >> explain analyze select count(*) from v; > > I think what you're looking at is projection overhead and per-plan- > node > overhead (the EXPLAIN ANALYZE in itself contributes quite a lot of the > latter). True... under HEAD explain took 13 seconds while a plain count took 10. Still not very good considering the count from the raw table took about 4 seconds (with or without explain). > One thing you could do is be more careful about making the > union input types match up so that no subquery scan nodes are > required: > > create view v2 as select a, b, null::int as c_id, null::text as > c_text from a > union all select a, null::text, b.c_id, c_text from b join c on > (b.c_id=c.c_id); > > On my machine this runs about twice as fast as the original view. Am I missing some magic? I'm still getting the subquery scan. decibel@platter.local=# explain select count(*) from v2; QUERY PLAN ------------------------------------------------------------------------ -------------- Aggregate (cost=279184.19..279184.20 rows=1 width=0) -> Append (cost=0.00..254178.40 rows=10002315 width=0) -> Subquery Scan "*SELECT* 1" (cost=0.00..254058.50 rows=10000175 width=0) -> Seq Scan on a (cost=0.00..154056.75 rows=10000175 width=14) -> Subquery Scan "*SELECT* 2" (cost=37.67..119.90 rows=2140 width=0) -> Hash Join (cost=37.67..98.50 rows=2140 width=40) Hash Cond: (b.c_id = c.c_id) -> Seq Scan on b (cost=0.00..31.40 rows=2140 width=8) -> Hash (cost=22.30..22.30 rows=1230 width=36) -> Seq Scan on c (cost=0.00..22.30 rows=1230 width=36) (10 rows) Time: 0.735 ms decibel@platter.local=# \d v2 View "public.v2" Column | Type | Modifiers --------+---------+----------- a | integer | b | text | c_id | integer | c_text | text | View definition: SELECT a.a, a.b, NULL::integer AS c_id, NULL::text AS c_text FROM a UNION ALL SELECT b.a, NULL::text AS b, b.c_id, c.c_text FROM b JOIN c ON b.c_id = c.c_id; That's on HEAD, btw. -- Decibel! jnasby@cashnetusa.com (512) 569-9461
"Jim 'Decibel!' Nasby" <jnasby@cashnetusa.com> writes: > On Nov 10, 2008, at 1:31 PM, Tom Lane wrote: >> On my machine this runs about twice as fast as the original view. > Am I missing some magic? I'm still getting the subquery scan. Hmm, I'm getting a core dump :-( ... this seems to be busted in HEAD. 8.3 gets it right though. regards, tom lane
On Nov 10, 2008, at 9:20 PM, Tom Lane wrote: > "Jim 'Decibel!' Nasby" <jnasby@cashnetusa.com> writes: >> On Nov 10, 2008, at 1:31 PM, Tom Lane wrote: >>> On my machine this runs about twice as fast as the original view. > >> Am I missing some magic? I'm still getting the subquery scan. > > Hmm, I'm getting a core dump :-( ... this seems to be busted in HEAD. > 8.3 gets it right though. Doesn't seem to for me... :/ decibel@platter.local=# select version(); version ------------------------------------------------------------------------ ----------------------------------------------------------------- PostgreSQL 8.3.5 on i386-apple-darwin8.11.1, compiled by GCC i686- apple-darwin8-gcc-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc. build 5370) (1 row) Time: 0.250 ms decibel@platter.local=# explain select count(*) from v2; QUERY PLAN ------------------------------------------------------------------------ -------------- Aggregate (cost=279184.19..279184.20 rows=1 width=0) -> Append (cost=0.00..254178.40 rows=10002315 width=0) -> Subquery Scan "*SELECT* 1" (cost=0.00..254058.50 rows=10000175 width=0) -> Seq Scan on a (cost=0.00..154056.75 rows=10000175 width=14) -> Subquery Scan "*SELECT* 2" (cost=37.67..119.90 rows=2140 width=0) -> Hash Join (cost=37.67..98.50 rows=2140 width=40) Hash Cond: (b.c_id = c.c_id) -> Seq Scan on b (cost=0.00..31.40 rows=2140 width=8) -> Hash (cost=22.30..22.30 rows=1230 width=36) -> Seq Scan on c (cost=0.00..22.30 rows=1230 width=36) (10 rows) Time: 0.923 ms decibel@platter.local=# \d v2 View "public.v2" Column | Type | Modifiers --------+---------+----------- a | integer | b | text | c_id | integer | c_text | text | View definition: SELECT a.a, a.b, NULL::integer AS c_id, NULL::text AS c_text FROM a UNION ALL SELECT b.a, NULL::text AS b, b.c_id, c.c_text FROM b JOIN c ON b.c_id = c.c_id; decibel@platter.local=# -- Decibel! jnasby@cashnetusa.com (512) 569-9461
"Jim 'Decibel!' Nasby" <jnasby@cashnetusa.com> writes: > On Nov 10, 2008, at 9:20 PM, Tom Lane wrote: >> 8.3 gets it right though. > Doesn't seem to for me... :/ Oh, I was looking at "select * from v2" not "select count(*) from v2". HEAD is a bit smarter about the latter than 8.3 is. regards, tom lane
On Nov 11, 2008, at 1:15 PM, Tom Lane wrote: > "Jim 'Decibel!' Nasby" <jnasby@cashnetusa.com> writes: >> On Nov 10, 2008, at 9:20 PM, Tom Lane wrote: >>> 8.3 gets it right though. > >> Doesn't seem to for me... :/ > > Oh, I was looking at "select * from v2" not "select count(*) from v2". > HEAD is a bit smarter about the latter than 8.3 is. So here's something odd... in both 8.3 and HEAD from a while ago it gives a better plan for SELECT * than for SELECT count(*): decibel@platter.local=# explain analyze select * from v2; QUERY PLAN ------------------------------------------------------------------------ ----------------------------------------------------- Result (cost=0.00..254178.40 rows=10002315 width=72) (actual time=0.049..8452.152 rows=9999999 loops=1) -> Append (cost=0.00..254178.40 rows=10002315 width=72) (actual time=0.048..5887.025 rows=9999999 loops=1) -> Seq Scan on a (cost=0.00..154056.75 rows=10000175 width=14) (actual time=0.048..4207.482 rows=9999999 loops=1) -> Hash Join (cost=37.67..98.50 rows=2140 width=40) (actual time=0.002..0.002 rows=0 loops=1) Hash Cond: (b.c_id = c.c_id) -> Seq Scan on b (cost=0.00..31.40 rows=2140 width=8) (actual time=0.000..0.000 rows=0 loops=1) -> Hash (cost=22.30..22.30 rows=1230 width=36) (never executed) -> Seq Scan on c (cost=0.00..22.30 rows=1230 width=36) (never executed) Total runtime: 9494.162 ms (9 rows) decibel@platter.local=# explain analyze select count(*) from v2; QUERY PLAN ------------------------------------------------------------------------ ----------------------------------------------------------------- Aggregate (cost=279184.19..279184.20 rows=1 width=0) (actual time=13155.524..13155.524 rows=1 loops=1) -> Append (cost=0.00..254178.40 rows=10002315 width=0) (actual time=0.045..11042.562 rows=9999999 loops=1) -> Subquery Scan "*SELECT* 1" (cost=0.00..254058.50 rows=10000175 width=0) (actual time=0.045..8976.352 rows=9999999 loops=1) -> Seq Scan on a (cost=0.00..154056.75 rows=10000175 width=14) (actual time=0.045..5936.930 rows=9999999 loops=1) -> Subquery Scan "*SELECT* 2" (cost=37.67..119.90 rows=2140 width=0) (actual time=0.002..0.002 rows=0 loops=1) -> Hash Join (cost=37.67..98.50 rows=2140 width=40) (actual time=0.002..0.002 rows=0 loops=1) Hash Cond: (b.c_id = c.c_id) -> Seq Scan on b (cost=0.00..31.40 rows=2140 width=8) (actual time=0.001..0.001 rows=0 loops=1) -> Hash (cost=22.30..22.30 rows=1230 width=36) (never executed) -> Seq Scan on c (cost=0.00..22.30 rows=1230 width=36) (never executed) Total runtime: 13155.642 ms (11 rows) decibel@platter.local=# explain analyze select count(*) from (select * from v2 offset 0) a; QUERY PLAN ------------------------------------------------------------------------ ----------------------------------------------------------------- Aggregate (cost=379207.34..379207.35 rows=1 width=0) (actual time=12592.273..12592.274 rows=1 loops=1) -> Limit (cost=0.00..254178.40 rows=10002315 width=72) (actual time=0.173..11057.717 rows=9999999 loops=1) -> Result (cost=0.00..254178.40 rows=10002315 width=72) (actual time=0.172..9213.524 rows=9999999 loops=1) -> Append (cost=0.00..254178.40 rows=10002315 width=72) (actual time=0.172..6608.656 rows=9999999 loops=1) -> Seq Scan on a (cost=0.00..154056.75 rows=10000175 width=14) (actual time=0.171..4793.116 rows=9999999 loops=1) -> Hash Join (cost=37.67..98.50 rows=2140 width=40) (actual time=0.002..0.002 rows=0 loops=1) Hash Cond: (b.c_id = c.c_id) -> Seq Scan on b (cost=0.00..31.40 rows=2140 width=8) (actual time=0.001..0.001 rows=0 loops=1) -> Hash (cost=22.30..22.30 rows=1230 width=36) (never executed) -> Seq Scan on c (cost=0.00..22.30 rows=1230 width=36) (never executed) Total runtime: 12592.442 ms (11 rows) And yes, explain overhead is huge... decibel@platter.local=# \timing Timing is on. decibel@platter.local=# select count(*) from v2; count --------- 9999999 (1 row) Time: 6217.624 ms decibel@platter.local=# -- Decibel! jnasby@cashnetusa.com (512) 569-9461
"Jim 'Decibel!' Nasby" <jnasby@cashnetusa.com> writes: > So here's something odd... in both 8.3 and HEAD from a while ago it > gives a better plan for SELECT * than for SELECT count(*): The short answer is that the Subquery Scan nodes can be dropped out when they are no-ops, which is to say producing the same set of columns their input produces (and not testing any filter conditions, but that's not relevant here). SELECT count(*) doesn't want to know about any columns so the output of the UNION arm doesn't match ... regards, tom lane