Thread: UNION ALL and sequential scans
I was glad to find in 8.3.7 that pg was now smart enough to use an index for a simple UNION ALL. But unfortunately, it's not quite there yet for our use case. Consider the following dummy tables: create table foo (id serial primary key, val int not null); create table bar (id serial primary key, val int not null); create table baz (id1 serial primary key, id2 int not null); insert into foo (val) select x from generate_series(0,10000) as x; insert into bar (val) select x from generate_series(0,10000) as x; insert into baz (id2) select x from generate_series(0,10000) as x; This query correctly uses the primary key indexes on foo and bar: explain analyze select * from baz join ( select id, val from foo union all select id, val from bar ) as foobar on(baz.id2=foobar.id) where baz.id1=42; But if I add a constant-valued column to indicate which branch of the union each result came from: explain analyze select * from baz join ( select id, val, 'foo'::text as source from foo union all select id, val, 'bar'::text as source from bar ) as foobar on(baz.id2=foobar.id) where baz.id1=42; All of a sudden it insists on a sequential scan (and takes 800 times as long to run) even when enable_seqscan is set false. Is there a good reason for this, or is it just a missed opportunity in the optimizer?
"Brad Jorsch" <programmer@protech1inc.com> writes: > But if I add a constant-valued column to indicate which branch of the > union each result came from: > explain analyze select * from baz join ( > select id, val, 'foo'::text as source from foo > union all > select id, val, 'bar'::text as source from bar > ) as foobar on(baz.id2=foobar.id) where baz.id1=42; > All of a sudden it insists on a sequential scan (and takes 800 times as > long to run) even when enable_seqscan is set false. Is there a good > reason for this, or is it just a missed opportunity in the optimizer? It's an ancient and fundamental limitation that is fixed in 8.4. Do not expect to see it fixed in 8.3.x. regards, tom lane
On Thu, May 14, 2009 at 4:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Brad Jorsch" <programmer@protech1inc.com> writes: >> But if I add a constant-valued column to indicate which branch of the >> union each result came from: > >> explain analyze select * from baz join ( >> select id, val, 'foo'::text as source from foo >> union all >> select id, val, 'bar'::text as source from bar >> ) as foobar on(baz.id2=foobar.id) where baz.id1=42; > >> All of a sudden it insists on a sequential scan (and takes 800 times as >> long to run) even when enable_seqscan is set false. Is there a good >> reason for this, or is it just a missed opportunity in the optimizer? > > It's an ancient and fundamental limitation that is fixed in 8.4. > Do not expect to see it fixed in 8.3.x. Does this also apply to the case of a join on an inherited table ? example: http://archives.postgresql.org/pgsql-performance/2003-10/msg00018.php Kind regards, Mathieu
Mathieu De Zutter <mathieu@dezutter.org> writes: > On Thu, May 14, 2009 at 4:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> It's an ancient and fundamental limitation that is fixed in 8.4. >> Do not expect to see it fixed in 8.3.x. > Does this also apply to the case of a join on an inherited table ? > example: http://archives.postgresql.org/pgsql-performance/2003-10/msg00018.php Well, the particular issue described in that message is long gone. What Brad is complaining about is non-strict expressions in the outputs of append-relation members. An inheritance tree also forms an append-relation, but AFAIK there is no way to have anything but simple Vars in its outputs so the case wouldn't arise. Do you have a specific problem example in mind? regards, tom lane