On Postgres 9.6 (config below), I have a case I don't understand: three tables that can be separately queried in milliseconds, but when put together into one view using UNION, take 150 seconds to query. Here's the rough idea (actual details below):
create view thesaurus as
(select id, name from A)
union (select id, name from B)
union (select id, name from C);
create table h(i integer);
insert into h values(12345);
On the other hand, if you do this, it's a millisecond plan:
select * from thesaurus where id in (12345);
Notice that it's effectively the same query since h above contains just this one value.
Here are the actual details. The view being queried:
create view thesaurus2 as
select
rt.thesaurus_id,
rt.version_id,
rt.normalized,
rt.identifier,
rt.typecode
from local_sample s
join thesaurus_master rt using (sample_id)
union
select c.id as thesaurus_id, c.cas_number as normalized,
c.cas_number as identifier,
3 as typecode
from cas_number c
join sample s on c.id = s.version_id union
select m.id as thesaurus_id, lower(m.mfcd) as normalized,
m.mfcd as identifier,
4 as typecode
from mfcd m
join sample s on m.id = s.version_id;
The bad sort (147 seconds to execute). Note that the "hitlist" table contains exactly one row.
explain analyze select c.version_id
from thesaurus2 c
join hitlist_rows_103710241 h on (c.thesaurus_id = h.objectid);
If I instead just query directly for that value, the answer is almost instant (1.2 msec):
explain analyze select c.version_id
from thesaurus2 c
where c.version_id in (1324511991);
Now if I take any one of the three tables in the UNION view, the query is really fast on each one. For example:
select distinct c.version_id
from (
select distinct c.id as thesaurus_id, c.cas_number as normalized,
c.cas_number as identifier,
3 as typecode
from cas_number c
join sample s on c.id = s.version_id ) c
join hitlist_rows_103710241 h on (c.thesaurus_id = h.objectid);
The other two subqueries are similarly fast.
This is Postgres9.6 running on Ubuntu 16.04, 64GB memory 16 CPUs. Non-default config values:
max_connections = 2000
shared_buffers = 12073MB
work_mem = 256MB
maintenance_work_mem = 512MB
synchronous_commit = off
effective_cache_size = 32GB
wal_level = logical
wal_keep_segments = 1000
max_wal_senders = 10
hot_standby = on
archive_mode = on
archive_command = '/bin/true'
Thanks!
Craig
--
---------------------------------
Craig A. James
Chief Technology Officer
eMolecules, Inc.
3430 Carmel Mountain Road, Suite 250
San Diego, CA 92121
---------------------------------