Thread: Query plan excluding index on view
I'm trying to fine tune this query to return in a reasonable amount of time and am having difficulties getting the query to run the way I'd like. I have a couple of semi-related entities that are stored in individual tables, say, A and B. There is then a view created that pulls together the common fields from these 2 tables. These are then related through a m:m relationship to a classification. Quick definitions of all of this follows:
Table: ItemA
id <- primary key
name
description
<addtl fields for A>
Table: ItemB
id <- primary key
name
description
<addtl fields for B>
View: Combined
SELECT id, name, description from ItemA
UNION ALL
SELECT id, name, description from ItemB
Table: xref
id <- primary key
item_id <- indexed, points to either ItemA.id or ItemB.id
classifcation_id <- indexed, points to classification.id
Table: classifcation
id <- primiary key
name
I'm trying to query from the classification, through the xref, and to the view to get a list of Items (either A or B) that are tied to a specific classification. My query is rather simple, baiscally as follows:
SELECT id, name, description
FROM combination c
INNER JOIN xref on c.id = xref.item_id
WHERE xref.classifcation_id = 1
This query runs in about 2-3 minutes (I should mention that ItemA has ~18M records and xref has ~26M records - and both will continue to grow). The explain text shows a disregard for the indexes on ItemA and ItemB and a sequence scan is done on both of them. However, if I rewrite this query to join directly to ItemA rather to the view it runs in ~50ms because it now uses the proper index.
I know it's generally requested to include the EXPLAIN text when submitting a specific question, but I thought perhaps this was generic enough that someone might at least have some suggestions. If required I can certainly work up a simpler example, or I could include my actual explain (though it doesn't exactly match everything defined above as I tried to keep things rather generic).
Any links would be nice as well, from all my searching the past few days, most of the performance tuning resources I could find where about tuning the server itself, not really a specific query - at least not one that dealt with this issue. If you've read this far - thank you much!
Table: ItemA
id <- primary key
name
description
<addtl fields for A>
Table: ItemB
id <- primary key
name
description
<addtl fields for B>
View: Combined
SELECT id, name, description from ItemA
UNION ALL
SELECT id, name, description from ItemB
Table: xref
id <- primary key
item_id <- indexed, points to either ItemA.id or ItemB.id
classifcation_id <- indexed, points to classification.id
Table: classifcation
id <- primiary key
name
I'm trying to query from the classification, through the xref, and to the view to get a list of Items (either A or B) that are tied to a specific classification. My query is rather simple, baiscally as follows:
SELECT id, name, description
FROM combination c
INNER JOIN xref on c.id = xref.item_id
WHERE xref.classifcation_id = 1
This query runs in about 2-3 minutes (I should mention that ItemA has ~18M records and xref has ~26M records - and both will continue to grow). The explain text shows a disregard for the indexes on ItemA and ItemB and a sequence scan is done on both of them. However, if I rewrite this query to join directly to ItemA rather to the view it runs in ~50ms because it now uses the proper index.
I know it's generally requested to include the EXPLAIN text when submitting a specific question, but I thought perhaps this was generic enough that someone might at least have some suggestions. If required I can certainly work up a simpler example, or I could include my actual explain (though it doesn't exactly match everything defined above as I tried to keep things rather generic).
Any links would be nice as well, from all my searching the past few days, most of the performance tuning resources I could find where about tuning the server itself, not really a specific query - at least not one that dealt with this issue. If you've read this far - thank you much!
"Matt Klinker" <mklinker@gmail.com> writes: > I know it's generally requested to include the EXPLAIN text when submitting > a specific question, but I thought perhaps this was generic enough that > someone might at least have some suggestions. You're usually only going to get generic suggestions from a generic explanation. One thought here though is that it's only been since PG 8.2 that you had any hope of getting an indexscan on a join condition pushed down through a UNION, which it looks like is what you're hoping for. What version are you running? regards, tom lane
Sorry for not including this extra bit originally. Below is the explain detail from both the query to the view that takes longer and then the query directly to the single table that performs quickly.
Hash Join (cost=49082.96..1940745.80 rows=11412 width=76)
Hash Cond: (outer.?column1? = inner.listing_fid)
-> Append (cost=0.00..1290709.94 rows=18487347 width=252)
-> Subquery Scan *SELECT* 1 (cost=0.00..1285922.80 rows=18384890 width=251)
-> Seq Scan on company (cost=0.00..1102073.90 rows=18384890 width=251)
-> Subquery Scan *SELECT* 2 (cost=0.00..4787.14 rows=102457 width=252)
-> Seq Scan on school (cost=0.00..3762.57 rows=102457 width=252)
-> Hash (cost=49042.64..49042.64 rows=16130 width=8)
-> Bitmap Heap Scan on listing_node_xref xref (cost=102.45..49042.64 rows=16130 width=8)
Recheck Cond: (node_fid = 173204537)
-> Bitmap Index Scan on idx_listing_node_xref_node_fid (cost=0.00..102.45 rows=16130 width=0)
Index Cond: (node_fid = 173204537)
Nested Loop (cost=102.45..98564.97 rows=11349 width=517)
-> Bitmap Heap Scan on listing_node_xref xref (cost=102.45..49042.64 rows=16130 width=8)
Recheck Cond: (node_fid = 173204537)
-> Bitmap Index Scan on idx_listing_node_xref_node_fid (cost=0.00..102.45 rows=16130 width=0)
Index Cond: (node_fid = 173204537)
-> Index Scan using idx_pki_company_id on company c (cost=0.00..3.06 rows=1 width=517)
Index Cond: (c.id = outer.listing_fid)
Hash Join (cost=49082.96..1940745.80 rows=11412 width=76)
Hash Cond: (outer.?column1? = inner.listing_fid)
-> Append (cost=0.00..1290709.94 rows=18487347 width=252)
-> Subquery Scan *SELECT* 1 (cost=0.00..1285922.80 rows=18384890 width=251)
-> Seq Scan on company (cost=0.00..1102073.90 rows=18384890 width=251)
-> Subquery Scan *SELECT* 2 (cost=0.00..4787.14 rows=102457 width=252)
-> Seq Scan on school (cost=0.00..3762.57 rows=102457 width=252)
-> Hash (cost=49042.64..49042.64 rows=16130 width=8)
-> Bitmap Heap Scan on listing_node_xref xref (cost=102.45..49042.64 rows=16130 width=8)
Recheck Cond: (node_fid = 173204537)
-> Bitmap Index Scan on idx_listing_node_xref_node_fid (cost=0.00..102.45 rows=16130 width=0)
Index Cond: (node_fid = 173204537)
Nested Loop (cost=102.45..98564.97 rows=11349 width=517)
-> Bitmap Heap Scan on listing_node_xref xref (cost=102.45..49042.64 rows=16130 width=8)
Recheck Cond: (node_fid = 173204537)
-> Bitmap Index Scan on idx_listing_node_xref_node_fid (cost=0.00..102.45 rows=16130 width=0)
Index Cond: (node_fid = 173204537)
-> Index Scan using idx_pki_company_id on company c (cost=0.00..3.06 rows=1 width=517)
Index Cond: (c.id = outer.listing_fid)
On Thu, Apr 3, 2008 at 7:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Matt Klinker" <mklinker@gmail.com> writes:> I new I'd forget something! I've tried this on both 8.2 and 8.3 with theThen you're going to have to provide more details ...
> same results.
regards, tom lane
"Matt Klinker" <mklinker@gmail.com> writes: > Sorry for not including this extra bit originally. Below is the explain > detail from both the query to the view that takes longer and then the query > directly to the single table that performs quickly. ... > -> Subquery Scan *SELECT* 1 (cost=0.00..1285922.80 rows=18384890 > width=251) > -> Seq Scan on company (cost=0.00..1102073.90 rows=18384890 The presence of a Subquery Scan node tells me that either this is a much older PG version than you stated, or there are some interesting details to the query that you omitted. Please drop the fan-dance routine and show us a complete reproducible test case. regards, tom lane
I'm sorry for the "fan-dance", it was not my intention to make it difficult but actually simpler in leaving out the finer details - lesson learned. Below you'll find create scripts for all tables and views invlolved. Also I've included the explain text for both queries when ran on the 8.3 database where what was included before was from 8.1 (I was incorrect in stating I had tried version 8.2, as I thought the 8.1 install was 8.2 - my apologies).
--Table 1 - (Item A) ~18M records
CREATE TABLE company
(
id bigint NOT NULL DEFAULT nextval('global_sequence'::regclass),
"name" character varying(65) NOT NULL,
description character varying(100),
recordid character varying(10),
full_address character varying(45),
street_number character varying(10),
street_directional character(2),
street_name character varying(20),
unit_designator character varying(4),
unit_number character varying(8),
city_name character varying(20),
state_code character(2),
zip character(5),
zip_extension character(4),
phone character varying(10),
phone_code character(1),
publish_date character varying(6),
solicitation_restrictions character(1),
business_flag character(1),
latitude character varying(11),
longitude character varying(11),
precision_code character(1),
fips character varying(16),
is_telco_unique boolean,
vanity_city_name character varying(20),
book_number character varying(6),
web_address character varying(50),
primary_bdc_flag character(1),
msa character varying(4),
is_amex_accepted boolean,
is_mastercard_accepted boolean,
is_visa_accepted boolean,
is_discover_accepted boolean,
is_diners_accepted boolean,
is_other_cc_accepted boolean,
fax character varying(10),
free_eac character(1),
hours_of_operation character(1),
is_spanish_spoken boolean,
is_french_spoken boolean,
is_german_spoken boolean,
is_japanese_spoken boolean,
is_italian_spoken boolean,
is_korean_spoken boolean,
is_chinese_spoken boolean,
senior_discount_key character(1),
listing_type_fid bigint,
CONSTRAINT pk_company_id PRIMARY KEY (id)
)
WITH (OIDS=FALSE);
--Table 2 - (Item B) ~100k records
CREATE TABLE school
(
id bigint NOT NULL DEFAULT nextval('global_sequence'::regclass),
"name" character varying(65) NOT NULL,
description character varying(100),
address1 character varying(100),
address2 character varying(100),
city character varying(50),
state character(2),
CONSTRAINT pk_school_id PRIMARY KEY (id)
)
WITH (OIDS=FALSE);
--Joined View:
CREATE OR REPLACE VIEW directory_listing AS
SELECT school.id, school.name, school.description, 119075291 AS listing_type_fid
FROM school
UNION ALL
SELECT company.id, company.name, company.description, 119074833 AS listing_type_fid
FROM company;
--Listing-Classification Xref: ~26M records
CREATE TABLE listing_node_xref
(
id bigint NOT NULL DEFAULT nextval('global_sequence'::regclass),
listing_fid bigint NOT NULL,
node_fid bigint NOT NULL,
CONSTRAINT pk_listing_node_xref PRIMARY KEY (id),
CONSTRAINT fk_listing_node_xref_node_fid FOREIGN KEY (node_fid)
REFERENCES node (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT uqe_listing_node_xref_listing_fid_node_fid UNIQUE (listing_fid, node_fid)
)
WITH (OIDS=FALSE);
ALTER TABLE listing_node_xref OWNER TO vml;
CREATE INDEX idx_listing_node_xref_listing_fid
ON listing_node_xref
USING btree
(listing_fid);
CREATE INDEX idx_listing_node_xref_node_fid
ON listing_node_xref
USING btree
(node_fid);
Here is the version of Postgres: PostgreSQL 8.3.1
Query:
SELECT l.id, l.name, l.description, l.listing_type_fid
FROM directory_listing l
INNER JOIN listing_node_xref xref ON l.id = xref.listing_fid
WHERE xref.node_fid = 173204537
Explain:
Hash Join (cost=48449.22..1223695.46 rows=11472 width=378)
Hash Cond: (school.id = xref.listing_fid)
-> Append (cost=0.00..945319.40 rows=18384970 width=378)
-> Seq Scan on school (cost=0.00..10.80 rows=80 width=374)
-> Seq Scan on company (cost=0.00..761458.90 rows=18384890 width=247)
-> Hash (cost=48246.22..48246.22 rows=16240 width=8)
-> Bitmap Heap Scan on listing_node_xref xref (cost=308.96..48246.22 rows=16240 width=8)
Recheck Cond: (node_fid = 173204537)
-> Bitmap Index Scan on idx_listing_node_xref_node_fid (cost=0.00..304.90 rows=16240 width=0)
Index Cond: (node_fid = 173204537)
Query:
select c.*
from company c
inner join listing_node_xref xref on c.id = xref.listing_fid
where xref.node_fid = 173204537
Explain:
Nested Loop (cost=308.96..205552.40 rows=11471 width=424)
-> Bitmap Heap Scan on listing_node_xref xref (cost=308.96..48246.22 rows=16240 width=8)
Recheck Cond: (node_fid = 173204537)
-> Bitmap Index Scan on idx_listing_node_xref_node_fid (cost=0.00..304.90 rows=16240 width=0)
Index Cond: (node_fid = 173204537)
-> Index Scan using pk_company_id on company c (cost=0.00..9.67 rows=1 width=424)
Index Cond: (c.id = xref.listing_fid)
--Table 1 - (Item A) ~18M records
CREATE TABLE company
(
id bigint NOT NULL DEFAULT nextval('global_sequence'::regclass),
"name" character varying(65) NOT NULL,
description character varying(100),
recordid character varying(10),
full_address character varying(45),
street_number character varying(10),
street_directional character(2),
street_name character varying(20),
unit_designator character varying(4),
unit_number character varying(8),
city_name character varying(20),
state_code character(2),
zip character(5),
zip_extension character(4),
phone character varying(10),
phone_code character(1),
publish_date character varying(6),
solicitation_restrictions character(1),
business_flag character(1),
latitude character varying(11),
longitude character varying(11),
precision_code character(1),
fips character varying(16),
is_telco_unique boolean,
vanity_city_name character varying(20),
book_number character varying(6),
web_address character varying(50),
primary_bdc_flag character(1),
msa character varying(4),
is_amex_accepted boolean,
is_mastercard_accepted boolean,
is_visa_accepted boolean,
is_discover_accepted boolean,
is_diners_accepted boolean,
is_other_cc_accepted boolean,
fax character varying(10),
free_eac character(1),
hours_of_operation character(1),
is_spanish_spoken boolean,
is_french_spoken boolean,
is_german_spoken boolean,
is_japanese_spoken boolean,
is_italian_spoken boolean,
is_korean_spoken boolean,
is_chinese_spoken boolean,
senior_discount_key character(1),
listing_type_fid bigint,
CONSTRAINT pk_company_id PRIMARY KEY (id)
)
WITH (OIDS=FALSE);
--Table 2 - (Item B) ~100k records
CREATE TABLE school
(
id bigint NOT NULL DEFAULT nextval('global_sequence'::regclass),
"name" character varying(65) NOT NULL,
description character varying(100),
address1 character varying(100),
address2 character varying(100),
city character varying(50),
state character(2),
CONSTRAINT pk_school_id PRIMARY KEY (id)
)
WITH (OIDS=FALSE);
--Joined View:
CREATE OR REPLACE VIEW directory_listing AS
SELECT school.id, school.name, school.description, 119075291 AS listing_type_fid
FROM school
UNION ALL
SELECT company.id, company.name, company.description, 119074833 AS listing_type_fid
FROM company;
--Listing-Classification Xref: ~26M records
CREATE TABLE listing_node_xref
(
id bigint NOT NULL DEFAULT nextval('global_sequence'::regclass),
listing_fid bigint NOT NULL,
node_fid bigint NOT NULL,
CONSTRAINT pk_listing_node_xref PRIMARY KEY (id),
CONSTRAINT fk_listing_node_xref_node_fid FOREIGN KEY (node_fid)
REFERENCES node (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT uqe_listing_node_xref_listing_fid_node_fid UNIQUE (listing_fid, node_fid)
)
WITH (OIDS=FALSE);
ALTER TABLE listing_node_xref OWNER TO vml;
CREATE INDEX idx_listing_node_xref_listing_fid
ON listing_node_xref
USING btree
(listing_fid);
CREATE INDEX idx_listing_node_xref_node_fid
ON listing_node_xref
USING btree
(node_fid);
Here is the version of Postgres: PostgreSQL 8.3.1
Query:
SELECT l.id, l.name, l.description, l.listing_type_fid
FROM directory_listing l
INNER JOIN listing_node_xref xref ON l.id = xref.listing_fid
WHERE xref.node_fid = 173204537
Explain:
Hash Join (cost=48449.22..1223695.46 rows=11472 width=378)
Hash Cond: (school.id = xref.listing_fid)
-> Append (cost=0.00..945319.40 rows=18384970 width=378)
-> Seq Scan on school (cost=0.00..10.80 rows=80 width=374)
-> Seq Scan on company (cost=0.00..761458.90 rows=18384890 width=247)
-> Hash (cost=48246.22..48246.22 rows=16240 width=8)
-> Bitmap Heap Scan on listing_node_xref xref (cost=308.96..48246.22 rows=16240 width=8)
Recheck Cond: (node_fid = 173204537)
-> Bitmap Index Scan on idx_listing_node_xref_node_fid (cost=0.00..304.90 rows=16240 width=0)
Index Cond: (node_fid = 173204537)
Query:
select c.*
from company c
inner join listing_node_xref xref on c.id = xref.listing_fid
where xref.node_fid = 173204537
Explain:
Nested Loop (cost=308.96..205552.40 rows=11471 width=424)
-> Bitmap Heap Scan on listing_node_xref xref (cost=308.96..48246.22 rows=16240 width=8)
Recheck Cond: (node_fid = 173204537)
-> Bitmap Index Scan on idx_listing_node_xref_node_fid (cost=0.00..304.90 rows=16240 width=0)
Index Cond: (node_fid = 173204537)
-> Index Scan using pk_company_id on company c (cost=0.00..9.67 rows=1 width=424)
Index Cond: (c.id = xref.listing_fid)
On Thu, Apr 3, 2008 at 11:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Matt Klinker" <mklinker@gmail.com> writes:> Sorry for not including this extra bit originally. Below is the explain...
> detail from both the query to the view that takes longer and then the query
> directly to the single table that performs quickly.> -> Subquery Scan *SELECT* 1 (cost=0.00..1285922.80 rows=18384890The presence of a Subquery Scan node tells me that either this is a much
> width=251)
> -> Seq Scan on company (cost=0.00..1102073.90 rows=18384890
older PG version than you stated, or there are some interesting details
to the query that you omitted. Please drop the fan-dance routine and
show us a complete reproducible test case.
regards, tom lane
"Matt Klinker" <mklinker@gmail.com> writes: > --Joined View: > CREATE OR REPLACE VIEW directory_listing AS > SELECT school.id, school.name, school.description, 119075291 AS > listing_type_fid > FROM school > UNION ALL > SELECT company.id, company.name, company.description, 119074833 AS > listing_type_fid > FROM company; Ah, there's the problem :-(. Can you get rid of the constants here? The planner's currently not smart about UNION ALL subqueries unless their SELECT lists contain just simple column references. (Yes, fixing that is on the todo list, but don't hold your breath... it'll be 8.4 material at the earliest.) regards, tom lane
Removing the constants definitely did take care of the issue on 8.3 (still same query plan on 8.1). Thanks for your help in getting this resolved, and sorry again for not including all relevant information on my initial request
On Fri, Apr 4, 2008 at 10:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Matt Klinker" <mklinker@gmail.com> writes:> --Joined View:Ah, there's the problem :-(. Can you get rid of the constants here?
> CREATE OR REPLACE VIEW directory_listing AS
> SELECT school.id, school.name, school.description, 119075291 AS
> listing_type_fid
> FROM school
> UNION ALL
> SELECT company.id, company.name, company.description, 119074833 AS
> listing_type_fid
> FROM company;
The planner's currently not smart about UNION ALL subqueries unless
their SELECT lists contain just simple column references.
(Yes, fixing that is on the todo list, but don't hold your breath...
it'll be 8.4 material at the earliest.)
regards, tom lane