Home > mailing lists

Inefficient query plan - Mailing list pgsql-performance

From	Jann Röder
Subject	Inefficient query plan
Date	August 23, 2010 01:25:15
Msg-id	i4st4a$e8v$1@dough.gmane.org Whole thread Raw
Responses	Re: Inefficient query plan Re: Inefficient query plan Re: Inefficient query plan
List	pgsql-performance

Tree view

I have two tables:
A: ItemID (PK), IsssueID (Indexed)
B: ItemID (FK), IndexNumber : PK(ItemID, IndexNumber)

Both tables have several million columns, but B has much more than A.

Now if I run

SELECT A.ItemID FROM A, B WHERE A.ItemID = B.itemID AND A.issueID =
<some id>

The query takes extremely long (several hours). I ran EXPLAIN and got:

"Hash Join  (cost=516.66..17710110.47 rows=8358225 width=16)"
"  Hash Cond: ((b.itemid)::bpchar = a.itemid)"
"  ->  Seq Scan on b  (cost=0.00..15110856.68 rows=670707968 width=16)"
"  ->  Hash  (cost=504.12..504.12 rows=1003 width=16)"
"        ->  Index Scan using idx_issueid on a  (cost=0.00..504.12
rows=1003 width=16)"
"              Index Cond: (issueid = 'A1983PW823'::bpchar)"

Now we see the problem is the seq scan on B. However there are only a
handful of rows in A that match a specific issueID. So my question is
why doesn't it just check for each of the ItemIDs that have the correct
IssueID in A if there is a matching itemID in B. This should be really
fast because ItemID in B is indexed since it is part of the primary key.

What is the reason for postgres not doing this, is there a way I can
make it do that? I'm using postgresql 8.4.4 and yes, I did run ANALYZE
on the entire DB.

I have
work_mem = 10MB
shared_buffer = 256MB
effective_cache_size = 768MB

The database is basically for a single user.

Thanks a lot,
Jann

pgsql-performance by date:

From: Greg Smith
Date: 22 August 2010, 15:01:08
Subject: Re: Fwd: Vacuum Full + Cluster + Vacuum full = non removable dead rows

From: Scott Marlowe
Date: 23 August 2010, 02:51:32
Subject: Re: Inefficient query plan

Inefficient query plan - Mailing list pgsql-performance

Previous

Next