Re: Poor performance when joining against inherited tables - Mailing list pgsql-performance

From Robert Haas
Subject Re: Poor performance when joining against inherited tables
Date
Msg-id BANLkTikkxy322Ws60X+nrLN5+su+DAsS7Q@mail.gmail.com
Whole thread Raw
In response to Re: Poor performance when joining against inherited tables  (Lucas Madar <madar@samsix.com>)
List pgsql-performance
On Wed, May 11, 2011 at 4:47 PM, Lucas Madar <madar@samsix.com> wrote:
> On 05/11/2011 09:38 AM, Robert Haas wrote:
>>>
>>> However, if I disable seqscan (set enable_seqscan=false), I get the
>>> following plan:
>>>
>>>  QUERY PLAN
>>> ------------
>>>  Hash Join  (cost=10001298843.53..290002337961.71 rows=8643757 width=506)
>>>   Hash Cond: (f.id = objects.id)
>>>   ->    Append  (cost=10000000000.00..290000536334.43 rows=8643757
>>> width=20)
>>>         ->    Seq Scan on item f  (cost=10000000000.00..10000000026.30
>>> rows=1630 width=20)
>>>         ->    Index Scan using xxx_pkey on itemXX f  (cost=0.00..10.60
>>> rows=90
>>> width=20)
>>>         ->    Index Scan using yyy_pkey on itemYY f  (cost=0.00..25.24
>>> rows=266 width=20)
>>>         ->    Index Scan using zzz_pkey on itemZZ f  (cost=0.00..9.28
>>> rows=2
>>> width=20)
>>>         ...
>>>   ->    Hash  (cost=999347.17..999347.17 rows=3941949 width=490)
>>>         ->    Index Scan using objects_pkey on objects
>>> (cost=0.00..999347.17
>>> rows=3941949 width=490)
>>>
>>> This seems like a much more sensible query plan.
>>
>> I don't think so.  Scanning the index to extract all the rows in a
>> table is typically going to be a lot slower than a sequential scan.
>>
>
> Compared to the previous query plan (omitted in this e-mail, in which the
> planner was scanning all the item tables sequentially), the second query is
> much more desirable. It takes about 12 seconds to complete, versus the other
> query which I canceled after six hours. However, what you propose seems to
> make even more sense.

I was just looking at this email again, and had another thought:
perhaps the tables in question are badly bloated.  In your situation,
it seems that the plan didn't change much when you set
enable_seqscan=off: it just replaced full-table seq-scans with
full-table index-scans, which should be slower.  But if you have a
giant table that's mostly empty space, then following the index
pointers to the limited number of blocks that contain any useful data
might be faster than scanning all the empty space.  If you still have
these tables around somewhere, you could test this hypothesis by
running CLUSTER on all the tables and see whether the seq-scan gets
faster.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-performance by date:

Previous
From: Greg Spiegelberg
Date:
Subject: Re: is parallel union all possible over dblink?
Next
From: Samuel Gendler
Date:
Subject: Re: near identical queries have vastly different plans