Re: Slow query with joins - Mailing list pgsql-performance

From Bendik Rognlien Johansen
Subject Re: Slow query with joins
Date
Msg-id 850AD6F9-08BD-4EF2-A549-407FD87C7974@gmail.com
Whole thread Raw
In response to Re: Slow query with joins  ("Jim C. Nasby" <jnasby@pervasive.com>)
Responses Re: Slow query with joins  ("Jim C. Nasby" <jnasby@pervasive.com>)
List pgsql-performance
The sort is definitively the culprit. When I removed it the query was
instant. I tried setting work_mem = 131072 but it did not seem to
help. I really don't understand this :-( Any other ideas?

Thanks!


On Jan 11, 2006, at 9:23 PM, Jim C. Nasby wrote:

> I'd try figuring out if the join is the culprit or the sort is (by
> dropping the ORDER BY). work_mem is probably forcing the sort to spill
> to disk, and if your drives are rather busy...
>
> You might also get a win if you re-order the joins to people,
> contacts,
> addresses, if you know it will have the same result.
>
> In this case LIMIT won't have any real effect, because you have to go
> all the way through with the ORDER BY anyway.
>
> On Wed, Jan 11, 2006 at 08:55:32PM +0100, Bendik Rognlien Johansen
> wrote:
>> Yes,  the rowcount estimates are real, however, it has been a long
>> time since the last VACUUM FULL (there is never a good time).
>>
>> I have clustered the tables, reindexed, analyzed, vacuumed and the
>> plan now looks like this:
>>
>>
>> no_people=# explain SELECT r.id AS r_id, r.firstname || ' ' ||
>> r.lastname AS r_name, ad.id AS ad_id, ad.type AS ad_type, ad.address
>> AS ad_address, ad.postalcode AS ad_postalcode, ad.postalsite AS
>> ad_postalsite, ad.priority AS ad_priority, ad.position[0] AS ad_lat,
>> ad.position[1] AS ad_lon, ad.uncertainty AS ad_uncertainty, ad.extra
>> AS ad_extra, ad.deleted AS ad_deleted, co.id AS co_id, co.type AS
>> co_type, co.value AS co_value, co.description AS co_description,
>> co.priority AS co_priority, co.visible AS co_visible, co.searchable
>> AS co_searchable, co.deleted AS co_deleted FROM people r LEFT OUTER
>> JOIN addresses ad ON(r.id = ad.record) LEFT OUTER JOIN contacts co ON
>> (r.id = co.record) WHERE NOT r.deleted AND  r.original IS NULL ORDER
>> BY r.id;
>>                                                         QUERY PLAN
>> ---------------------------------------------------------------------
>> ---
>> --------------------------------------------------
>> Sort  (cost=182866.49..182943.12 rows=30655 width=587)
>>    Sort Key: r.id
>>    ->  Nested Loop Left Join  (cost=0.00..170552.10 rows=30655
>> width=587)
>>          ->  Nested Loop Left Join  (cost=0.00..75054.96 rows=26325
>> width=160)
>>                ->  Index Scan using people_deleted_original_is_null
>> on people r  (cost=0.00..1045.47 rows=23861 width=27)
>>                      Filter: ((NOT deleted) AND (original IS NULL))
>>                ->  Index Scan using addresses_record_idx on
>> addresses ad  (cost=0.00..3.05 rows=4 width=137)
>>                      Index Cond: ("outer".id = ad.record)
>>          ->  Index Scan using contacts_record_idx on contacts co
>> (cost=0.00..3.32 rows=24 width=431)
>>                Index Cond: ("outer".id = co.record)
>> (10 rows)
>>
>>
>>
>>
>>
>>
>> Looks faster, but still very slow. I added limit 1000 and it has been
>> running for about 25 minutes now with no output. top shows:
>>
>>
>>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>> 29994 postgres  18   0 95768  78m  68m R 17.0  7.7   0:53.27
>> postmaster
>>
>>
>>
>> which is unusual, I usually get 99.9 %cpu for just about any query,
>> which leads me to believe this is disk related.
>>
>>
>>
>> postgresql.conf:
>> shared_buffers = 8192
>> work_mem = 8192
>> maintenance_work_mem = 524288
>>
>>
>>
>>
>> Hardware 2x2.8GHz cpu
>> 1GB ram
>>
>> Could this be an issue related to lack of VACUUM FULL? The tables get
>> a lot of updates.
>>
>>
>> Thank you very much so far!
>>
>>
>>
>>
>> On Jan 11, 2006, at 4:45 PM, Tom Lane wrote:
>>
>>> Bendik Rognlien Johansen <bendik.johansen@gmail.com> writes:
>>>> Has anyone got any tips for speeding up this query? It currently
>>>> takes hours to start.
>>>
>>> Are the rowcount estimates close to reality?  The plan doesn't look
>>> unreasonable to me if they are.  It might help to increase work_mem
>>> to ensure that the hash tables don't spill to disk.
>>>
>>> Indexes:
>>>     "people_original_is_null" btree (original) WHERE original IS
>>> NULL
>>>
>>> This index seems poorly designed: the actual index entries are dead
>>> weight since all of them are necessarily NULL.  You might as well
>>> make
>>> the index carry something that you frequently test in conjunction
>>> with
>>> "original IS NULL".  For instance, if this particular query is a
>>> common
>>> case, you could replace this index with
>>>
>>> CREATE INDEX people_deleted_original_is_null ON people(deleted)
>>>  WHERE original IS NULL;
>>>
>>> This index is still perfectly usable for queries that only say
>>> "original
>>> IS NULL", but it can also filter out rows with the wrong value of
>>> deleted.  Now, if there are hardly any rows with deleted = true,
>>> maybe
>>> this won't help much for your problem.  But in any case you ought to
>>> consider whether you can make the index entries do something useful.
>>>
>>>             regards, tom lane
>>
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 5: don't forget to increase your free space map settings
>>
>
> --
> Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
> Pervasive Software      http://pervasive.com    work: 512-231-6117
> vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: NOT LIKE much faster than LIKE?
Next
From: "Dave Dutcher"
Date:
Subject: Showing Column Statistics Number