Re: Query does not return rows unless a LIMIT statement is used. - Mailing list pgsql-novice
From | Richard Kut |
---|---|
Subject | Re: Query does not return rows unless a LIMIT statement is used. |
Date | |
Msg-id | 200602151111.37078.rkut@intelerad.com Whole thread Raw |
In response to | Re: Query does not return rows unless a LIMIT statement is used. (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Query does not return rows unless a LIMIT statement is used.
|
List | pgsql-novice |
Hi Tom! Thanks for the quick response! Here are the query plans for both queries. EXPLAIN ANALYZE SELECT * FROM pid, pv1 WHERE pid.patient_id_internal_id = pv1.patient_id_internal_id LIMIT 5324; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..33408.19 rows=5324 width=691) (actual time=0.054..106.972 rows=5324 loops=1) -> Nested Loop (cost=0.00..7277902.27 rows=1159822 width=691) (actual time=0.051..99.230 rows=5324 loops=1) -> Index Scan using patient_id_internal_id_idx on pv1 (cost=0.00..290598.21 rows=1159822 width=407) (actual time=0.020..14.995 rows=5324 loops=1) -> Index Scan using kdx_pid on pid (cost=0.00..6.01 rows=1 width=284) (actual time=0.009..0.010 rows=1 loops=5324) Index Cond: ((pid.patient_id_internal_id)::text = ("outer".patient_id_internal_id)::text) Total runtime: 111.137 ms (6 rows) Time: 112.405 ms EXPLAIN ANALYZE SELECT * FROM pid, pv1 WHERE pid.patient_id_internal_id = pv1.patient_id_internal_id; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=62558.39..509419.48 rows=1159822 width=691) (actual time=3348.848..163051.064 rows=1159805 loops=1) Hash Cond: (("outer".patient_id_internal_id)::text = ("inner".patient_id_internal_id)::text) -> Index Scan using patient_id_internal_id_idx on pv1 (cost=0.00..290598.21 rows=1159822 width=407) (actual time=0.020..56378.131 rows=1159822 loops=1) -> Hash (cost=49229.38..49229.38 rows=328403 width=284) (actual time=3342.874..3342.874 rows=328403 loops=1) -> Index Scan using kdx_pid on pid (cost=0.00..49229.38 rows=328403 width=284) (actual time=0.010..2110.172 rows=328403 loops=1) Total runtime: 163917.488 ms (6 rows) Time: 164006.053 ms I hope that this helps. On Wednesday 15 February 2006 10:53, Tom Lane wrote: > Richard Kut <rkut@intelerad.com> writes: > > Note the LIMIT 5324 statement. I empirically determined that this magic > > number is the maximum row count before the query performance degrades > > completely. It appears to be a sudden performance degradation. If I > > increase the value in LIMIT beyond 5324, or if I remove the LIMIT > > altogether, then the query performance is horrible. > > This is really, really hard to believe considering that you're showing > the exact same query plan in both cases. Are you certain you didn't > get confused about which EXPLAIN output was which? It's entirely > possible for the planner to pick a different underlying plan depending > on the LIMIT setting --- small LIMIT values will make it tend to prefer > fast-start query plans, since a slow-overall plan with nil startup cost > can beat a fast-overall plan with high startup cost if you're not > intending to run it to completion. Up until I got to the query plans > I thought you were describing a situation where the planner had switched > over to a high-startup-cost plan much sooner than it should do, but your > EXPLAIN printouts don't reflect any such thing. > > If it is a question of bad choice of query plan, the answer is probably > that you need to VACUUM and/or ANALYZE the tables involved so that the > planner has more accurate statistics to work with. Do the estimated > row counts shown by EXPLAIN look reasonable at all? > > > From observation using the TOP utility, I have noticed that when the > > query is running without the LIMIT statement, that the postmaster process > > resident memory size is increasing at a steady rate. It would seem that > > the query results are being gathered in memory first before being > > displayed in PSQL. Is there a way around this? > > The postmaster doesn't gather up query results, but psql itself will. > If you want incremental fetch of results, use a cursor and FETCH some > appropriate number of rows at a time. > > > commit_delay = 100000 > > This is unlikely to be a good idea. Your other parameter settings look > reasonable though. > > regards, tom lane -- Regards, Richard Kut Database Administrator Research & Development Intelerad Medical Systems Inc. 460 Ste-Catherine West, Suite 210 Montreal, Quebec, Canada H3B 1A7 Tel: 514.931.6222 x7733 Fax: 514.931.4653 rkut@intelerad.com www.intelerad.com This email or any attachments may contain confidential or legally privileged information intended for the sole use of the addressees. Any use, redistribution, disclosure, or reproduction of this information, except as intended, is prohibited. If you received this email in error, please notify the sender and remove all copies of the message, including any attachments.
pgsql-novice by date: