Query does not return rows unless a LIMIT statement is used. - Mailing list pgsql-novice
From | Richard Kut |
---|---|
Subject | Query does not return rows unless a LIMIT statement is used. |
Date | |
Msg-id | 200602151006.50419.rkut@intelerad.com Whole thread Raw |
Responses |
Re: Query does not return rows unless a LIMIT statement
Re: Query does not return rows unless a LIMIT statement is used. |
List | pgsql-novice |
Hello! This is my first time using Postgres, and my first posting to this mailing list. I am evaluating the performance of Postgres version 8.1.2 as a possible replacement for our aging Sybase ASE version 11 database. However, I have run into what could be a show-stopper problem. I am having a problem with a query taking a very long time to run. So long, in fact, that I have yet to see that query return any rows. I will apologize now for what may become a very long message, but I want to be sure that I pass along all of the pertinent details. Here we go. I am using Red Hat Enterprise Linux 3.0 on a box with 1.5 GB of RAM. The CPU is an Intel Xeon 2.4 GHz hyper-threaded chip with 512 KB cache. I am trying to do a simple join on indexed columns between two tables. Table PID has 328,403 records, and table PV1 has 1,159,822 records. The query which I am using is: SELECT * FROM pid, pv1 WHERE pid.patient_id_internal_id = pv1.patient_id_internal_id LIMIT 5324; For table PID, the column patient_id_internal_id is defined as character varying(20). The index on this column is defined as: "kdx_pid" btree (patient_id_internal_id), tablespace "hl7_segments_ihe_idx" For table PV1, the column patient_id_internal_id is defined as character varying(20). The index on this column is defined as: "patient_id_internal_id_idx" btree (patient_id_internal_id), tablespace "hl7_segments_ihe_idx" Both tables live in the tablespace hl7_segments_ihe_data. Note the LIMIT 5324 statement. I empirically determined that this magic number is the maximum row count before the query performance degrades completely. It appears to be a sudden performance degradation. If I increase the value in LIMIT beyond 5324, or if I remove the LIMIT altogether, then the query performance is horrible. Here is the query plan with the LIMIT statement: QUERY PLAN --------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..1607.27 rows=5324 width=691) -> Merge Join (cost=0.00..350141.38 rows=1159822 width=691) Merge Cond: (("outer".patient_id_internal_id)::text = ("inner".patient_id_internal_id)::text) -> Index Scan using kdx_pid on pid (cost=0.00..49229.38 rows=328403 width=284) -> Index Scan using patient_id_internal_id_idx on pv1 (cost=0.00..290598.21 rows=1159822 width=407) (5 rows) Time: 1.472 ms Now here is the query plan without the LIMIT statement: QUERY PLAN --------------------------------------------------------------------------------------------------------- Merge Join (cost=0.00..350141.38 rows=1159822 width=691) Merge Cond: (("outer".patient_id_internal_id)::text = ("inner".patient_id_internal_id)::text) -> Index Scan using kdx_pid on pid (cost=0.00..49229.38 rows=328403 width=284) -> Index Scan using patient_id_internal_id_idx on pv1 (cost=0.00..290598.21 rows=1159822 width=407) (4 rows) Time: 1.432 ms I have tried various combinations of SET enable_seqscan = off;, SET enable_mergejoin = off;, and so forth. All to no avail. The bottom line is still that LIMIT statement. Without it the query performance is awful. From observation using the TOP utility, I have noticed that when the query is running without the LIMIT statement, that the postmaster process resident memory size is increasing at a steady rate. It would seem that the query results are being gathered in memory first before being displayed in PSQL. Is there a way around this? After loading the tables with data I did perform a VACUUM ANALYZE and a database restart prior to any querying.. I have also modified the following parameters in the postgres.conf file, and then restarted the database: shared_buffers = 16384 commit_delay = 100000 commit_siblings = 5 enable_seqscan = off effective_cache_size = 100000 log_destination = 'stderr' redirect_stderr = on log_directory = 'pg_log' log_truncate_on_rotation = on log_rotation_age = 1440 log_rotation_size = 200 lc_messages = 'en_US' lc_monetary = 'en_US' lc_numeric = 'en_US' lc_time = 'en_US' What am I doing wrong? Has anyone run into something similar? Any help would be greatly appreciated. Thank you in advance. -- Regards, Richard Kut Database Administrator Research & Development Intelerad Medical Systems Inc. 460 Ste-Catherine West, Suite 210 Montreal, Quebec, Canada H3B 1A7 Tel: 514.931.6222 x7733 Fax: 514.931.4653 rkut@intelerad.com www.intelerad.com This email or any attachments may contain confidential or legally privileged information intended for the sole use of the addressees. Any use, redistribution, disclosure, or reproduction of this information, except as intended, is prohibited. If you received this email in error, please notify the sender and remove all copies of the message, including any attachments.
pgsql-novice by date: