optimize self-join query - Mailing list pgsql-sql
From | Ty Busby |
---|---|
Subject | optimize self-join query |
Date | |
Msg-id | BFC71945-8821-4BC9-8430-A8CACF8F3794@gmail.com Whole thread Raw |
Responses |
Re: optimize self-join query
(Tom Lane <tgl@sss.pgh.pa.us>)
Re: optimize self-join query (Lee Hachadoorian <lee.hachadoorian@gmail.com>) Re: optimize self-join query (Ty Busby <tybusby@yahoo.com>) |
List | pgsql-sql |
I have a table that stores a very large starting number called epc_start_numeric and a quantity. I've apparently built themost inefficient query possible for doing the job I need: find out if any records overlap. Imagine the epc_start_numeric+ quantity representing a block of numbers. I need to find out if any of these blocks overlap. Here is the table: CREATE TABLE ftp_epc_audit (record_id serial NOT NULL,sent_filename text NOT NULL,pcid text NOT NULL,tsid text NOT NULL,user_sec_id text NOT NULL,format_settext NOT NULL,format text NOT NULL,epc_start text NOT NULL,quantity integer,epc_decimal_start numeric(29) ) WITH OIDS; ALTER TABLE ftp_epc_audit OWNER TO postgres; And the query is currently this: SELECT '', 0, a.*, '', 0, b.* FROM ftp_epc_audit_staging a, ftp_epc_audit_staging b WHERE a.sent_filename <> b.sent_filename AND a.quantity > 0 AND b.quantity > 0 AND a.locked = 1 AND b.locked = 1 AND(( a.epc_decimal_start BETWEEN b.epc_decimal_start AND b.epc_decimal_start + b.quantity - 1 ) OR ( a.epc_decimal_start + a.quantity - 1 BETWEEN b.epc_decimal_start AND b.quantity - 1 ) OR ( a.epc_decimal_start + a.quantity - 1 < b.epc_decimal_start AND a.epc_decimal_start + a.quantity -1 > b.epc_decimal_start+ b.quantity - 1 )) The column sent_filename is the unique value used so that a record does not find itself. I've got an index on sent_filename,quantity, locked and epc_decimal_start. The query runs fine with a very small number of records. However, I need to process 60,000 records and this is taking hours. There must be a fundemental flaw in either the query or my overall design because it doesn't seem like a challengingtask. I've also tried a variant of this query which also takes several hours to run through 60,000 records. SELECT * FROM ftp_epc_audit_staging a WHERE a.quantity > 0 AND a.locked = 1 AND EXISTS ( SELECT TRUE FROM ftp_epc_audit_staging b WHERE b.locked = 1 AND b.quantity > 0 AND a.sent_filename <> b.sent_filename AND(( a.epc_decimal_start BETWEEN b.epc_decimal_start AND b.epc_decimal_start + b.quantity - 1 ) OR ( a.epc_decimal_start + a.quantity - 1 BETWEEN b.epc_decimal_start AND b.quantity - 1 ) OR ( a.epc_decimal_start + a.quantity - 1 < b.epc_decimal_start AND a.epc_decimal_start + a.quantity -1 > b.epc_decimal_start+ b.quantity - 1 ))) I would really appreciate any thoughts on this.