Thread: indexing for left join
I have two tables: TABLE ITEM ( ITEM_PK serial, RETAIL_PRICE numeric (7,2) NOT NULL, ... PRIMARY KEY (ITEM_PK) ) TABLE SERIAL_NO ( SERIAL_NO_PK serial, NO varchar (20) NOT NULL, NAME varchar (20), ITEM_FK integer NOT NULL, PRIMARY KEY (SERIAL_NO_PK) ); common query: SELECT ITEM.ITEM_PK FROM ITEM LEFT JOIN SERIAL_NO ON SERIAL_NO.ITEM_FK = ITEM.ITEM_PK WHERE SERIAL_NO.NO ='WX1234' GROUP BY ITEM.ITEM_PK Table ITEM will eventually grow very big and SERIAL_NO will grow with it. There will normally be zero or one SERIAL_NO per ITEM; few ITEMs will have more than one SERIAL_NO. I have created an index for SERIAL_NO.NO and one for SERIAL_NO.ITEM_FK for the above query. I ran an EXPLAIN: HashAggregate (cost=1.06..1.06 rows=1 width=4) -> Nested Loop (cost=0.00..1.06 rows=1 width=4) Join Filter: ("inner".item_fk = "outer".item_pk) -> Seq Scan onitem (cost=0.00..0.00 rows=1 width=4) -> Seq Scan on serial_no (cost=0.00..1.05 rows=1 width=4) Filter: (("no")::text= 'WX1234'::text) Sequential despite the indices? Or is this because the tables of my test DB are virtually empty? Many thanks in advance. -- Regards, Tarlika Elisabeth Schmitz
> Sequential despite the indices? Or is this because the tables of my test > DB are virtually empty? This is it. PostgreSQL changes strategies with data load. Performance testing must be done on an approximation of the real data (both values and size). --
T E Schmitz wrote: > > Sequential despite the indices? Or is this because the tables of my test > DB are virtually empty? Yes - read up on analyse and column statistics for details. Oh, you've probably missed about vacuuming too. -- Richard Huxton Archonet Ltd
Milorad Poluga wrote: > Try to execute this modification of your query : > > SELECT ITEM.ITEM_PK FROM ITEM > LEFT JOIN SERIAL_NO > ON ( SERIAL_NO.ITEM_FK = ITEM.ITEM_PK > AND SERIAL_NO.NO ='WX1234' ) > GROUP BY ITEM.ITEM_PK > >>SELECT ITEM.ITEM_PK FROM ITEM >>LEFT JOIN SERIAL_NO ON SERIAL_NO.ITEM_FK = ITEM.ITEM_PK >>WHERE SERIAL_NO.NO ='WX1234' >>GROUP BY ITEM.ITEM_PK For my small test DB both queries result in the same strategy. The query will be generated by an object relational interface depending on the user's search criteria. It will definitely be of the form I specified. I wanted to make sure that I have chosen the indices correctly. I am presuming, if the tables are big, that the index on SERIAL_NO.NO will be used for the WHERE clause and the one on SERIAL_NO.ITEM_FK for the join. -- Regards, Tarlika Elisabeth Schmitz
Rod Taylor wrote: >>Sequential despite the indices? Or is this because the tables of my test >>DB are virtually empty? > > > This is it. PostgreSQL changes strategies with data load. Performance > testing must be done on an approximation of the real data (both values > and size). Thanks for your responses. -- Regards, Tarlika Elisabeth Schmitz
I am new to PostgreSQL but isn't this query the same as doing an INNER JOIN? For a true LEFT JOIN should it not be as follows? SELECT ITEM.ITEM_PK FROM ITEM LEFT JOIN SERIAL_NO ON SERIAL_NO.ITEM_FK = ITEM.ITEM_PK AND SERIAL_NO.NO ='WX1234' GROUP BY ITEM.ITEM_PK Using an AND instead of WHERE for the predicate on SERIAL_NO.NO results in very different plans despite the immature statistics. The following plan is for the true LEFT JOIN. QUERY PLAN ---------------------------------------------------------------------------HashAggregate (cost=2.10..2.13 rows=3 width=4) -> Hash Left Join (cost=1.04..2.10 rows=3 width=4) Hash Cond: ("outer".item_pk = "inner".item_fk) -> Seq Scan on item (cost=0.00..1.03 rows=3 width=4) -> Hash (cost=1.04..1.04 rows=1 width=4) -> Seq Scan on serial_no (cost=0.00..1.04 rows=1 width=4) Filter: (("no")::text = 'WX1234'::text) (7 rows) The next plan, which is very similary to your original plan, is for the INNER JOIN you described. QUERY PLAN ---------------------------------------------------------------------HashAggregate (cost=2.11..2.12 rows=1 width=4) -> Nested Loop (cost=0.00..2.11 rows=1 width=4) Join Filter: ("outer".item_fk = "inner".item_pk) -> Seq Scanon serial_no (cost=0.00..1.04 rows=1 width=4) Filter: (("no")::text = 'WX1234'::text) -> Seq Scanon item (cost=0.00..1.03 rows=3 width=4) (6 rows) I wont speculate on how these plans would converge or diverge as the tables grew and the statistics matured. - Zulq Alam T E Schmitz wrote: > SELECT ITEM.ITEM_PK FROM ITEM > LEFT JOIN SERIAL_NO ON SERIAL_NO.ITEM_FK = ITEM.ITEM_PK > WHERE SERIAL_NO.NO ='WX1234' > GROUP BY ITEM.ITEM_PK > I ran an EXPLAIN: > HashAggregate (cost=1.06..1.06 rows=1 width=4) > -> Nested Loop (cost=0.00..1.06 rows=1 width=4) > Join Filter: ("inner".item_fk = "outer".item_pk) > -> Seq Scan on item (cost=0.00..0.00 rows=1 width=4) > -> Seq Scan on serial_no (cost=0.00..1.05 rows=1 width=4) > Filter: (("no")::text = 'WX1234'::text)