Thread: indexing for left join

indexing for left join

From
T E Schmitz
Date:
I have two tables:

TABLE ITEM
(
ITEM_PK serial,
RETAIL_PRICE numeric (7,2) NOT NULL,
...
PRIMARY KEY (ITEM_PK)
)

TABLE SERIAL_NO
(
SERIAL_NO_PK serial,
NO varchar (20) NOT NULL,
NAME varchar (20),
ITEM_FK integer NOT NULL,
PRIMARY KEY (SERIAL_NO_PK)
);

common query:

SELECT ITEM.ITEM_PK FROM ITEM
LEFT JOIN SERIAL_NO ON SERIAL_NO.ITEM_FK = ITEM.ITEM_PK
WHERE SERIAL_NO.NO ='WX1234'
GROUP BY ITEM.ITEM_PK

Table ITEM will eventually grow very big and SERIAL_NO will grow with 
it. There will normally be zero or one SERIAL_NO per ITEM; few ITEMs 
will have more than one SERIAL_NO.

I have created an index for SERIAL_NO.NO and one for SERIAL_NO.ITEM_FK 
for the above query.

I ran an EXPLAIN:
HashAggregate  (cost=1.06..1.06 rows=1 width=4)
->  Nested Loop  (cost=0.00..1.06 rows=1 width=4)    Join Filter: ("inner".item_fk = "outer".item_pk)    ->  Seq Scan
onitem  (cost=0.00..0.00 rows=1 width=4)    ->  Seq Scan on serial_no  (cost=0.00..1.05 rows=1 width=4)        Filter:
(("no")::text= 'WX1234'::text)
 


Sequential despite the indices? Or is this because the tables of my test 
DB are virtually empty?

Many thanks in advance.
-- 


Regards,

Tarlika Elisabeth Schmitz



Re: indexing for left join

From
Rod Taylor
Date:
> Sequential despite the indices? Or is this because the tables of my test 
> DB are virtually empty?

This is it. PostgreSQL changes strategies with data load. Performance
testing must be done on an approximation of the real data (both values
and size).
-- 



Re: indexing for left join

From
Richard Huxton
Date:
T E Schmitz wrote:
> 
> Sequential despite the indices? Or is this because the tables of my test 
> DB are virtually empty?

Yes - read up on analyse and column statistics for details. Oh, you've 
probably missed about vacuuming too.

--   Richard Huxton  Archonet Ltd


Re: indexing for left join

From
T E Schmitz
Date:
Milorad Poluga wrote:
>  Try to execute this modification of your query : 
> 
>  SELECT ITEM.ITEM_PK FROM ITEM
>  LEFT JOIN SERIAL_NO 
>     ON ( SERIAL_NO.ITEM_FK = ITEM.ITEM_PK
>                        AND SERIAL_NO.NO ='WX1234' )
>  GROUP BY ITEM.ITEM_PK
> 

>>SELECT ITEM.ITEM_PK FROM ITEM
>>LEFT JOIN SERIAL_NO ON SERIAL_NO.ITEM_FK = ITEM.ITEM_PK
>>WHERE SERIAL_NO.NO ='WX1234'
>>GROUP BY ITEM.ITEM_PK

For my small test DB both queries result in the same strategy.
The query will be generated by an object relational interface depending
on the user's search criteria. It will definitely be of the form I
specified.

I wanted to make sure that I have chosen the indices correctly. I am
presuming, if the tables are big, that the index on SERIAL_NO.NO will be
used for the WHERE clause and the one on SERIAL_NO.ITEM_FK for the join.

-- 


Regards,

Tarlika Elisabeth Schmitz



Re: indexing for left join

From
T E Schmitz
Date:
Rod Taylor wrote:
>>Sequential despite the indices? Or is this because the tables of my test 
>>DB are virtually empty?
> 
> 
> This is it. PostgreSQL changes strategies with data load. Performance
> testing must be done on an approximation of the real data (both values
> and size).

Thanks for your responses.

-- 


Regards,

Tarlika Elisabeth Schmitz



Re: indexing for left join

From
Zulq Alam
Date:
I am new to PostgreSQL but isn't this query the same as doing an INNER 
JOIN?

For a true LEFT JOIN should it not be as follows?

SELECT ITEM.ITEM_PK
FROM ITEM
LEFT JOIN SERIAL_NO ON SERIAL_NO.ITEM_FK = ITEM.ITEM_PK
AND SERIAL_NO.NO ='WX1234'
GROUP BY ITEM.ITEM_PK

Using an AND instead of WHERE for the predicate on SERIAL_NO.NO results 
in very different plans despite the immature statistics. The following 
plan is for the true LEFT JOIN.
                               QUERY PLAN
---------------------------------------------------------------------------HashAggregate  (cost=2.10..2.13 rows=3
width=4) ->  Hash Left Join  (cost=1.04..2.10 rows=3 width=4)        Hash Cond: ("outer".item_pk = "inner".item_fk)
  ->  Seq Scan on item  (cost=0.00..1.03 rows=3 width=4)        ->  Hash  (cost=1.04..1.04 rows=1 width=4)
-> Seq Scan on serial_no  (cost=0.00..1.04 rows=1 width=4)                    Filter: (("no")::text = 'WX1234'::text)
 
(7 rows)

The next plan, which is very similary to your original plan, is for the 
INNER JOIN you described.                            QUERY PLAN
---------------------------------------------------------------------HashAggregate  (cost=2.11..2.12 rows=1 width=4)
-> Nested Loop  (cost=0.00..2.11 rows=1 width=4)        Join Filter: ("outer".item_fk = "inner".item_pk)        ->  Seq
Scanon serial_no  (cost=0.00..1.04 rows=1 width=4)              Filter: (("no")::text = 'WX1234'::text)        ->  Seq
Scanon item  (cost=0.00..1.03 rows=3 width=4)
 
(6 rows)

I wont speculate on how these plans would converge or diverge as the 
tables grew and the statistics matured.

- Zulq Alam

T E Schmitz wrote:
> SELECT ITEM.ITEM_PK FROM ITEM
> LEFT JOIN SERIAL_NO ON SERIAL_NO.ITEM_FK = ITEM.ITEM_PK
> WHERE SERIAL_NO.NO ='WX1234'
> GROUP BY ITEM.ITEM_PK

> I ran an EXPLAIN:
> HashAggregate  (cost=1.06..1.06 rows=1 width=4)
> ->  Nested Loop  (cost=0.00..1.06 rows=1 width=4)
>     Join Filter: ("inner".item_fk = "outer".item_pk)
>     ->  Seq Scan on item  (cost=0.00..0.00 rows=1 width=4)
>     ->  Seq Scan on serial_no  (cost=0.00..1.05 rows=1 width=4)
>         Filter: (("no")::text = 'WX1234'::text)