Home > mailing lists

Re: Performance problem with semi-large tables - Mailing list pgsql-performance

From	PFC
Subject	Re: Performance problem with semi-large tables
Date	January 29, 2005 22:06:48
Msg-id	opsldw48czth1vuj@musicbox Whole thread Raw
In response to	Performance problem with semi-large tables ("Ken Egervari" <ken@upfactor.com>)
List	pgsql-performance

Tree view


> select s.*, ss.*
> from shipment s, shipment_status ss, release_code r
> where s.current_status_id = ss.id
>    and ss.release_code_id = r.id
>    and r.filtered_column = '5'
> order by ss.date desc
> limit 100;

> Release code is just a very small table of 8 rows by looking at the
> production data, hence the 0.125 filter ratio.  However, the data
> distribution is not normal since the filtered column actually pulls out
> about 54% of the rows in shipment_status when it joins.  Postgres seems
> to be doing a sequencial scan to pull out all of these rows.  Next, it
> joins approx 17550 rows to shipment.  Since this query has a limit, it
> only returns the first 100, which seems like a waste.

    Well, postgres does what you asked. It will be slow, because you have a
full table join. LIMIT does not change this because the rows have to be
sorted first.

    The date is in shipment_status so you should first get the
shipment_status.id that you need and later join to shipment. This will
avoid the big join :


SELECT s.*, ss.* FROM
    (SELECT * FROM shipment_status WHERE release_code_id IN
        (SELECT r.id FROM release_code WHERE r.filtered_column = '5')
    ORDER BY date DESC LIMIT 100
    ) as ss, shipment s
WHERE s.current_status_id = ss.id
ORDER BY date DESC LIMIT 100

    Is this better ?

pgsql-performance by date:

From: "David Parker"
Date: 29 January 2005, 22:04:43
Subject: Re: Performance problem with semi-large tables

From: "Ken Egervari"
Date: 29 January 2005, 22:40:34
Subject: Re: Performance problem with semi-large tables

Re: Performance problem with semi-large tables - Mailing list pgsql-performance

Previous

Next