Thread: Join performance

Join performance

From
Pepe Barbe
Date:
Hello,

I am having an issue on PostgreSQL 8.0.12. In the past we had
performance issues with the query planner for queries on some tables
where we knew we had indexes and it was doing a sequential scan, and
for this reason we issue "SET enable_seqscan = FALSE" for some queries.

Recently we have stumbled upon one of these kind of queries that is
giving terrible performance, because seqscan is disabled. I've reduced
the problem to a a command like this one:

SELECT * from gsm_sector_metrics NATURAL JOIN gsm_amr_metrics INNER
JOIN temp_busy_hr USING(start_time,bsc_id,sect_id);

Where temp_busy_hr is a temporary table.

If the previous is issued with seqscan TRUE, it runs within reasonable
time, else it runs for ever. The query plan for the previous query
with enable_seqscan = TRUE:

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=0.00..384555.98 rows=1 width=3092)
   ->  Nested Loop  (cost=0.00..384555.98 rows=1 width=3092)
         Join Filter: (("inner".bsc_id = "outer".bsc_id) AND
("inner".site_id = "outer".site_id) AND ("inner".sect_id =
"outer".sect_id))
         ->  Nested Loop  (cost=0.00..368645.64 rows=28 width=1192)
               Join Filter: (("outer".sect_id = "inner".sect_id) AND
("outer".bsc_id = "inner".bsc_id))
               ->  Seq Scan on temp_busy_hr  (cost=0.00..24.00
rows=1400 width=24)
               ->  Index Scan using gsm_amr_start_time_idx on
gsm_amr_metrics  (cost=0.00..226.66 rows=2094 width=1168)
                     Index Cond: ("outer".start_time =
gsm_amr_metrics.start_time)
         ->  Index Scan using gsm_sector_start_time_idx on
gsm_sector_metrics t1  (cost=0.00..528.77 rows=1973 width=1936)
               Index Cond: (t1.start_time = "outer".start_time)
(10 rows)

and the plan for enable_seqscan = FALSE:

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=100000097.16.. 100720844.011111 rows=1 width=3092)
   ->  Nested Loop  (cost=100000097.16..100720844.01 rows=1 width=3092)
         Join Filter: (("inner".bsc_id = "outer".bsc_id) AND
("inner".site_id = "outer".site_id) AND ("inner".sect_id =
"outer".sect_id))
         ->  Merge Join  (cost=100000097.16..100704933.67 rows=28
width=1192)
               Merge Cond: ("outer".start_time = "inner".start_time)
               Join Filter: (("inner".sect_id = "outer".sect_id) AND
("inner".bsc_id = "outer".bsc_id))
               ->  Index Scan using gsm_amr_start_time_idx on
gsm_amr_metrics  (cost=0.00..631211.45 rows=6005551 width=1168)
               ->  Sort  (cost=100000097.16..100000100.66 rows=1400
width=24)
                     Sort Key: temp_busy_hr.start_time
                     ->  Seq Scan on temp_busy_hr
(cost=100000000.00..100000024.00 rows=1400 width=24)
         ->  Index Scan using gsm_sector_start_time_idx on
gsm_sector_metrics t1  (cost=0.00..528.77 rows=1973 width=1936)
               Index Cond: (t1.start_time = "outer".start_time)
(12 rows)

Any ideas what could I try to fix this problem?

Thanks,
Pepe

Re: Join performance

From
"Steinar H. Gunderson"
Date:
On Thu, Nov 08, 2007 at 04:47:09PM -0600, Pepe Barbe wrote:
> I am having an issue on PostgreSQL 8.0.12. In the past we had performance
> issues with the query planner for queries on some tables where we knew we
> had indexes and it was doing a sequential scan, and for this reason we
> issue "SET enable_seqscan = FALSE" for some queries.

This is a bad idea in general. Did you really measure that this made queries
run faster? Generally, using an index is not always a win, and the planner
tries to figure out when it isn't. Setting it globally is seldom a good idea
anyway; if it really _is_ a win for a given query, you could always set it
locally in that session.

> Any ideas what could I try to fix this problem?

Re-enable seqscan?

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: Join performance

From
Tom Lane
Date:
"Steinar H. Gunderson" <sgunderson@bigfoot.com> writes:
> On Thu, Nov 08, 2007 at 04:47:09PM -0600, Pepe Barbe wrote:
>> I am having an issue on PostgreSQL 8.0.12. In the past we had performance
>> issues with the query planner for queries on some tables where we knew we
>> had indexes and it was doing a sequential scan, and for this reason we
>> issue "SET enable_seqscan = FALSE" for some queries.

> This is a bad idea in general.

Indeed.  A less brute-force way of getting the planner to favor
indexscans is to reduce random_page_cost ... have you experimented
with that?

Also, consider updating to 8.2.x, which has an improved cost model
for indexscans and will more often make the correct choice without
such shenanigans.

            regards, tom lane

Re: Join performance

From
Tom Lane
Date:
"Steinar H. Gunderson" <sgunderson@bigfoot.com> writes:
> On Thu, Nov 08, 2007 at 04:47:09PM -0600, Pepe Barbe wrote:
>> I am having an issue on PostgreSQL 8.0.12. In the past we had performance
>> issues with the query planner for queries on some tables where we knew we
>> had indexes and it was doing a sequential scan, and for this reason we
>> issue "SET enable_seqscan = FALSE" for some queries.

> This is a bad idea in general.

Indeed.  A less brute-force way of getting the planner to favor
indexscans is to reduce random_page_cost ... have you experimented
with that?

Also, consider updating to 8.2.x, which has an improved cost model
for indexscans and will more often make the correct choice without
such shenanigans.

            regards, tom lane

Re: Join performance

From
Tom Lane
Date:
"Steinar H. Gunderson" <sgunderson@bigfoot.com> writes:
> On Thu, Nov 08, 2007 at 04:47:09PM -0600, Pepe Barbe wrote:
>> I am having an issue on PostgreSQL 8.0.12. In the past we had performance
>> issues with the query planner for queries on some tables where we knew we
>> had indexes and it was doing a sequential scan, and for this reason we
>> issue "SET enable_seqscan = FALSE" for some queries.

> This is a bad idea in general.

Indeed.  A less brute-force way of getting the planner to favor
indexscans is to reduce random_page_cost ... have you experimented
with that?

Also, consider updating to 8.2.x, which has an improved cost model
for indexscans and will more often make the correct choice without
such shenanigans.

            regards, tom lane

Re: Join performance

From
Tom Lane
Date:
"Steinar H. Gunderson" <sgunderson@bigfoot.com> writes:
> On Thu, Nov 08, 2007 at 04:47:09PM -0600, Pepe Barbe wrote:
>> I am having an issue on PostgreSQL 8.0.12. In the past we had performance
>> issues with the query planner for queries on some tables where we knew we
>> had indexes and it was doing a sequential scan, and for this reason we
>> issue "SET enable_seqscan = FALSE" for some queries.

> This is a bad idea in general.

Indeed.  A less brute-force way of getting the planner to favor
indexscans is to reduce random_page_cost ... have you experimented
with that?

Also, consider updating to 8.2.x, which has an improved cost model
for indexscans and will more often make the correct choice without
such shenanigans.

            regards, tom lane

Re: Join performance

From
Tom Lane
Date:
Ooops, sorry about the multiple copies there --- not sure what happened.

            regards, tom lane

Re: Join performance

From
Russell Smith
Date:
Pepe Barbe wrote:
> Hello,
>
> I am having an issue on PostgreSQL 8.0.12. In the past we had
> performance issues with the query planner for queries on some tables
> where we knew we had indexes and it was doing a sequential scan, and for
> this reason we issue "SET enable_seqscan = FALSE" for some queries.
>
> Recently we have stumbled upon one of these kind of queries that is
> giving terrible performance, because seqscan is disabled. I've reduced
> the problem to a a command like this one:
>
> SELECT * from gsm_sector_metrics NATURAL JOIN gsm_amr_metrics INNER JOIN
> temp_busy_hr USING(start_time,bsc_id,sect_id);
>
> Where temp_busy_hr is a temporary table.

Have you tried analyzing the temp_busy_hr table?
Possibly adding an index to the temp table can help if you are doing lots of queries.

>
> If the previous is issued with seqscan TRUE, it runs within reasonable
> time, else it runs for ever. The query plan for the previous query with
> enable_seqscan = TRUE:

It would be worth know how far the estimates are out.  Also, have you tried altering the statistics target
for relevant columns to increase the accuracy?

>
> QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------------------

>
> Limit  (cost=0.00..384555.98 rows=1 width=3092)
>   ->  Nested Loop  (cost=0.00..384555.98 rows=1 width=3092)
>         Join Filter: (("inner".bsc_id = "outer".bsc_id) AND ("inner".site_id = "outer".site_id) AND ("inner".sect_id
="outer".sect_id)) 
>         ->  Nested Loop  (cost=0.00..368645.64 rows=28 width=1192)
>               Join Filter: (("outer".sect_id = "inner".sect_id) AND ("outer".bsc_id = "inner".bsc_id))
>               ->  Seq Scan on temp_busy_hr  (cost=0.00..24.00 rows=1400 width=24)
>               ->  Index Scan using gsm_amr_start_time_idx on gsm_amr_metrics  (cost=0.00..226.66 rows=2094
width=1168)
>                     Index Cond: ("outer".start_time = gsm_amr_metrics.start_time)
>         ->  Index Scan using gsm_sector_start_time_idx on gsm_sector_metrics t1  (cost=0.00..528.77 rows=1973
width=1936)
>               Index Cond: (t1.start_time = "outer".start_time)
> (10 rows)
>
> and the plan for enable_seqscan = FALSE:
>
> QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------------------

>
> Limit  (cost=100000097.16.. 100720844.011111 rows=1 width=3092)
>   ->  Nested Loop  (cost=100000097.16..100720844.01 rows=1 width=3092)
>         Join Filter: (("inner".bsc_id = "outer".bsc_id) AND ("inner".site_id = "outer".site_id) AND ("inner".sect_id
="outer".sect_id)) 
>         ->  Merge Join  (cost=100000097.16..100704933.67 rows=28 width=1192)
>               Merge Cond: ("outer".start_time = "inner".start_time)
>               Join Filter: (("inner".sect_id = "outer".sect_id) AND ("inner".bsc_id = "outer".bsc_id))
>               ->  Index Scan using gsm_amr_start_time_idx on gsm_amr_metrics  (cost=0.00..631211.45 rows=6005551
width=1168)
>               ->  Sort  (cost=100000097.16..100000100.66 rows=1400 width=24)
>                     Sort Key: temp_busy_hr.start_time
>                     ->  Seq Scan on temp_busy_hr  (cost=100000000.00..100000024.00 rows=1400 width=24)
>         ->  Index Scan using gsm_sector_start_time_idx on gsm_sector_metrics t1  (cost=0.00..528.77 rows=1973
width=1936)
>               Index Cond: (t1.start_time = "outer".start_time)
> (12 rows)
>
> Any ideas what could I try to fix this problem?
>
> Thanks,
> Pepe
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faq
>