Thread: Query performance

Query performance

From

S Arvind

Date:

12 October 2009, 11:21:57

In the below query both table has less than 1 million data. Can u tell me the reason of this plan?
why its takin extensive cost , seq scan and sorting?? wat is Materialize?

select 1 from service_detail
left join non_service_detail on non_service_detail_service_id = service_detail.service_detail_id

Merge Left Join (cost=62451.86..67379.08 rows=286789 width=0)
Merge Cond: (service_detail.service_detail_id = non_service_detail.non_service_detail_service_id)
-> Sort (cost=18610.57..18923.27 rows=125077 width=8)
        Sort Key: service_detail.service_detail_id
        -> Seq Scan on service_detail (cost=0.00..6309.77 rows=125077 width=8)
-> Materialize (cost=43841.28..47426.15 rows=286789 width=8)
        -> Sort (cost=43841.28..44558.26 rows=286789 width=8)
              Sort Key: non_service_detail.non_service_detail_service_id
              -> Seq Scan on non_service_detail (cost=0.00..13920.89 rows=286789 width=8)

Thanks,
Arvind S

"Many of lifes failure are people who did not realize how close they were to success when they gave up."
-Thomas Edison

Re: Query performance

From

Grzegorz Jaśkiewicz

Date:

12 October 2009, 12:30:22

On Mon, Oct 12, 2009 at 12:21 PM, S Arvind <arvindwill@gmail.com> wrote:

In the below query both table has less than 1 million data. Can u tell me the reason of this plan?
why its takin extensive cost , seq scan and sorting?? wat is Materialize?

select 1 from service_detail
left join non_service_detail on non_service_detail_service_id = service_detail.service_detail_id

Merge Left Join (cost=62451.86..67379.08 rows=286789 width=0)
Merge Cond: (service_detail.service_detail_id = non_service_detail.non_service_detail_service_id)
-> Sort (cost=18610.57..18923.27 rows=125077 width=8)
        Sort Key: service_detail.service_detail_id
        -> Seq Scan on service_detail (cost=0.00..6309.77 rows=125077 width=8)
-> Materialize (cost=43841.28..47426.15 rows=286789 width=8)
        -> Sort (cost=43841.28..44558.26 rows=286789 width=8)
              Sort Key: non_service_detail.non_service_detail_service_id
              -> Seq Scan on non_service_detail (cost=0.00..13920.89 rows=286789 width=8)

A) it is a left join, meaning - everything is pulled from left side,
B) there are no conditions, so ... ... everything is pulled again from left side.

--
GJ

Re: Query performance

From

S Arvind

Date:

12 October 2009, 12:40:12

I can understand left join, actually can any one tell me why sort operation is carried out and wat Materialize means...
Can anyone explain me the mentioned plan with reason(s)?

-Arvind S

2009/10/12 Grzegorz Jaśkiewicz <gryzman@gmail.com>

On Mon, Oct 12, 2009 at 12:21 PM, S Arvind <arvindwill@gmail.com> wrote:
In the below query both table has less than 1 million data. Can u tell me the reason of this plan?
why its takin extensive cost , seq scan and sorting?? wat is Materialize?

select 1 from service_detail
left join non_service_detail on non_service_detail_service_id = service_detail.service_detail_id

Merge Left Join (cost=62451.86..67379.08 rows=286789 width=0)
Merge Cond: (service_detail.service_detail_id = non_service_detail.non_service_detail_service_id)
-> Sort (cost=18610.57..18923.27 rows=125077 width=8)
        Sort Key: service_detail.service_detail_id
        -> Seq Scan on service_detail (cost=0.00..6309.77 rows=125077 width=8)
-> Materialize (cost=43841.28..47426.15 rows=286789 width=8)
        -> Sort (cost=43841.28..44558.26 rows=286789 width=8)
              Sort Key: non_service_detail.non_service_detail_service_id
              -> Seq Scan on non_service_detail (cost=0.00..13920.89 rows=286789 width=8)

A) it is a left join, meaning - everything is pulled from left side,
B) there are no conditions, so ... ... everything is pulled again from left side.

--
GJ

Re: Query performance

From

Matthew Wakeling

Date:

12 October 2009, 13:02:08

On Mon, 12 Oct 2009, S Arvind wrote:
> I can understand left join, actually can any one tell me why sort operation is carried
> out and wat Materialize means...
> Can anyone explain me the mentioned plan with reason(s)?

>  Merge Left Join  (cost=62451.86..67379.08 rows=286789 width=0)
>      Merge Cond: (a.id = b.id)
>      ->  Sort  (cost=18610.57..18923.27 rows=125077 width=8)
>          Sort Key: a.id
>          ->  Seq Scan on a  (cost=0.00..6309.77 rows=125077 width=8)
>      ->  Materialize  (cost=43841.28..47426.15 rows=286789 width=8)
>          ->  Sort  (cost=43841.28..44558.26 rows=286789 width=8)
>              Sort Key: b.id
>              ->  Seq Scan on b (cost=0.00..13920.89 rows=286789 width=8)

This is a merge join. A merge join joins together two streams of data,
where both streams are sorted, by placing the two streams side by side and
advancing through both streams finding matching rows. The algorithm can
use a pointer to a position in both of the streams, and advance the
pointer of the stream that has the earlier value according to the sort
order, and therefore get all the matches.

You are performing a query over the whole of both of the tables, so the
cheapest way to obtain a sorted stream of data is to do a full sequential
scan of the whole table, bring it into memory, and sort it. An alternative
would be to follow a B-tree index if one was available on the correct
column, but that is usually more expensive unless the table is clustered
on the index or only a small portion of the table is to be read. If you
had put a "LIMIT 10" clause on the end of the query and had such an index,
it would probably switch to that strategy instead.

The materialise step is effectively a buffer that allows one of the
streams to be rewound cheaply, which will be necessary if there are
multiple rows with the same value.

Does that answer your question?

Matthew

--
 The only secure computer is one that's unplugged, locked in a safe,
 and buried 20 feet under the ground in a secret location...and i'm not
 even too sure about that one.                         --Dennis Huges, FBI

Re: Query performance

From

S Arvind

Date:

12 October 2009, 13:24:00

Thanks very much Matthew , its more then my expectation...

Without changing the query is there any way to optimize it, like by changing the pg configuration for handling these kind queries?

-Arvind S

On Mon, Oct 12, 2009 at 6:31 PM, Matthew Wakeling <matthew@flymine.org> wrote:

On Mon, 12 Oct 2009, S Arvind wrote:
I can understand left join, actually can any one tell me why sort operation is carried
out and wat Materialize means...
Can anyone explain me the mentioned plan with reason(s)?

Merge Left Join (cost=62451.86..67379.08 rows=286789 width=0)
Merge Cond: (a.id = b.id)

-> Sort (cost=18610.57..18923.27 rows=125077 width=8)
         Sort Key: a.id
         -> Seq Scan on a (cost=0.00..6309.77 rows=125077 width=8)

-> Materialize (cost=43841.28..47426.15 rows=286789 width=8)
         -> Sort (cost=43841.28..44558.26 rows=286789 width=8)
             Sort Key: b.id
            -> Seq Scan on b (cost=0.00..13920.89 rows=286789 width=8)

This is a merge join. A merge join joins together two streams of data, where both streams are sorted, by placing the two streams side by side and advancing through both streams finding matching rows. The algorithm can use a pointer to a position in both of the streams, and advance the pointer of the stream that has the earlier value according to the sort order, and therefore get all the matches.

You are performing a query over the whole of both of the tables, so the cheapest way to obtain a sorted stream of data is to do a full sequential scan of the whole table, bring it into memory, and sort it. An alternative would be to follow a B-tree index if one was available on the correct column, but that is usually more expensive unless the table is clustered on the index or only a small portion of the table is to be read. If you had put a "LIMIT 10" clause on the end of the query and had such an index, it would probably switch to that strategy instead.

The materialise step is effectively a buffer that allows one of the streams to be rewound cheaply, which will be necessary if there are multiple rows with the same value.

Does that answer your question?

Matthew

--
The only secure computer is one that's unplugged, locked in a safe,
and buried 20 feet under the ground in a secret location...and i'm not
even too sure about that one. --Dennis Huges, FBI

Re: Query performance

From

Grzegorz Jaśkiewicz

Date:

12 October 2009, 13:30:16

btw, what's the version of db ?
what's the work_mem setting ?

try setting work_mem to higher value. As postgresql will fallback to disc sorting if the content doesn't fit in work_mem, which it probably doesn't (8.4+ show the memory usage for sorting, which your explain doesn't have).

Re: Query performance

From

Matthew Wakeling

Date:

12 October 2009, 13:37:15

On Mon, 12 Oct 2009, Grzegorz Jaśkiewicz wrote:
> try setting work_mem to higher value. As postgresql will fallback to disc sorting if the
> content doesn't fit in work_mem, which it probably doesn't (8.4+ show the memory usage
> for sorting, which your explain doesn't have).

For reference, here's the EXPLAIN:

>  Merge Left Join  (cost=62451.86..67379.08 rows=286789 width=0)
>      Merge Cond: (a.id = b.id)
>      ->  Sort  (cost=18610.57..18923.27 rows=125077 width=8)
>          Sort Key: a.id
>          ->  Seq Scan on a  (cost=0.00..6309.77 rows=125077 width=8)
>      ->  Materialize  (cost=43841.28..47426.15 rows=286789 width=8)
>          ->  Sort  (cost=43841.28..44558.26 rows=286789 width=8)
>              Sort Key: b.id
>              ->  Seq Scan on b (cost=0.00..13920.89 rows=286789 width=8)

This is an EXPLAIN, not an EXPLAIN ANALYSE. If it was an EXPLAIN ANALYSE,
it would show how much memory was used, and whether it was a disc sort or
an in-memory sort. As it is only an EXPLAIN, the query hasn't actually
been run, and we have no information about whether the sort would be
performed on disc or not.

Matthew

--
 Hi! You have reached 555-0129. None of us are here to answer the phone and
 the cat doesn't have opposing thumbs, so his messages are illegible. Please
 leave your name and message after the beep ...

Re: Query performance

From

Grzegorz Jaśkiewicz

Date:

12 October 2009, 13:40:33

2009/10/12 Matthew Wakeling <matthew@flymine.org>

This is an EXPLAIN, not an EXPLAIN ANALYSE. If it was an EXPLAIN ANALYSE, it would show how much memory was used, and whether it was a disc sort or an in-memory sort. As it is only an EXPLAIN, the query hasn't actually been run, and we have no information about whether the sort would be performed on disc or not.

true, I was looking at it as if it was explain analyze output :)
sorry.

--
GJ

Re: Query performance

From

S Arvind

Date:

12 October 2009, 14:46:19

Sorry guys, i sent the required plan....

                                                                  QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Merge Left Join (cost=62422.81..67345.85 rows=286487 width=0) (actual time=1459.355..2538.538 rows=325998 loops=1)
   Merge Cond: (service_detail.service_detail_id = non_service_detail.non_service_detail_service_id)
   -> Sort (cost=18617.60..18930.47 rows=125146 width=8) (actual time=425.115..560.807 rows=125146 loops=1)
         Sort Key: service_detail.service_detail_id
         Sort Method: external merge Disk: 2912kB
         -> Seq Scan on service_detail (cost=0.00..6310.46 rows=125146 width=8) (actual time=0.056..114.925 rows=125146 loops=1)
   -> Materialize (cost=43805.21..47386.30 rows=286487 width=8) (actual time=1034.220..1617.313 rows=286491 loops=1)
         -> Sort (cost=43805.21..44521.43 rows=286487 width=8) (actual time=1034.204..1337.708 rows=286491 loops=1)
               Sort Key: non_service_detail.non_service_detail_service_id
               Sort Method: external merge Disk: 6720kB
               -> Seq Scan on non_service_detail (cost=0.00..13917.87 rows=286487 width=8) (actual time=0.063..248.950 rows=286491 loops=1)
Total runtime: 2650.763 ms
(12 rows)

2009/10/12 Matthew Wakeling <matthew@flymine.org>

On Mon, 12 Oct 2009, Grzegorz Jaśkiewicz wrote:
try setting work_mem to higher value. As postgresql will fallback to disc sorting if the
content doesn't fit in work_mem, which it probably doesn't (8.4+ show the memory usage
for sorting, which your explain doesn't have).

For reference, here's the EXPLAIN:

Merge Left Join (cost=62451.86..67379.08 rows=286789 width=0)
Merge Cond: (a.id = b.id)
-> Sort (cost=18610.57..18923.27 rows=125077 width=8)
         Sort Key: a.id
         -> Seq Scan on a (cost=0.00..6309.77 rows=125077 width=8)
-> Materialize (cost=43841.28..47426.15 rows=286789 width=8)
         -> Sort (cost=43841.28..44558.26 rows=286789 width=8)
             Sort Key: b.id
            -> Seq Scan on b (cost=0.00..13920.89 rows=286789 width=8)

This is an EXPLAIN, not an EXPLAIN ANALYSE. If it was an EXPLAIN ANALYSE, it would show how much memory was used, and whether it was a disc sort or an in-memory sort. As it is only an EXPLAIN, the query hasn't actually been run, and we have no information about whether the sort would be performed on disc or not.

Matthew

--
Hi! You have reached 555-0129. None of us are here to answer the phone and the cat doesn't have opposing thumbs, so his messages are illegible. Please leave your name and message after the beep ...

Re: Query performance

From

S Arvind

Date:

12 October 2009, 14:53:23

Thanks Grzegorz,
But work memory is for each process (connection) rt? so if i keep more then 10MB will not affect the overall performance ?

Arvind S

2009/10/12 Grzegorz Jaśkiewicz <gryzman@gmail.com>

btw, what's the version of db ?
what's the work_mem setting ?

try setting work_mem to higher value. As postgresql will fallback to disc sorting if the content doesn't fit in work_mem, which it probably doesn't (8.4+ show the memory usage for sorting, which your explain doesn't have).

Re: Query performance

From

Grzegorz Jaśkiewicz

Date:

12 October 2009, 15:11:02

2009/10/12 S Arvind <arvindwill@gmail.com>

Thanks Grzegorz,
But work memory is for each process (connection) rt? so if i keep more then 10MB will not affect the overall performance ?

it will. But the memory is only allocated when needed.
You can always set it before running that particular query, and than put it back to default value.
just use SET work_mem=64MB

Mind you , postgresql requires more memory to sort same set of data on disc than on memory. Your explain analyze indicates, that it used 2912kB , which means your work_mem value is set to some ridiculously low value. Put it up to 8MB or something, and retry.

--
GJ