Thread: OUTER JOIN IS SLOW

OUTER JOIN IS SLOW

From
Benjamin Arai
Date:
I have been running pieces of my PL function by hand and I have found
that the following queries work by themselves taking less than a second
to execute.

getDateRange"('12/1/2005','12/1/2006')  <- simply generates a date
list.  Doesn't even access a table

SELECT * FROM mutualfd_weekday_qbid WHERE cusip='92193920' AND  pkey >=
'12/15/2005' AND pkey <= '12/15/2006';

But when combined as below it takes 10 seconds to execute.

SELECT d1 as date, d2.data as data FROM
"getDateRange"('12/1/2005','12/1/2006') d1 FULL OUTER JOIN (SELECT *
FROM mutualfd_weekday_qbid WHERE cusip='92193920' AND  pkey >=
'12/15/2005' AND pkey <= '12/15/2006') d2 ON d1=d2.pkey;

Do I need to increase the work_mem or is this possible still a ANALYZE
issue?

Benjamin

Re: [ADMIN] OUTER JOIN IS SLOW

From
"Shoaib Mir"
Date:
Try increasing the work_mem first to see the change, that might help.

-------------
Shoaib Mir
EnterpriseDB (www.enterprisedb.com)

On 12/24/06, Benjamin Arai <benjamin@araisoft.com> wrote:
I have been running pieces of my PL function by hand and I have found
that the following queries work by themselves taking less than a second
to execute.

getDateRange"('12/1/2005','12/1/2006')  <- simply generates a date
list.  Doesn't even access a table

SELECT * FROM mutualfd_weekday_qbid WHERE cusip='92193920' AND  pkey >=
'12/15/2005' AND pkey <= '12/15/2006';

But when combined as below it takes 10 seconds to execute.

SELECT d1 as date, d2.data as data FROM
"getDateRange"('12/1/2005','12/1/2006') d1 FULL OUTER JOIN (SELECT *
FROM mutualfd_weekday_qbid WHERE cusip='92193920' AND  pkey >=
'12/15/2005' AND pkey <= '12/15/2006') d2 ON d1=d2.pkey;

Do I need to increase the work_mem or is this possible still a ANALYZE
issue?

Benjamin

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

                http://www.postgresql.org/about/donate

Re: [ADMIN] OUTER JOIN IS SLOW

From
"Shoaib Mir"
Date:
....adding to the last email, for now try the work_mem but you should be adding ANALYZE along with the VACUUM (with a cron job I guess) you do regularly.

------------
Shoaib Mir
EntperpriseDB ( www.enterprisedb.com)

On 12/24/06, Shoaib Mir <shoaibmir@gmail.com> wrote:
Try increasing the work_mem first to see the change, that might help.

-------------
Shoaib Mir
EnterpriseDB ( www.enterprisedb.com)

On 12/24/06, Benjamin Arai <benjamin@araisoft.com> wrote:
I have been running pieces of my PL function by hand and I have found
that the following queries work by themselves taking less than a second
to execute.

getDateRange"('12/1/2005','12/1/2006')  <- simply generates a date
list.  Doesn't even access a table

SELECT * FROM mutualfd_weekday_qbid WHERE cusip='92193920' AND  pkey >=
'12/15/2005' AND pkey <= '12/15/2006';

But when combined as below it takes 10 seconds to execute.

SELECT d1 as date, d2.data as data FROM
"getDateRange"('12/1/2005','12/1/2006') d1 FULL OUTER JOIN (SELECT *
FROM mutualfd_weekday_qbid WHERE cusip='92193920' AND  pkey >=
'12/15/2005' AND pkey <= '12/15/2006') d2 ON d1=d2.pkey;

Do I need to increase the work_mem or is this possible still a ANALYZE
issue?

Benjamin

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

                http://www.postgresql.org/about/donate


Re: [ADMIN] OUTER JOIN IS SLOW

From
Benjamin Arai
Date:
Just to make things more clear I ran EXPLAIN ANALYZE on the slow query.
I got

 Merge Full Join  (cost=62.33..73.36 rows=1000 width=19) (actual
time=39.205..8521.644 rows=272 loops=1)
   Merge Cond: ("outer".pkey = "inner".d1)
   ->  Index Scan using mutualfd_weekday_qbid_pkey_idx on
mutualfd_weekday_qbid  (cost=0.00..6.01 rows=1 width=19) (actual
time=34.579..8510.801 rows=253 loops=1)
         Index Cond: ((pkey >= '2005-12-15'::date) AND (pkey <=
'2006-12-15'::date))
         Filter: (cusip = '92193920'::text)
   ->  Sort  (cost=62.33..64.83 rows=1000 width=4) (actual
time=4.604..5.674 rows=262 loops=1)
         Sort Key: d1.d1
         ->  Function Scan on "getDateRange" d1  (cost=0.00..12.50
rows=1000 width=4) (actual time=2.591..3.591 rows=262 loops=1)
 Total runtime: 8522.894 ms
(9 rows)

It looks like this line:

 "->  Index Scan using mutualfd_weekday_qbid_pkey_idx on
mutualfd_weekday_qbid  (cost=0.00..6.01 rows=1 width=19) (actual
time=34.579..8510.801 rows=253 loops=1)"

is the problem.  As I understand it is using the index but it is low as
dirt.  Hopefully, the -z will fix this.  I also ran the same query but
with earlier dates in the table and the query ran much faster.

 Merge Full Join  (cost=3492.48..3505.60 rows=1034 width=19) (actual
time=7.605..12.851 rows=273 loops=1)
   Merge Cond: ("outer".d1 = "inner".pkey)
   ->  Sort  (cost=62.33..64.83 rows=1000 width=4) (actual
time=4.586..5.530 rows=263 loops=1)
         Sort Key: d1.d1
         ->  Function Scan on "getDateRange" d1  (cost=0.00..12.50
rows=1000 width=4) (actual time=2.493..3.515 rows=263 loops=1)
   ->  Sort  (cost=3430.15..3432.74 rows=1034 width=19) (actual
time=2.998..3.971 rows=263 loops=1)
         Sort Key: mutualfd_weekday_qbid.pkey
         ->  Index Scan using mutualfd_weekday_qbid_pkey on
mutualfd_weekday_qbid  (cost=0.00..3378.38 rows=1034 width=19) (actual
time=0.075..1.843 rows=263 loops=1)
               Index Cond: ((cusip = '92193920'::text) AND (pkey >=
'1999-12-15'::date) AND (pkey <= '2000-12-15'::date))
 Total runtime: 13.935 ms
(10 rows)

In this case it only took 13.935ms as compared to 8522.894 ms for the
newer data.

Benjamin

Shoaib Mir wrote:
> ....adding to the last email, for now try the work_mem but you should
> be adding ANALYZE along with the VACUUM (with a cron job I guess) you
> do regularly.
>
> ------------
> Shoaib Mir
> EntperpriseDB ( www.enterprisedb.com <http://www.enterprisedb.com>)
>
> On 12/24/06, *Shoaib Mir* <shoaibmir@gmail.com
> <mailto:shoaibmir@gmail.com>> wrote:
>
>     Try increasing the work_mem first to see the change, that might help.
>
>     -------------
>     Shoaib Mir
>     EnterpriseDB ( www.enterprisedb.com <http://www.enterprisedb.com>)
>
>     On 12/24/06, *Benjamin Arai* <benjamin@araisoft.com
>     <mailto:benjamin@araisoft.com>> wrote:
>
>         I have been running pieces of my PL function by hand and I
>         have found
>         that the following queries work by themselves taking less than
>         a second
>         to execute.
>
>         getDateRange"('12/1/2005','12/1/2006')  <- simply generates a
>         date
>         list.  Doesn't even access a table
>
>         SELECT * FROM mutualfd_weekday_qbid WHERE cusip='92193920'
>         AND  pkey >=
>         '12/15/2005' AND pkey <= '12/15/2006';
>
>         But when combined as below it takes 10 seconds to execute.
>
>         SELECT d1 as date, d2.data as data FROM
>         "getDateRange"('12/1/2005','12/1/2006') d1 FULL OUTER JOIN
>         (SELECT *
>         FROM mutualfd_weekday_qbid WHERE cusip='92193920' AND  pkey >=
>         '12/15/2005' AND pkey <= '12/15/2006') d2 ON d1=d2.pkey;
>
>         Do I need to increase the work_mem or is this possible still a
>         ANALYZE
>         issue?
>
>         Benjamin
>
>         ---------------------------(end of
>         broadcast)---------------------------
>         TIP 7: You can help support the PostgreSQL project by donating at
>
>                         http://www.postgresql.org/about/donate
>         <http://www.postgresql.org/about/donate>
>
>
>

Re: [ADMIN] OUTER JOIN IS SLOW

From
"Shoaib Mir"
Date:
Yes, ANALYZE should definitely improve the performance for query...

--------------
Shoaib Mir
EnterpriseDB (www.enterprisedb.com)

On 12/24/06, Benjamin Arai <benjamin@araisoft.com> wrote:
Just to make things more clear I ran EXPLAIN ANALYZE on the slow query.
I got

Merge Full Join  (cost=62.33..73.36 rows=1000 width=19) (actual
time=39.205..8521.644 rows=272 loops=1)
   Merge Cond: ("outer".pkey = "inner".d1)
   ->  Index Scan using mutualfd_weekday_qbid_pkey_idx on
mutualfd_weekday_qbid  (cost=0.00..6.01 rows=1 width=19) (actual
time=34.579..8510.801 rows=253 loops=1)
         Index Cond: ((pkey >= '2005-12-15'::date) AND (pkey <=
'2006-12-15'::date))
         Filter: (cusip = '92193920'::text)
   ->  Sort  (cost=62.33..64.83 rows=1000 width=4) (actual
time=4.604..5.674 rows=262 loops=1)
         Sort Key: d1.d1
         ->  Function Scan on "getDateRange" d1  (cost=0.00..12.50
rows=1000 width=4) (actual time=2.591..3.591 rows=262 loops=1)
Total runtime: 8522.894 ms
(9 rows)

It looks like this line:

"->  Index Scan using mutualfd_weekday_qbid_pkey_idx on
mutualfd_weekday_qbid  (cost=0.00..6.01 rows=1 width=19) (actual
time=34.579..8510.801 rows=253 loops=1)"

is the problem.  As I understand it is using the index but it is low as
dirt.  Hopefully, the -z will fix this.  I also ran the same query but
with earlier dates in the table and the query ran much faster.

Merge Full Join  (cost=3492.48..3505.60 rows=1034 width=19) (actual
time= 7.605..12.851 rows=273 loops=1)
   Merge Cond: ("outer".d1 = "inner".pkey)
   ->  Sort  (cost=62.33..64.83 rows=1000 width=4) (actual
time=4.586..5.530 rows=263 loops=1)
         Sort Key: d1.d1
         ->  Function Scan on "getDateRange" d1  (cost=0.00..12.50
rows=1000 width=4) (actual time=2.493..3.515 rows=263 loops=1)
   ->  Sort  (cost=3430.15..3432.74 rows=1034 width=19) (actual
time=2.998..3.971 rows=263 loops=1)
         Sort Key: mutualfd_weekday_qbid.pkey
         ->  Index Scan using mutualfd_weekday_qbid_pkey on
mutualfd_weekday_qbid  (cost=0.00..3378.38 rows=1034 width=19) (actual
time=0.075..1.843 rows=263 loops=1)
               Index Cond: ((cusip = '92193920'::text) AND (pkey >=
'1999-12-15'::date) AND (pkey <= '2000-12-15'::date))
Total runtime: 13.935 ms
(10 rows)

In this case it only took 13.935ms as compared to 8522.894 ms for the
newer data.

Benjamin

Shoaib Mir wrote:
> ....adding to the last email, for now try the work_mem but you should
> be adding ANALYZE along with the VACUUM (with a cron job I guess) you
> do regularly.
>
> ------------
> Shoaib Mir
> EntperpriseDB ( www.enterprisedb.com <http://www.enterprisedb.com>)
>
> On 12/24/06, *Shoaib Mir* <shoaibmir@gmail.com
> <mailto: shoaibmir@gmail.com>> wrote:
>
>     Try increasing the work_mem first to see the change, that might help.
>
>     -------------
>     Shoaib Mir
>     EnterpriseDB ( www.enterprisedb.com <http://www.enterprisedb.com>)
>
>     On 12/24/06, *Benjamin Arai* <benjamin@araisoft.com
>     <mailto:benjamin@araisoft.com>> wrote:
>
>         I have been running pieces of my PL function by hand and I
>         have found
>         that the following queries work by themselves taking less than
>         a second
>         to execute.
>
>         getDateRange"('12/1/2005','12/1/2006')  <- simply generates a
>         date
>         list.  Doesn't even access a table
>
>         SELECT * FROM mutualfd_weekday_qbid WHERE cusip='92193920'
>         AND  pkey >=
>         '12/15/2005' AND pkey <= '12/15/2006';
>
>         But when combined as below it takes 10 seconds to execute.
>
>         SELECT d1 as date, d2.data as data FROM
>         "getDateRange"('12/1/2005','12/1/2006') d1 FULL OUTER JOIN
>         (SELECT *
>         FROM mutualfd_weekday_qbid WHERE cusip='92193920' AND  pkey >=
>         '12/15/2005' AND pkey <= '12/15/2006') d2 ON d1=d2.pkey;
>
>         Do I need to increase the work_mem or is this possible still a
>         ANALYZE
>         issue?
>
>         Benjamin
>
>         ---------------------------(end of
>         broadcast)---------------------------
>         TIP 7: You can help support the PostgreSQL project by donating at
>
>                         http://www.postgresql.org/about/donate
>         <http://www.postgresql.org/about/donate >
>
>
>

Re: [ADMIN] OUTER JOIN IS SLOW

From
Jeff Frost
Date:
On Sat, 23 Dec 2006, Benjamin Arai wrote:

> "->  Index Scan using mutualfd_weekday_qbid_pkey_idx on mutualfd_weekday_qbid
> (cost=0.00..6.01 rows=1 width=19) (actual time=34.579..8510.801 rows=253
> loops=1)"

You're right that this is the problem and show that the planner was expecting
a very low cost on the index scan, but it turned out to be much higher.  This
is because of old statistics.  VACUUM ANALYZE should remedy this.

Just run:

ANALYZE mytablename;

in psql and see if that gets you going.

Also of note, you can set work_mem per session, so it's reasonable to
benchmark various values until you find one that helps your query.

Note that you might not want to set work_mem too high, because it's per
operation, so you could easily run your machine out of RAM if you set this too
high and have many concurrent queries running.

Also, to answer your question regarding FSM settings, you should run a
"vacuumdb -av" and look at the last 8 lines to see if you have your FSM
settings high enough.

>
> is the problem.  As I understand it is using the index but it is low as dirt.
> Hopefully, the -z will fix this.  I also ran the same query but with earlier
> dates in the table and the query ran much faster.
>
> Merge Full Join  (cost=3492.48..3505.60 rows=1034 width=19) (actual
> time=7.605..12.851 rows=273 loops=1)
>  Merge Cond: ("outer".d1 = "inner".pkey)
>  ->  Sort  (cost=62.33..64.83 rows=1000 width=4) (actual time=4.586..5.530
> rows=263 loops=1)
>        Sort Key: d1.d1
>        ->  Function Scan on "getDateRange" d1  (cost=0.00..12.50 rows=1000
> width=4) (actual time=2.493..3.515 rows=263 loops=1)
>  ->  Sort  (cost=3430.15..3432.74 rows=1034 width=19) (actual
> time=2.998..3.971 rows=263 loops=1)
>        Sort Key: mutualfd_weekday_qbid.pkey
>        ->  Index Scan using mutualfd_weekday_qbid_pkey on
> mutualfd_weekday_qbid  (cost=0.00..3378.38 rows=1034 width=19) (actual
> time=0.075..1.843 rows=263 loops=1)
>              Index Cond: ((cusip = '92193920'::text) AND (pkey >=
> '1999-12-15'::date) AND (pkey <= '2000-12-15'::date))
> Total runtime: 13.935 ms
> (10 rows)
>
> In this case it only took 13.935ms as compared to 8522.894 ms for the newer
> data.
>
> Benjamin
>
> Shoaib Mir wrote:
>> ....adding to the last email, for now try the work_mem but you should be
>> adding ANALYZE along with the VACUUM (with a cron job I guess) you do
>> regularly.
>>
>> ------------
>> Shoaib Mir
>> EntperpriseDB ( www.enterprisedb.com <http://www.enterprisedb.com>)
>>
>> On 12/24/06, *Shoaib Mir* <shoaibmir@gmail.com
>> <mailto:shoaibmir@gmail.com>> wrote:
>>
>>     Try increasing the work_mem first to see the change, that might help.
>>
>>     -------------
>>     Shoaib Mir
>>     EnterpriseDB ( www.enterprisedb.com <http://www.enterprisedb.com>)
>>
>>     On 12/24/06, *Benjamin Arai* <benjamin@araisoft.com
>>     <mailto:benjamin@araisoft.com>> wrote:
>>
>>         I have been running pieces of my PL function by hand and I
>>         have found
>>         that the following queries work by themselves taking less than
>>         a second
>>         to execute.
>>
>>         getDateRange"('12/1/2005','12/1/2006')  <- simply generates a
>>         date
>>         list.  Doesn't even access a table
>>
>>         SELECT * FROM mutualfd_weekday_qbid WHERE cusip='92193920'
>>         AND  pkey >=
>>         '12/15/2005' AND pkey <= '12/15/2006';
>>
>>         But when combined as below it takes 10 seconds to execute.
>>
>>         SELECT d1 as date, d2.data as data FROM
>>         "getDateRange"('12/1/2005','12/1/2006') d1 FULL OUTER JOIN
>>         (SELECT *
>>         FROM mutualfd_weekday_qbid WHERE cusip='92193920' AND  pkey >=
>>         '12/15/2005' AND pkey <= '12/15/2006') d2 ON d1=d2.pkey;
>>
>>         Do I need to increase the work_mem or is this possible still a
>>         ANALYZE
>>         issue?
>>
>>         Benjamin
>>
>>         ---------------------------(end of
>>         broadcast)---------------------------
>>         TIP 7: You can help support the PostgreSQL project by donating at
>>
>>                         http://www.postgresql.org/about/donate
>>         <http://www.postgresql.org/about/donate>
>>
>>
>>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: You can help support the PostgreSQL project by donating at
>
>               http://www.postgresql.org/about/donate
>
>

--
Jeff Frost, Owner     <jeff@frostconsultingllc.com>
Frost Consulting, LLC     http://www.frostconsultingllc.com/
Phone: 650-780-7908    FAX: 650-649-1954

Re: [ADMIN] OUTER JOIN IS SLOW

From
Tom Lane
Date:
Benjamin Arai <benjamin@araisoft.com> writes:
>    ->  Index Scan using mutualfd_weekday_qbid_pkey_idx on
> mutualfd_weekday_qbid  (cost=0.00..6.01 rows=1 width=19) (actual
> time=34.579..8510.801 rows=253 loops=1)
>          Index Cond: ((pkey >= '2005-12-15'::date) AND (pkey <=
> '2006-12-15'::date))
>          Filter: (cusip = '92193920'::text)

Hm, so how many rows in mutualfd_weekday_qbid for that date?
And how many satisfy the cusip condition?  (I suppose 253,
but it looks like that must be a very small fraction of all
the rows for that date.)

The selectivity estimators are not great about dealing with
zero-width intervals like this one (in fact, if you look at the
code you'll find it doesn't even bother to distinguish '>' from '>='
... something we should probably try to improve sometime).

You'd probably have better luck if you could fold the WHERE condition
down to "pkey = '2005-12-15'".  Dunno how feasible that is for your
application.

            regards, tom lane

Re: [ADMIN] OUTER JOIN IS SLOW

From
Benjamin Arai
Date:
Hi,

I did a vacuum with -z and it fixed the issue.  I was not aware that
vacuumdb didn't ANALYZE by default.    Thanks everybody for all of the help!

Benjamin

Tom Lane wrote:
> Benjamin Arai <benjamin@araisoft.com> writes:
>
>>    ->  Index Scan using mutualfd_weekday_qbid_pkey_idx on
>> mutualfd_weekday_qbid  (cost=0.00..6.01 rows=1 width=19) (actual
>> time=34.579..8510.801 rows=253 loops=1)
>>          Index Cond: ((pkey >= '2005-12-15'::date) AND (pkey <=
>> '2006-12-15'::date))
>>          Filter: (cusip = '92193920'::text)
>>
>
> Hm, so how many rows in mutualfd_weekday_qbid for that date?
> And how many satisfy the cusip condition?  (I suppose 253,
> but it looks like that must be a very small fraction of all
> the rows for that date.)
>
> The selectivity estimators are not great about dealing with
> zero-width intervals like this one (in fact, if you look at the
> code you'll find it doesn't even bother to distinguish '>' from '>='
> ... something we should probably try to improve sometime).
>
> You'd probably have better luck if you could fold the WHERE condition
> down to "pkey = '2005-12-15'".  Dunno how feasible that is for your
> application.
>
>             regards, tom lane
>
>

Re: [ADMIN] OUTER JOIN IS SLOW

From
Alban Hertroys
Date:
Benjamin Arai wrote:
> Just to make things more clear I ran EXPLAIN ANALYZE on the slow query.
> I got

>   ->  Index Scan using mutualfd_weekday_qbid_pkey_idx on
> mutualfd_weekday_qbid  (cost=0.00..6.01 rows=1 width=19) (actual
> time=34.579..8510.801 rows=253 loops=1)
>         Index Cond: ((pkey >= '2005-12-15'::date) AND (pkey <=
> '2006-12-15'::date))
>         Filter: (cusip = '92193920'::text)

This is the relevant part (including the index details). It's filtering
(text) over all index items within a period of a full year. I imagine
can be quite a lot of data (unless pkey is a primary key on a date
field; only 366 records).

As mentioned, ANALYZE helps, but you may want to check:
1) Are you comparing integers as text strings? You may not want to.
2) Do you have an index on cusip?

You probably have, just in case.

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //