Thread: OUTER JOIN IS SLOW
I have been running pieces of my PL function by hand and I have found that the following queries work by themselves taking less than a second to execute. getDateRange"('12/1/2005','12/1/2006') <- simply generates a date list. Doesn't even access a table SELECT * FROM mutualfd_weekday_qbid WHERE cusip='92193920' AND pkey >= '12/15/2005' AND pkey <= '12/15/2006'; But when combined as below it takes 10 seconds to execute. SELECT d1 as date, d2.data as data FROM "getDateRange"('12/1/2005','12/1/2006') d1 FULL OUTER JOIN (SELECT * FROM mutualfd_weekday_qbid WHERE cusip='92193920' AND pkey >= '12/15/2005' AND pkey <= '12/15/2006') d2 ON d1=d2.pkey; Do I need to increase the work_mem or is this possible still a ANALYZE issue? Benjamin
Try increasing the work_mem first to see the change, that might help.
-------------
Shoaib Mir
EnterpriseDB (www.enterprisedb.com)
-------------
Shoaib Mir
EnterpriseDB (www.enterprisedb.com)
On 12/24/06, Benjamin Arai <benjamin@araisoft.com> wrote:
I have been running pieces of my PL function by hand and I have found
that the following queries work by themselves taking less than a second
to execute.
getDateRange"('12/1/2005','12/1/2006') <- simply generates a date
list. Doesn't even access a table
SELECT * FROM mutualfd_weekday_qbid WHERE cusip='92193920' AND pkey >=
'12/15/2005' AND pkey <= '12/15/2006';
But when combined as below it takes 10 seconds to execute.
SELECT d1 as date, d2.data as data FROM
"getDateRange"('12/1/2005','12/1/2006') d1 FULL OUTER JOIN (SELECT *
FROM mutualfd_weekday_qbid WHERE cusip='92193920' AND pkey >=
'12/15/2005' AND pkey <= '12/15/2006') d2 ON d1=d2.pkey;
Do I need to increase the work_mem or is this possible still a ANALYZE
issue?
Benjamin
---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at
http://www.postgresql.org/about/donate
....adding to the last email, for now try the work_mem but you should be adding ANALYZE along with the VACUUM (with a cron job I guess) you do regularly.
------------
Shoaib Mir
EntperpriseDB ( www.enterprisedb.com)
------------
Shoaib Mir
EntperpriseDB ( www.enterprisedb.com)
On 12/24/06, Shoaib Mir <shoaibmir@gmail.com> wrote:
Try increasing the work_mem first to see the change, that might help.
-------------
Shoaib Mir
EnterpriseDB ( www.enterprisedb.com)On 12/24/06, Benjamin Arai <benjamin@araisoft.com> wrote:I have been running pieces of my PL function by hand and I have found
that the following queries work by themselves taking less than a second
to execute.
getDateRange"('12/1/2005','12/1/2006') <- simply generates a date
list. Doesn't even access a table
SELECT * FROM mutualfd_weekday_qbid WHERE cusip='92193920' AND pkey >=
'12/15/2005' AND pkey <= '12/15/2006';
But when combined as below it takes 10 seconds to execute.
SELECT d1 as date, d2.data as data FROM
"getDateRange"('12/1/2005','12/1/2006') d1 FULL OUTER JOIN (SELECT *
FROM mutualfd_weekday_qbid WHERE cusip='92193920' AND pkey >=
'12/15/2005' AND pkey <= '12/15/2006') d2 ON d1=d2.pkey;
Do I need to increase the work_mem or is this possible still a ANALYZE
issue?
Benjamin
---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at
http://www.postgresql.org/about/donate
Just to make things more clear I ran EXPLAIN ANALYZE on the slow query. I got Merge Full Join (cost=62.33..73.36 rows=1000 width=19) (actual time=39.205..8521.644 rows=272 loops=1) Merge Cond: ("outer".pkey = "inner".d1) -> Index Scan using mutualfd_weekday_qbid_pkey_idx on mutualfd_weekday_qbid (cost=0.00..6.01 rows=1 width=19) (actual time=34.579..8510.801 rows=253 loops=1) Index Cond: ((pkey >= '2005-12-15'::date) AND (pkey <= '2006-12-15'::date)) Filter: (cusip = '92193920'::text) -> Sort (cost=62.33..64.83 rows=1000 width=4) (actual time=4.604..5.674 rows=262 loops=1) Sort Key: d1.d1 -> Function Scan on "getDateRange" d1 (cost=0.00..12.50 rows=1000 width=4) (actual time=2.591..3.591 rows=262 loops=1) Total runtime: 8522.894 ms (9 rows) It looks like this line: "-> Index Scan using mutualfd_weekday_qbid_pkey_idx on mutualfd_weekday_qbid (cost=0.00..6.01 rows=1 width=19) (actual time=34.579..8510.801 rows=253 loops=1)" is the problem. As I understand it is using the index but it is low as dirt. Hopefully, the -z will fix this. I also ran the same query but with earlier dates in the table and the query ran much faster. Merge Full Join (cost=3492.48..3505.60 rows=1034 width=19) (actual time=7.605..12.851 rows=273 loops=1) Merge Cond: ("outer".d1 = "inner".pkey) -> Sort (cost=62.33..64.83 rows=1000 width=4) (actual time=4.586..5.530 rows=263 loops=1) Sort Key: d1.d1 -> Function Scan on "getDateRange" d1 (cost=0.00..12.50 rows=1000 width=4) (actual time=2.493..3.515 rows=263 loops=1) -> Sort (cost=3430.15..3432.74 rows=1034 width=19) (actual time=2.998..3.971 rows=263 loops=1) Sort Key: mutualfd_weekday_qbid.pkey -> Index Scan using mutualfd_weekday_qbid_pkey on mutualfd_weekday_qbid (cost=0.00..3378.38 rows=1034 width=19) (actual time=0.075..1.843 rows=263 loops=1) Index Cond: ((cusip = '92193920'::text) AND (pkey >= '1999-12-15'::date) AND (pkey <= '2000-12-15'::date)) Total runtime: 13.935 ms (10 rows) In this case it only took 13.935ms as compared to 8522.894 ms for the newer data. Benjamin Shoaib Mir wrote: > ....adding to the last email, for now try the work_mem but you should > be adding ANALYZE along with the VACUUM (with a cron job I guess) you > do regularly. > > ------------ > Shoaib Mir > EntperpriseDB ( www.enterprisedb.com <http://www.enterprisedb.com>) > > On 12/24/06, *Shoaib Mir* <shoaibmir@gmail.com > <mailto:shoaibmir@gmail.com>> wrote: > > Try increasing the work_mem first to see the change, that might help. > > ------------- > Shoaib Mir > EnterpriseDB ( www.enterprisedb.com <http://www.enterprisedb.com>) > > On 12/24/06, *Benjamin Arai* <benjamin@araisoft.com > <mailto:benjamin@araisoft.com>> wrote: > > I have been running pieces of my PL function by hand and I > have found > that the following queries work by themselves taking less than > a second > to execute. > > getDateRange"('12/1/2005','12/1/2006') <- simply generates a > date > list. Doesn't even access a table > > SELECT * FROM mutualfd_weekday_qbid WHERE cusip='92193920' > AND pkey >= > '12/15/2005' AND pkey <= '12/15/2006'; > > But when combined as below it takes 10 seconds to execute. > > SELECT d1 as date, d2.data as data FROM > "getDateRange"('12/1/2005','12/1/2006') d1 FULL OUTER JOIN > (SELECT * > FROM mutualfd_weekday_qbid WHERE cusip='92193920' AND pkey >= > '12/15/2005' AND pkey <= '12/15/2006') d2 ON d1=d2.pkey; > > Do I need to increase the work_mem or is this possible still a > ANALYZE > issue? > > Benjamin > > ---------------------------(end of > broadcast)--------------------------- > TIP 7: You can help support the PostgreSQL project by donating at > > http://www.postgresql.org/about/donate > <http://www.postgresql.org/about/donate> > > >
Yes, ANALYZE should definitely improve the performance for query...
--------------
Shoaib Mir
EnterpriseDB (www.enterprisedb.com)
--------------
Shoaib Mir
EnterpriseDB (www.enterprisedb.com)
On 12/24/06, Benjamin Arai <benjamin@araisoft.com> wrote:
Just to make things more clear I ran EXPLAIN ANALYZE on the slow query.
I got
Merge Full Join (cost=62.33..73.36 rows=1000 width=19) (actual
time=39.205..8521.644 rows=272 loops=1)
Merge Cond: ("outer".pkey = "inner".d1)
-> Index Scan using mutualfd_weekday_qbid_pkey_idx on
mutualfd_weekday_qbid (cost=0.00..6.01 rows=1 width=19) (actual
time=34.579..8510.801 rows=253 loops=1)
Index Cond: ((pkey >= '2005-12-15'::date) AND (pkey <=
'2006-12-15'::date))
Filter: (cusip = '92193920'::text)
-> Sort (cost=62.33..64.83 rows=1000 width=4) (actual
time=4.604..5.674 rows=262 loops=1)
Sort Key: d1.d1
-> Function Scan on "getDateRange" d1 (cost=0.00..12.50
rows=1000 width=4) (actual time=2.591..3.591 rows=262 loops=1)
Total runtime: 8522.894 ms
(9 rows)
It looks like this line:
"-> Index Scan using mutualfd_weekday_qbid_pkey_idx on
mutualfd_weekday_qbid (cost=0.00..6.01 rows=1 width=19) (actual
time=34.579..8510.801 rows=253 loops=1)"
is the problem. As I understand it is using the index but it is low as
dirt. Hopefully, the -z will fix this. I also ran the same query but
with earlier dates in the table and the query ran much faster.
Merge Full Join (cost=3492.48..3505.60 rows=1034 width=19) (actual
time= 7.605..12.851 rows=273 loops=1)
Merge Cond: ("outer".d1 = "inner".pkey)
-> Sort (cost=62.33..64.83 rows=1000 width=4) (actual
time=4.586..5.530 rows=263 loops=1)
Sort Key: d1.d1
-> Function Scan on "getDateRange" d1 (cost=0.00..12.50
rows=1000 width=4) (actual time=2.493..3.515 rows=263 loops=1)
-> Sort (cost=3430.15..3432.74 rows=1034 width=19) (actual
time=2.998..3.971 rows=263 loops=1)
Sort Key: mutualfd_weekday_qbid.pkey
-> Index Scan using mutualfd_weekday_qbid_pkey on
mutualfd_weekday_qbid (cost=0.00..3378.38 rows=1034 width=19) (actual
time=0.075..1.843 rows=263 loops=1)
Index Cond: ((cusip = '92193920'::text) AND (pkey >=
'1999-12-15'::date) AND (pkey <= '2000-12-15'::date))
Total runtime: 13.935 ms
(10 rows)
In this case it only took 13.935ms as compared to 8522.894 ms for the
newer data.
Benjamin
Shoaib Mir wrote:
> ....adding to the last email, for now try the work_mem but you should
> be adding ANALYZE along with the VACUUM (with a cron job I guess) you
> do regularly.
>
> ------------
> Shoaib Mir
> EntperpriseDB ( www.enterprisedb.com <http://www.enterprisedb.com>)
>
> On 12/24/06, *Shoaib Mir* <shoaibmir@gmail.com
> <mailto: shoaibmir@gmail.com>> wrote:
>
> Try increasing the work_mem first to see the change, that might help.
>
> -------------
> Shoaib Mir
> EnterpriseDB ( www.enterprisedb.com <http://www.enterprisedb.com>)
>
> On 12/24/06, *Benjamin Arai* <benjamin@araisoft.com
> <mailto:benjamin@araisoft.com>> wrote:
>
> I have been running pieces of my PL function by hand and I
> have found
> that the following queries work by themselves taking less than
> a second
> to execute.
>
> getDateRange"('12/1/2005','12/1/2006') <- simply generates a
> date
> list. Doesn't even access a table
>
> SELECT * FROM mutualfd_weekday_qbid WHERE cusip='92193920'
> AND pkey >=
> '12/15/2005' AND pkey <= '12/15/2006';
>
> But when combined as below it takes 10 seconds to execute.
>
> SELECT d1 as date, d2.data as data FROM
> "getDateRange"('12/1/2005','12/1/2006') d1 FULL OUTER JOIN
> (SELECT *
> FROM mutualfd_weekday_qbid WHERE cusip='92193920' AND pkey >=
> '12/15/2005' AND pkey <= '12/15/2006') d2 ON d1=d2.pkey;
>
> Do I need to increase the work_mem or is this possible still a
> ANALYZE
> issue?
>
> Benjamin
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 7: You can help support the PostgreSQL project by donating at
>
> http://www.postgresql.org/about/donate
> <http://www.postgresql.org/about/donate >
>
>
>
On Sat, 23 Dec 2006, Benjamin Arai wrote: > "-> Index Scan using mutualfd_weekday_qbid_pkey_idx on mutualfd_weekday_qbid > (cost=0.00..6.01 rows=1 width=19) (actual time=34.579..8510.801 rows=253 > loops=1)" You're right that this is the problem and show that the planner was expecting a very low cost on the index scan, but it turned out to be much higher. This is because of old statistics. VACUUM ANALYZE should remedy this. Just run: ANALYZE mytablename; in psql and see if that gets you going. Also of note, you can set work_mem per session, so it's reasonable to benchmark various values until you find one that helps your query. Note that you might not want to set work_mem too high, because it's per operation, so you could easily run your machine out of RAM if you set this too high and have many concurrent queries running. Also, to answer your question regarding FSM settings, you should run a "vacuumdb -av" and look at the last 8 lines to see if you have your FSM settings high enough. > > is the problem. As I understand it is using the index but it is low as dirt. > Hopefully, the -z will fix this. I also ran the same query but with earlier > dates in the table and the query ran much faster. > > Merge Full Join (cost=3492.48..3505.60 rows=1034 width=19) (actual > time=7.605..12.851 rows=273 loops=1) > Merge Cond: ("outer".d1 = "inner".pkey) > -> Sort (cost=62.33..64.83 rows=1000 width=4) (actual time=4.586..5.530 > rows=263 loops=1) > Sort Key: d1.d1 > -> Function Scan on "getDateRange" d1 (cost=0.00..12.50 rows=1000 > width=4) (actual time=2.493..3.515 rows=263 loops=1) > -> Sort (cost=3430.15..3432.74 rows=1034 width=19) (actual > time=2.998..3.971 rows=263 loops=1) > Sort Key: mutualfd_weekday_qbid.pkey > -> Index Scan using mutualfd_weekday_qbid_pkey on > mutualfd_weekday_qbid (cost=0.00..3378.38 rows=1034 width=19) (actual > time=0.075..1.843 rows=263 loops=1) > Index Cond: ((cusip = '92193920'::text) AND (pkey >= > '1999-12-15'::date) AND (pkey <= '2000-12-15'::date)) > Total runtime: 13.935 ms > (10 rows) > > In this case it only took 13.935ms as compared to 8522.894 ms for the newer > data. > > Benjamin > > Shoaib Mir wrote: >> ....adding to the last email, for now try the work_mem but you should be >> adding ANALYZE along with the VACUUM (with a cron job I guess) you do >> regularly. >> >> ------------ >> Shoaib Mir >> EntperpriseDB ( www.enterprisedb.com <http://www.enterprisedb.com>) >> >> On 12/24/06, *Shoaib Mir* <shoaibmir@gmail.com >> <mailto:shoaibmir@gmail.com>> wrote: >> >> Try increasing the work_mem first to see the change, that might help. >> >> ------------- >> Shoaib Mir >> EnterpriseDB ( www.enterprisedb.com <http://www.enterprisedb.com>) >> >> On 12/24/06, *Benjamin Arai* <benjamin@araisoft.com >> <mailto:benjamin@araisoft.com>> wrote: >> >> I have been running pieces of my PL function by hand and I >> have found >> that the following queries work by themselves taking less than >> a second >> to execute. >> >> getDateRange"('12/1/2005','12/1/2006') <- simply generates a >> date >> list. Doesn't even access a table >> >> SELECT * FROM mutualfd_weekday_qbid WHERE cusip='92193920' >> AND pkey >= >> '12/15/2005' AND pkey <= '12/15/2006'; >> >> But when combined as below it takes 10 seconds to execute. >> >> SELECT d1 as date, d2.data as data FROM >> "getDateRange"('12/1/2005','12/1/2006') d1 FULL OUTER JOIN >> (SELECT * >> FROM mutualfd_weekday_qbid WHERE cusip='92193920' AND pkey >= >> '12/15/2005' AND pkey <= '12/15/2006') d2 ON d1=d2.pkey; >> >> Do I need to increase the work_mem or is this possible still a >> ANALYZE >> issue? >> >> Benjamin >> >> ---------------------------(end of >> broadcast)--------------------------- >> TIP 7: You can help support the PostgreSQL project by donating at >> >> http://www.postgresql.org/about/donate >> <http://www.postgresql.org/about/donate> >> >> >> > > ---------------------------(end of broadcast)--------------------------- > TIP 7: You can help support the PostgreSQL project by donating at > > http://www.postgresql.org/about/donate > > -- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954
Benjamin Arai <benjamin@araisoft.com> writes: > -> Index Scan using mutualfd_weekday_qbid_pkey_idx on > mutualfd_weekday_qbid (cost=0.00..6.01 rows=1 width=19) (actual > time=34.579..8510.801 rows=253 loops=1) > Index Cond: ((pkey >= '2005-12-15'::date) AND (pkey <= > '2006-12-15'::date)) > Filter: (cusip = '92193920'::text) Hm, so how many rows in mutualfd_weekday_qbid for that date? And how many satisfy the cusip condition? (I suppose 253, but it looks like that must be a very small fraction of all the rows for that date.) The selectivity estimators are not great about dealing with zero-width intervals like this one (in fact, if you look at the code you'll find it doesn't even bother to distinguish '>' from '>=' ... something we should probably try to improve sometime). You'd probably have better luck if you could fold the WHERE condition down to "pkey = '2005-12-15'". Dunno how feasible that is for your application. regards, tom lane
Hi, I did a vacuum with -z and it fixed the issue. I was not aware that vacuumdb didn't ANALYZE by default. Thanks everybody for all of the help! Benjamin Tom Lane wrote: > Benjamin Arai <benjamin@araisoft.com> writes: > >> -> Index Scan using mutualfd_weekday_qbid_pkey_idx on >> mutualfd_weekday_qbid (cost=0.00..6.01 rows=1 width=19) (actual >> time=34.579..8510.801 rows=253 loops=1) >> Index Cond: ((pkey >= '2005-12-15'::date) AND (pkey <= >> '2006-12-15'::date)) >> Filter: (cusip = '92193920'::text) >> > > Hm, so how many rows in mutualfd_weekday_qbid for that date? > And how many satisfy the cusip condition? (I suppose 253, > but it looks like that must be a very small fraction of all > the rows for that date.) > > The selectivity estimators are not great about dealing with > zero-width intervals like this one (in fact, if you look at the > code you'll find it doesn't even bother to distinguish '>' from '>=' > ... something we should probably try to improve sometime). > > You'd probably have better luck if you could fold the WHERE condition > down to "pkey = '2005-12-15'". Dunno how feasible that is for your > application. > > regards, tom lane > >
Benjamin Arai wrote: > Just to make things more clear I ran EXPLAIN ANALYZE on the slow query. > I got > -> Index Scan using mutualfd_weekday_qbid_pkey_idx on > mutualfd_weekday_qbid (cost=0.00..6.01 rows=1 width=19) (actual > time=34.579..8510.801 rows=253 loops=1) > Index Cond: ((pkey >= '2005-12-15'::date) AND (pkey <= > '2006-12-15'::date)) > Filter: (cusip = '92193920'::text) This is the relevant part (including the index details). It's filtering (text) over all index items within a period of a full year. I imagine can be quite a lot of data (unless pkey is a primary key on a date field; only 366 records). As mentioned, ANALYZE helps, but you may want to check: 1) Are you comparing integers as text strings? You may not want to. 2) Do you have an index on cusip? You probably have, just in case. -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World //