Thread: BUG #3667: Job scheduling with Greenplum fails
The following bug has been logged online: Bug reference: 3667 Logged by: Jon Roberts Email address: jon.roberts@asurion.com PostgreSQL version: 8.2 Operating system: Solaris Description: Job scheduling with Greenplum fails Details: Greenplum doesn't support "correlated subqueries" which PGAdmin III uses when PgAgent is installed. I suggest the SQL related to PgAgent be re-written to not use correlated subqueries.
"Jon Roberts" <jon.roberts@asurion.com> writes: > Greenplum doesn't support "correlated subqueries" which PGAdmin III uses > when PgAgent is installed. Surely this complaint should be directed to Greenplum. Correlated subqueries are a required entry-level feature in the SQL92 standard, and have been supported by PG for nigh ten years. It is by no means unreasonable for PGAdmin to expect them to work. regards, tom lane
"Roberts, Jon" <Jon.Roberts@asurion.com> writes: > What if the new query has a significantly lower cost compared to the older > one? Much as I'd like the planner to be infallible, it ain't; estimated costs are no proof of any real-world performance difference. Better show EXPLAIN ANALYZE numbers if you want to be taken seriously --- and *not* ones from a Greenplum-modified PG. In any case "I can make this particular query faster" seems a rather different argument from "you guys should eliminate all use of correlated subqueries". regards, tom lane
What if the new query has a significantly lower cost compared to the older one? The current query found in pgaJob.cpp: SELECT *, (SELECT jlgstatus FROM pgagent.pga_joblog jl WHERE jl.jlgjobid = j.jobid ORDER BY jlgid DESC LIMIT 1) AS joblastresult FROM pgagent.pga_job j JOIN pgagent.pga_jobclass cl ON cl.jclid=jobjclid LEFT OUTER JOIN pgagent.pga_jobagent ag ON ag.jagpid=jobagentid -- + restriction + ORDER BY jobname "Sort (cost=5359.18..5360.33 rows=460 width=221)" " Sort Key: j.jobname" " -> Hash Join (cost=69.50..5338.84 rows=460 width=221)" " Hash Cond: (j.jobjclid = cl.jclid)" " -> Hash Left Join (cost=33.40..54.33 rows=460 width=185)" " Hash Cond: (j.jobagentid = ag.jagpid)" " -> Seq Scan on pga_job j (cost=0.00..14.60 rows=460 width=141)" " -> Hash (cost=20.40..20.40 rows=1040 width=44)" " -> Seq Scan on pga_jobagent ag (cost=0.00..20.40 rows=1040 width=44)" " -> Hash (cost=21.60..21.60 rows=1160 width=36)" " -> Seq Scan on pga_jobclass cl (cost=0.00..21.60 rows=1160 width=36)" " SubPlan" " -> Limit (cost=0.00..11.40 rows=1 width=9)" " -> Index Scan Backward using pga_joblog_pkey on pga_joblog jl (cost=0.00..68.38 rows=6 width=9)" " Filter: (jlgjobid = $0)" My revised query: select j.*, cl.*, ag.*, sub3.jlgstatus from pgagent.pga_job j join pgagent.pga_jobclass cl on cl.jclid=jobjclid left outer join pgagent.pga_jobagent ag on ag.jagpid=jobagentid join (select j2.jlgstatus, sub.jlgjobid from pgagent.pga_joblog j2 join (select jl.jlgjobid, max(jl.jlgid) as max_jlgid from pgagent.pga_joblog jl group by jl.jlgjobid) sub on sub.jlgjobid = j2.jlgjobid and sub.max_jlgid = j2.jlgid) sub3 on sub3.jlgjobid = j.jobid -- + restriction + order by jobname "Sort (cost=81.07..81.07 rows=1 width=226)" " Sort Key: j.jobname" " -> Nested Loop (cost=66.00..81.06 rows=1 width=226)" " -> Nested Loop Left Join (cost=66.00..80.67 rows=1 width=190)" " -> Nested Loop (cost=66.00..80.29 rows=1 width=146)" " -> Hash Join (cost=66.00..72.01 rows=1 width=13)" " Hash Cond: ((sub.jlgjobid = j2.jlgjobid) AND (sub.max_jlgid = j2.jlgid))" " -> HashAggregate (cost=27.25..29.75 rows=200 width=8)" " -> Seq Scan on pga_joblog jl (cost=0.00..21.50 rows=1150 width=8)" " -> Hash (cost=21.50..21.50 rows=1150 width=13)" " -> Seq Scan on pga_joblog j2 (cost=0.00..21.50 rows=1150 width=13)" " -> Index Scan using pga_job_pkey on pga_job j (cost=0.00..8.27 rows=1 width=141)" " Index Cond: (sub.jlgjobid = j.jobid)" " -> Index Scan using pga_jobagent_pkey on pga_jobagent ag (cost=0.00..0.37 rows=1 width=44)" " Index Cond: (ag.jagpid = j.jobagentid)" " -> Index Scan using pga_jobclass_pkey on pga_jobclass cl (cost=0.00..0.37 rows=1 width=36)" " Index Cond: (cl.jclid = j.jobjclid)" Surely the cost dropping from 5359 to 81 is worth the trouble. Jon -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Wednesday, October 10, 2007 2:57 PM To: Jon Roberts Cc: pgsql-bugs@postgresql.org Subject: Re: [BUGS] BUG #3667: Job scheduling with Greenplum fails "Jon Roberts" <jon.roberts@asurion.com> writes: > Greenplum doesn't support "correlated subqueries" which PGAdmin III uses > when PgAgent is installed. Surely this complaint should be directed to Greenplum. Correlated subqueries are a required entry-level feature in the SQL92 standard, and have been supported by PG for nigh ten years. It is by no means unreasonable for PGAdmin to expect them to work. regards, tom lane
The original query: EXPLAIN ANALYZE SELECT *, (SELECT jlgstatus FROM pgagent.pga_joblog jl WHERE jl.jlgjobid = j.jobid ORDER BY jlgid DESC LIMIT 1) AS joblastresult FROM pgagent.pga_job j JOIN pgagent.pga_jobclass cl ON cl.jclid=jobjclid LEFT OUTER JOIN pgagent.pga_jobagent ag ON ag.jagpid=jobagentid -- + restriction + ORDER BY jobname; "Sort (cost=5359.18..5360.33 rows=460 width=221) (actual time=0.295..0.300 rows=2 loops=1)" " Sort Key: j.jobname" " -> Hash Join (cost=69.50..5338.84 rows=460 width=221) (actual time=0.189..0.249 rows=2 loops=1)" " Hash Cond: (j.jobjclid = cl.jclid)" " -> Hash Left Join (cost=33.40..54.33 rows=460 width=185) (actual time=0.068..0.084 rows=2 loops=1)" " Hash Cond: (j.jobagentid = ag.jagpid)" " -> Seq Scan on pga_job j (cost=0.00..14.60 rows=460 width=141) (actual time=0.025..0.030 rows=2 loops=1)" " -> Hash (cost=20.40..20.40 rows=1040 width=44) (actual time=0.019..0.019 rows=1 loops=1)" " -> Seq Scan on pga_jobagent ag (cost=0.00..20.40 rows=1040 width=44) (actual time=0.005..0.008 rows=1 loops=1)" " -> Hash (cost=21.60..21.60 rows=1160 width=36) (actual time=0.050..0.050 rows=5 loops=1)" " -> Seq Scan on pga_jobclass cl (cost=0.00..21.60 rows=1160 width=36) (actual time=0.011..0.022 rows=5 loops=1)" " SubPlan" " -> Limit (cost=0.00..11.40 rows=1 width=9) (actual time=0.023..0.025 rows=1 loops=2)" " -> Index Scan Backward using pga_joblog_pkey on pga_joblog jl (cost=0.00..68.38 rows=6 width=9) (actual time=0.014..0.014 rows=1 loops=2)" " Filter: (jlgjobid = $0)" "Total runtime: 0.519 ms" My revised query: EXPLAIN ANALYZE select j.*, cl.*, ag.*, sub3.jlgstatus from pgagent.pga_job j join pgagent.pga_jobclass cl on cl.jclid=jobjclid left outer join pgagent.pga_jobagent ag on ag.jagpid=jobagentid join (select j2.jlgstatus, sub.jlgjobid from pgagent.pga_joblog j2 join (select jl.jlgjobid, max(jl.jlgid) as max_jlgid from pgagent.pga_joblog jl group by jl.jlgjobid) sub on sub.jlgjobid = j2.jlgjobid and sub.max_jlgid = j2.jlgid) sub3 on sub3.jlgjobid = j.jobid -- + restriction + order by jobname; "Sort (cost=81.07..81.07 rows=1 width=226) (actual time=0.780..0.784 rows=2 loops=1)" " Sort Key: j.jobname" " -> Nested Loop (cost=66.00..81.06 rows=1 width=226) (actual time=0.633..0.736 rows=2 loops=1)" " -> Nested Loop Left Join (cost=66.00..80.67 rows=1 width=190) (actual time=0.613..0.684 rows=2 loops=1)" " -> Nested Loop (cost=66.00..80.29 rows=1 width=146) (actual time=0.598..0.651 rows=2 loops=1)" " -> Hash Join (cost=66.00..72.01 rows=1 width=13) (actual time=0.566..0.583 rows=2 loops=1)" " Hash Cond: ((sub.jlgjobid = j2.jlgjobid) AND (sub.max_jlgid = j2.jlgid))" " -> HashAggregate (cost=27.25..29.75 rows=200 width=8) (actual time=0.252..0.257 rows=2 loops=1)" " -> Seq Scan on pga_joblog jl (cost=0.00..21.50 rows=1150 width=8) (actual time=0.012..0.111 rows=44 loops=1)" " -> Hash (cost=21.50..21.50 rows=1150 width=13) (actual time=0.283..0.283 rows=44 loops=1)" " -> Seq Scan on pga_joblog j2 (cost=0.00..21.50 rows=1150 width=13) (actual time=0.023..0.145 rows=44 loops=1)" " -> Index Scan using pga_job_pkey on pga_job j (cost=0.00..8.27 rows=1 width=141) (actual time=0.014..0.017 rows=1 loops=2)" " Index Cond: (sub.jlgjobid = j.jobid)" " -> Index Scan using pga_jobagent_pkey on pga_jobagent ag (cost=0.00..0.37 rows=1 width=44) (actual time=0.002..0.002 rows=0 loops=2)" " Index Cond: (ag.jagpid = j.jobagentid)" " -> Index Scan using pga_jobclass_pkey on pga_jobclass cl (cost=0.00..0.37 rows=1 width=36) (actual time=0.006..0.009 rows=1 loops=2)" " Index Cond: (cl.jclid = j.jobjclid)" "Total runtime: 1.096 ms" My table only has 2 records in it so it might be different when we have several hundred jobs. The cost is significantly lower but the total runtime is higher. This is on a PostgreSQL database installed on my desktop. It has nothing to do with Greenplum. I can't even run an explain plan on GP with that first query because it fails. Another solution would be to call a function in the database rather than imbedding the SQL in C++. If you look at pgagent.sql, there are a few functions created to support jobs. Maybe this could be another function call so it could easily be modified to support Greenplum and make it easier for users to tweak the tool. Thanks! Jon -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Wednesday, October 10, 2007 4:24 PM To: Roberts, Jon Cc: pgsql-bugs@postgresql.org Subject: Re: [BUGS] BUG #3667: Job scheduling with Greenplum fails "Roberts, Jon" <Jon.Roberts@asurion.com> writes: > What if the new query has a significantly lower cost compared to the older > one? Much as I'd like the planner to be infallible, it ain't; estimated costs are no proof of any real-world performance difference. Better show EXPLAIN ANALYZE numbers if you want to be taken seriously --- and *not* ones from a Greenplum-modified PG. In any case "I can make this particular query faster" seems a rather different argument from "you guys should eliminate all use of correlated subqueries". regards, tom lane
On Thu, Oct 11, 2007 at 07:31:44AM -0500, Roberts, Jon wrote: > The cost is significantly lower but the total runtime is higher. Um, so you want developers to change the thing so that it performs more slowly, but has a prettier estimate of how much work it's going to do? That seems like a poor optimisation to me. A -- Andrew Sullivan | ajs@crankycanuck.ca The whole tendency of modern prose is away from concreteness. --George Orwell
Like I said in the email, I think the best solution is to put the code in a function. However, after I inserted a total of 25 jobs, the difference is more noticeable and my sql is better in terms of cost and total time. vacuum analyze pgagent.pga_job; vacuum analyze pgagent.pga_jobclass; vacuum analyze pgagent.pga_jobagent; Original SQL: explain analyze SELECT *, (SELECT jlgstatus FROM pgagent.pga_joblog jl WHERE jl.jlgjobid = j.jobid ORDER BY jlgid DESC LIMIT 1) AS joblastresult FROM pgagent.pga_job j JOIN pgagent.pga_jobclass cl ON cl.jclid=jobjclid LEFT OUTER JOIN pgagent.pga_jobagent ag ON ag.jagpid=jobagentid -- + restriction + ORDER BY jobname "Sort (cost=291.31..291.37 rows=25 width=127) (actual time=2.125..2.181 rows=25 loops=1)" " Sort Key: j.jobname" " -> Hash Join (cost=2.13..290.73 rows=25 width=127) (actual time=0.204..1.823 rows=25 loops=1)" " Hash Cond: (j.jobjclid = cl.jclid)" " -> Hash Left Join (cost=1.02..4.38 rows=25 width=105) (actual time=0.061..0.264 rows=25 loops=1)" " Hash Cond: (j.jobagentid = ag.jagpid)" " -> Seq Scan on pga_job j (cost=0.00..3.25 rows=25 width=63) (actual time=0.009..0.074 rows=25 loops=1)" " -> Hash (cost=1.01..1.01 rows=1 width=42) (actual time=0.021..0.021 rows=1 loops=1)" " -> Seq Scan on pga_jobagent ag (cost=0.00..1.01 rows=1 width=42) (actual time=0.005..0.008 rows=1 loops=1)" " -> Hash (cost=1.05..1.05 rows=5 width=22) (actual time=0.050..0.050 rows=5 loops=1)" " -> Seq Scan on pga_jobclass cl (cost=0.00..1.05 rows=5 width=22) (actual time=0.005..0.026 rows=5 loops=1)" " SubPlan" " -> Limit (cost=0.00..11.40 rows=1 width=9) (actual time=0.045..0.045 rows=0 loops=25)" " -> Index Scan Backward using pga_joblog_pkey on pga_joblog jl (cost=0.00..68.38 rows=6 width=9) (actual time=0.036..0.036 rows=0 loops=25)" " Filter: (jlgjobid = $0)" "Total runtime: 2.436 ms" My SQL: explain analyze select j.*, cl.*, ag.*, sub3.jlgstatus from pgagent.pga_job j join pgagent.pga_jobclass cl on cl.jclid=jobjclid left outer join pgagent.pga_jobagent ag on ag.jagpid=jobagentid join (select j2.jlgstatus, sub.jlgjobid from pgagent.pga_joblog j2 join (select jl.jlgjobid, max(jl.jlgid) as max_jlgid from pgagent.pga_joblog jl group by jl.jlgjobid) sub on sub.jlgjobid = j2.jlgjobid and sub.max_jlgid = j2.jlgid) sub3 on sub3.jlgjobid = j.jobid -- + restriction + order by jobname "Sort (cost=68.35..68.36 rows=1 width=132) (actual time=1.026..1.033 rows=2 loops=1)" " Sort Key: j.jobname" " -> Nested Loop (cost=36.69..68.34 rows=1 width=132) (actual time=0.877..0.961 rows=2 loops=1)" " -> Nested Loop Left Join (cost=36.69..67.58 rows=1 width=110) (actual time=0.838..0.882 rows=2 loops=1)" " -> Hash Join (cost=36.69..66.82 rows=1 width=68) (actual time=0.810..0.830 rows=2 loops=1)" " Hash Cond: ((j2.jlgjobid = j.jobid) AND (j2.jlgid = sub.max_jlgid))" " -> Seq Scan on pga_joblog j2 (cost=0.00..21.50 rows=1150 width=13) (actual time=0.024..0.130 rows=44 loops=1)" " -> Hash (cost=36.31..36.31 rows=25 width=71) (actual time=0.542..0.542 rows=2 loops=1)" " -> Hash Join (cost=30.81..36.31 rows=25 width=71) (actual time=0.506..0.523 rows=2 loops=1)" " Hash Cond: (sub.jlgjobid = j.jobid)" " -> HashAggregate (cost=27.25..29.75 rows=200 width=8) (actual time=0.270..0.275 rows=2 loops=1)" " -> Seq Scan on pga_joblog jl (cost=0.00..21.50 rows=1150 width=8) (actual time=0.013..0.116 rows=44 loops=1)" " -> Hash (cost=3.25..3.25 rows=25 width=63) (actual time=0.206..0.206 rows=25 loops=1)" " -> Seq Scan on pga_job j (cost=0.00..3.25 rows=25 width=63) (actual time=0.011..0.082 rows=25 loops=1)" " -> Index Scan using pga_jobagent_pkey on pga_jobagent ag (cost=0.00..0.75 rows=1 width=42) (actual time=0.006..0.006 rows=0 loops=2)" " Index Cond: (ag.jagpid = j.jobagentid)" " -> Index Scan using pga_jobclass_pkey on pga_jobclass cl (cost=0.00..0.75 rows=1 width=22) (actual time=0.016..0.020 rows=1 loops=2)" " Index Cond: (cl.jclid = j.jobjclid)" "Total runtime: 1.406 ms" I think this trend will continue as more and more jobs are inserted. BOOYA! Jon -----Original Message----- From: Andrew Sullivan [mailto:ajs@crankycanuck.ca] Sent: Thursday, October 11, 2007 10:14 AM To: Roberts, Jon Cc: 'Tom Lane'; pgsql-bugs@postgresql.org Subject: Re: [BUGS] BUG #3667: Job scheduling with Greenplum fails On Thu, Oct 11, 2007 at 07:31:44AM -0500, Roberts, Jon wrote: > The cost is significantly lower but the total runtime is higher. Um, so you want developers to change the thing so that it performs more slowly, but has a prettier estimate of how much work it's going to do? That seems like a poor optimisation to me. A -- Andrew Sullivan | ajs@crankycanuck.ca The whole tendency of modern prose is away from concreteness. --George Orwell
On Thu, 2007-10-11 at 10:42 -0500, Roberts, Jon wrote: > Like I said in the email, I think the best solution is to put the code in a > function. However, after I inserted a total of 25 jobs, the difference is > more noticeable and my sql is better in terms of cost and total time. You're missing a few key points: - this isn't the PgAdmin hackers list, so you're asking in the wrong place ... go to www.pgadmin.org and go from there. This is the standard PostgreSQL bug list and you haven't shown any bug with standard PostgreSQL. - PgAdmin 1.8 is almost released so you may have missed the boat - pgAdmin is trying to work with other PG variants, so a reasoned argument, in the right place, may help. I'll leave it to you to re-post appropriate parts of this thread. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
> ------- Original Message ------- > From: Simon Riggs <simon@2ndquadrant.com> > To: "Roberts, Jon" <Jon.Roberts@asurion.com> > Sent: 15/10/07, 21:19:04 > Subject: Re: [BUGS] BUG #3667: Job scheduling with Greenplum fails > > On Thu, 2007-10-11 at 10:42 -0500, Roberts, Jon wrote: > > > Like I said in the email, I think the best solution is to put the code in a > > function. However, after I inserted a total of 25 jobs, the difference is > > more noticeable and my sql is better in terms of cost and total time. > > You're missing a few key points: > > - this isn't the PgAdmin hackers list, so you're asking in the wrong > place ... go to www.pgadmin.org and go from there. This is the standard > PostgreSQL bug list and you haven't shown any bug with standard > PostgreSQL. > > - PgAdmin 1.8 is almost released so you may have missed the boat > > - pgAdmin is trying to work with other PG variants, so a reasoned > argument, in the right place, may help. A change has been applied for GP compatibility. It will be in pgAdmin 1.8. Regards, Dave
On Mon, 2007-10-15 at 22:54 +0100, Dave Page wrote: > A change has been applied for GP compatibility. It will be in pgAdmin 1.8. Wow Dave, that was quick! It was the right place after all. :-) -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
Simon Riggs wrote: > On Mon, 2007-10-15 at 22:54 +0100, Dave Page wrote: > >> A change has been applied for GP compatibility. It will be in pgAdmin 1.8. > > Wow Dave, that was quick! It was the right place after all. :-) > I exchanged a couple of emails with Luke a day or two back :-) /D