Thread: BUG #3667: Job scheduling with Greenplum fails

BUG #3667: Job scheduling with Greenplum fails

From
"Jon Roberts"
Date:
The following bug has been logged online:

Bug reference:      3667
Logged by:          Jon Roberts
Email address:      jon.roberts@asurion.com
PostgreSQL version: 8.2
Operating system:   Solaris
Description:        Job scheduling with Greenplum fails
Details:

Greenplum doesn't support "correlated subqueries" which PGAdmin III uses
when PgAgent is installed.

I suggest the SQL related to PgAgent be re-written to not use correlated
subqueries.

Re: BUG #3667: Job scheduling with Greenplum fails

From
Tom Lane
Date:
"Jon Roberts" <jon.roberts@asurion.com> writes:
> Greenplum doesn't support "correlated subqueries" which PGAdmin III uses
> when PgAgent is installed.

Surely this complaint should be directed to Greenplum.  Correlated
subqueries are a required entry-level feature in the SQL92 standard,
and have been supported by PG for nigh ten years.  It is by no means
unreasonable for PGAdmin to expect them to work.

            regards, tom lane

Re: BUG #3667: Job scheduling with Greenplum fails

From
Tom Lane
Date:
"Roberts, Jon" <Jon.Roberts@asurion.com> writes:
> What if the new query has a significantly lower cost compared to the older
> one?

Much as I'd like the planner to be infallible, it ain't; estimated costs
are no proof of any real-world performance difference.  Better show
EXPLAIN ANALYZE numbers if you want to be taken seriously --- and *not*
ones from a Greenplum-modified PG.

In any case "I can make this particular query faster" seems a rather
different argument from "you guys should eliminate all use of correlated
subqueries".

            regards, tom lane

Re: BUG #3667: Job scheduling with Greenplum fails

From
"Roberts, Jon"
Date:
What if the new query has a significantly lower cost compared to the older
one?

The current query found in pgaJob.cpp:
SELECT *,
       (SELECT jlgstatus FROM pgagent.pga_joblog jl WHERE jl.jlgjobid =
j.jobid ORDER BY jlgid DESC LIMIT 1) AS joblastresult
       FROM pgagent.pga_job j
         JOIN pgagent.pga_jobclass cl ON cl.jclid=jobjclid
         LEFT OUTER JOIN pgagent.pga_jobagent ag ON ag.jagpid=jobagentid
--       + restriction +
        ORDER BY jobname

"Sort  (cost=5359.18..5360.33 rows=460 width=221)"
"  Sort Key: j.jobname"
"  ->  Hash Join  (cost=69.50..5338.84 rows=460 width=221)"
"        Hash Cond: (j.jobjclid = cl.jclid)"
"        ->  Hash Left Join  (cost=33.40..54.33 rows=460 width=185)"
"              Hash Cond: (j.jobagentid = ag.jagpid)"
"              ->  Seq Scan on pga_job j  (cost=0.00..14.60 rows=460
width=141)"
"              ->  Hash  (cost=20.40..20.40 rows=1040 width=44)"
"                    ->  Seq Scan on pga_jobagent ag  (cost=0.00..20.40
rows=1040 width=44)"
"        ->  Hash  (cost=21.60..21.60 rows=1160 width=36)"
"              ->  Seq Scan on pga_jobclass cl  (cost=0.00..21.60 rows=1160
width=36)"
"        SubPlan"
"          ->  Limit  (cost=0.00..11.40 rows=1 width=9)"
"                ->  Index Scan Backward using pga_joblog_pkey on pga_joblog
jl  (cost=0.00..68.38 rows=6 width=9)"
"                      Filter: (jlgjobid = $0)"


My revised query:

select j.*, cl.*, ag.*, sub3.jlgstatus
   from pgagent.pga_job j join
        pgagent.pga_jobclass cl on cl.jclid=jobjclid left outer join
        pgagent.pga_jobagent ag on ag.jagpid=jobagentid
        join (select j2.jlgstatus,
                     sub.jlgjobid
                from pgagent.pga_joblog j2 join
                     (select jl.jlgjobid,
                             max(jl.jlgid) as max_jlgid
                        from pgagent.pga_joblog jl
                       group by jl.jlgjobid) sub
                     on sub.jlgjobid = j2.jlgjobid and
                        sub.max_jlgid = j2.jlgid) sub3
        on sub3.jlgjobid = j.jobid
--       + restriction +
order by jobname


"Sort  (cost=81.07..81.07 rows=1 width=226)"
"  Sort Key: j.jobname"
"  ->  Nested Loop  (cost=66.00..81.06 rows=1 width=226)"
"        ->  Nested Loop Left Join  (cost=66.00..80.67 rows=1 width=190)"
"              ->  Nested Loop  (cost=66.00..80.29 rows=1 width=146)"
"                    ->  Hash Join  (cost=66.00..72.01 rows=1 width=13)"
"                          Hash Cond: ((sub.jlgjobid = j2.jlgjobid) AND
(sub.max_jlgid = j2.jlgid))"
"                          ->  HashAggregate  (cost=27.25..29.75 rows=200
width=8)"
"                                ->  Seq Scan on pga_joblog jl
(cost=0.00..21.50 rows=1150 width=8)"
"                          ->  Hash  (cost=21.50..21.50 rows=1150 width=13)"
"                                ->  Seq Scan on pga_joblog j2
(cost=0.00..21.50 rows=1150 width=13)"
"                    ->  Index Scan using pga_job_pkey on pga_job j
(cost=0.00..8.27 rows=1 width=141)"
"                          Index Cond: (sub.jlgjobid = j.jobid)"
"              ->  Index Scan using pga_jobagent_pkey on pga_jobagent ag
(cost=0.00..0.37 rows=1 width=44)"
"                    Index Cond: (ag.jagpid = j.jobagentid)"
"        ->  Index Scan using pga_jobclass_pkey on pga_jobclass cl
(cost=0.00..0.37 rows=1 width=36)"
"              Index Cond: (cl.jclid = j.jobjclid)"


Surely the cost dropping from 5359 to 81 is worth the trouble.


Jon

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Wednesday, October 10, 2007 2:57 PM
To: Jon Roberts
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #3667: Job scheduling with Greenplum fails

"Jon Roberts" <jon.roberts@asurion.com> writes:
> Greenplum doesn't support "correlated subqueries" which PGAdmin III uses
> when PgAgent is installed.

Surely this complaint should be directed to Greenplum.  Correlated
subqueries are a required entry-level feature in the SQL92 standard,
and have been supported by PG for nigh ten years.  It is by no means
unreasonable for PGAdmin to expect them to work.

            regards, tom lane

Re: BUG #3667: Job scheduling with Greenplum fails

From
"Roberts, Jon"
Date:
The original query:

EXPLAIN ANALYZE
SELECT *,
       (SELECT jlgstatus FROM pgagent.pga_joblog jl WHERE jl.jlgjobid =
j.jobid ORDER BY jlgid DESC LIMIT 1) AS joblastresult
       FROM pgagent.pga_job j
         JOIN pgagent.pga_jobclass cl ON cl.jclid=jobjclid
         LEFT OUTER JOIN pgagent.pga_jobagent ag ON ag.jagpid=jobagentid
--       + restriction +
        ORDER BY jobname;


"Sort  (cost=5359.18..5360.33 rows=460 width=221) (actual time=0.295..0.300
rows=2 loops=1)"
"  Sort Key: j.jobname"
"  ->  Hash Join  (cost=69.50..5338.84 rows=460 width=221) (actual
time=0.189..0.249 rows=2 loops=1)"
"        Hash Cond: (j.jobjclid = cl.jclid)"
"        ->  Hash Left Join  (cost=33.40..54.33 rows=460 width=185) (actual
time=0.068..0.084 rows=2 loops=1)"
"              Hash Cond: (j.jobagentid = ag.jagpid)"
"              ->  Seq Scan on pga_job j  (cost=0.00..14.60 rows=460
width=141) (actual time=0.025..0.030 rows=2 loops=1)"
"              ->  Hash  (cost=20.40..20.40 rows=1040 width=44) (actual
time=0.019..0.019 rows=1 loops=1)"
"                    ->  Seq Scan on pga_jobagent ag  (cost=0.00..20.40
rows=1040 width=44) (actual time=0.005..0.008 rows=1 loops=1)"
"        ->  Hash  (cost=21.60..21.60 rows=1160 width=36) (actual
time=0.050..0.050 rows=5 loops=1)"
"              ->  Seq Scan on pga_jobclass cl  (cost=0.00..21.60 rows=1160
width=36) (actual time=0.011..0.022 rows=5 loops=1)"
"        SubPlan"
"          ->  Limit  (cost=0.00..11.40 rows=1 width=9) (actual
time=0.023..0.025 rows=1 loops=2)"
"                ->  Index Scan Backward using pga_joblog_pkey on pga_joblog
jl  (cost=0.00..68.38 rows=6 width=9) (actual time=0.014..0.014 rows=1
loops=2)"
"                      Filter: (jlgjobid = $0)"
"Total runtime: 0.519 ms"


My revised query:
EXPLAIN ANALYZE
select j.*, cl.*, ag.*, sub3.jlgstatus
   from pgagent.pga_job j join
        pgagent.pga_jobclass cl on cl.jclid=jobjclid left outer join
        pgagent.pga_jobagent ag on ag.jagpid=jobagentid
        join (select j2.jlgstatus,
                     sub.jlgjobid
                from pgagent.pga_joblog j2 join
                     (select jl.jlgjobid,
                             max(jl.jlgid) as max_jlgid
                        from pgagent.pga_joblog jl
                       group by jl.jlgjobid) sub
                     on sub.jlgjobid = j2.jlgjobid and
                        sub.max_jlgid = j2.jlgid) sub3
        on sub3.jlgjobid = j.jobid
--       + restriction +
order by jobname;

"Sort  (cost=81.07..81.07 rows=1 width=226) (actual time=0.780..0.784 rows=2
loops=1)"
"  Sort Key: j.jobname"
"  ->  Nested Loop  (cost=66.00..81.06 rows=1 width=226) (actual
time=0.633..0.736 rows=2 loops=1)"
"        ->  Nested Loop Left Join  (cost=66.00..80.67 rows=1 width=190)
(actual time=0.613..0.684 rows=2 loops=1)"
"              ->  Nested Loop  (cost=66.00..80.29 rows=1 width=146) (actual
time=0.598..0.651 rows=2 loops=1)"
"                    ->  Hash Join  (cost=66.00..72.01 rows=1 width=13)
(actual time=0.566..0.583 rows=2 loops=1)"
"                          Hash Cond: ((sub.jlgjobid = j2.jlgjobid) AND
(sub.max_jlgid = j2.jlgid))"
"                          ->  HashAggregate  (cost=27.25..29.75 rows=200
width=8) (actual time=0.252..0.257 rows=2 loops=1)"
"                                ->  Seq Scan on pga_joblog jl
(cost=0.00..21.50 rows=1150 width=8) (actual time=0.012..0.111 rows=44
loops=1)"
"                          ->  Hash  (cost=21.50..21.50 rows=1150 width=13)
(actual time=0.283..0.283 rows=44 loops=1)"
"                                ->  Seq Scan on pga_joblog j2
(cost=0.00..21.50 rows=1150 width=13) (actual time=0.023..0.145 rows=44
loops=1)"
"                    ->  Index Scan using pga_job_pkey on pga_job j
(cost=0.00..8.27 rows=1 width=141) (actual time=0.014..0.017 rows=1
loops=2)"
"                          Index Cond: (sub.jlgjobid = j.jobid)"
"              ->  Index Scan using pga_jobagent_pkey on pga_jobagent ag
(cost=0.00..0.37 rows=1 width=44) (actual time=0.002..0.002 rows=0 loops=2)"
"                    Index Cond: (ag.jagpid = j.jobagentid)"
"        ->  Index Scan using pga_jobclass_pkey on pga_jobclass cl
(cost=0.00..0.37 rows=1 width=36) (actual time=0.006..0.009 rows=1 loops=2)"
"              Index Cond: (cl.jclid = j.jobjclid)"
"Total runtime: 1.096 ms"

My table only has 2 records in it so it might be different when we have
several hundred jobs.

The cost is significantly lower but the total runtime is higher.  This is on
a PostgreSQL database installed on my desktop.  It has nothing to do with
Greenplum.  I can't even run an explain plan on GP with that first query
because it fails.

Another solution would be to call a function in the database rather than
imbedding the SQL in C++.  If you look at pgagent.sql, there are a few
functions created to support jobs.  Maybe this could be another function
call so it could easily be modified to support Greenplum and make it easier
for users to tweak the tool.

Thanks!


Jon
-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Wednesday, October 10, 2007 4:24 PM
To: Roberts, Jon
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #3667: Job scheduling with Greenplum fails

"Roberts, Jon" <Jon.Roberts@asurion.com> writes:
> What if the new query has a significantly lower cost compared to the older
> one?

Much as I'd like the planner to be infallible, it ain't; estimated costs
are no proof of any real-world performance difference.  Better show
EXPLAIN ANALYZE numbers if you want to be taken seriously --- and *not*
ones from a Greenplum-modified PG.

In any case "I can make this particular query faster" seems a rather
different argument from "you guys should eliminate all use of correlated
subqueries".

            regards, tom lane

Re: BUG #3667: Job scheduling with Greenplum fails

From
Andrew Sullivan
Date:
On Thu, Oct 11, 2007 at 07:31:44AM -0500, Roberts, Jon wrote:

> The cost is significantly lower but the total runtime is higher.

Um, so you want developers to change the thing so that it performs
more slowly, but has a prettier estimate of how much work it's going
to do?  That seems like a poor optimisation to me.

A

--
Andrew Sullivan  | ajs@crankycanuck.ca
The whole tendency of modern prose is away from concreteness.
        --George Orwell

Re: BUG #3667: Job scheduling with Greenplum fails

From
"Roberts, Jon"
Date:
Like I said in the email, I think the best solution is to put the code in a
function.  However, after I inserted a total of 25 jobs, the difference is
more noticeable and my sql is better in terms of cost and total time.

vacuum analyze pgagent.pga_job;
vacuum analyze pgagent.pga_jobclass;
vacuum analyze pgagent.pga_jobagent;

Original SQL:
explain analyze
SELECT *,
       (SELECT jlgstatus FROM pgagent.pga_joblog jl WHERE jl.jlgjobid =
j.jobid ORDER BY jlgid DESC LIMIT 1) AS joblastresult
       FROM pgagent.pga_job j
         JOIN pgagent.pga_jobclass cl ON cl.jclid=jobjclid
         LEFT OUTER JOIN pgagent.pga_jobagent ag ON ag.jagpid=jobagentid
--       + restriction +
        ORDER BY jobname
"Sort  (cost=291.31..291.37 rows=25 width=127) (actual time=2.125..2.181
rows=25 loops=1)"
"  Sort Key: j.jobname"
"  ->  Hash Join  (cost=2.13..290.73 rows=25 width=127) (actual
time=0.204..1.823 rows=25 loops=1)"
"        Hash Cond: (j.jobjclid = cl.jclid)"
"        ->  Hash Left Join  (cost=1.02..4.38 rows=25 width=105) (actual
time=0.061..0.264 rows=25 loops=1)"
"              Hash Cond: (j.jobagentid = ag.jagpid)"
"              ->  Seq Scan on pga_job j  (cost=0.00..3.25 rows=25 width=63)
(actual time=0.009..0.074 rows=25 loops=1)"
"              ->  Hash  (cost=1.01..1.01 rows=1 width=42) (actual
time=0.021..0.021 rows=1 loops=1)"
"                    ->  Seq Scan on pga_jobagent ag  (cost=0.00..1.01
rows=1 width=42) (actual time=0.005..0.008 rows=1 loops=1)"
"        ->  Hash  (cost=1.05..1.05 rows=5 width=22) (actual
time=0.050..0.050 rows=5 loops=1)"
"              ->  Seq Scan on pga_jobclass cl  (cost=0.00..1.05 rows=5
width=22) (actual time=0.005..0.026 rows=5 loops=1)"
"        SubPlan"
"          ->  Limit  (cost=0.00..11.40 rows=1 width=9) (actual
time=0.045..0.045 rows=0 loops=25)"
"                ->  Index Scan Backward using pga_joblog_pkey on pga_joblog
jl  (cost=0.00..68.38 rows=6 width=9) (actual time=0.036..0.036 rows=0
loops=25)"
"                      Filter: (jlgjobid = $0)"
"Total runtime: 2.436 ms"


My SQL:
explain analyze
select j.*, cl.*, ag.*, sub3.jlgstatus
   from pgagent.pga_job j join
        pgagent.pga_jobclass cl on cl.jclid=jobjclid left outer join
        pgagent.pga_jobagent ag on ag.jagpid=jobagentid
        join (select j2.jlgstatus,
                     sub.jlgjobid
                from pgagent.pga_joblog j2 join
                     (select jl.jlgjobid,
                             max(jl.jlgid) as max_jlgid
                        from pgagent.pga_joblog jl
                       group by jl.jlgjobid) sub
                     on sub.jlgjobid = j2.jlgjobid and
                        sub.max_jlgid = j2.jlgid) sub3
        on sub3.jlgjobid = j.jobid
--       + restriction +
order by jobname

"Sort  (cost=68.35..68.36 rows=1 width=132) (actual time=1.026..1.033 rows=2
loops=1)"
"  Sort Key: j.jobname"
"  ->  Nested Loop  (cost=36.69..68.34 rows=1 width=132) (actual
time=0.877..0.961 rows=2 loops=1)"
"        ->  Nested Loop Left Join  (cost=36.69..67.58 rows=1 width=110)
(actual time=0.838..0.882 rows=2 loops=1)"
"              ->  Hash Join  (cost=36.69..66.82 rows=1 width=68) (actual
time=0.810..0.830 rows=2 loops=1)"
"                    Hash Cond: ((j2.jlgjobid = j.jobid) AND (j2.jlgid =
sub.max_jlgid))"
"                    ->  Seq Scan on pga_joblog j2  (cost=0.00..21.50
rows=1150 width=13) (actual time=0.024..0.130 rows=44 loops=1)"
"                    ->  Hash  (cost=36.31..36.31 rows=25 width=71) (actual
time=0.542..0.542 rows=2 loops=1)"
"                          ->  Hash Join  (cost=30.81..36.31 rows=25
width=71) (actual time=0.506..0.523 rows=2 loops=1)"
"                                Hash Cond: (sub.jlgjobid = j.jobid)"
"                                ->  HashAggregate  (cost=27.25..29.75
rows=200 width=8) (actual time=0.270..0.275 rows=2 loops=1)"
"                                      ->  Seq Scan on pga_joblog jl
(cost=0.00..21.50 rows=1150 width=8) (actual time=0.013..0.116 rows=44
loops=1)"
"                                ->  Hash  (cost=3.25..3.25 rows=25
width=63) (actual time=0.206..0.206 rows=25 loops=1)"
"                                      ->  Seq Scan on pga_job j
(cost=0.00..3.25 rows=25 width=63) (actual time=0.011..0.082 rows=25
loops=1)"
"              ->  Index Scan using pga_jobagent_pkey on pga_jobagent ag
(cost=0.00..0.75 rows=1 width=42) (actual time=0.006..0.006 rows=0 loops=2)"
"                    Index Cond: (ag.jagpid = j.jobagentid)"
"        ->  Index Scan using pga_jobclass_pkey on pga_jobclass cl
(cost=0.00..0.75 rows=1 width=22) (actual time=0.016..0.020 rows=1 loops=2)"
"              Index Cond: (cl.jclid = j.jobjclid)"
"Total runtime: 1.406 ms"


I think this trend will continue as more and more jobs are inserted.


BOOYA!

Jon
-----Original Message-----
From: Andrew Sullivan [mailto:ajs@crankycanuck.ca]
Sent: Thursday, October 11, 2007 10:14 AM
To: Roberts, Jon
Cc: 'Tom Lane'; pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #3667: Job scheduling with Greenplum fails

On Thu, Oct 11, 2007 at 07:31:44AM -0500, Roberts, Jon wrote:

> The cost is significantly lower but the total runtime is higher.

Um, so you want developers to change the thing so that it performs
more slowly, but has a prettier estimate of how much work it's going
to do?  That seems like a poor optimisation to me.

A

--
Andrew Sullivan  | ajs@crankycanuck.ca
The whole tendency of modern prose is away from concreteness.
        --George Orwell

Re: BUG #3667: Job scheduling with Greenplum fails

From
Simon Riggs
Date:
On Thu, 2007-10-11 at 10:42 -0500, Roberts, Jon wrote:

> Like I said in the email, I think the best solution is to put the code in a
> function.  However, after I inserted a total of 25 jobs, the difference is
> more noticeable and my sql is better in terms of cost and total time.

You're missing a few key points:

- this isn't the PgAdmin hackers list, so you're asking in the wrong
place ... go to www.pgadmin.org and go from there. This is the standard
PostgreSQL bug list and you haven't shown any bug with standard
PostgreSQL.

- PgAdmin 1.8 is almost released so you may have missed the boat

- pgAdmin is trying to work with other PG variants, so a reasoned
argument, in the right place, may help.

I'll leave it to you to re-post appropriate parts of this thread.

--
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com

Re: BUG #3667: Job scheduling with Greenplum fails

From
"Dave Page"
Date:
> ------- Original Message -------
> From: Simon Riggs <simon@2ndquadrant.com>
> To: "Roberts, Jon" <Jon.Roberts@asurion.com>
> Sent: 15/10/07, 21:19:04
> Subject: Re: [BUGS] BUG #3667: Job scheduling with Greenplum fails
>
> On Thu, 2007-10-11 at 10:42 -0500, Roberts, Jon wrote:
>
> > Like I said in the email, I think the best solution is to put the code in a
> > function.  However, after I inserted a total of 25 jobs, the difference is
> > more noticeable and my sql is better in terms of cost and total time.
>
> You're missing a few key points:
>
> - this isn't the PgAdmin hackers list, so you're asking in the wrong
> place ... go to www.pgadmin.org and go from there. This is the standard
> PostgreSQL bug list and you haven't shown any bug with standard
> PostgreSQL.
>
> - PgAdmin 1.8 is almost released so you may have missed the boat
>
> - pgAdmin is trying to work with other PG variants, so a reasoned
> argument, in the right place, may help.

A change has been applied for GP compatibility. It will be in pgAdmin 1.8.

Regards, Dave

Re: BUG #3667: Job scheduling with Greenplum fails

From
Simon Riggs
Date:
On Mon, 2007-10-15 at 22:54 +0100, Dave Page wrote:

> A change has been applied for GP compatibility. It will be in pgAdmin 1.8.

Wow Dave, that was quick! It was the right place after all. :-)

--
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com

Re: BUG #3667: Job scheduling with Greenplum fails

From
Dave Page
Date:
Simon Riggs wrote:
> On Mon, 2007-10-15 at 22:54 +0100, Dave Page wrote:
>
>> A change has been applied for GP compatibility. It will be in pgAdmin 1.8.
>
> Wow Dave, that was quick! It was the right place after all. :-)
>

I exchanged a couple of emails with Luke a day or two back :-)

/D