Thread: query optimization question

query optimization question

From
Jack Coates
Date:
Hi all,

I've got a query that needs some help, please. Is there a way to avoid
all the looping? I've got freedom to work with the double-indented
sections below ) AND (, but the initial select distinct wrapper is much
more difficult to change. This is auto-generated code.

explain analyze SELECT DISTINCT members_.emailaddr_, members_.memberid_
FROM members_  WHERE (
    members_.List_='list1'
    AND members_.MemberType_='normal'
    AND members_.SubType_='mail'
    AND members_.emailaddr_ IS NOT NULL
    ) AND (
        ( select count(*) from lyrActiveRecips, members_ a, outmail_
        where lyrActiveRecips.UserName = a.UserNameLC_
        and lyrActiveRecips.Domain = a.Domain_
        and a.MemberID_ = members_.MemberID_
        and outmail_.MessageID_ = lyrActiveRecips.MailingID
        and outmail_.Type_ = 'list'
        and lyrActiveRecips.NextAttempt > '2004-01-20 00:00:00'
        )
          +
        ( select count(*) from lyrCompletedRecips, members_ a, outmail_
        where a.MemberID_ = lyrCompletedRecips.MemberID
        and a.UserNameLC_ = members_.UserNameLC_
        and a.Domain_ = members_.Domain_
        and outmail_.MessageID_ = lyrCompletedRecips.MailingID
        and outmail_.Type_ = 'list'
        and lyrCompletedRecips.FinalAttempt > '2004-01-20 00:00:00'
        and lyrCompletedRecips.CompletionStatusID = 300 )
          = 3
    )
;
                                                                             QUERY PLAN
                                             

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=537.06..537.07 rows=1 width=72) (actual
time=114460.908..114460.908 rows=0 loops=1)
   ->  Sort  (cost=537.06..537.06 rows=1 width=72) (actual
time=114460.905..114460.905 rows=0 loops=1)
         Sort Key: emailaddr_, memberid_
         ->  Index Scan using ix_members_list_notifyerr on members_
(cost=0.00..537.05 rows=1 width=72) (actual time=114460.893..114460.893
rows=0 loops=1)
               Index Cond: ((list_)::text = 'list1'::text)
               Filter: (((membertype_)::text = 'normal'::text) AND
((subtype_)::text = 'mail'::text) AND (emailaddr_ IS NOT NULL) AND
(((subplan) + (subplan)) = 3))
               SubPlan
                 ->  Aggregate  (cost=52.39..52.39 rows=1 width=0)
(actual time=0.089..0.090 rows=1 loops=818122)
                       ->  Hash Join  (cost=47.55..52.39 rows=1 width=0)
(actual time=0.086..0.086 rows=0 loops=818122)
                             Hash Cond: ("outer".memberid_ =
"inner".memberid)
                             ->  Index Scan using ix_members_emaillc on
members_ a  (cost=0.00..4.83 rows=1 width=4) (actual time=0.077..0.081
rows=1 loops=818122)
                                   Index Cond: (((domain_)::text =
($2)::text) AND ((usernamelc_)::text = ($1)::text))
                             ->  Hash  (cost=47.55..47.55 rows=1
width=4) (actual time=0.025..0.025 rows=0 loops=1)
                                   ->  Hash Join  (cost=25.00..47.55
rows=1 width=4) (actual time=0.023..0.023 rows=0 loops=1)
                                         Hash Cond: ("outer".messageid_
= "inner".mailingid)
                                         ->  Seq Scan on outmail_
(cost=0.00..22.50 rows=6 width=4) (actual time=0.001..0.001 rows=0
loops=1)
                                               Filter: ((type_)::text =
'list'::text)
                                         ->  Hash  (cost=25.00..25.00
rows=2 width=8) (actual time=0.003..0.003 rows=0 loops=1)
                                               ->  Seq Scan on
lyrcompletedrecips  (cost=0.00..25.00 rows=2 width=8) (actual
time=0.001..0.001 rows=0 loops=1)
                                                     Filter:
((finalattempt > '2004-01-20 00:00:00'::timestamp without time zone) AND
(completionstatusid = 300))
                 ->  Aggregate  (cost=51.59..51.59 rows=1 width=0)
(actual time=0.033..0.034 rows=1 loops=818122)
                       ->  Hash Join  (cost=27.35..51.59 rows=1 width=0)
(actual time=0.028..0.028 rows=0 loops=818122)
                             Hash Cond: ((("outer".username)::text =
("inner".usernamelc_)::text) AND (("outer"."domain")::text =
("inner".domain_)::text))
                             ->  Hash Join  (cost=22.52..46.72 rows=3
width=211) (actual time=0.003..0.003 rows=0 loops=818122)
                                   Hash Cond: ("outer".mailingid =
"inner".messageid_)
                                   ->  Seq Scan on lyractiverecips
(cost=0.00..22.50 rows=334 width=215) (actual time=0.001..0.001 rows=0
loops=818122)
                                         Filter: (nextattempt >
'2004-01-20 00:00:00'::timestamp without time zone)
                                   ->  Hash  (cost=22.50..22.50 rows=6
width=4) (actual time=0.003..0.003 rows=0 loops=1)
                                         ->  Seq Scan on outmail_
(cost=0.00..22.50 rows=6 width=4) (actual time=0.002..0.002 rows=0
loops=1)
                                               Filter: ((type_)::text =
'list'::text)
                             ->  Hash  (cost=4.82..4.82 rows=2
width=211) (actual time=0.017..0.017 rows=0 loops=818122)
                                   ->  Index Scan using pk_members_ on
members_ a  (cost=0.00..4.82 rows=2 width=211) (actual time=0.011..0.013
rows=1 loops=818122)
                                         Index Cond: (memberid_ = $0)
 Total runtime: 114474.407 ms
(34 rows)

that's with no data in lyractiverecips or lyrcompletedrecips. With data
in those tables, the query still hasn't completed after several hours on
two different machines.

thanks,
--
Jack Coates, Lyris Technologies Applications Engineer
510-549-4350 x148, jack@lyris.com
"Interoperability is the keyword, uniformity is a dead end."
                --Olivier Fourdan



Re: query optimization question

From
Tom Lane
Date:
Jack Coates <jack@lyris.com> writes:
> I've got a query that needs some help, please. Is there a way to avoid
> all the looping? I've got freedom to work with the double-indented
> sections below ) AND (, but the initial select distinct wrapper is much
> more difficult to change. This is auto-generated code.

Well, you're not going to get any serious improvement without a
wholesale rewrite of the query --- I'd think that something driven by
a GROUP BY memberid_ HAVING count(*) = whatever at the outer level would
be a better way to approach it.  As you have it, the system has no
choice but to fully evaluate two very expensive subselects, from scratch,
for each outer row.

However...

>         ( select count(*) from lyrActiveRecips, members_ a, outmail_
>         where lyrActiveRecips.UserName = a.UserNameLC_
>         and lyrActiveRecips.Domain = a.Domain_
>         and a.MemberID_ = members_.MemberID_
>         and outmail_.MessageID_ = lyrActiveRecips.MailingID

Is memberid_ a unique identifier for members_, as one would think from
the name?  If so, can't you drop the join of members_ a in this
subselect, and just use the corresponding fields from the outer table?

>         ( select count(*) from lyrCompletedRecips, members_ a, outmail_
>         where a.MemberID_ = lyrCompletedRecips.MemberID
>         and a.UserNameLC_ = members_.UserNameLC_
>         and a.Domain_ = members_.Domain_
>         and outmail_.MessageID_ = lyrCompletedRecips.MailingID

Why are the join conditions different here from the other subselect?
Can't you rephrase them the same as above, and then again remove the
inner appearance of members_ ?

            regards, tom lane

Re: query optimization question

From
Jack Coates
Date:
On Wed, 2004-01-28 at 18:04, Tom Lane wrote:
> Jack Coates <jack@lyris.com> writes:
> > I've got a query that needs some help, please. Is there a way to avoid
> > all the looping? I've got freedom to work with the double-indented
> > sections below ) AND (, but the initial select distinct wrapper is much
> > more difficult to change. This is auto-generated code.
>
> Well, you're not going to get any serious improvement without a
> wholesale rewrite of the query --- I'd think that something driven by
> a GROUP BY memberid_ HAVING count(*) = whatever at the outer level would
> be a better way to approach it.  As you have it, the system has no
> choice but to fully evaluate two very expensive subselects, from scratch,
> for each outer row.
>

I hear you. There's definitely an understanding that this tool can
generate some gnarly queries, and we want to redesign in a way that will
allow some more intelligence to be applied to the problem. In the
meantime, I'll be happy if PG grinds at the same level as other
databases. MS-SQL completed that query in 25 minutes on a database with
31 times the data in it. Since I'm one of the bigger *nix fans around
here, that doesn't make me happy.

> However...
>
> >         ( select count(*) from lyrActiveRecips, members_ a, outmail_
> >         where lyrActiveRecips.UserName = a.UserNameLC_
> >         and lyrActiveRecips.Domain = a.Domain_
> >         and a.MemberID_ = members_.MemberID_
> >         and outmail_.MessageID_ = lyrActiveRecips.MailingID
>
> Is memberid_ a unique identifier for members_, as one would think from
> the name?  If so, can't you drop the join of members_ a in this
> subselect, and just use the corresponding fields from the outer table?
>
> >         ( select count(*) from lyrCompletedRecips, members_ a, outmail_
> >         where a.MemberID_ = lyrCompletedRecips.MemberID
> >         and a.UserNameLC_ = members_.UserNameLC_
> >         and a.Domain_ = members_.Domain_
> >         and outmail_.MessageID_ = lyrCompletedRecips.MailingID
>
> Why are the join conditions different here from the other subselect?
> Can't you rephrase them the same as above, and then again remove the
> inner appearance of members_ ?
>
>             regards, tom lane

unfortunately, the column names are different between lyrcompletedrecips
and lyractiverecips. However, one thing we were able to do is to reduce
the number of queries by not trying to match across multiple lists.

SELECT DISTINCT members_.emailaddr_, members_.memberid_ FROM members_
WHERE ( members_.List_='list1'
    AND members_.MemberType_='normal'
    AND members_.SubType_='mail'
    AND members_.emailaddr_ IS NOT NULL )
AND (
    ( select count(*) from lyrActiveRecips, outmail_
    where outmail_.MessageID_ = lyrActiveRecips.MailingID
    and outmail_.Type_ = 'list'
    and members_.MemberID_ = lyrActiveRecips.MemberID
    and lyrActiveRecips.NextAttempt > '2004-01-20 00:00:00' )
     +
    ( select count(*) from lyrCompletedRecips, outmail_
    where members_.MemberID_ = lyrCompletedRecips.MemberID
    and outmail_.MessageID_ = lyrCompletedRecips.MailingID
    and outmail_.Type_ = 'list'
    and lyrCompletedRecips.FinalAttempt > '2004-01-20 00:00:00'
    and lyrCompletedRecips.CompletionStatusID = 300 )
     = 3
);

That completed in 3.5 minutes on MS-SQL. I killed the query this morning
after 15 hours on PostgreSQL 7.4. I tried a GROUP BY memberid_ HAVING
variation, which completed in 59 seconds on MS-SQL. I killed it after 35
minutes on PostgreSQL.

On a more positive note, if you remember the benchmarking I was doing
last month, PostgreSQL got some pretty good relative numbers. It
requires a lot of hand-holding and tuning relative to MS-SQL, but it
certainly beat the pants off of Oracle 8 and 9 for speed and ease of
management. Oracle 8 was in fact unable to complete the uglier stress
tests. I'll be working on a tuning recommendations white paper today.

thanks for all the help,
--
Jack Coates, Lyris Technologies Applications Engineer
510-549-4350 x148, jack@lyris.com
"Interoperability is the keyword, uniformity is a dead end."
                --Olivier Fourdan



Re: query optimization question

From
Tom Lane
Date:
Jack Coates <jack@lyris.com> writes:
> That completed in 3.5 minutes on MS-SQL. I killed the query this morning
> after 15 hours on PostgreSQL 7.4. I tried a GROUP BY memberid_ HAVING
> variation, which completed in 59 seconds on MS-SQL. I killed it after 35
> minutes on PostgreSQL.

Hm.  I'd like to think that 7.4 would be competitive on grouping
queries.  What sort of plan did you get from it?

            regards, tom lane

Re: query optimization question

From
Jack Coates
Date:
On Thu, 2004-01-29 at 10:05, Tom Lane wrote:
> Jack Coates <jack@lyris.com> writes:
> > That completed in 3.5 minutes on MS-SQL. I killed the query this morning
> > after 15 hours on PostgreSQL 7.4. I tried a GROUP BY memberid_ HAVING
> > variation, which completed in 59 seconds on MS-SQL. I killed it after 35
> > minutes on PostgreSQL.
>
> Hm.  I'd like to think that 7.4 would be competitive on grouping
> queries.  What sort of plan did you get from it?

Comparable to the first plan.

jackdb=# explain SELECT DISTINCT members_.memberid_
jackdb-# FROM members_
jackdb-# WHERE ( members_.List_='list1'
jackdb(# AND members_.MemberType_='normal'
jackdb(# AND members_.SubType_='mail'
jackdb(# AND members_.emailaddr_ IS NOT NULL )
jackdb-# GROUP BY memberid_ HAVING (
jackdb(#    ( select count(*) from lyrActiveRecips, outmail_
jackdb(# where outmail_.MessageID_ = lyrActiveRecips.MailingID
jackdb(# and outmail_.Type_ = 'list'
jackdb(# and members_.MemberID_ = lyrActiveRecips.MemberID
jackdb(# and lyrActiveRecips.NextAttempt > '2004-01-20 00:00:00' )
jackdb(# +
jackdb(# ( select count(*) from lyrCompletedRecips, outmail_
jackdb(# where members_.MemberID_ = lyrCompletedRecips.MemberID
jackdb(# and outmail_.MessageID_ = lyrCompletedRecips.MailingID
jackdb(# and outmail_.Type_ = 'list'
jackdb(# and lyrCompletedRecips.FinalAttempt > '2004-01-20 00:00:00'
jackdb(# and lyrCompletedRecips.CompletionStatusID = 300 )
jackdb(# = 3 );
                                                                                QUERY PLAN
                                                   

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=453.08..453.09 rows=1 width=4)
   ->  Group  (cost=453.08..453.09 rows=1 width=4)
         ->  Sort  (cost=453.08..453.08 rows=1 width=4)
               Sort Key: memberid_
               ->  Index Scan using ix_members_list_notifyerr on
members_  (cost=0.00..453.07 rows=1 width=4)
                     Index Cond: ((list_)::text = 'list1'::text)
                     Filter: (((membertype_)::text = 'normal'::text) AND
((subtype_)::text = 'mail'::text) AND (emailaddr_ IS NOT NULL) AND
(((subplan) + (subplan)) = 3))
                     SubPlan
                       ->  Aggregate  (cost=39.64..39.64 rows=1 width=0)
                             ->  Hash Join  (cost=17.10..39.64 rows=1
width=0)
                                   Hash Cond: ("outer".messageid_ =
"inner".mailingid)
                                   ->  Seq Scan on outmail_
(cost=0.00..22.50 rows=6 width=4)
                                         Filter: ((type_)::text =
'list'::text)
                                   ->  Hash  (cost=17.09..17.09 rows=1
width=4)
                                         ->  Index Scan using
ix_completedrecipsmemberid on lyrcompletedrecips  (cost=0.00..17.09
rows=1 width=4)
                                               Index Cond: ($0 =
memberid)
                                               Filter: ((finalattempt >
'2004-01-20 00:00:00'::timestamp without time zone) AND
(completionstatusid = 300))
                       ->  Aggregate  (cost=47.55..47.55 rows=1 width=0)
                             ->  Hash Join  (cost=25.00..47.55 rows=1
width=0)
                                   Hash Cond: ("outer".messageid_ =
"inner".mailingid)
                                   ->  Seq Scan on outmail_
(cost=0.00..22.50 rows=6 width=4)
                                         Filter: ((type_)::text =
'list'::text)
                                   ->  Hash  (cost=25.00..25.00 rows=2
width=4)
                                         ->  Seq Scan on
lyractiverecips  (cost=0.00..25.00 rows=2 width=4)
                                               Filter: (($0 = memberid)
AND (nextattempt > '2004-01-20 00:00:00'::timestamp without time zone))
(25 rows)

--
Jack Coates, Lyris Technologies Applications Engineer
510-549-4350 x148, jack@lyris.com
"Interoperability is the keyword, uniformity is a dead end."
                --Olivier Fourdan



Re: query optimization question

From
Tom Lane
Date:
Jack Coates <jack@lyris.com> writes:
> jackdb=# explain SELECT DISTINCT members_.memberid_
> jackdb-# FROM members_
> jackdb-# WHERE ( members_.List_='list1'
> jackdb(# AND members_.MemberType_='normal'
> jackdb(# AND members_.SubType_='mail'
> jackdb(# AND members_.emailaddr_ IS NOT NULL )
> jackdb-# GROUP BY memberid_ HAVING (

Um, that's not what I had in mind at all.  Does GROUP BY actually do
anything at all here?  (You didn't answer me as to whether memberid_
is a unique identifier or not, but if it is, this GROUP BY is just an
expensive no-op.)

What I was envisioning was pulling the sub-selects up to the top level
and using grouping to calculate the count(*) values for all memberids
in parallel.  Roughly speaking it would look like (again assuming
memberid_ is unique)

SELECT memberid_ FROM
(
  SELECT memberid_ FROM lyrActiveRecips, members_, outmail
  WHERE (all the conditions for this case)
  UNION ALL
  SELECT memberid_ FROM lyrCompletedRecips, members_, outmail
  WHERE (all the conditions for this case)
)
GROUP BY memberid_ HAVING count(*) = 3;

However, if you can't change the boilerplate part of your query then
this is all blue-sky speculation anyway.  What I'm actually more
interested in is your statement that MSSQL can do the original query
quickly.  I find that a bit hard to believe because I don't see any
relevant optimization techniques.  Do they have any equivalent to
EXPLAIN that would give some hint how they're doing it?

            regards, tom lane

Re: query optimization question

From
Jack Coates
Date:
On Thu, 2004-01-29 at 11:31, Tom Lane wrote:
> Jack Coates <jack@lyris.com> writes:
> > jackdb=# explain SELECT DISTINCT members_.memberid_
> > jackdb-# FROM members_
> > jackdb-# WHERE ( members_.List_='list1'
> > jackdb(# AND members_.MemberType_='normal'
> > jackdb(# AND members_.SubType_='mail'
> > jackdb(# AND members_.emailaddr_ IS NOT NULL )
> > jackdb-# GROUP BY memberid_ HAVING (
>
> Um, that's not what I had in mind at all.  Does GROUP BY actually do
> anything at all here?  (You didn't answer me as to whether memberid_
> is a unique identifier or not, but if it is, this GROUP BY is just an
> expensive no-op.)
>

Sorry for the misunderstanding. It should be unique, yes.

> What I was envisioning was pulling the sub-selects up to the top level
> and using grouping to calculate the count(*) values for all memberids
> in parallel.  Roughly speaking it would look like (again assuming
> memberid_ is unique)
>
> SELECT memberid_ FROM
> (
>   SELECT memberid_ FROM lyrActiveRecips, members_, outmail
>   WHERE (all the conditions for this case)
>   UNION ALL
>   SELECT memberid_ FROM lyrCompletedRecips, members_, outmail
>   WHERE (all the conditions for this case)
> )
> GROUP BY memberid_ HAVING count(*) = 3;
>
> However, if you can't change the boilerplate part of your query then
> this is all blue-sky speculation anyway.

Got it now -- I'm running into some subquery errors trying to implement
this, anyway.

> What I'm actually more
> interested in is your statement that MSSQL can do the original query
> quickly.  I find that a bit hard to believe because I don't see any
> relevant optimization techniques.  Do they have any equivalent to
> EXPLAIN that would give some hint how they're doing it?

yup -- here it is. It will probably be a nasty mess after linewrap gets
done with it, so let me know if you'd like me to post a copy on ftp.

SELECT DISTINCT members_.memberid_   FROM members_   WHERE (
members_.List_='list1'    AND members_.MemberType_='normal'    AND
members_.SubType_='mail' )   GROUP BY memberid_ HAVING (       ( select
count(*) from lyrActiveRecips, outmail_    where
outmail    11    1    0    NULL    NULL    1    NULL    102274.5    NULL    NULL    NULL    104.10356    NULL    NULL
SELECT    0    NULL 
  |--Parallelism(Gather Streams)    11    2    1    Parallelism    Gather
Streams    NULL    NULL    102274.5    0.0    0.22011127    23    104.10356    [members_].[MemberID_]    NULL
PLAN_ROW   -1    1.0 
       |--Filter(WHERE:(If ([Expr1006] IS NULL) then 0 else
[Expr1006]+If ([Expr1012] IS NULL) then 0 else
[Expr1012]=3))    11    3    2    Filter    Filter    WHERE:(If ([Expr1006] IS NULL) then
0 else [Expr1006]+If ([Expr1012] IS NULL) then 0 else
[Expr1012]=3)    NULL    102274.5    0.0    3.5393338    23    103.88345    [members_].[MemberID_]    NULL    PLAN_ROW
 -1    1.0 
            |--Hash Match(Right Outer Join,
HASH:([lyrCompletedRecips].[MemberID])=([members_].[MemberID_]),
RESIDUAL:([members_].[MemberID_]=[lyrCompletedRecips].[MemberID]))    11    4    3    Hash Match    Right Outer Join
HASH:([lyrCompletedRecips].[MemberID])=([members_].[MemberID_]),
RESIDUAL:([members_].[MemberID_]=[lyrCompletedRecips].[MemberID])   NULL    4782883.5    0.0    21.874712    23
100.34412   [members_].[MemberID_], [Expr1006], [Expr1012]    NULL    PLAN_ROW    -1    1.0 
                 |--Compute
Scalar(DEFINE:([Expr1012]=Convert([Expr1020])))    11    5    4    Compute
Scalar    Compute
Scalar    DEFINE:([Expr1012]=Convert([Expr1020]))    [Expr1012]=Convert([Expr1020])    119575.35    0.0    1.3723248
15   4.3749919    [lyrCompletedRecips].[MemberID], [Expr1012]    NULL    PLAN_ROW    -1    1.0 
                 |    |--Hash Match(Aggregate,
HASH:([lyrCompletedRecips].[MemberID]),
RESIDUAL:([lyrCompletedRecips].[MemberID]=[lyrCompletedRecips].[MemberID]) DEFINE:([Expr1020]=COUNT(*)))    11    6
5   Hash Match    Aggregate    HASH:([lyrCompletedRecips].[MemberID]),
RESIDUAL:([lyrCompletedRecips].[MemberID]=[lyrCompletedRecips].[MemberID])   [Expr1020]=COUNT(*)    119575.35    0.0
1.3723248   15    4.3749919    [lyrCompletedRecips].[MemberID], [Expr1020]    NULL    PLAN_ROW    -1    1.0 
                 |         |--Parallelism(Repartition Streams, PARTITION
COLUMNS:([lyrCompletedRecips].[MemberID]))    11    7    6    Parallelism    Repartition Streams    PARTITION
COLUMNS:([lyrCompletedRecips].[MemberID])   NULL    119640.6    0.0    0.32407209    173    3.002667
[lyrCompletedRecips].[MemberID]   NULL    PLAN_ROW    -1    1.0 
                 |              |--Nested Loops(Inner Join, OUTER
REFERENCES:([outmail_].[MessageID_]))    11    8    7    Nested Loops    Inner
Join    OUTER
REFERENCES:([outmail_].[MessageID_])    NULL    119640.6    0.0    0.75014657    173    2.6785948
[lyrCompletedRecips].[MemberID]   NULL    PLAN_ROW    -1    1.0 
                 |                   |--Parallelism(Distribute
Streams)    11    9    8    Parallelism    Distribute
Streams    NULL    NULL    1.0    0.0    2.8501874E-2    128    9.4664574E-2    [outmail_].[MessageID_]    NULL
PLAN_ROW   -1    1.0 
                 |                   |    |--Clustered Index
Scan(OBJECT:([lmdb].[dbo].[outmail_].[IX_outmail_list]),
WHERE:([outmail_].[Type_]='list'))    11    10    9    Clustered Index
Scan    Clustered Index
Scan    OBJECT:([lmdb].[dbo].[outmail_].[IX_outmail_list]),
WHERE:([outmail_].[Type_]='list')    [outmail_].[Type_],
[outmail_].[MessageID_]    1.0    0.01878925    3.9800001E-5    128    3.7658099E-2    [outmail_].[Type_],
[outmail_].[MessageID_]   NULL    PLAN_ROW    0    1.0 
                 |                   |--Clustered Index
Seek(OBJECT:([lmdb].[dbo].[lyrCompletedRecips].[IX_CompletedRecipsMailingID]),
SEEK:([lyrCompletedRecips].[MailingID]=[outmail_].[MessageID_]), WHERE:([lyrCompletedRecips].[CompletionStatusID]=300
AN   11    11    8    Clustered Index Seek    Clustered Index Seek
OBJECT:([lmdb].[dbo].[lyrCompletedRecips].[IX_CompletedRecipsMailingID]),
SEEK:([lyrCompletedRecips].[MailingID]=[outmail_].[MessageID_]), WHERE:([lyrCompletedRecips].[CompletionStatusID]=300
AND[lyrCompletedRecips].[FinalAttempt]>'Jan 20 2004 12:00AM')     [lyrCompletedRecips].[CompletionStatusID],
[lyrCompletedRecips].[FinalAttempt],[lyrCompletedRecips].[MemberID]    119640.6    0.5750553    0.13207871    53
1.5463468   [lyrCompletedRecips].[CompletionStatusID], [lyrCompletedRecips].[FinalAttempt],
[lyrCompletedRecips].[MemberID]   NULL    PLAN_ROW    -1    3.0 
                 |--Parallelism(Repartition Streams, PARTITION
COLUMNS:([members_].[MemberID_]))    11    19    4    Parallelism    Repartition
Streams    PARTITION
COLUMNS:([members_].[MemberID_])    NULL    4782883.5    0.0    15.474822    19    74.094414    [members_].[MemberID_],
[Expr1006]   NULL    PLAN_ROW    -1    1.0 
                      |--Nested Loops(Left Outer Join,
WHERE:([members_].[MemberID_]=[lyrActiveRecips].[MemberID]))    11    20    19    Nested Loops    Left Outer Join
WHERE:([members_].[MemberID_]=[lyrActiveRecips].[MemberID])   NULL    4782883.5    0.0    9.9962263    19    58.619591
 [members_].[MemberID_], [Expr1006]    NULL    PLAN_ROW    -1    1.0 
                           |--Clustered Index
Seek(OBJECT:([lmdb].[dbo].[members_].[IX_members_List_EmailLC]),
SEEK:([members_].[List_]='list1'),
WHERE:([members_].[MemberType_]='normal' AND
[members_].[SubType_]='mail') ORDERED FORWARD)    11    22    20    Clustered Index
Seek    Clustered Index
Seek    OBJECT:([lmdb].[dbo].[members_].[IX_members_List_EmailLC]),
SEEK:([members_].[List_]='list1'),
WHERE:([members_].[MemberType_]='normal' AND
[members_].[SubType_]='mail') ORDERED FORWARD    [members_].[SubType_],
[members_].[MemberType_],
[members_].[MemberID_]    4782883.5    40.160122    3.2745986    410    43.434719    [members_].[SubType_],
[members_].[MemberType_],[members_].[MemberID_]    NULL    PLAN_ROW    -1    1.0 
                           |--Table Spool    11    24    20    Table Spool    Lazy
Spool    NULL    NULL    1.0    1.6756756E-2    3.7999999E-7    15    0.90211391    [lyrActiveRecips].[MemberID],
[Expr1006]   NULL    PLAN_ROW    -1    4782883.5 
                                |--Compute
Scalar(DEFINE:([Expr1006]=Convert([Expr1021])))    11    25    24    Compute
Scalar    Compute
Scalar    DEFINE:([Expr1006]=Convert([Expr1021]))    [Expr1006]=Convert([Expr1021])    1.0    0.0    7.6000001E-6    15
  2.4437904E-2    [lyrActiveRecips].[MemberID], [Expr1006]    NULL    PLAN_ROW    -1    1.0 
                                     |--Stream Aggregate(GROUP
BY:([lyrActiveRecips].[MemberID])
DEFINE:([Expr1021]=Count(*)))    11    26    25    Stream Aggregate    Aggregate    GROUP
BY:([lyrActiveRecips].[MemberID])    [Expr1021]=Count(*)    1.0    0.0    7.6000001E-6    15    2.4437904E-2
[lyrActiveRecips].[MemberID],[Expr1021]    NULL    PLAN_ROW    -1    1.0 
                                          |--Sort(ORDER
BY:([lyrActiveRecips].[MemberID] ASC))    11    27    26    Sort    Sort    ORDER
BY:([lyrActiveRecips].[MemberID]
ASC)    NULL    1.0    1.1261261E-2    1.00011E-4    11    2.4430305E-2    [lyrActiveRecips].[MemberID]    NULL
PLAN_ROW   -1    1.0 

|--Filter(WHERE:([outmail_].[Type_]='list'))    11    28    27    Filter    Filter    WHERE:([outmail_].[Type_]='list')
  NULL    1.0    0.0    4.7999998E-7    156    1.3069032E-2    [lyrActiveRecips].[MemberID]    NULL    PLAN_ROW    -1
1.0 
                                                    |--Bookmark
Lookup(BOOKMARK:([Bmk1004]),
OBJECT:([lmdb].[dbo].[outmail_]))    11    29    28    Bookmark Lookup    Bookmark
Lookup    BOOKMARK:([Bmk1004]),
OBJECT:([lmdb].[dbo].[outmail_])    [outmail_].[Type_]    1.0    3.1249749E-3    0.0000011    156    1.3068552E-2
[lyrActiveRecips].[MemberID],[outmail_].[Type_]    NULL    PLAN_ROW    -1    1.0 
                                                         |--Nested
Loops(Inner Join, OUTER
REFERENCES:([lyrActiveRecips].[MailingID]))    11    30    29    Nested Loops    Inner
Join    OUTER
REFERENCES:([lyrActiveRecips].[MailingID])    NULL    1.0    0.0    0.00001254    138    9.9424766E-3
[lyrActiveRecips].[MemberID],[Bmk1004]    NULL    PLAN_ROW    -1    1.0 

|--Bookmark Lookup(BOOKMARK:([Bmk1002]),
OBJECT:([lmdb].[dbo].[lyrActiveRecips]))    11    31    30    Bookmark
Lookup    Bookmark Lookup    BOOKMARK:([Bmk1002]),
OBJECT:([lmdb].[dbo].[lyrActiveRecips])    [lyrActiveRecips].[MemberID],
[lyrActiveRecips].[MailingID]    1.0    3.1249749E-3    0.0000011    53    6.4091529E-3
[lyrActiveRecips].[MemberID],[lyrActiveRecips].[MailingID]    NULL    PLAN_ROW    -1    1.0 
                                                              |
|--Index
Seek(OBJECT:([lmdb].[dbo].[lyrActiveRecips].[jacktest_lar_date_ix]),
SEEK:([lyrActiveRecips].[NextAttempt] > 'Jan 20 2004 12:00AM') ORDERED
FORWARD)    11    32    31    Index Seek    Index
Seek    OBJECT:([lmdb].[dbo].[lyrActiveRecips].[jacktest_lar_date_ix]),
SEEK:([lyrActiveRecips].[NextAttempt] > 'Jan 20 2004 12:00AM') ORDERED
FORWARD    [Bmk1002]    1.0    3.2034749E-3    7.9603E-5    40    3.2830781E-3    [Bmk1002]    NULL    PLAN_ROW    -1
1.0 
                                                              |--Index
Seek(OBJECT:([lmdb].[dbo].[outmail_].[PK_outmail_]),
SEEK:([outmail_].[MessageID_]=[lyrActiveRecips].[MailingID]) ORDERED
FORWARD)    11    33    30    Index Seek    Index
Seek    OBJECT:([lmdb].[dbo].[outmail_].[PK_outmail_]),
SEEK:([outmail_].[MessageID_]=[lyrActiveRecips].[MailingID]) ORDERED
FORWARD    [Bmk1004]    1.0    3.2034749E-3    7.9603E-5    93    3.520784E-3    [Bmk1004]    NULL    PLAN_ROW    -1
3.0
--
Jack Coates, Lyris Technologies Applications Engineer
510-549-4350 x148, jack@lyris.com
"Interoperability is the keyword, uniformity is a dead end."
                --Olivier Fourdan



Re: query optimization question

From
Tom Lane
Date:
Jack Coates <jack@lyris.com> writes:
> yup -- here it is. It will probably be a nasty mess after linewrap gets
> done with it,

yup, sure is :-(  If I was familiar with the layout I could probably
decipher where the line breaks are supposed to be, but right now I'm
just confused.

> so let me know if you'd like me to post a copy on ftp.

Probably better to repost it as a gzip'd attachment.  That should
protect the formatting and get it into the list archives.

            regards, tom lane

Re: query optimization question

From
Jack Coates
Date:
On Thu, 2004-01-29 at 14:01, Tom Lane wrote:

> Probably better to repost it as a gzip'd attachment.  That should
> protect the formatting and get it into the list archives.
>
>             regards, tom lane

complete with a picture of the GUI version. 26k zipped, let's see if
this makes it through.
--
Jack Coates, Lyris Technologies Applications Engineer
510-549-4350 x148, jack@lyris.com
"Interoperability is the keyword, uniformity is a dead end."
                --Olivier Fourdan



Re: query optimization question

From
Dennis Bjorklund
Date:
On Thu, 29 Jan 2004, Tom Lane wrote:

> > jackdb-# GROUP BY memberid_ HAVING (
>
> Um, that's not what I had in mind at all.  Does GROUP BY actually do
> anything at all here?  (You didn't answer me as to whether memberid_
> is a unique identifier or not, but if it is, this GROUP BY is just an
> expensive no-op.)

From your comment I assume that there is no transformation in pg that
detects that the group by columns are unique?

> this is all blue-sky speculation anyway.  What I'm actually more
> interested in is your statement that MSSQL can do the original query
> quickly.  I find that a bit hard to believe because I don't see any
> relevant optimization techniques.

Getting rid of the group by would not give that kind of speedup? Maybe
mssql manage to rewrite the query like that before executing.

--
/Dennis Björklund


Re: query optimization question

From
Dennis Bjorklund
Date:
On Thu, 29 Jan 2004, Jack Coates wrote:

> > Probably better to repost it as a gzip'd attachment.  That should
>
> complete with a picture of the GUI version. 26k zipped, let's see if
> this makes it through.

Are you sure you attached it?

At least when it got here there was no attachment.

--
/Dennis Björklund


Re: query optimization question

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Jack Coates <jack@lyris.com> writes:
> > yup -- here it is. It will probably be a nasty mess after linewrap gets
> > done with it,
>
> yup, sure is :-(  If I was familiar with the layout I could probably
> decipher where the line breaks are supposed to be, but right now I'm
> just confused.

I just replaced all newlines that are followed by lines starting in column 1
with spaces and got something reasonable:

SELECT DISTINCT members_.memberid_   FROM members_   WHERE ( members_.List_='list1'    AND
members_.MemberType_='normal'   AND members_.SubType_='mail' )   GROUP BY memberid_ HAVING (       ( select count(*)
fromlyrActiveRecips, outmail_    where outmail    11    1    0    NULL    NULL    1    NULL    102274.5    NULL    NULL
  NULL    104.10356    NULL    NULL    SELECT    0    NULL 
  |--Parallelism(Gather Streams)    11    2    1    Parallelism    Gather Streams    NULL    NULL    102274.5    0.0
0.22011127   23    104.10356    [members_].[MemberID_]    NULL    PLAN_ROW    -1    1.0 
       |--Filter(WHERE:(If ([Expr1006] IS NULL) then 0 else [Expr1006]+If ([Expr1012] IS NULL) then 0 else
[Expr1012]=3))   11    3    2    Filter    Filter    WHERE:(If ([Expr1006] IS NULL) then 0 else [Expr1006]+If
([Expr1012]IS NULL) then 0 else [Expr1012]=3)    NULL    102274.5    0.0    3.5393338    23    103.88345
[members_].[MemberID_]   NULL    PLAN_ROW    -1    1.0 
            |--Hash Match(Right Outer Join, HASH:([lyrCompletedRecips].[MemberID])=([members_].[MemberID_]),
RESIDUAL:([members_].[MemberID_]=[lyrCompletedRecips].[MemberID]))   11    4    3    Hash Match    Right Outer Join
HASH:([lyrCompletedRecips].[MemberID])=([members_].[MemberID_]),
RESIDUAL:([members_].[MemberID_]=[lyrCompletedRecips].[MemberID])   NULL    4782883.5    0.0    21.874712    23
100.34412   [members_].[MemberID_], [Expr1006], [Expr1012]    NULL    PLAN_ROW    -1    1.0 
                 |--Compute Scalar(DEFINE:([Expr1012]=Convert([Expr1020])))    11    5    4    Compute Scalar
ComputeScalar    DEFINE:([Expr1012]=Convert([Expr1020]))    [Expr1012]=Convert([Expr1020])    119575.35    0.0
1.3723248   15    4.3749919    [lyrCompletedRecips].[MemberID], [Expr1012]    NULL    PLAN_ROW    -1    1.0 
                 |    |--Hash Match(Aggregate, HASH:([lyrCompletedRecips].[MemberID]),
RESIDUAL:([lyrCompletedRecips].[MemberID]=[lyrCompletedRecips].[MemberID])DEFINE:([Expr1020]=COUNT(*)))    11    6    5
  Hash Match    Aggregate    HASH:([lyrCompletedRecips].[MemberID]),
RESIDUAL:([lyrCompletedRecips].[MemberID]=[lyrCompletedRecips].[MemberID])   [Expr1020]=COUNT(*)    119575.35    0.0
1.3723248   15    4.3749919    [lyrCompletedRecips].[MemberID], [Expr1020]    NULL    PLAN_ROW    -1    1.0 
                 |         |--Parallelism(Repartition Streams, PARTITION COLUMNS:([lyrCompletedRecips].[MemberID]))
11   7    6    Parallelism    Repartition Streams    PARTITION COLUMNS:([lyrCompletedRecips].[MemberID])    NULL
119640.6   0.0    0.32407209    173    3.002667    [lyrCompletedRecips].[MemberID]    NULL    PLAN_ROW    -1    1.0 
                 |              |--Nested Loops(Inner Join, OUTER REFERENCES:([outmail_].[MessageID_]))    11    8    7
  Nested Loops    Inner Join    OUTER REFERENCES:([outmail_].[MessageID_])    NULL    119640.6    0.0    0.75014657
173   2.6785948    [lyrCompletedRecips].[MemberID]    NULL    PLAN_ROW    -1    1.0 
                 |                   |--Parallelism(Distribute Streams)    11    9    8    Parallelism    Distribute
Streams   NULL    NULL    1.0    0.0    2.8501874E-2    128    9.4664574E-2    [outmail_].[MessageID_]    NULL
PLAN_ROW   -1    1.0 
                 |                   |    |--Clustered Index Scan(OBJECT:([lmdb].[dbo].[outmail_].[IX_outmail_list]),
WHERE:([outmail_].[Type_]='list'))   11    10    9    Clustered Index Scan    Clustered Index Scan
OBJECT:([lmdb].[dbo].[outmail_].[IX_outmail_list]),WHERE:([outmail_].[Type_]='list')    [outmail_].[Type_],
[outmail_].[MessageID_]   1.0    0.01878925    3.9800001E-5    128    3.7658099E-2    [outmail_].[Type_],
[outmail_].[MessageID_]   NULL    PLAN_ROW    0    1.0 
                 |                   |--Clustered Index
Seek(OBJECT:([lmdb].[dbo].[lyrCompletedRecips].[IX_CompletedRecipsMailingID]),
SEEK:([lyrCompletedRecips].[MailingID]=[outmail_].[MessageID_]), WHERE:([lyrCompletedRecips].[CompletionStatusID]=300
AN   11    11    8    Clustered Index Seek    Clustered Index Seek
OBJECT:([lmdb].[dbo].[lyrCompletedRecips].[IX_CompletedRecipsMailingID]),
SEEK:([lyrCompletedRecips].[MailingID]=[outmail_].[MessageID_]), WHERE:([lyrCompletedRecips].[CompletionStatusID]=300
AND[lyrCompletedRecips].[FinalAttempt]>'Jan 20 2004 12:00AM')     [lyrCompletedRecips].[CompletionStatusID],
[lyrCompletedRecips].[FinalAttempt],[lyrCompletedRecips].[MemberID]    119640.6    0.5750553    0.13207871    53
1.5463468   [lyrCompletedRecips].[CompletionStatusID], [lyrCompletedRecips].[FinalAttempt],
[lyrCompletedRecips].[MemberID]   NULL    PLAN_ROW    -1    3.0 
                 |--Parallelism(Repartition Streams, PARTITION COLUMNS:([members_].[MemberID_]))    11    19    4
Parallelism   Repartition Streams    PARTITION COLUMNS:([members_].[MemberID_])    NULL    4782883.5    0.0
15.474822   19    74.094414    [members_].[MemberID_], [Expr1006]    NULL    PLAN_ROW    -1    1.0 
                      |--Nested Loops(Left Outer Join, WHERE:([members_].[MemberID_]=[lyrActiveRecips].[MemberID]))
11   20    19    Nested Loops    Left Outer Join    WHERE:([members_].[MemberID_]=[lyrActiveRecips].[MemberID])    NULL
  4782883.5    0.0    9.9962263    19    58.619591    [members_].[MemberID_], [Expr1006]    NULL    PLAN_ROW    -1
1.0
                           |--Clustered Index Seek(OBJECT:([lmdb].[dbo].[members_].[IX_members_List_EmailLC]),
SEEK:([members_].[List_]='list1'), WHERE:([members_].[MemberType_]='normal' AND [members_].[SubType_]='mail') ORDERED
FORWARD)   11    22    20    Clustered Index Seek    Clustered Index Seek
OBJECT:([lmdb].[dbo].[members_].[IX_members_List_EmailLC]),SEEK:([members_].[List_]='list1'),
WHERE:([members_].[MemberType_]='normal'AND [members_].[SubType_]='mail') ORDERED FORWARD    [members_].[SubType_],
[members_].[MemberType_],[members_].[MemberID_]    4782883.5    40.160122    3.2745986    410    43.434719
[members_].[SubType_],[members_].[MemberType_], [members_].[MemberID_]    NULL    PLAN_ROW    -1    1.0 
                           |--Table Spool    11    24    20    Table Spool    Lazy Spool    NULL    NULL    1.0
1.6756756E-2   3.7999999E-7    15    0.90211391    [lyrActiveRecips].[MemberID], [Expr1006]    NULL    PLAN_ROW    -1
4782883.5 
                                |--Compute Scalar(DEFINE:([Expr1006]=Convert([Expr1021])))    11    25    24    Compute
Scalar   Compute Scalar    DEFINE:([Expr1006]=Convert([Expr1021]))    [Expr1006]=Convert([Expr1021])    1.0    0.0
7.6000001E-6   15    2.4437904E-2    [lyrActiveRecips].[MemberID], [Expr1006]    NULL    PLAN_ROW    -1    1.0 
                                     |--Stream Aggregate(GROUP BY:([lyrActiveRecips].[MemberID])
DEFINE:([Expr1021]=Count(*)))   11    26    25    Stream Aggregate    Aggregate    GROUP
BY:([lyrActiveRecips].[MemberID])   [Expr1021]=Count(*)    1.0    0.0    7.6000001E-6    15    2.4437904E-2
[lyrActiveRecips].[MemberID],[Expr1021]    NULL    PLAN_ROW    -1    1.0 
                                          |--Sort(ORDER BY:([lyrActiveRecips].[MemberID] ASC))    11    27    26
Sort   Sort    ORDER BY:([lyrActiveRecips].[MemberID] ASC)    NULL    1.0    1.1261261E-2    1.00011E-4    11
2.4430305E-2   [lyrActiveRecips].[MemberID]    NULL    PLAN_ROW    -1    1.0 
                                               |--Filter(WHERE:([outmail_].[Type_]='list'))    11    28    27    Filter
  Filter    WHERE:([outmail_].[Type_]='list')    NULL    1.0    0.0    4.7999998E-7    156    1.3069032E-2
[lyrActiveRecips].[MemberID]   NULL    PLAN_ROW    -1    1.0 
                                                    |--Bookmark Lookup(BOOKMARK:([Bmk1004]),
OBJECT:([lmdb].[dbo].[outmail_]))   11    29    28    Bookmark Lookup    Bookmark Lookup    BOOKMARK:([Bmk1004]),
OBJECT:([lmdb].[dbo].[outmail_])   [outmail_].[Type_]    1.0    3.1249749E-3    0.0000011    156    1.3068552E-2
[lyrActiveRecips].[MemberID],[outmail_].[Type_]    NULL    PLAN_ROW    -1    1.0 
                                                         |--Nested Loops(Inner Join, OUTER
REFERENCES:([lyrActiveRecips].[MailingID]))   11    30    29    Nested Loops    Inner Join    OUTER
REFERENCES:([lyrActiveRecips].[MailingID])   NULL    1.0    0.0    0.00001254    138    9.9424766E-3
[lyrActiveRecips].[MemberID],[Bmk1004]    NULL    PLAN_ROW    -1    1.0 
                                                              |--Bookmark Lookup(BOOKMARK:([Bmk1002]),
OBJECT:([lmdb].[dbo].[lyrActiveRecips]))   11    31    30    Bookmark Lookup    Bookmark Lookup
BOOKMARK:([Bmk1002]),OBJECT:([lmdb].[dbo].[lyrActiveRecips])    [lyrActiveRecips].[MemberID],
[lyrActiveRecips].[MailingID]   1.0    3.1249749E-3    0.0000011    53    6.4091529E-3    [lyrActiveRecips].[MemberID],
[lyrActiveRecips].[MailingID]   NULL    PLAN_ROW    -1    1.0 
                                                              |    |--Index
Seek(OBJECT:([lmdb].[dbo].[lyrActiveRecips].[jacktest_lar_date_ix]),SEEK:([lyrActiveRecips].[NextAttempt] > 'Jan 20
200412:00AM') ORDERED FORWARD)    11    32    31    Index Seek    Index Seek
OBJECT:([lmdb].[dbo].[lyrActiveRecips].[jacktest_lar_date_ix]),SEEK:([lyrActiveRecips].[NextAttempt] > 'Jan 20 2004
12:00AM')ORDERED FORWARD    [Bmk1002]    1.0    3.2034749E-3    7.9603E-5    40    3.2830781E-3    [Bmk1002]    NULL
PLAN_ROW   -1    1.0 
                                                              |--Index
Seek(OBJECT:([lmdb].[dbo].[outmail_].[PK_outmail_]),SEEK:([outmail_].[MessageID_]=[lyrActiveRecips].[MailingID])
ORDEREDFORWARD)    11    33    30    Index Seek    Index Seek    OBJECT:([lmdb].[dbo].[outmail_].[PK_outmail_]),
SEEK:([outmail_].[MessageID_]=[lyrActiveRecips].[MailingID])ORDERED FORWARD    [Bmk1004]    1.0    3.2034749E-3
7.9603E-5   93    3.520784E-3    [Bmk1004]    NULL    PLAN_ROW    -1    3.0 

I still can't make heads or tails of it though.

--
greg

Re: query optimization question

From
Tom Lane
Date:
Dennis Bjorklund <db@zigo.dhs.org> writes:
> Getting rid of the group by would not give that kind of speedup?

No.  Getting rid of the per-row subqueries (or at least finding a way to
make 'em a lot cheaper) is the only way to make any meaningful change.

            regards, tom lane

Re: query optimization question

From
Jack Coates
Date:
On Thu, 2004-01-29 at 23:23, Dennis Bjorklund wrote:
> On Thu, 29 Jan 2004, Jack Coates wrote:
>
> > > Probably better to repost it as a gzip'd attachment.  That should
> >
> > complete with a picture of the GUI version. 26k zipped, let's see if
> > this makes it through.
>
> Are you sure you attached it?
>
> At least when it got here there was no attachment.

argh; attached the 40K version which was in color, removed it to make
the new one with greyscale and forgot to attach that. Here it is again:
--
Jack Coates, Lyris Technologies Applications Engineer
510-549-4350 x148, jack@lyris.com
"Interoperability is the keyword, uniformity is a dead end."
                --Olivier Fourdan

Attachment