Re: query optimization question - Mailing list pgsql-performance

From Jack Coates
Subject Re: query optimization question
Date
Msg-id 1075411407.7494.130.camel@cletus.lyris.com
Whole thread Raw
In response to Re: query optimization question  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: query optimization question  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
On Thu, 2004-01-29 at 11:31, Tom Lane wrote:
> Jack Coates <jack@lyris.com> writes:
> > jackdb=# explain SELECT DISTINCT members_.memberid_
> > jackdb-# FROM members_
> > jackdb-# WHERE ( members_.List_='list1'
> > jackdb(# AND members_.MemberType_='normal'
> > jackdb(# AND members_.SubType_='mail'
> > jackdb(# AND members_.emailaddr_ IS NOT NULL )
> > jackdb-# GROUP BY memberid_ HAVING (
>
> Um, that's not what I had in mind at all.  Does GROUP BY actually do
> anything at all here?  (You didn't answer me as to whether memberid_
> is a unique identifier or not, but if it is, this GROUP BY is just an
> expensive no-op.)
>

Sorry for the misunderstanding. It should be unique, yes.

> What I was envisioning was pulling the sub-selects up to the top level
> and using grouping to calculate the count(*) values for all memberids
> in parallel.  Roughly speaking it would look like (again assuming
> memberid_ is unique)
>
> SELECT memberid_ FROM
> (
>   SELECT memberid_ FROM lyrActiveRecips, members_, outmail
>   WHERE (all the conditions for this case)
>   UNION ALL
>   SELECT memberid_ FROM lyrCompletedRecips, members_, outmail
>   WHERE (all the conditions for this case)
> )
> GROUP BY memberid_ HAVING count(*) = 3;
>
> However, if you can't change the boilerplate part of your query then
> this is all blue-sky speculation anyway.

Got it now -- I'm running into some subquery errors trying to implement
this, anyway.

> What I'm actually more
> interested in is your statement that MSSQL can do the original query
> quickly.  I find that a bit hard to believe because I don't see any
> relevant optimization techniques.  Do they have any equivalent to
> EXPLAIN that would give some hint how they're doing it?

yup -- here it is. It will probably be a nasty mess after linewrap gets
done with it, so let me know if you'd like me to post a copy on ftp.

SELECT DISTINCT members_.memberid_   FROM members_   WHERE (
members_.List_='list1'    AND members_.MemberType_='normal'    AND
members_.SubType_='mail' )   GROUP BY memberid_ HAVING (       ( select
count(*) from lyrActiveRecips, outmail_    where
outmail    11    1    0    NULL    NULL    1    NULL    102274.5    NULL    NULL    NULL    104.10356    NULL    NULL
SELECT    0    NULL 
  |--Parallelism(Gather Streams)    11    2    1    Parallelism    Gather
Streams    NULL    NULL    102274.5    0.0    0.22011127    23    104.10356    [members_].[MemberID_]    NULL
PLAN_ROW   -1    1.0 
       |--Filter(WHERE:(If ([Expr1006] IS NULL) then 0 else
[Expr1006]+If ([Expr1012] IS NULL) then 0 else
[Expr1012]=3))    11    3    2    Filter    Filter    WHERE:(If ([Expr1006] IS NULL) then
0 else [Expr1006]+If ([Expr1012] IS NULL) then 0 else
[Expr1012]=3)    NULL    102274.5    0.0    3.5393338    23    103.88345    [members_].[MemberID_]    NULL    PLAN_ROW
 -1    1.0 
            |--Hash Match(Right Outer Join,
HASH:([lyrCompletedRecips].[MemberID])=([members_].[MemberID_]),
RESIDUAL:([members_].[MemberID_]=[lyrCompletedRecips].[MemberID]))    11    4    3    Hash Match    Right Outer Join
HASH:([lyrCompletedRecips].[MemberID])=([members_].[MemberID_]),
RESIDUAL:([members_].[MemberID_]=[lyrCompletedRecips].[MemberID])   NULL    4782883.5    0.0    21.874712    23
100.34412   [members_].[MemberID_], [Expr1006], [Expr1012]    NULL    PLAN_ROW    -1    1.0 
                 |--Compute
Scalar(DEFINE:([Expr1012]=Convert([Expr1020])))    11    5    4    Compute
Scalar    Compute
Scalar    DEFINE:([Expr1012]=Convert([Expr1020]))    [Expr1012]=Convert([Expr1020])    119575.35    0.0    1.3723248
15   4.3749919    [lyrCompletedRecips].[MemberID], [Expr1012]    NULL    PLAN_ROW    -1    1.0 
                 |    |--Hash Match(Aggregate,
HASH:([lyrCompletedRecips].[MemberID]),
RESIDUAL:([lyrCompletedRecips].[MemberID]=[lyrCompletedRecips].[MemberID]) DEFINE:([Expr1020]=COUNT(*)))    11    6
5   Hash Match    Aggregate    HASH:([lyrCompletedRecips].[MemberID]),
RESIDUAL:([lyrCompletedRecips].[MemberID]=[lyrCompletedRecips].[MemberID])   [Expr1020]=COUNT(*)    119575.35    0.0
1.3723248   15    4.3749919    [lyrCompletedRecips].[MemberID], [Expr1020]    NULL    PLAN_ROW    -1    1.0 
                 |         |--Parallelism(Repartition Streams, PARTITION
COLUMNS:([lyrCompletedRecips].[MemberID]))    11    7    6    Parallelism    Repartition Streams    PARTITION
COLUMNS:([lyrCompletedRecips].[MemberID])   NULL    119640.6    0.0    0.32407209    173    3.002667
[lyrCompletedRecips].[MemberID]   NULL    PLAN_ROW    -1    1.0 
                 |              |--Nested Loops(Inner Join, OUTER
REFERENCES:([outmail_].[MessageID_]))    11    8    7    Nested Loops    Inner
Join    OUTER
REFERENCES:([outmail_].[MessageID_])    NULL    119640.6    0.0    0.75014657    173    2.6785948
[lyrCompletedRecips].[MemberID]   NULL    PLAN_ROW    -1    1.0 
                 |                   |--Parallelism(Distribute
Streams)    11    9    8    Parallelism    Distribute
Streams    NULL    NULL    1.0    0.0    2.8501874E-2    128    9.4664574E-2    [outmail_].[MessageID_]    NULL
PLAN_ROW   -1    1.0 
                 |                   |    |--Clustered Index
Scan(OBJECT:([lmdb].[dbo].[outmail_].[IX_outmail_list]),
WHERE:([outmail_].[Type_]='list'))    11    10    9    Clustered Index
Scan    Clustered Index
Scan    OBJECT:([lmdb].[dbo].[outmail_].[IX_outmail_list]),
WHERE:([outmail_].[Type_]='list')    [outmail_].[Type_],
[outmail_].[MessageID_]    1.0    0.01878925    3.9800001E-5    128    3.7658099E-2    [outmail_].[Type_],
[outmail_].[MessageID_]   NULL    PLAN_ROW    0    1.0 
                 |                   |--Clustered Index
Seek(OBJECT:([lmdb].[dbo].[lyrCompletedRecips].[IX_CompletedRecipsMailingID]),
SEEK:([lyrCompletedRecips].[MailingID]=[outmail_].[MessageID_]), WHERE:([lyrCompletedRecips].[CompletionStatusID]=300
AN   11    11    8    Clustered Index Seek    Clustered Index Seek
OBJECT:([lmdb].[dbo].[lyrCompletedRecips].[IX_CompletedRecipsMailingID]),
SEEK:([lyrCompletedRecips].[MailingID]=[outmail_].[MessageID_]), WHERE:([lyrCompletedRecips].[CompletionStatusID]=300
AND[lyrCompletedRecips].[FinalAttempt]>'Jan 20 2004 12:00AM')     [lyrCompletedRecips].[CompletionStatusID],
[lyrCompletedRecips].[FinalAttempt],[lyrCompletedRecips].[MemberID]    119640.6    0.5750553    0.13207871    53
1.5463468   [lyrCompletedRecips].[CompletionStatusID], [lyrCompletedRecips].[FinalAttempt],
[lyrCompletedRecips].[MemberID]   NULL    PLAN_ROW    -1    3.0 
                 |--Parallelism(Repartition Streams, PARTITION
COLUMNS:([members_].[MemberID_]))    11    19    4    Parallelism    Repartition
Streams    PARTITION
COLUMNS:([members_].[MemberID_])    NULL    4782883.5    0.0    15.474822    19    74.094414    [members_].[MemberID_],
[Expr1006]   NULL    PLAN_ROW    -1    1.0 
                      |--Nested Loops(Left Outer Join,
WHERE:([members_].[MemberID_]=[lyrActiveRecips].[MemberID]))    11    20    19    Nested Loops    Left Outer Join
WHERE:([members_].[MemberID_]=[lyrActiveRecips].[MemberID])   NULL    4782883.5    0.0    9.9962263    19    58.619591
 [members_].[MemberID_], [Expr1006]    NULL    PLAN_ROW    -1    1.0 
                           |--Clustered Index
Seek(OBJECT:([lmdb].[dbo].[members_].[IX_members_List_EmailLC]),
SEEK:([members_].[List_]='list1'),
WHERE:([members_].[MemberType_]='normal' AND
[members_].[SubType_]='mail') ORDERED FORWARD)    11    22    20    Clustered Index
Seek    Clustered Index
Seek    OBJECT:([lmdb].[dbo].[members_].[IX_members_List_EmailLC]),
SEEK:([members_].[List_]='list1'),
WHERE:([members_].[MemberType_]='normal' AND
[members_].[SubType_]='mail') ORDERED FORWARD    [members_].[SubType_],
[members_].[MemberType_],
[members_].[MemberID_]    4782883.5    40.160122    3.2745986    410    43.434719    [members_].[SubType_],
[members_].[MemberType_],[members_].[MemberID_]    NULL    PLAN_ROW    -1    1.0 
                           |--Table Spool    11    24    20    Table Spool    Lazy
Spool    NULL    NULL    1.0    1.6756756E-2    3.7999999E-7    15    0.90211391    [lyrActiveRecips].[MemberID],
[Expr1006]   NULL    PLAN_ROW    -1    4782883.5 
                                |--Compute
Scalar(DEFINE:([Expr1006]=Convert([Expr1021])))    11    25    24    Compute
Scalar    Compute
Scalar    DEFINE:([Expr1006]=Convert([Expr1021]))    [Expr1006]=Convert([Expr1021])    1.0    0.0    7.6000001E-6    15
  2.4437904E-2    [lyrActiveRecips].[MemberID], [Expr1006]    NULL    PLAN_ROW    -1    1.0 
                                     |--Stream Aggregate(GROUP
BY:([lyrActiveRecips].[MemberID])
DEFINE:([Expr1021]=Count(*)))    11    26    25    Stream Aggregate    Aggregate    GROUP
BY:([lyrActiveRecips].[MemberID])    [Expr1021]=Count(*)    1.0    0.0    7.6000001E-6    15    2.4437904E-2
[lyrActiveRecips].[MemberID],[Expr1021]    NULL    PLAN_ROW    -1    1.0 
                                          |--Sort(ORDER
BY:([lyrActiveRecips].[MemberID] ASC))    11    27    26    Sort    Sort    ORDER
BY:([lyrActiveRecips].[MemberID]
ASC)    NULL    1.0    1.1261261E-2    1.00011E-4    11    2.4430305E-2    [lyrActiveRecips].[MemberID]    NULL
PLAN_ROW   -1    1.0 

|--Filter(WHERE:([outmail_].[Type_]='list'))    11    28    27    Filter    Filter    WHERE:([outmail_].[Type_]='list')
  NULL    1.0    0.0    4.7999998E-7    156    1.3069032E-2    [lyrActiveRecips].[MemberID]    NULL    PLAN_ROW    -1
1.0 
                                                    |--Bookmark
Lookup(BOOKMARK:([Bmk1004]),
OBJECT:([lmdb].[dbo].[outmail_]))    11    29    28    Bookmark Lookup    Bookmark
Lookup    BOOKMARK:([Bmk1004]),
OBJECT:([lmdb].[dbo].[outmail_])    [outmail_].[Type_]    1.0    3.1249749E-3    0.0000011    156    1.3068552E-2
[lyrActiveRecips].[MemberID],[outmail_].[Type_]    NULL    PLAN_ROW    -1    1.0 
                                                         |--Nested
Loops(Inner Join, OUTER
REFERENCES:([lyrActiveRecips].[MailingID]))    11    30    29    Nested Loops    Inner
Join    OUTER
REFERENCES:([lyrActiveRecips].[MailingID])    NULL    1.0    0.0    0.00001254    138    9.9424766E-3
[lyrActiveRecips].[MemberID],[Bmk1004]    NULL    PLAN_ROW    -1    1.0 

|--Bookmark Lookup(BOOKMARK:([Bmk1002]),
OBJECT:([lmdb].[dbo].[lyrActiveRecips]))    11    31    30    Bookmark
Lookup    Bookmark Lookup    BOOKMARK:([Bmk1002]),
OBJECT:([lmdb].[dbo].[lyrActiveRecips])    [lyrActiveRecips].[MemberID],
[lyrActiveRecips].[MailingID]    1.0    3.1249749E-3    0.0000011    53    6.4091529E-3
[lyrActiveRecips].[MemberID],[lyrActiveRecips].[MailingID]    NULL    PLAN_ROW    -1    1.0 
                                                              |
|--Index
Seek(OBJECT:([lmdb].[dbo].[lyrActiveRecips].[jacktest_lar_date_ix]),
SEEK:([lyrActiveRecips].[NextAttempt] > 'Jan 20 2004 12:00AM') ORDERED
FORWARD)    11    32    31    Index Seek    Index
Seek    OBJECT:([lmdb].[dbo].[lyrActiveRecips].[jacktest_lar_date_ix]),
SEEK:([lyrActiveRecips].[NextAttempt] > 'Jan 20 2004 12:00AM') ORDERED
FORWARD    [Bmk1002]    1.0    3.2034749E-3    7.9603E-5    40    3.2830781E-3    [Bmk1002]    NULL    PLAN_ROW    -1
1.0 
                                                              |--Index
Seek(OBJECT:([lmdb].[dbo].[outmail_].[PK_outmail_]),
SEEK:([outmail_].[MessageID_]=[lyrActiveRecips].[MailingID]) ORDERED
FORWARD)    11    33    30    Index Seek    Index
Seek    OBJECT:([lmdb].[dbo].[outmail_].[PK_outmail_]),
SEEK:([outmail_].[MessageID_]=[lyrActiveRecips].[MailingID]) ORDERED
FORWARD    [Bmk1004]    1.0    3.2034749E-3    7.9603E-5    93    3.520784E-3    [Bmk1004]    NULL    PLAN_ROW    -1
3.0
--
Jack Coates, Lyris Technologies Applications Engineer
510-549-4350 x148, jack@lyris.com
"Interoperability is the keyword, uniformity is a dead end."
                --Olivier Fourdan



pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: query optimization question
Next
From: Richard Huxton
Date:
Subject: Re: Explain plan for 2 column index