Thread: Sequential Scans On Complex Query With UNION - see why this fails
PostgreSql 8.0.3
I have a complex query that I want to read the data in a hierarchical manner - ie from master table filtered rows to sub set1, from subset1 to subset2 etc. The query is in the attached file, as is the explain. What I do not understand is why I get sequential scans on certain tables - ie the purchaseOrdDet, product, supplierProduct when each of the joins linking the tables hits a valid 'record number' based index.
The query is trying to find all purchase orders within a period, then all the lines for those orders, products for those lines etc. The union selects from the supplierproduct/product tables on both sides - in one case it uses the index and on the other does a sequential scan.
I am obviously doing something wrong in the structure of the query - any ideas?
Also,...as a quick 'method' question..when writing embedded joins, which syntax is easier for the optimiser? Should you do:
FROM table tableA AS A
JOIN tableB AS B
JOIN tableC AS C
JOIN tableD AS D
ON C.joinD = D.join
JOIN tableE AS E
ON c.joinE = E.join
ON B.joinC = C.join
ON A.joinB = B.join
OR
FROM table tableA AS A
JOIN tableB AS B
ON A.joinB = B.join
JOIN tableC AS C
ON B.joinC = C.join
JOIN tableD AS D
ON C.joinD = D.join
JOIN tableE AS E
ON c.joinE = E.join
OR is it immaterial and just a matter of style?
Regards, Steve Tucknott |
Attachment
Did anyone have any ideas on this? If not, I'll look at rewriting the query completely.
On Mon, 2006-01-02 at 09:10 +0000, Steve Tucknott wrote:
On Mon, 2006-01-02 at 09:10 +0000, Steve Tucknott wrote:
PostgreSql 8.0.3
I have a complex query that I want to read the data in a hierarchical manner - ie from master table filtered rows to sub set1, from subset1 to subset2 etc. The query is in the attached file, as is the explain. What I do not understand is why I get sequential scans on certain tables - ie the purchaseOrdDet, product, supplierProduct when each of the joins linking the tables hits a valid 'record number' based index.
The query is trying to find all purchase orders within a period, then all the lines for those orders, products for those lines etc. The union selects from the supplierproduct/product tables on both sides - in one case it uses the index and on the other does a sequential scan.
I am obviously doing something wrong in the structure of the query - any ideas?
Also,...as a quick 'method' question..when writing embedded joins, which syntax is easier for the optimiser? Should you do:
FROM table tableA AS A
JOIN tableB AS B
JOIN tableC AS C
JOIN tableD AS D
ON C.joinD = D.join
JOIN tableE AS E
ON c.joinE = E.join
ON B.joinC = C.join
ON A.joinB = B.join
OR
FROM table tableA AS A
JOIN tableB AS B
ON A.joinB = B.join
JOIN tableC AS C
ON B.joinC = C.join
JOIN tableD AS D
ON C.joinD = D.join
JOIN tableE AS E
ON c.joinE = E.join
OR is it immaterial and just a matter of style?
Regards,
Steve Tucknott
---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Regards, Steve Tucknott ReTSol Ltd DDI: 01903 828769 Mobile: 0773 671 5772 |
Steve, I don't have any explanation as to why your query is using an index in one case and a sequential scan in another. However, I do have a few observations after looking at your code that you might find helpful. On Jan 2, 2006, at 3:10 , Steve Tucknott wrote: > I am obviously doing something wrong in the structure of the query > - any ideas? When I first saw your FROM clause, I didn't even know that moving the ON clause away from the JOIN it was related to would even work. I was surprised to find that it does! I find it quite counterintuitive to separate a join and its condition. As I understand it, select * from a join b join c on c.b_name = b.b_name on b.a_name = a.a_name; is just another way of writing select * from a join b on b.a_name = a.a_name join c on c.b_name = b.b_name; and doesn't make a difference in query planning or performance. It's definitely not 'hierarchical' as I understand it. Comparing EXPLAIN ANALYZE output for the two query forms will show you if this is the case. In the queries you attached, I saw that you mix restrictions (e.g., c.b_name = 'some value') and join conditions (e.g., c.b_name = b.b_name) in both the FROM clause and the WHERE clause. I personally find this quite hard to read, though I don't think it makes much difference to the server. I believe both join conditions and restrictions are rewritten as they'd appear in the WHERE clause, so you *could* put them all in the WHERE clause. However, I find it useful to separate them out, and build queries using JOIN ... ON (and actually I prefer the JOIN ... USING syntax because only one set of join columns appears in the result). This helps me clearly see how the tables are related. I then place all restrictions in the WHERE clause. This helps me know where to look in the query to find what I'm looking for. The optimizer should be able to perform restrictions as necessary—not necessarily in the order the original query was written. After all, that's the reason for its existence :). I also noticed was that you've used column numbers rather than column names in your ORDER BY clause. I recommend using column names, as it makes it easier to really see what you're ordering by without having to refer to the SELECT target list. (In relational theory attribute order does not matter. The SQL standard *does* refer to column order, but you can go a long way and not have to worry about it, other than for UNION compatibility, and even then, you don't need to use column numbers explicitly.) I've provided a rewritten query at the bottom of this post. I'd be interested to see if it performs any differently that your original query (e.g., how their EXPLAIN ANALYZE output compares). The output also includes quite a few columns with duplicate column names (e.g., clnt.name, clntB.name; srcD.recNo, clnt.recNo, clntB.recNo, srcU.recNo). If feasible, I'd use AS to provide unique column names, as this can make it easier to reference the columns either in your client application or if you use this query as a subquery in another statement. (I haven't done this below. There could be a number of additional duplicates lurking in sProd.*, prod.*, VAT.*, and mrkUP.* .) What does the 1 = 1 do in your WHERE clause? Also, the mixed case column and table names you use get lowered in PostgreSQL, but you've probably already noticed that. To maintain case, you need to double-quote these identifiers. > Also,...as a quick 'method' question..when writing embedded joins, > which syntax is easier for the optimiser? Should you do: EXPLAIN ANALYZE will help you determine which form is more efficient in your situation. Though as I've indicated above, I'd rewrite the queries to make it easier for the programmer. Start by writing your query as you think it *should* be written to make it easiest for *you* to use. Then see how it performs. If you find it doesn't perform well, you'll have a baseline against which you can compare any improvement, and you'll learn what does and does not work. The optimizer is pretty good—don't second guess it. Code should be easy for the programmer to read and use first. Hope this helps! Michael Glaesemann grzm myrealbox com --- original query --- rewritten query SELECT clnt.name , clntB.name , periodOf(pOHdr.orderDate) , srcD.recNo , pODet.purchaseOrdHdrRecNo , clnt.recNo , clntB.recNo , srcU.recNo, srcU.replacementCost, srcU.requiredQty , sProd.* , prod.* , VAT.* , pODet.orderedQty , mrkUp.* FROM purchaseOrdHdr AS pOHdr JOIN purchaseOrdDet AS pODet ON (pODet.purchaseOrdHdrRecNo = pOHdr.recNo) JOIN sourceDetUpgr AS srcU ON (srcU.recNo = pODet.sourceDetUpgrRecNo) JOIN sourceDet AS srcD ON (srcD.recNo = srcU.sourceDetRecNo) JOIN sourceDetExtref AS srcE ON (srcE.foreignRecNo = srcD.recNo) JOIN clientBranch AS clntB ON (clntB.recNo = srcE.ownerForeignRecNo) JOIN client AS clnt ON (clntB.clientRecNo = clnt.recNo) JOIN markUp AS mrkUp ON (mrkUp.foreignRecNo = clnt.recNo) JOIN supplierProduct AS sProd ON (sProd.recNo = srcU.supplierProductRecNo) JOIN product AS prod ON (prod.recNo = sProd.productRecNo) JOIN VAT AS VAT ON (sProd.VATRecNo = VAT.recNo) WHERE pOHdr.orderDate BETWEEN '01/12/2005' AND '21/12/2005' AND 1=1 -- what does this do? AND pODet.lineStatus != 'V' AND srcD.actionStatus != 'V' AND srcD.serviceCoBranchRecNo = 2 AND srcU.lineStatus != 'V' AND srcE.tableName = 'sourcedet' -- moved down from join condition AND srcE.ownerForeignTableName = 'clientbranch' -- moved down from join condition AND mrkUp.foreignTableName = 'client' -- moved down from join condition UNION SELECT clnt.name , clntB.name , periodOf(srcD.enteredDate) , srcD.recNo , NULL , clnt.recNo , clntB.recNo , srcU.recNo, srcU.replacementCost, srcU.requiredQty , sProd.* , prod.* , VAT.* , 0 , mrkUp.* FROM sourceDet AS srcD JOIN sourceDetUpgr AS srcU ON (srcU.sourceDetRecNo = srcD.recNo) JOIN supplierProduct AS sProd ON (srcU.supplierProductRecNo = sProd.recNo) JOIN product AS prod ON (sProd.productRecNo = prod.recNo) JOIN VAT AS VAT ON (sProd.VATRecNo = VAT.recNo) JOIN sourceDetExtref AS srcE ON (srcE.foreignRecNo = srcD.recNo) JOIN clientBranch AS clntB ON (clntB.recNo = srcE.ownerForeignRecNo) JOIN client AS clnt ON (clntB.clientRecNo = clnt.recNo) JOIN markUp AS mrkUp ON (mrkUp.foreignRecNo = clnt.recNo) WHERE srcD.enteredDate BETWEEN '01/12/2005' AND '21/12/2005' AND srcD.serviceCoBranchRecNo = 2 AND srcD.actionStatus != 'V' AND srcU.lineStatus = 'S' AND mrkUp.foreignTableName = 'client' -- from join condition AND srcE.tableName = 'sourcedet' -- from join condition AND srcE.ownerForeignTableName = 'clientbranch' -- from join condition AND ( SELECT COUNT(*) FROM sourceDetUpgr AS srcU2 WHERE srcU2.sourceDetRecNo = srcD.recNo AND srcU2.lineStatus IN ('S','T') ) = ( SELECT COUNT(*) FROM sourceDetUpgr AS srcU2 WHERE srcU2.sourceDetRecNo = srcD.recNo AND srcU2.lineStatus != 'V' ) AND 1=1 -- what does this do? ORDER BY --3,2,1,4,5 periodof , clntB.name , clnt.name , srD.recNo , pODet.purchaseOrdHdrRecNo --- original query SELECT clnt.name,clntB.name,periodOf (pOHdr.orderDate),srcD.recNo,pODet.purchaseOrdHdrRecNo,clnt.recNo,clntB. recNo,srcU.recNo,srcU.replacementCost,srcU.requiredQty,sProd.*,prod.*,VA T.*,pODet.orderedQty,mrkUp.* FROM purchaseOrdHdr AS pOHdr JOIN purchaseOrdDet AS pODet JOIN sourceDetUpgr AS srcU JOIN sourceDet AS srcD JOIN sourceDetExtref AS srcE JOIN clientBranch AS clntB JOIN client AS clnt JOIN markUp AS mrkUp ON mrkUp.foreignRecNo = clnt.recNo AND mrkUp.foreignTableName = 'client' ON clntB.clientRecNo = clnt.recNo ON clntB.recNo = srcE.ownerForeignRecNo ON srcE.foreignRecNo = srcD.recNo AND srcE.tableName = 'sourcedet' AND srcE.ownerForeignTableName = 'clientbranch' ON srcD.recNo = srcU.sourceDetRecNo JOIN supplierProduct AS sProd JOIN product AS prod ON prod.recNo = sProd.productRecNo JOIN VAT AS VAT ON sProd.VATRecNo = VAT.recNo ON sProd.recNo = srcU.supplierProductRecNo ON srcU.recNo = pODet.sourceDetUpgrRecNo ON pODet.purchaseOrdHdrRecNo = pOHdr.recNo WHERE pOHdr.orderDate BETWEEN '01/12/2005' AND '21/12/2005' AND 1=1 AND pODet.lineStatus != 'V' AND srcD.actionStatus != 'V' AND srcD.serviceCoBranchRecNo = 2 AND srcU.lineStatus != 'V' UNION SELECT clnt.name,clntB.name,periodOf (srcD.enteredDate),srcD.recNo,NULL,clnt.recNo,clntB.recNo,srcU.recNo, srcU.replacementCost,srcU.requiredQty,sProd.*,prod.*,VAT.*, 0,mrkUp.* FROM sourceDet AS srcD JOIN sourceDetUpgr AS srcU JOIN supplierProduct AS sProd JOIN product AS prod ON sProd.productRecNo = prod.recNo JOIN VAT AS VAT ON sProd.VATRecNo = VAT.recNo ON srcU.supplierProductRecNo = sProd.recNo ON srcU.sourceDetRecNo = srcD.recNo JOIN sourceDetExtref AS srcE JOIN clientBranch AS clntB JOIN client AS clnt JOIN markUp AS mrkUp ON mrkUp.foreignRecNo = clnt.recNo AND mrkUp.foreignTableName = 'client' ON clntB.clientRecNo = clnt.recNo ON clntB.recNo = srcE.ownerForeignRecNo ON srcE.foreignRecNo = srcD.recNo AND srcE.tableName = 'sourcedet' AND srcE.ownerForeignTableName = 'clientbranch' WHERE srcD.enteredDate BETWEEN '01/12/2005' AND '21/12/2005' AND srcD.serviceCoBranchRecNo = 2 AND srcD.actionStatus != 'V' AND srcU.lineStatus = 'S' AND (SELECT COUNT(*) FROM sourceDetUpgr AS srcU2 WHERE srcU2.sourceDetRecNo = srcD.recNo AND srcU2.lineStatus IN ('S','T')) = (SELECT COUNT(*) FROM sourceDetUpgr AS srcU2 WHERE srcU2.sourceDetRecNo = srcD.recNo AND srcU2.lineStatus != 'V') AND 1=1 ORDER BY 3,2,1,4,5
Michael Glaesemann <grzm@myrealbox.com> writes: > When I first saw your FROM clause, I didn't even know that moving the > ON clause away from the JOIN it was related to would even work. I was > surprised to find that it does! I find it quite counterintuitive to > separate a join and its condition. As I understand it, I think what's happening is that the parser implicitly parenthesizes like this: from ((a join b on a.x=b.y) join c on b.y=c.z) versus this: from (a join (b join c on b.y=c.z) on a.x=b.y) These are equivalent formulations for inner joins, but they are decidedly not equivalent if one or both joins is OUTER. I'd tend to agree with Michael's advice to keep the join condition close to the pair of tables you think it's joining. Even better, use parentheses to make sure the parser reads it the same as you do. > I believe both join conditions and > restrictions are rewritten as they'd appear in the WHERE clause, so > you *could* put them all in the WHERE clause. Again, this is true for inner joins but very much not the case if any outer joins are involved. regards, tom lane
On Jan 12, 2006, at 12:21 , Tom Lane wrote: > I think what's happening is that the parser implicitly parenthesizes > like this: > > from ((a join b on a.x=b.y) join c on b.y=c.z) Any idea off hand if the SQL spec has anything to say on the subject? I haven't dug into my local version of the final draft to see. >> I believe both join conditions and >> restrictions are rewritten as they'd appear in the WHERE clause, so >> you *could* put them all in the WHERE clause. > > Again, this is true for inner joins but very much not the case if > any outer joins are involved. Thanks for the clarification, Tom. While I was writing I thought that OUTER JOIN was an exception, but I neglected to look into it or include my thoughts. Michael Glaesemann grzm myrealbox com
Michael Glaesemann <grzm@myrealbox.com> writes: > On Jan 12, 2006, at 12:21 , Tom Lane wrote: >> I think what's happening is that the parser implicitly parenthesizes >> like this: >> >> from ((a join b on a.x=b.y) join c on b.y=c.z) > Any idea off hand if the SQL spec has anything to say on the subject? SQL92 has this BNF: <from clause> ::= FROM <table reference> [ { <comma> <table reference> }... ] <table reference> ::= <table name> [ [ AS ] <correlation name> [ <left paren> <derived column list> <right paren> ] ] | <derived table> [ AS ] <correlation name> [ <left paren> <derived column list> <right paren> ] | <joined table> <derived table> ::= <table subquery> <derived column list> ::= <column name list> <joined table> ::= <cross join> | <qualified join> | <left paren> <joined table> <right paren> <cross join> ::= <table reference> CROSS JOIN <table reference> <qualified join> ::= <table reference> [ NATURAL ] [ <join type> ] JOIN <table reference> [ <join specification> ] <join specification> ::= <join condition> | <named columns join> <join condition> ::= ON <search condition> <named columns join> ::= USING <left paren> <join column list> <right paren> <join type> ::= INNER | <outer join type> [ OUTER ] | UNION <outer join type> ::= LEFT | RIGHT | FULL What we're talking about is the <qualified join> production, whose inputs are <table reference>s, and a <table reference> can be another <qualified join> with or without surrounding parentheses. So AFAICS SQL92 specifically allows both of these constructions. regards, tom lane