Thread: Refactored queries needing validation of syntactic equivalence
Hello! I'm a long time lurker who has become responsible for maintaining / updating utility queries at work. I've reworked two queries (as text attachment as they are wide lines) to enhance the planner's chance of speeding up the queries (Oracle8i's). I'm looking for someone to eyeball them and let me know if I've folded the sub-selects up correctly (I'm the ONLY sql speaking person at work so having a coworker do so is unfortunately not possible). Also unfortunately, there currently aren't any issues in the database that these queries are designed to find. All I can say for sure is (as you can see below each query) my refactored queries *at the least* return *no* data faster than the legacy queries... Thank you in advance and I wish the application at work used postgresql as it's backend! Michael Adams legacy | refactor ---------------------------------------------------------------------+-------------------------------------------------------------------- select m.co_cd, | select m.co_cd, m.ve_cd, | m.ve_cd, m.ivc_cd, | m.ivc_cd, m.po_cd, | m.po_cd, m.itm_cd, | m.itm_cd, m.qty, | m.qty, m.unit_cst, | m.unit_cst, (m.qty*m.unit_cst) as ext_cst, | (m.qty*m.unit_cst) as ext_cst, to_char(m.rcv_dt,'YYYY-MM-DD') as received, | to_char(m.rcv_dt,'YYYY-MM-DD') as received, origin_cd, | m.origin_cd, to_char(m.assigned_dt,'YYYY-MM-DD') as assigned | to_char(m.assigned_dt,'YYYY-MM-DD') as assigned from rcv_mo m | from rcv_mo m, rcv_mo r where ( m.origin_cd = 'MOM' ) | where ( m.origin_cd = 'MOM' ) and ( m.ASSIGNED_DT <= '31-Oct-2007' | and ( m.ASSIGNED_DT <= '31-Oct-2007' or | or m.ASSIGNED_DT is null | m.ASSIGNED_DT is null ) | ) and ( exists ( select 1 | and ( m.po_cd = r.po_cd ) from rcv_mo o | and ( m.itm_cd = r.itm_cd ) where ( m.po_cd = o.po_cd ) | and ( r.assigned_dt is null ) and ( m.itm_cd = o.itm_cd ) | and ( r.rcv_dt <= '31-Oct-2007') and ( o.assigned_dt is null ) | order by m.VE_CD, m.po_cd, m.itm_cd and ( o.rcv_dt <= '31-Oct-2007') | ) | -- 0 record(s) selected [Fetch MetaData: 0/ms] [FetchData: 0/ms] ) | -- [Executed: 10/10/07 9:24:09 AM CDT ] [Execution:937/ms] order by m.VE_CD, m.po_cd, m.itm_cd | | -- 0 record(s) selected [Fetch MetaData: 16/ms] [Fetch Data: 0/ms] | -- [Executed: 10/10/07 8:47:39 AM CDT ] [Execution: 2054333/ms] | | ---------------------------------------------------------------------+-------------------------------------------------------------------- ---------------------------------------------------------------------+-------------------------------------------------------------------- select o.co_cd, | select o.co_cd, o.ve_cd, | o.ve_cd, o.ivc_cd, | o.ivc_cd, o.po_cd, | o.po_cd, o.itm_cd, | o.itm_cd, o.qty, | o.qty, o.unit_cst, | o.unit_cst, (o.qty*o.unit_cst) as ext_cst, | (o.qty*o.unit_cst) as ext_cst, to_char(o.rcv_dt,'YYYY-MM-DD') as received, | to_char(o.rcv_dt,'YYYY-MM-DD') as received, o.origin_cd, | o.origin_cd, to_char(o.assigned_dt,'YYYY-MM-DD') as assigned | to_char(o.assigned_dt,'YYYY-MM-DD') as assigned from rcv_mo o | from rcv_mo o, rcv_mo m where ( o.assigned_dt is null ) | where ( o.assigned_dt is null ) and ( o.rcv_dt <= '31-Oct-2007' ) | and ( o.rcv_dt <= '31-Oct-2007' ) and ( exists ( select 1 | and ( m.origin_cd = 'MOM' ) from rcv_mo m | and ( o.po_cd = m.po_cd ) where ( m.origin_cd = 'MOM' ) | and ( o.itm_cd = m.itm_cd ) and ( o.po_cd = m.po_cd ) | and ( m.assigned_dt <= '31-Oct-2007' and ( o.itm_cd = m.itm_cd ) | or and ( m.assigned_dt <= '31-Oct-2007' | m.assigned_dt is null or | ) m.assigned_dt is null | order by o.ve_cd, o.po_cd, o.itm_cd ) | ) | -- 0 record(s) selected [Fetch MetaData: 0/ms] [FetchData: 0/ms] ) | -- [Executed: 10/10/07 9:32:03 AM CDT ] [Execution:344/ms] order by o.ve_cd, o.po_cd, o.itm_cd | | -- 0 record(s) selected [Fetch MetaData: 0/ms] [Fetch Data: 0/ms] | -- [Executed: 10/10/07 9:31:28 AM CDT ] [Execution: 148562/ms] | | ---------------------------------------------------------------------+--------------------------------------------------------------------- ---------------------------------------------------------------------+---------------------------------------------------------------------
Mike Adams wrote: > Hello! I'm a long time lurker who has become responsible for maintaining > / updating utility queries at work. I've reworked two queries (as text > attachment as they are wide lines) to enhance the planner's chance of > speeding up the queries (Oracle8i's). Well, I can't say it's standard procedure to look at Oracle queries, but if you don't tell anyone I won't :-) > I'm looking for someone to eyeball them and let me know if I've folded > the sub-selects up correctly (I'm the ONLY sql speaking person at work > so having a coworker do so is unfortunately not possible). > > Also unfortunately, there currently aren't any issues in the database > that these queries are designed to find. All I can say for sure is (as > you can see below each query) my refactored queries *at the least* > return *no* data faster than the legacy queries... Test data time then. No alternative to testing these things. > Thank you in advance and I wish the application at work used postgresql > as it's backend! OK, you've substituted and EXISTS check against a sub-query with a self-join. The key question of course is whether your join can return more than one match from "rcv_mo m" for each row in "rcv_mo o". I can't say without knowing more about your schema, and even then I'd want to test it. -- Richard Huxton Archonet Ltd
Richard Huxton wrote: (quoted OP lines edited for brevity...) > Mike Adams wrote: ... >> I've reworked two >> queries (as text attachment as they are wide lines) to enhance the >> planner's chance of speeding up the queries (Oracle8i's). > > Well, I can't say it's standard procedure to look at Oracle queries, but > if you don't tell anyone I won't :-) No prob, my lips are sealed... ;-) It's basically generic sql sanity checking that's needed anyhow. > >> I'm looking for someone to eyeball them and let me know if I've folded >> the sub-selects up correctly ... >> >> Also unfortunately, there currently aren't any issues in the database >> that these queries are designed to find. All I can say for sure is >> (as you can see below each query) my refactored queries *at the least* >> return *no* data faster than the legacy queries... > > Test data time then. No alternative to testing these things. > I do plan to run the old and the new until I'm sure the new queries aren't borked and return the same set of info. >> Thank you in advance ... > OK, you've substituted and EXISTS check against a sub-query with a > self-join. The key question of course is whether your join can return > more than one match from "rcv_mo m" for each row in "rcv_mo o". I can't > say without knowing more about your schema, and even then I'd want to > test it. > Thanks for the response! The schema is basically: table rcv_mo( CO_CD VCHR(3), --COMPANY CODE. VE_CD NOT NULL VCHR(4), --VENDOR CODE. IVC_CD VCHR(20), --INVOICE CODE. PO_CD NOT NULL VCHR(13), --PURCHASE ORDER CODE. ITM_CD NOT NULL VCHR(9), --ITEM CODE. QTY NUM(13,2), --QUANTITY. UNIT_CST NUM(13,2), --UNIT COST. RCV_DT DATE, --RECEIVED DATE. ORIGIN_CD NOT NULL VCHR(5), --CODE REPRESENTING THE PROGRAM WHICH --CREATED THE RCV_MORECORD. STORE_CD VCHR(2), --RECEIVING STORE CODE. WAYBILL_NUM VCHR(20), --WAYBILL NUMBER FROM RECEIVING BOL ASSIGNED_DT DATE, --ASSIGNED DATE IS THE SYSTEM DATE WHEN --THE INVOICE ANDTHE RECEIVINGS ARE --LINKED TOGETHER. TMP_ADJ_ROWID VCHR(40), --THIS FIELD WAS CREATED TO BE USED FOR --SPECIAL PROCESSINGDONE IN MPOI. UPON --COMMITTING, THE TMP_ADJ_ROWID WILL --ALWAYS BE NULL. RCVR_ID VCHR(15), --ID OF THE PERSON RECEIVING THE ORDER. EMP_CD VCHR(15), --ID OF THE LAST PERSON TO POST A --CHANGE TO RCV_MO. ); indexes: NONUNIQE (CO_CD, VE_CD, IVC_CD, PO_CD, ITM_CD); NONUNIQE (VE_CD, PO_CD); Notice the date columns aren't indexed! If they were, even the original queries would be *much* faster! Unfortunately I cannot get indexes implemented (not w/o more aggravation than the wait for the original queries provides). Here's the "process": inventory mgmt system inserts tuples when/as product arrives. If more than one of an item (itm_cd) is in the same batch, it may, or may not, (it's very POM dependent ;) ) be split into multiple tuples. Then the accounting dpt enters the product invoice into the "Match Off Management" system and assigns received product to the appropriate vendor invoice. Occasionally, the receiving dpt may post oh say 48 of an item in one table entry, however, the acctng dpt has 2 invoices for 24 items each. In MOM the acctng dpt /manually/ splits the entry, thus inserting 2 records who's origin is 'MOM', each for 24 items, and assigns them to the invoice(s) (or maybe just assigns one since they've not yet rec'd an invoice for the other 24). So, we can have *many* 'MOM' records. They are /supposed/ to let me know so I can immediately assign the original record to a "fake" invoice called "SPLIT IN MOM" and it drops off the radar. So of course, I'm rarely notified. This table is used to accrue for received but unpaid merchandise: "orphaned" entries inflate the accrual and inflate the value of inventory (not good). So..... The first query should pull all 'MOM' records that have one or more corresponding, and possibly orphaned, unassigned receiving records belonging to the same po_cd and item_cd. The second query should pull all unassigned, and possibly orphaned receiving records that have one or more corresponding 'MOM' records once again matching on po_cd and item_cd. Using the results of both queries to double check each other, I can figure out which (if any) open records are, in fact, orphans and do an "after the fact" assignment to the "SPLIT IN MOM" invoice to reduce our accrual. Of course, our ERMS should take care of this automagically; but, tragically, it seems "real" inventory cost flow was attached to the system using duct tape, hot glue, and a couple of thumb tacks. So, given all the administriva above, have I actually refactored them correctly? Thanks much!
Mike Adams wrote: > So..... > The first query should pull all 'MOM' records that have one or more > corresponding, and possibly orphaned, unassigned receiving records > belonging to the same po_cd and item_cd. > > The second query should pull all unassigned, and possibly orphaned > receiving records that have one or more corresponding 'MOM' records once > again matching on po_cd and item_cd. > > Using the results of both queries to double check each other, I can > figure out which (if any) open records are, in fact, orphans and do an > "after the fact" assignment to the "SPLIT IN MOM" invoice to reduce our > accrual. > > Of course, our ERMS should take care of this automagically; but, > tragically, it seems "real" inventory cost flow was attached to the > system using duct tape, hot glue, and a couple of thumb tacks. > > So, given all the administriva above, have I actually refactored them > correctly? Well, clearly you could have multiple possible matches, because apart from anything else you could in theory have multiple entries with the same item-code on the same purchase-order-code. In practice it will be rare, but it could happen. However, since the purpose is to provide you with a list so you can make manual changes there's no problem with that. What I might be tempted to do is restrict the dates more - you had <= '31 Oct 2007' I'd also apply >= '1 Aug 2007' (or whatever period is reasonable). You can always run an unconstrained match once a month to catch any that slip through the net, but presumably most will fall within a 90-day period. HTH -- Richard Huxton Archonet Ltd
Richard Huxton wrote: > Mike Adams wrote: >> So..... >> The first query should pull all 'MOM' records that have one or more >> corresponding, and possibly orphaned, unassigned receiving records >> belonging to the same po_cd and item_cd. >> >> The second query should pull all unassigned, and possibly orphaned >> receiving records that have one or more corresponding 'MOM' records once >> again matching on po_cd and item_cd. >> >> Using the results of both queries to double check each other, I can >> figure out which (if any) open records are, in fact, orphans and do an >> "after the fact" assignment to the "SPLIT IN MOM" invoice to reduce our >> accrual. >> >> Of course, our ERMS should take care of this automagically; but, >> tragically, it seems "real" inventory cost flow was attached to the >> system using duct tape, hot glue, and a couple of thumb tacks. >> >> So, given all the administriva above, have I actually refactored them >> correctly? > > Well, clearly you could have multiple possible matches, because apart > from anything else you could in theory have multiple entries with the > same item-code on the same purchase-order-code. In practice it will be > rare, but it could happen. > Yep! and it's not rare: if we receive 20 serialized items, we *will* get 20 entries of same "itm_cd,po_cd" as serialized items are individually posted in inventory (in accounting speak, they have a "specific item" costing basis, whereas "non serialized" items (parts etc) are (by us) considered to have a "FIFO" costing basis and can be lumped into "lots"). Yesterday I ran both the "legacy" and "refactor" versions of each query after the AP clerk (for once) let me know that her assistant had "joined" a number of receivings (did the reverse of a split for some reason). The "orphans" query (select o.co_cd, ...) came back with the same result set for both the legacy and refactor versions. The "moms" query (select m.co_cd, ...) did not! What I had for the "moms" result sets were (fake products replacing the real ones in the results below): legacy | refactor --------------------+-------------------------------- 2 hotplate | 2 hotplate 6 scooper | 2 hotplate | 6 scooper | 6 scooper | 6 scooper | 6 scooper | 6 scooper | 6 scooper The "orphans" result sets were the same (faked products in results below): result set ----------------- 1 hotplate 1 hotplate 1 scooper 1 scooper 1 scooper 1 scooper 1 scooper 1 scooper In truth those eight records returned by both "orphans" versions *were* actually orphaned by the *2* "moms" records that /do/ exist and were correctly reported by the legacy version... Oops! the refactored "moms" query is an unintentional (by me) cross product! > However, since the purpose is to provide you with a list so you can make > manual changes there's no problem with that. > Except for the unwanted cross productions! Well, there isn't an available "natural" way to prevent that as the table /doesn't/ have a pkey or even a good candidate key. What I did, and it did fix the result set to reflect reality, was change the select o.co_cd, ... from ... to select distinct o.co_cd, ..., o.rowid from ... rowid being Oracle's version of ctid and is the only "unique" item "in" the table ( oh the shame ). > What I might be tempted to do is restrict the dates more - you had <= > '31 Oct 2007' I'd also apply >= '1 Aug 2007' (or whatever period is > reasonable). You can always run an unconstrained match once a month to > catch any that slip through the net, but presumably most will fall > within a 90-day period. > > HTH I may restrict the dates more, however the refactored queries both run in under 1000 ms, and given the rcv_mo table currently has >5 && <7 years worth of historical data for them to plow through, and the plan is to only keep the data in the table for 7 yrs max... Thank you for the help! I've appreciated it greatly! Mike.