Thread: Refactored queries needing validation of syntactic equivalence

Refactored queries needing validation of syntactic equivalence

From
Mike Adams
Date:
Hello! I'm a long time lurker who has become responsible for maintaining
/ updating utility queries at work.  I've reworked two queries (as text
attachment as they are wide lines) to enhance the planner's chance of
speeding up the queries (Oracle8i's).

I'm looking for someone to eyeball them and let me know if I've folded
the sub-selects up correctly (I'm the ONLY sql speaking person at work
so having a coworker do so is unfortunately not possible).

Also unfortunately, there currently aren't any issues in the database
that these queries are designed to find.  All I can say for sure is (as
you can see below each query) my refactored queries *at the least*
return *no* data faster than the legacy queries...

Thank you in advance and I wish the application at work used postgresql
as it's backend!

Michael Adams
                               legacy                                |                              refactor

---------------------------------------------------------------------+--------------------------------------------------------------------
 select m.co_cd,                                                     | select m.co_cd,
        m.ve_cd,                                                     |        m.ve_cd,
        m.ivc_cd,                                                    |        m.ivc_cd,
        m.po_cd,                                                     |        m.po_cd,
        m.itm_cd,                                                    |        m.itm_cd,
        m.qty,                                                       |        m.qty,
        m.unit_cst,                                                  |        m.unit_cst,
        (m.qty*m.unit_cst) as ext_cst,                               |        (m.qty*m.unit_cst) as ext_cst,
        to_char(m.rcv_dt,'YYYY-MM-DD') as received,                  |        to_char(m.rcv_dt,'YYYY-MM-DD') as
received,
        origin_cd,                                                   |        m.origin_cd,
        to_char(m.assigned_dt,'YYYY-MM-DD') as assigned              |        to_char(m.assigned_dt,'YYYY-MM-DD') as
assigned
 from   rcv_mo m                                                     | from   rcv_mo m, rcv_mo r
 where  ( m.origin_cd    = 'MOM' )                                   | where  ( m.origin_cd    = 'MOM' )
    and ( m.ASSIGNED_DT <= '31-Oct-2007'                             |    and ( m.ASSIGNED_DT <= '31-Oct-2007'
          or                                                         |          or
          m.ASSIGNED_DT is null                                      |          m.ASSIGNED_DT is null
    )                                                                |    )
    and ( exists ( select 1                                          |    and (   m.po_cd     =  r.po_cd  )
                   from   rcv_mo  o                                  |    and (   m.itm_cd    =  r.itm_cd )
                   where  ( m.po_cd        = o.po_cd  )              |    and ( r.assigned_dt is null       )
                      and ( m.itm_cd       = o.itm_cd )              |    and ( r.rcv_dt      <= '31-Oct-2007')
                      and ( o.assigned_dt is null )                  | order by m.VE_CD, m.po_cd, m.itm_cd
                      and ( o.rcv_dt      <= '31-Oct-2007')          |
                 )                                                   | -- 0 record(s) selected [Fetch MetaData: 0/ms]
[FetchData: 0/ms]
 
    )                                                                | -- [Executed: 10/10/07 9:24:09 AM CDT ]
[Execution:937/ms]
 
 order by m.VE_CD, m.po_cd, m.itm_cd                                 |
                                                                     |
 -- 0 record(s) selected [Fetch MetaData: 16/ms] [Fetch Data: 0/ms]  |
 -- [Executed: 10/10/07 8:47:39 AM CDT ] [Execution: 2054333/ms]     |
                                                                     |

---------------------------------------------------------------------+--------------------------------------------------------------------

---------------------------------------------------------------------+--------------------------------------------------------------------
 select o.co_cd,                                                     | select o.co_cd,
        o.ve_cd,                                                     |        o.ve_cd,
        o.ivc_cd,                                                    |        o.ivc_cd,
        o.po_cd,                                                     |        o.po_cd,
        o.itm_cd,                                                    |        o.itm_cd,
        o.qty,                                                       |        o.qty,
        o.unit_cst,                                                  |        o.unit_cst,
        (o.qty*o.unit_cst) as ext_cst,                               |        (o.qty*o.unit_cst) as ext_cst,
        to_char(o.rcv_dt,'YYYY-MM-DD') as received,                  |        to_char(o.rcv_dt,'YYYY-MM-DD') as
received,
        o.origin_cd,                                                 |        o.origin_cd,
        to_char(o.assigned_dt,'YYYY-MM-DD') as assigned              |        to_char(o.assigned_dt,'YYYY-MM-DD') as
assigned
 from   rcv_mo o                                                     | from   rcv_mo o, rcv_mo m
 where  ( o.assigned_dt is null               )                      | where  ( o.assigned_dt is null           )
    and ( o.rcv_dt      <= '31-Oct-2007' )                           |    and ( o.rcv_dt      <= '31-Oct-2007'  )
    and ( exists ( select 1                                          |    and ( m.origin_cd    = 'MOM'          )
                   from   rcv_mo  m                                  |    and ( o.po_cd        = m.po_cd        )
                   where  ( m.origin_cd    = 'MOM'         )         |    and ( o.itm_cd       = m.itm_cd       )
                      and ( o.po_cd        = m.po_cd       )         |    and ( m.assigned_dt <= '31-Oct-2007'
                      and ( o.itm_cd       = m.itm_cd      )         |        or
                      and ( m.assigned_dt <= '31-Oct-2007'           |          m.assigned_dt is null
                            or                                       |    )
                            m.assigned_dt is null                    | order by o.ve_cd, o.po_cd, o.itm_cd
                      )                                              |
                 )                                                   | -- 0 record(s) selected [Fetch MetaData: 0/ms]
[FetchData: 0/ms]
 
    )                                                                | -- [Executed: 10/10/07 9:32:03 AM CDT ]
[Execution:344/ms]
 
 order by o.ve_cd, o.po_cd, o.itm_cd                                 |
                                                                     |
 -- 0 record(s) selected [Fetch MetaData: 0/ms] [Fetch Data: 0/ms]   |
 -- [Executed: 10/10/07 9:31:28 AM CDT ] [Execution: 148562/ms]      |
                                                                     |

---------------------------------------------------------------------+---------------------------------------------------------------------

---------------------------------------------------------------------+---------------------------------------------------------------------

Re: Refactored queries needing validation of syntactic equivalence

From
Richard Huxton
Date:
Mike Adams wrote:
> Hello! I'm a long time lurker who has become responsible for maintaining 
> / updating utility queries at work.  I've reworked two queries (as text 
> attachment as they are wide lines) to enhance the planner's chance of 
> speeding up the queries (Oracle8i's).

Well, I can't say it's standard procedure to look at Oracle queries, but  if you don't tell anyone I won't :-)

> I'm looking for someone to eyeball them and let me know if I've folded 
> the sub-selects up correctly (I'm the ONLY sql speaking person at work 
> so having a coworker do so is unfortunately not possible).
> 
> Also unfortunately, there currently aren't any issues in the database 
> that these queries are designed to find.  All I can say for sure is (as 
> you can see below each query) my refactored queries *at the least* 
> return *no* data faster than the legacy queries...

Test data time then. No alternative to testing these things.

> Thank you in advance and I wish the application at work used postgresql 
> as it's backend!

OK, you've substituted and EXISTS check against a sub-query with a 
self-join. The key question of course is whether your join can return 
more than one match from "rcv_mo m" for each row in "rcv_mo o". I can't 
say without knowing more about your schema, and even then I'd want to 
test it.

--   Richard Huxton  Archonet Ltd


Re: Refactored queries needing validation of syntactic equivalence

From
Mike Adams
Date:
Richard Huxton wrote:
(quoted OP lines edited for brevity...)
> Mike Adams wrote:
...
>> I've reworked two 
>> queries (as text attachment as they are wide lines) to enhance the 
>> planner's chance of speeding up the queries (Oracle8i's).
> 
> Well, I can't say it's standard procedure to look at Oracle queries, but 
>  if you don't tell anyone I won't :-)
No prob, my lips are sealed... ;-)

It's basically generic sql sanity checking that's needed anyhow.

> 
>> I'm looking for someone to eyeball them and let me know if I've folded 
>> the sub-selects up correctly
...
>>
>> Also unfortunately, there currently aren't any issues in the database 
>> that these queries are designed to find.  All I can say for sure is 
>> (as you can see below each query) my refactored queries *at the least* 
>> return *no* data faster than the legacy queries...
> 
> Test data time then. No alternative to testing these things.
> 
I do plan to run the old and the new until I'm sure the new queries
aren't borked and return the same set of info.

>> Thank you in advance
...
> OK, you've substituted and EXISTS check against a sub-query with a 
> self-join. The key question of course is whether your join can return 
> more than one match from "rcv_mo m" for each row in "rcv_mo o". I can't 
> say without knowing more about your schema, and even then I'd want to 
> test it.
> 
Thanks for the response!
The schema is basically:

table rcv_mo(
CO_CD         VCHR(3),           --COMPANY CODE.
VE_CD         NOT NULL VCHR(4),  --VENDOR CODE.
IVC_CD        VCHR(20),          --INVOICE CODE.
PO_CD         NOT NULL VCHR(13), --PURCHASE ORDER CODE.
ITM_CD        NOT NULL VCHR(9),  --ITEM CODE.
QTY           NUM(13,2),         --QUANTITY.
UNIT_CST      NUM(13,2),         --UNIT COST.
RCV_DT        DATE,              --RECEIVED DATE.
ORIGIN_CD     NOT NULL VCHR(5),  --CODE REPRESENTING THE PROGRAM WHICH                                 --CREATED THE
RCV_MORECORD.
 
STORE_CD      VCHR(2),           --RECEIVING STORE CODE.
WAYBILL_NUM   VCHR(20),          --WAYBILL NUMBER FROM RECEIVING BOL
ASSIGNED_DT   DATE,              --ASSIGNED DATE IS THE SYSTEM DATE WHEN                                 --THE INVOICE
ANDTHE RECEIVINGS ARE                                 --LINKED TOGETHER.
 
TMP_ADJ_ROWID VCHR(40),          --THIS FIELD WAS CREATED TO BE USED FOR                                 --SPECIAL
PROCESSINGDONE IN MPOI. UPON                                 --COMMITTING, THE TMP_ADJ_ROWID WILL
         --ALWAYS BE NULL.
 
RCVR_ID       VCHR(15),          --ID OF THE PERSON RECEIVING THE ORDER. 
EMP_CD        VCHR(15),          --ID OF THE LAST PERSON TO POST A 
                                 --CHANGE TO RCV_MO.
);

indexes:
NONUNIQE (CO_CD, VE_CD, IVC_CD, PO_CD, ITM_CD);
NONUNIQE (VE_CD, PO_CD);

Notice the date columns aren't indexed! If they were, even the original
queries would be *much* faster! Unfortunately I cannot get indexes
implemented (not w/o more aggravation than the wait for the original
queries provides).

Here's the "process": inventory mgmt system inserts tuples when/as
product arrives.  If more than one of an item (itm_cd) is in the same
batch, it may, or may not, (it's very POM dependent ;) ) be split into
multiple tuples.

Then the accounting dpt enters the product invoice into the "Match Off
Management" system and assigns received product to the appropriate
vendor invoice.

Occasionally, the receiving dpt may post oh say 48 of an item in one
table entry, however, the acctng dpt has 2 invoices for 24 items each.
In MOM the acctng dpt /manually/ splits the entry, thus inserting 2
records who's origin is 'MOM', each for 24 items, and assigns them to
the invoice(s) (or maybe just assigns one since they've not yet rec'd an
invoice for the other 24).  So, we can have *many* 'MOM' records.

They are /supposed/ to let me know so I can immediately assign the
original record to a "fake" invoice called "SPLIT IN MOM" and it drops
off the radar.  So of course, I'm rarely notified. This table is used to
accrue for received but unpaid merchandise: "orphaned" entries inflate
the accrual and inflate the value of inventory (not good).

So.....
The first query should pull all 'MOM' records that have one or more
corresponding, and possibly orphaned, unassigned receiving records
belonging to the same po_cd and item_cd.

The second query should pull all unassigned, and possibly orphaned
receiving records that have one or more corresponding 'MOM' records once
again matching on po_cd and item_cd.

Using the results of both queries to double check each other, I can
figure out which (if any) open records are, in fact, orphans and do an
"after the fact" assignment to the "SPLIT IN MOM" invoice to reduce our
accrual.

Of course, our ERMS should take care of this automagically; but,
tragically, it seems "real" inventory cost flow was attached to the
system using duct tape, hot glue, and a couple of thumb tacks.

So, given all the administriva above, have I actually refactored them
correctly?

Thanks much!



Re: Refactored queries needing validation of syntactic equivalence

From
Richard Huxton
Date:
Mike Adams wrote:
> So.....
> The first query should pull all 'MOM' records that have one or more
> corresponding, and possibly orphaned, unassigned receiving records
> belonging to the same po_cd and item_cd.
> 
> The second query should pull all unassigned, and possibly orphaned
> receiving records that have one or more corresponding 'MOM' records once
> again matching on po_cd and item_cd.
> 
> Using the results of both queries to double check each other, I can
> figure out which (if any) open records are, in fact, orphans and do an
> "after the fact" assignment to the "SPLIT IN MOM" invoice to reduce our
> accrual.
> 
> Of course, our ERMS should take care of this automagically; but,
> tragically, it seems "real" inventory cost flow was attached to the
> system using duct tape, hot glue, and a couple of thumb tacks.
> 
> So, given all the administriva above, have I actually refactored them
> correctly?

Well, clearly you could have multiple possible matches, because apart 
from anything else you could in theory have multiple entries with the 
same item-code on the same purchase-order-code. In practice it will be 
rare, but it could happen.

However, since the purpose is to provide you with a list so you can make 
manual changes there's no problem with that.

What I might be tempted to do is restrict the dates more - you had <= 
'31 Oct 2007' I'd also apply >= '1 Aug 2007' (or whatever period is 
reasonable). You can always run an unconstrained match once a month to 
catch any that slip through the net, but presumably most will fall 
within a 90-day period.

HTH
--   Richard Huxton  Archonet Ltd


Re: Refactored queries needing validation of syntactic equivalence

From
Mike Adams
Date:
Richard Huxton wrote:
> Mike Adams wrote:
>> So.....
>> The first query should pull all 'MOM' records that have one or more
>> corresponding, and possibly orphaned, unassigned receiving records
>> belonging to the same po_cd and item_cd.
>>
>> The second query should pull all unassigned, and possibly orphaned
>> receiving records that have one or more corresponding 'MOM' records once
>> again matching on po_cd and item_cd.
>>
>> Using the results of both queries to double check each other, I can
>> figure out which (if any) open records are, in fact, orphans and do an
>> "after the fact" assignment to the "SPLIT IN MOM" invoice to reduce our
>> accrual.
>>
>> Of course, our ERMS should take care of this automagically; but,
>> tragically, it seems "real" inventory cost flow was attached to the
>> system using duct tape, hot glue, and a couple of thumb tacks.
>>
>> So, given all the administriva above, have I actually refactored them
>> correctly?
> 
> Well, clearly you could have multiple possible matches, because apart 
> from anything else you could in theory have multiple entries with the 
> same item-code on the same purchase-order-code. In practice it will be 
> rare, but it could happen.
> 

Yep! and it's not rare: if we receive 20 serialized items, we *will* get 
20 entries of same "itm_cd,po_cd" as serialized items are individually 
posted in inventory (in accounting speak, they have a "specific item" 
costing basis, whereas "non serialized" items (parts etc) are (by us) 
considered to have a "FIFO" costing basis and can be lumped into "lots").

Yesterday I ran both the "legacy" and "refactor" versions of each query 
after the AP clerk (for once) let me know that her assistant had 
"joined" a number of receivings (did the reverse of a split for some 
reason).

The "orphans" query (select o.co_cd, ...) came back with the same result 
set for both the legacy and refactor versions. The "moms" query (select 
m.co_cd, ...) did not!

What I had for the "moms" result sets were (fake products replacing the 
real ones in the results below):

legacy              | refactor
--------------------+--------------------------------
2 hotplate          | 2 hotplate
6 scooper           | 2 hotplate                    | 6 scooper                    | 6 scooper                    | 6
scooper                   | 6 scooper                    | 6 scooper                    | 6 scooper
 

The "orphans" result sets were the same (faked products in results below):
       result set       -----------------       1 hotplate       1 hotplate       1 scooper       1 scooper       1
scooper      1 scooper       1 scooper       1 scooper
 

In truth those eight records returned by both "orphans" versions *were* 
actually orphaned by the *2* "moms" records that /do/ exist and were 
correctly reported by the legacy version... Oops! the refactored "moms" 
query is an unintentional (by me) cross product!

> However, since the purpose is to provide you with a list so you can make 
> manual changes there's no problem with that.
> 

Except for the unwanted cross productions!  Well, there isn't an 
available "natural" way to prevent that as the table /doesn't/ have a 
pkey or even a good candidate key.

What I did, and it did fix the result set to reflect reality, was change the   select o.co_cd, ... from ...
to   select distinct o.co_cd, ..., o.rowid from ...

rowid being Oracle's version of ctid and is the only "unique" item "in" 
the table ( oh the shame ).

> What I might be tempted to do is restrict the dates more - you had <= 
> '31 Oct 2007' I'd also apply >= '1 Aug 2007' (or whatever period is 
> reasonable). You can always run an unconstrained match once a month to 
> catch any that slip through the net, but presumably most will fall 
> within a 90-day period.
> 
> HTH

I may restrict the dates more, however the refactored queries both run 
in under 1000 ms, and given the rcv_mo table currently has >5 && <7 
years worth of historical data for them to plow through, and the plan is 
to only keep the data in the table for 7 yrs max...

Thank you for the help! I've appreciated it greatly!

Mike.