Thread: update with from

update with from

From
Sim Zacks
Date:
Postgresql 8.2

I want to update a table with a from that has mutliple rows joining to
it. According to the docs, this is not advisable because:
 "If it does, then only one of the join rows will be used to update the
target row, but which one will be used is not readily predictable."

In my tests, if the joined rows are sorted it always updates with the
first row. Does anyone have any other experiences, or should I be
concerned that at some point it will behave differently?

The performance is currently unacceptable when changing the from to only
join to one row makes


Thanks
Sim

Re: update with from

From
Sim Zacks
Date:
> In my tests, if the joined rows are sorted it always updates with the
> first row. Does anyone have any other experiences, or should I be
> concerned that at some point it will behave differently?

I checked my tests again. It always uses the last one, not the first one.

Sim


Re: update with from

From
Alban Hertroys
Date:
On 23 January 2012 14:48, Sim Zacks <sim@compulab.co.il> wrote:
>> In my tests, if the joined rows are sorted it always updates with the
>> first row. Does anyone have any other experiences, or should I be
>> concerned that at some point it will behave differently?
>
> I checked my tests again. It always uses the last one, not the first one.
>
> Sim

I expect that your records get updated multiple times too.

Perhaps you can use DISTINCT ON in your from-based UPDATE?

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

Re: update with from

From
Adrian Klaver
Date:
On Monday, January 23, 2012 2:00:29 am Sim Zacks wrote:
> Postgresql 8.2
>
> I want to update a table with a from that has mutliple rows joining to
> it. According to the docs, this is not advisable because:
>  "If it does, then only one of the join rows will be used to update the
> target row, but which one will be used is not readily predictable."
>
> In my tests, if the joined rows are sorted it always updates with the
> first row. Does anyone have any other experiences, or should I be
> concerned that at some point it will behave differently?
>
> The performance is currently unacceptable when changing the from to only
> join to one row makes

I guess the primary question here is, what are you trying to achieve?
Do want a particular row to supply the values to the target table i.e the row
with the most timestamp?
What is the query you are using?

>
>
> Thanks
> Sim

--
Adrian Klaver
adrian.klaver@gmail.com

Re: update with from

From
Sim Zacks
Date:
On 01/23/2012 04:34 PM, Alban Hertroys wrote:
> On 23 January 2012 14:48, Sim Zacks <sim@compulab.co.il> wrote:
>>> In my tests, if the joined rows are sorted it always updates with the
>>> first row. Does anyone have any other experiences, or should I be
>>> concerned that at some point it will behave differently?
>>
>> I checked my tests again. It always uses the last one, not the first one.
>>
>> Sim
>
> I expect that your records get updated multiple times too.
>
> Perhaps you can use DISTINCT ON in your from-based UPDATE?
>

The select in the from statement (i.e. update.. from (select...) )
returns 2739 records. It updates 617 records.
When I run the update it returns "617 rows affected"
So I'm guessing it is not multiple updates.

Sim

Re: update with from

From
Sim Zacks
Date:
On 01/23/2012 05:13 PM, Adrian Klaver wrote:

>
> I guess the primary question here is, what are you trying to achieve?
> Do want a particular row to supply the values to the target table i.e the row
> with the most timestamp?
> What is the query you are using?
>
The query returns a partid, unitprice and delivery weeks from the latest
set of rfqs sent. I want to update the table with the delivery weeks per
part of the cheapest of those rfqs.

This is the update stmt I am using, assuming that it always updates the
table with the last row per part:
update stat_allocated_components a set
partarrivedate=current_date+(b.deliverywks*7),partarrivedate_source='RFQ
Est'
from
(select b.popartid,b.partid,b.unitprice,b.deliverywks from poparts b
    join pos c using(poid)
    join lastrfqdateperpart d using(partid)
    where c.isrfq and c.issuedate > d.issuedate-7
    AND b.unitprice > 0::numeric AND b.quantity >= 100::numeric AND
c.postatusid = ANY (ARRAY[40, 41])
    order by b.partid,b.unitprice desc, b.deliverywks desc) b
where a.partid=b.partid and partarrivedate is null and
a.stock-a.previouscommitmentlf+a.quantity<0 and b.deliverywks is not null

This query take 163 ms.

When I throw in code to make the select only return the correct rows
The select statement takes 9 secs by itself:
select a.partid,a.deliverywks
from poparts a where popartid in (
    select b.popartid from poparts b
    join pos c using(poid)
    join stock.lastrfqdateperpart d using(partid)
    where c.isrfq and c.issuedate > d.issuedate-7
    AND b.unitprice > 0::numeric AND b.quantity >= 100::numeric AND
c.postatusid = ANY (ARRAY[40, 41])
    and b.partid=a.partid
    order by b.partid,b.unitprice, b.deliverywks
    limit 1
)

Re: update with from

From
Tom Lane
Date:
Sim Zacks <sim@compulab.co.il> writes:
> I want to update a table with a from that has mutliple rows joining to
> it. According to the docs, this is not advisable because:
>  "If it does, then only one of the join rows will be used to update the
> target row, but which one will be used is not readily predictable."

That means exactly what it says.

> In my tests, if the joined rows are sorted it always updates with the
> first row. Does anyone have any other experiences, or should I be
> concerned that at some point it will behave differently?

If you rely on this, your code *will* break some day.  Probably at
3AM while you're on vacation.  All it takes is a plan change.

            regards, tom lane

Re: update with from

From
Adrian Klaver
Date:
On Monday, January 23, 2012 7:32:35 am Sim Zacks wrote:
> On 01/23/2012 05:13 PM, Adrian Klaver wrote:
> > I guess the primary question here is, what are you trying to achieve?
> > Do want a particular row to supply the values to the target table i.e the
> > row with the most timestamp?
> > What is the query you are using?
>
> The query returns a partid, unitprice and delivery weeks from the latest
> set of rfqs sent. I want to update the table with the delivery weeks per
> part of the cheapest of those rfqs.

>
> This is the update stmt I am using, assuming that it always updates the
> table with the last row per part:
> update stat_allocated_components a set
> partarrivedate=current_date+(b.deliverywks*7),partarrivedate_source='RFQ
> Est'
> from
> (select b.popartid,b.partid,b.unitprice,b.deliverywks from poparts b
>     join pos c using(poid)
>     join lastrfqdateperpart d using(partid)
>     where c.isrfq and c.issuedate > d.issuedate-7
>     AND b.unitprice > 0::numeric AND b.quantity >= 100::numeric AND
> c.postatusid = ANY (ARRAY[40, 41])
>     order by b.partid,b.unitprice desc, b.deliverywks desc) b
> where a.partid=b.partid and partarrivedate is null and
> a.stock-a.previouscommitmentlf+a.quantity<0 and b.deliverywks is not null
>
> This query take 163 ms.
>
> When I throw in code to make the select only return the correct rows
> The select statement takes 9 secs by itself:
> select a.partid,a.deliverywks
> from poparts a where popartid in (
>     select b.popartid from poparts b
>     join pos c using(poid)
>     join stock.lastrfqdateperpart d using(partid)
>     where c.isrfq and c.issuedate > d.issuedate-7
>     AND b.unitprice > 0::numeric AND b.quantity >= 100::numeric AND
> c.postatusid = ANY (ARRAY[40, 41])
>     and b.partid=a.partid
>     order by b.partid,b.unitprice, b.deliverywks
>     limit 1
> )

From what I can see they are not the same queries, notwithstanding the
selectivity in the second query.  In fact I am not sure what the second query
accomplishes that cannot be done in the first query:)

Would you not get the same result in the first query by doing something like:

select b.popartid,b.partid,b.unitprice,b.deliverywks from poparts b
        join pos c using(poid)
        join lastrfqdateperpart d using(partid)
        where c.isrfq and c.issuedate > d.issuedate-7
        AND b.unitprice > 0::numeric AND b.quantity >= 100::numeric AND
c.postatusid = ANY (ARRAY[40, 41])
        order by b.partid,b.unitprice desc, b.deliverywks desc limit 1




--
Adrian Klaver
adrian.klaver@gmail.com

Re: update with from

From
Adrian Klaver
Date:
On Monday, January 23, 2012 7:32:35 am Sim Zacks wrote:
> On 01/23/2012 05:13 PM, Adrian Klaver wrote:

>
> When I throw in code to make the select only return the correct rows
> The select statement takes 9 secs by itself:
> select a.partid,a.deliverywks
> from poparts a where popartid in (
>     select b.popartid from poparts b
>     join pos c using(poid)
>     join stock.lastrfqdateperpart d using(partid)
>     where c.isrfq and c.issuedate > d.issuedate-7
>     AND b.unitprice > 0::numeric AND b.quantity >= 100::numeric AND
> c.postatusid = ANY (ARRAY[40, 41])
>     and b.partid=a.partid
>     order by b.partid,b.unitprice, b.deliverywks
>     limit 1
> )

To clarify what I posted earlier, my suggestion was based on rewriting the
second query as:

        select b.partid,b.deliverywks b.popartid from poparts b
        join pos c using(poid)
        join stock.lastrfqdateperpart d using(partid)
        where c.isrfq and c.issuedate > d.issuedate-7
        AND b.unitprice > 0::numeric AND b.quantity >= 100::numeric AND
c.postatusid = ANY (ARRAY[40, 41])
        order by b.partid,b.unitprice, b.deliverywks
        limit 1

I may be missing the intent of your original query, but I think the above gets
to the same result without the IN.

--
Adrian Klaver
adrian.klaver@gmail.com

Re: update with from

From
Sim Zacks
Date:
On 01/23/2012 07:10 PM, Adrian Klaver wrote:
> On Monday, January 23, 2012 7:32:35 am Sim Zacks wrote:
>> On 01/23/2012 05:13 PM, Adrian Klaver wrote:
>
>>
>> When I throw in code to make the select only return the correct rows
>> The select statement takes 9 secs by itself:
>> select a.partid,a.deliverywks
>> from poparts a where popartid in (
>>     select b.popartid from poparts b
>>     join pos c using(poid)
>>     join stock.lastrfqdateperpart d using(partid)
>>     where c.isrfq and c.issuedate > d.issuedate-7
>>     AND b.unitprice > 0::numeric AND b.quantity >= 100::numeric AND
>> c.postatusid = ANY (ARRAY[40, 41])
>>     and b.partid=a.partid
>>     order by b.partid,b.unitprice, b.deliverywks
>>     limit 1
>> )
>
> To clarify what I posted earlier, my suggestion was based on rewriting the
> second query as:
>
>         select b.partid,b.deliverywks b.popartid from poparts b
>         join pos c using(poid)
>         join stock.lastrfqdateperpart d using(partid)
>         where c.isrfq and c.issuedate > d.issuedate-7
>         AND b.unitprice > 0::numeric AND b.quantity >= 100::numeric AND
> c.postatusid = ANY (ARRAY[40, 41])
>         order by b.partid,b.unitprice, b.deliverywks
>         limit 1
>
> I may be missing the intent of your original query, but I think the above gets
> to the same result without the IN.
>

My first query returns all rows of each part ordered such so that the
row I want to actually update the table with is last. This query returns
12000 rows, for the 600 parts I want to update.

My second query with the limit within the subselect gets 1 row per part.
This returns 600 rows, 1 row for each part I want to update.

Your suggestion would only return one row.

See
http://www.pgsql.cz/index.php/PostgreSQL_SQL_Tricks#Select_first_n_rows_from_group
for reference.

Re: update with from

From
Adrian Klaver
Date:
On Monday, January 23, 2012 10:11:00 pm Sim Zacks wrote:
> On 01/23/2012 07:10 PM, Adrian Klaver wrote:
> > On Monday, January 23, 2012 7:32:35 am Sim Zacks wrote:
> >> On 01/23/2012 05:13 PM, Adrian Klaver wrote:
> >>
> >>
> >> When I throw in code to make the select only return the correct rows
> >> The select statement takes 9 secs by itself:
> >> select a.partid,a.deliverywks
> >> from poparts a where popartid in (
> >>
> >>     select b.popartid from poparts b
> >>     join pos c using(poid)
> >>     join stock.lastrfqdateperpart d using(partid)
> >>     where c.isrfq and c.issuedate > d.issuedate-7
> >>     AND b.unitprice > 0::numeric AND b.quantity >= 100::numeric AND
> >>
> >> c.postatusid = ANY (ARRAY[40, 41])
> >>
> >>     and b.partid=a.partid
> >>     order by b.partid,b.unitprice, b.deliverywks
> >>     limit 1
> >>
> >> )
> >
> > To clarify what I posted earlier, my suggestion was based on rewriting
> > the
> >
> > second query as:
> >         select b.partid,b.deliverywks b.popartid from poparts b
> >         join pos c using(poid)
> >         join stock.lastrfqdateperpart d using(partid)
> >         where c.isrfq and c.issuedate > d.issuedate-7
> >         AND b.unitprice > 0::numeric AND b.quantity >= 100::numeric AND
> >
> > c.postatusid = ANY (ARRAY[40, 41])
> >
> >         order by b.partid,b.unitprice, b.deliverywks
> >         limit 1
> >
> > I may be missing the intent of your original query, but I think the above
> > gets to the same result without the IN.
>
> My first query returns all rows of each part ordered such so that the
> row I want to actually update the table with is last. This query returns
> 12000 rows, for the 600 parts I want to update.
>
> My second query with the limit within the subselect gets 1 row per part.
> This returns 600 rows, 1 row for each part I want to update.
>
> Your suggestion would only return one row.

Oops. So per a previous suggestion:

select DISTINCT ON (b.partid) b.partid, b.deliverywks from poparts b
        join pos c using(poid)
        join stock.lastrfqdateperpart d using(partid)
        where c.isrfq and c.issuedate > d.issuedate-7
        AND b.unitprice > 0::numeric AND b.quantity >= 100::numeric AND
c.postatusid = ANY (ARRAY[40, 41])
        order by b.partid,b.unitprice, b.deliverywks


>
> See
> http://www.pgsql.cz/index.php/PostgreSQL_SQL_Tricks#Select_first_n_rows_fro
> m_group for reference.

--
Adrian Klaver
adrian.klaver@gmail.com