Thread: update with from
Postgresql 8.2 I want to update a table with a from that has mutliple rows joining to it. According to the docs, this is not advisable because: "If it does, then only one of the join rows will be used to update the target row, but which one will be used is not readily predictable." In my tests, if the joined rows are sorted it always updates with the first row. Does anyone have any other experiences, or should I be concerned that at some point it will behave differently? The performance is currently unacceptable when changing the from to only join to one row makes Thanks Sim
> In my tests, if the joined rows are sorted it always updates with the > first row. Does anyone have any other experiences, or should I be > concerned that at some point it will behave differently? I checked my tests again. It always uses the last one, not the first one. Sim
On 23 January 2012 14:48, Sim Zacks <sim@compulab.co.il> wrote: >> In my tests, if the joined rows are sorted it always updates with the >> first row. Does anyone have any other experiences, or should I be >> concerned that at some point it will behave differently? > > I checked my tests again. It always uses the last one, not the first one. > > Sim I expect that your records get updated multiple times too. Perhaps you can use DISTINCT ON in your from-based UPDATE? -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest.
On Monday, January 23, 2012 2:00:29 am Sim Zacks wrote: > Postgresql 8.2 > > I want to update a table with a from that has mutliple rows joining to > it. According to the docs, this is not advisable because: > "If it does, then only one of the join rows will be used to update the > target row, but which one will be used is not readily predictable." > > In my tests, if the joined rows are sorted it always updates with the > first row. Does anyone have any other experiences, or should I be > concerned that at some point it will behave differently? > > The performance is currently unacceptable when changing the from to only > join to one row makes I guess the primary question here is, what are you trying to achieve? Do want a particular row to supply the values to the target table i.e the row with the most timestamp? What is the query you are using? > > > Thanks > Sim -- Adrian Klaver adrian.klaver@gmail.com
On 01/23/2012 04:34 PM, Alban Hertroys wrote: > On 23 January 2012 14:48, Sim Zacks <sim@compulab.co.il> wrote: >>> In my tests, if the joined rows are sorted it always updates with the >>> first row. Does anyone have any other experiences, or should I be >>> concerned that at some point it will behave differently? >> >> I checked my tests again. It always uses the last one, not the first one. >> >> Sim > > I expect that your records get updated multiple times too. > > Perhaps you can use DISTINCT ON in your from-based UPDATE? > The select in the from statement (i.e. update.. from (select...) ) returns 2739 records. It updates 617 records. When I run the update it returns "617 rows affected" So I'm guessing it is not multiple updates. Sim
On 01/23/2012 05:13 PM, Adrian Klaver wrote: > > I guess the primary question here is, what are you trying to achieve? > Do want a particular row to supply the values to the target table i.e the row > with the most timestamp? > What is the query you are using? > The query returns a partid, unitprice and delivery weeks from the latest set of rfqs sent. I want to update the table with the delivery weeks per part of the cheapest of those rfqs. This is the update stmt I am using, assuming that it always updates the table with the last row per part: update stat_allocated_components a set partarrivedate=current_date+(b.deliverywks*7),partarrivedate_source='RFQ Est' from (select b.popartid,b.partid,b.unitprice,b.deliverywks from poparts b join pos c using(poid) join lastrfqdateperpart d using(partid) where c.isrfq and c.issuedate > d.issuedate-7 AND b.unitprice > 0::numeric AND b.quantity >= 100::numeric AND c.postatusid = ANY (ARRAY[40, 41]) order by b.partid,b.unitprice desc, b.deliverywks desc) b where a.partid=b.partid and partarrivedate is null and a.stock-a.previouscommitmentlf+a.quantity<0 and b.deliverywks is not null This query take 163 ms. When I throw in code to make the select only return the correct rows The select statement takes 9 secs by itself: select a.partid,a.deliverywks from poparts a where popartid in ( select b.popartid from poparts b join pos c using(poid) join stock.lastrfqdateperpart d using(partid) where c.isrfq and c.issuedate > d.issuedate-7 AND b.unitprice > 0::numeric AND b.quantity >= 100::numeric AND c.postatusid = ANY (ARRAY[40, 41]) and b.partid=a.partid order by b.partid,b.unitprice, b.deliverywks limit 1 )
Sim Zacks <sim@compulab.co.il> writes: > I want to update a table with a from that has mutliple rows joining to > it. According to the docs, this is not advisable because: > "If it does, then only one of the join rows will be used to update the > target row, but which one will be used is not readily predictable." That means exactly what it says. > In my tests, if the joined rows are sorted it always updates with the > first row. Does anyone have any other experiences, or should I be > concerned that at some point it will behave differently? If you rely on this, your code *will* break some day. Probably at 3AM while you're on vacation. All it takes is a plan change. regards, tom lane
On Monday, January 23, 2012 7:32:35 am Sim Zacks wrote: > On 01/23/2012 05:13 PM, Adrian Klaver wrote: > > I guess the primary question here is, what are you trying to achieve? > > Do want a particular row to supply the values to the target table i.e the > > row with the most timestamp? > > What is the query you are using? > > The query returns a partid, unitprice and delivery weeks from the latest > set of rfqs sent. I want to update the table with the delivery weeks per > part of the cheapest of those rfqs. > > This is the update stmt I am using, assuming that it always updates the > table with the last row per part: > update stat_allocated_components a set > partarrivedate=current_date+(b.deliverywks*7),partarrivedate_source='RFQ > Est' > from > (select b.popartid,b.partid,b.unitprice,b.deliverywks from poparts b > join pos c using(poid) > join lastrfqdateperpart d using(partid) > where c.isrfq and c.issuedate > d.issuedate-7 > AND b.unitprice > 0::numeric AND b.quantity >= 100::numeric AND > c.postatusid = ANY (ARRAY[40, 41]) > order by b.partid,b.unitprice desc, b.deliverywks desc) b > where a.partid=b.partid and partarrivedate is null and > a.stock-a.previouscommitmentlf+a.quantity<0 and b.deliverywks is not null > > This query take 163 ms. > > When I throw in code to make the select only return the correct rows > The select statement takes 9 secs by itself: > select a.partid,a.deliverywks > from poparts a where popartid in ( > select b.popartid from poparts b > join pos c using(poid) > join stock.lastrfqdateperpart d using(partid) > where c.isrfq and c.issuedate > d.issuedate-7 > AND b.unitprice > 0::numeric AND b.quantity >= 100::numeric AND > c.postatusid = ANY (ARRAY[40, 41]) > and b.partid=a.partid > order by b.partid,b.unitprice, b.deliverywks > limit 1 > ) From what I can see they are not the same queries, notwithstanding the selectivity in the second query. In fact I am not sure what the second query accomplishes that cannot be done in the first query:) Would you not get the same result in the first query by doing something like: select b.popartid,b.partid,b.unitprice,b.deliverywks from poparts b join pos c using(poid) join lastrfqdateperpart d using(partid) where c.isrfq and c.issuedate > d.issuedate-7 AND b.unitprice > 0::numeric AND b.quantity >= 100::numeric AND c.postatusid = ANY (ARRAY[40, 41]) order by b.partid,b.unitprice desc, b.deliverywks desc limit 1 -- Adrian Klaver adrian.klaver@gmail.com
On Monday, January 23, 2012 7:32:35 am Sim Zacks wrote: > On 01/23/2012 05:13 PM, Adrian Klaver wrote: > > When I throw in code to make the select only return the correct rows > The select statement takes 9 secs by itself: > select a.partid,a.deliverywks > from poparts a where popartid in ( > select b.popartid from poparts b > join pos c using(poid) > join stock.lastrfqdateperpart d using(partid) > where c.isrfq and c.issuedate > d.issuedate-7 > AND b.unitprice > 0::numeric AND b.quantity >= 100::numeric AND > c.postatusid = ANY (ARRAY[40, 41]) > and b.partid=a.partid > order by b.partid,b.unitprice, b.deliverywks > limit 1 > ) To clarify what I posted earlier, my suggestion was based on rewriting the second query as: select b.partid,b.deliverywks b.popartid from poparts b join pos c using(poid) join stock.lastrfqdateperpart d using(partid) where c.isrfq and c.issuedate > d.issuedate-7 AND b.unitprice > 0::numeric AND b.quantity >= 100::numeric AND c.postatusid = ANY (ARRAY[40, 41]) order by b.partid,b.unitprice, b.deliverywks limit 1 I may be missing the intent of your original query, but I think the above gets to the same result without the IN. -- Adrian Klaver adrian.klaver@gmail.com
On 01/23/2012 07:10 PM, Adrian Klaver wrote: > On Monday, January 23, 2012 7:32:35 am Sim Zacks wrote: >> On 01/23/2012 05:13 PM, Adrian Klaver wrote: > >> >> When I throw in code to make the select only return the correct rows >> The select statement takes 9 secs by itself: >> select a.partid,a.deliverywks >> from poparts a where popartid in ( >> select b.popartid from poparts b >> join pos c using(poid) >> join stock.lastrfqdateperpart d using(partid) >> where c.isrfq and c.issuedate > d.issuedate-7 >> AND b.unitprice > 0::numeric AND b.quantity >= 100::numeric AND >> c.postatusid = ANY (ARRAY[40, 41]) >> and b.partid=a.partid >> order by b.partid,b.unitprice, b.deliverywks >> limit 1 >> ) > > To clarify what I posted earlier, my suggestion was based on rewriting the > second query as: > > select b.partid,b.deliverywks b.popartid from poparts b > join pos c using(poid) > join stock.lastrfqdateperpart d using(partid) > where c.isrfq and c.issuedate > d.issuedate-7 > AND b.unitprice > 0::numeric AND b.quantity >= 100::numeric AND > c.postatusid = ANY (ARRAY[40, 41]) > order by b.partid,b.unitprice, b.deliverywks > limit 1 > > I may be missing the intent of your original query, but I think the above gets > to the same result without the IN. > My first query returns all rows of each part ordered such so that the row I want to actually update the table with is last. This query returns 12000 rows, for the 600 parts I want to update. My second query with the limit within the subselect gets 1 row per part. This returns 600 rows, 1 row for each part I want to update. Your suggestion would only return one row. See http://www.pgsql.cz/index.php/PostgreSQL_SQL_Tricks#Select_first_n_rows_from_group for reference.
On Monday, January 23, 2012 10:11:00 pm Sim Zacks wrote: > On 01/23/2012 07:10 PM, Adrian Klaver wrote: > > On Monday, January 23, 2012 7:32:35 am Sim Zacks wrote: > >> On 01/23/2012 05:13 PM, Adrian Klaver wrote: > >> > >> > >> When I throw in code to make the select only return the correct rows > >> The select statement takes 9 secs by itself: > >> select a.partid,a.deliverywks > >> from poparts a where popartid in ( > >> > >> select b.popartid from poparts b > >> join pos c using(poid) > >> join stock.lastrfqdateperpart d using(partid) > >> where c.isrfq and c.issuedate > d.issuedate-7 > >> AND b.unitprice > 0::numeric AND b.quantity >= 100::numeric AND > >> > >> c.postatusid = ANY (ARRAY[40, 41]) > >> > >> and b.partid=a.partid > >> order by b.partid,b.unitprice, b.deliverywks > >> limit 1 > >> > >> ) > > > > To clarify what I posted earlier, my suggestion was based on rewriting > > the > > > > second query as: > > select b.partid,b.deliverywks b.popartid from poparts b > > join pos c using(poid) > > join stock.lastrfqdateperpart d using(partid) > > where c.isrfq and c.issuedate > d.issuedate-7 > > AND b.unitprice > 0::numeric AND b.quantity >= 100::numeric AND > > > > c.postatusid = ANY (ARRAY[40, 41]) > > > > order by b.partid,b.unitprice, b.deliverywks > > limit 1 > > > > I may be missing the intent of your original query, but I think the above > > gets to the same result without the IN. > > My first query returns all rows of each part ordered such so that the > row I want to actually update the table with is last. This query returns > 12000 rows, for the 600 parts I want to update. > > My second query with the limit within the subselect gets 1 row per part. > This returns 600 rows, 1 row for each part I want to update. > > Your suggestion would only return one row. Oops. So per a previous suggestion: select DISTINCT ON (b.partid) b.partid, b.deliverywks from poparts b join pos c using(poid) join stock.lastrfqdateperpart d using(partid) where c.isrfq and c.issuedate > d.issuedate-7 AND b.unitprice > 0::numeric AND b.quantity >= 100::numeric AND c.postatusid = ANY (ARRAY[40, 41]) order by b.partid,b.unitprice, b.deliverywks > > See > http://www.pgsql.cz/index.php/PostgreSQL_SQL_Tricks#Select_first_n_rows_fro > m_group for reference. -- Adrian Klaver adrian.klaver@gmail.com