Re: left join with smaller table or index on (XXX is not null) to avoid upsert - Mailing list pgsql-general

From Ivan Sergio Borgonovo
Subject Re: left join with smaller table or index on (XXX is not null) to avoid upsert
Date
Msg-id 20090119134012.36e35a82@dawn.webthatworks.it
Whole thread Raw
In response to Re: left join with smaller table or index on (XXX is not null) to avoid upsert  ("Scott Marlowe" <scott.marlowe@gmail.com>)
List pgsql-general
On Sun, 18 Jan 2009 19:44:40 -0700
"Scott Marlowe" <scott.marlowe@gmail.com> wrote:

> You could update returning rowsupdated, so you could run that and
> get a list of all the rows that were updated.  Then build a simple
> select where not in (those rows) to get the rest for inserting.

uh nice addition. I didn't check all the goodies I got when I moved
from 8.1 to 8.3. I mostly was interested in tsearch.
Still while it makes nearly trivial to write upsert it looks like it
will still make the server sweat compared to MySQL REPLACE.
In postgresql I could write a rule, but it will be globally defined
and it is a much permanent solution than using an upsert (aka
REPLACE) on a statement basis.

> > I'm expecting that:
> > - ProductPrice will contain roughly but less than 10% of the
> > catalogue.

> Then an index will only help when you're selecting on something
> more selective.  unless your rows are really skinny, a sequential
> scan will usually win over an index scan.

They should be very skinny.
create table ProductPrice(
  ProductID int references Product (ProductID),
  DiscountedPrice numeric(4,2)
);

> > Since I haven't been able to find a quick way to build up a
> > hierarchy of promotions to apply/re-apply discounts when
> > promotion are added/deleted, creating/deleting promotions looks
> > critical as well.
> > The best thing I was able to plan was just to reapply all
> > promotions if one is deleted.

> Watch out for bloat when doing this.  A simple where change of

> update table set b = 45 ;

> to

> update table set b = 45 where b <> 45 ;

> can save the db a lot of work, and if you can apply the same logic
> to your update to save some dead tuples it's worth looking into.
> Updating whole tables wholesale is not definitely not pgsql's
> strong suit.

oh that's really a good suggestion since just a higher discount have
to be applied.

I just have to understand how to apply it.

If I actually have all the rows everything will be an update and I
can actually exploit your suggestion.

update ProductPrices set DiscountedPrice=round(q.Price*Discount,2)
  from somefunction() q where ProductPrices.ProductID=q.ProductID and
  ProductPrices.DiscountedPrice>round(q.Price*Discount,2);
If I have to "upsert" most of the advantage of reducing the # of
updates with an additional condition seems to be lost.

> > So it looks to me that approach B is going to make updating of
> > discounts easier, but I was wondering if it makes retrieval of
> > Products and Prices slower.
>
> If you do bulk updates, you'll blow out your tables if you don't
> keep them vacuumed.  50% dead space is manageable, if your data
> set is reasonably small (under a few hundred meg).  Just make sure
> you don't run 20 updates on a table in a row, that kind of thing.

That's going exactly to be the case. If I can't easily spot
intersections between promotions, and I doubt I can in a cheap way,
I'll have to run 20 to 100 updates every time I delete a promotion.
Set the DiscountedPrice to null for every ProductID I'm going to
delete from the promotion and reapply all the promotions.

But well maybe you helped me to find another approach.

Since I'm going to reapply all promotions I could:
- delete whole table
- insert prices starting from the promotions with higher discount
- skip on failed insert since other promotions will have same or
lower discount.

So I'll have just the rows I need in ProductPrice table, no need to
index on is not null, smaller table so faster to left join and keep
it in memory.

But... well how am I going to:

-- Discount=40
insert into ProductPrice from (
  select ProductID, round(q.Price*Discount,2) from
   mypromofunction(...)
);
-- Discount=40
insert into ProductPrice from (
  select ProductID, round(q.Price*Discount,2) from
   mypromofunction(...)
);
-- Discount=30
insert into ProductPrice from (
  select ProductID, round(q.Price*Discount,2) from
   mypromofunction(...)
);

OK one more approach:
create table ProductPromoPrice(
  PromoID references Promo (PromoID) on delete cascade,
  ProductID references Product (ProductID on delete cascade,
  DiscountedPrice numeric
);
create table ProductPrice(
  ProductID references Product (ProductID) on delete cascade,
  DiscountedPrice numeric
);

insert into ProductPromoPrice from select PromoID, ProductID,
round(q.Price*Discount,2) from
   mypromofunction(...)
);
...
[1]
insert into ProductPrice select ProductID, min(DiscountedPrice) from
  ProductPromoPrice group by ProductID;

I just did:
create table test.Prices(ItemID int, Price real);

insert into test.Prices select BrandID, max(ListPrice) from
catalog_items group by BrandID;
took 1sec

insert into test.Prices select ProductID, ListPrice from
catalog_items;
took 4 sec

If I'm expecting that
- discounted articles may be 10% of the whole catalogue
- a maximum overlap of 60%.
- large overlap it may involve a small # of promotions
- no more than 100 promotions
what execution time should I expect from query [1]?

Should an index on ProductPromoPrice.Price help on taking out the
max?
Should I index on ProductPromoPrice.ItemID?

Meanwhile I thought I may have another requirement that may make
things enough complicated to need a completely different strategy.
If discounts have to be applied just when goods are in stock... I'll
have to rebuild the discount table every time a product goes out of
stock.

I may have a PriceInStock table and a PriceBackOrder table an then
just chose the LEAST.
Since few products will get out of stock concurrently I'll have to
regenerate the entries just for those product... with a rule or a
trigger.

I'm still looking for advices for an overall better strategy or just
to lower the numbers of actual tests to see if this stuff is
feasible in a decent time.

--
Ivan Sergio Borgonovo
http://www.webthatworks.it


pgsql-general by date:

Previous
From: Ivan Sergio Borgonovo
Date:
Subject: too smart update (was: left join with smaller table or index on (XXX is not null) to avoid upsert)
Next
From: "Thom Brown"
Date:
Subject: How can I display the contents of a function?