Thread: Exact index overhead
Hi there, I have a table which looks similar to: CREATE TABLE accounting ( id text NOT NULL, time timestamp with time zone, data1 int, data2 int, data3 int, data4 int, data5 int, data6 int, data7 int, data8 int, state int CONSTRAINT accounting_pkey PRIMARY KEY (id), ) The table has about 300k rows but is growing steadily. The usage of this table is few selects and inserts, tons of updates and no deletes ever. Ratios are roughly select:insert = 1:1 insert:update = 1:60 Now it turns out that almost all reporting queries use the time field and without any additional indexes it ends up doing slow and expensive sequential scans (10-20 seconds). Therefore I'd like to create and index on time to speed this up, yet I'm not entirely sure what overhead that introduces. Clearly there's some overhead during insertion of a new row which I can live with but what's not clear is the overhead during updates, and the postgresql manual doesn't make that explicit. You see, all updates change most of the data fields but never ever touch the time field. Assuming correct and efficient behaviour of postgresql it should then also never touch the time index and incur zero overhead in its presence, but is this really the case? If it somehow does update the index too even though the value hasn't changed by some weird implementation detail I'd rather not have that index and live with slow queries for the few times a day that reporting is run. Gunther
Gunther Mayer wrote: > You see, all updates change most of the data fields but never ever touch > the time field. Assuming correct and efficient behaviour of postgresql > it should then also never touch the time index and incur zero overhead > in its presence, but is this really the case? If it somehow does update > the index too even though the value hasn't changed by some weird > implementation detail I'd rather not have that index and live with slow > queries for the few times a day that reporting is run. Well, until 8.3 PG does indeed update the index. That's because with MVCC an update is basically a delete+insert, so you'll end up with two versions (the V in MVCC) of the row. With 8.3 there's a new feature called HOT which means updates that don't change an index can be more efficient. So - if you are running 8.3, I'd say try the index and see what difference it makes. -- Richard Huxton Archonet Ltd
Gunther Mayer wrote: > You see, all updates change most of the data fields but never ever touch > the time field. Assuming correct and efficient behaviour of postgresql > it should then also never touch the time index and incur zero overhead > in its presence, but is this really the case? If it somehow does update > the index too even though the value hasn't changed by some weird > implementation detail I'd rather not have that index and live with slow > queries for the few times a day that reporting is run. Updates do generally modify the indexes as well. The way MVCC has been implemented in PostgreSQL, UPDATE is internally very much like DELETE+INSERT. A new row version is inserted, new index pointers are added for the new row version, and the old row version is marked as deleted. In version 8.3, however, the new HOT feature reduces the need for that. In a nutshell, if the new row version fits on the same page as the old one, no new index pointers need to be created. I would suggest just testing how much additional overhead the new index incurs. It might be less expensive than you think. You didn't mention how often the inserts happen, in other words, how fast you expect the table to grow. If the table is expected to grow orders of magnitude larger, you might want to partition the table by date. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, Apr 17, 2008 at 2:57 PM, Gunther Mayer <gunther.mayer@googlemail.com> wrote: > > > You see, all updates change most of the data fields but never ever touch > the time field. Assuming correct and efficient behaviour of postgresql it > should then also never touch the time index and incur zero overhead in its > presence, but is this really the case? Normally, whenever a row is updated, Postgres inserts a new index entry in each of the index. So to answer your question, there is certainly index overhead during updates, even if you are not changing the indexed column. But if you are using 8.3 then HOT may help you here, assuming you are not updating any index keys. HOT optimizes the case by *not* inserting a new index entry and also by performing retail vacuuming. The two necessary conditions for HOT are: 1. Update should not change any of the index keys. So if you have two indexes, one on column A and other on column B, update must not be modifying either A or B. 2. The existing block should have enough free space to accommodate the new version A less than 100 fillfactor may help you given your rate of updates. If your application satisfies 1, then I would suggest you to upgrade to 8.3 (if you are not using it already) and then you can create the index without bothering much about overheads. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
Pavan Deolasee wrote: > On Thu, Apr 17, 2008 at 2:57 PM, Gunther Mayer > <gunther.mayer@googlemail.com> wrote: > >> You see, all updates change most of the data fields but never ever touch >> the time field. Assuming correct and efficient behaviour of postgresql it >> should then also never touch the time index and incur zero overhead in its >> presence, but is this really the case? >> > > Normally, whenever a row is updated, Postgres inserts a new index entry in each > of the index. So to answer your question, there is certainly index > overhead during > updates, even if you are not changing the indexed column. > Ah, I knew these "obvious" assumptions wouldn't necessarily hold. Good that I checked. > But if you are using 8.3 then HOT may help you here, assuming you are > not updating > any index keys. HOT optimizes the case by *not* inserting a new index entry and > also by performing retail vacuuming. The two necessary conditions for HOT are: > > 1. Update should not change any of the index keys. So if you have two > indexes, one > on column A and other on column B, update must not be modifying either A or B. > That condition is always satisfied. > 2. The existing block should have enough free space to accommodate the > new version > A less than 100 fillfactor may help you given your rate of updates. > I see, as soon as a new block is required for the new version the index pointer needs updating too, I understand now. But at least in the common case of space being available the index overhead is reduced to zero. I can live with that. > If your application satisfies 1, then I would suggest you to upgrade > to 8.3 (if you are > not using it already) and then you can create the index without > bothering much about > overheads. > I'm still running 8.2.7 but I guess here's a compelling reason to upgrade ;-) Will do so soon. Thanks a lot to everyone who responded (and at what pace!). I love this community, it beats commercial support hands down. Gunther
On Thu, Apr 17, 2008 at 9:42 AM, Gunther Mayer <gunther.mayer@googlemail.com> wrote: > Pavan Deolasee wrote: > > > 2. The existing block should have enough free space to accommodate the > > new version > > A less than 100 fillfactor may help you given your rate of updates. > > > > > I see, as soon as a new block is required for the new version the index > pointer needs updating too, I understand now. But at least in the common > case of space being available the index overhead is reduced to zero. I can > live with that. Quick clarification, it's the table, not the index that has to have free space for the new row version. This rewards good normalization practices (narrower rows) and a lower fill factor.