Thread: Exact index overhead

Exact index overhead

From
Gunther Mayer
Date:
Hi there,

I have a table which looks similar to:

CREATE TABLE accounting
(
  id text NOT NULL,
  time timestamp with time zone,
  data1 int,
  data2 int,
  data3 int,
  data4 int,
  data5 int,
  data6 int,
  data7 int,
  data8 int,
  state int
  CONSTRAINT accounting_pkey PRIMARY KEY (id),
)

The table has about 300k rows but is growing steadily. The usage of this
table is few selects and inserts, tons of updates and no deletes ever.
Ratios are roughly
select:insert = 1:1
insert:update = 1:60

Now it turns out that almost all reporting queries use the time field
and without any additional indexes it ends up doing slow and expensive
sequential scans (10-20 seconds). Therefore I'd like to create and index
on time to speed this up, yet I'm not entirely sure what overhead that
introduces. Clearly there's some overhead during insertion of a new row
which I can live with but what's not clear is the overhead during
updates, and the postgresql manual doesn't make that explicit.

You see, all updates change most of the data fields but never ever touch
the time field. Assuming correct and efficient behaviour of postgresql
it should then also never touch the time index and incur zero overhead
in its presence, but is this really the case? If it somehow does update
the index too even though the value hasn't changed by some weird
implementation detail I'd rather not have that index and live with slow
queries for the few times a day that reporting is run.

Gunther

Re: Exact index overhead

From
Richard Huxton
Date:
Gunther Mayer wrote:
> You see, all updates change most of the data fields but never ever touch
> the time field. Assuming correct and efficient behaviour of postgresql
> it should then also never touch the time index and incur zero overhead
> in its presence, but is this really the case? If it somehow does update
> the index too even though the value hasn't changed by some weird
> implementation detail I'd rather not have that index and live with slow
> queries for the few times a day that reporting is run.

Well, until 8.3 PG does indeed update the index. That's because with
MVCC an update is basically a delete+insert, so you'll end up with two
versions (the V in MVCC) of the row.

With 8.3 there's a new feature called HOT which means updates that don't
change an index can be more efficient.

So - if you are running 8.3, I'd say try the index and see what
difference it makes.

--
   Richard Huxton
   Archonet Ltd

Re: Exact index overhead

From
Heikki Linnakangas
Date:
Gunther Mayer wrote:
> You see, all updates change most of the data fields but never ever touch
> the time field. Assuming correct and efficient behaviour of postgresql
> it should then also never touch the time index and incur zero overhead
> in its presence, but is this really the case? If it somehow does update
> the index too even though the value hasn't changed by some weird
> implementation detail I'd rather not have that index and live with slow
> queries for the few times a day that reporting is run.

Updates do generally modify the indexes as well. The way MVCC has been
implemented in PostgreSQL, UPDATE is internally very much like
DELETE+INSERT. A new row version is inserted, new index pointers are
added for the new row version, and the old row version is marked as deleted.

In version 8.3, however, the new HOT feature reduces the need for that.
In a nutshell, if the new row version fits on the same page as the old
one, no new index pointers need to be created.

I would suggest just testing how much additional overhead the new index
incurs. It might be less expensive than you think.

You didn't mention how often the inserts happen, in other words, how
fast you expect the table to grow. If the table is expected to grow
orders of magnitude larger, you might want to partition the table by date.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: Exact index overhead

From
"Pavan Deolasee"
Date:
On Thu, Apr 17, 2008 at 2:57 PM, Gunther Mayer
<gunther.mayer@googlemail.com> wrote:
>
>
>  You see, all updates change most of the data fields but never ever touch
> the time field. Assuming correct and efficient behaviour of postgresql it
> should then also never touch the time index and incur zero overhead in its
> presence, but is this really the case?

Normally, whenever a row is updated, Postgres inserts a new index entry in each
of the index. So to answer your question, there is certainly index
overhead during
updates, even if you are not changing the indexed column.

But if you are using 8.3 then HOT may help you here, assuming you are
not updating
any index keys. HOT optimizes the case by *not* inserting a new index entry and
also by performing retail vacuuming. The two necessary conditions for HOT are:

1. Update should not change any of the index keys. So if you have two
indexes, one
on column A and other on column B, update must not be modifying either A or B.

2. The existing block should have enough free space to accommodate the
new version
A less than 100 fillfactor may help you given your rate of updates.

If your application satisfies 1, then I would suggest you to upgrade
to 8.3 (if you are
not using it already) and then you can create the index without
bothering much about
overheads.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

Re: Exact index overhead

From
Gunther Mayer
Date:
Pavan Deolasee wrote:
> On Thu, Apr 17, 2008 at 2:57 PM, Gunther Mayer
> <gunther.mayer@googlemail.com> wrote:
>
>>  You see, all updates change most of the data fields but never ever touch
>> the time field. Assuming correct and efficient behaviour of postgresql it
>> should then also never touch the time index and incur zero overhead in its
>> presence, but is this really the case?
>>
>
> Normally, whenever a row is updated, Postgres inserts a new index entry in each
> of the index. So to answer your question, there is certainly index
> overhead during
> updates, even if you are not changing the indexed column.
>
Ah, I knew these "obvious" assumptions wouldn't necessarily hold. Good
that I checked.
> But if you are using 8.3 then HOT may help you here, assuming you are
> not updating
> any index keys. HOT optimizes the case by *not* inserting a new index entry and
> also by performing retail vacuuming. The two necessary conditions for HOT are:
>
> 1. Update should not change any of the index keys. So if you have two
> indexes, one
> on column A and other on column B, update must not be modifying either A or B.
>
That condition is always satisfied.
> 2. The existing block should have enough free space to accommodate the
> new version
> A less than 100 fillfactor may help you given your rate of updates.
>
I see, as soon as a new block is required for the new version the index
pointer needs updating too, I understand now. But at least in the common
case of space being available the index overhead is reduced to zero. I
can live with that.
> If your application satisfies 1, then I would suggest you to upgrade
> to 8.3 (if you are
> not using it already) and then you can create the index without
> bothering much about
> overheads.
>
I'm still running 8.2.7 but I guess here's a compelling reason to
upgrade ;-) Will do so soon.

Thanks a lot to everyone who responded (and at what pace!). I love this
community, it beats commercial support hands down.

Gunther

Re: Exact index overhead

From
"Scott Marlowe"
Date:
On Thu, Apr 17, 2008 at 9:42 AM, Gunther Mayer
<gunther.mayer@googlemail.com> wrote:
> Pavan Deolasee wrote:
>

> > 2. The existing block should have enough free space to accommodate the
> > new version
> > A less than 100 fillfactor may help you given your rate of updates.
> >
> >
>  I see, as soon as a new block is required for the new version the index
> pointer needs updating too, I understand now. But at least in the common
> case of space being available the index overhead is reduced to zero. I can
> live with that.

Quick clarification, it's the table, not the index that has to have
free space for the new row version.  This rewards good normalization
practices (narrower rows) and a lower fill factor.