Thread: Schema design question as it pertains to performance

Schema design question as it pertains to performance

From

"Benjamin Krajmalnik"

Date:

22 January 2013, 21:33:11

This question pertains to PG 9.0.4 running on FreeBSD amd64.  Was not
sure if it should go into the general list or the performance list, so
my apologies if I opted for the wrong list.

We currently have  a db schema which contains many wide indices which
usually contain one column which is constantly changing (such as a last
update time).
It was designed that way because at the time we were unaware that PG
could utilize multiple indices in a query.
Our database is very heavy un update queries, since the same records get
updated many times per minute.
As a result, that very wide index is constantly being updated, which I
am sure is one of the sources of some heavy disk i/o I am seeing.
Until recently, this design was not affecting performance, but as we
have more devices sending data, we are seeing a significant hit on the
i/o, and data is getting queued by the daemons which insert it creating
a backlog of "real time" data.
From a performance standpoint, is there a big hit on select performance
if a query ends up utilizing more than one index, taking into account
that an index has been used already to reduce the data set of potential
records, and the secondary index would mostly be used in the ordering of
the result set (such as a last updated time)?
I also assume that if no data has changed in an index, nothing is done
when the record is updated as pertains to the particular index - am I
correct in this assumption?

Our data is already partitioned and triggers send it to the correct
partition.

Thanks in advance,

Benjamin

Re: Schema design question as it pertains to performance

From

"Kevin Grittner"

Date:

22 January 2013, 21:56:08

Benjamin Krajmalnik wrote:

> From a performance standpoint, is there a big hit on select performance
> if a query ends up utilizing more than one index, taking into account
> that an index has been used already to reduce the data set of potential
> records, and the secondary index would mostly be used in the ordering of
> the result set (such as a last updated time)?

That depends on so many variables it is hard to give a simple answer.

> I also assume that if no data has changed in an index, nothing is done
> when the record is updated as pertains to the particular index - am I
> correct in this assumption?

No. If the update doesn't affect *any* indexed column, and there is
room in the page, it will do a HOT update and can skip all index
updates. If any indexed column is updated, it must expire the old
tuple and create a new tuple to represent the updated version of
the row, and this new tuple is not likely to land in the same page
as the old tuple; so it needs new entries in all the indexes. The
old index entries must remain until they can no longer be the
visible version of the row for any database transaction, so both
versions of the row will be on the index for a while.

-Kevin

Re: Schema design question as it pertains to performance

From

"Benjamin Krajmalnik"

Date:

22 January 2013, 22:27:07

Thanks, Kevin.
That was my intent - if no column of an index changes in an update then no changes are done on the index.
That helps quite a bit - I will try moving the columns which change continuously to their own index - hopefully this
willtame the overwhelming IO, since the index will now have a single column.  I am seeing too many semwaits or ufs
stateswhen running top, which means we are not quite keeping up.

A lot of the load has been offloaded to the read only hot standby - thanks for the pointers on the delays.  I only had
onequery cancelled today (as opposed to many previously), so it is just a matter of fine tuning right now.

-----Original Message-----
From: Kevin Grittner [mailto:kgrittn@mail.com] 
Sent: Tuesday, January 22, 2013 2:56 PM
To: Benjamin Krajmalnik; pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Schema design question as it pertains to performance

Benjamin Krajmalnik wrote:

> From a performance standpoint, is there a big hit on select 
> performance if a query ends up utilizing more than one index, taking 
> into account that an index has been used already to reduce the data 
> set of potential records, and the secondary index would mostly be used 
> in the ordering of the result set (such as a last updated time)?

That depends on so many variables it is hard to give a simple answer.

> I also assume that if no data has changed in an index, nothing is done 
> when the record is updated as pertains to the particular index - am I 
> correct in this assumption?

No. If the update doesn't affect *any* indexed column, and there is room in the page, it will do a HOT update and can
skipall index updates. If any indexed column is updated, it must expire the old tuple and create a new tuple to
representthe updated version of the row, and this new tuple is not likely to land in the same page as the old tuple; so
itneeds new entries in all the indexes. The old index entries must remain until they can no longer be the visible
versionof the row for any database transaction, so both versions of the row will be on the index for a while.

-Kevin

Re: Schema design question as it pertains to performance

From

"Kevin Grittner"

Date:

23 January 2013, 00:23:09

Benjamin Krajmalnik wrote:
> Kevin Grittner wrote:
>> Benjamin Krajmalnik wrote:

>>> I also assume that if no data has changed in an index, nothing
>>> is done when the record is updated as pertains to the
>>> particular index - am I correct in this assumption?
>>
>> No. [...] If any indexed column is updated, [...] it needs new
>> entries in all the indexes.

> That was my intent - if no column of an index changes in an
> update then no changes are done on the index.

I don't think you understood -- if any column referenced by any
index is updated, then all indexes must get a new entry for that
row whether or not they include that column. The new version of the
row will be at a new location, so new entries are needed in all
indexes to point to the new location.

I did *not* say that only indexes referencing the updated column
need a new entry.

-Kevin

Re: Schema design question as it pertains to performance

From

"Benjamin Krajmalnik"

Date:

23 January 2013, 00:48:45

Thanks for the clarification.  Now I understand - it is an all or nothing due to required entries for each row version
onthe main table.

I will have to see if removing the constantly changing value from the indices will affect the performance of our
selects.

-----Original Message-----
From: Kevin Grittner [mailto:kgrittn@mail.com] 
Sent: Tuesday, January 22, 2013 5:23 PM
To: Benjamin Krajmalnik; pgsql-admin@postgresql.org
Subject: RE: [ADMIN] Schema design question as it pertains to performance

Benjamin Krajmalnik wrote:
> Kevin Grittner wrote:
>> Benjamin Krajmalnik wrote:

>>> I also assume that if no data has changed in an index, nothing is 
>>> done when the record is updated as pertains to the particular index 
>>> - am I correct in this assumption?
>> 
>> No. [...] If any indexed column is updated, [...] it needs new 
>> entries in all the indexes.

> That was my intent - if no column of an index changes in an update 
> then no changes are done on the index.

I don't think you understood -- if any column referenced by any index is updated, then all indexes must get a new entry
forthat row whether or not they include that column. The new version of the row will be at a new location, so new
entriesare needed in all indexes to point to the new location.

I did *not* say that only indexes referencing the updated column need a new entry.

-Kevin