Thread: Schema design question as it pertains to performance
This question pertains to PG 9.0.4 running on FreeBSD amd64. Was not sure if it should go into the general list or the performance list, so my apologies if I opted for the wrong list. We currently have a db schema which contains many wide indices which usually contain one column which is constantly changing (such as a last update time). It was designed that way because at the time we were unaware that PG could utilize multiple indices in a query. Our database is very heavy un update queries, since the same records get updated many times per minute. As a result, that very wide index is constantly being updated, which I am sure is one of the sources of some heavy disk i/o I am seeing. Until recently, this design was not affecting performance, but as we have more devices sending data, we are seeing a significant hit on the i/o, and data is getting queued by the daemons which insert it creating a backlog of "real time" data. From a performance standpoint, is there a big hit on select performance if a query ends up utilizing more than one index, taking into account that an index has been used already to reduce the data set of potential records, and the secondary index would mostly be used in the ordering of the result set (such as a last updated time)? I also assume that if no data has changed in an index, nothing is done when the record is updated as pertains to the particular index - am I correct in this assumption? Our data is already partitioned and triggers send it to the correct partition. Thanks in advance, Benjamin
Benjamin Krajmalnik wrote: > From a performance standpoint, is there a big hit on select performance > if a query ends up utilizing more than one index, taking into account > that an index has been used already to reduce the data set of potential > records, and the secondary index would mostly be used in the ordering of > the result set (such as a last updated time)? That depends on so many variables it is hard to give a simple answer. > I also assume that if no data has changed in an index, nothing is done > when the record is updated as pertains to the particular index - am I > correct in this assumption? No. If the update doesn't affect *any* indexed column, and there is room in the page, it will do a HOT update and can skip all index updates. If any indexed column is updated, it must expire the old tuple and create a new tuple to represent the updated version of the row, and this new tuple is not likely to land in the same page as the old tuple; so it needs new entries in all the indexes. The old index entries must remain until they can no longer be the visible version of the row for any database transaction, so both versions of the row will be on the index for a while. -Kevin
Thanks, Kevin. That was my intent - if no column of an index changes in an update then no changes are done on the index. That helps quite a bit - I will try moving the columns which change continuously to their own index - hopefully this willtame the overwhelming IO, since the index will now have a single column. I am seeing too many semwaits or ufs stateswhen running top, which means we are not quite keeping up. A lot of the load has been offloaded to the read only hot standby - thanks for the pointers on the delays. I only had onequery cancelled today (as opposed to many previously), so it is just a matter of fine tuning right now. -----Original Message----- From: Kevin Grittner [mailto:kgrittn@mail.com] Sent: Tuesday, January 22, 2013 2:56 PM To: Benjamin Krajmalnik; pgsql-admin@postgresql.org Subject: Re: [ADMIN] Schema design question as it pertains to performance Benjamin Krajmalnik wrote: > From a performance standpoint, is there a big hit on select > performance if a query ends up utilizing more than one index, taking > into account that an index has been used already to reduce the data > set of potential records, and the secondary index would mostly be used > in the ordering of the result set (such as a last updated time)? That depends on so many variables it is hard to give a simple answer. > I also assume that if no data has changed in an index, nothing is done > when the record is updated as pertains to the particular index - am I > correct in this assumption? No. If the update doesn't affect *any* indexed column, and there is room in the page, it will do a HOT update and can skipall index updates. If any indexed column is updated, it must expire the old tuple and create a new tuple to representthe updated version of the row, and this new tuple is not likely to land in the same page as the old tuple; so itneeds new entries in all the indexes. The old index entries must remain until they can no longer be the visible versionof the row for any database transaction, so both versions of the row will be on the index for a while. -Kevin
Benjamin Krajmalnik wrote: > Kevin Grittner wrote: >> Benjamin Krajmalnik wrote: >>> I also assume that if no data has changed in an index, nothing >>> is done when the record is updated as pertains to the >>> particular index - am I correct in this assumption? >> >> No. [...] If any indexed column is updated, [...] it needs new >> entries in all the indexes. > That was my intent - if no column of an index changes in an > update then no changes are done on the index. I don't think you understood -- if any column referenced by any index is updated, then all indexes must get a new entry for that row whether or not they include that column. The new version of the row will be at a new location, so new entries are needed in all indexes to point to the new location. I did *not* say that only indexes referencing the updated column need a new entry. -Kevin
Thanks for the clarification. Now I understand - it is an all or nothing due to required entries for each row version onthe main table. I will have to see if removing the constantly changing value from the indices will affect the performance of our selects. -----Original Message----- From: Kevin Grittner [mailto:kgrittn@mail.com] Sent: Tuesday, January 22, 2013 5:23 PM To: Benjamin Krajmalnik; pgsql-admin@postgresql.org Subject: RE: [ADMIN] Schema design question as it pertains to performance Benjamin Krajmalnik wrote: > Kevin Grittner wrote: >> Benjamin Krajmalnik wrote: >>> I also assume that if no data has changed in an index, nothing is >>> done when the record is updated as pertains to the particular index >>> - am I correct in this assumption? >> >> No. [...] If any indexed column is updated, [...] it needs new >> entries in all the indexes. > That was my intent - if no column of an index changes in an update > then no changes are done on the index. I don't think you understood -- if any column referenced by any index is updated, then all indexes must get a new entry forthat row whether or not they include that column. The new version of the row will be at a new location, so new entriesare needed in all indexes to point to the new location. I did *not* say that only indexes referencing the updated column need a new entry. -Kevin