Thread: HOT updates in index-less tables
If a table has no indexes, we will always decide that any same-page update operation is a HOT update, since obviously it isn't modifying any indexed columns. But is there any benefit to doing so? I don't see one offhand, and it has a downside: we're very likely to encounter broken HOT chains if an index is created later. That leads to the sort of unexpected behavior exhibited here: http://archives.postgresql.org/pgsql-performance/2010-11/msg00216.php I'm thinking maybe HeapSatisfiesHOTUpdate should be changed so that it always returns false if the relation has no indexes, which could be checked cheaply via relation->rd_rel->relhasindex. regards, tom lane
On Sat, 2010-11-13 at 10:51 -0500, Tom Lane wrote: > If a table has no indexes, we will always decide that any same-page > update operation is a HOT update, since obviously it isn't modifying > any indexed columns. But is there any benefit to doing so? If we do the in-page "mini vacuum" even without HOT, then there should be no benefit from index-less HOT updates. If we don't try the mini vacuum in this case, then some current behaviuors could become much worse, say a table with one row and load of updates. > I don't > see one offhand, and it has a downside: we're very likely to > encounter broken HOT chains if an index is created later. That leads > to the sort of unexpected behavior exhibited here: > http://archives.postgresql.org/pgsql-performance/2010-11/msg00216.php > > I'm thinking maybe HeapSatisfiesHOTUpdate should be changed so that it > always returns false if the relation has no indexes, which could be > checked cheaply via relation->rd_rel->relhasindex. > > regards, tom lane > -- ------- Hannu Krosing PostgreSQL Infinite Scalability and Performance Consultant PG Admin Book: http://www.2ndQuadrant.com/books/
Hannu Krosing <hannu@2ndQuadrant.com> writes: > On Sat, 2010-11-13 at 10:51 -0500, Tom Lane wrote: >> If a table has no indexes, we will always decide that any same-page >> update operation is a HOT update, since obviously it isn't modifying >> any indexed columns. But is there any benefit to doing so? > If we do the in-page "mini vacuum" even without HOT, then there should > be no benefit from index-less HOT updates. AFAICS we do: heap_update marks the page as prunable whether it's a HOT update or not. The only difference between treating the update as HOT vs not-HOT is that if there was more than one HOT update, the intermediate tuples could be completely reclaimed by page pruning (ie, their line pointers go away too). With not-HOT updates, the intermediate line pointers would have to remain in DEAD state until vacuum, since page pruning wouldn't know if there were index entries pointing at them. But that seems like a pretty tiny penalty. regards, tom lane
On Sat, 2010-11-13 at 12:13 -0500, Tom Lane wrote: > Hannu Krosing <hannu@2ndQuadrant.com> writes: > > On Sat, 2010-11-13 at 10:51 -0500, Tom Lane wrote: > >> If a table has no indexes, we will always decide that any same-page > >> update operation is a HOT update, since obviously it isn't modifying > >> any indexed columns. But is there any benefit to doing so? > > > If we do the in-page "mini vacuum" even without HOT, then there should > > be no benefit from index-less HOT updates. > > AFAICS we do: heap_update marks the page as prunable whether it's a HOT > update or not. The only difference between treating the update as HOT vs > not-HOT is that if there was more than one HOT update, the intermediate > tuples could be completely reclaimed by page pruning (ie, their line > pointers go away too). With not-HOT updates, the intermediate line > pointers would have to remain in DEAD state until vacuum, How hard would it be to make the pruning logic be aware of there being no indexes and thus no possibility of index entries pointing at any tuple ? > since page > pruning wouldn't know if there were index entries pointing at them. > But that seems like a pretty tiny penalty. > > regards, tom lane -- ------- Hannu Krosing PostgreSQL Infinite Scalability and Performance Consultant PG Admin Book: http://www.2ndQuadrant.com/books/
Hannu Krosing <hannu@2ndQuadrant.com> writes: > On Sat, 2010-11-13 at 12:13 -0500, Tom Lane wrote: >> AFAICS we do: heap_update marks the page as prunable whether it's a HOT >> update or not. The only difference between treating the update as HOT vs >> not-HOT is that if there was more than one HOT update, the intermediate >> tuples could be completely reclaimed by page pruning (ie, their line >> pointers go away too). With not-HOT updates, the intermediate line >> pointers would have to remain in DEAD state until vacuum, > How hard would it be to make the pruning logic be aware of there being > no indexes and thus no possibility of index entries pointing at any > tuple ? I think it's problematic, because heap_page_prune can be executed with only AccessShareLock on the table, which means there's a race condition against concurrent CREATE INDEX. You could look at relhasindex easily enough, but that doesn't prove there's not a CREATE INDEX in progress. regards, tom lane
On Sat, Nov 13, 2010 at 12:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Hannu Krosing <hannu@2ndQuadrant.com> writes: >> On Sat, 2010-11-13 at 10:51 -0500, Tom Lane wrote: >>> If a table has no indexes, we will always decide that any same-page >>> update operation is a HOT update, since obviously it isn't modifying >>> any indexed columns. But is there any benefit to doing so? > >> If we do the in-page "mini vacuum" even without HOT, then there should >> be no benefit from index-less HOT updates. > > AFAICS we do: heap_update marks the page as prunable whether it's a HOT > update or not. The only difference between treating the update as HOT vs > not-HOT is that if there was more than one HOT update, the intermediate > tuples could be completely reclaimed by page pruning (ie, their line > pointers go away too). With not-HOT updates, the intermediate line > pointers would have to remain in DEAD state until vacuum, since page > pruning wouldn't know if there were index entries pointing at them. > But that seems like a pretty tiny penalty. I'm not at all convinced that's a tiny penalty. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 14.11.2010 00:29, Robert Haas wrote: > On Sat, Nov 13, 2010 at 12:13 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> Hannu Krosing<hannu@2ndQuadrant.com> writes: >>> On Sat, 2010-11-13 at 10:51 -0500, Tom Lane wrote: >>>> If a table has no indexes, we will always decide that any same-page >>>> update operation is a HOT update, since obviously it isn't modifying >>>> any indexed columns. But is there any benefit to doing so? >> >>> If we do the in-page "mini vacuum" even without HOT, then there should >>> be no benefit from index-less HOT updates. >> >> AFAICS we do: heap_update marks the page as prunable whether it's a HOT >> update or not. The only difference between treating the update as HOT vs >> not-HOT is that if there was more than one HOT update, the intermediate >> tuples could be completely reclaimed by page pruning (ie, their line >> pointers go away too). With not-HOT updates, the intermediate line >> pointers would have to remain in DEAD state until vacuum, since page >> pruning wouldn't know if there were index entries pointing at them. >> But that seems like a pretty tiny penalty. > > I'm not at all convinced that's a tiny penalty. Me neither. It's a tiny penalty when you consider one update, but if you repeatedly update the same tuple, you accumulate dead line pointers until the next real vacuum runs. With HOT updates, you reach a steady state where page pruning is all you need. Then again, if you're repeatedly updating a row in a table with no indexes, presumably it's a very small table or you would create an index on it. And frequently autovacuuming a small index is quite cheap too. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Sun, Nov 14, 2010 at 1:12 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 14.11.2010 00:29, Robert Haas wrote: >> >> On Sat, Nov 13, 2010 at 12:13 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>> >>> Hannu Krosing<hannu@2ndQuadrant.com> writes: >>>> >>>> On Sat, 2010-11-13 at 10:51 -0500, Tom Lane wrote: >>>>> >>>>> If a table has no indexes, we will always decide that any same-page >>>>> update operation is a HOT update, since obviously it isn't modifying >>>>> any indexed columns. But is there any benefit to doing so? >>> >>>> If we do the in-page "mini vacuum" even without HOT, then there should >>>> be no benefit from index-less HOT updates. >>> >>> AFAICS we do: heap_update marks the page as prunable whether it's a HOT >>> update or not. The only difference between treating the update as HOT vs >>> not-HOT is that if there was more than one HOT update, the intermediate >>> tuples could be completely reclaimed by page pruning (ie, their line >>> pointers go away too). With not-HOT updates, the intermediate line >>> pointers would have to remain in DEAD state until vacuum, since page >>> pruning wouldn't know if there were index entries pointing at them. >>> But that seems like a pretty tiny penalty. >> >> I'm not at all convinced that's a tiny penalty. > > Me neither. It's a tiny penalty when you consider one update, but if you > repeatedly update the same tuple, you accumulate dead line pointers until > the next real vacuum runs. With HOT updates, you reach a steady state where > page pruning is all you need. Then again, if you're repeatedly updating a > row in a table with no indexes, presumably it's a very small table or you > would create an index on it. And frequently autovacuuming a small index is > quite cheap too. The case here is when you have say a control table that is managing a gapless sequence, or a materialization table with a very small number of records. These type of tables get updated very frequently, perhaps in every transaction. People without detailed implementation knowledge of postgresql might assume that leaving an index off the table is faster in these situations. The danger here is that if autovacuum is stalled for whatever reason, you get exponentially bad behavior as the table gets stuffed with bad records. index-less hot was put in intentionally. As autovacuum gets smarter and smarter, the reasoning to do this get weaker. merlin