Re: Frequent Update Project: Design Overview of HOT Updates - Mailing list pgsql-hackers
From | Zeugswetter Andreas ADI SD |
---|---|
Subject | Re: Frequent Update Project: Design Overview of HOT Updates |
Date | |
Msg-id | E1539E0ED7043848906A8FF995BDA579017C0928@m0143.s-mxs.net Whole thread Raw |
In response to | Re: Frequent Update Project: Design Overview of HOT Updates (Gregory Stark <stark@enterprisedb.com>) |
Responses |
Re: Frequent Update Project: Design Overview of HOTUpdates
|
List | pgsql-hackers |
> > 1. It doubles the IO (original page + hot page), if the new row would > > have fit into the original page. > > That's an awfully big IF there. Even if you use a fillfactor > of 50% in which case you're paying a 100% performance penalty I don't see where the 50% come from ? That's only needed if you update all rows on the page. And that in a timeframe, that does not allow reuse of other outdated tuples. > > 4. although at first it might seem so I see no advantage for vacuum > > with overflow > > The main problem with vacuum now is that it must scan the > entire table (and the entire index) even if only a few > records are garbage. If we isolate the garbage in a separate > area then vacuum doesn't have to scan unrelated tuples. > > I'm not sure this really solves that problem because there > are still DELETEs to consider but it does remove one factor > that exacerbates it unnecessarily. Yea, so you still need to vaccum the large table regularly. > I think the vision is that the overflow table would never be > very large because it can be vacuumed very aggressively. It > has only tuples that are busy and will need vacuuming as soon > as a transaction ends. Unlike the main table which is mostly > tuples that don't need vacuuming. Ok, but you have to provide an extra vacuum that does only that then (and it randomly touches heap pages, and only does partial work there). > > 5. the size reduction of heap is imho moot because you trade it for a > > growing overflow > > (size reduction only comes from reusing dead tuples and > not adding > > index tuples --> SITC) > > I think you're comparing the wrong thing. I mean unless you do individually vacuum the overflow more frequently > Size isn't a problem in itself, size is a problem because it causes extra > i/o. Yes, and I state that at all possible occations :-) OnDisk size is a problem, really. > So a heap that's double in size necessary takes twice as > long as necessary to scan. The fact that the overflow tables > are taking up space isn't interesting if they don't have to > be scanned. The overflow does have to be read for each seq scan. And it was stated that it would be accessed with random access (follow tuple chain). But maybe we can read the overflow same as if it where an additional segment file ? > Hitting the overflow tables should be quite rare, it only > comes into play when looking at concurrently updated tuples. > It certainly happens but most tuples in the table will be > committed and not being concurrently updated by anyone else. The first update moves the row to overflow, only the 2nd next might be able to pull it back. So on average you would have at least 66% of all updated rows after last vacuum in the overflow. The problem with needing very frequent vacuums is, that you might not be able to do any work because of long transactions. Andreas
pgsql-hackers by date: