Re: crash-safe visibility map, take three - Mailing list pgsql-hackers
From | Jeff Davis |
---|---|
Subject | Re: crash-safe visibility map, take three |
Date | |
Msg-id | 1291333025.30909.69.camel@jdavis-ux.asterdata.local Whole thread Raw |
In response to | Re: crash-safe visibility map, take three (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: crash-safe visibility map, take three
|
List | pgsql-hackers |
On Thu, 2010-12-02 at 17:00 -0500, Robert Haas wrote: > I'm not really convinced that this problem is confined to bulk > loading. Every INSERT or UPDATE results in a new tuple that may need > hit bits set and eventually to be frozen. A bulk load is just a time > when you do lots of inserts all at once; it seems to me that a large > update would cause all the same problems, plus bloat. A big UPDATE does a lot of work, I don't see any near-term approach for solving that. Avoiding WAL (and full page writes) for the updates to PD_ALL_VISIBLE, hint bits, VM bits, etc., would probably be the least of my concerns in that case. > The triple I/O > problem exists for small transactions as well (and isn't desirable > there either); it's just less noticeable because the second and third > writes are, like the first one, small. Bulk loading poses some unique challenges because there is no opportunity to set PD_ALL_VISIBLE or hint bits before the loading is complete; and by that time, many checkpoints will have already happened, and the pages have already hit disk. That means we need to re-read them, modify them, and write them again (plus WAL, if we were following the rules). Small transactions don't suffer from the same problems. They generally begin and end without an intervening checkpoint. That means that you have an opportunity to set PD_ALL_VISIBLE or hint bits before the checkpoint happens, thus avoiding unnecessary extra writes. Additionally, small transaction workloads will generally have, to some approximation, some working set of pages. So, even if you do a read of a single tuple, write PD_ALL_VISIBLE and hint bits (thus dirtying the page), there is a reasonable chance that someone will come by later and do an insert/update/delete (thus forcing WAL anyway). And if the small transaction workload is completely random and you aren't touching the same pages between checkpoints, then setting hint bits one-at-a-time is not a good strategy anyway. It would be much better to do it in bulk with a VACUUM. And if VACUUM does anything significant to a page, it's going to WAL anyway. I'm having trouble seeing a case other than bulk-loading which causes a real problem. Maybe a small-transaction workload with a few long-running transactions? Even that doesn't seem so bad. > It seems to me that a COPY command executed in a transaction with no > other open snapshots writing to a table created or truncated within > the same transaction should be able to write frozen tuples from the > get-go, regardless of anything else we do. Well, some transaction might pick up a snapshot between the time you begin the copy and the time it commits. We'd need to prevent such a transaction from actually reading the table. > I don't think it would be appropriate to hold off > making the visibility map crash-safe, on the off chance that our > design for so doing might complicate something else we want to do > later. I'm not suggesting we hold off on it at all. To the contrary, I'm suggesting that we simply log updates of PD_ALL_VISIBLE as well as VM bits, at least until a performance problem presents itself. That will _simplify_ the design. Then, when a performance problem does present itself for a certain use case, we can see how to fix it. If many cases are affected, then we might choose one of these more creative solutions that breaks the rules in controlled ways, understanding the trade-offs. If only bulk loading is affected, we might choose to address that case directly. Regards,Jeff Davis
pgsql-hackers by date: