Thread: index-only scans vs. Hot Standby, round two
Currently, we have a problem with index-only scans in Hot Standby mode: the xmin horizon on the standby might lag the master, and thus an index-only scan might mistakenly conclude that no heap fetch is needed when in fact it is. I suggested that we handle this by suppressing generation of index-only scan plans in Hot Standby mode, but Simon, Noah, and Dimitri were arguing that we should instead do the following, which is now on the open items list: * Make XLOG_HEAP2_VISIBLE records generate recovery snapshot conflicts so that IndexOnlyScans work on Hot Standby Ideally, this would also allow us to remove the following hack from heapgetpage(), which would improve sequential-scan performance for Hot Standby. /* * If the all-visible flag indicates that all tuples on the page are * visible to everyone, we can skip the per-tuplevisibility tests. But * not in hot standby mode. A tuple that's already visible to all * transactions in themaster might still be invisible to a read-only * transaction in the standby. */ all_visible = PageIsAllVisible(dp)&& !snapshot->takenDuringRecovery; As far as I can see, to make this work, we'd need to have lazy_scan_heap() compute the latest xmin on any potentially all-visible page and treat that as a cutoff XID, cancelling queries with snapshots whose xmins equal or precede that value, just as we currently do when freezing tuples (cf xlog_heap_freeze). There are two things to worry about: correctness, and more query cancellations. In the department of query cancellations, I believe Noah argued previously that this wasn't really going to cause a problem. And, indeed, if the master has a mix of inserts, updates, and deletes, then it seems likely that any recovery conflicts generated this way would be hitting about the same place in the XID space as the ones caused by pruning away dead tuples. What will be different, if we go this route, is that an insert-only master, which right now only generates conflicts at freezing time, will instead generate conflicts much sooner, as soon as the relation is vacuumed. I do not use Hot Standby enough to know whether this is a problem, and I'm not trying to block this approach, but I do want to make sure that everyone agrees that it's acceptable before we go do it, because inevitably somebody is going to get bit. In making our decision, I think we should keep in mind that we are currently pretty laid-back about marking pages all-visible, and that's probably something we're going to want to change over time: for example, we recently discussed the fact that a page with one dead tuple in it currently needs *two* vacuums to become all-visible, because only the first vacuum pass is smart enough to mark things all-visible, and the first heap pass only truncates the dead tuple to a dead line pointer, so the page isn't all-visible a that point. When we fix that, which I think we're almost certainly going to want to do at some point, then that means these conflicts will occur that much sooner. I think we will likely also want to consider marking things all-visible on the fly at some point in the future; for example, if we pull a buffer in for an index-only scan and dirty it by setting a hint bit, we might want to take the plunge and mark it all-visible before evicting it, to save effort the next time. Again, not trying to dissuade anyone, just making sure we've thought through it enough to avoid being unhappy later. It's worth noting that none of this will normally matter if hot_standby_feedback=on, so part of the analysis here may depend on how many people we think have flipped that switch. As to correctness, after further review of lazy_scan_heap(), I think there are some residual bugs here that need to be fixed whatever we decide to do about the Hot Standby case: 1. I noticed that when we do PageSetAllVisible() today we follow it up with SetBufferCommitInfoNeedsSave(). PD_ALL_VISIBLE is not merely a hint, so I think these should be changed to MarkBufferDirty(), which shouldn't make much difference given current code, but proposals to handle hint changes differently than non-hint changes abound, so it's probably not good to rely on those meaning the same thing forever. 2. More seriously, lazy_scan_heap() releases the buffer lock before setting the all-visible bit on the heap. This looks just plain wrong (and it's my fault, to be clear). Somebody else can come along after we've released the buffer lock and mark the page not-all-visible. That concurrent process will check the visibility map bit, find it already clear, and go on its merry way. Then, we set the visibility map bit and go on our merry way. Now we have a not-all-visible page with the all-visible bit set in the visibility map. Oops. I think this probably needs to be rewritten so that lazy_scan_heap() acquires a pin on the visibility map page before locking the buffer for cleanup and holds onto that pin until either we exit the range of heap pages covered by that visibility map page or trigger index vac due to maintenance_work_mem exhaustion. With that infrastructure in place, we can set the visibility map bit at the same time we set the page-level bit without risking holding the buffer lock across a buffer I/O (we might have to hold the buffer lock across a WAL flush in the worst case, but that hazard exists in a number of other places as well and fixing it here is well out of scope). Now, suppose we fix the above bugs and also add logic to generate query-conflicts as described above. How far can the standby now trust the visibility information? Setting the visibility map bit is a fully WAL-logged operation, so anyone for whom seeing the bit as set would potentially be a problem is certain to be killed before they get the chance to become confused. And when we first open for read-only connections after a new base backup, the initial snapshot had better be able to see all XIDs which have committed prior to that point, and only such things should be marked all-visible, so AFAICT the door is nailed shut pretty tight here. I am a little less certain about the page-level bit. On the master, it's possible for the PD_ALL_VISIBLE bit to make it to disk before the WAL record that sets the visibility map bit; the LSN interlock only protects the visibility map page itself, as part of a complicated strategy to avoid emitting FPWs for the entire heap when we vacuum an insert-only table. On the standby, those same bits are going to get set when the xlog records covering the setting of visibility map bit get replayed (which would be OK, in the sense that the page-level bit would be trustworthy in this case), or when a copy of the buffer makes it from master to standby by some other means (which is the potentially problematic case). This could happen either as part of a base backup, or via a FPW. I don't think the base backup case is a problem for the same reasons that it's OK for the visibility map bit case. If an FPW turns the bit on, then somehow the page-level bit got set without the visibility map bit getting set. I think the only for that to happen is: 1. vacuum on master sets the page-level bit and the visibility map bit 2. the heap page with the bit is written to disk BEFORE the WAL entry generated in (1) gets flushed; this is allowable, since it's not an error for the page-level bit to be set while the visibility-map bit is unset, only the other way around 3. crash (still before the WAL entry is flushed) 4. some operation on the master emits an FPW for the page without first rendering it not-all-visible At present, I'm not sure there's any real problem here, since normally an all-visible heap page is only going to get another WAL entry if somebody does an insert, update, or delete on it, in which case the visibility map bit is going to get cleared anyway. Freezing the page might emit a new FPW, but that's going to generate conflicts anyway, so no problem there. So I think there's no real problem here, but it doesn't seem totally future-proof - any future type of WAL record that might modify the page without rendering it not-all-visible would create a very subtle hazard for Hot Standby. We could dodge the whole issue by leaving the hack in heapgetpage() intact and just be happy with making index-only scans work, or maybe somebody has a more clever idea. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Apr 13, 2012 at 5:33 PM, Robert Haas <robertmhaas@gmail.com> wrote: > Currently, we have a problem with index-only scans in Hot Standby > mode: the xmin horizon on the standby might lag the master, and thus > an index-only scan might mistakenly conclude that no heap fetch is > needed when in fact it is. I suggested that we handle this by > suppressing generation of index-only scan plans in Hot Standby mode, > but Simon, Noah, and Dimitri were arguing that we should instead do > the following, which is now on the open items list: > > * Make XLOG_HEAP2_VISIBLE records generate recovery snapshot conflicts > so that IndexOnlyScans work on Hot Standby ... <snip> very long email </snip> Luckily its much simpler than all of that suggests. It'll take a few hours for me to write a short reply but its Sunday today, so that will happen later. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Apr 13, 2012 at 12:33:06PM -0400, Robert Haas wrote: > In the department of query cancellations, I believe Noah argued > previously that this wasn't really going to cause a problem. And, > indeed, if the master has a mix of inserts, updates, and deletes, then > it seems likely that any recovery conflicts generated this way would > be hitting about the same place in the XID space as the ones caused by > pruning away dead tuples. What will be different, if we go this > route, is that an insert-only master, which right now only generates > conflicts at freezing time, will instead generate conflicts much > sooner, as soon as the relation is vacuumed. I do not use Hot Standby > enough to know whether this is a problem, and I'm not trying to block > this approach, but I do want to make sure that everyone agrees that > it's acceptable before we go do it, because inevitably somebody is > going to get bit. Given sufficient doubt, we could add a GUC, say hot_standby_use_all_visible. A standby with the GUC "off" would ignore all-visible indicators and also decline to generate conflicts when flipping them on. > As to correctness, after further review of lazy_scan_heap(), I think > there are some residual bugs here that need to be fixed whatever we > decide to do about the Hot Standby case: > > 1. I noticed that when we do PageSetAllVisible() today we follow it up > with SetBufferCommitInfoNeedsSave(). PD_ALL_VISIBLE is not merely a > hint, so I think these should be changed to MarkBufferDirty(), which > shouldn't make much difference given current code, but proposals to > handle hint changes differently than non-hint changes abound, so it's > probably not good to rely on those meaning the same thing forever. Do you refer to PD_ALL_VISIBLE as "not merely a hint" due to the requirement to prevent a page from simultaneously having a negative PD_ALL_VISIBLE and a positive visibility map bit? That is to say, PD_ALL_VISIBLE is fully a hint in its role as a witness of tuple visibility, but it is not a hint in its role as a witness of the visibility map status? In any event, I agree that those call sites should use MarkBufferDirty(). The large comment in SetBufferCommitInfoNeedsSave() seems incorrect. On systems with weaker memory ordering, the FlushBuffer() process's clearing of BM_JUST_DIRTIED may not be visible to the SetBufferCommitInfoNeedsSave() process until some time after the former retrieved buffer contents from shared memory. Memory barriers could make the comment true, but we should probably just update the comment to explain that a race condition may eat the update entirely. Incidentally, is there a good reason for MarkBufferDirty() to increment pgBufferUsage.shared_blks_dirtied and SetBufferCommitInfoNeedsSave() not to do so? Looks like an oversight. > 2. More seriously, lazy_scan_heap() releases the buffer lock before > setting the all-visible bit on the heap. This looks just plain wrong > (and it's my fault, to be clear). Somebody else can come along after > we've released the buffer lock and mark the page not-all-visible. > That concurrent process will check the visibility map bit, find it > already clear, and go on its merry way. Then, we set the visibility > map bit and go on our merry way. Now we have a not-all-visible page > with the all-visible bit set in the visibility map. Oops. Hmm, indeed. This deserves its own open item. > I think > this probably needs to be rewritten so that lazy_scan_heap() acquires > a pin on the visibility map page before locking the buffer for cleanup > and holds onto that pin until either we exit the range of heap pages > covered by that visibility map page or trigger index vac due to > maintenance_work_mem exhaustion. With that infrastructure in place, > we can set the visibility map bit at the same time we set the > page-level bit without risking holding the buffer lock across a buffer > I/O (we might have to hold the buffer lock across a WAL flush in the > worst case, but that hazard exists in a number of other places as well > and fixing it here is well out of scope). Looks reasonable at a glance. > 1. vacuum on master sets the page-level bit and the visibility map bit > 2. the heap page with the bit is written to disk BEFORE the WAL entry > generated in (1) gets flushed; this is allowable, since it's not an > error for the page-level bit to be set while the visibility-map bit is > unset, only the other way around > 3. crash (still before the WAL entry is flushed) > 4. some operation on the master emits an FPW for the page without > first rendering it not-all-visible > > At present, I'm not sure there's any real problem here, since normally > an all-visible heap page is only going to get another WAL entry if > somebody does an insert, update, or delete on it, in which case the > visibility map bit is going to get cleared anyway. Freezing the page > might emit a new FPW, but that's going to generate conflicts anyway, > so no problem there. So I think there's no real problem here, but it > doesn't seem totally future-proof - any future type of WAL record that > might modify the page without rendering it not-all-visible would > create a very subtle hazard for Hot Standby. We could dodge the whole > issue by leaving the hack in heapgetpage() intact and just be happy > with making index-only scans work, or maybe somebody has a more clever > idea. Good point. We could finagle things so the standby notices end-of-recovery checkpoints and blocks the optimization for older snapshots. For example, maintain an integer count of end-of-recovery checkpoints seen and store that in each Snapshot instead of takenDuringRecovery; use 0 on a master. When the global value is greater than the snapshot's value, disable the optimization for that snapshot. I don't know whether the optimization is powerful enough to justify that level of trouble, but it's an option to consider. For a different approach, targeting the fragility, we could add assertions to detect unexpected cases of a page bearing PD_ALL_VISIBLE submitted as a full-page image. Thanks, nm
On Mon, Apr 16, 2012 at 8:02 AM, Noah Misch <noah@leadboat.com> wrote: > On Fri, Apr 13, 2012 at 12:33:06PM -0400, Robert Haas wrote: >> In the department of query cancellations, I believe Noah argued >> previously that this wasn't really going to cause a problem. And, >> indeed, if the master has a mix of inserts, updates, and deletes, then >> it seems likely that any recovery conflicts generated this way would >> be hitting about the same place in the XID space as the ones caused by >> pruning away dead tuples. What will be different, if we go this >> route, is that an insert-only master, which right now only generates >> conflicts at freezing time, will instead generate conflicts much >> sooner, as soon as the relation is vacuumed. I do not use Hot Standby >> enough to know whether this is a problem, and I'm not trying to block >> this approach, but I do want to make sure that everyone agrees that >> it's acceptable before we go do it, because inevitably somebody is >> going to get bit. > > Given sufficient doubt, we could add a GUC, say hot_standby_use_all_visible. > A standby with the GUC "off" would ignore all-visible indicators and also > decline to generate conflicts when flipping them on. I'm against adding a new GUC, especially for that fairly thin reason. >> 1. vacuum on master sets the page-level bit and the visibility map bit >> 2. the heap page with the bit is written to disk BEFORE the WAL entry >> generated in (1) gets flushed; this is allowable, since it's not an >> error for the page-level bit to be set while the visibility-map bit is >> unset, only the other way around >> 3. crash (still before the WAL entry is flushed) >> 4. some operation on the master emits an FPW for the page without >> first rendering it not-all-visible >> >> At present, I'm not sure there's any real problem here, since normally >> an all-visible heap page is only going to get another WAL entry if >> somebody does an insert, update, or delete on it, in which case the >> visibility map bit is going to get cleared anyway. Freezing the page >> might emit a new FPW, but that's going to generate conflicts anyway, >> so no problem there. So I think there's no real problem here, but it >> doesn't seem totally future-proof - any future type of WAL record that >> might modify the page without rendering it not-all-visible would >> create a very subtle hazard for Hot Standby. We could dodge the whole >> issue by leaving the hack in heapgetpage() intact and just be happy >> with making index-only scans work, or maybe somebody has a more clever >> idea. > > Good point. We could finagle things so the standby notices end-of-recovery > checkpoints and blocks the optimization for older snapshots. For example, > maintain an integer count of end-of-recovery checkpoints seen and store that > in each Snapshot instead of takenDuringRecovery; use 0 on a master. When the > global value is greater than the snapshot's value, disable the optimization > for that snapshot. I don't know whether the optimization is powerful enough > to justify that level of trouble, but it's an option to consider. > > For a different approach, targeting the fragility, we could add assertions to > detect unexpected cases of a page bearing PD_ALL_VISIBLE submitted as a > full-page image. We don't need a future proof solution, especially not at this stage of the release cycle. Every time you add a WAL record, we need to consider the possible conflict impact. We just need a simple and clear solution. I'll work on that. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 16.04.2012 10:38, Simon Riggs wrote: > On Mon, Apr 16, 2012 at 8:02 AM, Noah Misch<noah@leadboat.com> wrote: >> On Fri, Apr 13, 2012 at 12:33:06PM -0400, Robert Haas wrote: >>> In the department of query cancellations, I believe Noah argued >>> previously that this wasn't really going to cause a problem. And, >>> indeed, if the master has a mix of inserts, updates, and deletes, then >>> it seems likely that any recovery conflicts generated this way would >>> be hitting about the same place in the XID space as the ones caused by >>> pruning away dead tuples. What will be different, if we go this >>> route, is that an insert-only master, which right now only generates >>> conflicts at freezing time, will instead generate conflicts much >>> sooner, as soon as the relation is vacuumed. I do not use Hot Standby >>> enough to know whether this is a problem, and I'm not trying to block >>> this approach, but I do want to make sure that everyone agrees that >>> it's acceptable before we go do it, because inevitably somebody is >>> going to get bit. >> >> Given sufficient doubt, we could add a GUC, say hot_standby_use_all_visible. >> A standby with the GUC "off" would ignore all-visible indicators and also >> decline to generate conflicts when flipping them on. > > I'm against adding a new GUC, especially for that fairly thin reason. Same here. Can we have a "soft" hot standby conflict that doesn't kill the query, but disables index-only-scans? In the long run, perhaps we need to store XIDs in the visibility map instead of just a bit. If you we only stored one xid per 100 pages or something like that, the storage overhead would not be much higher than what we have at the moment. But that's obviously not going to happen for 9.2... -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > Can we have a "soft" hot standby conflict that doesn't kill the query, > but disables index-only-scans? Well, there wouldn't be any way for the planner to know whether an index-only scan would be safe or not. I think this would have to look like a run-time fallback. Could it be structured as "return that the page's all-visible bit is not set, instead of failing?" Or am I confused about where the conflict is coming from? regards, tom lane
On Mon, Apr 16, 2012 at 3:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >> Can we have a "soft" hot standby conflict that doesn't kill the query, >> but disables index-only-scans? > > Well, there wouldn't be any way for the planner to know whether an > index-only scan would be safe or not. I think this would have to look > like a run-time fallback. Could it be structured as "return that the > page's all-visible bit is not set, instead of failing?" Or am I > confused about where the conflict is coming from? The starting point is that HS now offers 4 different mechanisms for avoiding conflicts, probably the best of which is hot standby feedback. So we only need to improve things if those techniques don't work for people. So initially, my attitude is lets throw a conflict and wait for feedback during beta. If we do need to do something, then introduce concept of a visibility conflict. On replay: If feedback not set, set LSN of visibility conflict on PROCs that conflict, if not already set. On query: If feedback not set, check conflict LSN against page, if page is later, check visibility. In terms of optimisation, I wouldn't expect to have to adjust costs at all. The difference would only show for long running queries and typically they don't touch too much new data as a fraction of total. The cost model for index only is pretty crude anyhow. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Apr 16, 2012 at 4:26 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Can we have a "soft" hot standby conflict that doesn't kill the query, but > disables index-only-scans? Yeah, something like that seems possible. For example, suppose the master includes, in each mark-heap-page-all-visible record, the newest XID on the page. On the standby, we keep track of the most recent such XID ever seen in shared memory. After noting that a page is all-visible, the standby cross-checks its snapshot against this XID and does the heap fetch anyway if it's too new. Conceivably this can be done with memory barriers but not without locks: we must update the XID in shared memory *before* marking the page all-visible, and we must check the visibility map or page-level bit *before* fetching the XID from shared memory - but the actual reads and writes of the XID are atomic. Now, if an index-only scan suffers a very high number of these "soft conflicts" and consequently does a lot more heap fetches than expected, performance might suck. But that should be rare, and could be mitigated by turning on hot_standby_feedback. Also, there might be some hit for repeatedly reading that XID from memory. Alternatively, we could try to forbid index-only scan plans from being created in the first place *only when* the underlying snapshot is too old. But then what happens if a conflict happens later, after the plan has been created but before it's fully executed? At that point it's too late to switch gears, so we'd still need something like the above. And the above might be adequate all by itself, since in practice index-only scans are mostly going to be useful when the data isn't changing all that fast, so most of the queries that would be cancelled by "hard" conflicts wouldn't have actually had a problem anyway. > In the long run, perhaps we need to store XIDs in the visibility map instead > of just a bit. If you we only stored one xid per 100 pages or something like > that, the storage overhead would not be much higher than what we have at the > moment. But that's obviously not going to happen for 9.2... Well, that would also have the undesirable side effect of greatly reducing the granularity of the map. As it is, updating a tiny fraction of the tuples in the table could result in the entire table ceasing to be not-all-visible if you happen to update exactly one tuple per page through the entire heap. Or more generally, updating X% of the rows in the table can cause Y% of the rows in the table to no longer be visible for index-only-scan purposes, where Y >> X. What you're proposing would make that much worse. I think we're actually going to find that the current system isn't tight enough; my suspicion is that users are going to complain that we're not aggressive enough about marking pages all-visible, because autovac won't kick in until updates+deletes>20%, but (1) index-only scans also care about inserts and (2) by the time we've got 20% dead tuples index-only scans may well be worthless. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Apr 16, 2012 at 3:02 AM, Noah Misch <noah@leadboat.com> wrote: > Do you refer to PD_ALL_VISIBLE as "not merely a hint" due to the requirement > to prevent a page from simultaneously having a negative PD_ALL_VISIBLE and a > positive visibility map bit? That is to say, PD_ALL_VISIBLE is fully a hint > in its role as a witness of tuple visibility, but it is not a hint in its role > as a witness of the visibility map status? Exactly. > The large comment in SetBufferCommitInfoNeedsSave() seems incorrect. On > systems with weaker memory ordering, the FlushBuffer() process's clearing of > BM_JUST_DIRTIED may not be visible to the SetBufferCommitInfoNeedsSave() > process until some time after the former retrieved buffer contents from shared > memory. True. > Memory barriers could make the comment true, but we should probably > just update the comment to explain that a race condition may eat the update > entirely. Agreed. > Incidentally, is there a good reason for MarkBufferDirty() to > increment pgBufferUsage.shared_blks_dirtied and SetBufferCommitInfoNeedsSave() > not to do so? Looks like an oversight. Also agreed. >> 2. More seriously, lazy_scan_heap() releases the buffer lock before >> setting the all-visible bit on the heap. This looks just plain wrong >> (and it's my fault, to be clear). Somebody else can come along after >> we've released the buffer lock and mark the page not-all-visible. >> That concurrent process will check the visibility map bit, find it >> already clear, and go on its merry way. Then, we set the visibility >> map bit and go on our merry way. Now we have a not-all-visible page >> with the all-visible bit set in the visibility map. Oops. > > Hmm, indeed. This deserves its own open item. Also agreed. > Good point. We could finagle things so the standby notices end-of-recovery > checkpoints and blocks the optimization for older snapshots. For example, > maintain an integer count of end-of-recovery checkpoints seen and store that > in each Snapshot instead of takenDuringRecovery; use 0 on a master. When the > global value is greater than the snapshot's value, disable the optimization > for that snapshot. I don't know whether the optimization is powerful enough > to justify that level of trouble, but it's an option to consider. I suspect not. It seems we can make index-only scans work without doing something like this; it's only the sequential-scan optimization we might lose. But nobody's ever really complained about not having that, to my knowledge. > For a different approach, targeting the fragility, we could add assertions to > detect unexpected cases of a page bearing PD_ALL_VISIBLE submitted as a > full-page image. The holes are narrow enough that I fear such cases would be detected only on high-velocity production systems, which is not exactly where you want to find out about such problems. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Apr 16, 2012 at 1:58 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > If we do need to do something, then introduce concept of a visibility conflict. > > On replay: > If feedback not set, set LSN of visibility conflict on PROCs that > conflict, if not already set. > > On query: > If feedback not set, check conflict LSN against page, if page is > later, check visibility. Hmm, should have read the whole thread before replying. This similar to what I just proposed in response to Heikki's message, but using LSN in lieu of (or maybe you mean in addition to) XID. I don't think we can ignore the need to throw conflicts just because hot_standby_feedback is set; there are going to be corner cases, for example, when it's just recently been turned on and the master has already done cleanup; or if the master and standby have recently gotten disconnected for even just a few seconds. But fundamentally we all seem to be converging on some variant of the "soft conflict" idea. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Apr 16, 2012 at 4:13 PM, Robert Haas <robertmhaas@gmail.com> wrote: > But fundamentally we all seem to be converging on some variant of the > "soft conflict" idea. So, as a first step, I've committed a patch that just throws a hard conflict. I think we probably want to optimize this further, and I'm going to work investigate that next. But it seemed productive to get this much out of the way first, so I did. In studying this, it strikes me that it would be rather nicer if we recovery conflicts could somehow arrange to roll back the active transaction by some means short of a FATAL error. I think there are some protocol issues with doing that, but I still wish we could figure out a way. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Apr 26, 2012 at 8:03 PM, Robert Haas <robertmhaas@gmail.com> wrote: > So, as a first step, I've committed a patch that just throws a hard > conflict. I think we probably want to optimize this further, and I'm > going to work investigate that next. But it seemed productive to get > this much out of the way first, so I did. I've been thinking about this some more. What's worrying me is that a visibility conflict, however we implement it, could be *worse* from the user's point of view than just killing the query. After all, there's a reasonable likelihood that a single visibility map page covers the whole relation (or all the blocks that the user is interested in), so any sort of conflict is basically going to turn the index-only scan into an index-scan plus some extra overhead. And if the planner had known that the user was going to get an index-only scan rather than just a plain index scan, it might well have picked some other plan in the first place. Another problem is that, if we add a test for visibility conflicts into visibilitymap_test(), I'm afraid we're going to drive up the cost of that function very significantly. Previous testing suggests that that efficiency or lack thereof of that function is already a performance problem for index-only scans, which kinda makes me not that excited about adding another branch in there somewhere (and even less excited about any proposed implementation that would add an lwlock acquire/release or similar). So on further reflection I'm thinking it may be best just to stick with a hard conflict for now and see what feedback we get from beta testers. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2 May 2012 13:41, Robert Haas <robertmhaas@gmail.com> wrote: > So on further reflection I'm thinking it may be best just to stick > with a hard conflict for now and see what feedback we get from beta > testers. Which is what I was expecting y'all to conclude once you'd looked at the task in more detail. And I'm happy with the concept of beta being a period where we listen to feedback, not just bugs, and decide whether further refinements are required. What I think is possible is to alter the conflict as it hits the backend. If a backend has enable_indexonly = off then it wouldn't be affected by those conflicts anyhow. So if we simply record whether we are using an index only plan then we'll know whether to ignore it or abort. I think we can handle that by marking the snapshot at executor startup time. Needs a few other pushups but not that hard. The likelihood is that SQL that uses index only won't run for long enough to be cancelled anyway. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services