Thread: LP_DEAD hinting and not holding on to a buffer pin on leaf page (Was:[HACKERS] [WIP] Zipfian distribution in pgbench)
LP_DEAD hinting and not holding on to a buffer pin on leaf page (Was:[HACKERS] [WIP] Zipfian distribution in pgbench)
From
Peter Geoghegan
Date:
On Fri, Jul 14, 2017 at 5:06 PM, Peter Geoghegan <pg@bowt.ie> wrote: > I think that what this probably comes down to, more than anything > else, is that you have leftmost hot/bloated leaf pages like this: > > > idx | level | l_item | blkno | btpo_prev | > btpo_next | btpo_flags | type | live_items | dead_items | > avg_item_size | page_size | free_size | highkey > -----------------------+-------+--------+-------+-----------+-----------+------------+------+------------+------------+---------------+-----------+-----------+------------------------- > ... > pgbench_accounts_pkey | 0 | 1 | 1 | 0 | > 2751 | 65 | l | 100 | 41 | 16 | > 8192 | 5328 | 11 00 00 00 00 00 00 00 > pgbench_accounts_pkey | 0 | 2 | 2751 | 1 | > 2746 | 65 | l | 48 | 90 | 16 | > 8192 | 5388 | 32 00 00 00 00 00 00 00 > ... > > The high key for the leftmost shows that only values below 0x11 belong > on the first page. This is about 16 or 17 possible distinct values, > and yet the page has 100 live items, and 41 dead items; in total, > there is room for 367 items. It's only slightly better with other > nearby pages. This is especially bad because once the keyspace gets > split up this finely, it's *impossible* to reverse it -- it's more or > less a permanent problem, at least until a REINDEX. I've been thinking about this a lot, because this really does look like a pathological case to me. I think that this workload is very sensitive to how effective kill_prior_tuples/LP_DEAD hinting is. Or at least, I can imagine that mechanism doing a lot better than it actually manages to do here. I wonder if it's possible that commit 2ed5b87f9, which let MVCC snapshots not hold on to a pin on leaf pages, should have considered workloads like this. The whole point of that commit was to reduce blocking by VACUUM on index scans, and I think that it was effective in doing that in many important cases. My concern is that the commit added this, which could make LP_DEAD hinting occur less frequently: + * If the pin was released after reading the page, then we re-read it. If it + * has been modified since we read it (as determined by the LSN), we dare not + * flag any entries because it is possible that the old entry was vacuumed + * away and the TID was re-used by a completely different heap tuple. */void -_bt_killitems(IndexAnd, ScanDesc scan, bool haveLock) +_bt_killitems(IndexScanDesc scan) Even if I'm basically right about this making LP_DEAD hinting occur less frequently in cases like the one from Alik, it might have actually been worth it even from the point of view of index bloat, because VACUUM operations complete more frequently, and so there is an indirect boost. It's not as if we've been that great at assessing problems like this before now, and so it bears considering if we could do better here without introducing much additional complexity. I just don't know. I would like to hear opinions on: How much of a difference might this have actually made? If it's a significant factor for any important workload, what can be done about it? Does anyone have any ideas about a less restrictive strategy for avoiding spuriously killing an index tuple whose heap TID was just recycled? ISTM that by doing this LSN check, _bt_killitems() is least likely to work when and where it is most needed. The most obvious way to correctly proceed even when LSN has changed would be to revisit the heap page when the leaf page LSN check doesn't work out, and revalidate the safety of proceeding with killing tuples. At least with unique indexes. It's probably extremely unlikely that the problem that we must avoid here will actually be observed in the real world, so it seems likely that an approach like this will be worth it. -- Peter Geoghegan
Re: LP_DEAD hinting and not holding on to a buffer pin on leaf page(Was: [HACKERS] [WIP] Zipfian distribution in pgbench)
From
Peter Geoghegan
Date:
On Tue, Jul 25, 2017 at 3:02 PM, Peter Geoghegan <pg@bowt.ie> wrote: > I've been thinking about this a lot, because this really does look > like a pathological case to me. I think that this workload is very > sensitive to how effective kill_prior_tuples/LP_DEAD hinting is. Or at > least, I can imagine that mechanism doing a lot better than it > actually manages to do here. I wonder if it's possible that commit > 2ed5b87f9, which let MVCC snapshots not hold on to a pin on leaf > pages, should have considered workloads like this. While the benchmark Alik came up with is non-trivial to reproduce, I can show a consistent regression for a simple case with only one active backend. I'm not sure whether or not this is considered an acceptable trade-off -- I didn't look through the archives from around March of 2015 just yet. Setup: Initialize pgbench (any scale factor). create index on pgbench_accounts (aid); The point of this example is to show a simple query that is never quite HOT-safe, where we need to create a new index entry in an index for that reason alone. Theoretically, only one index needs to be updated, not all indexes, because only one index covers attributes that have truly changed. For example, if we had WARM, I think that many of these theoretically unnecessary index tuple insertions would not happen. When they do happen, because we don't have something like WARM, it seems important that the kill_prior_tuples/LP_DEAD stuff does its job. I don't think that it will consistently do that, though. Steps: (Set debugger breakpoint from within _bt_killitems()) Test queries (run these from a single interactive psql session): update pgbench_accounts set abalance = abalance + 1 where aid = 763; update pgbench_accounts set abalance = abalance + 1 where aid = 763; update pgbench_accounts set abalance = abalance + 1 where aid = 763; We now see that no update ever kills items within _bt_killitems(), because our own update to the index leaf page itself nullifies our ability to kill anything, by changing the page LSN from the one stashed in the index scan state variable. Fortunately, we are not really "self-blocking" dead item cleanup here, because the _bt_check_unique() logic for killing all_dead index entries saves the day by not caring about LSNs. However, that only happens because the index on "aid" is a unique index. If we drop the primary key, and create a regular index on "aid" in its place, then the same UPDATE queries really do self-block from killing index tuples due to changes in 2ed5b87f9, which could be pretty bad if there wasn't some selects to do the kill_prior_tuple stuff instead. I verified that this regression against 9.4 exists, just to be sure that the problem wasn't somehow there all along -- the regression is real.:-( -- Peter Geoghegan
Re: LP_DEAD hinting and not holding on to a buffer pin on leaf page(Was: [HACKERS] [WIP] Zipfian distribution in pgbench)
From
Peter Geoghegan
Date:
On Tue, Jul 25, 2017 at 8:02 PM, Peter Geoghegan <pg@bowt.ie> wrote: > Setup: > > Initialize pgbench (any scale factor). > create index on pgbench_accounts (aid); That "create index" was meant to be on "abalance", to make the UPDATE queries always HOT-unsafe. (You'll want to *also* create this index once the PK is dropped, to show how killing items on recent Postgres versions hinges upon this being a unique index). -- Peter Geoghegan
Re: LP_DEAD hinting and not holding on to a buffer pin on leaf page(Was: [HACKERS] [WIP] Zipfian distribution in pgbench)
From
Robert Haas
Date:
On Tue, Jul 25, 2017 at 11:02 PM, Peter Geoghegan <pg@bowt.ie> wrote: > While the benchmark Alik came up with is non-trivial to reproduce, I > can show a consistent regression for a simple case with only one > active backend. That's not good. > We now see that no update ever kills items within _bt_killitems(), > because our own update to the index leaf page itself nullifies our > ability to kill anything, by changing the page LSN from the one > stashed in the index scan state variable. Fortunately, we are not > really "self-blocking" dead item cleanup here, because the > _bt_check_unique() logic for killing all_dead index entries saves the > day by not caring about LSNs. However, that only happens because the > index on "aid" is a unique index. This seems like an oversight. If we modify the page ourselves, could we check whether the saved LSN is still current just before, and if so, update the saved LSN just after? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: LP_DEAD hinting and not holding on to a buffer pin on leaf page(Was: [HACKERS] [WIP] Zipfian distribution in pgbench)
From
Peter Geoghegan
Date:
Robert Haas <robertmhaas@gmail.com> wrote: >> We now see that no update ever kills items within _bt_killitems(), >> because our own update to the index leaf page itself nullifies our >> ability to kill anything, by changing the page LSN from the one >> stashed in the index scan state variable. Fortunately, we are not >> really "self-blocking" dead item cleanup here, because the >> _bt_check_unique() logic for killing all_dead index entries saves the >> day by not caring about LSNs. However, that only happens because the >> index on "aid" is a unique index. > >This seems like an oversight. If we modify the page ourselves, could >we check whether the saved LSN is still current just before, and if >so, update the saved LSN just after? I really don't know if that would be worthwhile. It would certainly fix the regression shown in my test case, but that might not go far enough. I strongly suspect that there are more complicated workloads where LP_DEAD cleanup from SELECT statements matters, which is prevented by the LSN check thing, just because there are always other sessions that modify the page concurrently. This might be true of Alik's Zipfian test case, for example. In Alik's workload, there are two queries: One UPDATE, one SELECT. Even though the bloated index was a unique index, and so still gets _bt_check_unique() item killing, the regression is still going to block LP_DEAD cleanup by the SELECTs, which seems like it might be quite important there. After all, _bt_check_unique() cleanup has to happen with an exclusive buffer lock held, whereas the kill_prior_tuple stuff happens with only a shared buffer lock held. It's not hard to imaging that there will be a whole lot less LP_DEAD setting, overall. Alik's workload was one with a high degree of contention on just a few leaf pages, pages that become very bloated. It's not fair to blame the bloat we saw there on this regression, but I have to wonder how much it may have contributed. -- Peter Geoghegan
Re: LP_DEAD hinting and not holding on to a buffer pin on leaf page(Was: [HACKERS] [WIP] Zipfian distribution in pgbench)
From
Peter Geoghegan
Date:
Peter Geoghegan <pg@bowt.ie> wrote: >In Alik's workload, there are two queries: One UPDATE, one SELECT. Even >though the bloated index was a unique index, and so still gets >_bt_check_unique() item killing, the regression is still going to block >LP_DEAD cleanup by the SELECTs, which seems like it might be quite >important there. After all, _bt_check_unique() cleanup has to happen >with an exclusive buffer lock held, whereas the kill_prior_tuple stuff >happens with only a shared buffer lock held. It's not hard to imaging >that there will be a whole lot less LP_DEAD setting, overall. Actually, there is a much bigger reason why SELECT statement LP_DEAD killing matters more to Alik's workload than _bt_check_unique() LP_DEAD killing: The UPDATE query itself does not affect indexed columns. Most UPDATEs are actually HOT-safe (despite the degree of index bloat we see), and so most UPDATEs will never reach _bt_check_unique(). -- Peter Geoghegan
Re: LP_DEAD hinting and not holding on to a buffer pin on leaf page(Was: [HACKERS] [WIP] Zipfian distribution in pgbench)
From
Amit Kapila
Date:
On Wed, Jul 26, 2017 at 3:32 AM, Peter Geoghegan <pg@bowt.ie> wrote: > On Fri, Jul 14, 2017 at 5:06 PM, Peter Geoghegan <pg@bowt.ie> wrote: >> I think that what this probably comes down to, more than anything >> else, is that you have leftmost hot/bloated leaf pages like this: >> >> >> idx | level | l_item | blkno | btpo_prev | >> btpo_next | btpo_flags | type | live_items | dead_items | >> avg_item_size | page_size | free_size | highkey >> -----------------------+-------+--------+-------+-----------+-----------+------------+------+------------+------------+---------------+-----------+-----------+------------------------- >> ... >> pgbench_accounts_pkey | 0 | 1 | 1 | 0 | >> 2751 | 65 | l | 100 | 41 | 16 | >> 8192 | 5328 | 11 00 00 00 00 00 00 00 >> pgbench_accounts_pkey | 0 | 2 | 2751 | 1 | >> 2746 | 65 | l | 48 | 90 | 16 | >> 8192 | 5388 | 32 00 00 00 00 00 00 00 >> ... >> >> The high key for the leftmost shows that only values below 0x11 belong >> on the first page. This is about 16 or 17 possible distinct values, >> and yet the page has 100 live items, and 41 dead items; in total, >> there is room for 367 items. It's only slightly better with other >> nearby pages. This is especially bad because once the keyspace gets >> split up this finely, it's *impossible* to reverse it -- it's more or >> less a permanent problem, at least until a REINDEX. > > I've been thinking about this a lot, because this really does look > like a pathological case to me. I think that this workload is very > sensitive to how effective kill_prior_tuples/LP_DEAD hinting is. Or at > least, I can imagine that mechanism doing a lot better than it > actually manages to do here. I wonder if it's possible that commit > 2ed5b87f9, which let MVCC snapshots not hold on to a pin on leaf > pages, should have considered workloads like this. > Isn't it possible to confirm if the problem is due to commit 2ed5b87f9? Basically, if we have unlogged tables, then it won't release the pin. So if the commit in question is the culprit, then the same workload should not lead to bloat. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: LP_DEAD hinting and not holding on to a buffer pin on leaf page(Was: [HACKERS] [WIP] Zipfian distribution in pgbench)
From
Peter Geoghegan
Date:
Amit Kapila <amit.kapila16@gmail.com> wrote: >Isn't it possible to confirm if the problem is due to commit >2ed5b87f9? Basically, if we have unlogged tables, then it won't >release the pin. So if the commit in question is the culprit, then >the same workload should not lead to bloat. That's a great idea. Alik? -- Peter Geoghegan
Re: LP_DEAD hinting and not holding on to a buffer pin on leaf page(Was: [HACKERS] [WIP] Zipfian distribution in pgbench)
From
Robert Haas
Date:
On Thu, Jul 27, 2017 at 10:05 PM, Peter Geoghegan <pg@bowt.ie> wrote: > I really don't know if that would be worthwhile. It would certainly fix > the regression shown in my test case, but that might not go far enough. > I strongly suspect that there are more complicated workloads where > LP_DEAD cleanup from SELECT statements matters, which is prevented by > the LSN check thing, just because there are always other sessions that > modify the page concurrently. This might be true of Alik's Zipfian test > case, for example. I haven't studied the test case, but I think as a general principle it makes sense to be happy when someone comes up with an algorithm that holds a lock for a shorter period of time (and buffer pins are a type of lock). There are a number of places (fast-path locking, for example, or vacuum skipping pinned heap pages) where we have fast-paths that get taken most of the time and slow paths that get used when concurrent activity happens; empirically, such things often work out to a win. I think it's disturbing that this code seems to be taking the slow-path (which, in context, means skipping LP_DEAD cleanup) even there is no concurrent activity. That's hard to justify. But the fact that it is taking the slow-path when there *is* concurrent activity is harder to complain about. That might win or it might lose; the non-concurrent case only loses. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: LP_DEAD hinting and not holding on to a buffer pin on leaf page(Was: [HACKERS] [WIP] Zipfian distribution in pgbench)
From
Peter Geoghegan
Date:
Robert Haas <robertmhaas@gmail.com> wrote: >On Thu, Jul 27, 2017 at 10:05 PM, Peter Geoghegan <pg@bowt.ie> wrote: >> I really don't know if that would be worthwhile. It would certainly fix >> the regression shown in my test case, but that might not go far enough. >> I strongly suspect that there are more complicated workloads where >> LP_DEAD cleanup from SELECT statements matters, which is prevented by >> the LSN check thing, just because there are always other sessions that >> modify the page concurrently. This might be true of Alik's Zipfian test >> case, for example. > >I haven't studied the test case, but I think as a general principle it >makes sense to be happy when someone comes up with an algorithm that >holds a lock for a shorter period of time (and buffer pins are a type >of lock). There are a number of places (fast-path locking, for >example, or vacuum skipping pinned heap pages) where we have >fast-paths that get taken most of the time and slow paths that get >used when concurrent activity happens; empirically, such things often >work out to a win. I think it's disturbing that this code seems to be >taking the slow-path (which, in context, means skipping LP_DEAD >cleanup) even there is no concurrent activity. That's hard to >justify. That is hard to justify. I don't think that failing to set LP_DEAD hints is the cost that must be paid to realize a benefit elsewhere, though. I don't see much problem with having both benefits consistently. It's actually very unlikely that VACUUM will run, and a TID will be recycled at exactly the wrong time. We could probably come up with a more discriminating way of detecting that that may have happened, at least for Postgres 11. We'd continue to use LSN; the slow path would be taken when the LSN changed, but we do not give up on setting LP_DEAD bits. I think we can justify going to the heap again in this slow path, if that's what it takes. >But the fact that it is taking the slow-path when there *is* >concurrent activity is harder to complain about. That might win or it >might lose; the non-concurrent case only loses. Let's wait to see what difference it makes if Alik's zipfian distribution pgbench test case uses unlogged tables. That may gives us a good sense of the problem for cases with contention/concurrency. -- Peter Geoghegan
Re: LP_DEAD hinting and not holding on to a buffer pin on leaf page(Was: [HACKERS] [WIP] Zipfian distribution in pgbench)
From
Robert Haas
Date:
On Mon, Jul 31, 2017 at 1:54 PM, Peter Geoghegan <pg@bowt.ie> wrote: > That is hard to justify. I don't think that failing to set LP_DEAD hints > is the cost that must be paid to realize a benefit elsewhere, though. I > don't see much problem with having both benefits consistently. It's > actually very unlikely that VACUUM will run, and a TID will be recycled > at exactly the wrong time. We could probably come up with a more > discriminating way of detecting that that may have happened, at least > for Postgres 11. We'd continue to use LSN; the slow path would be taken > when the LSN changed, but we do not give up on setting LP_DEAD bits. I > think we can justify going to the heap again in this slow path, if > that's what it takes. Yeah, that might possibly be a good approach. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: LP_DEAD hinting and not holding on to a buffer pin on leaf page(Was: [HACKERS] [WIP] Zipfian distribution in pgbench)
From
Peter Geoghegan
Date:
Robert Haas <robertmhaas@gmail.com> wrote: >On Mon, Jul 31, 2017 at 1:54 PM, Peter Geoghegan <pg@bowt.ie> wrote: >> That is hard to justify. I don't think that failing to set LP_DEAD hints >> is the cost that must be paid to realize a benefit elsewhere, though. I >> don't see much problem with having both benefits consistently. It's >> actually very unlikely that VACUUM will run, and a TID will be recycled >> at exactly the wrong time. We could probably come up with a more >> discriminating way of detecting that that may have happened, at least >> for Postgres 11. We'd continue to use LSN; the slow path would be taken >> when the LSN changed, but we do not give up on setting LP_DEAD bits. I >> think we can justify going to the heap again in this slow path, if >> that's what it takes. > >Yeah, that might possibly be a good approach. I also wonder if it's worth teaching lazy_scan_heap() to keep around a list of TIDs that can at least have their LP_DEAD bit set within their index page, for use during subsequent index vacuuming. Doing at least this much for TIDs from heap pages that are skipped due to some other session concurrently holding a pin on the heap page ("pinskipped_pages" pages) could help some cases, and seems doable. VACUUM does not need an extra interlock against another VACUUM (such as a buffer pin) here, of course. I wouldn't expect this to help very much on many workloads, including Alik's Zipfian workload, but it might be useful to have a real guarantee about how long it can be, in VACUUM cycles, before a dead index tuple at least has its LP_DEAD bit set. LP_DEAD will only be set when an index scan goes to the heap, and it's not hard to imagine workloads where no index scan ever does that with dead tuples whose heap TIDs were killed only very recently. Unlike with heap pruning, setting the LP_DEAD bit of all dead index tuples on a leaf page is pretty much as good as a full VACUUM of the page. The only thing that makes it not quite as good is that you cannot assume that it's safe to kill the heap TIDs afterwards. -- Peter Geoghegan
Re: LP_DEAD hinting and not holding on to a buffer pin on leaf page(Was: [HACKERS] [WIP] Zipfian distribution in pgbench)
From
Peter Geoghegan
Date:
On Mon, Jul 31, 2017 at 10:54 AM, Peter Geoghegan <pg@bowt.ie> wrote: > Let's wait to see what difference it makes if Alik's zipfian > distribution pgbench test case uses unlogged tables. That may gives us a > good sense of the problem for cases with contention/concurrency. Yura Sokolov of Postgres Pro performed this benchmark at my request. He took the 9.5 commit immediately proceeding 2ed5b87f9 as a baseline. In all cases, logged tables were used. Note that this is not the most effective benchmark for showing the regression, because he didn't replace the PK with a non-unique index, though that is planned as follow-up; I wanted to stick with Alik's Zipfian test case for the time being. (Using a unique index is not the most effective thing for showing the regression because unique indexes have most LP_DEAD setting done in _bt_check_unique(), so only SELECTs will do less LP_DEAD cleanup there; SELECTs are 50% of all queries.) His results with 10 minute pgbench runs: Logged clients | 8217fb14 | 2ed5b87f | master | hashsnap | hashsnap_lwlock --------+----------+----------+--------+----------+---------------- 10 | 201569 | 204154 | 201095 | 201793 | 206111 20 | 328180 | 333358 | 334858 | 336721 | 370769 40 | 196352 | 194692 | 232413 | 231968 | 393947 70 | 121083 | 116881 | 148066 | 148443 | 224162 110 | 77989 | 73414 | 99305 | 111404 | 161900 160 | 48958 | 45830 | 65830 | 82340 | 115678 230 | 27699 | 25510 | 38186 | 57617 | 80575 310 | 16369 | 15137 | 21571 | 39745 | 56819 400 | 10327 | 9486 | 13157 | 27318 | 40859 500 | 6920 | 6496 | 8638 | 18677 | 29429 650 | 4315 | 3971 | 5196 | 11762 | 17883 800 | 2850 | 2772 | 3523 | 7733 | 10977 Note that you also see numbers from various patches from Yura, and the master branch mixed in here, but 8217fb14 (before) and 2ed5b87f (after) are the interesting numbers as far as this regression goes. There is an appreciable reduction in TPS here, though this workload is not as bad by that measure as first thought. There is a roughly 5% regression here past 40 clients or so. The regression in the *consistency* of transaction *throughput* is far more interesting, though. I've been doing analysis of this by drilling down to individual test cases with vimdiff, as follows: $ vimdiff test_8217fb14_logged_1_pgbench_40.out test_2ed5b87f_logged_1_pgbench_40.out (I attach these two files as a single example. I can provide the full results to those that ask for them privately; it's too much data to attach to an e-mail to the list.) You can see in this example that for most 5 second slices of the 10 minute benchmark, commit 2ed5b87f actually increases TPS somewhat, which is good. But there are also occasional *big* drops in TPS, sometimes by as much as 50% over a single 5 second period (when ANALYZE runs, adding random I/O during holding an exclusive buffer lock [1]?). When this slowdown happens, latency can be over 3 times higher, too. We see much more consistent performance without the B-Tree buffer pin VACUUM optimization, where there is no discernible pattern of performance dropping. The headline regression of 4% or 5% is not the main problem here, it seems. In summary, commit 2ed5b87f makes the workload have increased throughput most of the time, but occasionally sharply reduces throughput, which averages out to TPS being 4% or 5% lower overall. I think we'll find that there are bigger problems TPS-wise with non-unique indexes when that other test is performed by Yura; let's wait for those results to come back. Finally, I find it interesting that when Yura did the same benchmark, but with 5% SELECTs + 95% UPDATEs, rather than 50% SELECTs + 50% UPDATEs as above, the overall impact was surprisingly similar. His results: clients | 8217fb14 | 2ed5b87f | master | hashsnap | hashsnap_lwlock --------+----------+----------+--------+----------+---------------- 20 | 187697 | 187335 | 217558 | 215059 | 266894 50 | 81272 | 78784 | 97948 | 97659 | 157710 85 | 49446 | 47683 | 64597 | 70814 | 107748 130 | 32358 | 30393 | 42216 | 50531 | 75001 190 | 19403 | 17569 | 25704 | 35506 | 51292 270 | 10803 | 9878 | 14166 | 23851 | 35257 370 | 6108 | 5645 | 7684 | 15390 | 23659 500 | 3649 | 3297 | 4225 | 9172 | 14643 650 | 2239 | 2049 | 2635 | 5887 | 8588 800 | 1475 | 1424 | 1804 | 4035 | 5611 If nothing else, this shows how generally reliant these kinds of workload can be on LP_DEAD setting. And, there is one difference: The regression is seen here at *all* client counts, even with only 20 clients, This is presumably because with only 5% SELECTs it's more important that those few remaining SELECTs be able to perform LP_DEAD setting. [1] https://wiki.postgresql.org/wiki/Key_normalization#Avoiding_unnecessary_unique_index_enforcement -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Re: LP_DEAD hinting and not holding on to a buffer pin on leaf page(Was: [HACKERS] [WIP] Zipfian distribution in pgbench)
From
Peter Geoghegan
Date:
On Fri, Aug 4, 2017 at 3:30 PM, Peter Geoghegan <pg@bowt.ie> wrote: > Yura Sokolov of Postgres Pro performed this benchmark at my request. > He took the 9.5 commit immediately proceeding 2ed5b87f9 as a baseline. I attach a simple patch that comments out the release of the buffer pin for logged tables where an MVCC snapshot is used for a scan that is not an index-only scan. This is the simplest way of evaluating the effects of disabling 2ed5b87f9. Yura or others may find this helpful. To be clear, I am certainly not suggesting that we should revert 2ed5b87f9. I do think that we need to give serious thought to fixing the regression that 2ed5b87f9 introduced for LP_DEAD setting by index scans, though. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers