Thread: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-04-12 16:49:25 +0000, Kevin Grittner wrote: > On a big NUMA machine with 1000 connections in saturation load > there was a performance regression due to spinlock contention, for > acquiring values which were never used. Just fill with dummy > values if we're not going to use them. FWIW, I could see massive regressions with just 64 connections. I'm a bit scared of having an innoccuous sounding option regress things by a factor of 10. I think, in addition to this fix, we need to actually solve the scalability issue here to a good degree. One way to do so is to apply the parts of 0001 in http://archives.postgresql.org/message-id/20160330230914.GH13305%40awork2.anarazel.de defining PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY and rely on that. Another to apply the whole patch and simply put the lsn in an 8 byte atomic. - Andres
Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Tue, Apr 12, 2016 at 12:38 PM, Andres Freund <andres@anarazel.de> wrote: > On 2016-04-12 16:49:25 +0000, Kevin Grittner wrote: >> On a big NUMA machine with 1000 connections in saturation load >> there was a performance regression due to spinlock contention, for >> acquiring values which were never used. Just fill with dummy >> values if we're not going to use them. > > FWIW, I could see massive regressions with just 64 connections. With what settings? With or without the patch to avoid the locks when off? > I'm a bit scared of having an innoccuous sounding option regress things > by a factor of 10. I think, in addition to this fix, we need to actually > solve the scalability issue here to a good degree. One way to do so is > to apply the parts of 0001 in > http://archives.postgresql.org/message-id/20160330230914.GH13305%40awork2.anarazel.de > defining PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY and rely on that. Another > to apply the whole patch and simply put the lsn in an 8 byte atomic. I think that we are well due for atomic access to aligned 8-byte values. That would eliminate one potential hot spot in the "snapshot too old" code, for sure. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-04-12 13:44:00 -0500, Kevin Grittner wrote: > On Tue, Apr 12, 2016 at 12:38 PM, Andres Freund <andres@anarazel.de> wrote: > > On 2016-04-12 16:49:25 +0000, Kevin Grittner wrote: > >> On a big NUMA machine with 1000 connections in saturation load > >> there was a performance regression due to spinlock contention, for > >> acquiring values which were never used. Just fill with dummy > >> values if we're not going to use them. > > > > FWIW, I could see massive regressions with just 64 connections. > > With what settings? You mean pgbench or postgres? The former -M prepared -c 64 -j 64 -S. The latter just a large enough shared buffers to contains the scale 300 database, and adapted maintenance_work_mem. Nothing special. > With or without the patch to avoid the locks when off? Without. Your commit message made it sound like you need unrealistic or at least unusual numbers of connections, and that's afaics not the case. > > I'm a bit scared of having an innoccuous sounding option regress things > > by a factor of 10. I think, in addition to this fix, we need to actually > > solve the scalability issue here to a good degree. One way to do so is > > to apply the parts of 0001 in > > http://archives.postgresql.org/message-id/20160330230914.GH13305%40awork2.anarazel.de > > defining PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY and rely on that. Another > > to apply the whole patch and simply put the lsn in an 8 byte atomic. > > I think that we are well due for atomic access to aligned 8-byte > values. That would eliminate one potential hot spot in the > "snapshot too old" code, for sure. I'm kinda inclined to apply that portion (or just the whole patch with the spurious #ifdef 0 et al fixed) into 9.6; and add the necessary checks in a few places. Because I really think this is likely to hit unsuspecting users. FWIW, accessing a frequently changing value from a significant number of connections, at a high frequency, isn't exactly free without a spinlock either. But it should be much less bad. Greetings, Andres Freund
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Tue, Apr 12, 2016 at 1:56 PM, Andres Freund <andres@anarazel.de> wrote: > On 2016-04-12 13:44:00 -0500, Kevin Grittner wrote: >> On Tue, Apr 12, 2016 at 12:38 PM, Andres Freund <andres@anarazel.de> wrote: >>> On 2016-04-12 16:49:25 +0000, Kevin Grittner wrote: >>>> On a big NUMA machine with 1000 connections in saturation load >>>> there was a performance regression due to spinlock contention, for >>>> acquiring values which were never used. Just fill with dummy >>>> values if we're not going to use them. >>> >>> FWIW, I could see massive regressions with just 64 connections. >> >> With what settings? > > You mean pgbench or postgres? The former -M prepared -c 64 -j 64 -S. The > latter just a large enough shared buffers to contains the scale 300 > database, and adapted maintenance_work_mem. Nothing special. Well, something is different between your environment and mine, since I saw no difference at scale 100 and 2.2% at scale 200. So, knowing more about your hardware, OS, configuration, etc., might allow me to duplicate a problem so I can fix it. For example, I used a "real" pg config, like I would for a production machine (because that seems to me to be the environment that is most important): the kernel is 3.13 (not one with pessimal scheduling) and has tuning for THP, the deadline scheduler, the vm.*dirty* settings, etc. Without knowing even the kernel and what tuning the OS and pg have had on your box, I could take a lot of shots in the dark without hitting anything. Oh, and the output of `numactl --hardware` would be good to have. Thanks for all information you can provide. >> With or without the patch to avoid the locks when off? > > Without. Your commit message made it sound like you need unrealistic or > at least unusual numbers of connections, and that's afaics not the case. It was the only reported case to that point, so the additional data point is valuable, if I can tell where that point is. And you don't have any evidence that even with your configuration that any performance regression remains for those who have the default value for old_snapshot_threshold? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
Hi, On 2016-04-12 14:17:12 -0500, Kevin Grittner wrote: > Well, something is different between your environment and mine, > since I saw no difference at scale 100 and 2.2% at scale 200. In a readonly test or r/w? A lot of this will be different between single-socket and multi-socket servers; as soon as you have the latter the likelihood of contention being bad goes up dramatically. > So, > knowing more about your hardware, OS, configuration, etc., might > allow me to duplicate a problem so I can fix > For example, I used a "real" pg config, like I would for a production > machine (because that seems to me to be the environment that is most > important): the kernel is 3.13 (not one with pessimal scheduling) and > has tuning for THP, the deadline scheduler, the vm.*dirty* settings, > etc. Without knowing even the kernel and what tuning the OS and pg > have had on your box, I could take a lot of shots in the dark without > hitting anything. That shouldn't really matter much for a read-only, shared_buffer resident, test? There's no IO and THP pretty much plays no role because there's very few memory allocations (removing the pressure causing the well known degradations). > Oh, and the output of `numactl --hardware` would be good to have. > Thanks for all information you can provide. That was on Alexander's/PgPro's machine. Numactl wasn't installed, and I didn't have root. But it has four numa domains (gathered via /sys/). > It was the only reported case to that point, so the additional data > point is valuable, if I can tell where that point is. And you > don't have any evidence that even with your configuration that any > performance regression remains for those who have the default value > for old_snapshot_threshold? I haven't tested yet. Greetings, Andres Freund
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Tue, Apr 12, 2016 at 2:28 PM, Andres Freund <andres@anarazel.de> wrote: > On 2016-04-12 14:17:12 -0500, Kevin Grittner wrote: >> Well, something is different between your environment and mine, >> since I saw no difference at scale 100 and 2.2% at scale 200. > > In a readonly test or r/w? Readonly with client and job counts matching scale. > A lot of this will be different between > single-socket and multi-socket servers; as soon as you have the latter > the likelihood of contention being bad goes up dramatically. Yeah, I know, and 4 socket has been at least an order of magnitude more problematic in my experience than 2 socket. And the problems are far, far, far worse on kernels prior to 3.8, especially on 3.x before 3.8, so it's hard to know how to take any report of problems on a 4 node NUMA machine without knowing the kernel version. >> knowing more about your hardware, OS, configuration, etc., might >> allow me to duplicate a problem so I can fix > >> For example, I used a "real" pg config, like I would for a production >> machine (because that seems to me to be the environment that is most >> important): the kernel is 3.13 (not one with pessimal scheduling) and >> has tuning for THP, the deadline scheduler, the vm.*dirty* settings, >> etc. Without knowing even the kernel and what tuning the OS and pg >> have had on your box, I could take a lot of shots in the dark without >> hitting anything. > > That shouldn't really matter much for a read-only, shared_buffer > resident, test? There's no IO and THP pretty much plays no role because > there's very few memory allocations (removing the pressure causing the > well known degradations). I hate to assume which differences matter without trying, but some of them seem less probable than others. >> Oh, and the output of `numactl --hardware` would be good to have. >> Thanks for all information you can provide. > > That was on Alexander's/PgPro's machine. Numactl wasn't installed, and I > didn't have root. But it has four numa domains (gathered via /sys/). On the machines I've used, it will give you the hardware report without being root. But of course, it can't do that if it's not installed. I hadn't yet seen a machine with multiple NUMA memory segments that didn't have the numactl executable installed; I'll keep in mind that can happen. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Tue, Apr 12, 2016 at 2:53 PM, Kevin Grittner <kgrittn@gmail.com> wrote: > Readonly with client and job counts matching scale. Single-socket i7, BTW. >> A lot of this will be different between >> single-socket and multi-socket servers; as soon as you have the latter >> the likelihood of contention being bad goes up dramatically. > > Yeah, I know, and 4 socket has been at least an order of magnitude > more problematic in my experience than 2 socket. And the problems > are far, far, far worse on kernels prior to 3.8, especially on 3.x > before 3.8, so it's hard to know how to take any report of problems > on a 4 node NUMA machine without knowing the kernel version. Also, with 4 node NUMA I have seen far better scaling with hyper-threading turned off. I know there are environments where it helps, but high-concurrency on multi-node NUMA is not one of them. So, anyway, mentioning the HT setting is important, too. Kevin Grittner
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Alvaro Herrera
Date:
Andres Freund wrote: > I'm kinda inclined to apply that portion (or just the whole patch with > the spurious #ifdef 0 et al fixed) into 9.6; and add the necessary > checks in a few places. Because I really think this is likely to hit > unsuspecting users. !!! Be sure to consult with the RMT before doing anything of the sort. It might as well decide to revert the whole patch. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-04-12 23:52:14 -0300, Alvaro Herrera wrote: > Andres Freund wrote: > > > I'm kinda inclined to apply that portion (or just the whole patch with > > the spurious #ifdef 0 et al fixed) into 9.6; and add the necessary > > checks in a few places. Because I really think this is likely to hit > > unsuspecting users. > > !!! > > Be sure to consult with the RMT before doing anything of the sort. I didn't plan to do anything without a few +1's. I don't think we can release with the state of things as is though. I don't see a less intrusive way than to get rid of that spinlock on all platforms capable of significant concurrency. So, RMT, what are your thoughts on this? Andres
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Robert Haas
Date:
On Tue, Apr 12, 2016 at 11:05 PM, Andres Freund <andres@anarazel.de> wrote: > On 2016-04-12 23:52:14 -0300, Alvaro Herrera wrote: >> Andres Freund wrote: >> > I'm kinda inclined to apply that portion (or just the whole patch with >> > the spurious #ifdef 0 et al fixed) into 9.6; and add the necessary >> > checks in a few places. Because I really think this is likely to hit >> > unsuspecting users. >> >> !!! >> >> Be sure to consult with the RMT before doing anything of the sort. > > I didn't plan to do anything without a few +1's. I don't think we can > release with the state of things as is though. I don't see a less > intrusive way than to get rid of that spinlock on all platforms capable > of significant concurrency. > > So, RMT, what are your thoughts on this? I think that a significant performance regression which affects people not using snapshot_too_old would be a stop-ship issue, but I disagree that an issue which only affects people using the feature is a must-fix. It may be desirable to fix it, but I don't think we should regard it as a hard requirement. It's reasonable to fix some kinds of issues after feature freeze, but not at the price of accepting arbitrary amounts of new code that may have problems of its own. Every release will have some warts. My testing yesterday of latest master, specifically deb71fa9713dfe374a74fc58a5d298b5f25da3f5, last night did not show evidence of a regression under heavy concurrency, as per http://www.postgresql.org/message-id/CA+TgmobpHAqsOeHc-ooRsjzTKw1H4s4P1VBtwh1KkKO+6Mp8_Q@mail.gmail.com - that test was of course run without enabling "snapshot too old". My guess is that 2201d801b03c2d1b0bce4d6580b718dc34d38b3e was sufficient to put things right, and that we now have a problem only when "snapshot too old" is enabled. I have never understood why you didn't include 64-bit atomics in the original atomics implementation, and I really think we should have committed a patch to add them long before now. Also noteworthy is the fact that, by itself, such a patch cannot break anything except perhaps the build, for, lo!, unused macros and functions do not do anything. On the whole, I think that putting such a patch into PostgreSQL 9.6 is likely to save us more pain than it causes us. I would be disinclined to endorse applying part of it, because that seems likely to complicate back-patching for no real gain. Of course, the real fly in the ointment here is what we're going to do with the atomics once we have them. But AFAICS, there's no patch for that, yet. I don't think that I wish to take a position on whether a patch that hasn't been written yet should be applied. So I think the next step is that you should post the patches that you think should be applied in final form and those should be reviewed by knowledgeable people. Then, based on those reviews, the RMT can decide what to do. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes: > I have never understood why you didn't include 64-bit atomics in the > original atomics implementation, and I really think we should have > committed a patch to add them long before now. What will you do on 32-bit platforms (or, more generally, anything lacking 64-bit-wide atomics)? regards, tom lane
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Alvaro Herrera
Date:
Robert Haas wrote: > On Tue, Apr 12, 2016 at 11:05 PM, Andres Freund <andres@anarazel.de> wrote: > > I didn't plan to do anything without a few +1's. I don't think we can > > release with the state of things as is though. I don't see a less > > intrusive way than to get rid of that spinlock on all platforms capable > > of significant concurrency. > > > > So, RMT, what are your thoughts on this? > > I think that a significant performance regression which affects people > not using snapshot_too_old would be a stop-ship issue, Agreed. > but I disagree that an issue which only affects people using the > feature is a must-fix. Agreed. > It's reasonable to fix some kinds of > issues after feature freeze, but not at the price of accepting > arbitrary amounts of new code that may have problems of its own. > Every release will have some warts. Agreed. The patch being proposed for commit is fiddly architecture-specific stuff which is likely to destabilize the tree for quite some time, and cause lots of additional work to Andres and anyone else likely to work on such low-level details, such as Robert, both of which already have plenty to do. The snapshot-too-old feature is said to be great and shows lots of improvement in certain cases, and no regression can be measured for those who have it turned off. The regression only seems to show up if you turn it on and have a crazily high rate of read-only transactions. I think this can wait for 9.7. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Robert Haas
Date:
On Wed, Apr 13, 2016 at 9:52 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I have never understood why you didn't include 64-bit atomics in the >> original atomics implementation, and I really think we should have >> committed a patch to add them long before now. > > What will you do on 32-bit platforms (or, more generally, anything > lacking 64-bit-wide atomics)? We fall back to emulating it using spinlocks. This isn't really an issue in practice because 32-bit x86 has native 64-bit atomics, and it's hard to point to another 32-bit platform that is likely to be have enough concurrency for the lack of 64-bit atomics to matter. Actually, it looks like we have 64-bit atomics in the tree already; it's only the fallback implementation that is missing (so anything you do using 64-bit atomics would need an alternate implementation that did not rely on them). But the really interesting that the patch to which Andres linked does is introduce machinery to try to determine whether a platform has 8-byte single-copy atomicity; that is, whether a load or store of an aligned 8-byte value is guaranteed not to be torn. We currently avoid assuming that, but this requires additional spinlocks in a significant number of places; the regression seen using "snapshot too old" at high concurrency is merely the tip of the iceberg. And the annoying thing about avoiding that assumption is that it actually is true on pretty much every modern platform. Look at this gem Andres wrote in that patch: +/* + * 8 byte reads / writes have single-copy atomicity on 32 bit x86 platforms + * since at least the 586. As well as on all x86-64 cpus. + */ +#if defined(__i568__) || defined(__i668__) || /* gcc i586+ */ \ + (defined(_M_IX86) && _M_IX86 >= 500) || /* msvc i586+ */ \ + defined(__x86_64__) || defined(__x86_64) || defined(_M_X64) /* gcc, sunpro, msvc */ +#define PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY +#endif /* 8 byte single-copy atomicity */ I don't know if that test is actually correct, and I wonder about compile-time environment vs. run-time environment, but I have my doubts about how well PostgreSQL 9.6 would run on an i486. I doubt that is the platform for which we should be optimizing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Apr 13, 2016 at 9:52 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> I have never understood why you didn't include 64-bit atomics in the >>> original atomics implementation, and I really think we should have >>> committed a patch to add them long before now. >> What will you do on 32-bit platforms (or, more generally, anything >> lacking 64-bit-wide atomics)? > We fall back to emulating it using spinlocks. That's what I thought you were going to say, and it means that any "performance improvement" patch that relies on 64-bit atomics in hotspot code paths is going to be a complete disaster on anything but modern Intel hardware. I'm not sure that's a direction we want to go in. We need to stick to a set of atomics that's pretty widely portable. > This isn't really an > issue in practice because 32-bit x86 has native 64-bit atomics, and > it's hard to point to another 32-bit platform that is likely to be > have enough concurrency for the lack of 64-bit atomics to matter. It's not concurrency I'm worried about, it's the sheer overhead of going through the spinlock code. I'd be okay with atomics that were defined as "pointer width", if we have a need for that, but I'm suspicious of 64-bits-exactly. regards, tom lane
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Robert Haas
Date:
On Wed, Apr 13, 2016 at 10:20 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Wed, Apr 13, 2016 at 9:52 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Robert Haas <robertmhaas@gmail.com> writes: >>>> I have never understood why you didn't include 64-bit atomics in the >>>> original atomics implementation, and I really think we should have >>>> committed a patch to add them long before now. > >>> What will you do on 32-bit platforms (or, more generally, anything >>> lacking 64-bit-wide atomics)? > >> We fall back to emulating it using spinlocks. > > That's what I thought you were going to say, and it means that any > "performance improvement" patch that relies on 64-bit atomics in hotspot > code paths is going to be a complete disaster on anything but modern Intel > hardware. I'm not sure that's a direction we want to go in. We need to > stick to a set of atomics that's pretty widely portable. I think 64-bit atomics *are* pretty widely portable. Can you name a system with more than 4 CPU cores that doesn't support them? >> This isn't really an >> issue in practice because 32-bit x86 has native 64-bit atomics, and >> it's hard to point to another 32-bit platform that is likely to be >> have enough concurrency for the lack of 64-bit atomics to matter. > > It's not concurrency I'm worried about, it's the sheer overhead of > going through the spinlock code. I'm not sure I understand exactly what the concern is here. I agree that there is a possibility that any patch which uses 64-bit atomics could regress performance on platforms that do not support 64-bit atomics. That's why I argued initially against having fallbacks for *any* atomic operations; I was of the opinion that we should be prepared to carry two implementations of anything that was going to depend on atomics. I lost that argument, perhaps for the best. I think one of the problems here is that very few of us have any hardware available which we could even use to test performance on systems that lack support for both 32 and 64 bit atomics. We can compile without atomics on the hardware we do have and see how that goes, but that's not necessarily indicative of what will happen on some altogether different CPU architecture. In some cases there might be an emulator, like the VAX emulator Greg Stark was playing with, but that's not necessarily indicative either, and also, really, who cares? I think it would be cool if somebody started a project to try to optimize the performance of PostgreSQL on, say, a Raspberry Pi. Then we might learn whether any of this stuff actually matters there or whether the problems are completely elsewhere (like too much per-backend memory consumption). However, for reasons that are probably sort of obvious, I doubt I'll have much luck getting EnterpriseDB to fund work on that project - if it ever happens, it will probably have to be the work of a dedicated hobbiest, or somebody who has a tangible need to build an embedded system using PostgreSQL. > I'd be okay with atomics that were defined as "pointer width", if > we have a need for that, but I'm suspicious of 64-bits-exactly. I think LSNs are an important case, and they are not pointer width. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Apr 13, 2016 at 10:20 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> That's what I thought you were going to say, and it means that any >> "performance improvement" patch that relies on 64-bit atomics in hotspot >> code paths is going to be a complete disaster on anything but modern Intel >> hardware. I'm not sure that's a direction we want to go in. We need to >> stick to a set of atomics that's pretty widely portable. > I think 64-bit atomics *are* pretty widely portable. Can you name a > system with more than 4 CPU cores that doesn't support them? No, you're ignoring my point, which is what happens on single-CPU 32-bit machines, and whether we aren't going to destroy performance on low-end machines in pursuit of better performance on high-end. Now, to the extent that a patch uses a 64-bit atomic op to replace a spinlock acquisition, it might be pretty much a wash if low-end machines have to use a spinlock to emulate the atomic op. But it would be really easy for the translation to replace one spinlock acquisition with multiple spinlock acquisitions, and that would hurt. regards, tom lane
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-04-13 08:36:47 -0400, Robert Haas wrote: > I think that a significant performance regression which affects people > not using snapshot_too_old would be a stop-ship issue, but I disagree > that an issue which only affects people using the feature is a > must-fix. It may be desirable to fix it, but I don't think we should > regard it as a hard requirement. It's reasonable to fix some kinds of > issues after feature freeze, but not at the price of accepting > arbitrary amounts of new code that may have problems of its own. > Every release will have some warts. My problem with that is that snapshot-too-old is essentially a efficiency feature for busy and large databases. Regressing noticeably when it's enabled in it's natural habitat seems sad. > Of course, the real fly in the ointment here is what we're going to do > with the atomics once we have them. But AFAICS, there's no patch for > that, yet. I don't think that I wish to take a position on whether a > patch that hasn't been written yet should be applied. So I think the > next step is that you should post the patches that you think should be > applied in final form and those should be reviewed by knowledgeable > people. Then, based on those reviews, the RMT can decide what to do. Well, I'm less likely to write a patch when there's no chance that it's going to be applied. Which the rest of the thread sounds like... Greetings, Andres Freund
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Robert Haas
Date:
On Wed, Apr 13, 2016 at 10:42 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Wed, Apr 13, 2016 at 10:20 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> That's what I thought you were going to say, and it means that any >>> "performance improvement" patch that relies on 64-bit atomics in hotspot >>> code paths is going to be a complete disaster on anything but modern Intel >>> hardware. I'm not sure that's a direction we want to go in. We need to >>> stick to a set of atomics that's pretty widely portable. > >> I think 64-bit atomics *are* pretty widely portable. Can you name a >> system with more than 4 CPU cores that doesn't support them? > > No, you're ignoring my point, which is what happens on single-CPU > 32-bit machines, and whether we aren't going to destroy performance > on low-end machines in pursuit of better performance on high-end. > > Now, to the extent that a patch uses a 64-bit atomic op to replace > a spinlock acquisition, it might be pretty much a wash if low-end > machines have to use a spinlock to emulate the atomic op. But it > would be really easy for the translation to replace one spinlock > acquisition with multiple spinlock acquisitions, and that would hurt. One of us is confused, or we're just talking past each other, because I don't think I'm ignoring your point at all. In fact, I think I just responded to it rather directly. I agree that the exact risk you are describing exists. However, the multiple spinlock cycles that you are concerned about will only occur on a platform that doesn't support 64-bit atomics. In order to test whether there is a performance problem on such hardware, or how serious that problem is, we'd need to have access to such hardware, and I don't know where to find any such hardware. Do you? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-04-13 11:08:21 -0300, Alvaro Herrera wrote: > The patch being proposed for commit is fiddly architecture-specific > stuff which is likely to destabilize the tree for quite some time, and > cause lots of additional work to Andres and anyone else likely to work > on such low-level details, such as Robert, both of which already have > plenty to do. Personally I think this is an 9.6 open-item, and primarily Kevin has to work on it. Note that there really shouldn't be too much fiddly bits, we've already had 64bit atomics, just no fallback. This is just copying the fallback code from 32bit atomics to 64bit atomics. But what I'm actually proposing isn't even using the 64bit atomics from that patch, just to add --- a/src/include/port/atomics/arch-ppc.h +++ b/src/include/port/atomics/arch-ppc.h @@ -24,3 +24,6 @@#define pg_read_barrier_impl() __asm__ __volatile__ ("lwsync" : : : "memory")#define pg_write_barrier_impl() __asm__ __volatile__ ("lwsync" : : : "memory")#endif + +/* per architecture manual doubleword accesses have single copy atomicity */ +#define PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY to the appropriate files (ia64, ppc, x86) and then add an #ifndef PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY to GetXLogInsertRecPtr's acquisition of the spinlock. I.e. #ifndef PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITYSpinLockAcquire(&Insert->insertpos_lck); #endif current_bytepos = Insert->CurrBytePos; #ifndef PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITYSpinLockRelease(&Insert->insertpos_lck); #endif not because I think it's perfectly pretty that way, but because it's very easy to demonstrate that there's no regressions for anybody. > The regression only seems to show up if you turn it on and have a > crazily high rate of read-only transactions. I think this can wait > for 9.7. I don't think 120k read tps is all that high anymore these days. And you can easily create scenarios that are *much* worse than pgbench. E.g. a loop in a volatile plpgsql function will acquire Greetings, Andres Freund
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Robert Haas
Date:
On Wed, Apr 13, 2016 at 11:01 AM, Andres Freund <andres@anarazel.de> wrote: > Well, I'm less likely to write a patch when there's no chance that it's > going to be applied. Which the rest of the thread sounds like... I hope somebody writes it at some point, because we surely want to fix this for 9.7. However, I agree that there seems to be a tangible lack of enthusiasm for doing anything about it right now. I'm slightly surprised by that, but that's OK: I just work here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-04-13 10:42:03 -0400, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > On Wed, Apr 13, 2016 at 10:20 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> That's what I thought you were going to say, and it means that any > >> "performance improvement" patch that relies on 64-bit atomics in hotspot > >> code paths is going to be a complete disaster on anything but modern Intel > >> hardware. I'm not sure that's a direction we want to go in. We need to > >> stick to a set of atomics that's pretty widely portable. > > > I think 64-bit atomics *are* pretty widely portable. Can you name a > > system with more than 4 CPU cores that doesn't support them? > > No, you're ignoring my point, which is what happens on single-CPU > 32-bit machines, and whether we aren't going to destroy performance > on low-end machines in pursuit of better performance on high-end. I think generally the only platform of concern wrt is arm (< armv8), which doesn't have 64bit atomicity and doesn't have single-copy-atomicity for 8 byte values either (C.f. https://wiki.postgresql.org/wiki/Atomics). But: > Now, to the extent that a patch uses a 64-bit atomic op to replace > a spinlock acquisition, it might be pretty much a wash if low-end > machines have to use a spinlock to emulate the atomic op. But it > would be really easy for the translation to replace one spinlock > acquisition with multiple spinlock acquisitions, and that would hurt. Which is why I'm actually proposing to *not* use a pg_atomic_uint64, just a single define to remove the spinlock acquisition: http://archives.postgresql.org/message-id/20160413150839.mevdlgekizxyjhc5%40alap3.anarazel.de I think there are a number of LSNs which we'd be better of replacing LSN manipulations with an actual atomic operation (including fallback to the spinlock) might be beneficial. But that'd be a larger patch, and would require more testing; which is why I'm proposing the above. Greetings, Andres Freund
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Robert Haas
Date:
On Wed, Apr 13, 2016 at 11:18 AM, Andres Freund <andres@anarazel.de> wrote: > I think generally the only platform of concern wrt is arm (< armv8), > which doesn't have 64bit atomicity and doesn't have > single-copy-atomicity for 8 byte values either (C.f. > https://wiki.postgresql.org/wiki/Atomics). That page is sort of confusing, because it says that platform has those things but then says ***, which is footnoted to mean "linux kernel emulation available", but it's not too clear whether that applies to all atomics or just 8-byte atomics. The operator precedence of / (used as a separator) vs. footnotes is not stated. It's also not clear what "linux kernel emulation available" actually means. Should we think of those things being fast, or slow? At any rate, I do actually have a Raspberry Pi 2 here so if we ever commit a patch that might suck without real 64-bit atomics we might be able to actuall test whether it does or not. But as you say, no such patch is being proposed at the moment. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Apr 13, 2016 at 10:42 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> No, you're ignoring my point, which is what happens on single-CPU >> 32-bit machines, and whether we aren't going to destroy performance >> on low-end machines in pursuit of better performance on high-end. > One of us is confused, or we're just talking past each other, because > I don't think I'm ignoring your point at all. In fact, I think I just > responded to it rather directly. I agree that the exact risk you are > describing exists. However, the multiple spinlock cycles that you are > concerned about will only occur on a platform that doesn't support > 64-bit atomics. In order to test whether there is a performance > problem on such hardware, or how serious that problem is, we'd need to > have access to such hardware, and I don't know where to find any such > hardware. Do you? As Andres says, low-end ARM machines are probably the most common such hardware right now. I have two non-ARM machines in the buildfarm that certainly haven't got such instructions (prairiedog and gaur/pademelon). Now I wouldn't propose that we need to concern ourselves very much with performance on those two decade-plus-old platforms, but I do think that performance on small ARM machines is still of interest. regards, tom lane
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Wed, Apr 13, 2016 at 10:01 AM, Andres Freund <andres@anarazel.de> wrote: > My problem with that is that snapshot-too-old is essentially a > efficiency feature for busy and large databases. Regressing noticeably > when it's enabled in it's natural habitat seems sad. With a real-world application with realistic simulated user load there was no such regression and a big gain in performance over time, so we're talking about adjusting how broad a range of workloads it benefits. I don't have a strong opinion yet, since I haven't run the benchmarks on big machines (scheduled for the day after tomorrow); but as an example, if I only see such regression on a Linux kernel with version a version < 3.8 I am going to be less concerned about getting something into 9.6, since IMO it is completely irresponsible to run a NUMA machine with 4 or more nodes on an OS with a substandard NUMA scheduler. I'm not sure when 3.8 became available, but according to Wikipedia Version 3.10 of the Linux kernel was released in June 2013, so it's not like you need to be on the bleeding edge to have a decent scheduler. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-04-13 11:27:09 -0400, Robert Haas wrote: > That page is sort of confusing, because it says that platform has > those things but then says ***, which is footnoted to mean "linux > kernel emulation available", but it's not too clear whether that > applies to all atomics or just 8-byte atomics. The operator > precedence of / (used as a separator) vs. footnotes is not stated. / has a higher precedence than footnotes. Not sure how to make that easily clear. I'm not exactly a mediawiki expert. > It's also not clear what "linux kernel emulation available" actually > means. Should we think of those things being fast, or slow? Slow. It means that the compiler generates a syscall to perform the atomic. The syscall disables preemption, then performs the actual math, re-enables preemption, and returns. That's a lot more expensive than a spinlock. There's /** 64 bit atomics on arm are implemented using kernel fallbacks and might be* slow, so disable entirely for now.* XXX: Wemight want to change that at some point for AARCH64*/ #define PG_DISABLE_64_BIT_ATOMICS for that reason (in the current tree, not patch). The whole fallback facility exists to make it easier to port software to arm; but I wouldn't want to rely on it if not necessary. Greetings, Andres Freund
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-04-13 10:31:19 -0500, Kevin Grittner wrote: > With a real-world application with realistic simulated user load > there was no such regression and a big gain in performance over > time, so we're talking about adjusting how broad a range of > workloads it benefits. I think it depends very heavily on the type of application. To be affected you need a high rate of snapshot acquisitions. So lots of small statements, or possibly longer running stuff involving volatile functions (which IIRC get new snapshots continually). > but as an example, if I only see such regression on a Linux kernel > with version a version < 3.8 I am going to be less concerned about > getting something into 9.6, since IMO it is completely irresponsible > to run a NUMA machine with 4 or more nodes on an OS with a substandard > NUMA scheduler. I'm not sure when 3.8 became available, but according > to Wikipedia Version 3.10 of the Linux kernel was released in June > 2013, so it's not like you need to be on the bleeding edge to have a > decent scheduler. I don't think effect of adding a single spinlock (an exclusive lock!) in a hot path is likely to be hugely dependant on the kernel version. We've had such cases before, and felt the pain. E.g. the spinlock in the ProcArrayLock used to be a *HUGE* contention point, and it has pretty much the same acquisition pattern as this spinlock now. Greetings, Andres Freund
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Wed, Apr 13, 2016 at 10:59 AM, Andres Freund <andres@anarazel.de> wrote: >> but as an example, if I only see such regression on a Linux kernel >> with version a version < 3.8 I am going to be less concerned about >> getting something into 9.6, since IMO it is completely irresponsible >> to run a NUMA machine with 4 or more nodes on an OS with a substandard >> NUMA scheduler. I'm not sure when 3.8 became available, but according >> to Wikipedia Version 3.10 of the Linux kernel was released in June >> 2013, so it's not like you need to be on the bleeding edge to have a >> decent scheduler. > > I don't think effect of adding a single spinlock (an exclusive lock!) in > a hot path is likely to be hugely dependant on the kernel version. My experience is that is easily can be. We had a customer who could not scale beyond a certain point due to spinlock contention on a single spinlock already present in stock pg. We tried lots of config tweaks and a some custom patches to no avail. Then we had them upgrade from RHEL 6.latest to RHEL 7.latest, and they could scale much, much farther. No OS or pg config changes were made at the same time. The difference is that they went from kernel version kernel 2.6.32 to kernel version 3.10.0. The early version 3 kernels had a NUMA scheduler rewrite that was a disaster compared to 2.6.32. They rewrote it again in 3.8, with dramatic effect. > We've had such cases before, and felt the pain. E.g. the spinlock in the > ProcArrayLock used to be a *HUGE* contention point, and it has pretty > much the same acquisition pattern as this spinlock now. It would be great to have improvements in such access patterns, no doubt. I'll be happy if we get there. I don't have a problem trying to contribute to the effort, either, if people think that might actually be a net gain. But if we have a point where those not using the new feature are unaffected, and the question is about the range of workloads where the new feature will be helpful in 9.6, it doesn't seem to me to rise to the level of a bug or a release blocker. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-04-12 14:53:57 -0500, Kevin Grittner wrote: > On Tue, Apr 12, 2016 at 2:28 PM, Andres Freund <andres@anarazel.de> wrote: > > On 2016-04-12 14:17:12 -0500, Kevin Grittner wrote: > >> Well, something is different between your environment and mine, > >> since I saw no difference at scale 100 and 2.2% at scale 200. > > > > In a readonly test or r/w? > > Readonly with client and job counts matching scale. > > > A lot of this will be different between > > single-socket and multi-socket servers; as soon as you have the latter > > the likelihood of contention being bad goes up dramatically. > > Yeah, I know, and 4 socket has been at least an order of magnitude > more problematic in my experience than 2 socket. And the problems > are far, far, far worse on kernels prior to 3.8, especially on 3.x > before 3.8, so it's hard to know how to take any report of problems > on a 4 node NUMA machine without knowing the kernel version. On an EC2 m4.10xlarge (dedicated, but still a VM) - sorry I don't have anything better at hand right now, and it was already running. postgres config: postgres -D /srv/data/dev/ -c shared_buffers=64GB \ -c max_wal_size=64GB \ -c maintenance_work_mem=32GB \ -c huge_pages=on \ -c max_connections=400 \ -c logging_collector=on -c log_filename='postgresql.log' \ -c log_checkpoints=on -c autovacuum=off \ -c autovacuum_freeze_max_age=80000000 \ -c synchronous_commit=off Initialized with pgbench -q -i -s 300 Before each run I prewarmed with psql -c "create extension if not exists pg_prewarm;select sum(x.x) from (select pg_prewarm(oid) as x from pg_class whererelkind in ('i', 'r') order by oid) x;" > /dev/null 2>&1; running pgbench -M prepared -c 128 -j 128 -n -P 1 -T 100 -S With -c old_snapshot_threshold=0: latency average = 0.218 ms latency stddev = 0.154 ms tps = 584666.289753 (including connections establishing) tps = 584867.785569 (excluding connections establishing) With -c old_snapshot_threshold=10: latency average = 1.112 ms latency stddev = 1.246 ms tps = 114883.528964 (including connections establishing) tps = 114905.555943 (excluding connections establishing) With 848ef42bb8c7909c9d7baa38178d4a209906e7c1 (and followups) reverted: latency average = 0.210 ms latency stddev = 0.050 ms tps = 607734.407158 (including connections establishing) tps = 607918.118566 (excluding connections establishing) A quicker (each -T 10) test, without restarts between scale reuns, of other scales: scale thres=0 thresh=10 1 15377.761645 15017.789751 1 16285.111754 14829.493870 2 29563.478651 28790.462964 4 62649.628931 50935.364141 8 84557.464387 85631.348766 16 101475.002295 93908.910894 32 347435.607586 167702.527893 64 575640.880911 150139.375351 128 594782.154256 112183.933956 196 584290.957806 92080.129402 256 583921.995839 79345.378887 398 582138.372414 58100.798609 - Andres
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Robert Haas
Date:
On Wed, Apr 13, 2016 at 1:19 PM, Andres Freund <andres@anarazel.de> wrote: > On an EC2 m4.10xlarge (dedicated, but still a VM) - sorry I don't have > anything better at hand right now, and it was already running. > > postgres config: > postgres -D /srv/data/dev/ > -c shared_buffers=64GB \ > -c max_wal_size=64GB \ > -c maintenance_work_mem=32GB \ > -c huge_pages=on \ > -c max_connections=400 \ > -c logging_collector=on -c log_filename='postgresql.log' \ > -c log_checkpoints=on -c autovacuum=off \ > -c autovacuum_freeze_max_age=80000000 \ > -c synchronous_commit=off > > Initialized with pgbench -q -i -s 300 > > Before each run I prewarmed with > psql -c "create extension if not exists pg_prewarm;select sum(x.x) from (select pg_prewarm(oid) as x from pg_class whererelkind in ('i', 'r') order by oid) x;" > /dev/null 2>&1; > > running pgbench -M prepared -c 128 -j 128 -n -P 1 -T 100 -S > > With -c old_snapshot_threshold=0: > > latency average = 0.218 ms > latency stddev = 0.154 ms > tps = 584666.289753 (including connections establishing) > tps = 584867.785569 (excluding connections establishing) > > > With -c old_snapshot_threshold=10: > > latency average = 1.112 ms > latency stddev = 1.246 ms > tps = 114883.528964 (including connections establishing) > tps = 114905.555943 (excluding connections establishing) > > > With 848ef42bb8c7909c9d7baa38178d4a209906e7c1 (and followups) reverted: > latency average = 0.210 ms > latency stddev = 0.050 ms > tps = 607734.407158 (including connections establishing) > tps = 607918.118566 (excluding connections establishing) Yuck. Aside from the fact that performance tanks when the feature is turned on, it seems that there is a significant effect even with it turned off. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-04-13 13:25:14 -0400, Robert Haas wrote: > > With -c old_snapshot_threshold=0: > > > > latency average = 0.218 ms > > latency stddev = 0.154 ms > > tps = 584666.289753 (including connections establishing) > > tps = 584867.785569 (excluding connections establishing) > > > > > > With -c old_snapshot_threshold=10: > > > > latency average = 1.112 ms > > latency stddev = 1.246 ms > > tps = 114883.528964 (including connections establishing) > > tps = 114905.555943 (excluding connections establishing) > > > > > > With 848ef42bb8c7909c9d7baa38178d4a209906e7c1 (and followups) reverted: > > latency average = 0.210 ms > > latency stddev = 0.050 ms > > tps = 607734.407158 (including connections establishing) > > tps = 607918.118566 (excluding connections establishing) > > Yuck. Aside from the fact that performance tanks when the feature is > turned on A quick look at the former shows that it's primarily contention around the new OldSnapshotTimeMapLock not, on that hardware in that workload, the spinlock. Which isn't that surprising because it adds an exclusive lock to a path which doesn't contain any other exclusive locks these days... I have to say, I'm *highly* doubtful that it's ok to add an exclusive lock in a readonly workload to such an hot path, without any clear path forward how to fix that scalability issue. This doesn't apear to be requiring just a bit of elbow grease, but a fair bit more. > it seems that there is a significant effect even with it turned off. It looks that way, but I'd rather run a bit more careful and repeated tests to make sure about that part. At a factor of 5, as with the on/off tests, per-run varitions don't play a large role, but at smaller percentages it's worthwhile to put more care into it. If possible it'd be helpful to avoid a VM too... Andres
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Wed, Apr 13, 2016 at 12:25 PM, Robert Haas <robertmhaas@gmail.com> wrote: > [test results with old_snapshot_threshold = 0 and 10] From the docs: | A value of -1 disables this feature, and is the default. > Yuck. Aside from the fact that performance tanks when the feature is > turned on, it seems that there is a significant effect even with it > turned off. No evidence of that has been provided. -1 is off; 0 is for testing very fast expiration. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-04-13 13:52:15 -0500, Kevin Grittner wrote: > On Wed, Apr 13, 2016 at 12:25 PM, Robert Haas <robertmhaas@gmail.com> wrote: > > > [test results with old_snapshot_threshold = 0 and 10] > > From the docs: > > | A value of -1 disables this feature, and is the default. Hm, ok, let me run that as well then. The reason for the massive performance difference presumably is that MaintainOldSnapshotTimeMapping() is cut short due to /* No further tracking needed for 0 (used for testing). */ if (old_snapshot_threshold == 0) return; which means that OldSnapshotTimeMap isn't acquired exclusively. > > Yuck. Aside from the fact that performance tanks when the feature is > > turned on, it seems that there is a significant effect even with it > > turned off. > > No evidence of that has been provided. -1 is off; 0 is for testing > very fast expiration. I'll run with -1 once the current (longer) run has finished. Greetings, Andres Freund
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Wed, Apr 13, 2016 at 1:56 PM, Andres Freund <andres@anarazel.de> wrote: > I'll run with -1 once the current (longer) run has finished. Just for the record, were any of the other results purporting to be with the feature "off" also actually running with the feature set for its fastest possible timeout? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Robert Haas
Date:
On Wed, Apr 13, 2016 at 3:08 PM, Kevin Grittner <kgrittn@gmail.com> wrote: > On Wed, Apr 13, 2016 at 1:56 PM, Andres Freund <andres@anarazel.de> wrote: > >> I'll run with -1 once the current (longer) run has finished. > > Just for the record, were any of the other results purporting to be > with the feature "off" also actually running with the feature set > for its fastest possible timeout? Mine were testing something else entirely, so I didn't touch old_snapshot_threshold at all. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-04-13 13:52:15 -0500, Kevin Grittner wrote: > On Wed, Apr 13, 2016 at 12:25 PM, Robert Haas <robertmhaas@gmail.com> wrote: > > > [test results with old_snapshot_threshold = 0 and 10] > > From the docs: > > | A value of -1 disables this feature, and is the default. > > > Yuck. Aside from the fact that performance tanks when the feature is > > turned on, it seems that there is a significant effect even with it > > turned off. > > No evidence of that has been provided. -1 is off; 0 is for testing > very fast expiration. Longer tests are running, but, again on the previous hardware with only two sockets, the results for 128 clients are: 0: progress: 100.0 s, 593351.0 tps, lat 0.215 ms stddev 0.118 progress: 200.0 s, 594035.9 tps, lat 0.215 ms stddev 0.118 progress: 300.0 s, 594013.3 tps, lat 0.215 ms stddev 0.117 -1: progress: 100.0 s, 600835.3 tps, lat 0.212 ms stddev 0.049 progress: 200.0 s, 601466.1 tps, lat 0.212 ms stddev 0.048 progress: 300.0 s, 601529.5 tps, lat 0.212 ms stddev 0.047 reverted: progress: 100.0 s, 612676.6 tps, lat 0.208 ms stddev 0.048 progress: 200.0 s, 613214.3 tps, lat 0.208 ms stddev 0.047 progress: 300.0 s, 613384.3 tps, lat 0.208 ms stddev 0.047 This is all on virtualized (though using a dedicated instance) hardware. So they numbers are to be taken with a grain of salt. But I did run shorter tests in various orders, and the runtime difference apears to be very small. - Andres
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-04-13 14:08:49 -0500, Kevin Grittner wrote: > On Wed, Apr 13, 2016 at 1:56 PM, Andres Freund <andres@anarazel.de> wrote: > > > I'll run with -1 once the current (longer) run has finished. > > Just for the record, were any of the other results purporting to be > with the feature "off" also actually running with the feature set > for its fastest possible timeout? Yes, I'd only used 0 / 10. I think that shows that the contention, for me, is primarily the lwlock, not the spinlock. Greetings, Andres Freund
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
Hi Kevin, On 2016-04-13 12:21:10 -0700, Andres Freund wrote: > 0: > progress: 100.0 s, 593351.0 tps, lat 0.215 ms stddev 0.118 > progress: 200.0 s, 594035.9 tps, lat 0.215 ms stddev 0.118 > progress: 300.0 s, 594013.3 tps, lat 0.215 ms stddev 0.117 > > -1: > progress: 100.0 s, 600835.3 tps, lat 0.212 ms stddev 0.049 > progress: 200.0 s, 601466.1 tps, lat 0.212 ms stddev 0.048 > progress: 300.0 s, 601529.5 tps, lat 0.212 ms stddev 0.047 > > reverted: > progress: 100.0 s, 612676.6 tps, lat 0.208 ms stddev 0.048 > progress: 200.0 s, 613214.3 tps, lat 0.208 ms stddev 0.047 > progress: 300.0 s, 613384.3 tps, lat 0.208 ms stddev 0.047 Setting it to 1 gives: progress: 100.0 s, 115413.7 tps, lat 1.107 ms stddev 1.240 progress: 200.0 s, 114907.4 tps, lat 1.113 ms stddev 1.244 progress: 300.0 s, 115621.4 tps, lat 1.106 ms stddev 1.238 If you want me to rn some other tests I can, but ISTM we have the data we need? - Andres
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Wed, Apr 13, 2016 at 3:01 PM, Andres Freund <andres@anarazel.de> wrote: > If you want me to rn some other tests I can, but ISTM we have the > data we need? Thanks for the additional detail on how this was run. I think I still need a little more context, though: What is the kernel on which these tests were run? Which pg commit were these tests run against? If 2201d801 was not included in your -1 tests, have you identified where the 2% extra run time is going on -1 versus reverted? Since several other threads lately have reported bigger variation than that based on random memory alignment issues, can we confirm that this is a real difference in what is at master's HEAD? Of course, I'm still scheduled to test on bare metal machines in a couple days, on two different architectures, so we'll have a few more data points after that. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-04-13 15:21:31 -0500, Kevin Grittner wrote: > On Wed, Apr 13, 2016 at 3:01 PM, Andres Freund <andres@anarazel.de> wrote: > > > If you want me to rn some other tests I can, but ISTM we have the > > data we need? > > Thanks for the additional detail on how this was run. I think I > still need a little more context, though: > > What is the kernel on which these tests were run? 3.16. I can upgrade to 4.4 if necessary. But I still believe very strongly that this is side-tracking the issue. An exclusive lock (or spinlock) in a very hot path, which previously didn't have a specific exclusively locked lock, will present scalability issues, regardless of kernel. > Which pg commit were these tests run against? 85e00470. + some reverts (the whitespace commits make this harder...) in the reverted case. > If 2201d801 was not included in your -1 tests, have you identified > where the 2% extra run time is going on -1 versus reverted? No. It's hard to do good profiles on most virtualized hardware, since hardware performance counters are disabled. So you only can do OS sampling; which has a pretty big performance influence. I'm not entirely sure what you mean with "2201d801 was not included in your -1 tests". The optimization was present. > Since several other threads lately have reported bigger variation than > that based on random memory alignment issues, can we confirm that this > is a real difference in what is at master's HEAD? It's unfortunately hard to measure this conclusively here (and in general). I guess we'll have to look, on native hardware, where the difference comes from. The difference is smaller on my laptop, and my workstation is somewhere on a container ship, other physical hardware I do not have. Greetings, Andres Freund
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Wed, Apr 13, 2016 at 3:47 PM, Andres Freund <andres@anarazel.de> wrote: > On 2016-04-13 15:21:31 -0500, Kevin Grittner wrote: >> What is the kernel on which these tests were run? > > 3.16. I can upgrade to 4.4 if necessary. No, I'm not aware of any problems from 3.8 on. > But I still believe very strongly that this is side-tracking the issue. As long as I know it isn't a broken NUMA scheduler, or that there were fewer than four NUMA memory nodes, I consider it a non-issue. I just need to know whether it fits that problem profile to feel comfortable that I can interpret the results correctly. >> Which pg commit were these tests run against? > > 85e00470. + some reverts (the whitespace commits make this harder...) in > the reverted case. > > >> If 2201d801 was not included in your -1 tests, have you identified >> where the 2% extra run time is going on -1 versus reverted? > > No. It's hard to do good profiles on most virtualized hardware, since > hardware performance counters are disabled. So you only can do OS > sampling; which has a pretty big performance influence. > > I'm not entirely sure what you mean with "2201d801 was not included in > your -1 tests". The optimization was present. Sorry, the "not" was accidental -- I hate reverse logic errors like that. Based on the commit you used, I have my answer. Thanks. >> Since several other threads lately have reported bigger variation than >> that based on random memory alignment issues, can we confirm that this >> is a real difference in what is at master's HEAD? > > It's unfortunately hard to measure this conclusively here (and in > general). I guess we'll have to look, on native hardware, where the > difference comes from. The difference is smaller on my laptop, and my > workstation is somewhere on a container ship, other physical hardware I > do not have. OK, thanks. I can't think of anything else to ask for at this point. If you feel that you have enough to press for some particular course of action, go for it. Personally, I want to do some more investigation on those big machines. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-04-13 16:05:25 -0500, Kevin Grittner wrote: > OK, thanks. I can't think of anything else to ask for at this > point. If you feel that you have enough to press for some > particular course of action, go for it. I think we, at the very least, need a clear proposal how to resolve the scalability issue around OldSnapshotTimeMapLock in 9.6. Personally I think we shouldn't release with such a large regression due to a performance oriented feature; but if we do, we need to be confident that we can easily resolve it for 9.7. In contrast to the spinlock issue I don't see an easy way unfortunately. Without such a plan it seems too likely to go unfixed for a long time otherwise. > Personally, I want to do some more investigation on those big > machines. Sounds good, especially around the regression with the feature disabled. Andres
Re: [HACKERS] Re: pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Alexander Korotkov
Date:
On Thu, Apr 14, 2016 at 12:23 AM, Andres Freund <andres@anarazel.de> wrote:
On 2016-04-13 16:05:25 -0500, Kevin Grittner wrote:
> OK, thanks. I can't think of anything else to ask for at this
> point. If you feel that you have enough to press for some
> particular course of action, go for it.
I think we, at the very least, need a clear proposal how to resolve the
scalability issue around OldSnapshotTimeMapLock in 9.6. Personally I
think we shouldn't release with such a large regression due to a
performance oriented feature; but if we do, we need to be confident that
we can easily resolve it for 9.7. In contrast to the spinlock issue I
don't see an easy way unfortunately. Without such a plan it seems too
likely to go unfixed for a long time otherwise.
> Personally, I want to do some more investigation on those big
> machines.
Sounds good, especially around the regression with the feature disabled.
I've also run read-only test on 4x18 Intel machine between master and snapshot_too_old reverted. In particular, I've reverted following commits:
8b65cf4c5edabdcae45ceaef7b9ac236879aae50
848ef42bb8c7909c9d7baa38178d4a209906e7c1
80647bf65a03e232c995c0826ef394dad8d685fe
a6f6b78196a701702ec4ff6df56c346bdcf9abd2
2201d801b03c2d1b0bce4d6580b718dc34d38b3e
I've obtained following results.
clients master sto-reverted
1 13918 12997
2 26143 26728
4 50521 52539
8 104330 103785
10 129067 132606
20 255561 255844
30 368472 371359
40 444486 450429
50 489950 497705
60 563606 564385
70 710579 718860
80 916480 934170
90 1089917 1152961
100 1201337 1240055
110 1147208 1207727
120 1116256 1167681
130 1066475 1120891
140 1040379 1085904
150 974064 1022160
160 938396 976487
170 953636 978120
180 920772 953843
We can see small but certain regression after snapshot too old feature.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment
Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Noah Misch
Date:
On Wed, Apr 13, 2016 at 03:21:31PM -0500, Kevin Grittner wrote: > If 2201d801 was not included in your -1 tests, have you identified > where the 2% extra run time is going on -1 versus reverted? Since > several other threads lately have reported bigger variation than > that based on random memory alignment issues, can we confirm that > this is a real difference in what is at master's HEAD? If anyone wishes to confirm that, I recommend this method: http://www.postgresql.org/message-id/87vbitb2zp.fsf@news-spur.riddles.org.uk PostgreSQL has not required that from contributors, though. For putative regressions this small, we've either analyzed them theoretically or just dismissed them. The key judgment to finalize here is whether it's okay to release this feature given its current effect[1], when enabled, on performance. That is more controversial than the potential ~2% regression for old_snapshot_threshold=-1. Alvaro[2] and Robert[3] are okay releasing that way, and Andres[4] is not. If anyone else wants to weigh in, now is the time. [1] http://www.postgresql.org/message-id/20160413192110.fogwesjti3kxycnu@alap3.anarazel.de [2] http://www.postgresql.org/message-id/20160413140821.GA6568@alvherre.pgsql [3] http://www.postgresql.org/message-id/CA+TgmoZqN0xevR+1pZ6j-99-ZCBoOphr-23tiREb+QW1Eu=KOA@mail.gmail.com [4] http://www.postgresql.org/message-id/20160413212356.uv4velailmivnihh@alap3.anarazel.de
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-04-16 16:44:52 -0400, Noah Misch wrote: > That is more controversial than the potential ~2% regression for > old_snapshot_threshold=-1. Alvaro[2] and Robert[3] are okay releasing > that way, and Andres[4] is not. FWIW, I could be kinda convinced that it's temporarily ok, if there'd be a clear proposal on the table how to solve the scalability issue around MaintainOldSnapshotTimeMapping(). Postponing the optimization around something as trivial as a spinlock around reading an LSN is one thing, postponing something we don't know the solution to is anohter. Andres
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes: > On 2016-04-16 16:44:52 -0400, Noah Misch wrote: >> That is more controversial than the potential ~2% regression for >> old_snapshot_threshold=-1. Alvaro[2] and Robert[3] are okay releasing >> that way, and Andres[4] is not. > FWIW, I could be kinda convinced that it's temporarily ok, if there'd be > a clear proposal on the table how to solve the scalability issue around > MaintainOldSnapshotTimeMapping(). Postponing the optimization around > something as trivial as a spinlock around reading an LSN is one thing, > postponing something we don't know the solution to is anohter. The message Noah cited mentions only a 4% regression, but this one seems far worse: http://www.postgresql.org/message-id/20160413200148.bawmwjdmggbllhha@alap3.anarazel.de That's more than a 5X penalty, which seems like it would make the feature unusable; unless there is an argument that that's an extreme case that wouldn't be representative of most real-world usage. Which there may well be; I've not been following this thread carefully. regards, tom lane
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-04-16 17:52:44 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2016-04-16 16:44:52 -0400, Noah Misch wrote: > >> That is more controversial than the potential ~2% regression for > >> old_snapshot_threshold=-1. Alvaro[2] and Robert[3] are okay releasing > >> that way, and Andres[4] is not. > > > FWIW, I could be kinda convinced that it's temporarily ok, if there'd be > > a clear proposal on the table how to solve the scalability issue around > > MaintainOldSnapshotTimeMapping(). Postponing the optimization around > > something as trivial as a spinlock around reading an LSN is one thing, > > postponing something we don't know the solution to is anohter. > > The message Noah cited mentions only a 4% regression, but this one > seems far worse: > > http://www.postgresql.org/message-id/20160413200148.bawmwjdmggbllhha@alap3.anarazel.de > > That's more than a 5X penalty, which seems like it would make the > feature unusable; unless there is an argument that that's an extreme > case that wouldn't be representative of most real-world usage. > Which there may well be; I've not been following this thread carefully. The 4 % was with the feature disabled (in comparison to before it's introduction), we're not sure where that's coming from. But the 5x - and that was just on a mid-sized box - is with the feature enabled. - Andres
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes: > On 2016-04-16 17:52:44 -0400, Tom Lane wrote: >> That's more than a 5X penalty, which seems like it would make the >> feature unusable; unless there is an argument that that's an extreme >> case that wouldn't be representative of most real-world usage. >> Which there may well be; I've not been following this thread carefully. > The 4 % was with the feature disabled (in comparison to before it's > introduction), we're not sure where that's coming from. But the 5x - and > that was just on a mid-sized box - is with the feature enabled. 128 processors is a mid-sized box? Or if you didn't have 128 processors, why are you testing "-c 128 -j 128" cases? More seriously, the complaints here seem to center on performance in a read-only workload; but I don't actually see why you'd want to turn on this feature in a read-only, or even read-mostly, workload. It exists for the benefit of people who are trying to keep their pg_xlog/ directories reasonably sized, no? That doesn't sound very read-only-ish to me. regards, tom lane
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
Hi, On 2016-04-16 18:27:06 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2016-04-16 17:52:44 -0400, Tom Lane wrote: > >> That's more than a 5X penalty, which seems like it would make the > >> feature unusable; unless there is an argument that that's an extreme > >> case that wouldn't be representative of most real-world usage. > >> Which there may well be; I've not been following this thread carefully. > > > The 4 % was with the feature disabled (in comparison to before it's > > introduction), we're not sure where that's coming from. But the 5x - and > > that was just on a mid-sized box - is with the feature enabled. > > 128 processors is a mid-sized box? It has fewer. > Or if you didn't have 128 processors, why are you testing "-c 128 -j > 128" cases? I tried 128, because it's a random number I picket out of my hat. I've posted various different client numbers elsewhere in the thread. The machine (a VM, this isn't the best test!), has 20 cores / 40 hw threads afaik. But 128 connections on 40 "cpus" isn't unrealistic. Many workloads have a lot more connections and concurrent queries than cores - besides often being operationally easier, it's also sensible from a hardware utilization perspective. Due to latency effects individual connections frequently are idle; even if the client were issuing queries as fast as possible. > More seriously, the complaints here seem to center on performance in a > read-only workload; but I don't actually see why you'd want to turn on > this feature in a read-only, or even read-mostly, workload. In a purely read-only workload it's surely pointless. But I don't see why the results would be any benefit in a 75 read/25 write mix; which is probably already more write heavy than a lot of the actual workloads out there. > It exists for > the benefit of people who are trying to keep their pg_xlog/ directories > reasonably sized, no? That doesn't sound very read-only-ish to me. pg_xlog size? By my understanding it's there to cope with the bloat introduced by longrunning readonly transactions? Isn't the idea that "old snapshots" basically don't enforce their xmin anymore, allowing vacuum/hot pruning? Such old snapshots continue to work as long as they're not used to make visibility decisions about pages which have been modified "recently". The whole feature only is interesting if such old snapshots are likely to only access data that's not frequently modified. - Andres
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
David Steele
Date:
On 4/16/16 4:44 PM, Noah Misch wrote: > The key judgment to finalize here is whether it's okay to release this feature > given its current effect[1], when enabled, on performance. That is more > controversial than the potential ~2% regression for old_snapshot_threshold=-1. > Alvaro[2] and Robert[3] are okay releasing that way, and Andres[4] is not. If > anyone else wants to weigh in, now is the time. I'm in favor of releasing the feature even with the performance regression when enabled. First, there are use cases where a feature like this is absolutely critical. Second, I don't think it will improve and become performant without exposure to a wider audience. I think it's entirely within the PostgreSQL philosophy to release a feature that has warts and doesn't perform as well as we'd like as long as it is stable and does not corrupt data. In my opinion this feature meets these criteria and it is an important capability to add to PostgreSQL. -- -David david@pgmasters.net
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
> Second, I don't think it will improve and become performant without > exposure to a wider audience. Huh? The issue is a relatively simple to spot architectural issue (taking a single exclusive lock during snapshot acquiration which only needs shared locks otherwise) - I don't see how any input it's needed. And for that matter, I don't see why such a lock got through review.
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Amit Kapila
Date:
On Sun, Apr 17, 2016 at 2:26 AM, Andres Freund <andres@anarazel.de> wrote:
>
> On 2016-04-16 16:44:52 -0400, Noah Misch wrote:
> > That is more controversial than the potential ~2% regression for
> > old_snapshot_threshold=-1. Alvaro[2] and Robert[3] are okay releasing
> > that way, and Andres[4] is not.
>
> FWIW, I could be kinda convinced that it's temporarily ok, if there'd be
> a clear proposal on the table how to solve the scalability issue around
> MaintainOldSnapshotTimeMapping().
>
> On 2016-04-16 16:44:52 -0400, Noah Misch wrote:
> > That is more controversial than the potential ~2% regression for
> > old_snapshot_threshold=-1. Alvaro[2] and Robert[3] are okay releasing
> > that way, and Andres[4] is not.
>
> FWIW, I could be kinda convinced that it's temporarily ok, if there'd be
> a clear proposal on the table how to solve the scalability issue around
> MaintainOldSnapshotTimeMapping().
>
It seems that for read-only workloads, MaintainOldSnapshotTimeMapping() takes EXCLUSIVE LWLock which seems to be a probable reason for a performance regression. Now, here the question is do we need to acquire that lock if xmin is not changed since the last time value of oldSnapshotControl->latest_xmin is updated or xmin is lesser than equal to oldSnapshotControl->latest_xmin?
If we don't need it for above cases, I think it can address the performance regression to a good degree for read-only workloads when the feature is enabled.
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Tue, Apr 19, 2016 at 9:57 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Sun, Apr 17, 2016 at 2:26 AM, Andres Freund <andres@anarazel.de> wrote: >> >> On 2016-04-16 16:44:52 -0400, Noah Misch wrote: >> > That is more controversial than the potential ~2% regression for >> > old_snapshot_threshold=-1. Alvaro[2] and Robert[3] are okay releasing >> > that way, and Andres[4] is not. >> >> FWIW, I could be kinda convinced that it's temporarily ok, if there'd be >> a clear proposal on the table how to solve the scalability issue around >> MaintainOldSnapshotTimeMapping(). > > It seems that for read-only workloads, MaintainOldSnapshotTimeMapping() > takes EXCLUSIVE LWLock which seems to be a probable reason for a performance > regression. Now, here the question is do we need to acquire that lock if > xmin is not changed since the last time value of > oldSnapshotControl->latest_xmin is updated or xmin is lesser than equal to > oldSnapshotControl->latest_xmin? > If we don't need it for above cases, I think it can address the performance > regression to a good degree for read-only workloads when the feature is > enabled. Thanks, Amit -- I think something along those lines is the right solution to the scaling issues when the feature is enabled. For now I'm focusing on the back-patching issues and the performance regression when the feature is disabled, but I'll shift focus to this once the "killer" issues are in hand. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Robert Haas
Date:
On Tue, Apr 19, 2016 at 11:11 AM, Kevin Grittner <kgrittn@gmail.com> wrote: > On Tue, Apr 19, 2016 at 9:57 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> On Sun, Apr 17, 2016 at 2:26 AM, Andres Freund <andres@anarazel.de> wrote: >>> >>> On 2016-04-16 16:44:52 -0400, Noah Misch wrote: >>> > That is more controversial than the potential ~2% regression for >>> > old_snapshot_threshold=-1. Alvaro[2] and Robert[3] are okay releasing >>> > that way, and Andres[4] is not. >>> >>> FWIW, I could be kinda convinced that it's temporarily ok, if there'd be >>> a clear proposal on the table how to solve the scalability issue around >>> MaintainOldSnapshotTimeMapping(). >> >> It seems that for read-only workloads, MaintainOldSnapshotTimeMapping() >> takes EXCLUSIVE LWLock which seems to be a probable reason for a performance >> regression. Now, here the question is do we need to acquire that lock if >> xmin is not changed since the last time value of >> oldSnapshotControl->latest_xmin is updated or xmin is lesser than equal to >> oldSnapshotControl->latest_xmin? >> If we don't need it for above cases, I think it can address the performance >> regression to a good degree for read-only workloads when the feature is >> enabled. > > Thanks, Amit -- I think something along those lines is the right > solution to the scaling issues when the feature is enabled. For > now I'm focusing on the back-patching issues and the performance > regression when the feature is disabled, but I'll shift focus to > this once the "killer" issues are in hand. Maybe Amit could try his idea in parallel. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Tue, Apr 19, 2016 at 10:14 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Apr 19, 2016 at 11:11 AM, Kevin Grittner <kgrittn@gmail.com> wrote: >> On Tue, Apr 19, 2016 at 9:57 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >>> On Sun, Apr 17, 2016 at 2:26 AM, Andres Freund <andres@anarazel.de> wrote: >>>> >>>> On 2016-04-16 16:44:52 -0400, Noah Misch wrote: >>>> > That is more controversial than the potential ~2% regression for >>>> > old_snapshot_threshold=-1. Alvaro[2] and Robert[3] are okay releasing >>>> > that way, and Andres[4] is not. >>>> >>>> FWIW, I could be kinda convinced that it's temporarily ok, if there'd be >>>> a clear proposal on the table how to solve the scalability issue around >>>> MaintainOldSnapshotTimeMapping(). >>> >>> It seems that for read-only workloads, MaintainOldSnapshotTimeMapping() >>> takes EXCLUSIVE LWLock which seems to be a probable reason for a performance >>> regression. Now, here the question is do we need to acquire that lock if >>> xmin is not changed since the last time value of >>> oldSnapshotControl->latest_xmin is updated or xmin is lesser than equal to >>> oldSnapshotControl->latest_xmin? >>> If we don't need it for above cases, I think it can address the performance >>> regression to a good degree for read-only workloads when the feature is >>> enabled. >> >> Thanks, Amit -- I think something along those lines is the right >> solution to the scaling issues when the feature is enabled. For >> now I'm focusing on the back-patching issues and the performance >> regression when the feature is disabled, but I'll shift focus to >> this once the "killer" issues are in hand. > > Maybe Amit could try his idea in parallel. That would be great! -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Amit Kapila
Date:
On Tue, Apr 19, 2016 at 8:44 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Apr 19, 2016 at 11:11 AM, Kevin Grittner <kgrittn@gmail.com> wrote:
> > On Tue, Apr 19, 2016 at 9:57 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> >> It seems that for read-only workloads, MaintainOldSnapshotTimeMapping()
> >> takes EXCLUSIVE LWLock which seems to be a probable reason for a performance
> >> regression. Now, here the question is do we need to acquire that lock if
> >> xmin is not changed since the last time value of
> >> oldSnapshotControl->latest_xmin is updated or xmin is lesser than equal to
> >> oldSnapshotControl->latest_xmin?
> >> If we don't need it for above cases, I think it can address the performance
> >> regression to a good degree for read-only workloads when the feature is
> >> enabled.
> >
> > Thanks, Amit -- I think something along those lines is the right
> > solution to the scaling issues when the feature is enabled. For
> > now I'm focusing on the back-patching issues and the performance
> > regression when the feature is disabled, but I'll shift focus to
> > this once the "killer" issues are in hand.
>
> Maybe Amit could try his idea in parallel.
>
Okay, will look into it.
>
> On Tue, Apr 19, 2016 at 11:11 AM, Kevin Grittner <kgrittn@gmail.com> wrote:
> > On Tue, Apr 19, 2016 at 9:57 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> >> It seems that for read-only workloads, MaintainOldSnapshotTimeMapping()
> >> takes EXCLUSIVE LWLock which seems to be a probable reason for a performance
> >> regression. Now, here the question is do we need to acquire that lock if
> >> xmin is not changed since the last time value of
> >> oldSnapshotControl->latest_xmin is updated or xmin is lesser than equal to
> >> oldSnapshotControl->latest_xmin?
> >> If we don't need it for above cases, I think it can address the performance
> >> regression to a good degree for read-only workloads when the feature is
> >> enabled.
> >
> > Thanks, Amit -- I think something along those lines is the right
> > solution to the scaling issues when the feature is enabled. For
> > now I'm focusing on the back-patching issues and the performance
> > regression when the feature is disabled, but I'll shift focus to
> > this once the "killer" issues are in hand.
>
> Maybe Amit could try his idea in parallel.
>
Okay, will look into it.
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-04-19 20:27:31 +0530, Amit Kapila wrote: > On Sun, Apr 17, 2016 at 2:26 AM, Andres Freund <andres@anarazel.de> wrote: > > > > On 2016-04-16 16:44:52 -0400, Noah Misch wrote: > > > That is more controversial than the potential ~2% regression for > > > old_snapshot_threshold=-1. Alvaro[2] and Robert[3] are okay releasing > > > that way, and Andres[4] is not. > > > > FWIW, I could be kinda convinced that it's temporarily ok, if there'd be > > a clear proposal on the table how to solve the scalability issue around > > MaintainOldSnapshotTimeMapping(). > > > > It seems that for read-only workloads, MaintainOldSnapshotTimeMapping() > takes EXCLUSIVE LWLock which seems to be a probable reason for a > performance regression. Yes, that's the major problem. > Now, here the question is do we need to acquire that lock if xmin is > not changed since the last time value of > oldSnapshotControl->latest_xmin is updated or xmin is lesser than > equal to oldSnapshotControl->latest_xmin? If we don't need it for > above cases, I think it can address the performance regression to a > good degree for read-only workloads when the feature is enabled. I think the more fundamental issue is that the time->xid mapping is built at GetSnapshotData() time (via MaintainOldSnapshotTimeMapping()), and not when xids are assigned. Snapshots are created a lot more frequently in nearly all use-cases than xids are assigned. That's what forces the exclusive lock to be in the read path, rather than the write path. What's the reason for this? Greetings, Andres Freund
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Ants Aasma
Date:
On Tue, Apr 19, 2016 at 6:11 PM, Kevin Grittner <kgrittn@gmail.com> wrote: > On Tue, Apr 19, 2016 at 9:57 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> On Sun, Apr 17, 2016 at 2:26 AM, Andres Freund <andres@anarazel.de> wrote: >>> >>> On 2016-04-16 16:44:52 -0400, Noah Misch wrote: >>> > That is more controversial than the potential ~2% regression for >>> > old_snapshot_threshold=-1. Alvaro[2] and Robert[3] are okay releasing >>> > that way, and Andres[4] is not. >>> >>> FWIW, I could be kinda convinced that it's temporarily ok, if there'd be >>> a clear proposal on the table how to solve the scalability issue around >>> MaintainOldSnapshotTimeMapping(). >> >> It seems that for read-only workloads, MaintainOldSnapshotTimeMapping() >> takes EXCLUSIVE LWLock which seems to be a probable reason for a performance >> regression. Now, here the question is do we need to acquire that lock if >> xmin is not changed since the last time value of >> oldSnapshotControl->latest_xmin is updated or xmin is lesser than equal to >> oldSnapshotControl->latest_xmin? >> If we don't need it for above cases, I think it can address the performance >> regression to a good degree for read-only workloads when the feature is >> enabled. > > Thanks, Amit -- I think something along those lines is the right > solution to the scaling issues when the feature is enabled. For > now I'm focusing on the back-patching issues and the performance > regression when the feature is disabled, but I'll shift focus to > this once the "killer" issues are in hand. I had an idea I wanted to test out. The gist of it is to effectively have the last slot of timestamp to xid map stored in the latest_xmin field and only update the mapping when slot boundaries are crossed. See attached WIP patch for details. This way the exclusive lock only needs to be acquired once per minute. The common case is a spinlock that could be replaced with atomics later. And it seems to me that the mutex_threshold taken in TestForOldSnapshot() can also get pretty hot under some workloads, so that may also need some tweaking. I think a better approach would be to base the whole mechanism on a periodically updated counter, instead of timestamps. Autovacuum launcher looks like a good candidate to play the clock keeper, without it the feature has little point anyway. AFAICS only the clock keeper needs to have the timestamp xid mapping, others can make do with a couple of periodically updated values. I haven't worked it out in detail, but it feels like the code would be simpler. But this was a larger change than I felt comfortable trying out, so I went with the simple change first. However, while checking out if my proof of concept patch actually works I hit another issue. I couldn't get my test for the feature to actually work. The test script I used is attached. Basically I have a table with 1000 rows, one high throughput worker deleting old rows and inserting new ones, one long query that acquires a snapshot and sleeps for 30min, and one worker that has a repeatable read snapshot and periodically does count(*) on the table. Based on documentation I would expect the following: * The interfering query gets cancelled * The long running query gets to run * Old rows will start to be cleaned up after the threshold expires. However, testing on commit 9c75e1a36b6b2f3ad9f76ae661f42586c92c6f7c, I'm seeing that the old rows do not get cleaned up, and that I'm only seeing the interfering query get cancelled when old_snapshot_threshold = 0. Larger values do not result in cancellation. Am I doing something wrong or is the feature just not working at all? Regards, Ants Aasma
Attachment
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Wed, Apr 20, 2016 at 8:08 PM, Ants Aasma <ants.aasma@eesti.ee> wrote: > However, while checking out if my proof of concept patch actually > works I hit another issue. I couldn't get my test for the feature to > actually work. The test script I used is attached. Could you provide enough to make that a self-contained reproducible test case (i.e., that I don't need to infer or re-write any steps or guess how to call it)? In previous cases people have given me where they felt that the feature wasn't working there have have been valid reasons for it to behave as it was (e.g., a transaction with a transaction ID and an xmin which prevented cleanup from advancing). I'll be happy to look at your case and see whether it's another such case or some bug, but it seems a waste to reverse engineer or rewrite parts of the test case to do so. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Ants Aasma
Date:
On Thu, Apr 21, 2016 at 5:16 PM, Kevin Grittner <kgrittn@gmail.com> wrote: > On Wed, Apr 20, 2016 at 8:08 PM, Ants Aasma <ants.aasma@eesti.ee> wrote: > >> However, while checking out if my proof of concept patch actually >> works I hit another issue. I couldn't get my test for the feature to >> actually work. The test script I used is attached. > > Could you provide enough to make that a self-contained reproducible > test case (i.e., that I don't need to infer or re-write any steps > or guess how to call it)? In previous cases people have given me > where they felt that the feature wasn't working there have have > been valid reasons for it to behave as it was (e.g., a transaction > with a transaction ID and an xmin which prevented cleanup from > advancing). I'll be happy to look at your case and see whether > it's another such case or some bug, but it seems a waste to reverse > engineer or rewrite parts of the test case to do so. Just to be sure I didn't have anything screwy in my build environment I redid the test on a freshly installed Fedora 23 VM. Steps to reproduce: 1. Build postgresql from git. I used ./configure --enable-debug --enable-cassert --prefix=/home/ants/pg-master 2. Set up database: cat << EOF > test-settings.conf old_snapshot_threshold = 1min logging_collector = on log_directory = 'pg_log' log_filename = 'postgresql.log' log_line_prefix = '[%m] ' log_autovacuum_min_duration = 0 EOF pg-master/bin/initdb data/ cat test-settings.conf >> data/postgresql.conf pg-master/bin/pg_ctl -D data/ start pg-master/bin/createdb 3. Install python-psycopg2 and get the test script from my earlier e-mail [1] 4. Run the test: python test_oldsnapshot.py "host=/tmp" 5. Observe that the table keeps growing even after the old snapshot threshold is exceeded and autovacuum has run. Autovacuum log shows 0 tuples removed. Only the write workload has a xid assigned, the other two backends only have snapshot held: [ants@localhost ~]$ pg-master/bin/psql -c "SELECT application_name, backend_xid, backend_xmin, NOW()-xact_start AS tx_age, state FROM pg_stat_activity" application_name | backend_xid | backend_xmin | tx_age | state ----------------------+-------------+--------------+-----------------+--------------------- write-workload | 95637 | | 00:00:00.009314 | active long-unrelated-query | | 1806 | 00:04:33.914048 | active interfering-query | | 2444 | 00:04:32.910742 | idle in transaction psql | | 95637 | 00:00:00 | active Output from the test tool attached. After killing the test tool and the long running query autovacuum cleans stuff as expected. I'm too tired right now to chase this down myself. The mental toll that two small kids can take is pretty staggering. But I might find the time to fire up a debugger sometime tomorrow. Regards, Ants Aasma [1] http://www.postgresql.org/message-id/attachment/43859/test_oldsnapshot.py
Attachment
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Thu, Apr 21, 2016 at 2:10 PM, Ants Aasma <ants.aasma@eesti.ee> wrote: > On Thu, Apr 21, 2016 at 5:16 PM, Kevin Grittner <kgrittn@gmail.com> wrote: >> Could you provide enough to make that a self-contained >> reproducible test case [?] > [provided] Thanks! I have your test case running, and it is not immediately clear why old rows are not being vacuumed away. Will investigate. > I'm too tired right now to chase this down myself. The mental > toll that two small kids can take is pretty staggering. Been there, done that; so I know just what you mean. :-) It is rewarding though, eh? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Thu, Apr 21, 2016 at 4:13 PM, Kevin Grittner <kgrittn@gmail.com> wrote: > I have your test case running, and it is not immediately > clear why old rows are not being vacuumed away. I have not found the reason that the vacuuming is not as aggressive as it should be with this old_snapshot_threshold, but I left your test case running overnight and found that it eventually did kick in. So the question is why it was not nearly as aggressive as one would expect. From the server log: kgrittn@Kevin-Desktop:~/pg/master$ grep -B2 -A3 'tuples: [1-9]' Debug/data/pg_log/postgresql.log [2016-04-21 16:21:29.658 CDT] LOG: automatic vacuum of table "kgrittn.public.high_throughput": index scans: 1 pages: 0 removed, 2759 remain, 0 skipped due to pins, 0 skipped frozen tuples: 94 removed, 159928 remain, 158935 are dead but not yet removable buffer usage: 6005 hits, 0 misses,8 dirtied avg read rate: 0.000 MB/s, avg write rate: 0.090 MB/s system usage: CPU 0.00s/0.08u sec elapsed0.69 sec -- [2016-04-21 16:55:31.971 CDT] LOG: automatic vacuum of table "kgrittn.pg_catalog.pg_statistic": index scans: 1 pages: 0 removed, 23 remain, 0 skipped due to pins, 0 skipped frozen tuples: 2 removed, 515 remain, 128 are dead but not yet removable buffer usage: 50 hits, 11 misses, 14dirtied avg read rate: 4.048 MB/s, avg write rate: 5.152 MB/s system usage: CPU 0.00s/0.00u sec elapsed 0.02sec -- [2016-04-22 00:33:11.978 CDT] LOG: automatic vacuum of table "kgrittn.pg_catalog.pg_statistic": index scans: 1 pages: 0 removed, 68 remain, 0 skipped due to pins, 0 skipped frozen tuples: 1016 removed, 409 remain, 22 are dead but not yet removable buffer usage: 89 hits, 127 misses,111 dirtied avg read rate: 1.478 MB/s, avg write rate: 1.292 MB/s system usage: CPU 0.00s/0.00u sec elapsed0.67 sec [2016-04-22 00:33:18.572 CDT] LOG: automatic vacuum of table "kgrittn.public.high_throughput": index scans: 1 pages: 0 removed, 20196 remain, 0 skipped due to pins, 0 skipped frozen tuples: 292030 removed, 3941 remain, 3553 are dead but not yet removable buffer usage: 68541 hits, 14415misses, 20638 dirtied avg read rate: 1.674 MB/s, avg write rate: 2.396 MB/s system usage: CPU 0.23s/1.09usec elapsed 67.29 sec -- [2016-04-22 00:52:13.013 CDT] LOG: automatic vacuum of table "kgrittn.public.high_throughput": index scans: 1 pages: 0 removed, 20233 remain, 0 skipped due to pins, 19575 skipped frozen tuples: 8463 removed, 30533 remain, 28564 are dead but not yet removable buffer usage: 8136 hits,4 misses, 158 dirtied avg read rate: 0.027 MB/s, avg write rate: 1.065 MB/s system usage: CPU 0.00s/0.03usec elapsed 1.15 sec -- [2016-04-22 01:28:22.812 CDT] LOG: automatic vacuum of table "kgrittn.pg_catalog.pg_statistic": index scans: 1 pages: 0 removed, 68 remain, 0 skipped due to pins, 44 skipped frozen tuples: 26 removed, 760 remain, 108 are dead but not yet removable buffer usage: 37 hits, 27 misses, 12dirtied avg read rate: 4.963 MB/s, avg write rate: 2.206 MB/s system usage: CPU 0.00s/0.00u sec elapsed 0.04sec -- [2016-04-22 06:51:23.042 CDT] LOG: automatic vacuum of table "kgrittn.pg_catalog.pg_statistic": index scans: 1 pages: 0 removed, 68 remain, 0 skipped due to pins, 0 skipped frozen tuples: 692 removed, 403 remain, 16 are dead but not yet removable buffer usage: 90 hits, 109 misses, 76dirtied avg read rate: 1.646 MB/s, avg write rate: 1.148 MB/s system usage: CPU 0.00s/0.00u sec elapsed 0.51sec [2016-04-22 06:52:45.174 CDT] LOG: automatic vacuum of table "kgrittn.public.high_throughput": index scans: 1 pages: 0 removed, 28152 remain, 0 skipped due to pins, 0 skipped frozen tuples: 928116 removed, 14021 remain, 14021 are dead but not yet removable buffer usage: 88738 hits, 33068 misses, 45857 dirtied avg read rate: 1.811 MB/s, avg write rate:2.511 MB/s system usage: CPU 0.43s/1.93u sec elapsed 142.68 sec -- [2016-04-22 06:53:23.665 CDT] LOG: automatic vacuum of table "kgrittn.public.high_throughput": index scans: 1 pages: 0 removed, 28313 remain, 0 skipped due to pins, 27853 skipped frozen tuples: 43 removed, 9002 remain, 8001 are dead but not yet removable buffer usage: 9795 hits,22 misses, 28 dirtied avg read rate: 0.159 MB/s, avg write rate: 0.202 MB/s system usage: CPU 0.00s/0.03usec elapsed 1.08 sec -- [2016-04-22 07:22:25.240 CDT] LOG: automatic vacuum of table "kgrittn.public.high_throughput": index scans: 1 pages: 0 removed, 28313 remain, 0 skipped due to pins, 25627 skipped frozen tuples: 51 removed, 149639 remain, 147733 are dead but not yet removable buffer usage: 14227 hits,18 misses, 15 dirtied avg read rate: 0.089 MB/s, avg write rate: 0.074 MB/s system usage: CPU 0.01s/0.10usec elapsed 1.58 sec From the output stream of the python test script: 00:32:00] Counted 1000 rows with max 1094 in high_throughput table [00:32:05] High throughput table size @ 31240s. Size 161624kB Last vacuum 0:00:42.664285 ago [00:32:10] Interfering query got error snapshot too old [00:32:10] Waiting 3min to restart interfering query [06:50:00] Counted 1000 rows with max 1176984 in high_throughput table [06:50:05] High throughput table size @ 53920s. Size 225088kB Last vacuum 0:00:23.685084 ago [06:50:10] Interfering query got error snapshot too old [06:50:10] Waiting 3min to restart interfering query I don't see any evidence that it returned incorrect query results at any point, and it did eventually limit bloat; what we have is that it is not limiting it nearly as tightly as one would expect in this particular test case. I'm continuing to investigate. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Amit Kapila
Date:
On Wed, Apr 20, 2016 at 7:39 PM, Andres Freund <andres@anarazel.de> wrote:
>
> On 2016-04-19 20:27:31 +0530, Amit Kapila wrote:
> > On Sun, Apr 17, 2016 at 2:26 AM, Andres Freund <andres@anarazel.de> wrote:
> > >
> > > On 2016-04-16 16:44:52 -0400, Noah Misch wrote:
> > > > That is more controversial than the potential ~2% regression for
> > > > old_snapshot_threshold=-1. Alvaro[2] and Robert[3] are okay releasing
> > > > that way, and Andres[4] is not.
> > >
> > > FWIW, I could be kinda convinced that it's temporarily ok, if there'd be
> > > a clear proposal on the table how to solve the scalability issue around
> > > MaintainOldSnapshotTimeMapping().
> > >
> >
> > It seems that for read-only workloads, MaintainOldSnapshotTimeMapping()
> > takes EXCLUSIVE LWLock which seems to be a probable reason for a
> > performance regression.
>
> Yes, that's the major problem.
>
>
> > Now, here the question is do we need to acquire that lock if xmin is
> > not changed since the last time value of
> > oldSnapshotControl->latest_xmin is updated or xmin is lesser than
> > equal to oldSnapshotControl->latest_xmin? If we don't need it for
> > above cases, I think it can address the performance regression to a
> > good degree for read-only workloads when the feature is enabled.
>
> I think the more fundamental issue is that the time->xid mapping is
> built at GetSnapshotData() time (via MaintainOldSnapshotTimeMapping()),
> and not when xids are assigned. Snapshots are created a lot more
> frequently in nearly all use-cases than xids are assigned. That's what
> forces the exclusive lock to be in the read path, rather than the write
> path.
>
> What's the reason for this?
>
>
> On 2016-04-19 20:27:31 +0530, Amit Kapila wrote:
> > On Sun, Apr 17, 2016 at 2:26 AM, Andres Freund <andres@anarazel.de> wrote:
> > >
> > > On 2016-04-16 16:44:52 -0400, Noah Misch wrote:
> > > > That is more controversial than the potential ~2% regression for
> > > > old_snapshot_threshold=-1. Alvaro[2] and Robert[3] are okay releasing
> > > > that way, and Andres[4] is not.
> > >
> > > FWIW, I could be kinda convinced that it's temporarily ok, if there'd be
> > > a clear proposal on the table how to solve the scalability issue around
> > > MaintainOldSnapshotTimeMapping().
> > >
> >
> > It seems that for read-only workloads, MaintainOldSnapshotTimeMapping()
> > takes EXCLUSIVE LWLock which seems to be a probable reason for a
> > performance regression.
>
> Yes, that's the major problem.
>
>
> > Now, here the question is do we need to acquire that lock if xmin is
> > not changed since the last time value of
> > oldSnapshotControl->latest_xmin is updated or xmin is lesser than
> > equal to oldSnapshotControl->latest_xmin? If we don't need it for
> > above cases, I think it can address the performance regression to a
> > good degree for read-only workloads when the feature is enabled.
>
> I think the more fundamental issue is that the time->xid mapping is
> built at GetSnapshotData() time (via MaintainOldSnapshotTimeMapping()),
> and not when xids are assigned. Snapshots are created a lot more
> frequently in nearly all use-cases than xids are assigned. That's what
> forces the exclusive lock to be in the read path, rather than the write
> path.
>
> What's the reason for this?
>
I don't see any particular reason for doing so, but not sure if it will be beneficial in all kind of cases if we build that mapping when xids are assigned. As an example, consider the case where couple of write transactions start at same time and immediately after that a read statement is executed, now for all-those write-transactions we need to take Exclusive lock to build an oldsnaphot entry, whereas with the above optimization suggested by me, it needs to take Exclusive lock just once for read-statement.
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Amit Kapila
Date:
On Thu, Apr 21, 2016 at 6:38 AM, Ants Aasma <ants.aasma@eesti.ee> wrote:
>
> On Tue, Apr 19, 2016 at 6:11 PM, Kevin Grittner <kgrittn@gmail.com> wrote:
> > On Tue, Apr 19, 2016 at 9:57 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >> On Sun, Apr 17, 2016 at 2:26 AM, Andres Freund <andres@anarazel.de> wrote:
> >>>
> >>> FWIW, I could be kinda convinced that it's temporarily ok, if there'd be
> >>> a clear proposal on the table how to solve the scalability issue around
> >>> MaintainOldSnapshotTimeMapping().
> >>
> >> It seems that for read-only workloads, MaintainOldSnapshotTimeMapping()
> >> takes EXCLUSIVE LWLock which seems to be a probable reason for a performance
> >> regression. Now, here the question is do we need to acquire that lock if
> >> xmin is not changed since the last time value of
> >> oldSnapshotControl->latest_xmin is updated or xmin is lesser than equal to
> >> oldSnapshotControl->latest_xmin?
> >> If we don't need it for above cases, I think it can address the performance
> >> regression to a good degree for read-only workloads when the feature is
> >> enabled.
> >
> > Thanks, Amit -- I think something along those lines is the right
> > solution to the scaling issues when the feature is enabled. For
> > now I'm focusing on the back-patching issues and the performance
> > regression when the feature is disabled, but I'll shift focus to
> > this once the "killer" issues are in hand.
>
> I had an idea I wanted to test out. The gist of it is to effectively
> have the last slot of timestamp to xid map stored in the latest_xmin
> field and only update the mapping when slot boundaries are crossed.
> See attached WIP patch for details. This way the exclusive lock only
> needs to be acquired once per minute.
>
Why at all do we need to acquire Exclusive lock if xmin is not changing at all? Also, I think your proposed patch can effect the update of xid's for existing mappings. In particular, I am talking about below code:
>
> On Tue, Apr 19, 2016 at 6:11 PM, Kevin Grittner <kgrittn@gmail.com> wrote:
> > On Tue, Apr 19, 2016 at 9:57 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >> On Sun, Apr 17, 2016 at 2:26 AM, Andres Freund <andres@anarazel.de> wrote:
> >>>
> >>> FWIW, I could be kinda convinced that it's temporarily ok, if there'd be
> >>> a clear proposal on the table how to solve the scalability issue around
> >>> MaintainOldSnapshotTimeMapping().
> >>
> >> It seems that for read-only workloads, MaintainOldSnapshotTimeMapping()
> >> takes EXCLUSIVE LWLock which seems to be a probable reason for a performance
> >> regression. Now, here the question is do we need to acquire that lock if
> >> xmin is not changed since the last time value of
> >> oldSnapshotControl->latest_xmin is updated or xmin is lesser than equal to
> >> oldSnapshotControl->latest_xmin?
> >> If we don't need it for above cases, I think it can address the performance
> >> regression to a good degree for read-only workloads when the feature is
> >> enabled.
> >
> > Thanks, Amit -- I think something along those lines is the right
> > solution to the scaling issues when the feature is enabled. For
> > now I'm focusing on the back-patching issues and the performance
> > regression when the feature is disabled, but I'll shift focus to
> > this once the "killer" issues are in hand.
>
> I had an idea I wanted to test out. The gist of it is to effectively
> have the last slot of timestamp to xid map stored in the latest_xmin
> field and only update the mapping when slot boundaries are crossed.
> See attached WIP patch for details. This way the exclusive lock only
> needs to be acquired once per minute.
>
Why at all do we need to acquire Exclusive lock if xmin is not changing at all? Also, I think your proposed patch can effect the update of xid's for existing mappings. In particular, I am talking about below code:
else if (ts <= (oldSnapshotControl->head_timestamp +((oldSnapshotControl->count_used - 1)* USECS_PER_MINUTE)))
{
/* existing mapping; advance xid if possible */
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Amit Kapila
Date:
On Tue, Apr 19, 2016 at 8:41 PM, Kevin Grittner <kgrittn@gmail.com> wrote:
>
> On Tue, Apr 19, 2016 at 9:57 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Sun, Apr 17, 2016 at 2:26 AM, Andres Freund <andres@anarazel.de> wrote:
> >>
> >> On 2016-04-16 16:44:52 -0400, Noah Misch wrote:
> >> > That is more controversial than the potential ~2% regression for
> >> > old_snapshot_threshold=-1. Alvaro[2] and Robert[3] are okay releasing
> >> > that way, and Andres[4] is not.
> >>
> >> FWIW, I could be kinda convinced that it's temporarily ok, if there'd be
> >> a clear proposal on the table how to solve the scalability issue around
> >> MaintainOldSnapshotTimeMapping().
> >
> > It seems that for read-only workloads, MaintainOldSnapshotTimeMapping()
> > takes EXCLUSIVE LWLock which seems to be a probable reason for a performance
> > regression. Now, here the question is do we need to acquire that lock if
> > xmin is not changed since the last time value of
> > oldSnapshotControl->latest_xmin is updated or xmin is lesser than equal to
> > oldSnapshotControl->latest_xmin?
> > If we don't need it for above cases, I think it can address the performance
> > regression to a good degree for read-only workloads when the feature is
> > enabled.
>
> Thanks, Amit -- I think something along those lines is the right
> solution to the scaling issues when the feature is enabled.
>
>
> On Tue, Apr 19, 2016 at 9:57 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Sun, Apr 17, 2016 at 2:26 AM, Andres Freund <andres@anarazel.de> wrote:
> >>
> >> On 2016-04-16 16:44:52 -0400, Noah Misch wrote:
> >> > That is more controversial than the potential ~2% regression for
> >> > old_snapshot_threshold=-1. Alvaro[2] and Robert[3] are okay releasing
> >> > that way, and Andres[4] is not.
> >>
> >> FWIW, I could be kinda convinced that it's temporarily ok, if there'd be
> >> a clear proposal on the table how to solve the scalability issue around
> >> MaintainOldSnapshotTimeMapping().
> >
> > It seems that for read-only workloads, MaintainOldSnapshotTimeMapping()
> > takes EXCLUSIVE LWLock which seems to be a probable reason for a performance
> > regression. Now, here the question is do we need to acquire that lock if
> > xmin is not changed since the last time value of
> > oldSnapshotControl->latest_xmin is updated or xmin is lesser than equal to
> > oldSnapshotControl->latest_xmin?
> > If we don't need it for above cases, I think it can address the performance
> > regression to a good degree for read-only workloads when the feature is
> > enabled.
>
> Thanks, Amit -- I think something along those lines is the right
> solution to the scaling issues when the feature is enabled.
>
I have tried attached patch along the above lines and it seems that it addresses performance regression to a good degree when feature is enabled at moderate client-count like 32, but still more needs to be done for somewhat higher client-count like 64.
Performance data is for median of 3, 5 min runs of read-only workload -
pgbench -c $client_count -j $client_count -T 300 -M prepared -S postgres
o_s_t - old_snapshot_threshold
Client_Count/Patch_Ver | 32 | 64 |
HEAD (o_s_t = -1) | 354077 | 552063 |
HEAD (o_s_t = 1) | 92809 | 55847 |
Patch (o_s_t = 1) | 319759 | 191741 |
If you think that attached patch is correct functionality wise, then I think we can go-ahead with it and then investigate what more can be improved. I think newly introduced spinlocks might be the reason of performance degradation at higher client-count, if that turns out to be true, then I think we can replace them with atomics, once Andres's patch for completing the 64-bit atomics implementation is committed.
m/c details used for performance testing
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 192
On-line CPU(s) list: 0-191
Thread(s) per core: 8
Core(s) per socket: 1
Socket(s): 24
NUMA node(s): 4
Model: IBM,8286-42A
L1d cache: 64K
L1i cache: 32K
L2 cache: 512K
L3 cache: 8192K
NUMA node0 CPU(s): 0-47
NUMA node1 CPU(s): 48-95
NUMA node2 CPU(s): 96-143
NUMA node3 CPU(s): 144-191
Attachment
What is the recommended procedure for replying to a pgsql-committers messsage? Is cross-posting to hackers really the right approach, as it causes duplicate messages. (pgsql-committers CC removed.) --------------------------------------------------------------------------- On Tue, Apr 12, 2016 at 10:38:56AM -0700, Andres Freund wrote: > On 2016-04-12 16:49:25 +0000, Kevin Grittner wrote: > > On a big NUMA machine with 1000 connections in saturation load > > there was a performance regression due to spinlock contention, for > > acquiring values which were never used. Just fill with dummy > > values if we're not going to use them. > > FWIW, I could see massive regressions with just 64 connections. > > I'm a bit scared of having an innoccuous sounding option regress things > by a factor of 10. I think, in addition to this fix, we need to actually > solve the scalability issue here to a good degree. One way to do so is > to apply the parts of 0001 in > http://archives.postgresql.org/message-id/20160330230914.GH13305%40awork2.anarazel.de > defining PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY and rely on that. Another > to apply the whole patch and simply put the lsn in an 8 byte atomic. > > - Andres > > > -- > Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-committers -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Bruce Momjian wrote: > > What is the recommended procedure for replying to a pgsql-committers > messsage? Is cross-posting to hackers really the right approach, as it > causes duplicate messages. (pgsql-committers CC removed.) CCing pgsql-hackers when replying to -committers was discussed some time ago and the consensus seemed to be that that's preferrable than keeping the discussion in -committers only, because that one is so much smaller. Whether -committers is kept in cc/to or not seems not to be important. This is all in the archives somewhere ... You must be new to this email thing. Any millennial will tell you that there's no duplicate because Gmail already de-duplicates them. You would only see one copy, ever. Now, if you still live in a cave and don't use Gmail (like, say, me), you could still change the options in Majordomo to send a unique copy of each message, that is to say change the delivery class to "unique" rather than "each". Then it will see that you have the same message in two lists and send you only one. Now, whenever you're in the CC list of a message (something which I'm told is somewhat common around here) you would additionally get that copy too! There's nothing majordomo could do about that of course. Again the solution is to use Gmail (what else!). If you won't do that, you can install a procmail recipe to remove the dupes, say :0 Wh: msgid.lock | formail -D 65536 $HOME/.msgid.cache -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 04/29/2016 10:20 AM, Alvaro Herrera wrote: > :0 Wh: msgid.lock > | formail -D 65536 $HOME/.msgid.cache > And Alvaro drowns in the drool of his own sarcasm. -- Command Prompt, Inc. http://the.postgres.company/ +1-503-667-4564 PostgreSQL Centered full stack support, consulting and development. Everyone appreciates your honesty, until you are honest with them.
Joshua D. Drake wrote: > On 04/29/2016 10:20 AM, Alvaro Herrera wrote: > > >:0 Wh: msgid.lock > >| formail -D 65536 $HOME/.msgid.cache > > And Alvaro drowns in the drool of his own sarcasm. I was going to add an apology that this wasn't supposed to be insulting, only funny, but refrained. There was no offense meant. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Apr 29, 2016 at 02:49:38PM -0300, Alvaro Herrera wrote: > Joshua D. Drake wrote: > > On 04/29/2016 10:20 AM, Alvaro Herrera wrote: > > > > >:0 Wh: msgid.lock > > >| formail -D 65536 $HOME/.msgid.cache > > > > And Alvaro drowns in the drool of his own sarcasm. > > I was going to add an apology that this wasn't supposed to be insulting, > only funny, but refrained. There was no offense meant. I took it as humorous. I didn't turn on duplicate removal on Majordomo so I could police people who cross-posted unnecessarily. Are you saying I shouldn't worry about that and just turn on duplicate removal? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Bruce Momjian wrote: > On Fri, Apr 29, 2016 at 02:49:38PM -0300, Alvaro Herrera wrote: > > Joshua D. Drake wrote: > > > On 04/29/2016 10:20 AM, Alvaro Herrera wrote: > > > > > > >:0 Wh: msgid.lock > > > >| formail -D 65536 $HOME/.msgid.cache > > > > > > And Alvaro drowns in the drool of his own sarcasm. > > > > I was going to add an apology that this wasn't supposed to be insulting, > > only funny, but refrained. There was no offense meant. > > I took it as humorous. Great, thanks. > I didn't turn on duplicate removal on Majordomo so I could police > people who cross-posted unnecessarily. Are you saying I shouldn't > worry about that and just turn on duplicate removal? Yeah, I've done that for years. I notice duplicates by looking at CCs anyway. Note that our archives system is perfectly prepared to deal with messages cross-posted to several mailing lists, as a first-class feature. I don't think there's a strong need to police them specifically. If consensus is that we should completely forbid cross-posting, we can talk about that. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2016-04-29 15:27:15 -0300, Alvaro Herrera wrote: > If consensus is that we should completely forbid cross-posting, we can > talk about that. I find xposts rather useful. WRT committers vs. hackers thing, I'll e.g. be far more likely to be able to keep up with committers than hackers. Andres
On Fri, Apr 29, 2016 at 02:20:40PM -0300, Alvaro Herrera wrote: > Now, if you still live in a cave and don't use Gmail (like, say, me), > you could still change the options in Majordomo to send a unique copy of > each message, that is to say change the delivery class to "unique" > rather than "each". Then it will see that you have the same message in > two lists and send you only one. OK, I set every Majordomo email list to "each unduplicated message" --- who says I am behind the times. :-) -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Fri, Apr 22, 2016 at 8:06 AM, Kevin Grittner <kgrittn@gmail.com> wrote: > On Thu, Apr 21, 2016 at 4:13 PM, Kevin Grittner <kgrittn@gmail.com> wrote: > >> I have your test case running, and it is not immediately >> clear why old rows are not being vacuumed away. > > I have not found the reason that the vacuuming is not as aggressive > as it should be with this old_snapshot_threshold, but I left your > test case running overnight and found that it eventually did kick > in. So the question is why it was not nearly as aggressive as one > would expect. Once I found it, it turned out to be a bit of a "forehead-slapper". Because the array of entries mapping time to TransactionId was exactly the same size as old_snapshot_threshold, the override of the xmin for pruning or vacuum would not be seen if another transaction got in fast enough, and this python test case was pounding hard enough that the override was rarely seen. By expanding the array by 10 entries, we will only miss the more aggressive cleanup if the thread stalls at that point for more than 10 minutes, which seems like a reasonable degree of patience, given that there is no correctness problem if that does happen. Ants, I think you'll find your test case behaving as you expected now. Now to continue with the performance benchmarks. I'm pretty sure we've fixed the problems when the feature is disabled (old_snapshot_threshold = -1), and there are several suggestions for improving performance while it is on that need to be compared and benchmarked. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Robert Haas
Date:
On Fri, Apr 29, 2016 at 6:08 PM, Kevin Grittner <kgrittn@gmail.com> wrote: > Now to continue with the performance benchmarks. I'm pretty sure > we've fixed the problems when the feature is disabled > (old_snapshot_threshold = -1), and there are several suggestions > for improving performance while it is on that need to be compared > and benchmarked. If anyone thinks that the issue with the feature disabled is NOT fixed, please speak up! I'm moving the corresponding open item to CLOSE_WAIT status, meaning that it will be closed if nobody shows up to say that there is still an issue. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-05-02 09:03:19 -0400, Robert Haas wrote: > On Fri, Apr 29, 2016 at 6:08 PM, Kevin Grittner <kgrittn@gmail.com> wrote: > > Now to continue with the performance benchmarks. I'm pretty sure > > we've fixed the problems when the feature is disabled > > (old_snapshot_threshold = -1), and there are several suggestions > > for improving performance while it is on that need to be compared > > and benchmarked. > > If anyone thinks that the issue with the feature disabled is NOT > fixed, please speak up! I'm moving the corresponding open item to > CLOSE_WAIT status, meaning that it will be closed if nobody shows up > to say that there is still an issue. Well, I don't agree that the feature is in a releaseable state. The datastructure is pretty much non-scalable, and maintained on the wrong side (every read, instead of once in writing writing xacts). There's no proposal actually addressing the scalability issues. Andres
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Robert Haas
Date:
On Mon, May 2, 2016 at 10:21 AM, Andres Freund <andres@anarazel.de> wrote: > On 2016-05-02 09:03:19 -0400, Robert Haas wrote: >> On Fri, Apr 29, 2016 at 6:08 PM, Kevin Grittner <kgrittn@gmail.com> wrote: >> > Now to continue with the performance benchmarks. I'm pretty sure >> > we've fixed the problems when the feature is disabled >> > (old_snapshot_threshold = -1), and there are several suggestions >> > for improving performance while it is on that need to be compared >> > and benchmarked. >> >> If anyone thinks that the issue with the feature disabled is NOT >> fixed, please speak up! I'm moving the corresponding open item to >> CLOSE_WAIT status, meaning that it will be closed if nobody shows up >> to say that there is still an issue. > > Well, I don't agree that the feature is in a releaseable state. The > datastructure is pretty much non-scalable, and maintained on the wrong > side (every read, instead of once in writing writing xacts). There's no > proposal actually addressing the scalability issues. You are certainly welcome to add a new open item to cover those complaints. But I do not want to blur together the discussion of whether the feature is well-designed with the question of whether it regresses performance when it is turned off. Those are severable issues, meriting separate discussion (and probably separate threads). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Ants Aasma
Date:
On Mon, May 2, 2016 at 5:21 PM, Andres Freund <andres@anarazel.de> wrote: > On 2016-05-02 09:03:19 -0400, Robert Haas wrote: >> On Fri, Apr 29, 2016 at 6:08 PM, Kevin Grittner <kgrittn@gmail.com> wrote: >> > Now to continue with the performance benchmarks. I'm pretty sure >> > we've fixed the problems when the feature is disabled >> > (old_snapshot_threshold = -1), and there are several suggestions >> > for improving performance while it is on that need to be compared >> > and benchmarked. >> >> If anyone thinks that the issue with the feature disabled is NOT >> fixed, please speak up! I'm moving the corresponding open item to >> CLOSE_WAIT status, meaning that it will be closed if nobody shows up >> to say that there is still an issue. > > Well, I don't agree that the feature is in a releaseable state. The > datastructure is pretty much non-scalable, and maintained on the wrong > side (every read, instead of once in writing writing xacts). There's no > proposal actually addressing the scalability issues. Unless I'm missing something fundamental the feature only requires tracking an upper bound on xmin observed by snapshots between clock ticks. The simplest way to do this would be a periodic process that increments a clock counter (32bit counter would be plenty) and then calculates xmin for the preceding range. With this scheme GetSnapshotData would need two atomic fetches to get current LSN and the timestamp. Test for old snapshot can also run completely lock free with a single atomic fetch of threshold timestamp. The negative side is that we need to have a process running that runs the clock ticks and the ticks may sometimes be late. Giving something like autovacuum launcher this task doesn't seem too bad and the consequence of falling behind is just delayed timing out of old snapshots. As far as I can see this approach would get rid of any scalability issues, but it is a pretty significant change and requires 64bit atomic reads to get rid of contention on xlog insert lock. Regards, Ants Aasma
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-05-02 18:15:40 +0300, Ants Aasma wrote: > On Mon, May 2, 2016 at 5:21 PM, Andres Freund <andres@anarazel.de> wrote: > > On 2016-05-02 09:03:19 -0400, Robert Haas wrote: > >> On Fri, Apr 29, 2016 at 6:08 PM, Kevin Grittner <kgrittn@gmail.com> wrote: > >> > Now to continue with the performance benchmarks. I'm pretty sure > >> > we've fixed the problems when the feature is disabled > >> > (old_snapshot_threshold = -1), and there are several suggestions > >> > for improving performance while it is on that need to be compared > >> > and benchmarked. > >> > >> If anyone thinks that the issue with the feature disabled is NOT > >> fixed, please speak up! I'm moving the corresponding open item to > >> CLOSE_WAIT status, meaning that it will be closed if nobody shows up > >> to say that there is still an issue. > > > > Well, I don't agree that the feature is in a releaseable state. The > > datastructure is pretty much non-scalable, and maintained on the wrong > > side (every read, instead of once in writing writing xacts). There's no > > proposal actually addressing the scalability issues. > > Unless I'm missing something fundamental the feature only requires > tracking an upper bound on xmin observed by snapshots between clock > ticks. I'm not saying that there's no datastructure that can make the whole thing efficient - just that current datastructure doesn't look viable and that I've not seen that point addressed seriously. > The simplest way to do this would be a periodic process that > increments a clock counter (32bit counter would be plenty) and then > calculates xmin for the preceding range. With this scheme > GetSnapshotData would need two atomic fetches to get current LSN and > the timestamp. Test for old snapshot can also run completely lock free > with a single atomic fetch of threshold timestamp. The negative side > is that we need to have a process running that runs the clock ticks > and the ticks may sometimes be late. Giving something like autovacuum > launcher this task doesn't seem too bad and the consequence of falling > behind is just delayed timing out of old snapshots. That'd be one way, yes. I suspect it'd even be sufficient to move maintaining the map around GetNewTransactionId(); so only writers pay the overhead. Given that writes obviously are slower than reads that might make the problem disappear for a long while. > As far as I can see this approach would get rid of any scalability > issues, but it is a pretty significant change and requires 64bit > atomic reads to get rid of contention on xlog insert lock. Yea. Andres
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Bruce Momjian
Date:
On Mon, May 2, 2016 at 07:21:21AM -0700, Andres Freund wrote: > On 2016-05-02 09:03:19 -0400, Robert Haas wrote: > > On Fri, Apr 29, 2016 at 6:08 PM, Kevin Grittner <kgrittn@gmail.com> wrote: > > > Now to continue with the performance benchmarks. I'm pretty sure > > > we've fixed the problems when the feature is disabled > > > (old_snapshot_threshold = -1), and there are several suggestions > > > for improving performance while it is on that need to be compared > > > and benchmarked. > > > > If anyone thinks that the issue with the feature disabled is NOT > > fixed, please speak up! I'm moving the corresponding open item to > > CLOSE_WAIT status, meaning that it will be closed if nobody shows up > > to say that there is still an issue. > > Well, I don't agree that the feature is in a releaseable state. The > datastructure is pretty much non-scalable, and maintained on the wrong > side (every read, instead of once in writing writing xacts). There's no > proposal actually addressing the scalability issues. I also strongly question whether we should revert this feature and try again in 9.7. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-05-02 10:32:28 -0400, Robert Haas wrote: > You are certainly welcome to add a new open item to cover those > complaints. Done that. > But I do not want to blur together the discussion of > whether the feature is well-designed with the question of whether it > regresses performance when it is turned off. Those are severable > issues, meriting separate discussion (and probably separate threads). The current thread already contains a lot of information about both, so separating doesn't seem beneficial. Greetings, Andres Freund
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Wed, Apr 20, 2016 at 8:08 PM, Ants Aasma <ants.aasma@eesti.ee> wrote:
> I had an idea I wanted to test out. The gist of it is to effectively
> have the last slot of timestamp to xid map stored in the latest_xmin
> field and only update the mapping when slot boundaries are crossed.
> See attached WIP patch for details. This way the exclusive lock only
> needs to be acquired once per minute. The common case is a spinlock
> that could be replaced with atomics later.
I rebased the patch Ants posted (attached), and am running
benchmarks on a cthulhu (a big NUMA machine with 8 memory nodes).
Normally I wouldn't post results without a lot more data points
with multiple samples at each, but the initial results have me
wondering whether people would like to see this pushed later today
so that it has some time in the buildfarm and then into beta1.
Running the r/w TPC-B (sort of) load with scale, jobs, and threads
at 1000, and the database configured as I would for a production
server of that size, preliminary TPS results are:
master, -1: 8158
master, 10min: 2019
Ants' patch, 10min: 7804
Basically it just skips the maintenance of the time/xid mapping
unless current time has advanced to a new minute.
I can see arguments for tuning this far in time for the beta, as
well as the argument to wait until after the beta, so I'm just
throwing it out there to see what other people think. I wouldn't
do it unless I have three runs at -1 and 10min with the patch, all
showing similar numbers. If the BF chokes on it I would revert
this optimization attempt.
Thoughts?
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
> I had an idea I wanted to test out. The gist of it is to effectively
> have the last slot of timestamp to xid map stored in the latest_xmin
> field and only update the mapping when slot boundaries are crossed.
> See attached WIP patch for details. This way the exclusive lock only
> needs to be acquired once per minute. The common case is a spinlock
> that could be replaced with atomics later.
I rebased the patch Ants posted (attached), and am running
benchmarks on a cthulhu (a big NUMA machine with 8 memory nodes).
Normally I wouldn't post results without a lot more data points
with multiple samples at each, but the initial results have me
wondering whether people would like to see this pushed later today
so that it has some time in the buildfarm and then into beta1.
Running the r/w TPC-B (sort of) load with scale, jobs, and threads
at 1000, and the database configured as I would for a production
server of that size, preliminary TPS results are:
master, -1: 8158
master, 10min: 2019
Ants' patch, 10min: 7804
Basically it just skips the maintenance of the time/xid mapping
unless current time has advanced to a new minute.
I can see arguments for tuning this far in time for the beta, as
well as the argument to wait until after the beta, so I'm just
throwing it out there to see what other people think. I wouldn't
do it unless I have three runs at -1 and 10min with the patch, all
showing similar numbers. If the BF chokes on it I would revert
this optimization attempt.
Thoughts?
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
Hi, On 2016-05-06 14:18:22 -0500, Kevin Grittner wrote: > I rebased the patch Ants posted (attached), and am running > benchmarks on a cthulhu (a big NUMA machine with 8 memory nodes). > Normally I wouldn't post results without a lot more data points > with multiple samples at each, but the initial results have me > wondering whether people would like to see this pushed later today > so that it has some time in the buildfarm and then into beta1. I think that generally would make sense. We quite possibly need some further changes, but it seems more likely that we can find them if the patch runs close to the disabled performance. > Running the r/w TPC-B (sort of) load with scale, jobs, and threads > at 1000, and the database configured as I would for a production > server of that size, preliminary TPS results are: > > master, -1: 8158 > master, 10min: 2019 > Ants' patch, 10min: 7804 That's rather nice. Did you test read-only as well? If you'd feel more comfortable committing after I've run some performance tests, I could kick off some soon. > I can see arguments for tuning this far in time for the beta, as > well as the argument to wait until after the beta, so I'm just > throwing it out there to see what other people think. I wouldn't > do it unless I have three runs at -1 and 10min with the patch, all > showing similar numbers. If the BF chokes on it I would revert > this optimization attempt. +1 for going forward. I'm still doubtful that it's a good idea to the map maintenance from GetSnapshotData(), but the issue becomes much less severe when addressed like this. The primary reasons why I'd like to move it is because of the significant amount of added gettimeofday() calls which are a real hog in some virtualized environments, and because I'm doubtful of tying the current time to the xmin horizon. > diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c > index e1551a3..b7d965a 100644 > --- a/src/backend/utils/time/snapmgr.c > +++ b/src/backend/utils/time/snapmgr.c > @@ -80,8 +80,11 @@ typedef struct OldSnapshotControlData > */ > slock_t mutex_current; /* protect current timestamp */ > int64 current_timestamp; /* latest snapshot timestamp */ > - slock_t mutex_latest_xmin; /* protect latest snapshot xmin */ > + slock_t mutex_latest_xmin; /* protect latest snapshot xmin > + * and next_map_update > + */ > TransactionId latest_xmin; /* latest snapshot xmin */ > + int64 next_map_update; /* latest snapshot valid up to */ > slock_t mutex_threshold; /* protect threshold fields */ > int64 threshold_timestamp; /* earlier snapshot is old */ > TransactionId threshold_xid; /* earlier xid may be gone */ Overly nitpickily I'd refer to the actual variable name (instead of "latest snapshot xmin") in the mutex_latest_xmin comment. > if (!same_ts_as_threshold) > { > + if (ts == update_ts) > + { > + xlimit = latest_xmin; > + if (NormalTransactionIdFollows(xlimit, recentXmin)) > + SetOldSnapshotThresholdTimestamp(ts, xlimit); > + } > + else > + { > LWLockAcquire(OldSnapshotTimeMapLock, LW_SHARED); > > if (oldSnapshotControl->count_used > 0 I guess it's just an issue in my mail-reader, but the indentation looks funky here. Looks roughly sensible. Greetings, Andres Freund
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Fri, May 6, 2016 at 5:07 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2016-05-06 14:18:22 -0500, Kevin Grittner wrote:
>> I rebased the patch Ants posted (attached), and am running
>> benchmarks on a cthulhu (a big NUMA machine with 8 memory nodes).
>> Normally I wouldn't post results without a lot more data points
>> with multiple samples at each, but the initial results have me
>> wondering whether people would like to see this pushed later today
>> so that it has some time in the buildfarm and then into beta1.
>
> I think that generally would make sense. We quite possibly need some
> further changes, but it seems more likely that we can find them if the
> patch runs close to the disabled performance.
ok
>> Running the r/w TPC-B (sort of) load with scale, jobs, and threads
>> at 1000, and the database configured as I would for a production
>> server of that size, preliminary TPS results are:
>>
>> master, -1: 8158
>> master, 10min: 2019
>> Ants' patch, 10min: 7804
>
> That's rather nice. Did you test read-only as well?
Not yet. I don't trust short runs, so I've been going with -T2400;
with setup times and so on, that limits me to one run per hour of
time I book the machine, and I'm competing with others for that. I
do plan to run read-only, too.
From the 40 minute tests so far with Ants' patch (alternating settings):
old_snapshot_threshold = 10
7804
9524
9512
old_snapshot_threshold = -1
10421
8691
8977
It's disappointing that I am not getting more consistent numbers,
but NUMA can be hard to manage that way.
> If you'd feel more comfortable committing after I've run some
> performance tests, I could kick off some soon.
I think I should get it onto the buildfarm if we're going for
beta2, so there's time to recognize any problem (unlikely as that
*seems*) and back this out before beta if needed. That said, all
additional data points welcome!
>> I can see arguments for tuning this far in time for the beta, as
>> well as the argument to wait until after the beta, so I'm just
>> throwing it out there to see what other people think. I wouldn't
>> do it unless I have three runs at -1 and 10min with the patch, all
>> showing similar numbers. If the BF chokes on it I would revert
>> this optimization attempt.
>
> +1 for going forward. I'm still doubtful that it's a good idea to the
> map maintenance from GetSnapshotData(), but the issue becomes much less
> severe when addressed like this.
>
> The primary reasons why I'd like to move it is because of the
> significant amount of added gettimeofday() calls which are a real hog in
> some virtualized environments, and because I'm doubtful of tying the
> current time to the xmin horizon.
When I initially presented the proof of concept patch during the
9.5 development cycle it was based on transaction counts, and that
was the biggest criticism, and it came from many quarters. Using
time was the big demand from just about everyone, and I'm not sure
how you do that without a mapping of time to xmin horizon. If you
have some other idea, I'm all ears.
> Looks roughly sensible.
Will push shortly with the nit-pick fixes you requested.
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
> On 2016-05-06 14:18:22 -0500, Kevin Grittner wrote:
>> I rebased the patch Ants posted (attached), and am running
>> benchmarks on a cthulhu (a big NUMA machine with 8 memory nodes).
>> Normally I wouldn't post results without a lot more data points
>> with multiple samples at each, but the initial results have me
>> wondering whether people would like to see this pushed later today
>> so that it has some time in the buildfarm and then into beta1.
>
> I think that generally would make sense. We quite possibly need some
> further changes, but it seems more likely that we can find them if the
> patch runs close to the disabled performance.
ok
>> Running the r/w TPC-B (sort of) load with scale, jobs, and threads
>> at 1000, and the database configured as I would for a production
>> server of that size, preliminary TPS results are:
>>
>> master, -1: 8158
>> master, 10min: 2019
>> Ants' patch, 10min: 7804
>
> That's rather nice. Did you test read-only as well?
Not yet. I don't trust short runs, so I've been going with -T2400;
with setup times and so on, that limits me to one run per hour of
time I book the machine, and I'm competing with others for that. I
do plan to run read-only, too.
From the 40 minute tests so far with Ants' patch (alternating settings):
old_snapshot_threshold = 10
7804
9524
9512
old_snapshot_threshold = -1
10421
8691
8977
It's disappointing that I am not getting more consistent numbers,
but NUMA can be hard to manage that way.
> If you'd feel more comfortable committing after I've run some
> performance tests, I could kick off some soon.
I think I should get it onto the buildfarm if we're going for
beta2, so there's time to recognize any problem (unlikely as that
*seems*) and back this out before beta if needed. That said, all
additional data points welcome!
>> I can see arguments for tuning this far in time for the beta, as
>> well as the argument to wait until after the beta, so I'm just
>> throwing it out there to see what other people think. I wouldn't
>> do it unless I have three runs at -1 and 10min with the patch, all
>> showing similar numbers. If the BF chokes on it I would revert
>> this optimization attempt.
>
> +1 for going forward. I'm still doubtful that it's a good idea to the
> map maintenance from GetSnapshotData(), but the issue becomes much less
> severe when addressed like this.
>
> The primary reasons why I'd like to move it is because of the
> significant amount of added gettimeofday() calls which are a real hog in
> some virtualized environments, and because I'm doubtful of tying the
> current time to the xmin horizon.
When I initially presented the proof of concept patch during the
9.5 development cycle it was based on transaction counts, and that
was the biggest criticism, and it came from many quarters. Using
time was the big demand from just about everyone, and I'm not sure
how you do that without a mapping of time to xmin horizon. If you
have some other idea, I'm all ears.
> Looks roughly sensible.
Will push shortly with the nit-pick fixes you requested.
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-05-06 19:43:24 -0500, Kevin Grittner wrote: > It's disappointing that I am not getting more consistent numbers, > but NUMA can be hard to manage that way. FWIW, in my experience, unless you disable autovacuum (or rather auto-analyze), the effects from non-predicable analyze runs with long-running snapshots are worse. I mean the numa effects suck, but in r/w workload effects of analyze are often much worse. That comment reminds me of a question I had: Did you consider the effect of this patch on analyze? It uses a snapshot, and by memory you've not built in a defense against analyze being cancelled. > Will push shortly with the nit-pick fixes you requested. Cool. Greetings, Andres Freund
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Fri, May 6, 2016 at 7:48 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2016-05-06 19:43:24 -0500, Kevin Grittner wrote:
>> It's disappointing that I am not getting more consistent numbers,
>> but NUMA can be hard to manage that way.
>
> FWIW, in my experience, unless you disable autovacuum (or rather
> auto-analyze), the effects from non-predicable analyze runs with
> long-running snapshots are worse. I mean the numa effects suck, but in
> r/w workload effects of analyze are often much worse.
Hm. But the benefits of the patch are not there if autovacuum is
off. I'm gonna need to ponder the best way to test given all that.
> That comment reminds me of a question I had: Did you consider the effect
> of this patch on analyze? It uses a snapshot, and by memory you've not
> built in a defense against analyze being cancelled.
Will need to check on that.
>> Will push shortly with the nit-pick fixes you requested.
>
> Cool.
Done.
I will be checking in on the buildfarm, but if it starts causing
problems while I'm, say, sleeping -- I won't be offended if someone
else reverts 7e3da1c4737fd6582e12c80983987e4d2cbc1d17.
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
> On 2016-05-06 19:43:24 -0500, Kevin Grittner wrote:
>> It's disappointing that I am not getting more consistent numbers,
>> but NUMA can be hard to manage that way.
>
> FWIW, in my experience, unless you disable autovacuum (or rather
> auto-analyze), the effects from non-predicable analyze runs with
> long-running snapshots are worse. I mean the numa effects suck, but in
> r/w workload effects of analyze are often much worse.
Hm. But the benefits of the patch are not there if autovacuum is
off. I'm gonna need to ponder the best way to test given all that.
> That comment reminds me of a question I had: Did you consider the effect
> of this patch on analyze? It uses a snapshot, and by memory you've not
> built in a defense against analyze being cancelled.
Will need to check on that.
>> Will push shortly with the nit-pick fixes you requested.
>
> Cool.
Done.
I will be checking in on the buildfarm, but if it starts causing
problems while I'm, say, sleeping -- I won't be offended if someone
else reverts 7e3da1c4737fd6582e12c80983987e4d2cbc1d17.
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-05-06 20:28:27 -0500, Kevin Grittner wrote: > On Fri, May 6, 2016 at 7:48 PM, Andres Freund <andres@anarazel.de> wrote: > > On 2016-05-06 19:43:24 -0500, Kevin Grittner wrote: > >> It's disappointing that I am not getting more consistent numbers, > >> but NUMA can be hard to manage that way. > > > > FWIW, in my experience, unless you disable autovacuum (or rather > > auto-analyze), the effects from non-predicable analyze runs with > > long-running snapshots are worse. I mean the numa effects suck, but in > > r/w workload effects of analyze are often much worse. > > Hm. But the benefits of the patch are not there if autovacuum is > off. I'm gonna need to ponder the best way to test given all that. It's sufficient to set the threshhold for analyze very high, as vacuum itself doesn't have that problem. I basically just set autovacuum_analyze_threshold to INT_MAX , that alleviates the problem for normal runs. Andres
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Fri, May 6, 2016 at 8:28 PM, Kevin Grittner <kgrittn@gmail.com> wrote: > On Fri, May 6, 2016 at 7:48 PM, Andres Freund <andres@anarazel.de> wrote: >> That comment reminds me of a question I had: Did you consider the effect >> of this patch on analyze? It uses a snapshot, and by memory you've not >> built in a defense against analyze being cancelled. > > Will need to check on that. With a 1min threshold, after loading a table "v" with a million rows, beginning a repeatable read transaction on a different connection and opening a cursor against that table, deleting almost all rows on the original connection, and waiting a few minutes I see this in the open transaction: test=# analyze verbose v; INFO: analyzing "public.v" INFO: "v": scanned 4425 of 4425 pages, containing 1999 live rows and 0 dead rows; 1999 rows in sample, 1999 estimated total rows ANALYZE test=# select count(*) from v; ERROR: snapshot too old Meanwhile, no errors appeared in the log from autovacuum. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-05-24 11:24:44 -0500, Kevin Grittner wrote: > On Fri, May 6, 2016 at 8:28 PM, Kevin Grittner <kgrittn@gmail.com> wrote: > > On Fri, May 6, 2016 at 7:48 PM, Andres Freund <andres@anarazel.de> wrote: > > >> That comment reminds me of a question I had: Did you consider the effect > >> of this patch on analyze? It uses a snapshot, and by memory you've not > >> built in a defense against analyze being cancelled. > > > > Will need to check on that. > > With a 1min threshold, after loading a table "v" with a million > rows, beginning a repeatable read transaction on a different > connection and opening a cursor against that table, deleting almost > all rows on the original connection, and waiting a few minutes I > see this in the open transaction: > > test=# analyze verbose v; > INFO: analyzing "public.v" > INFO: "v": scanned 4425 of 4425 pages, containing 1999 live rows and > 0 dead rows; 1999 rows in sample, 1999 estimated total rows > ANALYZE > test=# select count(*) from v; > ERROR: snapshot too old > > Meanwhile, no errors appeared in the log from autovacuum. I'd guess that that problem could only be reproduced if autoanalyze takes longer than the timeout, and there's rows pruned after it has started? Analyze IIRC acquires a new snapshot when getting sample rows, so it'll not necessarily trigger in the above scenario, right? Is there anything preventing this from becoming a problem? Greetings, Andres Freund
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Tue, May 24, 2016 at 12:00 PM, Andres Freund <andres@anarazel.de> wrote: > Analyze IIRC acquires a new snapshot when getting sample rows, I could not find anything like that, and a case-insensitive search of analyze.c finds no occurrences of "snap". Can you remember where you think you saw something that would cause the ANALYZE command in my test to use a snapshot other than the one from the REPEATABLE READ transaction in which it was run? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Alvaro Herrera
Date:
Kevin Grittner wrote: > On Tue, May 24, 2016 at 12:00 PM, Andres Freund <andres@anarazel.de> wrote: > > > Analyze IIRC acquires a new snapshot when getting sample rows, > > I could not find anything like that, and a case-insensitive search > of analyze.c finds no occurrences of "snap". Can you remember > where you think you saw something that would cause the ANALYZE > command in my test to use a snapshot other than the one from the > REPEATABLE READ transaction in which it was run? For ANALYZE, the snapshot is set in vacuum() prior to calling analyze_rel(); see vacuum.c. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Tom Lane
Date:
Kevin Grittner <kgrittn@gmail.com> writes: > On Tue, May 24, 2016 at 12:00 PM, Andres Freund <andres@anarazel.de> wrote: >> Analyze IIRC acquires a new snapshot when getting sample rows, > I could not find anything like that, and a case-insensitive search > of analyze.c finds no occurrences of "snap". Can you remember > where you think you saw something that would cause the ANALYZE > command in my test to use a snapshot other than the one from the > REPEATABLE READ transaction in which it was run? The control logic concerned with that is in vacuum.c, not analyze.c. AFAICS a manual ANALYZE that's within a transaction block would use the prevailing snapshot (see in_outer_xact tests). There are a lot of cases that wouldn't, though. regards, tom lane
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-05-24 13:04:09 -0500, Kevin Grittner wrote: > On Tue, May 24, 2016 at 12:00 PM, Andres Freund <andres@anarazel.de> wrote: > > > Analyze IIRC acquires a new snapshot when getting sample rows, > > I could not find anything like that, and a case-insensitive search > of analyze.c finds no occurrences of "snap". Can you remember > where you think you saw something that would cause the ANALYZE > command in my test to use a snapshot other than the one from the > REPEATABLE READ transaction in which it was run? It's outside of analyze.c: autovacuum_do_vac_analyze() -> vacuum() -> if (options & VACOPT_VACUUM) use_own_xacts = true; else { Assert(options & VACOPT_ANALYZE); if (IsAutoVacuumWorkerProcess()) use_own_xacts = true; ... if (options & VACOPT_ANALYZE) { /* * If using separate xacts, start one for analyze. Otherwise, * we can use the outer transaction. */ if (use_own_xacts) { StartTransactionCommand(); /* functions in indexes may want a snapshot set */ PushActiveSnapshot(GetTransactionSnapshot()); } analyze_rel(relid, relation, options, params, va_cols, in_outer_xact, vac_strategy); if (use_own_xacts) { PopActiveSnapshot(); CommitTransactionCommand(); } } Andres
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Tue, May 24, 2016 at 12:00 PM, Andres Freund <andres@anarazel.de> wrote: > On 2016-05-24 11:24:44 -0500, Kevin Grittner wrote: >> On Fri, May 6, 2016 at 8:28 PM, Kevin Grittner <kgrittn@gmail.com> wrote: >>> On Fri, May 6, 2016 at 7:48 PM, Andres Freund <andres@anarazel.de> wrote: >> >>>> That comment reminds me of a question I had: Did you consider the effect >>>> of this patch on analyze? It uses a snapshot, and by memory you've not >>>> built in a defense against analyze being cancelled. The primary defense is not considering a cancellation except in 30 of the 500 places a page is used. None of those 30 are, as far as I can see (upon review in response to your question), used in the analyze process. >> With a 1min threshold, after loading a table "v" with a million >> rows, beginning a repeatable read transaction on a different >> connection and opening a cursor against that table, deleting almost >> all rows on the original connection, and waiting a few minutes I >> see this in the open transaction: >> >> test=# analyze verbose v; >> INFO: analyzing "public.v" >> INFO: "v": scanned 4425 of 4425 pages, containing 1999 live rows and >> 0 dead rows; 1999 rows in sample, 1999 estimated total rows >> ANALYZE >> test=# select count(*) from v; >> ERROR: snapshot too old >> >> Meanwhile, no errors appeared in the log from autovacuum. > > I'd guess that that problem could only be reproduced if autoanalyze > takes longer than the timeout, and there's rows pruned after it has > started? Analyze IIRC acquires a new snapshot when getting sample rows, > so it'll not necessarily trigger in the above scenario, right? Per Tom's recollection and my review of the code, the transaction snapshot would be used in the test I show above, and the combination of the verbose output the subsequent query show clearly that if one of the page references capable of throwing the error were encountered with this snapshot, the error would be thrown. So at least in this ANALYZE run, there is empirical support for what I found in reviewing the code -- none of the places where we check for an old snapshot are exercised during an ANALYZE. > Is there anything preventing this from becoming a problem? The fundamental approach that the error can only appear on user-facing scans, not internal reads and positioning. Unless there is some code path that uses a scan where the snapshot age is checked, this cannot be a problem. I don't see any such path, but if you do, please let me know, and I'll fix things. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Alvaro Herrera
Date:
Kevin Grittner wrote: > On Tue, May 24, 2016 at 12:00 PM, Andres Freund <andres@anarazel.de> wrote: > > On 2016-05-24 11:24:44 -0500, Kevin Grittner wrote: > >> On Fri, May 6, 2016 at 8:28 PM, Kevin Grittner <kgrittn@gmail.com> wrote: > >>> On Fri, May 6, 2016 at 7:48 PM, Andres Freund <andres@anarazel.de> wrote: > >> > >>>> That comment reminds me of a question I had: Did you consider the effect > >>>> of this patch on analyze? It uses a snapshot, and by memory you've not > >>>> built in a defense against analyze being cancelled. > > The primary defense is not considering a cancellation except in 30 > of the 500 places a page is used. None of those 30 are, as far as > I can see (upon review in response to your question), used in the > analyze process. I think what this means is that vacuum might remove tuples that would otherwise be visible to analyze's snapshot. I suppose that's acceptable. I wondered if it could cause harm to the size of the sample, but after looking at acquire_sample_rows briefly I think it'd be unharmed. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-05-24 14:48:35 -0500, Kevin Grittner wrote: > On Tue, May 24, 2016 at 12:00 PM, Andres Freund <andres@anarazel.de> wrote: > > On 2016-05-24 11:24:44 -0500, Kevin Grittner wrote: > >> On Fri, May 6, 2016 at 8:28 PM, Kevin Grittner <kgrittn@gmail.com> wrote: > >>> On Fri, May 6, 2016 at 7:48 PM, Andres Freund <andres@anarazel.de> wrote: > >> > >>>> That comment reminds me of a question I had: Did you consider the effect > >>>> of this patch on analyze? It uses a snapshot, and by memory you've not > >>>> built in a defense against analyze being cancelled. > > The primary defense is not considering a cancellation except in 30 > of the 500 places a page is used. None of those 30 are, as far as > I can see (upon review in response to your question), used in the > analyze process. Uh. How's that a defense? That seems like a recipe for corruption, not a protection. That might be acceptable in the analyze case, but what about e.g. concurrent index builds? E.g. IndexBuildHeapRangeScan() doesn't seem to contain any checks against outdated blocks, and that's certainly not ok? It appears that concurrent index builds are currently broken from a quick skim? > > Is there anything preventing this from becoming a problem? > > The fundamental approach that the error can only appear on > user-facing scans, not internal reads and positioning. > Unless there is some code path that uses a scan where the snapshot > age is checked, this cannot be a problem. I don't see any such > path, but if you do, please let me know, and I'll fix things. That scares me. Not throwing an error, and not being broken are entirely different things. Andres
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Tue, May 24, 2016 at 3:54 PM, Andres Freund <andres@anarazel.de> wrote: > what about e.g. concurrent index builds? E.g. IndexBuildHeapRangeScan() doesn't > seem to contain any checks against outdated blocks Why would it? We're talking about blocks where there were dead tuples, with the transaction which updated or deleted the tuples being complete, where those dead tuples were vacuumed away. These should not appear in the index, should they? > and that's certainly not ok? Why not? > It appears that concurrent index builds are currently broken > from a quick skim? Either you don't understand this feature very well, or I don't understand concurrent index build very well. I thought we burned a lot of time going through the index an extra time just to get rid of dead tuples -- why would it be a problem not to add them in the first place? >>> Is there anything preventing this from becoming a problem? >> >> The fundamental approach that the error can only appear on >> user-facing scans, not internal reads and positioning. > >> Unless there is some code path that uses a scan where the snapshot >> age is checked, this cannot be a problem. I don't see any such >> path, but if you do, please let me know, and I'll fix things. > > That scares me. Not throwing an error, and not being broken are entirely > different things. Absolutely, but let's not start pointing at random chunks of code and asking why an error isn't thrown there without showing that you get bad results otherwise, or at least some plausible argument why you might. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Robert Haas
Date:
On Tue, May 24, 2016 at 3:48 PM, Kevin Grittner <kgrittn@gmail.com> wrote: > On Tue, May 24, 2016 at 12:00 PM, Andres Freund <andres@anarazel.de> wrote: >> On 2016-05-24 11:24:44 -0500, Kevin Grittner wrote: >>> On Fri, May 6, 2016 at 8:28 PM, Kevin Grittner <kgrittn@gmail.com> wrote: >>>> On Fri, May 6, 2016 at 7:48 PM, Andres Freund <andres@anarazel.de> wrote: >>> >>>>> That comment reminds me of a question I had: Did you consider the effect >>>>> of this patch on analyze? It uses a snapshot, and by memory you've not >>>>> built in a defense against analyze being cancelled. > > The primary defense is not considering a cancellation except in 30 > of the 500 places a page is used. None of those 30 are, as far as > I can see (upon review in response to your question), used in the > analyze process. It's not obvious to me how this is supposed to work. If ANALYZE's snapshot is subject to being ignored for xmin purposes because of snapshot_too_old, then I would think it would need to consider cancelling itself if it reads a page with possibly-removed data, just like in any other case. It seems that we might not actually need a snapshot set for ANALYZE in all cases, because the comments say: /* functions in indexes may want a snapshot set */ PushActiveSnapshot(GetTransactionSnapshot()); If we can get away with it, it would be a rather large win to only set a snapshot when the table has an expression index. For purposes of "snapshot too old", though, it will be important that a function in an index which tries to read data from some other table which has been pruned cancels itself when necessary. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Tue, May 24, 2016 at 4:10 PM, Robert Haas <robertmhaas@gmail.com> wrote: > For purposes of > "snapshot too old", though, it will be important that a function in an > index which tries to read data from some other table which has been > pruned cancels itself when necessary. Hm. I'll try to work up a test case for this. If you have one, please send it along to me. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Tue, May 24, 2016 at 4:09 PM, Kevin Grittner <kgrittn@gmail.com> wrote: > On Tue, May 24, 2016 at 3:54 PM, Andres Freund <andres@anarazel.de> wrote: >> It appears that concurrent index builds are currently broken >> from a quick skim? > > Either you don't understand this feature very well, or I don't > understand concurrent index build very well. And it may be the latter. On closer review I think some adjustment may be needed in IndexBuildHeapRangeScan(). I'm not sure that throwing the error is necessary, since there is a flag to say that the index is unsafe for existing snapshots -- it may be enough to set that flag. Sorry for my earlier email. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-05-24 16:09:27 -0500, Kevin Grittner wrote: > On Tue, May 24, 2016 at 3:54 PM, Andres Freund <andres@anarazel.de> wrote: > > > what about e.g. concurrent index builds? E.g. IndexBuildHeapRangeScan() doesn't > > seem to contain any checks against outdated blocks > > Why would it? We're talking about blocks where there were dead > tuples, with the transaction which updated or deleted the tuples > being complete, where those dead tuples were vacuumed away. These > should not appear in the index, should they? (it appears I had switched around the concerns for concurrent and !concurrent in my head. But I think both might be negatively affected.) For concurrent indexes we'll actually use a normal mvcc snapshot, which means heap_getnext() in IndexBuildHeapRangeScan() and validate_index_heapscan() will error out when encountering removed tuples. Which means it'll be hard to ever perform a concurrent reindex where an individual phase takes more than old_snapshot_threshold - problematic from my POV, given that old_snapshot_threshold cannot be changed at runtime. For normal index scans the following appears to be problematic: case HEAPTUPLE_RECENTLY_DEAD: /* * If tuple is recently deleted then we must index it * anyway to preserve MVCC semantics. (Pre-existing * transactions could try to use the index after we finish * building it, and may need to see such tuples.) * * However, if it was HOT-updated then we must only index * the live tuple at the end of the HOT-chain. Since this * breaks semantics for pre-existing snapshots, mark the * index as unusable for them. */ afaics that detection is broken if we advance the xmin horizon more aggressively than the snapshot. The LSNs of the created index pages will be new, and we'll thus not necessarily error out when requried. At the very least we'd have to set indexInfo->ii_BrokenHotChain in that case. As far as I can see normal index builds will allow concurrent hot prunes and everything; since those only require page-level exclusive locks. So for !concurrent builds we might end up with a corrupt index. I think many of the places relying on heap scans with !mvcc snapshots are problematic in that way. Outdated reads will not be detected by TestForOldSnapshot() (given the (snapshot)->satisfies == HeapTupleSatisfiesMVCC condition therein), and rely on the snapshot to be actually working. E.g. CLUSTER/ VACUUM FULL rely on accurate HEAPTUPLE_RECENTLY_DEAD switch (HeapTupleSatisfiesVacuum(tuple, OldestXmin, buf)) { ... case HEAPTUPLE_RECENTLY_DEAD: tups_recently_dead += 1; /* fall through */ case HEAPTUPLE_LIVE: /* Live or recently dead, must copy it */ isdead = false; break; If an old session with >= repeatable read accesses a clustered table (after the cluster committed), they'll now not see any errors, because all the LSNs look new. > >>> Is there anything preventing this from becoming a problem? > >> > >> The fundamental approach that the error can only appear on > >> user-facing scans, not internal reads and positioning. > > > >> Unless there is some code path that uses a scan where the snapshot > >> age is checked, this cannot be a problem. I don't see any such > >> path, but if you do, please let me know, and I'll fix things. > > > > That scares me. Not throwing an error, and not being broken are entirely > > different things. > > Absolutely, but let's not start pointing at random chunks of code > and asking why an error isn't thrown there without showing that you > get bad results otherwise, or at least some plausible argument why > you might. This attitude is disturbing. You've evidently not at all looked at the snapshot issues around analyze before; even though snapshot too old at the very least introduces a behavioural issue there (if not a bug at least in the case of expression based stuff). I've asked you about principled defenses, and your answer was "we don't error out". Now I point to another place, and you respond with a relatively strong dismissal. Independent of me being right or wrong, that seems to be the completely wrong way round. Andres
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Tue, May 24, 2016 at 4:56 PM, Andres Freund <andres@anarazel.de> wrote: > The LSNs of the created index pages will be new, and we'll thus > not necessarily error out when requried. It is *old* LSNs that are *safe* -- *new* LSNs are what trigger the "snapshot too old" error. So your observation may be a reason that there is not, in fact, any bug here. That said, even if there is no chance that using the index could generate incorrect results, it may be worth looking at whether avoiding index use (as mentioned below) might avoid false positives, and have enough value as an optimization to add. Of course, if that is the case, there is the question of whether that is appropriate for 9.6 or material for the first version 10 CF. > At the very least we'd have to set indexInfo->ii_BrokenHotChain > in that case. If there is a bug, this is what I will look at first -- having queries avoid the index if they are using an old snapshot, rather than throwing an error. > As far as I can see normal index builds will allow concurrent hot > prunes and everything; since those only require page-level > exclusive locks. > > So for !concurrent builds we might end up with a corrupt index. ... by which you mean an index that omits certainly-dead heap tuples which have been the subject of early pruning or vacuum, even if there are still registered snapshots that include those tuples? Or do you see something else? Again, since both the heap pages involved and all the new index pages would have LSNs recent enough to trigger the old snapshot check, I'm not sure where the problem is, but will review closely to see what I might have missed. > I think many of the places relying on heap scans with !mvcc > snapshots are problematic in that way. Outdated reads will not be > detected by TestForOldSnapshot() (given the (snapshot)->satisfies > == HeapTupleSatisfiesMVCC condition therein), and rely on the > snapshot to be actually working. E.g. CLUSTER/ VACUUM FULL rely > on accurate HEAPTUPLE_RECENTLY_DEAD Don't the "RECENTLY" values imply that the transaction is still running which cause the tuple to be dead? Since tuples are not subject to early pruning or vacuum when that is the case, where do you see a problem? The snapshot itself has the usual xmin et al., so I'm not sure what you might mean by "the snapshot to be actually working" if not the override at the time of pruning/vacuuming. > If an old session with >= repeatable read accesses a clustered > table (after the cluster committed), they'll now not see any > errors, because all the LSNs look new. Again, it is new LSNs that trigger errors; if the page has not been written recently the LSN is old and there is no error. I think you may be seeing problems based on getting the basics of this backwards. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Fri, May 27, 2016 at 9:58 AM, Kevin Grittner <kgrittn@gmail.com> wrote: > On Tue, May 24, 2016 at 4:56 PM, Andres Freund <andres@anarazel.de> wrote: >> If an old session with >= repeatable read accesses a clustered >> table (after the cluster committed), they'll now not see any >> errors, because all the LSNs look new. > > Again, it is new LSNs that trigger errors; if the page has not been > written recently the LSN is old and there is no error. I think you > may be seeing problems based on getting the basics of this > backwards. I am reviewing the suggestion of a possible bug now, and will make it my top priority until resolved. By the end of 1 June I will either have committed a fix or posted an explanation of why the concern is mistaken, with test results to demonstrate correct behavior. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Tue, May 24, 2016 at 4:10 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, May 24, 2016 at 3:48 PM, Kevin Grittner <kgrittn@gmail.com> wrote: >> On Tue, May 24, 2016 at 12:00 PM, Andres Freund <andres@anarazel.de> wrote: >>> On 2016-05-24 11:24:44 -0500, Kevin Grittner wrote: >>>> On Fri, May 6, 2016 at 8:28 PM, Kevin Grittner <kgrittn@gmail.com> wrote: >>>>> On Fri, May 6, 2016 at 7:48 PM, Andres Freund <andres@anarazel.de> wrote: >>>> >>>>>> That comment reminds me of a question I had: Did you consider the effect >>>>>> of this patch on analyze? It uses a snapshot, and by memory you've not >>>>>> built in a defense against analyze being cancelled. >> >> The primary defense is not considering a cancellation except in 30 >> of the 500 places a page is used. None of those 30 are, as far as >> I can see (upon review in response to your question), used in the >> analyze process. > > It's not obvious to me how this is supposed to work. If ANALYZE's > snapshot is subject to being ignored for xmin purposes because of > snapshot_too_old, then I would think it would need to consider > cancelling itself if it reads a page with possibly-removed data, just > like in any other case. It seems that we might not actually need a > snapshot set for ANALYZE in all cases, because the comments say: > > /* functions in indexes may want a snapshot set */ > PushActiveSnapshot(GetTransactionSnapshot()); > > If we can get away with it, it would be a rather large win to only set > a snapshot when the table has an expression index. For purposes of > "snapshot too old", though, it will be important that a function in an > index which tries to read data from some other table which has been > pruned cancels itself when necessary. I will make this my top priority after resolving the question of whether there is an issue with CREATE INDEX. I expect to have a resolution, probably involving some patch, by 3 June. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Robert Haas
Date:
On Fri, May 27, 2016 at 10:58 AM, Kevin Grittner <kgrittn@gmail.com> wrote: >> As far as I can see normal index builds will allow concurrent hot >> prunes and everything; since those only require page-level >> exclusive locks. >> >> So for !concurrent builds we might end up with a corrupt index. > > ... by which you mean an index that omits certainly-dead heap > tuples which have been the subject of early pruning or vacuum, even > if there are still registered snapshots that include those tuples? > Or do you see something else? I think that is the danger. > Again, since both the heap pages involved and all the new index > pages would have LSNs recent enough to trigger the old snapshot > check, I'm not sure where the problem is, but will review closely > to see what I might have missed. This is a good point, though, I think. >> I think many of the places relying on heap scans with !mvcc >> snapshots are problematic in that way. Outdated reads will not be >> detected by TestForOldSnapshot() (given the (snapshot)->satisfies >> == HeapTupleSatisfiesMVCC condition therein), and rely on the >> snapshot to be actually working. E.g. CLUSTER/ VACUUM FULL rely >> on accurate HEAPTUPLE_RECENTLY_DEAD > > Don't the "RECENTLY" values imply that the transaction is still > running which cause the tuple to be dead? Since tuples are not > subject to early pruning or vacuum when that is the case, where do > you see a problem? The snapshot itself has the usual xmin et al., > so I'm not sure what you might mean by "the snapshot to be actually > working" if not the override at the time of pruning/vacuuming. Anybody who calls HeapTupleSatisfiesVacuum() with an xmin value that is newer that the oldest registered snapshot in the system (based on some snapshots being ignored) might get a return value of HEAPTUPLE_DEAD rather than HEAPTUPLE_RECENTLY_DEAD. It seems necessary to carefully audit all calls to HeapTupleSatisfiesVacuum() to see whether that difference matters. I took a quick look and here's what I see: statapprox_heap(): Statistical information for the DBA. The difference is non-critical. heap_prune_chain(): Seeing the tuple as dead might cause it to be removed early. This should be OK. Removing the tuple early will cause the page LSN to be bumped unless RelationNeedsWAL() returns false, and TransactionIdLimitedForOldSnapshots() includes that as a condition for disabling early pruning. IndexBuildHeapRangeScan(): We might end up with indexIt = false instead of indexIt = true. That should be OK because anyone using the old snapshot will see a new page LSN and error out. We might also fail to set indexInfo->ii_BrokenHotChain = true. I suspect that's a problem, but I'm not certain of it. acquire_sample_rows: Both return values are treated in the same way. No problem. copy_heap_data: We'll end up setting isdead = true instead of tups_recently_dead += 1. That means that the new heap won't include the tuple, which is OK because old snapshots can't read the new heap without erroring out, assuming that the new heap has LSNs. The xmin used here comes from vacuum_set_xid_limits() which goes through TransactionIdLimitedForOldSnapshots() so this should be OK for the same reasons as heap_prune_chain(). Another effect of seeing the tuple as prematurely dead is that we'll call rewrite_heap_dead_tuple() on it; rewrite_heap_dead_tuple() will presume that if this tuple is dead, its predecessor in the ctid chain is also dead. I don't see any obvious problem with that. lazy_scan_heap(): Basically, the same thing as heap_prune_chain(). CheckForSerializableConflictOut: Maybe a problem? If the tuple is dead, there's no issue, but if it's recently-dead, there might be. We might want to add comments to some of these places addressing snapshot_too_old specifically. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Tue, May 31, 2016 at 10:03 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, May 27, 2016 at 10:58 AM, Kevin Grittner <kgrittn@gmail.com> wrote: >>> As far as I can see normal index builds will allow concurrent hot >>> prunes and everything; since those only require page-level >>> exclusive locks. >>> >>> So for !concurrent builds we might end up with a corrupt index. >> >> ... by which you mean an index that omits certainly-dead heap >> tuples which have been the subject of early pruning or vacuum, even >> if there are still registered snapshots that include those tuples? >> Or do you see something else? > > I think that is the danger. Well, the *perceived* danger -- since every page in the new index would be new enough to be recognized as too recently modified to be safe for a snapshot that could still see the omitted rows, the only question I'm sorting out on this is how safe it might be to cause the index to be ignored in planning when using such a snapshot. That and demonstrating the safe behavior to those not following closely enough to see what will happen without a demonstration. >> Again, since both the heap pages involved and all the new index >> pages would have LSNs recent enough to trigger the old snapshot >> check, I'm not sure where the problem is, > This is a good point, though, I think. The whole perception of risk in this area seems to be based on getting that wrong; although the review of this area may yield some benefit in terms of minimizing false positives. >>> I think many of the places relying on heap scans with !mvcc >>> snapshots are problematic in that way. Outdated reads will not be >>> detected by TestForOldSnapshot() (given the (snapshot)->satisfies >>> == HeapTupleSatisfiesMVCC condition therein), and rely on the >>> snapshot to be actually working. E.g. CLUSTER/ VACUUM FULL rely >>> on accurate HEAPTUPLE_RECENTLY_DEAD >> >> Don't the "RECENTLY" values imply that the transaction is still >> running which cause the tuple to be dead? Since tuples are not >> subject to early pruning or vacuum when that is the case, where do >> you see a problem? The snapshot itself has the usual xmin et al., >> so I'm not sure what you might mean by "the snapshot to be actually >> working" if not the override at the time of pruning/vacuuming. > > Anybody who calls HeapTupleSatisfiesVacuum() with an xmin value that > is newer that the oldest registered snapshot in the system (based on > some snapshots being ignored) might get a return value of > HEAPTUPLE_DEAD rather than HEAPTUPLE_RECENTLY_DEAD. Since the override xmin cannot advance past the earliest transaction which is still active, HEAPTUPLE_DEAD indicates that the transaction causing the tuple to be dead has completed and the tuple is irrevocably dead -- even if there are still snapshots registered which can see it. Seeing HEAPTUPLE_DEAD rather than HEAPTUPLE_RECENTLY_DEAD is strictly limited to tuples which are certainly and permanently dead and for which the only possible references are non-MVCC snapshots or existing snapshots subject to "snapshot too old" monitoring. > IndexBuildHeapRangeScan(): We might end up with indexIt = false > instead of indexIt = true. That should be OK because anyone using the > old snapshot will see a new page LSN and error out. We might also > fail to set indexInfo->ii_BrokenHotChain = true. I suspect that's a > problem, but I'm not certain of it. The latter flag is what I'm currently digging at; but my hope is that whenever old_snapshot_threshold >= 0 we can set indexInfo->ii_BrokenHotChain = true to cause the planner to skip consideration of the index if the snapshot is an "old" one. That will avoid some false positives (seeing the error when not strictly necessary). If that works out the way I hope, the only down side is that a scan using a snapshot from an old transaction or cursor would use some other index or a heap scan; but we already have that possibility in some cases -- that seems to be the point of the flag. > CheckForSerializableConflictOut: Maybe a problem? If the tuple is > dead, there's no issue, but if it's recently-dead, there might be. If the tuple is not visible to the scan, the behavior is unchanged (a simple return from the function on either HEAPTUPLE_DEAD or HEAPTUPLE_RECENTLY_DEAD with !visible) and (thus) clearly correct. If the tuple is visible to us it is currently subject to early pruning or vacuum (since those operations would get the same modified xmin) but has not yet had any such treatment since we made it to this function in the first place. The processing for SSI purposes would be unaffected by the possibility that there could later be early pruning/vacuuming. Thanks for the review and feedback! -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Fri, May 27, 2016 at 10:18 AM, Kevin Grittner <kgrittn@gmail.com> wrote: > On Fri, May 27, 2016 at 9:58 AM, Kevin Grittner <kgrittn@gmail.com> wrote: >> On Tue, May 24, 2016 at 4:56 PM, Andres Freund <andres@anarazel.de> wrote: > >>> If an old session with >= repeatable read accesses a clustered >>> table (after the cluster committed), they'll now not see any >>> errors, because all the LSNs look new. >> >> Again, it is new LSNs that trigger errors; if the page has not been >> written recently the LSN is old and there is no error. I think you >> may be seeing problems based on getting the basics of this >> backwards. > > I am reviewing the suggestion of a possible bug now, and will make > it my top priority until resolved. By the end of 1 June I will > either have committed a fix or posted an explanation of why the > concern is mistaken, with test results to demonstrate correct > behavior. This got set back by needing to fix a bug in the 9.5 release. I am back on this and have figured out that everyone who commented on this specific issue was wrong about a very important fact -- the LSNs in index pages after CREATE INDEX (with or without CONCURRENTLY) and for REINDEX are always == InvalidXLogRecPtr (0). That means that a snapshot from before an index build does not always generate errors when it should on the use of the new index. (Any early pruning/vacuuuming from before the index build is missed; activity subsequent to the index build is recognized.) Consequently, causing the index to be ignored in planning when using the old index is not a nice optimization, but necessary for correctness. We already have logic to do this for other cases (like HOT updates), so it is a matter of tying in to that existing code correctly. This won't be all that novel. I now expect to push a fix along those lines by Tuesday, 6 June. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Fri, May 27, 2016 at 10:35 AM, Kevin Grittner <kgrittn@gmail.com> wrote: > On Tue, May 24, 2016 at 4:10 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> [ANALYZE of index with expression may fail to update statistics >> if ANALYZE runs longer than old_snapshot_threshold] >> If we can get away with it, it would be a rather large win to only set >> a snapshot when the table has an expression index. For purposes of >> "snapshot too old", though, it will be important that a function in an >> index which tries to read data from some other table which has been >> pruned cancels itself when necessary. > > I will make this my top priority after resolving the question of whether > there is an issue with CREATE INDEX. I expect to have a resolution, > probably involving some patch, by 3 June. Due to 9.5 bug-fixing and the index issue taking a bit longer than I expected, this is now looking like a 7 June resolution. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Fri, Jun 3, 2016 at 4:24 PM, Kevin Grittner <kgrittn@gmail.com> wrote:
> Consequently, causing the index to be ignored in planning when
> using the old index
That last line should have read "using an old snapshot"
> is not a nice optimization, but necessary for
> correctness. We already have logic to do this for other cases
> (like HOT updates), so it is a matter of tying in to that existing
> code correctly. This won't be all that novel.
Just to demonstrate that, the minimal patch to fix behavior in this
area would be:
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 31a1438..6c379da 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2268,6 +2268,9 @@ IndexBuildHeapRangeScan(Relation heapRelation,
Assert(numblocks == InvalidBlockNumber);
}
+ if (old_snapshot_threshold >= 0)
+ indexInfo->ii_BrokenHotChain = true;
+
reltuples = 0;
/*
Of course, ii_BrokenHotChain should be renamed to something like> Consequently, causing the index to be ignored in planning when
> using the old index
That last line should have read "using an old snapshot"
> is not a nice optimization, but necessary for
> correctness. We already have logic to do this for other cases
> (like HOT updates), so it is a matter of tying in to that existing
> code correctly. This won't be all that novel.
Just to demonstrate that, the minimal patch to fix behavior in this
area would be:
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 31a1438..6c379da 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2268,6 +2268,9 @@ IndexBuildHeapRangeScan(Relation heapRelation,
Assert(numblocks == InvalidBlockNumber);
}
+ if (old_snapshot_threshold >= 0)
+ indexInfo->ii_BrokenHotChain = true;
+
reltuples = 0;
/*
ii_UnsafeForOldSnapshots, and some comments need to be updated; but
the above is the substance of it.
Attached is what I have so far. I'm still looking at what other
comments might need to be adjusted, but thought I should put this
much out for comment now.
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Robert Haas
Date:
On Sat, Jun 4, 2016 at 4:21 PM, Kevin Grittner <kgrittn@gmail.com> wrote: > On Fri, Jun 3, 2016 at 4:24 PM, Kevin Grittner <kgrittn@gmail.com> wrote: >> Consequently, causing the index to be ignored in planning when >> using the old index > > That last line should have read "using an old snapshot" > >> is not a nice optimization, but necessary for >> correctness. We already have logic to do this for other cases >> (like HOT updates), so it is a matter of tying in to that existing >> code correctly. This won't be all that novel. > > Just to demonstrate that, the minimal patch to fix behavior in this > area would be: > > diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c > index 31a1438..6c379da 100644 > --- a/src/backend/catalog/index.c > +++ b/src/backend/catalog/index.c > @@ -2268,6 +2268,9 @@ IndexBuildHeapRangeScan(Relation heapRelation, > Assert(numblocks == InvalidBlockNumber); > } > > + if (old_snapshot_threshold >= 0) > + indexInfo->ii_BrokenHotChain = true; > + > reltuples = 0; > > /* > > Of course, ii_BrokenHotChain should be renamed to something like > ii_UnsafeForOldSnapshots, and some comments need to be updated; but > the above is the substance of it. I don't know why we'd want to rename it like that... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Tue, Jun 7, 2016 at 10:40 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sat, Jun 4, 2016 at 4:21 PM, Kevin Grittner <kgrittn@gmail.com> wrote: >> the minimal patch to fix behavior in this area would be: >> >> diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c >> index 31a1438..6c379da 100644 >> --- a/src/backend/catalog/index.c >> +++ b/src/backend/catalog/index.c >> @@ -2268,6 +2268,9 @@ IndexBuildHeapRangeScan(Relation heapRelation, >> Assert(numblocks == InvalidBlockNumber); >> } >> >> + if (old_snapshot_threshold >= 0) >> + indexInfo->ii_BrokenHotChain = true; >> + >> reltuples = 0; >> >> /* >> >> Of course, ii_BrokenHotChain should be renamed to something like >> ii_UnsafeForOldSnapshots, and some comments need to be updated; but >> the above is the substance of it. > > I don't know why we'd want to rename it like that... If we made the above change, the old name would be misleading, but I've thought better of that and attach a slightly different approach (tested but not yet with comment adjustments); attached. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Robert Haas
Date:
On Wed, Jun 8, 2016 at 9:40 AM, Kevin Grittner <kgrittn@gmail.com> wrote: >>> Of course, ii_BrokenHotChain should be renamed to something like >>> ii_UnsafeForOldSnapshots, and some comments need to be updated; but >>> the above is the substance of it. >> >> I don't know why we'd want to rename it like that... > > If we made the above change, the old name would be misleading, but > I've thought better of that and attach a slightly different > approach (tested but not yet with comment adjustments); attached. Kevin asked me to look at this patch, and maybe update it, but after some further study, I am not at all convinced that there's any actual bug here. Here's why: in order for the HeapTupleSatisfiesVacuum() in IndexBuildHeapRangeScan() to return HEAPTUPLE_DEAD instead of HEAPTUPLE_RECENTLY_DEAD, it would have to be using an OldestXmin value that doesn't include all of the snapshots in the system. But that will never happen, because that xmin comes directly from GetOldestXmin(heapRelation, true), which knows nothing about snapshot_too_old and will therefore never exclude any snapshots. If we were to pass the output of HeapTupleSatisfiesVacuum() through TransactionIdLimitedForOldSnapshots() before using it here, we would have a bug. But we don't do that. Do you have a test case that demonstrates a problem, or an explanation of why you think there is one? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Wed, Jun 8, 2016 at 2:49 PM, Robert Haas <robertmhaas@gmail.com> wrote: > Do you have a test case that demonstrates a problem, or an explanation > of why you think there is one? With old_snapshot_threshold = '1min' -- connection 1 drop table if exists t1; create table t1 (c1 int not null); insert into t1 select generate_series(1, 1000000); begin transaction isolation level repeatable read; select 1; -- connection 2 insert into t2 values (1); delete from t1 where c1 between 200000 and 299999; delete from t1 where c1 = 1000000; vacuum analyze verbose t1; select pg_sleep_for('2min'); vacuum analyze verbose t1; -- repeat if needed until dead rows vacuumed -- connection 1 select c1 from t1 where c1 = 100; select c1 from t1 where c1 = 250000; The problem occurs when an index is built while an old snapshot exists which can't see the effects of early pruning/vacuuming. The fix prevents use of such an index until all snapshots early enough to have a problem have been released. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Robert Haas
Date:
On Wed, Jun 8, 2016 at 4:04 PM, Kevin Grittner <kgrittn@gmail.com> wrote: > On Wed, Jun 8, 2016 at 2:49 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> Do you have a test case that demonstrates a problem, or an explanation >> of why you think there is one? > > With old_snapshot_threshold = '1min' > > -- connection 1 > drop table if exists t1; > create table t1 (c1 int not null); > insert into t1 select generate_series(1, 1000000); > begin transaction isolation level repeatable read; > select 1; > > -- connection 2 > insert into t2 values (1); > delete from t1 where c1 between 200000 and 299999; > delete from t1 where c1 = 1000000; > vacuum analyze verbose t1; > select pg_sleep_for('2min'); > vacuum analyze verbose t1; -- repeat if needed until dead rows vacuumed > > -- connection 1 > select c1 from t1 where c1 = 100; > select c1 from t1 where c1 = 250000; > > The problem occurs when an index is built while an old snapshot > exists which can't see the effects of early pruning/vacuuming. The > fix prevents use of such an index until all snapshots early enough > to have a problem have been released. This example doesn't seem to involve any CREATE INDEX or REINDEX operations, so I'm a little confused. Generally, I think I see the hazard you're concerned about: I had failed to realize, as you mentioned upthread, that new index pages would have an LSN of 0. So if a tuple is pruned early and then the index is reindexed, old snapshots won't realize that data is missing. What I'm less certain about is whether you can actually get by with reusing ii_BrokenHotChain to handle this case. For example, note this comment: * However, when reindexing an existing index, we should do nothing here. * Any HOT chains that are broken with respect to the index must predate * the index's original creation, so there is no need to change the * index's usability horizon. Moreover, we *must not* try to change the * index's pg_index entry while reindexing pg_index itself, and this * optimization nicely prevents that. This logic doesn't apply to the old snapshot case; there, you'd need to distrust the index whether it was an initial build or a REINDEX, but it doesn't look like that's what the patch does. I'm worried there could be other places where we rely on ii_BrokenHotChain to detect only one specific condition that isn't quite the same as what you're trying to use it for here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Noah Misch
Date:
On Fri, Jun 03, 2016 at 04:29:40PM -0500, Kevin Grittner wrote: > On Fri, May 27, 2016 at 10:35 AM, Kevin Grittner <kgrittn@gmail.com> wrote: > > On Tue, May 24, 2016 at 4:10 PM, Robert Haas <robertmhaas@gmail.com> wrote: > > >> [ANALYZE of index with expression may fail to update statistics > >> if ANALYZE runs longer than old_snapshot_threshold] > > >> If we can get away with it, it would be a rather large win to only set > >> a snapshot when the table has an expression index. For purposes of > >> "snapshot too old", though, it will be important that a function in an > >> index which tries to read data from some other table which has been > >> pruned cancels itself when necessary. > > > > I will make this my top priority after resolving the question of whether > > there is an issue with CREATE INDEX. I expect to have a resolution, > > probably involving some patch, by 3 June. > > Due to 9.5 bug-fixing and the index issue taking a bit longer than > I expected, this is now looking like a 7 June resolution. This PostgreSQL 9.6 open item is past due for your status update. Kindly send a status update within 24 hours, and include a date for your subsequent status update. Refer to the policy on open item ownership: http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
[Thanks to Robert to stepping up to keep this moving while I was down yesterday with a minor injury. I'm back on it today.] On Wed, Jun 8, 2016 at 3:11 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Jun 8, 2016 at 4:04 PM, Kevin Grittner <kgrittn@gmail.com> wrote: >> -- connection 1 >> drop table if exists t1; >> create table t1 (c1 int not null); >> insert into t1 select generate_series(1, 1000000); >> begin transaction isolation level repeatable read; >> select 1; >> >> -- connection 2 >> insert into t2 values (1); >> delete from t1 where c1 between 200000 and 299999; >> delete from t1 where c1 = 1000000; >> vacuum analyze verbose t1; >> select pg_sleep_for('2min'); >> vacuum analyze verbose t1; -- repeat if needed until dead rows vacuumed >> >> -- connection 1 >> select c1 from t1 where c1 = 100; >> select c1 from t1 where c1 = 250000; >> >> The problem occurs when an index is built while an old snapshot >> exists which can't see the effects of early pruning/vacuuming. The >> fix prevents use of such an index until all snapshots early enough >> to have a problem have been released. > > This example doesn't seem to involve any CREATE INDEX or REINDEX > operations, so I'm a little confused. Sorry; pasto -- the index build is supposed to be on connection 2 right after the dead rows are confirmed vacuumed away: create index t1_c1 on t1 (c1); > Generally, I think I see the hazard you're concerned about: I had > failed to realize, as you mentioned upthread, that new index pages > would have an LSN of 0. So if a tuple is pruned early and then the > index is reindexed, old snapshots won't realize that data is missing. > What I'm less certain about is whether you can actually get by with > reusing ii_BrokenHotChain to handle this case. v2 and later does not do that. v1 did, but that was a more blunt instrument. > For example, note this comment: > > * However, when reindexing an existing index, we should do nothing here. > * Any HOT chains that are broken with respect to the index must predate > * the index's original creation, so there is no need to change the > * index's usability horizon. Moreover, we *must not* try to change the > * index's pg_index entry while reindexing pg_index itself, and this > * optimization nicely prevents that. > > This logic doesn't apply to the old snapshot case; there, you'd need > to distrust the index whether it was an initial build or a REINDEX, > but it doesn't look like that's what the patch does. I'm worried > there could be other places where we rely on ii_BrokenHotChain to > detect only one specific condition that isn't quite the same as what > you're trying to use it for here. Well spotted. I had used a lot of discreet calls to get that reindex logic right, but it was verbose and ugly, so I had just added the new macros in this patch and started to implement them before I knocked off for the day. At handover I was too distracted to remember where I was with it. :-( See if it looks right to you now. Attached is v3. I will commit this patch to resolve this issue tomorrow, barring any objections before then. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Robert Haas
Date:
On Thu, Jun 9, 2016 at 10:28 AM, Kevin Grittner <kgrittn@gmail.com> wrote: > [Thanks to Robert to stepping up to keep this moving while I was > down yesterday with a minor injury. I'm back on it today.] >> Generally, I think I see the hazard you're concerned about: I had >> failed to realize, as you mentioned upthread, that new index pages >> would have an LSN of 0. So if a tuple is pruned early and then the >> index is reindexed, old snapshots won't realize that data is missing. >> What I'm less certain about is whether you can actually get by with >> reusing ii_BrokenHotChain to handle this case. > > v2 and later does not do that. v1 did, but that was a more blunt > instrument. > >> For example, note this comment: >> >> * However, when reindexing an existing index, we should do nothing here. >> * Any HOT chains that are broken with respect to the index must predate >> * the index's original creation, so there is no need to change the >> * index's usability horizon. Moreover, we *must not* try to change the >> * index's pg_index entry while reindexing pg_index itself, and this >> * optimization nicely prevents that. >> >> This logic doesn't apply to the old snapshot case; there, you'd need >> to distrust the index whether it was an initial build or a REINDEX, >> but it doesn't look like that's what the patch does. I'm worried >> there could be other places where we rely on ii_BrokenHotChain to >> detect only one specific condition that isn't quite the same as what >> you're trying to use it for here. > > Well spotted. I had used a lot of discreet calls to get that > reindex logic right, but it was verbose and ugly, so I had just > added the new macros in this patch and started to implement them > before I knocked off for the day. At handover I was too distracted > to remember where I was with it. :-( See if it looks right to you > now. > > Attached is v3. I will commit this patch to resolve this issue > tomorrow, barring any objections before then. So I like the idea of centralizing checks in RelationAllowsEarlyVacuum, but shouldn't it really be called RelationAllowsEarlyPruning? Will look at this a bit more if I get time. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Thu, Jun 9, 2016 at 6:16 PM, Robert Haas <robertmhaas@gmail.com> wrote: > So I like the idea of centralizing checks in > RelationAllowsEarlyVacuum, but shouldn't it really be called > RelationAllowsEarlyPruning? Since vacuum calls the pruning function, and not the other way around, the name you suggest would be technically more correct. Committed using "Pruning" instead of "Vacuum" in both new macros. I have closed the CREATE INDEX versus "snapshot too old" issue in the "PostgreSQL 9.6 Open Items" Wiki page. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Robert Haas
Date:
On Fri, Jun 10, 2016 at 10:45 AM, Kevin Grittner <kgrittn@gmail.com> wrote: > On Thu, Jun 9, 2016 at 6:16 PM, Robert Haas <robertmhaas@gmail.com> wrote: > >> So I like the idea of centralizing checks in >> RelationAllowsEarlyVacuum, but shouldn't it really be called >> RelationAllowsEarlyPruning? > > Since vacuum calls the pruning function, and not the other way > around, the name you suggest would be technically more correct. > Committed using "Pruning" instead of "Vacuum" in both new macros. > > I have closed the CREATE INDEX versus "snapshot too old" issue in > the "PostgreSQL 9.6 Open Items" Wiki page. You've still got an early_vacuum_enabled variable name floating around there. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Fri, Jun 10, 2016 at 10:26 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Jun 10, 2016 at 10:45 AM, Kevin Grittner <kgrittn@gmail.com> wrote: >> Since vacuum calls the pruning function, and not the other way >> around, the name you suggest would be technically more correct. >> Committed using "Pruning" instead of "Vacuum" in both new macros. > You've still got an early_vacuum_enabled variable name floating around there. Gah! Renamed for consistency. Thanks! -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Tue, May 24, 2016 at 4:10 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, May 24, 2016 at 3:48 PM, Kevin Grittner <kgrittn@gmail.com> wrote: >> On Tue, May 24, 2016 at 12:00 PM, Andres Freund <andres@anarazel.de> wrote: >>> On 2016-05-24 11:24:44 -0500, Kevin Grittner wrote: >>>> On Fri, May 6, 2016 at 8:28 PM, Kevin Grittner <kgrittn@gmail.com> wrote: >>>>> On Fri, May 6, 2016 at 7:48 PM, Andres Freund <andres@anarazel.de> wrote: >>>> >>>>>> That comment reminds me of a question I had: Did you consider the effect >>>>>> of this patch on analyze? It uses a snapshot, and by memory you've not >>>>>> built in a defense against analyze being cancelled. >> >> The primary defense is not considering a cancellation except in 30 >> of the 500 places a page is used. None of those 30 are, as far as >> I can see (upon review in response to your question), used in the >> analyze process. > > It's not obvious to me how this is supposed to work. If ANALYZE's > snapshot is subject to being ignored for xmin purposes because of > snapshot_too_old, then I would think it would need to consider > cancelling itself if it reads a page with possibly-removed data, just > like in any other case. It seems that we might not actually need a > snapshot set for ANALYZE in all cases, because the comments say: > > /* functions in indexes may want a snapshot set */ > PushActiveSnapshot(GetTransactionSnapshot()); > > If we can get away with it, it would be a rather large win to only set > a snapshot when the table has an expression index. For purposes of > "snapshot too old", though, it will be important that a function in an > index which tries to read data from some other table which has been > pruned cancels itself when necessary. I have reviewed the code and run tests to try to find something here which could be considered a bug, without finding any problem. When reading pages for the random sample for ANALYZE (or auto-analyze) there is not an age check; so ANALYZE completes without error, keeping statistics up-to-date. There really is no difference in behavior except in the case that: (1) old_snapshot_threshold >= 0 to enable the "snapshot too old" feature, and (2) there were tuples that were dead as the result of completed transactions, and (3) those tuples became older than the threshold, and (4) those tuples were pruned or vacuumed away, and (5) an ANALYZE process would have read those dead tuples had they not been removed. In such a case the irrevocably dead, permanently removed tuples are not counted in the statistics. I have trouble seeing a better outcome than that. Among my tests, I specifically checked for an ANALYZE of a table having an index on an expression, using an old snapshot: -- connection 1 drop table if exists t1; create table t1 (c1 int not null); drop table if exists t2; create table t2 (c1 int not null); insert into t1 select generate_series(1, 10000); drop function mysq(i int); create function mysq(i int) returns int language plpgsql immutable as $mysq$ begin return (i * i); end $mysq$; create index t1_c1sq on t1 ((mysq(c1))); begin transaction isolation level repeatable read; select 1; -- connection 2 vacuum analyze verbose t1; delete from t1 where c1 between 1000 and 1999; delete from t1 where c1 = 8000; insert into t2 values (1); select pg_sleep_for('2min'); vacuum verbose t1; -- repeat if necessary to see the dead rows disappear -- connection 1 analyze verbose t1; This runs to completion, as I would want and expect. I am closing this item on the "PostgreSQL 9.6 Open Items" page. If anyone feels that I've missed something, please provide a test to show the problem, or a clear description of the problem and how you feel behavior should be different. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Robert Haas
Date:
On Sat, Jun 11, 2016 at 11:29 AM, Kevin Grittner <kgrittn@gmail.com> wrote: > I have reviewed the code and run tests to try to find something > here which could be considered a bug, without finding any problem. > When reading pages for the random sample for ANALYZE (or > auto-analyze) there is not an age check; so ANALYZE completes > without error, keeping statistics up-to-date. > > There really is no difference in behavior except in the case that: > > (1) old_snapshot_threshold >= 0 to enable the "snapshot too old" > feature, and > (2) there were tuples that were dead as the result of completed > transactions, and > (3) those tuples became older than the threshold, and > (4) those tuples were pruned or vacuumed away, and > (5) an ANALYZE process would have read those dead tuples had they > not been removed. > > In such a case the irrevocably dead, permanently removed tuples are > not counted in the statistics. I have trouble seeing a better > outcome than that. Among my tests, I specifically checked for an > ANALYZE of a table having an index on an expression, using an old > snapshot: > > -- connection 1 > drop table if exists t1; > create table t1 (c1 int not null); > drop table if exists t2; > create table t2 (c1 int not null); > insert into t1 select generate_series(1, 10000); > drop function mysq(i int); > create function mysq(i int) > returns int > language plpgsql > immutable > as $mysq$ > begin > return (i * i); > end > $mysq$; > create index t1_c1sq on t1 ((mysq(c1))); > begin transaction isolation level repeatable read; > select 1; > > -- connection 2 > vacuum analyze verbose t1; > delete from t1 where c1 between 1000 and 1999; > delete from t1 where c1 = 8000; > insert into t2 values (1); > select pg_sleep_for('2min'); > vacuum verbose t1; -- repeat if necessary to see the dead rows > disappear > > -- connection 1 > analyze verbose t1; > > This runs to completion, as I would want and expect. > > I am closing this item on the "PostgreSQL 9.6 Open Items" page. If > anyone feels that I've missed something, please provide a test to > show the problem, or a clear description of the problem and how you > feel behavior should be different. So what happens in this scenario: 1. ANALYZE runs really slowly - maybe the user-defined function it's running for the expression index is extremely long-running. 2. Eventually, the snapshot for ANALYZE is older than the configured value of snapshot_too_old. 3. Then, ANALYZE selects a page with an LSN new enough that it might have been pruned. Presumably, the ANALYZE ought to error out in this scenario, just as it would in any other situation where an old snapshot sees a new page. No? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Wed, Jun 15, 2016 at 10:46 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sat, Jun 11, 2016 at 11:29 AM, Kevin Grittner <kgrittn@gmail.com> wrote: >> I have reviewed the code and run tests to try to find something >> here which could be considered a bug, without finding any problem. >> When reading pages for the random sample for ANALYZE (or >> auto-analyze) there is not an age check; so ANALYZE completes >> without error, keeping statistics up-to-date. >> >> There really is no difference in behavior except in the case that: >> >> (1) old_snapshot_threshold >= 0 to enable the "snapshot too old" >> feature, and >> (2) there were tuples that were dead as the result of completed >> transactions, and >> (3) those tuples became older than the threshold, and >> (4) those tuples were pruned or vacuumed away, and >> (5) an ANALYZE process would have read those dead tuples had they >> not been removed. >> >> In such a case the irrevocably dead, permanently removed tuples are >> not counted in the statistics. I have trouble seeing a better >> outcome than that. Among my tests, I specifically checked for an >> ANALYZE of a table having an index on an expression, using an old >> snapshot: >> >> -- connection 1 >> drop table if exists t1; >> create table t1 (c1 int not null); >> drop table if exists t2; >> create table t2 (c1 int not null); >> insert into t1 select generate_series(1, 10000); >> drop function mysq(i int); >> create function mysq(i int) >> returns int >> language plpgsql >> immutable >> as $mysq$ >> begin >> return (i * i); >> end >> $mysq$; >> create index t1_c1sq on t1 ((mysq(c1))); >> begin transaction isolation level repeatable read; >> select 1; >> >> -- connection 2 >> vacuum analyze verbose t1; >> delete from t1 where c1 between 1000 and 1999; >> delete from t1 where c1 = 8000; >> insert into t2 values (1); >> select pg_sleep_for('2min'); >> vacuum verbose t1; -- repeat if necessary to see the dead rows >> disappear >> >> -- connection 1 >> analyze verbose t1; >> >> This runs to completion, as I would want and expect. >> >> I am closing this item on the "PostgreSQL 9.6 Open Items" page. If >> anyone feels that I've missed something, please provide a test to >> show the problem, or a clear description of the problem and how you >> feel behavior should be different. > > So what happens in this scenario: > > 1. ANALYZE runs really slowly - maybe the user-defined function it's > running for the expression index is extremely long-running. > 2. Eventually, the snapshot for ANALYZE is older than the configured > value of snapshot_too_old. > 3. Then, ANALYZE selects a page with an LSN new enough that it might > have been pruned. > > Presumably, the ANALYZE ought to error out in this scenario, just as > it would in any other situation where an old snapshot sees a new page. > No? The test I showed creates a situation which (to ANALYZE) is identical to what you describe -- ANALYZE sees a page with an LSN recent enough that it could have been (and actually has been) pruned. Why would it be better for the ANALYZE to fail than to complete? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Robert Haas
Date:
On Wed, Jun 15, 2016 at 1:44 PM, Kevin Grittner <kgrittn@gmail.com> wrote:>> So what happens in this scenario: >> 1. ANALYZE runs really slowly - maybe the user-defined function it's >> running for the expression index is extremely long-running. >> 2. Eventually, the snapshot for ANALYZE is older than the configured >> value of snapshot_too_old. >> 3. Then, ANALYZE selects a page with an LSN new enough that it might >> have been pruned. >> >> Presumably, the ANALYZE ought to error out in this scenario, just as >> it would in any other situation where an old snapshot sees a new page. >> No? > > The test I showed creates a situation which (to ANALYZE) is > identical to what you describe -- ANALYZE sees a page with an LSN > recent enough that it could have been (and actually has been) > pruned. Why would it be better for the ANALYZE to fail than to > complete? As I understand it, the reason we need to sometimes give "ERROR: snapshot too old" after early pruning is because we might otherwise give the wrong answer. Maybe I'm confused. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Wed, Jun 15, 2016 at 1:29 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Jun 15, 2016 at 1:44 PM, Kevin Grittner <kgrittn@gmail.com> > wrote:>> So what happens in this scenario: >>> 1. ANALYZE runs really slowly - maybe the user-defined function it's >>> running for the expression index is extremely long-running. >>> 2. Eventually, the snapshot for ANALYZE is older than the configured >>> value of snapshot_too_old. >>> 3. Then, ANALYZE selects a page with an LSN new enough that it might >>> have been pruned. >>> >>> Presumably, the ANALYZE ought to error out in this scenario, just as >>> it would in any other situation where an old snapshot sees a new page. >>> No? >> >> The test I showed creates a situation which (to ANALYZE) is >> identical to what you describe -- ANALYZE sees a page with an LSN >> recent enough that it could have been (and actually has been) >> pruned. Why would it be better for the ANALYZE to fail than to >> complete? > > As I understand it, the reason we need to sometimes give "ERROR: > snapshot too old" after early pruning is because we might otherwise > give the wrong answer. > > Maybe I'm confused. In the scenario that you describe, ANALYZE is coming up with statistics to use in query planning, and the numbers are not expected to always be 100% accurate. I can think of conditions which might prevail when seeing the recent LSN. (1) The recent LSN is from a cause having nothing to do with the STO feature, like DML. As things stand, the behavior is the same as without the patch -- the rows are counted just the same as always. If we did as you suggest, we instead would abort ANALYZE and have stale statistics. (2) The recent LSN is related to STO pruning. The dead rows are gone forever, and will not be counted. This seems more correct than counting them, even if it were possible, and also superior to aborting the ANALYZE and leaving stale statistics. Of course, a subset of (1) is the case that the rows can be early-pruned at the next opportunity. In such a case ANALYZE is still counting them according to the rules that we had before the snapshot too old feature. If someone wants to argue that those rules are wrong, that seems like material for a separate patch. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Alvaro Herrera
Date:
Robert Haas wrote: > On Wed, Jun 15, 2016 at 1:44 PM, Kevin Grittner <kgrittn@gmail.com> > wrote: > > The test I showed creates a situation which (to ANALYZE) is > > identical to what you describe -- ANALYZE sees a page with an LSN > > recent enough that it could have been (and actually has been) > > pruned. Why would it be better for the ANALYZE to fail than to > > complete? > > As I understand it, the reason we need to sometimes give "ERROR: > snapshot too old" after early pruning is because we might otherwise > give the wrong answer. So what constitutes "the wrong answer"? A regular transaction reading a page and not finding a tuple that should have been there but was removed, is a serious problem and should be aborted. For ANALYZE it may not matter a great deal. Some very old tuple that might have been chosen for the sample is not there; a different tuple is chosen instead, so the stats might be slightly difference. No big deal. Maybe it is possible to get into trouble if you're taking a sample for an expression index. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Wed, Jun 15, 2016 at 1:45 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Maybe it is possible to get into trouble if you're taking a sample for > an expression index. Maybe -- if you are using a function for an index expression which does an explicit SELECT against some database table rather than only using values from the row itself. In such a case you would have had to mark a function as IMMUTABLE which depends on table contents. I say you get to keep both pieces. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Robert Haas
Date:
On Wed, Jun 15, 2016 at 2:45 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Robert Haas wrote: >> On Wed, Jun 15, 2016 at 1:44 PM, Kevin Grittner <kgrittn@gmail.com> >> wrote: >> > The test I showed creates a situation which (to ANALYZE) is >> > identical to what you describe -- ANALYZE sees a page with an LSN >> > recent enough that it could have been (and actually has been) >> > pruned. Why would it be better for the ANALYZE to fail than to >> > complete? >> >> As I understand it, the reason we need to sometimes give "ERROR: >> snapshot too old" after early pruning is because we might otherwise >> give the wrong answer. > > So what constitutes "the wrong answer"? A regular transaction reading a > page and not finding a tuple that should have been there but was > removed, is a serious problem and should be aborted. For ANALYZE it may > not matter a great deal. Some very old tuple that might have been > chosen for the sample is not there; a different tuple is chosen instead, > so the stats might be slightly difference. No big deal. > > Maybe it is possible to get into trouble if you're taking a sample for > an expression index. The expression index case is the one to worry about; if there is a problem, that's where it is. What bothers me is that a function used in an expression index could do anything at all - it can read any table in the database. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Alvaro Herrera
Date:
Robert Haas wrote: > On Wed, Jun 15, 2016 at 2:45 PM, Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: > > Maybe it is possible to get into trouble if you're taking a sample for > > an expression index. > > The expression index case is the one to worry about; if there is a > problem, that's where it is. What bothers me is that a function used > in an expression index could do anything at all - it can read any > table in the database. Hmm, but if it does that, then the code that actually implements that query would run the STO checks, right? The analyze code doesn't need to. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Wed, Jun 15, 2016 at 1:50 PM, Robert Haas <robertmhaas@gmail.com> wrote: > The expression index case is the one to worry about; if there is a > problem, that's where it is. What bothers me is that a function used > in an expression index could do anything at all - it can read any > table in the database. It *can*, but then you are lying to the database when you call it IMMUTABLE. Such an index can easily become corrupted through normal DML. Without DML the ANALYZE has no problem. So you seem to be concerned that if someone is lying to the database engine to force it accept a function as IMMUTABLE when it actually isn't, and then updating the referenced rows (which is very likely to render the index corrupted), that statistics might also become stale. They might. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Alvaro Herrera
Date:
Kevin Grittner wrote: > On Wed, Jun 15, 2016 at 1:50 PM, Robert Haas <robertmhaas@gmail.com> wrote: > > > The expression index case is the one to worry about; if there is a > > problem, that's where it is. What bothers me is that a function used > > in an expression index could do anything at all - it can read any > > table in the database. > > It *can*, but then you are lying to the database when you call it > IMMUTABLE. Such an index can easily become corrupted through > normal DML. Without DML the ANALYZE has no problem. So you seem > to be concerned that if someone is lying to the database engine to > force it accept a function as IMMUTABLE when it actually isn't, and > then updating the referenced rows (which is very likely to render > the index corrupted), that statistics might also become stale. We actually go quite some lengths to support this case, even when it's the opinion of many that we shouldn't. For example VACUUM doesn't try to find index entries using the values in each deleted tuple; instead we remember the TIDs and then scan the indexes (possibly many times) to find entries that match those TIDs -- which is much slower. Yet we do it this way to protect the case that somebody is doing the not-really-IMMUTABLE function. In other words, I don't think we consider the position you argued as acceptable. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Wed, Jun 15, 2016 at 1:54 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Robert Haas wrote: >> The expression index case is the one to worry about; if there is a >> problem, that's where it is. What bothers me is that a function used >> in an expression index could do anything at all - it can read any >> table in the database. > > Hmm, but if it does that, then the code that actually implements that > query would run the STO checks, right? The analyze code doesn't need > to. Right. In the described case, you would get a "snapshot too old" error inside the expression which is trying to generate the index value. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Wed, Jun 15, 2016 at 1:59 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Kevin Grittner wrote: >> On Wed, Jun 15, 2016 at 1:50 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> >>> The expression index case is the one to worry about; if there is a >>> problem, that's where it is. What bothers me is that a function used >>> in an expression index could do anything at all - it can read any >>> table in the database. >> >> It *can*, but then you are lying to the database when you call it >> IMMUTABLE. Such an index can easily become corrupted through >> normal DML. Without DML the ANALYZE has no problem. So you seem >> to be concerned that if someone is lying to the database engine to >> force it accept a function as IMMUTABLE when it actually isn't, and >> then updating the referenced rows (which is very likely to render >> the index corrupted), that statistics might also become stale. > > We actually go quite some lengths to support this case, even when it's > the opinion of many that we shouldn't. For example VACUUM doesn't try > to find index entries using the values in each deleted tuple; instead we > remember the TIDs and then scan the indexes (possibly many times) to > find entries that match those TIDs -- which is much slower. Yet we do > it this way to protect the case that somebody is doing the > not-really-IMMUTABLE function. > > In other words, I don't think we consider the position you argued as > acceptable. What are you saying is unacceptable, and what behavior would be acceptable instead? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-06-15 14:50:46 -0400, Robert Haas wrote: > On Wed, Jun 15, 2016 at 2:45 PM, Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: > > Robert Haas wrote: > >> On Wed, Jun 15, 2016 at 1:44 PM, Kevin Grittner <kgrittn@gmail.com> > >> wrote: > >> > The test I showed creates a situation which (to ANALYZE) is > >> > identical to what you describe -- ANALYZE sees a page with an LSN > >> > recent enough that it could have been (and actually has been) > >> > pruned. Why would it be better for the ANALYZE to fail than to > >> > complete? > >> > >> As I understand it, the reason we need to sometimes give "ERROR: > >> snapshot too old" after early pruning is because we might otherwise > >> give the wrong answer. > > > > So what constitutes "the wrong answer"? A regular transaction reading a > > page and not finding a tuple that should have been there but was > > removed, is a serious problem and should be aborted. For ANALYZE it may > > not matter a great deal. Some very old tuple that might have been > > chosen for the sample is not there; a different tuple is chosen instead, > > so the stats might be slightly difference. No big deal. > > > > Maybe it is possible to get into trouble if you're taking a sample for > > an expression index. > > The expression index case is the one to worry about; if there is a > problem, that's where it is. What bothers me is that a function used > in an expression index could do anything at all - it can read any > table in the database. Isn't that also a problem around fetching toast tuples? As we don't TestForOldSnapshot_impl() for toast, We might fetch a toast tuple which since have been re-purposed for a datum of a different type. Which can have arbitrarily bad consequences afaics. Andres
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Wed, Jun 15, 2016 at 2:20 PM, Andres Freund <andres@anarazel.de> wrote: > We might fetch a toast tuple which > since have been re-purposed for a datum of a different type. How would that happen? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Alvaro Herrera
Date:
Kevin Grittner wrote: > On Wed, Jun 15, 2016 at 1:59 PM, Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: > > We actually go quite some lengths to support this case, even when it's > > the opinion of many that we shouldn't. For example VACUUM doesn't try > > to find index entries using the values in each deleted tuple; instead we > > remember the TIDs and then scan the indexes (possibly many times) to > > find entries that match those TIDs -- which is much slower. Yet we do > > it this way to protect the case that somebody is doing the > > not-really-IMMUTABLE function. > > > > In other words, I don't think we consider the position you argued as > > acceptable. > > What are you saying is unacceptable, and what behavior would be > acceptable instead? The answer "we don't support the situation where you have an index using an IMMUTABLE function that isn't actually immutable" is not acceptable. The acceptable solution would be a design that doesn't have that property as a requisite. I think having various executor(/heapam) checks that raise errors when queries are executed from within ANALYZE is acceptable. I don't know about the TOAST related angle Andres just raised. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Robert Haas
Date:
On Wed, Jun 15, 2016 at 2:59 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Kevin Grittner wrote: >> On Wed, Jun 15, 2016 at 1:50 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> >> > The expression index case is the one to worry about; if there is a >> > problem, that's where it is. What bothers me is that a function used >> > in an expression index could do anything at all - it can read any >> > table in the database. >> >> It *can*, but then you are lying to the database when you call it >> IMMUTABLE. Such an index can easily become corrupted through >> normal DML. Without DML the ANALYZE has no problem. So you seem >> to be concerned that if someone is lying to the database engine to >> force it accept a function as IMMUTABLE when it actually isn't, and >> then updating the referenced rows (which is very likely to render >> the index corrupted), that statistics might also become stale. > > We actually go quite some lengths to support this case, even when it's > the opinion of many that we shouldn't. For example VACUUM doesn't try > to find index entries using the values in each deleted tuple; instead we > remember the TIDs and then scan the indexes (possibly many times) to > find entries that match those TIDs -- which is much slower. Yet we do > it this way to protect the case that somebody is doing the > not-really-IMMUTABLE function. > > In other words, I don't think we consider the position you argued as > acceptable. Well, I actually don't think there's a giant problem here. I'm just trying to follow the chain of the argument to its (illogical) conclusion. I mean, if ANALYZE itself fails to see a tuple subjected to early pruning, that should be fine. And if some query run by a supposedly-but-not-actually immutable function errors out because snapshot_too_old is set, that also seems more or less fine. The statistics might not get updated, but oh well: either make your supposedly-immutable function actually immutable, or else turn off snapshot_too_old, or else live with the fact that ANALYZE will fail some percentage of the time. Supporting people who cheat and do things that technically aren't allowed is one thing; saying that every new feature must never have any downsides for such people is something else. If we took the latter approach, parallel query would be right out, because you sure can break things by mislabeling functions as PARALLEL SAFE. I *do* think it *must* be possible to get an ANALYZE to do something funky - either error out, or give wrong answers - if you set up a strange enough set of circumstances, but I don't think that's necessarily something we need to do anything about. I think this whole discussion of snapshot too old has provoked far too many bitter emails -- on all sides. I entirely believe that there are legitimate reasons to have concerns about this feature, and I entirely suspect that it has problems we haven't found yet, and I also entirely believe that there will be some future bugs that stem from this feature that we would not have had otherwise. I think it is entirely legitimate to have concerns about those things. On the other hand, I *also* believe that Kevin is a pretty careful guy and that he's done his best to make this patch safe and that he did not just go off and commit something half-baked without due reflection. We have to expect that if people who are committers don't get much review of their patches, they will eventually commit them anyway. The "I can't believe you committed this" reactions seem out of line to me. This feature is not perfect. Nor is it the worst thing anybody's ever committed. But it's definitely provoked more ire than most. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
"David G. Johnston"
Date:
Kevin Grittner wrote:
> On Wed, Jun 15, 2016 at 1:59 PM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
> > We actually go quite some lengths to support this case, even when it's
> > the opinion of many that we shouldn't. For example VACUUM doesn't try
> > to find index entries using the values in each deleted tuple; instead we
> > remember the TIDs and then scan the indexes (possibly many times) to
> > find entries that match those TIDs -- which is much slower. Yet we do
> > it this way to protect the case that somebody is doing the
> > not-really-IMMUTABLE function.
> >
> > In other words, I don't think we consider the position you argued as
> > acceptable.
>
> What are you saying is unacceptable, and what behavior would be
> acceptable instead?
The answer "we don't support the situation where you have an index using
an IMMUTABLE function that isn't actually immutable" is not acceptable.
The acceptable solution would be a design that doesn't have that
property as a requisite.
Yes, a much better solution would be for PostgreSQL to examine the body of every function and determine on its own the proper volatility - or lacking that to "sandbox" (for lack of a better term) function execution so it simply cannot do things that conflict with its user specified marking. But the prevailing opinion on this list is that such an effort is not worthy of resources and that "let the user keep both pieces"
is the more expedient policy. That this patch is being defended using that argument is consistent to policy and thus quite acceptable.
The technical details here are just beyond my reach ATM but I think Robert's meta-points are spot on. Though to be fair we are changing a fundamental assumption underlying how the system and transactions operate - the amount of code whose assumptions are now being stressed is non-trivial; and for a feature that will generally have less use in production - and likely in much higher-stakes arenas - having a professionally hostile approach will help to ensure that what is released has been thoroughly vetted.
These edge cases should be thought of, discussed, and ideally documented somewhere so that future coders can see and understand that said edges have been considered even if the answer is: "well, we don't blow up and at worse have some slightly off statistics, that seems fine".
David J.
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-06-15 14:24:58 -0500, Kevin Grittner wrote: > On Wed, Jun 15, 2016 at 2:20 PM, Andres Freund <andres@anarazel.de> wrote: > > > We might fetch a toast tuple which > > since have been re-purposed for a datum of a different type. > > How would that happen? Autovac vacuums toast and heap tables independently. Once a toast datum isn't used anymore, the oid used can be reused (because it doesn't conflict via GetNewOidWithIndex() anymore. If analyze then detoasts a datum, which hasn't been removed, the contents of that toast id, might actually be for something different. That's not super likely to happen (given how rare oid wraparounds usually are), but it appears to be possible. Regards, Andres
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Wed, Jun 15, 2016 at 3:25 PM, Andres Freund <andres@anarazel.de> wrote: > On 2016-06-15 14:24:58 -0500, Kevin Grittner wrote: >> On Wed, Jun 15, 2016 at 2:20 PM, Andres Freund <andres@anarazel.de> wrote: >> >> > We might fetch a toast tuple which >> > since have been re-purposed for a datum of a different type. >> >> How would that happen? > > Autovac vacuums toast and heap tables independently. Once a toast datum > isn't used anymore, the oid used can be reused (because it doesn't > conflict via GetNewOidWithIndex() anymore. If analyze then detoasts a > datum, which hasn't been removed, the contents of that toast id, might > actually be for something different. What prevents that from happening now, without STO? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Wed, Jun 15, 2016 at 2:40 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Kevin Grittner wrote: >> On Wed, Jun 15, 2016 at 1:59 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > >> > We actually go quite some lengths to support this case, even when it's >> > the opinion of many that we shouldn't. For example VACUUM doesn't try >> > to find index entries using the values in each deleted tuple; instead we >> > remember the TIDs and then scan the indexes (possibly many times) to >> > find entries that match those TIDs -- which is much slower. Yet we do >> > it this way to protect the case that somebody is doing the >> > not-really-IMMUTABLE function. >> > >> > In other words, I don't think we consider the position you argued as >> > acceptable. >> >> What are you saying is unacceptable, and what behavior would be >> acceptable instead? > > The answer "we don't support the situation where you have an index using > an IMMUTABLE function that isn't actually immutable" is not acceptable. > The acceptable solution would be a design that doesn't have that > property as a requisite. > > I think having various executor(/heapam) checks that raise errors when > queries are executed from within ANALYZE is acceptable. Here is an example of a test case showing that: -- connection 1 drop table if exists t1; create table t1 (c1 int not null); drop table if exists t2; create table t2 (c1 int not null); insert into t1 select generate_series(1, 10000); drop function mysq(i int); create function mysq(i int) returns int language plpgsql immutable as $mysq$ begin return (i * (select c1 from t2)); end $mysq$; insert into t2 values (1); create index t1_c1sq on t1 ((mysq(c1))); begin transaction isolation level repeatable read; select 1; -- connection 2 vacuum analyze verbose t1; delete from t1 where c1 between 1000 and 1999; delete from t1 where c1 = 8000; update t2 set c1 = 1; -- connection 1 analyze verbose t1; -- when run after threshold, STO error occurs The tail end of that, running the analyze once immediately and once after the threshold is: test=# -- connection 1 test=# analyze verbose t1; -- when run after threshold, STO error occurs INFO: analyzing "public.t1" INFO: "t1": scanned 45 of 45 pages, containing 8999 live rows and 1001 dead rows; 8999 rows in sample, 8999 estimated total rows ANALYZE test=# -- connection 1 analyze verbose t1; -- when run after threshold, STO error occurs INFO: analyzing "public.t1" INFO: "t1": scanned 45 of 45 pages, containing 8999 live rows and 1001 dead rows; 8999 rows in sample, 8999 estimated total rows ERROR: snapshot too old CONTEXT: SQL statement "SELECT (i * (select c1 from t2))" PL/pgSQL function mysq(integer) line 3 at RETURN Is there some other behavior which would be preferred? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-06-15 16:58:25 -0500, Kevin Grittner wrote: > On Wed, Jun 15, 2016 at 3:25 PM, Andres Freund <andres@anarazel.de> wrote: > > On 2016-06-15 14:24:58 -0500, Kevin Grittner wrote: > >> On Wed, Jun 15, 2016 at 2:20 PM, Andres Freund <andres@anarazel.de> wrote: > >> > >> > We might fetch a toast tuple which > >> > since have been re-purposed for a datum of a different type. > >> > >> How would that happen? > > > > Autovac vacuums toast and heap tables independently. Once a toast datum > > isn't used anymore, the oid used can be reused (because it doesn't > > conflict via GetNewOidWithIndex() anymore. If analyze then detoasts a > > datum, which hasn't been removed, the contents of that toast id, might > > actually be for something different. > > What prevents that from happening now, without STO? Afaics we shouldn't ever look (i.e. detoast) at a "dead for everyone" tuple in autovacuum (or anywhere else). There's one minor exception to that, and that's enum datums in indexes, which is why we currently have weird transactional requirements for them. I'm not entirely sure this can be hit, but it's worth checking. Andres
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Wed, Jun 15, 2016 at 5:34 PM, Andres Freund <andres@anarazel.de> wrote: > On 2016-06-15 16:58:25 -0500, Kevin Grittner wrote: >> On Wed, Jun 15, 2016 at 3:25 PM, Andres Freund <andres@anarazel.de> wrote: >>> On 2016-06-15 14:24:58 -0500, Kevin Grittner wrote: >>>> On Wed, Jun 15, 2016 at 2:20 PM, Andres Freund <andres@anarazel.de> wrote: >>>> >>>>> We might fetch a toast tuple which >>>>> since have been re-purposed for a datum of a different type. >>>> >>>> How would that happen? >>> >>> Autovac vacuums toast and heap tables independently. Once a toast datum >>> isn't used anymore, the oid used can be reused (because it doesn't >>> conflict via GetNewOidWithIndex() anymore. If analyze then detoasts a >>> datum, which hasn't been removed, the contents of that toast id, might >>> actually be for something different. >> >> What prevents that from happening now, without STO? > > Afaics we shouldn't ever look (i.e. detoast) at a "dead for everyone" > tuple in autovacuum (or anywhere else). There's one minor exception to > that, and that's enum datums in indexes, which is why we currently have > weird transactional requirements for them. I'm not entirely sure this > can be hit, but it's worth checking. I'm not clear where you see this as being in any way different with STO. Above it seemed that you saw this as an issue related to ANALYZE. If there is not early pruning for the table being analyzed, nothing is at all different. If there is early pruning the rows are not seen and there could be no detoasting. If there is a function that lies about IMMUTABLE and reads from a table, it either functions as before or throws a STO error on page access (long before any detoasting). Am I missing something? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-06-15 20:22:24 -0500, Kevin Grittner wrote: > I'm not clear where you see this as being in any way different with > STO. Above it seemed that you saw this as an issue related to > ANALYZE. If there is not early pruning for the table being > analyzed, nothing is at all different. If there is early pruning > the rows are not seen and there could be no detoasting. If there > is a function that lies about IMMUTABLE and reads from a table, it > either functions as before or throws a STO error on page access > (long before any detoasting). Am I missing something? I'm not sure, I might be missing something myself. Given the frequency of confusion of all senior hackers involved in this discussion... I previously was thinking of this in the context of ANALYZE, but I now think it's a bigger problem (and might not really affect ANALYZE itself). The simplest version of the scenario I'm concerned about is that a tuple in a tuple is *not* vacuumed, even though it's elegible to be removed due to STO. If that tuple has toasted columns, it could be the that the toast table was independently vacuumed (autovac considers main/toast tables separately, or the horizon could change between the two heap scans, or pins could prevent vacuuming of one page, or ...). Toast accesses via tuptoaster.c currently don't perform TestForOldSnapshot_impl(), because they use SnapshotToastData, not SnapshotMVCC. That seems to means two things: 1) You might get 'missing chunk number ...' errors on access to toasted columns. Misleading error, but ok. 2) Because the tuple has been pruned from the toast table, it's possible that the toast oid/va_valueid is reused, because now there's no conflict with GetNewOidWithIndex() anymore. In that case the toast_fetch_datum() might return a tuple from another column & type (all columns in a table share the same toast table), which could lead to almost arbitrary problems. That's not super likely to happen, but could have quite severe consequences once it starts. It seems the easiest way to fix this would be to make TestForOldSnapshot() "accept" SnapshotToast as well. Andres
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Alvaro Herrera
Date:
Kevin Grittner wrote: > test=# -- connection 1 > analyze verbose t1; -- when run after threshold, STO error occurs > INFO: analyzing "public.t1" > INFO: "t1": scanned 45 of 45 pages, containing 8999 live rows and > 1001 dead rows; 8999 rows in sample, 8999 estimated total rows > ERROR: snapshot too old > CONTEXT: SQL statement "SELECT (i * (select c1 from t2))" > PL/pgSQL function mysq(integer) line 3 at RETURN > > Is there some other behavior which would be preferred? The fact that the ERROR is being thrown seems okay to me. I was a bit surprised that the second INFO line is shown, but there's a simple explanation: we first acquire the sample rows (using acquire_sample_rows) and only after that's done we try to compute the stats from them. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Wed, Jun 15, 2016 at 8:57 PM, Andres Freund <andres@anarazel.de> wrote: > The simplest version of the scenario I'm concerned about is that a tuple > in a tuple is *not* vacuumed, even though it's elegible to be removed > due to STO. If that tuple has toasted columns, it could be the that the > toast table was independently vacuumed (autovac considers main/toast > tables separately, If that were really true, why would we not have the problem in current production versions that the toast table could be vacuumed before the heap, leading to exactly the issue you are talking about? It seems to me that a simple test shows that it is not the case that the heap is vacuumed without considering toast: test=# create table tt (c1 text not null); CREATE TABLE test=# insert into tt select repeat(md5(n::text),100000) from (select generate_series(1,100)) x(n); INSERT 0 100 test=# delete from tt; DELETE 100 test=# vacuum verbose tt; INFO: vacuuming "public.tt" INFO: "tt": removed 100 row versions in 1 pages INFO: "tt": found 100 removable, 0 nonremovable row versions in 1 out of 1 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. Skipped 0 pages due to buffer pins. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: vacuuming "pg_toast.pg_toast_16552" INFO: scanned index "pg_toast_16552_index" to remove 1900 row versions DETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "pg_toast_16552": removed 1900 row versions in 467 pages DETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "pg_toast_16552_index" now contains 0 row versions in 8 pages DETAIL: 1900 index row versions were removed. 5 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "pg_toast_16552": found 1900 removable, 0 nonremovable row versions in 467 out of 467 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. Skipped 0 pages due to buffer pins. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. VACUUM > or the horizon could change between the two heap scans, Not a problem in current production why? > or pins could prevent vacuuming of one page, or ...). Not a problem in current production why? So far I am not seeing any way for TOAST tuples to be pruned in advance of the referencing heap tuples, nor any way for the problem you describe to happen in the absence of that. If you see such, could you provide a more detailed description or a reproducible test case? > Toast accesses via tuptoaster.c currently don't perform > TestForOldSnapshot_impl(), because they use SnapshotToastData, not > SnapshotMVCC. > > That seems to means two things: > > 1) You might get 'missing chunk number ...' errors on access to toasted > columns. Misleading error, but ok. > > 2) Because the tuple has been pruned from the toast table, it's possible > that the toast oid/va_valueid is reused, because now there's no > conflict with GetNewOidWithIndex() anymore. In that case the > toast_fetch_datum() might return a tuple from another column & type > (all columns in a table share the same toast table), which could lead > to almost arbitrary problems. That's not super likely to happen, but > could have quite severe consequences once it starts. > > It seems the easiest way to fix this would be to make > TestForOldSnapshot() "accept" SnapshotToast as well. I don't think that would be appropriate without testing the characteristics of the underlying table to see whether it should be excluded. But is the TOAST data ever updated without an update to the referencing heap tuple? If not, I don't see any benefit. And we certainly don't want to add some new way to prune TOAST tuples which might still have referencing heap tuples; that could provide a route to *create* the problem you describe. I am on vacation tomorrow (Friday the 17th) through Monday the 27th, so I will be unable to respond to further issues during that time. Hopefully I can get this particular issue sorted out before I go. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-06-16 09:50:09 -0500, Kevin Grittner wrote: > On Wed, Jun 15, 2016 at 8:57 PM, Andres Freund <andres@anarazel.de> wrote: > > > The simplest version of the scenario I'm concerned about is that a tuple > > in a tuple is *not* vacuumed, even though it's elegible to be removed > > due to STO. If that tuple has toasted columns, it could be the that the > > toast table was independently vacuumed (autovac considers main/toast > > tables separately, > > If that were really true, why would we not have the problem in > current production versions that the toast table could be vacuumed > before the heap, leading to exactly the issue you are talking > about? The issue isn't there without the feature, because we (should) never access a tuple/detoast a column when it's invisible enough for the corresponding toast tuple to be vacuumed away. But with old_snapshot_timeout that's obviously (intentionally) not the case anymore. Due to old_snapshot_threshold we'll prune tuples which, without it, would still be considered HEAPTUPLE_RECENTLY_DEAD. > It seems to me that a simple test shows that it is not the > case that the heap is vacuumed without considering toast: That's why I mentioned autovacuum: /* * Scan pg_class to determine which tables to vacuum. * * We do this in two passes: on the first one we collect the list of plain * relations and materialized views, and on the second one we collect * TOAST tables. The reason for doing the second pass is that during it we * want to use the main relation's pg_class.reloptions entry if the TOAST * table does not have any, and we cannot obtain it unless we know * beforehand what's the main table OID. * * We need to check TOAST tables separately because in cases with short, * wide tables there might be proportionally much more activity in the * TOAST table than in its parent. */ ... tab->at_vacoptions = VACOPT_SKIPTOAST | (dovacuum ? VACOPT_VACUUM : 0) | (doanalyze ? VACOPT_ANALYZE : 0) | (!wraparound ? VACOPT_NOWAIT : 0); (note the skiptoast) ... /* * Remember the relation's TOAST relation for later, if the caller asked * us to process it. In VACUUM FULL, though, the toast table is * automatically rebuilt by cluster_rel so we shouldn't recurse to it. */ if (!(options & VACOPT_SKIPTOAST) && !(options & VACOPT_FULL)) toast_relid = onerel->rd_rel->reltoastrelid; else toast_relid = InvalidOid; ... if (toast_relid != InvalidOid) vacuum_rel(toast_relid, relation, options, params); > > or the horizon could change between the two heap scans, > > Not a problem in current production why? Because the horizon will never go to a value which allows "surely dead" tuples to be read, thus we never detoast columns from a tuple for which we'd removed toast data. That's why we're performing visibility tests (hopefully) everywhere, before accessing tuple contents (as opposed to inspecting the header). > > or pins could prevent vacuuming of one page, or ...). > > Not a problem in current production why? Same reason. > So far I am not seeing any way for TOAST tuples to be pruned in > advance of the referencing heap tuples, nor any way for the problem > you describe to happen in the absence of that. Didn't I just list three different ways, only one of which you doubted the veracity of? Saying "Not a problem in current production why" doesn't change it being a problem. > > It seems the easiest way to fix this would be to make > > TestForOldSnapshot() "accept" SnapshotToast as well. > > I don't think that would be appropriate without testing the > characteristics of the underlying table to see whether it should be > excluded. You mean checking whether it's a toast table? We could check that, but since we never use a toast scan outside of toast, it doesn't seem necessary. > But is the TOAST data ever updated without an update to > the referencing heap tuple? It shouldn't. > If not, I don't see any benefit. Huh? Greetings, Andres Freund
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Robert Haas
Date:
On Thu, Jun 16, 2016 at 11:37 AM, Andres Freund <andres@anarazel.de> wrote: >> If that were really true, why would we not have the problem in >> current production versions that the toast table could be vacuumed >> before the heap, leading to exactly the issue you are talking >> about? > > The issue isn't there without the feature, because we (should) never > access a tuple/detoast a column when it's invisible enough for the > corresponding toast tuple to be vacuumed away. But with > old_snapshot_timeout that's obviously (intentionally) not the case > anymore. Due to old_snapshot_threshold we'll prune tuples which, > without it, would still be considered HEAPTUPLE_RECENTLY_DEAD. Is there really an assumption that the heap and the TOAST heap are only ever vacuumed with the same OldestXmin value? Because that seems like it would be massively flaky. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-06-16 12:02:39 -0400, Robert Haas wrote: > On Thu, Jun 16, 2016 at 11:37 AM, Andres Freund <andres@anarazel.de> wrote: > >> If that were really true, why would we not have the problem in > >> current production versions that the toast table could be vacuumed > >> before the heap, leading to exactly the issue you are talking > >> about? > > > > The issue isn't there without the feature, because we (should) never > > access a tuple/detoast a column when it's invisible enough for the > > corresponding toast tuple to be vacuumed away. But with > > old_snapshot_timeout that's obviously (intentionally) not the case > > anymore. Due to old_snapshot_threshold we'll prune tuples which, > > without it, would still be considered HEAPTUPLE_RECENTLY_DEAD. > > Is there really an assumption that the heap and the TOAST heap are > only ever vacuumed with the same OldestXmin value? Because that seems > like it would be massively flaky. There's not. They can be vacuumed days apart. But if we vacuum the toast table with an OldestXmin, and encounter a dead toast tuple, by the definition of OldestXmin (excluding STO), there cannot be a session reading the referencing tuple anymore - so that shouldn't matter. IIRC we actually reverted a patch that caused significant problems around this. I think there's a small race condition around ProcessStandbyHSFeedbackMessage(), and you can restart with a different vacuum_defer_cleanup_age (we should just remove that), but other than that we shouldn't run into any issues without STO.
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Robert Haas
Date:
On Thu, Jun 16, 2016 at 12:14 PM, Andres Freund <andres@anarazel.de> wrote: >> > The issue isn't there without the feature, because we (should) never >> > access a tuple/detoast a column when it's invisible enough for the >> > corresponding toast tuple to be vacuumed away. But with >> > old_snapshot_timeout that's obviously (intentionally) not the case >> > anymore. Due to old_snapshot_threshold we'll prune tuples which, >> > without it, would still be considered HEAPTUPLE_RECENTLY_DEAD. >> >> Is there really an assumption that the heap and the TOAST heap are >> only ever vacuumed with the same OldestXmin value? Because that seems >> like it would be massively flaky. > > There's not. They can be vacuumed days apart. But if we vacuum the toast > table with an OldestXmin, and encounter a dead toast tuple, by the > definition of OldestXmin (excluding STO), there cannot be a session > reading the referencing tuple anymore - so that shouldn't matter. I don't understand how STO changes that. I'm not saying it doesn't change it, but I don't understand why it would. The root of my confusion is: if we prune a tuple, we'll bump the page LSN, so any session that is still referencing that tuple will error out as soon as it touches the page on which that tuple used to exist. It won't even survive long enough to care that the tuple isn't there any more. Maybe it would help if you lay out the whole sequence of events, like: S1: Does this. S2: Does that. S1: Now does something else. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-06-16 12:43:34 -0400, Robert Haas wrote: > On Thu, Jun 16, 2016 at 12:14 PM, Andres Freund <andres@anarazel.de> wrote: > >> > The issue isn't there without the feature, because we (should) never > >> > access a tuple/detoast a column when it's invisible enough for the > >> > corresponding toast tuple to be vacuumed away. But with > >> > old_snapshot_timeout that's obviously (intentionally) not the case > >> > anymore. Due to old_snapshot_threshold we'll prune tuples which, > >> > without it, would still be considered HEAPTUPLE_RECENTLY_DEAD. > >> > >> Is there really an assumption that the heap and the TOAST heap are > >> only ever vacuumed with the same OldestXmin value? Because that seems > >> like it would be massively flaky. > > > > There's not. They can be vacuumed days apart. But if we vacuum the toast > > table with an OldestXmin, and encounter a dead toast tuple, by the > > definition of OldestXmin (excluding STO), there cannot be a session > > reading the referencing tuple anymore - so that shouldn't matter. > > I don't understand how STO changes that. I'm not saying it doesn't > change it, but I don't understand why it would. Because we advance OldestXmin more aggressively, while allowing snapshots that are *older* than OldestXmin to access old tuples on pages which haven't been touched. > The root of my confusion is: if we prune a tuple, we'll bump the page > LSN, so any session that is still referencing that tuple will error > out as soon as it touches the page on which that tuple used to exist. Right. On the main table. But we don't peform that check on the toast table/pages. So if we prune toast tuples, which are still referenced by (unvacuumed) main relation, we can get into trouble. > It won't even survive long enough to care that the tuple isn't there > any more. > > Maybe it would help if you lay out the whole sequence of events, like: > > S1: Does this. > S2: Does that. > S1: Now does something else. I presume it'd be something like: Assuming a 'toasted' table, which contains one row, with a 1GB field. S1: BEGIN REPEATABLE READ; S1: SELECT SUM(length(one_gb_record)) FROM toasted; S2: DELETE FROM toasted; AUTOVAC: vacuum toasted's toast table, it's large. skip toasted, it's small S1: SELECT SUM(length(one_gb_record)) FROM toasted; <missing chunk error>
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Robert Haas
Date:
On Thu, Jun 16, 2016 at 12:54 PM, Andres Freund <andres@anarazel.de> wrote: >> The root of my confusion is: if we prune a tuple, we'll bump the page >> LSN, so any session that is still referencing that tuple will error >> out as soon as it touches the page on which that tuple used to exist. > > Right. On the main table. But we don't peform that check on the toast > table/pages. So if we prune toast tuples, which are still referenced by > (unvacuumed) main relation, we can get into trouble. OK, if it's true that we don't perform that check on the TOAST table, then I agree there's a potential problem there. I don't immediately know where in the code to look to check that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-06-16 13:44:34 -0400, Robert Haas wrote: > On Thu, Jun 16, 2016 at 12:54 PM, Andres Freund <andres@anarazel.de> wrote: > >> The root of my confusion is: if we prune a tuple, we'll bump the page > >> LSN, so any session that is still referencing that tuple will error > >> out as soon as it touches the page on which that tuple used to exist. > > > > Right. On the main table. But we don't peform that check on the toast > > table/pages. So if we prune toast tuples, which are still referenced by > > (unvacuumed) main relation, we can get into trouble. > > OK, if it's true that we don't perform that check on the TOAST table, > then I agree there's a potential problem there. I don't immediately > know where in the code to look to check that. static inline void TestForOldSnapshot(Snapshot snapshot, Relation relation, Page page) { Assert(relation != NULL); if (old_snapshot_threshold >= 0 && (snapshot) != NULL && (snapshot)->satisfies == HeapTupleSatisfiesMVCC && !XLogRecPtrIsInvalid((snapshot)->lsn) && PageGetLSN(page) > (snapshot)->lsn) TestForOldSnapshot_impl(snapshot, relation); } The relevant part is the HeapTupleSatisfiesMVCC check, we're using SatisfiesToast for toast lookups. FWIW, I just tried to reproduce this with old_snapshot_threshold = 0 - but ran into the problem that I couldn't get it to vacuum anything away (neither main nor toast rel). That appears to be if (old_snapshot_threshold == 0) { if (TransactionIdPrecedes(latest_xmin, MyPgXact->xmin) && TransactionIdFollows(latest_xmin, xlimit)) xlimit = latest_xmin; because latest_xmin always is equal to MyPgXact->xmin, which is actually kinda unsurprising? Regards, Andres
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Thu, Jun 16, 2016 at 1:01 PM, Andres Freund <andres@anarazel.de> wrote: > The relevant part is the HeapTupleSatisfiesMVCC check, we're using > SatisfiesToast for toast lookups. > > FWIW, I just tried to reproduce this with old_snapshot_threshold = 0 - > but ran into the problem that I couldn't get it to vacuum anything away > (neither main nor toast rel). That appears to be > if (old_snapshot_threshold == 0) > { > if (TransactionIdPrecedes(latest_xmin, MyPgXact->xmin) > && TransactionIdFollows(latest_xmin, xlimit)) > xlimit = latest_xmin; > because latest_xmin always is equal to MyPgXact->xmin, which is actually > kinda unsurprising? Sure -- the STO feature *never* advances the point for early pruning past the earliest still-active transaction ID. If it did we would have all sorts of weird problems. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Thu, Jun 16, 2016 at 11:54 AM, Andres Freund <andres@anarazel.de> wrote: > On 2016-06-16 12:43:34 -0400, Robert Haas wrote: >> The root of my confusion is: if we prune a tuple, we'll bump the page >> LSN, so any session that is still referencing that tuple will error >> out as soon as it touches the page on which that tuple used to exist. > > Right. On the main table. But we don't peform that check on the toast > table/pages. So if we prune toast tuples, which are still referenced by > (unvacuumed) main relation, we can get into trouble. I thought that we should never be using visibility information from the toast table; that the visibility information in the heap should control. If that's the case, how would we prune toast rows without pruning the heap? You pointed out that the *reverse* case has an option bit -- if that is ever set there could be toasted values which would not have a row. Do they still have a line pointer in the heap, like "dead" index entries? How are they cleaned up in current production versions? (Note the question mark -- I'm not big on using that with assertions and rarely fall back on rhetorical questions.) >> It won't even survive long enough to care that the tuple isn't there >> any more. >> >> Maybe it would help if you lay out the whole sequence of events, like: >> >> S1: Does this. >> S2: Does that. >> S1: Now does something else. > > I presume it'd be something like: > > Assuming a 'toasted' table, which contains one row, with a 1GB field. > > S1: BEGIN REPEATABLE READ; > S1: SELECT SUM(length(one_gb_record)) FROM toasted; > S2: DELETE FROM toasted; > AUTOVAC: vacuum toasted's toast table, it's large. skip toasted, it's small > S1: SELECT SUM(length(one_gb_record)) FROM toasted; > <missing chunk error> I'll put together a test like that and post in a bit. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-06-16 13:16:35 -0500, Kevin Grittner wrote: > On Thu, Jun 16, 2016 at 1:01 PM, Andres Freund <andres@anarazel.de> wrote: > > > The relevant part is the HeapTupleSatisfiesMVCC check, we're using > > SatisfiesToast for toast lookups. > > > > FWIW, I just tried to reproduce this with old_snapshot_threshold = 0 - > > but ran into the problem that I couldn't get it to vacuum anything away > > (neither main nor toast rel). That appears to be > > if (old_snapshot_threshold == 0) > > { > > if (TransactionIdPrecedes(latest_xmin, MyPgXact->xmin) > > && TransactionIdFollows(latest_xmin, xlimit)) > > xlimit = latest_xmin; > > because latest_xmin always is equal to MyPgXact->xmin, which is actually > > kinda unsurprising? > > Sure -- the STO feature *never* advances the point for early > pruning past the earliest still-active transaction ID. If it did > we would have all sorts of weird problems. Both latest_xmin, MyPgXact->xmin are equivalent to txid_current() here. Note that a threshold of 1 actually vacuums in this case (after waiting obviously), but 0 never does. Afaics that's because before TransactionIdLimitedForOldSnapshots() is reached, MaintainOldSnapshotTimeMapping will have updated latest_xmin to the current value. With old_snapshot_threshold=1 I indeed can reproduce the issue. I disabled autovacuum, to make the scheduling more predictable. But it should "work" just as well with autovacuum. S1: CREATE TABLE toasted(largecol text); INSERT INTO toasted SELECT string_agg(random()::text, '-') FROM generate_series(1, 10000000); BEGIN; DELETE FROM toasted; S2: BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ; S2: SELECT hashtext(largecol), length(largecol) FROM toasted; > ... S1: COMMIT; S2: SELECT hashtext(largecol), length(largecol) FROM toasted; > ... S1: /* wait for snapshot threshold to be passed */ S1: VACUUM pg_toast.pg_toast_16437; > INFO: 00000: "pg_toast_16437": found 61942 removable, 0 nonremovable row versions in 15486 out of 15486 pages > DETAIL: 0 dead row versions cannot be removed yet. S2: SELECT hashtext(largecol), length(largecol) FROM toasted; ERROR: XX000: missing chunk number 0 for toast value 16455 in pg_toast_16437 LOCATION: toast_fetch_datum, tuptoaster.c:1945 Andres
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
Hi, On 2016-06-16 13:19:13 -0500, Kevin Grittner wrote: > On Thu, Jun 16, 2016 at 11:54 AM, Andres Freund <andres@anarazel.de> wrote: > > On 2016-06-16 12:43:34 -0400, Robert Haas wrote: > > >> The root of my confusion is: if we prune a tuple, we'll bump the page > >> LSN, so any session that is still referencing that tuple will error > >> out as soon as it touches the page on which that tuple used to exist. > > > > Right. On the main table. But we don't peform that check on the toast > > table/pages. So if we prune toast tuples, which are still referenced by > > (unvacuumed) main relation, we can get into trouble. > > I thought that we should never be using visibility information from > the toast table; that the visibility information in the heap should > control. We use visibility information for vacuuming, toast vacuum puts toast chunk tuples through HeapTupleSatisfiesVacuum(), just like for normal tuples. Otherwise we'd have to collect dead toast tuples during the normal vacuum, and then do explicit vacuums for those. That'd end up being pretty expensive. > If that's the case, how would we prune toast rows without > pruning the heap? I'm not sure what you mean? We prune toast tuples by checking xmin/xmax, and then comparing with OldestXmin. Without STO that's safe, because we know nobody could lookup up those toast tuples. > You pointed out that the *reverse* case has an > option bit -- if that is ever set there could be toasted values > which would not have a row. We vacuum toast tables without the main table, by simply calling vacuum() on the toast relation. So you can get the case that only the normal relation is vacuumed *or* that only the toast relation is vacuumed. > Do they still have a line pointer in the heap, like "dead" index > entries? You can have non-pruned toast tuples, where any evidence of the referencing main-heap tuple is gone. > How are they cleaned up in current production versions? There's simply no interlock except OldestXmin preveting referenced toast tuples to be vacuumed, as long as any alive snapshot can read them. Greetings, Andres Freund
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Thu, Jun 16, 2016 at 1:19 PM, Kevin Grittner <kgrittn@gmail.com> wrote: > On Thu, Jun 16, 2016 at 11:54 AM, Andres Freund <andres@anarazel.de> wrote: >> On 2016-06-16 12:43:34 -0400, Robert Haas wrote: >>> Maybe it would help if you lay out the whole sequence of events, like: >>> >>> S1: Does this. >>> S2: Does that. >>> S1: Now does something else. >> >> I presume it'd be something like: >> >> Assuming a 'toasted' table, which contains one row, with a 1GB field. >> >> S1: BEGIN REPEATABLE READ; >> S1: SELECT SUM(length(one_gb_record)) FROM toasted; >> S2: DELETE FROM toasted; >> AUTOVAC: vacuum toasted's toast table, it's large. skip toasted, it's small >> S1: SELECT SUM(length(one_gb_record)) FROM toasted; >> <missing chunk error> > > I'll put together a test like that and post in a bit. old_snapshot_threshold = '1min' autovacuum_vacuum_threshold = 0\ autovacuum_vacuum_scale_factor = 0.0000000001 test=# CREATE TABLE gb (rec bytea not null); CREATE TABLE test=# ALTER TABLE gb ALTER COLUMN rec SET STORAGE external; ALTER TABLE test=# INSERT INTO gb SELECT t FROM (SELECT repeat('x', 1000000000)::bytea) x(t); INSERT 0 1 test=# BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ; BEGIN test=# SELECT SUM(length(rec)) FROM gb; sum ------------ 1000000000 (1 row) [wait for autovacuum to run] test=# SELECT SUM(length(rec)) FROM gb; ERROR: snapshot too old -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Thu, Jun 16, 2016 at 1:32 PM, Andres Freund <andres@anarazel.de> wrote: > With old_snapshot_threshold=1 I indeed can reproduce the issue. I > disabled autovacuum, to make the scheduling more predictable. But it > should "work" just as well with autovacuum. > > S1: CREATE TABLE toasted(largecol text); > INSERT INTO toasted SELECT string_agg(random()::text, '-') FROM generate_series(1, 10000000); > BEGIN; > DELETE FROM toasted; > S2: BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ; > S2: SELECT hashtext(largecol), length(largecol) FROM toasted; >> ... > S1: COMMIT; > S2: SELECT hashtext(largecol), length(largecol) FROM toasted; >> ... > S1: /* wait for snapshot threshold to be passed */ > S1: VACUUM pg_toast.pg_toast_16437; >> INFO: 00000: "pg_toast_16437": found 61942 removable, 0 nonremovable row versions in 15486 out of 15486 pages >> DETAIL: 0 dead row versions cannot be removed yet. > S2: SELECT hashtext(largecol), length(largecol) FROM toasted; > ERROR: XX000: missing chunk number 0 for toast value 16455 in pg_toast_16437 > LOCATION: toast_fetch_datum, tuptoaster.c:1945 Thanks! That's something I should be able to work with. Unfortunately, I am going to be on vacation, so I won't have any results until sometime after 28 June. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-06-16 13:53:01 -0500, Kevin Grittner wrote: > On Thu, Jun 16, 2016 at 1:19 PM, Kevin Grittner <kgrittn@gmail.com> wrote: > > On Thu, Jun 16, 2016 at 11:54 AM, Andres Freund <andres@anarazel.de> wrote: > >> On 2016-06-16 12:43:34 -0400, Robert Haas wrote: > > >>> Maybe it would help if you lay out the whole sequence of events, like: > >>> > >>> S1: Does this. > >>> S2: Does that. > >>> S1: Now does something else. > >> > >> I presume it'd be something like: > >> > >> Assuming a 'toasted' table, which contains one row, with a 1GB field. > >> > >> S1: BEGIN REPEATABLE READ; > >> S1: SELECT SUM(length(one_gb_record)) FROM toasted; > >> S2: DELETE FROM toasted; > >> AUTOVAC: vacuum toasted's toast table, it's large. skip toasted, it's small > >> S1: SELECT SUM(length(one_gb_record)) FROM toasted; > >> <missing chunk error> > > > > I'll put together a test like that and post in a bit. > > old_snapshot_threshold = '1min' > autovacuum_vacuum_threshold = 0\ > autovacuum_vacuum_scale_factor = 0.0000000001 > > test=# CREATE TABLE gb (rec bytea not null); > CREATE TABLE > test=# ALTER TABLE gb ALTER COLUMN rec SET STORAGE external; > ALTER TABLE > test=# INSERT INTO gb SELECT t FROM (SELECT repeat('x', > 1000000000)::bytea) x(t); > INSERT 0 1 > test=# BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ; > BEGIN > test=# SELECT SUM(length(rec)) FROM gb; > sum > ------------ > 1000000000 > (1 row) > > [wait for autovacuum to run] > > test=# SELECT SUM(length(rec)) FROM gb; > ERROR: snapshot too old See https://www.postgresql.org/message-id/20160616183207.wygoktoplycdzav7@alap3.anar for a recipe that reproduce the issue. I presume your example also vacuums the main table due to the threshold and scale factor you set (which will pretty much alwasy vacuum a table, no?). Andres Freund
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-06-16 11:57:48 -0700, Andres Freund wrote: > See https://www.postgresql.org/message-id/20160616183207.wygoktoplycdzav7@alap3.anar For posterity's sake, that was supposed to be https://www.postgresql.org/message-id/20160616183207.wygoktoplycdzav7@alap3.anarazel.de
Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Noah Misch
Date:
On Thu, Jun 16, 2016 at 01:56:44PM -0500, Kevin Grittner wrote: > On Thu, Jun 16, 2016 at 1:32 PM, Andres Freund <andres@anarazel.de> wrote: > > > With old_snapshot_threshold=1 I indeed can reproduce the issue. I > > disabled autovacuum, to make the scheduling more predictable. But it > > should "work" just as well with autovacuum. > > > > S1: CREATE TABLE toasted(largecol text); > > INSERT INTO toasted SELECT string_agg(random()::text, '-') FROM generate_series(1, 10000000); > > BEGIN; > > DELETE FROM toasted; > > S2: BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ; > > S2: SELECT hashtext(largecol), length(largecol) FROM toasted; > >> ... > > S1: COMMIT; > > S2: SELECT hashtext(largecol), length(largecol) FROM toasted; > >> ... > > S1: /* wait for snapshot threshold to be passed */ > > S1: VACUUM pg_toast.pg_toast_16437; > >> INFO: 00000: "pg_toast_16437": found 61942 removable, 0 nonremovable row versions in 15486 out of 15486 pages > >> DETAIL: 0 dead row versions cannot be removed yet. > > S2: SELECT hashtext(largecol), length(largecol) FROM toasted; > > ERROR: XX000: missing chunk number 0 for toast value 16455 in pg_toast_16437 > > LOCATION: toast_fetch_datum, tuptoaster.c:1945 > > Thanks! That's something I should be able to work with. > Unfortunately, I am going to be on vacation, so I won't have any > results until sometime after 28 June. This PostgreSQL 9.6 open item is past due for your status update. Kindly send a status update within 24 hours, and include a date for your subsequent status update. Refer to the policy on open item ownership: http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com
Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-06-30 23:51:18 -0400, Noah Misch wrote: > On Thu, Jun 16, 2016 at 01:56:44PM -0500, Kevin Grittner wrote: > > On Thu, Jun 16, 2016 at 1:32 PM, Andres Freund <andres@anarazel.de> wrote: > > > > > With old_snapshot_threshold=1 I indeed can reproduce the issue. I > > > disabled autovacuum, to make the scheduling more predictable. But it > > > should "work" just as well with autovacuum. > > > > > > S1: CREATE TABLE toasted(largecol text); > > > INSERT INTO toasted SELECT string_agg(random()::text, '-') FROM generate_series(1, 10000000); > > > BEGIN; > > > DELETE FROM toasted; > > > S2: BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ; > > > S2: SELECT hashtext(largecol), length(largecol) FROM toasted; > > >> ... > > > S1: COMMIT; > > > S2: SELECT hashtext(largecol), length(largecol) FROM toasted; > > >> ... > > > S1: /* wait for snapshot threshold to be passed */ > > > S1: VACUUM pg_toast.pg_toast_16437; > > >> INFO: 00000: "pg_toast_16437": found 61942 removable, 0 nonremovable row versions in 15486 out of 15486 pages > > >> DETAIL: 0 dead row versions cannot be removed yet. > > > S2: SELECT hashtext(largecol), length(largecol) FROM toasted; > > > ERROR: XX000: missing chunk number 0 for toast value 16455 in pg_toast_16437 > > > LOCATION: toast_fetch_datum, tuptoaster.c:1945 > > > > Thanks! That's something I should be able to work with. > > Unfortunately, I am going to be on vacation, so I won't have any > > results until sometime after 28 June. > > This PostgreSQL 9.6 open item is past due for your status update. Kindly send > a status update within 24 hours, and include a date for your subsequent status > update. Refer to the policy on open item ownership: > http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com IIRC Kevin is out of the office this week, so this'll have to wait till next week. Andres
Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Robert Haas
Date:
On Fri, Jul 1, 2016 at 2:48 AM, Andres Freund <andres@anarazel.de> wrote: >> This PostgreSQL 9.6 open item is past due for your status update. Kindly send >> a status update within 24 hours, and include a date for your subsequent status >> update. Refer to the policy on open item ownership: >> http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com > > IIRC Kevin is out of the office this week, so this'll have to wait till > next week. No, he's back since Tuesday - it was last week that he was out. I spoke with him yesterday about this and he indicated that he had been thinking about it and had several ideas about how to fix it. I'm not sure why he hasn't posted here yet. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Fri, Jul 1, 2016 at 7:17 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Jul 1, 2016 at 2:48 AM, Andres Freund <andres@anarazel.de> wrote: >>> This PostgreSQL 9.6 open item is past due for your status update. Kindly send >>> a status update within 24 hours, and include a date for your subsequent status >>> update. Refer to the policy on open item ownership: >>> http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com >> >> IIRC Kevin is out of the office this week, so this'll have to wait till >> next week. > > No, he's back since Tuesday - it was last week that he was out. I > spoke with him yesterday about this and he indicated that he had been > thinking about it and had several ideas about how to fix it. I'm not > sure why he hasn't posted here yet. I have been looking at several possible fixes, and weighing the pros and cons of each. I expect to post a patch later today. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Noah Misch
Date:
On Fri, Jul 01, 2016 at 09:00:45AM -0500, Kevin Grittner wrote: > On Fri, Jul 1, 2016 at 7:17 AM, Robert Haas <robertmhaas@gmail.com> wrote: > > On Fri, Jul 1, 2016 at 2:48 AM, Andres Freund <andres@anarazel.de> wrote: > >>> This PostgreSQL 9.6 open item is past due for your status update. Kindly send > >>> a status update within 24 hours, and include a date for your subsequent status > >>> update. Refer to the policy on open item ownership: > >>> http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com > >> > >> IIRC Kevin is out of the office this week, so this'll have to wait till > >> next week. > > > > No, he's back since Tuesday - it was last week that he was out. I > > spoke with him yesterday about this and he indicated that he had been > > thinking about it and had several ideas about how to fix it. I'm not > > sure why he hasn't posted here yet. > > I have been looking at several possible fixes, and weighing the > pros and cons of each. I expect to post a patch later today. This PostgreSQL 9.6 open item is past due for your status update. Kindly send a status update within 24 hours, and include a date for your subsequent status update. Refer to the policy on open item ownership: http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com
Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Sat, Jul 2, 2016 at 1:29 PM, Noah Misch <noah@leadboat.com> wrote: > On Fri, Jul 01, 2016 at 09:00:45AM -0500, Kevin Grittner wrote: >> I have been looking at several possible fixes, and weighing the >> pros and cons of each. I expect to post a patch later today. > > This PostgreSQL 9.6 open item is past due for your status update. Kindly send > a status update within 24 hours, and include a date for your subsequent status > update. Refer to the policy on open item ownership: > http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com Attached is a patch which fixes this issue, which I will push Monday unless there are objections. The problem to be fixed is this: TOAST values can be pruned or vacuumed away while the heap still has references to them, but the visibility data is such that you should not be able to see the referencing heap tuple once the TOAST value is old enough to clean up. When the old_snapshot_threshold is set to allow early pruning, early cleanup of the TOAST values could occur while a connection can still see the heap row, and the "snapshot too old" error might not be triggered. (In practice, it's fairly hard to hit that, but a test case will be included in a bit.) It would even be possible to have an overlapping transaction which is old enough to create a new value with the old OID after it is removed, which might even be of a different data type. The gymnastics required to hit that are too daunting to have created a test case, but it seems possible. The possible fixes considered were these: (1) Always vacuum the heap before its related TOAST table. (2) Same as (1) but only when old_snapshot_threshold >= 0. (3) Allow the special snapshot used for TOAST access to generate the "snapshot too old" error, so that the modified page from the pruning/vacuuming (along with other conditions) would cause that rather than something suggesting corrupt internal structure. (4) When looking to read a toasted value for a tuple past the early pruning horizon, if the value was not found consider it a "snapshot too old" error. (5) Don't vacuum or prune a TOAST table except as part of the heap vacuum when early pruning is enabled. (6) Don't allow early vacuuming/pruning of TOAST values except as part of the vacuuming of the related heap. It became evident pretty quickly that the HOT pruning of TOAST values should not do early cleanup, based on practical concerns of coordinating that with the heap cleanup for any of the above options. What's more, since we don't allow updating of tuples holding TOAST values, HOT pruning seems to be of dubious value on a TOAST table in general -- but removing that would be the matter for a separate patch. Anyway, this patch includes a small hunk of code (two lines) to avoid early HOT pruning for TOAST tables. For the vacuuming, option (6) seems a clear winner, and that is what this patch implements. A TOAST table can still be vacuumed on its own, but in that case it will not use old_snapshot_threshold to try to do any early cleanup. We were already normally vacuuming the TOAST table whenever we vacuumed the related heap; in such a case it uses the "oldestXmin" used for the heap to vacuum the TOAST table. The other options either could not limit errors to cases when they were really needed or had to pass through way too much information through many layers to know what actions to take when. Option (6) basically conditions the call to try to use a more aggressive cleanup threshold on whether the relation is a TOAST relation and a flag indicating whether we are in a particular vacuum function based on the recursive call made from heap vacuum to cover its TOAST table. Not the most elegant code, but fairly straightforward. The net result is that, like existing production versions, we can have heap rows pointing to missing TOAST values, but only when those heap rows are not visible to anyone. Test case (adapted from one provided by Andres Freund): -- START WITH: -- autovacuum = off -- old_snapshot_threshold = 1 -- connection 1 SHOW autovacuum; SHOW old_snapshot_threshold; DROP TABLE IF EXISTS toasted; CREATE TABLE toasted(largecol text); INSERT INTO toasted SELECT string_agg(random()::text, '-') FROM generate_series(1, 10000000); BEGIN; DELETE FROM toasted; -- connection 2 BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ; SELECT hashtext(largecol), length(largecol) FROM toasted; -- connection 1 COMMIT; -- connection 2 SELECT hashtext(largecol), length(largecol) FROM toasted; -- connection 1 SELECT hashtext(largecol), length(largecol) FROM toasted; -- connection 1 /* wait for snapshot threshold to be passed */ SELECT oid FROM pg_class WHERE relname = 'toasted'; VACUUM VERBOSE pg_toast.pg_toast_?; SELECT hashtext(largecol), length(largecol) FROM toasted; -- connection 2 SELECT hashtext(largecol), length(largecol) FROM toasted; -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Robert Haas
Date:
On Sat, Jul 2, 2016 at 3:20 PM, Kevin Grittner <kgrittn@gmail.com> wrote: > On Sat, Jul 2, 2016 at 1:29 PM, Noah Misch <noah@leadboat.com> wrote: >> On Fri, Jul 01, 2016 at 09:00:45AM -0500, Kevin Grittner wrote: > >>> I have been looking at several possible fixes, and weighing the >>> pros and cons of each. I expect to post a patch later today. >> >> This PostgreSQL 9.6 open item is past due for your status update. Kindly send >> a status update within 24 hours, and include a date for your subsequent status >> update. Refer to the policy on open item ownership: >> http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com > > Attached is a patch which fixes this issue, which I will push > Monday unless there are objections. Considering that (1) this was posted on a weekend and (2) that Monday is also a US holiday and (3) that we are not about to wrap a release, I think you should postpone the proposed commit date by a few days to allow time for review. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Sat, Jul 2, 2016 at 5:10 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sat, Jul 2, 2016 at 3:20 PM, Kevin Grittner <kgrittn@gmail.com> wrote: >> Attached is a patch which fixes this issue, which I will push >> Monday unless there are objections. > > Considering that (1) this was posted on a weekend and (2) that Monday > is also a US holiday and (3) that we are not about to wrap a release, > I think you should postpone the proposed commit date by a few days to > allow time for review. OK, will push Thursday, the 7th of July unless there are objections. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-07-02 14:20:13 -0500, Kevin Grittner wrote: > The possible fixes considered were these: > > (1) Always vacuum the heap before its related TOAST table. I think that's clearly not ok from a cost perspective. > (2) Same as (1) but only when old_snapshot_threshold >= 0. > (3) Allow the special snapshot used for TOAST access to generate > the "snapshot too old" error, so that the modified page from the > pruning/vacuuming (along with other conditions) would cause that > rather than something suggesting corrupt internal structure. > (4) When looking to read a toasted value for a tuple past the > early pruning horizon, if the value was not found consider it a > "snapshot too old" error. Doesn't solve the issue that a toast id might end up being reused. > (5) Don't vacuum or prune a TOAST table except as part of the heap > vacuum when early pruning is enabled. That's pretty costly. > (6) Don't allow early vacuuming/pruning of TOAST values except as > part of the vacuuming of the related heap. > It became evident pretty quickly that the HOT pruning of TOAST > values should not do early cleanup, based on practical concerns of > coordinating that with the heap cleanup for any of the above > options. What's more, since we don't allow updating of tuples > holding TOAST values, HOT pruning seems to be of dubious value on a > TOAST table in general -- but removing that would be the matter for > a separate patch. I'm not following here. Sure, there'll be no HOT chains, but hot pruning also releases space (though not item pointers) for dead tuples. And that's fairly valuable in high-churn tables? > Anyway, this patch includes a small hunk of code > (two lines) to avoid early HOT pruning for TOAST tables. I see it's only prohibiting the old_snapshot_threshold triggered cleanup, good. > For the vacuuming, option (6) seems a clear winner, and that is > what this patch implements. A TOAST table can still be vacuumed on > its own, but in that case it will not use old_snapshot_threshold to > try to do any early cleanup. > We were already normally vacuuming > the TOAST table whenever we vacuumed the related heap; in such a > case it uses the "oldestXmin" used for the heap to vacuum the TOAST > table. That's not the case. Autovacuum schedules main and toast tables independently. Check the two collection loops in do_autovacuum:/* * On the first pass, we collect main tables to vacuum,and also the main * table relid to TOAST relid mapping. */while ((tuple = heap_getnext(relScan, ForwardScanDirection))!= NULL) { ... relation_needs_vacanalyze(relid, relopts, classForm, tabentry, effective_multixact_freeze_max_age, &dovacuum, &doanalyze, &wraparound); ... /* relations that need work are added to table_oids */ if (dovacuum || doanalyze) table_oids = lappend_oid(table_oids, relid); } .../* second pass: check TOAST tables */while ((tuple = heap_getnext(relScan, ForwardScanDirection)) != NULL){ ... relation_needs_vacanalyze(relid, relopts, classForm, tabentry, effective_multixact_freeze_max_age, &dovacuum, &doanalyze, &wraparound); /* ignore analyze for toast tables */ if (dovacuum) table_oids = lappend_oid(table_oids, relid); } So I don't think that approach still allows old snapshot related cleanups for toast triggered vacuums? Is that an acceptable restriction? Greetings, Andres Freund
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Wed, Jul 6, 2016 at 4:55 PM, Andres Freund <andres@anarazel.de> wrote: > So I don't think that approach still allows old snapshot related > cleanups for toast triggered vacuums? Is that an acceptable > restriction? What I would rather see is that if the heap is vacuumed (whether or not by autovacuum) then the related TOAST table is also vacuumed (using the same horizon the heap used), but if the TOAST relation is chosen for vacuum by itself that it does not attempt to adjust the horizon based on old_snapshot_threshold. I am looking to see how to make that happen; expect a new patch Monday. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-07-08 11:00:50 -0500, Kevin Grittner wrote: > On Wed, Jul 6, 2016 at 4:55 PM, Andres Freund <andres@anarazel.de> wrote: > > > So I don't think that approach still allows old snapshot related > > cleanups for toast triggered vacuums? Is that an acceptable > > restriction? > > What I would rather see is that if the heap is vacuumed (whether or > not by autovacuum) then the related TOAST table is also vacuumed > (using the same horizon the heap used), but if the TOAST relation > is chosen for vacuum by itself that it does not attempt to adjust > the horizon based on old_snapshot_threshold. Uh, wouldn't that quote massively regress the autovacuum workload in some cases? There's a reason they're considered separately after all. And in many cases, even if there's lots of updates in the heap table, the toast table doesn't get any updates. And the toast table is often a lot larger than the data. Regards, Andres
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Fri, Jul 8, 2016 at 12:53 PM, Andres Freund <andres@anarazel.de> wrote: > On 2016-07-08 11:00:50 -0500, Kevin Grittner wrote: >> On Wed, Jul 6, 2016 at 4:55 PM, Andres Freund <andres@anarazel.de> wrote: >> >> > So I don't think that approach still allows old snapshot related >> > cleanups for toast triggered vacuums? Is that an acceptable >> > restriction? >> >> What I would rather see is that if the heap is vacuumed (whether or >> not by autovacuum) then the related TOAST table is also vacuumed >> (using the same horizon the heap used), but if the TOAST relation >> is chosen for vacuum by itself that it does not attempt to adjust >> the horizon based on old_snapshot_threshold. > > Uh, wouldn't that quote massively regress the autovacuum workload in > some cases? There's a reason they're considered separately after > all. And in many cases, even if there's lots of updates in the heap > table, the toast table doesn't get any updates. And the toast table is > often a lot larger than the data. Of course, the toast table has only one index, and it is narrow. With the visibility map, it should visit only the needed pages in the toast's heap area, so any regression would be in the case that: (1) old_snapshot_threshold >= 0 (2) the "normal" heap met the conditions for vacuum, but the heap didn't (3) when passing the toast heap based on visibility map, *some* cleanup was done (otherwise the TID list would be empty,so no index pass is needed) Any extra work would be at least partially offset by pushing back the point where the next vacuum of toast data would be needed and by removing index entries and keeping both the toast data and index smaller. I'm sure you could find cases where there was a net performance loss, but I'm also sure that by containing toast size when it would otherwise grow for weeks or months, it could be a very large performance gain. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-07-08 13:32:35 -0500, Kevin Grittner wrote: > On Fri, Jul 8, 2016 at 12:53 PM, Andres Freund <andres@anarazel.de> wrote: > > On 2016-07-08 11:00:50 -0500, Kevin Grittner wrote: > >> On Wed, Jul 6, 2016 at 4:55 PM, Andres Freund <andres@anarazel.de> wrote: > >> > >> > So I don't think that approach still allows old snapshot related > >> > cleanups for toast triggered vacuums? Is that an acceptable > >> > restriction? > >> > >> What I would rather see is that if the heap is vacuumed (whether or > >> not by autovacuum) then the related TOAST table is also vacuumed > >> (using the same horizon the heap used), but if the TOAST relation > >> is chosen for vacuum by itself that it does not attempt to adjust > >> the horizon based on old_snapshot_threshold. > > > > Uh, wouldn't that quote massively regress the autovacuum workload in > > some cases? There's a reason they're considered separately after > > all. And in many cases, even if there's lots of updates in the heap > > table, the toast table doesn't get any updates. And the toast table is > > often a lot larger than the data. > > Of course, the toast table has only one index, and it is narrow. But that index and the table are often large... > With the visibility map, it should visit only the needed pages in > the toast's heap area, so any regression would be in the case that: > > (1) old_snapshot_threshold >= 0 > (2) the "normal" heap met the conditions for vacuum, but the heap > didn't > (3) when passing the toast heap based on visibility map, *some* > cleanup was done (otherwise the TID list would be empty, so no > index pass is needed) Unfortunately btree performs an index scan, even if there's no tids to clean up. See the unconditional calls to lazy_cleanup_index()->amvacuumcleanup(). C.f./* * If btbulkdelete was called, we need not do anything, just return the *stats from the latest btbulkdelete call. If it wasn't called, we must * still do a pass over the index, to recycle anynewly-recyclable pages * and to obtain index statistics. * * Since we aren't going to actually delete any leaf items,there's no * need to go through all the vacuum-cycle-ID pushups. > but I'm also sure that by containing toast size > when it would otherwise grow for weeks or months, it could be a > very large performance gain. That's an argument for changing autovacuum heuristics, not for making this change as a side-effect of a bugfix. I'm a bit confused, why aren't we simply adding LSN interlock checks for toast? Doesn't look that hard? Seems like a much more natural course of fixing this issue? Regards, Andres
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Fri, Jul 8, 2016 at 1:52 PM, Andres Freund <andres@anarazel.de> wrote: > I'm a bit confused, why aren't we simply adding LSN interlock > checks for toast? Doesn't look that hard? Seems like a much more > natural course of fixing this issue? I took some time trying to see what you have in mind, and I'm really not "getting it". I definitely applaud you for spotting the problem, but this suggestion for solving it doesn't seem to be useful. Basically, after turning this suggestion every way I could, I see two alternative ways to implement it. (1) Whenever TestForOldSnapshot() checks a heap page, check whether the related toast is OK for all visible tuples on that page. It would be enough to check one toast tuple for one value per heap tuple, but still -- this would be really nasty from a performance perspective. (2) To deal with the fact that only about 7% of the BufferGetPage() calls need to make this check, all functions and macros which read toast data from the table would need extra parameters, and all call sites for the toast API would need to have such context information passed to them, so they could specify this correctly. Ugh. Compared to those options, the approach I was taking, where the fix is "automatic" but some workloads where old_snapshot_threshold is on would sequentially read some toast indexes more often seems pretty tame. Do you see some other option that fits what you describe? I'll give you a couple days to respond before coding the patch. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Amit Kapila
Date:
On Tue, Jul 12, 2016 at 8:34 PM, Kevin Grittner <kgrittn@gmail.com> wrote: > On Fri, Jul 8, 2016 at 1:52 PM, Andres Freund <andres@anarazel.de> wrote: > >> I'm a bit confused, why aren't we simply adding LSN interlock >> checks for toast? Doesn't look that hard? Seems like a much more >> natural course of fixing this issue? > > I took some time trying to see what you have in mind, and I'm > really not "getting it". > Isn't it possible if we initialize lsn and whenTaken in SnapshotToast when old_snapshot_threshold > 0 and add a check for HeapTupleSatisfiesToast in TestForOldSnapshot()? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Wed, Jul 13, 2016 at 7:57 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Tue, Jul 12, 2016 at 8:34 PM, Kevin Grittner <kgrittn@gmail.com> wrote: >> On Fri, Jul 8, 2016 at 1:52 PM, Andres Freund <andres@anarazel.de> wrote: >> >>> I'm a bit confused, why aren't we simply adding LSN interlock >>> checks for toast? Doesn't look that hard? Seems like a much more >>> natural course of fixing this issue? >> >> I took some time trying to see what you have in mind, and I'm >> really not "getting it". > > Isn't it possible if we initialize lsn and whenTaken in SnapshotToast > when old_snapshot_threshold > 0 and add a check for > HeapTupleSatisfiesToast in TestForOldSnapshot()? With that approach, how will we know *not* to generate an error when reading the chain of tuples for a value we are deleting. Or positioning to modify an index on toast data. Etc., etc. etc. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-07-12 10:04:45 -0500, Kevin Grittner wrote: > On Fri, Jul 8, 2016 at 1:52 PM, Andres Freund <andres@anarazel.de> wrote: > > > I'm a bit confused, why aren't we simply adding LSN interlock > > checks for toast? Doesn't look that hard? Seems like a much more > > natural course of fixing this issue? > > I took some time trying to see what you have in mind, and I'm > really not "getting it". I definitely applaud you for spotting the > problem, but this suggestion for solving it doesn't seem to be > useful. ... > Basically, after turning this suggestion every way I could, I see > two alternative ways to implement it. What I was actually getting at was to perform TestForOldSnapshot() in the HeapTupleSatisfiesToast case as well. That'd require minor amounts of work to keep the lsn up2date, but otherwise should be fairly easy to implement. It seems much more logical to use the same mechanism we use for heap for toast as well, rather than implementing something separate. - Andres
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-07-13 10:06:52 -0500, Kevin Grittner wrote: > On Wed, Jul 13, 2016 at 7:57 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Jul 12, 2016 at 8:34 PM, Kevin Grittner <kgrittn@gmail.com> wrote: > >> On Fri, Jul 8, 2016 at 1:52 PM, Andres Freund <andres@anarazel.de> wrote: > >> > >>> I'm a bit confused, why aren't we simply adding LSN interlock > >>> checks for toast? Doesn't look that hard? Seems like a much more > >>> natural course of fixing this issue? > >> > >> I took some time trying to see what you have in mind, and I'm > >> really not "getting it". > > > > Isn't it possible if we initialize lsn and whenTaken in SnapshotToast > > when old_snapshot_threshold > 0 and add a check for > > HeapTupleSatisfiesToast in TestForOldSnapshot()? > > With that approach, how will we know *not* to generate an error > when reading the chain of tuples for a value we are deleting. Or > positioning to modify an index on toast data. Etc., etc. etc. I'm not following. How is that different in the toast case than in the heap case?
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Wed, Jul 13, 2016 at 12:48 PM, Andres Freund <andres@anarazel.de> wrote: > On 2016-07-13 10:06:52 -0500, Kevin Grittner wrote: >> On Wed, Jul 13, 2016 at 7:57 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >>> On Tue, Jul 12, 2016 at 8:34 PM, Kevin Grittner <kgrittn@gmail.com> wrote: >>>> On Fri, Jul 8, 2016 at 1:52 PM, Andres Freund <andres@anarazel.de> wrote: >>>> >>>>> I'm a bit confused, why aren't we simply adding LSN interlock >>>>> checks for toast? Doesn't look that hard? Seems like a much more >>>>> natural course of fixing this issue? >>>> >>>> I took some time trying to see what you have in mind, and I'm >>>> really not "getting it". >>> >>> Isn't it possible if we initialize lsn and whenTaken in SnapshotToast >>> when old_snapshot_threshold > 0 and add a check for >>> HeapTupleSatisfiesToast in TestForOldSnapshot()? >> >> With that approach, how will we know *not* to generate an error >> when reading the chain of tuples for a value we are deleting. Or >> positioning to modify an index on toast data. Etc., etc. etc. > > I'm not following. How is that different in the toast case than in the > heap case? A short answer is that a normal table's heap doesn't go through systable_getnext_ordered(). That function is used both for cases where the check should not be made, like toast_delete_datum(), and cases where it should, like toast_fetch_datum(). Since this keeps coming up, I'll produce a patch this way. I'm skeptical, but maybe it will look better than I think it will. I should be able to post that by Friday. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-07-13 15:57:02 -0500, Kevin Grittner wrote: > A short answer is that a normal table's heap doesn't go through > systable_getnext_ordered(). That function is used both for cases > where the check should not be made, like toast_delete_datum(), and > cases where it should, like toast_fetch_datum(). It *has* to be be made for toast_delete_datum(). Otherwise we could end up deleting a since reused toast id. Or am I missing something? Andres
Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Noah Misch
Date:
On Wed, Jul 13, 2016 at 03:57:02PM -0500, Kevin Grittner wrote: > On Wed, Jul 13, 2016 at 12:48 PM, Andres Freund <andres@anarazel.de> wrote: > > On 2016-07-13 10:06:52 -0500, Kevin Grittner wrote: > >> On Wed, Jul 13, 2016 at 7:57 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > >>> On Tue, Jul 12, 2016 at 8:34 PM, Kevin Grittner <kgrittn@gmail.com> wrote: > >>>> On Fri, Jul 8, 2016 at 1:52 PM, Andres Freund <andres@anarazel.de> wrote: > >>>> > >>>>> I'm a bit confused, why aren't we simply adding LSN interlock > >>>>> checks for toast? Doesn't look that hard? Seems like a much more > >>>>> natural course of fixing this issue? > >>>> > >>>> I took some time trying to see what you have in mind, and I'm > >>>> really not "getting it". > >>> > >>> Isn't it possible if we initialize lsn and whenTaken in SnapshotToast > >>> when old_snapshot_threshold > 0 and add a check for > >>> HeapTupleSatisfiesToast in TestForOldSnapshot()? > >> > >> With that approach, how will we know *not* to generate an error > >> when reading the chain of tuples for a value we are deleting. Or > >> positioning to modify an index on toast data. Etc., etc. etc. > > > > I'm not following. How is that different in the toast case than in the > > heap case? > > A short answer is that a normal table's heap doesn't go through > systable_getnext_ordered(). That function is used both for cases > where the check should not be made, like toast_delete_datum(), and > cases where it should, like toast_fetch_datum(). > > Since this keeps coming up, I'll produce a patch this way. I'm > skeptical, but maybe it will look better than I think it will. I > should be able to post that by Friday. This PostgreSQL 9.6 open item is past due for your status update. Kindly send a status update within 24 hours, and include a date for your subsequent status update. Refer to the policy on open item ownership: http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com
Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Noah Misch
Date:
On Sat, Jul 16, 2016 at 06:48:08PM -0400, Noah Misch wrote: > On Wed, Jul 13, 2016 at 03:57:02PM -0500, Kevin Grittner wrote: > > On Wed, Jul 13, 2016 at 12:48 PM, Andres Freund <andres@anarazel.de> wrote: > > > On 2016-07-13 10:06:52 -0500, Kevin Grittner wrote: > > >> On Wed, Jul 13, 2016 at 7:57 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > > >>> On Tue, Jul 12, 2016 at 8:34 PM, Kevin Grittner <kgrittn@gmail.com> wrote: > > >>>> On Fri, Jul 8, 2016 at 1:52 PM, Andres Freund <andres@anarazel.de> wrote: > > >>>> > > >>>>> I'm a bit confused, why aren't we simply adding LSN interlock > > >>>>> checks for toast? Doesn't look that hard? Seems like a much more > > >>>>> natural course of fixing this issue? > > >>>> > > >>>> I took some time trying to see what you have in mind, and I'm > > >>>> really not "getting it". > > >>> > > >>> Isn't it possible if we initialize lsn and whenTaken in SnapshotToast > > >>> when old_snapshot_threshold > 0 and add a check for > > >>> HeapTupleSatisfiesToast in TestForOldSnapshot()? > > >> > > >> With that approach, how will we know *not* to generate an error > > >> when reading the chain of tuples for a value we are deleting. Or > > >> positioning to modify an index on toast data. Etc., etc. etc. > > > > > > I'm not following. How is that different in the toast case than in the > > > heap case? > > > > A short answer is that a normal table's heap doesn't go through > > systable_getnext_ordered(). That function is used both for cases > > where the check should not be made, like toast_delete_datum(), and > > cases where it should, like toast_fetch_datum(). > > > > Since this keeps coming up, I'll produce a patch this way. I'm > > skeptical, but maybe it will look better than I think it will. I > > should be able to post that by Friday. > > This PostgreSQL 9.6 open item is past due for your status update. Kindly send > a status update within 24 hours, and include a date for your subsequent status > update. Refer to the policy on open item ownership: > http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com IMMEDIATE ATTENTION REQUIRED. This PostgreSQL 9.6 open item is long past due for your status update. Please reacquaint yourself with the policy on open item ownership[1] and then reply immediately. If I do not hear from you by 2016-07-20 03:00 UTC, I will transfer this item to release management team ownership without further notice. [1] http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com
Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Mon, Jul 18, 2016 at 9:10 PM, Noah Misch <noah@leadboat.com> wrote: > On Sat, Jul 16, 2016 at 06:48:08PM -0400, Noah Misch wrote: >> On Wed, Jul 13, 2016 at 03:57:02PM -0500, Kevin Grittner wrote: >>> On Wed, Jul 13, 2016 at 12:48 PM, Andres Freund <andres@anarazel.de> wrote: >>>> On 2016-07-13 10:06:52 -0500, Kevin Grittner wrote: >>>>> On Wed, Jul 13, 2016 at 7:57 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >>>>>> On Tue, Jul 12, 2016 at 8:34 PM, Kevin Grittner <kgrittn@gmail.com> wrote: >>>>>>> On Fri, Jul 8, 2016 at 1:52 PM, Andres Freund <andres@anarazel.de> wrote: >>>>>>> >>>>>>>> I'm a bit confused, why aren't we simply adding LSN interlock >>>>>>>> checks for toast? Doesn't look that hard? Seems like a much more >>>>>>>> natural course of fixing this issue? >>>>>>> >>>>>>> I took some time trying to see what you have in mind, and I'm >>>>>>> really not "getting it". >>>>>> >>>>>> Isn't it possible if we initialize lsn and whenTaken in SnapshotToast >>>>>> when old_snapshot_threshold > 0 and add a check for >>>>>> HeapTupleSatisfiesToast in TestForOldSnapshot()? >>>>> >>>>> With that approach, how will we know *not* to generate an error >>>>> when reading the chain of tuples for a value we are deleting. Or >>>>> positioning to modify an index on toast data. Etc., etc. etc. >>>> >>>> I'm not following. How is that different in the toast case than in the >>>> heap case? >>> >>> A short answer is that a normal table's heap doesn't go through >>> systable_getnext_ordered(). That function is used both for cases >>> where the check should not be made, like toast_delete_datum(), and >>> cases where it should, like toast_fetch_datum(). >>> >>> Since this keeps coming up, I'll produce a patch this way. I'm >>> skeptical, but maybe it will look better than I think it will. I >>> should be able to post that by Friday. >> >> This PostgreSQL 9.6 open item is past due for your status update. Kindly send >> a status update within 24 hours, and include a date for your subsequent status >> update. Refer to the policy on open item ownership: >> http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com > > IMMEDIATE ATTENTION REQUIRED. This PostgreSQL 9.6 open item is long past due > for your status update. Please reacquaint yourself with the policy on open > item ownership[1] and then reply immediately. If I do not hear from you by > 2016-07-20 03:00 UTC, I will transfer this item to release management team > ownership without further notice. > > [1] http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com As far as I can see, to do this the way that Andres and Amit suggest involves tying in to indexam.c and other code in incredibly ugly ways. I think it is entirely the wrong way to go, as I can't find a way to make it look remotely sane. The question is whether I should do it the way that I think is sane, or whether someone else wants to show me what I'm missing by producing at least a rough patch along these lines. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-07-19 18:09:59 -0500, Kevin Grittner wrote: > As far as I can see, to do this the way that Andres and Amit > suggest involves tying in to indexam.c and other code in incredibly > ugly ways. Could you explain the problem you're seing? Isn't pretty much all all that we need to do: 1) add a InitSnapshotToast(Snapshot originMVCCSnap), which sets SnapshotData->lsn to the the origin snapshot's lsn 2) adapt TestForOldSnapshot() to accept both HeapTupleSatisfiesMVCC and HeapTupleSatisfiesToast? I mean the only difference between toast / plain heap table WRT old_snapshot_threshold is that we don't use a mvcc snapshot. > I think it is entirely the wrong way to go, as I can't > find a way to make it look remotely sane. The question is whether > I should do it the way that I think is sane, or whether someone > else wants to show me what I'm missing by producing at least a > rough patch along these lines. I'll, but I'd prefer you explaining the problem first. Maybe it's me missing the obvious problem. Andres
Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Noah Misch
Date:
On Tue, Jul 19, 2016 at 06:09:59PM -0500, Kevin Grittner wrote: > On Mon, Jul 18, 2016 at 9:10 PM, Noah Misch <noah@leadboat.com> wrote: > > On Sat, Jul 16, 2016 at 06:48:08PM -0400, Noah Misch wrote: > >> This PostgreSQL 9.6 open item is past due for your status update. Kindly send > >> a status update within 24 hours, and include a date for your subsequent status > >> update. Refer to the policy on open item ownership: > >> http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com > > > > IMMEDIATE ATTENTION REQUIRED. This PostgreSQL 9.6 open item is long past due > > for your status update. Please reacquaint yourself with the policy on open > > item ownership[1] and then reply immediately. If I do not hear from you by > > 2016-07-20 03:00 UTC, I will transfer this item to release management team > > ownership without further notice. > > > > [1] http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com > > As far as I can see, to do this the way that Andres and Amit > suggest involves tying in to indexam.c and other code in incredibly > ugly ways. I think it is entirely the wrong way to go, as I can't > find a way to make it look remotely sane. The question is whether > I should do it the way that I think is sane, or whether someone > else wants to show me what I'm missing by producing at least a > rough patch along these lines. This does not qualify as a status update, because it does not include a date for your subsequent status update.
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Amit Kapila
Date:
On Wed, Jul 20, 2016 at 5:02 AM, Andres Freund <andres@anarazel.de> wrote: > On 2016-07-19 18:09:59 -0500, Kevin Grittner wrote: >> As far as I can see, to do this the way that Andres and Amit >> suggest involves tying in to indexam.c and other code in incredibly >> ugly ways. > > Could you explain the problem you're seing? > > Isn't pretty much all all that we need to do: > 1) add a InitSnapshotToast(Snapshot originMVCCSnap), which sets SnapshotData->lsn > to the the origin snapshot's lsn > 2) adapt TestForOldSnapshot() to accept both HeapTupleSatisfiesMVCC and > HeapTupleSatisfiesToast? > I also think so. However, it is not clear what is the best place to initialize toast snapshot. One idea could be to do it in GetSnapshotData() after capturing the required information for the valid value of old_snapshot_threshold. Do you have something else in mind? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On July 19, 2016 7:14:42 PM PDT, Amit Kapila <amit.kapila16@gmail.com> wrote: >On Wed, Jul 20, 2016 at 5:02 AM, Andres Freund <andres@anarazel.de> >wrote: >> On 2016-07-19 18:09:59 -0500, Kevin Grittner wrote: >>> As far as I can see, to do this the way that Andres and Amit >>> suggest involves tying in to indexam.c and other code in incredibly >>> ugly ways. >> >> Could you explain the problem you're seing? >> >> Isn't pretty much all all that we need to do: >> 1) add a InitSnapshotToast(Snapshot originMVCCSnap), which sets >SnapshotData->lsn >> to the the origin snapshot's lsn >> 2) adapt TestForOldSnapshot() to accept both HeapTupleSatisfiesMVCC >and >> HeapTupleSatisfiesToast? >> > >I also think so. However, it is not clear what is the best place to >initialize toast snapshot. One idea could be to do it in >GetSnapshotData() after capturing the required information for the >valid value of old_snapshot_threshold. Do you have something else in >mind? There's very few callsites using toast snapshots. I'd just do it there. Don't think we ever use GetSnapshotData for them. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Amit Kapila
Date:
On Wed, Jul 20, 2016 at 7:57 AM, Andres Freund <andres@anarazel.de> wrote: > > > On July 19, 2016 7:14:42 PM PDT, Amit Kapila <amit.kapila16@gmail.com> wrote: >>On Wed, Jul 20, 2016 at 5:02 AM, Andres Freund <andres@anarazel.de> >>wrote: >>> On 2016-07-19 18:09:59 -0500, Kevin Grittner wrote: >>>> As far as I can see, to do this the way that Andres and Amit >>>> suggest involves tying in to indexam.c and other code in incredibly >>>> ugly ways. >>> >>> Could you explain the problem you're seing? >>> >>> Isn't pretty much all all that we need to do: >>> 1) add a InitSnapshotToast(Snapshot originMVCCSnap), which sets >>SnapshotData->lsn >>> to the the origin snapshot's lsn >>> 2) adapt TestForOldSnapshot() to accept both HeapTupleSatisfiesMVCC >>and >>> HeapTupleSatisfiesToast? >>> >> >>I also think so. However, it is not clear what is the best place to >>initialize toast snapshot. One idea could be to do it in >>GetSnapshotData() after capturing the required information for the >>valid value of old_snapshot_threshold. Do you have something else in >>mind? > > There's very few callsites using toast snapshots. I'd just do it there. Don't think we ever use GetSnapshotData for them. > I think Snapshot's members whenTaken and lsn are updated/initialized only in GetSnapshotData(). So if GetSnapshotData() is not used, how will you expect those fields to be updated. We need those fields to be updated for TestForOldSnapshot(). -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On July 19, 2016 7:43:05 PM PDT, Amit Kapila <amit.kapila16@gmail.com> wrote: >On Wed, Jul 20, 2016 at 7:57 AM, Andres Freund <andres@anarazel.de> >wrote: >> >> >> On July 19, 2016 7:14:42 PM PDT, Amit Kapila ><amit.kapila16@gmail.com> wrote: >>>On Wed, Jul 20, 2016 at 5:02 AM, Andres Freund <andres@anarazel.de> >>>wrote: >>>> On 2016-07-19 18:09:59 -0500, Kevin Grittner wrote: >>>>> As far as I can see, to do this the way that Andres and Amit >>>>> suggest involves tying in to indexam.c and other code in >incredibly >>>>> ugly ways. >>>> >>>> Could you explain the problem you're seing? >>>> >>>> Isn't pretty much all all that we need to do: >>>> 1) add a InitSnapshotToast(Snapshot originMVCCSnap), which sets >>>SnapshotData->lsn >>>> to the the origin snapshot's lsn >>>> 2) adapt TestForOldSnapshot() to accept both HeapTupleSatisfiesMVCC >>>and >>>> HeapTupleSatisfiesToast? >>>> >>> >>>I also think so. However, it is not clear what is the best place to >>>initialize toast snapshot. One idea could be to do it in >>>GetSnapshotData() after capturing the required information for the >>>valid value of old_snapshot_threshold. Do you have something else in >>>mind? >> >> There's very few callsites using toast snapshots. I'd just do it >there. Don't think we ever use GetSnapshotData for them. >> > >I think Snapshot's members whenTaken and lsn are updated/initialized >only in GetSnapshotData(). So if GetSnapshotData() is not used, how >will you expect those fields to be updated. We need those fields to >be updated for TestForOldSnapshot(). That's why I suggested copying them from the current mvcc snapshot. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Kevin Grittner
Date:
On Tue, Jul 19, 2016 at 6:32 PM, Andres Freund <andres@anarazel.de> wrote: > I mean the only difference between toast / plain heap table WRT > old_snapshot_threshold is that we don't use a mvcc snapshot. We use different functions and never, ever call BufferGetPage -- except for deep in the bowels of the AMs. Countless functions would need to be modified to pass in information about whether any call is one of those which need to test for snapshot-too-old. Since "normal" heap and index access is already covered without that, yet use the AMs, there would be a weird "double coverage" to look out for. On top of all that, you would need to not only throw errors for some cases but (as you pointed out earlier in the thread) turn others into no-ops. Also, some of the toast calls are very far from the calls for the base row, where a function might decide to de-toast some toast pointer. With the naive approach of what you suggest, frequency of checking would go from once per page (containing multiple tuples) to that *plus* once per toast chunk per value per heap tuple, although it seems like checking any one (like the first) toast chunk for a value would suffice. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Robert Haas
Date:
On Wed, Jul 20, 2016 at 3:39 AM, Andres Freund <andres@anarazel.de> wrote: >>I think Snapshot's members whenTaken and lsn are updated/initialized >>only in GetSnapshotData(). So if GetSnapshotData() is not used, how >>will you expect those fields to be updated. We need those fields to >>be updated for TestForOldSnapshot(). > > That's why I suggested copying them from the current mvcc snapshot. And how do you obtain that? The functions that reference SnapshotToast are toast_delete_datum, toastrel_value_exists, and toast_fetch_datum, toast_fetch_datum_slice, but none of those take a snapshot as an argument, nor is there any reasonable way to make them do so. Those are indirectly called by things like bttextcmp, which don't know what snapshot was used to fetch the datum that they are detoasting and can't reasonably be made to know. I mean, you could do something *approximately* correct by calling GetActiveSnapshot() but that doesn't seem likely to be correct in detail. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-07-20 11:26:11 -0400, Robert Haas wrote: > On Wed, Jul 20, 2016 at 3:39 AM, Andres Freund <andres@anarazel.de> wrote: > >>I think Snapshot's members whenTaken and lsn are updated/initialized > >>only in GetSnapshotData(). So if GetSnapshotData() is not used, how > >>will you expect those fields to be updated. We need those fields to > >>be updated for TestForOldSnapshot(). > > > > That's why I suggested copying them from the current mvcc snapshot. > > And how do you obtain that? The functions that reference > SnapshotToast are toast_delete_datum, toastrel_value_exists, and > toast_fetch_datum, toast_fetch_datum_slice, but none of those take a > snapshot as an argument, nor is there any reasonable way to make them > do so. Those are indirectly called by things like bttextcmp, which > don't know what snapshot was used to fetch the datum that they are > detoasting and can't reasonably be made to know. > > I mean, you could do something *approximately* correct by calling > GetActiveSnapshot() but that doesn't seem likely to be correct in > detail. GetActiveSnapshot() seems like it should work well enough in this case, or we could use pairingheap_first() to get the actual oldest registered one.
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Alvaro Herrera
Date:
Andres Freund wrote: > On 2016-07-20 11:26:11 -0400, Robert Haas wrote: > > On Wed, Jul 20, 2016 at 3:39 AM, Andres Freund <andres@anarazel.de> wrote: > > >>I think Snapshot's members whenTaken and lsn are updated/initialized > > >>only in GetSnapshotData(). So if GetSnapshotData() is not used, how > > >>will you expect those fields to be updated. We need those fields to > > >>be updated for TestForOldSnapshot(). > > > > > > That's why I suggested copying them from the current mvcc snapshot. > > > > And how do you obtain that? The functions that reference > > SnapshotToast are toast_delete_datum, toastrel_value_exists, and > > toast_fetch_datum, toast_fetch_datum_slice, but none of those take a > > snapshot as an argument, nor is there any reasonable way to make them > > do so. Those are indirectly called by things like bttextcmp, which > > don't know what snapshot was used to fetch the datum that they are > > detoasting and can't reasonably be made to know. > > > > I mean, you could do something *approximately* correct by calling > > GetActiveSnapshot() but that doesn't seem likely to be correct in > > detail. > > GetActiveSnapshot() seems like it should work well enough in this case, > or we could use pairingheap_first() to get the actual oldest registered > one. Hmm. Why is the active snapshot not sufficient? Perhaps we need some kind of redesign or minor tweak to snapmgr to keep track of the oldest snapshot of a resowner or something like that? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Robert Haas
Date:
On Wed, Jul 20, 2016 at 12:30 PM, Andres Freund <andres@anarazel.de> wrote: >> And how do you obtain that? The functions that reference >> SnapshotToast are toast_delete_datum, toastrel_value_exists, and >> toast_fetch_datum, toast_fetch_datum_slice, but none of those take a >> snapshot as an argument, nor is there any reasonable way to make them >> do so. Those are indirectly called by things like bttextcmp, which >> don't know what snapshot was used to fetch the datum that they are >> detoasting and can't reasonably be made to know. >> >> I mean, you could do something *approximately* correct by calling >> GetActiveSnapshot() but that doesn't seem likely to be correct in >> detail. > > GetActiveSnapshot() seems like it should work well enough in this case, > or we could use pairingheap_first() to get the actual oldest registered > one. It's hard to believe that it's equally good to use the newest registered snapshot (which is, I think, what you will often get from GetActiveSnapshot()) and the oldest registered snapshot (which is what you will get from pairingheap_first()). It seems to me that we need to analyze what happens if we choose a snapshot that is older than the one used to find the datum which contained the toast pointer, and conversely what happens if we use a snapshot that is newer than the one we used to find the toast pointer. Here's an attempt: 1. If we pick a snapshot that is older than the one that found the scan tuple, we might get a "snapshot too old" error that is not strictly necessary. 2. If we pick a snapshot that is newer than the one that found the scan tuple, then haven't we failed to fix the problem? I'm not so sure about this direction, but if it's OK to test an arbitrarily new snapshot, then I can't see why we need the test at all. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Andres Freund
Date:
On 2016-07-20 13:59:32 -0400, Robert Haas wrote: > It's hard to believe that it's equally good to use the newest > registered snapshot (which is, I think, what you will often get from > GetActiveSnapshot()) and the oldest registered snapshot (which is what > you will get from pairingheap_first()). It seems to me that we need > to analyze what happens if we choose a snapshot that is older than the > one used to find the datum which contained the toast pointer, and > conversely what happens if we use a snapshot that is newer than the > one we used to find the toast pointer. Yea, the oldest seems better. > Here's an attempt: > > 1. If we pick a snapshot that is older than the one that found the > scan tuple, we might get a "snapshot too old" error that is not > strictly necessary. Right. Which still seems a lot better than essentially pessimizing vacuuming for toast tables considerably. > 2. If we pick a snapshot that is newer than the one that found the > scan tuple, then haven't we failed to fix the problem? I'm not so > sure about this direction, but if it's OK to test an arbitrarily new > snapshot, then I can't see why we need the test at all. I think some argument could be construed why it'd possibly be safe, but I feel a lot better with the other option. Andres
Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Noah Misch
Date:
On Tue, Jul 19, 2016 at 09:01:05PM -0400, Noah Misch wrote: > On Tue, Jul 19, 2016 at 06:09:59PM -0500, Kevin Grittner wrote: > > On Mon, Jul 18, 2016 at 9:10 PM, Noah Misch <noah@leadboat.com> wrote: > > > On Sat, Jul 16, 2016 at 06:48:08PM -0400, Noah Misch wrote: > > >> This PostgreSQL 9.6 open item is past due for your status update. Kindly send > > >> a status update within 24 hours, and include a date for your subsequent status > > >> update. Refer to the policy on open item ownership: > > >> http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com > > > > > > IMMEDIATE ATTENTION REQUIRED. This PostgreSQL 9.6 open item is long past due > > > for your status update. Please reacquaint yourself with the policy on open > > > item ownership[1] and then reply immediately. If I do not hear from you by > > > 2016-07-20 03:00 UTC, I will transfer this item to release management team > > > ownership without further notice. > > > > > > [1] http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com > > > > As far as I can see, to do this the way that Andres and Amit > > suggest involves tying in to indexam.c and other code in incredibly > > ugly ways. I think it is entirely the wrong way to go, as I can't > > find a way to make it look remotely sane. The question is whether > > I should do it the way that I think is sane, or whether someone > > else wants to show me what I'm missing by producing at least a > > rough patch along these lines. > > This does not qualify as a status update, because it does not include a date > for your subsequent status update. This PostgreSQL 9.6 open item now needs a permanent owner. Would any other committer like to take ownership? If this role interests you, please read this thread and the policy linked above, then send an initial status update bearing a date for your subsequent status update. If the item does not have a permanent owner by 2016-07-24 02:00 UTC, I will resolve the item by reverting commit 848ef42 and followups. Thanks, nm
Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
From
Robert Haas
Date:
On Wed, Jul 20, 2016 at 9:15 PM, Noah Misch <noah@leadboat.com> wrote: > This PostgreSQL 9.6 open item now needs a permanent owner. Would any other > committer like to take ownership? If this role interests you, please read > this thread and the policy linked above, then send an initial status update > bearing a date for your subsequent status update. If the item does not have a > permanent owner by 2016-07-24 02:00 UTC, I will resolve the item by reverting > commit 848ef42 and followups. I will adopt this item. I will provide an initial patch for this issue, or convince someone else to do so, within one week. Therefore, expect a further status update from me on or before July 28th. I expect that the patch will be based on ideas from these emails: https://www.postgresql.org/message-id/1AB8F80A-D16E-4154-9497-98FBB164253D@anarazel.de https://www.postgresql.org/message-id/20160720181213.f4io7gc6lyc377sw@alap3.anarazel.de -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company