Thread: cost-based vacuum
I am beginning to look at Postgres 8, and am particularly interested in cost-based vacuum/analyze. I'm hoping someone can shed some light on the behavior I am seeing. Suppose there are three threads: writer_thread every 1/15 second do BEGIN TRANSACTION COPY table1 FROM stdin ... COPY tableN FROM stdin perform several UPDATEs, DELETEs and INSERTs COMMIT reader_thread every 1/15 second do BEGIN TRANSACTION SELECT FROM table1 ... ... SELECT FROM tableN ... COMMIT analyze_thread every 5 minutes do ANALYZE table1 ... ANALYZE tableN Now, Postgres 8.0.3 out-of-the-box (all default configs) on a particular piece of hardware runs the Postgres connection for writer_thread at about 15% CPU (meaningless, I know, but for comparison) and runs the Postgres connection for reader_thread at about 30% CPU. Latency for reader_thread seeing updates from writer_thread is well under 1/15s. Impact of analyze_thread is negligible. If I make the single configuration change of setting vacuum_cost_delay=1000, each iteration in analyze_thread takes much longer, of course. But what I also see is that the CPU usage of the connections for writer_thread and reader_thread spike up to well over 80% each (this is a dualie) and latency drops to 8-10s, during the ANALYZEs. I don't understand why this would be. I don't think there are any lock issues, and I don't see any obvious I/O issues. Am I missing something? Is there any way to get some insight into what those connections are doing? Thanks, --Ian
Ian Westmacott <ianw@intellivid.com> writes: > If I make the single configuration change of setting > vacuum_cost_delay=1000, each iteration in analyze_thread takes > much longer, of course. But what I also see is that the CPU > usage of the connections for writer_thread and reader_thread > spike up to well over 80% each (this is a dualie) and latency > drops to 8-10s, during the ANALYZEs. [ scratches head... ] That doesn't make any sense at all. > I don't understand why this would be. I don't think there > are any lock issues, and I don't see any obvious I/O issues. > Am I missing something? Is there any way to get some > insight into what those connections are doing? Profiling maybe? Can you put together a self-contained test case that replicates this behavior, so other people could look? regards, tom lane
On Fri, 2005-07-08 at 12:25 -0400, Ian Westmacott wrote: > I am beginning to look at Postgres 8, and am particularly > interested in cost-based vacuum/analyze. I'm hoping someone > can shed some light on the behavior I am seeing. > > Suppose there are three threads: > > writer_thread > every 1/15 second do > BEGIN TRANSACTION > COPY table1 FROM stdin > ... > COPY tableN FROM stdin > perform several UPDATEs, DELETEs and INSERTs > COMMIT > > reader_thread > every 1/15 second do > BEGIN TRANSACTION > SELECT FROM table1 ... > ... > SELECT FROM tableN ... > COMMIT > > analyze_thread > every 5 minutes do > ANALYZE table1 > ... > ANALYZE tableN > > > Now, Postgres 8.0.3 out-of-the-box (all default configs) on a > particular piece of hardware runs the Postgres connection for > writer_thread at about 15% CPU (meaningless, I know, but for > comparison) and runs the Postgres connection for reader_thread > at about 30% CPU. Latency for reader_thread seeing updates > from writer_thread is well under 1/15s. Impact of > analyze_thread is negligible. > > If I make the single configuration change of setting > vacuum_cost_delay=1000, each iteration in analyze_thread takes > much longer, of course. But what I also see is that the CPU > usage of the connections for writer_thread and reader_thread > spike up to well over 80% each (this is a dualie) and latency > drops to 8-10s, during the ANALYZEs. > > I don't understand why this would be. I don't think there > are any lock issues, and I don't see any obvious I/O issues. > Am I missing something? Is there any way to get some > insight into what those connections are doing? The ANALYZE commands hold read locks on the tables you wish to write to. If you slow them down, you merely slow down your write transactions also, and then the read transactions that wait behind them. Every time the ANALYZE sleeps it wakes up the other transactions, which then realise they can't move because of locks and then wake up the ANALYZEs for another shot. The end result is that you introduce more context- switching, without any chance of doing more useful work while the ANALYZEs sleep. Don't use the vacuum_cost_delay in this situation. You might try setting it to 0 for the analyze_thread only. Sounds like you could speed things up by splitting everything into two sets of tables, with writer_thread1 and writer_thread2 etc. That way your 2 CPUs would be able to independently be able to get through more work without locking each other out. Best Regards, Simon Riggs
On Mon, 2005-07-11 at 07:31, Simon Riggs wrote: > The ANALYZE commands hold read locks on the tables you wish to write to. > If you slow them down, you merely slow down your write transactions > also, and then the read transactions that wait behind them. Every time > the ANALYZE sleeps it wakes up the other transactions, which then > realise they can't move because of locks and then wake up the ANALYZEs > for another shot. The end result is that you introduce more context- > switching, without any chance of doing more useful work while the > ANALYZEs sleep. Let me make sure I understand. ANALYZE acquires a read lock on the table, that it holds until the operation is complete (including any sleeps). That read lock blocks the extension of that table via COPY. Is that right? According to the 8.0 docs, ANALYZE acquires an ACCESS SHARE lock on the table, and that conflicts only with ACCESS EXCLUSIVE. Thats why I didn't think I had a lock issue, since I think COPY only needs ROW EXCLUSIVE. Or perhaps the transaction needs something more? Thanks, --Ian
Simon Riggs <simon@2ndquadrant.com> writes: >> I don't understand why this would be. I don't think there >> are any lock issues, and I don't see any obvious I/O issues. > The ANALYZE commands hold read locks on the tables you wish to write to. Unless there were more commands that Ian didn't show us, he's not taking any locks that would conflict with ANALYZE. So I don't believe this is the explanation. regards, tom lane
On Mon, 2005-07-11 at 09:07 -0400, Ian Westmacott wrote: > On Mon, 2005-07-11 at 07:31, Simon Riggs wrote: > > The ANALYZE commands hold read locks on the tables you wish to write to. > > If you slow them down, you merely slow down your write transactions > > also, and then the read transactions that wait behind them. Every time > > the ANALYZE sleeps it wakes up the other transactions, which then > > realise they can't move because of locks and then wake up the ANALYZEs > > for another shot. The end result is that you introduce more context- > > switching, without any chance of doing more useful work while the > > ANALYZEs sleep. > > Let me make sure I understand. ANALYZE acquires a read > lock on the table, that it holds until the operation is > complete (including any sleeps). That read lock blocks > the extension of that table via COPY. Is that right? > > According to the 8.0 docs, ANALYZE acquires an ACCESS SHARE > lock on the table, and that conflicts only with ACCESS > EXCLUSIVE. Thats why I didn't think I had a lock issue, > since I think COPY only needs ROW EXCLUSIVE. Or perhaps > the transaction needs something more? The docs are correct, but don't show catalog and buffer locks. ...but on further reading of the code there are no catalog locks or buffer locks held across the sleep points. So, my explanation doesn't work as an explanation for the sleep/no sleep difference you have observed. Best Regards, Simon Riggs
On Mon, 2005-07-11 at 15:51 +0100, Simon Riggs wrote: > On Mon, 2005-07-11 at 09:07 -0400, Ian Westmacott wrote: > > On Mon, 2005-07-11 at 07:31, Simon Riggs wrote: > > > The ANALYZE commands hold read locks on the tables you wish to write to. > > > If you slow them down, you merely slow down your write transactions > > > also, and then the read transactions that wait behind them. Every time > > > the ANALYZE sleeps it wakes up the other transactions, which then > > > realise they can't move because of locks and then wake up the ANALYZEs > > > for another shot. The end result is that you introduce more context- > > > switching, without any chance of doing more useful work while the > > > ANALYZEs sleep. > > > > Let me make sure I understand. ANALYZE acquires a read > > lock on the table, that it holds until the operation is > > complete (including any sleeps). That read lock blocks > > the extension of that table via COPY. Is that right? > > > > According to the 8.0 docs, ANALYZE acquires an ACCESS SHARE > > lock on the table, and that conflicts only with ACCESS > > EXCLUSIVE. Thats why I didn't think I had a lock issue, > > since I think COPY only needs ROW EXCLUSIVE. Or perhaps > > the transaction needs something more? > > The docs are correct, but don't show catalog and buffer locks. > > ...but on further reading of the code there are no catalog locks or > buffer locks held across the sleep points. So, my explanation doesn't > work as an explanation for the sleep/no sleep difference you have > observed. I've been through all the code now and can't find any resource that is held across a delay point. Nor any reason to believe that the vacuum cost accounting would slow anything down. Since vacuum_cost_delay is a userset parameter, you should be able to SET this solely for the analyze_thread. That way we will know with more certainty that it is the analyze_thread that is interfering. What is your default_statistics_target? Do you have other stats targets set? How long does ANALYZE take to run, with/without the vacuum_cost_delay? Thanks, Best Regards, Simon Riggs
On Tue, 2005-07-12 at 03:45, Simon Riggs wrote: > Since vacuum_cost_delay is a userset parameter, you should be able to > SET this solely for the analyze_thread. That way we will know with more > certainty that it is the analyze_thread that is interfering. That is what I have been doing. In fact, I have eliminated the reader_thread and analyze_thread. I just have the writer_thread running, and a psql connection with which I perform ANALYZE, for various vacuum_cost_* parameters. (I'm trying to extract a reproducible experiment) It appears not to matter whether it is one of the tables being written to that is ANALYZEd. I can ANALYZE an old, quiescent table, or a system table and see this effect. > What is your default_statistics_target? All other configs are default; default_statistics_target=10. > Do you have other stats targets set? No. The only thing slightly out of the ordinary with the tables is that they are created WITHOUT OIDS. Some indexes, but no primary keys. All columns NOT NULL. > How long does ANALYZE take to run, with/without the vacuum_cost_delay? Well, on one table with about 50K rows, it takes about 1/4s to ANALYZE with vacuum_cost_delay=0, and about 15s with vacuum_cost_delay=1000. Other things of note: - VACUUM has the same effect. If I VACUUM or ANALYZE the whole DB, the CPU spikes reset between tables. - vmstat reports blocks written drops as the CPU rises. Don't know if it is cause or effect yet. On a small test system, I'm writing about 1.5MB/s. After about 20s of cost-based ANALYZE, this drops under 0.5MB/s. - this is a dual Xeon. I have tried both with and without hyperthreading. I haven't tried to reproduce it elsewhere yet, but will. - Looking at oprofile reports for 10-minute runs of a database-wide VACUUM with vacuum_cost_delay=0 and 1000, shows the latter spending a lot of time in LWLockAcquire and LWLockRelease (20% each vs. 2%). Thanks, --Ian
On Tue, 2005-07-12 at 13:50 -0400, Ian Westmacott wrote: > It appears not to matter whether it is one of the tables > being written to that is ANALYZEd. I can ANALYZE an old, > quiescent table, or a system table and see this effect. Can you confirm that this effect is still seen even when the ANALYZE doesn't touch *any* of the tables being accessed? > - this is a dual Xeon. Is that Xeon MP then? > - Looking at oprofile reports for 10-minute runs of a > database-wide VACUUM with vacuum_cost_delay=0 and 1000, > shows the latter spending a lot of time in LWLockAcquire > and LWLockRelease (20% each vs. 2%). Is this associated with high context switching also? Best Regards, Simon Riggs
On Wed, 2005-07-13 at 11:55, Simon Riggs wrote: > On Tue, 2005-07-12 at 13:50 -0400, Ian Westmacott wrote: > > It appears not to matter whether it is one of the tables > > being written to that is ANALYZEd. I can ANALYZE an old, > > quiescent table, or a system table and see this effect. > > Can you confirm that this effect is still seen even when the ANALYZE > doesn't touch *any* of the tables being accessed? Yes. > > - this is a dual Xeon. > > Is that Xeon MP then? Yes. > > - Looking at oprofile reports for 10-minute runs of a > > database-wide VACUUM with vacuum_cost_delay=0 and 1000, > > shows the latter spending a lot of time in LWLockAcquire > > and LWLockRelease (20% each vs. 2%). > > Is this associated with high context switching also? Yes, it appears that context switches increase up to 4-5x during cost-based ANALYZE. --Ian
Ian Westmacott <ianw@intellivid.com> writes: > On Wed, 2005-07-13 at 11:55, Simon Riggs wrote: >> On Tue, 2005-07-12 at 13:50 -0400, Ian Westmacott wrote: >>> It appears not to matter whether it is one of the tables >>> being written to that is ANALYZEd. I can ANALYZE an old, >>> quiescent table, or a system table and see this effect. >> >> Can you confirm that this effect is still seen even when the ANALYZE >> doesn't touch *any* of the tables being accessed? > Yes. This really isn't making any sense at all. I took another look through the vacuum_delay_point() calls, and I can see a couple that are questionably placed: * the one in count_nondeletable_pages() is done while we are holding exclusive lock on the table; we might be better off not to delay there, so as not to block non-VACUUM processes longer than we have to. * the ones in hashbulkdelete and rtbulkdelete are done while holding various forms of exclusive locks on the index (this was formerly true of gistbulkdelete as well). Again it might be better not to delay. However, these certainly do not explain Ian's problem, because (a) these only apply to VACUUM, not ANALYZE; (b) they would only lock the table being VACUUMed, not other ones; (c) if these locks were to block the reader or writer thread, it'd manifest as blocking on a semaphore, not as a surge in LWLock thrashing. >> Is that Xeon MP then? > Yes. The LWLock activity is certainly suggestive of prior reports of excessive buffer manager lock contention, but it makes *no* sense that that would be higher with vacuum cost delay than without. I'd have expected the other way around. I'd really like to see a test case for this... regards, tom lane
On Wed, 2005-07-13 at 14:58 -0400, Tom Lane wrote: > Ian Westmacott <ianw@intellivid.com> writes: > > On Wed, 2005-07-13 at 11:55, Simon Riggs wrote: > >> On Tue, 2005-07-12 at 13:50 -0400, Ian Westmacott wrote: > >>> It appears not to matter whether it is one of the tables > >>> being written to that is ANALYZEd. I can ANALYZE an old, > >>> quiescent table, or a system table and see this effect. > >> > >> Can you confirm that this effect is still seen even when the ANALYZE > >> doesn't touch *any* of the tables being accessed? > > > Yes. > > This really isn't making any sense at all. Agreed. I think all of this indicates that some wierdness (technical term) is happening at a different level in the computing stack. I think all of this points fairly strongly to it *not* being a PostgreSQL algorithm problem, i.e. if the code was executed by an idealised Knuth- like CPU then we would not get this problem. Plus, I have faith that if it was a problem in that "plane" then you or another would have uncovered it by now. > However, these certainly do not explain Ian's problem, because (a) these > only apply to VACUUM, not ANALYZE; (b) they would only lock the table > being VACUUMed, not other ones; (c) if these locks were to block the > reader or writer thread, it'd manifest as blocking on a semaphore, not > as a surge in LWLock thrashing. I've seen enough circumstantial evidence to connect the time spent inside LWLockAcquire/Release as being connected to the Semaphore ops within them, not the other aspects of the code. Months ago we discussed the problem of false sharing on closely packed arrays of shared variables because of the large cache line size of the Xeon MP. When last we touched on that thought, I focused on the thought that the LWLock array was too tightly packed for the predefined locks. What we didn't discuss (because I was too focused on the other array) was the PGPROC shared array is equally tightly packed, which could give problems on the semaphores in LWLock. Intel says fairly clearly that this would be an issue. > >> Is that Xeon MP then? > > > Yes. > > The LWLock activity is certainly suggestive of prior reports of > excessive buffer manager lock contention, but it makes *no* sense that > that would be higher with vacuum cost delay than without. I'd have > expected the other way around. > > I'd really like to see a test case for this... My feeling is that a "micro-architecture" test would be more likely to reveal some interesting information. Best Regards, Simon Riggs
I can at least report that the problem does not seem to occur with Postgres 8.0.1 running on a dual Opteron. --Ian On Wed, 2005-07-13 at 16:39, Simon Riggs wrote: > On Wed, 2005-07-13 at 14:58 -0400, Tom Lane wrote: > > Ian Westmacott <ianw@intellivid.com> writes: > > > On Wed, 2005-07-13 at 11:55, Simon Riggs wrote: > > >> On Tue, 2005-07-12 at 13:50 -0400, Ian Westmacott wrote: > > >>> It appears not to matter whether it is one of the tables > > >>> being written to that is ANALYZEd. I can ANALYZE an old, > > >>> quiescent table, or a system table and see this effect. > > >> > > >> Can you confirm that this effect is still seen even when the ANALYZE > > >> doesn't touch *any* of the tables being accessed? > > > > > Yes. > > > > This really isn't making any sense at all. > > Agreed. I think all of this indicates that some wierdness (technical > term) is happening at a different level in the computing stack. I think > all of this points fairly strongly to it *not* being a PostgreSQL > algorithm problem, i.e. if the code was executed by an idealised Knuth- > like CPU then we would not get this problem. Plus, I have faith that if > it was a problem in that "plane" then you or another would have > uncovered it by now. > > > However, these certainly do not explain Ian's problem, because (a) these > > only apply to VACUUM, not ANALYZE; (b) they would only lock the table > > being VACUUMed, not other ones; (c) if these locks were to block the > > reader or writer thread, it'd manifest as blocking on a semaphore, not > > as a surge in LWLock thrashing. > > I've seen enough circumstantial evidence to connect the time spent > inside LWLockAcquire/Release as being connected to the Semaphore ops > within them, not the other aspects of the code. > > Months ago we discussed the problem of false sharing on closely packed > arrays of shared variables because of the large cache line size of the > Xeon MP. When last we touched on that thought, I focused on the thought > that the LWLock array was too tightly packed for the predefined locks. > What we didn't discuss (because I was too focused on the other array) > was the PGPROC shared array is equally tightly packed, which could give > problems on the semaphores in LWLock. > > Intel says fairly clearly that this would be an issue. > > > >> Is that Xeon MP then? > > > > > Yes. > > > > The LWLock activity is certainly suggestive of prior reports of > > excessive buffer manager lock contention, but it makes *no* sense that > > that would be higher with vacuum cost delay than without. I'd have > > expected the other way around. > > > > I'd really like to see a test case for this... > > My feeling is that a "micro-architecture" test would be more likely to > reveal some interesting information. > > Best Regards, Simon Riggs > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend