Thread: Possible explanations for catastrophic performace deterioration?
I recently had a puzzling experience (performace related). Had a DB running presumably smoothly, on a server with Dual-Core Opteron and 4GB of RAM (and SATA2 drives with Hardware RAID-1). (PostgreSQL 8.2.4 installed from source, on a FC4 system --- databases with no encoding --- initdb -E SQL_ASCII --no-locale, and all the databases created with encoding SQL_ASCII) We thought that performance had gone little by little down, but the evidence now suggests that something must have triggered a big step down in the performance of the server. Thinking that it was simply a bottleneck with the hardware, we moved to a different machine (lower performance CPU-wise, but with dual hard disk, so I configured the pg_xlog directory on a partition on a separate hard disk, estimating that this would take precedence over the lower CPU power and the 2GB of RAM instead of 4). Not only the performance was faster --- a query like: select count(*) from customer was *instantaneous* on the new machine (just after populating it, without having even analyzed it!), and would take over a minute on the old machine (the first time). Then, the second time, it would take a little over two seconds on the old machine (at this point, both machines had *zero* activity --- they were both essentially disconnected from the outside world; serving exclusively my psql connection). Funny thing, I dropped the database (on the old machine) and re-created it with the backup I had just created, and now the performance on the old one was again normal (the above query now gives me a result in essentially no time --- same as on the new machine). In retrospect, I'm now wondering if a vacuum full would have solved the issue? (we do run vacuumdb -z --- vacuum analyze --- daily) Any comments?? I'm worried that three months down the road we'll face the same issue with this new server (that's about the time it took since we had started running the other server until we noticed the poor performance level) --- and we can not afford to completely stop the system to drop-and-recreate the db on a regular basis. Thanks, Carlos --
You didn't specify the database size, but my guess is that the total data size about enough to fit in shared_buffers or kernel cache. On the new system (or dropped/recreated database), it would've all or mostly fit in memory which would make things like count(*) work quickly. On the old database, you probably had a lot of fragmentation which would've caused significantly more I/O to be performed thereby causing a slowdown. You could compare relation sizes to check easily. My guess is that a vacuum full would've brought the other database back up to speed. In the future, you probably want to set fillfactor to a reasonable amount to account for updates-to-blocks-between-vacuum to try and capture as few row-migrations as possible. -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 499 Thornall Street, 2nd Floor | jonah.harris@enterprisedb.com Edison, NJ 08837 | http://www.enterprisedb.com/
"Jonah H. Harris" <jonah.harris@gmail.com> writes: > My guess is that a vacuum full would've brought the other database > back up to speed. Yeah, table bloat is what it sounds like to me too. > In the future, you probably want to set fillfactor > to a reasonable amount to account for updates-to-blocks-between-vacuum > to try and capture as few row-migrations as possible. More to the point, check your FSM size and make sure vacuums are happening often enough. regards, tom lane
Jonah H. Harris wrote: > You didn't specify the database size Oops, sorry about that one --- the full backup is a 950MB file. The entire database should fit in memory (and the effective_cache_size was set to 2GB for the machine with 4GB of memory) > , but my guess is that the total > data size about enough to fit in shared_buffers or kernel cache. On > the new system (or dropped/recreated database), it would've all or > mostly fit in memory which would make things like count(*) work > quickly. I don't understand this argument --- the newer system has actually less memory than the old one; how could it fit there and not on the old one? Plus, how could dropping-recreating the database on the same machine change the fact that the entire dataset entirely fit or not in memory?? The other part that puzzled me is that after running "select count(*) ... " several times (that particular table is *very* small --- just 200 thousand records of no more than 100 or 200 bytes each), then the entire table *should* have been in memory ... Yet, it would still take a few seconds (notice that there was a *considerable* improvement from the first run of that query to the second one on the old server --- from more than a minute, to just above two seconds.... But still, on the new server, and after recreating the DB on the old one, it runs in *no time* the first time). > My guess is that a vacuum full would've brought the other database > back up to speed. I'm furious now that it didn't occur to me the vacuum full until *after* I had recreated the database to see th problem disappear... I wonder if I should then periodically run a vacuum full --- say, once a week? Once a month? > In the future, you probably want to set fillfactor > to a reasonable amount to account for updates-to-blocks-between-vacuum > to try and capture as few row-migrations as possible. > Could you elaborate a bit on this? Or point me to the right places in the documentation to help me understand the above?? (I'm 100% blank after reading the above paragraph) Thanks, Carlos --
Carlos Moreno wrote: >> , but my guess is that the total >> data size about enough to fit in shared_buffers or kernel cache. On >> the new system (or dropped/recreated database), it would've all or >> mostly fit in memory which would make things like count(*) work >> quickly. > > I don't understand this argument --- the newer system has actually > less memory than the old one; how could it fit there and not on the > old one? Plus, how could dropping-recreating the database on the same > machine change the fact that the entire dataset entirely fit or not in > memory?? Because on the older server it is bloated, while on the new one it is fresh thus no dead tuples. > The other part that puzzled me is that after running "select count(*) > ... " several times (that particular table is *very* small --- just > 200 thousand records of no more than 100 or 200 bytes each), then the > entire table *should* have been in memory ... Yet, it would still > take a few seconds (notice that there was a *considerable* > improvement from the first run of that query to the second one on the > old server --- from more than a minute, to just above two seconds.... > But still, on the new server, and after recreating the DB on the old > one, it runs in *no time* the first time). Bloat can explain this as well. >> My guess is that a vacuum full would've brought the other database >> back up to speed. > > I'm furious now that it didn't occur to me the vacuum full until > *after* I had recreated the database to see th problem disappear... > > I wonder if I should then periodically run a vacuum full --- say, once > a week? Once a month? Never. What you need to do is make sure your FSM settings (fsm_max_pages in particular) are high enough, and that you VACUUM (not full) frequently enough. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
>> I don't understand this argument --- the newer system has actually >> less memory than the old one; how could it fit there and not on the >> old one? Plus, how could dropping-recreating the database on the same >> machine change the fact that the entire dataset entirely fit or not in >> memory?? > > Because on the older server it is bloated, while on the new one it is > fresh thus no dead tuples. Wait a second --- am I correct in understanding then that the bloating you guys are referring to occurs *in memory*?? My mind has been operating under the assumption that bloating only occurs on disk, and never in memory --- is there where my logic is mistaken? >> I wonder if I should then periodically run a vacuum full --- say, once >> a week? Once a month? > > Never. What you need to do is make sure your FSM settings > (fsm_max_pages in particular) are high enough, and that you VACUUM (not > full) frequently enough. Noted. Thanks! Carlos --
On 9/23/07, Carlos Moreno <moreno_pg@mochima.com> wrote: > Wait a second --- am I correct in understanding then that the bloating > you guys are referring to occurs *in memory*?? No, bloating occurs on-disk; but this does affect memory. Bloat means that even though your table data may take up 1G after the initial load, due to poor vacuuming, table layouts, etc. it to equal something more... say 2G. The thing is, even though the table only stores 1G of data, it is now physically 2G. So, anything that would need to read the entire table (like COUNT(*)), or large sections of it sequentially, are performing twice as many I/Os to do so. Which means you're actually waiting on two things, I/O and additional CPU time reading blocks that have very little viable data in them. -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 499 Thornall Street, 2nd Floor | jonah.harris@enterprisedb.com Edison, NJ 08837 | http://www.enterprisedb.com/
Jonah H. Harris wrote: > On 9/23/07, Carlos Moreno <moreno_pg@mochima.com> wrote: >> Wait a second --- am I correct in understanding then that the bloating >> you guys are referring to occurs *in memory*?? > > No, bloating occurs on-disk; but this does affect memory. Bloat means > that even though your table data may take up 1G after the initial > load, due to poor vacuuming, table layouts, etc. it to equal something > more... say 2G. > > The thing is, even though the table only stores 1G of data, it is now > physically 2G. So, anything that would need to read the entire table > (like COUNT(*)), or large sections of it sequentially, are performing > twice as many I/Os to do so. OK --- that was my initial impression... But again, then I'm still puzzled about why *the second time* that I load the query, it still take a few seconds. That is: the first time I run the query, it has to go through the disk; in the normal case it would have to read 100MB of data, but due to bloating, it actually has to go through 2GB of data. Ok, but then, it will load only 100MB (the ones that are not "uncollected disk garbage") to memory. The next time that I run the query, the server would only need to read 100MB from memory --- the result should be instantaneous... The behaviour I observed was: first time I run the query took over one minute; second time, a little above two seconds. Tried four or five times more; in every instance it was around 2 seconds. On the new server, *the first time* I run the query, it takes *no time* (I repeat: *no time* --- as in perhaps 10 to 100 msec; in any case, my eyes could not resolve between the moment I hit enter and the moment I see the result with the count of rows --- that's between one and two orders of magnitude faster than with the old server --- and again, we're comparing *the first* time I execute the query on the new machine, in which case it is expected that it would have to read from disk, compared to the second and subsequent times that I execute it on the old machine, in which case, since the bloating does not occur in memory, the entire seq. scan should occur exclusively in memory ... ) That's what still puzzles me --- Alvaro's reply seemed to explain it if I accept that the bloating affects memory (dead tuples loaded to memory reduce the capacity to load the entire dataset into memory)... Someone could shed some light and point out if there's still something I'm missing or some other mistake in my analysis?? Hope I'm not sounding like I'm being dense!! Thanks, Carlos --
Carlos Moreno wrote: > That is: the first time I run the query, it has to go through the > disk; in the normal case it would have to read 100MB of data, but due > to bloating, it actually has to go through 2GB of data. Ok, but > then, it will load only 100MB (the ones that are not "uncollected > disk garbage") to memory. The next time that I run the query, the > server would only need to read 100MB from memory --- the result should > be instantaneous... Wrong. If there is 2GB of data, 1900MB of which is dead tuples, those pages would still have to be scanned for the count(*). The system does not distinguish "pages which have no live tuples" from other pages, so it has to load them all. -- Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4 "[PostgreSQL] is a great group; in my opinion it is THE best open source development communities in existence anywhere." (Lamar Owen)
Alvaro Herrera wrote: > Carlos Moreno wrote: > > >> That is: the first time I run the query, it has to go through the >> disk; in the normal case it would have to read 100MB of data, but due >> to bloating, it actually has to go through 2GB of data. Ok, but >> then, it will load only 100MB (the ones that are not "uncollected >> disk garbage") to memory. The next time that I run the query, the >> server would only need to read 100MB from memory --- the result should >> be instantaneous... > > Wrong. If there is 2GB of data, 1900MB of which is dead tuples, those > pages would still have to be scanned for the count(*). The system does > not distinguish "pages which have no live tuples" from other pages, so > it has to load them all. Yes, that part I understand --- I think I now know what the error is in my logic. I was thinking as follows: We read 2GB of which 1900MB are dead tuples. But then, once they're read, the system will only keep in memory the 100MB that are valid tuples. I'm now thinking that the problem with my logic is that the system does not keep anything in memory (or not all tuples, in any case), since it is only counting, so it does not *have to* keep them, and since the total amount of reading from the disk exceeds the amount of physical memory, then the valid tuples are "pushed out" of memory. So, the second time I execute the query, it will still need to scan the disk (in my mind, the way I was seeing it, the second time I execute the "select count(*) from customer", the entire customer table would be in memory from the previous time, and that's why I was thinking that the bloating would not explain why the second time it is still slow). Am I understanding it right? Thanks for your patience! Carlos --
On 9/23/07, Carlos Moreno <moreno_pg@mochima.com> wrote: > Yes, that part I understand --- I think I now know what the error is in > my logic. I was thinking as follows: We read 2GB of which 1900MB are > dead tuples. But then, once they're read, the system will only keep > in memory the 100MB that are valid tuples. Yes, this is wrong. > I'm now thinking that the problem with my logic is that the system does > not keep anything in memory (or not all tuples, in any case), since it > is only counting, so it does not *have to* keep them, and since the > total amount of reading from the disk exceeds the amount of physical > memory, then the valid tuples are "pushed out" of memory. Yes, it does keep some in memory, but not all of it. > So, the second time I execute the query, it will still need to scan the > disk (in my mind, the way I was seeing it, the second time I execute > the "select count(*) from customer", the entire customer table would be > in memory from the previous time, and that's why I was thinking that > the bloating would not explain why the second time it is still slow). Yes, it is still performing additional I/Os and additional CPU work to read bloated data. > Am I understanding it right? Now, I think so. -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 499 Thornall Street, 2nd Floor | jonah.harris@enterprisedb.com Edison, NJ 08837 | http://www.enterprisedb.com/
"Carlos Moreno" <moreno_pg@mochima.com> writes: > I'm now thinking that the problem with my logic is that the system does > not keep anything in memory (or not all tuples, in any case), since it > is only counting, so it does not *have to* keep them That's really not how it works. When Postgres talks to the OS they're just bits. There's no cache of rows or values or anything higher level than bits. Neither the OS's filesystem cache nor the Postgres shared memory knows the difference between live or dead rows or even pages that don't contain any rows. > and since the total amount of reading from the disk exceeds the amount of > physical memory, then the valid tuples are "pushed out" of memory. That's right. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com