Thread: Re: Huge number of disk writes after migration to 8.1
Marcin <migor@op.pl> writes: > Attached you'll find the archive with data you've asked. Well, breaking down the contents of the stats file I find: 190 backends (218880 bytes) 118 databases (8496 bytes) 81952 tables (8523008 bytes) And the ps listing shows there actually are about 190 live backends, so the idea about leaked backend stats entries is clearly wrong. The per-table data is 104 bytes/table, up from 80 in 8.0, so that's a noticeable increase but hardly a killer. What I have to conclude is that 8.1 is tracking stats for a lot more tables than 8.0 was. (Do you happen to have a number for the size of the stats file in the 8.0 installation? I don't need to see its contents, I just want to know how big it is.) Do you actually have 81952 tables in the installation? Maybe we are looking at a table-level leak. Do you have autovacuum on where it was not in 8.0? Maybe that's causing it. regards, tom lane
Marcin <migor@op.pl> writes: > Autovacuum is off. > The VACUUM FULL bottom line is: > INFO: free space map contains 59842 pages in 25371 relations > DETAIL: A total of 426720 page slots are in use (including overhead). > 426720 page slots are required to track all free space. > And the quick iterations for all the DBs with psql -t -c "\dt" shows > 67654 rows. > So there's not 81952, BUT, the tables are created (and dropped) quite > often during the work hours (and they're regular, not TEMPORARY tables). > I also find out, that there were 11170 tables created (and most of them > dropped) today. Looking at the code, stats entries for dropped tables are cleaned out only when a VACUUM command is done; which is something we probably ought to change. However, that's always been the case, so I don't understand why your stats file is so much bigger in 8.1. Have you changed your vacuuming strategy at all since the 8.0 installation? Perhaps row or block statistics weren't enabled in the 8.0 installation and are now? regards, tom lane
Tom Lane wrote: > Marcin <migor@op.pl> writes: > > So there's not 81952, BUT, the tables are created (and dropped) quite > > often during the work hours (and they're regular, not TEMPORARY tables). > > I also find out, that there were 11170 tables created (and most of them > > dropped) today. > > Looking at the code, stats entries for dropped tables are cleaned out > only when a VACUUM command is done; which is something we probably ought > to change. I was going to ask if you were confusing pgstat_vacuum_tabstat with a VACUUM command, when I noticed that only in vacuum() is that function called! This surprised me and I agree that it should be changed. I'm not sure what would be a more appropiate place to call it, however. Maybe in heap_drop_with_catalog() we could place a call, or a limited version that'd only "vacuum" the table being dropped. -- Alvaro Herrera http://www.amazon.com/gp/registry/DXLWNGRJD34J "Having your biases confirmed independently is how scientific progress is made, and hence made our great society what it is today" (Mary Gardiner)
Alvaro Herrera <alvherre@commandprompt.com> writes: > Tom Lane wrote: >> Looking at the code, stats entries for dropped tables are cleaned out >> only when a VACUUM command is done; which is something we probably ought >> to change. > I was going to ask if you were confusing pgstat_vacuum_tabstat with a > VACUUM command, when I noticed that only in vacuum() is that function > called! This surprised me and I agree that it should be changed. I'm > not sure what would be a more appropiate place to call it, however. I don't have a problem with that. What I'm thinking is that a DROP TABLE command should issue a tabpurge message for the specific table (or index) being zapped. We still need vacuum_tabstat as a backstop in case the tabpurge message gets lost, though. Another thought is that in autovacuum, pgstat_vacuum_tabstat is really called too often: once per autovac cycle would be sufficient, but instead it's repeated for each table we vacuum. regards, tom lane
On Tue, Jan 17, 2006 at 05:12:09PM -0500, Tom Lane wrote: > However, that's always been the case, so I don't understand > why your stats file is so much bigger in 8.1. Have you changed your > vacuuming strategy at all since the 8.0 installation? Perhaps row or > block statistics weren't enabled in the 8.0 installation and are now? The only change was the upgrade to 8.1. I run VACUUM FULL everyday, and double checked that it finished succesfully in last two days. The block and row statistics are disabled: stats_start_collector = on stats_command_string = off stats_block_level = off stats_row_level = off stats_reset_on_server_start = off I have no idea, why the pgstat.stat is so large. I just shutdown the idle backend connection, so there're only 30 left (instead of 200), and the pgstat size decreases by 200KB from 8.8 to 8.6 MB. The lowest size of pgstat in last six hours was 8.5 MB. -- Marcin
Marcin <migor@op.pl> writes: > On Tue, Jan 17, 2006 at 05:12:09PM -0500, Tom Lane wrote: >> However, that's always been the case, so I don't understand >> why your stats file is so much bigger in 8.1. Have you changed your >> vacuuming strategy at all since the 8.0 installation? Perhaps row or >> block statistics weren't enabled in the 8.0 installation and are now? > The only change was the upgrade to 8.1. I run VACUUM FULL everyday, and > double checked that it finished succesfully in last two days. > The block and row statistics are disabled: > stats_start_collector = on > stats_command_string = off > stats_block_level = off > stats_row_level = off > stats_reset_on_server_start = off Stats off and it's still bloating the file?? [ studies code... ] I see the culprit: it's the pgstat_report_vacuum and pgstat_report_analyze routines that were added in 8.1. Those send messages unconditionally, meaning that the collector will create table entries for every table during a database-wide vacuum, even with stats turned off. This seems like a bad idea. Given the nature of what's counted, I think that treating these messages as "row level" stats would be appropriate. Alvaro, what do you think? regards, tom lane
Tom Lane wrote: > Stats off and it's still bloating the file?? > > [ studies code... ] I see the culprit: it's the pgstat_report_vacuum > and pgstat_report_analyze routines that were added in 8.1. Those send > messages unconditionally, meaning that the collector will create table > entries for every table during a database-wide vacuum, even with stats > turned off. > > This seems like a bad idea. Sorry, clearly my bug :-( > Given the nature of what's counted, I think that treating these > messages as "row level" stats would be appropriate. Alvaro, what do > you think? Yeah, row level seems appropiate for what we use it. I'll take care of it, unless you want to do it. -- Alvaro Herrera http://www.PlanetPostgreSQL.org <inflex> really, I see PHP as like a strange amalgamation of C, Perl, Shell <crab> inflex: you know that "amalgam" means "mixture with mercury", more or less, right? <crab> i.e., "deadly poison"
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > Yeah, row level seems appropiate for what we use it. I'll take care of > it, unless you want to do it. I'll fix it --- I want to put in an immediate tabpurge on drop, too. regards, tom lane
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > Tom Lane wrote: >> Given the nature of what's counted, I think that treating these >> messages as "row level" stats would be appropriate. Alvaro, what do >> you think? > Yeah, row level seems appropiate for what we use it. I'll take care of > it, unless you want to do it. Actually, there's another problem here: if we do have row-level stats turned on, a manual "VACUUM" command will still cause the set of tables listed in the stats file to grow to include every table in the DB, whether or not anything interesting is happening to that table. I think this is probably undesirable. I'm tempted to change pgstat_recv_vacuum and pgstat_recv_analyze so that they will not create new hash entries, but only insert the row count if the hash entry already exists. I am a bit worried that I might be missing something about possible interactions with autovacuum though. I see that autovac skips vacuuming tables that have no hash entry ... is there some circular reasoning going on there? regards, tom lane
Tom Lane wrote: > Actually, there's another problem here: if we do have row-level stats > turned on, a manual "VACUUM" command will still cause the set of tables > listed in the stats file to grow to include every table in the DB, > whether or not anything interesting is happening to that table. I think > this is probably undesirable. I'm tempted to change pgstat_recv_vacuum > and pgstat_recv_analyze so that they will not create new hash entries, > but only insert the row count if the hash entry already exists. I am a > bit worried that I might be missing something about possible > interactions with autovacuum though. I see that autovac skips vacuuming > tables that have no hash entry ... is there some circular reasoning > going on there? The idea was that autovac should skip tables for which it doesn't have info, because it can't decide and we chose to err on the side of caution. However, after a vacuum or analyze we do have info about the table, which is what we store in the pgstat_recv functions inconditionally. Thus the next autovacuum is able to make an informed decision about this table. The principles are: 1) store as much information as possible, 2) autovacuum should act iff it has information. I don't think this is a bug, but I'm not dead set on it if you want to change this behavior. -- Alvaro Herrera http://www.amazon.com/gp/registry/DXLWNGRJD34J "The first of April is the day we remember what we are the other 364 days of the year" (Mark Twain)
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > Tom Lane wrote: >> I'm tempted to change pgstat_recv_vacuum >> and pgstat_recv_analyze so that they will not create new hash entries, >> but only insert the row count if the hash entry already exists. I am a >> bit worried that I might be missing something about possible >> interactions with autovacuum though. I see that autovac skips vacuuming >> tables that have no hash entry ... is there some circular reasoning >> going on there? > The idea was that autovac should skip tables for which it doesn't have > info, because it can't decide and we chose to err on the side of > caution. However, after a vacuum or analyze we do have info about the > table, which is what we store in the pgstat_recv functions > inconditionally. Thus the next autovacuum is able to make an informed > decision about this table. However, if it skips the table for lack of info, then there still won't be any info next time. More to the point, that's not why we skip entry-less tables. We skip entry-less tables because there is no need for autovac to do anything to a table that's not being modified, and if it *is* being modified then those operations will result in creation of a table entry. So I'm still not seeing why vacuum or analyze should force creation of an entry. regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > > Tom Lane wrote: > >> I'm tempted to change pgstat_recv_vacuum > >> and pgstat_recv_analyze so that they will not create new hash entries, > >> but only insert the row count if the hash entry already exists. I am a > >> bit worried that I might be missing something about possible > >> interactions with autovacuum though. I see that autovac skips vacuuming > >> tables that have no hash entry ... is there some circular reasoning > >> going on there? > > > The idea was that autovac should skip tables for which it doesn't have > > info, because it can't decide and we chose to err on the side of > > caution. However, after a vacuum or analyze we do have info about the > > table, which is what we store in the pgstat_recv functions > > inconditionally. Thus the next autovacuum is able to make an informed > > decision about this table. > > However, if it skips the table for lack of info, then there still won't > be any info next time. Of course, because there's nowhere to take it from. > More to the point, that's not why we skip entry-less tables. We skip > entry-less tables because there is no need for autovac to do anything to > a table that's not being modified, and if it *is* being modified then > those operations will result in creation of a table entry. So I'm still > not seeing why vacuum or analyze should force creation of an entry. Well, if you issue a manual vacuum then you can argue that it has been "modified" (it's seeing some activity). But immediately after a vacuum the table will not be vacuumed by autovac if there's no other activity, because there's no need for it, so we can create the entry anyway and it won't make an immediate difference. However, it (to have the pgstat table entry) _will_ make a difference as soon as that entry is modified by other activity, because the number of tuples pgstat knows about will be more accurate. So maybe we could supress the entry creation on analyze (because analyze _can_ force a vacuum during the next autovac iteration, which is something we may want to avoid), but keep it on vacuum. -- Alvaro Herrera Valdivia, Chile ICBM: S 39º 49' 17.7", W 73º 14' 26.8" "The West won the world not by the superiority of its ideas or values or religion but rather by its superiority in applying organized violence. Westerners often forget this fact, non-Westerners never do." (Samuel P. Huntington)
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > Well, if you issue a manual vacuum then you can argue that it has been > "modified" (it's seeing some activity). But immediately after a vacuum > the table will not be vacuumed by autovac if there's no other activity, > because there's no need for it, so we can create the entry anyway and it > won't make an immediate difference. You're ignoring the point of this thread, which is that creating hashtable entries for tables that aren't actively being modified causes significant ongoing overhead that we ought to try to minimize. What I'm suggesting is that only foreground modification activity (insert/update/delete) should trigger creation of a hashtable entry. Once that has happened, it's reasonable to update the entry after any vacuum or analyze. But a vacuum or analyze shouldn't force an entry to spring into existence when nothing else is happening to the table. That just bloats the stats file. Given the current autovac logic, ordinary autovac operations won't cause useless entries to be created anyway ... but an anti-wraparound vacuum will, and so will manual vacuum or analyze commands. regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > > Well, if you issue a manual vacuum then you can argue that it has been > > "modified" (it's seeing some activity). But immediately after a vacuum > > the table will not be vacuumed by autovac if there's no other activity, > > because there's no need for it, so we can create the entry anyway and it > > won't make an immediate difference. > > You're ignoring the point of this thread, which is that creating > hashtable entries for tables that aren't actively being modified > causes significant ongoing overhead that we ought to try to minimize. True. I agree with the rest of the reasoning in this case. Maybe the fact that the stat file is completely rewritten every 500 ms should be reconsidered, if in the future someone chooses to rewrite the stat system. We can reconsider this part then, as well. -- Alvaro Herrera http://www.PlanetPostgreSQL.org Jason Tesser: You might not have understood me or I am not understanding you. Paul Thomas: It feels like we're 2 people divided by a common language...
Tom Lane wrote: > [ studies code... ] I see the culprit: it's the pgstat_report_vacuum > and pgstat_report_analyze routines that were added in 8.1. Those send > messages unconditionally, meaning that the collector will create table > entries for every table during a database-wide vacuum, even with stats > turned off. So in my case disabling the daily vacuum would, in fact, improve the write performance ;) ... just joking. I think I turn off the stats completely, and wait patiently for a fix. Do you think it will be included in one of next 8.1.x releases, or do I have to wait for 8.2? Thanks again for a great help, -- Marcin
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > Maybe the fact that the stat file is completely rewritten every 500 ms > should be reconsidered, if in the future someone chooses to rewrite > the stat system. We can reconsider this part then, as well. Yeah, it's becoming pretty obvious that that design does not scale very well. I don't immediately have any ideas about a better way though. I am working on some marginal hacks like not writing more of the backend activity strings than is needed, but it'd be nicer to think of a different solution. regards, tom lane
Marcin <migor@op.pl> writes: > Do you think it will be included in one of next 8.1.x releases, or do I > have to wait for 8.2? I'm working on a fix for 8.1.3. regards, tom lane
Greetings, pg_restore doesn't handle errors very well. While attempting to upgraded a database from 8.0 to 8.1 there were some errors creating some tables because 8.0 had PostGIS installed (along with some tables) and 8.1 did not. This also meant that attempts to load the tables which had failed to be created didn't work. The bad part about this is that instead of skipping to the end of the data from the COPY command, pg_restore seems to have thought that the contents of the table was an SQL command and read in the whole table and I think then got very confused. Here's the process list from this: postgres 6692 2.7 0.7 78380 48524 pts/3 S+ Jan19 32:26 /usr/lib/p= ostgresql/8.1/bin/pg_dump -h /var/run/postgresql -p 5432 -Fc tsf postgres 6693 1.2 51.7 3198716 3169240 pts/3 S+ Jan19 15:06 /usr/lib/p= ostgresql/8.1/bin/pg_restore -h /var/run/postgresql -p 5433 -d template1 -C postgres 6694 10.8 0.9 96780 59316 ? S Jan19 126:40 postgres: = postgres tsf [local] COPY=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20 postgres 6782 3.4 0.3 52364 24092 ? S Jan19 40:12 postgres: = postgres tsf [local] idle=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20 As you can see, pg_restore is using up 3G or so, which seems likely to be the size of one of the PostGIS-based tables. The 8.0 COPY seems to think it's still going (perhaps it's blocking because pg_restore is trying to do something, but it's not actually using any CPU time), while the 8.1 connection is idle (waiting for pg_restore to give it something to do). It seems like pg_restore really should be able to handle COPY errors correctly by skipping to the end of the COPY data segment when the initial COPY command comes back as an error. If it's not too invasive of a fix, I really think it should be included in the next point release of 8.1. Thanks! Stephen
Stephen Frost <sfrost@snowman.net> writes: > It seems like pg_restore really should be able to handle COPY errors > correctly by skipping to the end of the COPY data segment when the > initial COPY command comes back as an error. Send a patch ;-) regards, tom lane