Thread: Re: Schema-only dump dumps no constraints, no triggers
Hi, Dnia 28 lipca 2012 1:10 Adrian Klaver <adrian.klaver@gmail.com> napisał(a): > What where the deleted files? > WAL, Logs, other? > What type of WAL replication are you doing? > Streaming, log shipping, etc? > What are your settings for the WAL replication? > In particular wal_keep_segments ? > Is the WAL replication actually working? at this time - a couple days after restart, the clog hasn't re-formed yet. Thus, I am unable to tell you what files theywere, we didn't pay that much attention to it then - there were some WAL files but I can't tell what the actual structurewas. I'll provide this information whenever possible. The WAL replication is a streaming replication with a hot standby server. The servers have a direct connection with one another.Configuration appended. The replication is working fine. The primary server also has a single mostly-"idle" transaction from any client node - a trait of an "always online" clientapplication. Immediately after primary server restart this does not add much overhead, though. I can't tell whetherthis, in the long run, might cause the problem. Configuration follows: - primary server postgresql.conf - WAL section - standby server postgresql.conf - WAL section - standby server recovery.conf The primary server WAL configuration is: #------------------------------------------------------------------------------ # WRITE AHEAD LOG #------------------------------------------------------------------------------ # - Settings - wal_level = hot_standby # minimal, archive, or hot_standby # (change requires restart) #fsync = on # turns forced synchronization on or off #synchronous_commit = on # immediate fsync at commit #wal_sync_method = fsync # the default is the first option # supported by the operating system: # open_datasync # fdatasync (default on Linux) # fsync # fsync_writethrough # open_sync #full_page_writes = on # recover from partial page writes wal_buffers = 1MB # min 32kB # (change requires restart) #wal_writer_delay = 200ms # 1-10000 milliseconds #commit_delay = 0 # range 0-100000, in microseconds #commit_siblings = 5 # range 1-1000 # - Checkpoints - #checkpoint_segments = 30 # in logfile segments, min 1, 16MB each #checkpoint_timeout = 5min # range 30s-1h #checkpoint_completion_target = 0.5 # checkpoint target duration, 0.0 - 1.0 #checkpoint_warning = 30s # 0 disables # - Archiving - archive_mode = on # allows archiving to be done # (change requires restart) archive_command = 'rsync %p <CUT>/%f' # command to use to archive a logfile segment archive_timeout = 0 # force a logfile segment switch after this # number of seconds; 0 disables # - Streaming Replication - max_wal_senders = 5 # max number of walsender processes # (change requires restart) #wal_sender_delay = 200ms # walsender cycle time, 1-10000 milliseconds wal_keep_segments = 32 # in logfile segments, 16MB each; 0 disables #vacuum_defer_cleanup_age = 0 # number of xacts by which cleanup is delayed # - Standby Servers - #hot_standby = off # "on" allows queries during recovery # (change requires restart) #max_standby_archive_delay = 30s # max delay before canceling queries # when reading WAL from archive; # -1 allows indefinite delay #max_standby_streaming_delay = 30s # max delay before canceling queries # when reading streaming WAL; # -1 allows indefinite delay The standby server WAL configuration is: #------------------------------------------------------------------------------ # WRITE AHEAD LOG #------------------------------------------------------------------------------ # - Settings - #wal_level = minimal # minimal, archive, or hot_standby # (change requires restart) #fsync = on # turns forced synchronization on or off #synchronous_commit = on # immediate fsync at commit #wal_sync_method = fsync # the default is the first option # supported by the operating system: # open_datasync # fdatasync (default on Linux) # fsync # fsync_writethrough # open_sync #full_page_writes = on # recover from partial page writes #wal_buffers = 64kB # min 32kB # (change requires restart) #wal_writer_delay = 200ms # 1-10000 milliseconds #commit_delay = 0 # range 0-100000, in microseconds #commit_siblings = 5 # range 1-1000 # - Checkpoints - #checkpoint_segments = 3 # in logfile segments, min 1, 16MB each #checkpoint_timeout = 5min # range 30s-1h #checkpoint_completion_target = 0.5 # checkpoint target duration, 0.0 - 1.0 #checkpoint_warning = 30s # 0 disables # - Archiving - #archive_mode = on # allows archiving to be done # (change requires restart) #archive_command = 'cp %p /backup/repl/%f' # command to use to archive a logfile segment #archive_timeout = 0 # force a logfile segment switch after this # number of seconds; 0 disables # - Streaming Replication - #max_wal_senders = 5 # max number of walsender processes # (change requires restart) #wal_sender_delay = 200ms # walsender cycle time, 1-10000 milliseconds #wal_keep_segments = 32 # in logfile segments, 16MB each; 0 disables #vacuum_defer_cleanup_age = 0 # number of xacts by which cleanup is delayed # - Standby Servers - hot_standby = on # "on" allows queries during recovery # (change requires restart) #max_standby_archive_delay = 30s # max delay before canceling queries # when reading WAL from archive; # -1 allows indefinite delay #max_standby_streaming_delay = 30s # max delay before canceling queries # when reading streaming WAL; # -1 allows indefinite delay The standby server recovery.conf file: standby_mode = 'on' primary_conninfo = 'host=<CUT> port=<CUT> user=<CUT>' trigger_file = '<CUT>/repl_trigger' restore_command = 'rsync <CUT>/%f "%p"'
Re: Clogging problem (was: Schema-only dump dumps no constraints, no triggers)
From
Marek Kielar
Date:
Hi, to complement information from the previous message: Dnia 29 lipca 2012 12:29 Marek Kielar <mkielar@go2.pl> napisał(a): > Hi, > > > Dnia 28 lipca 2012 1:10 Adrian Klaver <adrian.klaver@gmail.com> napisał(a): > > > What where the deleted files? > > WAL, Logs, other? > > > at this time - a couple days after restart, the clog hasn't re-formed yet. Thus, I am unable to tell you what files theywere, we didn't pay that much attention to it then - there were some WAL files but I can't tell what the actual structurewas. I'll provide this information whenever possible. The clog has somewhat re-formed - the full listing of lsof (filtered for unique files) for postmaster(s) on the databasemount is here: http://BillionUploads.com/ya9kjv78t9es/postmaster_files_sorted.csv.html Consecutive commands were issued in a matter of minutes and differ slightly. Some totals / aggregates: df – /data 83 141 382 144 du – /data 29 170 365 801 lsof – /data 75 348 037 632 lsof – /data/base 74 975 969 280 lsof – /data/base (deleted) 53 769 936 896 lsof – /data/pg_xlog 369 098 752 lsof – /data/pg_xlog (deleted) 201 326 592 lsof – /data/global 2 965 504 It is clear that the server processes are keeping most of the files from being actually deleted.
On 08/06/2012 05:08 AM, Marek Kielar wrote: > Hi, > > to complement information from the previous message: > > > Dnia 29 lipca 2012 12:29 Marek Kielar <mkielar@go2.pl> napisał(a): > >> Hi, >> >> >> Dnia 28 lipca 2012 1:10 Adrian Klaver <adrian.klaver@gmail.com> napisał(a): >> >>> What where the deleted files? >>> WAL, Logs, other? >> >> >> at this time - a couple days after restart, the clog hasn't re-formed yet. Thus, I am unable to tell you what files theywere, we didn't pay that much attention to it then - there were some WAL files but I can't tell what the actual structurewas. I'll provide this information whenever possible. > > > The clog has somewhat re-formed - the full listing of lsof (filtered for unique files) for postmaster(s) on the databasemount is here: > http://BillionUploads.com/ya9kjv78t9es/postmaster_files_sorted.csv.html FYI you might to consider using some other site for uploads. The above is sort of scary and leads you down all sorts of false paths. > > Consecutive commands were issued in a matter of minutes and differ slightly. > > Some totals / aggregates: > df – /data 83 141 382 144 > du – /data 29 170 365 801 > lsof – /data 75 348 037 632 > lsof – /data/base 74 975 969 280 > lsof – /data/base (deleted) 53 769 936 896 > lsof – /data/pg_xlog 369 098 752 > lsof – /data/pg_xlog (deleted) 201 326 592 > lsof – /data/global 2 965 504 > > It is clear that the server processes are keeping most of the files from being actually deleted. Well the nature of database data files is they expand and/or contract as needed. Unless you are getting rid of the actual object they refer to they will not be deleted. The files WAL files in pg_xlog are a different matter, but in the listing you sent they seem to be reasonable. There are a couple of things off the top of my head that can cause data files to expand unnecessarily: 1) Autovacuum is not aggressive enough. 2) There are open transactions keeping old tuples from being removed. From previous posts, you mentioned a 'permanent' connection to the database. Are you sure it is not holding an open transaction? The pg_locks view would be a good place to start: http://www.postgresql.org/docs/9.1/interactive/view-pg-locks.html > > -- Adrian Klaver adrian.klaver@gmail.com
Dnia 6 sierpnia 2012 17:00 Adrian Klaver <adrian.klaver@gmail.com> napisał(a): > > The clog has somewhat re-formed - the full listing of lsof (filtered for unique files) for postmaster(s) on the databasemount is here: > > http://BillionUploads.com/ya9kjv78t9es/postmaster_files_sorted.csv.html > > FYI you might to consider using some other site for uploads. The above > is sort of scary and leads you down all sorts of false paths. > Sorry about that, it's the first time I had used a hosting service and they didn't require creating an account (as in a liston Wikipedia). I guess using NoScript spoils with saving from trouble, but makes one come at wrong assumptions. I'lltry to choose better next time. > > > > Consecutive commands were issued in a matter of minutes and differ slightly. > > > > Some totals / aggregates: > > df – /data 83 141 382 144 > > du – /data 29 170 365 801 > > lsof – /data 75 348 037 632 > > lsof – /data/base 74 975 969 280 > > lsof – /data/base (deleted) 53 769 936 896 > > lsof – /data/pg_xlog 369 098 752 > > lsof – /data/pg_xlog (deleted) 201 326 592 > > lsof – /data/global 2 965 504 > > > > It is clear that the server processes are keeping most of the files from being actually deleted. > > Well the nature of database data files is they expand and/or contract as > needed. Unless you are getting rid of the actual object they refer to > they will not be deleted. The files WAL files in pg_xlog are a different > matter, but in the listing you sent they seem to be reasonable. There > are a couple of things off the top of my head that can cause data files > to expand unnecessarily: > 1) Autovacuum is not aggressive enough. > 2) There are open transactions keeping old tuples from being removed. > > From previous posts, you mentioned a 'permanent' connection to the > database. Are you sure it is not holding an open transaction? > The pg_locks view would be a good place to start: > http://www.postgresql.org/docs/9.1/interactive/view-pg-locks.html > 1) Running, through pgAdmin3, an ordinary VACUUM FULL ANALYZE and REINDEX on all our databases plus the "postgres" databasethat are in the cluster, didn't release a substantial amount of disk space - it might add up to maybe a few percentof the overall. 2) It doesn't seem there are any long-running transactions even though the PIDs do repeat during some time (but since theconnections are kept open this seems reasonable): postgres=# SELECT * FROM pg_locks ORDER BY pid; locktype;database;relation;page;tuple;virtualxid;transactionid;classid;objid;objsubid;virtualtransaction;pid;mode;granted relation;11874;10985;;;;;;;;85/101738;24367;AccessShareLock;t virtualxid;;;;;85/101738;;;;;85/101738;24367;ExclusiveLock;t virtualxid;;;;;20/788838;;;;;20/788838;24505;ExclusiveLock;t virtualxid;;;;;14/923780;;;;;14/923780;24621;ExclusiveLock;t virtualxid;;;;;76/139304;;;;;76/139304;24699;ExclusiveLock;t virtualxid;;;;;55/199999;;;;;55/199999;24703;ExclusiveLock;t virtualxid;;;;;59/363780;;;;;59/363780;24926;ExclusiveLock;t (7 rows) And after some time with a different invocation of psql (to let go of the PID): postgres=# SELECT * FROM pg_locks ORDER BY pid; locktype;database;relation;page;tuple;virtualxid;transactionid;classid;objid;objsubid;virtualtransaction;pid;mode;granted virtualxid;;;;;56/410614;;;;;56/410614;25105;ExclusiveLock;t virtualxid;;;;;3/667499;;;;;3/667499;25145;ExclusiveLock;t relation;11874;10985;;;;;;;;85/101817;25171;AccessShareLock;t virtualxid;;;;;85/101817;;;;;85/101817;25171;ExclusiveLock;t (4 rows) We are again approaching slowly the point that the server restart will be needed. If / when this happens, I'll provide statisticsagain. Best regards, Marek Kielar