Thread: Re: Schema-only dump dumps no constraints, no triggers

Re: Schema-only dump dumps no constraints, no triggers

From
Marek Kielar
Date:
Hi,


Dnia 28 lipca 2012 1:10 Adrian Klaver <adrian.klaver@gmail.com> napisał(a):

> What where the deleted files?
>    WAL, Logs, other?
> What type of WAL replication are you doing?
>    Streaming, log shipping, etc?
> What are your settings for the WAL replication?
>    In particular wal_keep_segments ?
> Is the WAL replication actually working?


at this time - a couple days after restart, the clog hasn't re-formed yet. Thus, I am unable to tell you what files
theywere, we didn't pay that much attention to it then - there were some WAL files but I can't tell what the actual
structurewas. I'll provide this information whenever possible. 

The WAL replication is a streaming replication with a hot standby server. The servers have a direct connection with one
another.Configuration appended. 

The replication is working fine.

The primary server also has a single mostly-"idle" transaction from any client node - a trait of an "always online"
clientapplication. Immediately after primary server restart this does not add much overhead, though. I can't tell
whetherthis, in the long run, might cause the problem. 

Configuration follows:
 - primary server postgresql.conf - WAL section
 - standby server postgresql.conf - WAL section
 - standby server recovery.conf

The primary server WAL configuration is:
#------------------------------------------------------------------------------
# WRITE AHEAD LOG
#------------------------------------------------------------------------------

# - Settings -

wal_level = hot_standby            # minimal, archive, or hot_standby
                    # (change requires restart)
#fsync = on                # turns forced synchronization on or off
#synchronous_commit = on        # immediate fsync at commit
#wal_sync_method = fsync        # the default is the first option
                    # supported by the operating system:
                    #   open_datasync
                    #   fdatasync (default on Linux)
                    #   fsync
                    #   fsync_writethrough
                    #   open_sync
#full_page_writes = on            # recover from partial page writes
wal_buffers = 1MB            # min 32kB
                    # (change requires restart)
#wal_writer_delay = 200ms        # 1-10000 milliseconds

#commit_delay = 0            # range 0-100000, in microseconds
#commit_siblings = 5            # range 1-1000

# - Checkpoints -

#checkpoint_segments = 30        # in logfile segments, min 1, 16MB each
#checkpoint_timeout = 5min        # range 30s-1h
#checkpoint_completion_target = 0.5    # checkpoint target duration, 0.0 - 1.0
#checkpoint_warning = 30s        # 0 disables

# - Archiving -

archive_mode = on        # allows archiving to be done
                # (change requires restart)
archive_command = 'rsync %p <CUT>/%f'        # command to use to archive a logfile segment
archive_timeout = 0        # force a logfile segment switch after this
                # number of seconds; 0 disables

# - Streaming Replication -

max_wal_senders = 5        # max number of walsender processes
                # (change requires restart)
#wal_sender_delay = 200ms    # walsender cycle time, 1-10000 milliseconds
wal_keep_segments = 32        # in logfile segments, 16MB each; 0 disables
#vacuum_defer_cleanup_age = 0    # number of xacts by which cleanup is delayed

# - Standby Servers -

#hot_standby = off            # "on" allows queries during recovery
                    # (change requires restart)
#max_standby_archive_delay = 30s    # max delay before canceling queries
                    # when reading WAL from archive;
                    # -1 allows indefinite delay
#max_standby_streaming_delay = 30s    # max delay before canceling queries
                    # when reading streaming WAL;
                    # -1 allows indefinite delay




The standby server WAL configuration is:
#------------------------------------------------------------------------------
# WRITE AHEAD LOG
#------------------------------------------------------------------------------

# - Settings -

#wal_level = minimal            # minimal, archive, or hot_standby
                    # (change requires restart)
#fsync = on                # turns forced synchronization on or off
#synchronous_commit = on        # immediate fsync at commit
#wal_sync_method = fsync        # the default is the first option
                    # supported by the operating system:
                    #   open_datasync
                    #   fdatasync (default on Linux)
                    #   fsync
                    #   fsync_writethrough
                    #   open_sync
#full_page_writes = on            # recover from partial page writes
#wal_buffers = 64kB            # min 32kB
                    # (change requires restart)
#wal_writer_delay = 200ms        # 1-10000 milliseconds

#commit_delay = 0            # range 0-100000, in microseconds
#commit_siblings = 5            # range 1-1000

# - Checkpoints -

#checkpoint_segments = 3        # in logfile segments, min 1, 16MB each
#checkpoint_timeout = 5min        # range 30s-1h
#checkpoint_completion_target = 0.5    # checkpoint target duration, 0.0 - 1.0
#checkpoint_warning = 30s        # 0 disables

# - Archiving -

#archive_mode = on        # allows archiving to be done
                # (change requires restart)
#archive_command = 'cp %p /backup/repl/%f'        # command to use to archive a logfile segment
#archive_timeout = 0        # force a logfile segment switch after this
                # number of seconds; 0 disables

# - Streaming Replication -

#max_wal_senders = 5        # max number of walsender processes
                # (change requires restart)
#wal_sender_delay = 200ms    # walsender cycle time, 1-10000 milliseconds
#wal_keep_segments = 32        # in logfile segments, 16MB each; 0 disables
#vacuum_defer_cleanup_age = 0    # number of xacts by which cleanup is delayed

# - Standby Servers -

hot_standby = on            # "on" allows queries during recovery
                    # (change requires restart)
#max_standby_archive_delay = 30s    # max delay before canceling queries
                    # when reading WAL from archive;
                    # -1 allows indefinite delay
#max_standby_streaming_delay = 30s    # max delay before canceling queries
                    # when reading streaming WAL;
                    # -1 allows indefinite delay




The standby server recovery.conf file:
standby_mode = 'on'
primary_conninfo = 'host=<CUT> port=<CUT> user=<CUT>'

trigger_file = '<CUT>/repl_trigger'

restore_command = 'rsync <CUT>/%f "%p"'


Re: Clogging problem (was: Schema-only dump dumps no constraints, no triggers)

From
Marek Kielar
Date:
Hi,

to complement information from the previous message:


Dnia 29 lipca 2012 12:29 Marek Kielar <mkielar@go2.pl> napisał(a):

> Hi,
>
>
> Dnia 28 lipca 2012 1:10 Adrian Klaver <adrian.klaver@gmail.com> napisał(a):
>
> > What where the deleted files?
> >    WAL, Logs, other?
>
>
> at this time - a couple days after restart, the clog hasn't re-formed yet. Thus, I am unable to tell you what files
theywere, we didn't pay that much attention to it then - there were some WAL files but I can't tell what the actual
structurewas. I'll provide this information whenever possible. 


The clog has somewhat re-formed - the full listing of lsof (filtered for unique files) for postmaster(s) on the
databasemount is here: 
http://BillionUploads.com/ya9kjv78t9es/postmaster_files_sorted.csv.html

Consecutive commands were issued in a matter of minutes and differ slightly.

Some totals / aggregates:
df – /data    83 141 382 144
du – /data    29 170 365 801
lsof – /data    75 348 037 632
lsof – /data/base    74 975 969 280
lsof – /data/base (deleted)    53 769 936 896
lsof – /data/pg_xlog    369 098 752
lsof – /data/pg_xlog (deleted)    201 326 592
lsof – /data/global    2 965 504

It is clear that the server processes are keeping most of the files from being actually deleted.


Re: Clogging problem

From
Adrian Klaver
Date:
On 08/06/2012 05:08 AM, Marek Kielar wrote:
> Hi,
>
> to complement information from the previous message:
>
>
> Dnia 29 lipca 2012 12:29 Marek Kielar <mkielar@go2.pl> napisał(a):
>
>> Hi,
>>
>>
>> Dnia 28 lipca 2012 1:10 Adrian Klaver <adrian.klaver@gmail.com> napisał(a):
>>
>>> What where the deleted files?
>>>     WAL, Logs, other?
>>
>>
>> at this time - a couple days after restart, the clog hasn't re-formed yet. Thus, I am unable to tell you what files
theywere, we didn't pay that much attention to it then - there were some WAL files but I can't tell what the actual
structurewas. I'll provide this information whenever possible. 
>
>
> The clog has somewhat re-formed - the full listing of lsof (filtered for unique files) for postmaster(s) on the
databasemount is here: 
> http://BillionUploads.com/ya9kjv78t9es/postmaster_files_sorted.csv.html

FYI you might to consider using some other site for uploads. The above
is sort of scary and leads you down all sorts of false paths.

>
> Consecutive commands were issued in a matter of minutes and differ slightly.
>
> Some totals / aggregates:
> df – /data    83 141 382 144
> du – /data    29 170 365 801
> lsof – /data    75 348 037 632
> lsof – /data/base    74 975 969 280
> lsof – /data/base (deleted)    53 769 936 896
> lsof – /data/pg_xlog    369 098 752
> lsof – /data/pg_xlog (deleted)    201 326 592
> lsof – /data/global    2 965 504
>
> It is clear that the server processes are keeping most of the files from being actually deleted.

Well the nature of database data files is they expand and/or contract as
needed. Unless you are getting rid of the actual object they refer to
they will not be deleted. The files WAL files in pg_xlog are a different
matter, but in the listing you sent they seem to be reasonable. There
are a couple of things off the top of my head that can cause data files
to expand unnecessarily:
1) Autovacuum is not aggressive enough.
2) There are open transactions keeping old tuples from being removed.

 From previous posts, you mentioned a 'permanent' connection to the
database. Are you sure it is not holding an open transaction?
The pg_locks view would be a good place to start:
http://www.postgresql.org/docs/9.1/interactive/view-pg-locks.html

>
>


--
Adrian Klaver
adrian.klaver@gmail.com

Re: Clogging problem

From
Marek Kielar
Date:


Dnia 6 sierpnia 2012 17:00 Adrian Klaver <adrian.klaver@gmail.com> napisał(a):

> > The clog has somewhat re-formed - the full listing of lsof (filtered for unique files) for postmaster(s) on the
databasemount is here: 
> > http://BillionUploads.com/ya9kjv78t9es/postmaster_files_sorted.csv.html
>
> FYI you might to consider using some other site for uploads. The above
> is sort of scary and leads you down all sorts of false paths.
>

Sorry about that, it's the first time I had used a hosting service and they didn't require creating an account (as in a
liston Wikipedia). I guess using NoScript spoils with saving from trouble, but makes one come at wrong assumptions.
I'lltry to choose better next time. 

> >
> > Consecutive commands were issued in a matter of minutes and differ slightly.
> >
> > Some totals / aggregates:
> > df – /data    83 141 382 144
> > du – /data    29 170 365 801
> > lsof – /data    75 348 037 632
> > lsof – /data/base    74 975 969 280
> > lsof – /data/base (deleted)    53 769 936 896
> > lsof – /data/pg_xlog    369 098 752
> > lsof – /data/pg_xlog (deleted)    201 326 592
> > lsof – /data/global    2 965 504
> >
> > It is clear that the server processes are keeping most of the files from being actually deleted.
>
> Well the nature of database data files is they expand and/or contract as
> needed. Unless you are getting rid of the actual object they refer to
> they will not be deleted. The files WAL files in pg_xlog are a different
> matter, but in the listing you sent they seem to be reasonable. There
> are a couple of things off the top of my head that can cause data files
> to expand unnecessarily:
> 1) Autovacuum is not aggressive enough.
> 2) There are open transactions keeping old tuples from being removed.
>
>  From previous posts, you mentioned a 'permanent' connection to the
> database. Are you sure it is not holding an open transaction?
> The pg_locks view would be a good place to start:
> http://www.postgresql.org/docs/9.1/interactive/view-pg-locks.html
>


1) Running, through pgAdmin3, an ordinary VACUUM FULL ANALYZE and REINDEX on all our databases plus the "postgres"
databasethat are in the cluster, didn't release a substantial amount of disk space - it might add up to maybe a few
percentof the overall. 


2) It doesn't seem there are any long-running transactions even though the PIDs do repeat during some time (but since
theconnections are kept open this seems reasonable): 

postgres=# SELECT * FROM pg_locks ORDER BY pid;

locktype;database;relation;page;tuple;virtualxid;transactionid;classid;objid;objsubid;virtualtransaction;pid;mode;granted
relation;11874;10985;;;;;;;;85/101738;24367;AccessShareLock;t
virtualxid;;;;;85/101738;;;;;85/101738;24367;ExclusiveLock;t
virtualxid;;;;;20/788838;;;;;20/788838;24505;ExclusiveLock;t
virtualxid;;;;;14/923780;;;;;14/923780;24621;ExclusiveLock;t
virtualxid;;;;;76/139304;;;;;76/139304;24699;ExclusiveLock;t
virtualxid;;;;;55/199999;;;;;55/199999;24703;ExclusiveLock;t
virtualxid;;;;;59/363780;;;;;59/363780;24926;ExclusiveLock;t
(7 rows)

And after some time with a different invocation of psql (to let go of the PID):

postgres=# SELECT * FROM pg_locks ORDER BY pid;

locktype;database;relation;page;tuple;virtualxid;transactionid;classid;objid;objsubid;virtualtransaction;pid;mode;granted
virtualxid;;;;;56/410614;;;;;56/410614;25105;ExclusiveLock;t
virtualxid;;;;;3/667499;;;;;3/667499;25145;ExclusiveLock;t
relation;11874;10985;;;;;;;;85/101817;25171;AccessShareLock;t
virtualxid;;;;;85/101817;;;;;85/101817;25171;ExclusiveLock;t
(4 rows)


We are again approaching slowly the point that the server restart will be needed. If / when this happens, I'll provide
statisticsagain. 

Best regards,
Marek Kielar