Thread: log_truncate_on_rotation=on is not truncating
We discovered that on our servers where we do daily shutdown and startup for a cold backup, the pg_log logs are not truncated when it switches to the new day, when log_truncation_on_rotation=on. The shutdown/startup of postgresql happens daily at 16:00 system time. Also observed that only when the shutdown event happen at 16:00 pm, it is only when the log is rotated for the current day, and the event message is appended to last week's postgreql-%a.log. On our other servers where there is not a daily shutdown/startup, the postgresql-%a.log is rotated to the new day at 00:00 hour and is truncated of all last week's content. We first noticed this only our standby servers where we do daily cold backup and restart postgresql, and no other activities. Then I was able to reproduce this issue on a standalone server by scheduling a daily shutdown/restart. * * *Here are settings and log details:* * * Postgresql is shutdown/startup via service: /sbin/service postgresql-9.1 stop/start -- calls pg_ctl OS versions: centos-release-notes-5.5-0 and centos-release-6-2.el6.centos.7.x86_64 PostgreSQL version: PostgreSQL 9.1.3 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.6 20110731 (Red Hat 4.4.6-3), 64-bit and PostgreSQL 9.1.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.6 20110731 (Red Hat 4.4.6-3), 64-bit Postgresql.conf settings for pg_logs related: logging_collector=on log_directory='pg_log' log_filename='postgresql-%a.log' log_truncate_on_rotation=on log_rotation_age=1d log_rotation_size=0 timezone = 'UTC' Service timezone: >date Tue Jun 19 11:25:56 PDT 2012 Content of a postgresql-Tue.log in its entirety. Note this is a standby slave so there are not much activities other than shutdown/restart performed for a cold backup. The *bold *date is just to de-mark the current day where the entries were appended. The entries of 2012-06-19 was an intentional syntax error to get the an entry for the current day. 2012-06-12 16:00:02 PDT 23761 4fd67fc4.5cd1 LOG: received fast shutdown request 2012-06-12 16:00:02 PDT 23761 4fd67fc4.5cd1 LOG: aborting any active transactions 2012-06-12 16:00:02 PDT 23800 4fd67fc8.5cf8 FATAL: terminating walreceiver process due to administrator command 2012-06-12 16:00:02 PDT 23765 4fd67fc5.5cd5 LOG: shutting down 2012-06-12 16:00:02 PDT 23765 4fd67fc5.5cd5 LOG: database system is shut down 2012-06-12 16:31:08 PDT 30458 4fd7d13c.76fa LOG: database system was shut down in recovery at 2012-06-12 16:00:02 PDT 2012-06-12 16:31:08 PDT 30458 4fd7d13c.76fa LOG: entering standby mode 2012-06-12 16:31:08 PDT 30458 4fd7d13c.76fa LOG: redo starts at 6B/D3E1EA70 2012-06-12 16:31:10 PDT 30458 4fd7d13c.76fa LOG: consistent recovery state reached at 6B/DB8FDFE8 2012-06-12 16:31:10 PDT 30458 4fd7d13c.76fa LOG: unexpected pageaddr 69/D68FE000 in log file 107, segment 219, offset 9428992 2012-06-12 16:31:10 PDT 30454 4fd7d13a.76f6 LOG: database system is ready to accept read only connections 2012-06-12 16:31:11 PDT 30494 4fd7d13e.771e LOG: streaming replication successfully connected to primary *2012-06-19 *11:22:04 PDT postgres [local] 30698 4fe0c34c.77ea ERROR: syntax error at or near ":" at character 17 *2012-06-19 *11:22:04 PDT postgres [local] 30698 4fe0c34c.77ea STATEMENT: select version(): Listing of pg_log directory. You can see here that the last entry of each day (file timestamp) is when postgresql is restarted at 16:30. Except June 19, I just force a syntax error to create an entry for today, June 19: -rw------- 1 postgres postgres 1403 Jun 13 16:30 postgresql-Wed.log -rw------- 1 postgres postgres 2868 Jun 14 16:31 postgresql-Thu.log -rw------- 1 postgres postgres 2418 Jun 15 16:31 postgresql-Fri.log -rw------- 1 postgres postgres 2222 Jun 16 16:31 postgresql-Sat.log -rw------- 1 postgres postgres 2208 Jun 17 16:30 postgresql-Sun.log -rw------- 1 postgres postgres 2832 Jun 18 16:31 postgresql-Mon.log -rw------- 1 postgres postgres 1355 Jun 19 11:22 postgresql-Tue.log Here's normal behavior on a server that does NOT get restarted daily. The logs are all rotated and truncated at 00:00 as expected: $ pwd /data/postgres/9.1/pg_log ls -ltr total 4 -rw------- 1 postgres postgres 0 Jun 13 00:00 postgresql-Wed.log -rw------- 1 postgres postgres 0 Jun 14 00:00 postgresql-Thu.log -rw------- 1 postgres postgres 1563 Jun 15 14:12 postgresql-Fri.log -rw------- 1 postgres postgres 0 Jun 16 00:00 postgresql-Sat.log -rw------- 1 postgres postgres 0 Jun 17 00:00 postgresql-Sun.log -rw------- 1 postgres postgres 0 Jun 18 00:00 postgresql-Mon.log -rw------- 1 postgres postgres 0 Jun 19 00:00 postgresql-Tue.log -- * Chan Pham Database Administrator Direct: 503.943.6773 Fax: 503.224.1581 // AIM: ioChanny // Twitter: iovation www.iovation.com <http://www.iovation.com/eml-msg/> *
Hi Chan, I think this is configuration issue. This issue is happening because log files will always append when server is restarted. What I can see from your configuration, log_rotation_age=3D1d (1 day) which is greater than daily schedule shutdown/restart time. So this is the reason why log files are growing. You can change the log_rotation_age=3D8h (i.e. any time less than the schedule shutdown/restart time) as log file name=20 Is postgreql-%a.log this is sufficient to preserve till last week's log, as at the time of rotation after 8 hours if the day is not changed then log file name will be same as previous then truncation will not happen. =A0 Thanks & Regards, Kiran Kumar =A0=A0=A0 -------------------------- VISWANATHAM=A0 KIRAN KUMAR SENIOR TECHNICAL LEADER. HUAWEI TECHNOLOGIES INDIA PVT. LTD. EXTN: 7560; LEVEL-7 C & D TOWER DIAMOND DISTRICT, OLD AIRPORT ROAD BANGALORE-08 MOBILE: +91-9945115987 =A0 From: pgsql-bugs-owner@postgresql.org [mailto:pgsql-bugs-owner@postgresql.org] On Behalf Of Chan Pham Sent: Wednesday, June 20, 2012 11:54 PM To: pgsql-bugs@postgresql.org Subject: [BUGS] log_truncate_on_rotation=3Don is not truncating We discovered that on our servers where we do daily shutdown and startup for a cold backup, the pg_log logs are not truncated when it switches to the new day, when log_truncation_on_rotation=3Don. The shutdown/startup of postgres= ql happens daily at 16:00 system time. Also observed that only when the shutdown event happen at 16:00 pm, it is only when the log is rotated for the current day, and the event message is appended to last week's postgreql-%a.log.=A0 On our other servers where there is not a daily shutdown/startup, the postgresql-%a.log is rotated to the new day at 00:00 hour and is truncated of all last week's content. We first noticed this only our standby servers where we do daily cold backup and restart postgresql, and no other activities. Then I was able to reproduce this issue on a standalone server by scheduling a daily shutdown/restart. Here are settings and log details: Postgresql is shutdown/startup via service: =A0 =A0/sbin/service postgresql-9.1 stop/start =A0-- calls pg_ctl OS versions: =A0centos-release-notes-5.5-0 =A0 and =A0centos-release-6-2.el6.centos.7.x86_64 PostgreSQL version: =A0 =A0PostgreSQL 9.1.3 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) = 4.4.6 20110731 (Red Hat 4.4.6-3), 64-bit =A0 =A0and =A0 =A0PostgreSQL 9.1.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) = 4.4.6 20110731 (Red Hat 4.4.6-3), 64-bit Postgresql.conf settings for pg_logs related: =A0 =A0logging_collector=3Don =A0 =A0log_directory=3D'pg_log' =A0 =A0log_filename=3D'postgresql-%a.log' =A0 =A0log_truncate_on_rotation=3Don =A0 =A0log_rotation_age=3D1d =A0 =A0log_rotation_size=3D0 =A0 =A0timezone =3D 'UTC' Service timezone: =A0 >date Tue Jun 19 11:25:56 PDT 2012 Content of a postgresql-Tue.log in its entirety. Note this is a standby slave so there are not much activities other than shutdown/restart performed for a cold backup. The bold date is just to de-mark the current day where the entries were appended. The entries of 2012-06-19 was an intentional syntax error to get the an entry for the current day. 2012-06-12 16:00:02 PDT =A0 23761 4fd67fc4.5cd1 LOG: =A0received fast shutd= own request 2012-06-12 16:00:02 PDT =A0 23761 4fd67fc4.5cd1 LOG: =A0aborting any active transactions 2012-06-12 16:00:02 PDT =A0 23800 4fd67fc8.5cf8 FATAL: =A0terminating walreceiver process due to administrator command 2012-06-12 16:00:02 PDT =A0 23765 4fd67fc5.5cd5 LOG: =A0shutting down 2012-06-12 16:00:02 PDT =A0 23765 4fd67fc5.5cd5 LOG: =A0database system is = shut down 2012-06-12 16:31:08 PDT =A0 30458 4fd7d13c.76fa LOG: =A0database system was= shut down in recovery at 2012-06-12 16:00:02 PDT 2012-06-12 16:31:08 PDT =A0 30458 4fd7d13c.76fa LOG: =A0entering standby mo= de 2012-06-12 16:31:08 PDT =A0 30458 4fd7d13c.76fa LOG: =A0redo starts at 6B/D3E1EA70 2012-06-12 16:31:10 PDT =A0 30458 4fd7d13c.76fa LOG: =A0consistent recovery state reached at 6B/DB8FDFE8 2012-06-12 16:31:10 PDT =A0 30458 4fd7d13c.76fa LOG: =A0unexpected pageaddr 69/D68FE000 in log file 107, segment 219, offset 9428992 2012-06-12 16:31:10 PDT =A0 30454 4fd7d13a.76f6 LOG: =A0database system is = ready to accept read only connections 2012-06-12 16:31:11 PDT =A0 30494 4fd7d13e.771e LOG: =A0streaming replicati= on successfully connected to primary 2012-06-19 11:22:04 PDT postgres [local] 30698 4fe0c34c.77ea ERROR: =A0synt= ax error at or near ":" at character 17 2012-06-19 11:22:04 PDT postgres [local] 30698 4fe0c34c.77ea STATEMENT: =A0select version(): Listing of pg_log directory. You can see here that the last entry of each day (file timestamp) is when postgresql is restarted at 16:30. Except June 19, I just force a syntax error to create an entry for today, June 19: -rw------- 1 postgres postgres 1403 Jun 13 16:30 postgresql-Wed.log -rw------- 1 postgres postgres 2868 Jun 14 16:31 postgresql-Thu.log -rw------- 1 postgres postgres 2418 Jun 15 16:31 postgresql-Fri.log -rw------- 1 postgres postgres 2222 Jun 16 16:31 postgresql-Sat.log -rw------- 1 postgres postgres 2208 Jun 17 16:30 postgresql-Sun.log -rw------- 1 postgres postgres 2832 Jun 18 16:31 postgresql-Mon.log -rw------- 1 postgres postgres 1355 Jun 19 11:22 postgresql-Tue.log Here's normal behavior on a server that does NOT get restarted daily. The logs are all rotated and truncated at 00:00 as expected: $ pwd /data/postgres/9.1/pg_log ls -ltr total 4 -rw------- 1 postgres postgres =A0 =A00 Jun 13 00:00 postgresql-Wed.log -rw------- 1 postgres postgres =A0 =A00 Jun 14 00:00 postgresql-Thu.log -rw------- 1 postgres postgres 1563 Jun 15 14:12 postgresql-Fri.log -rw------- 1 postgres postgres =A0 =A00 Jun 16 00:00 postgresql-Sat.log -rw------- 1 postgres postgres =A0 =A00 Jun 17 00:00 postgresql-Sun.log -rw------- 1 postgres postgres =A0 =A00 Jun 18 00:00 postgresql-Mon.log -rw------- 1 postgres postgres =A0 =A00 Jun 19 00:00 postgresql-Tue.log --=20 Chan Pham Database Administrator Direct: 503.943.6773 Fax: 503.224.1581 // AIM: ioChanny // Twitter: iovation www.iovation.com
[ just got around to looking at this bug report, sorry for the delay ] Viswanatham Kiran Kumar <viswanatham.kirankumar@huawei.com> writes: > Hi Chan, > I think this is configuration issue. > This issue is happening because log files will always append when > server is restarted. What I can see from your configuration, > log_rotation_age=1d (1 day) which is greater than daily schedule > shutdown/restart time. So this is the reason why log files are growing. Yeah, I think that's part of it, but it does still seem like a bug. The relevant bit of the code is /* * The requirements here are to choose the next time > now that is a * "multiple" of the log rotation interval. "Multiple" can be interpreted * fairly loosely. In this version we align to log_timezone rather than * GMT. */ rotinterval = Log_RotationAge * SECS_PER_MINUTE; /* convert to seconds */ now = (pg_time_t) time(NULL); tm = pg_localtime(&now, log_timezone); now += tm->tm_gmtoff; now -= now % rotinterval; now += rotinterval; now -= tm->tm_gmtoff; next_rotation_time = now; Looking at this, it seems like it will choose a next_rotation_time that is more than rotinterval beyond "now" if you have a log_timezone for which advancing by gmtoff moves past the next natural rotinterval multiple. So if the server gets shut down before that time elapses, it never will think it should rotate the logfile. Chan, could you say exactly what log_timezone setting is in use on the servers that are acting funny? regards, tom lane
I wrote: > The relevant bit of the code is > /* > * The requirements here are to choose the next time > now that is a > * "multiple" of the log rotation interval. "Multiple" can be interpreted > * fairly loosely. In this version we align to log_timezone rather than > * GMT. > */ Oh, no, scratch that --- there's nothing wrong with that calculation. Actually I think the issue is with the logic that decides whether to truncate or append during rotation: * Decide whether to overwrite or append. We can overwrite if (a) * Log_truncate_on_rotation is set, (b) the rotation was triggered by * elapsed time and not something else, and (c) the computed file name is * different from what we were previously logging into. * * Note: during the first rotation after forking off from the postmaster, * last_file_name will be NULL. (We don't bother to set it in the * postmaster because it ain't gonna work in the EXEC_BACKEND case.) So we * will always append in that situation, even though truncating would * usually be safe. This dates from around 2004, and at the time we did not think it was very important if the first rotation opportunity failed to truncate. Of course, if you restart the postmaster so often that it never gets to the second rotation opportunity, that theory falls down. But I guess it's worth asking why you think it's a good plan to reboot so often. Most people running Postgres are appalled by the idea that they might have to have any downtime at all. regards, tom lane
Hello Tom and all, Thank you for following up on this. 1. log_timezone: admin_datamart=# show log_timezone; log_timezone -------------- US/Pacific (1 row) I have been playing around with changing log_rotation_age = 7h, +/- 1 hour to see if I can get the elapse timing right. The shutdown/startup time is still at 1600/18:30. At 8h, it was still appending. But recently, I've change it to 7h. Some day the log will truncate and some day it didn't. I'm guessing if its based on elapsed time since startup for truncate to happen or not, then it make sense that truncate didn't always happen because startup time can varies (some day tar runs longer). 2. To answer your question as why we are restarting daily is that these servers are hot standby and we are taking cold backups from there. The plan is to do hot backup (now that we are on 9.1 and can pause recovery) but we just haven't gotten time to implement the hot backup yet. Thanks, Chan On Sun, Jul 15, 2012 at 10:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: > > The relevant bit of the code is > > > /* > > * The requirements here are to choose the next time > now that is a > > * "multiple" of the log rotation interval. "Multiple" can be > interpreted > > * fairly loosely. In this version we align to log_timezone rather > than > > * GMT. > > */ > > Oh, no, scratch that --- there's nothing wrong with that calculation. > Actually I think the issue is with the logic that decides whether to > truncate or append during rotation: > > * Decide whether to overwrite or append. We can overwrite if (a) > * Log_truncate_on_rotation is set, (b) the rotation was triggered by > * elapsed time and not something else, and (c) the computed file name > is > * different from what we were previously logging into. > * > * Note: during the first rotation after forking off from the > postmaster, > * last_file_name will be NULL. (We don't bother to set it in the > * postmaster because it ain't gonna work in the EXEC_BACKEND case.) > So we > * will always append in that situation, even though truncating would > * usually be safe. > > This dates from around 2004, and at the time we did not think it was > very important if the first rotation opportunity failed to truncate. > Of course, if you restart the postmaster so often that it never gets to > the second rotation opportunity, that theory falls down. But I guess > it's worth asking why you think it's a good plan to reboot so often. > Most people running Postgres are appalled by the idea that they might > have to have any downtime at all. > > regards, tom lane > -- * Chan Pham Database Administrator Direct: 503.943.6773 Fax: 503.224.1581 // AIM: ioChanny // Twitter: iovation www.iovation.com <http://www.iovation.com/eml-msg/> *