Thread: BUG #17543: CSVLOG malformed from disk space error

BUG #17543: CSVLOG malformed from disk space error

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      17543
Logged by:          Nathan Ballance
Email address:      nathanfballance@gmail.com
PostgreSQL version: 14.4
Operating system:   Linux/UNIX
Description:

Postgresql server with csvlog log_destination enabled will have malformed
CSV upon a disk space error. This causes any loading of the malformed *.csv
log file to error


Steps to Reproduce:
*Applies to all postgresql versions. Tested using git mirror -
https://github.com/postgres/postgres

Modify 'postgresql.conf' for the following parameter changes:
log_destination = 'stderr,csvlog'
logging_collector = on

Cause system to run out of disk space

postgresql-2022-XX-XX_XXXXXX.csv is now in malformed format

Returns error when executing  \copy postgres_log FROM
postgresql-2022-XX-XX_XXXXXX.csv csv
Testing example:
template1=# \copy postgres_log FROM
'/home/ec2-user/postgres/test_postgres/log/postgresql-2022-07-08_012958.csv'
csv
ERROR:  unterminated CSV quoted field
CONTEXT:  COPY postgres_log, line 107: "2022-07-08 01:41:59.246
UTC,,,12154,,62c78b67.2f7a,1,,2022-07-08 01:41:59
UTC,4/17,0,ERROR,53100,"co..."
template1=# 

In postgresql-2022-XX-XX_XXXXXX.csv the line will not terminate in proper
CSV format

Log snippet from postgresql-2022-07-08_012958.csv
=================
2022-07-08 01:41:49.969
UTC,"ec2-user","template1",12152,"[local]",62c78b2f.2f78,1,"idle",2022-07-08
01:41:03 UTC,3/51,0,ERROR,42601,"syntax error at or near ""IF""",,,,,,"D
ROP TABLE test_5 IF EXISTS;",19,,"psql","client backend",,0
2022-07-08 01:41:59.246 UTC,,,12154,,62c78b67.2f7a,1,,2022-07-08 01:41:59
UTC,4/17,0,ERROR,53100,"could not extend file ""base/1/16389_vm"": No space
left on device",,"Check
free disk space.",,,"while scanning block 32672 of relation
""public.test_5""
automatic vacuum of table ""template1.public.test_5""",,,,"","autovacuum
wor2022-07-08 01:42:44.035 UTC,,,12073,,62c78896.2f29,31,,2022-07-08
01:29:58 UTC,,0,LOG,00000,"chec
kpoint complete: wrote 4169 buffers (25.4%); 0 WAL file(s) added, 0 removed,
33 recycled; write=269.903 s, sync=0.002 s, total=269.972 s; sync files=2,
longest=0.002 s, aver
age=0.001 s; distance=538440 kB, estimate=540448 kB; lsn=1/7E026E90, redo
lsn=1/6C1D0148",,,,,,,,,"","checkpointer",,0
2022-07-08 01:43:14.040 UTC,,,12073,,62c78896.2f29,32,,2022-07-08 01:29:58
UTC,,0,LOG,00000,"checkpoint starting: time",,,,,,,,,"","checkpointer",,0
=================

From the above the malformed log line is:
autovacuum wor2022-07-08 01:42:44.035 UTC

This thread is writing 'autovacuum worker' but terminates but does not fully
write to CSV file and next log line is started. This is malformed.
Please note, when testing that the exact location when this terminates is
not repeatable, however, the CSV will always be malformed upon a disk space
error


ASK: Can the CSV file be written to in a safer way which ensures proper
format even upon disk error?


Re: BUG #17543: CSVLOG malformed from disk space error

From
"David G. Johnston"
Date:
On Saturday, July 9, 2022, PG Bug reporting form <noreply@postgresql.org> wrote:
The following bug has been logged on the website:

Bug reference:      17543
Logged by:          Nathan Ballance
Email address:      nathanfballance@gmail.com
PostgreSQL version: 14.4
Operating system:   Linux/UNIX
Description:       

Postgresql server with csvlog log_destination enabled will have malformed
CSV upon a disk space error. This causes any loading of the malformed *.csv
log file to error


ASK: Can the CSV file be written to in a safer way which ensures proper
format even upon disk error?


This list is for reporting bugs, not feature requests.

I’d have to say that there is little interest in sacrificing performance for safety here, which seems like an unavoidable proposition.

David J.

Re: BUG #17543: CSVLOG malformed from disk space error

From
Michael Paquier
Date:
On Mon, Jul 11, 2022 at 05:31:35AM -0700, David G. Johnston wrote:
> This list is for reporting bugs, not feature requests.
>
> I’d have to say that there is little interest in sacrificing performance
> for safety here, which seems like an unavoidable proposition.

I'd be curious to hear about an actual proposal if there are things
that could be improved.  Saying that, after playing with this area in
the last couple of months to add support for the JSON format, the
design of the syslogger with its set of synced pipes to ensure the
ordering of the logs sent by the backends to the syslogger process
relies on many assumptions, and it may be hard to make things more
reliable without sacrificing performance.  Aka, this can already
become a bottleneck with short logs and a rather high logging level
used and we don't want to make that worse.
--
Michael

Attachment

Re: BUG #17543: CSVLOG malformed from disk space error

From
Andres Freund
Date:
Hi,

On 2022-07-11 05:31:35 -0700, David G. Johnston wrote:
> On Saturday, July 9, 2022, PG Bug reporting form <noreply@postgresql.org>
> wrote:
> 
> > The following bug has been logged on the website:
> >
> > Bug reference:      17543
> > Logged by:          Nathan Ballance
> > Email address:      nathanfballance@gmail.com
> > PostgreSQL version: 14.4
> > Operating system:   Linux/UNIX
> > Description:
> >
> > Postgresql server with csvlog log_destination enabled will have malformed
> > CSV upon a disk space error. This causes any loading of the malformed *.csv
> > log file to error
> >
> >
> > ASK: Can the CSV file be written to in a safer way which ensures proper
> > format even upon disk error?
> >
> >
> This list is for reporting bugs, not feature requests.
> 
> I’d have to say that there is little interest in sacrificing performance
> for safety here, which seems like an unavoidable proposition.

I agree in general, but this specific issue seems easy enough to address. We
could just track whether the last write failed, and if so, emit an additional
newline.

But that just fixes the simple case - if the last successful write contained
the start of an escaped string, the newline won't necessarily be recognized as
the end of a "row".

We can't generally solve this with reasonable cost, because if e.g. postgres
runs out of space and then is restarted, we'll not know how to "break out" of
whatever state we were in. We'd have to read the file to figure that out,
obviously not a realistic proposition.

It might be possible to devise a scheme in which we force rotation in cases
where corruption might have occurred?

Greetings,

Andres Freund



Re: BUG #17543: CSVLOG malformed from disk space error

From
Kyotaro Horiguchi
Date:
At Tue, 12 Jul 2022 17:48:53 -0700, Andres Freund <andres@anarazel.de> wrote in 
> It might be possible to devise a scheme in which we force rotation in cases
> where corruption might have occurred?

It would be cumbersome to generate unique names for new files then let
the server grab the newest file at restart. I'm not sure what happens
if many files are quickly generated due to ENOSPC. (Maybe fills-up the
dirent then moves to another failure mode..)

Considering how often that can happen, I would choose to make the
reader function robust or to fix the file manually..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: BUG #17543: CSVLOG malformed from disk space error

From
Noah Misch
Date:
On Tue, Jul 12, 2022 at 05:48:53PM -0700, Andres Freund wrote:
> On 2022-07-11 05:31:35 -0700, David G. Johnston wrote:
> > On Saturday, July 9, 2022, PG Bug reporting form <noreply@postgresql.org> wrote:
> > > Postgresql server with csvlog log_destination enabled will have malformed
> > > CSV upon a disk space error. This causes any loading of the malformed *.csv
> > > log file to error
> > >
> > > ASK: Can the CSV file be written to in a safer way which ensures proper
> > > format even upon disk error?

> > I’d have to say that there is little interest in sacrificing performance
> > for safety here, which seems like an unavoidable proposition.
> 
> I agree in general, but this specific issue seems easy enough to address. We
> could just track whether the last write failed, and if so, emit an additional
> newline.
> 
> But that just fixes the simple case - if the last successful write contained
> the start of an escaped string, the newline won't necessarily be recognized as
> the end of a "row".

Here's one approach avoiding that problem.  After ENOSPC causes the logfile to
end with a prefix of a message, issue ftruncate(logfile, logfile_length -
written_bytes_of_message_prefix).

An alternative would be to periodically posix_fallocate() substantial space in
the logfile, and write messages only to already-allocated space.  At rotation,
clean shutdown, or startup, ftruncate() away trailing NUL bytes.  I figure
this is inferior to the other approach, because the trailing NUL bytes will be
user-visible after OS crashes and when tailing active logs.

(Neither approach prevents CSV corruption if the OS crashes in the middle of
syslogger's processing of one record.  I don't know a low-cost, general fix
for that.  One tough case is a field that should have been "foo""bar" getting
truncated to "foo".)