Thread: WAL file size vs. data file size

WAL file size vs. data file size

From
Ben Chobot
Date:
Today I tried to restore a 70GB database with the standard "pg_dump -h old_server <…> | psql -h new_server <…>" method. I had 100GB set aside for WAL files, which I figured surely would be enough, because all of the data, including indices, is only 70GB. So I was a bit surprised when the restore hung mis-way because my pg_xlogs directory ran out of space. 

Is it expected that WAL files are less dense than data files? I understand that they'll include multiple versions of the data, while the data files only include the last one (if recently vacuumed), but it's not like a restore does much besides COPY commands. I also understand that when you specify an archive_timeout value, you might be bloating the WAL files, but again, in the case of a restore, it seems like each WAL file will be full of useful data. So how is it that I can have more WAL bytes than data bytes?

Now, the new server is dedicated to running just this database, so it's not like there was anything else that could be  adding info the the WAL files. But it is a master for a hot standby cluster…. maybe that makes a difference?

Re: WAL file size vs. data file size

From
Tom Lane
Date:
Ben Chobot <bench@silentmedia.com> writes:
> Today I tried to restore a 70GB database with the standard "pg_dump -h old_server <�> | psql -h new_server <�>"
method.I had 100GB set aside for WAL files, which I figured surely would be enough, because all of the data, including
indices,is only 70GB. So I was a bit surprised when the restore hung mis-way because my pg_xlogs directory ran out of
space. 

> Is it expected that WAL files are less dense than data files?

Yes, that's not particularly surprising ... but how come they weren't
getting recycled?  Perhaps you had configured WAL archiving but it was
broken?

            regards, tom lane

Re: WAL file size vs. data file size

From
Ben Chobot
Date:
On Oct 27, 2011, at 8:44 AM, Tom Lane wrote:

> Ben Chobot <bench@silentmedia.com> writes:
>> Today I tried to restore a 70GB database with the standard "pg_dump -h old_server <∑> | psql -h new_server <∑>"
method.I had 100GB set aside for WAL files, which I figured surely would be enough, because all of the data, including
indices,is only 70GB. So I was a bit surprised when the restore hung mis-way because my pg_xlogs directory ran out of
space. 
>
>> Is it expected that WAL files are less dense than data files?
>
> Yes, that's not particularly surprising ... but how come they weren't
> getting recycled?  Perhaps you had configured WAL archiving but it was
> broken?

It's because I'm archiving wal files into Amazon's S3, which is slooooooooooow. PG is recycling as fast as it can, but
whena few MB of COPY rows seem to ballon up to a few hundred MB of WAL files, it has a lot to archive before it can
recycle.It'll be fine for steady state but it looks like it's just going to be a waste for this initial load. 

What's the expected density ratio? I was always under the impression it would be about 1:1 when doing things like COPY,
andhave never seen anything to the contrary.