Thread: question on error during COPY FROM

question on error during COPY FROM

From
Jerome Wagner
Date:
Hello,

in the documentation I read


COPY stops operation at the first error. This should not lead to problems in the event of a COPY TO, but the target table will already have received earlier rows in a COPY FROM. These rows will not be visible or accessible, but they still occupy disk space. This might amount to a considerable amount of wasted disk space if the failure happened well into a large copy operation. You might wish to invoke VACUUM to recover the wasted space.

does that mean that I should always execute a VACUUM to recover the wasted space when an error is triggered or will the auto-vacuum mechanism do the job by itself ?

Thanks
Jerome

Re: question on error during COPY FROM

From
"Ilya Kazakevich"
Date:
Hi.

>does that mean that I should always execute a VACUUM to recover the
>wasted space when an error is triggered or will the auto-vacuum mechanism
>do the job by itself ?


If you have autovacuum enabled it will clean up tablespace. However, space will not be returned to filesystem but will
bereused by database. 
You may run VACUUM FULL manually to return it to filesystem.

https://www.postgresql.org/docs/9.1/static/routine-vacuuming.html


Ilya Kazakevich

JetBrains
http://www.jetbrains.com
The Drive to Develop



Re: question on error during COPY FROM

From
Francisco Olarte
Date:
On Tue, Aug 23, 2016 at 2:32 PM, Ilya Kazakevich
<Ilya.Kazakevich@jetbrains.com> wrote:
>>does that mean that I should always execute a VACUUM to recover the
>>wasted space when an error is triggered or will the auto-vacuum mechanism
>>do the job by itself ?
> If you have autovacuum enabled it will clean up tablespace. However, space will not be returned to filesystem but
willbe reused by database. 
> You may run VACUUM FULL manually to return it to filesystem.

A normal vacuum may also return some space, specially after a big bulk
load, see second paragraph of 23.1.2 the URL you posted:
> https://www.postgresql.org/docs/9.1/static/routine-vacuuming.html

Where it says "However, it will not return the space to the operating
system, except in the special case where one or more pages at the end
of a table become entirely free and an exclusive table lock can be
easily obtained.". A big aborted bulk load may just fit the case, as
it may put a lot of tuples at new pages at the end and be executed in
a low-load period where the lock is easier to acquire.


Francisco Olarte.


Re: question on error during COPY FROM

From
Rakesh Kumar
Date:
Is it true that one datafile in PG can only belong to one object (table/index)

On Tue, Aug 23, 2016 at 9:55 AM, Francisco Olarte
<folarte@peoplecall.com> wrote:
> On Tue, Aug 23, 2016 at 2:32 PM, Ilya Kazakevich
> <Ilya.Kazakevich@jetbrains.com> wrote:
>>>does that mean that I should always execute a VACUUM to recover the
>>>wasted space when an error is triggered or will the auto-vacuum mechanism
>>>do the job by itself ?
>> If you have autovacuum enabled it will clean up tablespace. However, space will not be returned to filesystem but
willbe reused by database. 
>> You may run VACUUM FULL manually to return it to filesystem.
>
> A normal vacuum may also return some space, specially after a big bulk
> load, see second paragraph of 23.1.2 the URL you posted:
>> https://www.postgresql.org/docs/9.1/static/routine-vacuuming.html
>
> Where it says "However, it will not return the space to the operating
> system, except in the special case where one or more pages at the end
> of a table become entirely free and an exclusive table lock can be
> easily obtained.". A big aborted bulk load may just fit the case, as
> it may put a lot of tuples at new pages at the end and be executed in
> a low-load period where the lock is easier to acquire.
>
>
> Francisco Olarte.
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


Re: question on error during COPY FROM

From
Adrian Klaver
Date:
On 08/23/2016 07:06 AM, Rakesh Kumar wrote:
> Is it true that one datafile in PG can only belong to one object (table/index)

Yes, assuming by datafile you mean an on disk file. Though one object
may have many in disk files associated with it:

https://www.postgresql.org/docs/9.5/static/storage-file-layout.html

Details start just under Table 63-1. Contents of PGDATA


>
> On Tue, Aug 23, 2016 at 9:55 AM, Francisco Olarte
> <folarte@peoplecall.com> wrote:
>> On Tue, Aug 23, 2016 at 2:32 PM, Ilya Kazakevich
>> <Ilya.Kazakevich@jetbrains.com> wrote:
>>>> does that mean that I should always execute a VACUUM to recover the
>>>> wasted space when an error is triggered or will the auto-vacuum mechanism
>>>> do the job by itself ?
>>> If you have autovacuum enabled it will clean up tablespace. However, space will not be returned to filesystem but
willbe reused by database. 
>>> You may run VACUUM FULL manually to return it to filesystem.
>>
>> A normal vacuum may also return some space, specially after a big bulk
>> load, see second paragraph of 23.1.2 the URL you posted:
>>> https://www.postgresql.org/docs/9.1/static/routine-vacuuming.html
>>
>> Where it says "However, it will not return the space to the operating
>> system, except in the special case where one or more pages at the end
>> of a table become entirely free and an exclusive table lock can be
>> easily obtained.". A big aborted bulk load may just fit the case, as
>> it may put a lot of tuples at new pages at the end and be executed in
>> a low-load period where the lock is easier to acquire.
>>
>>
>> Francisco Olarte.
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: question on error during COPY FROM

From
Francisco Olarte
Date:
On Tue, Aug 23, 2016 at 4:06 PM, Rakesh Kumar
<rakeshkumar464a3@gmail.com> wrote:
> Is it true that one datafile in PG can only belong to one object (table/index)

If this is a question, yes, AFAIK ( in fact they are split in 1G
chunks to prevent problems with quirky filesystems ). Search for "Each
table and index is stored in a separate file" in
https://www.postgresql.org/docs/9.5/static/storage-file-layout.html


A full periodic read, even if superficial, of the postgres manual, is
quite beneficial. I feel poetic today ;-> . Seriously, is quite well
written and full of interesting information, even if skipping large
chunks knowing where the info is can sava you a lot of work and mails.
AAMOF, it's one of the main reasons I've been using postgres all this
years.

Francisco Olarte.


Re: question on error during COPY FROM

From
Melvin Davidson
Date:


On Tue, Aug 23, 2016 at 6:04 AM, Jerome Wagner <jerome.wagner@laposte.net> wrote:
Hello,

in the documentation I read


COPY stops operation at the first error. This should not lead to problems in the event of a COPY TO, but the target table will already have received earlier rows in a COPY FROM. These rows will not be visible or accessible, but they still occupy disk space. This might amount to a considerable amount of wasted disk space if the failure happened well into a large copy operation. You might wish to invoke VACUUM to recover the wasted space.

does that mean that I should always execute a VACUUM to recover the wasted space when an error is triggered or will the auto-vacuum mechanism do the job by itself ?

Thanks
Jerome

>but the target table will already have received earlier rows in a COPY FROM
No, that is not what happens. The entire copy is not permanently written to disk until a COMMIT or the entire statement is completed (;). On first error, a ROLLBACK is done so there is no wasted space. That being said, it is always good practice to schedule a VACUUM ANALYZE (but not FULL) in the off hours to insure a cleanup and fresh statistics.


--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.