Thread: COPY TO and VACUUM

COPY TO and VACUUM

From
Roberto Grandi
Date:
Dear All

I'm running Postgres 8.4 on Ubuntu 10.4 Linux server (64bit)
I have a big table tath contains product information: during the day we perform a process that import new product
continouslywith statemtn COPY TO .. from files to this table. 

As result the table disk space is growing fast, it seems that postgres is not able to free space for old rows.

Is it possible to run a specific autovacuum acivity or say to postgres "every time I delete a row, delete it immedialty
anddon't take care of other transactions" ? 

Do you have any suggestion for me?

I'll appreciate every suggeestion you can provide me.

Many thanks in advance

Roberto


Re: COPY TO and VACUUM

From
Kevin Grittner
Date:
Roberto Grandi <roberto.grandi@trovaprezzi.it> wrote:

> I'm running Postgres 8.4 on Ubuntu 10.4 Linux server (64bit)
> I have a big table tath contains product information: during the
> day we perform a process that import new product continously with
> statemtn COPY TO .. from files to this table.
>
> As result the table disk space is growing fast, it seems that
> postgres is not able to free space for old rows.

COPY TO would not free any space.  Is there some other activity you
haven't yet mentioned?

> Is it possible to run a specific autovacuum acivity or say to
> postgres "every time I delete a row, delete it immedialty and
> don't take care of other transactions" ?

You can configure autovacuum to be more aggressive, or you could
run VACUUM statements.

> Do you have any suggestion for me?

8.4 is getting pretty old; there have been a lot of autovacuum
improvements in recent years.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: COPY TO and VACUUM

From
Roberto Grandi
Date:
Hi kevin

first of all thanks for your help. I did a mistake we are using postgres 8.3.

I didn't expect COPY TO frees space but I was wondering Autovacumm delete dead rows as soon as possible, in fact my
scenariois: 

- Delete all products record for a vendor
- Reload all products record (from new listing) for the same vendor.

Obviously we repeat this process continously and table space is growing really fast.

Can you suggest me an approach for autovacuum within this scenario and, if you want, suggest me an appropriate version
ofpostgres that help solving my problem? 

Many thanks in advance again.

BR,
Roberto



----- Messaggio originale -----
Da: "Kevin Grittner" <kgrittn@ymail.com>
A: "Roberto Grandi" <roberto.grandi@trovaprezzi.it>, pgsql-performance@postgresql.org
Inviato: Martedì, 3 settembre 2013 22:34:30
Oggetto: Re: [PERFORM] COPY TO and VACUUM

Roberto Grandi <roberto.grandi@trovaprezzi.it> wrote:

> I'm running Postgres 8.4 on Ubuntu 10.4 Linux server (64bit)
> I have a big table tath contains product information: during the
> day we perform a process that import new product continously with
> statemtn COPY TO .. from files to this table.
>
> As result the table disk space is growing fast, it seems that
> postgres is not able to free space for old rows.

COPY TO would not free any space.  Is there some other activity you
haven't yet mentioned?

> Is it possible to run a specific autovacuum acivity or say to
> postgres "every time I delete a row, delete it immedialty and
> don't take care of other transactions" ?

You can configure autovacuum to be more aggressive, or you could
run VACUUM statements.

> Do you have any suggestion for me?

8.4 is getting pretty old; there have been a lot of autovacuum
improvements in recent years.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: COPY TO and VACUUM

From
Kevin Grittner
Date:
Roberto Grandi <roberto.grandi@trovaprezzi.it> wrote:

> we are using postgres 8.3.

> my scenario is:
>
> - Delete all products record for a vendor
> - Reload all products record (from new listing) for the same
>   vendor.
>
> Obviously we repeat this process continously and table space is
> growing really fast.
>
> Can you suggest me an approach for autovacuum within this
> scenario and, if you want, suggest me an appropriate version of
> postgres that help solving my problem?

At this point I would recommend the latest minor release of 9.2 for
production use.  If you were early in a development cycle I would
suggest considering the soon-to-be-released 9.3.0.  Be sure to stay
current on minor releases.

http://www.postgresql.org/support/versioning/

If your table space is growing fast with this usage pattern, it
suggests that autovacuum is not configured to be aggressive enough.
My suggestions:

Make sure autovacuum is on.

Decrease autovacuum_naptime to 15s, so that it will notice deletes
sooner.

You could consider reducing autovacuum_scale_factor below the
default of 0.2 so that it triggers based on fewer deletes.

You should probably set autovacuum_vacuum_cost_limit to 400 and
incrementally increase it until autovacuum is able to keep up with
the activity you describe.  It defaults to 200 and I have had to
set it to 650 on some systems to allow it to keep up.  It wouldn't
be surprising if some systems need a higher setting.  Higher
settings may cause autovacuum activity to have a more noticeable
impact on foreground processes; but if it is too low, you will
develop bloat which will harm performance and eat disk space.

If all autovacuum workers are sometimes busy with big tables for
extended periods and you see other tables neglected for too long,
you should boost autovacuum_max_workers until that problem is
solved.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: COPY TO and VACUUM

From
Jeff Janes
Date:
On Tue, Sep 3, 2013 at 11:15 PM, Roberto Grandi
<roberto.grandi@trovaprezzi.it> wrote:
> Hi kevin
>
> first of all thanks for your help. I did a mistake we are using postgres 8.3.
>
> I didn't expect COPY TO frees space but I was wondering Autovacumm delete dead rows as soon as possible, in fact my
scenariois: 
>
> - Delete all products record for a vendor
> - Reload all products record (from new listing) for the same vendor.
>
> Obviously we repeat this process continously and table space is growing really fast.

It isn't obvious to me why you would do this continuously.  Surely
your vendors don't change their catalogs exactly as fast as your
database can digest them!

In any event, I'd probably just incorporate a manual vacuum statement
into the delete/reload cycle.  Since delete and copy are not
throttled, while autovacuum is throttled by default to a rather low
level, it is quite possible that default autovacuum can never keep up
with the workload you are generating.  Rather than trying to tune
autovacuum to fit this special case, it would be easier to just throw
in some manual vacuuming.  (Not instead of autovac, just as a
supplement to it)

Cheers,

Jeff


Re: COPY TO and VACUUM

From
Roberto Grandi
Date:
Hi Jeff,

the proble is that when continously updloading vendors listing on our "big" table the autovacuum is not able to free
spaceas we would. 
Secondarly, if we launch a Vacuum after each "upload" we collide with other upload taht are running in parallel.

Is it possible, form your point of view, working with isolation levels or table partitioning to minimize table space
growing?
Thanks again for all your help.

BR,
Roberto

----- Messaggio originale -----
Da: "Jeff Janes" <jeff.janes@gmail.com>
A: "Roberto Grandi" <roberto.grandi@trovaprezzi.it>
Cc: "Kevin Grittner" <kgrittn@ymail.com>, pgsql-performance@postgresql.org
Inviato: Mercoledì, 4 settembre 2013 18:29:13
Oggetto: Re: [PERFORM] COPY TO and VACUUM

On Tue, Sep 3, 2013 at 11:15 PM, Roberto Grandi
<roberto.grandi@trovaprezzi.it> wrote:
> Hi kevin
>
> first of all thanks for your help. I did a mistake we are using postgres 8.3.
>
> I didn't expect COPY TO frees space but I was wondering Autovacumm delete dead rows as soon as possible, in fact my
scenariois: 
>
> - Delete all products record for a vendor
> - Reload all products record (from new listing) for the same vendor.
>
> Obviously we repeat this process continously and table space is growing really fast.

It isn't obvious to me why you would do this continuously.  Surely
your vendors don't change their catalogs exactly as fast as your
database can digest them!

In any event, I'd probably just incorporate a manual vacuum statement
into the delete/reload cycle.  Since delete and copy are not
throttled, while autovacuum is throttled by default to a rather low
level, it is quite possible that default autovacuum can never keep up
with the workload you are generating.  Rather than trying to tune
autovacuum to fit this special case, it would be easier to just throw
in some manual vacuuming.  (Not instead of autovac, just as a
supplement to it)

Cheers,

Jeff


Re: COPY TO and VACUUM

From
David Kerr
Date:
Hi Roberto,

Yes you could partition by vendor and then truncate the partition before loading.

Truncate reclaims space immediately and is generally much faster than delete.


On Thu, Sep 05, 2013 at 06:05:08PM +0200, Roberto Grandi wrote:
- Hi Jeff,
-
- the proble is that when continously updloading vendors listing on our "big" table the autovacuum is not able to free
spaceas we would. 
- Secondarly, if we launch a Vacuum after each "upload" we collide with other upload taht are running in parallel.
-
- Is it possible, form your point of view, working with isolation levels or table partitioning to minimize table space
growing?
- Thanks again for all your help.
-
- BR,
- Roberto
-
- ----- Messaggio originale -----
- Da: "Jeff Janes" <jeff.janes@gmail.com>
- A: "Roberto Grandi" <roberto.grandi@trovaprezzi.it>
- Cc: "Kevin Grittner" <kgrittn@ymail.com>, pgsql-performance@postgresql.org
- Inviato: Mercoledì, 4 settembre 2013 18:29:13
- Oggetto: Re: [PERFORM] COPY TO and VACUUM
-
- On Tue, Sep 3, 2013 at 11:15 PM, Roberto Grandi
- <roberto.grandi@trovaprezzi.it> wrote:
- > Hi kevin
- >
- > first of all thanks for your help. I did a mistake we are using postgres 8.3.
- >
- > I didn't expect COPY TO frees space but I was wondering Autovacumm delete dead rows as soon as possible, in fact my
scenariois: 
- >
- > - Delete all products record for a vendor
- > - Reload all products record (from new listing) for the same vendor.
- >
- > Obviously we repeat this process continously and table space is growing really fast.
-
- It isn't obvious to me why you would do this continuously.  Surely
- your vendors don't change their catalogs exactly as fast as your
- database can digest them!
-
- In any event, I'd probably just incorporate a manual vacuum statement
- into the delete/reload cycle.  Since delete and copy are not
- throttled, while autovacuum is throttled by default to a rather low
- level, it is quite possible that default autovacuum can never keep up
- with the workload you are generating.  Rather than trying to tune
- autovacuum to fit this special case, it would be easier to just throw
- in some manual vacuuming.  (Not instead of autovac, just as a
- supplement to it)
-
- Cheers,
-
- Jeff
-
-
- --
- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
- To make changes to your subscription:
- http://www.postgresql.org/mailpref/pgsql-performance


Re: COPY TO and VACUUM

From
Jeff Janes
Date:
On Thu, Sep 5, 2013 at 9:05 AM, Roberto Grandi
<roberto.grandi@trovaprezzi.it> wrote:
> Hi Jeff,
>
> the proble is that when continously updloading vendors listing on our "big" table the autovacuum is not able to free
spaceas we would. 

It might not be able to free it (to be reused) as fast as you need it
to, but it should be freeing it eventually.

> Secondarly, if we launch a Vacuum after each "upload" we collide with other upload taht are running in parallel.

I wouldn't do a manual vacuum after *each* upload.  Doing one after
every Nth upload, where N is estimated to make up about 1/5 of the
table, should be good.  You are probably IO limited, so you probably
don't gain much by running these uploads in parallel, I would try to
avoid that.  But in any case, there shouldn't be a collision between
manual vacuum and a concurrent upload.  There would be one between two
manual vacuums but you could code around that by explicitly locking
the table in the correct mode nowait or with a timeout, and skipping
the vacuum if it can't get the lock.

>
> Is it possible, form your point of view, working with isolation levels or table partitioning to minimize table space
growing?

Partitioning by vendor might work well for that purpose.

Cheers,

Jeff


Re: COPY TO and VACUUM

From
Roberto Grandi
Date:
Hi Guys,

we found a suitable solution for our process we run every 5-6 hours a CLUSTER stement for our big table: this "lock"
activitiesbut allow us to recover all available space. 

When testing this task we discover another issues and that's why I'm coming back to you for your experience:

duting our process we run multiple simoultaneously "COPY... FROM" in order to load data into our table but a t the same
timewe run also "COPY ... TO" statement in parallel to export data for other clients. 

We found that COPY .. TO queries sometimes are pending for more than 100 minutes and the destination file continues to
beat 0 Kb. Can you advise me how to solve this issue? 
Is it here a best way to bulk download data avoiding any kind of block when running in parallel?

Many thanks in advance


----- Messaggio originale -----
Da: "Jeff Janes" <jeff.janes@gmail.com>
A: "Roberto Grandi" <roberto.grandi@trovaprezzi.it>
Cc: "Kevin Grittner" <kgrittn@ymail.com>, pgsql-performance@postgresql.org
Inviato: Giovedì, 5 settembre 2013 20:14:26
Oggetto: Re: [PERFORM] COPY TO and VACUUM

On Thu, Sep 5, 2013 at 9:05 AM, Roberto Grandi
<roberto.grandi@trovaprezzi.it> wrote:
> Hi Jeff,
>
> the proble is that when continously updloading vendors listing on our "big" table the autovacuum is not able to free
spaceas we would. 

It might not be able to free it (to be reused) as fast as you need it
to, but it should be freeing it eventually.

> Secondarly, if we launch a Vacuum after each "upload" we collide with other upload taht are running in parallel.

I wouldn't do a manual vacuum after *each* upload.  Doing one after
every Nth upload, where N is estimated to make up about 1/5 of the
table, should be good.  You are probably IO limited, so you probably
don't gain much by running these uploads in parallel, I would try to
avoid that.  But in any case, there shouldn't be a collision between
manual vacuum and a concurrent upload.  There would be one between two
manual vacuums but you could code around that by explicitly locking
the table in the correct mode nowait or with a timeout, and skipping
the vacuum if it can't get the lock.

>
> Is it possible, form your point of view, working with isolation levels or table partitioning to minimize table space
growing?

Partitioning by vendor might work well for that purpose.

Cheers,

Jeff


Re: COPY TO and VACUUM

From
Jeff Janes
Date:
On Wed, Sep 11, 2013 at 11:14 PM, Roberto Grandi <roberto.grandi@trovaprezzi.it> wrote:
Hi Guys,

we found a suitable solution for our process we run every 5-6 hours a CLUSTER stement for our big table: this "lock" activities but allow us to recover all available space.


If you can tolerate the locks, that is fine, but it just doesn't seem like this should be necessary.  A manual vacuum should get the job done with weaker locking.  Did you try running a manual vacuum every 5-6 hours instead (it would probably not reclaim the space, but would make it available for reuse and so cap the steady-state size of the file, hopefully to about the same size as the max size under the CLUSTER regime)


When testing this task we discover another issues and that's why I'm coming back to you for your experience:

duting our process we run multiple simoultaneously "COPY... FROM" in order to load data into our table but a t the same time we run also "COPY ... TO" statement in parallel to export data for other clients.

We found that COPY .. TO queries sometimes are pending for more than 100 minutes and the destination file continues to be at 0 Kb. Can you advise me how to solve this issue?

Are your COPY ... FROM also blocking, just in a way you are not detecting (because there is no growing file to watch the size of)?  What does pg_lock say?

Cheers,

Jeff

Re: COPY TO and VACUUM

From
Roberto Grandi
Date:
Hi Jeff,

thanks for your suggestion, Well test Vacuum instead of Cluster and come back with live result.
at the same time i discovered that our COPY (...) TO are really really slow, I see 0Kb a t the beginning but at the end
theygrow by 4Kb each second. 
Our export is standard (i.e.: SELECT a, b, c FROM table1) but sometime it's very slow, what could be your suggestion?
Isit possible to detect if we are facing problem on IO or Linux systemItself? 

Many thank in advance for all your help.
Regards,

Roberto


----- Messaggio originale -----
Da: "Jeff Janes" <jeff.janes@gmail.com>
A: "Roberto Grandi" <roberto.grandi@trovaprezzi.it>
Cc: pgsql-performance@postgresql.org, "Kevin Grittner" <kgrittn@ymail.com>
Inviato: Lunedì, 16 settembre 2013 2:18:44
Oggetto: Re: [PERFORM] COPY TO and VACUUM

On Wed, Sep 11, 2013 at 11:14 PM, Roberto Grandi <
roberto.grandi@trovaprezzi.it> wrote:

> Hi Guys,
>
> we found a suitable solution for our process we run every 5-6 hours a
> CLUSTER stement for our big table: this "lock" activities but allow us to
> recover all available space.
>


If you can tolerate the locks, that is fine, but it just doesn't seem like
this should be necessary.  A manual vacuum should get the job done with
weaker locking.  Did you try running a manual vacuum every 5-6 hours
instead (it would probably not reclaim the space, but would make it
available for reuse and so cap the steady-state size of the file, hopefully
to about the same size as the max size under the CLUSTER regime)


> When testing this task we discover another issues and that's why I'm
> coming back to you for your experience:
>
> duting our process we run multiple simoultaneously "COPY... FROM" in order
> to load data into our table but a t the same time we run also "COPY ... TO"
> statement in parallel to export data for other clients.
>
> We found that COPY .. TO queries sometimes are pending for more than 100
> minutes and the destination file continues to be at 0 Kb. Can you advise me
> how to solve this issue?
>

Are your COPY ... FROM also blocking, just in a way you are not detecting
(because there is no growing file to watch the size of)?  What does pg_lock
say?

Cheers,

Jeff