Thread: COPY TO and VACUUM
Dear All I'm running Postgres 8.4 on Ubuntu 10.4 Linux server (64bit) I have a big table tath contains product information: during the day we perform a process that import new product continouslywith statemtn COPY TO .. from files to this table. As result the table disk space is growing fast, it seems that postgres is not able to free space for old rows. Is it possible to run a specific autovacuum acivity or say to postgres "every time I delete a row, delete it immedialty anddon't take care of other transactions" ? Do you have any suggestion for me? I'll appreciate every suggeestion you can provide me. Many thanks in advance Roberto
Roberto Grandi <roberto.grandi@trovaprezzi.it> wrote: > I'm running Postgres 8.4 on Ubuntu 10.4 Linux server (64bit) > I have a big table tath contains product information: during the > day we perform a process that import new product continously with > statemtn COPY TO .. from files to this table. > > As result the table disk space is growing fast, it seems that > postgres is not able to free space for old rows. COPY TO would not free any space. Is there some other activity you haven't yet mentioned? > Is it possible to run a specific autovacuum acivity or say to > postgres "every time I delete a row, delete it immedialty and > don't take care of other transactions" ? You can configure autovacuum to be more aggressive, or you could run VACUUM statements. > Do you have any suggestion for me? 8.4 is getting pretty old; there have been a lot of autovacuum improvements in recent years. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi kevin first of all thanks for your help. I did a mistake we are using postgres 8.3. I didn't expect COPY TO frees space but I was wondering Autovacumm delete dead rows as soon as possible, in fact my scenariois: - Delete all products record for a vendor - Reload all products record (from new listing) for the same vendor. Obviously we repeat this process continously and table space is growing really fast. Can you suggest me an approach for autovacuum within this scenario and, if you want, suggest me an appropriate version ofpostgres that help solving my problem? Many thanks in advance again. BR, Roberto ----- Messaggio originale ----- Da: "Kevin Grittner" <kgrittn@ymail.com> A: "Roberto Grandi" <roberto.grandi@trovaprezzi.it>, pgsql-performance@postgresql.org Inviato: Martedì, 3 settembre 2013 22:34:30 Oggetto: Re: [PERFORM] COPY TO and VACUUM Roberto Grandi <roberto.grandi@trovaprezzi.it> wrote: > I'm running Postgres 8.4 on Ubuntu 10.4 Linux server (64bit) > I have a big table tath contains product information: during the > day we perform a process that import new product continously with > statemtn COPY TO .. from files to this table. > > As result the table disk space is growing fast, it seems that > postgres is not able to free space for old rows. COPY TO would not free any space. Is there some other activity you haven't yet mentioned? > Is it possible to run a specific autovacuum acivity or say to > postgres "every time I delete a row, delete it immedialty and > don't take care of other transactions" ? You can configure autovacuum to be more aggressive, or you could run VACUUM statements. > Do you have any suggestion for me? 8.4 is getting pretty old; there have been a lot of autovacuum improvements in recent years. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Roberto Grandi <roberto.grandi@trovaprezzi.it> wrote: > we are using postgres 8.3. > my scenario is: > > - Delete all products record for a vendor > - Reload all products record (from new listing) for the same > vendor. > > Obviously we repeat this process continously and table space is > growing really fast. > > Can you suggest me an approach for autovacuum within this > scenario and, if you want, suggest me an appropriate version of > postgres that help solving my problem? At this point I would recommend the latest minor release of 9.2 for production use. If you were early in a development cycle I would suggest considering the soon-to-be-released 9.3.0. Be sure to stay current on minor releases. http://www.postgresql.org/support/versioning/ If your table space is growing fast with this usage pattern, it suggests that autovacuum is not configured to be aggressive enough. My suggestions: Make sure autovacuum is on. Decrease autovacuum_naptime to 15s, so that it will notice deletes sooner. You could consider reducing autovacuum_scale_factor below the default of 0.2 so that it triggers based on fewer deletes. You should probably set autovacuum_vacuum_cost_limit to 400 and incrementally increase it until autovacuum is able to keep up with the activity you describe. It defaults to 200 and I have had to set it to 650 on some systems to allow it to keep up. It wouldn't be surprising if some systems need a higher setting. Higher settings may cause autovacuum activity to have a more noticeable impact on foreground processes; but if it is too low, you will develop bloat which will harm performance and eat disk space. If all autovacuum workers are sometimes busy with big tables for extended periods and you see other tables neglected for too long, you should boost autovacuum_max_workers until that problem is solved. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Sep 3, 2013 at 11:15 PM, Roberto Grandi <roberto.grandi@trovaprezzi.it> wrote: > Hi kevin > > first of all thanks for your help. I did a mistake we are using postgres 8.3. > > I didn't expect COPY TO frees space but I was wondering Autovacumm delete dead rows as soon as possible, in fact my scenariois: > > - Delete all products record for a vendor > - Reload all products record (from new listing) for the same vendor. > > Obviously we repeat this process continously and table space is growing really fast. It isn't obvious to me why you would do this continuously. Surely your vendors don't change their catalogs exactly as fast as your database can digest them! In any event, I'd probably just incorporate a manual vacuum statement into the delete/reload cycle. Since delete and copy are not throttled, while autovacuum is throttled by default to a rather low level, it is quite possible that default autovacuum can never keep up with the workload you are generating. Rather than trying to tune autovacuum to fit this special case, it would be easier to just throw in some manual vacuuming. (Not instead of autovac, just as a supplement to it) Cheers, Jeff
Hi Jeff, the proble is that when continously updloading vendors listing on our "big" table the autovacuum is not able to free spaceas we would. Secondarly, if we launch a Vacuum after each "upload" we collide with other upload taht are running in parallel. Is it possible, form your point of view, working with isolation levels or table partitioning to minimize table space growing? Thanks again for all your help. BR, Roberto ----- Messaggio originale ----- Da: "Jeff Janes" <jeff.janes@gmail.com> A: "Roberto Grandi" <roberto.grandi@trovaprezzi.it> Cc: "Kevin Grittner" <kgrittn@ymail.com>, pgsql-performance@postgresql.org Inviato: Mercoledì, 4 settembre 2013 18:29:13 Oggetto: Re: [PERFORM] COPY TO and VACUUM On Tue, Sep 3, 2013 at 11:15 PM, Roberto Grandi <roberto.grandi@trovaprezzi.it> wrote: > Hi kevin > > first of all thanks for your help. I did a mistake we are using postgres 8.3. > > I didn't expect COPY TO frees space but I was wondering Autovacumm delete dead rows as soon as possible, in fact my scenariois: > > - Delete all products record for a vendor > - Reload all products record (from new listing) for the same vendor. > > Obviously we repeat this process continously and table space is growing really fast. It isn't obvious to me why you would do this continuously. Surely your vendors don't change their catalogs exactly as fast as your database can digest them! In any event, I'd probably just incorporate a manual vacuum statement into the delete/reload cycle. Since delete and copy are not throttled, while autovacuum is throttled by default to a rather low level, it is quite possible that default autovacuum can never keep up with the workload you are generating. Rather than trying to tune autovacuum to fit this special case, it would be easier to just throw in some manual vacuuming. (Not instead of autovac, just as a supplement to it) Cheers, Jeff
Hi Roberto, Yes you could partition by vendor and then truncate the partition before loading. Truncate reclaims space immediately and is generally much faster than delete. On Thu, Sep 05, 2013 at 06:05:08PM +0200, Roberto Grandi wrote: - Hi Jeff, - - the proble is that when continously updloading vendors listing on our "big" table the autovacuum is not able to free spaceas we would. - Secondarly, if we launch a Vacuum after each "upload" we collide with other upload taht are running in parallel. - - Is it possible, form your point of view, working with isolation levels or table partitioning to minimize table space growing? - Thanks again for all your help. - - BR, - Roberto - - ----- Messaggio originale ----- - Da: "Jeff Janes" <jeff.janes@gmail.com> - A: "Roberto Grandi" <roberto.grandi@trovaprezzi.it> - Cc: "Kevin Grittner" <kgrittn@ymail.com>, pgsql-performance@postgresql.org - Inviato: Mercoledì, 4 settembre 2013 18:29:13 - Oggetto: Re: [PERFORM] COPY TO and VACUUM - - On Tue, Sep 3, 2013 at 11:15 PM, Roberto Grandi - <roberto.grandi@trovaprezzi.it> wrote: - > Hi kevin - > - > first of all thanks for your help. I did a mistake we are using postgres 8.3. - > - > I didn't expect COPY TO frees space but I was wondering Autovacumm delete dead rows as soon as possible, in fact my scenariois: - > - > - Delete all products record for a vendor - > - Reload all products record (from new listing) for the same vendor. - > - > Obviously we repeat this process continously and table space is growing really fast. - - It isn't obvious to me why you would do this continuously. Surely - your vendors don't change their catalogs exactly as fast as your - database can digest them! - - In any event, I'd probably just incorporate a manual vacuum statement - into the delete/reload cycle. Since delete and copy are not - throttled, while autovacuum is throttled by default to a rather low - level, it is quite possible that default autovacuum can never keep up - with the workload you are generating. Rather than trying to tune - autovacuum to fit this special case, it would be easier to just throw - in some manual vacuuming. (Not instead of autovac, just as a - supplement to it) - - Cheers, - - Jeff - - - -- - Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) - To make changes to your subscription: - http://www.postgresql.org/mailpref/pgsql-performance
On Thu, Sep 5, 2013 at 9:05 AM, Roberto Grandi <roberto.grandi@trovaprezzi.it> wrote: > Hi Jeff, > > the proble is that when continously updloading vendors listing on our "big" table the autovacuum is not able to free spaceas we would. It might not be able to free it (to be reused) as fast as you need it to, but it should be freeing it eventually. > Secondarly, if we launch a Vacuum after each "upload" we collide with other upload taht are running in parallel. I wouldn't do a manual vacuum after *each* upload. Doing one after every Nth upload, where N is estimated to make up about 1/5 of the table, should be good. You are probably IO limited, so you probably don't gain much by running these uploads in parallel, I would try to avoid that. But in any case, there shouldn't be a collision between manual vacuum and a concurrent upload. There would be one between two manual vacuums but you could code around that by explicitly locking the table in the correct mode nowait or with a timeout, and skipping the vacuum if it can't get the lock. > > Is it possible, form your point of view, working with isolation levels or table partitioning to minimize table space growing? Partitioning by vendor might work well for that purpose. Cheers, Jeff
Hi Guys, we found a suitable solution for our process we run every 5-6 hours a CLUSTER stement for our big table: this "lock" activitiesbut allow us to recover all available space. When testing this task we discover another issues and that's why I'm coming back to you for your experience: duting our process we run multiple simoultaneously "COPY... FROM" in order to load data into our table but a t the same timewe run also "COPY ... TO" statement in parallel to export data for other clients. We found that COPY .. TO queries sometimes are pending for more than 100 minutes and the destination file continues to beat 0 Kb. Can you advise me how to solve this issue? Is it here a best way to bulk download data avoiding any kind of block when running in parallel? Many thanks in advance ----- Messaggio originale ----- Da: "Jeff Janes" <jeff.janes@gmail.com> A: "Roberto Grandi" <roberto.grandi@trovaprezzi.it> Cc: "Kevin Grittner" <kgrittn@ymail.com>, pgsql-performance@postgresql.org Inviato: Giovedì, 5 settembre 2013 20:14:26 Oggetto: Re: [PERFORM] COPY TO and VACUUM On Thu, Sep 5, 2013 at 9:05 AM, Roberto Grandi <roberto.grandi@trovaprezzi.it> wrote: > Hi Jeff, > > the proble is that when continously updloading vendors listing on our "big" table the autovacuum is not able to free spaceas we would. It might not be able to free it (to be reused) as fast as you need it to, but it should be freeing it eventually. > Secondarly, if we launch a Vacuum after each "upload" we collide with other upload taht are running in parallel. I wouldn't do a manual vacuum after *each* upload. Doing one after every Nth upload, where N is estimated to make up about 1/5 of the table, should be good. You are probably IO limited, so you probably don't gain much by running these uploads in parallel, I would try to avoid that. But in any case, there shouldn't be a collision between manual vacuum and a concurrent upload. There would be one between two manual vacuums but you could code around that by explicitly locking the table in the correct mode nowait or with a timeout, and skipping the vacuum if it can't get the lock. > > Is it possible, form your point of view, working with isolation levels or table partitioning to minimize table space growing? Partitioning by vendor might work well for that purpose. Cheers, Jeff
On Wed, Sep 11, 2013 at 11:14 PM, Roberto Grandi <roberto.grandi@trovaprezzi.it> wrote:
Hi Guys,
we found a suitable solution for our process we run every 5-6 hours a CLUSTER stement for our big table: this "lock" activities but allow us to recover all available space.
If you can tolerate the locks, that is fine, but it just doesn't seem like this should be necessary. A manual vacuum should get the job done with weaker locking. Did you try running a manual vacuum every 5-6 hours instead (it would probably not reclaim the space, but would make it available for reuse and so cap the steady-state size of the file, hopefully to about the same size as the max size under the CLUSTER regime)
When testing this task we discover another issues and that's why I'm coming back to you for your experience:
duting our process we run multiple simoultaneously "COPY... FROM" in order to load data into our table but a t the same time we run also "COPY ... TO" statement in parallel to export data for other clients.
We found that COPY .. TO queries sometimes are pending for more than 100 minutes and the destination file continues to be at 0 Kb. Can you advise me how to solve this issue?
Are your COPY ... FROM also blocking, just in a way you are not detecting (because there is no growing file to watch the size of)? What does pg_lock say?
Cheers,
Jeff
Hi Jeff, thanks for your suggestion, Well test Vacuum instead of Cluster and come back with live result. at the same time i discovered that our COPY (...) TO are really really slow, I see 0Kb a t the beginning but at the end theygrow by 4Kb each second. Our export is standard (i.e.: SELECT a, b, c FROM table1) but sometime it's very slow, what could be your suggestion? Isit possible to detect if we are facing problem on IO or Linux systemItself? Many thank in advance for all your help. Regards, Roberto ----- Messaggio originale ----- Da: "Jeff Janes" <jeff.janes@gmail.com> A: "Roberto Grandi" <roberto.grandi@trovaprezzi.it> Cc: pgsql-performance@postgresql.org, "Kevin Grittner" <kgrittn@ymail.com> Inviato: Lunedì, 16 settembre 2013 2:18:44 Oggetto: Re: [PERFORM] COPY TO and VACUUM On Wed, Sep 11, 2013 at 11:14 PM, Roberto Grandi < roberto.grandi@trovaprezzi.it> wrote: > Hi Guys, > > we found a suitable solution for our process we run every 5-6 hours a > CLUSTER stement for our big table: this "lock" activities but allow us to > recover all available space. > If you can tolerate the locks, that is fine, but it just doesn't seem like this should be necessary. A manual vacuum should get the job done with weaker locking. Did you try running a manual vacuum every 5-6 hours instead (it would probably not reclaim the space, but would make it available for reuse and so cap the steady-state size of the file, hopefully to about the same size as the max size under the CLUSTER regime) > When testing this task we discover another issues and that's why I'm > coming back to you for your experience: > > duting our process we run multiple simoultaneously "COPY... FROM" in order > to load data into our table but a t the same time we run also "COPY ... TO" > statement in parallel to export data for other clients. > > We found that COPY .. TO queries sometimes are pending for more than 100 > minutes and the destination file continues to be at 0 Kb. Can you advise me > how to solve this issue? > Are your COPY ... FROM also blocking, just in a way you are not detecting (because there is no growing file to watch the size of)? What does pg_lock say? Cheers, Jeff