Thread: Index space growing even after cleanup via autovacuum in Postgres 9.2
Hi, We have a table where we insert about 10 million rows everyday. We keep 14 day's worth of entries (i.e. 140 mil). A scheduled task wakes up every day and deletes all entries past the 14 day window (i.e. deletes entries from the 15th day in the past). We have autovacuum set to trigger when 1% of rows are deleted and that is running fine consistently. However, we are noticing that after autovacuum, our disk space consumption is still increasing and the increase is in the index size (by querying pg_total_relation_size("mytable") and pg_indexes_size("mytable")). In Postgres 9.2.2, doesn't autovacuum cleanup dead indexes and reuse that space too? So, why do we see this not happening? Thanks - tirthankar
Tirthankar Barari <tbarari@verizon.com> writes: > However, we are noticing that after autovacuum, our disk space > consumption is still increasing and the increase is in the index size > (by querying pg_total_relation_size("mytable") and > pg_indexes_size("mytable")). > In Postgres 9.2.2, doesn't autovacuum cleanup dead indexes and reuse > that space too? So, why do we see this not happening? You've not provided enough info to speak in more than generalities, but: whether index space is reclaimable depends a whole lot on your usage patterns. There's provision to recycle pages that've become completely empty, but as long as there's at least one key left on a leaf page, it won't be recycled. So for instance if you have a time-ordered index and you delete all but one out of every hundred entries in time sequence, you're not going to get that space back short of a REINDEX. regards, tom lane
Re: Index space growing even after cleanup via autovacuum in Postgres 9.2
From
Francisco Olarte
Date:
Hi: On Thu, Jan 9, 2014 at 7:50 PM, Tirthankar Barari <tbarari@verizon.com> wrote: > We have a table where we insert about 10 million rows everyday. We keep 14 > day's worth of entries (i.e. 140 mil). A scheduled task wakes up every day > and deletes all entries past the 14 day window (i.e. deletes entries from > the 15th day in the past). Not related to your vacuum problem, but if your pattern is something like deleting everything inserted 15 days ago you may want to think of using partitioning or simple inheritance. Make the scheduled task create a new partition/child table, redirect insertions to it, drop the oldest partition. Also, if the pattern is insert-only all the time, delete whole day, it may be easier to exclude the table from autovacuum and make the scheduled task vacuum the table after deleting. Francisco Olarte.
On Fri, 10 Jan 2014 13:06:21 +0100 Francisco Olarte <folarte@peoplecall.com> wrote: > Hi: > > On Thu, Jan 9, 2014 at 7:50 PM, Tirthankar Barari <tbarari@verizon.com> wrote: > > We have a table where we insert about 10 million rows everyday. We keep 14 > > day's worth of entries (i.e. 140 mil). A scheduled task wakes up every day > > and deletes all entries past the 14 day window (i.e. deletes entries from > > the 15th day in the past). I missed the early part of this thread, so I apologize if I'm repeating anything that was previously stated. In my experience, I have seen that rolling datasets like this are far better maintained with more frequent purging. Most database systems seem to struggle when loading up large amounts of data, then removing large amounts of data. What has helped immensely for me is to increase the frequency of the purging process. i.e. instead of puring once a day, purge once an hour, or possibly even more frequently. Even though the purge happens more often, it's less overhead each time it happens since it's affecting less rows. I have one system that collects about 60 GPS coordinates per second and keeps 2 weeks worth of data -- trial and error has found that the most efficient ongoing maintenance is to purge about every 10 seconds (in the case of this particular system, I actually wrote a program that estimates the current load on the database and purges more or less frequently based on what else is going on, but over the course of a day it averages out to about once very 10 seconds) YMMV, and the suggestion about partitioning is something that might work in your case as well. -- Bill Moran <wmoran@potentialtech.com>
Re: Index space growing even after cleanup via autovacuum in Postgres 9.2
From
Francisco Olarte
Date:
Hi: On Mon, Jan 13, 2014 at 5:26 PM, Tirthankar Barari <tbarari@verizon.com> wrote: > On 01/10/2014 07:06 AM, Francisco Olarte wrote: >> Not related to your vacuum problem, but if your pattern is something >> like deleting everything inserted 15 days ago you may want to think of >> using partitioning or simple inheritance. Make the scheduled task >> create a new partition/child table, redirect insertions to it, drop >> the oldest partition. ... > Thanks for your input. The rentention window was supposed to be variable and > dynamically changeable. So, partitioning is our last resort. Will try the > vacuum after delete instead of autovacuum. The ability to use partition/inheritance does not depend on an static window, but on wether your deletion pattern is as described. Supose you do it daily. You can name your partitions / child_tables as child_YYYYMMDD. Then to delete data that is >N days old you just build the cutoff date, select from the system catalog relations whos name is like child_\d\d\d\d\d\d\d\d , whose name is greater than child_12345678 ( substitute the curoff date ) and whose parent is the appropiate table and drop all of them. If the retention window just grew ( say from 15 to 20 ), the first 5 days you'll find no child table ( same as when starting, this looks like the window grew from 0 to N ). If it shrank from 15 to 10 the first day you'll drop 10 tables. Depending on how you change the retention window you can also delete just the appropiate partition, ignoring error in case it does not exists ( to acomodate window growing cases, you can even use just a drop if exists ) and when the window shrinks you can zap extra tables manually or on the procedure which shrinks the window. The advantage of this is avoiding system catalog query, but I personally would use first alternative. The logic is much the same as a deleting, just using partition drops. Regards. Francisco Olarte.
Re: Index space growing even after cleanup via autovacuum in Postgres 9.2
From
Tirthankar Barari
Date:
On 01/10/2014 07:06 AM, Francisco Olarte wrote: > Hi: > > On Thu, Jan 9, 2014 at 7:50 PM, Tirthankar Barari <tbarari@verizon.com> wrote: >> We have a table where we insert about 10 million rows everyday. We keep 14 >> day's worth of entries (i.e. 140 mil). A scheduled task wakes up every day >> and deletes all entries past the 14 day window (i.e. deletes entries from >> the 15th day in the past). > Not related to your vacuum problem, but if your pattern is something > like deleting everything inserted 15 days ago you may want to think of > using partitioning or simple inheritance. Make the scheduled task > create a new partition/child table, redirect insertions to it, drop > the oldest partition. > > Also, if the pattern is insert-only all the time, delete whole day, it > may be easier to exclude the table from autovacuum and make the > scheduled task vacuum the table after deleting. > > Francisco Olarte. Thanks for your input. The rentention window was supposed to be variable and dynamically changeable. So, partitioning isour last resort. Will try the vacuum after delete instead of autovacuum.
Re: Index space growing even after cleanup via autovacuum in Postgres 9.2
From
Tirthankar Barari
Date:
On 01/13/2014 01:38 PM, Francisco Olarte wrote: > Hi: > > On Mon, Jan 13, 2014 at 5:26 PM, Tirthankar Barari <tbarari@verizon.com> wrote: >> On 01/10/2014 07:06 AM, Francisco Olarte wrote: >>> Not related to your vacuum problem, but if your pattern is something >>> like deleting everything inserted 15 days ago you may want to think of >>> using partitioning or simple inheritance. Make the scheduled task >>> create a new partition/child table, redirect insertions to it, drop >>> the oldest partition. > ... >> Thanks for your input. The rentention window was supposed to be variable and >> dynamically changeable. So, partitioning is our last resort. Will try the >> vacuum after delete instead of autovacuum. > The ability to use partition/inheritance does not depend on an static > window, but on wether your deletion pattern is as described. Supose > you do it daily. You can name your partitions / child_tables as > child_YYYYMMDD. Then to delete data that is >N days old you just build > the cutoff date, select from the system catalog relations whos name is > like child_\d\d\d\d\d\d\d\d , whose name is greater than > child_12345678 ( substitute the curoff date ) and whose parent is the > appropiate table and drop all of them. If the retention window just > grew ( say from 15 to 20 ), the first 5 days you'll find no child > table ( same as when starting, this looks like the window grew from 0 > to N ). If it shrank from 15 to 10 the first day you'll drop 10 > tables. Depending on how you change the retention window you can also > delete just the appropiate partition, ignoring error in case it does > not exists ( to acomodate window growing cases, you can even use just > a drop if exists ) and when the window shrinks you can zap extra > tables manually or on the procedure which shrinks the window. The > advantage of this is avoiding system catalog query, but I personally > would use first alternative. The logic is much the same as a deleting, > just using partition drops. > > Regards. > > Francisco Olarte. > > My tables are: table test_metric ( id varchar(255) not null, // just auto generated uuid from app timestamp timestamp not null, version int4, entity_id varchar(255) not null, primary key (id, timestamp) ); Indexes: "test_metric_pkey1" PRIMARY KEY, btree (id) "test_metric_entity_id_timestamp_key" UNIQUE CONSTRAINT, btree (entity_id, "timestamp") AND table test_metric_metrics ( metric_id varchar(255) not null, metrics float8, metrics_key varchar(255) not null, primary key (metric_id, metrics_key) ); Indexes: "test_metric_metrics_pkey" PRIMARY KEY, btree (metric_id, metrics_key) Foreign-key constraints: "fk3b8e13abb63406d5" FOREIGN KEY (metric_id) REFERENCES test_metric(id) Basically, test_metric holds the timestamp and some metadata and test_metric_metrics holds the set of key/value pairs forthe give entity and timestamp in the parent table. Is it possible to partition the second table by timestamp field from first table? We are using postgres 9.2.2 Thanks, - Tirthankar
Re: Index space growing even after cleanup via autovacuum in Postgres 9.2
From
Francisco Olarte
Date:
On Wed, Jan 15, 2014 at 6:07 PM, Tirthankar Barari <tbarari@verizon.com> wrote: > My tables are: > > table test_metric ( > id varchar(255) not null, // just auto generated uuid from app > timestamp timestamp not null, > version int4, > entity_id varchar(255) not null, > primary key (id, timestamp) > ); > Indexes: > "test_metric_pkey1" PRIMARY KEY, btree (id) > "test_metric_entity_id_timestamp_key" UNIQUE CONSTRAINT, btree > (entity_id, "timestamp") > > AND > > table test_metric_metrics ( > metric_id varchar(255) not null, > metrics float8, > metrics_key varchar(255) not null, > primary key (metric_id, metrics_key) > ); > Indexes: > "test_metric_metrics_pkey" PRIMARY KEY, btree (metric_id, metrics_key) > Foreign-key constraints: > "fk3b8e13abb63406d5" FOREIGN KEY (metric_id) REFERENCES test_metric(id) > > Basically, test_metric holds the timestamp and some metadata and > test_metric_metrics holds the set of key/value pairs for the give entity and > timestamp in the parent table. > > Is it possible to partition the second table by timestamp field from first > table? mmmm... I think not. Although you could copy the timestamp field into the second table, but then things will begin to get hairy, as you'll need to maintain redirection rules for insertions on both tables, and I do not know from memory which are the rules when mixing foreign keys and inheritance, and will need to test them anyway to convince myself it works. To dessign a solution for these, knowledge of the application and the access patterns is needed. I just wanted to point that for maintaining rolling logs of data inheritance plus some logic can greatly enhance your performance, as you do not need deletes, just table droppings which are normally faster. I do it to maintain call records, but I have logic for my particular access pattern, which creates partitions on demand and just uses copy against the relevant partition exploiting timestamp locality and invariance ( cdrs are normally inserted for recent calls, and timestamps do not normally change ) and it has made maintenance much easier ( as I just coalesce old paritions into unheriting archive tables and drop them ). You maybe able to dessign something like this, but if you begin to have complex relatinoships, probably the way to go until you explore the relationships is to just cranck up the deleting / vacuuming frequency ( many small delete / vacuum normally lead to more total time but less impact on db performance, as it has been pointed previously ), and, if the tables use the classical pattern for log tables ( insert at one end, delete at the other, extremely rare updates ) it will probably perform better if excluded from autovacuum and vacuumed explictly after deletion batches, or just periodically ( as inserting does not leave many oportunities to recover space, but you may want to analyze if vaccum is set infrequently ). Francisco Olarte.