Thread: Re: Need help in reclaiming disk space by deleting the selected records

Re: Need help in reclaiming disk space by deleting the selected records

From
"Yelai, Ramkumar IN BLR STS"
Date:
Thanks Steve and Andres,
 
I read these articles
 
and I have some more questions on the Steve comments.
 
  1. Do I need run REINDEX to reduce space or auto vacuum will handle re indexing?
  2. Cluster, Re index and Vacuum full locks the table, Hence do we need to avoid database operations ( select, delete, insert ) while doing disk clean up? Just curious what if I keep inserting while running this command?
  3. All the three commands needs some additional space to do this operation? Am I correct?
  4. Would all database server ( oracle, sqlserver and mysql ) needs downtime while doing disk clean up?
  5. I am very happy to use Truncate and table partitioning, it is satisfying my requirements. But in order to achieve this, for 10 years ( currently 6 unique archiving tables I have ) I have to create 1440 month tables. Will it creates any issue and is there anything I need to consider carefully while doing this?
 
Thanks & Regards,
Ramkumar
_____________________________________________
From: Yelai, Ramkumar IN BLR STS
Sent: Thursday, September 13, 2012 7:03 PM
To: 'pgsql-general@postgresql.org'
Subject: Need help in reclaiming disk space by deleting the selected records
 
 
Hi All,
 
I am a beginner in Postgresql and Databases. I have a requirement that reclaiming disk space by deleting the rows in a selected time span.  I went through the documents and articles to know how to get the table size (http://wiki.postgresql.org/wiki/Disk_Usage
 
But before let the user delete, I have to show the size of the records size in the selected time span.  But here I don’t know how to calculate the selected records size. 
In addition to this, I assume that after deleted the records I have to run VACUUM FULL command to reclaiming the space( Please correct me if I am wrong or let me know the best approach) .
 
The table looks like this
 
CREATE TABLE IF NOT EXISTS "SN_SamplTable"
(
  "ID" integer NOT NULL,
  “Data” integer,
  "CLIENT_COUNT_TIMESTAMP" timestamp without time zone
);
 
Please help me to how to proceed on this.
 
Thanks & Regards,
Ramkumar.
 
 

Re: Re: Need help in reclaiming disk space by deleting the selected records

From
"Albe Laurenz"
Date:
Yelai, Ramkumar wrote:
> 1.    Do I need run REINDEX to reduce space or auto vacuum will handle
re indexing?

Autovacuum will not rebuild the index as REINDEX does.
It will only free index entries tht can be reused later.

> 2.    Cluster, Re index and Vacuum full locks the table, Hence do we
need to avoid database operations
> ( select, delete, insert ) while doing disk clean up? Just curious
what if I keep inserting while
> running this command?

The INSERT would get blocked until the CLUSTER, REINDEX or VACUUM FULL
is done.

> 3.    All the three commands needs some additional space to do this
operation? Am I correct?

Yes.

> 4.    Would all database server ( oracle, sqlserver and mysql ) needs
downtime while doing disk clean
> up?

"Disk cleanup" is a very vague term.
All database management systems handle these things differently.

Note that PostgreSQL normally does not need any downtime if the
vacuum strategy is right.

> 5.    I am very happy to use Truncate and table partitioning, it is
satisfying my requirements. But in
> order to achieve this, for 10 years ( currently 6 unique archiving
tables I have ) I have to create
> 1440 month tables. Will it creates any issue and is there anything I
need to consider carefully while
> doing this?

Anything exceeding a few hundred partitions is not considered a good
idea.
The system needs to keep track of all the tables, and query planning
for such a partitioned table might be expensive.

1440 is probably pushing the limits, but maybe somebody with more
experience can say more.

Yours,
Laurenz Albe


Re: Re: Need help in reclaiming disk space by deleting the selected records

From
"Albe Laurenz"
Date:
Yelai, Ramkumar wrote:
>> Anything exceeding a few hundred partitions is not considered a good
idea.
>> The system needs to keep track of all the tables, and query planning
for such a partitioned table
>> might be expensive.
>>
>> 1440 is probably pushing the limits, but maybe somebody with more
experience can say more.

> By mistake I added 1440 tables, but it is incorrect, below is the
total number of tables
>
> 7 base tables X 120 months = 840 child tables.
>
> As per your statement, If I create these many table then it will
affect the performance. But as per
> the document
(http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html)
> constraint_exclusion will improve query performance. Please clarify me
here how query planning will be
> expensive?

The planner will have to decide which of the 840 tables
to access.

> I have one more strategy that instead of creating 10 years, I'd like
to use batch processing like
> create 2 years of tables ( 240 tables ) and when we are going above 2
years we will create next 2 year
> table and update the trigger or use 5 years (480 tables ) instead of 2
years.
>
> The above approach will not create a more partitioned table and if
user wants space they can truncate
> the old tables. Please let me know is this good approach?

I don't understand that in detail.

I would recommend that you prototype some of these variants
and run some performance tests.  That's the only good way to
know what will perform well in your environment.

Yours,
Laurenz Albe



Re: Re: Need help in reclaiming disk space by deleting the selected records

From
John R Pierce
Date:
On 09/28/12 12:36 AM, Albe Laurenz wrote:
> Yelai, Ramkumar wrote:
> >7 base tables X 120 months = 840 child tables. As per your statement, If I create these many table then it will
affectthe performance. But as per the document (http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html)
constraint_exclusionwill improve query performance. Please clarify me here how query planning will be expensive? 
> The planner will have to decide which of the 840 tables
> to access.
>


well, really, which of the 120 tables for a given base table.  he's got
7 different base tables.   120 partitions is still too many.

if I was partioning a table for 2 year retention, I'd probably do it by
month, or even quarter, so there would be 24 or 8 child tables for 2
years.    we use week tables for 6 months, but almost all of our
activity is to the latest week and the one before that, its quite rare
we need to dig back to older records..



--
john r pierce                            N 37, W 122
santa cruz ca                         mid-left coast



Re: Re: Need help in reclaiming disk space by deleting the selected records

From
"Yelai, Ramkumar IN BLR STS"
Date:

-----Original Message-----
From: Albe Laurenz [mailto:laurenz.albe@wien.gv.at]
Sent: Friday, September 28, 2012 1:07 PM
To: Yelai, Ramkumar IN BLR STS; pgsql-general@postgresql.org
Cc: scrawford@pinpointresearch.com; andres@2ndquadrant.com
Subject: RE: [GENERAL] Re: Need help in reclaiming disk space by deleting the selected records

Yelai, Ramkumar wrote:
>> Anything exceeding a few hundred partitions is not considered a good
idea.
>> The system needs to keep track of all the tables, and query planning
for such a partitioned table
>> might be expensive.
>>
>> 1440 is probably pushing the limits, but maybe somebody with more
experience can say more.

> By mistake I added 1440 tables, but it is incorrect, below is the
total number of tables
>
> 7 base tables X 120 months = 840 child tables.
>
> As per your statement, If I create these many table then it will
affect the performance. But as per
> the document
(http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html)
> constraint_exclusion will improve query performance. Please clarify me
here how query planning will be
> expensive?

The planner will have to decide which of the 840 tables to access.

> I have one more strategy that instead of creating 10 years, I'd like
to use batch processing like
> create 2 years of tables ( 240 tables ) and when we are going above 2
years we will create next 2 year
> table and update the trigger or use 5 years (480 tables ) instead of 2
years.
>
> The above approach will not create a more partitioned table and if
user wants space they can truncate
> the old tables. Please let me know is this good approach?

I don't understand that in detail.

I would recommend that you prototype some of these variants and run some performance tests.  That's the only good way
toknow what will perform well in your environment. 

Yours,
Laurenz Albe

Thanks Laurenz Albe.

After I went through the below articles, I understand query plans about partition table and its limits.

http://stackoverflow.com/questions/6104774/how-many-table-partitions-is-too-many-in-postgres
http://postgresql.1045698.n5.nabble.com/Table-partitioning-td3410542.html


As per our functionality ( 7 tables are represents 7 Unique archiving logs ), we will not be querying 840 tables or 7
basetables at same time. i.e each unique archiving logs table will have only 120 child tables, hence planner will have
todevice which of 120 table to access. 

In addition to this, at any time I will be reading only one partition table among 120 tables as per our computations.
Inthis computation we will not use any joins or combining the partition tables.  

As Laurenz said, I will do some prototype and I will check the query plans based on our queries.

Please let me know if you have any points are suggestions.

Thanks & regards,
Ramkumar