Thread: Table partitioning
I have a rather large and slow table in Postgresql 9.1. I'm thinking of partitioning it by months, but I don't like the ideaof creating and dropping tables all the time. I'm thinking of simply creating 12 child tables, in which the check condition will be, for example, date_part('month'', time_arrived)= 1 (or 2 for February, 3 for March etc.). I'll just be deleting records rather than dropping tables, the same way I do in my current setup. I delete a week's worthevery time. So, I have two questions. First, is constraint exclusion going to work with that kind of condition? I mean, if my WHERE clause says something like"time_arrived >= '2013-04-05' and time_arrived < '2013-04-17'", will it be able to tell that date_part("month",time_arrived)for all the records is 4, and therefore avoid selecting from any partitions other than theapril one? Second, when I delete (not drop!) from the mother table, are records deleted automatically from the child tables or do Ineed to create rules/triggers for that? TIA, Herouth
On 2013-10-28 12:27, Herouth Maoz wrote: > I have a rather large and slow table in Postgresql 9.1. I'm thinking of partitioning it by months, but I don't like theidea of creating and dropping tables all the time. > > I'm thinking of simply creating 12 child tables, in which the check condition will be, for example, date_part('month'',time_arrived) = 1 (or 2 for February, 3 for March etc.). > > I'll just be deleting records rather than dropping tables, the same way I do in my current setup. I delete a week's worthevery time. > > So, I have two questions. > > First, is constraint exclusion going to work with that kind of condition? I mean, if my WHERE clause says something like"time_arrived >= '2013-04-05' and time_arrived < '2013-04-17'", will it be able to tell that date_part("month",time_arrived)for all the records is 4, and therefore avoid selecting from any partitions other than theapril one? > > Second, when I delete (not drop!) from the mother table, are records deleted automatically from the child tables or doI need to create rules/triggers for that? > > > TIA, > Herouth > 1. No - you'd need a condition like "where date_part("month", time_arrived) = 1" in your select statements in order for the constraint exclusion to kick in 2. Yes - there is no need to create rules or triggers for deletes on the parent table (check out the syntax for "delete from <table>" versus "delete from only <table>)
Thanks. Assuming there is an index on the time_arrived column, and that there are about 10.5 million records in each childtable, how bad will performance be if the query actually accesses all the 12 tables? Will it be as bad as using thefull table? On 28/10/2013, at 18:31, Elliot wrote: > On 2013-10-28 12:27, Herouth Maoz wrote: >> I have a rather large and slow table in Postgresql 9.1. I'm thinking of partitioning it by months, but I don't like theidea of creating and dropping tables all the time. >> >> I'm thinking of simply creating 12 child tables, in which the check condition will be, for example, date_part('month'',time_arrived) = 1 (or 2 for February, 3 for March etc.). >> >> I'll just be deleting records rather than dropping tables, the same way I do in my current setup. I delete a week's worthevery time. >> >> So, I have two questions. >> >> First, is constraint exclusion going to work with that kind of condition? I mean, if my WHERE clause says something like"time_arrived >= '2013-04-05' and time_arrived < '2013-04-17'", will it be able to tell that date_part("month",time_arrived)for all the records is 4, and therefore avoid selecting from any partitions other than theapril one? >> >> Second, when I delete (not drop!) from the mother table, are records deleted automatically from the child tables or doI need to create rules/triggers for that? >> >> >> TIA, >> Herouth >> > 1. No - you'd need a condition like "where date_part("month", time_arrived) = 1" in your select statements in order forthe constraint exclusion to kick in > 2. Yes - there is no need to create rules or triggers for deletes on the parent table (check out the syntax for "deletefrom <table>" versus "delete from only <table>) >
On 2013-10-28 12:47, Herouth Maoz wrote: > Thanks. Assuming there is an index on the time_arrived column, and that there are about 10.5 million records in each childtable, how bad will performance be if the query actually accesses all the 12 tables? Will it be as bad as using thefull table? > > In the case where you don't add a predicate to your select statement that matches your constraint checks, meaning your statements are visiting every child table, you're not going to experience any performance gains over leaving everything in one table (plus you're adding on extra maintenance overhead and likely degrading query performance, even if only by a small amount). What are your concerns with creating/dropping tables? You may have even better success with creating a child table for each week. Fifty-two extra tables is extra maintenance but the performance boost you could get by dividing out your data is probably significant, and that kind of maintenance should be automated anyway.
On 10/28/2013 09:27 AM, Herouth Maoz wrote: > I have a rather large and slow table in Postgresql 9.1. I'm thinking of partitioning it by months, but I don't like theidea of creating and dropping tables all the time. What is slow about it? Inserting? Selecting? Deleting? Partitioning can assist with some issues but does no good if what you really need is an index or better query. Partitioning shines as an option to manage archiving/purging of time-series data but only if you work with it, not against it. What don't you like about creating and dropping tables? You can easily automate it: https://github.com/keithf4/pg_partman > > I'm thinking of simply creating 12 child tables, in which the check condition will be, for example, date_part('month'',time_arrived) = 1 (or 2 for February, 3 for March etc.). > > I'll just be deleting records rather than dropping tables, the same way I do in my current setup. I delete a week's worthevery time. You are missing out on one of the best aspects of partitioning. Compared to dropping or truncating a child table, deleting is far slower and causes table bloat which may impact future queries. > > Second, when I delete (not drop!) from the mother table, are records deleted automatically from the child tables or doI need to create rules/triggers for that? > Yes unless you use the keyword "ONLY": "If specified, deletes rows from the named table only. When not specified, any tables inheriting from the named table are also processed." Cheers, Steve
Everything is slow about it - selects, deletes and inserts, that is. I don't do updates on that table.
The inserts and deletes are less of an issue because they are done once a week. Of course it would be nicer if they were faster, but that's less of an issue. The real issue is with self-joins, which are a common query. But I have indexes on the relevant fields: the connecting field (the one used for the self-join) as well as the date field. The queries are mostly of the format SELECT ... FROM tablename t1 left join tablename t2 on t1.indexed_field = t2.indexed_field and t2.boolean_field where t1.date_field >= 'date1' and t1.date_field < 'date2' and not t1.boolean; This can take about 15 minutes to run, depending on the date range. And it doesn't matter much if I put a date range within the left join, either.
As I mentioned in another message, I don't like the idea that my script may fail, and then inserts into the table might start to fail as well. I'm not always available to run it manually within a set time. And writing an automation that is different than all the other tables I maintain in that database makes for maintenance spaghetti. I also don't like running automated DDL commands. They don't play well with backups.
-----הודעה מקורית-----
מאת: Steve Crawford [mailto:scrawford@pinpointresearch.com]
נשלח: ב 28/10/2013 22:31
אל: Herouth Maoz; pgsql-general@postgresql.org
נושא: Re: [GENERAL] Table partitioning
On 10/28/2013 09:27 AM, Herouth Maoz wrote:
> I have a rather large and slow table in Postgresql 9.1. I'm thinking of partitioning it by months, but I don't like the idea of creating and dropping tables all the time.
What is slow about it? Inserting? Selecting? Deleting? Partitioning can
assist with some issues but does no good if what you really need is an
index or better query. Partitioning shines as an option to manage
archiving/purging of time-series data but only if you work with it, not
against it.
What don't you like about creating and dropping tables? You can easily
automate it: https://github.com/keithf4/pg_partman
>
> I'm thinking of simply creating 12 child tables, in which the check condition will be, for example, date_part('month'', time_arrived) = 1 (or 2 for February, 3 for March etc.).
>
> I'll just be deleting records rather than dropping tables, the same way I do in my current setup. I delete a week's worth every time.
You are missing out on one of the best aspects of partitioning. Compared
to dropping or truncating a child table, deleting is far slower and
causes table bloat which may impact future queries.
>
> Second, when I delete (not drop!) from the mother table, are records deleted automatically from the child tables or do I need to create rules/triggers for that?
>
Yes unless you use the keyword "ONLY": "If specified, deletes rows from
the named table only. When not specified, any tables inheriting from the
named table are also processed."
Cheers,
Steve