Thread: Table performance with millions of rows
Question on large tables… When should one consider table partitioning vs. just stuffing 10 million rows into one table? I currently have CDR’s that are injected into a table at the rate of over 100,000 a day, which is large. At some point I’ll want to prune these records out, so being able to just drop or truncate the table in one shot makes childtable partitions attractive. From a pure data warehousing standpoint, what are the do’s/don’t of keeping such large tables? Other notes… - This table is never updated, only appended (CDR’s) - Right now daily SQL called to delete records older than X days. (costly, purging ~100k records at a time) -- inoc.net!rblayzor XMPP: rblayzor.AT.inoc.net PGP: https://inoc.net/~rblayzor/
On Wed, Dec 27, 2017 at 07:54:23PM -0500, Robert Blayzor wrote: > Question on large tables… > > When should one consider table partitioning vs. just stuffing 10 million rows into one table? IMO, whenever constraint exclusion, DROP vs DELETE, or seq scan on individual children justify the minor administrative overhead of partitioning. Note that partitioning may be implemented as direct insertion into child tables, or may involve triggers or rules. > I currently have CDR’s that are injected into a table at the rate of over > 100,000 a day, which is large. > > At some point I’ll want to prune these records out, so being able to just > drop or truncate the table in one shot makes child table partitions > attractive. That's one of the major use cases for partitioning (DROP rather than DELETE and thus avoiding any following vacuum+analyze). https://www.postgresql.org/docs/10/static/ddl-partitioning.html#DDL-PARTITIONING-OVERVIEW Justin
On Dec 27, 2017, at 8:20 PM, Justin Pryzby <pryzby@telsasoft.com> wrote: > > That's one of the major use cases for partitioning (DROP rather than DELETE and > thus avoiding any following vacuum+analyze). > https://www.postgresql.org/docs/10/static/ddl-partitioning.html#DDL-PARTITIONING-OVERVIEW That’s the plan to partition and I can easily change the code to insert directly into the child tables. Right now, I was going to use date ranges (per month) based on a timestamp. But could I just create 12 child tables, one for each month instead of creating one for Year+month ? ie: instead of: (CHECK (ts >= DATE ‘2017-12-01' AND ts < DATE ‘2018-01-01’)) use: (CHECK (EXTRACT(MONTH FROM ts) = 12)) I’ll never need more than the least six months, so I’ll just truncate the older child tables. By the time the data wrapsaround, the child table will be empty. I’m not even sure if the above CHECK (w/ EXTRACT) instead of just looking for a date range is valid.
No, it's unfortunately not possible. Documentation says in Caveats part: /Constraint exclusion only works when the query's WHERE clause contains constants (or externally supplied parameters). For example, a comparison against a non-immutable function such as CURRENT_TIMESTAMP cannot be optimized, since the planner cannot know which partition the function value might fall into at run time. Keep the partitioning constraints simple, else the planner may not be able to prove that partitions don't need to be visited. Use simple equality conditions for list partitioning, or simple range tests for range partitioning, as illustrated in the preceding examples. A good rule of thumb is that partitioning constraints should contain only comparisons of the partitioning column(s) to constants using B-tree-indexable operators./ Even making a function in SQL or plpgsql and declaring it as immutable will not help. Postgres will always check against all the partitions. It's not enough "simple" for the planner. -- Sent from: http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html