Thread: Table performance with millions of rows

Table performance with millions of rows

From
Robert Blayzor
Date:
Question on large tables…


When should one consider table partitioning vs. just stuffing 10 million rows into one table?

I currently have CDR’s that are injected into a table at the rate of over 100,000 a day, which is large.


At some point I’ll want to prune these records out, so being able to just drop or truncate the table in one shot makes
childtable partitions attractive. 


From a pure data warehousing standpoint, what are the do’s/don’t of keeping such large tables?

Other notes…
- This table is never updated, only appended (CDR’s)
- Right now daily SQL called to delete records older than X days. (costly, purging ~100k records at a time)



--
inoc.net!rblayzor
XMPP: rblayzor.AT.inoc.net
PGP:  https://inoc.net/~rblayzor/
















Re: Table performance with millions of rows (partitioning)

From
Justin Pryzby
Date:
On Wed, Dec 27, 2017 at 07:54:23PM -0500, Robert Blayzor wrote:
> Question on large tables…
> 
> When should one consider table partitioning vs. just stuffing 10 million rows into one table?

IMO, whenever constraint exclusion, DROP vs DELETE, or seq scan on individual
children justify the minor administrative overhead of partitioning.  Note that
partitioning may be implemented as direct insertion into child tables, or may
involve triggers or rules.

> I currently have CDR’s that are injected into a table at the rate of over
> 100,000 a day, which is large.
> 
> At some point I’ll want to prune these records out, so being able to just
> drop or truncate the table in one shot makes child table partitions
> attractive.

That's one of the major use cases for partitioning (DROP rather than DELETE and
thus avoiding any following vacuum+analyze).
https://www.postgresql.org/docs/10/static/ddl-partitioning.html#DDL-PARTITIONING-OVERVIEW

Justin


Re: Table performance with millions of rows (partitioning)

From
Robert Blayzor
Date:
On Dec 27, 2017, at 8:20 PM, Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> That's one of the major use cases for partitioning (DROP rather than DELETE and
> thus avoiding any following vacuum+analyze).
> https://www.postgresql.org/docs/10/static/ddl-partitioning.html#DDL-PARTITIONING-OVERVIEW


That’s the plan to partition and I can easily change the code to insert directly into the child tables.

Right now, I was going to use date ranges (per month) based on a timestamp.

But could I just create 12 child tables, one for each month instead of creating one for Year+month ?

ie: instead of:

  (CHECK (ts >= DATE ‘2017-12-01' AND ts < DATE ‘2018-01-01’))

use:

  (CHECK (EXTRACT(MONTH FROM ts) = 12))


I’ll never need more than the least six months, so I’ll just truncate the older child tables. By the time the data
wrapsaround, the child table will be empty. 


I’m not even sure if the above CHECK (w/ EXTRACT) instead of just looking for a date range is valid.





Re: Table performance with millions of rows (partitioning)

From
pinker
Date:
No, it's unfortunately not possible.
Documentation says in Caveats part:

/Constraint exclusion only works when the query's WHERE clause contains
constants (or externally supplied parameters). For example, a comparison
against a non-immutable function such as CURRENT_TIMESTAMP cannot be
optimized, since the planner cannot know which partition the function value
might fall into at run time.

Keep the partitioning constraints simple, else the planner may not be able
to prove that partitions don't need to be visited. Use simple equality
conditions for list partitioning, or simple range tests for range
partitioning, as illustrated in the preceding examples. A good rule of thumb
is that partitioning constraints should contain only comparisons of the
partitioning column(s) to constants using B-tree-indexable operators./

Even making a function in SQL or plpgsql and declaring it as immutable will
not help. Postgres will always check against all the partitions. It's not
enough "simple" for the planner.




--
Sent from: http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html