Re: PostgreSQL suitable? - Mailing list pgsql-general

From Stephen Frost
Subject Re: PostgreSQL suitable?
Date
Msg-id 20171219163208.GL4628@tamriel.snowman.net
Whole thread Raw
In response to Re: PostgreSQL suitable?  (Vincenzo Romano <vincenzo.romano@notorand.it>)
Responses Re: PostgreSQL suitable?
List pgsql-general
Greetings,

* Vincenzo Romano (vincenzo.romano@notorand.it) wrote:
> 2017-12-19 16:48 GMT+01:00 Stephen Frost <sfrost@snowman.net>:
> > * Vincenzo Romano (vincenzo.romano@notorand.it) wrote:
> >> Sorry, my bad: I confused V10 with v11.
> >> But accordingly to a discussion with Bruce Momjan, table partitionin V10 is
> >> little more than syntactic sugar around old-fashioned table partitioning.
> >
> > Well, it's a bit more than that since there's tuple-routing, but you're
> > right that the partition elimination is the same as it was in earlier
> > versions and based on constraint exclusion.  That said, as noted in the
> > email you replied to, reasonable numbers of partitions aren't too bad
> > even with the planning cost; it's when you have many thousands of
> > partitions that you get into cases where planning time for queries is
> > really bad.
>
> When you have to handle a 100TB table, the number of partitions
> shouldn't be 10 or 12
> as seen in most examples and tests.

Probably not, but it would depend on the data and what you're doing with
it.  While there are some general rules-of-thumb, there's no one right
answer when it comes to the size of individual partitions.

> This is the same type of issues you hit with partial indexes (this is
> why I mentioned them earlier).
> Sub-table (and partial index) selection algorithm should be
> logarithmic or sub-linear.

Sure, and work is being done to improve PG in that exact area, but that
doesn't mean it can't handle workloads like this today, but you have to
be aware of the costs associated with today's partitions.

> As long as it'll be linear, you'll hear about "reasonable number of partitions".
> One thousand partitions for a 100TB table would make "manageable"
> 100GB sub-tables.

Yup, and 100G tables are certainly large and a bit awkward but they can
be managed.

> I could be easily wrong, but this is an are where PG needs improvements.

Of course, and work is being done to improve it.  What's important is
knowing that there's a cost to having more partitions when querying
through the parent when you get to a point where you have thousands of
partitions.  That cost may be perfectly fine in some use-cases and in
others it might not be, and instead you'd likely have to build logic
into the application layer to address it.  That's not ideal, which is
why there's work being done to improve PG, but it's not necessairly a
big deal either.  Dynamic SQL is another approach.

> One could maybe think about multi-level partitioning, though.

Not sure exactly what you're referring to here, but a simple multi-level
partitioning setup with PG wouldn't actually change the cost for
partition elimination today.

> > Also as noted on this thread, PG could handle this data volume, but to
> > be efficient there would be work to be done in normalization,
> > aggregation, and analyzing the system to ensure you're storing and
> > querying on the data efficiently.
>
> Normalization will grow the number of tables (and later joins) and
> you'll will very likely end up with at least a table with a
> "gazillion" rows.

Natuarlly, but that "gazillion" rows table would be much smaller for
having the data normalized- if you don't normalize it then the gazillion
row table is a huge amount of duplicated data, making the entire system
much larger than necessary.

> I fear normalization, provided it's really needed, would provide little help.

I seriously doubt that's the case.  Normalization might reduce that
100TB down to 10's of TB instead, or perhaps even smaller.

> With tables that big I usually do "software partitioning".
> I make the software aware of the partition schema so it can direct the
> queries to the needed (sub-)tables.

Yes, implementing partition elimination at the application level is
certainly an approach that can work with PG.  Even then, ideally, you'd
normalize the data so that the footprint is much smaller, making more
efficient use of the storage and memory available.

Thanks!

Stephen

Attachment

pgsql-general by date:

Previous
From: Stephen Frost
Date:
Subject: Re: PostgreSQL suitable?
Next
From: Stephen Frost
Date:
Subject: Re: PostgreSQL suitable?