Thread: PostgreSQL suitable?

PostgreSQL suitable?

From
Kellner Thiemo
Date:
Hi

We are developing a data warehouse of which the integration layer will start with over 100 TB of data. There are not
manyentities though we probably can partition and foremost we should use inheritance for the lab results. I just was
wonderingif PostgreSQL was able to cope with. In case it depends on the modelling kind, we have not yet decided between
classicerd, anchor modelling and data vault. 

Does someone have experience with such a set up?

Kind regards

Thiemo


Re: PostgreSQL suitable?

From
"Rakesh Kumar"
Date:
I think with 100TB the biggest challenge will be taking backups. PG has no real incremental backups offered in Oracle
orSQLServer.
 


Re: PostgreSQL suitable?

From
Chris Travers
Date:


On Tue, Dec 19, 2017 at 3:07 PM, Kellner Thiemo <thiemo.kellner@usb.ch> wrote:
Hi

We are developing a data warehouse of which the integration layer will start with over 100 TB of data. There are not many entities though we probably can partition and foremost we should use inheritance for the lab results. I just was wondering if PostgreSQL was able to cope with. In case it depends on the modelling kind, we have not yet decided between classic erd, anchor modelling and data vault.

Does someone have experience with such a set up?

There are a significant number of issues that happen when you try to put that much data in PostgrSQL.  Remember there is a hard limit of 32TB per table.

I currently help administer an analytics environment where 400TB of data is pre-aggregated into 32TB of ready-to-serve metrics.  We generally try to keep our initial point of entry databases to under 20TB of possible.

Nonetheless it is quite possible either using distributed add-ons like Citus, forks like Postgres-XL, or (as we did at Adjust) a mixture of data consolidation and application-level sharding.

As a plug, I expect to deliver talks about this in various places.  Stay tuned ;-)

 

Kind regards

Thiemo




--
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor lock-in.

Re: PostgreSQL suitable?

From
Vincenzo Romano
Date:
I think that table partitioning isn't suitable yet for a large number of partitions. Or a large number of partial indexes. 

This will be a challenge too.

Il 19 dic 2017 15:07, "Kellner Thiemo" <thiemo.kellner@usb.ch> ha scritto:
Hi

We are developing a data warehouse of which the integration layer will start with over 100 TB of data. There are not many entities though we probably can partition and foremost we should use inheritance for the lab results. I just was wondering if PostgreSQL was able to cope with. In case it depends on the modelling kind, we have not yet decided between classic erd, anchor modelling and data vault.

Does someone have experience with such a set up?

Kind regards

Thiemo

Re: PostgreSQL suitable?

From
Andreas Kretschmer
Date:

Am 19.12.2017 um 15:07 schrieb Kellner Thiemo:
> Hi
>
> We are developing a data warehouse of which the integration layer will start with over 100 TB of data. There are not
manyentities though we probably can partition and foremost we should use inheritance for the lab results. I just was
wonderingif PostgreSQL was able to cope with. In case it depends on the modelling kind, we have not yet decided between
classicerd, anchor modelling and data vault.
 
>
> Does someone have experience with such a set up?
>
> Kind regards
>
> Thiemo
>


depends at least on the data and the workload. pg10 contains better 
solutions for table-partitioning, up to 1000 (maybe more) child-tables 
arn't that problem.
We have customers in that range.


Regards, Andreas

-- 
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com



Re: PostgreSQL suitable?

From
Vincenzo Romano
Date:
I've not tested PG10. But it's not released for production yet!

Il 19 dic 2017 15:48, "Andreas Kretschmer" <andreas@a-kretschmer.de> ha scritto:


Am 19.12.2017 um 15:07 schrieb Kellner Thiemo:
Hi

We are developing a data warehouse of which the integration layer will start with over 100 TB of data. There are not many entities though we probably can partition and foremost we should use inheritance for the lab results. I just was wondering if PostgreSQL was able to cope with. In case it depends on the modelling kind, we have not yet decided between classic erd, anchor modelling and data vault.

Does someone have experience with such a set up?

Kind regards

Thiemo



depends at least on the data and the workload. pg10 contains better solutions for table-partitioning, up to 1000 (maybe more) child-tables arn't that problem.
We have customers in that range.


Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com


Re: PostgreSQL suitable?

From
Alvaro Herrera
Date:
Vincenzo Romano wrote:
> I've not tested PG10. But it's not released for production yet!

It definitely is, for a couple of months now.  10.1 (the first bugfix
release) has been out for over a month.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: PostgreSQL suitable?

From
Tim Clarke
Date:
PG10 is happily in production - in active use here :)

Tim Clarke
IT Director
Manifest

Tel: +44 (0)1376 504510 | Mobile: +44 (0)7887 563420 |
Main: +44 (0)1376 503500 | Fax: +44 (0)1376 503550

Blog: https://blog.manifest.co.uk/ <https://www.manifest.co.uk/> | Web:
https://www.manifest.co.uk/

9 Freebournes Court | Newland Street | Witham | Essex | CM8 2BL | England


----------------------------------------------------------------------------------------------------------------------------

Copyright: This e-mail may contain confidential or legally privileged
information. If you are not the named addressee you must not use or
disclose such information, instead please report it to admin@manifest.info
Legal: Manifest is the trading name of: Manifest Information Services
Ltd: Registered in England Number 3401145 & The Manifest Voting Agency
Ltd: Registered in England Number 2920820 Registered Office at above
address. Please Click Here >> <https://www.manifest.co.uk/legal/> for
further information.

On 19/12/17 14:50, Vincenzo Romano wrote:
> I've not tested PG10. But it's not released for production yet!
>
> Il 19 dic 2017 15:48, "Andreas Kretschmer" <andreas@a-kretschmer.de
> <mailto:andreas@a-kretschmer.de>> ha scritto:
>
>
>
>     Am 19.12.2017 um 15:07 schrieb Kellner Thiemo:
>
>         Hi
>
>         We are developing a data warehouse of which the integration
>         layer will start with over 100 TB of data. There are not many
>         entities though we probably can partition and foremost we
>         should use inheritance for the lab results. I just was
>         wondering if PostgreSQL was able to cope with. In case it
>         depends on the modelling kind, we have not yet decided between
>         classic erd, anchor modelling and data vault.
>
>         Does someone have experience with such a set up?
>
>         Kind regards
>
>         Thiemo
>
>
>
>     depends at least on the data and the workload. pg10 contains
>     better solutions for table-partitioning, up to 1000 (maybe more)
>     child-tables arn't that problem.
>     We have customers in that range.
>
>
>     Regards, Andreas
>
>     --
>     2ndQuadrant - The PostgreSQL Support Company.
>     www.2ndQuadrant.com <http://www.2ndQuadrant.com>
>
>



Attachment

Re: PostgreSQL suitable?

From
Vincenzo Romano
Date:
Sorry, my bad: I confused V10 with v11.
But accordingly to a discussion with Bruce Momjan, table partitionin V10 is little more than syntactic sugar around old-fashioned table partitioning.
Sub-table partition selection is linear in complexity.

Il 19 dic 2017 15:55, "Alvaro Herrera" <alvherre@alvh.no-ip.org> ha scritto:
Vincenzo Romano wrote:
> I've not tested PG10. But it's not released for production yet!

It definitely is, for a couple of months now.  10.1 (the first bugfix
release) has been out for over a month.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PostgreSQL suitable?

From
Stephen Frost
Date:
Greetings,

* Rakesh Kumar (rakeshkumar464@mail.com) wrote:
> I think with 100TB the biggest challenge will be taking backups. PG has no real incremental backups offered in Oracle
orSQLServer.
 

There are multiple solutions to doing incremental backups with
PostgreSQL, so I'm not sure why you're saying that they don't exist,
because that's really not accurate.

Thanks!

Stephen

Attachment

Re: PostgreSQL suitable?

From
Stephen Frost
Date:
Greetings,

* Vincenzo Romano (vincenzo.romano@notorand.it) wrote:
> Sorry, my bad: I confused V10 with v11.
> But accordingly to a discussion with Bruce Momjan, table partitionin V10 is
> little more than syntactic sugar around old-fashioned table partitioning.

Well, it's a bit more than that since there's tuple-routing, but you're
right that the partition elimination is the same as it was in earlier
versions and based on constraint exclusion.  That said, as noted in the
email you replied to, reasonable numbers of partitions aren't too bad
even with the planning cost; it's when you have many thousands of
partitions that you get into cases where planning time for queries is
really bad.

Also as noted on this thread, PG could handle this data volume, but to
be efficient there would be work to be done in normalization,
aggregation, and analyzing the system to ensure you're storing and
querying on the data efficiently.

Thanks!

Stephen

Attachment

Re: PostgreSQL suitable?

From
Andreas Kretschmer
Date:

Am 19.12.2017 um 15:58 schrieb Vincenzo Romano:
> But accordingly to a discussion with Bruce Momjan, table partitionin 
> V10 is little more than syntactic sugar around old-fashioned table 
> partitioning.

there is always room for improvements, but table-partitioning in pg10 is 
much, much better then before.
(there was a impressive talk about that: 

https://www.postgresql.eu/events/schedule/pgconfeu2017/session/1643-everything-wrong-with-partitioning-in-postgresql-10/

)

Regards, Andreas

-- 
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com



Re: PostgreSQL suitable?

From
"Rakesh Kumar"
Date:
> There are multiple solutions to doing incremental backups with
> PostgreSQL, so I'm not sure why you're saying that they don't exist,
> because that's really not accurate.

PG's incremental backup is essentially WAL logs applied to a point in time.  I am talking about true incremental backup
wherethe db backs up only the changed blocks after the last full backup.  In a DW system where most of the time it is
appendonly, it makes a huge difference in backup time.
 

I believe there is one tool which looks at mtime/atime of each of the data file and takes the decision to back it up or
lot. Not sure how robust it is.
 


Re: PostgreSQL suitable?

From
Vincenzo Romano
Date:
2017-12-19 16:48 GMT+01:00 Stephen Frost <sfrost@snowman.net>:
> * Vincenzo Romano (vincenzo.romano@notorand.it) wrote:
>> Sorry, my bad: I confused V10 with v11.
>> But accordingly to a discussion with Bruce Momjan, table partitionin V10 is
>> little more than syntactic sugar around old-fashioned table partitioning.
>
> Well, it's a bit more than that since there's tuple-routing, but you're
> right that the partition elimination is the same as it was in earlier
> versions and based on constraint exclusion.  That said, as noted in the
> email you replied to, reasonable numbers of partitions aren't too bad
> even with the planning cost; it's when you have many thousands of
> partitions that you get into cases where planning time for queries is
> really bad.

When you have to handle a 100TB table, the number of partitions
shouldn't be 10 or 12
as seen in most examples and tests.
This is the same type of issues you hit with partial indexes (this is
why I mentioned them earlier).
Sub-table (and partial index) selection algorithm should be
logarithmic or sub-linear.
As long as it'll be linear, you'll hear about "reasonable number of partitions".
One thousand partitions for a 100TB table would make "manageable"
100GB sub-tables.
I could be easily wrong, but this is an are where PG needs improvements.
One could maybe think about multi-level partitioning, though.

> Also as noted on this thread, PG could handle this data volume, but to
> be efficient there would be work to be done in normalization,
> aggregation, and analyzing the system to ensure you're storing and
> querying on the data efficiently.

Normalization will grow the number of tables (and later joins) and
you'll will very likely end up with at least a table with a
"gazillion" rows.
I fear normalization, provided it's really needed, would provide little help.
With tables that big I usually do "software partitioning".
I make the software aware of the partition schema so it can direct the
queries to the needed (sub-)tables.
So I will have the sub-table selection algorithm in my language of
choice with effective support data structures.
Of course this is far from being the right solution.

-- 
Vincenzo Romano - NotOrAnd.IT
Information Technologies
--
NON QVIETIS MARIBVS NAVTA PERITVS


Re: PostgreSQL suitable?

From
James Keener
Date:
Would a storage block level incremental like zfs work?

On December 19, 2017 11:12:48 AM EST, Rakesh Kumar <rakeshkumar464@mail.com> wrote:

There are multiple solutions to doing incremental backups with
PostgreSQL, so I'm not sure why you're saying that they don't exist,
because that's really not accurate.

PG's incremental backup is essentially WAL logs applied to a point in time. I am talking about true incremental backup where the db backs up only the changed blocks after the last full backup. In a DW system where most of the time it is append only, it makes a huge difference in backup time.

I believe there is one tool which looks at mtime/atime of each of the data file and takes the decision to back it up or lot. Not sure how robust it is.


--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: PostgreSQL suitable?

From
Stephen Frost
Date:
Greetings Rakesh,

* Rakesh Kumar (rakeshkumar464@mail.com) wrote:
>
> > There are multiple solutions to doing incremental backups with
> > PostgreSQL, so I'm not sure why you're saying that they don't exist,
> > because that's really not accurate.
>
> PG's incremental backup is essentially WAL logs applied to a point in time.  I am talking about true incremental
backupwhere the db backs up only the changed blocks after the last full backup.  In a DW system where most of the time
itis append only, it makes a huge difference in backup time. 
>
> I believe there is one tool which looks at mtime/atime of each of the data file and takes the decision to back it up
orlot.  Not sure how robust it is. 

Yes, there are multiple such solutions including pgBackRest and barman,
both of which are robust, well supported projects that provide exactly
the incremental backup capability you're asking about.  I can't speak
for barman, but I know that there are many people using pgBackRest
to do incremental backups, some on very large PG databases.

I will point out that supporting incremental backups is tricky to do
correctly, which is why using an existing project that's had years to
work on getting it right is really the best approach.  I wouldn't
recommend trying to roll your own.

Thanks!

Stephen

Attachment

Re: PostgreSQL suitable?

From
Stephen Frost
Date:
Greetings,

* Vincenzo Romano (vincenzo.romano@notorand.it) wrote:
> 2017-12-19 16:48 GMT+01:00 Stephen Frost <sfrost@snowman.net>:
> > * Vincenzo Romano (vincenzo.romano@notorand.it) wrote:
> >> Sorry, my bad: I confused V10 with v11.
> >> But accordingly to a discussion with Bruce Momjan, table partitionin V10 is
> >> little more than syntactic sugar around old-fashioned table partitioning.
> >
> > Well, it's a bit more than that since there's tuple-routing, but you're
> > right that the partition elimination is the same as it was in earlier
> > versions and based on constraint exclusion.  That said, as noted in the
> > email you replied to, reasonable numbers of partitions aren't too bad
> > even with the planning cost; it's when you have many thousands of
> > partitions that you get into cases where planning time for queries is
> > really bad.
>
> When you have to handle a 100TB table, the number of partitions
> shouldn't be 10 or 12
> as seen in most examples and tests.

Probably not, but it would depend on the data and what you're doing with
it.  While there are some general rules-of-thumb, there's no one right
answer when it comes to the size of individual partitions.

> This is the same type of issues you hit with partial indexes (this is
> why I mentioned them earlier).
> Sub-table (and partial index) selection algorithm should be
> logarithmic or sub-linear.

Sure, and work is being done to improve PG in that exact area, but that
doesn't mean it can't handle workloads like this today, but you have to
be aware of the costs associated with today's partitions.

> As long as it'll be linear, you'll hear about "reasonable number of partitions".
> One thousand partitions for a 100TB table would make "manageable"
> 100GB sub-tables.

Yup, and 100G tables are certainly large and a bit awkward but they can
be managed.

> I could be easily wrong, but this is an are where PG needs improvements.

Of course, and work is being done to improve it.  What's important is
knowing that there's a cost to having more partitions when querying
through the parent when you get to a point where you have thousands of
partitions.  That cost may be perfectly fine in some use-cases and in
others it might not be, and instead you'd likely have to build logic
into the application layer to address it.  That's not ideal, which is
why there's work being done to improve PG, but it's not necessairly a
big deal either.  Dynamic SQL is another approach.

> One could maybe think about multi-level partitioning, though.

Not sure exactly what you're referring to here, but a simple multi-level
partitioning setup with PG wouldn't actually change the cost for
partition elimination today.

> > Also as noted on this thread, PG could handle this data volume, but to
> > be efficient there would be work to be done in normalization,
> > aggregation, and analyzing the system to ensure you're storing and
> > querying on the data efficiently.
>
> Normalization will grow the number of tables (and later joins) and
> you'll will very likely end up with at least a table with a
> "gazillion" rows.

Natuarlly, but that "gazillion" rows table would be much smaller for
having the data normalized- if you don't normalize it then the gazillion
row table is a huge amount of duplicated data, making the entire system
much larger than necessary.

> I fear normalization, provided it's really needed, would provide little help.

I seriously doubt that's the case.  Normalization might reduce that
100TB down to 10's of TB instead, or perhaps even smaller.

> With tables that big I usually do "software partitioning".
> I make the software aware of the partition schema so it can direct the
> queries to the needed (sub-)tables.

Yes, implementing partition elimination at the application level is
certainly an approach that can work with PG.  Even then, ideally, you'd
normalize the data so that the footprint is much smaller, making more
efficient use of the storage and memory available.

Thanks!

Stephen

Attachment

Re: PostgreSQL suitable?

From
Stephen Frost
Date:
Greetings,

* James Keener (jim@jimkeener.com) wrote:
> Would a storage block level incremental like zfs work?

This really depends on what you want out of your backups and just
exactly how the ZFS filesystem is set up.  Remember that any backup of
PG that doesn't use PG's start/stop backup must be atomic across all
tablespaces and even then that really just allows you to bring PG back
up as of that point of the snapshot.  I wouldn't recommend trying to
play WAL forward from that kind of a backup.  If you use do use
pg_start/stop_backup with ZFS snapshots, and make sure to track all of
the WAL that's generated between the start/stop backup and ensure it's
reliably stored, etc, then they can work, but it's not simple.

Thanks!

Stephen

Attachment

Re: PostgreSQL suitable?

From
Vincenzo Romano
Date:
Hi.

2017-12-19 17:32 GMT+01:00 Stephen Frost <sfrost@snowman.net>:
> Greetings,
>
> * Vincenzo Romano (vincenzo.romano@notorand.it) wrote:
>> 2017-12-19 16:48 GMT+01:00 Stephen Frost <sfrost@snowman.net>:
>> > * Vincenzo Romano (vincenzo.romano@notorand.it) wrote:
>> >> Sorry, my bad: I confused V10 with v11.
>> >> But accordingly to a discussion with Bruce Momjan, table partitionin V10 is
>> >> little more than syntactic sugar around old-fashioned table partitioning.
>> >
>> > Well, it's a bit more than that since there's tuple-routing, but you're
>> > right that the partition elimination is the same as it was in earlier
>> > versions and based on constraint exclusion.  That said, as noted in the
>> > email you replied to, reasonable numbers of partitions aren't too bad
>> > even with the planning cost; it's when you have many thousands of
>> > partitions that you get into cases where planning time for queries is
>> > really bad.
>>
>> When you have to handle a 100TB table, the number of partitions
>> shouldn't be 10 or 12
>> as seen in most examples and tests.
>
> Probably not, but it would depend on the data and what you're doing with
> it.  While there are some general rules-of-thumb, there's no one right
> answer when it comes to the size of individual partitions.

I think partitioning is done in order to make smaller tables or to
balance table size and table number.
You currently have to keep in mind a limit to the number of sub-tables.
This makes partitioning less effective.

>> This is the same type of issues you hit with partial indexes (this is
>> why I mentioned them earlier).
>> Sub-table (and partial index) selection algorithm should be
>> logarithmic or sub-linear.
>
> Sure, and work is being done to improve PG in that exact area, but that
> doesn't mean it can't handle workloads like this today, but you have to
> be aware of the costs associated with today's partitions.
>
>> As long as it'll be linear, you'll hear about "reasonable number of partitions".
>> One thousand partitions for a 100TB table would make "manageable"
>> 100GB sub-tables.
>
> Yup, and 100G tables are certainly large and a bit awkward but they can
> be managed.

Of course. But I would bet they are still considered as "very large tables".

>> I could be easily wrong, but this is an are where PG needs improvements.
>
> Of course, and work is being done to improve it.  What's important is
> knowing that there's a cost to having more partitions when querying
> through the parent when you get to a point where you have thousands of
> partitions.  That cost may be perfectly fine in some use-cases and in
> others it might not be, and instead you'd likely have to build logic
> into the application layer to address it.  That's not ideal, which is
> why there's work being done to improve PG, but it's not necessairly a
> big deal either.  Dynamic SQL is another approach.
>
>> One could maybe think about multi-level partitioning, though.
>
> Not sure exactly what you're referring to here, but a simple multi-level
> partitioning setup with PG wouldn't actually change the cost for
> partition elimination today.

It'd be a tree of tables, pointing to logarithmic selection.
This is why we love tree data structures.

>> > Also as noted on this thread, PG could handle this data volume, but to
>> > be efficient there would be work to be done in normalization,
>> > aggregation, and analyzing the system to ensure you're storing and
>> > querying on the data efficiently.
>>
>> Normalization will grow the number of tables (and later joins) and
>> you'll will very likely end up with at least a table with a
>> "gazillion" rows.
>
> Natuarlly, but that "gazillion" rows table would be much smaller for
> having the data normalized- if you don't normalize it then the gazillion
> row table is a huge amount of duplicated data, making the entire system
> much larger than necessary.

Partitioning is done for other reasons than de-duplicating data.
The number of rows to be analyzed would be still the same and the
indexes over those columns would still be rather large.

>> I fear normalization, provided it's really needed, would provide little help.
>
> I seriously doubt that's the case.  Normalization might reduce that
> 100TB down to 10's of TB instead, or perhaps even smaller.

A 10x duplication factor seems very bad to me.
If that's the case then normalization would provide for some
improvement while eating resources to be done.
Storage nowadays isn't a big issue. Querying is.

>> With tables that big I usually do "software partitioning".
>> I make the software aware of the partition schema so it can direct the
>> queries to the needed (sub-)tables.
>
> Yes, implementing partition elimination at the application level is
> certainly an approach that can work with PG.  Even then, ideally, you'd
> normalize the data so that the footprint is much smaller, making more
> efficient use of the storage and memory available.

Normalizing can save resources that you later need to join tables back.
It's a matter of trade-offs and it strictly depends on the queries to
be run in my opinion.
Effective table partitioning, instead, seems to me to have bigger importance.

My bottom line here is: table partitioning is nice, but not ready yet
for the big data where it makes the bigger sense.

> Thanks!

I do thank you for your acute remarks.

> Stephen

-- 
Vincenzo Romano - NotOrAnd.IT
Information Technologies
--
NON QVIETIS MARIBVS NAVTA PERITVS


Re: PostgreSQL suitable?

From
Alexander Stoddard
Date:


On Tue, Dec 19, 2017 at 10:39 AM, Stephen Frost <sfrost@snowman.net> wrote:
Greetings,

* James Keener (jim@jimkeener.com) wrote:
> Would a storage block level incremental like zfs work?

This really depends on what you want out of your backups and just
exactly how the ZFS filesystem is set up.  Remember that any backup of
PG that doesn't use PG's start/stop backup must be atomic across all
tablespaces and even then that really just allows you to bring PG back
up as of that point of the snapshot.  I wouldn't recommend trying to
play WAL forward from that kind of a backup.  If you use do use
pg_start/stop_backup with ZFS snapshots, and make sure to track all of
the WAL that's generated between the start/stop backup and ensure it's
reliably stored, etc, then they can work, but it's not simple.


I believe that the thread started with a data warehouse use case. That might be one application where data ingestion and processing can be stopped and started in a controlled manner. As opposed to a continuously live system where changes are going to continually accumulate in the WAL.

Best regards,
Alex

Re: PostgreSQL suitable?

From
Stephen Frost
Date:
Greetings,

* Vincenzo Romano (vincenzo.romano@notorand.it) wrote:
> 2017-12-19 17:32 GMT+01:00 Stephen Frost <sfrost@snowman.net>:
> > * Vincenzo Romano (vincenzo.romano@notorand.it) wrote:
> >> 2017-12-19 16:48 GMT+01:00 Stephen Frost <sfrost@snowman.net>:
> >> > * Vincenzo Romano (vincenzo.romano@notorand.it) wrote:
> >> >> Sorry, my bad: I confused V10 with v11.
> >> >> But accordingly to a discussion with Bruce Momjan, table partitionin V10 is
> >> >> little more than syntactic sugar around old-fashioned table partitioning.
> >> >
> >> > Well, it's a bit more than that since there's tuple-routing, but you're
> >> > right that the partition elimination is the same as it was in earlier
> >> > versions and based on constraint exclusion.  That said, as noted in the
> >> > email you replied to, reasonable numbers of partitions aren't too bad
> >> > even with the planning cost; it's when you have many thousands of
> >> > partitions that you get into cases where planning time for queries is
> >> > really bad.
> >>
> >> When you have to handle a 100TB table, the number of partitions
> >> shouldn't be 10 or 12
> >> as seen in most examples and tests.
> >
> > Probably not, but it would depend on the data and what you're doing with
> > it.  While there are some general rules-of-thumb, there's no one right
> > answer when it comes to the size of individual partitions.
>
> I think partitioning is done in order to make smaller tables or to
> balance table size and table number.
> You currently have to keep in mind a limit to the number of sub-tables.
> This makes partitioning less effective.

Sure, but that doesn't make PG unsuitable for such use-cases, just that
you have to realize these costs and address them.

> >> This is the same type of issues you hit with partial indexes (this is
> >> why I mentioned them earlier).
> >> Sub-table (and partial index) selection algorithm should be
> >> logarithmic or sub-linear.
> >
> > Sure, and work is being done to improve PG in that exact area, but that
> > doesn't mean it can't handle workloads like this today, but you have to
> > be aware of the costs associated with today's partitions.
> >
> >> As long as it'll be linear, you'll hear about "reasonable number of partitions".
> >> One thousand partitions for a 100TB table would make "manageable"
> >> 100GB sub-tables.
> >
> > Yup, and 100G tables are certainly large and a bit awkward but they can
> > be managed.
>
> Of course. But I would bet they are still considered as "very large tables".

They're not small. :)

> >> I could be easily wrong, but this is an are where PG needs improvements.
> >
> > Of course, and work is being done to improve it.  What's important is
> > knowing that there's a cost to having more partitions when querying
> > through the parent when you get to a point where you have thousands of
> > partitions.  That cost may be perfectly fine in some use-cases and in
> > others it might not be, and instead you'd likely have to build logic
> > into the application layer to address it.  That's not ideal, which is
> > why there's work being done to improve PG, but it's not necessairly a
> > big deal either.  Dynamic SQL is another approach.
> >
> >> One could maybe think about multi-level partitioning, though.
> >
> > Not sure exactly what you're referring to here, but a simple multi-level
> > partitioning setup with PG wouldn't actually change the cost for
> > partition elimination today.
>
> It'd be a tree of tables, pointing to logarithmic selection.
> This is why we love tree data structures.

Sure, but that doesn't matter when it comes to how constraint exclusion
works with PostgreSQL today- all partitions under a given parent are
considered when doing partition elimination.

> >> > Also as noted on this thread, PG could handle this data volume, but to
> >> > be efficient there would be work to be done in normalization,
> >> > aggregation, and analyzing the system to ensure you're storing and
> >> > querying on the data efficiently.
> >>
> >> Normalization will grow the number of tables (and later joins) and
> >> you'll will very likely end up with at least a table with a
> >> "gazillion" rows.
> >
> > Natuarlly, but that "gazillion" rows table would be much smaller for
> > having the data normalized- if you don't normalize it then the gazillion
> > row table is a huge amount of duplicated data, making the entire system
> > much larger than necessary.
>
> Partitioning is done for other reasons than de-duplicating data.

Yes, I wasn't suggesting to use partitioning to de-duplicate data, I was
suggesting to use normalization for that.

> The number of rows to be analyzed would be still the same and the
> indexes over those columns would still be rather large.

While true, the indexes would be smaller than they would be without the
data being normalized.

> >> I fear normalization, provided it's really needed, would provide little help.
> >
> > I seriously doubt that's the case.  Normalization might reduce that
> > 100TB down to 10's of TB instead, or perhaps even smaller.
>
> A 10x duplication factor seems very bad to me.

I'm not sure why, it's not actually that uncommon.

> If that's the case then normalization would provide for some
> improvement while eating resources to be done.

It's unclear what you're referring to here regarding 'eating resources'.
Clearly there's some CPU time that's consumed when doing a join, but
that's generally far cheaper than the time required to pull data off of
disk because it's not in memory.

> Storage nowadays isn't a big issue. Querying is.

Sure, and if all you had to pay was the storage cost then no one would
care- but you also have to pay the memory cost and that's a much bigger
deal.  If you have to go out to disk constantly then you've basically
already lost when it comes to query time.

> >> With tables that big I usually do "software partitioning".
> >> I make the software aware of the partition schema so it can direct the
> >> queries to the needed (sub-)tables.
> >
> > Yes, implementing partition elimination at the application level is
> > certainly an approach that can work with PG.  Even then, ideally, you'd
> > normalize the data so that the footprint is much smaller, making more
> > efficient use of the storage and memory available.
>
> Normalizing can save resources that you later need to join tables back.

You were commenting above on how normalization eats resources and now
it's saving them. :)  In my experience, the join cost is nothing
compared to the cost to pull data off of disk because you couldn't fit
it all in memory.

> It's a matter of trade-offs and it strictly depends on the queries to
> be run in my opinion.

I agree that it's a matter of trade-offs and certainly depends on the
queries being run, though none of that information is available here.

> Effective table partitioning, instead, seems to me to have bigger importance.

They're both important, imv.

> My bottom line here is: table partitioning is nice, but not ready yet
> for the big data where it makes the bigger sense.

I agree that there's more work to be done there, certainly, but it's
also getting better and there are ways to work around the existing
limitations.

Thanks!

Stephen

Attachment

RE: PostgreSQL suitable?

From
Kellner Thiemo
Date:
Hi

Thank you all for your thoughts and the light shedding on this subject. I am very pleased to have got such engaged
feedbackand also that PostgreSQL seems to be so capable. I am quite sad that we most probably will not setup our DWH
withit as RDBMS. It is, I fear, a quite in-house-political decision. 

Kind regards

Thiemo