Re: Is this still accurate? - Mailing list pgsql-docs

From Simon Riggs
Subject Re: Is this still accurate?
Date
Msg-id CANP8+jJ+qfuoiZ8RZq9Wr+6FHKM0aqktoUK2EdHCNwAz54FbpQ@mail.gmail.com
Whole thread Raw
In response to Re: Is this still accurate?  ("Jonathan S. Katz" <jkatz@postgresql.org>)
List pgsql-docs
On 6 January 2018 at 16:35, Jonathan S. Katz <jkatz@postgresql.org> wrote:
> Hi,
>
> On Jan 6, 2018, at 9:45 AM, Magnus Hagander <magnus@hagander.net> wrote:
>
>
>
> On Fri, Jan 5, 2018 at 8:09 PM, Jonathan S. Katz <jkatz@postgresql.org>
> wrote:
>>
>> Hi,
>>
>> On Jan 5, 2018, at 1:33 PM, Steve Atkins <steve@blighty.com> wrote:
>>
>>
>> On Jan 5, 2018, at 10:00 AM, Stephen Frost <sfrost@snowman.net> wrote:
>>
>> Greetings,
>>
>> * Moser, Glen G (Glen.Moser@charter.com) wrote:
>>
>> That's really the gist of the concern from a team member of mine.  Not
>> that the 4TB number is wrong but that it could be misleading to assume that
>> 4TB is some sort of upper bound.
>>
>> That's how this concern was relayed to me and I am just following up.
>>
>>
>> Well, saying 'in excess of' is pretty clear, but I don't think the
>> sentence is really adding much either, so perhaps we should just remove
>> it.
>>
>>
>> It's been useful a few times to reassure people that we can handle "large"
>> databases operationally, rather than just having large theoretical limits.
>>
>> Updating it would be great, or wrapping a little more verbiage around the
>> 4TB number, but a mild -1 on removing it altogether.
>>
>>
>> Here is a proposed patch that updates the wording:
>>
>> "There are active PostgreSQL instances in production environments that
>> manage many terabytes of data, as well as clusters managing petabytes.”
>>
>> The idea is that it gives a sense of scope for how big instances/clusters
>> can run without fixing people on a number.  People can draw their own
>> conclusions from the hard limits further down the page.
>>
> +1.

I don't think that's as useful, so -1 for removing the stated limit.

People always ask "how big can it go?" and having a specific number
there is important. We have publicly documented cases above 50TB, so I
think we should say that.

Clusters in Petabyte range? We need to be able to substantiate that
with publicly documented cases. They also need to be pure PostgreSQL,
not "with added tech", no?


Also, I can't see that the 1.6 TB per row is accurate, because that
would mean 1600 toast pointers at 20 bytes each = 32000 bytes, which
is above what we can normally support with 8kB blocksize as we
normally shipped.

Lastly, the "per table limit" should really say "32 TB per table, 128
PB for a partitioned table (4000 partitions)"

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-docs by date:

Previous
From: "Jonathan S. Katz"
Date:
Subject: Re: Is this still accurate?
Next
From: "Brian McKiernan"
Date:
Subject: Advice on Contiguous IDs