Re: On columnar storage - Mailing list pgsql-hackers

From Michael Nolan
Subject Re: On columnar storage
Date
Msg-id CAOzAquJGTzR6vSbsiZXBys2OKZdLRDYk3kqp+Dp+Bko8SeyOAA@mail.gmail.com
Whole thread Raw
In response to Re: On columnar storage  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers


On Sun, Jun 14, 2015 at 10:30 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

 

Are you looking to avoid all hardware-based limits, or would using a 64
bit row pointer be possible?  That would give you 2^64 or 1.8 E19 unique
rows over whatever granularity/uniqueness you use (per table, per
database, etc.)
--
Mike Nolan.

I don't think the number of tuples is the main problem here, it's the number of pages a single relation can have. Looking at the numbers of rows as a direct function of TID size is misleading, because the TID is split into two fixed parts - page number (32b) and tuple number (16b).

For the record, 2^48 is 281,474,976,710,656 which ought to be enough for anybody, but we waste large part of that because we assume there might be up to 2^16 tuples per page, although the actual limit is way lower (~290 for 8kB pages, and ~1200 for 32kB pages.

So we can only have ~4 billion pages, which is where the 32TB limit comes from (with 32kB pages it's 128TB).

Longer TIDs are one a straightforward way to work around this limit, assuming you add the bits to the 'page number' field. Adding 16 bits (thus using 64-bit pointers) would increase the limit 2^16-times to about 2048 petabytes (with 8kB pages). But that of course comes with a cost, because you have to keep those larger TIDs in indexes etc.

Another option might be to split the 48 bits differently, by moving 5 bits to the page number part of TID (so that we expect ~2048 tuples per page at most). That'd increase the limit to 1PB (4PB with 32kB pages).

The column store approach is somehow orthogonal to this, because it splits the table vertically into multiple pieces, each stored in a separate relfilenode and thus using a separate sequence of page numbers.

And of course, the usual 'horizontal' partitioning has a very similar effect (separate filenodes).

regards

--
Tomas Vondra                   http://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Thanks for the reply. It's been a while since my last data structures course (1971), but I do remember a few things.  I have never personally needed more than 1500 columns in a table, but can see how some might.  Likewise, the 32TB limit hasn't affected me yet, either.  I doubt either ever will.

Solving either or both of those seems like it may at some point require a larger bit space for (at least some) TIDs, which is why I was wondering if a goal here is to eliminate all (practical) limits, 

It probably doesn't make sense to force all users to use that large bit space (with the associated space and performance penalties)  If there's a way to do this, then you are all truly wizards. (This all reminds me of how the IP4 bit space was parcelled up into Class A, B, C and D addresses, at a time when people thought 32 bits would last us forever.  Maybe 128 bits actually will.)
--
Mike Nolan



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Memory Accounting v11
Next
From: Petr Jelinek
Date:
Subject: creating extension including dependencies