Re: On columnar storage - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: On columnar storage
Date
Msg-id 557D9E02.6030605@2ndquadrant.com
Whole thread Raw
In response to Re: On columnar storage  (Michael Nolan <htfoot@gmail.com>)
Responses Re: On columnar storage  (Michael Nolan <htfoot@gmail.com>)
List pgsql-hackers
Hi,

On 06/13/15 00:07, Michael Nolan wrote:
>
>
> On Thu, Jun 11, 2015 at 7:03 PM, Alvaro Herrera
> <alvherre@2ndquadrant.com <mailto:alvherre@2ndquadrant.com>> wrote:
>
>     We hope to have a chance to discuss this during the upcoming developer
>     unconference in Ottawa.  Here are some preliminary ideas to shed some
>     light on what we're trying to do.
>
>
>     I've been trying to figure out a plan to enable native column stores
>     (CS or "colstore") for Postgres.  Motivations:
>
>     * avoid the 32 TB limit for tables
>     * avoid the 1600 column limit for tables
>     * increased performance
>
> Are you looking to avoid all hardware-based limits, or would using a 64
> bit row pointer be possible?  That would give you 2^64 or 1.8 E19 unique
> rows over whatever granularity/uniqueness you use (per table, per
> database, etc.)
> --
> Mike Nolan.

I don't think the number of tuples is the main problem here, it's the 
number of pages a single relation can have. Looking at the numbers of 
rows as a direct function of TID size is misleading, because the TID is 
split into two fixed parts - page number (32b) and tuple number (16b).

For the record, 2^48 is 281,474,976,710,656 which ought to be enough for 
anybody, but we waste large part of that because we assume there might 
be up to 2^16 tuples per page, although the actual limit is way lower 
(~290 for 8kB pages, and ~1200 for 32kB pages.

So we can only have ~4 billion pages, which is where the 32TB limit 
comes from (with 32kB pages it's 128TB).

Longer TIDs are one a straightforward way to work around this limit, 
assuming you add the bits to the 'page number' field. Adding 16 bits 
(thus using 64-bit pointers) would increase the limit 2^16-times to 
about 2048 petabytes (with 8kB pages). But that of course comes with a 
cost, because you have to keep those larger TIDs in indexes etc.

Another option might be to split the 48 bits differently, by moving 5 
bits to the page number part of TID (so that we expect ~2048 tuples per 
page at most). That'd increase the limit to 1PB (4PB with 32kB pages).

The column store approach is somehow orthogonal to this, because it 
splits the table vertically into multiple pieces, each stored in a 
separate relfilenode and thus using a separate sequence of page numbers.

And of course, the usual 'horizontal' partitioning has a very similar 
effect (separate filenodes).

regards

--
Tomas Vondra                   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Petr Jelinek
Date:
Subject: pg_resetsysid
Next
From: Tom Lane
Date:
Subject: Re: Entities created in one query not available in another in extended protocol