Thread: 2D partitioning of VLDB - sane or not?

2D partitioning of VLDB - sane or not?

From

"Jason Nerothin"

Date:

09 August 2007, 18:35:37

I am building up a schema for storing a bunch of data about proteins, which on a certain level can be modelled with
quitesimple tables. The problem is that the database I am building needs to house lots of it >10TB and growing, with
onetable in particular threatening to top 1TB. In the case of the table and in the case of the overall database, the
sizecan be expected to grow quickly (and most of it can never be deleted). <br /><br />In the past, with smaller
tables,I have had success partitioning on a 64-bit crc hash that takes a more or less uniform distribution of input
dataand pumps out a more-or-less uniform distribution of partitioned data with a very small probability of collision.
Thehash itself is implemented as a c add-on library, returns a BIGINT and serves as a candidate key for what for our
purposeswe can call a protein record. <br /><br />Now back to the big table, which relates two of these records (in a
theoreticallysymmetric way). Assuming I set the the table up as something like:<br /><br />CREATE TABLE
big_protein_relation_partition_dimA_dimB{<br/> protein_id_a BIGINTEGER NOT NULL CHECK( bin_num(protein_id_a) = dimA ),
---key (hash) from some table<br />protein_id_a BIGINTEGER NOT NULL CHECK( bin_num(protein_id_b) = dimB ), --- key
(hash)from some table<br />...<br />} <br /><br />and do a little c bit-twiddling and define some binning mechanism on
theBIGINTEGERs.<br /><br />As near I can tell, binning out along the A and B dimensions into 256 bins, I shouldn't be
inany danger of running out of OIDs or anything like that (despite having to deal with 2^16 tables). Theoretically, at
least,I should be able to do UNIONS along each axis (to avoid causing the analyzer too much overhead) and use range
exclusionto make my queries zip along with proper indexing. <br /><br />Aside from running into a known bug with "too
manytriggers" when creating gratuitous indices on these tables, I feel as it may be possible to do what I want without
breakingeverything. But then again, am I taking too many liberties with technology that maybe didn't have use cases
likethis one in mind? <br /><br />Jason<br /><br />-- <br />========================================================<br
/>JasonNerothin<br />Programmer/Analyst IV - Database Administration<br />UCLA-DOE Institute for Genomics &
Proteomics<br/>Howard Hughes Medical Institute <br />========================================================<br />611
C.E.Young Drive East   | Tel: (310) 206-3907<br />105 Boyer Hall, Box 951570  | Fax: (310) 206-3914<br />Los Angeles,
CA90095. USA | Mail: <a href="mailto:jason@mbi.ucla.edu"> jason@mbi.ucla.edu</a><br
/>========================================================<br/><a
href="http://www.mbi.ucla.edu/~jason">http://www.mbi.ucla.edu/~jason</a><br
/>========================================================

Re: 2D partitioning of VLDB - sane or not?

From

Josh Berkus

Date:

11 August 2007, 16:34:44

Jason,

> Aside from running into a known bug with "too many triggers" when creating
> gratuitous indices on these tables, I feel as it may be possible to do what
> I want without breaking everything. But then again, am I taking too many
> liberties with technology that maybe didn't have use cases like this one in
> mind?

Well, you're pushing PostgreSQL partitioning further than it's currently able 
to go.  Right now our range exclusion becomes too costly to be useful 
somewhere around 300 to 1000 partitions (depending on CPU and other issues) 
because the constraints are checked linearly.

To make your scheme work, you'd need to improve our existing partitioning 
mechanisms to scale to 100,000 partitions.  It would also help you to 
implement multiple inheritance so that you could have a partition which 
belonged to two masters.  I'd be very interested in seeing you do so, of 
course, but this may be more hacking than you had in mind.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

Re: 2D partitioning of VLDB - sane or not?

From

"Jason Nerothin"

Date:

13 August 2007, 14:49:35

Josh,

I think what you are suggesting is something like this:

-- begin SQL --
core=# CREATE TABLE temp_x( x_id BIGINT PRIMARY KEY, x_info VARCHAR(16) NOT NULL DEFAULT 'x_info');
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "temp_x_pkey" for table "temp_x"
CREATE TABLE
core=# CREATE TABLE temp_y( y_id BIGINT PRIMARY KEY, y_info VARCHAR(16) NOT NULL DEFAULT 'y_info');
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "temp_y_pkey" for table "temp_y"
CREATE TABLE
core=# CREATE TABLE temp_xy() INHERITS (temp_x, temp_y);
CREATE TABLE
core=# \d temp_xy
                             Table "core.temp_xy"
Column |         Type          |                  Modifiers
--------+-----------------------+----------------------------------------------
x_id   | bigint                | not null
x_info | character varying(16) | not null default 'x_info'::character varying
y_id   | bigint                | not null
y_info | character varying(16) | not null default 'y_info'::character varying
Inherits: temp_x,
          temp_y

-- end code --

The problem with this is what I really want to do is something like this:

-- begin code --
core=# CREATE TABLE temp_xx() INHERITS (temp_x, temp_x);
ERROR: inherited relation "temp_x" duplicated
-- end code --

The issue is that the relations are in fact reflexive and, due to the sheer size fo the data I'm trying to warehouse, I'd like not to keep them around more than once.

I'm sort of thinking aloud here, but based on what you've told me, I guess I'm left having to choose which direction I want to search in since the domains and ranges are theoretically the same. On the other hand, perhaps I could take the overhead impact and just keep two copies of the parent tables around. The relation table is on the order of about 300-500x as large as the parent tables and that multiplier is expected to stay relatively constant over time...?

Which brings us back to the original issue. If I decide to stick with the current implementation and not "improve our existing partitioning
mechanisms to scale to 100,000 partitions", I could do something like:

Maintain 2 copies of the parent table (partitioned by 256).
Inherit from both to a relation table.

Does this get me out of the woods with the query analyzer? Doesn't seem like it would, necessarily, at least.

On 8/11/07, Josh Berkus <josh@agliodbs.com> wrote:

Jason,

> Aside from running into a known bug with "too many triggers" when creating
> gratuitous indices on these tables, I feel as it may be possible to do what
> I want without breaking everything. But then again, am I taking too many
> liberties with technology that maybe didn't have use cases like this one in
> mind?

Well, you're pushing PostgreSQL partitioning further than it's currently able
to go.  Right now our range exclusion becomes too costly to be useful
somewhere around 300 to 1000 partitions (depending on CPU and other issues)
because the constraints are checked linearly.

To make your scheme work, you'd need to improve our existing partitioning
mechanisms to scale to 100,000 partitions.  It would also help you to
implement multiple inheritance so that you could have a partition which
belonged to two masters.  I'd be very interested in seeing you do so, of
course, but this may be more hacking than you had in mind.

--
Josh Berkus
PostgreSQL @ Sun
San Francisco

--
========================================================
Jason Nerothin
Programmer/Analyst IV - Database Administration
UCLA-DOE Institute for Genomics & Proteomics
Howard Hughes Medical Institute
========================================================
611 C.E. Young Drive East | Tel: (310) 206-3907
105 Boyer Hall, Box 951570 | Fax: (310) 206-3914
Los Angeles, CA 90095. USA | Mail: jason@mbi.ucla.edu
========================================================
http://www.mbi.ucla.edu/~jason
========================================================

Re: 2D partitioning of VLDB - sane or not?

From

Josh Berkus

Date:

13 August 2007, 20:20:17

Jason,

> Which brings us back to the original issue. If I decide to stick with
> the current implementation and not "improve our existing partitioning
> mechanisms to scale to 100,000 partitions", I could do something like:
>
> Maintain 2 copies of the parent table (partitioned by 256).
> Inherit from both to a relation table.
>
> Does this get me out of the woods with the query analyzer? Doesn't seem
> like it would, necessarily, at least.

You don't get a table's partitions when you inherit.  Just the schema of 
the master.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

Re: 2D partitioning of VLDB - sane or not?

From

"Zeugswetter Andreas ADI SD"

Date:

14 August 2007, 05:47:20

> Which brings us back to the original issue. If I decide to stick with
> the current implementation and not "improve our existing partitioning
> mechanisms to scale to 100,000 partitions", I could do something like:

There is a point where you can leave the selection of the correct rows
to normal btree indexes.
I'd say that that point currently is well below 2000 partitions for all
common db systems.
I opt, that more partitions will only be useful for very limited use
cases,
and would be very interested in hearing of a practical partitioning
scheme where more partitions still show a performance advantage (any db
system).

Looks like in your case a partitioning scheme with 256 partitions on one
of the 2 dimensions sounds reasonable.
Or 32 instead of 256 bins for each dim, if your searches are
bidirectional.

Andreas

Re: 2D partitioning of VLDB - sane or not?

From

Gregory Stark

Date:

14 August 2007, 06:55:09

"Zeugswetter Andreas ADI SD" <Andreas.Zeugswetter@s-itsolutions.at> writes:

> I'd say that that point currently is well below 2000 partitions for all
> common db systems.

I think it will depend heavily on the type of queries you're talking about.
Postgres's constraint_exclusion is a linear search and does quite a bit of
work for each constraint. So it isn't terribly efficient for more than 1,000
partitions or so.

*But* that only affects planning time. If your queries are always effectively
pruned to few partitions and you execute them thousands of times then you not
care about slow planning time.

And if the overall table is large enough and you're dropping and loading
partitions then you may still be benefiting from partitioning by keeping all
the loaded records together and allowing dropping a partition to be constant
time.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com