2D partitioning of VLDB - sane or not? - Mailing list pgsql-hackers
From | Jason Nerothin |
---|---|
Subject | 2D partitioning of VLDB - sane or not? |
Date | |
Msg-id | f42b58b90708091434w396fe7e0s418256b29d32ce7d@mail.gmail.com Whole thread Raw |
Responses |
Re: 2D partitioning of VLDB - sane or not?
|
List | pgsql-hackers |
I am building up a schema for storing a bunch of data about proteins, which on a certain level can be modelled with quitesimple tables. The problem is that the database I am building needs to house lots of it >10TB and growing, with onetable in particular threatening to top 1TB. In the case of the table and in the case of the overall database, the sizecan be expected to grow quickly (and most of it can never be deleted). <br /><br />In the past, with smaller tables,I have had success partitioning on a 64-bit crc hash that takes a more or less uniform distribution of input dataand pumps out a more-or-less uniform distribution of partitioned data with a very small probability of collision. Thehash itself is implemented as a c add-on library, returns a BIGINT and serves as a candidate key for what for our purposeswe can call a protein record. <br /><br />Now back to the big table, which relates two of these records (in a theoreticallysymmetric way). Assuming I set the the table up as something like:<br /><br />CREATE TABLE big_protein_relation_partition_dimA_dimB{<br/> protein_id_a BIGINTEGER NOT NULL CHECK( bin_num(protein_id_a) = dimA ), ---key (hash) from some table<br />protein_id_a BIGINTEGER NOT NULL CHECK( bin_num(protein_id_b) = dimB ), --- key (hash)from some table<br />...<br />} <br /><br />and do a little c bit-twiddling and define some binning mechanism on theBIGINTEGERs.<br /><br />As near I can tell, binning out along the A and B dimensions into 256 bins, I shouldn't be inany danger of running out of OIDs or anything like that (despite having to deal with 2^16 tables). Theoretically, at least,I should be able to do UNIONS along each axis (to avoid causing the analyzer too much overhead) and use range exclusionto make my queries zip along with proper indexing. <br /><br />Aside from running into a known bug with "too manytriggers" when creating gratuitous indices on these tables, I feel as it may be possible to do what I want without breakingeverything. But then again, am I taking too many liberties with technology that maybe didn't have use cases likethis one in mind? <br /><br />Jason<br /><br />-- <br />========================================================<br />JasonNerothin<br />Programmer/Analyst IV - Database Administration<br />UCLA-DOE Institute for Genomics & Proteomics<br/>Howard Hughes Medical Institute <br />========================================================<br />611 C.E.Young Drive East | Tel: (310) 206-3907<br />105 Boyer Hall, Box 951570 | Fax: (310) 206-3914<br />Los Angeles, CA90095. USA | Mail: <a href="mailto:jason@mbi.ucla.edu"> jason@mbi.ucla.edu</a><br />========================================================<br/><a href="http://www.mbi.ucla.edu/~jason">http://www.mbi.ucla.edu/~jason</a><br />========================================================
pgsql-hackers by date: