Re: partitioning performance question - Mailing list pgsql-performance
From | Robert Klemme |
---|---|
Subject | Re: partitioning performance question |
Date | |
Msg-id | CAM9pMnOLvSJ9AYCEN-ka-DmB-cX5K78=YyHRnqFdyQXAOBVdiQ@mail.gmail.com Whole thread Raw |
In response to | partitioning performance question (Kevin Kempter <cs_dba@consistentstate.com>) |
List | pgsql-performance |
On Sat, Jun 9, 2012 at 7:58 PM, Kevin Kempter <cs_dba@consistentstate.com> wrote: > Hi All; > > We have a client that has a table where large blobs (bytea) are stored. the > table has a key column that is numbers (like 112362) but unfortunately it's > a varchar column so the blobs are accessed via queries like: > > select * from bigtable where keycol = '217765' > > The primary reason we want to partition the table is for maintenance, the > table is well over 1.2 Terabytes in size and they have never successfully > vacuumed it. However I don't want to make performance even worse. The table > does have a serial key, I'm thinking the best options will be to partition > by range on the serial key, or maybe via the keycol character column via > using an in statement on the check constraints, thus allowing the planner to > actually leverage the above sql. I suspect doing a typecast to integer in > the check constraints will prove to be a bad idea if the keycol column > remains a varchar. > > Thoughts? > > Here's the table: > > > Table "problemchild" > Column | Type | Modifiers > -----------+--------------------------+-------------------------------------------------------------------- > keycol | character varying | > blob_data | bytea | > removed_date | timestamp with time zone | > alt_key | bigint | not null default > nextval('problemchild_alt_key_seq'::regclass) > Indexes: > "pc_pkey" PRIMARY KEY, btree (alt_key) > "key2" btree (keycol) I find it odd that you have a column "keycol" which is not the PK and your PK column is named "alt_key". Is "keycol" always the character representation of "alt_key"? Are they unrelated? It would also help to know how the data in this table changes. Do you only ever add data? Is some data removed from time to time (maybe based on the "removed_date")? If the table grows continually then range partitioning sounds good. However, I think you should really make "keycol" a number type because otherwise range partitioning will be a pain (you would need to include the length of the string in the criterion if you want your varchar ranges to mimic number ranges). However, if you are deleting from time to time and hence the table does not grow in the long run then hash partitioning might be a better idea because then you do not need to create new partitions all the time. Example on alt_key create table problemchild ( keycol varchar(100), blob_data bytea, removed_date timestamp with time zone, alt_key bigint primary key ); create table problemchild_00 ( check ( alt_key % 16 = 0 ) ) inherits (problemchild); create table problemchild_01 ( check ( alt_key % 16 = 1 ) ) inherits (problemchild); create table problemchild_02 ( check ( alt_key % 16 = 2 ) ) inherits (problemchild); ... Kind regards robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/
pgsql-performance by date: