Thread: Table Partitioning and Rules

Table Partitioning and Rules

From
"Girish Bajaj"
Date:
<div class="Section1"><p class="MsoNormal"><font face="Times New Roman" size="3"><span style="font-size:
12.0pt">Hello All,</span></font><p class="MsoNormal"><font face="Times New Roman" size="3"><span style="font-size:
12.0pt"> </span></font><p class="MsoNormal"><font face="Times New Roman" size="3"><span style="font-size:
12.0pt">I have a set of tables partitioned horizontally. DML below. </span></font><p class="MsoNormal"><font
face="TimesNew Roman" size="3"><span style="font-size: 
12.0pt"> </span></font><p class="MsoNormal"><font face="Times New Roman" size="3"><span style="font-size:
12.0pt">Essentially Im trying to store a persons information in a table in the database. Since we could have millions
ofpeople, with duplicates! Ive decided we need to partition the table into segments where all people with the LastName
startingfrom A to G will be in one table. H-N will be in another table and O-Z in the third. Ive created a VIEW that
doesa </span></font>UNION on all the tables.<p class="MsoNormal"><font face="Times New Roman" size="3"><span
style="font-size:
12.0pt"><br /> Now, Im trying to create an INSERT and UPDATE RULE on the VIEW that would direct an insert or update
intothe appropriate partition table depending on the person LastName. <br /><br /> I cant seem to figure this out. Does
anyoneknow how to do this?</span></font><p class="MsoNormal"><font face="Times New Roman" size="3"><span
style="font-size:
12.0pt"> </span></font><p class="MsoNormal"><font face="Times New Roman" size="3"><span style="font-size:
12.0pt">Thanks,<br /> Girish<br /><br /><br /> -- Table: public."contact_A_G"<br /> CREATE TABLE public."contact_A_G"
(<br/>   "CONTACTID" int8 NOT NULL,<br />   "LastName" varchar(50),<br />   "FirstName" varchar(50),<br />   CONSTRAINT
"contact_A_G_pkey"PRIMARY KEY ("CONTACTID")<br /> ) WITH OIDS;<br /><br /> -- Table: public."contact_H_N"<br /> CREATE
TABLEpublic."contact_H_N" (<br />   "CONTACTID" int8 NOT NULL,<br />   "LastName" varchar(50),<br />   "FirstName"
varchar(50),<br/>   CONSTRAINT "contact_H_N_pkey" PRIMARY KEY ("CONTACTID")<br /> ) WITH OIDS;<br /><br /> -- Table:
public."contact_O_Z"<br/> CREATE TABLE public."contact_O_Z" (<br />   "CONTACTID" int8 NOT NULL,<br />   "LastName"
varchar(50),<br/>   "FirstName" varchar(50),<br />   CONSTRAINT "contact_O_Z_pkey" PRIMARY KEY ("CONTACTID")<br /> )
WITHOIDS;<br /><br /><br /> CREATE VIEW </span></font>Contact AS<br /> SELECT * FROM "Contact_A_G"<br /> UNION<br />
SELECT* FROM "Contact_H_M"<br /> UNION<br /> SELECT * FROM "Contact_N_Z";<br /><br /></div> 

Re: Table Partitioning and Rules

From
Richard Huxton
Date:
On Thursday 17 Jul 2003 6:20 pm, Girish Bajaj wrote:
> Hello All,
>
> I have a set of tables partitioned horizontally. DML below.
>
> Essentially Im trying to store a persons information in a table in the
> database. Since we could have millions of people, with duplicates! Ive
> decided we need to partition the table into segments where all people with
> the LastName starting from A to G will be in one table. H-N will be in
> another table and O-Z in the third. Ive created a VIEW that does a UNION on
> all the tables.

I'd not bother - databases are good at managing large amounts of information.
If you really need to you can set up multiple partial indexes:

CREATE INDEX my_index_a ON contact (LastName) WHERE LastName>='A' AND
LastName<'B'
CREATE INDEX my_index_b ON contact (LastName) WHERE LastName>='B' AND
LastName<'C'
etc.

> Now, Im trying to create an INSERT and UPDATE RULE on the VIEW that would
> direct an insert or update into the appropriate partition table depending
> on the person LastName.
>
> I cant seem to figure this out. Does anyone know how to do this?

Something like:
CREATE RULE ... ON INSERT TO Contact WHERE NEW.LastName>='A' AND
NEW.LastName<'H'
DO INSTEAD ...insert into correct table here...

You'll want to read the chapter on the rule system in the "Server Programming"
section of the manual.

--  Richard Huxton


Re: Table Partitioning and Rules

From
Josh Berkus
Date:
Girish,

> > Essentially Im trying to store a persons information in a table in the
> > database. Since we could have millions of people, with duplicates! Ive
> > decided we need to partition the table into segments where all people with
> > the LastName starting from A to G will be in one table. H-N will be in
> > another table and O-Z in the third. Ive created a VIEW that does a UNION
on
> > all the tables.

This sounds hideously inefficient and a management headache besides.  I think
PostgreSQL will accept up to 2 billion rows in any one table, and splitting
stuff into 3 tables will not improve your performance ... quite the opposite.

Change your database design.

--
-Josh BerkusAglio Database SolutionsSan Francisco



Re: Table Partitioning and Rules

From
Rod Taylor
Date:
On Thu, 2003-07-17 at 19:03, Josh Berkus wrote:
> Girish,
>
> > > Essentially Im trying to store a persons information in a table in the
> > > database. Since we could have millions of people, with duplicates! Ive
> > > decided we need to partition the table into segments where all people with
> > > the LastName starting from A to G will be in one table. H-N will be in
> > > another table and O-Z in the third. Ive created a VIEW that does a UNION
> on
> > > all the tables.
>
> This sounds hideously inefficient and a management headache besides.  I think
> PostgreSQL will accept up to 2 billion rows in any one table, and splitting
> stuff into 3 tables will not improve your performance ... quite the opposite.

PostgreSQL will go well beyond 2 billion rows in a table.  It just
becomes difficult to use OIDs.

Re: Table Partitioning and Rules

From
"Girish Bajaj"
Date:
The problem is that Im worried about sequential scans. This particular table
can have upto 150 cols and 250 million records. Now we have a reporting
requirement that someone could select on ANY col and filter on any col as
well. Meaning someone could so a SELECT on col number 1,2,310,1000 from
contact where col num 75='X' and col num 139 = 'Y'.

I cant possibly index all the cols in the table. So I thought Id best manage
the data by splitting up the table into multiple partitions and eventually
depending on application logic, only scan those tables that are necessary to
scan sequentially instead of the whole big table.

Im getting a little confused here cause eventually I would want to join in
this 250 million gigantic table as well.. and that would be a real big
problem causing loads of sequential scans wouldn't it?

Thanks,
Girish


-----Original Message-----
From: Josh Berkus [mailto:josh@agliodbs.com]
Sent: Thursday, July 17, 2003 2:03 PM
To: Richard Huxton; Girish Bajaj; pgsql-sql@postgresql.org
Subject: Re: [SQL] Table Partitioning and Rules

Girish,

> > Essentially Im trying to store a persons information in a table in the
> > database. Since we could have millions of people, with duplicates! Ive
> > decided we need to partition the table into segments where all people
with
> > the LastName starting from A to G will be in one table. H-N will be in
> > another table and O-Z in the third. Ive created a VIEW that does a UNION

on
> > all the tables.

This sounds hideously inefficient and a management headache besides.  I
think
PostgreSQL will accept up to 2 billion rows in any one table, and splitting
stuff into 3 tables will not improve your performance ... quite the
opposite.

Change your database design.

--
-Josh BerkusAglio Database SolutionsSan Francisco




Re: Table Partitioning and Rules

From
Josh Berkus
Date:
Girish,

> I cant possibly index all the cols in the table.

Why not?

Seriously, what you can do is use query logging to log the columns and
combinations of columns that users most frequently ask for.  Then you can
index for those queries.

Your partitioning scheme assumes that users will be searching on the last name
most of the time, so I'd start with that.

> Im getting a little confused here cause eventually I would want to join in
> this 250 million gigantic table as well.. and that would be a real big
> problem causing loads of sequential scans wouldn't it?

Yes, it would.

--
-Josh BerkusAglio Database SolutionsSan Francisco



Re: Table Partitioning and Rules

From
Steve Crawford
Date:
OK, so basically you are trying to keep a hundered some odd attributes on 
everyone in the US. It's possible that a 150 column table is properly 
normalized (I have a similar situation) but it is rare.

Suppose it is really properly normalized. You can still benefit from indexes 
on just some of the columns by choosing those most commonly used in queries. 
You may also want to research partial indexes (create index foo ... where 
bar=baz) which can under certain circumstances be far smaller and faster than 
full indexes.

Breaking the table up won't improve the full table scan - in fact as each 
sub-table's file grows it will probably fragment on the disk much worse than 
a single growing file would which will, along with all the overhead of 
joining all the tables, make things worse.

Review your structure carefully. Plan on $$$ for the hardware.

Cheers,
Steve


On Thursday 17 July 2003 12:22 pm, Girish Bajaj wrote:
> The problem is that Im worried about sequential scans. This particular
> table can have upto 150 cols and 250 million records. Now we have a
> reporting requirement that someone could select on ANY col and filter on
> any col as well. Meaning someone could so a SELECT on col number
> 1,2,310,1000 from contact where col num 75='X' and col num 139 = 'Y'.
>
> I cant possibly index all the cols in the table. So I thought Id best
> manage the data by splitting up the table into multiple partitions and
> eventually depending on application logic, only scan those tables that are
> necessary to scan sequentially instead of the whole big table.
>
> Im getting a little confused here cause eventually I would want to join in
> this 250 million gigantic table as well.. and that would be a real big
> problem causing loads of sequential scans wouldn't it?
>
> Thanks,
> Girish
>
>
> -----Original Message-----
> From: Josh Berkus [mailto:josh@agliodbs.com]
> Sent: Thursday, July 17, 2003 2:03 PM
> To: Richard Huxton; Girish Bajaj; pgsql-sql@postgresql.org
> Subject: Re: [SQL] Table Partitioning and Rules
>
> Girish,
>
> > > Essentially Im trying to store a persons information in a table in the
> > > database. Since we could have millions of people, with duplicates! Ive
> > > decided we need to partition the table into segments where all people
>
> with
>
> > > the LastName starting from A to G will be in one table. H-N will be in
> > > another table and O-Z in the third. Ive created a VIEW that does a
> > > UNION
>
> on
>
> > > all the tables.
>
> This sounds hideously inefficient and a management headache besides.  I
> think
> PostgreSQL will accept up to 2 billion rows in any one table, and splitting
> stuff into 3 tables will not improve your performance ... quite the
> opposite.
>
> Change your database design.



Re: Table Partitioning and Rules

From
Dmitry Tkach
Date:
But what makes you think, that it is quicker to scan 10 tables with 25 
million rows each than it would be to scan one table with 250 million rows?
It won't... In fact, it will be *longer*.

If you have a way to narrow the number of tables to scan down based on 
the condition, you can have that logic implemented with partial indices, 
as it was suggested earlier in this thread...

Dima

Girish Bajaj wrote:

>The problem is that Im worried about sequential scans. This particular table
>can have upto 150 cols and 250 million records. Now we have a reporting
>requirement that someone could select on ANY col and filter on any col as
>well. Meaning someone could so a SELECT on col number 1,2,310,1000 from
>contact where col num 75='X' and col num 139 = 'Y'.
>
>I cant possibly index all the cols in the table. So I thought Id best manage
>the data by splitting up the table into multiple partitions and eventually
>depending on application logic, only scan those tables that are necessary to
>scan sequentially instead of the whole big table.
>
>Im getting a little confused here cause eventually I would want to join in
>this 250 million gigantic table as well.. and that would be a real big
>problem causing loads of sequential scans wouldn't it?
>
>Thanks,
>Girish
>
>
>-----Original Message-----
>From: Josh Berkus [mailto:josh@agliodbs.com] 
>Sent: Thursday, July 17, 2003 2:03 PM
>To: Richard Huxton; Girish Bajaj; pgsql-sql@postgresql.org
>Subject: Re: [SQL] Table Partitioning and Rules
>
>Girish,
>
>  
>
>>>Essentially Im trying to store a persons information in a table in the
>>>database. Since we could have millions of people, with duplicates! Ive
>>>decided we need to partition the table into segments where all people
>>>      
>>>
>with
>  
>
>>>the LastName starting from A to G will be in one table. H-N will be in
>>>another table and O-Z in the third. Ive created a VIEW that does a UNION
>>>      
>>>
>
>on
>  
>
>>>all the tables.
>>>      
>>>
>
>This sounds hideously inefficient and a management headache besides.  I
>think 
>PostgreSQL will accept up to 2 billion rows in any one table, and splitting 
>stuff into 3 tables will not improve your performance ... quite the
>opposite.
>
>Change your database design.
>
>  
>




Re: Table Partitioning and Rules

From
Greg Stark
Date:
"Girish Bajaj" <gbajaj@tietronix.com> writes:

> I cant possibly index all the cols in the table. So I thought Id best manage
> the data by splitting up the table into multiple partitions and eventually
> depending on application logic, only scan those tables that are necessary to
> scan sequentially instead of the whole big table.

But that's only going to help if one of the columns they're searching on is
the last name column isn't it?

I'm a fan of partitioned tables but you have to consider what advantage you're
trying to achieve to understand if it will actually be helpful for you:


Partitioned tables only really improve query performance if virtually all
queries use a common constraint. The canonical example is accounting tables
being partitioned based on fiscal year. Virtually all the queries--even ones
doing massive batch queries best served by sequential scans--will only scan
the current fiscal year.

In your case unless you can impose a constraint on the UI that users always
perform their queries on a single letter of the alphabet at a time and only
see results for people whose last names match that letter, it's not really a
great match as far as query performance.


The other advantage of partitioned tables is space management; it allows
placing each partition on a separate physical storage space. However without
native support in Postgres doing it via rules is going to be a headache. I
would think you would be better off striping the disks together and storing it
as a single large table. That's the only clean approach Postgres really allows
at this point anyways. 


Finally, if I WAS going to partition based on the first letter of a text
string, which I doubt I would, I would probably create 26 partitions right off
the bat. Not try to make up arbitrary break points. If those arbitrary
breakpoints turn out to be poorly chosen it'll be a complex manual job to move
them. Whereas if you just have 26 partitions some will be large and some small
and you can move partitions between physical storage freely to balance things.

-- 
greg



Re: Table Partitioning and Rules

From
Richard Huxton
Date:
On Thursday 17 Jul 2003 8:39 pm, Steve Crawford wrote:
> OK, so basically you are trying to keep a hundered some odd attributes on
> everyone in the US. It's possible that a 150 column table is properly
> normalized (I have a similar situation) but it is rare.
>
> Suppose it is really properly normalized. You can still benefit from
> indexes on just some of the columns by choosing those most commonly used in
> queries. You may also want to research partial indexes (create index foo
> ... where bar=baz) which can under certain circumstances be far smaller and
> faster than full indexes.

The other issue is, that even if your table is normalised you may want to
split vertically. That's going to depend on usage patterns, and I don't know
what you've got, but say you used three tables:

contact_address
contact_personal
contact_bank_details

Now a search by sales would be interested in ...address & ...personal whereas
accounts would look at ...address & ...bank_details.

This _might_ make sense, but probably only if you can group columns into
related groups and users are more interested in some groups than others. If
you are lucky the gains might be in cache usage, whereas the costs will be in
joining groups for results.

I'm not saying you should do this just to try and improve performance, but it
might make sense if users look at it that way.

> Review your structure carefully. Plan on $$$ for the hardware.

Or remind your users that patience is a virtue ;-)

--  Richard Huxton