Home > mailing lists

Re: Partitioning feature ... - Mailing list pgsql-hackers

From	Kedar Potdar
Subject	Re: Partitioning feature ...
Date	March 30, 2009 12:16:42
Msg-id	bd8134a40903300451lfbc1cc5w5756140902f2689c@mail.gmail.com Whole thread Raw
In response to	Re: Partitioning feature ... (Emmanuel Cecchet <manu@asterdata.com>)
Responses	Re: Partitioning feature ... (Jaime Casanova <jcasanov@systemguards.com.ec>)
List	pgsql-hackers

Tree view

Hi Emmanuel,

Thanks for your time. This is a WIP patch and we will integrate your suggestions/comments as appropriate.

Regards,
--
Kedar.

On Fri, Mar 27, 2009 at 3:38 AM, Emmanuel Cecchet <manu@asterdata.com> wrote:

Hi Kedar,

First of all, congratulations for the excellent work.
I have some comments and questions.

In get_relevent_partition (btw, relevant is spelled with an a) you are maintaining 2 lists.

> Oops! 'a' typographical error.

I guess this is only useful for multi-column partitions, right?
If you have a single column partition (without subpartitions), I think you could directly return on the first match (without maintaining any list) since you guarantee that there is no overlap between partitions.
A simple but effective optimization for inserts consists of caching the last partition used (consecutive inserts often go to the same partition) and try it first before going through the whole loop.

> Yep.

The update trigger should first check if the tuple needs to be moved. If the updated tuple still matches the constraints of the partitions it will not have to be moved and will save a lot of overhead.

> Yes. We agree on that.

The COPY operation should probably be optimized to use the same code as the one in the insert trigger for partitioned tables. I guess some code could be factorized in COPY to make the inserts more efficient.

The current trigger approach should prevent other triggers to be added to the table, or you should make sure that the partition trigger is always the one to execute last.

> As triggers are executed in order of their names, we've prefixed the trigger names with "zz". This should work fine as long as no-one uses trigger-name which starts with "zz".

As we don't have automatic partition creation, it would be interesting to have an optional mechanism to deal with tuples that don't match any partition (very useful when you do bulk insert and some new data require a new partition). Having a simple overflow partition or an error logging mechanism would definitely help to identify these tuples and prevent things like large COPY operations to fail.

> Will get back on this.

Looking forward to your responses,
Emmanuel

We are implementing table partitioning feature to support Range and Hash partitions. Please find attached, the WIP patch and test-cases.

The syntax used conforms to most of the suggestions mentioned in http://archives.postgresql.org/pgsql-hackers/2008-01/msg00413.php, barring the following:
-- Specification of partition names is optional. System will be able to generate partition names in such cases.
-- Sub partitioning

We are maintaining a system catalog(pg_partition) for partition meta-data. System will look-up this table to find appropriate partition to operate on.
System internally uses low-level 'C' triggers to row-movement.

Regards,
--
Kedar.

------------------------------------------------------------------------

--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com

pgsql-hackers by date:

From: Marko Kreen
Date: 30 March 2009, 10:37:57
Subject: 8.3.5: Crash in CountActiveBackends() - lockless race?

From: Dave Page
Date: 30 March 2009, 12:28:37
Subject: Re: [GENERAL] New shapshot RPMs (Mar 27, 2009) are ready for testing

Re: Partitioning feature ... - Mailing list pgsql-hackers

Previous

Next