Thread:

From

"Matt Casters"

Date:

20 January 2005, 09:34:50

Hi,

I have the go ahead of a customer to do some testing on Postgresql in a couple of weeks as a
replacement for Oracle.
The reason for the test is that the number of users of the warehouse is going to increase and this
will have a serious impact on licencing costs. (I bet that sounds familiar)

We're running a medium sized data warehouse on a Solaris box (4CPU, 8Gb RAM) on Oracle.
Basically we have 2 large fact tables to deal with: one going for 400M rows, the other will be
hitting 1B rows soon.
(around 250Gb of data)

My questions to the list are: has this sort of thing been attempted before? If so, what where the
performance results compared to Oracle?
I've been reading up on partitioned tabes on pgsql, will the performance benefit will be
comparable to Oracle partitioned tables?
What are the gotchas?
Should I be testing on 8 or the 7 version?
While I didn't find any documents immediately, are there any fine manuals to read on data
warehouse performance tuning on PostgreSQL?

Thanks in advance for any help you may have, I'll do my best to keep pgsql-performance up to date
on the results.

Best regards,

Matt
------
Matt Casters <matt.casters@ibridge.be>
i-Bridge bvba, http://www.kettle.be
Fonteinstraat 70, 9400 Okegem, Belgium
Phone +32 (0) 486/97.29.37

Re:

From

Stephen Frost

Date:

20 January 2005, 14:26:01

* Matt Casters (Matt.Casters@advalvas.be) wrote:
> I have the go ahead of a customer to do some testing on Postgresql in a couple of weeks as a
> replacement for Oracle.
> The reason for the test is that the number of users of the warehouse is going to increase and this
> will have a serious impact on licencing costs. (I bet that sounds familiar)

Rather familiar, yes... :)

> We're running a medium sized data warehouse on a Solaris box (4CPU, 8Gb RAM) on Oracle.
> Basically we have 2 large fact tables to deal with: one going for 400M rows, the other will be
> hitting 1B rows soon.
> (around 250Gb of data)

Quite a bit of data.  There's one big thing to note here I think-
Postgres will not take advantage of multiple CPUs for a given query,
Oracle will.  So, it depends on your workload as to how that may impact
you.  Situations where this will be unlikely to affect you:

Your main bottle-neck is IO/disk and not CPU.
You run multiple queries in parallel frequently.
There are other processes on the system which chew up CPU time anyway.

Situations where you're likely to be affected would be:

You periodically run one big query.
You run a set of queries in sequential order.

> My questions to the list are: has this sort of thing been attempted before? If so, what where the
> performance results compared to Oracle?

I'm pretty sure it's been attempted before but unfortunately I don't
have any numbers on it myself.  My data sets aren't that large (couple
million rows) but I've found PostgreSQL at least as fast as Oracle for
what we do, and much easier to work with.

> I've been reading up on partitioned tabes on pgsql, will the performance benefit will be
> comparable to Oracle partitioned tables?

In this case I would think so, except that PostgreSQL still won't use
multiple CPUs for a given query, even against partitioned tables, aiui.

> What are the gotchas?

See above? :)  Other issues are things having to do w/ your specific
SQL- Oracle's old join syntax isn't supported by PostgreSQL (what is it,
something like select x,y from a,b where x=%y; to do a right-join,
iirc).

> Should I be testing on 8 or the 7 version?

Now that 8.0 is out I'd say probably test with that and just watch for
8.0.x releases before you go production, if you have time before you
have to go into production with the new solution (sounds like you do-
changing databases takes time anyway).

> Thanks in advance for any help you may have, I'll do my best to keep pgsql-performance up to date
> on the results.

Hope that helps.  Others on here will correct me if I misspoke. :)

    Stephen

Attachment

signature.asc

Re:

From

Alex Turner

Date:

20 January 2005, 16:31:36

I am curious - I wasn't aware that postgresql supported partitioned tables,
Could someone point me to the docs on this.

Thanks,

Alex Turner
NetEconomist


On Thu, 20 Jan 2005 09:26:03 -0500, Stephen Frost <sfrost@snowman.net> wrote:
> * Matt Casters (Matt.Casters@advalvas.be) wrote:
> > I have the go ahead of a customer to do some testing on Postgresql in a couple of weeks as a
> > replacement for Oracle.
> > The reason for the test is that the number of users of the warehouse is going to increase and this
> > will have a serious impact on licencing costs. (I bet that sounds familiar)
>
> Rather familiar, yes... :)
>
> > We're running a medium sized data warehouse on a Solaris box (4CPU, 8Gb RAM) on Oracle.
> > Basically we have 2 large fact tables to deal with: one going for 400M rows, the other will be
> > hitting 1B rows soon.
> > (around 250Gb of data)
>
> Quite a bit of data.  There's one big thing to note here I think-
> Postgres will not take advantage of multiple CPUs for a given query,
> Oracle will.  So, it depends on your workload as to how that may impact
> you.  Situations where this will be unlikely to affect you:
>
> Your main bottle-neck is IO/disk and not CPU.
> You run multiple queries in parallel frequently.
> There are other processes on the system which chew up CPU time anyway.
>
> Situations where you're likely to be affected would be:
>
> You periodically run one big query.
> You run a set of queries in sequential order.
>
> > My questions to the list are: has this sort of thing been attempted before? If so, what where the
> > performance results compared to Oracle?
>
> I'm pretty sure it's been attempted before but unfortunately I don't
> have any numbers on it myself.  My data sets aren't that large (couple
> million rows) but I've found PostgreSQL at least as fast as Oracle for
> what we do, and much easier to work with.
>
> > I've been reading up on partitioned tabes on pgsql, will the performance benefit will be
> > comparable to Oracle partitioned tables?
>
> In this case I would think so, except that PostgreSQL still won't use
> multiple CPUs for a given query, even against partitioned tables, aiui.
>
> > What are the gotchas?
>
> See above? :)  Other issues are things having to do w/ your specific
> SQL- Oracle's old join syntax isn't supported by PostgreSQL (what is it,
> something like select x,y from a,b where x=%y; to do a right-join,
> iirc).
>
> > Should I be testing on 8 or the 7 version?
>
> Now that 8.0 is out I'd say probably test with that and just watch for
> 8.0.x releases before you go production, if you have time before you
> have to go into production with the new solution (sounds like you do-
> changing databases takes time anyway).
>
> > Thanks in advance for any help you may have, I'll do my best to keep pgsql-performance up to date
> > on the results.
>
> Hope that helps.  Others on here will correct me if I misspoke. :)
>
>         Stephen
>
>
>

Re:

From

Greg Stark

Date:

20 January 2005, 16:34:16

"Matt Casters" <Matt.Casters@advalvas.be> writes:

> I've been reading up on partitioned tabes on pgsql, will the performance
> benefit will be comparable to Oracle partitioned tables?

Postgres doesn't have any built-in support for partitioned tables. You can do
it the same way people did it on Oracle up until 8.0 which is by creating
views of UNIONs or using inherited tables.

The main advantage of partitioned tables is being able to load and drop data
in large chunks instantaneously. This avoids having to perform large deletes
and then having to vacuum huge tables to recover the space.

However in Postgres you aren't going to get most of the performance advantage
of partitions in your query plans. The Oracle planner can prune partitions it
knows aren't relevant to the query to avoid having to search through them.

This can let it get the speed of a full table scan without the disadvantage of
having to read irrelevant tuples. Postgres is sometimes going to be forced to
either do a much slower index scan or read tables that aren't relevant.

--
greg

Re:

From

"Matt Casters"

Date:

20 January 2005, 20:06:26

Thanks Stephen,

My main concern is to get as much read performance on the disks as possible
on this given system.  CPU is rarely a problem on a typical data warehouse
system, this one's not any different.

We basically have 2 RAID5 disk sets (300Gb) and 150Gb) with a third one
coming along.(around 350Gb)
I was kind of hoping that the new PGSQL tablespaces would allow me to create
a storage container spanning multiple file-systems, but unfortunately, that
seems to be not the case.  Is this correct?

That tells me that I probably need to do a full reconfiguration of the disks
on the Solaris level to get maximum performance out of the system.
Mmmm. This is going to be a though one to crack.  Perhaps it will be
possible to get some extra juice out of placing the indexes on the smaller
disks (150G) and the data on the bigger ones?

Thanks!

Matt

-----Oorspronkelijk bericht-----
Van: Stephen Frost [mailto:sfrost@snowman.net]
Verzonden: donderdag 20 januari 2005 15:26
Aan: Matt Casters
CC: pgsql-performance@postgresql.org
Onderwerp: Re: [PERFORM]

* Matt Casters (Matt.Casters@advalvas.be) wrote:
> I have the go ahead of a customer to do some testing on Postgresql in
> a couple of weeks as a replacement for Oracle.
> The reason for the test is that the number of users of the warehouse
> is going to increase and this will have a serious impact on licencing
> costs. (I bet that sounds familiar)

Rather familiar, yes... :)

> We're running a medium sized data warehouse on a Solaris box (4CPU, 8Gb
RAM) on Oracle.
> Basically we have 2 large fact tables to deal with: one going for 400M
> rows, the other will be hitting 1B rows soon.
> (around 250Gb of data)

Quite a bit of data.  There's one big thing to note here I think- Postgres
will not take advantage of multiple CPUs for a given query, Oracle will.
So, it depends on your workload as to how that may impact you.  Situations
where this will be unlikely to affect you:

Your main bottle-neck is IO/disk and not CPU.
You run multiple queries in parallel frequently.
There are other processes on the system which chew up CPU time anyway.

Situations where you're likely to be affected would be:

You periodically run one big query.
You run a set of queries in sequential order.

> My questions to the list are: has this sort of thing been attempted
> before? If so, what where the performance results compared to Oracle?

I'm pretty sure it's been attempted before but unfortunately I don't have
any numbers on it myself.  My data sets aren't that large (couple million
rows) but I've found PostgreSQL at least as fast as Oracle for what we do,
and much easier to work with.

> I've been reading up on partitioned tabes on pgsql, will the
> performance benefit will be comparable to Oracle partitioned tables?

In this case I would think so, except that PostgreSQL still won't use
multiple CPUs for a given query, even against partitioned tables, aiui.

> What are the gotchas?

See above? :)  Other issues are things having to do w/ your specific
SQL- Oracle's old join syntax isn't supported by PostgreSQL (what is it,
something like select x,y from a,b where x=%y; to do a right-join, iirc).

> Should I be testing on 8 or the 7 version?

Now that 8.0 is out I'd say probably test with that and just watch for 8.0.x
releases before you go production, if you have time before you have to go
into production with the new solution (sounds like you do- changing
databases takes time anyway).

> Thanks in advance for any help you may have, I'll do my best to keep
> pgsql-performance up to date on the results.

Hope that helps.  Others on here will correct me if I misspoke. :)

    Stephen

Re:

From

"Joshua D. Drake"

Date:

20 January 2005, 20:26:38

Matt Casters wrote:
>
> Thanks Stephen,
>
> My main concern is to get as much read performance on the disks as possible
> on this given system.  CPU is rarely a problem on a typical data warehouse
> system, this one's not any different.
>
> We basically have 2 RAID5 disk sets (300Gb) and 150Gb) with a third one
> coming along.(around 350Gb)

Why not run two raid systems. A RAID 1 for your OS and a RAID 10 for
your database? Push all of your extra drives into the RAID 10.

Sincerely,

Joshua D. Drake




> I was kind of hoping that the new PGSQL tablespaces would allow me to create
> a storage container spanning multiple file-systems, but unfortunately, that
> seems to be not the case.  Is this correct?
>
> That tells me that I probably need to do a full reconfiguration of the disks
> on the Solaris level to get maximum performance out of the system.
> Mmmm. This is going to be a though one to crack.  Perhaps it will be
> possible to get some extra juice out of placing the indexes on the smaller
> disks (150G) and the data on the bigger ones?
>
> Thanks!
>
> Matt
>
> -----Oorspronkelijk bericht-----
> Van: Stephen Frost [mailto:sfrost@snowman.net]
> Verzonden: donderdag 20 januari 2005 15:26
> Aan: Matt Casters
> CC: pgsql-performance@postgresql.org
> Onderwerp: Re: [PERFORM]
>
> * Matt Casters (Matt.Casters@advalvas.be) wrote:
>
>>I have the go ahead of a customer to do some testing on Postgresql in
>>a couple of weeks as a replacement for Oracle.
>>The reason for the test is that the number of users of the warehouse
>>is going to increase and this will have a serious impact on licencing
>>costs. (I bet that sounds familiar)
>
>
> Rather familiar, yes... :)
>
>
>>We're running a medium sized data warehouse on a Solaris box (4CPU, 8Gb
>
> RAM) on Oracle.
>
>>Basically we have 2 large fact tables to deal with: one going for 400M
>>rows, the other will be hitting 1B rows soon.
>>(around 250Gb of data)
>
>
> Quite a bit of data.  There's one big thing to note here I think- Postgres
> will not take advantage of multiple CPUs for a given query, Oracle will.
> So, it depends on your workload as to how that may impact you.  Situations
> where this will be unlikely to affect you:
>
> Your main bottle-neck is IO/disk and not CPU.
> You run multiple queries in parallel frequently.
> There are other processes on the system which chew up CPU time anyway.
>
> Situations where you're likely to be affected would be:
>
> You periodically run one big query.
> You run a set of queries in sequential order.
>
>
>>My questions to the list are: has this sort of thing been attempted
>>before? If so, what where the performance results compared to Oracle?
>
>
> I'm pretty sure it's been attempted before but unfortunately I don't have
> any numbers on it myself.  My data sets aren't that large (couple million
> rows) but I've found PostgreSQL at least as fast as Oracle for what we do,
> and much easier to work with.
>
>
>>I've been reading up on partitioned tabes on pgsql, will the
>>performance benefit will be comparable to Oracle partitioned tables?
>
>
> In this case I would think so, except that PostgreSQL still won't use
> multiple CPUs for a given query, even against partitioned tables, aiui.
>
>
>>What are the gotchas?
>
>
> See above? :)  Other issues are things having to do w/ your specific
> SQL- Oracle's old join syntax isn't supported by PostgreSQL (what is it,
> something like select x,y from a,b where x=%y; to do a right-join, iirc).
>
>
>>Should I be testing on 8 or the 7 version?
>
>
> Now that 8.0 is out I'd say probably test with that and just watch for 8.0.x
> releases before you go production, if you have time before you have to go
> into production with the new solution (sounds like you do- changing
> databases takes time anyway).
>
>
>>Thanks in advance for any help you may have, I'll do my best to keep
>>pgsql-performance up to date on the results.
>
>
> Hope that helps.  Others on here will correct me if I misspoke. :)
>
>     Stephen
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster


--
Command Prompt, Inc., your source for PostgreSQL replication,
professional support, programming, managed services, shared
and dedicated hosting. Home of the Open Source Projects plPHP,
plPerlNG, pgManage,  and pgPHPtoolkit.
Contact us now at: +1-503-667-4564 - http://www.commandprompt.com

Attachment

jd.vcf

Re:

From

"Matt Casters"

Date:

20 January 2005, 21:39:21

Joshua,

Actually that's a great idea!
I'll have to check if Solaris wants to play ball though.
We'll have to see as we don't have the new disks yet, ETA is next week.

Cheers,

Matt

-----Oorspronkelijk bericht-----
Van: Joshua D. Drake [mailto:jd@commandprompt.com]
Verzonden: donderdag 20 januari 2005 21:26
Aan: matt.casters@advalvas.be
CC: pgsql-performance@postgresql.org
Onderwerp: Re: [PERFORM]

Matt Casters wrote:
>
> Thanks Stephen,
>
> My main concern is to get as much read performance on the disks as
> possible on this given system.  CPU is rarely a problem on a typical
> data warehouse system, this one's not any different.
>
> We basically have 2 RAID5 disk sets (300Gb) and 150Gb) with a third
> one coming along.(around 350Gb)

Why not run two raid systems. A RAID 1 for your OS and a RAID 10 for your
database? Push all of your extra drives into the RAID 10.

Sincerely,

Joshua D. Drake




> I was kind of hoping that the new PGSQL tablespaces would allow me to
> create a storage container spanning multiple file-systems, but
> unfortunately, that seems to be not the case.  Is this correct?
>
> That tells me that I probably need to do a full reconfiguration of the
> disks on the Solaris level to get maximum performance out of the system.
> Mmmm. This is going to be a though one to crack.  Perhaps it will be
> possible to get some extra juice out of placing the indexes on the
> smaller disks (150G) and the data on the bigger ones?
>
> Thanks!
>
> Matt
>
> -----Oorspronkelijk bericht-----
> Van: Stephen Frost [mailto:sfrost@snowman.net]
> Verzonden: donderdag 20 januari 2005 15:26
> Aan: Matt Casters
> CC: pgsql-performance@postgresql.org
> Onderwerp: Re: [PERFORM]
>
> * Matt Casters (Matt.Casters@advalvas.be) wrote:
>
>>I have the go ahead of a customer to do some testing on Postgresql in
>>a couple of weeks as a replacement for Oracle.
>>The reason for the test is that the number of users of the warehouse
>>is going to increase and this will have a serious impact on licencing
>>costs. (I bet that sounds familiar)
>
>
> Rather familiar, yes... :)
>
>
>>We're running a medium sized data warehouse on a Solaris box (4CPU,
>>8Gb
>
> RAM) on Oracle.
>
>>Basically we have 2 large fact tables to deal with: one going for 400M
>>rows, the other will be hitting 1B rows soon.
>>(around 250Gb of data)
>
>
> Quite a bit of data.  There's one big thing to note here I think-
> Postgres will not take advantage of multiple CPUs for a given query,
Oracle will.
> So, it depends on your workload as to how that may impact you.
> Situations where this will be unlikely to affect you:
>
> Your main bottle-neck is IO/disk and not CPU.
> You run multiple queries in parallel frequently.
> There are other processes on the system which chew up CPU time anyway.
>
> Situations where you're likely to be affected would be:
>
> You periodically run one big query.
> You run a set of queries in sequential order.
>
>
>>My questions to the list are: has this sort of thing been attempted
>>before? If so, what where the performance results compared to Oracle?
>
>
> I'm pretty sure it's been attempted before but unfortunately I don't
> have any numbers on it myself.  My data sets aren't that large (couple
> million
> rows) but I've found PostgreSQL at least as fast as Oracle for what we
> do, and much easier to work with.
>
>
>>I've been reading up on partitioned tabes on pgsql, will the
>>performance benefit will be comparable to Oracle partitioned tables?
>
>
> In this case I would think so, except that PostgreSQL still won't use
> multiple CPUs for a given query, even against partitioned tables, aiui.
>
>
>>What are the gotchas?
>
>
> See above? :)  Other issues are things having to do w/ your specific
> SQL- Oracle's old join syntax isn't supported by PostgreSQL (what is
> it, something like select x,y from a,b where x=%y; to do a right-join,
iirc).
>
>
>>Should I be testing on 8 or the 7 version?
>
>
> Now that 8.0 is out I'd say probably test with that and just watch for
> 8.0.x releases before you go production, if you have time before you
> have to go into production with the new solution (sounds like you do-
> changing databases takes time anyway).
>
>
>>Thanks in advance for any help you may have, I'll do my best to keep
>>pgsql-performance up to date on the results.
>
>
> Hope that helps.  Others on here will correct me if I misspoke. :)
>
>     Stephen
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster


--
Command Prompt, Inc., your source for PostgreSQL replication, professional
support, programming, managed services, shared and dedicated hosting. Home
of the Open Source Projects plPHP, plPerlNG, pgManage,  and pgPHPtoolkit.
Contact us now at: +1-503-667-4564 - http://www.commandprompt.com

Re:

From

Mark Kirkwood

Date:

20 January 2005, 21:40:31

Matt Casters wrote:
> Hi,
>
> My questions to the list are: has this sort of thing been attempted before? If so, what where the
> performance results compared to Oracle?
> I've been reading up on partitioned tabes on pgsql, will the performance benefit will be
> comparable to Oracle partitioned tables?
> What are the gotchas?
> Should I be testing on 8 or the 7 version?
> While I didn't find any documents immediately, are there any fine manuals to read on data
> warehouse performance tuning on PostgreSQL?
>
Some of the previous postings on this list discuss various methods for
doing partitioning (UNION and INHERIT), as well as the use of partial
indexes - see the thread titled : 'Data Warehouse Reevaluation - MySQL
vs Postgres -- merge tables'.

Unfortunately none of these work well for a standard 'star' because :

i) all conditions are on the dimension tables, and
ii) the optimizer can eliminate 'partition' tables only on the basis of
  *constant* conditions, and the resulting implied restrictions caused
by the join to the dimension table(s) are not usable for this.

So I think to get it to work well some violence to your 'star' may be
required (e.g. adding constant columns to 'fact' tables to aid the
optimizer, plus rewriting queries to include conditions on the added
columns).

One other gotcha is that Pg cannot do index only access, which can hurt.
However it may be possibly to get good performance using CLUSTER on the
fact tables (or just loading them in a desirable order) plus using
partial indexes.

regards

Mark

Re:

From

Bruno Wolff III

Date:

21 January 2005, 00:00:37

On Thu, Jan 20, 2005 at 11:31:29 -0500,
  Alex Turner <armtuk@gmail.com> wrote:
> I am curious - I wasn't aware that postgresql supported partitioned tables,
> Could someone point me to the docs on this.

Some people have been doing it using a union view. There isn't actually
a partition feature.

Re:

From

Mischa

Date:

21 January 2005, 04:55:36

No support for partitioned tables? Perhaps in name ... but I use a time-based
"partition" tables that inherit from a base table; new partitions are "placed"
(moved) round-robin on a set of drives. Somewhat manual, but if you really need
a solution now, it works.

Quoting Greg Stark <gsstark@mit.edu>:

>
> "Matt Casters" <Matt.Casters@advalvas.be> writes:
>
> > I've been reading up on partitioned tabes on pgsql, will the performance
> > benefit will be comparable to Oracle partitioned tables?
>
> Postgres doesn't have any built-in support for partitioned tables. You can
> do
> it the same way people did it on Oracle up until 8.0 which is by creating
> views of UNIONs or using inherited tables.
>
> The main advantage of partitioned tables is being able to load and drop data
> in large chunks instantaneously. This avoids having to perform large deletes
> and then having to vacuum huge tables to recover the space.
>
> However in Postgres you aren't going to get most of the performance
> advantage
> of partitions in your query plans. The Oracle planner can prune partitions
> it
> knows aren't relevant to the query to avoid having to search through them.
>
> This can let it get the speed of a full table scan without the disadvantage
> of
> having to read irrelevant tuples. Postgres is sometimes going to be forced
> to
> either do a much slower index scan or read tables that aren't relevant.
>
> --
> greg
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>


--
"Dreams come true, not free."

Re:

From

"Matt Casters"

Date:

21 January 2005, 08:51:03

> On Thu, Jan 20, 2005 at 11:31:29 -0500,
>   Alex Turner <armtuk@gmail.com> wrote:
>> I am curious - I wasn't aware that postgresql supported partitioned tables,
>> Could someone point me to the docs on this.
>
> Some people have been doing it using a union view. There isn't actually
> a partition feature.
>
>

Actually, there is.  If found this example on pgsql-performance:

>>   CREATE TABLE super_foo ( partition NUMERIC, bar NUMERIC );
>>   ANALYZE super_foo ;
>>
>>   CREATE TABLE sub_foo1 () INHERITS ( super_foo );
>>   INSERT INTO sub_foo1 VALUES ( 1, 1 );
>>   -- repeat insert until sub_foo1 has 1,000,000 rows
>>   CREATE INDEX idx_subfoo1_partition ON sub_foo1 ( partition );
>>   ANALYZE sub_foo1 ;
>>
>>   CREATE TABLE sub_foo2 () INHERITS ( super_foo );
>>   INSERT INTO sub_foo2 VALUES ( 2, 1 );
>>   -- repeat insert until sub_foo2 has 1,000,000 rows
>>   CREATE INDEX idx_subfoo2_partition ON sub_foo2 ( partition );
>>   ANALYZE sub_foo2 ;
>>

I think that in certain cases this system even beats Oracle as it stores less information in the
table partitions. (and in doing so is causing less disk IO)
BTW, internally, Oracle sees partitions as tables too.  Even the "Union all" system that MS SQL
Server uses works fine as long as the optimiser supports it to prune correctly.

Cheers,

Matt
------
Matt Casters <matt.casters@ibridge.be>
i-Bridge bvba, http://www.kettle.be
Fonteinstraat 70, 9400 Okegem, Belgium
Phone +32 (0) 486/97.29.37

Re:

From

Yann Michel

Date:

21 January 2005, 12:30:23

Hi,

On Fri, Jan 21, 2005 at 09:50:46AM +0100, Matt Casters wrote:
>
> > Some people have been doing it using a union view. There isn't actually
> > a partition feature.
>
> Actually, there is.  If found this example on pgsql-performance:
>
> >>   CREATE TABLE super_foo ( partition NUMERIC, bar NUMERIC );
> >>   ANALYZE super_foo ;
> >>
> >>   CREATE TABLE sub_foo1 () INHERITS ( super_foo );
[...]
> >>
> >>   CREATE TABLE sub_foo2 () INHERITS ( super_foo );
[...]
> >>

Yes, this could be used instead of a view. But there is one thing
missing. You can't just insert into super_foo and aquire the "correct
partition". You will still have to insert into the correct underlying
table. "Real" partitioning will take care of correct partition
selection.

Regards,
Yann

Re:

From

"Matt Casters"

Date:

21 January 2005, 12:51:19

>> > Some people have been doing it using a union view. There isn't actually
>> > a partition feature.
>>
>> Actually, there is.  If found this example on pgsql-performance:
>>
>> >>   CREATE TABLE super_foo ( partition NUMERIC, bar NUMERIC );
>> >>   ANALYZE super_foo ;
>> >>
>> >>   CREATE TABLE sub_foo1 () INHERITS ( super_foo );
> [...]
>> >>
>> >>   CREATE TABLE sub_foo2 () INHERITS ( super_foo );
> [...]
>> >>
>
> Yes, this could be used instead of a view. But there is one thing
> missing. You can't just insert into super_foo and aquire the "correct
> partition". You will still have to insert into the correct underlying
> table. "Real" partitioning will take care of correct partition
> selection.

This IS bad news.  It would mean a serious change in the ETL.
I think I can solve the other problems, but I don't know about this one...

Regards,

Matt

Re:

From

Matteo Beccati

Date:

21 January 2005, 14:37:21

Hi,

>>>>  CREATE TABLE super_foo ( partition NUMERIC, bar NUMERIC );
>>>>  ANALYZE super_foo ;
>>>>
>>>>  CREATE TABLE sub_foo1 () INHERITS ( super_foo );
>>>>  CREATE TABLE sub_foo2 () INHERITS ( super_foo );
>
> Yes, this could be used instead of a view. But there is one thing
> missing. You can't just insert into super_foo and aquire the "correct
> partition". You will still have to insert into the correct underlying
> table. "Real" partitioning will take care of correct partition
> selection.

I've recently used this method for partitioning data. In my setup
inserts are done inside a pl/pgsql function called at regular intervals,
so this isn't a problem for me. I didn't test it, but I think some rules
(or a trigger) could do the trick.


Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com

Re:

From

Yann Michel

Date:

21 January 2005, 16:06:46

Hi,

On Fri, Jan 21, 2005 at 03:37:20PM +0100, Matteo Beccati wrote:
>
> >>>> CREATE TABLE super_foo ( partition NUMERIC, bar NUMERIC );
> >>>> ANALYZE super_foo ;
> >>>>
> >>>> CREATE TABLE sub_foo1 () INHERITS ( super_foo );
> >>>> CREATE TABLE sub_foo2 () INHERITS ( super_foo );
> >
> >Yes, this could be used instead of a view. But there is one thing
> >missing. You can't just insert into super_foo and aquire the "correct
> >partition". You will still have to insert into the correct underlying
> >table. "Real" partitioning will take care of correct partition
> >selection.
>
> I've recently used this method for partitioning data. In my setup
> inserts are done inside a pl/pgsql function called at regular intervals,
> so this isn't a problem for me. I didn't test it, but I think some rules
> (or a trigger) could do the trick.

Yes, a pl/pgsql function or any software solution can solve this
problem, but what you normally expect from a partitioning support is
that you don't have to care about where to put your data due to the db
will take care for that.
Of cause a trigger could do this as well, but don't forget, that a
trigger in dwh environments, where you process thousands of row at once
during data loading, is very expensive and therefore no solution for
production use.

Regards,
Yann