Thread: Partition column should be part of PK

Partition column should be part of PK

From
Nagaraj Raj
Date:

we have some partitioned tables with inherence and planning to migrate them to the declaration.

Table DDL:

CREATE TABLE c_account_p

(

    billing_account_guid character varying(40)  NOT NULL,

    ingestion_process_id bigint NOT NULL DEFAULT '-1'::integer,

    load_dttm timestamp(6) without time zone NOT NULL,

    ban integer NOT NULL,

    CONSTRAINT billing_account_pkey PRIMARY KEY (billing_account_guid, ban)

) PARTITION by RANGE(load_dttm);

When I try the create table, it's throwing below error:

ERROR:  insufficient columns in the PRIMARY KEY constraint definition

DETAIL:  PRIMARY KEY constraint on table "l_billing_account_p" lacks column "load_dttm" which is part of the partition key.

SQL state: 0A000

Is it mandatory/necessary that the partition column should be a primary key? cause if I include load_dttm as PK then it's working fine.db<>fiddle


If the partition column should be supposed to be a PK, it's challenging to create a partition by range with the date column, cause the load_dttm column chances to have duplicate if data loaded COPY or INSERT.
INSERT INTO c_account_p SELECT * from c_account_p_bkp ON CONFLICT (billing_account_guid,ban,load_dttm) DO UPDATE SET 'some stuff..'

If I receive billing_account_guid, ban combination with different load_dttm then it will end up with duplicate keys.


Could some please help me to understand this scenario?

Thanks.

 



Re: Partition column should be part of PK

From
Justin Pryzby
Date:
Declarative partitioning was new in v10
In v11, it was allowed to create an index on a partitioned table, including
unique indexes.

However it's not a "global" index - instead, it's an "inherited" index.
For a unique index, uniqueness is enforced within each individual index.
And so global uniqueness is only guaranteed if the partition key is included in
the index.

On Fri, Jun 25, 2021 at 03:10:07AM +0000, Nagaraj Raj wrote:
> CREATE TABLE c_account_p (
>     billing_account_guid character varying(40)  NOT NULL,
>     ingestion_process_id bigint NOT NULL DEFAULT '-1'::integer,
>     load_dttm timestamp(6) without time zone NOT NULL,
>     ban integer NOT NULL,
>     CONSTRAINT billing_account_pkey PRIMARY KEY (billing_account_guid, ban)
> ) PARTITION by RANGE(load_dttm);
> When I try the create table, it's throwing below error:
> ERROR:  insufficient columns in the PRIMARY KEY constraint definition
> DETAIL:  PRIMARY KEY constraint on table "l_billing_account_p" lacks column "load_dttm" which is part of the
partitionkey.
 
> 
> If the partition column should be supposed to be a PK, it's challenging to create a partition by range with the date
column,cause the load_dttm column chances to have duplicate if data loaded COPY or INSERT.INSERT INTO c_account_p
SELECT* from c_account_p_bkp ON CONFLICT (billing_account_guid,ban,load_dttm) DO UPDATE SET 'some stuff..'
 
> 
> If I receive billing_account_guid, ban combination with different load_dttm then it will end up with duplicate keys.

It sounds like you want a unique index on (billing_account_guid, ban) to
support INSERT ON CONFLICT.  If DO UPDATE SET will never move tuples to a new
partittion, then you could do the INSERT ON CONFLICT on the partition rather
than its parent.

But it cannot be a unique, "partitioned" index, without including load_dttm.

-- 
Justin



Re: Partition column should be part of PK

From
Justin Pryzby
Date:
If I'm not wrong, this is the same thing you asked 2 week ago.

If so, why not continue the conversation on the same thread, and why not
reference the old thread ?

I went to the effort to find the old conversation.
https://www.postgresql.org/message-id/20210625042228.GJ29179@telsasoft.com

If declaratively partitioned tables and partitioned indexes don't do what you
wanted, then you should consider not using them for this.

On Fri, Jun 25, 2021 at 03:10:07AM +0000, Nagaraj Raj wrote:
> we have some partitioned tables with inherence and planning to migrate them to the declaration.
> Table DDL:
> CREATE TABLE c_account_p
> (
>     billing_account_guid character varying(40)  NOT NULL,
>     ingestion_process_id bigint NOT NULL DEFAULT '-1'::integer,
>     load_dttm timestamp(6) without time zone NOT NULL,
>     ban integer NOT NULL,
>     CONSTRAINT billing_account_pkey PRIMARY KEY (billing_account_guid, ban)
> ) PARTITION by RANGE(load_dttm);
> When I try the create table, it's throwing below error:
> ERROR:  insufficient columns in the PRIMARY KEY constraint definition
> DETAIL:  PRIMARY KEY constraint on table "l_billing_account_p" lacks column "load_dttm" which is part of the
partitionkey.
 
> SQL state: 0A000
> Is it mandatory/necessary that the partition column should be a primary key? cause if I include load_dttm as PK then
it'sworking fine.db<>fiddle
 
> If the partition column should be supposed to be a PK, it's challenging to create a partition by range with the date
column,cause the load_dttm column chances to have duplicate if data loaded COPY or INSERT.INSERT INTO c_account_p
SELECT* from c_account_p_bkp ON CONFLICT (billing_account_guid,ban,load_dttm) DO UPDATE SET 'some stuff..'
 
> If I receive billing_account_guid, ban combination with different load_dttm then it will end up with duplicate keys.
> Could some please help me to understand this scenario?
> Thanks.



Re: Partition column should be part of PK

From
Alvaro Herrera
Date:
On 2021-Jul-08, Justin Pryzby wrote:

> If I'm not wrong, this is the same thing you asked 2 week ago.
> 
> If so, why not continue the conversation on the same thread, and why not
> reference the old thread ?
> 
> I went to the effort to find the old conversation.
> https://www.postgresql.org/message-id/20210625042228.GJ29179@telsasoft.com

Actually, it looks like you're replying to the same email you replied to
two weeks ago.

-- 
Álvaro Herrera              Valdivia, Chile  —  https://www.EnterpriseDB.com/



RE: Partition column should be part of PK

From
"Mike Sofen"
Date:
I believe this thread qualifies for the funniest thread of 2021 (so far).  And yes, this is a top post.  :-)

Mike Sofen

-----Original Message-----
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Sent: Thursday, July 08, 2021 3:29 PM
To: Justin Pryzby <pryzby@telsasoft.com>
Cc: Nagaraj Raj <nagaraj.sf@yahoo.com>; pgsql-performance@postgresql.org
Subject: Re: Partition column should be part of PK

On 2021-Jul-08, Justin Pryzby wrote:

> If I'm not wrong, this is the same thing you asked 2 week ago.
>
> If so, why not continue the conversation on the same thread, and why
> not reference the old thread ?
>
> I went to the effort to find the old conversation.
> https://www.postgresql.org/message-id/20210625042228.GJ29179@telsasoft
> .com

Actually, it looks like you're replying to the same email you replied to two weeks ago.

--
Álvaro Herrera              Valdivia, Chile  —  https://www.EnterpriseDB.com/






Re: Partition column should be part of PK

From
Nagaraj Raj
Date:
My apologies for making confusion with new thread. Yes its same issue related to earlier post.

I was trying to figure out  how to ensure unique values for columns (billing_account_guid, ban). If i add partition key to constraint , it wont be possible what im looking for.

My use case as below 

INSERT INTO t1 SELECT * from t2 ON CONFLICT (billing_account_guid,ban) DO UPDATE SET something…

Or

INSERT INTO t1 SELECT * from t2 ON CONFLICT constraint (pk or uk)(billing_account_guid,ban) DO UPDATE SET something…


Thanks

Sent from Yahoo Mail for iPhone

On Thursday, July 8, 2021, 7:12 PM, Mike Sofen <msofen@runbox.com> wrote:

I believe this thread qualifies for the funniest thread of 2021 (so far).  And yes, this is a top post.  :-)

Mike Sofen

-----Original Message-----
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Sent: Thursday, July 08, 2021 3:29 PM
To: Justin Pryzby <pryzby@telsasoft.com>
Cc: Nagaraj Raj <nagaraj.sf@yahoo.com>; pgsql-performance@postgresql.org
Subject: Re: Partition column should be part of PK

On 2021-Jul-08, Justin Pryzby wrote:

> If I'm not wrong, this is the same thing you asked 2 week ago.
>
> If so, why not continue the conversation on the same thread, and why
> not reference the old thread ?
>
> I went to the effort to find the old conversation.
> https://www.postgresql.org/message-id/20210625042228.GJ29179@telsasoft
> .com

Actually, it looks like you're replying to the same email you replied to two weeks ago.

--
Álvaro Herrera              Valdivia, Chile  —  https://www.EnterpriseDB.com/





Re: Partition column should be part of PK

From
Christophe Pettus
Date:

> On Jul 8, 2021, at 20:32, Nagaraj Raj <nagaraj.sf@yahoo.com> wrote:
>
> My apologies for making confusion with new thread. Yes its same issue related to earlier post.
>
> I was trying to figure out  how to ensure unique values for columns (billing_account_guid, ban). If i add partition
keyto constraint , it wont be possible what im looking for. 
>
> My use case as below
>
> INSERT INTO t1 SELECT * from t2 ON CONFLICT (billing_account_guid,ban) DO UPDATE SET something…
>
> Or
>
> INSERT INTO t1 SELECT * from t2 ON CONFLICT constraint (pk or uk)(billing_account_guid,ban) DO UPDATE SET something…

Right now, PostgreSQL does not support unique indexes on partitioned tables (that operate across all partitions) unless
thepartition key is included in the index definition.  If it didn't have that requirement, it would have to
independently(and in a concurrency-supported way) scan every partition individually to see if there is a duplicate key
violationin any of the partitions, and the machinery to do that does not exist right now. 

If the goal is to make sure there is only one (billing_account_guid, ban, date) combination across the entire partition
set,you can create an index unique index on the partitioned set as (billing_account_guid, ban, date), and INSERT ... ON
CONFLICTDO NOTHING works properly then. 

If the goal is to make sure there is only one (billing_account_uid, ban) in any partition regardless of date, you'll
needto do something more sophisticated to make sure that two sessions don't insert an (billing_account_uid, ban) value
intotwo different partitions.  This isn't a great fit for table partitioning, and you might want to reconsider if
partitioningthe table is the right answer here.  If you *must* have table partitioning, a possible algorithm is: 

-- Start a transaction
-- Hash the (billing_account_uid, ban) key into a 64 bit value.
-- Use that 64 bit value as a key to a call to pg_advisory_xact_lock() [1] to, in essence, create a signal to any other
transactionattempting to insert that pair that it is being modified. 
-- SELECT on that pair to make sure one does not already exist.
-- If one does not, do the INSERT.
-- Commit, which releases the advisory lock.

This doesn't provide quite the same level of uniqueness that a cross-partition index would, but if this is the only
codepath that does the INSERT, it should keep duplicate from showing up in different partitions. 

[1] https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-ADVISORY-LOCKS


Re: Partition column should be part of PK

From
Nagaraj Raj
Date:
personally, I feel this design is very bad compared to other DB servers.

> If the goal is to make sure there is only one (billing_account_uid, ban) in any partition regardless of date, you'll need to do something more sophisticated to make sure that two sessions don't insert an (billing_account_uid, ban) value into two different partitions.  This isn't a great fit for table partitioning, and you might want to reconsider if partitioning the table is the right answer here.  If you *must* have table partitioning, a possible algorithm is:

yes, this is my use case.

can I use some trigger on the partition table before inserting the call that function this one handle conflict? 


CREATE or replace FUNCTION insert_trigger()
    RETURNS trigger
    LANGUAGE 'plpgsql'
    COST 100
    VOLATILE NOT LEAKPROOF
AS $BODY$
DECLARE
      conn_name text;
  c_table TEXT;
  t_schema text;
  c_table1 text;
  m_table1 text;
    BEGIN
    c_table1 := TG_TABLE_NAME;
    t_schema := TG_TABLE_SCHEMA;
    m_table1 := t_schema||'.'||TG_TABLE_NAME;
    SELECT conname FROM pg_constraint WHERE conrelid = TG_TABLE_NAME ::regclass::oid and contype = 'u' into conn_name;
    execute 'insert into '|| m_table1 || ' values ' || new.* || ' on conflict on constraint ' || conn_name || ' do nothing -- or somthing';
    RETURN null;
    end;
    $BODY$;

CREATE TRIGGER insert
    BEFORE INSERT
    ON t4
    FOR EACH ROW
    WHEN (pg_trigger_depth() < 1)
    EXECUTE FUNCTION insert_trigger();
    CREATE TRIGGER insert
    BEFORE INSERT
    ON t3
    FOR EACH ROW
    WHEN (pg_trigger_depth() < 1)
    EXECUTE FUNCTION insert_trigger(); .. so on ..
    




Please suggest.


Thanks,
Rj

On Thursday, July 8, 2021, 08:52:35 PM PDT, Christophe Pettus <xof@thebuild.com> wrote:




> On Jul 8, 2021, at 20:32, Nagaraj Raj <nagaraj.sf@yahoo.com> wrote:
>
> My apologies for making confusion with new thread. Yes its same issue related to earlier post.
>
> I was trying to figure out  how to ensure unique values for columns (billing_account_guid, ban). If i add partition key to constraint , it wont be possible what im looking for.
>
> My use case as below
>
> INSERT INTO t1 SELECT * from t2 ON CONFLICT (billing_account_guid,ban) DO UPDATE SET something…
>
> Or
>
> INSERT INTO t1 SELECT * from t2 ON CONFLICT constraint (pk or uk)(billing_account_guid,ban) DO UPDATE SET something…

Right now, PostgreSQL does not support unique indexes on partitioned tables (that operate across all partitions) unless the partition key is included in the index definition.  If it didn't have that requirement, it would have to independently (and in a concurrency-supported way) scan every partition individually to see if there is a duplicate key violation in any of the partitions, and the machinery to do that does not exist right now.

If the goal is to make sure there is only one (billing_account_guid, ban, date) combination across the entire partition set, you can create an index unique index on the partitioned set as (billing_account_guid, ban, date), and INSERT ... ON CONFLICT DO NOTHING works properly then.

If the goal is to make sure there is only one (billing_account_uid, ban) in any partition regardless of date, you'll need to do something more sophisticated to make sure that two sessions don't insert an (billing_account_uid, ban) value into two different partitions.  This isn't a great fit for table partitioning, and you might want to reconsider if partitioning the table is the right answer here.  If you *must* have table partitioning, a possible algorithm is:

-- Start a transaction
-- Hash the (billing_account_uid, ban) key into a 64 bit value.
-- Use that 64 bit value as a key to a call to pg_advisory_xact_lock() [1] to, in essence, create a signal to any other transaction attempting to insert that pair that it is being modified.
-- SELECT on that pair to make sure one does not already exist.
-- If one does not, do the INSERT.
-- Commit, which releases the advisory lock.

This doesn't provide quite the same level of uniqueness that a cross-partition index would, but if this is the only code path that does the INSERT, it should keep duplicate from showing up in different partitions.


[1]
https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-ADVISORY-LOCKS


Re: Partition column should be part of PK

From
Christophe Pettus
Date:

> On Jul 11, 2021, at 17:36, Nagaraj Raj <nagaraj.sf@yahoo.com> wrote:
>
> personally, I feel this design is very bad compared to other DB servers.

Patches accepted.  The issue is that in order to have a partition-set-wide unique index, the system would have to lock
theunique index entries in *all* partitions, not just the target one. This logic does not currently exist, and it's not
trivialto implement efficiently. 

> can I use some trigger on the partition table before inserting the call that function this one handle conflict?

That doesn't handle the core problem, which is ensuring that two different sessions do not insert the same
(billing_account_uid,ban) into two different partitions.  That requires some kind of higher-level lock.  The example
yougive isn't required; PostgreSQL will perfectly happily accept a unique constraint on (billing_account_uid, ban) on
eachpartition, and handle attempts to insert a duplicate row correctly (either by returning an error or processing an
ONCONFLICT) clause.  What that does not prevent is a duplicate (billing_account_uid, ban) in two different partitions. 

There's another issue here, which is this design implies that once a particular (billing_account_uid, ban) row is
createdin the partitioned table, it is never deleted.  This means older partitions are never dropped, which means the
numberof partitions in the table will row unbounded.  This is not going to scale well as the number of partitions
startsgetting very large. 

You might consider, instead, hash-partitioning on one of billing_account_uid or ban, or reconsider if partitioning is
theright solution here. 


Re: Partition column should be part of PK

From
David Rowley
Date:
On Mon, 12 Jul 2021 at 12:37, Nagaraj Raj <nagaraj.sf@yahoo.com> wrote:
> personally, I feel this design is very bad compared to other DB servers.

I'm not sure exactly what you're referring to here as you didn't quote
it, but my guess is you mean our lack of global index support.

Generally, there's not all that much consensus in the community that
this would be a good feature to have.  Why do people want to use
partitioning?  Many people do it so that they can quickly remove data
that's no longer required with a simple DETACH operation.  This is
metadata only and is generally very fast.  Another set of people
partition as their tables are very large and they become much easier
to manage when broken down into parts.  There's also a group of people
who do it for the improved data locality.   Unfortunately, if we had a
global index feature then that requires building a single index over
all partitions.  DETACH is no longer a metadata-only operation as we
must somehow invalidate or remove tuples that belong to the detached
partition. The group of people who partitioned to get away from very
large tables now have a very large index.  Maybe the only group to get
off lightly here are the data locality group. They'll still have the
same data locality on the heap.

So in short, many of the benefits of partitioning disappear when you
have a global index.

So, why did you partition your data in the first place?  If you feel
like you wouldn't mind having a large global index over all partitions
then maybe you're better off just using a non-partitioned table to
store this data.

David



Re: Partition column should be part of PK

From
Thomas Kellerer
Date:
David Rowley schrieb am 12.07.2021 um 02:57:
> Generally, there's not all that much consensus in the community that
> this would be a good feature to have.  Why do people want to use
> partitioning?  Many people do it so that they can quickly remove data
> that's no longer required with a simple DETACH operation.  This is
> metadata only and is generally very fast.  Another set of people
> partition as their tables are very large and they become much easier
> to manage when broken down into parts.  There's also a group of people
> who do it for the improved data locality.   Unfortunately, if we had a
> global index feature then that requires building a single index over
> all partitions.  DETACH is no longer a metadata-only operation as we
> must somehow invalidate or remove tuples that belong to the detached
> partition. The group of people who partitioned to get away from very
> large tables now have a very large index.  Maybe the only group to get
> off lightly here are the data locality group. They'll still have the
> same data locality on the heap.
>
> So in short, many of the benefits of partitioning disappear when you
> have a global index.

The situations where this is useful are large tables where partitioning
would turn Seq Scans of the whole table into Seq Scans of a partition,
or where it would allow for partition wise joins and still have
foreign keys referencing the partitioned table.

I agree they do have downsides. I only know Oracle as one of those systems
where this is possible, and in general global indexes are somewhat
avoided but there are still situations where they are useful.
E.g. if you want to have foreign keys referencing your partitioned
table and including the partition key in the primary key makes no
sense.

Even though they have disadvantages, I think it would be nice to
have the option to create them.

I know that in the Oracle world, they are used seldomly (precisely
because of the disadvantages you mentioned) but they do have a place.

Thomas



RE: Partition column should be part of PK

From
"Michel SALAIS"
Date:
Hi all,
I think that global indexes could be useful sometimes. That is why Oracle implements them.
Just to mention two benefits that could be required by a lot of people:
- Global uniqueness which shouldn't be in conflict with partitioning
- Performance! Well, when index is on a column which is not the partitioning key. A global index would be better for
performance...

Nevertheless, this doesn't go without any price and you have described this very well. That is why Oracle invalidates
globalindexes when some partitioning maintenance operations are achieved. These indexes have to be rebuilt. But,
anyway,such operations could be done "concurrently" or "online"... 

Michel SALAIS

-----Message d'origine-----
De : David Rowley <dgrowleyml@gmail.com>
Envoyé : lundi 12 juillet 2021 02:57
À : Nagaraj Raj <nagaraj.sf@yahoo.com>
Cc : Christophe Pettus <xof@thebuild.com>; pgsql-performance@postgresql.org
Objet : Re: Partition column should be part of PK

On Mon, 12 Jul 2021 at 12:37, Nagaraj Raj <nagaraj.sf@yahoo.com> wrote:
> personally, I feel this design is very bad compared to other DB servers.

I'm not sure exactly what you're referring to here as you didn't quote it, but my guess is you mean our lack of global
indexsupport. 

Generally, there's not all that much consensus in the community that this would be a good feature to have.  Why do
peoplewant to use partitioning?  Many people do it so that they can quickly remove data that's no longer required with
asimple DETACH operation.  This is metadata only and is generally very fast.  Another set of people partition as their
tablesare very large and they become much easier to manage when broken down into parts.  There's also a group of people 
who do it for the improved data locality.   Unfortunately, if we had a
global index feature then that requires building a single index over all partitions.  DETACH is no longer a
metadata-onlyoperation as we must somehow invalidate or remove tuples that belong to the detached partition. The group
ofpeople who partitioned to get away from very large tables now have a very large index.  Maybe the only group to get
offlightly here are the data locality group. They'll still have the same data locality on the heap. 

So in short, many of the benefits of partitioning disappear when you have a global index.

So, why did you partition your data in the first place?  If you feel like you wouldn't mind having a large global index
overall partitions then maybe you're better off just using a non-partitioned table to store this data. 

David





Performance benchmark of PG

From
Manish Lad
Date:
Dear all,
We are planning to migrate Oracle exadata database to postgresql and db size ranges from 1 tb to 60 TB. 

Will the PG support this with the performance matching to that of exadata applince? 
If anyone could point me in the right direction where i xan get the benchmarking done for these two databases either on prime or any cloud would be great. 

Thanks all in advance. 

Manish

Re: Performance benchmark of PG

From
Laurenz Albe
Date:
On Mon, 2021-07-19 at 15:39 +0530, Manish Lad wrote:
> We are planning to migrate Oracle exadata database to postgresql and db size ranges from 1 tb to 60 TB. 
> 
> Will the PG support this with the performance matching to that of exadata applince? 
> If anyone could point me in the right direction where i xan get the benchmarking done
>  for these two databases either on prime or any cloud would be great. 

You won't find any trustworthy benchmarks anywhere, because Oracle expressedly
forbids publishing of benchmark results in its license, unless Oracle has given
its permission.

The question cannot be answered, because performance depends on your workload,
configuration, software and hardware.  Perhaps PostgreSQL will be faster, perhaps not.

Test and see.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: Performance benchmark of PG

From
Manish Lad
Date:
Yes you are right. I also experienced same in one such migration from db2 to PG which had read faster but the write was not meeting the need. 

We then noticed the differences in disk types. 

Once changed it matched the source. 

Thanks and Regards

Manish 

On Mon, 19 Jul 2021, 16:34 Laurenz Albe, <laurenz.albe@cybertec.at> wrote:
On Mon, 2021-07-19 at 15:39 +0530, Manish Lad wrote:
> We are planning to migrate Oracle exadata database to postgresql and db size ranges from 1 tb to 60 TB.
>
> Will the PG support this with the performance matching to that of exadata applince?
> If anyone could point me in the right direction where i xan get the benchmarking done
>  for these two databases either on prime or any cloud would be great.

You won't find any trustworthy benchmarks anywhere, because Oracle expressedly
forbids publishing of benchmark results in its license, unless Oracle has given
its permission.

The question cannot be answered, because performance depends on your workload,
configuration, software and hardware.  Perhaps PostgreSQL will be faster, perhaps not.

Test and see.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

Re: Performance benchmark of PG

From
Hüseyin Demir
Date:
Hi,

The question can not be answered in a proper way. Because, in PostgreSQL, performance(response time in query execution events) depends on

1. Your disk/storage hardware. The performance can vary between SSD and HDD for example.
2. Your PostgreSQL configurations. In other words, configuration parameters can change your performance metrics. But you have to define your queries,data size that a query can SELECT each time and queries that INSERTS/UPDATES to database.
3. Your CPU and MEMORY hardwares can also change your performance metrics. You have to compare your hardware infrastructure with Exadata appliances.
4. You also have to consider the connection pooling part in your application part. PostgreSQL can suffer from performance problems because of lack of connection pooling.

Regards.


Manish Lad <manishlad7893@gmail.com>, 19 Tem 2021 Pzt, 14:09 tarihinde şunu yazdı:
Yes you are right. I also experienced same in one such migration from db2 to PG which had read faster but the write was not meeting the need. 

We then noticed the differences in disk types. 

Once changed it matched the source. 

Thanks and Regards

Manish 

On Mon, 19 Jul 2021, 16:34 Laurenz Albe, <laurenz.albe@cybertec.at> wrote:
On Mon, 2021-07-19 at 15:39 +0530, Manish Lad wrote:
> We are planning to migrate Oracle exadata database to postgresql and db size ranges from 1 tb to 60 TB.
>
> Will the PG support this with the performance matching to that of exadata applince?
> If anyone could point me in the right direction where i xan get the benchmarking done
>  for these two databases either on prime or any cloud would be great.

You won't find any trustworthy benchmarks anywhere, because Oracle expressedly
forbids publishing of benchmark results in its license, unless Oracle has given
its permission.

The question cannot be answered, because performance depends on your workload,
configuration, software and hardware.  Perhaps PostgreSQL will be faster, perhaps not.

Test and see.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com



--
Hüseyin Demir

Senior Database Platform Engineer

Linkedin: hseyindemir

Re: Performance benchmark of PG

From
Thomas Kellerer
Date:
Manish Lad schrieb am 19.07.2021 um 12:09:
> We are planning to migrate Oracle exadata database to postgresql and
> db size ranges from 1 tb to 60 TB.
>
> Will the PG support this with the performance matching to that of
> exadata applince? If anyone could point me in the right direction
> where i xan get the benchmarking done for these two databases either
> on prime or any cloud would be great.


As already pointed out, you won't find such a benchmark.

You will have to run such a benchmark yourself. Ideally with a workload
that represents your use case. Or maybe with something like HammerDB.

But Exadata isn't only software, it's also hardware especially designed
to work together with Oracle's enterprise edition.

So if you want to get any reasonable results, you will at least have to
buy hardware that matches the Exadata HW specifications.

So if you run your own tests, make sure you buy comparable HW for
Postgres as well (lots of RAM and many fast server grade NVMes)




Re: Performance benchmark of PG

From
Manish Lad
Date:
Thank you all for your swift response. 

Thank you again.

Manish

On Mon, 19 Jul 2021, 15:39 Manish Lad, <manishlad7893@gmail.com> wrote:
Dear all,
We are planning to migrate Oracle exadata database to postgresql and db size ranges from 1 tb to 60 TB. 

Will the PG support this with the performance matching to that of exadata applince? 
If anyone could point me in the right direction where i xan get the benchmarking done for these two databases either on prime or any cloud would be great. 

Thanks all in advance. 

Manish

Re: Performance benchmark of PG

From
Ninad Shah
Date:
As Thomas rightly pointed about the feasibility of benchmarking. You may still compare performance of queries on both Exadata as well as PostgreSQL. IMO, it may not be on par, but it must be acceptable.

In the contemporary world, 60TB isn't really a huge database. So, I hardly think you should find any performance issues on PostgreSQL.

All the best.


Regards,
Ninad Shah


On Mon, 19 Jul 2021 at 16:54, Manish Lad <manishlad7893@gmail.com> wrote:
Thank you all for your swift response. 

Thank you again.

Manish

On Mon, 19 Jul 2021, 15:39 Manish Lad, <manishlad7893@gmail.com> wrote:
Dear all,
We are planning to migrate Oracle exadata database to postgresql and db size ranges from 1 tb to 60 TB. 

Will the PG support this with the performance matching to that of exadata applince? 
If anyone could point me in the right direction where i xan get the benchmarking done for these two databases either on prime or any cloud would be great. 

Thanks all in advance. 

Manish

Re: Performance benchmark of PG

From
Manish Lad
Date:
Thanks a lot. 

On Mon, 19 Jul 2021, 22:18 Ninad Shah, <nshah.postgres@gmail.com> wrote:
As Thomas rightly pointed about the feasibility of benchmarking. You may still compare performance of queries on both Exadata as well as PostgreSQL. IMO, it may not be on par, but it must be acceptable.

In the contemporary world, 60TB isn't really a huge database. So, I hardly think you should find any performance issues on PostgreSQL.

All the best.


Regards,
Ninad Shah


On Mon, 19 Jul 2021 at 16:54, Manish Lad <manishlad7893@gmail.com> wrote:
Thank you all for your swift response. 

Thank you again.

Manish

On Mon, 19 Jul 2021, 15:39 Manish Lad, <manishlad7893@gmail.com> wrote:
Dear all,
We are planning to migrate Oracle exadata database to postgresql and db size ranges from 1 tb to 60 TB. 

Will the PG support this with the performance matching to that of exadata applince? 
If anyone could point me in the right direction where i xan get the benchmarking done for these two databases either on prime or any cloud would be great. 

Thanks all in advance. 

Manish