Thread: Quick Performance Poll

Quick Performance Poll

From

"Simon Dale"

Date:

20 April 2006, 13:19:07

Hi,

I was just wondering whether anyone has had success with storing more than 1TB of data with PostgreSQL and how they have found the performance.

We need a database that can store in excess of this amount and still show good performance. We will probably be implementing several tables with foreign keys and also indexes which will obviously impact on both data size and performance too.

Many thanks in advance,

Simon

Visit our Website at www.rm.com

This message is confidential. You should not copy it or disclose its contents to anyone. You may use and apply the information for the intended purpose only. Internet communications are not secure; therefore, RM does not accept legal responsibility for the contents of this message. Any views or opinions presented are those of the author only and not of RM. If this email has come to you in error, please delete it, along with any attachments. Please note that RM may intercept incoming and outgoing email communications.

Freedom of Information Act 2000
This email and any attachments may contain confidential information belonging to RM. Where the email and any attachments do contain information of a confidential nature, including without limitation information relating to trade secrets, special terms or prices these shall be deemed for the purpose of the Freedom of Information Act 2000 as information provided in confidence by RM and the disclosure of which would be prejudicial to RM's commercial interests.

This email has been scanned for viruses by Trend ScanMail.

Re: Quick Performance Poll

From

"Jim Buttafuoco"

Date:

20 April 2006, 13:36:32

Simon,

I have many databases over 1T with the largest being ~6T.  All of my databases store telecom data, such as call detail
records.  The access is very fast when looking for a small subset of the data.  For servers, I am using white box intel
XEON and P4 systems with SATA disks, 4G of memory.  SCSI is out of our price range, but if I had unlimited $ I would go
with SCSI /SCSI raid instead.

Jim

---------- Original Message -----------
From: "Simon Dale" <sdale@rm.com>
To: <pgsql-performance@postgresql.org>
Sent: Thu, 20 Apr 2006 14:18:58 +0100
Subject: [PERFORM] Quick Performance Poll

> Hi,
>
> I was just wondering whether anyone has had success with storing more
> than 1TB of data with PostgreSQL and how they have found the
> performance.
>
> We need a database that can store in excess of this amount and still
> show good performance. We will probably be implementing several tables
> with foreign keys and also indexes which will obviously impact on both
> data size and performance too.
>
> Many thanks in advance,
>
> Simon
> Visit our Website at http://www.rm.com
>
> This message is confidential. You should not copy it or disclose its contents to anyone. You may use and apply
> the information for the intended purpose only. Internet communications are not secure; therefore, RM does not
> accept legal responsibility for the contents of this message. Any views or opinions presented are those of the
> author only and not of RM. If this email has come to you in error, please delete it, along with any
> attachments. Please note that RM may intercept incoming and outgoing email communications.
>
> Freedom of Information Act 2000
> This email and any attachments may contain confidential information belonging to RM.  Where the email and any
> attachments do contain information of a confidential nature, including without limitation information relating
> to trade secrets, special terms or prices these shall be deemed for the purpose of the Freedom of Information
> Act 2000 as information provided in confidence by RM and the disclosure of which would be prejudicial to RM's
> commercial interests.
>
> This email has been scanned for viruses by Trend ScanMail.
------- End of Original Message -------

Re: Quick Performance Poll

From

"Luke Lonergan"

Date:

20 April 2006, 14:32:01

Jim,

On 4/20/06 6:36 AM, "Jim Buttafuoco" <jim@contactbda.com> wrote:

> The access is very fast when looking for a small subset of the data.

I guess you are not using indexes because building a (non bitmap) index on
6TB on a single machine would take days if not weeks.

So if you are using table partitioning, do you have to refer to each child
table separately in your queries?

- Luke

Re: Quick Performance Poll

From

"Jim Buttafuoco"

Date:

20 April 2006, 14:41:02

First of all this is NOT a single table and yes I am using partitioning and the constaint exclusion stuff.  the largest
set of tables is over 2T.  I have not had to rebuild the biggest database yet, but for a smaller one ~1T the restore
takes about 12 hours including many indexes on both large and small tables

Jim



---------- Original Message -----------
From: "Luke Lonergan" <llonergan@greenplum.com>
To: jim@contactbda.com, "Simon Dale" <sdale@rm.com>, pgsql-performance@postgresql.org
Sent: Thu, 20 Apr 2006 07:31:33 -0700
Subject: Re: [PERFORM] Quick Performance Poll

> Jim,
>
> On 4/20/06 6:36 AM, "Jim Buttafuoco" <jim@contactbda.com> wrote:
>
> > The access is very fast when looking for a small subset of the data.
>
> I guess you are not using indexes because building a (non bitmap) index on
> 6TB on a single machine would take days if not weeks.
>
> So if you are using table partitioning, do you have to refer to each child
> table separately in your queries?
>
> - Luke
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly
------- End of Original Message -------

Re: Quick Performance Poll

From

"Luke Lonergan"

Date:

20 April 2006, 15:03:39

Jim,

On 4/20/06 7:40 AM, "Jim Buttafuoco" <jim@contactbda.com> wrote:

> First of all this is NOT a single table and yes I am using partitioning and
> the constaint exclusion stuff.  the largest
> set of tables is over 2T.  I have not had to rebuild the biggest database yet,
> but for a smaller one ~1T the restore
> takes about 12 hours including many indexes on both large and small tables

You would probably benefit greatly from the new on-disk bitmap index feature
in Bizgres Open Source.  It's 8.1 plus the sort speed improvement and
on-disk bitmap index.

Index creation and sizes for the binary version are in the table below (from
a performance report on bizgres network.  The version in CVS tip on
pgfoundry is much faster on index creation as well.

The current drawback to bitmap index is that it isn't very maintainable
under insert/update, although it is safe for those operations.  For now, you
have to drop index, do inserts/updates, rebuild index.

We'll have a version that is maintained for insert/update next.

- Luke

  #   Indexed Columns   Create Time (seconds)   Space Used (MBs)
                                BITMAP   BTREE   BITMAP   BTREE
  1   L_SHIPMODE                454.8   2217.1   58     1804
  2   L_QUANTITY                547.2   937.8    117    1804
  3   L_LINENUMBER              374.5   412.4    59     1285
  4   L_SHIPMODE, L_QUANTITY    948.7   2933.4   176    2845
  5   O_ORDERSTATUS             83.5    241.3    5      321
  6   O_ORDERPRIORITY           108.5   679.1    11     580
  7   C_MKTSEGMENT              10.9    51.3     1      45
  8   C_NATIONKEY               8.3     9.3      2      32

Re: Quick Performance Poll

From

"Jim Buttafuoco"

Date:

20 April 2006, 15:06:40

I have been following your work with great interest.  I believe I spoke to someone from Greenplum at linux world in
Boston a couple of weeks ago.

---------- Original Message -----------
From: "Luke Lonergan" <llonergan@greenplum.com>
To: jim@contactbda.com, "Simon Dale" <sdale@rm.com>, pgsql-performance@postgresql.org
Sent: Thu, 20 Apr 2006 08:03:10 -0700
Subject: Re: [PERFORM] Quick Performance Poll

> Jim,
>
> On 4/20/06 7:40 AM, "Jim Buttafuoco" <jim@contactbda.com> wrote:
>
> > First of all this is NOT a single table and yes I am using partitioning and
> > the constaint exclusion stuff.  the largest
> > set of tables is over 2T.  I have not had to rebuild the biggest database yet,
> > but for a smaller one ~1T the restore
> > takes about 12 hours including many indexes on both large and small tables
>
> You would probably benefit greatly from the new on-disk bitmap index feature
> in Bizgres Open Source.  It's 8.1 plus the sort speed improvement and
> on-disk bitmap index.
>
> Index creation and sizes for the binary version are in the table below (from
> a performance report on bizgres network.  The version in CVS tip on
> pgfoundry is much faster on index creation as well.
>
> The current drawback to bitmap index is that it isn't very maintainable
> under insert/update, although it is safe for those operations.  For now, you
> have to drop index, do inserts/updates, rebuild index.
>
> We'll have a version that is maintained for insert/update next.
>
> - Luke
>
>   #   Indexed Columns   Create Time (seconds)   Space Used (MBs)
>                                 BITMAP   BTREE   BITMAP   BTREE
>   1   L_SHIPMODE                454.8   2217.1   58     1804
>   2   L_QUANTITY                547.2   937.8    117    1804
>   3   L_LINENUMBER              374.5   412.4    59     1285
>   4   L_SHIPMODE, L_QUANTITY    948.7   2933.4   176    2845
>   5   O_ORDERSTATUS             83.5    241.3    5      321
>   6   O_ORDERPRIORITY           108.5   679.1    11     580
>   7   C_MKTSEGMENT              10.9    51.3     1      45
>   8   C_NATIONKEY               8.3     9.3      2      32
------- End of Original Message -------

Re: Quick Performance Poll

From

Markus Schaber

Date:

20 April 2006, 15:11:23

Hi, Luke,

Luke Lonergan wrote:

> The current drawback to bitmap index is that it isn't very maintainable
> under insert/update, although it is safe for those operations.  For now, you
> have to drop index, do inserts/updates, rebuild index.

So they effectively turn the table into a read-only table for now.

Are they capable to index custom datatypes like the PostGIS geometries
that use the GIST mechanism? This could probably speed up our Geo
Databases for Map rendering, containing static data that is updated
approx. 2 times per year.

Markus

--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

Re: Quick Performance Poll

From

"Luke Lonergan"

Date:

20 April 2006, 15:17:13

Markus,

On 4/20/06 8:11 AM, "Markus Schaber" <schabi@logix-tt.com> wrote:

> Are they capable to index custom datatypes like the PostGIS geometries
> that use the GIST mechanism? This could probably speed up our Geo
> Databases for Map rendering, containing static data that is updated
> approx. 2 times per year.

Should work fine - the other limitation is cardinality, or number of unique
values in the column being indexed.  A reasonable limit is about 10,000
unique values in the column.

We're also going to improve this aspect of the implementation, but the
progress might take the useful limit to 300,000 or so.

- Luke

Re: Quick Performance Poll

From

"Jim C. Nasby"

Date:

20 April 2006, 16:55:42

Interested in doing a case study for the website?

On Thu, Apr 20, 2006 at 09:36:25AM -0400, Jim Buttafuoco wrote:
>
> Simon,
>
> I have many databases over 1T with the largest being ~6T.  All of my databases store telecom data, such as call
detail
> records.  The access is very fast when looking for a small subset of the data.  For servers, I am using white box
intel
> XEON and P4 systems with SATA disks, 4G of memory.  SCSI is out of our price range, but if I had unlimited $ I would
go
> with SCSI /SCSI raid instead.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Quick Performance Poll

From

"Milen Kulev"

Date:

20 April 2006, 19:45:23

Hi Luke,
I (still) haven't tried Bizgres, but what do you mean  with "The current drawback to bitmap index is that it isn't very
maintainable under insert/update, although it is safe for those operations"?

Do you mean that INSERT/UPDATE operations against bitmap indexes are imperformant ?
If yes, to what extend ?

Or you mean that bitmap index corruption is possible when issueing DML  againts BMP indexes?
Or  BMP indexes are growing too fast as a result of DML ?

I am asking this question because Oracle needed 3 years to solve its BMP index problems (BMP index corruption/ space
usage explosion when several processes are performing DML operations ).

Is Bizgres implementation  suffering from this kind child deseases ?

Regards . Milen

-----Original Message-----
From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Luke Lonergan
Sent: Thursday, April 20, 2006 5:03 PM
To: jim@contactbda.com; Simon Dale; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Quick Performance Poll

Jim,

On 4/20/06 7:40 AM, "Jim Buttafuoco" <jim@contactbda.com> wrote:

> First of all this is NOT a single table and yes I am using
> partitioning and the constaint exclusion stuff.  the largest set of
> tables is over 2T.  I have not had to rebuild the biggest database
> yet, but for a smaller one ~1T the restore takes about 12 hours
> including many indexes on both large and small tables

You would probably benefit greatly from the new on-disk bitmap index feature in Bizgres Open Source.  It's 8.1 plus the
sort speed improvement and on-disk bitmap index.

Index creation and sizes for the binary version are in the table below (from a performance report on bizgres network.
The version in CVS tip on pgfoundry is much faster on index creation as well.

The current drawback to bitmap index is that it isn't very maintainable under insert/update, although it is safe for
those operations.  For now, you have to drop index, do inserts/updates, rebuild index.

We'll have a version that is maintained for insert/update next.

- Luke

  #   Indexed Columns   Create Time (seconds)   Space Used (MBs)
                                BITMAP   BTREE   BITMAP   BTREE
  1   L_SHIPMODE                454.8   2217.1   58     1804
  2   L_QUANTITY                547.2   937.8    117    1804
  3   L_LINENUMBER              374.5   412.4    59     1285
  4   L_SHIPMODE, L_QUANTITY    948.7   2933.4   176    2845
  5   O_ORDERSTATUS             83.5    241.3    5      321
  6   O_ORDERPRIORITY           108.5   679.1    11     580
  7   C_MKTSEGMENT              10.9    51.3     1      45
  8   C_NATIONKEY               8.3     9.3      2      32

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

Re: Quick Performance Poll

From

"Luke Lonergan"

Date:

20 April 2006, 21:28:38

Milen,

On 4/20/06 12:45 PM, "Milen Kulev" <makulev@gmx.net> wrote:

> I (still) haven't tried Bizgres, but what do you mean  with "The current
> drawback to bitmap index is that it isn't very
> maintainable under insert/update, although it is safe for those operations"?

Yes.

> Do you mean that INSERT/UPDATE operations against bitmap indexes are
> imperformant ?
> If yes, to what extend ?

Insert/Update (but not delete) operations will often invalidate a bitmap
index in our current implementation because we have not implemented a
maintenance method for them when insertions re-use TIDs.  We are in the
planning stages for an update that will fix this.

> Or you mean that bitmap index corruption is possible when issueing DML
> againts BMP indexes?

We check for the case of an insertion that causes a re-used TID and issue an
error that indicates the index should be removed before the operation is
retried.  This isn't particularly useful for cases where inserts occur
frequently, so the current use-case if for tables where DML should be done
in batches after removing the index, then the index re-applied.

> I am asking this question because Oracle needed 3 years to solve its BMP index
> problems (BMP index corruption/ space
> usage explosion when several processes are performing DML operations ).

We will be much faster than that!  Concurrency will be less than ideal with
our maintenance approach initially, but there shouldn't be a corruption
problem.

> Is Bizgres implementation  suffering from this kind child deseases ?

Sneeze, cough.

- Luke
>
>
> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org
> [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Luke Lonergan
> Sent: Thursday, April 20, 2006 5:03 PM
> To: jim@contactbda.com; Simon Dale; pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Quick Performance Poll
>
>
> Jim,
>
> On 4/20/06 7:40 AM, "Jim Buttafuoco" <jim@contactbda.com> wrote:
>
>> First of all this is NOT a single table and yes I am using
>> partitioning and the constaint exclusion stuff.  the largest set of
>> tables is over 2T.  I have not had to rebuild the biggest database
>> yet, but for a smaller one ~1T the restore takes about 12 hours
>> including many indexes on both large and small tables
>
> You would probably benefit greatly from the new on-disk bitmap index feature
> in Bizgres Open Source.  It's 8.1 plus the
> sort speed improvement and on-disk bitmap index.
>
> Index creation and sizes for the binary version are in the table below (from a
> performance report on bizgres network.
> The version in CVS tip on pgfoundry is much faster on index creation as well.
>
> The current drawback to bitmap index is that it isn't very maintainable under
> insert/update, although it is safe for
> those operations.  For now, you have to drop index, do inserts/updates,
> rebuild index.
>
> We'll have a version that is maintained for insert/update next.
>
> - Luke
>
>   #   Indexed Columns   Create Time (seconds)   Space Used (MBs)
>                                 BITMAP   BTREE   BITMAP   BTREE
>   1   L_SHIPMODE                454.8   2217.1   58     1804
>   2   L_QUANTITY                547.2   937.8    117    1804
>   3   L_LINENUMBER              374.5   412.4    59     1285
>   4   L_SHIPMODE, L_QUANTITY    948.7   2933.4   176    2845
>   5   O_ORDERSTATUS             83.5    241.3    5      321
>   6   O_ORDERPRIORITY           108.5   679.1    11     580
>   7   C_MKTSEGMENT              10.9    51.3     1      45
>   8   C_NATIONKEY               8.3     9.3      2      32
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>
>
>

Re: Quick Performance Poll

From

"Milen Kulev"

Date:

21 April 2006, 05:22:36

Hi Luke,
Thank you very much for your prompt reply.
I have got ***much*** more information than  expected ;)

Obviously there are thing to be improved in the current implementation of BMP indexes,
But anyway they are worth usung (I wa pretty impressed from the BMP index performance, after
Reading a PDF document on Bizgres site).

Thanks ahain for the information.

Regards. Milen

-----Original Message-----
From: Luke Lonergan [mailto:llonergan@greenplum.com]
Sent: Thursday, April 20, 2006 11:28 PM
To: Milen Kulev
Cc: pgsql-performance@postgresql.org; bizgres-general
Subject: Re: [PERFORM] Quick Performance Poll


Milen,

On 4/20/06 12:45 PM, "Milen Kulev" <makulev@gmx.net> wrote:

> I (still) haven't tried Bizgres, but what do you mean  with "The
> current drawback to bitmap index is that it isn't very maintainable
> under insert/update, although it is safe for those operations"?

Yes.

> Do you mean that INSERT/UPDATE operations against bitmap indexes are
> imperformant ? If yes, to what extend ?

Insert/Update (but not delete) operations will often invalidate a bitmap index in our current implementation because we
have not implemented a maintenance method for them when insertions re-use TIDs.  We are in the planning stages for an
update that will fix this.

> Or you mean that bitmap index corruption is possible when issueing DML
> againts BMP indexes?

We check for the case of an insertion that causes a re-used TID and issue an error that indicates the index should be
removed before the operation is retried.  This isn't particularly useful for cases where inserts occur frequently, so
the current use-case if for tables where DML should be done in batches after removing the index, then the index
re-applied.

> I am asking this question because Oracle needed 3 years to solve its
> BMP index problems (BMP index corruption/ space usage explosion when
> several processes are performing DML operations ).

We will be much faster than that!  Concurrency will be less than ideal with our maintenance approach initially, but
there shouldn't be a corruption problem.

> Is Bizgres implementation  suffering from this kind child deseases ?

Sneeze, cough.

- Luke
>
>
> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org
> [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Luke
> Lonergan
> Sent: Thursday, April 20, 2006 5:03 PM
> To: jim@contactbda.com; Simon Dale; pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Quick Performance Poll
>
>
> Jim,
>
> On 4/20/06 7:40 AM, "Jim Buttafuoco" <jim@contactbda.com> wrote:
>
>> First of all this is NOT a single table and yes I am using
>> partitioning and the constaint exclusion stuff.  the largest set of
>> tables is over 2T.  I have not had to rebuild the biggest database
>> yet, but for a smaller one ~1T the restore takes about 12 hours
>> including many indexes on both large and small tables
>
> You would probably benefit greatly from the new on-disk bitmap index
> feature in Bizgres Open Source.  It's 8.1 plus the sort speed
> improvement and on-disk bitmap index.
>
> Index creation and sizes for the binary version are in the table below
> (from a performance report on bizgres network. The version in CVS tip
> on pgfoundry is much faster on index creation as well.
>
> The current drawback to bitmap index is that it isn't very
> maintainable under insert/update, although it is safe for those
> operations.  For now, you have to drop index, do inserts/updates,
> rebuild index.
>
> We'll have a version that is maintained for insert/update next.
>
> - Luke
>
>   #   Indexed Columns   Create Time (seconds)   Space Used (MBs)
>                                 BITMAP   BTREE   BITMAP   BTREE
>   1   L_SHIPMODE                454.8   2217.1   58     1804
>   2   L_QUANTITY                547.2   937.8    117    1804
>   3   L_LINENUMBER              374.5   412.4    59     1285
>   4   L_SHIPMODE, L_QUANTITY    948.7   2933.4   176    2845
>   5   O_ORDERSTATUS             83.5    241.3    5      321
>   6   O_ORDERPRIORITY           108.5   679.1    11     580
>   7   C_MKTSEGMENT              10.9    51.3     1      45
>   8   C_NATIONKEY               8.3     9.3      2      32
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>
>
>