Thread: Quick Performance Poll
Hi,
I was just wondering whether anyone has had success with storing more than 1TB of data with PostgreSQL and how they have found the performance.
We need a database that can store in excess of this amount and still show good performance. We will probably be implementing several tables with foreign keys and also indexes which will obviously impact on both data size and performance too.
Many thanks in advance,
Simon
Visit our Website at www.rm.com
This message is confidential. You should not copy it or disclose its contents to anyone. You may use and apply the information for the intended purpose only. Internet communications are not secure; therefore, RM does not accept legal responsibility for the contents of this message. Any views or opinions presented are those of the author only and not of RM. If this email has come to you in error, please delete it, along with any attachments. Please note that RM may intercept incoming and outgoing email communications.
Freedom of Information Act 2000
This email and any attachments may contain confidential information belonging to RM. Where the email and any attachments do contain information of a confidential nature, including without limitation information relating to trade secrets, special terms or prices these shall be deemed for the purpose of the Freedom of Information Act 2000 as information provided in confidence by RM and the disclosure of which would be prejudicial to RM's commercial interests.
This email has been scanned for viruses by Trend ScanMail.
Simon, I have many databases over 1T with the largest being ~6T. All of my databases store telecom data, such as call detail records. The access is very fast when looking for a small subset of the data. For servers, I am using white box intel XEON and P4 systems with SATA disks, 4G of memory. SCSI is out of our price range, but if I had unlimited $ I would go with SCSI /SCSI raid instead. Jim ---------- Original Message ----------- From: "Simon Dale" <sdale@rm.com> To: <pgsql-performance@postgresql.org> Sent: Thu, 20 Apr 2006 14:18:58 +0100 Subject: [PERFORM] Quick Performance Poll > Hi, > > I was just wondering whether anyone has had success with storing more > than 1TB of data with PostgreSQL and how they have found the > performance. > > We need a database that can store in excess of this amount and still > show good performance. We will probably be implementing several tables > with foreign keys and also indexes which will obviously impact on both > data size and performance too. > > Many thanks in advance, > > Simon > Visit our Website at http://www.rm.com > > This message is confidential. You should not copy it or disclose its contents to anyone. You may use and apply > the information for the intended purpose only. Internet communications are not secure; therefore, RM does not > accept legal responsibility for the contents of this message. Any views or opinions presented are those of the > author only and not of RM. If this email has come to you in error, please delete it, along with any > attachments. Please note that RM may intercept incoming and outgoing email communications. > > Freedom of Information Act 2000 > This email and any attachments may contain confidential information belonging to RM. Where the email and any > attachments do contain information of a confidential nature, including without limitation information relating > to trade secrets, special terms or prices these shall be deemed for the purpose of the Freedom of Information > Act 2000 as information provided in confidence by RM and the disclosure of which would be prejudicial to RM's > commercial interests. > > This email has been scanned for viruses by Trend ScanMail. ------- End of Original Message -------
Jim, On 4/20/06 6:36 AM, "Jim Buttafuoco" <jim@contactbda.com> wrote: > The access is very fast when looking for a small subset of the data. I guess you are not using indexes because building a (non bitmap) index on 6TB on a single machine would take days if not weeks. So if you are using table partitioning, do you have to refer to each child table separately in your queries? - Luke
First of all this is NOT a single table and yes I am using partitioning and the constaint exclusion stuff. the largest set of tables is over 2T. I have not had to rebuild the biggest database yet, but for a smaller one ~1T the restore takes about 12 hours including many indexes on both large and small tables Jim ---------- Original Message ----------- From: "Luke Lonergan" <llonergan@greenplum.com> To: jim@contactbda.com, "Simon Dale" <sdale@rm.com>, pgsql-performance@postgresql.org Sent: Thu, 20 Apr 2006 07:31:33 -0700 Subject: Re: [PERFORM] Quick Performance Poll > Jim, > > On 4/20/06 6:36 AM, "Jim Buttafuoco" <jim@contactbda.com> wrote: > > > The access is very fast when looking for a small subset of the data. > > I guess you are not using indexes because building a (non bitmap) index on > 6TB on a single machine would take days if not weeks. > > So if you are using table partitioning, do you have to refer to each child > table separately in your queries? > > - Luke > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly ------- End of Original Message -------
Jim, On 4/20/06 7:40 AM, "Jim Buttafuoco" <jim@contactbda.com> wrote: > First of all this is NOT a single table and yes I am using partitioning and > the constaint exclusion stuff. the largest > set of tables is over 2T. I have not had to rebuild the biggest database yet, > but for a smaller one ~1T the restore > takes about 12 hours including many indexes on both large and small tables You would probably benefit greatly from the new on-disk bitmap index feature in Bizgres Open Source. It's 8.1 plus the sort speed improvement and on-disk bitmap index. Index creation and sizes for the binary version are in the table below (from a performance report on bizgres network. The version in CVS tip on pgfoundry is much faster on index creation as well. The current drawback to bitmap index is that it isn't very maintainable under insert/update, although it is safe for those operations. For now, you have to drop index, do inserts/updates, rebuild index. We'll have a version that is maintained for insert/update next. - Luke # Indexed Columns Create Time (seconds) Space Used (MBs) BITMAP BTREE BITMAP BTREE 1 L_SHIPMODE 454.8 2217.1 58 1804 2 L_QUANTITY 547.2 937.8 117 1804 3 L_LINENUMBER 374.5 412.4 59 1285 4 L_SHIPMODE, L_QUANTITY 948.7 2933.4 176 2845 5 O_ORDERSTATUS 83.5 241.3 5 321 6 O_ORDERPRIORITY 108.5 679.1 11 580 7 C_MKTSEGMENT 10.9 51.3 1 45 8 C_NATIONKEY 8.3 9.3 2 32
I have been following your work with great interest. I believe I spoke to someone from Greenplum at linux world in Boston a couple of weeks ago. ---------- Original Message ----------- From: "Luke Lonergan" <llonergan@greenplum.com> To: jim@contactbda.com, "Simon Dale" <sdale@rm.com>, pgsql-performance@postgresql.org Sent: Thu, 20 Apr 2006 08:03:10 -0700 Subject: Re: [PERFORM] Quick Performance Poll > Jim, > > On 4/20/06 7:40 AM, "Jim Buttafuoco" <jim@contactbda.com> wrote: > > > First of all this is NOT a single table and yes I am using partitioning and > > the constaint exclusion stuff. the largest > > set of tables is over 2T. I have not had to rebuild the biggest database yet, > > but for a smaller one ~1T the restore > > takes about 12 hours including many indexes on both large and small tables > > You would probably benefit greatly from the new on-disk bitmap index feature > in Bizgres Open Source. It's 8.1 plus the sort speed improvement and > on-disk bitmap index. > > Index creation and sizes for the binary version are in the table below (from > a performance report on bizgres network. The version in CVS tip on > pgfoundry is much faster on index creation as well. > > The current drawback to bitmap index is that it isn't very maintainable > under insert/update, although it is safe for those operations. For now, you > have to drop index, do inserts/updates, rebuild index. > > We'll have a version that is maintained for insert/update next. > > - Luke > > # Indexed Columns Create Time (seconds) Space Used (MBs) > BITMAP BTREE BITMAP BTREE > 1 L_SHIPMODE 454.8 2217.1 58 1804 > 2 L_QUANTITY 547.2 937.8 117 1804 > 3 L_LINENUMBER 374.5 412.4 59 1285 > 4 L_SHIPMODE, L_QUANTITY 948.7 2933.4 176 2845 > 5 O_ORDERSTATUS 83.5 241.3 5 321 > 6 O_ORDERPRIORITY 108.5 679.1 11 580 > 7 C_MKTSEGMENT 10.9 51.3 1 45 > 8 C_NATIONKEY 8.3 9.3 2 32 ------- End of Original Message -------
Hi, Luke, Luke Lonergan wrote: > The current drawback to bitmap index is that it isn't very maintainable > under insert/update, although it is safe for those operations. For now, you > have to drop index, do inserts/updates, rebuild index. So they effectively turn the table into a read-only table for now. Are they capable to index custom datatypes like the PostGIS geometries that use the GIST mechanism? This could probably speed up our Geo Databases for Map rendering, containing static data that is updated approx. 2 times per year. Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org
Markus, On 4/20/06 8:11 AM, "Markus Schaber" <schabi@logix-tt.com> wrote: > Are they capable to index custom datatypes like the PostGIS geometries > that use the GIST mechanism? This could probably speed up our Geo > Databases for Map rendering, containing static data that is updated > approx. 2 times per year. Should work fine - the other limitation is cardinality, or number of unique values in the column being indexed. A reasonable limit is about 10,000 unique values in the column. We're also going to improve this aspect of the implementation, but the progress might take the useful limit to 300,000 or so. - Luke
Interested in doing a case study for the website? On Thu, Apr 20, 2006 at 09:36:25AM -0400, Jim Buttafuoco wrote: > > Simon, > > I have many databases over 1T with the largest being ~6T. All of my databases store telecom data, such as call detail > records. The access is very fast when looking for a small subset of the data. For servers, I am using white box intel > XEON and P4 systems with SATA disks, 4G of memory. SCSI is out of our price range, but if I had unlimited $ I would go > with SCSI /SCSI raid instead. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Hi Luke, I (still) haven't tried Bizgres, but what do you mean with "The current drawback to bitmap index is that it isn't very maintainable under insert/update, although it is safe for those operations"? Do you mean that INSERT/UPDATE operations against bitmap indexes are imperformant ? If yes, to what extend ? Or you mean that bitmap index corruption is possible when issueing DML againts BMP indexes? Or BMP indexes are growing too fast as a result of DML ? I am asking this question because Oracle needed 3 years to solve its BMP index problems (BMP index corruption/ space usage explosion when several processes are performing DML operations ). Is Bizgres implementation suffering from this kind child deseases ? Regards . Milen -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Luke Lonergan Sent: Thursday, April 20, 2006 5:03 PM To: jim@contactbda.com; Simon Dale; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Quick Performance Poll Jim, On 4/20/06 7:40 AM, "Jim Buttafuoco" <jim@contactbda.com> wrote: > First of all this is NOT a single table and yes I am using > partitioning and the constaint exclusion stuff. the largest set of > tables is over 2T. I have not had to rebuild the biggest database > yet, but for a smaller one ~1T the restore takes about 12 hours > including many indexes on both large and small tables You would probably benefit greatly from the new on-disk bitmap index feature in Bizgres Open Source. It's 8.1 plus the sort speed improvement and on-disk bitmap index. Index creation and sizes for the binary version are in the table below (from a performance report on bizgres network. The version in CVS tip on pgfoundry is much faster on index creation as well. The current drawback to bitmap index is that it isn't very maintainable under insert/update, although it is safe for those operations. For now, you have to drop index, do inserts/updates, rebuild index. We'll have a version that is maintained for insert/update next. - Luke # Indexed Columns Create Time (seconds) Space Used (MBs) BITMAP BTREE BITMAP BTREE 1 L_SHIPMODE 454.8 2217.1 58 1804 2 L_QUANTITY 547.2 937.8 117 1804 3 L_LINENUMBER 374.5 412.4 59 1285 4 L_SHIPMODE, L_QUANTITY 948.7 2933.4 176 2845 5 O_ORDERSTATUS 83.5 241.3 5 321 6 O_ORDERPRIORITY 108.5 679.1 11 580 7 C_MKTSEGMENT 10.9 51.3 1 45 8 C_NATIONKEY 8.3 9.3 2 32 ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Milen, On 4/20/06 12:45 PM, "Milen Kulev" <makulev@gmx.net> wrote: > I (still) haven't tried Bizgres, but what do you mean with "The current > drawback to bitmap index is that it isn't very > maintainable under insert/update, although it is safe for those operations"? Yes. > Do you mean that INSERT/UPDATE operations against bitmap indexes are > imperformant ? > If yes, to what extend ? Insert/Update (but not delete) operations will often invalidate a bitmap index in our current implementation because we have not implemented a maintenance method for them when insertions re-use TIDs. We are in the planning stages for an update that will fix this. > Or you mean that bitmap index corruption is possible when issueing DML > againts BMP indexes? We check for the case of an insertion that causes a re-used TID and issue an error that indicates the index should be removed before the operation is retried. This isn't particularly useful for cases where inserts occur frequently, so the current use-case if for tables where DML should be done in batches after removing the index, then the index re-applied. > I am asking this question because Oracle needed 3 years to solve its BMP index > problems (BMP index corruption/ space > usage explosion when several processes are performing DML operations ). We will be much faster than that! Concurrency will be less than ideal with our maintenance approach initially, but there shouldn't be a corruption problem. > Is Bizgres implementation suffering from this kind child deseases ? Sneeze, cough. - Luke > > > -----Original Message----- > From: pgsql-performance-owner@postgresql.org > [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Luke Lonergan > Sent: Thursday, April 20, 2006 5:03 PM > To: jim@contactbda.com; Simon Dale; pgsql-performance@postgresql.org > Subject: Re: [PERFORM] Quick Performance Poll > > > Jim, > > On 4/20/06 7:40 AM, "Jim Buttafuoco" <jim@contactbda.com> wrote: > >> First of all this is NOT a single table and yes I am using >> partitioning and the constaint exclusion stuff. the largest set of >> tables is over 2T. I have not had to rebuild the biggest database >> yet, but for a smaller one ~1T the restore takes about 12 hours >> including many indexes on both large and small tables > > You would probably benefit greatly from the new on-disk bitmap index feature > in Bizgres Open Source. It's 8.1 plus the > sort speed improvement and on-disk bitmap index. > > Index creation and sizes for the binary version are in the table below (from a > performance report on bizgres network. > The version in CVS tip on pgfoundry is much faster on index creation as well. > > The current drawback to bitmap index is that it isn't very maintainable under > insert/update, although it is safe for > those operations. For now, you have to drop index, do inserts/updates, > rebuild index. > > We'll have a version that is maintained for insert/update next. > > - Luke > > # Indexed Columns Create Time (seconds) Space Used (MBs) > BITMAP BTREE BITMAP BTREE > 1 L_SHIPMODE 454.8 2217.1 58 1804 > 2 L_QUANTITY 547.2 937.8 117 1804 > 3 L_LINENUMBER 374.5 412.4 59 1285 > 4 L_SHIPMODE, L_QUANTITY 948.7 2933.4 176 2845 > 5 O_ORDERSTATUS 83.5 241.3 5 321 > 6 O_ORDERPRIORITY 108.5 679.1 11 580 > 7 C_MKTSEGMENT 10.9 51.3 1 45 > 8 C_NATIONKEY 8.3 9.3 2 32 > > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > > >
Hi Luke, Thank you very much for your prompt reply. I have got ***much*** more information than expected ;) Obviously there are thing to be improved in the current implementation of BMP indexes, But anyway they are worth usung (I wa pretty impressed from the BMP index performance, after Reading a PDF document on Bizgres site). Thanks ahain for the information. Regards. Milen -----Original Message----- From: Luke Lonergan [mailto:llonergan@greenplum.com] Sent: Thursday, April 20, 2006 11:28 PM To: Milen Kulev Cc: pgsql-performance@postgresql.org; bizgres-general Subject: Re: [PERFORM] Quick Performance Poll Milen, On 4/20/06 12:45 PM, "Milen Kulev" <makulev@gmx.net> wrote: > I (still) haven't tried Bizgres, but what do you mean with "The > current drawback to bitmap index is that it isn't very maintainable > under insert/update, although it is safe for those operations"? Yes. > Do you mean that INSERT/UPDATE operations against bitmap indexes are > imperformant ? If yes, to what extend ? Insert/Update (but not delete) operations will often invalidate a bitmap index in our current implementation because we have not implemented a maintenance method for them when insertions re-use TIDs. We are in the planning stages for an update that will fix this. > Or you mean that bitmap index corruption is possible when issueing DML > againts BMP indexes? We check for the case of an insertion that causes a re-used TID and issue an error that indicates the index should be removed before the operation is retried. This isn't particularly useful for cases where inserts occur frequently, so the current use-case if for tables where DML should be done in batches after removing the index, then the index re-applied. > I am asking this question because Oracle needed 3 years to solve its > BMP index problems (BMP index corruption/ space usage explosion when > several processes are performing DML operations ). We will be much faster than that! Concurrency will be less than ideal with our maintenance approach initially, but there shouldn't be a corruption problem. > Is Bizgres implementation suffering from this kind child deseases ? Sneeze, cough. - Luke > > > -----Original Message----- > From: pgsql-performance-owner@postgresql.org > [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Luke > Lonergan > Sent: Thursday, April 20, 2006 5:03 PM > To: jim@contactbda.com; Simon Dale; pgsql-performance@postgresql.org > Subject: Re: [PERFORM] Quick Performance Poll > > > Jim, > > On 4/20/06 7:40 AM, "Jim Buttafuoco" <jim@contactbda.com> wrote: > >> First of all this is NOT a single table and yes I am using >> partitioning and the constaint exclusion stuff. the largest set of >> tables is over 2T. I have not had to rebuild the biggest database >> yet, but for a smaller one ~1T the restore takes about 12 hours >> including many indexes on both large and small tables > > You would probably benefit greatly from the new on-disk bitmap index > feature in Bizgres Open Source. It's 8.1 plus the sort speed > improvement and on-disk bitmap index. > > Index creation and sizes for the binary version are in the table below > (from a performance report on bizgres network. The version in CVS tip > on pgfoundry is much faster on index creation as well. > > The current drawback to bitmap index is that it isn't very > maintainable under insert/update, although it is safe for those > operations. For now, you have to drop index, do inserts/updates, > rebuild index. > > We'll have a version that is maintained for insert/update next. > > - Luke > > # Indexed Columns Create Time (seconds) Space Used (MBs) > BITMAP BTREE BITMAP BTREE > 1 L_SHIPMODE 454.8 2217.1 58 1804 > 2 L_QUANTITY 547.2 937.8 117 1804 > 3 L_LINENUMBER 374.5 412.4 59 1285 > 4 L_SHIPMODE, L_QUANTITY 948.7 2933.4 176 2845 > 5 O_ORDERSTATUS 83.5 241.3 5 321 > 6 O_ORDERPRIORITY 108.5 679.1 11 580 > 7 C_MKTSEGMENT 10.9 51.3 1 45 > 8 C_NATIONKEY 8.3 9.3 2 32 > > > > ---------------------------(end of > broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > > >