Thread: 2 million+ entries

2 million+ entries

From

Matthew Hixson

Date:

20 August 1999, 20:46:09

Hi, I'm going to be working on a project soon that involves an SQL database. If
I have my way it will be using PostgreSQL 6.5 on a dual PII-350 with 256 MB of
RAM and two 18GB UW2 SCSI drives (Seagate Barracudas). However there is a
possibility that it could be running on an NT server with even beefier hardware. What I'm wondering is; is there any
problemwith Postgres handling a table

with 2 million entries in it? Is there an NT solution that would work better?
Personally I can't stand NT, but its not really my decision to make. This is
going to be mission critical so must remain rock solid 24x7. (On second
thought that kindof rules out NT doesn't it...? :) Anyway here is the table structure that my project manager came up
with. I

mentioned that it might be better to put the "Subscriber Table", "Usage Table",
"Personal Information Table", "Credit Card Information Table", and "Interests
Table" all into one larger table. So now we're wondering: Does having
separate tables like this make sense for speeding up lookups or would it be
faster with my one large table idea? In either case will PostgreSQL be able to
handle this much data gracefully? Thanks in advance for any insight into this. -M@

Subscriber TableUserName (unique)PasswordActive

Usage TableUserName
(unique)DiskSpaceAuthorizedAmountDiskSpaceAuthorizedDateLastLoginFailedAttemptsDefaultDirectoryLocalDefaultDirectoryzDiskAccountCreateDateOverDrawnWarningFlagNotifyFlag

Personal Information TableUserName
(unique)FirstNameLastNameEmailAddressHomeAddress1HomeAddress2HomeCityHomeStateHomeCountryHomePostalCodeHomePhone1HomePhone2HomeFaxWorkAddress1WorkAddress2WorkCityWorkStateWorkCountryWorkPostalCodeWorkPhone1WorkPhone2WorkFaxSexBirthYearInterests

Credit Card Info TableUserName
(unique)CreditCardCardNumberIssuingBankNameOnCardExpirationDateBillingNameBillingAddress1BillingAddress2BillingCityBillingStateBillingPostalCode

Disk Space Pricing TableInitialFreeSizeNextBlockSizeNextBlockPriceQuantityDiscount

Interests TableSports - Active�Sports - Leisure�Theater�Collecting�Fashion�
--
Matthew Hixson - CIO
FroZenWave Communications
http://www.frozenwave.com

Re: [SQL] 2 million+ entries

From

Michael Richards

Date:

24 August 1999, 02:30:35

On Fri, 20 Aug 1999, Matthew Hixson wrote:

>   I'm going to be working on a project soon that involves an SQL database.  If
> I have my way it will be using PostgreSQL 6.5 on a dual PII-350 with 256 MB of
> RAM and two 18GB UW2 SCSI drives (Seagate Barracudas).  However there is a
> possibility that it could be running on an NT server with even beefier hardware.
If you're planning on doing some heavy database processing and have the
funds, I'd suggest using smaller drives with a RAID controller. Large
drives such as the 18Gb store to much data per platter and as a result end
up a little IO bound when you're doing heavy processing.

I'd suggest using a *BSD, ie FreeBSD, as it's filsystem performance is
much better than something like ext2. As for the large table size, I ran a
postgres database with 15,000,000 tuples once. It was 6.3, and the vacuum
process was particularily nasty, but I don't think you will have any
problems with 6.5.

We're currently testing a system with about 1.2 million records with 6.5
and it outperforms SQL server 6 by quite a lot. It's running on a dual P3
with a 4 9.1 gig Cheetah array on a DPT controller running RAID0. I'm
pretty happy with it... 

If you want to look at another aspect, even if Postgres didn't perform
quite as well as SQL server, consider about $1k for an NT server license
and $2k for a 10 client SQL server license.... put that moolah into
hardware and you're ahead again...

-Michael

Re: [SQL] 2 million+ entries

From

Matthew Hixson

Date:

24 August 1999, 03:31:36

On Mon, 23 Aug 1999, Michael Richards wrote:

Thanks for the reply, Michael.  I'll forward it on to my project manager.

>I'd suggest using a *BSD, ie FreeBSD, as it's filsystem performance is
>much better than something like ext2. 

Speaking of filesystems does anyone know when SGI's XFS filesystem is supposed
to be available on Linux?  I hear they recently GPL'd it. Thanks,  -M@

--
Matthew Hixson - CIO 
FroZenWave Communications 
http://www.frozenwave.com

Re: [SQL] 2 million+ entries

From

Moray McConnachie

Date:

24 August 1999, 04:09:36

> On Mon, 23 Aug 1999, Michael Richards wrote:
> 

> 
> >I'd suggest using a *BSD, ie FreeBSD, as it's filsystem performance is
> >much better than something like ext2. 
> 

People that ought to know tell me:

"No. I don't think even the *BSD advocates claim that *BSD 
beats Linux on large boxes (i.e. SMP and plenty of memory). 
The only places I can think of off-hand where *BSD advocates
can argue even the possibility for better performance 
against Linux are 
(1) when heavy swapping is going on (which means the box was
badly balanced when bought especially if it's a compute 
server) and
(2) certain unusual network uses (due to design differences and
historical curiosities). There are some functionality differences
between *BSD and Linux but that's a different issue."

However, we were talking about 2 GB plus of memory here...

----------------------
Moray.McConnachie@computing-services.oxford.ac.uk

Re: [SQL] 2 million+ entries

From

"Rudy Gireyev"

Date:

24 August 1999, 04:23:36

While we are on the hardware subject, why not replace the PIII
with Xeons? Either PII or PIII.

Rudy

On 24 Aug 99, at 3:30, Michael Richards wrote:

> On Fri, 20 Aug 1999, Matthew Hixson wrote:
> 
> >   I'm going to be working on a project soon that involves an SQL
> >   database.  If
> > I have my way it will be using PostgreSQL 6.5 on a dual PII-350 with 256
> > MB of RAM and two 18GB UW2 SCSI drives (Seagate Barracudas).  However
> > there is a possibility that it could be running on an NT server with
> > even beefier hardware.
> If you're planning on doing some heavy database processing and have the
> funds, I'd suggest using smaller drives with a RAID controller. Large
> drives such as the 18Gb store to much data per platter and as a result end
> up a little IO bound when you're doing heavy processing.
> 
> I'd suggest using a *BSD, ie FreeBSD, as it's filsystem performance is
> much better than something like ext2. As for the large table size, I ran a
> postgres database with 15,000,000 tuples once. It was 6.3, and the vacuum
> process was particularily nasty, but I don't think you will have any
> problems with 6.5.
> 
> We're currently testing a system with about 1.2 million records with 6.5
> and it outperforms SQL server 6 by quite a lot. It's running on a dual P3
> with a 4 9.1 gig Cheetah array on a DPT controller running RAID0. I'm
> pretty happy with it... 
> 
> If you want to look at another aspect, even if Postgres didn't perform
> quite as well as SQL server, consider about $1k for an NT server license
> and $2k for a 10 client SQL server license.... put that moolah into
> hardware and you're ahead again...
> 
> -Michael
> 
> 
> ************
> 
>

Re: [SQL] 2 million+ entries

From

Michael Richards

Date:

24 August 1999, 21:25:55

On Tue, 24 Aug 1999, Moray McConnachie wrote:

> > >I'd suggest using a *BSD, ie FreeBSD, as it's filsystem performance is
> > >much better than something like ext2. 
> 
> People that ought to know tell me:
> 
> "No. I don't think even the *BSD advocates claim that *BSD 
> beats Linux on large boxes (i.e. SMP and plenty of memory). 
> The only places I can think of off-hand where *BSD advocates
> can argue even the possibility for better performance 
> against Linux are 
> (1) when heavy swapping is going on (which means the box was
> badly balanced when bought especially if it's a compute 
> server) and
If your system has resorted to swapping, it's obviously in big trouble.
Linux's comparison problems start long before a system starts swapping.
Given the swap fragmentation problems, there is no "possibility" for 
better performance... Even under paging conditions, linux falls far
short. You can't do a linear search through the entire map of the swap
every page you write and expect to maintain any sort of comparison against
some of the more mature unixes. Quite simply, it's flawed and needs to be
fixed; someone will do it sooner or later, but I'm building solutions now.

I'm not suggesting FreeBSD because I'm blindly advocating it's use... ie
"Linux RULEZ"
Which other unixes have you tried?
"Um... Windows???"
Rather, I'm suggesting it because we've tried it and found it performed
better for us than linux. This was not a blind decision; rather an
informed one following many tests and benchmarks.

> (2) certain unusual network uses (due to design differences and
> historical curiosities). There are some functionality differences
> between *BSD and Linux but that's a different issue."
This is not really an issue with filesystem performance. With a table of
2 million tuples, that is what eats the majority of your query processing
time.

I think it's well known that ffs is much faster than ext2. Given
softupdates, it's even better. Softupdates also avoids the filesystem
corruption problems with ext2 because it writes dependant entities in the 
proper order, not just any old order they appear in the filesystem cache.

If ext2 did perform just as well as ffs, then why is it being re-designed
for the 3rd time? Maybe ext3 will be better, but for now, I stick with
proven solutions. If it's not broken, don't fix it.

-Michael