Re: Test database for new installs? - Mailing list pgsql-hackers

From Greg Stark
Subject Re: Test database for new installs?
Date
Msg-id 87ekilzljr.fsf@stark.xeocode.com
Whole thread Raw
In response to Re: Test database for new installs?  (Josh Berkus <josh@agliodbs.com>)
List pgsql-hackers
Josh Berkus <josh@agliodbs.com> writes:

> Also, for tables where the text key is required in the table, *adding* an 
> additional INT column as a key is no improvement in performance.

Not true. Every table which references the varchar-keyed table needs to have a
complete copy of the varchar key. Plus the indexes on the table (and often
those referencing tables) are all bigger too. 

> > Additionally, I regretted *every* case where I decided to use some text
> > data as key, sooner or later.
> 
> Well, obviously you and I have had different workloads then.

My experience agrees with his.

A good example was using user provided text usernames as a primary key. The
application guaranteed they would be unique, and they couldn't be changed. Had
those requirements changed things would have gotten very nasty. No, ON UPDATE
CASCADE doesn't solve things when you have a few hundred million records
referencing the table on a 24x7 application. Especially when you have a few
hundred million more archived records on tape and in your data warehouse for
doing statistical analyses.

In fact those requirements never changed. And yet we still ended up regretting
that decision for multiple reasons:

. The varchar field spread throughout the database like a slow rot to tables that referenced users. Some of the largest
tablesin the database ended up 10-30% inflated in size due to that field alone. Their indexes were even more inflated.
 

. Later we had to export data to a third party and receive data back from them. Their mainframe uppercased all the text
weprovided for them in the key. It was also fixed position so any trailing spaces were effectively lost.
 

Because of the latter problem we added an integer field. 

Afterwards we started using that to reference the users table on any new
table. The speed difference on index scans was noticable. Actually this was
Oracle. I don't have empirical tests for postgres to know if it would be the
same. Actually I would expect the difference for Postgres would be even
greater. Postgres stores integers directly in Datums but varchars require a
palloc, and any comparison involves strcoll calls which can be quite slow
compared to an integer == call.

> > We will probably find that *everthing* each of us does will offend somebody
> > else. I'm not too keen to get into "best practice" wars. That's what IRC is
> > for ;-)

Now can we discuss naming conventions for primary keys? :)

-- 
greg



pgsql-hackers by date:

Previous
From: Robert Treat
Date:
Subject: Re: Test database for new installs?
Next
From: Tom Lane
Date:
Subject: Re: another plperl bug