Re: blobs - Mailing list pgsql-admin

From Steve Holdoway
Subject Re: blobs
Date
Msg-id 20070202073937.b80f6ccc.steve.holdoway@firetrust.com
Whole thread Raw
In response to Re: blobs  ("Chad Wagner" <chad.wagner@gmail.com>)
Responses Re: blobs  ("Chad Wagner" <chad.wagner@gmail.com>)
List pgsql-admin
On Wed, 31 Jan 2007 18:45:15 -0500
"Chad Wagner" <chad.wagner@gmail.com> wrote:

> On 1/31/07, Steve Holdoway <steve.holdoway@firetrust.com> wrote:
> >
> > The table I'm concerned with at the moment have (currently) 5 million
> > rows, with a churn of about 300,000 rows a week. The table has about a
> > million hits a day, which makes it the main potential bottleneck in this
> > database.
>
>
> Why would it be a "bottleneck"?  If you are updating or deleting 300K a
> week, definitely make sure you take a look at autovacuum and turn it ON.
It is a potential bottleneck as I said. I am replacing between 5 and 10 percent of the data in the table every week.
Thereare, in addition, a daily total of about a million hits ( read/update/delete ), as I said.  

This is the busiest table in the database. Therefore, it's the most likely candidate to cause performance problems.
That'swhat I call a bottleneck. 
>
>
> We need to store some large ( 0 -> 100kB ) data with each row. Would you
> > recommend adding it as columns in this table, given that blobs will be
> > stored in the pg_largeobject table anyway, or would you recommend a daughter
> > table for this?
>
>
> Depends on how you are querying the table.  This is really a database
> modeling question, and leads into many many more questions.  I would say if
> your frequently range scanning the table (selecting several rows) and in
> those cases you rarely need the "blob", then I would fork it off into a
> child table.  If the "blob" is rarely accessed, and only accessed directly,
> then definitely a child table in my book.
>
> The reason is if your frequently fetching rows from this table and rarely
> using the blob all you are doing is consuming memory that could be better
> used for other things, and spinning I/O when it is not necessary.
Sorry, you're completley wrong. If you consider that the only way of getting info is select *, then this is true.
Personally,I think that anyone who does this in code is plain lazy and should find a job more suited to them (: 

I am asking for input from those who have been in this situation before, and have experience in the tradeoff of running
aseparate table for the big stuff as against the extra ( 8 byte? ) column that would be added to the master table. Why
amI asking this? Because Postgres has an unique way of handling this kind of data, unique from even the last time I
usedpostgres in anger - 7.4. It's different from every other rdbms ( and MySQL ), and I have no practical experience of
itin the wild. 
>
>
> Any other suggestions on how to avoid performance problems with this table (
> > hardware is dual Xeon, 4GB mem, 2 hardware raid channels for storage + 1 for
> > logs, all running debian 32 bit ).
> >
>
> It really has to do with how you design your schema.
No. It really has to do with the gathering of relevant information so that I can design my schema properly.
Academically,3NF may be the solution to every design, but in the real world, there are many other things to take in to
account.I've monitored the data flows, sized the hardware to handle the IO, and can either spend a week or two
benchmarkingdifferent solutions from cold, or I can take the advice of those with relevant experience ( who I expect to
findon this list ) to point me in the right direction first. 
>
>
> --
> Chad
> http://www.postgresqlforums.com/
>
Steve.

PS. Please fix your headers so replies to your mails appear on the list directly.

pgsql-admin by date:

Previous
From: Sidar López Cruz
Date:
Subject: User OID
Next
From: "Alexander B."
Date:
Subject: Install Tsearch2