Re: VARCHAR -vs- CHAR: huge performance difference? - Mailing list pgsql-admin

From C. Bensend
Subject Re: VARCHAR -vs- CHAR: huge performance difference?
Date
Msg-id 58063.63.227.74.41.1087355391.squirrel@webmail.stinkweasel.net
Whole thread Raw
In response to Re: VARCHAR -vs- CHAR: huge performance difference?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: VARCHAR -vs- CHAR: huge performance difference?
List pgsql-admin
> Right.  The only thing that will do for you is waste space with padding
> blanks.  The performance improvement you saw was due to something else,
> which I theorize is not having to waste time reading dead space.
>
> Since you obviously doubt this assertion, try the same experiment except
> load the data into a new table with the *same* structure as the original
> (ie, all varchar).  You'll see the same or probably better performance.

Hi Tom,

   I don't doubt your assertion, I just don't understand it all yet.  :)
Let me assure you, you're the expert here, not I.

   I did as you suggested - I created a third copy of the table, using
the exact same structure.  And you're absolutely right - it was lightning
fast (around 100ms).

>> Actually, all databases on this server are vacuumed nightly, right
>> before backups.
>
> Not often enough evidently...

This statement worries me a bit.  The data in the original table was
bulk-loaded the other night (less than three days I'm sure), and new
entries have been added at the approximate rate of 300 per day.  Is this
going to continue to happen?

Or do I just need to vacuum more often?  I _did_ try a vacuum before
asking the list for help, but it didn't give any improvement (just a
vacuum analyze).

> Note the difference in "pages".  emails is nearly double the physical
> size, even though the live data in it is doubtless smaller.  (You could
> use the contrib/pgstattuple functions to see exactly how much live data
> there is.)

OK.  I see (and understand) the pages value now.

> I was actually expecting to see more than a 2:1 difference in file size,
> seeing that you reported more than a 2:1 difference in read time.  It
> could be that there is also some question of physical layout of the file
> on disk.  The original table probably accreted over time and was given
> space that's not very consecutive on disk.  The copied table was
> written in one swoop, more or less, and is very likely stored in a more
> nearly consecutive set of physical disk blocks.  I'm not sure how you'd
> investigate this theory though --- AFAIK there isn't any real easy way
> to find out how badly fragmented a file is in most Unix filesystems.

Ugh.  So, what would you recommend as a fix?  I see the problem, and I
see the fix that just worked, but I certainly can't be the only person
around that is using a "wide" table with a lot of character data being
added at a rather slow rate...

Thank you _very much_ for all your help,

Benny


--
"Oh, the Jedis are going to feel this one!"       -- Professor Farnsworth,
                                                     "Futurama"

pgsql-admin by date:

Previous
From: Tom Lane
Date:
Subject: Re: VARCHAR -vs- CHAR: huge performance difference?
Next
From: "Iain"
Date:
Subject: Re: VARCHAR -vs- CHAR: huge performance difference?