Thread: VARCHAR -vs- CHAR: huge performance difference?

VARCHAR -vs- CHAR: huge performance difference?

From

"C. Bensend"

Date:

15 June 2004, 21:58:32

Hey folks,

   I am working on a rather small, simple database.  I'm running 7.3.5 on
an OpenBSD 3.5-STABLE machine (1.3GHz, 512M RAM, IDE drive using UDMA-5).

   I am parsing and storing emails, so I have a lot of character string
data to worry about.  In one particular table, I have 26 columns of type
varchar, with widths of 24-8192 characters.  This is the table that is
giving me some pretty severe performance problems.

   For example, many of the statistics I run against these emails is a
simple count(), whether it be on a column or *.  Right now, counting the
rows takes in excess of 3.5 seconds.  OK, that's pretty slow.  However,
there are only --> 24,000 <-- rows in this table!  I could see it taking
a long time if I had 500,000 rows, or a million, but 24,000?

   Now, I am not an expert with database design whatsoever, so I fully
admit that having a table with 26 varchar columns (along with a handful
of other fixed-width columns) probably isn't the brightest design.  :)
So, I went ahead and created an exact copy of this table, with the
exception of creating all character columns as type char(), not varchar().
I was pondering if making PostgreSQL worry about the varying lengths
by using varchar was the problem...

   And sure enough, counting the rows on the new table takes around
148ms.  That's a pretty big difference from 3600ms.  And no, 3.6 seconds
doesn't sound like much, until you have several operations on the same
table to draw a single web page, and suddenly the page takes 20s to
load.  :(

   So, my question for the gurus:  is using varchars THAT detrimental?
Or am I just going way way overboard by having so _many_ in a single
table?  Or am I an idiot?  (honest question - I'm armpit-deep in learning
mode here)  I'm also curious to know if I've crossed some invisible
line with the number of columns/width of rows that makes performance
degrade rapidly.

   If further info is needed, please ask - I just didn't want to spam
the list with further table definitions and explain analyze output if
it wasn't needed.

Benny


--
"Oh, the Jedis are going to feel this one!"       -- Professor Farnsworth,
                                                     "Futurama"

Re: VARCHAR -vs- CHAR: huge performance difference?

From

"Scott Marlowe"

Date:

15 June 2004, 22:22:44

On Tue, 2004-06-15 at 18:58, C. Bensend wrote:
> Hey folks,
>
>    I am working on a rather small, simple database.  I'm running 7.3.5 on
> an OpenBSD 3.5-STABLE machine (1.3GHz, 512M RAM, IDE drive using UDMA-5).
>
>    I am parsing and storing emails, so I have a lot of character string
> data to worry about.  In one particular table, I have 26 columns of type
> varchar, with widths of 24-8192 characters.  This is the table that is
> giving me some pretty severe performance problems.
>
>    For example, many of the statistics I run against these emails is a
> simple count(), whether it be on a column or *.  Right now, counting the
> rows takes in excess of 3.5 seconds.  OK, that's pretty slow.  However,
> there are only --> 24,000 <-- rows in this table!  I could see it taking
> a long time if I had 500,000 rows, or a million, but 24,000?
>
>    Now, I am not an expert with database design whatsoever, so I fully
> admit that having a table with 26 varchar columns (along with a handful
> of other fixed-width columns) probably isn't the brightest design.  :)
> So, I went ahead and created an exact copy of this table, with the
> exception of creating all character columns as type char(), not varchar().
> I was pondering if making PostgreSQL worry about the varying lengths
> by using varchar was the problem...
>
>    And sure enough, counting the rows on the new table takes around
> 148ms.  That's a pretty big difference from 3600ms.  And no, 3.6 seconds
> doesn't sound like much, until you have several operations on the same
> table to draw a single web page, and suddenly the page takes 20s to
> load.  :(

Were those fields populated just like the varchar fields?  If not, then
the test proves little.  If so, I find it hard to believe that char(x)
would be any faster than varchar.  They're all handled about the same.

If you want to do count(*) on the table, do it by having a table with
nothing but IDs in it that is referenced by the table with all the
text.  PostgreSQL can't really optimized aggregate functions with
indexes, so it always winds up doing seq scans.

Re: VARCHAR -vs- CHAR: huge performance difference?

From

"C. Bensend"

Date:

15 June 2004, 22:32:28

> Were those fields populated just like the varchar fields?  If not, then
> the test proves little.  If so, I find it hard to believe that char(x)
> would be any faster than varchar.  They're all handled about the same.

Hi Scott,

   Yes, the new table was populated from the data from the original, via
a:

INSERT INTO emails2 SELECT * FROM emails;

   This should be correct, yes?

> If you want to do count(*) on the table, do it by having a table with
> nothing but IDs in it that is referenced by the table with all the
> text.  PostgreSQL can't really optimized aggregate functions with
> indexes, so it always winds up doing seq scans.

   I have also tried doing a count(column) on the emails table, using a
column that is indexed.  It showed no improvement - I would have
expected at least a little gain.  And here's the clincher - when I do
a count(*) on a different table with the same number of rows but only
four varchar columns, it returns the result in 75ms.  (!!!)

Benny


--
"Oh, the Jedis are going to feel this one!"       -- Professor Farnsworth,
                                                     "Futurama"

Re: VARCHAR -vs- CHAR: huge performance difference?

From

Tom Lane

Date:

15 June 2004, 23:08:17

"C. Bensend" <benny@bennyvision.com> writes:
> So, I went ahead and created an exact copy of this table, with the
> exception of creating all character columns as type char(), not varchar().
> I was pondering if making PostgreSQL worry about the varying lengths
> by using varchar was the problem...

The above transformation is a guaranteed loser in Postgres.

>    And sure enough, counting the rows on the new table takes around
> 148ms.  That's a pretty big difference from 3600ms.

I'm betting that the original table is physically huge because you've
not vacuumed it regularly.  The copying produced a table with no wasted
space, so physically smaller even though the live data is noticeably
bigger (because of all the padding blanks you forced to be added).

Check what VACUUM VERBOSE has to say about each of these tables...

            regards, tom lane

Re: VARCHAR -vs- CHAR: huge performance difference?

From

"C. Bensend"

Date:

15 June 2004, 23:12:52

> "C. Bensend" <benny@bennyvision.com> writes:
>> So, I went ahead and created an exact copy of this table, with the
>> exception of creating all character columns as type char(), not
>> varchar().
>> I was pondering if making PostgreSQL worry about the varying lengths
>> by using varchar was the problem...
>
> The above transformation is a guaranteed loser in Postgres.

Hi Tom,

   By transformation, do you mean the varchar() -> char() change?  If
so, I'm not sure I understand - it certainly improved the performance.
Or am I misunderstanding?

> I'm betting that the original table is physically huge because you've
> not vacuumed it regularly.  The copying produced a table with no wasted
> space, so physically smaller even though the live data is noticeably
> bigger (because of all the padding blanks you forced to be added).
>
> Check what VACUUM VERBOSE has to say about each of these tables...

Actually, all databases on this server are vacuumed nightly, right
before backups.  But here is the data:

prod01=> vacuum verbose emails;
INFO:  --Relation public.emails--
INFO:  Index emails_email_id_idx: Pages 358; Tuples 24198: Deleted 82.
        CPU 0.03s/0.01u sec elapsed 0.41 sec.
INFO:  Index emails_date_received_idx: Pages 325; Tuples 24198: Deleted 82.
        CPU 0.00s/0.00u sec elapsed 0.63 sec.
INFO:  Removed 82 tuples in 23 pages.
        CPU 0.00s/0.00u sec elapsed 0.06 sec.
INFO:  Pages 5793: Changed 0, Empty 0; Tup 24198: Vac 82, Keep 0, UnUsed
71757.
        Total CPU 0.24s/0.06u sec elapsed 4.71 sec.
INFO:  --Relation pg_toast.pg_toast_399420--
INFO:  Pages 0: Changed 0, Empty 0; Tup 0: Vac 0, Keep 0, UnUsed 0.
        Total CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM
prod01=> vacuum verbose emails2;
INFO:  --Relation public.emails2--
INFO:  Pages 2646: Changed 0, Empty 0; Tup 24162: Vac 0, Keep 0, UnUsed 0.
        Total CPU 0.10s/0.00u sec elapsed 1.00 sec.
INFO:  --Relation pg_toast.pg_toast_859969--
INFO:  Pages 0: Changed 0, Empty 0; Tup 0: Vac 0, Keep 0, UnUsed 0.
        Total CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM

Thanks very much,

Benny


--
"Oh, the Jedis are going to feel this one!"       -- Professor Farnsworth,
                                                     "Futurama"

Re: VARCHAR -vs- CHAR: huge performance difference?

From

Tom Lane

Date:

15 June 2004, 23:52:01

"C. Bensend" <benny@bennyvision.com> writes:
>> The above transformation is a guaranteed loser in Postgres.

>    By transformation, do you mean the varchar() -> char() change?

Right.  The only thing that will do for you is waste space with padding
blanks.  The performance improvement you saw was due to something else,
which I theorize is not having to waste time reading dead space.

Since you obviously doubt this assertion, try the same experiment except
load the data into a new table with the *same* structure as the original
(ie, all varchar).  You'll see the same or probably better performance.

> Actually, all databases on this server are vacuumed nightly, right
> before backups.

Not often enough evidently...

> INFO:  --Relation public.emails--
> INFO:  Pages 5793: Changed 0, Empty 0; Tup 24198: Vac 82, Keep 0, UnUsed
> 71757.

> INFO:  --Relation public.emails2--
> INFO:  Pages 2646: Changed 0, Empty 0; Tup 24162: Vac 0, Keep 0, UnUsed 0.

Note the difference in "pages".  emails is nearly double the physical
size, even though the live data in it is doubtless smaller.  (You could
use the contrib/pgstattuple functions to see exactly how much live data
there is.)

I was actually expecting to see more than a 2:1 difference in file size,
seeing that you reported more than a 2:1 difference in read time.  It
could be that there is also some question of physical layout of the file
on disk.  The original table probably accreted over time and was given
space that's not very consecutive on disk.  The copied table was
written in one swoop, more or less, and is very likely stored in a more
nearly consecutive set of physical disk blocks.  I'm not sure how you'd
investigate this theory though --- AFAIK there isn't any real easy way
to find out how badly fragmented a file is in most Unix filesystems.

            regards, tom lane

Re: VARCHAR -vs- CHAR: huge performance difference?

From

"C. Bensend"

Date:

16 June 2004, 00:11:37

> Right.  The only thing that will do for you is waste space with padding
> blanks.  The performance improvement you saw was due to something else,
> which I theorize is not having to waste time reading dead space.
>
> Since you obviously doubt this assertion, try the same experiment except
> load the data into a new table with the *same* structure as the original
> (ie, all varchar).  You'll see the same or probably better performance.

Hi Tom,

   I don't doubt your assertion, I just don't understand it all yet.  :)
Let me assure you, you're the expert here, not I.

   I did as you suggested - I created a third copy of the table, using
the exact same structure.  And you're absolutely right - it was lightning
fast (around 100ms).

>> Actually, all databases on this server are vacuumed nightly, right
>> before backups.
>
> Not often enough evidently...

This statement worries me a bit.  The data in the original table was
bulk-loaded the other night (less than three days I'm sure), and new
entries have been added at the approximate rate of 300 per day.  Is this
going to continue to happen?

Or do I just need to vacuum more often?  I _did_ try a vacuum before
asking the list for help, but it didn't give any improvement (just a
vacuum analyze).

> Note the difference in "pages".  emails is nearly double the physical
> size, even though the live data in it is doubtless smaller.  (You could
> use the contrib/pgstattuple functions to see exactly how much live data
> there is.)

OK.  I see (and understand) the pages value now.

> I was actually expecting to see more than a 2:1 difference in file size,
> seeing that you reported more than a 2:1 difference in read time.  It
> could be that there is also some question of physical layout of the file
> on disk.  The original table probably accreted over time and was given
> space that's not very consecutive on disk.  The copied table was
> written in one swoop, more or less, and is very likely stored in a more
> nearly consecutive set of physical disk blocks.  I'm not sure how you'd
> investigate this theory though --- AFAIK there isn't any real easy way
> to find out how badly fragmented a file is in most Unix filesystems.

Ugh.  So, what would you recommend as a fix?  I see the problem, and I
see the fix that just worked, but I certainly can't be the only person
around that is using a "wide" table with a lot of character data being
added at a rather slow rate...

Thank you _very much_ for all your help,

Benny


--
"Oh, the Jedis are going to feel this one!"       -- Professor Farnsworth,
                                                     "Futurama"

Re: VARCHAR -vs- CHAR: huge performance difference?

From

"Iain"

Date:

16 June 2004, 00:18:26

Hi Benny,

What happens if you recreate the table using varchar? I mean use the same
procedure yoused to create the table with char, but leave the definitions as
varchar.

Personally, I can't see any logical reason for char being faster. The
problem is the size of the row, each row spans multiple database pages and
all have to be read to do a count. It's a lot of IO, and varchar should
actually be more compact in your case.

I would think that the speedup you see is a result of the table having a
fresh new organization. Anyway, I'm interested to hear what happens if you
do the test above.

I liked Scot's suggestion of using a small table containing only the IDs to
use for counting. A serial scan of this small table could be expected to be
much faster. Postgres doesnt use an index in the case you specified, as I
understand it.

Regards
Iain
----- Original Message -----
From: "C. Bensend" <benny@bennyvision.com>
To: <pgsql-admin@postgresql.org>
Sent: Wednesday, June 16, 2004 11:12 AM
Subject: Re: [ADMIN] VARCHAR -vs- CHAR: huge performance difference?


>
> > "C. Bensend" <benny@bennyvision.com> writes:
> >> So, I went ahead and created an exact copy of this table, with the
> >> exception of creating all character columns as type char(), not
> >> varchar().
> >> I was pondering if making PostgreSQL worry about the varying lengths
> >> by using varchar was the problem...
> >
> > The above transformation is a guaranteed loser in Postgres.
>
> Hi Tom,
>
>    By transformation, do you mean the varchar() -> char() change?  If
> so, I'm not sure I understand - it certainly improved the performance.
> Or am I misunderstanding?
>
> > I'm betting that the original table is physically huge because you've
> > not vacuumed it regularly.  The copying produced a table with no wasted
> > space, so physically smaller even though the live data is noticeably
> > bigger (because of all the padding blanks you forced to be added).
> >
> > Check what VACUUM VERBOSE has to say about each of these tables...
>
> Actually, all databases on this server are vacuumed nightly, right
> before backups.  But here is the data:
>
> prod01=> vacuum verbose emails;
> INFO:  --Relation public.emails--
> INFO:  Index emails_email_id_idx: Pages 358; Tuples 24198: Deleted 82.
>         CPU 0.03s/0.01u sec elapsed 0.41 sec.
> INFO:  Index emails_date_received_idx: Pages 325; Tuples 24198: Deleted
82.
>         CPU 0.00s/0.00u sec elapsed 0.63 sec.
> INFO:  Removed 82 tuples in 23 pages.
>         CPU 0.00s/0.00u sec elapsed 0.06 sec.
> INFO:  Pages 5793: Changed 0, Empty 0; Tup 24198: Vac 82, Keep 0, UnUsed
> 71757.
>         Total CPU 0.24s/0.06u sec elapsed 4.71 sec.
> INFO:  --Relation pg_toast.pg_toast_399420--
> INFO:  Pages 0: Changed 0, Empty 0; Tup 0: Vac 0, Keep 0, UnUsed 0.
>         Total CPU 0.00s/0.00u sec elapsed 0.00 sec.
> VACUUM
> prod01=> vacuum verbose emails2;
> INFO:  --Relation public.emails2--
> INFO:  Pages 2646: Changed 0, Empty 0; Tup 24162: Vac 0, Keep 0, UnUsed 0.
>         Total CPU 0.10s/0.00u sec elapsed 1.00 sec.
> INFO:  --Relation pg_toast.pg_toast_859969--
> INFO:  Pages 0: Changed 0, Empty 0; Tup 0: Vac 0, Keep 0, UnUsed 0.
>         Total CPU 0.00s/0.00u sec elapsed 0.00 sec.
> VACUUM
>
> Thanks very much,
>
> Benny
>
>
> --
> "Oh, the Jedis are going to feel this one!"       -- Professor Farnsworth,
>                                                      "Futurama"
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match

Re: VARCHAR -vs- CHAR: huge performance difference?

From

"Iain"

Date:

16 June 2004, 01:56:46

> Or do I just need to vacuum more often?  I _did_ try a vacuum before
> asking the list for help, but it didn't give any improvement (just a
> vacuum analyze).

"vacuum analyse" allows dead space to be re-used, but doesn't compact the
table, you need "vacuum analyse full" for that.

I'm not sure how the free space map relates to this, maybe someone can shed
some light, but if it is too small, not all dead rows can be re-used (as I
understand it). If your table (or even the DB in general) is updated alot,
the default FSM size that comes with 7.3 may not be enough to ensure the
most efficient space reclamation. Last time I looked the docs were a little
hazy on that.

Re: VARCHAR -vs- CHAR: huge performance difference?

From

"Scott Marlowe"

Date:

16 June 2004, 02:31:10

On Tue, 2004-06-15 at 21:09, C. Bensend wrote:
> > Right.  The only thing that will do for you is waste space with padding
> > blanks.  The performance improvement you saw was due to something else,
> > which I theorize is not having to waste time reading dead space.
> >
> > Since you obviously doubt this assertion, try the same experiment except
> > load the data into a new table with the *same* structure as the original
> > (ie, all varchar).  You'll see the same or probably better performance.
>
> Hi Tom,
>
>    I don't doubt your assertion, I just don't understand it all yet.  :)
> Let me assure you, you're the expert here, not I.
>
>    I did as you suggested - I created a third copy of the table, using
> the exact same structure.  And you're absolutely right - it was lightning
> fast (around 100ms).
>
> >> Actually, all databases on this server are vacuumed nightly, right
> >> before backups.
> >
> > Not often enough evidently...
>
> This statement worries me a bit.  The data in the original table was
> bulk-loaded the other night (less than three days I'm sure), and new
> entries have been added at the approximate rate of 300 per day.  Is this
> going to continue to happen?
>
> Or do I just need to vacuum more often?  I _did_ try a vacuum before
> asking the list for help, but it didn't give any improvement (just a
> vacuum analyze).
>
> > Note the difference in "pages".  emails is nearly double the physical
> > size, even though the live data in it is doubtless smaller.  (You could
> > use the contrib/pgstattuple functions to see exactly how much live data
> > there is.)
>
> OK.  I see (and understand) the pages value now.
>
> > I was actually expecting to see more than a 2:1 difference in file size,
> > seeing that you reported more than a 2:1 difference in read time.  It
> > could be that there is also some question of physical layout of the file
> > on disk.  The original table probably accreted over time and was given
> > space that's not very consecutive on disk.  The copied table was
> > written in one swoop, more or less, and is very likely stored in a more
> > nearly consecutive set of physical disk blocks.  I'm not sure how you'd
> > investigate this theory though --- AFAIK there isn't any real easy way
> > to find out how badly fragmented a file is in most Unix filesystems.
>
> Ugh.  So, what would you recommend as a fix?  I see the problem, and I
> see the fix that just worked, but I certainly can't be the only person
> around that is using a "wide" table with a lot of character data being
> added at a rather slow rate...

You might want to look into the autovacuum daemon, and / or increasing
fsm settings to be large enough to hold all the spare tuples released by
vacuuming.

Re: VARCHAR -vs- CHAR: huge performance difference?

From

"V i s h a l Kashyap @ [Sai Hertz And Control Systems]"

Date:

16 June 2004, 09:57:06

Dear C. Bensend wrote:

>   I am parsing and storing emails, so I have a lot of character string
>data to worry about.  In one particular table, I have 26 columns of type
>varchar, with widths of 24-8192 characters.  This is the table that is
>giving me some pretty severe performance problems.
>
>   For example, many of the statistics I run against these emails is a
>simple count(), whether it be on a column or *.  Right now, counting the
>rows takes in excess of 3.5 seconds.  OK, that's pretty slow.  However,
>there are only --> 24,000 <-- rows in this table!  I could see it taking
>a long time if I had 500,000 rows, or a million, but 24,000?
>
>
>
For similar  post someone suggested to do a count like

select count(*)  from table where any = any limit 1 ;

Try this look if thiers any  performance increase.

--
Regards,
Vishal Kashyap
Director / Lead Software Developer,
Sai Hertz And Control Systems Pvt Ltd,
http://saihertz.rediffblogs.com
Yahoo  IM: mailforvishal[ a t ]yahoo.com

Re: VARCHAR -vs- CHAR: huge performance difference?

From

"C. Bensend"

Date:

16 June 2004, 11:38:24

> You might want to look into the autovacuum daemon, and / or increasing
> fsm settings to be large enough to hold all the spare tuples released by
> vacuuming.

IIRC, the autovacuum stuff was added in 7.4, which I'm not running (yet).
An upgrade to 7.4.3 might be prudent for me, while the database is still
small.

Benny


--
"Oh, the Jedis are going to feel this one!"       -- Professor Farnsworth,
                                                     "Futurama"

Re: VARCHAR -vs- CHAR: huge performance difference?

From

"Olivier Hubaut"

Date:

17 June 2004, 04:47:05

On Wed, 16 Jun 2004 09:37:58 -0500 (CDT), C. Bensend
<benny@bennyvision.com> wrote:

>
>> You might want to look into the autovacuum daemon, and / or increasing
>> fsm settings to be large enough to hold all the spare tuples released by
>> vacuuming.
>
> IIRC, the autovacuum stuff was added in 7.4, which I'm not running (yet).
> An upgrade to 7.4.3 might be prudent for me, while the database is still
> small.
>
> Benny
>
>

By the way, be carefull. The pg_autovacuum is not good enough for being
using as the only way for cleaning the database, especially if you have a
lot of update and delete on the same tables.


--
Downloading signature ... 99%
*CRC FAILED*
signature aborted

Re: VARCHAR -vs- CHAR: huge performance difference?

From

"C. Bensend"

Date:

17 June 2004, 11:59:31

> By the way, be carefull. The pg_autovacuum is not good enough for being
> using as the only way for cleaning the database, especially if you have a
> lot of update and delete on the same tables.

For right now, I'm just vacuuming more often to see how things go.  I
didn't think I was even generating enough transactions for it to be a
problem, but a 'vacuum full' made a night-and-day difference.

If it continues to be problematic, I'll probably look at an upgrade.

Benny


--
"Oh, the Jedis are going to feel this one!"       -- Professor Farnsworth,
                                                     "Futurama"

Re: VARCHAR -vs- CHAR: huge performance difference?

From

"Scott Marlowe"

Date:

17 June 2004, 12:37:03

On Thu, 2004-06-17 at 00:46, Olivier Hubaut wrote:
> On Wed, 16 Jun 2004 09:37:58 -0500 (CDT), C. Bensend
> <benny@bennyvision.com> wrote:
>
> >
> >> You might want to look into the autovacuum daemon, and / or increasing
> >> fsm settings to be large enough to hold all the spare tuples released by
> >> vacuuming.
> >
> > IIRC, the autovacuum stuff was added in 7.4, which I'm not running (yet).
> > An upgrade to 7.4.3 might be prudent for me, while the database is still
> > small.
> >
> > Benny
> >
> >
>
> By the way, be carefull. The pg_autovacuum is not good enough for being
> using as the only way for cleaning the database, especially if you have a
> lot of update and delete on the same tables.

The problem here isn't pg_autovacuum, but too small of settings for
fsm.  I've run multi-day tests where autovacuum kept the size of the
database pretty much the same with 200+ updates a second going on.

Re: VARCHAR -vs- CHAR: huge performance difference?

From

"C. Bensend"

Date:

17 June 2004, 12:58:26

> The problem here isn't pg_autovacuum, but too small of settings for
> fsm.  I've run multi-day tests where autovacuum kept the size of the
> database pretty much the same with 200+ updates a second going on.

Hi Scott,

   Could you explain the fsm a little more?  I have done _no_ tuning on
my database simply because I don't know how, but I would be interested
to hear your recommendations for the fsm settings.

Thanks!

Benny


--
"Oh, the Jedis are going to feel this one!"       -- Professor Farnsworth,
                                                     "Futurama"

Re: VARCHAR -vs- CHAR: huge performance difference?

From

"Scott Marlowe"

Date:

20 June 2004, 07:09:46

On Thu, 2004-06-17 at 09:57, C. Bensend wrote:
> > The problem here isn't pg_autovacuum, but too small of settings for
> > fsm.  I've run multi-day tests where autovacuum kept the size of the
> > database pretty much the same with 200+ updates a second going on.
>
> Hi Scott,
>
>    Could you explain the fsm a little more?  I have done _no_ tuning on
> my database simply because I don't know how, but I would be interested
> to hear your recommendations for the fsm settings.

OK, Here's the basics of what fsm does.  As you probably already know,
PostgreSQL uses an in store multiple versioning system, where the first
version of a tuple points to next newest version, and so on, until we
get to the last version.  So, if you update the same row four times,
you'll have five copies of it in the data storage files, the original,
plus four copies.

Each transaction that needs to access said tuple will grab the first
one, check the date stamp on it, and go on the the next one, until it
finds the last valid one for it, based on the rules of visibility.  We
won't get into that right now, because what's important is that we have
5 versions of that tuple in memory.

Now, after all the currently processing transactions finish, all the new
transactions are going to be seeing the fifth version of the tuple from
then on.  But they'll all start by looking at the first one and working
forward one at a time.

Along comes vacuum and vacuum full.  Vacuum full marks all these dead
tuples, then actually compresses the data file back down to save the
actual lost space.  Plain (or lazy) vacuums simply mark the rows as
free, without actually reclaiming them.  The free space map is what
vacuum uses to mark those tuples as free, and what the storage manager
checks to find free space to place new tuples in the tables.

So, the free space map needs to be big enough to hold a reference to
every single freed row from vacuuming, or the vacuumed tuple space will
not get reused, and the storage manager will simply append new tuples
onto the end of the data file.

You can find a bit more on this subject at:

http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html

Re: VARCHAR -vs- CHAR: huge performance difference?

From

"Iain"

Date:

21 June 2004, 00:59:37

There is some information on www.varlena.com I think it may have been Scott
who wrote the article that I'm thinking about. There is other information
about, your best bet may be google.

Regards
Iain
----- Original Message -----
From: "C. Bensend" <benny@bennyvision.com>
To: <pgsql-admin@postgresql.org>
Sent: Friday, June 18, 2004 12:57 AM
Subject: Re: [ADMIN] VARCHAR -vs- CHAR: huge performance difference?


>
> > The problem here isn't pg_autovacuum, but too small of settings for
> > fsm.  I've run multi-day tests where autovacuum kept the size of the
> > database pretty much the same with 200+ updates a second going on.
>
> Hi Scott,
>
>    Could you explain the fsm a little more?  I have done _no_ tuning on
> my database simply because I don't know how, but I would be interested
> to hear your recommendations for the fsm settings.
>
> Thanks!
>
> Benny
>
>
> --
> "Oh, the Jedis are going to feel this one!"       -- Professor Farnsworth,
>                                                      "Futurama"
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings