Thread: strange performance regression between 7.4 and 8.1

From:
"Alex Deucher"
Date:

Hello,

I have noticed a strange performance regression and I'm at a loss as
to what's happening.  We have a fairly large database (~16 GB).  The
original postgres 7.4 was running on a sun v880 with 4 CPUs and 8 GB
of ram running Solaris on local scsi discs.  The new server is a sun
Opteron box with 4 cores, 8 GB of ram running postgres 8.1.4 on Linux
(AMD64) on a 4 Gbps FC SAN volume.  When we created the new database
it was created from scratch rather than copying over the old one,
however the table structure is almost identical (UTF8 on the new one
vs. C on the old). The problem is queries are ~10x slower on the new
hardware.  I read several places that the SAN might be to blame, but
testing with bonnie and dd indicates that the SAN is actually almost
twice as fast as the scsi discs in the old sun server.  I've tried
adjusting just about every option in the postgres config file, but
performance remains the same.  Any ideas?

Thanks,

Alex

From:
"Joshua D. Drake"
Date:

Alex Deucher wrote:
> Hello,
>
> I have noticed a strange performance regression and I'm at a loss as
> to what's happening.  We have a fairly large database (~16 GB).  The
> original postgres 7.4 was running on a sun v880 with 4 CPUs and 8 GB
> of ram running Solaris on local scsi discs.  The new server is a sun
> Opteron box with 4 cores, 8 GB of ram running postgres 8.1.4 on Linux
> (AMD64) on a 4 Gbps FC SAN volume.  When we created the new database
> it was created from scratch rather than copying over the old one,
> however the table structure is almost identical (UTF8 on the new one
> vs. C on the old). The problem is queries are ~10x slower on the new
> hardware.  I read several places that the SAN might be to blame, but
> testing with bonnie and dd indicates that the SAN is actually almost
> twice as fast as the scsi discs in the old sun server.  I've tried
> adjusting just about every option in the postgres config file, but
> performance remains the same.  Any ideas?

Vacuum? Analayze? default_statistics_target? How many shared_buffers?
effective_cache_size? work_mem?

Sincerely,

Joshua D. Drake


>
> Thanks,
>
> Alex
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faq
>


--

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


From:
Jeff Frost
Date:

On Thu, 1 Mar 2007, Joshua D. Drake wrote:

> Alex Deucher wrote:
>> Hello,
>>
>> I have noticed a strange performance regression and I'm at a loss as
>> to what's happening.  We have a fairly large database (~16 GB).  The
>> original postgres 7.4 was running on a sun v880 with 4 CPUs and 8 GB
>> of ram running Solaris on local scsi discs.  The new server is a sun
>> Opteron box with 4 cores, 8 GB of ram running postgres 8.1.4 on Linux
>> (AMD64) on a 4 Gbps FC SAN volume.  When we created the new database
>> it was created from scratch rather than copying over the old one,
>> however the table structure is almost identical (UTF8 on the new one
>> vs. C on the old). The problem is queries are ~10x slower on the new
>> hardware.  I read several places that the SAN might be to blame, but
>> testing with bonnie and dd indicates that the SAN is actually almost
>> twice as fast as the scsi discs in the old sun server.  I've tried
>> adjusting just about every option in the postgres config file, but
>> performance remains the same.  Any ideas?
>
> Vacuum? Analayze? default_statistics_target? How many shared_buffers?
> effective_cache_size? work_mem?

Also, an explain analyze from both the 7.4 and 8.1 systems with one of the
10x slower queries would probably be handy.

What do you mean by "created from scratch rather than copying over the old
one"?  How did you put the data in?  Did you run analyze after loading it?
Is autovacuum enabled and if so, what are the thresholds?

--
Jeff Frost, Owner     <>
Frost Consulting, LLC     http://www.frostconsultingllc.com/
Phone: 650-780-7908    FAX: 650-649-1954

From:
"Alex Deucher"
Date:

On 3/1/07, Joshua D. Drake <> wrote:
> Alex Deucher wrote:
> > Hello,
> >
> > I have noticed a strange performance regression and I'm at a loss as
> > to what's happening.  We have a fairly large database (~16 GB).  The
> > original postgres 7.4 was running on a sun v880 with 4 CPUs and 8 GB
> > of ram running Solaris on local scsi discs.  The new server is a sun
> > Opteron box with 4 cores, 8 GB of ram running postgres 8.1.4 on Linux
> > (AMD64) on a 4 Gbps FC SAN volume.  When we created the new database
> > it was created from scratch rather than copying over the old one,
> > however the table structure is almost identical (UTF8 on the new one
> > vs. C on the old). The problem is queries are ~10x slower on the new
> > hardware.  I read several places that the SAN might be to blame, but
> > testing with bonnie and dd indicates that the SAN is actually almost
> > twice as fast as the scsi discs in the old sun server.  I've tried
> > adjusting just about every option in the postgres config file, but
> > performance remains the same.  Any ideas?
>
> Vacuum? Analayze? default_statistics_target? How many shared_buffers?
> effective_cache_size? work_mem?
>

I'm running the autovacuum process on the 8.1 server.  vacuuming on
the old server was done manually.

default_statistics_target and effective_cache_size are set to the the
defaults on both.

postgres 7.4 server:
# - Memory -
shared_buffers = 82000 # 1000            min 16, at least
max_connections*2, 8KB each
sort_mem = 8000        # 1024            min 64, size in KB
vacuum_mem = 32000     # 8192            min 1024, size in KB
# - Free Space Map -
#max_fsm_pages = 20000          # min max_fsm_relations*16, 6 bytes each
#max_fsm_relations = 1000       # min 100, ~50 bytes each
# - Kernel Resource Usage -
#max_files_per_process = 1000   # min 25

postgres 8.1 server:
# - Memory -
shared_buffers = 100000                 # min 16 or max_connections*2, 8KB each
temp_buffers = 2000 #1000                       # min 100, 8KB each
max_prepared_transactions = 100 #5              # can be 0 or more
# note: increasing max_prepared_transactions costs ~600 bytes of shared memory
# per transaction slot, plus lock space (see max_locks_per_transaction).
work_mem = 10000        #1024           # min 64, size in KB
maintenance_work_mem = 524288 #16384            # min 1024, size in KB
#max_stack_depth = 2048                 # min 100, size in KB

I've also tried using the same settings from the old server on the new
one; same performance issues.

Thanks,

Alex

> Sincerely,
>
> Joshua D. Drake
>
>
> >
> > Thanks,
> >
> > Alex
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 3: Have you checked our extensive FAQ?
> >
> >               http://www.postgresql.org/docs/faq
> >
>
>
> --
>
>       === The PostgreSQL Company: Command Prompt, Inc. ===
> Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
> Providing the most comprehensive  PostgreSQL solutions since 1997
>              http://www.commandprompt.com/
>
> Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
> PostgreSQL Replication: http://www.commandprompt.com/products/
>
>

From:
"Alex Deucher"
Date:

On 3/1/07, Jeff Frost <> wrote:
> On Thu, 1 Mar 2007, Joshua D. Drake wrote:
>
> > Alex Deucher wrote:
> >> Hello,
> >>
> >> I have noticed a strange performance regression and I'm at a loss as
> >> to what's happening.  We have a fairly large database (~16 GB).  The
> >> original postgres 7.4 was running on a sun v880 with 4 CPUs and 8 GB
> >> of ram running Solaris on local scsi discs.  The new server is a sun
> >> Opteron box with 4 cores, 8 GB of ram running postgres 8.1.4 on Linux
> >> (AMD64) on a 4 Gbps FC SAN volume.  When we created the new database
> >> it was created from scratch rather than copying over the old one,
> >> however the table structure is almost identical (UTF8 on the new one
> >> vs. C on the old). The problem is queries are ~10x slower on the new
> >> hardware.  I read several places that the SAN might be to blame, but
> >> testing with bonnie and dd indicates that the SAN is actually almost
> >> twice as fast as the scsi discs in the old sun server.  I've tried
> >> adjusting just about every option in the postgres config file, but
> >> performance remains the same.  Any ideas?
> >
> > Vacuum? Analayze? default_statistics_target? How many shared_buffers?
> > effective_cache_size? work_mem?
>
> Also, an explain analyze from both the 7.4 and 8.1 systems with one of the
> 10x slower queries would probably be handy.
>

I'll run some and get back to you.

> What do you mean by "created from scratch rather than copying over the old
> one"?  How did you put the data in?  Did you run analyze after loading it?
> Is autovacuum enabled and if so, what are the thresholds?

Both the databases were originally created from xml files.  We just
re-created the new one from the xml rather than copying the old
database over.  I didn't manually run analyze on it, but we are
running the autovacuum process:

autovacuum = on #off                    # enable autovacuum subprocess?
autovacuum_naptime = 360 #60            # time between autovacuum runs, in secs
autovacuum_vacuum_threshold = 10000 #1000       # min # of tuple updates before
                                        # vacuum
autovacuum_analyze_threshold = 5000 #500        # min # of tuple updates before

Thanks,

Alex

>
> --
> Jeff Frost, Owner       <>
> Frost Consulting, LLC   http://www.frostconsultingllc.com/
> Phone: 650-780-7908     FAX: 650-649-1954
>

From:
Jeff Frost
Date:

On Thu, 1 Mar 2007, Alex Deucher wrote:

> On 3/1/07, Jeff Frost <> wrote:
>> On Thu, 1 Mar 2007, Joshua D. Drake wrote:
>>
>> > Alex Deucher wrote:
>> >> Hello,
>> >>
>> >> I have noticed a strange performance regression and I'm at a loss as
>> >> to what's happening.  We have a fairly large database (~16 GB).  The
>> >> original postgres 7.4 was running on a sun v880 with 4 CPUs and 8 GB
>> >> of ram running Solaris on local scsi discs.  The new server is a sun
>> >> Opteron box with 4 cores, 8 GB of ram running postgres 8.1.4 on Linux
>> >> (AMD64) on a 4 Gbps FC SAN volume.  When we created the new database
>> >> it was created from scratch rather than copying over the old one,
>> >> however the table structure is almost identical (UTF8 on the new one
>> >> vs. C on the old). The problem is queries are ~10x slower on the new
>> >> hardware.  I read several places that the SAN might be to blame, but
>> >> testing with bonnie and dd indicates that the SAN is actually almost
>> >> twice as fast as the scsi discs in the old sun server.  I've tried
>> >> adjusting just about every option in the postgres config file, but
>> >> performance remains the same.  Any ideas?
>> >
>> > Vacuum? Analayze? default_statistics_target? How many shared_buffers?
>> > effective_cache_size? work_mem?
>>
>> Also, an explain analyze from both the 7.4 and 8.1 systems with one of the
>> 10x slower queries would probably be handy.
>>
>
> I'll run some and get back to you.
>
>> What do you mean by "created from scratch rather than copying over the old
>> one"?  How did you put the data in?  Did you run analyze after loading it?
>> Is autovacuum enabled and if so, what are the thresholds?
>
> Both the databases were originally created from xml files.  We just
> re-created the new one from the xml rather than copying the old
> database over.  I didn't manually run analyze on it, but we are
> running the autovacuum process:

You should probably manually run analyze and see if that resolves your
problem.

>
> autovacuum = on #off                    # enable autovacuum subprocess?
> autovacuum_naptime = 360 #60            # time between autovacuum runs, in
> secs
> autovacuum_vacuum_threshold = 10000 #1000       # min # of tuple updates
> before
>                                       # vacuum
> autovacuum_analyze_threshold = 5000 #500        # min # of tuple updates
> before

Most people make autovacuum more aggressive and not less aggressive.  In fact,
the new defaults in 8.2 are:

#autovacuum_vacuum_threshold = 500      # min # of tuple updates before
                                         # vacuum
#autovacuum_analyze_threshold = 250     # min # of tuple updates before
                                         # analyze
#autovacuum_vacuum_scale_factor = 0.2   # fraction of rel size before
                                         # vacuum
#autovacuum_analyze_scale_factor = 0.1  # fraction of rel size before

I'd recommend trying those, otherwise you might not vacuum enough.

It'll be interesting to see the explain analyze output after you've run
analyze by hand.

--
Jeff Frost, Owner     <>
Frost Consulting, LLC     http://www.frostconsultingllc.com/
Phone: 650-780-7908    FAX: 650-649-1954

From:
Jeff Frost
Date:

On Thu, 1 Mar 2007, Alex Deucher wrote:

>> Vacuum? Analayze? default_statistics_target? How many shared_buffers?
>> effective_cache_size? work_mem?
>>
>
> I'm running the autovacuum process on the 8.1 server.  vacuuming on
> the old server was done manually.
>
> default_statistics_target and effective_cache_size are set to the the
> defaults on both.
>
> postgres 7.4 server:
> # - Memory -
> shared_buffers = 82000 # 1000            min 16, at least
> max_connections*2, 8KB each
> sort_mem = 8000        # 1024            min 64, size in KB
> vacuum_mem = 32000     # 8192            min 1024, size in KB
> # - Free Space Map -
> #max_fsm_pages = 20000          # min max_fsm_relations*16, 6 bytes each
> #max_fsm_relations = 1000       # min 100, ~50 bytes each
> # - Kernel Resource Usage -
> #max_files_per_process = 1000   # min 25
>
> postgres 8.1 server:
> # - Memory -
> shared_buffers = 100000                 # min 16 or max_connections*2, 8KB
> each
> temp_buffers = 2000 #1000                       # min 100, 8KB each
> max_prepared_transactions = 100 #5              # can be 0 or more
> # note: increasing max_prepared_transactions costs ~600 bytes of shared
> memory
> # per transaction slot, plus lock space (see max_locks_per_transaction).
> work_mem = 10000        #1024           # min 64, size in KB
> maintenance_work_mem = 524288 #16384            # min 1024, size in KB
> #max_stack_depth = 2048                 # min 100, size in KB
>
> I've also tried using the same settings from the old server on the new
> one; same performance issues.
>

If this is a linux system, could you give us the output of the 'free' command?
Postgresql might be choosing a bad plan because your effective_cache_size is
way off (it's the default now right?).  Also, what was the block read/write
speed of the SAN from your bonnie tests?  Probably want to tune
random_page_cost as well if it's also at the default.


--
Jeff Frost, Owner     <>
Frost Consulting, LLC     http://www.frostconsultingllc.com/
Phone: 650-780-7908    FAX: 650-649-1954

From:
"Alex Deucher"
Date:

On 3/1/07, Jeff Frost <> wrote:
> On Thu, 1 Mar 2007, Alex Deucher wrote:
>
> >> Vacuum? Analayze? default_statistics_target? How many shared_buffers?
> >> effective_cache_size? work_mem?
> >>
> >
> > I'm running the autovacuum process on the 8.1 server.  vacuuming on
> > the old server was done manually.
> >
> > default_statistics_target and effective_cache_size are set to the the
> > defaults on both.
> >
> > postgres 7.4 server:
> > # - Memory -
> > shared_buffers = 82000 # 1000            min 16, at least
> > max_connections*2, 8KB each
> > sort_mem = 8000        # 1024            min 64, size in KB
> > vacuum_mem = 32000     # 8192            min 1024, size in KB
> > # - Free Space Map -
> > #max_fsm_pages = 20000          # min max_fsm_relations*16, 6 bytes each
> > #max_fsm_relations = 1000       # min 100, ~50 bytes each
> > # - Kernel Resource Usage -
> > #max_files_per_process = 1000   # min 25
> >
> > postgres 8.1 server:
> > # - Memory -
> > shared_buffers = 100000                 # min 16 or max_connections*2, 8KB
> > each
> > temp_buffers = 2000 #1000                       # min 100, 8KB each
> > max_prepared_transactions = 100 #5              # can be 0 or more
> > # note: increasing max_prepared_transactions costs ~600 bytes of shared
> > memory
> > # per transaction slot, plus lock space (see max_locks_per_transaction).
> > work_mem = 10000        #1024           # min 64, size in KB
> > maintenance_work_mem = 524288 #16384            # min 1024, size in KB
> > #max_stack_depth = 2048                 # min 100, size in KB
> >
> > I've also tried using the same settings from the old server on the new
> > one; same performance issues.
> >
>
> If this is a linux system, could you give us the output of the 'free' command?

             total       used       free     shared    buffers     cached
Mem:       8059852    8042868      16984          0        228    7888648
-/+ buffers/cache:     153992    7905860
Swap:     15631224       2164   15629060


> Postgresql might be choosing a bad plan because your effective_cache_size is
> way off (it's the default now right?).  Also, what was the block read/write

yes it's set to the default.

> speed of the SAN from your bonnie tests?  Probably want to tune
> random_page_cost as well if it's also at the default.
>

                ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
luna12-san   16000M 58896  91 62931   9 35870   5 54869  82 145504 13  397.7  0

effective_cache_size is the default.

Alex

From:
Jeff Frost
Date:

On Thu, 1 Mar 2007, Alex Deucher wrote:

> On 3/1/07, Jeff Frost <> wrote:
>> On Thu, 1 Mar 2007, Alex Deucher wrote:
>>
>> >> Vacuum? Analayze? default_statistics_target? How many shared_buffers?
>> >> effective_cache_size? work_mem?
>> >>
>> >
>> > I'm running the autovacuum process on the 8.1 server.  vacuuming on
>> > the old server was done manually.
>> >
>> > default_statistics_target and effective_cache_size are set to the the
>> > defaults on both.
>> >
>> > postgres 7.4 server:
>> > # - Memory -
>> > shared_buffers = 82000 # 1000            min 16, at least
>> > max_connections*2, 8KB each
>> > sort_mem = 8000        # 1024            min 64, size in KB
>> > vacuum_mem = 32000     # 8192            min 1024, size in KB
>> > # - Free Space Map -
>> > #max_fsm_pages = 20000          # min max_fsm_relations*16, 6 bytes each
>> > #max_fsm_relations = 1000       # min 100, ~50 bytes each
>> > # - Kernel Resource Usage -
>> > #max_files_per_process = 1000   # min 25
>> >
>> > postgres 8.1 server:
>> > # - Memory -
>> > shared_buffers = 100000                 # min 16 or max_connections*2,
>> 8KB
>> > each
>> > temp_buffers = 2000 #1000                       # min 100, 8KB each
>> > max_prepared_transactions = 100 #5              # can be 0 or more
>> > # note: increasing max_prepared_transactions costs ~600 bytes of shared
>> > memory
>> > # per transaction slot, plus lock space (see max_locks_per_transaction).
>> > work_mem = 10000        #1024           # min 64, size in KB
>> > maintenance_work_mem = 524288 #16384            # min 1024, size in KB
>> > #max_stack_depth = 2048                 # min 100, size in KB
>> >
>> > I've also tried using the same settings from the old server on the new
>> > one; same performance issues.
>> >
>>
>> If this is a linux system, could you give us the output of the 'free'
>> command?
>
>            total       used       free     shared    buffers     cached
> Mem:       8059852    8042868      16984          0        228    7888648
> -/+ buffers/cache:     153992    7905860
> Swap:     15631224       2164   15629060

So, I would set effective_cache_size = 988232 (7905860/8).

>
>> Postgresql might be choosing a bad plan because your effective_cache_size
>> is
>> way off (it's the default now right?).  Also, what was the block read/write
>
> yes it's set to the default.
>
>> speed of the SAN from your bonnie tests?  Probably want to tune
>> random_page_cost as well if it's also at the default.
>>
>
>                 ------Sequential Output------ --Sequential Input-
> --Random-
>                   -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> --Seeks--
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec
> %CP
> luna12-san   16000M 58896  91 62931   9 35870   5 54869  82 145504 13  397.7
> 0
>

So, you're getting 62MB/s writes and 145MB/s reads.  Just FYI, that write
speed is about the same as my single SATA drive write speed on my workstation,
so not that great.  The read speed is decent, though and with that sort of
read performance, you might want to lower random_page_cost to something like
2.5 or 2 so the planner will tend to prefer index scans.

--
Jeff Frost, Owner     <>
Frost Consulting, LLC     http://www.frostconsultingllc.com/
Phone: 650-780-7908    FAX: 650-649-1954

From:
"Alex Deucher"
Date:

On 3/1/07, Jeff Frost <> wrote:
> On Thu, 1 Mar 2007, Alex Deucher wrote:
>
> > On 3/1/07, Jeff Frost <> wrote:
> >> On Thu, 1 Mar 2007, Alex Deucher wrote:
> >>
> >> >> Vacuum? Analayze? default_statistics_target? How many shared_buffers?
> >> >> effective_cache_size? work_mem?
> >> >>
> >> >
> >> > I'm running the autovacuum process on the 8.1 server.  vacuuming on
> >> > the old server was done manually.
> >> >
> >> > default_statistics_target and effective_cache_size are set to the the
> >> > defaults on both.
> >> >
> >> > postgres 7.4 server:
> >> > # - Memory -
> >> > shared_buffers = 82000 # 1000            min 16, at least
> >> > max_connections*2, 8KB each
> >> > sort_mem = 8000        # 1024            min 64, size in KB
> >> > vacuum_mem = 32000     # 8192            min 1024, size in KB
> >> > # - Free Space Map -
> >> > #max_fsm_pages = 20000          # min max_fsm_relations*16, 6 bytes each
> >> > #max_fsm_relations = 1000       # min 100, ~50 bytes each
> >> > # - Kernel Resource Usage -
> >> > #max_files_per_process = 1000   # min 25
> >> >
> >> > postgres 8.1 server:
> >> > # - Memory -
> >> > shared_buffers = 100000                 # min 16 or max_connections*2,
> >> 8KB
> >> > each
> >> > temp_buffers = 2000 #1000                       # min 100, 8KB each
> >> > max_prepared_transactions = 100 #5              # can be 0 or more
> >> > # note: increasing max_prepared_transactions costs ~600 bytes of shared
> >> > memory
> >> > # per transaction slot, plus lock space (see max_locks_per_transaction).
> >> > work_mem = 10000        #1024           # min 64, size in KB
> >> > maintenance_work_mem = 524288 #16384            # min 1024, size in KB
> >> > #max_stack_depth = 2048                 # min 100, size in KB
> >> >
> >> > I've also tried using the same settings from the old server on the new
> >> > one; same performance issues.
> >> >
> >>
> >> If this is a linux system, could you give us the output of the 'free'
> >> command?
> >
> >            total       used       free     shared    buffers     cached
> > Mem:       8059852    8042868      16984          0        228    7888648
> > -/+ buffers/cache:     153992    7905860
> > Swap:     15631224       2164   15629060
>
> So, I would set effective_cache_size = 988232 (7905860/8).
>
> >
> >> Postgresql might be choosing a bad plan because your effective_cache_size
> >> is
> >> way off (it's the default now right?).  Also, what was the block read/write
> >
> > yes it's set to the default.
> >
> >> speed of the SAN from your bonnie tests?  Probably want to tune
> >> random_page_cost as well if it's also at the default.
> >>
> >
> >                   ------Sequential Output------ --Sequential Input-
> > --Random-
> >                   -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> > --Seeks--
> > Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec
> > %CP
> > luna12-san   16000M 58896  91 62931   9 35870   5 54869  82 145504 13  397.7
> > 0
> >
>
> So, you're getting 62MB/s writes and 145MB/s reads.  Just FYI, that write
> speed is about the same as my single SATA drive write speed on my workstation,
> so not that great.  The read speed is decent, though and with that sort of
> read performance, you might want to lower random_page_cost to something like
> 2.5 or 2 so the planner will tend to prefer index scans.
>

Right, but the old box was getting ~45MBps on both reads and writes,
so it's an improvement for me :)  Thanks for the advice, I'll let you
know how it goes.

Alex

From:
"Alex Deucher"
Date:

On 3/1/07, Jeff Frost <> wrote:
> On Thu, 1 Mar 2007, Joshua D. Drake wrote:
>
> > Alex Deucher wrote:
> >> Hello,
> >>
> >> I have noticed a strange performance regression and I'm at a loss as
> >> to what's happening.  We have a fairly large database (~16 GB).  The
> >> original postgres 7.4 was running on a sun v880 with 4 CPUs and 8 GB
> >> of ram running Solaris on local scsi discs.  The new server is a sun
> >> Opteron box with 4 cores, 8 GB of ram running postgres 8.1.4 on Linux
> >> (AMD64) on a 4 Gbps FC SAN volume.  When we created the new database
> >> it was created from scratch rather than copying over the old one,
> >> however the table structure is almost identical (UTF8 on the new one
> >> vs. C on the old). The problem is queries are ~10x slower on the new
> >> hardware.  I read several places that the SAN might be to blame, but
> >> testing with bonnie and dd indicates that the SAN is actually almost
> >> twice as fast as the scsi discs in the old sun server.  I've tried
> >> adjusting just about every option in the postgres config file, but
> >> performance remains the same.  Any ideas?
> >
> > Vacuum? Analayze? default_statistics_target? How many shared_buffers?
> > effective_cache_size? work_mem?
>
> Also, an explain analyze from both the 7.4 and 8.1 systems with one of the
> 10x slower queries would probably be handy.

here are some examples.  Analyze is still running on the new db, I'll
post results when that is done.  Mostly what our apps do is prepared
row selects from different tables:
select c1,c2,c3,c4,c5 from t1 where c1='XXX';

old server:
db=# EXPLAIN ANALYZE select c1,c2 from t1 where c2='6258261';
                                                        QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------
 Index Scan using t1_c2_index on t1  (cost=0.00..166.89 rows=42
width=26) (actual time=5.722..5.809 rows=2 loops=1)
   Index Cond: ((c2)::text = '6258261'::text)
 Total runtime: 5.912 ms
(3 rows)

db=# EXPLAIN ANALYZE select c1,c2 from t1 where c1='6258261';
                                                         QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------
 Index Scan using t1_c1_key on t1  (cost=0.00..286.08 rows=72
width=26) (actual time=12.423..12.475 rows=12 loops=1)
   Index Cond: ((c1)::text = '6258261'::text)
 Total runtime: 12.538 ms
(3 rows)


new server:
db=# EXPLAIN ANALYZE select c1,c2 from t1 where c2='6258261';
                                                         QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------
 Index Scan using t1_c2_index on t1  (cost=0.00..37.63 rows=11
width=26) (actual time=33.461..51.377 rows=2 loops=1)
   Index Cond: ((c2)::text = '6258261'::text)
 Total runtime: 51.419 ms
(3 rows)

db=# EXPLAIN ANALYZE select c1,c2 from t1 where c1='6258261';
                                                           QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------
 Index Scan using t1_c1_index on t1  (cost=0.00..630.45 rows=2907
width=26) (actual time=45.733..46.271 rows=12 loops=1)
   Index Cond: ((c1)::text = '6258261'::text)
 Total runtime: 46.325 ms
(3 rows)


Alex

From:
Jeff Frost
Date:

On Thu, 1 Mar 2007, Alex Deucher wrote:

> here are some examples.  Analyze is still running on the new db, I'll
> post results when that is done.  Mostly what our apps do is prepared
> row selects from different tables:
> select c1,c2,c3,c4,c5 from t1 where c1='XXX';
>
> old server:
> db=# EXPLAIN ANALYZE select c1,c2 from t1 where c2='6258261';
>                                                       QUERY PLAN
>
---------------------------------------------------------------------------------------------------------------------------
> Index Scan using t1_c2_index on t1  (cost=0.00..166.89 rows=42
> width=26) (actual time=5.722..5.809 rows=2 loops=1)
>  Index Cond: ((c2)::text = '6258261'::text)
> Total runtime: 5.912 ms
> (3 rows)
>
> db=# EXPLAIN ANALYZE select c1,c2 from t1 where c1='6258261';
>                                                        QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------
> Index Scan using t1_c1_key on t1  (cost=0.00..286.08 rows=72
> width=26) (actual time=12.423..12.475 rows=12 loops=1)
>  Index Cond: ((c1)::text = '6258261'::text)
> Total runtime: 12.538 ms
> (3 rows)
>
>
> new server:
> db=# EXPLAIN ANALYZE select c1,c2 from t1 where c2='6258261';
>                                                        QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------
> Index Scan using t1_c2_index on t1  (cost=0.00..37.63 rows=11
> width=26) (actual time=33.461..51.377 rows=2 loops=1)
>  Index Cond: ((c2)::text = '6258261'::text)
> Total runtime: 51.419 ms
> (3 rows)
>
> db=# EXPLAIN ANALYZE select c1,c2 from t1 where c1='6258261';
>                                                          QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------------------
> Index Scan using t1_c1_index on t1  (cost=0.00..630.45 rows=2907
> width=26) (actual time=45.733..46.271 rows=12 loops=1)
>  Index Cond: ((c1)::text = '6258261'::text)
> Total runtime: 46.325 ms
> (3 rows)

Notice the huge disparity here betwen the expected number of rows (2907) and
the actual rows?  That's indicative of needing to run analyze.  The time is
only about 4x the 7.4 runtime and that's with the analyze running merrily
along in the background.  It's probably not as bad off as you think.  At least
this query isn't 10x. :-)

Run these again for us after analyze is complete.

--
Jeff Frost, Owner     <>
Frost Consulting, LLC     http://www.frostconsultingllc.com/
Phone: 650-780-7908    FAX: 650-649-1954

From:
Jeff Frost
Date:

On Thu, 1 Mar 2007, Alex Deucher wrote:

>> >> Postgresql might be choosing a bad plan because your
>> effective_cache_size
>> >> is
>> >> way off (it's the default now right?).  Also, what was the block
>> read/write
>> >
>> > yes it's set to the default.
>> >
>> >> speed of the SAN from your bonnie tests?  Probably want to tune
>> >> random_page_cost as well if it's also at the default.
>> >>
>> >
>> >                   ------Sequential Output------ --Sequential Input-
>> > --Random-
>> >                   -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
>> > --Seeks--
>> > Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
>> /sec
>> > %CP
>> > luna12-san   16000M 58896  91 62931   9 35870   5 54869  82 145504 13
>> 397.7
>> > 0
>> >
>>
>> So, you're getting 62MB/s writes and 145MB/s reads.  Just FYI, that write
>> speed is about the same as my single SATA drive write speed on my
>> workstation,
>> so not that great.  The read speed is decent, though and with that sort of
>> read performance, you might want to lower random_page_cost to something
>> like
>> 2.5 or 2 so the planner will tend to prefer index scans.
>>
>
> Right, but the old box was getting ~45MBps on both reads and writes,
> so it's an improvement for me :)  Thanks for the advice, I'll let you
> know how it goes.

Do you think that is because you have a different interface between you and
the SAN?  ~45MBps is pretty slow - your average 7200RPM ATA133 drive can do
that and costs quite a bit less than a SAN.

Is the SAN being shared between the database servers and other servers?  Maybe
it was just random timing that gave you the poor write performance on the old
server which might be also yielding occassional poor performance on the new
one.

--
Jeff Frost, Owner     <>
Frost Consulting, LLC     http://www.frostconsultingllc.com/
Phone: 650-780-7908    FAX: 650-649-1954

From:
Ron
Date:

At 07:36 PM 3/1/2007, Jeff Frost wrote:
>On Thu, 1 Mar 2007, Alex Deucher wrote:
>
>>> >> Postgresql might be choosing a bad plan because your
>>> effective_cache_size
>>> >> is
>>> >> way off (it's the default now right?).  Also, what was the
>>> block read/write
>>> >
>>> > yes it's set to the default.
>>> >
>>> >> speed of the SAN from your bonnie tests?  Probably want to tune
>>> >> random_page_cost as well if it's also at the default.
>>> >>
>>> >
>>> >                   ------Sequential Output------ --Sequential Input-
>>> > --Random-
>>> >                   -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
>>> > --Seeks--
>>> > Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP
>>> K/sec %CP /sec
>>> > %CP
>>> > luna12-san   16000M 58896  91 62931   9 35870   5 54869  82
>>> 145504 13 397.7
>>> > 0
>>> >
>>>So, you're getting 62MB/s writes and 145MB/s reads.  Just FYI, that write
>>>speed is about the same as my single SATA drive write speed on my
>>>workstation,
>>>so not that great.  The read speed is decent, though and with that sort of
>>>read performance, you might want to lower random_page_cost to something like
>>>2.5 or 2 so the planner will tend to prefer index scans.
>>
>>Right, but the old box was getting ~45MBps on both reads and writes,
>>so it's an improvement for me :)  Thanks for the advice, I'll let you
>>know how it goes.
>
>Do you think that is because you have a different interface between
>you and the SAN?  ~45MBps is pretty slow - your average 7200RPM
>ATA133 drive can do that and costs quite a bit less than a SAN.
>
>Is the SAN being shared between the database servers and other
>servers?  Maybe it was just random timing that gave you the poor
>write performance on the old server which might be also yielding
>occassional poor performance on the new one.

Remember that pg, even pg 8.2.3, has a known history of very poor
insert speed (see comments on this point by Josh Berkus,  Luke Lonergan, etc)

For some reason, the code changes that have resulted in dramatic
improvements in pg's read speed have not had nearly the same efficacy
for writes.

Bottom line: pg presently has a fairly low and fairly harsh upper
bound on write performance.   What exactly that bound is has been the
subject of some discussion, but IIUC the fact of its existence is
well established.

Various proposals for improving the situation exist, I've even made
some of them, but AFAIK this is currently considered one of the
"tough pg problems".

Cheers,
Ron Peacetree


From:
"Alex Deucher"
Date:

On 3/1/07, Jeff Frost <> wrote:
> On Thu, 1 Mar 2007, Alex Deucher wrote:
>
> >> >> Postgresql might be choosing a bad plan because your
> >> effective_cache_size
> >> >> is
> >> >> way off (it's the default now right?).  Also, what was the block
> >> read/write
> >> >
> >> > yes it's set to the default.
> >> >
> >> >> speed of the SAN from your bonnie tests?  Probably want to tune
> >> >> random_page_cost as well if it's also at the default.
> >> >>
> >> >
> >> >                   ------Sequential Output------ --Sequential Input-
> >> > --Random-
> >> >                   -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> >> > --Seeks--
> >> > Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
> >> /sec
> >> > %CP
> >> > luna12-san   16000M 58896  91 62931   9 35870   5 54869  82 145504 13
> >> 397.7
> >> > 0
> >> >
> >>
> >> So, you're getting 62MB/s writes and 145MB/s reads.  Just FYI, that write
> >> speed is about the same as my single SATA drive write speed on my
> >> workstation,
> >> so not that great.  The read speed is decent, though and with that sort of
> >> read performance, you might want to lower random_page_cost to something
> >> like
> >> 2.5 or 2 so the planner will tend to prefer index scans.
> >>
> >
> > Right, but the old box was getting ~45MBps on both reads and writes,
> > so it's an improvement for me :)  Thanks for the advice, I'll let you
> > know how it goes.
>
> Do you think that is because you have a different interface between you and
> the SAN?  ~45MBps is pretty slow - your average 7200RPM ATA133 drive can do
> that and costs quite a bit less than a SAN.
>
> Is the SAN being shared between the database servers and other servers?  Maybe
> it was just random timing that gave you the poor write performance on the old
> server which might be also yielding occassional poor performance on the new
> one.
>

The direct attached scsi discs on the old database server we getting
45MBps not the SAN.  The SAN got 62/145Mbps, which is not as bad.  We
have 4 servers on the SAN each with it's own 4 GBps FC link via an FC
switch.  I'll try and re-run the numbers when the servers are idle
this weekend.

Alex

From:
"Joshua D. Drake"
Date:

\
>> Is the SAN being shared between the database servers and other
>> servers?  Maybe
>> it was just random timing that gave you the poor write performance on
>> the old
>> server which might be also yielding occassional poor performance on
>> the new
>> one.
>>
>
> The direct attached scsi discs on the old database server we getting
> 45MBps not the SAN.  The SAN got 62/145Mbps, which is not as bad.

How many spindles you got in that SAN?

 We
> have 4 servers on the SAN each with it's own 4 GBps FC link via an FC
> switch.  I'll try and re-run the numbers when the servers are idle
> this weekend.
>
> Alex
>


--

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


From:
"Alex Deucher"
Date:

On 3/1/07, Joshua D. Drake <> wrote:
> \
> >> Is the SAN being shared between the database servers and other
> >> servers?  Maybe
> >> it was just random timing that gave you the poor write performance on
> >> the old
> >> server which might be also yielding occassional poor performance on
> >> the new
> >> one.
> >>
> >
> > The direct attached scsi discs on the old database server we getting
> > 45MBps not the SAN.  The SAN got 62/145Mbps, which is not as bad.
>
> How many spindles you got in that SAN?

105 IIRC.

Alex

From:
Jeff Frost
Date:

On Thu, 1 Mar 2007, Alex Deucher wrote:

> On 3/1/07, Jeff Frost <> wrote:
>> On Thu, 1 Mar 2007, Alex Deucher wrote:
>>
>> >> >> Postgresql might be choosing a bad plan because your
>> >> effective_cache_size
>> >> >> is
>> >> >> way off (it's the default now right?).  Also, what was the block
>> >> read/write
>> >> >
>> >> > yes it's set to the default.
>> >> >
>> >> >> speed of the SAN from your bonnie tests?  Probably want to tune
>> >> >> random_page_cost as well if it's also at the default.
>> >> >>
>> >> >
>> >> >                   ------Sequential Output------ --Sequential Input-
>> >> > --Random-
>> >> >                   -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
>> >> > --Seeks--
>> >> > Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
>> >> /sec
>> >> > %CP
>> >> > luna12-san   16000M 58896  91 62931   9 35870   5 54869  82 145504 13
>> >> 397.7
>> >> > 0
>> >> >
>> >>
>> >> So, you're getting 62MB/s writes and 145MB/s reads.  Just FYI, that
>> write
>> >> speed is about the same as my single SATA drive write speed on my
>> >> workstation,
>> >> so not that great.  The read speed is decent, though and with that sort
>> of
>> >> read performance, you might want to lower random_page_cost to something
>> >> like
>> >> 2.5 or 2 so the planner will tend to prefer index scans.
>> >>
>> >
>> > Right, but the old box was getting ~45MBps on both reads and writes,
>> > so it's an improvement for me :)  Thanks for the advice, I'll let you
>> > know how it goes.
>>
>> Do you think that is because you have a different interface between you and
>> the SAN?  ~45MBps is pretty slow - your average 7200RPM ATA133 drive can do
>> that and costs quite a bit less than a SAN.
>>
>> Is the SAN being shared between the database servers and other servers?
>> Maybe
>> it was just random timing that gave you the poor write performance on the
>> old
>> server which might be also yielding occassional poor performance on the new
>> one.
>>
>
> The direct attached scsi discs on the old database server we getting
> 45MBps not the SAN.  The SAN got 62/145Mbps, which is not as bad.  We
> have 4 servers on the SAN each with it's own 4 GBps FC link via an FC
> switch.  I'll try and re-run the numbers when the servers are idle
> this weekend.

Sorry, I thought the old server was also attached to the SAN.  My fault for
not hanging onto the entire email thread.

I think you're mixing and matching your capitol and lower case Bs in your
sentence above though. :-)

I suspect what you really mean is The SAN got 62/145MBps (megabytes/sec) and
teh FC link is 4Gbps (gigabits/sec) or 500MBps.  Is that correct?  If so, and
seeing that you think there are 105 spindles on the SAN, I'd say you're either
maxxing out the switch fabric of the SAN with your servers or you have a
really poorly performing SAN in general, or you just misunderstood the .

As a comparison With 8 WD Raptors configured in a RAID10 with normal ext3 I
get about 160MB/s write and 305MB/s read performance.  Hopefully the SAN has
lots of other super nifty features that make up for the poor performance. :-(

--
Jeff Frost, Owner     <>
Frost Consulting, LLC     http://www.frostconsultingllc.com/
Phone: 650-780-7908    FAX: 650-649-1954

From:
"Alex Deucher"
Date:

On 3/1/07, Jeff Frost <> wrote:
> On Thu, 1 Mar 2007, Alex Deucher wrote:
>
> > On 3/1/07, Jeff Frost <> wrote:
> >> On Thu, 1 Mar 2007, Alex Deucher wrote:
> >>
> >> >> >> Postgresql might be choosing a bad plan because your
> >> >> effective_cache_size
> >> >> >> is
> >> >> >> way off (it's the default now right?).  Also, what was the block
> >> >> read/write
> >> >> >
> >> >> > yes it's set to the default.
> >> >> >
> >> >> >> speed of the SAN from your bonnie tests?  Probably want to tune
> >> >> >> random_page_cost as well if it's also at the default.
> >> >> >>
> >> >> >
> >> >> >                   ------Sequential Output------ --Sequential Input-
> >> >> > --Random-
> >> >> >                   -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> >> >> > --Seeks--
> >> >> > Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
> >> >> /sec
> >> >> > %CP
> >> >> > luna12-san   16000M 58896  91 62931   9 35870   5 54869  82 145504 13
> >> >> 397.7
> >> >> > 0
> >> >> >
> >> >>
> >> >> So, you're getting 62MB/s writes and 145MB/s reads.  Just FYI, that
> >> write
> >> >> speed is about the same as my single SATA drive write speed on my
> >> >> workstation,
> >> >> so not that great.  The read speed is decent, though and with that sort
> >> of
> >> >> read performance, you might want to lower random_page_cost to something
> >> >> like
> >> >> 2.5 or 2 so the planner will tend to prefer index scans.
> >> >>
> >> >
> >> > Right, but the old box was getting ~45MBps on both reads and writes,
> >> > so it's an improvement for me :)  Thanks for the advice, I'll let you
> >> > know how it goes.
> >>
> >> Do you think that is because you have a different interface between you and
> >> the SAN?  ~45MBps is pretty slow - your average 7200RPM ATA133 drive can do
> >> that and costs quite a bit less than a SAN.
> >>
> >> Is the SAN being shared between the database servers and other servers?
> >> Maybe
> >> it was just random timing that gave you the poor write performance on the
> >> old
> >> server which might be also yielding occassional poor performance on the new
> >> one.
> >>
> >
> > The direct attached scsi discs on the old database server we getting
> > 45MBps not the SAN.  The SAN got 62/145Mbps, which is not as bad.  We
> > have 4 servers on the SAN each with it's own 4 GBps FC link via an FC
> > switch.  I'll try and re-run the numbers when the servers are idle
> > this weekend.
>
> Sorry, I thought the old server was also attached to the SAN.  My fault for
> not hanging onto the entire email thread.
>
> I think you're mixing and matching your capitol and lower case Bs in your
> sentence above though. :-)

whoops :)

>
> I suspect what you really mean is The SAN got 62/145MBps (megabytes/sec) and
> teh FC link is 4Gbps (gigabits/sec) or 500MBps.  Is that correct?  If so, and
> seeing that you think there are 105 spindles on the SAN, I'd say you're either
> maxxing out the switch fabric of the SAN with your servers or you have a
> really poorly performing SAN in general, or you just misunderstood the .
>
> As a comparison With 8 WD Raptors configured in a RAID10 with normal ext3 I
> get about 160MB/s write and 305MB/s read performance.  Hopefully the SAN has
> lots of other super nifty features that make up for the poor performance. :-(
>

It's big and reliable (and compared to lots of others, relatively
inexpensive) which is why we bought it.  We bought it mostly as a huge
file store.  The RAID groups on the SAN were set up for maximum
capacity rather than for performance.  Using it for the databases just
came up recently.

Alex

From:
"Joshua D. Drake"
Date:

Alex Deucher wrote:
> On 3/1/07, Joshua D. Drake <> wrote:
>> \
>> >> Is the SAN being shared between the database servers and other
>> >> servers?  Maybe
>> >> it was just random timing that gave you the poor write performance on
>> >> the old
>> >> server which might be also yielding occassional poor performance on
>> >> the new
>> >> one.
>> >>
>> >
>> > The direct attached scsi discs on the old database server we getting
>> > 45MBps not the SAN.  The SAN got 62/145Mbps, which is not as bad.
>>
>> How many spindles you got in that SAN?
>
> 105 IIRC.

You have 105 spindles are you are only get 62megs on writes? That seems
about half what you should be getting. (at least).

Joshua D. Drake


>
> Alex
>


--

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


From:
"Alex Deucher"
Date:

On 3/1/07, Joshua D. Drake <> wrote:
> Alex Deucher wrote:
> > On 3/1/07, Joshua D. Drake <> wrote:
> >> \
> >> >> Is the SAN being shared between the database servers and other
> >> >> servers?  Maybe
> >> >> it was just random timing that gave you the poor write performance on
> >> >> the old
> >> >> server which might be also yielding occassional poor performance on
> >> >> the new
> >> >> one.
> >> >>
> >> >
> >> > The direct attached scsi discs on the old database server we getting
> >> > 45MBps not the SAN.  The SAN got 62/145Mbps, which is not as bad.
> >>
> >> How many spindles you got in that SAN?
> >
> > 105 IIRC.
>
> You have 105 spindles are you are only get 62megs on writes? That seems
> about half what you should be getting. (at least).
>

Take the numbers with grain of salt.  They are by no means a thorough
evaluation.  I just ran bonnie a couple times to get a rough reference
point.  I can do a more thorough analysis.

Alex

> Joshua D. Drake
>
>
> >
> > Alex
> >
>

From:
Florian Weimer
Date:

* Alex Deucher:

> I have noticed a strange performance regression and I'm at a loss as
> to what's happening.  We have a fairly large database (~16 GB).

Sorry for asking, but is this a typo?  Do you mean 16 *TB* instead of
16 *GB*?

If it's really 16 GB, you should check if it's cheaper to buy more RAM
than to fiddle with the existing infrastructure.

> however the table structure is almost identical (UTF8 on the new one
> vs. C on the old).

Locale settings make a huge difference for sorting and LIKE queries.
We usually use the C locale and SQL_ASCII encoding, mostly for
performance reasons.  (Proper UTF-8 can be enforced through
constraints if necessary.)

--
Florian Weimer                <>
BFK edv-consulting GmbH       http://www.bfk.de/
Kriegsstraße 100              tel: +49-721-96201-1
D-76133 Karlsruhe             fax: +49-721-96201-99

From:
Carlos Moreno
Date:

Florian Weimer wrote:
> * Alex Deucher:
>
>
>> I have noticed a strange performance regression and I'm at a loss as
>> to what's happening.  We have a fairly large database (~16 GB).
>>
>
> Sorry for asking, but is this a typo?  Do you mean 16 *TB* instead of
> 16 *GB*?
>
> If it's really 16 GB, you should check if it's cheaper to buy more RAM
> than to fiddle with the existing infrastructure.
>

This brings me to a related question:

Do I need to specifically configure something to take advantage of
such increase of RAM?

In particular, is the amount of things that postgres can do with RAM
limited by the amount of shared_buffers or some other parameter?
Should shared_buffers be a fixed fraction of the total amount of
physical RAM, or should it be the total amount minus half a gigabyte
or so?

As an example, if one upgrades a host from 1GB to 4GB, what would
be the right thing to do in the configuration, assuming 8.1 or 8.2?  (at
least what would be the critical aspects?)

Thanks,

Carlos
--


From:
"Alex Deucher"
Date:

On 3/2/07, Florian Weimer <> wrote:
> * Alex Deucher:
>
> > I have noticed a strange performance regression and I'm at a loss as
> > to what's happening.  We have a fairly large database (~16 GB).
>
> Sorry for asking, but is this a typo?  Do you mean 16 *TB* instead of
> 16 *GB*?
>
> If it's really 16 GB, you should check if it's cheaper to buy more RAM
> than to fiddle with the existing infrastructure.
>

Yes, 16 GB.  I'd rather not shell out for more ram, if I'm not even
sure that will help.  The new system should be faster, or at least as
fast, so I'd like to sort out what's going on before I buy more ram.


> > however the table structure is almost identical (UTF8 on the new one
> > vs. C on the old).
>
> Locale settings make a huge difference for sorting and LIKE queries.
> We usually use the C locale and SQL_ASCII encoding, mostly for
> performance reasons.  (Proper UTF-8 can be enforced through
> constraints if necessary.)
>

I suppose that might be a factor.  How much of a performance
difference do you see between utf-8 and C?


Alex

From:
Ron
Date:

At 08:56 AM 3/2/2007, Carlos Moreno wrote:
>Florian Weimer wrote:
>>* Alex Deucher:
>>
>>
>>>I have noticed a strange performance regression and I'm at a loss as
>>>to what's happening.  We have a fairly large database (~16 GB).
>>>
>>
>>Sorry for asking, but is this a typo?  Do you mean 16 *TB* instead of
>>16 *GB*?
>>
>>If it's really 16 GB, you should check if it's cheaper to buy more RAM
>>than to fiddle with the existing infrastructure.
>>
>
>This brings me to a related question:
>
>Do I need to specifically configure something to take advantage of
>such increase of RAM?
>
>In particular, is the amount of things that postgres can do with RAM
>limited by the amount of shared_buffers or some other parameter?
>Should shared_buffers be a fixed fraction of the total amount of
>physical RAM, or should it be the total amount minus half a gigabyte
>or so?
>
>As an example, if one upgrades a host from 1GB to 4GB, what would
>be the right thing to do in the configuration, assuming 8.1 or 8.2?  (at
>least what would be the critical aspects?)
>
>Thanks,
>
>Carlos

Unfortunately, pg does not (yet! ;-) ) treat all available RAM as a
common pool and dynamically allocate it intelligently to each of the
various memory data structures.

So if you increase your RAM,  you will have to manually change the
entries in the pg config file to take advantage of it.
(and start pg after changing it for the new config values to take effect)

The pertinent values are all those listed under "Memory" in the
annotated pg conf file:  shared_buffers, work_mem, maintenance_work_mem, etc.
http://www.powerpostgresql.com/Downloads/annotated_conf_80.html

Cheers,
Ron Peacetree


From:
"Alex Deucher"
Date:

On 3/1/07, Jeff Frost <> wrote:
> On Thu, 1 Mar 2007, Alex Deucher wrote:
>
> > here are some examples.  Analyze is still running on the new db, I'll
> > post results when that is done.  Mostly what our apps do is prepared
> > row selects from different tables:
> > select c1,c2,c3,c4,c5 from t1 where c1='XXX';
> >
> > old server:
> > db=# EXPLAIN ANALYZE select c1,c2 from t1 where c2='6258261';
> >                                                       QUERY PLAN
> >
---------------------------------------------------------------------------------------------------------------------------
> > Index Scan using t1_c2_index on t1  (cost=0.00..166.89 rows=42
> > width=26) (actual time=5.722..5.809 rows=2 loops=1)
> >  Index Cond: ((c2)::text = '6258261'::text)
> > Total runtime: 5.912 ms
> > (3 rows)
> >
> > db=# EXPLAIN ANALYZE select c1,c2 from t1 where c1='6258261';
> >                                                        QUERY PLAN
> >
----------------------------------------------------------------------------------------------------------------------------
> > Index Scan using t1_c1_key on t1  (cost=0.00..286.08 rows=72
> > width=26) (actual time=12.423..12.475 rows=12 loops=1)
> >  Index Cond: ((c1)::text = '6258261'::text)
> > Total runtime: 12.538 ms
> > (3 rows)
> >
> >
> > new server:
> > db=# EXPLAIN ANALYZE select c1,c2 from t1 where c2='6258261';
> >                                                        QUERY PLAN
> >
----------------------------------------------------------------------------------------------------------------------------
> > Index Scan using t1_c2_index on t1  (cost=0.00..37.63 rows=11
> > width=26) (actual time=33.461..51.377 rows=2 loops=1)
> >  Index Cond: ((c2)::text = '6258261'::text)
> > Total runtime: 51.419 ms
> > (3 rows)
> >
> > db=# EXPLAIN ANALYZE select c1,c2 from t1 where c1='6258261';
> >                                                          QUERY PLAN
> >
--------------------------------------------------------------------------------------------------------------------------------
> > Index Scan using t1_c1_index on t1  (cost=0.00..630.45 rows=2907
> > width=26) (actual time=45.733..46.271 rows=12 loops=1)
> >  Index Cond: ((c1)::text = '6258261'::text)
> > Total runtime: 46.325 ms
> > (3 rows)
>
> Notice the huge disparity here betwen the expected number of rows (2907) and
> the actual rows?  That's indicative of needing to run analyze.  The time is
> only about 4x the 7.4 runtime and that's with the analyze running merrily
> along in the background.  It's probably not as bad off as you think.  At least
> this query isn't 10x. :-)
>
> Run these again for us after analyze is complete.

well, while the DB isn't 10x, the application using the DB shoes a 10x
decrease in performance.  Pages that used to take 5 seconds to load
take 50 secs (I supposed the problem is compounded as there are
several queries per page.).  Anyway, new numbers after the analyze.
Unfortunately, they are improved, but still not great:

old server:
db=# EXPLAIN ANALYZE select c1,c2 from t1 where c2='6258261';
                                                        QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------
 Index Scan using t1_c2_index on t1  (cost=0.00..166.89 rows=42
width=26) (actual time=0.204..0.284 rows=2 loops=1)
   Index Cond: ((c2)::text = '6258261'::text)
 Total runtime: 0.421 ms
(3 rows)

db=# EXPLAIN ANALYZE select c1,c2 from t1 where c1='6258261';
                                                        QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
 Index Scan using t1_c1_key on t1  (cost=0.00..286.08 rows=72
width=26) (actual time=0.299..0.354 rows=12 loops=1)
   Index Cond: ((c1)::text = '6258261'::text)
 Total runtime: 0.451 ms
(3 rows)



new server:
db=# EXPLAIN ANALYZE select c1,c2 from t1 where c2='6258261';
                                                        QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
 Index Scan using t1_c2_index on t1  (cost=0.00..37.63 rows=11
width=26) (actual time=0.126..0.134 rows=2 loops=1)
   Index Cond: ((c2)::text = '6258261'::text)
 Total runtime: 0.197 ms
(3 rows)

db=# EXPLAIN ANALYZE select c1,c2 from t1 where c1='6258261';
                                                          QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------
 Index Scan using t1_c1_index on t1  (cost=0.00..630.45 rows=2907
width=26) (actual time=5.820..5.848 rows=12 loops=1)
   Index Cond: ((c1)::text = '6258261'::text)
 Total runtime: 5.899 ms
(3 rows)

Here's another example:
old server:
db=# EXPLAIN ANALYZE select c1,c2 from t1 where c2='6000001';
                                                         QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------
 Index Scan using t1_c2_index on t1  (cost=0.00..166.89 rows=42
width=26) (actual time=4.031..55.349 rows=8 loops=1)
   Index Cond: ((c2)::text = '6000001'::text)
 Total runtime: 55.459 ms
(3 rows)

db=# EXPLAIN ANALYZE select c1,c2 from t1 where c1='6000001';
                                                       QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------
 Index Scan using t1_c1_key on t1  (cost=0.00..286.08 rows=72
width=26) (actual time=0.183..0.203 rows=4 loops=1)
   Index Cond: ((c1)::text = '6000001'::text)
 Total runtime: 0.289 ms
(3 rows)


new server:
db=# EXPLAIN ANALYZE select c1,c2 from t1 where c2='6000001';
                                                          QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------
 Index Scan using t1_c2_index on t1  (cost=0.00..37.63 rows=11
width=26) (actual time=115.412..202.151 rows=8 loops=1)
   Index Cond: ((c2)::text = '6000001'::text)
 Total runtime: 202.234 ms
(3 rows)

db=# EXPLAIN ANALYZE select c1,c2 from t1 where c1='6000001';
                                                          QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------
 Index Scan using t1_c1_index on t1  (cost=0.00..630.45 rows=2907
width=26) (actual time=99.811..99.820 rows=4 loops=1)
   Index Cond: ((c1)::text = '6000001'::text)
 Total runtime: 99.861 ms
(3 rows)

I haven't gotten a chance to restart postgres this the config changes
you suggested yet.  The rows have improved for some but not all and
the times are still slow.  Any ideas?

Alex

From:
Ron
Date:

At 10:16 AM 3/2/2007, Alex Deucher wrote:
>On 3/2/07, Florian Weimer <> wrote:
>>* Alex Deucher:
>>
>> > I have noticed a strange performance regression and I'm at a loss as
>> > to what's happening.  We have a fairly large database (~16 GB).
>>
>>Sorry for asking, but is this a typo?  Do you mean 16 *TB* instead of
>>16 *GB*?
>>
>>If it's really 16 GB, you should check if it's cheaper to buy more RAM
>>than to fiddle with the existing infrastructure.
>
>Yes, 16 GB.  I'd rather not shell out for more ram, if I'm not even
>sure that will help.  The new system should be faster, or at least as
>fast, so I'd like to sort out what's going on before I buy more ram.
>
OK.  You
a= went from pg 7.4.x to 8.1.4 AND

b= you changed from 4 SPARC CPUs (how many cores?  If this is > 4...)
to 2 2C Opterons AND
(SPEC and TPC bench differences between these CPUs?)

c= you went from a Sun box to a "white box" AND
(memory subsystem differences?  other differences?)

d=  you went from local HD IO to a SAN
(many differences hidden in that one line...  ...and is the physical
layout of tables and things like pg_xlog sane on the SAN?)


...and you did this by just pulling over the old DB onto the new HW?

May I suggest that it is possible that your schema, queries, etc were
all optimized for pg 7.x running on the old HW?
(explain analyze shows the old system taking ~1/10 the time per row
as well as estimating the number of rows more accurately)

RAM is =cheap=.  Much cheaper than the cost of a detective hunt
followed by rework to queries, schema, etc.
Fitting the entire DB into RAM is guaranteed to help unless this is
an OLTP like application where HD IO is  required to be synchronous..
If you can fit the entire DB comfortably into RAM, do it and buy
yourself the time to figure out the rest of the story w/o impacting
on production performance.

Cheers,
Ron Peacetree


From:
"Alex Deucher"
Date:

On 3/2/07, Ron <> wrote:
> At 10:16 AM 3/2/2007, Alex Deucher wrote:
> >On 3/2/07, Florian Weimer <> wrote:
> >>* Alex Deucher:
> >>
> >> > I have noticed a strange performance regression and I'm at a loss as
> >> > to what's happening.  We have a fairly large database (~16 GB).
> >>
> >>Sorry for asking, but is this a typo?  Do you mean 16 *TB* instead of
> >>16 *GB*?
> >>
> >>If it's really 16 GB, you should check if it's cheaper to buy more RAM
> >>than to fiddle with the existing infrastructure.
> >
> >Yes, 16 GB.  I'd rather not shell out for more ram, if I'm not even
> >sure that will help.  The new system should be faster, or at least as
> >fast, so I'd like to sort out what's going on before I buy more ram.
> >
> OK.  You
> a= went from pg 7.4.x to 8.1.4 AND
>

yes.

> b= you changed from 4 SPARC CPUs (how many cores?  If this is > 4...)
> to 2 2C Opterons AND
> (SPEC and TPC bench differences between these CPUs?)
>

4 single core 800 Mhz sparcs to 2 dual core 2.2 Ghz opterons.

> c= you went from a Sun box to a "white box" AND
> (memory subsystem differences?  other differences?)
>

The new hardware is Sun as well. X4100s running Linux.  It should be
faster all around because the old server is 5 years old.

> d=  you went from local HD IO to a SAN
> (many differences hidden in that one line...  ...and is the physical
> layout of tables and things like pg_xlog sane on the SAN?)
>
>
> ...and you did this by just pulling over the old DB onto the new HW?
>

We rebuild the DB from scratch on the new server.  Same table
structure though.  We reloaded from the source material directly.

> May I suggest that it is possible that your schema, queries, etc were
> all optimized for pg 7.x running on the old HW?
> (explain analyze shows the old system taking ~1/10 the time per row
> as well as estimating the number of rows more accurately)
>
> RAM is =cheap=.  Much cheaper than the cost of a detective hunt
> followed by rework to queries, schema, etc.
> Fitting the entire DB into RAM is guaranteed to help unless this is
> an OLTP like application where HD IO is  required to be synchronous..
> If you can fit the entire DB comfortably into RAM, do it and buy
> yourself the time to figure out the rest of the story w/o impacting
> on production performance.

Perhaps so.  I just don't want to spend $1000 on ram and have it only
marginally improve performance if at all.  The old DB works, so we can
keep using that until we sort this out.

Alex

>
> Cheers,
> Ron Peacetree
>
>

From:
Tom Lane
Date:

"Alex Deucher" <> writes:
> Anyway, new numbers after the analyze.
> Unfortunately, they are improved, but still not great:

Why are the index names different between the old and new servers?
Is that just cosmetic, or is 8.2 actually picking a different
(and less suitable) index for the c1 queries?

            regards, tom lane

From:
"Alex Deucher"
Date:

On 3/2/07, Tom Lane <> wrote:
> "Alex Deucher" <> writes:
> > Anyway, new numbers after the analyze.
> > Unfortunately, they are improved, but still not great:
>
> Why are the index names different between the old and new servers?
> Is that just cosmetic, or is 8.2 actually picking a different
> (and less suitable) index for the c1 queries?
>

That's just cosmetic.  They are the same.

Alex

From:
Scott Marlowe
Date:

On Fri, 2007-03-02 at 10:03, Alex Deucher wrote:
> On 3/2/07, Ron <> wrote:
> > At 10:16 AM 3/2/2007, Alex Deucher wrote:

> > d=  you went from local HD IO to a SAN
> > (many differences hidden in that one line...  ...and is the physical
> > layout of tables and things like pg_xlog sane on the SAN?)
> >
> >
> > ...and you did this by just pulling over the old DB onto the new HW?
> >
>
> We rebuild the DB from scratch on the new server.  Same table
> structure though.  We reloaded from the source material directly.

I would REALLY recommend testing this machine out with a simple software
RAID-1 pair of SCSI or SATA drives just to eliminate or confirm the SAN
as the root problem.


From:
Ron
Date:

At 11:03 AM 3/2/2007, Alex Deucher wrote:
>On 3/2/07, Ron <> wrote:
>
>>May I suggest that it is possible that your schema, queries, etc were
>>all optimized for pg 7.x running on the old HW?
>>(explain analyze shows the old system taking ~1/10 the time per row
>>as well as estimating the number of rows more accurately)
>>
>>RAM is =cheap=.  Much cheaper than the cost of a detective hunt
>>followed by rework to queries, schema, etc.
>>Fitting the entire DB into RAM is guaranteed to help unless this is
>>an OLTP like application where HD IO is  required to be synchronous..
>>If you can fit the entire DB comfortably into RAM, do it and buy
>>yourself the time to figure out the rest of the story w/o impacting
>>on production performance.
>
>Perhaps so.  I just don't want to spend $1000 on ram and have it only
>marginally improve performance if at all.  The old DB works, so we can
>keep using that until we sort this out.
>
>Alex
1=  $1000 worth of RAM is very likely less than the $ worth of, say,
10 hours of your time to your company.  Perhaps much less.
(Your =worth=, not your pay or even your fully loaded cost.  This
number tends to be >= 4x what you are paid unless the organization
you are working for is in imminent financial danger.)
You've already put more considerably more than 10 hours of your time
into this...

2= If the DB goes from not fitting completely into RAM to being
completely RAM resident, you are almost 100% guaranteed a big
performance boost.
The exception is an OLTP like app where DB writes can't be done
a-synchronously (doing financial transactions, real time control systems, etc).
Data mines should never have this issue.

3= Whether adding enough RAM to make the DB RAM resident (and
re-configuring conf, etc, appropriately) solves the problem or not,
you will have gotten a serious lead as to what's wrong.

...and I still think looking closely at the actual physical layout of
the tables in the SAN is likely to be worth it.

Cheers,
Ron Peacetree



From:
Anton Rommerskirchen
Date:

Am Donnerstag 01 März 2007 21:44 schrieb Alex Deucher:
> Hello,
>
> I have noticed a strange performance regression and I'm at a loss as
> to what's happening.  We have a fairly large database (~16 GB).  The
> original postgres 7.4 was running on a sun v880 with 4 CPUs and 8 GB
> of ram running Solaris on local scsi discs.  The new server is a sun
> Opteron box with 4 cores, 8 GB of ram running postgres 8.1.4 on Linux
> (AMD64) on a 4 Gbps FC SAN volume.  When we created the new database
> it was created from scratch rather than copying over the old one,
> however the table structure is almost identical (UTF8 on the new one
> vs. C on the old). The problem is queries are ~10x slower on the new
> hardware.  I read several places that the SAN might be to blame, but
> testing with bonnie and dd indicates that the SAN is actually almost
> twice as fast as the scsi discs in the old sun server.  I've tried
> adjusting just about every option in the postgres config file, but
> performance remains the same.  Any ideas?
>


1. Do you use NUMA ctl for locking the db on one node ?

2. do you use bios to interleave memeory ?

3. do you expand cache over mor than one numa node ?

> Thanks,
>
> Alex
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq

--

ATRSoft GmbH
Rosellstrasse 9
D 50354 Hürth
Deutschland
Tel .: +49(0)2233 691324

Geschäftsführer Anton Rommerskirchen

Köln HRB 44927
STNR 224/5701 - 1010

From:
Alvaro Herrera
Date:

Florian Weimer escribió:

> Locale settings make a huge difference for sorting and LIKE queries.
> We usually use the C locale and SQL_ASCII encoding, mostly for
> performance reasons.  (Proper UTF-8 can be enforced through
> constraints if necessary.)

Hmm, you are aware of varchar_pattern_ops and related opclasses, right?
That helps for LIKE queries in non-C locales (though you do have to keep
almost-duplicate indexes).

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:
"Alex Deucher"
Date:

On 3/2/07, Ron <> wrote:
> At 11:03 AM 3/2/2007, Alex Deucher wrote:
> >On 3/2/07, Ron <> wrote:
> >
> >>May I suggest that it is possible that your schema, queries, etc were
> >>all optimized for pg 7.x running on the old HW?
> >>(explain analyze shows the old system taking ~1/10 the time per row
> >>as well as estimating the number of rows more accurately)
> >>
> >>RAM is =cheap=.  Much cheaper than the cost of a detective hunt
> >>followed by rework to queries, schema, etc.
> >>Fitting the entire DB into RAM is guaranteed to help unless this is
> >>an OLTP like application where HD IO is  required to be synchronous..
> >>If you can fit the entire DB comfortably into RAM, do it and buy
> >>yourself the time to figure out the rest of the story w/o impacting
> >>on production performance.
> >
> >Perhaps so.  I just don't want to spend $1000 on ram and have it only
> >marginally improve performance if at all.  The old DB works, so we can
> >keep using that until we sort this out.
> >
> >Alex
> 1=  $1000 worth of RAM is very likely less than the $ worth of, say,
> 10 hours of your time to your company.  Perhaps much less.
> (Your =worth=, not your pay or even your fully loaded cost.  This
> number tends to be >= 4x what you are paid unless the organization
> you are working for is in imminent financial danger.)
> You've already put more considerably more than 10 hours of your time
> into this...
>
> 2= If the DB goes from not fitting completely into RAM to being
> completely RAM resident, you are almost 100% guaranteed a big
> performance boost.
> The exception is an OLTP like app where DB writes can't be done
> a-synchronously (doing financial transactions, real time control systems, etc).
> Data mines should never have this issue.
>
> 3= Whether adding enough RAM to make the DB RAM resident (and
> re-configuring conf, etc, appropriately) solves the problem or not,
> you will have gotten a serious lead as to what's wrong.
>
> ...and I still think looking closely at the actual physical layout of
> the tables in the SAN is likely to be worth it.
>

How would I go about doing that?

Thanks,

Alex

From:
Ron
Date:

At 02:43 PM 3/2/2007, Alex Deucher wrote:
>On 3/2/07, Ron <> wrote:
>>
>>...and I still think looking closely at the actual physical layout of
>>the tables in the SAN is likely to be worth it.
>
>How would I go about doing that?
>
>Alex

Hard for me to give specific advice when I don't know what SAN
product we are talking about nor what kind of HDs are in it nor how
those HDs are presently configured...

I quote you in an earlier post:
"The RAID groups on the SAN were set up for maximum capacity rather
than for performance.  Using it for the databases just came up recently."

That implies to me that the SAN is more or less set up as a huge 105
HD (assuming this number is correct?  We all know how "assume" is
spelled...) JBOD or RAID 5 (or 6, or 5*, or 6*) set.

=IF= that is true, tables are not being given dedicated RAID
groups.  That implies that traditional lore like having pg_xlog on
dedicated spindles is being ignored.
Nor is the more general Best Practice of putting the most heavily
used tables onto dedicated spindles being followed.

In addition, the most space efficient RAID levels: 5* or 6*, are not
the best performing one (RAID 10 striping your mirrors)

In short, configuring a SAN for maximum capacity is exactly the wrong
thing to do if one is planning to use it in the best way to support
DB performance.

I assume (there's that word again...) that there is someone in your
organization who understands how the SAN is configured and administered.
You need to talk to them about these issues.

Cheers,
Ron



From:
"Alex Deucher"
Date:

On 3/2/07, Ron <> wrote:
> At 02:43 PM 3/2/2007, Alex Deucher wrote:
> >On 3/2/07, Ron <> wrote:
> >>
> >>...and I still think looking closely at the actual physical layout of
> >>the tables in the SAN is likely to be worth it.
> >
> >How would I go about doing that?
> >
> >Alex
>
> Hard for me to give specific advice when I don't know what SAN
> product we are talking about nor what kind of HDs are in it nor how
> those HDs are presently configured...
>
> I quote you in an earlier post:
> "The RAID groups on the SAN were set up for maximum capacity rather
> than for performance.  Using it for the databases just came up recently."
>
> That implies to me that the SAN is more or less set up as a huge 105
> HD (assuming this number is correct?  We all know how "assume" is
> spelled...) JBOD or RAID 5 (or 6, or 5*, or 6*) set.
>
> =IF= that is true, tables are not being given dedicated RAID
> groups.  That implies that traditional lore like having pg_xlog on
> dedicated spindles is being ignored.
> Nor is the more general Best Practice of putting the most heavily
> used tables onto dedicated spindles being followed.
>
> In addition, the most space efficient RAID levels: 5* or 6*, are not
> the best performing one (RAID 10 striping your mirrors)
>
> In short, configuring a SAN for maximum capacity is exactly the wrong
> thing to do if one is planning to use it in the best way to support
> DB performance.
>
> I assume (there's that word again...) that there is someone in your
> organization who understands how the SAN is configured and administered.
> You need to talk to them about these issues.
>

Ah OK.  I see what you are saying;  thank you for clarifying.  Yes,
the SAN is configured for maximum capacity; it has large RAID 5
groups.  As I said earlier, we never intended to run a DB on the SAN,
it just happened to come up, hence the configuration.

Alex

From:
Guido Neitzer
Date:

On 02.03.2007, at 14:20, Alex Deucher wrote:

> Ah OK.  I see what you are saying;  thank you for clarifying.  Yes,
> the SAN is configured for maximum capacity; it has large RAID 5
> groups.  As I said earlier, we never intended to run a DB on the SAN,
> it just happened to come up, hence the configuration.

So why not dumping the stuff ones, importing into a PG configured to
use local discs (Or even ONE local disc, you might have the 16GB you
gave as a size for the db on the local machine, right?) and testing
whether the problem is with PG connecting to the SAN. So you have one
factor less to consider after all your changes.

Maybe it's just that something in the chain from PG to the actual HD
spindles kills your random access performance for getting the actual
rows.

cug

From:
Jeff Frost
Date:

On Fri, 2 Mar 2007, Guido Neitzer wrote:

> On 02.03.2007, at 14:20, Alex Deucher wrote:
>
>> Ah OK.  I see what you are saying;  thank you for clarifying.  Yes,
>> the SAN is configured for maximum capacity; it has large RAID 5
>> groups.  As I said earlier, we never intended to run a DB on the SAN,
>> it just happened to come up, hence the configuration.
>
> So why not dumping the stuff ones, importing into a PG configured to use
> local discs (Or even ONE local disc, you might have the 16GB you gave as a
> size for the db on the local machine, right?) and testing whether the problem
> is with PG connecting to the SAN. So you have one factor less to consider
> after all your changes.
>
> Maybe it's just that something in the chain from PG to the actual HD spindles
> kills your random access performance for getting the actual rows.

I am actually starting to think that the SAN may be introducing some amount of
latency that is enough to kill your random IO which is what all of the queries
in question are doing - look up in index - fetch row from table.

If you have the time, it would be totally worth it to test with a local disk
and see how that affects the speed.

I would think that even with RAID5, a SAN with that many spindles would be
quite fast in raw throughput, but perhaps it's just seek latency that's
killing you.

When you run the bonnie tests again, take note of what the seeks/sec is
compared with the old disk.  Also, you should run bonnie with the -b switch to
see if that causes significant slowdown of the writes...maybe minor synced
write activity to pg_xlog is bogging the entire system down.  Is the system
spending most of its time in IO wait?

Also, another item of note might be the actual on disk DB size..I wonder if it
has changed significantly going from SQL_ASCII to UTF8.

In 8.1 you can do this:

SELECT  datname,
         pg_size_pretty(pg_database_size(datname)) AS size
FROM pg_database;

In 7.4, you'll need to install the dbsize contrib module to get the same info.

--
Jeff Frost, Owner     <>
Frost Consulting, LLC     http://www.frostconsultingllc.com/
Phone: 650-780-7908    FAX: 650-649-1954

From:
"Alex Deucher"
Date:

On 3/2/07, Jeff Frost <> wrote:
> On Fri, 2 Mar 2007, Guido Neitzer wrote:
>
> > On 02.03.2007, at 14:20, Alex Deucher wrote:
> >
> >> Ah OK.  I see what you are saying;  thank you for clarifying.  Yes,
> >> the SAN is configured for maximum capacity; it has large RAID 5
> >> groups.  As I said earlier, we never intended to run a DB on the SAN,
> >> it just happened to come up, hence the configuration.
> >
> > So why not dumping the stuff ones, importing into a PG configured to use
> > local discs (Or even ONE local disc, you might have the 16GB you gave as a
> > size for the db on the local machine, right?) and testing whether the problem
> > is with PG connecting to the SAN. So you have one factor less to consider
> > after all your changes.
> >
> > Maybe it's just that something in the chain from PG to the actual HD spindles
> > kills your random access performance for getting the actual rows.
>
> I am actually starting to think that the SAN may be introducing some amount of
> latency that is enough to kill your random IO which is what all of the queries
> in question are doing - look up in index - fetch row from table.
>
> If you have the time, it would be totally worth it to test with a local disk
> and see how that affects the speed.
>
> I would think that even with RAID5, a SAN with that many spindles would be
> quite fast in raw throughput, but perhaps it's just seek latency that's
> killing you.
>
> When you run the bonnie tests again, take note of what the seeks/sec is
> compared with the old disk.  Also, you should run bonnie with the -b switch to
> see if that causes significant slowdown of the writes...maybe minor synced
> write activity to pg_xlog is bogging the entire system down.  Is the system
> spending most of its time in IO wait?
>
> Also, another item of note might be the actual on disk DB size..I wonder if it
> has changed significantly going from SQL_ASCII to UTF8.
>
> In 8.1 you can do this:
>
> SELECT  datname,
>          pg_size_pretty(pg_database_size(datname)) AS size
> FROM pg_database;
>
> In 7.4, you'll need to install the dbsize contrib module to get the same info.
>

I'm beginning the think the same thing.  I'm planning to try the tests
above next week.  I'll let you know what I find out.

Thanks!

Alex