Thread: Beginner optimization questions, esp. regarding Tsearch2 configuration

Beginner optimization questions, esp. regarding Tsearch2 configuration

From
"Carl Youngblood"
Date:
I'm trying to optimize a resume search engine that is using Tsearch2
indexes.  It's running on a dual-opteron 165 system with 4GB of ram
and a raid1 3Gb/sec SATA array.  Each text entry is about 2-3K of
text, and there are about 23,000 rows in the search table, with a goal
of reaching about 100,000 rows eventually.

I'm running Ubuntu 6.06 amd64 server edition.  The raid array is a
software-based linux array with LVM on top of it and the file system
for the database mount point is XFS.  The only optimization I've done
so far is to put the following in /etc/sysctl.conf:

kernel.shmall = 2097152
kernel.shmmax = 2147483648
kernel.shmmni = 4096
kernel.sem = 250 32000 100 128
fs.file-max = 65536

And in postgresql.conf I set the following parameters:

shared_buffers = 131072
work_mem = 65536
max_stack_depth = 4096
max_fsm_pages = 40000
max_fsm_relations = 2000

These probably aren't ideal but I was hoping they would perform a
little better than the defaults.  I got the following results from a
pgbench script I picked up off the web:

CHECKPOINT
===== sync ======
10 concurrent users...
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 10
number of transactions per client: 100
number of transactions actually processed: 1000/1000
tps = 632.146016 (including connections establishing)
tps = 710.474526 (excluding connections establishing)

Once again I don't know if these results are good or not for my hardware.

I have a couple of questions:

- Does anyone have some good advice for optimizing postgres for
tsearch2 queries?
- I noticed that there are six different postmaster daemons running.
Only one of them is taking up a lot of RAM (1076m virtual and 584m
resident).  The second one is using 181m resident while the others are
less than 20m each.  Is it normal to have multiple postmaster
processes?  Even the biggest process doesn't seem to be using near as
much RAM as I have on this machine.  Is that bad?  What percentage of
my physical memory should I expect postgres to use for itself?  How
can I encourage it to cache more query results in memory?

Thanks in advance for your time.

Carl Youngblood

Re: Beginner optimization questions, esp. regarding Tsearch2

From
Richard Huxton
Date:
Carl Youngblood wrote:
> - I noticed that there are six different postmaster daemons running.
> Only one of them is taking up a lot of RAM (1076m virtual and 584m
> resident).  The second one is using 181m resident while the others are
> less than 20m each.  Is it normal to have multiple postmaster
> processes?

You should have one master backend process and one per connection. PG is
a classic multi-process designed server.

 > Even the biggest process doesn't seem to be using near as
> much RAM as I have on this machine.  Is that bad?  What percentage of
> my physical memory should I expect postgres to use for itself?  How
> can I encourage it to cache more query results in memory?

OK - one of the key things with PostgreSQL is that it relies on the O.S.
to cache its disk files. So, allocating too much memory to PG can be
counterproductive.

 From your figures, you're allocating about 64MB to work_mem, which is
per sort. So, a complex query could use several times that amount. If
you don't have many concurrent queries that might be what you want.

Also, you've allocated 1GB to your shared_buffers which is more than I'd
use as a starting point.

You've only mentioned one main table with 100,000 rows, so presumably
you're going to cache the entire DB in RAM. So, you'll want to increase
effective_cache_size and reduce random_page_cost.

--
   Richard Huxton
   Archonet Ltd

Re: Beginner optimization questions, esp. regarding Tsearch2

From
Markus Schaber
Date:
Hi, Richard and Carl,

Richard Huxton wrote:
> Carl Youngblood wrote:
>> - I noticed that there are six different postmaster daemons running.
>> Only one of them is taking up a lot of RAM (1076m virtual and 584m
>> resident).  The second one is using 181m resident while the others are
>> less than 20m each.  Is it normal to have multiple postmaster
>> processes?
>
> You should have one master backend process and one per connection. PG is
> a classic multi-process designed server.

There may be some additional background processes, such as the
background writer, stats collector or autovacuum, depending on your
version and configuration.

HTH,
Markus

--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

Re: Beginner optimization questions, esp. regarding Tsearch2

From
"Carl Youngblood"
Date:
Thanks a lot for the advice Richard.  I will try those things out and
report back to the list.

Carl

On 8/10/06, Richard Huxton <dev@archonet.com> wrote:
>  From your figures, you're allocating about 64MB to work_mem, which is
> per sort. So, a complex query could use several times that amount. If
> you don't have many concurrent queries that might be what you want.
>
> Also, you've allocated 1GB to your shared_buffers which is more than I'd
> use as a starting point.
>
> You've only mentioned one main table with 100,000 rows, so presumably
> you're going to cache the entire DB in RAM. So, you'll want to increase
> effective_cache_size and reduce random_page_cost.

Re: Beginner optimization questions, esp. regarding Tsearch2

From
"Jim C. Nasby"
Date:
On Thu, Aug 10, 2006 at 10:23:55AM +0100, Richard Huxton wrote:
> Carl Youngblood wrote:
> >- I noticed that there are six different postmaster daemons running.
> >Only one of them is taking up a lot of RAM (1076m virtual and 584m
> >resident).  The second one is using 181m resident while the others are
> >less than 20m each.  Is it normal to have multiple postmaster
> >processes?
>
> You should have one master backend process and one per connection. PG is
> a classic multi-process designed server.
>
> > Even the biggest process doesn't seem to be using near as
> >much RAM as I have on this machine.  Is that bad?  What percentage of
> >my physical memory should I expect postgres to use for itself?  How
> >can I encourage it to cache more query results in memory?
>
> OK - one of the key things with PostgreSQL is that it relies on the O.S.
> to cache its disk files. So, allocating too much memory to PG can be
> counterproductive.
>
> From your figures, you're allocating about 64MB to work_mem, which is
> per sort. So, a complex query could use several times that amount. If
> you don't have many concurrent queries that might be what you want.
>
> Also, you've allocated 1GB to your shared_buffers which is more than I'd
> use as a starting point.

See the recent thread about how old rules of thumb for shared_buffers
are now completely bunk. With 4G of memory, setting shared_buffers to 2G
could easily be reasonable. The OP really needs to test several
different values with their actual workload and see what works best.

> You've only mentioned one main table with 100,000 rows, so presumably
> you're going to cache the entire DB in RAM. So, you'll want to increase
> effective_cache_size and reduce random_page_cost.
>
> --
>   Richard Huxton
>   Archonet Ltd
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Beginner optimization questions, esp. regarding Tsearch2

From
"Carl Youngblood"
Date:
I tried setting it to 2GB and postgres wouldn't start.  Didn't
investigate in much greater detail as to why it wouldn't start, but
after switching it back to 1GB it started fine.

On 8/15/06, Jim C. Nasby <jnasby@pervasive.com> wrote:
> See the recent thread about how old rules of thumb for shared_buffers
> are now completely bunk. With 4G of memory, setting shared_buffers to 2G
> could easily be reasonable. The OP really needs to test several
> different values with their actual workload and see what works best.

Re: Beginner optimization questions, esp. regarding Tsearch2

From
"Carl Youngblood"
Date:
By the way, can you please post a link to that thread?

On 8/15/06, Jim C. Nasby <jnasby@pervasive.com> wrote:
> See the recent thread about how old rules of thumb for shared_buffers
> are now completely bunk. With 4G of memory, setting shared_buffers to 2G
> could easily be reasonable. The OP really needs to test several
> different values with their actual workload and see what works best.

Re: Beginner optimization questions, esp. regarding Tsearch2

From
"Jim C. Nasby"
Date:
On Tue, Aug 15, 2006 at 12:47:54PM -0600, Carl Youngblood wrote:
> I tried setting it to 2GB and postgres wouldn't start.  Didn't
> investigate in much greater detail as to why it wouldn't start, but
> after switching it back to 1GB it started fine.

Most likely because you didn't set the kernel's shared memory settings
high enough.

To answer you other question:
http://archives.postgresql.org/pgsql-performance/2006-08/msg00095.php

> On 8/15/06, Jim C. Nasby <jnasby@pervasive.com> wrote:
> >See the recent thread about how old rules of thumb for shared_buffers
> >are now completely bunk. With 4G of memory, setting shared_buffers to 2G
> >could easily be reasonable. The OP really needs to test several
> >different values with their actual workload and see what works best.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
>

--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Beginner optimization questions, esp. regarding

From
Frank Wiles
Date:
On Tue, 15 Aug 2006 12:47:54 -0600
"Carl Youngblood" <carl@youngbloods.org> wrote:

> I tried setting it to 2GB and postgres wouldn't start.  Didn't
> investigate in much greater detail as to why it wouldn't start, but
> after switching it back to 1GB it started fine.
>
> On 8/15/06, Jim C. Nasby <jnasby@pervasive.com> wrote:
> > See the recent thread about how old rules of thumb for
> > shared_buffers are now completely bunk. With 4G of memory, setting
> > shared_buffers to 2G could easily be reasonable. The OP really
> > needs to test several different values with their actual workload
> > and see what works best.

   Sounds like you need to increase your kernel's maximum amount
   of shared memory.  This is typically why an increase in
   shared_buffers causes PostgreSQL not to start.

   Check out this page in the docs for more information:

   http://www.postgresql.org/docs/8.1/static/kernel-resources.html

 ---------------------------------
   Frank Wiles <frank@wiles.org>
   http://www.wiles.org
 ---------------------------------


Re: Beginner optimization questions, esp. regarding Tsearch2

From
"Carl Youngblood"
Date:
The relevant portion of my sysctl.conf file looks like this:

kernel.shmall = 2097152
kernel.shmmax = 2147483648
kernel.shmmni = 4096
kernel.sem = 250 32000 100 128
fs.file-max = 65536

I understood it was a good idea to set shmmax to half of available
memory (2GB in this case).  I assume that I need to set shared_buffers
slightly lower than 2GB for postgresql to start successfully.

Carl

On 8/15/06, Jim C. Nasby <jnasby@pervasive.com> wrote:
> On Tue, Aug 15, 2006 at 12:47:54PM -0600, Carl Youngblood wrote:
> > I tried setting it to 2GB and postgres wouldn't start.  Didn't
> > investigate in much greater detail as to why it wouldn't start, but
> > after switching it back to 1GB it started fine.
>
> Most likely because you didn't set the kernel's shared memory settings
> high enough.

Re: Beginner optimization questions, esp. regarding Tsearch2

From
"Jim C. Nasby"
Date:
On Wed, Aug 16, 2006 at 09:34:24AM -0600, Carl Youngblood wrote:
> The relevant portion of my sysctl.conf file looks like this:
>
> kernel.shmall = 2097152
> kernel.shmmax = 2147483648
> kernel.shmmni = 4096
> kernel.sem = 250 32000 100 128
> fs.file-max = 65536
>
> I understood it was a good idea to set shmmax to half of available
> memory (2GB in this case).  I assume that I need to set shared_buffers

I don't see any reason to do that, so long as you have control over
what's being run on the system. Just set it to 3000000000 or so.

> slightly lower than 2GB for postgresql to start successfully.
>
> Carl
>
> On 8/15/06, Jim C. Nasby <jnasby@pervasive.com> wrote:
> >On Tue, Aug 15, 2006 at 12:47:54PM -0600, Carl Youngblood wrote:
> >> I tried setting it to 2GB and postgres wouldn't start.  Didn't
> >> investigate in much greater detail as to why it wouldn't start, but
> >> after switching it back to 1GB it started fine.
> >
> >Most likely because you didn't set the kernel's shared memory settings
> >high enough.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>       choose an index scan if your joining column's datatypes do not
>       match
>

--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461