Re: 15,000 tables - next step - Mailing list pgsql-performance

From Jan Wieck
Subject Re: 15,000 tables - next step
Date
Msg-id 43950068.3040402@Yahoo.com
Whole thread Raw
In response to Re: 15,000 tables - next step  (Michael Riess <mlriess@gmx.de>)
List pgsql-performance
On 12/4/2005 4:33 AM, Michael Riess wrote:
> I will do the following:
>
> - switch to 10k buffers on a 1GB machine, 20k buffers on a 2GB machine
> - try to optimize my connection polls to remember which apps (groups of
> 30 tables) were accessed, so that there is a better chance of using caches
> - "swap out" tables which are rarely used: export the content, drop the
> table, and re-create it on the fly upon access.

I hacked pgbench a little and did some tests (finally had to figure out
for myself if there is much of an impact with hundreds or thousands of
tables).

The changes done to pgbench:

     - Use the [-s n] value allways, instead of determining the
       scaling from the DB.

     - Lower the number of accounts per scaling factor to 10,000.

     - Add another scaling type. Option [-a n] splits up the test
       into n schemas, each containing [-s n] branches.

The tests were performed on a 667 MHz P3, 640MB Ram with a single IDE
disk. All tests were IO bound. In all tests the number of clients was 5
default transaction and 50 readonly (option -S). The FreeBSD kernel of
the system is configured to handle up to 50,000 open files, fully cache
directories in virtual memory and to lock all shared memory into
physical ram.

The different scalings used were

     init -a1 -s3000
     run  -a1 -s300

and

     init -a3000 -s1
     run  -a300 -s1

The latter creates a database of 12,000 tables with 1,200 of them
actually in use during the test. Both databases are about 4 GB in size.

The performance loss for going from -s3000 to -a3000 is about 10-15%.

The performance gain for going from 1,000 shared_buffers to 48,000 is
roughly 70% (-a3000 test case) and 100% (-s3000 test case).

Conclusion: The right shared memory configuration easily outperforms the
loss from increase in number of tables, given that the kernel is
configured to be up to the task of dealing with thousands of files
accessed by that number of backends too.


Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #

pgsql-performance by date:

Previous
From: Ron
Date:
Subject: Re: BLCKSZ
Next
From: David Lang
Date:
Subject: Re: two disks - best way to use them?