Thread: Insert performance, what should I expect?

Insert performance, what should I expect?

From
Brock Henry
Date:
Hi,

I've after some opinions about insert performance.

I'm importing a file with 13,002 lines to a database that ends up with
75,703 records across 6 tables. This is a partial file – the real data
is 4 files with total lines 95174. I'll be loading these files each
morning, and then running a number of queries on them.

The select queries run fast enough, (mostly - 2 queries are slow but
I'll look into that later), but importing is slower than I'd like it
to be, but I'm wondering what to expect?

I've done some manual benchmarking running my script 'time script.pl'
I realise my script uses some of the time, bench marking shows that
%50 of the time is spent in dbd:execute.

Test 1, For each import, I'm dropping all indexes and pkeys/fkeys,
then importing, then adding keys and indexes. Then I've got successive
runs. I figure the reindexing will get more expensive as the database
grows?

Successive Imports: 44,49,50,57,55,61,72 (seconds)
= average 1051inserts/second (which now that I've written this seems
fairly good)

Test 2, no dropping etc of indexes, just INSERTs
Import – 61, 62, 73, 68, 78, 74 (seconds)
= average 1091 inserts/second

Machine is Linux 2.6.4, 1GB RAM, 3.something GHz XEON processor, SCSI
hdd's (raid1). PostgreSQL 7.4.2. Lightly loaded machine, not doing
much other than my script. Script and DB on same machine.

Sysctl –a | grep shm
kernel.shmmni = 4096
kernel.shmall = 134217728     (pages or bytes? Anyway…)
kernel.shmmax = 134217728

postgresql.conf
tcpip_socket = true
max_connections = 32
superuser_reserved_connections = 2
shared_buffers = 8192
sort_mem = 4096
vacuum_mem = 16384
max_fsm_relations = 300
fsync = true
wal_buffers = 64
checkpoint_segments = 10
effective_cache_size = 16000
syslog = 1
silent_mode = false
log_connections = true
log_pid = true
log_timestamp = true
stats_start_collector = true
stats_row_level = true

Can I expect it to go faster than this? I'll see where I can make my
script itself go faster, but I don't think I'll be able to do much.
I'll do some pre-prepare type stuff, but I don't expect significant
gains, maybe 5-10%. I'd could happily turn off fsync for this job, but
not for some other databases the server is hosting.

Any comments/suggestions would be appreciated.

Thanks :)

Brock Henry

Re: Insert performance, what should I expect?

From
"Matthew T. O'Connor"
Date:
Brock Henry wrote:

>Hi,
>
>I've after some opinions about insert performance.
>
Have you looked into using the copy command instead of inserts?  For
bulk loading of data it can be significantly faster.

Re: Insert performance, what should I expect?

From
Rod Taylor
Date:
> I've done some manual benchmarking running my script 'time script.pl'
> I realise my script uses some of the time, bench marking shows that
> %50 of the time is spent in dbd:execute.

The perl drivers don't currently use database level prepared statements
which would give a small boost.

But your best bet is to switch to using COPY instead of INSERT. Two ways
to do this.

1) Drop DBD::Pg and switch to the Pg driver for Perl instead (non-DBI
compliant) which has functions similar to putline() that allow COPY to
be used.

2) Have your perl script output a .sql file with the data prepared (COPY
statements) which you feed into the database via psql.

You can probably achieve a 50% increase in throughput.


Re: Insert performance, what should I expect?

From
Andrew McMillan
Date:
On Wed, 2004-10-20 at 11:53 +1000, Brock Henry wrote:
>
> Test 1, For each import, I'm dropping all indexes and pkeys/fkeys,
> then importing, then adding keys and indexes. Then I've got successive
> runs. I figure the reindexing will get more expensive as the database
> grows?

Sounds like the right approach to me, if the tables are empty before the
import.


> Successive Imports: 44,49,50,57,55,61,72 (seconds)
> = average 1051inserts/second (which now that I've written this seems
> fairly good)

(A) Are you doing the whole thing inside a transaction?  This will be
significantly quicker.  COPY would probably be quicker still, but the
biggest difference will be a single transaction.

(B) If you are starting with empty files, are you ensuring that the dead
records are vacuumed before you start?  I would recommend a "vacuum
full" on the affected tables prior to the first import run (i.e. when
the tables are empty).  This is likely to be the reason that the timing
on your successive imports increases so much.



> sort_mem = 4096

You probably want to increase this - if you have 1G of RAM then there is
probably some spare.  But if you actually expect to use 32 connections
then 32 * 4M = 128M might mean a careful calculation is needed.  If you
are really only likely to have 1-2 connections running concurrently then
increase it to (e.g.) 32768.

> max_fsm_relations = 300

If you do a "vacuum full verbose;" the last line will give you some
clues as to what to set this (and max_fsm_pages) too.


> effective_cache_size = 16000

16000 * 8k = 128M seems low for a 1G machine - probably you could say
64000 without fear of being wrong.  What does "free" show as "cached"?
Depending on how dedicated the machine is to the database, the effective
cache size may be as much as 80-90% of that.


> Can I expect it to go faster than this? I'll see where I can make my
> script itself go faster, but I don't think I'll be able to do much.
> I'll do some pre-prepare type stuff, but I don't expect significant
> gains, maybe 5-10%. I'd could happily turn off fsync for this job, but
> not for some other databases the server is hosting.

You can probably double the speed - maybe more.

Cheers,
                    Andrew,
-------------------------------------------------------------------------
Andrew @ Catalyst .Net .NZ  Ltd,  PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/            PHYS: Level 2, 150-154 Willis St
DDI: +64(4)803-2201      MOB: +64(272)DEBIAN      OFFICE: +64(4)499-2267
              How many things I can do without! -- Socrates
-------------------------------------------------------------------------


Attachment

Re: Insert performance, what should I expect?

From
Robert Creager
Date:
When grilled further on (Tue, 19 Oct 2004 22:12:28 -0400),
Rod Taylor <pg@rbt.ca> confessed:

> > I've done some manual benchmarking running my script 'time script.pl'
> > I realise my script uses some of the time, bench marking shows that
> > %50 of the time is spent in dbd:execute.
> >
> 1) Drop DBD::Pg and switch to the Pg driver for Perl instead (non-DBI
> compliant) which has functions similar to putline() that allow COPY to
> be used.

COPY can be used with DBD::Pg, per a script I use:

$dbh->do( "COPY temp_obs_$band ( $col_list ) FROM stdin" );
$dbh->func( join ( "\t", @data ) . "\n", 'putline' );
$dbh->func( "\\.\n", 'putline' );
$dbh->func( 'endcopy' );

With sets of data from 1000 to 8000 records, my COPY performance is consistent
at ~10000 records per second.

Cheers,
Rob

--
 10:39:31 up 2 days, 16:25,  2 users,  load average: 2.15, 2.77, 3.06
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004

Attachment

Re: Insert performance, what should I expect?

From
Rod Taylor
Date:
On Wed, 2004-10-20 at 12:45, Robert Creager wrote:
> When grilled further on (Tue, 19 Oct 2004 22:12:28 -0400),
> Rod Taylor <pg@rbt.ca> confessed:
>
> > > I've done some manual benchmarking running my script 'time script.pl'
> > > I realise my script uses some of the time, bench marking shows that
> > > %50 of the time is spent in dbd:execute.
> > >
> > 1) Drop DBD::Pg and switch to the Pg driver for Perl instead (non-DBI
> > compliant) which has functions similar to putline() that allow COPY to
> > be used.
>
> COPY can be used with DBD::Pg, per a script I use:
>
> $dbh->do( "COPY temp_obs_$band ( $col_list ) FROM stdin" );
> $dbh->func( join ( "\t", @data ) . "\n", 'putline' );
> $dbh->func( "\\.\n", 'putline' );
> $dbh->func( 'endcopy' );

Thanks for that. All of the conversations I've seen on the subject
stated that DBD::Pg only supported standard DB features -- copy not
amongst them.

> With sets of data from 1000 to 8000 records, my COPY performance is consistent
> at ~10000 records per second.

Well done.



Re: Insert performance, what should I expect?

From
Gaetano Mendola
Date:
Brock Henry wrote:
 > Any comments/suggestions would be appreciated.

Tune also the disk I/O elevator.

look at this: http://www.varlena.com/varlena/GeneralBits/49.php


Regards
Gaetano Mendola



Re: Insert performance, what should I expect?

From
"Steinar H. Gunderson"
Date:
On Sat, Oct 23, 2004 at 12:31:32PM +0200, Gaetano Mendola wrote:
>> Any comments/suggestions would be appreciated.
> Tune also the disk I/O elevator.
>
> look at this: http://www.varlena.com/varlena/GeneralBits/49.php

Mm, interesting. I've heard somewhere that the best for database-like loads
on Linux is to disable the anticipatory I/O scheduler
(http://kerneltrap.org/node/view/567), which should probably
influence the numbers for elvtune also -- anybody know whether this is true
or not for PostgreSQL?

/* Steinar */
--
Homepage: http://www.sesse.net/