Thread: Insert performance

Insert performance

From

"Shridhar Daithankar"

Date:

16 August 2003, 05:38:39

Hi all,

Couple of days ago, one of my colleague, Rahul Iyer posted a query regarding
insert performance of 5M rows. A common suggestion was to use copy.

Unfortunately he can not use copy due to some constraints. I was helping him to
get maximum out of it. We were playing with a data set of 500K rows on SunOS5.6
and postgresql 7.3.3

Best we could get was 500K records in 355 sec. That's roughly 1400 inserts per
sec. This was with default settings and around 10K inserts per transaction.
Postgresql was hogging all the CPU during load.

Troubled by this, I set up a similar database at home. That is a Athlon
XP2000+/512MB machine with a 40GB seagate disk. It is running slackware 9.0
with 2.4.20(IIRC).

I have attached the results of my experiements and the code I used to
benchmark. It was a simple table with an integer and a varchar(30) field.

I was really amazed to see the numbers. First of all, it beat the sunOS machine
 left and right. Bruce posted some numbers of 9K inserts/sec. Here we see the
same.

Secondly I also played with filesystems. Ext3 does not seem to be performing as
good. Reiser and ext2 did fine. Unfortunately the kernel didn't support XFS/JFS
so could not test them.

I have also attached the raw benchmark data in kspread format, for the curious.
Didn't exported to anything else because kspread had troubles with exporting
formula values.

I also noticed that reiser takes hell lot more CPU than ext2 and ext3. It
nearly peaks out all CPU capacity. Not so with ext2.

Comments? One thing I can't help to notice is sunOs is not on same scale. The
sunOS machine is a 1GB RAM machine. It has oracle and mysql running on it and
have 300MB swap in use but I am sure it has SCSI disk and in all respect I
would rather expect a RISC machine to perform better than an athlon XP machine,
at least for an IO.

If you want me to post details of sparc machine, please let me know how do I
find it. I have never worked with sparcs earlier and have no intention of doing
this again..:-)

Bye
 Shridhar

--
Fourth Law of Applied Terror:    The night before the English History mid-term,
your Biology    instructor will assign 200 pages on planaria.Corollary:    Every
instructor assumes that you have nothing else to do except    study for that
instructor's course.

Attachment

Re: Insert performance

From

Josh Berkus

Date:

16 August 2003, 17:33:24

Shridhar,

> Unfortunately he can not use copy due to some constraints.

Why not use COPY to load the table, and then apply the constraints by query
afterwords?  It might not be faster, but then again it might.

> I was really amazed to see the numbers. First of all, it beat the sunOS
> machine left and right. Bruce posted some numbers of 9K inserts/sec. Here
> we see the same.
<snip>
> Comments? One thing I can't help to notice is sunOs is not on same scale.
> The sunOS machine is a 1GB RAM machine. It has oracle and mysql running on
> it and have 300MB swap in use but I am sure it has SCSI disk and in all
> respect I would rather expect a RISC machine to perform better than an
> athlon XP machine, at least for an IO.

It's been reported on this list several times that Solaris is the *worst* of
the *nixes for PostgreSQL performance.   No analysis has been posted as to
why; my own thoughts are:
    - Solaris' multi-threaded architecture which imposes a hefty per-process
overhead, about triple that of Linux, slowing new connections and large
multi-user activity;
    - Poor filesystem management; Sun simply hasn't kept up with IBM, Reiser, Red
Hat and BSD in developing filesystems.
    ... but that's based on inadequate experimentation, just a few tests on
Bonnie++ on a Netra running Solaris 8.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: Insert performance

From

Christopher Browne

Date:

16 August 2003, 19:50:00

Martha Stewart called it a Good Thing whenjosh@agliodbs.com (Josh Berkus)wrote:
> Shridhar,
>> Unfortunately he can not use copy due to some constraints.

> Why not use COPY to load the table, and then apply the constraints
> by query afterwords?  It might not be faster, but then again it
> might.

If you can transform the information into COPYable form, that's
certainly a good thing.  Whether it will be workable or not is another
question.

> ... but that's based on inadequate experimentation, just a few tests
> on Bonnie++ on a Netra running Solaris 8.

As far as the filesystem issues are concerned, you're probably using
an "old, standard" version of UFS.  The "high performance" option on
Solaris involves using third-party Veritas software.

The persistence of this is somewhat surprising; I'd somewhat have
expected Sun to have bought out Veritas or some such thing, as it's a
pretty vital technology that _isn't_ totally under their control.
Actually, an entertaining option would be for them to buy out SGI, as
that would get them control of XFS and a number of other interesting
technologies.

My expectations of a Netra wouldn't be terribly high, either; they
seem to exist as a product so that people that need a cheap Sun box
have an option.  They are mostly running IDE disk, and the latest
IA-32 hardware is likely to have newer faster interface options.
--
(format nil "~S@~S" "cbbrowne" "ntlug.org")
http://cbbrowne.com/info/advocacy.html
"Windows NT was designed to be administered by an idiot and usually
is..."  -- Chris Adams" <adamsc@gigante.ml.org>

Re: Insert performance

From

"Shridhar Daithankar"

Date:

18 August 2003, 03:23:33

On 16 Aug 2003 at 11:40, Josh Berkus wrote:

> Shridhar,
>
> > Unfortunately he can not use copy due to some constraints.
>
> Why not use COPY to load the table, and then apply the constraints by query
> afterwords?  It might not be faster, but then again it might.

Lol.. The constraints I mentioned weren't database constraints.. It need to
interface with another database module.. The data import can not be text..
that's a project requirement..

I should have been clearer in first place..

Bye
 Shridhar

--
O'Toole's commentary on Murphy's Law:    Murphy was an optimist.

Re: Insert performance

From

"Shridhar Daithankar"

Date:

18 August 2003, 14:32:18

On 18 Aug 2003 at 18:52, Hannu Krosing wrote:
> My own experimentation also got numbers in 9k/sec range (on a quad
> 1.3GHz Xeons, 2GM mem, 50MB/sec raid) when doing 10-20 parallel runs of
> ~1000 inserts/transaction.
>
> Performance dropped to ~300/sec (at about 60M rows) when I added an
> index (primary key) - as I did random inserts, the hit rates for index
> pages were probably low.

I was loading a geographic data couple of months back.. It was 3GB data when
loaded in postgresql.

I tried loading data first and creating index later. It ran out of available
9GB space. So I created index on an empty table and started loading it. It was
slow but at least finished after 3 hours... Co-incidentally oracle had same
problems as well. So creating index beforehand remains only option at times, it
seems. Tom remarked that it shouldn't have made difference but apparently it
does..

You mentioned parallel runs and still getting 9K/sec. Was that overall 9K or
per connection? If it is former, probably WAL is hit too hard. You could do
some additional testing by having WALit's own disk.

I was plannning to do the multiple writers test. But since objective was to
find out why postgresql was so slow and it tunred out to be slowaris in first
place, didn't have any enthu. left.

I recommended my colleague to move to linux. But apparently this product is a
part of suit which runs on some HUGE solaris machines. So if it has to run
postgresql, it has to run sunos. I hope they are faster with SCSI..

Bye
 Shridhar

--
On my planet, to rest is to rest -- to cease using energy.  To me, itis quite
illogical to run up and down on green grass, using energy,instead of saving it.
    -- Spock, "Shore Leave", stardate 3025.2

Re: Insert performance

From

Josh Berkus

Date:

18 August 2003, 15:48:13

Chris,

> My expectations of a Netra wouldn't be terribly high, either; they
> seem to exist as a product so that people that need a cheap Sun box
> have an option.  They are mostly running IDE disk, and the latest
> IA-32 hardware is likely to have newer faster interface options.

Neither are ours ... we bought an array of 1U Netras as a dot-com closeout.
At $330 per machine including OS, they were a darned good deal, and with 4 of
them the redundancy makes up for the lack of individual server performance.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: Insert performance

From

Hannu Krosing

Date:

18 August 2003, 16:11:05

Shridhar Daithankar kirjutas E, 18.08.2003 kell 09:21:
> On 16 Aug 2003 at 11:40, Josh Berkus wrote:
>
> > Shridhar,
> >
> > > Unfortunately he can not use copy due to some constraints.
> >
> > Why not use COPY to load the table, and then apply the constraints by query
> > afterwords?  It might not be faster, but then again it might.
>
> Lol.. The constraints I mentioned weren't database constraints..

Can't you still apply them later ;)

My own experimentation also got numbers in 9k/sec range (on a quad
1.3GHz Xeons, 2GM mem, 50MB/sec raid) when doing 10-20 parallel runs of
~1000 inserts/transaction.

Performance dropped to ~300/sec (at about 60M rows) when I added an
index (primary key) - as I did random inserts, the hit rates for index
pages were probably low.

--------------
Hannu

Re: Insert performance

From

Hannu Krosing

Date:

18 August 2003, 19:17:41

Shridhar Daithankar kirjutas E, 18.08.2003 kell 19:02:
> On 18 Aug 2003 at 18:52, Hannu Krosing wrote:
> > My own experimentation also got numbers in 9k/sec range (on a quad
> > 1.3GHz Xeons, 2GM mem, 50MB/sec raid) when doing 10-20 parallel runs of
> > ~1000 inserts/transaction.
> >
> > Performance dropped to ~300/sec (at about 60M rows) when I added an
> > index (primary key) - as I did random inserts, the hit rates for index
> > pages were probably low.
>
> I was loading a geographic data couple of months back.. It was 3GB data when
> loaded in postgresql.

With or without indexes ?

> I tried loading data first and creating index later. It ran out of available
> 9GB space. So I created index on an empty table and started loading it. It was
> slow but at least finished after 3 hours... Co-incidentally oracle had same
> problems as well. So creating index beforehand remains only option at times, it
> seems. Tom remarked that it shouldn't have made difference but apparently it
> does..

Tom just fixed some memory leaks on array indexing the other day. Could
there be something like that on geographic types ?

> You mentioned parallel runs and still getting 9K/sec. Was that overall 9K or
> per connection?

Overall. But notice that my setup was (a little) slower per processor.

>  If it is former, probably WAL is hit too hard. You could do
> some additional testing by having WALit's own disk.

I guess that todays IDE disks are about the same speed (~50MB/sec) as my
test RAID was.

I run multiple parallel runs to have a chance to use all 4 processors
(but IIRC it was heavyly IO-bound) as well as to better use writing time
on WAL platters (not to wait for full rotation on each platter)

--------------
Hannu

Re: Insert performance

From

"Shridhar Daithankar"

Date:

19 August 2003, 03:27:09

On 18 Aug 2003 at 9:57, Josh Berkus wrote:

> Chris,
>
> > My expectations of a Netra wouldn't be terribly high, either; they
> > seem to exist as a product so that people that need a cheap Sun box
> > have an option.  They are mostly running IDE disk, and the latest
> > IA-32 hardware is likely to have newer faster interface options.
>
> Neither are ours ... we bought an array of 1U Netras as a dot-com closeout.
> At $330 per machine including OS, they were a darned good deal, and with 4 of
> them the redundancy makes up for the lack of individual server performance.

I am sure they would run much better with linux on them rather than solaris..

Bye
 Shridhar

--
Shannon's Observation:    Nothing is so frustrating as a bad situation that is
beginning to    improve.

Re: Insert performance

From

"Shridhar Daithankar"

Date:

19 August 2003, 03:31:54

On 19 Aug 2003 at 1:16, Hannu Krosing wrote:

> Shridhar Daithankar kirjutas E, 18.08.2003 kell 19:02:
> > I was loading a geographic data couple of months back.. It was 3GB data when
> > loaded in postgresql.
>
> With or without indexes ?

Without index. Index was another 3 GB with 50% utilisation. It was 81M rows
with 3 floats each..

>
> > I tried loading data first and creating index later. It ran out of available
> > 9GB space. So I created index on an empty table and started loading it. It was
> > slow but at least finished after 3 hours... Co-incidentally oracle had same
> > problems as well. So creating index beforehand remains only option at times, it
> > seems. Tom remarked that it shouldn't have made difference but apparently it
> > does..
>
> Tom just fixed some memory leaks on array indexing the other day. Could
> there be something like that on geographic types ?

Dunno.. This was 7.3.2 or earlier.. Later the project abandoned all types of
databases and went to in memory structures since flat data was of the order of
200MB. Now they aer returning to databases as flat data is approaching excess
of 3 GB..

God knows what will they do next. It's an ideal example what schedule pressure
can do to architecture design of a software..

Bye
 Shridhar

--
Air Force Inertia Axiom:    Consistency is always easier to defend than correctness.