Re: Sustained inserts per sec ... ? - Mailing list pgsql-performance

From Simon Riggs
Subject Re: Sustained inserts per sec ... ?
Date
Msg-id 1112645479.16721.806.camel@localhost.localdomain
Whole thread Raw
In response to Re: Sustained inserts per sec ... ?  (Christopher Petrilli <petrilli@gmail.com>)
Responses Re: Sustained inserts per sec ... ?
List pgsql-performance
On Mon, 2005-04-04 at 15:56 -0400, Christopher Petrilli wrote:
> On Apr 4, 2005 3:46 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > On Mon, 2005-04-04 at 09:48 -0400, Christopher Petrilli wrote:
> > > The point, in the rough middle, is where the program begins inserting
> > > into a new table (inherited). The X axis is the "total" number of rows
> > > inserted.
> >
> > and you also mention the same data plotted with elapsed time:
> > http://www.amber.org/~petrilli/diagrams/pgsql_copyperf_timeline.png
> >
> > Your graphs look identical to others I've seen, so I think we're
> > touching on something wider than your specific situation. The big
> > difference is that things seem to go back to high performance when you
> > switch to a new inherited table.
>
> This is correct.
>
> > I'm very interested in the graphs of elapsed time for COPY 500 rows
> > against rows inserted. The simplistic inference from those graphs are
> > that if you only inserted 5 million rows into each table, rather than 10
> > million rows then everything would be much quicker. I hope this doesn't
> > work, but could you try that to see if it works? I'd like to rule out a
> > function of "number of rows" as an issue, or focus in on it depending
> > upon the results.

Any chance of running a multiple load of 4 million rows per table,
leaving the test running for at least 3 tables worth (12+ M rows)?

> >
> > Q: Please can you confirm that the discontinuity on the graph at around
> > 5000 elapsed seconds matches EXACTLY with the switch from one table to
> > another? That is an important point.
>
> Well, the change over happens at 51593.395205 seconds :-)  Here's two
> lines from the results with row count and time added:
>
> 10000000    51584.9818912    8.41331386566
> 10000500    51593.395205    0.416964054108
>
> Note that 10M is when it swaps.  I see no reason to interpret it
> differently, so it seems to be totally based around switching tables
> (and thereby indices).

OK, but do you have some other external knowledge that it is definitely
happening at that time? Your argument above seems slightly circular to
me.

This is really important because we need to know whether it ties in with
that event, or some other.

Have you run this for more than 2 files, say 3 or more?

You COMMIT after each 500 rows?

> > Q: How many data files are there for these relations? Wouldn't be two,
> > by any chance, when we have 10 million rows in them?
>
> I allow PostgreSQL to manage all the data files itself, so here's the
> default tablespace:
>
> total 48
> drwx------  2 pgsql pgsql 4096 Jan 26 20:59 1
> drwx------  2 pgsql pgsql 4096 Dec 17 19:15 17229
> drwx------  2 pgsql pgsql 4096 Feb 16 17:55 26385357
> drwx------  2 pgsql pgsql 4096 Mar 24 23:56 26425059
> drwx------  2 pgsql pgsql 8192 Mar 28 11:31 26459063
> drwx------  2 pgsql pgsql 8192 Mar 31 23:54 26475755
> drwx------  2 pgsql pgsql 4096 Apr  4 15:07 26488263
> [root@bigbird base]# du
> 16624   ./26425059
> 5028    ./26385357
> 5660    ./26459063
> 4636    ./17229
> 6796    ./26475755
> 4780    ./1
> 1862428 ./26488263
> 1905952 .

OK. Please...
cd $PGDATA/base/26488263
ls -l

I'm looking for the number of files associated with each inherited table
(heap).

> > Q: What is the average row length?
> > About 150-160 bytes?
>
> Raw data is around 150bytes, after insertion, I'd need to do some
> other calculations.

By my calculations, you should have just 2 data files per 10M rows for
the main table. The performance degradation seems to coincide with the
point where we move to inserting into the second of the two files.

I'm not looking for explanations yet, just examining coincidences and
trying to predict the behaviour based upon conjectures.

Best Regards, Simon Riggs


pgsql-performance by date:

Previous
From: Christopher Petrilli
Date:
Subject: Re: Sustained inserts per sec ... ?
Next
From: Christopher Petrilli
Date:
Subject: Re: Sustained inserts per sec ... ?