Thread: performance while importing a very large data set in to database

performance while importing a very large data set in to database

From

"Ashish Kumar Singh"

Date:

05 December 2009, 14:44:29

Hello Everyone,

I have a very bit big database around 15 million in size, and the dump file is around 12 GB.

While importing this dump in to database I have noticed that initially query response time is very slow but it does improves with time.

Any suggestions to improve performance after dump in imported in to database will be highly appreciated!

Regards,

Ashish

Re: performance while importing a very large data set in to database

From

Jeremy Harris

Date:

05 December 2009, 16:53:49

On 12/02/2009 11:31 PM, Ashish Kumar Singh wrote:
> While importing this dump in to database I have noticed that initially
> query response time is very slow but it does improves with time.
>
> Any suggestions to improve performance after dump in imported in to
> database will be highly appreciated!

Analyse your tables?
-J

Re: performance while importing a very large data set in to database

From

"Ing . Marcos Luís Ortíz Valmaseda"

Date:

05 December 2009, 17:07:19

Ashish Kumar Singh escribió:
>
> Hello Everyone,
>
>
>
> I have a very bit big database around 15 million in size, and the dump
> file is around 12 GB.
>
> While importing this dump in to database I have noticed that initially
> query response time is very slow but it does improves with time.
>
> Any suggestions to improve performance after dump in imported in to
> database will be highly appreciated!
>
>
>
>
>
>
>
> Regards,
>
> Ashish
>
My suggestion is:
1- Afterward of the db restore, you can do a vacuum analyze manually on
your big tables to erase all dead rows
2- Then you can reindex your big tables on any case that you use it.
3- Then apply A CLUSTER command on the right tables that have these indexes.

Regards


--
-------------------------------------
"TIP 4: No hagas 'kill -9' a postmaster"
Ing. Marcos Luís Ortíz Valmaseda
PostgreSQL System DBA
Centro de Tecnologías de Almacenamiento y Anális de Datos (CENTALAD)
Universidad de las Ciencias Informáticas

Linux User # 418229
http://www.postgresql-es.org
http://www.postgresql.org
http://www.planetpostgresql.org

Re: performance while importing a very large data set in to database

From

Scott Marlowe

Date:

05 December 2009, 17:13:27

On Wed, Dec 2, 2009 at 4:31 PM, Ashish Kumar Singh
<ashishkumar.singh@altair.com> wrote:
> Hello Everyone,
>
> I have a very bit big database around 15 million in size, and the dump file
> is around 12 GB.
>
> While importing this dump in to database I have noticed that initially query
> response time is very slow but it does improves with time.
>
> Any suggestions to improve performance after dump in imported in to database
> will be highly appreciated!

This is pretty normal.  When the db first starts up or right after a
load it has nothing in its buffers or the kernel cache.  As you access
more and more data the db and OS learned what is most commonly
accessed and start holding onto those data and throw the less used
stuff away to make room for it.  Our production dbs run at a load
factor of about 4 to 6, but when first started and put in the loop
they'll hit 25 or 30 and have slow queries for a minute or so.

Having a fast IO subsystem will help offset some of this, and
sometimes "select * from bigtable" might too.

Re: performance while importing a very large data set in to database

From

Scott Marlowe

Date:

05 December 2009, 19:10:07

On Sat, Dec 5, 2009 at 7:16 AM, "Ing . Marcos Luís Ortíz Valmaseda"
<mlortiz@uci.cu> wrote:
> Ashish Kumar Singh escribió:
>>
>> Hello Everyone,
>>
>>
>> I have a very bit big database around 15 million in size, and the dump
>> file is around 12 GB.
>>
>> While importing this dump in to database I have noticed that initially
>> query response time is very slow but it does improves with time.
>>
>> Any suggestions to improve performance after dump in imported in to
>> database will be highly appreciated!
>>
>>
>>
>>
>> Regards,
>>
>> Ashish
>>
> My suggestion is:
> 1- Afterward of the db restore, you can do a vacuum analyze manually on your
> big tables to erase all dead rows

Well, there should be no dead rows, it's a fresh restore, so just
plain analyze would be enough.  Note that autovacuum will kick in
eventually and do this for you.

> 2- Then you can reindex your big tables on any case that you use it.

Again, a freshly loaded db does not need to be reindexed.  The indexes
are fresh and new and clean.

> 3- Then apply A CLUSTER command on the right tables that have these indexes.

Now that's useful, but if you're gonna cluster, do it FIRST, then
analyze the tables.

Re: performance while importing a very large data set in to database

From

Pierre Frédéric Caillaud

Date:

06 December 2009, 10:13:07

>> I have a very bit big database around 15 million in size, and the dump
>> file
>> is around 12 GB.
>>
>> While importing this dump in to database I have noticed that initially
>> query
>> response time is very slow but it does improves with time.
>>
>> Any suggestions to improve performance after dump in imported in to
>> database
>> will be highly appreciated!
>
> This is pretty normal.  When the db first starts up or right after a
> load it has nothing in its buffers or the kernel cache.  As you access
> more and more data the db and OS learned what is most commonly
> accessed and start holding onto those data and throw the less used
> stuff away to make room for it.  Our production dbs run at a load
> factor of about 4 to 6, but when first started and put in the loop
> they'll hit 25 or 30 and have slow queries for a minute or so.
>
> Having a fast IO subsystem will help offset some of this, and
> sometimes "select * from bigtable" might too.


Maybe it's the updating of the the hint bits ?...

Re: performance while importing a very large data set in to database

From

Kris Kewley

Date:

06 December 2009, 10:16:38

Does postgres have the concept of "pinning" procs, functions, etc to
cache.

As you mention, typically frequently used statements are cached
improving performance.

If waiting for the DBMS to do this is not an option then pinning
critical ones should improve performance immediately following start up.

This is an approach I have used with oracle to address this situation.

Kris

On 5-Dec-09, at 15:42, Scott Marlowe <scott.marlowe@gmail.com> wrote:

> On Wed, Dec 2, 2009 at 4:31 PM, Ashish Kumar Singh
> <ashishkumar.singh@altair.com> wrote:
>> Hello Everyone,
>>
>> I have a very bit big database around 15 million in size, and the
>> dump file
>> is around 12 GB.
>>
>> While importing this dump in to database I have noticed that
>> initially query
>> response time is very slow but it does improves with time.
>>
>> Any suggestions to improve performance after dump in imported in to
>> database
>> will be highly appreciated!
>
> This is pretty normal.  When the db first starts up or right after a
> load it has nothing in its buffers or the kernel cache.  As you access
> more and more data the db and OS learned what is most commonly
> accessed and start holding onto those data and throw the less used
> stuff away to make room for it.  Our production dbs run at a load
> factor of about 4 to 6, but when first started and put in the loop
> they'll hit 25 or 30 and have slow queries for a minute or so.
>
> Having a fast IO subsystem will help offset some of this, and
> sometimes "select * from bigtable" might too.
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org
> )
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

Re: performance while importing a very large data set in to database

From

Greg Smith

Date:

06 December 2009, 11:15:59

Kris Kewley wrote:
> Does postgres have the concept of "pinning" procs, functions, etc to
> cache.
>
No.  Everything that's in PostgreSQL's cache gets a usage count attached
to is.  When the buffer is used by something else, that count gets
incremented.  And when new buffers need to be allocated, the process
that searches for them decrements usage counts until it find one with a
count of 0 that's then evicted.  There's no way to pin things using this
scheme, the best you can do is try to access the data in advance and/or
regularly enough that its usage count never drops too far.

--
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com  www.2ndQuadrant.com