Thread: performance while importing a very large data set in to database
Hello Everyone,
I have a very bit big database around 15 million in size, and the dump file is around 12 GB.
While importing this dump in to database I have noticed that initially query response time is very slow but it does improves with time.
Any suggestions to improve performance after dump in imported in to database will be highly appreciated!
Regards,
Ashish
On 12/02/2009 11:31 PM, Ashish Kumar Singh wrote: > While importing this dump in to database I have noticed that initially > query response time is very slow but it does improves with time. > > Any suggestions to improve performance after dump in imported in to > database will be highly appreciated! Analyse your tables? -J
Re: performance while importing a very large data set in to database
From
"Ing . Marcos Luís Ortíz Valmaseda"
Date:
Ashish Kumar Singh escribió: > > Hello Everyone, > > > > I have a very bit big database around 15 million in size, and the dump > file is around 12 GB. > > While importing this dump in to database I have noticed that initially > query response time is very slow but it does improves with time. > > Any suggestions to improve performance after dump in imported in to > database will be highly appreciated! > > > > > > > > Regards, > > Ashish > My suggestion is: 1- Afterward of the db restore, you can do a vacuum analyze manually on your big tables to erase all dead rows 2- Then you can reindex your big tables on any case that you use it. 3- Then apply A CLUSTER command on the right tables that have these indexes. Regards -- ------------------------------------- "TIP 4: No hagas 'kill -9' a postmaster" Ing. Marcos Luís Ortíz Valmaseda PostgreSQL System DBA Centro de Tecnologías de Almacenamiento y Anális de Datos (CENTALAD) Universidad de las Ciencias Informáticas Linux User # 418229 http://www.postgresql-es.org http://www.postgresql.org http://www.planetpostgresql.org
On Wed, Dec 2, 2009 at 4:31 PM, Ashish Kumar Singh <ashishkumar.singh@altair.com> wrote: > Hello Everyone, > > I have a very bit big database around 15 million in size, and the dump file > is around 12 GB. > > While importing this dump in to database I have noticed that initially query > response time is very slow but it does improves with time. > > Any suggestions to improve performance after dump in imported in to database > will be highly appreciated! This is pretty normal. When the db first starts up or right after a load it has nothing in its buffers or the kernel cache. As you access more and more data the db and OS learned what is most commonly accessed and start holding onto those data and throw the less used stuff away to make room for it. Our production dbs run at a load factor of about 4 to 6, but when first started and put in the loop they'll hit 25 or 30 and have slow queries for a minute or so. Having a fast IO subsystem will help offset some of this, and sometimes "select * from bigtable" might too.
On Sat, Dec 5, 2009 at 7:16 AM, "Ing . Marcos Luís Ortíz Valmaseda" <mlortiz@uci.cu> wrote: > Ashish Kumar Singh escribió: >> >> Hello Everyone, >> >> >> I have a very bit big database around 15 million in size, and the dump >> file is around 12 GB. >> >> While importing this dump in to database I have noticed that initially >> query response time is very slow but it does improves with time. >> >> Any suggestions to improve performance after dump in imported in to >> database will be highly appreciated! >> >> >> >> >> Regards, >> >> Ashish >> > My suggestion is: > 1- Afterward of the db restore, you can do a vacuum analyze manually on your > big tables to erase all dead rows Well, there should be no dead rows, it's a fresh restore, so just plain analyze would be enough. Note that autovacuum will kick in eventually and do this for you. > 2- Then you can reindex your big tables on any case that you use it. Again, a freshly loaded db does not need to be reindexed. The indexes are fresh and new and clean. > 3- Then apply A CLUSTER command on the right tables that have these indexes. Now that's useful, but if you're gonna cluster, do it FIRST, then analyze the tables.
Re: performance while importing a very large data set in to database
From
Pierre Frédéric Caillaud
Date:
>> I have a very bit big database around 15 million in size, and the dump >> file >> is around 12 GB. >> >> While importing this dump in to database I have noticed that initially >> query >> response time is very slow but it does improves with time. >> >> Any suggestions to improve performance after dump in imported in to >> database >> will be highly appreciated! > > This is pretty normal. When the db first starts up or right after a > load it has nothing in its buffers or the kernel cache. As you access > more and more data the db and OS learned what is most commonly > accessed and start holding onto those data and throw the less used > stuff away to make room for it. Our production dbs run at a load > factor of about 4 to 6, but when first started and put in the loop > they'll hit 25 or 30 and have slow queries for a minute or so. > > Having a fast IO subsystem will help offset some of this, and > sometimes "select * from bigtable" might too. Maybe it's the updating of the the hint bits ?...
Does postgres have the concept of "pinning" procs, functions, etc to cache. As you mention, typically frequently used statements are cached improving performance. If waiting for the DBMS to do this is not an option then pinning critical ones should improve performance immediately following start up. This is an approach I have used with oracle to address this situation. Kris On 5-Dec-09, at 15:42, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Wed, Dec 2, 2009 at 4:31 PM, Ashish Kumar Singh > <ashishkumar.singh@altair.com> wrote: >> Hello Everyone, >> >> I have a very bit big database around 15 million in size, and the >> dump file >> is around 12 GB. >> >> While importing this dump in to database I have noticed that >> initially query >> response time is very slow but it does improves with time. >> >> Any suggestions to improve performance after dump in imported in to >> database >> will be highly appreciated! > > This is pretty normal. When the db first starts up or right after a > load it has nothing in its buffers or the kernel cache. As you access > more and more data the db and OS learned what is most commonly > accessed and start holding onto those data and throw the less used > stuff away to make room for it. Our production dbs run at a load > factor of about 4 to 6, but when first started and put in the loop > they'll hit 25 or 30 and have slow queries for a minute or so. > > Having a fast IO subsystem will help offset some of this, and > sometimes "select * from bigtable" might too. > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org > ) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance
Kris Kewley wrote: > Does postgres have the concept of "pinning" procs, functions, etc to > cache. > No. Everything that's in PostgreSQL's cache gets a usage count attached to is. When the buffer is used by something else, that count gets incremented. And when new buffers need to be allocated, the process that searches for them decrements usage counts until it find one with a count of 0 that's then evicted. There's no way to pin things using this scheme, the best you can do is try to access the data in advance and/or regularly enough that its usage count never drops too far. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com