Thread: Storing database in cluster (Memory)
Hi all,
As I am inserting 100million rows daily into partitioned tables (daily wise), it is getting slower. Even the retrivel of data, select statement on those tables takes about 30 mintues. I have tried increasing the parameters in postgres.conf but still that doesn't help me much as the no of rows are huge. Will the idea of storing the database in cluster (memory) increase the performance of insert/update/select in the table in a suitation like this? Thanks a lot in advance.
Regards
Roopa
Don't pick lemons.
See all the new 2007 cars at Yahoo! Autos.
-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of roopa perumalraja
Sent: Tuesday, February 06, 2007 12:33 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Storing database in cluster (Memory)Hi all,As I am inserting 100million rows daily into partitioned tables (daily wise), it is getting slower. Even the retrivel of data, select statement on those tables takes about 30 mintues. I have tried increasing the parameters in postgres.conf but still that doesn't help me much as the no of rows are huge. Will the idea of storing the database in cluster (memory) increase the performance of insert/update/select in the table in a suitation like this? Thanks a lot in advance.RegardsRoopa
Hello Roopa,
Are you doing any vacuum runs on these tables? Most time degrading performance one highly updated tables is caused by not performing any vacuum runs.
Greetings,
Matthias
roopa perumalraja wrote: > Hi all, > > As I am inserting 100million rows daily into partitioned tables > (daily wise), it is getting slower. What is - the inserts? By how much? What tables? What indexes? How are you inserting these rows? > Even the retrivel of data, select > statement on those tables takes about 30 mintues. All selects take 30 minutes, regardless what work they do? Or do you have specific selects that are causing problems? > I have tried > increasing the parameters in postgres.conf but still that doesn't > help me much as the no of rows are huge. What parameters, to what values? What hardware are you running on? What load do you place on it? > Will the idea of storing the > database in cluster (memory) increase the performance of > insert/update/select in the table in a suitation like this? Thanks a > lot in advance. A better question would be "Why isn't my data being cached?". The operating-system should cache regularly used files. Still - more details please roopa and we'll see if anyone can help you. -- Richard Huxton Archonet Ltd
You might take a look at index anding for speeding up your selects -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Richard Huxton Sent: Tuesday, February 06, 2007 7:24 AM To: roopa perumalraja Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Storing database in cluster (Memory) roopa perumalraja wrote: > Hi all, > > As I am inserting 100million rows daily into partitioned tables > (daily wise), it is getting slower. What is - the inserts? By how much? What tables? What indexes? How are you inserting these rows? > Even the retrivel of data, select > statement on those tables takes about 30 mintues. All selects take 30 minutes, regardless what work they do? Or do you have specific selects that are causing problems? > I have tried > increasing the parameters in postgres.conf but still that doesn't > help me much as the no of rows are huge. What parameters, to what values? What hardware are you running on? What load do you place on it? > Will the idea of storing the > database in cluster (memory) increase the performance of > insert/update/select in the table in a suitation like this? Thanks a > lot in advance. A better question would be "Why isn't my data being cached?". The operating-system should cache regularly used files. Still - more details please roopa and we'll see if anyone can help you. -- Richard Huxton Archonet Ltd ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend PRIVILEGED AND CONFIDENTIAL This email transmission contains privileged and confidential information intended only for the use of the individual or entitynamed above. If the reader of the email is not the intended recipient or the employee or agent responsible for deliveringit to the intended recipient, you are hereby notified that any use, dissemination or copying of this email transmissionis strictly prohibited by the sender. If you have received this transmission in error, please delete the emailand immediately notify the sender via the email return address or mailto:postmaster@argushealth.com. Thank you.
>>Hi all,
>>
>>As I am inserting 100million rows daily into partitioned tables (daily wise), it is getting slower.
>What is - the inserts? By how much? What tables? What indexes? How are you inserting these rows?
>>
>>As I am inserting 100million rows daily into partitioned tables (daily wise), it is getting slower.
>What is - the inserts? By how much? What tables? What indexes? How are you inserting these rows?
I take my words back as 100million rows. The insert of 20million rows everyday takes only 10minutes as I use copy statement to copy into temperory table from flat files then do some manipulation to the data & insert it into the paritioned tables. I have solved the problem. I have paritioned the tables date-wise. (Is partitioning the tables monthly is recommanded?)
The proble is with another insert, selecting data from one of the paritioned tables and doing some calculations then inserting into another table. That is around 280000 rows every day. This takes really a long time (almost a day) if somebody is doing something with the database. If nobody is using the database then it takes almost two hours. Even a select statement to other tables in the datbase affects this insert. While inserting I use
BEGIN WORK
SELECT foo1
LOCK TABLE foo2 IN EXCLUSIVE MODE
INSERT INTO foo2 (SELECT......)
COMMIT WORK
All the tables are indexed. I am using 4 indexes including the pkey index. Will dropping the index before inserting and reindexing it after the insert will help?
>>Even the retrivel of data, select statement on those tables takes about 30 mintues.
>All selects take 30 minutes, regardless what work they do? Or do you have specific selects that are causing problems?
Any statement more than one running on database takes time. Say for example if I am inserting 20million rows into one table & at the same time if I try updating into another table, that takes a lot of time.
>> I have tried increasing the parameters in postgres.conf but still that doesn't help me much as the no of rows are huge.
>What parameters, to what values? What hardware are you running on? What load do you place on it?
parameters in postgres.conf which I increased are
fsysn on
wal_buffers 128
checkpoint_segments 256 (2 GB)
checkpoint_timeout 3600 (1 hour)
checkpoint_segments 256 (2 GB)
checkpoint_timeout 3600 (1 hour)
work_mem: set to 128MB
maintenance_work_mem: to 512MB
maintenance_work_mem: to 512MB
I wanted to increase shared_buffer to 60,000 but I am not able to restart the database if I change it even to 10000. It says 'could not start postmaster'. (max_connection is 100.)
>>Will the idea of storing the database in cluster (memory) increase the performance of
>> insert/update/select in the table in a suitation like this? Thanks a lot in advance.
>A better question would be "Why isn't my data being cached?". The operating-system should cache regularly used files.
So I think the problem is when two things are happening in database, it takes a long time. Thats the reason I was wondering if storing the database in cluster will solve the problem.
And I even run vaccum every night.
>Still - more details please roopa and we'll see if anyone can help you.
I hope this helps.
Thanks
--
Roopa
--
Roopa
Looking for earth-friendly autos?
Browse Top Cars by "Green Rating" at Yahoo! Autos' Green Center.
roopa perumalraja wrote: >>> Hi all, >>> >>> As I am inserting 100million rows daily into partitioned tables >>> (daily wise), it is getting slower. > >> What is - the inserts? By how much? What tables? What indexes? How >> are you inserting these rows? > > I take my words back as 100million rows. The insert of 20million rows > everyday takes only 10minutes as I use copy statement to copy into > temperory table from flat files then do some manipulation to the data > & insert it into the paritioned tables. I have solved the problem. I > have paritioned the tables date-wise. (Is partitioning the tables > monthly is recommanded?) Depends upon your usage - some people partition daily if they have a lot of incoming data. > The proble is with another insert, selecting data from one of the > paritioned tables and doing some calculations then inserting into > another table. That is around 280000 rows every day. This takes > really a long time (almost a day) if somebody is doing something with > the database. If nobody is using the database then it takes almost > two hours. Even a select statement to other tables in the datbase > affects this insert. While inserting I use > > BEGIN WORK SELECT foo1 LOCK TABLE foo2 IN EXCLUSIVE MODE INSERT INTO > foo2 (SELECT......) COMMIT WORK > > All the tables are indexed. I am using 4 indexes including the pkey > index. Will dropping the index before inserting and reindexing it > after the insert will help? Maybe. It is often quicker to recreate an index if you are updating lots of rows. >>> Even the retrivel of data, select statement on those tables takes >>> about 30 mintues. > >> All selects take 30 minutes, regardless what work they do? Or do >> you have specific selects that are causing problems? > > Any statement more than one running on database takes time. Say for > example if I am inserting 20million rows into one table & at the same > time if I try updating into another table, that takes a lot of time. You haven't given details of the select statements, or what the bottleneck is in your system. It sounds like it might be disk I/O. Have you tried monitoring activity with vmstat/iostat (assuming you're on a Linux/Unix system)? >>> I have tried increasing the parameters in postgres.conf but still >>> that doesn't help me much as the no of rows are huge. > >> What parameters, to what values? What hardware are you running on? >> What load do you place on it? > > parameters in postgres.conf which I increased are > > fsysn on > wal_buffers 128 > checkpoint_segments 256 (2 GB) > checkpoint_timeout 3600 (1 hour) How did you calculate these? Is this based on your bulk data load. > work_mem: set to 128MB > maintenance_work_mem: to 512MB I assume you have enough memory to support these settings? Something over 4GB, yes? > I wanted to increase shared_buffer to 60,000 but I am not able to > restart the database if I change it even to 10000. It says 'could not > start postmaster'. (max_connection is 100.) "It" says? What do the logs say? >>> Will the idea of storing the database in cluster (memory) >>> increase the performance of insert/update/select in the table in >>> a suitation like this? Thanks a lot in advance. > >> A better question would be "Why isn't my data being cached?". The >> operating-system should cache regularly used files. > > So I think the problem is when two things are happening in database, > it takes a long time. Thats the reason I was wondering if storing the > database in cluster will solve the problem. The data should be cached anyway. That's the point. You still haven't told us any of the information needed to help you. Some of the questions we still need answers for: 1. What queries are giving you trouble, and how many rows do they deal with? 2. What hardware is the database running on? 3. What load is on the system (try "vmstat 10" or similar) -- Richard Huxton Archonet Ltd