TPC-H Scaling Factors X PostgreSQL Cluster Command - Mailing list pgsql-performance
From | Nelson Kotowski |
---|---|
Subject | TPC-H Scaling Factors X PostgreSQL Cluster Command |
Date | |
Msg-id | d34b24380704210754q53aa86c8l3a63211d81febbf7@mail.gmail.com Whole thread Raw |
Responses |
Re: TPC-H Scaling Factors X PostgreSQL Cluster Command
Re: TPC-H Scaling Factors X PostgreSQL Cluster Command |
List | pgsql-performance |
Hello everyone,
This is my first post in here, i am in need of some help...
Wel, i am running PostgreSQL 8.2.4, in a single node, in this machine:
Dell PowerEdge 1950
Intel Xeon 2.33 (2 CPUs)
4GB RAM
Sata HD 160GB
Debian Distribution
Some relevant parameters:
In Linux:
-> SHMMAX: 32MB
In postgresql.conf:
-> shared_buffers: 24MB
-> maintenance_work_mem: 16MB
I need to: build, load and do some optimization procedures in a TPC-H benchmark database.
So far, i need to do it in three different scale factors (1, 2 and 5GB databases).
My build process comprehends creating the tables without any foreign keys, indexes, etc. - Running OK!
Then, i load the data from the flat files generated through DBGEN software into these tables. - Running OK!
Finally, i run a "optimize" script that does the following:
- Alter the tables to add the mandatory foreign keys;
- Create all mandatory indexes;
- Cluster the orders table by the orders table index;
- Cluster the lineitem table by the lineitem table index;
- Vacuum the database;
- Analyze statistics.
This is the step which is causing me some headaches, mainly related to the 5GB database. I identified that the cluster command over the lineitem table (cluster idx_lineitem on lineitem) is the responsible. I got to this conclusion because when i run it in the 1GB and 2GB database i am able to complete this script in 10 and 30 minutes each. But when i run this command over the 5GB database, it simply seems as it won't end. I watched it running over 12 hours and nothing happened.
To investigate a bit, i tried to tune these parameters and these parameters only, and re-run the script (rebooted the machine and restarted it all over):
In Linux: SHMMAX -> Tuned it to 2GB via echo "..." > /proc/sys/kernel/shmmax
In postgresql.conf:
shared_buffers: 512MB
maintenance_work_mem: 800MB
I thought that this might improve the performance, but as a matter of fact, that's what happened:
1 GB database - cluster command time remains the same (more or less 10 minutes)
2 GB database - cluster command now takes 3 hours instead of 30 minutes! BAD
5 GB database - still can't complete the command in over 12 hours.
To add some info, i did a top command on the machine, i saw that the postmaster consumes all the "shared_buffers" configured in the physical memory (13,3% of the RAM --> 512MB of 4GB) but the total free mem is 0% (all the 4GB is used, but not by the postmaster), and no swap is in use, CPU is around 1% busy (this 1% is for the postmaster), and this machine is dedicate, for my personal use, and there's nothing else running but PostgreSQL.
Does anyone have any clues?
Thanks in advance,
Nelson P Kotowski Filho.
This is my first post in here, i am in need of some help...
Wel, i am running PostgreSQL 8.2.4, in a single node, in this machine:
Dell PowerEdge 1950
Intel Xeon 2.33 (2 CPUs)
4GB RAM
Sata HD 160GB
Debian Distribution
Some relevant parameters:
In Linux:
-> SHMMAX: 32MB
In postgresql.conf:
-> shared_buffers: 24MB
-> maintenance_work_mem: 16MB
I need to: build, load and do some optimization procedures in a TPC-H benchmark database.
So far, i need to do it in three different scale factors (1, 2 and 5GB databases).
My build process comprehends creating the tables without any foreign keys, indexes, etc. - Running OK!
Then, i load the data from the flat files generated through DBGEN software into these tables. - Running OK!
Finally, i run a "optimize" script that does the following:
- Alter the tables to add the mandatory foreign keys;
- Create all mandatory indexes;
- Cluster the orders table by the orders table index;
- Cluster the lineitem table by the lineitem table index;
- Vacuum the database;
- Analyze statistics.
This is the step which is causing me some headaches, mainly related to the 5GB database. I identified that the cluster command over the lineitem table (cluster idx_lineitem on lineitem) is the responsible. I got to this conclusion because when i run it in the 1GB and 2GB database i am able to complete this script in 10 and 30 minutes each. But when i run this command over the 5GB database, it simply seems as it won't end. I watched it running over 12 hours and nothing happened.
To investigate a bit, i tried to tune these parameters and these parameters only, and re-run the script (rebooted the machine and restarted it all over):
In Linux: SHMMAX -> Tuned it to 2GB via echo "..." > /proc/sys/kernel/shmmax
In postgresql.conf:
shared_buffers: 512MB
maintenance_work_mem: 800MB
I thought that this might improve the performance, but as a matter of fact, that's what happened:
1 GB database - cluster command time remains the same (more or less 10 minutes)
2 GB database - cluster command now takes 3 hours instead of 30 minutes! BAD
5 GB database - still can't complete the command in over 12 hours.
To add some info, i did a top command on the machine, i saw that the postmaster consumes all the "shared_buffers" configured in the physical memory (13,3% of the RAM --> 512MB of 4GB) but the total free mem is 0% (all the 4GB is used, but not by the postmaster), and no swap is in use, CPU is around 1% busy (this 1% is for the postmaster), and this machine is dedicate, for my personal use, and there's nothing else running but PostgreSQL.
Does anyone have any clues?
Thanks in advance,
Nelson P Kotowski Filho.
pgsql-performance by date: