Thread: Query much faster with enable_seqscan=0
Hello,
I have received some help from the IRC channel, however, the problem still exists. When running the following query with enable_seqscan set to 0, it takes less than a second, whereas with it set to 1, the query returns in 14 seconds. The machine itself has 8GB Ram and is running PostgreSQL 9.0 on Debian Lenny. The database size is about 7GB.
Query:
SELECT tr.id, tr.sid FROM test_registration tr, INNER JOIN test_registration_result r on (tr.id = r.test_registration_id) WHERE. tr.test_administration_id='32a22b12-aa21-11df-a606-96551e8f4e4c'::uuid GROUP BY tr.id, tr.sid
demo=# \d test_registration Table "public.test_registration" Column | Type | Modifiers ------------------------+-----------------------------+------------------------ id | uuid | not null sid | character varying(36) | not null created_date | timestamp without time zone | not null default now() modified_date | timestamp without time zone | not null test_administration_id | uuid | not null teacher_number | character varying(15) | test_version_id | uuid | Indexes: "test_registration_pkey" PRIMARY KEY, btree (id) "test_registration_sid_key" UNIQUE, btree (sid, test_administration_id) "test_registration_teacher" btree (teacher_number) "test_registration_test_id" btree (test_administration_id) demo=# \d test_registration_result Table "public.test_registration_result" Column | Type | Modifiers ----------------------+-----------------------+----------- answer | character varying(15) | question_id | uuid | not null score | double precision | test_registration_id | uuid | not null Indexes: "test_registration_result_pkey" PRIMARY KEY, btree (question_id, test_registration_id) "test_registration_result_answer" btree (test_registration_id, answer, score) "test_registration_result_test" btree (test_registration_id)
Explain Analyze: -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=951169.97..951198.37 rows=2840 width=25) (actual time=14669.039..14669.843 rows=2972 loops=1) -> Hash Join (cost=2988.07..939924.85 rows=2249024 width=25) (actual time=551.464..14400.061 rows=638980 loops=1) Hash Cond: (r.test_registration_id = tr.id) -> Seq Scan on test_registration_result r (cost=0.00..681946.72 rows=37199972 width=16) (actual time=0.015..6073.101 rows=37198734 loops=1) -> Hash (cost=2952.57..2952.57 rows=2840 width=25) (actual time=2.516..2.516 rows=2972 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 160kB -> Bitmap Heap Scan on test_registration tr (cost=44.29..2952.57 rows=2840 width=25) (actual time=0.528..1.458 rows=2972 loops=1) Recheck Cond: (test_administration_id = 'e26a165a-c19f-11df-be2f-778af560e5a2'::uuid) -> Bitmap Index Scan on test_registration_test_administration_id (cost=0.00..43.58 rows=2840 width=0) (actual time=0.507..0.507 rows=2972 loops=1) Index Cond: (test_administration_id = 'e26a165a-c19f-11df-be2f-778af560e5a2'::uuid) Total runtime: 14670.337 ms (11 rows) real 0m14.698s user 0m0.000s sys 0m0.008s
With "set enable_seqscan=0;" SET QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=1225400.19..1225428.59 rows=2840 width=25) (actual time=748.397..749.160 rows=2972 loops=1) -> Nested Loop (cost=0.00..1214155.07 rows=2249024 width=25) (actual time=0.107..465.165 rows=638980 loops=1) -> Index Scan using test_registration_test_administration_id on test_registration tr (cost=0.00..4413.96 rows=2840 width=25) (actual time=0.050..1.610 rows=2972 loops=1) Index Cond: (test_administration_id = 'e26a165a-c19f-11df-be2f-778af560e5a2'::uuid) -> Index Scan using test_registration_result_answer on test_registration_result r (cost=0.00..416.07 rows=792 width=16) (actual time=0.019..0.106 rows=215 loops=2972) Index Cond: (r.test_registration_id = tr.id) Total runtime: 749.745 ms (7 rows) real 0m0.759s user 0m0.008s sys 0m0.000s
The following parameters are changed in postgresql.conf and I have routinely vacuum analyzed the tables and database:
shared_buffers = 2048MB
work_mem = 8MB
maintenance_work_mem = 256MB
wal_buffers = 640kB
random_page_cost = 4.0
effective_cache_size = 7000MB
default_statistics_target = 200
free -m:
total used free shared buffers cached Mem: 8003 7849 153 0 25 7555 -/+ buffers/cache: 268 7735 Swap: 7640 0 7639
Any help would be appreciated. Thank you very much.
Ogden
You DB is more than likely cached. You should adjust your page costs to better reflect reality and then the planner can make more accurate estimates and then choose the proper plan. Cheers, Ken On Tue, Sep 21, 2010 at 12:32:01PM -0500, Ogden wrote: > Hello, > > I have received some help from the IRC channel, however, the problem still exists. When running the following query withenable_seqscan set to 0, it takes less than a second, whereas with it set to 1, the query returns in 14 seconds. Themachine itself has 8GB Ram and is running PostgreSQL 9.0 on Debian Lenny. The database size is about 7GB. > > > Query: > SELECT tr.id, tr.sid > FROM > test_registration tr, > INNER JOIN test_registration_result r on (tr.id = r.test_registration_id) > WHERE. > tr.test_administration_id='32a22b12-aa21-11df-a606-96551e8f4e4c'::uuid > GROUP BY tr.id, tr.sid > > > > demo=# \d test_registration > Table "public.test_registration" > Column | Type | Modifiers > ------------------------+-----------------------------+------------------------ > id | uuid | not null > sid | character varying(36) | not null > created_date | timestamp without time zone | not null default now() > modified_date | timestamp without time zone | not null > test_administration_id | uuid | not null > teacher_number | character varying(15) | > test_version_id | uuid | > Indexes: > "test_registration_pkey" PRIMARY KEY, btree (id) > "test_registration_sid_key" UNIQUE, btree (sid, test_administration_id) > "test_registration_teacher" btree (teacher_number) > "test_registration_test_id" btree (test_administration_id) > > demo=# \d test_registration_result > Table "public.test_registration_result" > Column | Type | Modifiers > ----------------------+-----------------------+----------- > answer | character varying(15) | > question_id | uuid | not null > score | double precision | > test_registration_id | uuid | not null > Indexes: > "test_registration_result_pkey" PRIMARY KEY, btree (question_id, test_registration_id) > "test_registration_result_answer" btree (test_registration_id, answer, score) > "test_registration_result_test" btree (test_registration_id) > > > Explain Analyze: > > > -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > HashAggregate (cost=951169.97..951198.37 rows=2840 width=25) (actual time=14669.039..14669.843 rows=2972 loops=1) > -> Hash Join (cost=2988.07..939924.85 rows=2249024 width=25) (actual time=551.464..14400.061 rows=638980 loops=1) > Hash Cond: (r.test_registration_id = tr.id) > -> Seq Scan on test_registration_result r (cost=0.00..681946.72 rows=37199972 width=16) (actual time=0.015..6073.101rows=37198734 loops=1) > -> Hash (cost=2952.57..2952.57 rows=2840 width=25) (actual time=2.516..2.516 rows=2972 loops=1) > Buckets: 1024 Batches: 1 Memory Usage: 160kB > -> Bitmap Heap Scan on test_registration tr (cost=44.29..2952.57 rows=2840 width=25) (actual time=0.528..1.458rows=2972 loops=1) > Recheck Cond: (test_administration_id = 'e26a165a-c19f-11df-be2f-778af560e5a2'::uuid) > -> Bitmap Index Scan on test_registration_test_administration_id (cost=0.00..43.58 rows=2840 width=0)(actual time=0.507..0.507 rows=2972 loops=1) > Index Cond: (test_administration_id = 'e26a165a-c19f-11df-be2f-778af560e5a2'::uuid) > Total runtime: 14670.337 ms > (11 rows) > > > real 0m14.698s > user 0m0.000s > sys 0m0.008s > > > With "set enable_seqscan=0;" > > > SET > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > HashAggregate (cost=1225400.19..1225428.59 rows=2840 width=25) (actual time=748.397..749.160 rows=2972 loops=1) > -> Nested Loop (cost=0.00..1214155.07 rows=2249024 width=25) (actual time=0.107..465.165 rows=638980 loops=1) > -> Index Scan using test_registration_test_administration_id on test_registration tr (cost=0.00..4413.96 rows=2840width=25) (actual time=0.050..1.610 rows=2972 loops=1) > Index Cond: (test_administration_id = 'e26a165a-c19f-11df-be2f-778af560e5a2'::uuid) > -> Index Scan using test_registration_result_answer on test_registration_result r (cost=0.00..416.07 rows=792width=16) (actual time=0.019..0.106 rows=215 loops=2972) > Index Cond: (r.test_registration_id = tr.id) > Total runtime: 749.745 ms > (7 rows) > > > real 0m0.759s > user 0m0.008s > sys 0m0.000s > > > The following parameters are changed in postgresql.conf and I have routinely vacuum analyzed the tables and database: > > shared_buffers = 2048MB > work_mem = 8MB > maintenance_work_mem = 256MB > wal_buffers = 640kB > random_page_cost = 4.0 > effective_cache_size = 7000MB > default_statistics_target = 200 > > > free -m: > total used free shared buffers cached > Mem: 8003 7849 153 0 25 7555 > -/+ buffers/cache: 268 7735 > Swap: 7640 0 7639 > > > Any help would be appreciated. Thank you very much. > > Ogden
I assume you mean random_page_cost? It is currently set to 4.0 - is it better to increase or decrease this value? Thank you Ogden On Sep 21, 2010, at 1:06 PM, Kenneth Marshall wrote: > You DB is more than likely cached. You should adjust your > page costs to better reflect reality and then the planner > can make more accurate estimates and then choose the proper > plan. > > Cheers, > Ken > > On Tue, Sep 21, 2010 at 12:32:01PM -0500, Ogden wrote: >> Hello, >> >> I have received some help from the IRC channel, however, the problem still exists. When running the following query withenable_seqscan set to 0, it takes less than a second, whereas with it set to 1, the query returns in 14 seconds. Themachine itself has 8GB Ram and is running PostgreSQL 9.0 on Debian Lenny. The database size is about 7GB. >> >> >> Query: >> SELECT tr.id, tr.sid >> FROM >> test_registration tr, >> INNER JOIN test_registration_result r on (tr.id = r.test_registration_id) >> WHERE. >> tr.test_administration_id='32a22b12-aa21-11df-a606-96551e8f4e4c'::uuid >> GROUP BY tr.id, tr.sid >> >> >> >> demo=# \d test_registration >> Table "public.test_registration" >> Column | Type | Modifiers >> ------------------------+-----------------------------+------------------------ >> id | uuid | not null >> sid | character varying(36) | not null >> created_date | timestamp without time zone | not null default now() >> modified_date | timestamp without time zone | not null >> test_administration_id | uuid | not null >> teacher_number | character varying(15) | >> test_version_id | uuid | >> Indexes: >> "test_registration_pkey" PRIMARY KEY, btree (id) >> "test_registration_sid_key" UNIQUE, btree (sid, test_administration_id) >> "test_registration_teacher" btree (teacher_number) >> "test_registration_test_id" btree (test_administration_id) >> >> demo=# \d test_registration_result >> Table "public.test_registration_result" >> Column | Type | Modifiers >> ----------------------+-----------------------+----------- >> answer | character varying(15) | >> question_id | uuid | not null >> score | double precision | >> test_registration_id | uuid | not null >> Indexes: >> "test_registration_result_pkey" PRIMARY KEY, btree (question_id, test_registration_id) >> "test_registration_result_answer" btree (test_registration_id, answer, score) >> "test_registration_result_test" btree (test_registration_id) >> >> >> Explain Analyze: >> >> >> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> HashAggregate (cost=951169.97..951198.37 rows=2840 width=25) (actual time=14669.039..14669.843 rows=2972 loops=1) >> -> Hash Join (cost=2988.07..939924.85 rows=2249024 width=25) (actual time=551.464..14400.061 rows=638980 loops=1) >> Hash Cond: (r.test_registration_id = tr.id) >> -> Seq Scan on test_registration_result r (cost=0.00..681946.72 rows=37199972 width=16) (actual time=0.015..6073.101rows=37198734 loops=1) >> -> Hash (cost=2952.57..2952.57 rows=2840 width=25) (actual time=2.516..2.516 rows=2972 loops=1) >> Buckets: 1024 Batches: 1 Memory Usage: 160kB >> -> Bitmap Heap Scan on test_registration tr (cost=44.29..2952.57 rows=2840 width=25) (actual time=0.528..1.458rows=2972 loops=1) >> Recheck Cond: (test_administration_id = 'e26a165a-c19f-11df-be2f-778af560e5a2'::uuid) >> -> Bitmap Index Scan on test_registration_test_administration_id (cost=0.00..43.58 rows=2840 width=0)(actual time=0.507..0.507 rows=2972 loops=1) >> Index Cond: (test_administration_id = 'e26a165a-c19f-11df-be2f-778af560e5a2'::uuid) >> Total runtime: 14670.337 ms >> (11 rows) >> >> >> real 0m14.698s >> user 0m0.000s >> sys 0m0.008s >> >> >> With "set enable_seqscan=0;" >> >> >> SET >> QUERY PLAN >> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> HashAggregate (cost=1225400.19..1225428.59 rows=2840 width=25) (actual time=748.397..749.160 rows=2972 loops=1) >> -> Nested Loop (cost=0.00..1214155.07 rows=2249024 width=25) (actual time=0.107..465.165 rows=638980 loops=1) >> -> Index Scan using test_registration_test_administration_id on test_registration tr (cost=0.00..4413.96 rows=2840width=25) (actual time=0.050..1.610 rows=2972 loops=1) >> Index Cond: (test_administration_id = 'e26a165a-c19f-11df-be2f-778af560e5a2'::uuid) >> -> Index Scan using test_registration_result_answer on test_registration_result r (cost=0.00..416.07 rows=792width=16) (actual time=0.019..0.106 rows=215 loops=2972) >> Index Cond: (r.test_registration_id = tr.id) >> Total runtime: 749.745 ms >> (7 rows) >> >> >> real 0m0.759s >> user 0m0.008s >> sys 0m0.000s >> >> >> The following parameters are changed in postgresql.conf and I have routinely vacuum analyzed the tables and database: >> >> shared_buffers = 2048MB >> work_mem = 8MB >> maintenance_work_mem = 256MB >> wal_buffers = 640kB >> random_page_cost = 4.0 >> effective_cache_size = 7000MB >> default_statistics_target = 200 >> >> >> free -m: >> total used free shared buffers cached >> Mem: 8003 7849 153 0 25 7555 >> -/+ buffers/cache: 268 7735 >> Swap: 7640 0 7639 >> >> >> Any help would be appreciated. Thank you very much. >> >> Ogden > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance
On 2010-09-21 20:21, Ogden wrote: > I assume you mean random_page_cost? It is currently set to 4.0 - is it better to increase or decrease this value? > Should be lowered to a bit over seq_page_cost.. and more importantly.. you should make sure that you have updated your statistics .. run "ANALYZE"; -- Jesper
How odd, I set the following: seq_page_cost = 1.0 random_page_cost = 2.0 And now the query runs in milliseconds as opposed to 14 seconds. Could this really be the change? I am running ANALYZE now- how often is it recommended to do this? Thank you Ogden On Sep 21, 2010, at 1:51 PM, Jesper Krogh wrote: > On 2010-09-21 20:21, Ogden wrote: >> I assume you mean random_page_cost? It is currently set to 4.0 - is it better to increase or decrease this value? >> > > Should be lowered to a bit over seq_page_cost.. and more importantly.. you should > make sure that you have updated your statistics .. run "ANALYZE"; > > -- > Jesper > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance
On Tue, 2010-09-21 at 14:02 -0500, Ogden wrote: > How odd, I set the following: > > seq_page_cost = 1.0 > random_page_cost = 2.0 > > And now the query runs in milliseconds as opposed to 14 seconds. Could this really be the change? I am running ANALYZEnow - how often is it recommended to do this? PostgreSQL's defaults are based on extremely small and some would say (non production) size databases. As a matter of course I always recommend bringing seq_page_cost and random_page_cost more in line. However, you may want to try moving random_page_cost back to 4 and try increasing cpu_tuple_cost instead. Sincerely, Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
Joshua D. Drake wrote: > PostgreSQL's defaults are based on extremely small and some would say > (non production) size databases. As a matter of course I always > recommend bringing seq_page_cost and random_page_cost more in line. > Also, they presume that not all of your data is going to be in memory, and the query optimizer needs to be careful about what it does and doesn't pull from disk. If that's not the case, like here where there's 8GB of RAM and a 7GB database, dramatic reductions to both seq_page_cost and random_page_cost can make sense. Don't be afraid to think lowering below 1.0 is going too far--something more like 0.01 for sequential and 0.02 for random may actually reflect reality here. -- Greg Smith, 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services and Support www.2ndQuadrant.us Author, "PostgreSQL 9.0 High Performance" Pre-ordering at: https://www.packtpub.com/postgresql-9-0-high-performance/book
On Sep 21, 2010, at 2:16 PM, Greg Smith wrote: > Joshua D. Drake wrote: >> PostgreSQL's defaults are based on extremely small and some would say >> (non production) size databases. As a matter of course I always >> recommend bringing seq_page_cost and random_page_cost more in line. >> > > Also, they presume that not all of your data is going to be in memory, and the query optimizer needs to be careful aboutwhat it does and doesn't pull from disk. If that's not the case, like here where there's 8GB of RAM and a 7GB database,dramatic reductions to both seq_page_cost and random_page_cost can make sense. Don't be afraid to think loweringbelow 1.0 is going too far--something more like 0.01 for sequential and 0.02 for random may actually reflect realityhere. > I have done just that, per your recommendations and now what took 14 seconds, only takes less than a second, so it was certainlythese figures I messed around with. I have set: seq_page_cost = 0.01 random_page_cost = 0.02 cpu_tuple_cost = 0.01 Everything seems to run faster now. I think this should be fine - I'll keep an eye on things over the next few days. I truly appreciate everyone's help. Ogden
Greg Smith <greg@2ndquadrant.com> writes: > and the query optimizer needs to be careful about what it does and > doesn't pull from disk. If that's not the case, like here where there's > 8GB of RAM and a 7GB database, dramatic reductions to both seq_page_cost > and random_page_cost can make sense. Don't be afraid to think lowering > below 1.0 is going too far--something more like 0.01 for sequential and > 0.02 for random may actually reflect reality here. If you are tuning for an all-in-RAM situation, you should set random_page_cost equal to seq_page_cost (and usually set both smaller than 1). By definition, those costs are equal if you're fetching from RAM. If it's only mostly-in-RAM then keeping random_page_cost a bit higher makes sense. regards, tom lane
Ogden <lists@darkstatic.com> writes: > SELECT tr.id, tr.sid > FROM > test_registration tr, > INNER JOIN test_registration_result r on (tr.id = r.test_registration_id) > WHERE. > tr.test_administration_id='32a22b12-aa21-11df-a606-96551e8f4e4c'::uuid > GROUP BY tr.id, tr.sid Seeing that tr.id is a primary key, I think you might be a lot better off if you avoided the inner join and group by. I think what you really want here is something like SELECT tr.id, tr.sid FROM test_registration tr WHERE tr.test_administration_id='32a22b12-aa21-11df-a606-96551e8f4e4c'::uuid AND EXISTS(SELECT 1 FROM test_registration_result r WHERE tr.id = r.test_registration_id) regards, tom lane
On Tue, 2010-09-21 at 14:02 -0500, Ogden wrote: > How odd, I set the following: > > seq_page_cost = 1.0 > random_page_cost = 2.0 > > And now the query runs in milliseconds as opposed to 14 seconds. Could this really be the change? I am running ANALYZEnow - how often is it recommended to do this? PostgreSQL's defaults are based on extremely small and some would say (non production) size databases. As a matter of course I always recommend bringing seq_page_cost and random_page_cost more in line. However, you may want to try moving random_page_cost back to 4 and try increasing cpu_tuple_cost instead. Sincerely, Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
On Sep 21, 2010, at 2:34 PM, Ogden wrote: > > On Sep 21, 2010, at 2:16 PM, Greg Smith wrote: > >> Joshua D. Drake wrote: >>> PostgreSQL's defaults are based on extremely small and some would say >>> (non production) size databases. As a matter of course I always >>> recommend bringing seq_page_cost and random_page_cost more in line. >>> >> >> Also, they presume that not all of your data is going to be in memory, and the query optimizer needs to be careful aboutwhat it does and doesn't pull from disk. If that's not the case, like here where there's 8GB of RAM and a 7GB database,dramatic reductions to both seq_page_cost and random_page_cost can make sense. Don't be afraid to think loweringbelow 1.0 is going too far--something more like 0.01 for sequential and 0.02 for random may actually reflect realityhere. >> > > I have done just that, per your recommendations and now what took 14 seconds, only takes less than a second, so it wascertainly these figures I messed around with. I have set: > > seq_page_cost = 0.01 > random_page_cost = 0.02 > cpu_tuple_cost = 0.01 > > Everything seems to run faster now. I think this should be fine - I'll keep an eye on things over the next few days. > > I truly appreciate everyone's help. > > Ogden > I spoke too soon - well I came in this morning and reran the query that was speeded up yesterday by a lot after tweakingthose numbers. This morning the first time I ran it, it took 16 seconds whereas every subsequent run was a matterof 2 seconds. I assume there is OS caching going on for those results. Is this normal or could it also be the speedof my disks which is causing a lag when I first run it (it's RAID 5 across 6 disks). Is there any explanation for thisor what should those settings really be? Perhaps 0.01 is too low? Thank you Ogden
On Sep 22, 2010, at 6:36 AM, Ogden wrote: > > On Sep 21, 2010, at 2:34 PM, Ogden wrote: > >> >> On Sep 21, 2010, at 2:16 PM, Greg Smith wrote: >> >>> Joshua D. Drake wrote: >>>> PostgreSQL's defaults are based on extremely small and some would say >>>> (non production) size databases. As a matter of course I always >>>> recommend bringing seq_page_cost and random_page_cost more in line. >>>> >>> >>> Also, they presume that not all of your data is going to be in memory, and the query optimizer needs to be careful aboutwhat it does and doesn't pull from disk. If that's not the case, like here where there's 8GB of RAM and a 7GB database,dramatic reductions to both seq_page_cost and random_page_cost can make sense. Don't be afraid to think loweringbelow 1.0 is going too far--something more like 0.01 for sequential and 0.02 for random may actually reflect realityhere. >>> >> >> I have done just that, per your recommendations and now what took 14 seconds, only takes less than a second, so it wascertainly these figures I messed around with. I have set: >> >> seq_page_cost = 0.01 >> random_page_cost = 0.02 >> cpu_tuple_cost = 0.01 >> >> Everything seems to run faster now. I think this should be fine - I'll keep an eye on things over the next few days. >> >> I truly appreciate everyone's help. >> >> Ogden >> > > > I spoke too soon - well I came in this morning and reran the query that was speeded up yesterday by a lot after tweakingthose numbers. This morning the first time I ran it, it took 16 seconds whereas every subsequent run was a matterof 2 seconds. I assume there is OS caching going on for those results. Is this normal or could it also be the speedof my disks which is causing a lag when I first run it (it's RAID 5 across 6 disks). Is there any explanation for thisor what should those settings really be? Perhaps 0.01 is too low? > > Thank you > > Ogden When not cached, the plan with sequential scans will almost always be much faster. When cached in memory, the ones using indexes are almost always faster. The tuning parameters are essentially telling postgres the likelihood of finding things on disk instead versus in memory. The default parameters are essentially "not likely in memory, with a somewhat slow disk". > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance
On Wed, Sep 22, 2010 at 9:36 AM, Ogden <lists@darkstatic.com> wrote: > > On Sep 21, 2010, at 2:34 PM, Ogden wrote: > >> >> On Sep 21, 2010, at 2:16 PM, Greg Smith wrote: >> >>> Joshua D. Drake wrote: >>>> PostgreSQL's defaults are based on extremely small and some would say >>>> (non production) size databases. As a matter of course I always >>>> recommend bringing seq_page_cost and random_page_cost more in line. >>>> >>> >>> Also, they presume that not all of your data is going to be in memory, and the query optimizer needs to be careful aboutwhat it does and doesn't pull from disk. If that's not the case, like here where there's 8GB of RAM and a 7GB database,dramatic reductions to both seq_page_cost and random_page_cost can make sense. Don't be afraid to think loweringbelow 1.0 is going too far--something more like 0.01 for sequential and 0.02 for random may actually reflect realityhere. >>> >> >> I have done just that, per your recommendations and now what took 14 seconds, only takes less than a second, so it wascertainly these figures I messed around with. I have set: >> >> seq_page_cost = 0.01 >> random_page_cost = 0.02 >> cpu_tuple_cost = 0.01 >> >> Everything seems to run faster now. I think this should be fine - I'll keep an eye on things over the next few days. >> >> I truly appreciate everyone's help. >> >> Ogden >> > > > I spoke too soon - well I came in this morning and reran the query that was speeded up yesterday by a lot after tweakingthose numbers. This morning the first time I ran it, it took 16 seconds whereas every subsequent run was a matterof 2 seconds. I assume there is OS caching going on for those results. Is this normal or could it also be the speedof my disks which is causing a lag when I first run it (it's RAID 5 across 6 disks). Is there any explanation for thisor what should those settings really be? Perhaps 0.01 is too low? Yeah, I think those numbers are a bit low. Your database probably isn't fully cached. Keep in mind there's going to be some fluctuation as to what is and is not in cache, and you can't expect whatever plan the planner picks to be exactly perfect for both cases. I might try something more like 0.2 / 0.1. If you really need the query to be fast, though, you might need to do more than jigger the page costs. Did you try Tom's suggested rewrite? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Sep 21, 2010, at 6:30 PM, Tom Lane wrote: > Ogden <lists@darkstatic.com> writes: >> SELECT tr.id, tr.sid >> FROM >> test_registration tr, >> INNER JOIN test_registration_result r on (tr.id = r.test_registration_id) >> WHERE. >> tr.test_administration_id='32a22b12-aa21-11df-a606-96551e8f4e4c'::uuid >> GROUP BY tr.id, tr.sid > > Seeing that tr.id is a primary key, I think you might be a lot better > off if you avoided the inner join and group by. I think what you really > want here is something like > > SELECT tr.id, tr.sid > FROM > test_registration tr > WHERE > tr.test_administration_id='32a22b12-aa21-11df-a606-96551e8f4e4c'::uuid > AND EXISTS(SELECT 1 FROM test_registration_result r > WHERE tr.id = r.test_registration_id) > > regards, tom lane > Thank you for this suggestion, however, what if I wanted some columns from test_registration_result - this wouldn't work,for example if I wanted test_registration_result.answer to be fetched. Hence, I had to have a JOIN with test_registration_resultand a GROUP BY. I still am not happy with my query - the EXISTS executes in great speed however Icannot retrieve any of the columns from that table. Thank you Ogden
On Tue, Sep 21, 2010 at 4:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Ogden <lists@darkstatic.com> writes:Seeing that tr.id is a primary key, I think you might be a lot better
> SELECT tr.id, tr.sid
> FROM
> test_registration tr,
> INNER JOIN test_registration_result r on (tr.id = r.test_registration_id)
> WHERE.
> tr.test_administration_id='32a22b12-aa21-11df-a606-96551e8f4e4c'::uuid
> GROUP BY tr.id, tr.sid
off if you avoided the inner join and group by. I think what you really
want here is something likeWHEREAND EXISTS(SELECT 1 FROM test_registration_result r
tr.test_administration_id='32a22b12-aa21-11df-a606-96551e8f4e4c'::uuid
WHERE tr.id = r.test_registration_id)
regards, tom lane
Could you explain the logic behind why this structure is better than the other? Is it always the case that one should just always use the 'exists(select 1 from x...)' structure when trying to strip rows that don't join or is it just the case when you know that the rows which do join are a fairly limited subset? Does the same advantage exist if filtering rows in the joined table on some criteria, or is it better at that point to use an inner join and add a where clause to filter the joined rows.
select table1.columns
from table1, table2
where table1.column = 'some_value'
AND table2.column = 'some_other_value'
versus
select table1.columns
from table1
where table1.column = 'some_value'
and table2.column ='some_other_value')
On Tue, Oct 12, 2010 at 10:28 PM, Samuel Gendler <sgendler@ideasculptor.com> wrote: > > > On Tue, Sep 21, 2010 at 4:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> >> Ogden <lists@darkstatic.com> writes: >> > SELECT tr.id, tr.sid >> > FROM >> > test_registration tr, >> > INNER JOIN test_registration_result r on (tr.id = >> > r.test_registration_id) >> > WHERE. >> > >> > tr.test_administration_id='32a22b12-aa21-11df-a606-96551e8f4e4c'::uuid >> > GROUP BY tr.id, tr.sid >> >> Seeing that tr.id is a primary key, I think you might be a lot better >> off if you avoided the inner join and group by. I think what you really >> want here is something like >> >> SELECT tr.id, tr.sid >> FROM >> test_registration tr >> WHERE >> >> tr.test_administration_id='32a22b12-aa21-11df-a606-96551e8f4e4c'::uuid >> AND EXISTS(SELECT 1 FROM test_registration_result r >> WHERE tr.id = r.test_registration_id) >> >> regards, tom lane >> > > Could you explain the logic behind why this structure is better than the > other? Is it always the case that one should just always use the > 'exists(select 1 from x...)' structure when trying to strip rows that don't > join or is it just the case when you know that the rows which do join are a > fairly limited subset? Does the same advantage exist if filtering rows in > the joined table on some criteria, or is it better at that point to use an > inner join and add a where clause to filter the joined rows. > select table1.columns > from table1, table2 > where table1.column = 'some_value' > and table1.fk = table2.pk > AND table2.column = 'some_other_value' > versus > select table1.columns > from table1 > where table1.column = 'some_value' > and exists(select 1 from table2 where table1.fk = table2.pk > and table2.column ='some_other_value') I don't think there's much difference between those two cases. I think Tom's point was that GROUP BY can be expensive - which it certainly can. It's absolutely necessary and unavoidable for certain queries, of course, but don't include it unless you need it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company