Impact of foreign keys on a simple count(*) ? - Mailing list pgsql-sql
From | C. Bensend |
---|---|
Subject | Impact of foreign keys on a simple count(*) ? |
Date | |
Msg-id | 51319.63.227.74.41.1097293075.squirrel@63.227.74.41 Whole thread Raw |
Responses |
Re: Impact of foreign keys on a simple count(*) ?
|
List | pgsql-sql |
Hey folks (long email, my apologies), I am wrapping up my schema upgrade, and I just noticed a real show-stopper for me... Here is the "before" table structure: email_id | integer | not nulljoejob | boolean | default falsebayes_poison | boolean | default falseperm_error | boolean | defaultfalsebait_id | integer | default 0num_recip | integer |default 1product | integer | default 1043date_received | timestamp with time zone |only_date | date |only_time | time with time zone |maildir_file | charactervarying(64) |errors_to | character varying(512) |reply_to | character varying(512) |spammer | character varying(512) |return_path | character varying(512) |received_from | character varying(512) |message_id | character varying(512) |target_domain | character varying(512) |mail_date | character varying(512) |x_priority | character varying(512) |x_msmail_priority| character varying(512) |x_mimeole | character varying(512) |mime_version | charactervarying(512) |subject | character varying(1024) |mail_to | character varying(2048) |x_mailer | character varying(2048) |content_type | character varying(2048) |user_agent | character varying(2048) |cc | character varying(2048) |comments | charactervarying(8192) |last_mod | timestamp without time zone | default ('now'::text)::timest amp(6) with time zone Indexes: "emails_pkey" primary key, btree (email_id) "emails_idx_bait_id" btree (bait_id) "emails_idx_mail_to" btree(mail_to) "emails_idx_only_date" btree (only_date) "emails_idx_only_time" btree (only_time) "emails_idx_product"btree (product) "emails_idx_received_from" btree (received_from) "emails_idx_subject" btree (subject) "emails_idx_target_domain" btree (target_domain) And here is the "after" structure: email_id | integer | not nulljoejob | boolean | default falsebayes_poison | boolean | default falseperm_error | boolean | default falsenum_recip | integer | default 1mydom_id | integer | default 0spamv_id | integer | default 1053spammer_id | integer | default 1003last_mod | timestamp with time zone | default ('now'::text)::timestampwith time zonebait_id | integer |product_id | integer |date_received | timestamp with time zone |only_date | date |only_time |time with time zone |maildir_file | character varying(128) |x_priority | character varying(128) |x_msmail_priority| character varying(128) |x_mimeole | character varying(128) |mime_version | charactervarying(512) |received_from | character varying(512) |content_type | character varying(512) |errors_to | character varying(512) |user_agent | character varying(512) |mail_date | charactervarying(512) |x_mailer | character varying(512) |return_path | character varying(512) |message_id | character varying(512) |reply_to | character varying(512) |subject | charactervarying(1024) |mail_to | character varying(1024) |cc | character varying(2048) | Indexes: "emails_pkey" primary key, btree (email_id) "emails_maildir_file_key" unique, btree (maildir_file) Foreign-key constraints: "$1" FOREIGN KEY (bait_id) REFERENCES bait(bait_id) ON UPDATE CASCADE ON DELETE SET NULL "$2" FOREIGN KEY (product_id) REFERENCES products(product_id) ON UPDATE CASCADE ON DELETE SET NULL "$3" FOREIGN KEY (mydom_id) REFERENCES my_domains(mydom_id) ON UPDATE CASCADE ON DELETE SET NULL "$4" FOREIGN KEY (spamv_id) REFERENCES spamvertisers(spamv_id) ON UPDATE CASCADE ON DELETE SET NULL "$5" FOREIGN KEY (spammer_id) REFERENCES spammers(spammer_id) ON UPDATE CASCADE ON DELETE SET NULL I am absolutely aware of the fact that the new schema can still use some work. :) But, it's a step in the right direction as I muddle my way through this. Not to mention, I really did need those foreign keys. The problem is this: Original schema: prod01=> EXPLAIN ANALYZE SELECT count(*) FROM emails; QUERY PLAN --------------------------------------------------------------------------------------------------------------------Aggregate (cost=6345.61..6345.61 rows=1 width=0) (actual time=1066.727..1066.728 rows=1 loops=1) -> Seq Scan on emails (cost=0.00..6121.49 rows=89649 width=0) (actual time=18.214..980.040 rows=89649 loops=1)Total runtime: 1066.931 ms (3 rows) New schema: prod01=> EXPLAIN ANALYZE SELECT count(*) FROM emails; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------Aggregate (cost=22.50..22.50 rows=1 width=0) (actual time=40818.750..40818.752 rows=1 loops=1) -> Seq Scan on emails (cost=0.00..20.00 rows=1000 width=0) (actual time=38801.127..40458.369 rows=89649 loops=1)Total runtime: 40819.115 ms Is it the addition of the multiple foreign keys that is slowing this down so much? I have several calculations I need to do with the count, so this is a big thing for me. Suggestions appreciated, both on the immediate problem or regarding the new schema. Even "you're an idiot" is fine, if it accompanies a bit of advice. :) Thanks! Benny -- "Even if a man chops off your hand with a sword, you still have two nice, sharp bones to stick in his eyes." -- .sig on Slashdot