Hey folks (long email, my apologies),
I am wrapping up my schema upgrade, and I just noticed a real
show-stopper for me... Here is the "before" table structure:
email_id | integer | not nulljoejob | boolean | default
falsebayes_poison | boolean | default falseperm_error | boolean |
defaultfalsebait_id | integer | default 0num_recip | integer
|default 1product | integer | default 1043date_received | timestamp with time zone
|only_date | date |only_time | time with time zone |maildir_file |
charactervarying(64) |errors_to | character varying(512) |reply_to | character varying(512)
|spammer | character varying(512) |return_path | character varying(512) |received_from
| character varying(512) |message_id | character varying(512) |target_domain | character
varying(512) |mail_date | character varying(512) |x_priority | character varying(512)
|x_msmail_priority| character varying(512) |x_mimeole | character varying(512) |mime_version |
charactervarying(512) |subject | character varying(1024) |mail_to | character
varying(2048) |x_mailer | character varying(2048) |content_type | character varying(2048)
|user_agent | character varying(2048) |cc | character varying(2048) |comments |
charactervarying(8192) |last_mod | timestamp without time zone | default
('now'::text)::timest
amp(6) with time zone
Indexes: "emails_pkey" primary key, btree (email_id) "emails_idx_bait_id" btree (bait_id) "emails_idx_mail_to"
btree(mail_to) "emails_idx_only_date" btree (only_date) "emails_idx_only_time" btree (only_time)
"emails_idx_product"btree (product) "emails_idx_received_from" btree (received_from) "emails_idx_subject" btree
(subject) "emails_idx_target_domain" btree (target_domain)
And here is the "after" structure:
email_id | integer | not nulljoejob | boolean | default
falsebayes_poison | boolean | default falseperm_error | boolean | default
falsenum_recip | integer | default 1mydom_id | integer | default
0spamv_id | integer | default 1053spammer_id | integer | default
1003last_mod | timestamp with time zone | default
('now'::text)::timestampwith time zonebait_id | integer |product_id | integer
|date_received | timestamp with time zone |only_date | date |only_time
|time with time zone |maildir_file | character varying(128) |x_priority | character varying(128)
|x_msmail_priority| character varying(128) |x_mimeole | character varying(128) |mime_version |
charactervarying(512) |received_from | character varying(512) |content_type | character varying(512)
|errors_to | character varying(512) |user_agent | character varying(512) |mail_date |
charactervarying(512) |x_mailer | character varying(512) |return_path | character varying(512)
|message_id | character varying(512) |reply_to | character varying(512) |subject |
charactervarying(1024) |mail_to | character varying(1024) |cc | character varying(2048) |
Indexes: "emails_pkey" primary key, btree (email_id) "emails_maildir_file_key" unique, btree (maildir_file)
Foreign-key constraints: "$1" FOREIGN KEY (bait_id) REFERENCES bait(bait_id) ON UPDATE CASCADE ON
DELETE SET NULL "$2" FOREIGN KEY (product_id) REFERENCES products(product_id) ON UPDATE
CASCADE ON DELETE SET NULL "$3" FOREIGN KEY (mydom_id) REFERENCES my_domains(mydom_id) ON UPDATE
CASCADE ON DELETE SET NULL "$4" FOREIGN KEY (spamv_id) REFERENCES spamvertisers(spamv_id) ON UPDATE
CASCADE ON DELETE SET NULL "$5" FOREIGN KEY (spammer_id) REFERENCES spammers(spammer_id) ON UPDATE
CASCADE ON DELETE SET NULL
I am absolutely aware of the fact that the new schema can still use
some work. :) But, it's a step in the right direction as I muddle my
way through this. Not to mention, I really did need those foreign keys.
The problem is this:
Original schema:
prod01=> EXPLAIN ANALYZE SELECT count(*) FROM emails; QUERY PLAN
--------------------------------------------------------------------------------------------------------------------Aggregate
(cost=6345.61..6345.61 rows=1 width=0) (actual
time=1066.727..1066.728 rows=1 loops=1) -> Seq Scan on emails (cost=0.00..6121.49 rows=89649 width=0) (actual
time=18.214..980.040 rows=89649 loops=1)Total runtime: 1066.931 ms
(3 rows)
New schema:
prod01=> EXPLAIN ANALYZE SELECT count(*) FROM emails; QUERY PLAN
----------------------------------------------------------------------------------------------------------------------Aggregate
(cost=22.50..22.50 rows=1 width=0) (actual
time=40818.750..40818.752 rows=1 loops=1) -> Seq Scan on emails (cost=0.00..20.00 rows=1000 width=0) (actual
time=38801.127..40458.369 rows=89649 loops=1)Total runtime: 40819.115 ms
Is it the addition of the multiple foreign keys that is slowing this
down so much? I have several calculations I need to do with the count,
so this is a big thing for me.
Suggestions appreciated, both on the immediate problem or regarding
the new schema. Even "you're an idiot" is fine, if it accompanies a
bit of advice. :)
Thanks!
Benny
--
"Even if a man chops off your hand with a sword, you still have two nice,
sharp bones to stick in his eyes." -- .sig on Slashdot