Re: cost and actual time - Mailing list pgsql-performance
From | Christopher Kings-Lynne |
---|---|
Subject | Re: cost and actual time |
Date | |
Msg-id | 07e301c2d94a$e6eec450$6500a8c0@fhp.internal Whole thread Raw |
In response to | Re: cost and actual time ("Josh Berkus" <josh@agliodbs.com>) |
List | pgsql-performance |
I nominate Manfred for support response award of the week! Chris ----- Original Message ----- From: "Manfred Koizar" <mkoi-pg@aon.at> To: "Chantal Ackermann" <chantal.ackermann@biomax.de> Cc: "Josh Berkus" <josh@agliodbs.com>; <pgsql-performance@postgresql.org>; <tgl@sss.pgh.pa.us> Sent: Thursday, February 20, 2003 6:00 PM Subject: Re: [PERFORM] cost and actual time > On Wed, 19 Feb 2003 10:38:54 +0100, Chantal Ackermann > <chantal.ackermann@biomax.de> wrote: > >Nested Loop: 53508.86 msec > >Merge Join: 113066.81 msec > >Hash Join: 439344.44 msec > > Chantal, > > you might have reached the limit of what Postgres (or any other > database?) can do for you with these data structures. Time for > something completely different: Try calculating the counts in > advance. > > CREATE TABLE occ_stat ( > did INT NOT NULL, > gid INT NOT NULL, > cnt INT NOT NULL > ) WITHOUT OIDS; > > CREATE INDEX occ_stat_dg ON occ_stat(did, gid); > CREATE INDEX occ_stat_gd ON occ_stat(gid, did); > > There is *no* UNIQUE constraint on (did, gid). You get the numbers > you're after by > SELECT did, sum(cnt) AS cnt > FROM occ_stat > WHERE gid = 'whatever' > GROUP BY did > ORDER BY cnt DESC; > > occ_stat is initially loaded by > > INSERT INTO occ_stat > SELECT did, gid, count(*) > FROM g_o INNER JOIN d_o ON (g_o.sid = d_o.sid) > GROUP BY did, gid; > > Doing it in chunks > WHERE sid BETWEEN a::bigint AND b::bigint > might be faster. > > You have to block any INSERT/UPDATE/DELETE activity on d_o and g_o > while you do the initial load. If it takes too long, see below for > how to do it in the background; hopefully the load task will catch up > some day :-) > > Keeping occ_stat current: > > CREATE RULE d_o_i AS ON INSERT > TO d_o DO ( > INSERT INTO occ_stat > SELECT NEW.did, g_o.gid, 1 > FROM g_o > WHERE g_o.sid = NEW.sid); > > CREATE RULE d_o_d AS ON DELETE > TO d_o DO ( > INSERT INTO occ_stat > SELECT OLD.did, g_o.gid, -1 > FROM g_o > WHERE g_o.sid = OLD.sid); > > On UPDATE do both. Create a set of similar rules for g_o. > > These rules will create a lot of duplicates on (did, gid) in occ_stat. > Updating existing rows and inserting only new combinations might seem > obvious, but this method has concurrency problems (cf. the thread > "Hard problem with concurrency" on -hackers). So occ_stat calls for > reorganisation from time to time: > > BEGIN; > SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; > CREATE TEMP TABLE t (did INT, gid INT, cnt INT) WITHOUT OIDS; > > INSERT INTO t > SELECT did, gid, sum(cnt) > FROM occ_stat > GROUP BY did, gid > HAVING count(*) > 1; > > DELETE FROM occ_stat > WHERE t.did = occ_stat.did > AND t.gid = occ_stat.gid; > > INSERT INTO occ_stat SELECT * FROM t; > > DROP TABLE t; > COMMIT; > VACUUM ANALYZE occ_stat; -- very important!! > > Now this should work, but the rules could kill INSERT/UPDATE/DELETE > performance. Depending on your rate of modifications you might be > forced to push the statistics calculation to the background. > > CREATE TABLE d_o_change ( > sid BIGINT NOT NULL, > did INT NOT NULL, > cnt INT NOT NULL > ) WITHOUT OIDS; > > ... ON INSERT TO d_o DO ( > INSERT INTO d_o_change VALUES (NEW.sid, NEW.did, 1)); > > ... ON DELETE TO d_o DO ( > INSERT INTO d_o_change VALUES (OLD.sid, OLD.did, -1)); > > ... ON UPDATE TO d_o > WHERE OLD.sid != NEW.sid OR OLD.did != NEW.did > DO both > > And the same for g_o. > > You need a task that periodically scans [dg]_o_change and does ... > > BEGIN; > SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; > SELECT <any row (or some rows) from x_o_change>; > INSERT INTO occ_stat <see above>; > DELETE <the selected row(s) from x_o_change>; > COMMIT; > > Don't forget to VACUUM! > > If you invest a little more work, I guess you can combine the > reorganisation into the loader task ... > > I have no idea whether this approach is better than what you have now. > With a high INSERT/UPDATE/DELETE rate it may lead to a complete > performance disaster. You have to try ... > > Servus > Manfred > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
pgsql-performance by date: