Re: cost and actual time - Mailing list pgsql-performance

From Christopher Kings-Lynne
Subject Re: cost and actual time
Date
Msg-id 07e301c2d94a$e6eec450$6500a8c0@fhp.internal
Whole thread Raw
In response to Re: cost and actual time  ("Josh Berkus" <josh@agliodbs.com>)
List pgsql-performance
I nominate Manfred for support response award of the week!

Chris

----- Original Message -----
From: "Manfred Koizar" <mkoi-pg@aon.at>
To: "Chantal Ackermann" <chantal.ackermann@biomax.de>
Cc: "Josh Berkus" <josh@agliodbs.com>; <pgsql-performance@postgresql.org>;
<tgl@sss.pgh.pa.us>
Sent: Thursday, February 20, 2003 6:00 PM
Subject: Re: [PERFORM] cost and actual time


> On Wed, 19 Feb 2003 10:38:54 +0100, Chantal Ackermann
> <chantal.ackermann@biomax.de> wrote:
> >Nested Loop: 53508.86 msec
> >Merge Join: 113066.81 msec
> >Hash Join:  439344.44 msec
>
> Chantal,
>
> you might have reached the limit of what Postgres (or any other
> database?) can do for you with these data structures.  Time for
> something completely different:  Try calculating the counts in
> advance.
>
>     CREATE TABLE occ_stat (
>         did INT NOT NULL,
>         gid INT NOT NULL,
>         cnt INT NOT NULL
>     ) WITHOUT OIDS;
>
>     CREATE INDEX occ_stat_dg ON occ_stat(did, gid);
>     CREATE INDEX occ_stat_gd ON occ_stat(gid, did);
>
> There is *no* UNIQUE constraint on (did, gid).  You get the numbers
> you're after by
>     SELECT did, sum(cnt) AS cnt
>       FROM occ_stat
>      WHERE gid = 'whatever'
>      GROUP BY did
>      ORDER BY cnt DESC;
>
> occ_stat is initially loaded by
>
>     INSERT INTO occ_stat
>     SELECT did, gid, count(*)
>       FROM g_o INNER JOIN d_o ON (g_o.sid = d_o.sid)
>      GROUP BY did, gid;
>
> Doing it in chunks
>     WHERE sid BETWEEN a::bigint AND b::bigint
> might be faster.
>
> You have to block any INSERT/UPDATE/DELETE activity on d_o and g_o
> while you do the initial load.  If it takes too long, see below for
> how to do it in the background; hopefully the load task will catch up
> some day :-)
>
> Keeping occ_stat current:
>
>     CREATE RULE d_o_i AS ON INSERT
>         TO d_o DO (
>             INSERT INTO occ_stat
>             SELECT NEW.did, g_o.gid, 1
>               FROM g_o
>              WHERE g_o.sid = NEW.sid);
>
>     CREATE RULE d_o_d AS ON DELETE
>         TO d_o DO (
>             INSERT INTO occ_stat
>             SELECT OLD.did, g_o.gid, -1
>               FROM g_o
>              WHERE g_o.sid = OLD.sid);
>
> On UPDATE do both.  Create a set of similar rules for g_o.
>
> These rules will create a lot of duplicates on (did, gid) in occ_stat.
> Updating existing rows and inserting only new combinations might seem
> obvious, but this method has concurrency problems (cf. the thread
> "Hard problem with concurrency" on -hackers).  So occ_stat calls for
> reorganisation from time to time:
>
>     BEGIN;
>     SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
>     CREATE TEMP TABLE t (did INT, gid INT, cnt INT) WITHOUT OIDS;
>
>     INSERT INTO t
>     SELECT did, gid, sum(cnt)
>       FROM occ_stat
>      GROUP BY did, gid
>     HAVING count(*) > 1;
>
>     DELETE FROM occ_stat
>      WHERE t.did = occ_stat.did
>        AND t.gid = occ_stat.gid;
>
>     INSERT INTO occ_stat SELECT * FROM t;
>
>     DROP TABLE t;
>     COMMIT;
>     VACUUM ANALYZE occ_stat;  -- very important!!
>
> Now this should work, but the rules could kill INSERT/UPDATE/DELETE
> performance.  Depending on your rate of modifications you might be
> forced to push the statistics calculation to the background.
>
>     CREATE TABLE d_o_change (
>         sid BIGINT NOT NULL,
>         did INT NOT NULL,
>         cnt INT NOT NULL
>     ) WITHOUT OIDS;
>
>     ... ON INSERT TO d_o DO (
>         INSERT INTO d_o_change VALUES (NEW.sid, NEW.did, 1));
>
>     ... ON DELETE TO d_o DO (
>         INSERT INTO d_o_change VALUES (OLD.sid, OLD.did, -1));
>
>     ... ON UPDATE TO d_o
>         WHERE OLD.sid != NEW.sid OR OLD.did != NEW.did
>         DO both
>
> And the same for g_o.
>
> You need a task that periodically scans [dg]_o_change and does ...
>
>     BEGIN;
>     SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
>     SELECT <any row (or some rows) from x_o_change>;
>     INSERT INTO occ_stat <see above>;
>     DELETE <the selected row(s) from x_o_change>;
>     COMMIT;
>
> Don't forget to VACUUM!
>
> If you invest a little more work, I guess you can combine the
> reorganisation into the loader task ...
>
> I have no idea whether this approach is better than what you have now.
> With a high INSERT/UPDATE/DELETE rate it may lead to a complete
> performance disaster.  You have to try ...
>
> Servus
>  Manfred
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
>


pgsql-performance by date:

Previous
From: Kevin White
Date:
Subject: Re: Really bad insert performance: what did I do wrong?
Next
From: Tom Lane
Date:
Subject: Re: slow query