Re: Massive table bloat - Mailing list pgsql-general

From Thom Brown
Subject Re: Massive table bloat
Date
Msg-id bddc86151003030756o722902f7t31accd588365ef3b@mail.gmail.com
Whole thread Raw
In response to Massive table bloat  ("Markus Wollny" <Markus.Wollny@computec.de>)
Responses Re: Massive table bloat  ("Markus Wollny" <Markus.Wollny@computec.de>)
List pgsql-general
On 3 March 2010 15:33, Markus Wollny <Markus.Wollny@computec.de> wrote:
> Hi!
>
> I've set up some system to track slow page executions in one of our (as
> yet not live) web apps. The tracking itself is handled completely within
> the database using a function. Within a very short time (approx. 1 week)
> and although we haven't got that much traffic on our testpages, the
> table in question as grown beyond a size of 23 GB, even though a SELECT
> count(*) on it will tell me that it only contains 235 rows. I'm sure I
> must be missing something obvious here...
>
> Here's the DDL for the table:
>
> CREATE TABLE stats.slowpages
> (
> url text NOT NULL,
> lastexecduration integer NOT NULL,
> avgslowexecduration integer,
> execcount integer,
> lastexectime timestamp without time zone,
> site_id integer NOT NULL,
> slowestexecduration integer,
> totaltimespent bigint,
> CONSTRAINT "slowpages_pkey" PRIMARY KEY (url)
> )WITHOUT OIDS;
>
> -- Indexes
> CREATE INDEX idx_slowpages_duration ON stats.slowpages USING btree
> (lastexecduration);
> CREATE INDEX idx_slowpages_avgduration ON stats.slowpages USING btree
> (avgslowexecduration);
> CREATE INDEX idx_slowpages_execcount ON stats.slowpages USING btree
> (execcount);
> CREATE INDEX idx_slowpages_lastexec ON stats.slowpages USING btree
> (lastexectime);
> CREATE INDEX idx_slowpages_site ON stats.slowpages USING btree
> (site_id);
> CREATE UNIQUE INDEX uidx_slowpages_url_site ON stats.slowpages USING
> btree (url, site_id);
> CREATE INDEX idx_slowpages_totaltimespent ON stats.slowpages USING btree
> (totaltimespent);
>
> And this here is the function we use to insert or update entries in this
> table:
>
> CREATE or REPLACE FUNCTION "stats"."iou_slowpages"(
> IN "_site_id" integer,
> IN "_url" text,
> IN "_duration" integer)
> RETURNS void AS
> $BODY$
> BEGIN
>    LOOP
>
>        UPDATE stats.slowpages
>               SET  avgslowexecduration =
> ((avgslowexecduration*execcount)+_duration)/(execcount+1)
>               ,    execcount = execcount+1
>               ,    lastexectime = now()
>               ,    lastexecduration = _duration
>               ,    totaltimespent = totaltimespent + _duration
>               ,    slowestexecduration = CASE WHEN _duration >
> slowestexecduration
>                    THEN _duration ELSE slowestexecduration END
>               WHERE url = _url AND site_id = _site_id;
>        IF found THEN
>            RETURN;
>        END IF;
>
>        BEGIN
>            INSERT INTO
> stats.slowpages(url,lastexecduration,avgslowexecduration,slowestexecdura
> tion,totaltimespent,execcount,lastexectime,site_id)
>            VALUES (_url, _duration, _duration,_duration,_duration, 1,
> now(), _site_id);
>            RETURN;
>        EXCEPTION WHEN unique_violation THEN
>
>        END;
>    END LOOP;
> END;
> $BODY$
> LANGUAGE 'plpgsql' VOLATILE;
>
> _site_id is a small integer value, _url is a full URL string to a page
> and _duration is a value in milliseconds. We're on PostgreSQL 8.3.7.
>
> Any idea about what I may be missing here?
>
> Kind regards
>
>   Markus
>

If you update rows, it actually creates a new version of it.  The old
one doesn't get removed until the VACUUM process cleans it up, so
maybe you need to run that against the database?

Regards

Thom

pgsql-general by date:

Previous
From: "Markus Wollny"
Date:
Subject: Massive table bloat
Next
From: Grzegorz Jaśkiewicz
Date:
Subject: Re: Massive table bloat