Massive table bloat - Mailing list pgsql-general

From Markus Wollny
Subject Massive table bloat
Date
Msg-id 28011CD60FB1724DBA4442E38277F6260F91793D@hermes.computec.de
Whole thread Raw
Responses Re: Massive table bloat  (Thom Brown <thombrown@gmail.com>)
Re: Massive table bloat  (Grzegorz Jaśkiewicz <gryzman@gmail.com>)
List pgsql-general
Hi!

I've set up some system to track slow page executions in one of our (as
yet not live) web apps. The tracking itself is handled completely within
the database using a function. Within a very short time (approx. 1 week)
and although we haven't got that much traffic on our testpages, the
table in question as grown beyond a size of 23 GB, even though a SELECT
count(*) on it will tell me that it only contains 235 rows. I'm sure I
must be missing something obvious here...

Here's the DDL for the table:

CREATE TABLE stats.slowpages
(
url text NOT NULL,
lastexecduration integer NOT NULL,
avgslowexecduration integer,
execcount integer,
lastexectime timestamp without time zone,
site_id integer NOT NULL,
slowestexecduration integer,
totaltimespent bigint,
CONSTRAINT "slowpages_pkey" PRIMARY KEY (url)
)WITHOUT OIDS;

-- Indexes
CREATE INDEX idx_slowpages_duration ON stats.slowpages USING btree
(lastexecduration);
CREATE INDEX idx_slowpages_avgduration ON stats.slowpages USING btree
(avgslowexecduration);
CREATE INDEX idx_slowpages_execcount ON stats.slowpages USING btree
(execcount);
CREATE INDEX idx_slowpages_lastexec ON stats.slowpages USING btree
(lastexectime);
CREATE INDEX idx_slowpages_site ON stats.slowpages USING btree
(site_id);
CREATE UNIQUE INDEX uidx_slowpages_url_site ON stats.slowpages USING
btree (url, site_id);
CREATE INDEX idx_slowpages_totaltimespent ON stats.slowpages USING btree
(totaltimespent);

And this here is the function we use to insert or update entries in this
table:

CREATE or REPLACE FUNCTION "stats"."iou_slowpages"(
IN "_site_id" integer,
IN "_url" text,
IN "_duration" integer)
RETURNS void AS
$BODY$
BEGIN
    LOOP

        UPDATE stats.slowpages
               SET  avgslowexecduration =
((avgslowexecduration*execcount)+_duration)/(execcount+1)
               ,    execcount = execcount+1
               ,    lastexectime = now()
               ,    lastexecduration = _duration
               ,    totaltimespent = totaltimespent + _duration
               ,    slowestexecduration = CASE WHEN _duration >
slowestexecduration
                    THEN _duration ELSE slowestexecduration END
               WHERE url = _url AND site_id = _site_id;
        IF found THEN
            RETURN;
        END IF;

        BEGIN
            INSERT INTO
stats.slowpages(url,lastexecduration,avgslowexecduration,slowestexecdura
tion,totaltimespent,execcount,lastexectime,site_id)
            VALUES (_url, _duration, _duration,_duration,_duration, 1,
now(), _site_id);
            RETURN;
        EXCEPTION WHEN unique_violation THEN

        END;
    END LOOP;
END;
$BODY$
LANGUAGE 'plpgsql' VOLATILE;

_site_id is a small integer value, _url is a full URL string to a page
and _duration is a value in milliseconds. We're on PostgreSQL 8.3.7.

Any idea about what I may be missing here?

Kind regards

   Markus


Computec Media AG
Sitz der Gesellschaft und Registergericht: Furth (HRB 8818)
Vorstandsmitglieder: Albrecht Hengstenberg (Vorsitzender) und Rainer Rosenbusch
Vorsitzender des Aufsichtsrates: Jurg Marquard
Umsatzsteuer-Identifikationsnummer: DE 812 575 276



pgsql-general by date:

Previous
From: Richard Huxton
Date:
Subject: Re: disable triggers isolated to transaction only?
Next
From: Thom Brown
Date:
Subject: Re: Massive table bloat