Home > mailing lists

Re: two memory-consuming postgres processes - Mailing list pgsql-performance

From	PFC
Subject	Re: two memory-consuming postgres processes
Date	May 2, 2008 18:25:23
Msg-id	op.uajp1iotcigqcu@apollo13.peufeu.com Whole thread Raw
In response to	Re: two memory-consuming postgres processes (Alexy Khrabrov <deliverable@gmail.com>)
Responses	Re: two memory-consuming postgres processes
List	pgsql-performance

Tree view

> I created several indices for the primary table, yes.  Sure I can do a
> table for a volatile column, but then I'll have to create a new such
> table for each derived column -- that's why I tried to add a column to
> the existing table.  Yet seeing this is really slow, and I need to to
> many derived analyses like this -- which are later scanned in other
> computations, so should persist -- I indeed see no other way but to
> procreate derived tables with the same key, one column per each...

    OK, so in that case, if you could do all of your derived column
calculations in one query like this :

CREATE TABLE derived AS SELECT ... FROM ... (perform all your derived
calculations here)

    or :

BEGIN;    <-- this is important to avoid writing xlog
CREATE TABLE derived AS ...
INSERT INTO derived SELECT ... FROM ... (perform all your derived
calculations here)
COMMIT;

    Basically, updating the entire table several times to add a few simple
columns is a bad idea. If you can compute all the data you need in one
query, like above, it will be much faster. Especially if you join one
large table to several smaller ones, and as long as the huge data set
doesn't need to be sorted (check the query plan using EXPLAIN). Try to do
as much as possible in one query to scan the large dataset only once.

    Note that the above will be faster than updating the entire table since
it needs to write much less data : it doesn't need to delete the old rows,
and it doesn't need to write the transaction log, since if the transaction
rolls back, the table never existed anyway. Also since your newly created
table doesn't have any indexes, they won't need to be updated.

    If you really need to update an entire table multiple times, you will
need to :

    - Use hardware that can handle disk writes at a decent speed (that isn't
a characteristic of a laptop drive)
    - use MyIsam, yes (but if you need to make complex queries on the data
afterwards, it could suck).

pgsql-performance by date:

From: Greg Smith
Date: 02 May 2008, 18:23:52
Subject: Re: two memory-consuming postgres processes

From: Ernesto
Date: 02 May 2008, 18:30:29
Subject: Very slow INFORMATION_SCHEMA

Re: two memory-consuming postgres processes - Mailing list pgsql-performance

Previous

Next