Re: FW: Out of swap space & memory - Mailing list pgsql-general

From Paul Tillotson
Subject Re: FW: Out of swap space & memory
Date
Msg-id 4115108D.7040108@shentel.net
Whole thread Raw
In response to FW: Out of swap space & memory  ("Kevin Bartz" <kbartz@loyaltymatrix.com>)
List pgsql-general
Kevin,

I've been casually following this thread (you might say "lurking"), and
I don't know *why* the problem is occurring, but you said that you
didn't see a foolproof way to break the problem into steps. I think
there is such a way, which will at least allow you to get the job done.

This example uses the member_id column as the partitioning column, and
uses 3 smaller steps. Substitute a different column or more steps if
necessary.

Pick 2 values of member id which can be used as dividers to distribute
your table 3 ways evenly. For example, if your member_id's are
sequential from 1 to 3000, use values 1000 and 2000. Then do this:
(notice inserted WHERE clause)

-- de-dup the table
select member_id, campaign_id, catalog_type, pushed, delivered, clicks,
opened, month, type1, type2, type3, type4, type5
into octanenights
from octanenights_raw
WHERE member_id < 1000
group by member_id, campaign_id, catalog_type, pushed, delivered, clicks,
         opened, month, type1, type2, type3, type4, type5;

-- de-dup the table
select member_id, campaign_id, catalog_type, pushed, delivered, clicks,
opened, month, type1, type2, type3, type4, type5
into octanenights
from octanenights_raw
WHERE member_id >= 1000 AND member_id < 2000
group by member_id, campaign_id, catalog_type, pushed, delivered, clicks,
         opened, month, type1, type2, type3, type4, type5;

-- de-dup the table
select member_id, campaign_id, catalog_type, pushed, delivered, clicks,
opened, month, type1, type2, type3, type4, type5
into octanenights
from octanenights_raw
WHERE member_id >= 2000
group by member_id, campaign_id, catalog_type, pushed, delivered, clicks,
         opened, month, type1, type2, type3, type4, type5;

This way, the system never has to do a GROUP BY with the whole table. If
it still runs out of memory this way, you can partition the data into as
many smaller pieces as necessary, although I'm sure you'll want to write
a script to do so if you have to do very many partitions.

Paul

Kevin Bartz wrote:

>-----Original Message-----
>From: Kevin Bartz [mailto:kbartz@loyaltymatrix.com]
>Sent: Friday, August 06, 2004 8:41 PM
>To: 'mike@thegodshalls.com'
>Subject: RE: [GENERAL] Out of swap space & memory
>
>Well, all I'm doing right now is using psql from the command line. A bit
>unintelligent, I know, but I want to make sure things will work
>appropriately before I dive into the world of query editors.
>
>Thanks for your suggestions. My data won't need to be updated regularly. In
>this case, as in all others, loading each set of data is a one-shot process,
>so I don't think I'll need to worry about truncating.
>
>I can't think of any way to break the problem into steps, since the
>duplicates may well be scattered throughout the table. If I split
>octanenights into octanenights1 and octanenights2 and then de-dup each
>individually, I would have to first be sure that octanenights1 does not
>share a duplicate with octanenights2, or that duplicate would appear in the
>UNION ALLed version. Maybe I'm missing something?
>
>Thanks for your kind response.
>
>Kevin
>
>


pgsql-general by date:

Previous
From: Mike Nolan
Date:
Subject: Re: Rounding in PGSQL
Next
From: Joe Conway
Date:
Subject: Re: Rounding in PGSQL