Thread: Deleting large amount of data.

Deleting large amount of data.

From
"Vilson farias"
Date:
Greetings,
 
    I'm having some perfomance problems related with large amount of tuples in a table, so I decided to start programming some shell scripts to be placed in crontad to remove old data every night.
 
   Now, my questions are :
 
  If I just execute a DELETE * FROM WHERE call_start < '2002-08-21' and call_start > '2002-08-20', and that interval has 100.000 tuples, does DELETE start and commit a transaction for every single deletion?
 
  And If I put a BEGIN at the beginning and a COMMIT at the end, do deleted tuples overload transaction logs?
 
  Finally, what's the best thing to do?
 
Best regards.
 
--------------------------------------------------------------------------------
José Vilson de Mello de Farias
Software Engineer
 
Dígitro Tecnologia Ltda - www.digitro.com.br - Brazil
APC - Customer Oriented Applications
E-mail: vilson.farias@digitro.com.br
Tel.: +55 48 281 7158
ICQ 11866179
--------------------------------------------------------------------------------
José Vilson de Mello de Farias
Analista de Sistemas
 
Dígitro  Tecnologia  Ltda   -   www.digitro.com.br
APC - Aplicativos Orientados ao Cliente
E-mail: vilson.farias@digitro.com.br
Tel.: +55 48 281 7158
ICQ 11866179

Re: Deleting large amount of data.

From
Stephan Szabo
Date:
On Mon, 26 Aug 2002, Vilson farias wrote:

> Greetings,
>
>     I'm having some perfomance problems related with large amount of
> tuples in a table, so I decided to start programming some shell
> scripts to be placed in crontad to remove old data every night.

Hmm, you might want to send some info on the schema/queries/explain
output, maybe someone'll be able to help.

>
>    Now, my questions are :
>
>   If I just execute a DELETE * FROM WHERE call_start < '2002-08-21'
> and call_start > '2002-08-20', and that interval has 100.000 tuples,
> does DELETE start and commit a transaction for every single deletion?

No.  It's all one statement, so it'll be in its containing transaction (if
you haven't started one, it'll be put into its own).


Re: Deleting large amount of data.

From
"Vilson farias"
Date:

Thanks for ask my post, Mr. Szabo.

   Thanks for worry about my performance problems, but they are natural for
my database, once my hardware isn't a Ferrari and my table has more than 2
milion of touples and it's related with a lot of other very high populated
tables. I realized I can keep no longer than 6 mounths of data in these
tables, so I would like to remove older data every night. I didn't write any
code yet and I can't start testing the best way in my databases, so I'm
posting here to get some advices. I'm worried about performance problems
that could raise when a start removing 12000 touples in a single transaction
once I don't know how PostgreSQL works internally. Can a single transaction
deal with so many data without problem or overloading transaction buffers
(or something like that) or do I need to split deletion process in smaller
sets of data?

My main table wich has more than 2milion of touples :
CREATE TABLE cham_chamada (
       cod_bxs              integer NOT NULL,
       chave                integer NOT NULL,
       identificacao        integer NOT NULL,
       dt_inicial           timestamp  NOT NULL,
       indicadora           integer,
       cod_categoria        integer,
       identidadea          varchar(20),
       dt_final             timestamp ,
       juntor               integer,
       indicadorb           integer,
       identidadeb          varchar(20),
       flg_chamada          char(1),
       flg_liberacao        char(1),
       temp_total           integer,
       ddd                  smallint,
       cod_liberjuntor      integer,
       cod_resultcham       integer,
       cod_fds              integer,
       cifra_origem         integer,
       csp                  char(2),
       conta                integer,
       ramal                varchar(20),
       ultimo_ramal         varchar(20),
       CONSTRAINT XPKcham_chamada PRIMARY KEY (dt_inicial, cod_bxs, chave,
identificacao)

)
;

CREATE INDEX XIE1cham_chamada ON cham_chamada
(
       dt_final                       ,
       identidadea
);

CREATE INDEX XIE2cham_chamada ON cham_chamada
(
       dt_final                       ,
       flg_chamada
);








Re: Deleting large amount of data.

From
Stephan Szabo
Date:
On Tue, 27 Aug 2002, Vilson farias wrote:

>
>
> Thanks for ask my post, Mr. Szabo.
>
>    Thanks for worry about my performance problems, but they are natural for
> my database, once my hardware isn't a Ferrari and my table has more than 2
> milion of touples and it's related with a lot of other very high populated
> tables. I realized I can keep no longer than 6 mounths of data in these
> tables, so I would like to remove older data every night. I didn't write any
> code yet and I can't start testing the best way in my databases, so I'm
> posting here to get some advices. I'm worried about performance problems
> that could raise when a start removing 12000 touples in a single transaction
> once I don't know how PostgreSQL works internally. Can a single transaction
> deal with so many data without problem or overloading transaction buffers
> (or something like that) or do I need to split deletion process in smaller
> sets of data?

12000 rows shouldn't be a big deal.  I assume you're adding about the same
number per day basically, so it's a rotating window.

The things to be sure of are that the column you're deleting by is indexed
(using the definition below this means dt_inicial or dt_final has to be
part of the constraint).  You should check with explain to make sure it's
using the index (if it says it's not, you should analyze the table and try
again).  Also, if you're running 7.2, you should probably do a vacuum
afterwards (without full) and after say a couple of weeks of this, if
you notice your disk space is increasing, there are some configuration
options which can help (look for the free space map options in the
configuration file).

>
> My main table wich has more than 2milion of touples :
> CREATE TABLE cham_chamada (
>        cod_bxs              integer NOT NULL,
>        chave                integer NOT NULL,
>        identificacao        integer NOT NULL,
>        dt_inicial           timestamp  NOT NULL,
>        indicadora           integer,
>        cod_categoria        integer,
>        identidadea          varchar(20),
>        dt_final             timestamp ,
>        juntor               integer,
>        indicadorb           integer,
>        identidadeb          varchar(20),
>        flg_chamada          char(1),
>        flg_liberacao        char(1),
>        temp_total           integer,
>        ddd                  smallint,
>        cod_liberjuntor      integer,
>        cod_resultcham       integer,
>        cod_fds              integer,
>        cifra_origem         integer,
>        csp                  char(2),
>        conta                integer,
>        ramal                varchar(20),
>        ultimo_ramal         varchar(20),
>        CONSTRAINT XPKcham_chamada PRIMARY KEY (dt_inicial, cod_bxs, chave,
> identificacao)
>
> )
> ;
>
> CREATE INDEX XIE1cham_chamada ON cham_chamada
> (
>        dt_final                       ,
>        identidadea
> );
>
> CREATE INDEX XIE2cham_chamada ON cham_chamada
> (
>        dt_final                       ,
>        flg_chamada
> );