Thread: Deleting large amount of data.
Greetings,
I'm having some perfomance problems related with large amount of tuples in a table, so I decided to start programming some shell scripts to be placed in crontad to remove old data every night.
Now, my questions are :
If I just execute a DELETE * FROM WHERE call_start < '2002-08-21' and call_start > '2002-08-20', and that interval has 100.000 tuples, does DELETE start and commit a transaction for every single deletion?
And If I put a BEGIN at the beginning and a COMMIT at the end, do deleted tuples overload transaction logs?
Finally, what's the best thing to do?
Best regards.
--------------------------------------------------------------------------------
José Vilson de Mello de Farias
Software Engineer
José Vilson de Mello de Farias
Software Engineer
Dígitro Tecnologia Ltda - www.digitro.com.br - Brazil
APC - Customer Oriented Applications
E-mail: vilson.farias@digitro.com.br
Tel.: +55 48 281 7158
ICQ 11866179
APC - Customer Oriented Applications
E-mail: vilson.farias@digitro.com.br
Tel.: +55 48 281 7158
ICQ 11866179
--------------------------------------------------------------------------------
José Vilson de Mello de Farias
Analista de Sistemas
Dígitro Tecnologia Ltda - www.digitro.com.br
APC - Aplicativos Orientados ao Cliente
E-mail: vilson.farias@digitro.com.br
Tel.: +55 48 281 7158
ICQ 11866179
José Vilson de Mello de Farias
Analista de Sistemas
Dígitro Tecnologia Ltda - www.digitro.com.br
APC - Aplicativos Orientados ao Cliente
E-mail: vilson.farias@digitro.com.br
Tel.: +55 48 281 7158
ICQ 11866179
On Mon, 26 Aug 2002, Vilson farias wrote: > Greetings, > > I'm having some perfomance problems related with large amount of > tuples in a table, so I decided to start programming some shell > scripts to be placed in crontad to remove old data every night. Hmm, you might want to send some info on the schema/queries/explain output, maybe someone'll be able to help. > > Now, my questions are : > > If I just execute a DELETE * FROM WHERE call_start < '2002-08-21' > and call_start > '2002-08-20', and that interval has 100.000 tuples, > does DELETE start and commit a transaction for every single deletion? No. It's all one statement, so it'll be in its containing transaction (if you haven't started one, it'll be put into its own).
Thanks for ask my post, Mr. Szabo. Thanks for worry about my performance problems, but they are natural for my database, once my hardware isn't a Ferrari and my table has more than 2 milion of touples and it's related with a lot of other very high populated tables. I realized I can keep no longer than 6 mounths of data in these tables, so I would like to remove older data every night. I didn't write any code yet and I can't start testing the best way in my databases, so I'm posting here to get some advices. I'm worried about performance problems that could raise when a start removing 12000 touples in a single transaction once I don't know how PostgreSQL works internally. Can a single transaction deal with so many data without problem or overloading transaction buffers (or something like that) or do I need to split deletion process in smaller sets of data? My main table wich has more than 2milion of touples : CREATE TABLE cham_chamada ( cod_bxs integer NOT NULL, chave integer NOT NULL, identificacao integer NOT NULL, dt_inicial timestamp NOT NULL, indicadora integer, cod_categoria integer, identidadea varchar(20), dt_final timestamp , juntor integer, indicadorb integer, identidadeb varchar(20), flg_chamada char(1), flg_liberacao char(1), temp_total integer, ddd smallint, cod_liberjuntor integer, cod_resultcham integer, cod_fds integer, cifra_origem integer, csp char(2), conta integer, ramal varchar(20), ultimo_ramal varchar(20), CONSTRAINT XPKcham_chamada PRIMARY KEY (dt_inicial, cod_bxs, chave, identificacao) ) ; CREATE INDEX XIE1cham_chamada ON cham_chamada ( dt_final , identidadea ); CREATE INDEX XIE2cham_chamada ON cham_chamada ( dt_final , flg_chamada );
On Tue, 27 Aug 2002, Vilson farias wrote: > > > Thanks for ask my post, Mr. Szabo. > > Thanks for worry about my performance problems, but they are natural for > my database, once my hardware isn't a Ferrari and my table has more than 2 > milion of touples and it's related with a lot of other very high populated > tables. I realized I can keep no longer than 6 mounths of data in these > tables, so I would like to remove older data every night. I didn't write any > code yet and I can't start testing the best way in my databases, so I'm > posting here to get some advices. I'm worried about performance problems > that could raise when a start removing 12000 touples in a single transaction > once I don't know how PostgreSQL works internally. Can a single transaction > deal with so many data without problem or overloading transaction buffers > (or something like that) or do I need to split deletion process in smaller > sets of data? 12000 rows shouldn't be a big deal. I assume you're adding about the same number per day basically, so it's a rotating window. The things to be sure of are that the column you're deleting by is indexed (using the definition below this means dt_inicial or dt_final has to be part of the constraint). You should check with explain to make sure it's using the index (if it says it's not, you should analyze the table and try again). Also, if you're running 7.2, you should probably do a vacuum afterwards (without full) and after say a couple of weeks of this, if you notice your disk space is increasing, there are some configuration options which can help (look for the free space map options in the configuration file). > > My main table wich has more than 2milion of touples : > CREATE TABLE cham_chamada ( > cod_bxs integer NOT NULL, > chave integer NOT NULL, > identificacao integer NOT NULL, > dt_inicial timestamp NOT NULL, > indicadora integer, > cod_categoria integer, > identidadea varchar(20), > dt_final timestamp , > juntor integer, > indicadorb integer, > identidadeb varchar(20), > flg_chamada char(1), > flg_liberacao char(1), > temp_total integer, > ddd smallint, > cod_liberjuntor integer, > cod_resultcham integer, > cod_fds integer, > cifra_origem integer, > csp char(2), > conta integer, > ramal varchar(20), > ultimo_ramal varchar(20), > CONSTRAINT XPKcham_chamada PRIMARY KEY (dt_inicial, cod_bxs, chave, > identificacao) > > ) > ; > > CREATE INDEX XIE1cham_chamada ON cham_chamada > ( > dt_final , > identidadea > ); > > CREATE INDEX XIE2cham_chamada ON cham_chamada > ( > dt_final , > flg_chamada > );