Re: [GENERAL] slow inserts and updates on large tables - Mailing list pgsql-general

From jim@reptiles.org (Jim Mercer)
Subject Re: [GENERAL] slow inserts and updates on large tables
Date
Msg-id m10D7g0-00080dC@mailbox.reptiles.org
Whole thread Raw
In response to Re: [GENERAL] slow inserts and updates on large tables  (Herouth Maoz <herouth@oumail.openu.ac.il>)
Responses Re: [GENERAL] slow inserts and updates on large tables
List pgsql-general
> > if anyone has any pointers as to why this is so slow, lemme know.
>
> Have you checked the usual stuff:
>
> 1) Each insert and update creates its own transaction. That is, an
>    insert is in fact:
>
>    BEGIN TRANSACTION;
>    INSERT...;
>    COMMIT;
>
>    So, to make things faster you should BEGIN TRANSACTION explicitly
>    before all the inserts and COMMIT after them. Or separate into
>    manageable bulks if you run into a memory problem.

i have a script which times various methods:

------
data is 6047 records
insertama time was 56 seconds
psqlinsert time was 51 seconds
traninsert time was 51 seconds
psqlselect time was 0
------

the script drops and recreates the testtable before each test.

insertama read the data from a text file and uses PQexec to insert each record.
it does not use begin/commit.

psqlinsert is "psql -f insert.sql" where insert.sql is a file of insert
commands (created by printf'ing the PQexec args from insertama).

traninsert is the same as psqlinsert, except it has "begin transaction" and
"commit transaction" at the start and end.

psqlselect is a "psql -c select count(callid) from testtable".

so, according to my tests, the begin/commit stuff doesn't do me any good.

> 2) Indexing tables mean faster SELECTs at the expense of slower INSERTs
>    and UPDATEs. There is no magic. The reasoning is that normally you
>    query the data a lot more than you change it.
>
>    Thus, it is preferable, before doing bulk inserts, to drop the
>    indices and recreate them afterwards. This is true when you are not
>    expectind the database to be queried at the same time the inserts are
>    made.

the production database currently has over 5 million records, and is expected
to run at 10-15 million is full production.

the inserts happen in batches, every 10 minutes or so.

recreating the index takes some 30 minutes as it is.

however, if i use the above bench test script, but with the index removed,
i get:

------
data is 6047 records
insertama time was 53 seconds
psqlinsert time was 47 seconds
traninsert time was 48 seconds
psqlselect time was 0 seconds
------

as you can see, it didn't make much difference.

>    As for updates, it's trickier, because you actually use the index
>    for the WHERE part of the update. If speed is of an essence, I would
>    probably try the following:
>
>    SELECT * INTO TABLE temp_table FROM your_table WHERE update_condition;
>    DELETE FROM your_table WHERE update_condition;
>    DROP INDEX...;
>    INSERT INTO your_table SELECT ... FROM temp_table; -- update within select
>    CREATE INDEX...;

as stated above, the size of the table means i can't do this on every update.

> 3) Back to the issue of INSERTS - copies are faster. If you can transform
>    the data into tab-delimited format as required by COPY, you save a lot
>    of time on parsing, planning etc.

this sorta defeats the purpose of putting the data in an SQL database. 8^)

--
[ Jim Mercer    Reptilian Research      jim@reptiles.org   +1 416 410-5633 ]
[ The telephone, for those of you who  have forgotten, was a commonly used ]
[ communications technology in the days before electronic mail.            ]
[ They're still easy to find in most large cities. -- Nathaniel Borenstein ]

pgsql-general by date:

Previous
From: Janis Pinkis
Date:
Subject: AIX 4.3.2 and 6.4.2
Next
From: jim@reptiles.org (Jim Mercer)
Date:
Subject: Re: [GENERAL] slow inserts and updates on large tables