Home > mailing lists

Re: How batch processing works - Mailing list pgsql-general

From	Ron Johnson
Subject	Re: How batch processing works
Date	September 19, 2024 15:07:54
Msg-id	CANzqJaBahWw3gGhgaatOQnUh3Daij7kjtk7a3+Z=dJNL4d4SKg@mail.gmail.com Whole thread
In response to	Re: How batch processing works (Lok P <loknath.73@gmail.com>)
Responses	Re: How batch processing works
List	pgsql-general

Tree view

On Thu, Sep 19, 2024 at 5:24 AM Lok P <loknath.73@gmail.com> wrote:

On Thu, Sep 19, 2024 at 11:31 AM Ron Johnson <ronljohnsonjr@gmail.com> wrote:

[snip]

Method-4

INSERT INTO parent_table VALUES (1, 'a'), (2, 'a');
INSERT INTO child_table VALUES (1,1, 'a'), (1,2, 'a');
commit;

If I knew that I had to load a structured input data file (even if it had parent and child records), this is how I'd do it (but probably first try and see if "in-memory COPY INTO" is such a thing).

I was trying to reproduce this behaviour using row by row commit vs just batch commit vs true batch insert as you mentioned, i am not able to see any difference between "batch commit" and "true batch insert" response. Am I missing anything?

[snip]

DO $$
DECLARE
num_inserts INTEGER := 100000;
batch_size INTEGER := 50;
start_time TIMESTAMP;
end_time TIMESTAMP;
elapsed_time INTERVAL;
i INTEGER;
BEGIN
-- Method 1: Individual Inserts with Commit after every Row
start_time := clock_timestamp();

FOR i IN 1..num_inserts LOOP
INSERT INTO parent_table VALUES (i, 'a');
COMMIT;
END LOOP;

end_time := clock_timestamp();
elapsed_time := end_time - start_time;
INSERT INTO debug_log (method1, start_time, end_time, elapsed_time)
VALUES ('Method 1: Individual Inserts with Commit after every Row', start_time, end_time, elapsed_time);

-- Method 2: Individual Inserts with Commit after 100 Rows
start_time := clock_timestamp();

FOR i IN 1..num_inserts LOOP
INSERT INTO parent_table2 VALUES (i, 'a');
-- Commit after every 100 rows
IF i % batch_size = 0 THEN
COMMIT;
END IF;
END LOOP;

-- Final commit if not already committed
commit;

end_time := clock_timestamp();
elapsed_time := end_time - start_time;
INSERT INTO debug_log (method1, start_time, end_time, elapsed_time)
VALUES ('Method 2: Individual Inserts with Commit after 100 Rows', start_time, end_time, elapsed_time);

-- Method 3: Batch Inserts with Commit after all
start_time := clock_timestamp();

FOR i IN 1..(num_inserts / batch_size) LOOP
INSERT INTO parent_table3 VALUES
(1 + (i - 1) * batch_size, 'a'),

[snip]

(49 + (i - 1) * batch_size, 'a'),
(50 + (i - 1) * batch_size, 'a'));
COMMIT;
END LOOP;

COMMIT; -- Final commit for all
end_time := clock_timestamp();
elapsed_time := end_time - start_time;
INSERT INTO debug_log (method1, start_time, end_time, elapsed_time)
VALUES ('Method 3: Batch Inserts with Commit after All', start_time, end_time, elapsed_time);

END $$;

Reproduce what behavior?

Anyway, plpgsql functions (including anonymous DO statements) are -- to Postgresql -- single statements. Thus, they'll be faster than individual calls..

An untrusted language like plpython3u might speed things up even more, if you have to read a heterogeneous external file and insert all the records into the db.

Death to <Redacted>, and butter sauce.

Don't boil me, I'm still alive.

<Redacted> crustacean!

pgsql-general by date:

From: Lok P
Date: 19 September 2024, 12:24:16
Subject: Re: How batch processing works

From: Greg Sabino Mullane
Date: 19 September 2024, 15:24:19
Subject: Re: IO related waits

Re: How batch processing works - Mailing list pgsql-general

Previous

Next