Re: How batch processing works - Mailing list pgsql-general

From Ron Johnson
Subject Re: How batch processing works
Date
Msg-id CANzqJaDLSfHeWqe5BGS93bLhE4PnUgW4eSZS1E4xUa83kQtBgA@mail.gmail.com
Whole thread Raw
In response to How batch processing works  (Lok P <loknath.73@gmail.com>)
Responses Re: How batch processing works
List pgsql-general
On Thu, Sep 19, 2024 at 1:31 AM Lok P <loknath.73@gmail.com> wrote:
Hello,
Saw multiple threads around the same , so I want some clarification. As we know row by row is slow by slow processing , so in heavy write systems(say the client app is in Java) , people asked to do DMLS in batches rather in a row by row fashion to minimize the chatting or context switches between database and client which is resource intensive. What I understand is that , a true batch processing means the client has to collect all the input bind values and  prepare the insert statement and submit to the database at oneshot and then commit.

What it means actually and if we divide the option as below, which method truly does batch processing or there exists some other method for doing the batch processing considering postgres as backend database?

I understand, the first method below is truly a row by row processing in which context switches happen between client and database with each row, whereas the second method is just batching the commits but not a true batch processing as it will do the same amount of context switching between the database and client. But regarding the third and fourth method, will both execute in a similar fashion in the database with the same number of context switches? If any other better method exists to do these inserts in batches? Appreciate your guidance.


CREATE TABLE parent_table (
    id SERIAL PRIMARY KEY,
    name TEXT
);

CREATE TABLE child_table (
    id SERIAL PRIMARY KEY,
    parent_id INT REFERENCES parent_table(id),
    value TEXT
);


Method-1

insert into parent_table values(1,'a');
commit;
insert into parent_table values(2,'a');
commit;
insert into child_table values(1,1,'a');
Commit;
insert into child_table values(1,2,'a');
commit;

VS

Method-2

insert into parent_table values(1,'a');
insert into parent_table values(2,'a');
insert into child_table values(1,1,'a');
insert into child_table values(1,2,'a');  
Commit;

As a former "DP" programmer, from an application point of view, this is absolutely batch programming.

My experience was with COBOL and C, though, which were low overhead.  From what I've seen in PG log files, JDBC is astoundingly chatty.
 
[snip] 

Method-4

INSERT INTO parent_table VALUES  (1, 'a'), (2, 'a');
INSERT INTO child_table VALUES   (1,1, 'a'), (1,2, 'a');
commit;

If I knew that I had to load a structured input data file (even if it had parent and child records), this is how I'd do it (but probably first try and see if "in-memory COPY INTO" is such a thing).

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> crustacean!

pgsql-general by date:

Previous
From: Lok P
Date:
Subject: How batch processing works
Next
From: veem v
Date:
Subject: Re: IO related waits