Re: How batch processing works - Mailing list pgsql-general

From Lok P
Subject Re: How batch processing works
Date
Msg-id CAKna9Vb7kg4FjkYRrDJLFGX6cqwYSxEJwo3uEq5uSupH=wp+ew@mail.gmail.com
Whole thread Raw
In response to Re: How batch processing works  (Lok P <loknath.73@gmail.com>)
List pgsql-general

Below are the results for the posted methods. Tested it on local and it gave no difference in timing between the method-2 andmethod-3. Failed to run in dbfiddle somehow.

Also I was initially worried if adding the trigger to the our target table, will worsen the performance as because , it will make all the execution to "row by row" rather a true batch insert(method-3 as posted) as there will be more number of context switches , but it seems it will still be doing the batch commits(like the way its in method-2). So as per that , we won't lose any performance as such. Is this understanding correct?


Method-1- 00:01:44.48

Method-2- 00:00:02.67

Method-3- 00:00:02.39

https://gist.github.com/databasetech0073/8e9106757d751358c0c0c65a2374dbc6


On Thu, Sep 19, 2024 at 6:42 PM Lok P <loknath.73@gmail.com> wrote:


On Thu, Sep 19, 2024 at 5:40 PM Ron Johnson <ronljohnsonjr@gmail.com> wrote:
On Thu, Sep 19, 2024 at 5:24 AM Lok P <loknath.73@gmail.com> wrote:


[snip] 
DO $$
DECLARE
    num_inserts INTEGER := 100000;
    batch_size INTEGER := 50;
    start_time TIMESTAMP;
    end_time TIMESTAMP;
    elapsed_time INTERVAL;
    i INTEGER;
BEGIN
    -- Method 1: Individual Inserts with Commit after every Row
    start_time := clock_timestamp();

    FOR i IN 1..num_inserts LOOP
        INSERT INTO parent_table VALUES (i, 'a');
        COMMIT;  
    END LOOP;

    end_time := clock_timestamp();
    elapsed_time := end_time - start_time;
    INSERT INTO debug_log (method1, start_time, end_time, elapsed_time)
    VALUES ('Method 1: Individual Inserts with Commit after every Row', start_time, end_time, elapsed_time);

    -- Method 2: Individual Inserts with Commit after 100 Rows
    start_time := clock_timestamp();

    FOR i IN 1..num_inserts LOOP
        INSERT INTO parent_table2 VALUES (i, 'a');
        -- Commit after every 100 rows
        IF i % batch_size = 0 THEN
            COMMIT;  
        END IF;
    END LOOP;

    -- Final commit if not already committed
   commit;

    end_time := clock_timestamp();
    elapsed_time := end_time - start_time;
    INSERT INTO debug_log (method1, start_time, end_time, elapsed_time)
    VALUES ('Method 2: Individual Inserts with Commit after 100 Rows', start_time, end_time, elapsed_time);

    -- Method 3: Batch Inserts with Commit after all
    start_time := clock_timestamp();

    FOR i IN 1..(num_inserts / batch_size) LOOP
        INSERT INTO parent_table3 VALUES
            (1 + (i - 1) * batch_size, 'a'),
[snip] 
            (49 + (i - 1) * batch_size, 'a'),
            (50 + (i - 1) * batch_size, 'a'));
COMMIT;
    END LOOP;

    COMMIT;  -- Final commit for all
    end_time := clock_timestamp();
    elapsed_time := end_time - start_time;
    INSERT INTO debug_log (method1, start_time, end_time, elapsed_time)
    VALUES ('Method 3: Batch Inserts with Commit after All', start_time, end_time, elapsed_time);

END $$;

Reproduce what behavior?

Anyway, plpgsql functions (including anonymous DO statements) are -- to Postgresql -- single statements.  Thus, they'll be faster than individual calls..

An untrusted language like plpython3u might speed things up even more, if you have to read a heterogeneous external file and insert all the records into the db.

Here if you see my script , the method-1 is doing commit after each row insert. And method-2 is doing a batch commit i.e. commit after every "50" row. And method-3 is doing a true batch insert i.e. combining all the 50 values in one insert statement and submitting to the database in oneshot and then COMMIT it, so the context switching will be a lot less. So I was expecting Method-3 to be the fastest way to insert the rows here, but the response time shows the same response time for Method-2 and method-3. Method-1 is the slowest through.

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Need assistance in converting subqueries to joins
Next
From: KK CHN
Date:
Subject: PgBackRest and WAL archive expiry