Thread: Manual query vs trigger during data load

Manual query vs trigger during data load

From

yudhi s

Date:

13 September 2024, 10:31:58

Hello All,

We are having a table which is going to be inserted with 100's of millions of rows each day. And we now want to have a requirement in which we need to do some transformation/lookup logic built on top of a few of the input bind values , while inserting the data. So I wanted to understand ,is it possible to do it along with the INSERT query or is it better to have a trigger created for the same?

For. e.g Below is the current Insert query used in the Java code. We want to fetch the value for "column2" from a lookup table rather than directly inserting as it's coming from the customer side. So I am thinking of a trigger like below. But at the same time I also want to compare the performance of a normal way of doing the lookup vs having it performed using triggers.

So one way i am thinking is first fetching the value of the "column2" from reference_tab1 using a separate "select query" in Java code itself, and then passing that to the below insert query, but i think that will increase the response time as that will be a separate DB call.

1)So, is there a way I can do it directly using the single INSERT query itself without additional SELECT query? And then will try to compare that with the trigger based approach.
2)Additionally , if this decision will impact a batch insert approach. i.e. say , in case of trigger , will the batch insert fail because trigger will force it to make it row by row?

INSERT INTO tab_part1 (column1, column2, column3, column4, column5, part_date)
VALUES (:v_col1, :v_col2, :v_col3, :v_col3, :v_col4,:v_col5, CURRENT_DATE);

CREATE OR REPLACE FUNCTION trg_func_populate_column2() RETURNS TRIGGER AS $$
BEGIN
-- Fetch reference value and populate column2
NEW.column2 := (SELECT lookup_key FROM reference_tab1 WHERE lookup_key = old.column2);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;

Regards

Yudhi

Manual query vs trigger during data load

From

Thiemo Kellner

Date:

13 September 2024, 10:49:59

Hi

To me it seems possible to create an insert select, e.g.

Insert into tab1 (val1, val2)
Select valA, valB
From tab2
Where valC = :param1

Cheers

Thiemo

Re: Manual query vs trigger during data load

From

Juan Rodrigo Alejandro Burgos Mella

Date:

13 September 2024, 12:58:05

Hello, I find it unlikely that the trigger will work properly, since the reserved fields of the OLD subset have no value in an INSERT

Atte
JRBN

El vie, 13 de sept de 2024, 04:32, yudhi s <learnerdatabase99@gmail.com> escribió:

Hello All,

We are having a table which is going to be inserted with 100's of millions of rows each day. And we now want to have a requirement in which we need to do some transformation/lookup logic built on top of a few of the input bind values , while inserting the data. So I wanted to understand ,is it possible to do it along with the INSERT query or is it better to have a trigger created for the same?

For. e.g Below is the current Insert query used in the Java code. We want to fetch the value for "column2" from a lookup table rather than directly inserting as it's coming from the customer side. So I am thinking of a trigger like below. But at the same time I also want to compare the performance of a normal way of doing the lookup vs having it performed using triggers.

So one way i am thinking is first fetching the value of the "column2" from reference_tab1 using a separate "select query" in Java code itself, and then passing that to the below insert query, but i think that will increase the response time as that will be a separate DB call.

1)So, is there a way I can do it directly using the single INSERT query itself without additional SELECT query? And then will try to compare that with the trigger based approach.
2)Additionally , if this decision will impact a batch insert approach. i.e. say , in case of trigger , will the batch insert fail because trigger will force it to make it row by row?

INSERT INTO tab_part1 (column1, column2, column3, column4, column5, part_date)
VALUES (:v_col1, :v_col2, :v_col3, :v_col3, :v_col4,:v_col5, CURRENT_DATE);

CREATE OR REPLACE FUNCTION trg_func_populate_column2() RETURNS TRIGGER AS $$
BEGIN
-- Fetch reference value and populate column2
NEW.column2 := (SELECT lookup_key FROM reference_tab1 WHERE lookup_key = old.column2);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;

Regards
Yudhi

Re: Manual query vs trigger during data load

From

Adrian Klaver

Date:

13 September 2024, 17:50:56

On 9/13/24 02:58, Juan Rodrigo Alejandro Burgos Mella wrote:
> Hello, I find it unlikely that the trigger will work properly, since the 
> reserved fields of the OLD subset have no value in an INSERT

I'm not seeing that the OP is asking for OLD.* values, they are just 
looking to include the result of a lookup on another table in the INSERT.

> 
> Atte
> JRBN
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com

Re: Manual query vs trigger during data load

From

Adrian Klaver

Date:

13 September 2024, 17:57:28

On 9/13/24 07:50, Adrian Klaver wrote:
> On 9/13/24 02:58, Juan Rodrigo Alejandro Burgos Mella wrote:
>> Hello, I find it unlikely that the trigger will work properly, since 
>> the reserved fields of the OLD subset have no value in an INSERT
> 
> I'm not seeing that the OP is asking for OLD.* values, they are just 
> looking to include the result of a lookup on another table in the INSERT.

My mistake I see the OLD reference now.

> 
>>
>> Atte
>> JRBN
>>
> 
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com

Re: Manual query vs trigger during data load

From

Rob Sargent

Date:

13 September 2024, 20:22:53


> On Sep 13, 2024, at 10:57 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>
> On 9/13/24 07:50, Adrian Klaver wrote:
>>> On 9/13/24 02:58, Juan Rodrigo Alejandro Burgos Mella wrote:
>>> Hello, I find it unlikely that the trigger will work properly, since the reserved fields of the OLD subset have no
valuein an INSERT 
>> I'm not seeing that the OP is asking for OLD.* values, they are just looking to include the result of a lookup on
anothertable in the INSERT. 
>
> My mistake I see the OLD reference now.
>
>>>
Personally I would cache the lookup on the java side a send the correct value to a simple insert statement

>
>
>

Re: Manual query vs trigger during data load

From

yudhi s

Date:

13 September 2024, 22:24:49

On Fri, Sep 13, 2024 at 8:27 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:

On 9/13/24 07:50, Adrian Klaver wrote:
> On 9/13/24 02:58, Juan Rodrigo Alejandro Burgos Mella wrote:
>> Hello, I find it unlikely that the trigger will work properly, since
>> the reserved fields of the OLD subset have no value in an INSERT
>
> I'm not seeing that the OP is asking for OLD.* values, they are just
> looking to include the result of a lookup on another table in the INSERT.

My mistake I see the OLD reference now.

>

My mistake.The trigger was supposed to use "new.col2" and fetch the corresponding lookup value from the lookup table and insert that value to the target table.

Now my question was ,in such a situation , the trigger will work fine , but is that the optimal way of doing ? Or should we convert the query someway such that the lookup table can be queried along with the INSERT at one shot from the database with a single DB call? And is it true that the trigger on the target table will suppress the batch insert and make it row by row, even if we call it in a batch fashion?

As "thiemo" mentioned , it can be done as below method, but if we have multiple lookup tables to be populated for multiple columns , then , how can the INSERT query be tweaked to cater the need here? And I understand , the lookup table can be cached in Java and refreshed at a certain point in time, but I was trying to understand if this can be doable by directly querying the database, considering the lookup tables are having large data sets in them.

Insert into tab1 (val1, val2)
Select valA, valB
From tab2
Where valC = :param1

Re: Manual query vs trigger during data load

From

yudhi s

Date:

14 September 2024, 18:51:45

On Sat, Sep 14, 2024 at 4:17 PM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:

On 2024-09-14 00:54:49 +0530, yudhi s wrote:
> As "thiemo" mentioned , it can be done as below method, but if we have
> multiple lookup tables to be populated for multiple columns , then , how can
> the INSERT query be tweaked to cater the need here?

Just use a join:
insert into target(val1, val2, val3, val4)
select :param1, cfgA.substA, :param3, cfgB.substB
from cfgA, cfgB
where cfgA.keyA = :param2 and cfgB.keyB = :param4

Or use a CTE per lookup which might be more readable:

with cA as ( select substA from cfgA where keyA = :param2 ),
cB as ( select substB from cfgB where keyB = :param4 )
insert into target(val1, val2, val3, val4)
select :param1, cA.substA, :param3, cB.substB
from cA, cB

Thank you. I will try these options.

Also we are trying to do something as below , which will separate the tables based on the specific lookup fields for the target tables and thus it will look simple rather than using those reference tables in the From clause which may cause some confusion in reading the code or not sure if it will cause cartesian. Please correct me if I'm wrong.

INSERT INTO tab_part1 (column1, column2, column3, column4, column5, part_date)

VALUES ( :v_col1, (SELECT lookup_value FROM reference_tab1 WHERE lookup_key = :v_col2), :v_col3, :v_col4, :v_col5, CURRENT_DATE );