Thread: Detect which sum of cartesian product (+ any combination of n records in tables) exceeds a value
Detect which sum of cartesian product (+ any combination of n records in tables) exceeds a value
From
agharta
Date:
Hi all, I hope someone can helps me.... I have a problem detecting a sum of cartesian product of tables. ---------------------------- Test case: //CREATE TABLES create table t1 ( id serial, field_1 integer); create table t2 ( id serial, field_1 integer); create table t3 ( id serial, field_1 integer); create table t4 ( id serial, field_1 integer); //FILL TABLES insert into t1 (field_1) select cast(random()*10 as integer) from generate_series(1,10); insert into t2 (field_1) select cast(random()*10 as integer) from generate_series(1,10); insert into t3 (field_1) select cast(random()*10 as integer) from generate_series(1,10); insert into t4 (field_1) select cast(random()*10 as integer) from generate_series(1,10); -------------------- Example: i have 4 tables with fields, i would detect which combination of field_1 in any table exceed a value (eg. 35). Simple, ugly & slow but simple: select * from t1, t2,t3,t4 where t1.field_1 + t2.field_1 + t3.field_1 + t4.field_1 >35 It works. Now my question: i would determine which combination on field_1 of t1,t2,t3 plus a combination(any) of 2 records on field_1 of t4, exceeds a value (eg. 35) It should be something like t1.field_1 + t2.field_1 + t3.field_1 + ( any combination of 2 records of t4.field_1) > 35 Suppose i have these records in tables (field_1), for simple explain of my problem: t1 = 1 t2 = 5 t3 = 4 t4 = 1,3,4 the combination of 2 record on t4.field_1 should be: 1+5+4 + ( 1+3) 1+5+4 + ( 1+4) 1+5+4 + ( 3+1) 1+5+4 + ( 3+4) 1+5+4 + ( 4+1) 1+5+4 + ( 4+3) How to do it??? This is a static test case with a static (2 records) problem, in my production db it could be any combination (2,3,4,5+ records ) of field_1 of any table. Hope I was clear, Best regards and thanks in advance, Agharta
Re: Detect which sum of cartesian product (+ any combination of n records in tables) exceeds a value
From
"David G. Johnston"
Date:
<div dir="ltr"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><span style="font-family:arial,sans-serif">OnThu, Mar 19, 2015 at 9:30 AM, agharta </span><span dir="ltr" style="font-family:arial,sans-serif"><<ahref="mailto:agharta82@gmail.com" target="_blank">agharta82@gmail.com</a>></span><spanstyle="font-family:arial,sans-serif"> wrote:</span><br /></div><divclass="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px#ccc solid;padding-left:1ex"><br /> It should be something like t1.field_1 + t2.field_1 + t3.field_1+ ( any combination of 2 records of t4.field_1) > 35<br /><span class="HOEnZb"><font color="#888888"><br /></font></span></blockquote></div><br/></div><div class="gmail_extra"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Icould probably brute-force write such a query in maybe a half-hour. I likelywould not be alive if I tried executing it on any non-trivial sized database though.</div><div class="gmail_default"style="font-family:arial,helvetica,sans-serif"><br /></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Asan algorithm:</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br/></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Createtwo relations (temp tables/views/materialized views), one for t1/t2/t3and one for t4/t4 each having a single row for every potential combination of rows. Each table would contributetwo values, the content of "field_1" and the primary key of the corresponding table. The new PK would be a compositeof all the contributing PKs</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br /></div><divclass="gmail_default" style="font-family:arial,helvetica,sans-serif">For each relation, if the sum of the valuecolumns is > 35 then every single row from the other table will provide a match. This is your first output.</div><divclass="gmail_default" style="font-family:arial,helvetica,sans-serif"><br /></div><div class="gmail_default"style="font-family:arial,helvetica,sans-serif">Cross Join the two relations, after removing those ineach that were matched above, and sum together all 5 fields. This is your second output.</div><div class="gmail_default"style="font-family:arial,helvetica,sans-serif"><br /></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">UnionAll the two outputs together and you have your result.</div><div class="gmail_default"style="font-family:arial,helvetica,sans-serif"><br /></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Itcan be done in one step but this at least gives you a prayer of executingin reasonable time for meaningfully sized datasets. You can just write the second part and avoid the union untilyour data warrants the more complex, but likely faster, setup.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br/></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">DavidJ.</div></div></div>
Re: Detect which sum of cartesian product (+ any combination of n records in tables) exceeds a value
From
agharta
Date:
<div class="moz-cite-prefix"><br /> On 03/19/2015 06:05 PM, David G. Johnston wrote:<br /></div><blockquote cite="mid:CAKFQuwY27=A_Y3HjndKhAQzZtg4Y3wAHfDCXvEiHL6fMciq_ng@mail.gmail.com"type="cite"><div dir="ltr"><div class="gmail_extra"><br/></div><div class="gmail_extra"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Ilikely would not be alive if I tried executing it on any non-trivial sizeddatabase though.</div></div></div></blockquote><br /> Me too :) !<br /><br /><blockquote cite="mid:CAKFQuwY27=A_Y3HjndKhAQzZtg4Y3wAHfDCXvEiHL6fMciq_ng@mail.gmail.com"type="cite"><div dir="ltr"><div class="gmail_extra"><divclass="gmail_default" style="font-family:arial,helvetica,sans-serif"><br /></div><div class="gmail_default"style="font-family:arial,helvetica,sans-serif">As an algorithm:</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br/></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Createtwo relations (temp tables/views/materialized views), one for t1/t2/t3and one for t4/t4 each having a single row for every potential combination of rows. Each table would contributetwo values, the content of "field_1" and the primary key of the corresponding table. The new PK would be a compositeof all the contributing PKs</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br /></div><divclass="gmail_default" style="font-family:arial,helvetica,sans-serif">For each relation, if the sum of the valuecolumns is > 35 then every single row from the other table will provide a match. This is your first output.</div><divclass="gmail_default" style="font-family:arial,helvetica,sans-serif"><br /></div><div class="gmail_default"style="font-family:arial,helvetica,sans-serif">Cross Join the two relations, after removing those ineach that were matched above, and sum together all 5 fields. This is your second output.</div><div class="gmail_default"style="font-family:arial,helvetica,sans-serif"><br /></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">UnionAll the two outputs together and you have your result.</div><div class="gmail_default"style="font-family:arial,helvetica,sans-serif"><br /></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Itcan be done in one step but this at least gives you a prayer of executingin reasonable time for meaningfully sized datasets. You can just write the second part and avoid the union untilyour data warrants the more complex, but likely faster, setup.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br/></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">DavidJ.</div></div></div></blockquote><br /> You're right, this should bethe fastest implementation possible, but cross/cartesian matching is very slow with a huge amount of data (it is natural).<br /><br /> I think that a simple & dynamic (t4/t4/t4/t4... n times) solution is not possible, as 9.4 PG version.Correct me if i am wrong.<br /><br /><div id="gt-src-tools"><div id="gt-src-tools-l"><div id="gt-input-tool" style="display:inline-block;"><div id="itamenu"><span class="ita-kd-inputtools-div"></span></div></div></div></div><div class="almost_half_cell"id="gt-res-content"><div dir="ltr" style="zoom:1"><span class="short_text" id="result_box" lang="en"><spanclass="hps alt-edited">I hoped</span> <span class="hps">that there was</span> <span class="hps">a magic-trick-functionthat would resolve the problem. Nope. :(<br /><br /> I need to review & rewrite my db/applicationto solve the problem in another way. <br /><br /></span></span><br /><span class="short_text" id="result_box"lang="en"><span class="hps"><span class="short_text" id="result_box" lang="en"><span class="hps">I owe you</span><span class="hps">a beer</span></span>, thanks a lot for your suggestions.<br /><br /><br /> Cheers,<br /><br />Agharta<br /><br /><br /></span></span></div></div><br />
Re: Detect which sum of cartesian product (+ any combination of n records in tables) exceeds a value
From
agharta
Date:
<div class="moz-text-html" lang="x-unicode"><div class="moz-cite-prefix"><br /> On 03/19/2015 06:05 PM, David G. Johnstonwrote:<br /></div><blockquote cite="mid:CAKFQuwY27=A_Y3HjndKhAQzZtg4Y3wAHfDCXvEiHL6fMciq_ng@mail.gmail.com" type="cite"><divdir="ltr"><div class="gmail_extra"><br /></div><div class="gmail_extra"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Ilikely would not be alive if I tried executing it on any non-trivial sizeddatabase though.</div></div></div></blockquote><br /> Me too :) !<br /><br /><blockquote cite="mid:CAKFQuwY27=A_Y3HjndKhAQzZtg4Y3wAHfDCXvEiHL6fMciq_ng@mail.gmail.com"type="cite"><div dir="ltr"><div class="gmail_extra"><divclass="gmail_default" style="font-family:arial,helvetica,sans-serif"><br /></div><div class="gmail_default"style="font-family:arial,helvetica,sans-serif">As an algorithm:</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br/></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Createtwo relations (temp tables/views/materialized views), one for t1/t2/t3and one for t4/t4 each having a single row for every potential combination of rows. Each table would contributetwo values, the content of "field_1" and the primary key of the corresponding table. The new PK would be a compositeof all the contributing PKs</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br /></div><divclass="gmail_default" style="font-family:arial,helvetica,sans-serif">For each relation, if the sum of the valuecolumns is > 35 then every single row from the other table will provide a match. This is your first output.</div><divclass="gmail_default" style="font-family:arial,helvetica,sans-serif"><br /></div><div class="gmail_default"style="font-family:arial,helvetica,sans-serif">Cross Join the two relations, after removing those ineach that were matched above, and sum together all 5 fields. This is your second output.</div><div class="gmail_default"style="font-family:arial,helvetica,sans-serif"><br /></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">UnionAll the two outputs together and you have your result.</div><div class="gmail_default"style="font-family:arial,helvetica,sans-serif"><br /></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Itcan be done in one step but this at least gives you a prayer of executingin reasonable time for meaningfully sized datasets. You can just write the second part and avoid the union untilyour data warrants the more complex, but likely faster, setup.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br/></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">DavidJ.</div></div></div></blockquote><br /> You're right, this should bethe fastest implementation possible, but cross/cartesian matching is very slow with a huge amount of data (it is natural).<br /><br /> I think that a simple & dynamic (t4/t4/t4/t4... n times) solution is not possible, as 9.4 PG version.Correct me if i am wrong.<br /><br /><div id="gt-src-tools"><div id="gt-src-tools-l"><div id="gt-input-tool" style="display:inline-block;"><div id="itamenu"><span class="ita-kd-inputtools-div"></span></div></div></div></div><div class="almost_half_cell"id="gt-res-content"><div dir="ltr" style="zoom:1"><span class="short_text" id="result_box" lang="en"><spanclass="hps alt-edited">I hoped</span> <span class="hps">that there was</span> <span class="hps">a magic-trick-functionthat would resolve the problem. Nope. :(<br /><br /> I need to review & rewrite my db/applicationto solve the problem in another way. <br /><br /></span></span><br /><span class="short_text" id="result_box"lang="en"><span class="hps"><span class="short_text" id="result_box" lang="en"><span class="hps">I owe you</span><span class="hps">a beer</span></span>, thanks a lot for your suggestions.<br /><br /><br /> Cheers,<br /><br />Agharta<br /><br /><br /></span></span></div></div><br /></div>