Thread: execute same query only one time?
Hi, is there a best practice to share data between two select statements? Imaging following situation: I want to receive two result sets from two tables, referring to a specific id from table t0 AND I try not to query for that specific id a second time. Table t0 returns 1 row and table t1 returns multiple rows. begin; select id, col1, col2, ... from t0 where id = (select max(id) from t0 where col1 = value1 and col2 = value2 and ...); select col1 from t1 where t0_id = (select max(id) from t0 where col1 = value1 and col2 = value2 and ...); commit; Best regards Johannes
Attachment
Hi,
is there a best practice to share data between two select statements?
Imaging following situation: I want to receive two result sets from two
tables, referring to a specific id from table t0 AND I try not to query
for that specific id a second time.
Table t0 returns 1 row and table t1 returns multiple rows.
begin;
select id, col1, col2, ... from t0 where id = (select max(id) from t0
where col1 = value1 and col2 = value2 and ...);
select col1 from t1 where t0_id = (select max(id) from t0 where col1 =
value1 and col2 = value2 and ...);
commit;
Please confirm:
You want the result of "SELECT max(id) FROM t0" to be used in the second query without having to recompute it?
What client are you using to execute these statements?
Dave
Not an expert, but I would try a temporary unlogged table.
Sent from my android device.
-----Original Message-----
From: Johannes <jotpe@posteo.de>
To: pgsql-general@postgresql.org
Sent: Mon, 08 Feb 2016 11:07
Subject: [GENERAL] execute same query only one time?
Hi, Sent from my android device.
-----Original Message-----
From: Johannes <jotpe@posteo.de>
To: pgsql-general@postgresql.org
Sent: Mon, 08 Feb 2016 11:07
Subject: [GENERAL] execute same query only one time?
is there a best practice to share data between two select statements?
Imaging following situation: I want to receive two result sets from two
tables, referring to a specific id from table t0 AND I try not to query
for that specific id a second time.
Table t0 returns 1 row and table t1 returns multiple rows.
begin;
select id, col1, col2, ... from t0 where id = (select max(id) from t0
where col1 = value1 and col2 = value2 and ...);
select col1 from t1 where t0_id = (select max(id) from t0 where col1 =
value1 and col2 = value2 and ...);
commit;
Best regards Johannes
On 02/08/2016 11:05 AM, Johannes wrote: > Hi, > > is there a best practice to share data between two select statements? A join: http://www.postgresql.org/docs/9.4/interactive/sql-select.html Search for: join_type > > Imaging following situation: I want to receive two result sets from two > tables, referring to a specific id from table t0 AND I try not to query > for that specific id a second time. > > Table t0 returns 1 row and table t1 returns multiple rows. > > begin; > select id, col1, col2, ... from t0 where id = (select max(id) from t0 > where col1 = value1 and col2 = value2 and ...); > select col1 from t1 where t0_id = (select max(id) from t0 where col1 = > value1 and col2 = value2 and ...); > commit; Based on rough guess of the above, without seeing actual table schemas: select id, t0.col1, t1.col1, col2, ... from t0 JOIN t1 ON t0.id = t1.t0_id where id = (select max(id) from t0 where col1 = value1 and col2 = value2 and ...); > > Best regards Johannes > -- Adrian Klaver adrian.klaver@aklaver.com
On 2/8/16, Johannes <jotpe@posteo.de> wrote: > Hi, > > is there a best practice to share data between two select statements? > > Imaging following situation: I want to receive two result sets from two > tables, referring to a specific id from table t0 AND I try not to query > for that specific id a second time. > > Table t0 returns 1 row and table t1 returns multiple rows. > > begin; > select id, col1, col2, ... from t0 where id = (select max(id) from t0 > where col1 = value1 and col2 = value2 and ...); > select col1 from t1 where t0_id = (select max(id) from t0 where col1 = > value1 and col2 = value2 and ...); > commit; > > Best regards Johannes Yes. You can use temporary autodeleting tables[1] for that. Similar to: BEGIN; CREATE TEMPORARY TABLE temptable(id int) ON COMMIT DROP; INSERT INTO temptable SELECT max(id) FROM t0 WHERE col1 = value1 and col2 = value2 and ...; SELECT id, col1, col2, ... FROM t0 INNER NATURAL JOIN temptable; SELECT col1 FROM t1 INNER JOIN temptable ON (t0_id = temptable.id); COMMIT; [1]http://www.postgresql.org/docs/9.5/static/sql-createtable.html -- Best regards, Vitaly Burovoy
On 2/8/16, Adrian Klaver <adrian.klaver@aklaver.com> wrote: > On 02/08/2016 11:05 AM, Johannes wrote: >> Imaging following situation: I want to receive two result sets from two >> tables, referring to a specific id from table t0 AND I try not to query >> for that specific id a second time. > >> Table t0 returns 1 row and table t1 returns multiple rows. >> >> begin; >> select id, col1, col2, ... from t0 where id = (select max(id) from t0 >> where col1 = value1 and col2 = value2 and ...); >> select col1 from t1 where t0_id = (select max(id) from t0 where col1 = >> value1 and col2 = value2 and ...); >> commit; >> >> Best regards Johannes > > Based on rough guess of the above, without seeing actual table schemas: > > select id, t0.col1, t1.col1, col2, ... from t0 JOIN t1 ON t0.id = > t1.t0_id where id = (select max(id) from t0 where col1 = value1 and col2 > = value2 and ...); I don't think it is a good solution because it leads to copying columns from the t0 which is wasting net traffic and increasing complexity at the client side. Moreover it works iff t0 returns only one row. > > -- > Adrian Klaver > adrian.klaver@aklaver.com -- Best regards, Vitaly Burovoy
On 2/8/16, travis@traviswellman.com <travis@traviswellman.com> wrote: > Not an expert, but I would try a temporary unlogged table. Note: temporary tables are always unlogged. Please, 1. Don't top post. 2. Use "Reply to all" to be sure an author of an original letter gets your answer even if he hasn't subscribed to the list. > -----Original Message----- > From: Johannes <jotpe@posteo.de> > To: pgsql-general@postgresql.org > Sent: Mon, 08 Feb 2016 11:07 > Subject: [GENERAL] execute same query only one time? > > Hi, > > is there a best practice to share data between two select statements? > > Imaging following situation: I want to receive two result sets from two > tables, referring to a specific id from table t0 AND I try not to query > for that specific id a second time. > > Table t0 returns 1 row and table t1 returns multiple rows. > > begin; > select id, col1, col2, ... from t0 where id = (select max(id) from t0 > where col1 = value1 and col2 = value2 and ...); > select col1 from t1 where t0_id = (select max(id) from t0 where col1 = > value1 and col2 = value2 and ...); > commit; > > Best regards Johannes -- Best regards, Vitaly Burovoy
Am 08.02.2016 um 20:15 schrieb David G. Johnston: > On Mon, Feb 8, 2016 at 12:05 PM, Johannes <jotpe@posteo.de> wrote: > >> Hi, >> >> is there a best practice to share data between two select statements? >> >> Imaging following situation: I want to receive two result sets from two >> tables, referring to a specific id from table t0 AND I try not to query >> for that specific id a second time. >> >> Table t0 returns 1 row and table t1 returns multiple rows. >> >> begin; >> select id, col1, col2, ... from t0 where id = (select max(id) from t0 >> where col1 = value1 and col2 = value2 and ...); >> select col1 from t1 where t0_id = (select max(id) from t0 where col1 = >> value1 and col2 = value2 and ...); >> commit; > > > Please confirm: > > You want the result of "SELECT max(id) FROM t0" to be used in the second > query without having to recompute it? Yes. > What client are you using to execute these statements? JDBC. I execute both statements at once and iterate through the resultsets. Johannes
Attachment
On 2/8/16, Johannes <jotpe@posteo.de> wrote: > Am 08.02.2016 um 20:15 schrieb David G. Johnston: >> On Mon, Feb 8, 2016 at 12:05 PM, Johannes <jotpe@posteo.de> wrote: >> >>> Hi, >>> >>> is there a best practice to share data between two select statements? >>> >>> Imaging following situation: I want to receive two result sets from two >>> tables, referring to a specific id from table t0 AND I try not to query >>> for that specific id a second time. >>> >>> Table t0 returns 1 row and table t1 returns multiple rows. >>> >>> begin; >>> select id, col1, col2, ... from t0 where id = (select max(id) from t0 >>> where col1 = value1 and col2 = value2 and ...); >>> select col1 from t1 where t0_id = (select max(id) from t0 where col1 = >>> value1 and col2 = value2 and ...); >>> commit; >> >> >> Please confirm: >> >> You want the result of "SELECT max(id) FROM t0" to be used in the >> second >> query without having to recompute it? > > Yes. > >> What client are you using to execute these statements? > > JDBC. I execute both statements at once and iterate through the resultsets. > > Johannes Hmm. Could you clarify why you don't want to pass id from the first query to the second one: select col1 from t1 where t0_id = value_id_from_the_first_query -- Best regards, Vitaly Burovoy
Am 08.02.2016 um 20:32 schrieb Vitaly Burovoy: > On 2/8/16, Adrian Klaver <adrian.klaver@aklaver.com> wrote: >> On 02/08/2016 11:05 AM, Johannes wrote: >>> Imaging following situation: I want to receive two result sets from two >>> tables, referring to a specific id from table t0 AND I try not to query >>> for that specific id a second time. >> >>> Table t0 returns 1 row and table t1 returns multiple rows. >>> >>> begin; >>> select id, col1, col2, ... from t0 where id = (select max(id) from t0 >>> where col1 = value1 and col2 = value2 and ...); >>> select col1 from t1 where t0_id = (select max(id) from t0 where col1 = >>> value1 and col2 = value2 and ...); >>> commit; >>> >>> Best regards Johannes >> >> Based on rough guess of the above, without seeing actual table schemas: >> >> select id, t0.col1, t1.col1, col2, ... from t0 JOIN t1 ON t0.id = >> t1.t0_id where id = (select max(id) from t0 where col1 = value1 and col2 >> = value2 and ...); > > I don't think it is a good solution because it leads to copying > columns from the t0 which is wasting net traffic and increasing > complexity at the client side. Moreover it works iff t0 returns only > one row. I had same doubts. CTE would be first class, if it was be reusable for other statements. Johannes
Attachment
Am 08.02.2016 um 21:17 schrieb Vitaly Burovoy: > On 2/8/16, Johannes <jotpe@posteo.de> wrote: >> Am 08.02.2016 um 20:15 schrieb David G. Johnston: >>> On Mon, Feb 8, 2016 at 12:05 PM, Johannes <jotpe@posteo.de> wrote: >>> >>>> Hi, >>>> >>>> is there a best practice to share data between two select statements? >>>> >>>> Imaging following situation: I want to receive two result sets from two >>>> tables, referring to a specific id from table t0 AND I try not to query >>>> for that specific id a second time. >>>> >>>> Table t0 returns 1 row and table t1 returns multiple rows. >>>> >>>> begin; >>>> select id, col1, col2, ... from t0 where id = (select max(id) from t0 >>>> where col1 = value1 and col2 = value2 and ...); >>>> select col1 from t1 where t0_id = (select max(id) from t0 where col1 = >>>> value1 and col2 = value2 and ...); >>>> commit; >>> >>> >>> Please confirm: >>> >>> You want the result of "SELECT max(id) FROM t0" to be used in the >>> second >>> query without having to recompute it? >> >> Yes. >> >>> What client are you using to execute these statements? >> >> JDBC. I execute both statements at once and iterate through the resultsets. >> >> Johannes > > Hmm. Could you clarify why you don't want to pass id from the first > query to the second one: > > select col1 from t1 where t0_id = value_id_from_the_first_query > Of course I could do that, but in that case I would not ask. I thougt there could be a better solution to execute all statements at once. Saving roundtrips, increase speed, a more sophistacted solution, learn something new... Johannes
Attachment
On 2/8/16, Johannes <jotpe@posteo.de> wrote: > > Am 08.02.2016 um 20:32 schrieb Vitaly Burovoy: >> On 2/8/16, Adrian Klaver <adrian.klaver@aklaver.com> wrote: >>> Based on rough guess of the above, without seeing actual table schemas: >>> >>> select id, t0.col1, t1.col1, col2, ... from t0 JOIN t1 ON t0.id = >>> t1.t0_id where id = (select max(id) from t0 where col1 = value1 and col2 >>> = value2 and ...); >> >> I don't think it is a good solution because it leads to copying >> columns from the t0 which is wasting net traffic and increasing >> complexity at the client side. Moreover it works iff t0 returns only >> one row. > > I had same doubts. > CTE would be first class, if it was be reusable for other statements. > > Johannes CTEs are temporary tables for a _statement_ for using a single statement instead of several ones (create temp table, insert into, select from it, select from it, drop temp table). But it is not your case because CTEs are for a queries which return a single set of rows. Your case is returning two sets (one row with several columns from t0 and several rows with a single columns from t1). -- Best regards, Vitaly Burovoy
Am 08.02.2016 um 21:33 schrieb Vitaly Burovoy: > On 2/8/16, Johannes <jotpe@posteo.de> wrote: >> >> Am 08.02.2016 um 20:32 schrieb Vitaly Burovoy: >>> On 2/8/16, Adrian Klaver <adrian.klaver@aklaver.com> wrote: >>>> Based on rough guess of the above, without seeing actual table schemas: >>>> >>>> select id, t0.col1, t1.col1, col2, ... from t0 JOIN t1 ON t0.id = >>>> t1.t0_id where id = (select max(id) from t0 where col1 = value1 and col2 >>>> = value2 and ...); >>> >>> I don't think it is a good solution because it leads to copying >>> columns from the t0 which is wasting net traffic and increasing >>> complexity at the client side. Moreover it works iff t0 returns only >>> one row. >> >> I had same doubts. >> CTE would be first class, if it was be reusable for other statements. >> >> Johannes > > CTEs are temporary tables for a _statement_ for using a single > statement instead of several ones (create temp table, insert into, > select from it, select from it, drop temp table). > > But it is not your case because CTEs are for a queries which return a > single set of rows. Your case is returning two sets (one row with > several columns from t0 and several rows with a single columns from > t1). Sure. Thanks for the temporary table example! Johannes
Attachment
> On 08 Feb 2016, at 20:05, Johannes <jotpe@posteo.de> wrote: > > select id, col1, col2, ... from t0 where id = (select max(id) from t0 > where col1 = value1 and col2 = value2 and …); > select col1 from t1 where t0_id = (select max(id) from t0 where col1 = > value1 and col2 = value2 and …); select t0.id, t0.col1, t0.col2, t0…., t1.col1 from t0 join t1 on (t1.t0_id = t0.id) group by t0.id, t0.col1, t0.col2, t0…., t1.col1 having t0.id = max(t0.id); Low complexity and works with any number of rows from t0 (as does Adrian's solution, btw). I'm not sure what you mean by "copying of columns" in your reply to Adrian's solution, but I don't think that happens here. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
On 2/8/16, Johannes <jotpe@posteo.de> wrote: > Am 08.02.2016 um 21:17 schrieb Vitaly Burovoy: >> On 2/8/16, Johannes <jotpe@posteo.de> wrote: >>> Am 08.02.2016 um 20:15 schrieb David G. Johnston: >>>> On Mon, Feb 8, 2016 at 12:05 PM, Johannes <jotpe@posteo.de> wrote: >>>> >>>>> Hi, >>>>> >>>>> is there a best practice to share data between two select statements? >>>>> >>>>> Imaging following situation: I want to receive two result sets from >>>>> two >>>>> tables, referring to a specific id from table t0 AND I try not to >>>>> query >>>>> for that specific id a second time. >>>>> >>>>> Table t0 returns 1 row and table t1 returns multiple rows. >>>>> >>>>> begin; >>>>> select id, col1, col2, ... from t0 where id = (select max(id) from t0 >>>>> where col1 = value1 and col2 = value2 and ...); >>>>> select col1 from t1 where t0_id = (select max(id) from t0 where col1 = >>>>> value1 and col2 = value2 and ...); >>>>> commit; >> >> Hmm. Could you clarify why you don't want to pass id from the first >> query to the second one: >> >> select col1 from t1 where t0_id = value_id_from_the_first_query >> > > Of course I could do that, but in that case I would not ask. > > I thougt there could be a better solution to execute all statements at > once. What the reason to execute all statements which return different columns at once? > Saving roundtrips, In most cases they are not so big. Getting a bunch of duplicated data is wasting you network bandwidth and don't increase speed. > increase speed, Speed will be at least the same. In your case either you have to use more DDL (like CREATE TEMP TABLE) or get copied columns that leads more time to encode/decode and send it via network. > a more sophisticated solution, It usually depends on a task. Your case is simple enough and can't lead any sophisticated solution. =( > learn something new... It makes sense. =) > Johannes -- Best regards, Vitaly Burovoy
On 2/8/16, Alban Hertroys <haramrae@gmail.com> wrote: > >> On 08 Feb 2016, at 20:05, Johannes <jotpe@posteo.de> wrote: >> >> select id, col1, col2, ... from t0 where id = (select max(id) from t0 >> where col1 = value1 and col2 = value2 and …); >> >> select col1 from t1 where t0_id = (select max(id) from t0 where col1 = >> value1 and col2 = value2 and …); > > select t0.id, t0.col1, t0.col2, t0…., t1.col1 > from t0 > join t1 on (t1.t0_id = t0.id) > group by t0.id, t0.col1, t0.col2, t0…., t1.col1 > having t0.id = max(t0.id); > > Low complexity and works with any number of rows from t0 (as does Adrian's > solution, btw). I think it fully ruins speed at all. Try to create tables, insert at least 100000 rows into each of them (note that cardinality between them is 1:m) and see EXPLAIN of your query. You are joining two big tables, sort and group a resulting table and remove most rows to fit into one statement... > I'm not sure what you mean by "copying of columns" in your reply to Adrian's > solution, but I don't think that happens here. In the original letter the first query returns one row: "(id, col1, col2)", and the second one returns rows "(val1), (val2), (val3), ..." (values of the t1.col1). If you use joining, you get rows: (id, col1, col2, val1) (id, col1, col2, val2) (id, col1, col2, val3) ... where values of the first three columns are the same. > > Alban Hertroys -- Best regards, Vitaly Burovoy
Am 08.02.2016 um 21:50 schrieb Vitaly Burovoy: > On 2/8/16, Johannes <jotpe@posteo.de> wrote: >> Am 08.02.2016 um 21:17 schrieb Vitaly Burovoy: >>> On 2/8/16, Johannes <jotpe@posteo.de> wrote: >>>> Am 08.02.2016 um 20:15 schrieb David G. Johnston: >>>>> On Mon, Feb 8, 2016 at 12:05 PM, Johannes <jotpe@posteo.de> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> is there a best practice to share data between two select statements? >>>>>> >>>>>> Imaging following situation: I want to receive two result sets from >>>>>> two >>>>>> tables, referring to a specific id from table t0 AND I try not to >>>>>> query >>>>>> for that specific id a second time. >>>>>> >>>>>> Table t0 returns 1 row and table t1 returns multiple rows. >>>>>> >>>>>> begin; >>>>>> select id, col1, col2, ... from t0 where id = (select max(id) from t0 >>>>>> where col1 = value1 and col2 = value2 and ...); >>>>>> select col1 from t1 where t0_id = (select max(id) from t0 where col1 = >>>>>> value1 and col2 = value2 and ...); >>>>>> commit; >>> >>> Hmm. Could you clarify why you don't want to pass id from the first >>> query to the second one: >>> >>> select col1 from t1 where t0_id = value_id_from_the_first_query >>> >> >> Of course I could do that, but in that case I would not ask. >> >> I thougt there could be a better solution to execute all statements at >> once. > > What the reason to execute all statements which return different > columns at once? > >> Saving roundtrips, > > In most cases they are not so big. Getting a bunch of duplicated data > is wasting you network bandwidth and don't increase speed. In my and your example no duplicated data (result sets) is send over the network. The server do not need to wait until the client snips out the id and sends it id in the next query again. So the server can compute the result set without external dependencies as fast as possible. >> increase speed, > > Speed will be at least the same. In your case either you have to use > more DDL (like CREATE TEMP TABLE) or get copied columns that leads > more time to encode/decode and send it via network. The time difference is small, yes. My old variant with executing the first select, remember the returned id value and paste it into the second query and execute it takes 32ms. Your temp table variant need 29ms. Nice to see. That are 10% speed improvement. >> a more sophisticated solution, > > It usually depends on a task. Your case is simple enough and can't > lead any sophisticated solution. =( > No problem. >> learn something new... > > It makes sense. =) > >> Johannes Good night.
Attachment
On 2/8/16, Johannes <jotpe@posteo.de> wrote: > Am 08.02.2016 um 21:50 schrieb Vitaly Burovoy: >> On 2/8/16, Johannes <jotpe@posteo.de> wrote: >>> Am 08.02.2016 um 21:17 schrieb Vitaly Burovoy: >>>> Hmm. Could you clarify why you don't want to pass id from the first >>>> query to the second one: >>>> >>>> select col1 from t1 where t0_id = value_id_from_the_first_query >>> >>> Of course I could do that, but in that case I would not ask. >>> >>> I thougt there could be a better solution to execute all statements at >>> once. >> >> What the reason to execute all statements which return different >> columns at once? >> >>> Saving roundtrips, >> >> In most cases they are not so big. Getting a bunch of duplicated data >> is wasting you network bandwidth and don't increase speed. > > In my and your example no duplicated data (result sets) is send over the > network. The server do not need to wait until the client snips out the > id and sends it id in the next query again. So the server can compute > the result set without external dependencies as fast as possible. We are talking about executing all statements at once to save RTT. Are we? And a parallel thread has advice to join tables (queries). It is a way to run both queries at once, but it is not a solution. >>> increase speed, >> >> Speed will be at least the same. In your case either you have to use >> more DDL (like CREATE TEMP TABLE) or get copied columns that leads >> more time to encode/decode and send it via network. > > The time difference is small, yes. > My old variant with executing the first select, remember the returned id > value and paste it into the second query and execute it takes 32ms. > > Your temp table variant need 29ms. Nice to see. That are 10% speed > improvement. I guess you measure it by your app. It is just a measurement error. +-3ms can be a sum of TCP packet loss, system interrupts, system timer inaccuracy, multiple cache missing, different layers (you are using Java, it has a VM and a lot of intermediate abstraction layers). Remember, my version has 6 statements each of them requires some work at PG's side, plus my version has two joins which usually slower than direct search by a value. Your version has only 4 statements and the only one slow place -- "where" clause in the second select which can be replaced by a value founded in the first select (your version sends more data: value1, value2, ...). You also can avoid "begin" and "commit" since default transaction isolation is "READ COMMITTED"[1]: > Also note that two successive SELECT commands can see different data, > even though they are within a single transaction, if other transactions commit > changes after the first SELECT starts and before the second SELECT starts. If you want to measure time, run both versions 10000 times in 8 connections simultaneously and compare results. ;-) 32ms * 10k requests / 8 threads = 40000ms = 40sec [1]http://www.postgresql.org/docs/devel/static/transaction-iso.html#XACT-READ-COMMITTED -- Best regards, Vitaly Burovoy
Johannes <jotpe@posteo.de> writes: >> What the reason to execute all statements which return different >> columns at once? >> >>> Saving roundtrips, >> >> In most cases they are not so big. Getting a bunch of duplicated data >> is wasting you network bandwidth and don't increase speed. > > In my and your example no duplicated data (result sets) is send over the > network. The server do not need to wait until the client snips out the > id and sends it id in the next query again. So the server can compute > the result set without external dependencies as fast as possible. Sounds like what you're really after is a stored procedure, isn't it?
On 2/9/16, Harald Fuchs <hari.fuchs@gmail.com> wrote: > Johannes <jotpe@posteo.de> writes: > >>> What the reason to execute all statements which return different >>> columns at once? >>> >>>> Saving roundtrips, >>> >>> In most cases they are not so big. Getting a bunch of duplicated data >>> is wasting you network bandwidth and don't increase speed. >> >> In my and your example no duplicated data (result sets) is send over the >> network. The server do not need to wait until the client snips out the >> id and sends it id in the next query again. So the server can compute >> the result set without external dependencies as fast as possible. > > Sounds like what you're really after is a stored procedure, isn't it? Unfortunately, his case is different, because he needs to get two different set of rows that is impossible even with stored procedures. -- Best regards, Vitaly Burovoy
On Tuesday, February 9, 2016, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote:
On 2/9/16, Harald Fuchs <hari.fuchs@gmail.com> wrote:
> Johannes <jotpe@posteo.de> writes:
>
>>> What the reason to execute all statements which return different
>>> columns at once?
>>>
>>>> Saving roundtrips,
>>>
>>> In most cases they are not so big. Getting a bunch of duplicated data
>>> is wasting you network bandwidth and don't increase speed.
>>
>> In my and your example no duplicated data (result sets) is send over the
>> network. The server do not need to wait until the client snips out the
>> id and sends it id in the next query again. So the server can compute
>> the result set without external dependencies as fast as possible.
>
> Sounds like what you're really after is a stored procedure, isn't it?
Unfortunately, his case is different, because he needs to get two
different set of rows that is impossible even with stored procedures.
Correct, though it might be workable to use cursors in this situation. Not exactly sure how, though...
David J.
>>>> Hi, >>>> >>>> is there a best practice to share data between two select statements? Hi, I didn't check the whole thread so forgive me if this was already proposed, but maybe you could do something like: create temp table result2 (...) query_1: WITH cte as (select ..), tmp as ( INSERT INTO result2 select ... from cte), SELECT ... from cte; query_2: select * from result2; regards, Marc Mamin
>>>> Hi,
>>>>
>>>> is there a best practice to share data between two select statements?
Hi,
I didn't check the whole thread so forgive me if this was already proposed,
but maybe you could do something like:
create temp table result2 (...)
query_1:
WITH cte as (select ..),
tmp as ( INSERT INTO result2 select ... from cte),
SELECT ... from cte;
query_2:
select * from result2;
It was, more or less. I'm not sure you buy much using an updating CTE in lieu of a dedicated statement populating the temporary table. It seems a bit more confusing to comprehend and the performance benefit has to be marginal given we expect to only insert a single row into the temp table.
David J.
On 2/9/16, Marc Mamin <M.Mamin@intershop.de> wrote: > >>>>> Hi, >>>>> >>>>> is there a best practice to share data between two select statements? > > Hi, > I didn't check the whole thread Try it[1]. The thread is not so long (21 letters before yours) and it worth it. > so forgive me if this was already proposed, > but maybe you could do something like: > > create temp table result2 (...) > > query_1: > WITH cte as (select ..), > tmp as ( INSERT INTO result2 select ... from cte), > SELECT ... from cte; > > query_2: > select * from result2; There is a mistake here: query2 returns the same result as the query_1. > > regards, > Marc Mamin It is similar to the fourth answer[2] in the thread. If you are able to create temporary table with all fields of the first result only for avoiding one statement, it can be rewritten without CTE and one INNER JOIN (five statements): BEGIN; CREATE TEMPORARY TABLE temptable(id ..., col1 ..., col2 ..., ...) ON COMMIT DROP; INSERT INTO temptable SELECT id, col1, col2, ... FROM t0 WHERE col1 = value1 and col2 = value2 and ... RETURNING *; SELECT col1 FROM t1 INNER JOIN temptable ON (t0_id = temptable.id); COMMIT; === If it is hard to detect (or just lazy to write) what types temporary table has, you can rewrite it as (also five statements): BEGIN; CREATE TEMPORARY TABLE temptable ON COMMIT DROP AS SELECT id, col1, col2, ... FROM t0 WHERE id = ( SELECT max(id) FROM t0 WHERE col1 = value1 and col2 = value2 and ... ); SELECT * FROM temptable; SELECT col1 FROM t1 INNER JOIN temptable ON (t0_id = temptable.id); COMMIT; === But it is not so useful. In the case in original letter the best way is to use only two queries and pass id from the result of the first query as an argument to the second query. See the other letter[3] in the thread. Temporary tables are useful for keeping a lot of rows to prevent copying them between client and server. [1]http://www.postgresql.org/message-id/flat/56B8E6F9.9070600@posteo.de [2]http://www.postgresql.org/message-id/CAKOSWNkRB0kfWHcw9CvOcRmkS8HuzrPVFLr0kKcjuYn6juK5NA@mail.gmail.com [3]http://www.postgresql.org/message-id/CAKOSWN=wX9Myw4q9mpiqESyizU4tWrGivgbw+3Ef=5Q60viFSg@mail.gmail.com -- Best regards, Vitaly Burovoy
Am 09.02.2016 um 00:56 schrieb Vitaly Burovoy: > On 2/8/16, Johannes <jotpe@posteo.de> wrote: >> Am 08.02.2016 um 21:50 schrieb Vitaly Burovoy: >>> On 2/8/16, Johannes <jotpe@posteo.de> wrote: >>>> Am 08.02.2016 um 21:17 schrieb Vitaly Burovoy: >>>>> Hmm. Could you clarify why you don't want to pass id from the first >>>>> query to the second one: >>>>> >>>>> select col1 from t1 where t0_id = value_id_from_the_first_query >>>> >>>> Of course I could do that, but in that case I would not ask. >>>> >>>> I thougt there could be a better solution to execute all statements at >>>> once. >>> >>> What the reason to execute all statements which return different >>> columns at once? >>> >>>> Saving roundtrips, >>> >>> In most cases they are not so big. Getting a bunch of duplicated data >>> is wasting you network bandwidth and don't increase speed. >> >> In my and your example no duplicated data (result sets) is send over the >> network. The server do not need to wait until the client snips out the >> id and sends it id in the next query again. So the server can compute >> the result set without external dependencies as fast as possible. > > We are talking about executing all statements at once to save RTT. Are we? Yes, we do. > And a parallel thread has advice to join tables (queries). It is a way > to run both queries at once, but it is not a solution. Right. >>>> increase speed, >>> >>> Speed will be at least the same. In your case either you have to use >>> more DDL (like CREATE TEMP TABLE) or get copied columns that leads >>> more time to encode/decode and send it via network. >> >> The time difference is small, yes. >> My old variant with executing the first select, remember the returned id >> value and paste it into the second query and execute it takes 32ms. >> >> Your temp table variant need 29ms. Nice to see. That are 10% speed >> improvement. > > I guess you measure it by your app. It is just a measurement error. > +-3ms can be a sum of TCP packet loss, system interrupts, system timer > inaccuracy, multiple cache missing, different layers (you are using > Java, it has a VM and a lot of intermediate abstraction layers). I know all these facts. I run it a "only" a few times, single threaded. With high resolution. The rounded result of 3ms was reliable and thats exact enough for my rule of thumb. Your temp table variant looks more elegant, and is sightly faster. I'm fine with that. > Remember, my version has 6 statements each of them requires some work > at PG's side, plus my version has two joins which usually slower than > direct search by a value. Your version has only 4 statements and the > only one slow place -- "where" clause in the second select which can > be replaced by a value founded in the first select (your version sends > more data: value1, value2, ...). Anyway, it is faster ;) > You also can avoid "begin" and "commit" since default transaction > isolation is "READ COMMITTED"[1]: >> Also note that two successive SELECT commands can see different data, >> even though they are within a single transaction, if other transactions commit >> changes after the first SELECT starts and before the second SELECT starts. I know. I did not tell that I run my queries in with repeatable read isolation. And I read it is wise to bundle multiple queries in an transaction, because the overhead of multiple transaction can be avoid to one. > If you want to measure time, run both versions 10000 times in 8 > connections simultaneously and compare results. ;-) > > 32ms * 10k requests / 8 threads = 40000ms = 40sec Thats more complicated, I think I learned enough about it. But thanks. Ciao Johannes
Attachment
On 2/9/16, Johannes <jotpe@posteo.de> wrote: > Am 09.02.2016 um 00:56 schrieb Vitaly Burovoy: >> On 2/8/16, Johannes <jotpe@posteo.de> wrote: >>> Am 08.02.2016 um 21:50 schrieb Vitaly Burovoy: >>>> On 2/8/16, Johannes <jotpe@posteo.de> wrote: >>>>> increase speed, >>>> >>>> Speed will be at least the same. In your case either you have to use >>>> more DDL (like CREATE TEMP TABLE) or get copied columns that leads >>>> more time to encode/decode and send it via network. >>> >>> The time difference is small, yes. >>> My old variant with executing the first select, remember the returned id >>> value and paste it into the second query and execute it takes 32ms. >>> >>> Your temp table variant need 29ms. Nice to see. That are 10% speed >>> improvement. >> >> I guess you measure it by your app. It is just a measurement error. >> +-3ms can be a sum of TCP packet loss, system interrupts, system timer >> inaccuracy, multiple cache missing, different layers (you are using >> Java, it has a VM and a lot of intermediate abstraction layers). > > I know all these facts. I run it a "only" a few times, single threaded. > With high resolution. The rounded result of 3ms was reliable and thats > exact enough for my rule of thumb. > Your temp table variant looks more elegant, and is sightly faster. I'm > fine with that. > >> Remember, my version has 6 statements each of them requires some work >> at PG's side, plus my version has two joins which usually slower than >> direct search by a value. Your version has only 4 statements and the >> only one slow place -- "where" clause in the second select which can >> be replaced by a value founded in the first select (your version sends >> more data: value1, value2, ...). > > Anyway, it is faster ;) I can't believe it. I insist it is a measurement error. >> You also can avoid "begin" and "commit" since default transaction >> isolation is "READ COMMITTED"[1]: >>> Also note that two successive SELECT commands can see different data, >>> even though they are within a single transaction, if other transactions >>> commit >>> changes after the first SELECT starts and before the second SELECT >>> starts. > > I know. I did not tell that I run my queries in with repeatable read > isolation. > And I read it is wise to bundle multiple queries in an transaction, > because the overhead of multiple transaction can be avoid to one. It is not true for transactions with SELECTs only. It is wise to join multiple queries in one transaction when you do multiple _changes_ in a DB, because after each change DB must save it into WAL file and increase shared "Transaction ID sequence". Also as in your example, transactions with isolation SERIALIZABLE and REPEATABLE READ are used to do (even RO) queries to the DB in a consistent state. But RO queries don't become faster. >> If you want to measure time, run both versions 10000 times in 8 >> connections simultaneously and compare results. ;-) >> >> 32ms * 10k requests / 8 threads = 40000ms = 40sec > > Thats more complicated, I think I learned enough about it. But thanks. Real world is complex. 40 seconds per test is not long. If you have to save only one param for the other query creating a temp table is heavy enough. I'm waiting for a result of 10k requests test. -- Best regards, Vitaly Burovoy