Thread: A complex plproxy query
This is a complex question, and I couldn't form it in a short and easy way, and I'm sorry for that. First of all, let me introduce you to the DB (to form a question), for you to understand what am I talking about. The module looks like a social network, just the users have friends, which can be in different groups. Also it is clustered with PLPROXY by user_id, so the user itself, and his friends list (the list of ID's) is always in the same DB, but the information about the friends is not (it is clustered through all the partitions). Here is a little sketch of a the needed tables: CREATE TABLE friend ( id bigint, user_id integer, friend_id integer, group_id bigint, ... ); This table is a 'friend link' from one user to another, which can be marked as being in some 'group', and the backward link exists also (from the 2nd user to the 1st), which can possibly be in another 'group'. CREATE TABLE user ( user_id integer, nickname text, .... -- lots of other info ); This is just a user table. Both of these are clustered by user_id. I need to form the following query, for it to be as fast as possible (here it is written as if it the DB was not partitioned): SELECT something FROM user u, friend f WHERE u.user_id = f.friend.id AND f.user_id = $1 (this is given as an argument) AND f.group_id = $2 So to say, give me the list of friends (not only their ID's, but all the needed columns!) of given individual, which are in a given group. That seems ok without plproxy, but with using it, I can't imagine how can I form a nice query, or a function (or a set of plpgsql + plproxy functions) to do the job. Thanks in advance and regards, Igor Katson.
On Thu, 2009-01-22 at 02:33 +0300, Igor Katson wrote: > So to say, give me the list of friends (not only their ID's, but all the > needed columns!) of given individual, which are in a given group. That > seems ok without plproxy, but with using it, I can't imagine how can I > form a nice query, or a function (or a set of plpgsql + plproxy > functions) to do the job. You need to do it in two steps - first run a query on the partition the user is in to get list of friends ids, then run a second RUN ON ALL query with WHERE f.friend.id in (list of ids from f1) AND f.group_id = $2 to gather all friend info in parallel -- ------------------------------------------ Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training
Hannu Krosing wrote: > On Thu, 2009-01-22 at 02:33 +0300, Igor Katson wrote: > > >> So to say, give me the list of friends (not only their ID's, but all the >> needed columns!) of given individual, which are in a given group. That >> seems ok without plproxy, but with using it, I can't imagine how can I >> form a nice query, or a function (or a set of plpgsql + plproxy >> functions) to do the job. >> > > You need to do it in two steps - first run a query on the partition the > user is in to get list of friends ids, then run a second RUN ON ALL > query with > > WHERE f.friend.id in (list of ids from f1) AND f.group_id = $2 > > to gather all friend info in parallel > > I was thinking about that. But I don't understand, how can I pass the list of id's. Should I turn the output of a select into an array? How then? What if the array gets hundreds of items long?
On 1/22/09, Igor Katson <descentspb@gmail.com> wrote: > Hannu Krosing wrote: > > On Thu, 2009-01-22 at 02:33 +0300, Igor Katson wrote: > > > > > >> So to say, give me the list of friends (not only their ID's, but all the > >> needed columns!) of given individual, which are in a given group. That > >> seems ok without plproxy, but with using it, I can't imagine how can I > >> form a nice query, or a function (or a set of plpgsql + plproxy > >> functions) to do the job. > >> > > > > You need to do it in two steps - first run a query on the partition the > > user is in to get list of friends ids, then run a second RUN ON ALL > > query with > > > > WHERE f.friend.id in (list of ids from f1) AND f.group_id = $2 > > > > to gather all friend info in parallel > > > > > > I was thinking about that. But I don't understand, how can I pass the > list of id's. Should I turn the output of a select into an array? How > then? What if the array gets hundreds of items long? Yes, array works fine. And if it's long, then let it be long... -- marko
Marko Kreen wrote: > On 1/22/09, Igor Katson <descentspb@gmail.com> wrote: > >> Hannu Krosing wrote: >> > On Thu, 2009-01-22 at 02:33 +0300, Igor Katson wrote: >> > >> > >> >> So to say, give me the list of friends (not only their ID's, but all the >> >> needed columns!) of given individual, which are in a given group. That >> >> seems ok without plproxy, but with using it, I can't imagine how can I >> >> form a nice query, or a function (or a set of plpgsql + plproxy >> >> functions) to do the job. >> >> >> > >> > You need to do it in two steps - first run a query on the partition the >> > user is in to get list of friends ids, then run a second RUN ON ALL >> > query with >> > >> > WHERE f.friend.id in (list of ids from f1) AND f.group_id = $2 >> > >> > to gather all friend info in parallel >> > >> > >> >> I was thinking about that. But I don't understand, how can I pass the >> list of id's. Should I turn the output of a select into an array? How >> then? What if the array gets hundreds of items long? >> > > Yes, array works fine. And if it's long, then let it be long... > > Ok, thank you, guys. What is the best way to make an array out of a column? I didn't make up anything better then writing a function: CREATE OR REPLACE FUNCTION int_column_to_array(query text) RETURNS int[] AS $$ DECLARE arr int[]; rec int; BEGIN FOR rec IN EXECUTE query LOOP arr := array_append('{}',rec); END LOOP; RETURN arr; END; $$ language plpgsql;
On Thu, 2009-01-22 at 15:10 +0300, Igor Katson wrote: > Ok, thank you, guys. What is the best way to make an array out of a > column? I didn't make up anything better then writing a function: > > CREATE OR REPLACE FUNCTION int_column_to_array(query text) RETURNS int[] > AS $$ > DECLARE > arr int[]; > rec int; > BEGIN > FOR rec IN EXECUTE query > LOOP > arr := array_append('{}',rec); > END LOOP; > RETURN arr; > END; > $$ language plpgsql; hannu=# select ARRAY(select usename from pg_user); ?column? ------------------------------- {postgres,hannu,m1,skyncuser} (1 row) -- ------------------------------------------ Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training
Hannu Krosing wrote: > On Thu, 2009-01-22 at 15:10 +0300, Igor Katson wrote: > > >> Ok, thank you, guys. What is the best way to make an array out of a >> column? I didn't make up anything better then writing a function: >> >> CREATE OR REPLACE FUNCTION int_column_to_array(query text) RETURNS int[] >> AS $$ >> DECLARE >> arr int[]; >> rec int; >> BEGIN >> FOR rec IN EXECUTE query >> LOOP >> arr := array_append('{}',rec); >> END LOOP; >> RETURN arr; >> END; >> $$ language plpgsql; >> > > hannu=# select ARRAY(select usename from pg_user); > ?column? > ------------------------------- > {postgres,hannu,m1,skyncuser} > (1 row) > > > > Lots of thanks! I tried the same one, but with ARRAY[], so i didn't get anything.
On Thu, 2009-01-22 at 16:13 +0200, Hannu Krosing wrote: > On Thu, 2009-01-22 at 15:10 +0300, Igor Katson wrote: > > > Ok, thank you, guys. What is the best way to make an array out of a > > column? I didn't make up anything better then writing a function: > > > > CREATE OR REPLACE FUNCTION int_column_to_array(query text) RETURNS int[] > > AS $$ > > DECLARE > > arr int[]; > > rec int; > > BEGIN > > FOR rec IN EXECUTE query > > LOOP > > arr := array_append('{}',rec); > > END LOOP; > > RETURN arr; > > END; > > $$ language plpgsql; > > hannu=# select ARRAY(select usename from pg_user); > ?column? > ------------------------------- > {postgres,hannu,m1,skyncuser} > (1 row) So what yo need is select * from gather_friend_info_from_all_partitions(get_friend_id_list_form_user_partition(user_id)); -- ------------------------------------------ Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training
Hannu Krosing wrote: > On Thu, 2009-01-22 at 16:13 +0200, Hannu Krosing wrote: > >> On Thu, 2009-01-22 at 15:10 +0300, Igor Katson wrote: >> >> >>> Ok, thank you, guys. What is the best way to make an array out of a >>> column? I didn't make up anything better then writing a function: >>> >>> CREATE OR REPLACE FUNCTION int_column_to_array(query text) RETURNS int[] >>> AS $$ >>> DECLARE >>> arr int[]; >>> rec int; >>> BEGIN >>> FOR rec IN EXECUTE query >>> LOOP >>> arr := array_append('{}',rec); >>> END LOOP; >>> RETURN arr; >>> END; >>> $$ language plpgsql; >>> >> hannu=# select ARRAY(select usename from pg_user); >> ?column? >> ------------------------------- >> {postgres,hannu,m1,skyncuser} >> (1 row) >> > > So what yo need is > > select * from > gather_friend_info_from_all_partitions(get_friend_id_list_form_user_partition(user_id)); > > Yes, after using arrays, I figured out perfectly, how to do that. And thanks for the help!