Thread: trouble converting several serial queries into a parallel query

trouble converting several serial queries into a parallel query

From

Jonathan Vanasco

Date:

04 July 2015, 23:30:12

I have a very simple query that is giving me some issues due to the size of the database and the number of requests I
maketo it in order to compile the report I need: 

A dumbed down version of the table and query:

    CREATE TABLE a_to_b (
        id_a INT NOT NULL REFERENCES table_a(id),
        id_b INT NOT NULL REFERENCES table_b(id),
        PRIMARY KEY (id_a, id_b)
    );
    SELECT id_a, id_b FROM a_2_b WHERE id_a = 1 LIMIT 5;

The problem is that the table has a few million records and I need to query it 30+ times in a row.

I'd like to improve this with a parallel search using `IN()`

    SELECT id_a, id_b FROM a_2_b WHERE id_a = IN
(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26.27,28,29,30);

That technique has generally fixed a lot of bottlenecks for us.

However I can't wrap my head around structuring it so that I can apply a limit based on the column -- so that I only
get5 records per id_a. 

The table has columns that I would use for ordering in the future, but I'm fine with just getting random values right
now. 

Can anyone offer some suggestions?  Thanks in advance.

Re: trouble converting several serial queries into a parallel query

From

Chris Mair

Date:

05 July 2015, 12:04:44

> I have a very simple query that is giving me some issues due to the size of the database and the number of requests I
maketo it in order to compile the report I need: 
>
> A dumbed down version of the table and query:
>
>     CREATE TABLE a_to_b (
>         id_a INT NOT NULL REFERENCES table_a(id),
>         id_b INT NOT NULL REFERENCES table_b(id),
>         PRIMARY KEY (id_a, id_b)
>     );
>     SELECT id_a, id_b FROM a_2_b WHERE id_a = 1 LIMIT 5;
>
> The problem is that the table has a few million records and I need to query it 30+ times in a row.
>
> I'd like to improve this with a parallel search using `IN()`
>
>     SELECT id_a, id_b FROM a_2_b WHERE id_a = IN
(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26.27,28,29,30);
>
> That technique has generally fixed a lot of bottlenecks for us.
>
> However I can't wrap my head around structuring it so that I can apply a limit based on the column -- so that I only
get5 records per id_a. 
>
> The table has columns that I would use for ordering in the future, but I'm fine with just getting random values right
now. 
>
> Can anyone offer some suggestions?  Thanks in advance.

Hi,

I had exactly the same problem some time ago and came up with this:


select * from (
    select *, rank() over (partition by id_a order by id_b) as r
    from a_to_b where id_a in (1, 2)
) as subsel where r <= 5;

Note the ordering is already there (by id_b), you can pick other columns
of course).

It looks a bit complicated, though. If anybody knows a more
straitforward way I'd be glad to hear it :)

Bye,
Chris.