Re: Specifying many rows in a table - Mailing list pgsql-general

From John Sidney-Woollett
Subject Re: Specifying many rows in a table
Date
Msg-id 1761.192.168.0.64.1075328098.squirrel@mercury.wardbrook.com
Whole thread Raw
In response to Specifying many rows in a table  (Steve Atkins <steve@blighty.com>)
List pgsql-general
If you could insert the "in" values into a temporary table then join
against your *large* table you may find that you getter performance, at
the expense of having to populate the temp tables with your "in" values.

Rather than having to populate and delete your "in" values ready for the
next query, consider adding a search counter column to the table, eg

(in pseudo code)
create temporary table INTABLE (searchid integer, invalue integer);

create a function InsertInValues(searchid integer, values text) which
takes a comma delimited list of "in" values, and uses a loop to insert
them into your (temp) INTABLE.

now your query is

select * from table t, intable it
where t.keyvalue = it.invalue
and searchid = XXXX;

Next query, call InsertInValues(searchid integer, values text)  again with
the next set of data, incrementing the searchid value for the function
call, and then perform the next search.

Or write a function SearchUsingInValues(invalue integer) which does the
two steps above within the one function (generating its own internal
searchid), and returns a setof values.

Or do away with the searchid stuff, and either drop/recreate the temp
table between calls, or truncate the data.

Someone else may be able to suggest a better/faster way of generating the
"in" values...

I don't know if that's any use. Hope so.

John Sidney-Woollett

Steve Atkins said:
> I have a large table (potentially tens or hundreds of millions of rows)
> and
> I need to extract some number of these rows, defined by an integer primary
> key.
>
> So, the obvious answer is
>
>   select * from table where id in (1,3,4);
>
> But I may want to extract a large number of rows, many thousands
>
>   select * from table where id in (1, 3, 5, ...., 100000, 100017, 23000);
>
> This falls over when it exceeds the maximum expression depth of 10,000.
> And I have a sneaky feeling that increasing max_expr_depth isn't the
> right fix.
>
> Performance is pretty important, so does anyone have a good suggestion
> for how to phrase this query so that it'll at worst only be a single
> seq-scan through the mondo table, and make a reasonable choice as to
> whether to use an index-scan or seq-scan, depending on the number
> of rows I'm pulling out?
>
> Cheers,
>   Steve
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match
>


pgsql-general by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Specifying many rows in a table
Next
From: Tom Lane
Date:
Subject: Re: 7.3.4 freezing