Thread: Re: [SQL] "SELECT IN" Still Broken in 7.4b

Re: [SQL] "SELECT IN" Still Broken in 7.4b

From
Tom Lane
Date:
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> Within the scope of the new hashed IN stuff I believe so in at least some
> cases.  I have a few million row table of integers where searching for
> values IN (~10000 values) takes longer than creating the temp table,
> copying into it and doing the in subquery.

I did some profiling and soon realized that the main problem is the
executor's trick for not returning the same row multiple times in a
multiple-indexscan plan node.  The point is that givenWHERE a = 1 OR b = 1
you could create a plan that first indexscans on a, then indexscans on
b --- but you mustn't return any tuples in the second scan that you
already returned in the first.  IndexNext solves this by evaluating the
prior-scan index conditions to see if they are true.  Which is okay if
there aren't too many of them.  But when you have an N-element IN list
this means you are going to do O(N^2) index expression evaluations.
In the 10000-element IN-list test case, ExecQual gets called almost
50 million times :-(

I'm toying with the notion of ripping out that logic and instead
building an in-memory hashtable of already-returned TIDs.  This could
theoretically run out of memory if the multiple indexscan returns enough
tuples, but I think in practice that wouldn't happen because the planner
wouldn't choose an indexscan when very large numbers of tuples are being
selected.

Comments?
        regards, tom lane


Re: [SQL] "SELECT IN" Still Broken in 7.4b

From
Stephan Szabo
Date:
On Thu, 21 Aug 2003, Tom Lane wrote:

> Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> > Within the scope of the new hashed IN stuff I believe so in at least some
> > cases.  I have a few million row table of integers where searching for
> > values IN (~10000 values) takes longer than creating the temp table,
> > copying into it and doing the in subquery.
>
> I did some profiling and soon realized that the main problem is the
> executor's trick for not returning the same row multiple times in a
> multiple-indexscan plan node.  The point is that given
>     WHERE a = 1 OR b = 1
> you could create a plan that first indexscans on a, then indexscans on
> b --- but you mustn't return any tuples in the second scan that you
> already returned in the first.  IndexNext solves this by evaluating the
> prior-scan index conditions to see if they are true.  Which is okay if
> there aren't too many of them.  But when you have an N-element IN list
> this means you are going to do O(N^2) index expression evaluations.
> In the 10000-element IN-list test case, ExecQual gets called almost
> 50 million times :-(
>
> I'm toying with the notion of ripping out that logic and instead
> building an in-memory hashtable of already-returned TIDs.  This could
> theoretically run out of memory if the multiple indexscan returns enough
> tuples, but I think in practice that wouldn't happen because the planner
> wouldn't choose an indexscan when very large numbers of tuples are being
> selected.

Well, if you want to be safer, I guess you could (at runtime) decide that
the table's gotten too big and fall back to the old method if you didn't
entirely rip it out.  I'm not sure if that'd be too ugly though, but it
would mean that you wouldn't have to worry about it returning too many
tuples.



Re: [SQL] "SELECT IN" Still Broken in 7.4b

From
Manfred Koizar
Date:
On Thu, 21 Aug 2003 16:42:20 -0400, Tom Lane <tgl@sss.pgh.pa.us>
wrote:
>The point is that given
>    WHERE a = 1 OR b = 1
>you could create a plan that first indexscans on a, then indexscans on
>b --- but you mustn't return any tuples in the second scan that you
>already returned in the first.  IndexNext solves this by evaluating the
>prior-scan index conditions to see if they are true.
WHERE a = 1 OR b = 2
andWHERE a = 1 OR a = 2

are totally different things.  In the latter case you don't have to
check prior conditions because the conditions are mutually exclusive.
Is this reasonably easy to find out at plan creation time?

Yes, I changed your example to make my point clear, becauseWHERE a = 1 OR a = 1
has its own set of problems.

ServusManfred


Re: [SQL] "SELECT IN" Still Broken in 7.4b

From
Josh Berkus
Date:
Tom,

> I'm toying with the notion of ripping out that logic and instead
> building an in-memory hashtable of already-returned TIDs.  This could
> theoretically run out of memory if the multiple indexscan returns enough
> tuples, but I think in practice that wouldn't happen because the planner
> wouldn't choose an indexscan when very large numbers of tuples are being
> selected.

Don't forget all of the tyros who tune their queries through "set
enable_seqscan=false".  I think we'd need some kind of memory safety valve on
this, like sandboxing it in sort_mem.

--
-Josh BerkusAglio Database SolutionsSan Francisco



Re: [SQL] "SELECT IN" Still Broken in 7.4b

From
Tom Lane
Date:
Manfred Koizar <mkoi-pg@aon.at> writes:
>     WHERE a = 1 OR a = 2
> are totally different things.  In the latter case you don't have to
> check prior conditions because the conditions are mutually exclusive.
> Is this reasonably easy to find out at plan creation time?

Yeah, I know, but I see no easy way to verify this (where "easy" means
"doesn't take an unreasonable amount of time").  A naive check would
take O(N^2) time, putting you right back where you started.  Even a
smart check would surely take more time per item than one hashtable probe.
I'm also concerned about how much the planner would have to assume about
the semantics of the operators in order to prove the conditions are
mutually exclusive.

Finally, I suspect that once we get rid of the O(N^2) behavior in the
executor, we will find that the next biggest bottleneck is in the
planner; adding more work for it to do per OR-clause item will make
things worse not better.
        regards, tom lane


Re: [SQL] "SELECT IN" Still Broken in 7.4b

From
Joe Conway
Date:
Tom Lane wrote:
> I'm toying with the notion of ripping out that logic and instead
> building an in-memory hashtable of already-returned TIDs.  This could
> theoretically run out of memory if the multiple indexscan returns enough
> tuples, but I think in practice that wouldn't happen because the planner
> wouldn't choose an indexscan when very large numbers of tuples are being
> selected.

I thought with your recent changes, you could use dynahash, which 
already spills to disk instead of consuming all memory?

Joe



Re: [SQL] "SELECT IN" Still Broken in 7.4b

From
Tom Lane
Date:
Joe Conway <mail@joeconway.com> writes:
> Tom Lane wrote:
>> I'm toying with the notion of ripping out that logic and instead
>> building an in-memory hashtable of already-returned TIDs.  This could
>> theoretically run out of memory if the multiple indexscan returns enough
>> tuples, but I think in practice that wouldn't happen because the planner
>> wouldn't choose an indexscan when very large numbers of tuples are being
>> selected.

> I thought with your recent changes, you could use dynahash, which 
> already spills to disk instead of consuming all memory?

I was going to use dynahash, but it doesn't spill to disk.  You're
confusing that with the HashJoin mechanism, which is quite different and
only really useful for joins.
        regards, tom lane


Re: [SQL] "SELECT IN" Still Broken in 7.4b

From
"Christopher Kings-Lynne"
Date:
> I'm toying with the notion of ripping out that logic and instead
> building an in-memory hashtable of already-returned TIDs.  This could
> theoretically run out of memory if the multiple indexscan returns enough
> tuples, but I think in practice that wouldn't happen because the planner
> wouldn't choose an indexscan when very large numbers of tuples are being
> selected.
> 
> Comments?

Sounds kind of like a bitmap index almost..

Chris



Re: [SQL] "SELECT IN" Still Broken in 7.4b

From
"Shridhar Daithankar"
Date:
On 21 Aug 2003 at 16:42, Tom Lane wrote:

> Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> > Within the scope of the new hashed IN stuff I believe so in at least some
> > cases.  I have a few million row table of integers where searching for
> > values IN (~10000 values) takes longer than creating the temp table,
> > copying into it and doing the in subquery.
> 
> I did some profiling and soon realized that the main problem is the
> executor's trick for not returning the same row multiple times in a
> multiple-indexscan plan node.  The point is that given
>     WHERE a = 1 OR b = 1
> you could create a plan that first indexscans on a, then indexscans on
> b --- but you mustn't return any tuples in the second scan that you
> already returned in the first.  IndexNext solves this by evaluating the
> prior-scan index conditions to see if they are true.  Which is okay if
> there aren't too many of them.  But when you have an N-element IN list
> this means you are going to do O(N^2) index expression evaluations.
> In the 10000-element IN-list test case, ExecQual gets called almost
> 50 million times :-(
> 
> I'm toying with the notion of ripping out that logic and instead
> building an in-memory hashtable of already-returned TIDs.  This could
> theoretically run out of memory if the multiple indexscan returns enough
> tuples, but I think in practice that wouldn't happen because the planner
> wouldn't choose an indexscan when very large numbers of tuples are being
> selected.

If I understand it correctly, we are essentially looking at ~10000 individual 
index scans, in above example, isn't it? So if planner takes this in account, 
it probably would not opt for seq. scan.

Other idea could be, index the in list first and search the index using 
locality of values in the in list. If the 10K values in the list are between 
10K-20K, they should be pretty easy to locate with single index sweep, assuming 
uniform distribution. May be planner could build this IN list index before 
drawing plan to determine cardinality and distribution of individual values.

On the least side, it could draw a plan for fetching a tuple range between min 
and max of IN values and building in memory hash of these values for 
comparison. That's would surely be cheaper than scanning entire table/result 
set over ad over 

Frankly, I would say a temp table is far better solution..:-)

Just a thought...

ByeShridhar

--
Youth doesn't excuse everything.        -- Dr. Janice Lester (in Kirk's body), 
"Turnabout Intruder",           stardate 5928.5.



Re: [SQL] "SELECT IN" Still Broken in 7.4b

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Finally, I suspect that once we get rid of the O(N^2) behavior in the
> executor, we will find that the next biggest bottleneck is in the
> planner; adding more work for it to do per OR-clause item will make
> things worse not better.

But time spent in the planner doesn't matter if you're using prepared queries
which is true for OLTP applications where there are very few queries being
executed many many times. I hear web sites are quite popular these days.

I missed the beginning of this thread so I'm not sure if it's relevant. But
wouldn't it be possible to just check if all the clauses and see if they're
using precisely the same index with an equality type operator? That won't
catch things like "a BETWEEN 1 AND 2 OR a BETWEEN 5 AND 6" but it will catch
things like "a IN (1,2,3,4,5,6)". And I don't see how it wouldn't be linear
in the number of clauses.

-- 
greg



Re: [SQL] "SELECT IN" Still Broken in 7.4b

From
Stephan Szabo
Date:
On 22 Aug 2003, Greg Stark wrote:

>
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>
> > Finally, I suspect that once we get rid of the O(N^2) behavior in the
> > executor, we will find that the next biggest bottleneck is in the
> > planner; adding more work for it to do per OR-clause item will make
> > things worse not better.
>
> But time spent in the planner doesn't matter if you're using prepared queries
> which is true for OLTP applications where there are very few queries being
> executed many many times. I hear web sites are quite popular these days.

True, but not every application is such.  You still need to balance
planning time with other concerns.

> I missed the beginning of this thread so I'm not sure if it's relevant. But
> wouldn't it be possible to just check if all the clauses and see if they're
> using precisely the same index with an equality type operator? That won't
> catch things like "a BETWEEN 1 AND 2 OR a BETWEEN 5 AND 6" but it will catch
> things like "a IN (1,2,3,4,5,6)". And I don't see how it wouldn't be linear
> in the number of clauses.

But that wouldn't help unless you made sure that what was being compared
was not the same AFAICT (since the point would be to not need to check if
the rows were already returned) since a=1 or a=1 is legal if meaningless.
And that's not necessarily immediately obvious depending on the behavior
of the = operator without trying it, for example does where a='c ' or
a='c' have a redundancy or not?



Re: [SQL] "SELECT IN" Still Broken in 7.4b

From
Tom Lane
Date:
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> Well, if you want to be safer, I guess you could (at runtime) decide that
> the table's gotten too big and fall back to the old method if you didn't
> entirely rip it out.  I'm not sure if that'd be too ugly though, but it
> would mean that you wouldn't have to worry about it returning too many
> tuples.

I did it this way --- it falls back to the old code if the TID hash
table grows to exceed SortMem.  Should be noticeably faster than the
old code for reasonably-sized IN lists.
        regards, tom lane