Thread: Re: [SQL] "SELECT IN" Still Broken in 7.4b
Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > Within the scope of the new hashed IN stuff I believe so in at least some > cases. I have a few million row table of integers where searching for > values IN (~10000 values) takes longer than creating the temp table, > copying into it and doing the in subquery. I did some profiling and soon realized that the main problem is the executor's trick for not returning the same row multiple times in a multiple-indexscan plan node. The point is that givenWHERE a = 1 OR b = 1 you could create a plan that first indexscans on a, then indexscans on b --- but you mustn't return any tuples in the second scan that you already returned in the first. IndexNext solves this by evaluating the prior-scan index conditions to see if they are true. Which is okay if there aren't too many of them. But when you have an N-element IN list this means you are going to do O(N^2) index expression evaluations. In the 10000-element IN-list test case, ExecQual gets called almost 50 million times :-( I'm toying with the notion of ripping out that logic and instead building an in-memory hashtable of already-returned TIDs. This could theoretically run out of memory if the multiple indexscan returns enough tuples, but I think in practice that wouldn't happen because the planner wouldn't choose an indexscan when very large numbers of tuples are being selected. Comments? regards, tom lane
On Thu, 21 Aug 2003, Tom Lane wrote: > Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > > Within the scope of the new hashed IN stuff I believe so in at least some > > cases. I have a few million row table of integers where searching for > > values IN (~10000 values) takes longer than creating the temp table, > > copying into it and doing the in subquery. > > I did some profiling and soon realized that the main problem is the > executor's trick for not returning the same row multiple times in a > multiple-indexscan plan node. The point is that given > WHERE a = 1 OR b = 1 > you could create a plan that first indexscans on a, then indexscans on > b --- but you mustn't return any tuples in the second scan that you > already returned in the first. IndexNext solves this by evaluating the > prior-scan index conditions to see if they are true. Which is okay if > there aren't too many of them. But when you have an N-element IN list > this means you are going to do O(N^2) index expression evaluations. > In the 10000-element IN-list test case, ExecQual gets called almost > 50 million times :-( > > I'm toying with the notion of ripping out that logic and instead > building an in-memory hashtable of already-returned TIDs. This could > theoretically run out of memory if the multiple indexscan returns enough > tuples, but I think in practice that wouldn't happen because the planner > wouldn't choose an indexscan when very large numbers of tuples are being > selected. Well, if you want to be safer, I guess you could (at runtime) decide that the table's gotten too big and fall back to the old method if you didn't entirely rip it out. I'm not sure if that'd be too ugly though, but it would mean that you wouldn't have to worry about it returning too many tuples.
On Thu, 21 Aug 2003 16:42:20 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote: >The point is that given > WHERE a = 1 OR b = 1 >you could create a plan that first indexscans on a, then indexscans on >b --- but you mustn't return any tuples in the second scan that you >already returned in the first. IndexNext solves this by evaluating the >prior-scan index conditions to see if they are true. WHERE a = 1 OR b = 2 andWHERE a = 1 OR a = 2 are totally different things. In the latter case you don't have to check prior conditions because the conditions are mutually exclusive. Is this reasonably easy to find out at plan creation time? Yes, I changed your example to make my point clear, becauseWHERE a = 1 OR a = 1 has its own set of problems. ServusManfred
Tom, > I'm toying with the notion of ripping out that logic and instead > building an in-memory hashtable of already-returned TIDs. This could > theoretically run out of memory if the multiple indexscan returns enough > tuples, but I think in practice that wouldn't happen because the planner > wouldn't choose an indexscan when very large numbers of tuples are being > selected. Don't forget all of the tyros who tune their queries through "set enable_seqscan=false". I think we'd need some kind of memory safety valve on this, like sandboxing it in sort_mem. -- -Josh BerkusAglio Database SolutionsSan Francisco
Manfred Koizar <mkoi-pg@aon.at> writes: > WHERE a = 1 OR a = 2 > are totally different things. In the latter case you don't have to > check prior conditions because the conditions are mutually exclusive. > Is this reasonably easy to find out at plan creation time? Yeah, I know, but I see no easy way to verify this (where "easy" means "doesn't take an unreasonable amount of time"). A naive check would take O(N^2) time, putting you right back where you started. Even a smart check would surely take more time per item than one hashtable probe. I'm also concerned about how much the planner would have to assume about the semantics of the operators in order to prove the conditions are mutually exclusive. Finally, I suspect that once we get rid of the O(N^2) behavior in the executor, we will find that the next biggest bottleneck is in the planner; adding more work for it to do per OR-clause item will make things worse not better. regards, tom lane
Tom Lane wrote: > I'm toying with the notion of ripping out that logic and instead > building an in-memory hashtable of already-returned TIDs. This could > theoretically run out of memory if the multiple indexscan returns enough > tuples, but I think in practice that wouldn't happen because the planner > wouldn't choose an indexscan when very large numbers of tuples are being > selected. I thought with your recent changes, you could use dynahash, which already spills to disk instead of consuming all memory? Joe
Joe Conway <mail@joeconway.com> writes: > Tom Lane wrote: >> I'm toying with the notion of ripping out that logic and instead >> building an in-memory hashtable of already-returned TIDs. This could >> theoretically run out of memory if the multiple indexscan returns enough >> tuples, but I think in practice that wouldn't happen because the planner >> wouldn't choose an indexscan when very large numbers of tuples are being >> selected. > I thought with your recent changes, you could use dynahash, which > already spills to disk instead of consuming all memory? I was going to use dynahash, but it doesn't spill to disk. You're confusing that with the HashJoin mechanism, which is quite different and only really useful for joins. regards, tom lane
> I'm toying with the notion of ripping out that logic and instead > building an in-memory hashtable of already-returned TIDs. This could > theoretically run out of memory if the multiple indexscan returns enough > tuples, but I think in practice that wouldn't happen because the planner > wouldn't choose an indexscan when very large numbers of tuples are being > selected. > > Comments? Sounds kind of like a bitmap index almost.. Chris
On 21 Aug 2003 at 16:42, Tom Lane wrote: > Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > > Within the scope of the new hashed IN stuff I believe so in at least some > > cases. I have a few million row table of integers where searching for > > values IN (~10000 values) takes longer than creating the temp table, > > copying into it and doing the in subquery. > > I did some profiling and soon realized that the main problem is the > executor's trick for not returning the same row multiple times in a > multiple-indexscan plan node. The point is that given > WHERE a = 1 OR b = 1 > you could create a plan that first indexscans on a, then indexscans on > b --- but you mustn't return any tuples in the second scan that you > already returned in the first. IndexNext solves this by evaluating the > prior-scan index conditions to see if they are true. Which is okay if > there aren't too many of them. But when you have an N-element IN list > this means you are going to do O(N^2) index expression evaluations. > In the 10000-element IN-list test case, ExecQual gets called almost > 50 million times :-( > > I'm toying with the notion of ripping out that logic and instead > building an in-memory hashtable of already-returned TIDs. This could > theoretically run out of memory if the multiple indexscan returns enough > tuples, but I think in practice that wouldn't happen because the planner > wouldn't choose an indexscan when very large numbers of tuples are being > selected. If I understand it correctly, we are essentially looking at ~10000 individual index scans, in above example, isn't it? So if planner takes this in account, it probably would not opt for seq. scan. Other idea could be, index the in list first and search the index using locality of values in the in list. If the 10K values in the list are between 10K-20K, they should be pretty easy to locate with single index sweep, assuming uniform distribution. May be planner could build this IN list index before drawing plan to determine cardinality and distribution of individual values. On the least side, it could draw a plan for fetching a tuple range between min and max of IN values and building in memory hash of these values for comparison. That's would surely be cheaper than scanning entire table/result set over ad over Frankly, I would say a temp table is far better solution..:-) Just a thought... ByeShridhar -- Youth doesn't excuse everything. -- Dr. Janice Lester (in Kirk's body), "Turnabout Intruder", stardate 5928.5.
Tom Lane <tgl@sss.pgh.pa.us> writes: > Finally, I suspect that once we get rid of the O(N^2) behavior in the > executor, we will find that the next biggest bottleneck is in the > planner; adding more work for it to do per OR-clause item will make > things worse not better. But time spent in the planner doesn't matter if you're using prepared queries which is true for OLTP applications where there are very few queries being executed many many times. I hear web sites are quite popular these days. I missed the beginning of this thread so I'm not sure if it's relevant. But wouldn't it be possible to just check if all the clauses and see if they're using precisely the same index with an equality type operator? That won't catch things like "a BETWEEN 1 AND 2 OR a BETWEEN 5 AND 6" but it will catch things like "a IN (1,2,3,4,5,6)". And I don't see how it wouldn't be linear in the number of clauses. -- greg
On 22 Aug 2003, Greg Stark wrote: > > Tom Lane <tgl@sss.pgh.pa.us> writes: > > > Finally, I suspect that once we get rid of the O(N^2) behavior in the > > executor, we will find that the next biggest bottleneck is in the > > planner; adding more work for it to do per OR-clause item will make > > things worse not better. > > But time spent in the planner doesn't matter if you're using prepared queries > which is true for OLTP applications where there are very few queries being > executed many many times. I hear web sites are quite popular these days. True, but not every application is such. You still need to balance planning time with other concerns. > I missed the beginning of this thread so I'm not sure if it's relevant. But > wouldn't it be possible to just check if all the clauses and see if they're > using precisely the same index with an equality type operator? That won't > catch things like "a BETWEEN 1 AND 2 OR a BETWEEN 5 AND 6" but it will catch > things like "a IN (1,2,3,4,5,6)". And I don't see how it wouldn't be linear > in the number of clauses. But that wouldn't help unless you made sure that what was being compared was not the same AFAICT (since the point would be to not need to check if the rows were already returned) since a=1 or a=1 is legal if meaningless. And that's not necessarily immediately obvious depending on the behavior of the = operator without trying it, for example does where a='c ' or a='c' have a redundancy or not?
Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > Well, if you want to be safer, I guess you could (at runtime) decide that > the table's gotten too big and fall back to the old method if you didn't > entirely rip it out. I'm not sure if that'd be too ugly though, but it > would mean that you wouldn't have to worry about it returning too many > tuples. I did it this way --- it falls back to the old code if the TID hash table grows to exceed SortMem. Should be noticeably faster than the old code for reasonably-sized IN lists. regards, tom lane