Re: SQL WHERE: many sql or large IN() - Mailing list pgsql-general
From | Oleg Bartunov |
---|---|
Subject | Re: SQL WHERE: many sql or large IN() |
Date | |
Msg-id | Pine.LNX.4.64.0704061735050.12152@sn.sai.msu.ru Whole thread Raw |
In response to | Re: SQL WHERE: many sql or large IN() (tom <tom@tacocat.net>) |
List | pgsql-general |
Aha, then why not use gin index for text[] ? see, for example, my testing http://www.sai.msu.su/~megera/wiki/GinTest oleg On Fri, 6 Apr 2007, tom wrote: > If I read this right, intarray is for reading values from an array data type. > > I don't have this. > > I have a varchar() field that is indexed (unique), call it 'foo' > > I have a choice of running: > > SELECT bar FROM tokens WHERE foo IN ('apple','orange','biscuit'....) for up > to ~300 words > > OR > > SELECT bar FROM tokens WHERE foo = 'apple' up to ~300 times as a > prepared/cached SQL statements. > > > > On Apr 6, 2007, at 9:10 AM, Oleg Bartunov wrote: > >> Tom, >> >> have you seen contrib/intarray ? >> >> Oleg >> On Fri, 6 Apr 2007, tom wrote: >> >>> I'm wondering where the differences are in running two different types of >>> SQL statements. >>> >>> Given ~300 tokens/words I can either run 1 sql statement with a large list >>> in a "WHERE foo IN (...300 tokens...)" >>> or I can run ~300 statements, one for each token. >>> In the first case, the SQL is not prepared, but just executed. >>> In the second case, the SQL is prepared and run as a cached execution plan >>> (I think). >>> >>> Now. It would seem that the second approach would be painfully slow. But >>> I'm not sure that I'm seeing this. >>> Currently I have <5 users. As always, this might change... >>> >>> Before I start going about coding and testing lots of stuff I thought I >>> would ask for some historical experiences someone might have had when >>> comparing these two approaches and if there are inflection points between >>> the performance in terms of the number of tokens or simultaneous users. >>> >>> I should add that the tokens are either indexed or primary indexed but in >>> both cases, unique, and not guaranteed to exist in every case. >>> >>> Initially it seems that the WHERE IN (...) approach takes a turn for the >>> worse when the list gets very large. >>> It also seems to do comparatively worse when the number of tokens is very >>> small. >>> But I can't claim any scientifically sound basis for making this >>> distinction. >>> >>> Any experiences someone would like to share? >>> >>> ---------------------------(end of broadcast)--------------------------- >>> TIP 1: if posting/reading through Usenet, please send an appropriate >>> subscribe-nomail command to majordomo@postgresql.org so that your >>> message can get through to the mailing list cleanly >> >> Regards, >> Oleg >> _____________________________________________________________ >> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), >> Sternberg Astronomical Institute, Moscow University, Russia >> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ >> phone: +007(495)939-16-83, +007(495)939-23-83 >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 6: explain analyze is your friend >> > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
pgsql-general by date: