Home > mailing lists

Re: SQL WHERE: many sql or large IN() - Mailing list pgsql-general

From	Oleg Bartunov
Subject	Re: SQL WHERE: many sql or large IN()
Date	April 6, 2007 10:36:01
Msg-id	Pine.LNX.4.64.0704061735050.12152@sn.sai.msu.ru Whole thread Raw
In response to	Re: SQL WHERE: many sql or large IN() (tom <tom@tacocat.net>)
List	pgsql-general

Tree view

Aha,

then why not use gin index for text[] ?

see, for example, my testing
http://www.sai.msu.su/~megera/wiki/GinTest

oleg
On Fri, 6 Apr 2007, tom wrote:

> If I read this right, intarray is for reading values from an array data type.
>
> I don't have this.
>
> I have a varchar() field that is indexed (unique), call it 'foo'
>
> I have a choice of running:
>
> SELECT bar FROM tokens WHERE foo IN ('apple','orange','biscuit'....) for up
> to ~300 words
>
> OR
>
> SELECT bar FROM tokens WHERE foo = 'apple' up to ~300 times as a
> prepared/cached SQL statements.
>
>
>
> On Apr 6, 2007, at 9:10 AM, Oleg Bartunov wrote:
>
>> Tom,
>>
>> have you seen contrib/intarray ?
>>
>> Oleg
>> On Fri, 6 Apr 2007, tom wrote:
>>
>>> I'm wondering where the differences are in running two different types of
>>> SQL statements.
>>>
>>> Given ~300 tokens/words I can either run 1 sql statement with a large list
>>> in a "WHERE foo IN (...300 tokens...)"
>>> or I can run ~300 statements, one for each token.
>>> In the first case, the SQL is not prepared, but just executed.
>>> In the second case, the SQL is prepared and run as a cached execution plan
>>> (I think).
>>>
>>> Now.  It would seem that the second approach would be painfully slow.  But
>>> I'm not sure that I'm seeing this.
>>> Currently I have <5 users.  As always, this might change...
>>>
>>> Before I start going about coding and testing lots of stuff I thought I
>>> would ask for some historical experiences someone might have had when
>>> comparing these two approaches and if there are inflection points between
>>> the performance in terms of the number of tokens or simultaneous users.
>>>
>>> I should add that the tokens are either indexed or primary indexed but in
>>> both cases, unique, and not guaranteed to exist in every case.
>>>
>>> Initially it seems that the WHERE IN (...) approach takes a turn for the
>>> worse when the list gets very large.
>>> It also seems to do comparatively worse when the number of tokens is very
>>> small.
>>> But I can't claim any scientifically sound basis for making this
>>> distinction.
>>>
>>> Any experiences someone would like to share?
>>>
>>> ---------------------------(end of broadcast)---------------------------
>>> TIP 1: if posting/reading through Usenet, please send an appropriate
>>>    subscribe-nomail command to majordomo@postgresql.org so that your
>>>    message can get through to the mailing list cleanly
>>
>>     Regards,
>>         Oleg
>> _____________________________________________________________
>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
>> Sternberg Astronomical Institute, Moscow University, Russia
>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>> phone: +007(495)939-16-83, +007(495)939-23-83
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 6: explain analyze is your friend
>>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

pgsql-general by date:

From: tom
Date: 06 April 2007, 10:23:59
Subject: Re: SQL WHERE: many sql or large IN()

From: Listmail
Date: 06 April 2007, 11:07:59
Subject: Re: SQL WHERE: many sql or large IN()

Re: SQL WHERE: many sql or large IN() - Mailing list pgsql-general

Previous

Next