Re: Select where id in (LARGE LIST) ? - Mailing list pgsql-jdbc

From Jasper Potts
Subject Re: Select where id in (LARGE LIST) ?
Date
Msg-id 42EF3B8F.1020603@jasperpotts.com
Whole thread Raw
In response to Re: Select where id in (LARGE LIST) ?  (Csaba Nagy <nagy@ecircle-ag.com>)
Responses Re: Select where id in (LARGE LIST) ?  (Oliver Jowett <oliver@opencloud.com>)
Re: Select where id in (LARGE LIST) ?  (Csaba Nagy <nagy@ecircle-ag.com>)
List pgsql-jdbc
We are working on a desktop(Swing) application, postgresql is running
embedded in the application and is hence single user. We have a
scrollable view of grid of icons much like explorer. The data set is a
list of item ids resulting from a complex query that can take 30
seconds. This data set of the query results are stored in table in the
database. The user then needs to be able to browse(scroll) through the
list of items and make a selection. Then they can perform operations on
that selection. Those operations require a way of the selection being
used in a database query. The options are:

(1)    Store the selection on memory on the client.
(2)   Store the selection as a column in the query results table
(boolean maybe)
(3)    Hybrid solution where the selection is in memory and in the database.

The problem with (1) is how to transfer the selection to the db if it
gets large. This is where the chunking comes in.
The problem with the pure selection in the database route(2) is the
simple selection operations like clicking on an item to select only it.
Are too slow.
(3) could be good but is complex to implement with Threading/Locking etc.

My current plan is to go with (1) but change the storage model for the
selection. The first idea was just a set of ids of selected items,
problem is if that is 100,000 then is hard to transfer to the WHERE part
of a query. The new idea is to store it as a list of ranges eg.
(26-32,143-198,10922-10923) this could then be sent as a "WHERE (row
 >=26 and row <=32) or (row >=143 and row <=198) or (row >=10922 and row
<=10923)". As the user has to choose each range by hand it is unlikely
there could be more than 100 ranges.

The big question here is whats the max length of a query sent though
JDBC to Postgresql???

Many Thanks

Jasper

Csaba Nagy wrote:

>I don't quite understand what you're doing here, but smells to me for
>something which shouldn't be interactive in the first place. If some
>query expectedly exceeds a few seconds, we make the operation
>asynchronous, i.e. the user starts it, gets a page saying the thing is
>currently being done, and then he receives some kind of notification
>when the thing is finished, either on the web page (by periodically
>reloading) or by email for really long lasting things.
>Now the chunking method does have an overhead indeed, but it has a set
>of good properties too: it's easily interruptible, it won't bug your
>database down (it will let some other things go too between the chunks),
>and you won't get connection timeouts for really long operations... it's
>just more manageable in many respects.
>And if you do have some long lasting things, make sure you won't let the
>user hit reload 50 times... an asynchronous way of doing it will help in
>this regard too, cause then you know what the user did and reload will
>only show the status.
>
>HTH,
>Csaba.
>
>
>
>On Fri, 2005-07-29 at 21:18, Jasper Potts wrote:
>
>
>>I have been trying the chunk method and it wins in cases where the
>>number of ids is not too high. I was comparing:
>>
>>(1) joining the main table with a table that had a column with ids and a
>>boolean column for selected, both id columns have indexes.
>>(2) select over main table where id in (...) done in 100s with prepared
>>statement and results accumulated in java
>>
>>with a select that did a sum() of  another column the results were:
>>
>>No. of Selected Items  |   Join Select Time in sec   |  Chunk Select
>>Time in sec
>>30                               |   0.4
>>|  0.007
>>4000                           |   0.5
>>|  0.24
>>30000                         |   0.7                                 |
>>1.12
>>
>>All of these were with 30,000 rows in main table from JDBC.
>>
>>These results don't take into account the time it takes to clear then
>>populate the select table with selection data from client. At the moment
>>this can take 1-2 seconds which is far from interactive for the user.
>>Which is why I am looking for a better method.
>>
>>Any other suggestions? :-)
>>
>>I would like to push the data size up to a million, which means the user
>>could in theory select a million rows and apply operation. This is going
>>to get very slow with the chunk method.
>>
>>Many Thanks
>>
>>Jasper
>>
>>Csaba Nagy wrote:
>>
>>
>>
>>>Jasper,
>>>
>>>You can chunk your operation. That means to only use 100 entries in one
>>>run, and repeat it until all ids were processed. Use a prepared
>>>statement, that will save you some overhead. The last chunk will have
>>>less entries than the parameter placeholders, so you will have to build
>>>a special last statement, or to set the superfluous parameters to null
>>>or to one of the values from the last chunk, depends on what kind of
>>>query you have. We do all our data import/export this way, and it works
>>>fine.
>>>
>>>Cheers,
>>>Csaba.
>>>
>>>
>>>On Fri, 2005-07-29 at 17:30, Jasper Potts wrote:
>>>
>>>
>>>
>>>
>>>>I am working on a gui application which has a list of ids of selected
>>>>items. To perform an operation on the selected items I do a
>>>>"select/update .. where id in(...)". There seems to be a limit of
>>>>100-200 items in the list. Is there a way of doing this with large
>>>>lists, maybe 10,000, 100,000 long?
>>>>
>>>>The best solution I have so far is to create a selection table and write
>>>>the select out to that and perform a join but the 100,000 inserts are slow.
>>>>
>>>>The ids are int8(long), so not talking large amounts of data, couple Mb
>>>>at most. Database and client running on same machine over localhost.
>>>>
>>>>Many Thanks
>>>>
>>>>Jasper
>>>>
>>>>---------------------------(end of broadcast)---------------------------
>>>>TIP 9: In versions below 8.0, the planner will ignore your desire to
>>>>      choose an index scan if your joining column's datatypes do not
>>>>      match
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 6: explain analyze is your friend
>>
>>
>
>
>

pgsql-jdbc by date:

Previous
From: Oliver Jowett
Date:
Subject: Re: Timestamp changes committed to HEAD
Next
From: Oliver Jowett
Date:
Subject: Re: Select where id in (LARGE LIST) ?