Re: Select where id in (LARGE LIST) ? - Mailing list pgsql-jdbc

From Csaba Nagy
Subject Re: Select where id in (LARGE LIST) ?
Date
Msg-id 1122885611.2837.209.camel@coppola.muc.ecircle.de
Whole thread Raw
In response to Re: Select where id in (LARGE LIST) ?  (Jasper Potts <jasper@jasperpotts.com>)
Responses Re: Select where id in (LARGE LIST) ?
List pgsql-jdbc
I don't quite understand what you're doing here, but smells to me for
something which shouldn't be interactive in the first place. If some
query expectedly exceeds a few seconds, we make the operation
asynchronous, i.e. the user starts it, gets a page saying the thing is
currently being done, and then he receives some kind of notification
when the thing is finished, either on the web page (by periodically
reloading) or by email for really long lasting things.
Now the chunking method does have an overhead indeed, but it has a set
of good properties too: it's easily interruptible, it won't bug your
database down (it will let some other things go too between the chunks),
and you won't get connection timeouts for really long operations... it's
just more manageable in many respects.
And if you do have some long lasting things, make sure you won't let the
user hit reload 50 times... an asynchronous way of doing it will help in
this regard too, cause then you know what the user did and reload will
only show the status.

HTH,
Csaba.



On Fri, 2005-07-29 at 21:18, Jasper Potts wrote:
> I have been trying the chunk method and it wins in cases where the
> number of ids is not too high. I was comparing:
>
> (1) joining the main table with a table that had a column with ids and a
> boolean column for selected, both id columns have indexes.
> (2) select over main table where id in (...) done in 100s with prepared
> statement and results accumulated in java
>
> with a select that did a sum() of  another column the results were:
>
> No. of Selected Items  |   Join Select Time in sec   |  Chunk Select
> Time in sec
> 30                               |   0.4
> |  0.007
> 4000                           |   0.5
> |  0.24
> 30000                         |   0.7                                 |
> 1.12
>
> All of these were with 30,000 rows in main table from JDBC.
>
> These results don't take into account the time it takes to clear then
> populate the select table with selection data from client. At the moment
> this can take 1-2 seconds which is far from interactive for the user.
> Which is why I am looking for a better method.
>
> Any other suggestions? :-)
>
> I would like to push the data size up to a million, which means the user
> could in theory select a million rows and apply operation. This is going
> to get very slow with the chunk method.
>
> Many Thanks
>
> Jasper
>
> Csaba Nagy wrote:
>
> >Jasper,
> >
> >You can chunk your operation. That means to only use 100 entries in one
> >run, and repeat it until all ids were processed. Use a prepared
> >statement, that will save you some overhead. The last chunk will have
> >less entries than the parameter placeholders, so you will have to build
> >a special last statement, or to set the superfluous parameters to null
> >or to one of the values from the last chunk, depends on what kind of
> >query you have. We do all our data import/export this way, and it works
> >fine.
> >
> >Cheers,
> >Csaba.
> >
> >
> >On Fri, 2005-07-29 at 17:30, Jasper Potts wrote:
> >
> >
> >>I am working on a gui application which has a list of ids of selected
> >>items. To perform an operation on the selected items I do a
> >>"select/update .. where id in(...)". There seems to be a limit of
> >>100-200 items in the list. Is there a way of doing this with large
> >>lists, maybe 10,000, 100,000 long?
> >>
> >>The best solution I have so far is to create a selection table and write
> >>the select out to that and perform a join but the 100,000 inserts are slow.
> >>
> >>The ids are int8(long), so not talking large amounts of data, couple Mb
> >>at most. Database and client running on same machine over localhost.
> >>
> >>Many Thanks
> >>
> >>Jasper
> >>
> >>---------------------------(end of broadcast)---------------------------
> >>TIP 9: In versions below 8.0, the planner will ignore your desire to
> >>       choose an index scan if your joining column's datatypes do not
> >>       match
> >>
> >>
> >
> >
> >
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend


pgsql-jdbc by date:

Previous
From: Oliver Jowett
Date:
Subject: Timestamp changes committed to HEAD
Next
From: Andrus Adamchik
Date:
Subject: Re: BUG #1780: JDBC driver "setNull" throws for BLOB and CLOB