Thread: Best way to reduce server rounds getting big BLOBs

Best way to reduce server rounds getting big BLOBs

From
Jorge Arévalo
Date:
Hello,

I'd like to know what's the best way to reduce the number of server rounds in a libpq C app that fetches BLOBs from a
remotePostgreSQL server. 

About 75% of the time my app uses is spent querying database. I basically get binary objects (images). I have to fetch
allthe images from a table. This table can be really big (in number of rows) and each image can be big too. 

I guess I should go for cursors. If I understood the concept of "cursor", basically the query is executed, a ResultSet
isgenerated inside the database server, and the client receives a "pointer" to this ResultSet. You can get all the rows
bymoving this pointer over the ResultSet, calling the right functions. But you still have to go to the database for
eachchunk of data. Am I right? 

I also understand that the "perfect" situation would be to have all the table content in memory in just one server
round,available for my app. But as that's not scalable at all, I want a "good enough" method to avoid (expensive)
travelsto database server. 

Are cursors my best bet?

Many thanks in advance, and sorry if the question is too naive.

--
Jorge Arevalo
Freelance developer

http://www.krop.com/jorgearevalo
http://about.me/jorgeas80

Enviado con Sparrow (http://www.sparrowmailapp.com/?sig)




Re: Best way to reduce server rounds getting big BLOBs

From
Merlin Moncure
Date:
On Wed, May 15, 2013 at 11:31 AM, Jorge Arévalo
<jorgearevalo@libregis.org> wrote:
> Hello,
>
> I'd like to know what's the best way to reduce the number of server rounds in a libpq C app that fetches BLOBs from a
remotePostgreSQL server. 
>
> About 75% of the time my app uses is spent querying database. I basically get binary objects (images). I have to
fetchall the images from a table. This table can be really big (in number of rows) and each image can be big too. 

#1 thing to make sure of when getting big blobs is that you are
fetching data in binary.  If you are not, do so before changing
anything else (I wrote a library to help do that, libpqtypes).

> I guess I should go for cursors. If I understood the concept of "cursor", basically the query is executed, a
ResultSetis generated inside the database server, and the client receives a "pointer" to this ResultSet. You can get
allthe rows by moving this pointer over the ResultSet, calling the right functions. But you still have to go to the
databasefor each chunk of data. Am I right? 

cursors are a way to page through a query result without fetching all
the data at once.  this would be most useful if you are processing one
row at a time on the client side.  but if the client needs all the
data held in memory, cursors will only help in terms of reducing the
temporary memory demands while doing the transfer.  So it's hard to
say if it's worth using them until you describe the client side
requirements a little better.

merlin


Re: Best way to reduce server rounds getting big BLOBs

From
Jorge Arévalo
Date:
Hello,

First of all, sorry for the duplicate. I accidentally sent the same message twice, and I got responses in both of them.
Iinclude here the person who responded to the other messsage   


El miércoles 15 de mayo de 2013 a las 20:43, Merlin Moncure escribió:

> On Wed, May 15, 2013 at 11:31 AM, Jorge Arévalo
> <jorgearevalo@libregis.org (mailto:jorgearevalo@libregis.org)> wrote:
> > Hello,
> >
> > I'd like to know what's the best way to reduce the number of server rounds in a libpq C app that fetches BLOBs from
aremote PostgreSQL server. 
> >
> > About 75% of the time my app uses is spent querying database. I basically get binary objects (images). I have to
fetchall the images from a table. This table can be really big (in number of rows) and each image can be big too. 
>
> #1 thing to make sure of when getting big blobs is that you are
> fetching data in binary. If you are not, do so before changing
> anything else (I wrote a library to help do that, libpqtypes).

Yes, I get data in binary. But thanks for the reference to the library.

>
> > I guess I should go for cursors. If I understood the concept of "cursor", basically the query is executed, a
ResultSetis generated inside the database server, and the client receives a "pointer" to this ResultSet. You can get
allthe rows by moving this pointer over the ResultSet, calling the right functions. But you still have to go to the
databasefor each chunk of data. Am I right? 
>
> cursors are a way to page through a query result without fetching all
> the data at once. this would be most useful if you are processing one
> row at a time on the client side. but if the client needs all the
> data held in memory, cursors will only help in terms of reducing the
> temporary memory demands while doing the transfer. So it's hard to
> say if it's worth using them until you describe the client side
> requirements a little better.
>
Ok, more context here. The images stored in the database, are really PostGIS Raster images. So, they are stored in this
format[1], and are read in this another one [2], as binary objects (you can get a string representation of the format
too,but is slower). 

My point is to make the GDAL PostGIS Raster driver [3] faster, and my approach is: "instead of one server round per
eachuser's data request, try to get the whole raster data with the fewer number of queries". 

GDAL library raises a read operation for each chunk of data, in case of tiled raster formats. PostGIS Raster is a
(special)tiled raster format, but one read operation per data chunk means one query (= one trip to server), and that's
expensive.Even the simplest app (transform one raster table into a GeoTIFF file) spends 75% of its time in database
queries,because this architecture.   

Plus, each data read's request requires an intersection between the area desired and the data stored in database (using
ST_Intersects[4]). An intersection in the database is more expensive than in memory. 

So, my goal is to save server rounds. My thought was: using a cursor, I still have to go to the database, but my data
isstored in a ResultSet, and I just need to seek over it, instead of raising expensive queries. Am I wrong? Is there
anybetter approach? 

Eduardo, about your response, many thanks. I think blob data compression doesn't apply here. And out-db storage is an
optionof the PostGIS Raster format, but it's not mandatory. 


[1] http://trac.osgeo.org/postgis/browser/trunk/raster/doc/RFC1-SerializedFormat
[2] http://trac.osgeo.org/postgis/browser/trunk/raster/doc/RFC2-WellKnownBinaryFormat
[3] http://trac.osgeo.org/gdal/wiki/frmts_wtkraster.html
[4] http://postgis.net/docs/manual-2.0/ST_Intersects.html


Best regards,
Jorge


> merlin