Thread: What's a good way to improve this query?

What's a good way to improve this query?

From

Jorge Arévalo

Date:

05 June 2013, 18:22:37

Hello,

I'm running this PostGIS Raster query

select
st_scalex(rast),
st_scaley(rast),
st_skewx(rast),
st_skewy(rast),
st_width(rast),
st_height(rast),
rid,
st_upperleftx(rast),
st_upperlefty(rast),
st_numbands(rast)
from
my_postgis_raster_table



I want to remark that, even when 'rast' is a complex type and can be really big, I'm getting just metadata. Not the
whole'rast' column. Anyway, the average dimensions of a 'rast' column in like 600x400 pixels (8 bits per pixel). So,
notso big (about 234 KB per rast object).  

My table has 1257 rows, and this query takes about 45 secs to execute (45646 msecs). I think it's too slow. I'm just
gettingmetadata, not the whole 'rast' object, as said.  

This is the explain analyze output

Seq Scan on my_postgis_raster_table (cost=0.00..198.85 rows=1257 width=36) (actual time=86.867..51861.495 rows=1257
loops=1)
Total runtime: 51863.919 ms



So, basically a sequential scan. As expected, I guess (I'm not a postgres expert, so sorry if I'm talking nonsense)

I've calculated the effective transfer rate for this table

SELECT pg_size_pretty(CAST(pg_relation_size('my_postgis_raster_table') / 45646 * 1000 as int8)) AS bytes_per_second;

As it's 27KB. Isn't it a slow rate? Is there any kind of index I could create to speed this query? Maybe use some kind
ofcache system? 

Many thanks in advance,

--
Jorge Arevalo
Freelance developer

http://www.krop.com/jorgearevalo
http://about.me/jorgeas80

Enviado con Sparrow (http://www.sparrowmailapp.com/?sig)

Re: What's a good way to improve this query?

From

Paul Ramsey

Date:

05 June 2013, 18:32:09

Well, your objects are larger than the page size, so you're getting them out of the toast tables, not directly out of
mainstorage. You may also have your type declared as 'main' storage, which means it's zipped up, so it's being unzipped
beforeyou can access it, that's also an overhead.   

For metadata retrieval, the thing to do is store the metadata at the head of the object (which I'm not looking into
pgrasterto see if you do, but I'll assume for now) and then use PG_DETOAST_DATUM_SLICE in the metadata accessor
function,so that you only pull the bytes you want, rather than detoasting the whole object just to get the header
information.

You may be causing further pain by having all the metadata functions separate, so that in fact the object is being read
9separate times by your different functions. It'll float into cache quickly enough, but the uncompression step at each
accesswill still be there. You might want to stuff the query through a sampling profiler (OSX Shark!) and confirm, but
Iwould guess you'll find a lot of cycles spinning in zlib for this query. 

Paul

--
Paul Ramsey
http://cleverelephant.ca
http://postgis.net

On Wednesday, June 5, 2013 at 11:22 AM, Jorge Arévalo wrote:

> Hello,
>
> I'm running this PostGIS Raster query
>
> select
> st_scalex(rast),
> st_scaley(rast),
> st_skewx(rast),
> st_skewy(rast),
> st_width(rast),
> st_height(rast),
> rid,
> st_upperleftx(rast),
> st_upperlefty(rast),
> st_numbands(rast)
> from
> my_postgis_raster_table
>
>
>
> I want to remark that, even when 'rast' is a complex type and can be really big, I'm getting just metadata. Not the
whole'rast' column. Anyway, the average dimensions of a 'rast' column in like 600x400 pixels (8 bits per pixel). So,
notso big (about 234 KB per rast object).   
>
> My table has 1257 rows, and this query takes about 45 secs to execute (45646 msecs). I think it's too slow. I'm just
gettingmetadata, not the whole 'rast' object, as said.   
>
> This is the explain analyze output
>
> Seq Scan on my_postgis_raster_table (cost=0.00..198.85 rows=1257 width=36) (actual time=86.867..51861.495 rows=1257
loops=1)
> Total runtime: 51863.919 ms
>
>
>
> So, basically a sequential scan. As expected, I guess (I'm not a postgres expert, so sorry if I'm talking nonsense)
>
> I've calculated the effective transfer rate for this table
>
> SELECT pg_size_pretty(CAST(pg_relation_size('my_postgis_raster_table') / 45646 * 1000 as int8)) AS bytes_per_second;

>
> As it's 27KB. Isn't it a slow rate? Is there any kind of index I could create to speed this query? Maybe use some
kindof cache system? 
>
> Many thanks in advance,
>
> --
> Jorge Arevalo
> Freelance developer
>
> http://www.krop.com/jorgearevalo
> http://about.me/jorgeas80
>
> Enviado con Sparrow (http://www.sparrowmailapp.com/?sig)
>
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org (mailto:pgsql-general@postgresql.org))
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

Re: What's a good way to improve this query?

From

Jorge Arévalo

Date:

06 June 2013, 10:24:03

Hello,


El miércoles 5 de junio de 2013 a las 20:31, Paul Ramsey escribió:

> Well, your objects are larger than the page size, so you're getting them out of the toast tables, not directly out of
mainstorage. You may also have your type declared as 'main' storage, which means it's zipped up, so it's being unzipped
beforeyou can access it, that's also an overhead. 

Good to know. I'll check it.

>
> For metadata retrieval, the thing to do is store the metadata at the head of the object (which I'm not looking into
pgrasterto see if you do, but I'll assume for now) and then use PG_DETOAST_DATUM_SLICE in the metadata accessor
function,so that you only pull the bytes you want, rather than detoasting the whole object just to get the header
information.
>
Ok. I'll check the PostGIS Raster functions too.

>
> You may be causing further pain by having all the metadata functions separate, so that in fact the object is being
read9 separate times by your different functions. It'll float into cache quickly enough, but the uncompression step at
eachaccess will still be there. You might want to stuff the query through a sampling profiler (OSX Shark!) and confirm,
butI would guess you'll find a lot of cycles spinning in zlib for this query. 
>
Yes, you're right. Actually, replacing the calls with a general ST_Metadata call and unpacking the record at client
side,it's really faster. Many thanks! 

And many thanks about Shark too. Looks great.

Best regards,
Jorge


>
> Paul
>
> --
> Paul Ramsey
> http://cleverelephant.ca
> http://postgis.net
>
>
> On Wednesday, June 5, 2013 at 11:22 AM, Jorge Arévalo wrote:
>
> > Hello,
> >
> > I'm running this PostGIS Raster query
> >
> > select
> > st_scalex(rast),
> > st_scaley(rast),
> > st_skewx(rast),
> > st_skewy(rast),
> > st_width(rast),
> > st_height(rast),
> > rid,
> > st_upperleftx(rast),
> > st_upperlefty(rast),
> > st_numbands(rast)
> > from
> > my_postgis_raster_table
> >
> >
> >
> > I want to remark that, even when 'rast' is a complex type and can be really big, I'm getting just metadata. Not the
whole'rast' column. Anyway, the average dimensions of a 'rast' column in like 600x400 pixels (8 bits per pixel). So,
notso big (about 234 KB per rast object).   
> >
> > My table has 1257 rows, and this query takes about 45 secs to execute (45646 msecs). I think it's too slow. I'm
justgetting metadata, not the whole 'rast' object, as said.   
> >
> > This is the explain analyze output
> >
> > Seq Scan on my_postgis_raster_table (cost=0.00..198.85 rows=1257 width=36) (actual time=86.867..51861.495 rows=1257
loops=1)
> > Total runtime: 51863.919 ms
> >
> >
> >
> > So, basically a sequential scan. As expected, I guess (I'm not a postgres expert, so sorry if I'm talking nonsense)
> >
> > I've calculated the effective transfer rate for this table
> >
> > SELECT pg_size_pretty(CAST(pg_relation_size('my_postgis_raster_table') / 45646 * 1000 as int8)) AS
bytes_per_second;  
> >
> > As it's 27KB. Isn't it a slow rate? Is there any kind of index I could create to speed this query? Maybe use some
kindof cache system? 
> >
> > Many thanks in advance,
> >
> > --
> > Jorge Arevalo
> > Freelance developer
> >
> > http://www.krop.com/jorgearevalo
> > http://about.me/jorgeas80
> >
> > Enviado con Sparrow (http://www.sparrowmailapp.com/?sig)
> >
> >
> >
> >
> > --
> > Sent via pgsql-general mailing list (pgsql-general@postgresql.org (mailto:pgsql-general@postgresql.org))
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-general
>

Re: What's a good way to improve this query?

From

Jorge Arévalo

Date:

07 June 2013, 18:18:32

Hello again,

El jueves 6 de junio de 2013 a las 12:23, Jorge Arévalo escribió:

> Hello,
>
>
> El miércoles 5 de junio de 2013 a las 20:31, Paul Ramsey escribió:
>
> > Well, your objects are larger than the page size, so you're getting them out of the toast tables, not directly out
ofmain storage. You may also have your type declared as 'main' storage, which means it's zipped up, so it's being
unzippedbefore you can access it, that's also an overhead. 
>
> Good to know. I'll check it.
Checked. Storage strategy is 'extended'. If I understood well, that means that the data is compressed and, if it's
stilltoo long, it's moved out of main storage. Maybe a 'external' strategy would perform faster (no need to unzip),
withthe penalty of increased storage space. Am I right? 

At the end, if I want more speed while fetching data, I should change the storage strategy of the raster type. That
wouldprobably require rebuild the entire database. I don't think it worths (unless it performs *really* faster) 
>
>
> > For metadata retrieval, the thing to do is store the metadata at the head of the object (which I'm not looking into
pgrasterto see if you do, but I'll assume for now) and then use PG_DETOAST_DATUM_SLICE in the metadata accessor
function,so that you only pull the bytes you want, rather than detoasting the whole object just to get the header
information.
>
> Ok. I'll check the PostGIS Raster functions too.
>
Yes, PG_DETOAST_DATUM_SLICE is used in all metadata functions.

> >
> > You may be causing further pain by having all the metadata functions separate, so that in fact the object is being
read9 separate times by your different functions. It'll float into cache quickly enough, but the uncompression step at
eachaccess will still be there. You might want to stuff the query through a sampling profiler (OSX Shark!) and confirm,
butI would guess you'll find a lot of cycles spinning in zlib for this query. 
>
> Yes, you're right. Actually, replacing the calls with a general ST_Metadata call and unpacking the record at client
side,it's really faster. Many thanks! 
>
> And many thanks about Shark too. Looks great.
>
> Best regards,
> Jorge
>
So, I think the only thing I can do is try to rewrite the queries, if possible. But if I need the metadata of all tiles
inorder to know individual pixel sizes and dimensions, there's not much space for improvement.   

I guess things like choosing a tile size that fits in a postgres page size may help. Or creating a cache with
precalculatedvalues at client-side.   

Any other suggestions to improve the experience in data visualization using PostGIS Raster as raster storage system are
welcome.

Many thanks again, Paul.

--
Jorge

> > Paul
> >
> > --
> > Paul Ramsey
> > http://cleverelephant.ca
> > http://postgis.net
> >
> >
> > On Wednesday, June 5, 2013 at 11:22 AM, Jorge Arévalo wrote:
> >
> > > Hello,
> > >
> > > I'm running this PostGIS Raster query
> > >
> > > select
> > > st_scalex(rast),
> > > st_scaley(rast),
> > > st_skewx(rast),
> > > st_skewy(rast),
> > > st_width(rast),
> > > st_height(rast),
> > > rid,
> > > st_upperleftx(rast),
> > > st_upperlefty(rast),
> > > st_numbands(rast)
> > > from
> > > my_postgis_raster_table
> > >
> > >
> > >
> > > I want to remark that, even when 'rast' is a complex type and can be really big, I'm getting just metadata. Not
thewhole 'rast' column. Anyway, the average dimensions of a 'rast' column in like 600x400 pixels (8 bits per pixel).
So,not so big (about 234 KB per rast object).   
> > >
> > > My table has 1257 rows, and this query takes about 45 secs to execute (45646 msecs). I think it's too slow. I'm
justgetting metadata, not the whole 'rast' object, as said.   
> > >
> > > This is the explain analyze output
> > >
> > > Seq Scan on my_postgis_raster_table (cost=0.00..198.85 rows=1257 width=36) (actual time=86.867..51861.495
rows=1257loops=1) 
> > > Total runtime: 51863.919 ms
> > >
> > >
> > >
> > > So, basically a sequential scan. As expected, I guess (I'm not a postgres expert, so sorry if I'm talking
nonsense)
> > >
> > > I've calculated the effective transfer rate for this table
> > >
> > > SELECT pg_size_pretty(CAST(pg_relation_size('my_postgis_raster_table') / 45646 * 1000 as int8)) AS
bytes_per_second;  
> > >
> > > As it's 27KB. Isn't it a slow rate? Is there any kind of index I could create to speed this query? Maybe use some
kindof cache system? 
> > >
> > > Many thanks in advance,
> > >
> > > --
> > > Jorge Arevalo
> > > Freelance developer
> > >
> > > http://www.krop.com/jorgearevalo
> > > http://about.me/jorgeas80
> > >
> > > Enviado con Sparrow (http://www.sparrowmailapp.com/?sig)
> > >
> > >
> > >
> > >
> > > --
> > > Sent via pgsql-general mailing list (pgsql-general@postgresql.org (mailto:pgsql-general@postgresql.org))
> > > To make changes to your subscription:
> > > http://www.postgresql.org/mailpref/pgsql-general
> >
>