Thread: Using the right tool

Using the right tool

From

"Nathan Hopper"

Date:

08 September 2002, 23:08:37

I spent about 12 hours crawling over Google searches, Usenet archives, and a
great deal of GIS matter without finding much in the way of solid answers.
Perhaps someone on this list can provide a bit of insight.

I'm in the need of cataloging a growing collection of USGS satellite
imagery. That in itself has been handled rather handily using flat files,
but with a recent decision to acquire non-USGS data (potentially tens of
thousands of images) covering geographical regions, the need for a reliable
catalog is paramount. You can imaging how easily this would become a
management nightmare. This data will be tagged, cataloged, and filed by me,
rather than the easy-to-handle USGS data that has a common packaging and
naming style.

Most people who create an image library are doing so for web-based
applications, usually, it seems, using MySQL. While I have no convictions to
either database (it's just a tool, so the end is more important than the
means), I've decided to use Postgres since it has strong ties to the GIS
community.

But for this task, I've encountered little in the way of stories from users
who have used Postgres to catalog image data, despite fairly intense
digging. The thing is, these are not small images by any means. So what
works well for a few thousand < 250k JPEGS might not work so well for a few
thousand 25-125 meg TIFFs and SID files.

Can anyone suggest some references for storage of binary objects of this
magnitude (100 gig)? While it isn't particularly important to store the file
in the database itself, it sure would simplify things -- that way if the
file is moved, the db link isn't broken. Rather than reinvent the wheel and
build a database and interface solution, is any information available on
implementations such as this?

Regards,
Nathan Hopper

Re: Using the right tool

From

Tom Lane

Date:

09 September 2002, 12:50:44

"Nathan Hopper" <nathanh@broszengineering.com> writes:
> But for this task, I've encountered little in the way of stories from users
> who have used Postgres to catalog image data, despite fairly intense
> digging. The thing is, these are not small images by any means. So what
> works well for a few thousand < 250k JPEGS might not work so well for a few
> thousand 25-125 meg TIFFs and SID files.
> Can anyone suggest some references for storage of binary objects of this
> magnitude (100 gig)?

Nathan, I'd suggest talking to the guys at Refractions Inc (Dave Blasby
and Paul Ramsey).  I'm pretty sure they have direct experience with
exactly this sort of thing ... but I dunno how closely they follow the
PG mailing lists, so they might not see your request.

            regards, tom lane

Vacuum and indexes problem

From

Joe Murphy

Date:

09 September 2002, 13:06:36

I'm running a simple query on a simple table (see create syntax below).

before running vacuum on the table explain tells me that the index
"mytable_id_name_idx" is being used
after running vacuum on the table explain tells me that a sequential scan is
being used.
If I run reindex, I'm back to the index being used.

Any ideas why this is happening?

PG 7.2.1 on Solaris 2.6 (and HP-UX 11.00)

Output of the commands below


CREATE TABLE mytable (
        id          INT NOT NULL,
        name        TEXT NOT NULL,
        num     INT NOT NULL,
        answer      INT NOT NULL,
        field1     INT,
        field2     INT,
        field3     TEXT,
        field4     TEXT
        );

CREATE INDEX mytable_id_name_idx ON mytable (id,name);
CREATE INDEX mytable_num_idx ON mytable (num);
CREATE UNIQUE INDEX mytable_id_num_idx ON mytable (id,num);


aw_db_joe_1=# explain select * from mytable where id = 1 and name = 'john';
NOTICE:  QUERY PLAN:

Index Scan using mytable_id_name_idx on mytable  (cost=0.00..4.83 rows=1
width=116)

EXPLAIN

aw_db_joe_1=# vacuum mytable;
VACUUM

aw_db_joe_1=# explain select * from mytable where id = 1 and name = 'john';
NOTICE:  QUERY PLAN:

Seq Scan on mytable  (cost=0.00..0.00 rows=1 width=116)

EXPLAIN

aw_db_joe_1=# reindex table mytable;
REINDEX

aw_db_joe_1=# explain select * from mytable where id = 1 and name = 'john';
NOTICE:  QUERY PLAN:

Index Scan using mytable_id_name_idx on mytable  (cost=0.00..4.83 rows=1
width=116)

EXPLAIN

Re: Vacuum and indexes problem

From

Martijn van Oosterhout

Date:

09 September 2002, 13:40:04

Firstly, how many rows in the table? If it's less than a few dozen, a seq
scan is the right answer. Secondly, vacuum analyse is usually recommended
reasonably often.

HTH,

On Mon, Sep 09, 2002 at 03:04:30PM +0100, Joe Murphy wrote:
> I'm running a simple query on a simple table (see create syntax below).
>
> before running vacuum on the table explain tells me that the index
> "mytable_id_name_idx" is being used
> after running vacuum on the table explain tells me that a sequential scan is
> being used.
> If I run reindex, I'm back to the index being used.
>
> Any ideas why this is happening?
>
> PG 7.2.1 on Solaris 2.6 (and HP-UX 11.00)
>
> Output of the commands below
>
>
> CREATE TABLE mytable (
>         id          INT NOT NULL,
>         name        TEXT NOT NULL,
>         num     INT NOT NULL,
>         answer      INT NOT NULL,
>         field1     INT,
>         field2     INT,
>         field3     TEXT,
>         field4     TEXT
>         );
>
> CREATE INDEX mytable_id_name_idx ON mytable (id,name);
> CREATE INDEX mytable_num_idx ON mytable (num);
> CREATE UNIQUE INDEX mytable_id_num_idx ON mytable (id,num);
>
>
> aw_db_joe_1=# explain select * from mytable where id = 1 and name = 'john';
> NOTICE:  QUERY PLAN:
>
> Index Scan using mytable_id_name_idx on mytable  (cost=0.00..4.83 rows=1
> width=116)
>
> EXPLAIN
>
> aw_db_joe_1=# vacuum mytable;
> VACUUM
>
> aw_db_joe_1=# explain select * from mytable where id = 1 and name = 'john';
> NOTICE:  QUERY PLAN:
>
> Seq Scan on mytable  (cost=0.00..0.00 rows=1 width=116)
>
> EXPLAIN
>
> aw_db_joe_1=# reindex table mytable;
> REINDEX
>
> aw_db_joe_1=# explain select * from mytable where id = 1 and name = 'john';
> NOTICE:  QUERY PLAN:
>
> Index Scan using mytable_id_name_idx on mytable  (cost=0.00..4.83 rows=1
> width=116)
>
> EXPLAIN
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly

--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> There are 10 kinds of people in the world, those that can do binary
> arithmetic and those that can't.

Re: Vacuum and indexes problem

From

Joe Murphy

Date:

09 September 2002, 14:10:27

There were fewer than a dozen rows OK, I've tested with ~1000 and indexes
are used.
Thanks.
Martijn van Oosterhout wrote:
Firstly, how many rows in the table? If it's less
than a few dozen, a seq
scan is the right answer. Secondly, vacuum analyse is usually recommended
reasonably often.
HTH,
On Mon, Sep 09, 2002 at 03:04:30PM +0100, Joe Murphy wrote:
> I'm running a simple query on a simple table (see create syntax below).
>
> before running vacuum on the table explain tells me that the index
> "mytable_id_name_idx" is being used
> after running vacuum on the table explain tells me that a sequential
scan is
> being used.
> If I run reindex, I'm back to the index being used.
>
> Any ideas why this is happening?
>
> PG 7.2.1 on Solaris 2.6 (and HP-UX 11.00)
>
> Output of the commands below
>
>
> CREATE TABLE mytable (
>         id         
INT NOT NULL,
>         name       
TEXT NOT NULL,
>         num    
INT NOT NULL,
>         answer     
INT NOT NULL,
>         field1    
INT,
>         field2    
INT,
>         field3    
TEXT,
>         field4    
TEXT
>         );
>
> CREATE INDEX mytable_id_name_idx ON mytable (id,name);
> CREATE INDEX mytable_num_idx ON mytable (num);
> CREATE UNIQUE INDEX mytable_id_num_idx ON mytable (id,num);
>
>
> aw_db_joe_1=# explain select * from mytable where id = 1 and name
= 'john';
> NOTICE:  QUERY PLAN:
>
> Index Scan using mytable_id_name_idx on mytable  (cost=0.00..4.83
rows=1
> width=116)
>
> EXPLAIN
>
> aw_db_joe_1=# vacuum mytable;
> VACUUM
>
> aw_db_joe_1=# explain select * from mytable where id = 1 and name
= 'john';
> NOTICE:  QUERY PLAN:
>
> Seq Scan on mytable  (cost=0.00..0.00 rows=1 width=116)
>
> EXPLAIN
>
> aw_db_joe_1=# reindex table mytable;
> REINDEX
>
> aw_db_joe_1=# explain select * from mytable where id = 1 and name
= 'john';
> NOTICE:  QUERY PLAN:
>
> Index Scan using mytable_id_name_idx on mytable  (cost=0.00..4.83
rows=1
> width=116)
>
> EXPLAIN
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
--
Martijn van Oosterhout   <kleptog@svana.org> 
http://svana.org/kleptog/
> There are 10 kinds of people in the world, those that can do binary
> arithmetic and those that can't.

-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

                       
JoeMurphy - AerSoft Limited 
             2 Northumberland Avenue, Dun Laoghaire, Co.
Dublin.            
  phone: +353-1-2301166     direct: +353-1-2145953     fax:
+353-1-2301167  
  mailto:joe@aersoft.com   mobile: +353-86-8526181  http://www.aersoft.com 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~