Thread: Docs: GIST

Docs: GIST

From
Lincoln Yeoh
Date:
I'm a bit confused.

In 7.3 is it possible to use GIST without using any of the stuff in
contrib/? If it is, how can it be done and in which cases should it be done?

The pgsql docs about indexes keep talking about GIST here and there, but I
can't seem to use them on anything. And there's no gist in the "ops" and
access method listing.

Having the docs say Postgresql provides GIST as one of the four access
methods, GIST supports multicolumn indexes, GIST etc, is just confusing if
the docs pertaining to indexes don't also say that in a default postgresql
installation you cannot create an index using GIST (if you can actually
create a GIST index "out of box", how??).

Another thing: is Eugene Selkov's 1998 message on GIST indexes in the 7.3
docs (see GIST Indexes) still valid? There's mention of Postgresql 6.3 and
postgres95 there too.

BTW, 7.3 is GREAT! Multiple col/row returns, prepare queries, schemas etc.
Also set enable_seq_scan=off can get rolled back to whatever it was before
now right? Cool, coz I have to force index use for a particular select.

Thanks to the postgresql dev team and everyone involved!

Cheerio,
Link.


Re: Docs: GIST

From
Tom Lane
Date:
Lincoln Yeoh <lyeoh@pop.jaring.my> writes:
> I'm a bit confused.
> In 7.3 is it possible to use GIST without using any of the stuff in
> contrib/?

No, because there are no GIST opclasses in the standard installation.
They are only in contrib.

Yes, that's a bit silly.  As GIST improves out of the "academic toy"
category into the "production tool" category, I expect we will migrate
GIST opclasses into the standard installation.

            regards, tom lane

Re: Docs: GIST

From
Lincoln Yeoh
Date:
I did figure it out eventually but it'll be clearer to mention that in the
docs - e.g. the only way to use GIST is to use the stuff in contrib. Coz I
had a bit of wishful thinking - thought that maybe some bits of GIST might
have at least become useable by default in 7.3 e.g. the simpler stuff (the
docs didn't quite contradict that).

Definitely not asking for it to be rushed in tho. Software is more reliable
when the developers know what they are doing, and they get to release stuff
when they think it's ready, not when others say it is.

Cheerio,
Link.

At 12:41 AM 12/12/02 -0500, Tom Lane wrote:

>Lincoln Yeoh <lyeoh@pop.jaring.my> writes:
> > I'm a bit confused.
> > In 7.3 is it possible to use GIST without using any of the stuff in
> > contrib/?
>
>No, because there are no GIST opclasses in the standard installation.
>They are only in contrib.



CLUSTER command

From
Jean-Luc Lachance
Date:
Hi all,

I just read about the cluster command and was a little (very)
disapointed.
Clustered tables do not remain clustered after inserts.
Clustered tables are usefull when the table is very large and there are
few different keys.


Because the table file is already extended (2G limit) using different
files extension (.N)
how complicated (modifying the code) would it be to have the table files
split according to the cluster key?

This would:

Greatly improve performance when the cluster key in included in search
criteria.
Allow for a much larger table before a file has to be split (.N).
Simplify the management of symblinks (that's something else we need to
look at).
The index file for that field would no longer be required.

Of course, there should be only one cluster key per table.
The length the "key" should be short and the number of unique key should
be low as well.

SO... ?

JLL

Re: [PERFORM] CLUSTER command

From
Jean-Luc Lachance
Date:
Oh, and something else,

I think the syntax should be:

Cluster <table> on <attribute>


Maybe inheritance can be use here.
The problem is creating the new "table" when a new key is detected.
I know, I can use rules, but the optimiser is not aware of the
clustering.

Enough from me for now.

What do you think?

JLL


Jean-Luc Lachance wrote:
>
> Hi all,
>
> I just read about the cluster command and was a little (very)
> disapointed.
> Clustered tables do not remain clustered after inserts.
> Clustered tables are usefull when the table is very large and there are
> few different keys.
>
> Because the table file is already extended (2G limit) using different
> files extension (.N)
> how complicated (modifying the code) would it be to have the table files
> split according to the cluster key?
>
> This would:
>
> Greatly improve performance when the cluster key in included in search
> criteria.
> Allow for a much larger table before a file has to be split (.N).
> Simplify the management of symblinks (that's something else we need to
> look at).
> The index file for that field would no longer be required.
>
> Of course, there should be only one cluster key per table.
> The length the "key" should be short and the number of unique key should
> be low as well.
>
> SO... ?
>
> JLL
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Re: CLUSTER command

From
Stephan Szabo
Date:
On Thu, 12 Dec 2002, Jean-Luc Lachance wrote:

> Hi all,
>
> I just read about the cluster command and was a little (very)
> disapointed.
> Clustered tables do not remain clustered after inserts.
> Clustered tables are usefull when the table is very large and there are
> few different keys.
>
>
> Because the table file is already extended (2G limit) using different
> files extension (.N)
> how complicated (modifying the code) would it be to have the table files
> split according to the cluster key?

I'd vote against changing the existing CLUSTER since the existing CLUSTER
while not great does handle many different key values fairly well as well
and this solution wouldn't.  Many different key values are still
useful to cluster if you're doing searches over ranges since it lowers the
number of heap file reads necessary.  If done this should probably be
separate from the existing cluster or at least both versions should be
possible.



Re: CLUSTER command

From
Jean-Luc Lachance
Date:
The current cluster command is equivalant to:

create b as select * from a order by i;

So you would not be loosing anything.



Stephan Szabo wrote:
>
> On Thu, 12 Dec 2002, Jean-Luc Lachance wrote:
>
> > Hi all,
> >
> > I just read about the cluster command and was a little (very)
> > disapointed.
> > Clustered tables do not remain clustered after inserts.
> > Clustered tables are usefull when the table is very large and there are
> > few different keys.
> >
> >
> > Because the table file is already extended (2G limit) using different
> > files extension (.N)
> > how complicated (modifying the code) would it be to have the table files
> > split according to the cluster key?
>
> I'd vote against changing the existing CLUSTER since the existing CLUSTER
> while not great does handle many different key values fairly well as well
> and this solution wouldn't.  Many different key values are still
> useful to cluster if you're doing searches over ranges since it lowers the
> number of heap file reads necessary.  If done this should probably be
> separate from the existing cluster or at least both versions should be
> possible.

Re: CLUSTER command

From
Stephan Szabo
Date:
On Thu, 12 Dec 2002, Jean-Luc Lachance wrote:

> The current cluster command is equivalant to:
>
> create b as select * from a order by i;
>
> So you would not be loosing anything.

Except for the fact that the CLUSTER is intended (although
I don't know if it does yet) to retain things like constraints
and other indexes which the above doesn't.


Re: CLUSTER command

From
Lincoln Yeoh
Date:
Splitting table files by indexed value may not help if the operating system
doesn't manage to keep the tables unfragmented on disk. I suppose the O/S
should know how to do it though.

Cheerio,
Link.

At 04:31 PM 12/12/02 -0500, Jean-Luc Lachance wrote:

>Hi all,
>
>I just read about the cluster command and was a little (very)
>disapointed.
>Clustered tables do not remain clustered after inserts.
>Clustered tables are usefull when the table is very large and there are
>few different keys.
>
>
>Because the table file is already extended (2G limit) using different
>files extension (.N)
>how complicated (modifying the code) would it be to have the table files
>split according to the cluster key?



Re: [PERFORM] CLUSTER command

From
Jean-Luc Lachance
Date:
OK fine,

Let's create a new command:

PARTITION <table> ON <attribute>

I did not want to start a fight. You can keep the CLUSTER command as it
is.

I still think clustering/partitioning would be a great idea.
This is what I want to talk about. Look at the original post for the
reasons.


JLL



johnnnnnn wrote:
>
> On Thu, Dec 12, 2002 at 02:03:56PM -0800, Stephan Szabo wrote:
> > I'd vote against changing the existing CLUSTER since the existing
> > CLUSTER while not great does handle many different key values fairly
> > well as well and this solution wouldn't.
>
> I would agree. What's being proposed sounds much more like table
> partitioning than clustering.
>
> That's not to say that the existing CLUSTER couldn't be improved, at
> the very least to the point where it allows inserts to respect the
> clustered structure. That's a post for another thread, though.
>
> -johnnnnnnnnnnn
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly

PerformPortalClose warning in 7.3

From
Michael Engelhart
Date:
Hi -
I've been running PostgreSQL 7.3 on Mac OS X 10.2 since it was released
and it's been running fine.  I'm using pyPgSQL 2.3 for client side
programming which also was working great until tonight.   Now whenever
I do any query of any type, I get warnings like this:

WARNING:  PerformPortalClose: portal "pgsql_00179f10" not found

It "appears" that everything is still working the way it was but it's a
bit discomforting to have these  show up on my screen while running my
applications.

Anyone that can explain this?

Here's a tiny bit of Python sample code that I used to make sure it
wasn't my other code causing the problems

from pyPgSQL import PgSQL

dbname = "template1"
conn = PgSQL.connect(database=dbname)
cursor = conn.cursor()
sql = "SELECT now()";
cursor.execute(sql)
res = cursor.fetchall()
for i in res:
    print i
cursor.close()
conn.commit()

strangely if I remove the last 2 lines (cursor.close() and
conn.commit()) I don't get the errors.

Also I don't notice that I don't have this problem with psql command
line either.   Is this the Python API causing this?

Thanks for any help

Mike


Re: PerformPortalClose warning in 7.3

From
Gerhard Haering
Date:
Michael Engelhart wrote in gmane.comp.db.postgresql.general:
> Hi -
> I've been running PostgreSQL 7.3 on Mac OS X 10.2 since it was released
> and it's been running fine.  I'm using pyPgSQL 2.3 for client side
> programming which also was working great until tonight.   Now whenever
> I do any query of any type, I get warnings like this:
>
> WARNING:  PerformPortalClose: portal "pgsql_00179f10" not found
>
> It "appears" that everything is still working the way it was but it's a
> bit discomforting to have these  show up on my screen while running my
> applications.
>
> Anyone that can explain this?
>
> Here's a tiny bit of Python sample code that I used to make sure it
> wasn't my other code causing the problems
>
> from pyPgSQL import PgSQL
>
> dbname = "template1"
> conn = PgSQL.connect(database=dbname)
> cursor = conn.cursor()
> sql = "SELECT now()";
> cursor.execute(sql)
> res = cursor.fetchall()
> for i in res:
>     print i
> cursor.close()
> conn.commit()

Actually, pyPgSQL is using PostgreSQL portals behind your back. This
is a feature!

To show this, we use the undocumented, but very handy toggleShowQuery
flag. The effect is that we can see what SQL pyPgSQL sends to the
backend using libpq (the lines staring with QUERY: below):

#v+
gerhard@gargamel:~$ python
Python 2.2.2 (#1, Nov 30 2002, 23:19:58)
[GCC 2.95.4 20020320 [FreeBSD]] on freebsd4
Type "help", "copyright", "credits" or "license" for more information.
>>> from pyPgSQL import PgSQL
>>> con = PgSQL.connect()
>>> con.conn.toggleShowQuery
'On'
>>> cursor = con.cursor()
QUERY: BEGIN WORK
>>> cursor.execute("select * from test")
QUERY: DECLARE "PgSQL_0811F1EC" CURSOR FOR select * from test
QUERY: FETCH 1 FROM "PgSQL_0811F1EC"
QUERY: SELECT typname, -1 , typelem FROM pg_type WHERE oid = 23
QUERY: SELECT typname, -1 , typelem FROM pg_type WHERE oid = 1043
>>> result = cursor.fetchmany(5)
QUERY: FETCH 4 FROM "PgSQL_0811F1EC"
>>> result
[[None, 'A'], [None, 'B'], [None, 'C'], [None, 'F'], [None, 'F']]
>>> con.commit()
QUERY: CLOSE PgSQL_0811F1EC
QUERY: COMMIT WORK
>>>
#v-

This gives me a warning like this:

#v+
WARNING:  PerformPortalClose: portal "pgsql_0811f1ec" not found
#v-

As far as I can see, the SQL pyPgSQL emits is perfectly ok. But I'd be
glad to hear a clarification.

> strangely if I remove the last 2 lines (cursor.close() and
> conn.commit()) I don't get the errors.
>
> Also I don't notice that I don't have this problem with psql command
> line either.   Is this the Python API causing this?

If you use the same SQL statements using portals in psql, you get the
same warning (obviously). I just tried.

Gerhard (pyPgSQL developer)
--
Favourite database:             http://www.postgresql.org/
Favourite programming language: http://www.python.org/
Combine the two:                http://pypgsql.sf.net/
Embedded database for Python:   http://pysqlite.sf.net/

Re: PerformPortalClose warning in 7.3

From
Tom Lane
Date:
Gerhard Haering <haering_postgresql@gmx.de> writes:
> To show this, we use the undocumented, but very handy toggleShowQuery
> flag. The effect is that we can see what SQL pyPgSQL sends to the
> backend using libpq (the lines staring with QUERY: below):


> QUERY: DECLARE "PgSQL_0811F1EC" CURSOR FOR select * from test
> ...
> QUERY: CLOSE PgSQL_0811F1EC

This looks like a pyPgSQL bug to me.  If it's going to use a mixed-case
name for the cursor then it must either always double-quote the name or
never do so.  Failing to double-quote in the CLOSE command is wrong.

            regards, tom lane