Thread: Show stopper ? (was: Re: "cruising" or "browsing" through tables using an index / ordering)

  Hi again,

   It seems unfortunate, but if I can't efficiently (i.e. no copying of
whole table, no blocking and using index)  browse the table, I might not
be able to "sell" postgres into the commercial environment. :-( RAIMA
Velocis might win :-(

  Please please help me solve this or make workarounds or anything. I
would *really* like to see PosgreSQL to be playing against the Big
(commercial) Boys !

  What the whole problem really reduces to, is to be able to get the next
/ previous value in an index. If I can do that, I win. No SELECT (and thus
no locking) and no huge copying !). It seems to me like something
primitive, so it should be easy. But it isn't obvious to me. Please help.

  Below is more description and questions.

On Thu, 22 Jan 1998, Dustin Sallings wrote:

> >    Well, first let the user look up the manufacturer's part, and then let
> > him "browse" from there one (look at the next / previous ones), where
> > "next" is defined as the one closest bigger than the current one (based on
> > a sort attribute(s) on which I have an index).
>
>     That's easy in a normal application type place where you keep state, if
> you're doing it as a web based application, it's a different story...  Assuming
> you're writing a standalone application, you just keep a cursor to what you're
> looking at, and move it back and forth as needed, otherwise, otherwise, you're
> going to have to do the whole query again AFAIK.

  a few questions :

   0. having a value of a field on which there is an index, how can I do :
       a) current_pointer = some_function("value_I_have");
       b) next_pointer    = some_other_function(current_pointer);
       c) one_tupple      = yet_another_function(next_pointer);
   If I can accomplish a,b,c, then I win and I don't have to do questions
1..5 below.

   1. I assume I have to say "declare cursor mycursor for SELECT
field1,field2 FROM mytable ORDER BY one_of_fields_on_which_I_have_index"
    Am I right ? Or can I do it some other way ?

   2. Will not the step (1.) *copy* the whole 40MB table ? (that would
trash the system if several people at once want to "browse" through the
table.)

   3. Will not the step (1.) read-lock *lock* the whole table ? (i.e.
while somebody is "browsing", I need other people to be able to update the
database).

   4. Will the step (1.) cause Postgres to use index for moving forth /
back ? Because if it doesn't, it might take ages to either declare the
cursor or to move forth / back.

   5. If the step (1.) will not cause Postgres to use index, what will ?

   6. How do I do it if I don't want to start "browsing" from the first
row ?

  Thanx,

          Jan

 -- Gospel of Jesus is the saving power of God for all who believe --
Jan Vicherek ## To some, nothing is impossible. ##  www.ied.com/~honza
    >>>    Free Software Union President  ...  www.fslu.org    <<<
Interactive Electronic Design Inc.    -#-    PGP: finger honza@ied.com


On Fri, 23 Jan 1998, Jan Vicherek wrote:

>   Hi again,
>
>    It seems unfortunate, but if I can't efficiently (i.e. no copying of
> whole table, no blocking and using index)  browse the table, I might not
> be able to "sell" postgres into the commercial environment. :-( RAIMA
> Velocis might win :-(
>
>   Please please help me solve this or make workarounds or anything. I
> would *really* like to see PosgreSQL to be playing against the Big
> (commercial) Boys !

    I'm curious, but can the "Big (commercial) Boys" do this?  If so,
can you please provide an example of which and how?  Most of us here have
access to an one or the other (me, Oracle) to use as a sample system...if
we can prove that it does work on another system, then we have something
to work with, but right now all I've seen is "I wish I could do this", and
several examples on how to accomplish it using PostgreSQL, but that's
it...

>
>   What the whole problem really reduces to, is to be able to get the next
> / previous value in an index. If I can do that, I win. No SELECT (and thus
> no locking) and no huge copying !). It seems to me like something
> primitive, so it should be easy. But it isn't obvious to me. Please help.

    If there is no SELECT, how do you get data out of an SQL database?
*raised eyebrow*

>    0. having a value of a field on which there is an index, how can I do :
>        a) current_pointer = some_function("value_I_have");
>        b) next_pointer    = some_other_function(current_pointer);
>        c) one_tupple      = yet_another_function(next_pointer);
>    If I can accomplish a,b,c, then I win and I don't have to do questions
>    1..5 below.

    Why not put a sequence field on the table so that you can do:

    select * from table where rowid = n;     -or-
    select * from table where rowid = n - 1; -or-
    select * from table where rowid = n + 1; -or-
    select * from table where rowid >= n and rowid <= n+x;

    And create the index on rowid?

Marc G. Fournier
Systems Administrator @ hub.org
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org


On Fri, 23 Jan 1998, The Hermit Hacker wrote:

>     I'm curious, but can the "Big (commercial) Boys" do this?  If so,
> can you please provide an example of which and how?

    Hmm, well, the one we are switching from does this ;-) (Informix 3.3
ALL-II C interface). It's not SQL, tho.

>  Most of us here have
> access to an one or the other (me, Oracle) to use as a sample system...if
> we can prove that it does work on another system, then we have something
> to work with

> all I've seen is ... and several examples on how to accomplish it using
> PostgreSQL, but that's it...

   Wait, have you seen here an example that accomplishes this which
wouldn't need the whole table copied and wouldn't lock the table against
updates ?

> >   What the whole problem really reduces to, is to be able to get the next
> > / previous value in an index. If I can do that, I win. No SELECT (and thus
> > no locking) and no huge copying !). It seems to me like something
> > primitive, so it should be easy. But it isn't obvious to me. Please help.
>
>     If there is no SELECT, how do you get data out of an SQL database?
> *raised eyebrow*

   Sorry, I meant no SELECT that couln't be done through index and
so wouldn't copy huge amounts of data into a result table. And wouldn't
lock the whole table.

> >    0. having a value of a field on which there is an index, how can I do :
> >        a) current_pointer = some_function("value_I_have");
> >        b) next_pointer    = some_other_function(current_pointer);
> >        c) one_tupple      = yet_another_function(next_pointer);
> >    If I can accomplish a,b,c, then I win and I don't have to do questions
> >    1..5 below.
>
>     Why not put a sequence field on the table so that you can do:
>     select * from table where rowid = n;     -or-
>     select * from table where rowid = n - 1; -or-
>     select * from table where rowid = n + 1; -or-
>     select * from table where rowid >= n and rowid <= n+x;
>
>     And create the index on rowid?

   Because I also need to be able to INSERT rows.  That would require
renumeration of half the table (remember, it's 40MB, 400,000 rows) every
time I do an INSERT.

  I *still* think that there *has to* be a way to find a value that is
immediatelly next to one I have. This seems like such a primitive
operation. Even the backend must be doing it on some operations, it would
seem.

 Maybe even in SQL. Maybe something like (I'm not an SQL expert) : "SELECT
IndexField from MyTable where InxdexField > 'my_current_value' and
IndexField < ("all IndexFields that are bigger than the IndexField
searched for")

  Important : I'm not looking for a "pure SQL" solution. I'm writing a C
emulation library, so if it can be achieved via a call to a C Postgres
function, it would be great.

   Thanx,

       Jan


 -- Gospel of Jesus is the saving power of God for all who believe --
Jan Vicherek ## To some, nothing is impossible. ##  www.ied.com/~honza
    >>>    Free Software Union President  ...  www.fslu.org    <<<
Interactive Electronic Design Inc.    -#-    PGP: finger honza@ied.com


On Fri, 23 Jan 1998, Jan Vicherek wrote:

> On Fri, 23 Jan 1998, The Hermit Hacker wrote:
>
> >     I'm curious, but can the "Big (commercial) Boys" do this?  If so,
> > can you please provide an example of which and how?
>
>     Hmm, well, the one we are switching from does this ;-) (Informix 3.3
> ALL-II C interface). It's not SQL, tho.

    Okay, well, ummm...now you are comparing apples to oranges
then...if you wanted a non-SQL engine to replace Informix, PostgreSQL
isn't what you are looking for :(

> > all I've seen is ... and several examples on how to accomplish it using
> > PostgreSQL, but that's it...
>
>    Wait, have you seen here an example that accomplishes this which
> wouldn't need the whole table copied and wouldn't lock the table against
> updates ?

    First off, *nothing* you are going to be able to accomplish in
*any* SQL engine is going to prevent locking the table against
updates...the code that Bruce put in this afternoon for v6.3 is going to
reduce the possibility of the lock causing a deadlock, but that is about
it...the lock will still be created.

> >     Why not put a sequence field on the table so that you can do:
> >     select * from table where rowid = n;     -or-
> >     select * from table where rowid = n - 1; -or-
> >     select * from table where rowid = n + 1; -or-
> >     select * from table where rowid >= n and rowid <= n+x;
> >
> >     And create the index on rowid?
>
>    Because I also need to be able to INSERT rows.  That would require
> renumeration of half the table (remember, it's 40MB, 400,000 rows) every
> time I do an INSERT.

    Okay, you are confusing INSERT then...INSERT in SQL just adds a
row, that's it...it doesn't perform any "sorting" on it...that's what the
ORDER BY command does...

    ...but, I now understand what *you* mean by INSERT...

>   I *still* think that there *has to* be a way to find a value that is
> immediatelly next to one I have. This seems like such a primitive
> operation. Even the backend must be doing it on some operations, it would
> seem.

    Not possible...INSERT into a table doesn't "merge" the record
between its lower/higher bounds...it just adds it to the very end of the
table.  And an index doesn't "sort" the data either...that is what the
ORDER BY clause is for...

>  Maybe even in SQL. Maybe something like (I'm not an SQL expert) : "SELECT
> IndexField from MyTable where InxdexField > 'my_current_value' and
> IndexField < ("all IndexFields that are bigger than the IndexField
> searched for")

    From your sample above, is your first SQL call going to pull out
all 40MB of data in one select statement, then your second 40MB minus the
first X records?

    What you want to do is:

begin
  declare mycursor cursor for select * from pg-user order by <somefield>;
  move $forward  in FOO;
  fetch $retrieve in FOO;
  close foo;
end;

    Basically, take your table, move to the $forward record in it,
grab the next $retrieve records and then release the table.

    Your first time through, $forward might just equal 0...but when
you run it the second time through, you pass it back a $forward value
equal to $forward + $retrieve, so that you start at where you ended the
first time.  This is how I deal with one of my projects, where I have to
do exactly what you are looking for...

    About the only part of the SELECT above is the ORDER BY, and alot
of that is as much a restriction on your hardware then anything...the
major performance boost in an ORDER BY is memory..keeping it off the hard
drive.

>   Important : I'm not looking for a "pure SQL" solution. I'm writing a C
> emulation library, so if it can be achieved via a call to a C Postgres
> function, it would be great.

    You'd be better off looking at something like GDBM (which, by the
way, also creates a lock against updates while another is reading the
database)...unless I'm missing something, you aren't looking at doing
anything that *requires* an SQL engine :(

Marc G. Fournier
Systems Administrator @ hub.org
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org



Attn PG gurus / coders : New approach for ORDER BY ? (was: Re: Show stopper ?)

From
Jan Vicherek
Date:
On Fri, 23 Jan 1998, The Hermit Hacker wrote:

> On Fri, 23 Jan 1998, Jan Vicherek wrote:
>
> > On Fri, 23 Jan 1998, The Hermit Hacker wrote:
> >
> > >     I'm curious, but can the "Big (commercial) Boys" do this?  If so,
> > > can you please provide an example of which and how?
> >
> >     Hmm, well, the one we are switching from does this ;-) (Informix 3.3
> > ALL-II C interface). It's not SQL, tho.
>
>     Okay, well, ummm...now you are comparing apples to oranges
> then...if you wanted a non-SQL engine to replace Informix, PostgreSQL
> isn't what you are looking for :(

   Hmm, I thought that PostgreSQL would have two interfaces :

  an SQL interface (doing all the optimizing, joining and the complicated
stuff) and
  a low(er)-level C/C++/whatever interface which would allow to use less
of the intelligence of PostgreSQL and more of it's retireval system,
including indexing.

> > > all I've seen is ... and several examples on how to accomplish it using
> > > PostgreSQL, but that's it...
> >
> >    Wait, have you seen here an example that accomplishes this which
> > wouldn't need the whole table copied and wouldn't lock the table against
> > updates ?
>
>     First off, *nothing* you are going to be able to accomplish in
> *any* SQL engine is going to prevent locking the table against
> updates...

   Sorry, I wasn't precise again : I mean I can't stop other people from
updating a table just because one guy is out there "browsing" through it.

> > >     Why not put a sequence field on the table so that you can do:
> > >     select * from table where rowid = n;     -or-
> > >     select * from table where rowid = n - 1; -or-
> > >     select * from table where rowid = n + 1; -or-
> > >     select * from table where rowid >= n and rowid <= n+x;
> > >
> > >     And create the index on rowid?
> >
> >    Because I also need to be able to INSERT rows.  That would require
> > renumeration of half the table (remember, it's 40MB, 400,000 rows) every
> > time I do an INSERT.
>
>     Okay, you are confusing INSERT then...INSERT in SQL just adds a
> row, that's it...it doesn't perform any "sorting" on it...that's what the
> ORDER BY command does...

   The INSERT causes all indices on that table to be updated.
 If I used a field in the table to serve as a pseudo-index (user-level
index, not system-level index), *then* that would require "manual"
renumeration of rowid field in half the table every time I do an INSERT.

>     ...but, I now understand what *you* mean by INSERT...

 good :)

> >   I *still* think that there *has to* be a way to find a value that is
> > immediatelly next to one I have. This seems like such a primitive
> > operation. Even the backend must be doing it on some operations, it would
> > seem.
>
>     Not possible...

   Remeber that I have an index on the field that I'm looking for the
"next" value.
   Is it not possible in such case ?
   Even by calling some lower-level C function to give me the next value ?

> INSERT into a table doesn't "merge" the record
> between its lower/higher bounds...it just adds it to the very end of the
> table.  And an index doesn't "sort" the data either...that is what the
> ORDER BY clause is for...

   I guess I wasn't precise enough. I was aware of these two facts when I
was writing the previous msgs.
   Index doesn't "sort". Index gets sorted. But pseudo-index doesn't even
get sorted, I would have to sort it "manually" after every INSERT I would
make.

> >  Maybe even in SQL. Maybe something like (I'm not an SQL expert) : "SELECT
> > IndexField from MyTable where InxdexField > 'my_current_value' and
> > IndexField < ("all IndexFields that are bigger than the IndexField
> > searched for")
>
>     From your sample above, is your first SQL call going to pull out
> all 40MB of data in one select statement, then your second 40MB minus the
> first X records?

   Hmm, I could have hoped that the optimizer would recognize what I'm
trying to do and would only pull out that one record, but I guess not.
Never mind this attempt then.

>     What you want to do is:
>
> begin
>   declare mycursor cursor for select * from pg-user order by <somefield>;
>   move $forward  in FOO;
>   fetch $retrieve in FOO;
>   close foo;
> end;

 (I guess you meant "close mycursor", not "close foo".)

  I've just tried this on half-the-size table, and it took 70 seconds :-(

meray=> begin;
BEGIN
meray=> declare mycursor cursor for select * from master order by mpartno;
SELECT
meray=> fetch forward 1 in mycursor;
close mycursor;
end;
mpartno            |mlistprc|mwarcomp|mcost|msupercd|mbasecd|mflag|mpackqty|mdescrip
-------------------+--------+--------+-----+--------+-------+-----+--------+----------------
F       0 AVZK     |     581|     418|  311|S       |       |    6|       0|SPARK PLUG
(1 row)

meray=> close mycursor;
CLOSE
meray=> end;
END
meray=> \d masterind

Table    = masterind
+----------------------------------+----------------------------------+-------+
|              Field               |              Type                | Length|
+----------------------------------+----------------------------------+-------+
| mpartno                          | (bp)char                         |    19 |
+----------------------------------+----------------------------------+-------+

  Since the ORDER BY field (mpartno) has an index on it, why wouldn't
Postgres just take that index and go with it one-by-one ?! Such approach
wouldn't take 70 seconds on idle P200+ 64MB, but would take 1/2 a second.

Is such approach pricipally/fundamentally wrong, or is this correct
approach, and postgres just needs some more code written for this thing to
work ? (I.e. would other things break if this was implemented ?)

 -- whereas single select (which uses index) takes half a second.
meray=>   select * from master where mpartno = 'F       0 AVZK     ';
mpartno            |mlistprc|mwarcomp|mcost|msupercd|mbasecd|mflag|mpackqty|mdescrip
-------------------+--------+--------+-----+--------+-------+-----+--------+----------------
F       0 AVZK     |     581|     418|  311|S       |       |    6|       0|SPARK PLUG
F       0 AVZK     |     581|     418|  311|S       |       |    6|       0|SPARK PLUG
(2 rows)
meray=>

  Now I want do see the row that is in the masterind after the first, in
this case it contains the same data.

  If we can get "this new approach" to work,

>     Basically, take your table, move to the $forward record in it,
> grab the next $retrieve records and then release the table.

   But this creates 70-second lock on the table every time I want to see
next record :-(

> >   Important : I'm not looking for a "pure SQL" solution. I'm writing a C
> > emulation library, so if it can be achieved via a call to a C Postgres
> > function, it would be great.
>
>     You'd be better off looking at something like GDBM (which, by the
> way, also creates a lock against updates while another is reading the
> database)...unless I'm missing something, you aren't looking at doing
> anything that *requires* an SQL engine :(

   This looks like a sad suggestion to me. I'll kick and scream just that
I wouldn't have to go with Velocis RAIMA's lower-level C interface. (I'm
just staring into its manual to figure out the sequence of C calls I would
have to make to accomplish this.)


   Thanx to all,

        Jan

 -- Gospel of Jesus is the saving power of God for all who believe --
Jan Vicherek ## To some, nothing is impossible. ##  www.ied.com/~honza
    >>>    Free Software Union President  ...  www.fslu.org    <<<
Interactive Electronic Design Inc.    -#-    PGP: finger honza@ied.com


>     You'd be better off looking at something like GDBM (which, by the
> way, also creates a lock against updates while another is reading the
> database)...unless I'm missing something, you aren't looking at doing
> anything that *requires* an SQL engine :(

I agree.  GDBM is a fine system for such uses.

--
Bruce Momjian
maillist@candle.pha.pa.us

>
> On Fri, 23 Jan 1998, The Hermit Hacker wrote:
>
> >     I'm curious, but can the "Big (commercial) Boys" do this?  If so,
> > can you please provide an example of which and how?
>
>     Hmm, well, the one we are switching from does this ;-) (Informix 3.3
> ALL-II C interface). It's not SQL, tho.

Ah, the Informix ISAM interface that used to sit underneath the Informix
SQL engine.

>   Important : I'm not looking for a "pure SQL" solution. I'm writing a C
> emulation library, so if it can be achieved via a call to a C Postgres
> function, it would be great.
>

You can put an index on the table, and embed a function inside the
engine to spin through the index, getting valid rows.  The code is
already there to spin through the index, so you could just hook on to
the index, get the 'tid' and use that to get data from the physical
table.  No locking, but you may get in trouble if other people are
reading the index.  Maybe a quick lock-unlock would do it for you.  Not
easy, because the engine really doesn't provide a user-friendly C
interface for this, but it could be done.

--
Bruce Momjian
maillist@candle.pha.pa.us

On Fri, 23 Jan 1998, Bruce Momjian wrote:

> >     You'd be better off looking at something like GDBM (which, by the
> > way, also creates a lock against updates while another is reading the
> > database)...unless I'm missing something, you aren't looking at doing
> > anything that *requires* an SQL engine :(
>
> I agree.  GDBM is a fine system for such uses.

   We would like to convert to something that has more perspective, like
PostgreSQL does. We are not converting to PostgreSQL just to get this one
job done, but to layout a good platform for the future ...

  Yes ! PostgreSQL is highly regarded !

       Jan

 -- Gospel of Jesus is the saving power of God for all who believe --
Jan Vicherek ## To some, nothing is impossible. ##  www.ied.com/~honza
    >>>    Free Software Union President  ...  www.fslu.org    <<<
Interactive Electronic Design Inc.    -#-    PGP: finger honza@ied.com



On Fri, 23 Jan 1998, Jan Vicherek wrote:
[big chop]

>    This looks like a sad suggestion to me. I'll kick and scream just that
> I wouldn't have to go with Velocis RAIMA's lower-level C interface. (I'm
> just staring into its manual to figure out the sequence of C calls I would
> have to make to accomplish this.)

If it is still the same as when it was called DBVista... then

    d_keyfind(xx,x,x)
    while(db_status == S_OK)
       {
       d_recread(x,x,x,x);
       d_keynext(ssss);
       }
(No error checking in that code...<g>)

------------ end Raima note ------------------------------------

BUT - the functionality you want could be a special case of the Postgres
backend and would benefit a very important real world scenario
(scrolling inquiries)

IF - declare cursor
  and select 'whole table'
  and 'order by a field which is a key'

Treat as special - dont handle ANY data until the FETCH statement.
  IE. Defer the IO until the fetch happens -  then access the row(s)
      via the index

This would give an enormous boost in effective performance with the common
application task of a scrolling inquiry against a large table.
Implementing it would be transparent to current application code and would
offer a competitive edge over other products without it.

Normally I'm not one to support 'special case' coding tricks, but the
public impact on visible performance probably justifies looking into this
a bit further.
I suspect the query processor can identify the situation. It would need to
remember that it was special and leave enough control info to allow the
'fetch' processing to respond to that special case.

Comments from the guru's who are up to their necks in the code ?

Regards
Ron O'Hara

>
>
>    Thanx to all,
>
>         Jan
>
>  -- Gospel of Jesus is the saving power of God for all who believe --
> Jan Vicherek ## To some, nothing is impossible. ##  www.ied.com/~honza
>     >>>    Free Software Union President  ...  www.fslu.org    <<<
> Interactive Electronic Design Inc.    -#-    PGP: finger honza@ied.com
>
>


  Hey ! You sound like you know exactly what I mean ! ;-)

On Fri, 23 Jan 1998, Bruce Momjian wrote:

> >   Important : I'm not looking for a "pure SQL" solution. I'm writing a C
> > emulation library, so if it can be achieved via a call to a C Postgres
> > function, it would be great.
>
> You can put an index on the table, and embed a function inside the
> engine to spin through the index, getting valid rows.

  Aha, this implies that in the index there are valid and non-valid rows.
I guess those that are to be "valid" (no current transactions on that
row) and the non-valid "those that are subject to an update lock /
transaction".

>  The code is
> already there to spin through the index, so you could just hook on to
> the index, get the 'tid' and use that to get data from the physical
> table.

    This sounds like what I meant. Would you have any pointers (to source
files / functions I should read / understand / call ) ?


>  No locking, but you may get in trouble if other people are
> reading the index.

  I don't see what trouble I could get into. Any hints appreciated.


> Maybe a quick lock-unlock would do it for you.

   Do you mean lock-unlock on the table or on the index ?


> Not easy, because the engine really doesn't provide a user-friendly C
> interface for this, but it could be done.

   Can you point me to some of the user-nonfriendly files / functions I
should use then ?

      Thanx a bunch (I have to have the implementation by tomorrow 10am),

           Jan

 -- Gospel of Jesus is the saving power of God for all who believe --
Jan Vicherek ## To some, nothing is impossible. ##  www.ied.com/~honza
    >>>    Free Software Union President  ...  www.fslu.org    <<<
Interactive Electronic Design Inc.    -#-    PGP: finger honza@ied.com




   Here is another exact perfect descrtiption of what I mean :
 (you don't have to read it again, it's only repost)

        Jan

On Sat, 24 Jan 1998, Ron O'Hara wrote:

> On Fri, 23 Jan 1998, Jan Vicherek wrote:
> [big chop]
>
> >    This looks like a sad suggestion to me. I'll kick and scream just that
> > I wouldn't have to go with Velocis RAIMA's lower-level C interface. (I'm
> > just staring into its manual to figure out the sequence of C calls I would
> > have to make to accomplish this.)
>
> If it is still the same as when it was called DBVista... then
>
>     d_keyfind(xx,x,x)
>     while(db_status == S_OK)
>        {
>        d_recread(x,x,x,x);
>        d_keynext(ssss);
>        }
> (No error checking in that code...<g>)
>
> ------------ end Raima note ------------------------------------

  Yea, it seems like that would be still the same ... thanx for saving me
the time ! :)

 v--- Yes, yes yes, this is exactly what I'm trying to accompliesh   ---v
> BUT - the functionality you want could be a special case of the Postgres
> backend and would benefit a very important real world scenario
> (scrolling inquiries)
>
> IF - declare cursor
>   and select 'whole table'
>   and 'order by a field which is a key'
>
> Treat as special - dont handle ANY data until the FETCH statement.
>   IE. Defer the IO until the fetch happens -  then access the row(s)
>       via the index
>
> This would give an enormous boost in effective performance with the common
> application task of a scrolling inquiry against a large table.
> Implementing it would be transparent to current application code and would
> offer a competitive edge over other products without it.

    yes yes yes

> Normally I'm not one to support 'special case' coding tricks, but the
> public impact on visible performance probably justifies looking into this
> a bit further.
> I suspect the query processor can identify the situation. It would need to
> remember that it was special and leave enough control info to allow the
> 'fetch' processing to respond to that special case.
>
> Comments from the guru's who are up to their necks in the code ?

  ( I am eager ;-)


          Jan

 -- Gospel of Jesus is the saving power of God for all who believe --
Jan Vicherek ## To some, nothing is impossible. ##  www.ied.com/~honza
    >>>    Free Software Union President  ...  www.fslu.org    <<<
Interactive Electronic Design Inc.    -#-    PGP: finger honza@ied.com


Re: Attn PG gurus / coders : New approach for ORDER BY ?

From
Jan Vicherek
Date:

On Sat, 24 Jan 1998, Ron O'Hara wrote:

[ description of spercial-case deleted ]

> Comments from the guru's who are up to their necks in the code ?

    Hmm, I don't know how busy you developers are, so I don't know what
should I tell the customer tomorrow. Any proximity guesses ?

        Thanx,

             Jan


Re: Attn PG gurus / coders : New approach for ORDER BY ? (was: Re: Show stopper ?)

From
The Hermit Hacker
Date:
On Fri, 23 Jan 1998, Jan Vicherek wrote:

>   an SQL interface (doing all the optimizing, joining and the complicated
> stuff) and
>   a low(er)-level C/C++/whatever interface which would allow to use less
> of the intelligence of PostgreSQL and more of it's retireval system,
> including indexing.

    *scratch head*  indexing != sorting...indexing doesn't produce
sorted data...indexing provides a means to very quickly retrieve
*segments* of data from a large table of data...

> >     First off, *nothing* you are going to be able to accomplish in
> > *any* SQL engine is going to prevent locking the table against
> > updates...
>
>    Sorry, I wasn't precise again : I mean I can't stop other people from
> updating a table just because one guy is out there "browsing" through it.

    The new code that Bruce put in *should* provide for better
schedualing of the locks on a table.  Bruce can explain it alot better
then me, but the way I understand it, before, if someone had a READ lock
on the table and someone came in looking for a WRITE lock, they got held
out.  If, while the READ lock was still in effect, someone else came in to
READ the table, they got in, effectively pushing the WRITE lock further
down the queue.  With Bruce's patches, that WRITE lock will prevent
another READ from grabbing a "lock" until after the WRITE finishes.  As I
haven't had a chance to look at it yet, I'm curious as to how often
someone sees a deadlock with the new code.

    I'm running an radiusd accounting system on v6.2.1 that has one
table in it that is:

postgres@zeus> ls -lt rad*
-rw-------  1 postgres  wheel   2678784 Jan 23 23:01 radlog_uniq_id
-rw-------  1 postgres  wheel   1564672 Jan 23 23:01 radlog_stop
-rw-------  1 postgres  wheel   2031616 Jan 23 23:01 radlog_start
-rw-------  1 postgres  wheel   6406144 Jan 23 23:01 radlog
-rw-------  1 postgres  wheel   1572864 Jan 23 23:01 radlog_userid
-rw-------  1 postgres  wheel  79519744 Jan 23 22:51 radhist
-rw-------  1 postgres  wheel  16818176 Jan 23 22:51 radhist_userid
-rw-------  1 postgres  wheel  16793600 Jan 23 22:51 radhist_uniq_id
-rw-------  1 postgres  wheel  11927552 Jan 23 22:51 radhist_stop
-rw-------  1 postgres  wheel  11919360 Jan 23 22:51 radhist_start

    Those are just two tables in it.  The only problem that I've yet
to have with it, as far as "deadlocks" are concerned (deadlocks mean that
one lock is in place while a write lock is trying to happen) is during a
vacuum...and that's running on an old 486 with one hard drive and *just
upgraded to* 64Meg of RAM.

    Oh...and I'd say a good 75% of the activities on those above
tables is multiple simultaneous updates...and the only thing that disrupts
that is the vacuum process...

    Vadim...what is that humongous database you are running?  And what
are you running it on?


> > >   I *still* think that there *has to* be a way to find a value that is
> > > immediatelly next to one I have. This seems like such a primitive
> > > operation. Even the backend must be doing it on some operations, it would
> > > seem.
> >
> >     Not possible...
>
>    Remeber that I have an index on the field that I'm looking for the
> "next" value.
>    Is it not possible in such case ?

    No, as the index isn't "ordered"...there is no logical next/prev
record in an index

>    Even by calling some lower-level C function to give me the next value ?

    there are no lower-level C functions...for that, you'd be looking
at something like GDBM...

>    Index doesn't "sort". Index gets sorted. But pseudo-index doesn't even
> get sorted, I would have to sort it "manually" after every INSERT I would
> make.

    Index gets sorted by...?

>   I've just tried this on half-the-size table, and it took 70 seconds :-(
>
> meray=> begin;
> BEGIN
> meray=> declare mycursor cursor for select * from master order by mpartno;
> SELECT
> meray=> fetch forward 1 in mycursor;
> close mycursor;
> end;
> mpartno            |mlistprc|mwarcomp|mcost|msupercd|mbasecd|mflag|mpackqty|mdescrip
> -------------------+--------+--------+-----+--------+-------+-----+--------+----------------
> F       0 AVZK     |     581|     418|  311|S       |       |    6|       0|SPARK PLUG
> (1 row)
>
> meray=> close mycursor;
> CLOSE
> meray=> end;
> END
> meray=> \d masterind
>
> Table    = masterind
> +----------------------------------+----------------------------------+-------+
> |              Field               |              Type                | Length|
> +----------------------------------+----------------------------------+-------+
> | mpartno                          | (bp)char                         |    19 |
> +----------------------------------+----------------------------------+-------+
>
>   Since the ORDER BY field (mpartno) has an index on it, why wouldn't
> Postgres just take that index and go with it one-by-one ?! Such approach

    An index isn't sorted.  An index is only used with a WHERE clause
(Bruce, I'm not misrepresenting that, am I?).  A 'select * from table'
will only do a Sequential Scan through the table for the data it needs.

acctng=> select count(start) from radhist;
 count
------
563258
(1 row)

acctng=> explain select * from radhist;
NOTICE:QUERY PLAN:

Seq Scan on radhist  (cost=27022.38 size=537648 width=72)

EXPLAIN
acctng=> explain select * from radhist where userid = 'scrappy';
NOTICE:QUERY PLAN:

Index Scan on radhist  (cost=2.05 size=1 width=72)

EXPLAIN
acctng=>


    Damn, I wasn't thinking.  Okay...an index *is* sorted, in a sense.
You index on *one* field of a table (with one exception that I know of,
but I'm not getting into that here).  The index only makes use of that
if/when you do a WHERE clause on that field.

acctng=> explain select * from radhist where port = 1;
NOTICE:QUERY PLAN:

Seq Scan on radhist  (cost=27022.38 size=1 width=72)

EXPLAIN

    Notice above it does a Seq Scan?  I don't have an index on port,
since I'll never do a WHERE "search" on it.


> wouldn't take 70 seconds on idle P200+ 64MB, but would take 1/2 a second.

    How are you starting up postmaster?  Take a look at the man page
for postgres, which has an option:

       -S   Specifies the amount of memory to be used by internal
            sorts before using  disk  files  for  sorting.   This
            value  is specified in 1k bytes, and defaults to 512.

    By default, very little memory is "reserved" for sorting (ORDER
BY)...try increasing that and see if your performance improves any.  My
accounting server has it set for 10240 (10Meg), since it too had 64Meg of
RAM and nobody expect as an accounting/RDBMS server.  I could probably
increase it past that, not sure...

> Is such approach pricipally/fundamentally wrong, or is this correct
> approach, and postgres just needs some more code written for this thing to
> work ? (I.e. would other things break if this was implemented ?)

    See my above explanation...Bruce or Vadim could probably explain
it better, as they are more solid in the internals themselves, but you are
misunderstanding what an index is and what it is used for :(

>  -- whereas single select (which uses index) takes half a second.
> meray=>   select * from master where mpartno = 'F       0 AVZK     ';
> mpartno            |mlistprc|mwarcomp|mcost|msupercd|mbasecd|mflag|mpackqty|mdescrip
> -------------------+--------+--------+-----+--------+-------+-----+--------+----------------
> F       0 AVZK     |     581|     418|  311|S       |       |    6|       0|SPARK PLUG
> F       0 AVZK     |     581|     418|  311|S       |       |    6|       0|SPARK PLUG
> (2 rows)
> meray=>

    Assuming you have an index built on mpartno, then yes, this would
use the index because (and only because) of the WHERE clause.  Try using
'explain' at the beginning of your select statements...it will explain
*what* it is going to do, but not actually do it.

>   Now I want do see the row that is in the masterind after the first, in
> this case it contains the same data.
>
>   If we can get "this new approach" to work,

    You can't, unless there is some logical "greater/less then" for
mpartno, such that you could do something like (syntax will be wrong, its
the concept that i'm showing):

begin;
declare cursor mycursor for select where mpartno >= 'F       0 AVZK     '
ORDER by mpartno;
fetch forward 1 from mycursor;
close mycursor;
end;

That would speed up the search (should?) because now you are finding, in
the index, all mpartno's that come after the one your stipulate, and then
you are grabbing the first one off the stack...I don't guarantee it will
be faster, because it also depends on how much data is returned.

Just thought of it...after you load this table, you did do a 'vacuum
analyze' on it, right?  Just to get the stats up dated?

> >     Basically, take your table, move to the $forward record in it,
> > grab the next $retrieve records and then release the table.
>
>    But this creates 70-second lock on the table every time I want to see
> next record :-(

    Most of that 70-seconds, I am guess, was spent in the ORDER
by...try increasing -S to as much as you dare and see if that helps any...


Marc G. Fournier
Systems Administrator @ hub.org
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org


On Fri, 23 Jan 1998, Jan Vicherek wrote:

> > You can put an index on the table, and embed a function inside the
> > engine to spin through the index, getting valid rows.
>
>   Aha, this implies that in the index there are valid and non-valid rows.
> I guess those that are to be "valid" (no current transactions on that
> row) and the non-valid "those that are subject to an update lock /
> transaction".

    There is no record level locking available at this point in time,
only table level.  Bruce, just curious...this whole discussion...would it
be moot if we did have record level locking?

Marc G. Fournier
Systems Administrator @ hub.org
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org


Re: [HACKERS] Re: Attn PG gurus / coders : New approach for ORDER BY ?

From
The Hermit Hacker
Date:
On Fri, 23 Jan 1998, Jan Vicherek wrote:

>
>
> On Sat, 24 Jan 1998, Ron O'Hara wrote:
>
> [ description of spercial-case deleted ]
>
> > Comments from the guru's who are up to their necks in the code ?
>
>     Hmm, I don't know how busy you developers are, so I don't know what
> should I tell the customer tomorrow. Any proximity guesses ?

    IMHO...very little chance of this happening in the next several
months...Bruce/Vadim might say differently, but somehow I doubt it :(
There are too many things currently on the go that are needed for 99% of
the users...

Marc G. Fournier
Systems Administrator @ hub.org
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org


Keep in mind that the entire database gets locked for as long as
you've got it open in read/write mode, regardless of if you've read or
written.

On Fri, 23 January 1998, at 22:53:27, Bruce Momjian wrote:

> >     You'd be better off looking at something like GDBM (which, by the
> > way, also creates a lock against updates while another is reading the
> > database)...unless I'm missing something, you aren't looking at doing
> > anything that *requires* an SQL engine :(
>
> I agree.  GDBM is a fine system for such uses.