Thread: some points for FAQ

some points for FAQ

From

"Pavel Stehule"

Date:

09 October 2007, 09:28:33

4.1)
To SELECT a random row, use:   SELECT col   FROM tab   ORDER BY random()   LIMIT 1;

+ On bigger tables this solution is slow. Please, find smarter
solution on network.


4.6)

ILIKE is slow, specially on multibyte encodings. If is possible use
FULLTEXT. LIKE '%some%' is slow always .. thing about FULLTEXT.

4.11.2)

+ Alternatively (on PostgreSQL 8.2.0 and all later releases) you could
RETURNING clause for retrieving used SERIAL value, e.g.,

new_id = execute("SELECT INSERT INTO person(name) VALUES('Blaise
Pascal') RETURNING id");

4.19)

+ most of problems with invalid OIDs in cache are solved in PostgreSQL
8.3. Please remeber, so every replanning of SQL statements needs time.
Write your application, they can exist without cache invalidation.


I am sorry, I am not able create patch via my minimalistic english knowleage.

Regards
Pavel Stehule

p.s. can we create some general F.A.Q XML format and store FAQ there?

WIP Proposal:

<faq name = .....  language = >
<entry number="1.1.1"> <query>....</query> <ansver>    ...
we need some tags from html: <p><br><a><i><b><ul><li><table>

Re: some points for FAQ

From

Alvaro Herrera

Date:

09 October 2007, 11:02:47

Pavel Stehule escribió:

> p.s. can we create some general F.A.Q XML format and store FAQ there?
> 
> WIP Proposal:
> 
> <faq name = .....  language = >
> <entry number="1.1.1">
>   <query>....</query>
>   <ansver>
>      ...
> we need some tags from html: <p><br><a><i><b><ul><li><table>

There is a DocBook spec for FAQ lists.  Actually a friend of mine was
working on converting our FAQ into that kind of XML.

-- 
Alvaro Herrera                 http://www.amazon.com/gp/registry/CTMLCN8V17R4
"God is real, unless declared as int"

Re: some points for FAQ

From

"Pavel Stehule"

Date:

09 October 2007, 11:23:58

2007/10/9, Alvaro Herrera <alvherre@commandprompt.com>:
> Pavel Stehule escribió:
>
> > p.s. can we create some general F.A.Q XML format and store FAQ there?
> >
> > WIP Proposal:
> >
> > <faq name = .....  language = >
> > <entry number="1.1.1">
> >   <query>....</query>
> >   <ansver>
> >      ...
> > we need some tags from html: <p><br><a><i><b><ul><li><table>
>
> There is a DocBook spec for FAQ lists.  Actually a friend of mine was
> working on converting our FAQ into that kind of XML.
>

I'll look on it

Pavel

Re: some points for FAQ

From

Chris Browne

Date:

09 October 2007, 13:25:56

alvherre@commandprompt.com (Alvaro Herrera) writes:
> Pavel Stehule escribió:
>
>> p.s. can we create some general F.A.Q XML format and store FAQ there?
>> 
>> WIP Proposal:
>> 
>> <faq name = .....  language = >
>> <entry number="1.1.1">
>>   <query>....</query>
>>   <ansver>
>>      ...
>> we need some tags from html: <p><br><a><i><b><ul><li><table>
>
> There is a DocBook spec for FAQ lists.  Actually a friend of mine was
> working on converting our FAQ into that kind of XML.

Yup, the structure is known as a <qandaset>
 <http://www.docbook.org/tdg/en/html/qandaset.html>

There is an example of this in the Slony-I docs - the admin guide has
a FAQ defined using qandaset and its children.
-- 
"cbbrowne","@","acm.org"
http://www3.sympatico.ca/cbbrowne/faq.html
All extremists should be taken out and shot.

Re: some points for FAQ

From

Bruce Momjian

Date:

09 October 2007, 17:13:14

Pavel Stehule wrote:
> 4.1)
> 
>  To SELECT a random row, use:
>     SELECT col
>     FROM tab
>     ORDER BY random()
>     LIMIT 1;
> 
> + On bigger tables this solution is slow. Please, find smarter
> solution on network.
> 

Well, give me a better example that works.

> 4.6)
> 
> ILIKE is slow, specially on multibyte encodings. If is possible use
> FULLTEXT. LIKE '%some%' is slow always .. thing about FULLTEXT.

I added a mention of "full text indexing" for word searches.

> 4.11.2)
> 
> + Alternatively (on PostgreSQL 8.2.0 and all later releases) you could
> RETURNING clause for retrieving used SERIAL value, e.g.,
> 
> new_id = execute("SELECT INSERT INTO person(name) VALUES('Blaise
> Pascal') RETURNING id");

Agreed.  I have updated the text to suggest RETURNING be used and
reduced the other examples.  The web site should have the updated
content shortly but CVS will have FAQ.html as well soon.

> 4.19)
> 
> + most of problems with invalid OIDs in cache are solved in PostgreSQL
> 8.3. Please remeber, so every replanning of SQL statements needs time.
> Write your application, they can exist without cache invalidation.

Agreed.  Item removed.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://postgres.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: some points for FAQ

From

"Pavel Stehule"

Date:

09 October 2007, 17:51:37

2007/10/9, Bruce Momjian <bruce@momjian.us>:
> Pavel Stehule wrote:
> > 4.1)
> >
> >  To SELECT a random row, use:
> >     SELECT col
> >     FROM tab
> >     ORDER BY random()
> >     LIMIT 1;
> >
> > + On bigger tables this solution is slow. Please, find smarter
> > solution on network.
> >
>
> Well, give me a better example that works.

Better universal solution doesn't exist. Exists only unelegant
solutions - but mutch faster.

SELECT id, ...  FROM data WHERE id = ANY(ARRAY(                      SELECT (random()*:max_id)::int
   FROM generate_series(1,20))) LIMIT 1;

max_id is host variable ~ real max id + some

-- fast solution if id is PK of data

>
> > 4.19)
> >
> > + most of problems with invalid OIDs in cache are solved in PostgreSQL
> > 8.3. Please remeber, so every replanning of SQL statements needs time.
> > Write your application, they can exist without cache invalidation.
>
> Agreed.  Item removed.
>

Cache invalidation isn't 100% protection before this error message.
With specific using of EXECUTE statement, you can get this message
too. But all temp tables related problems are solved.

Regards
Pavel Stehule

Re: some points for FAQ

From

Bruce Momjian

Date:

09 October 2007, 17:54:17

Pavel Stehule wrote:
> 2007/10/9, Bruce Momjian <bruce@momjian.us>:
> > Pavel Stehule wrote:
> > > 4.1)
> > >
> > >  To SELECT a random row, use:
> > >     SELECT col
> > >     FROM tab
> > >     ORDER BY random()
> > >     LIMIT 1;
> > >
> > > + On bigger tables this solution is slow. Please, find smarter
> > > solution on network.
> > >
> >
> > Well, give me a better example that works.
> 
> Better universal solution doesn't exist. Exists only unelegant
> solutions - but mutch faster.
> 
> SELECT id, ...
>    FROM data
>   WHERE id = ANY(ARRAY(
>                        SELECT (random()*:max_id)::int
>                           FROM generate_series(1,20)))
>   LIMIT 1;
> 
> max_id is host variable ~ real max id + some
> 
> -- fast solution if id is PK of data

Right.  We really only want general solutions in the FAQ.


> > > 4.19)
> > >
> > > + most of problems with invalid OIDs in cache are solved in PostgreSQL
> > > 8.3. Please remeber, so every replanning of SQL statements needs time.
> > > Write your application, they can exist without cache invalidation.
> >
> > Agreed.  Item removed.
> >
> 
> Cache invalidation isn't 100% protection before this error message.
> With specific using of EXECUTE statement, you can get this message
> too. But all temp tables related problems are solved.

OK, let's see how many bug reports we get and we can always re-add it.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://postgres.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: some points for FAQ

From

"Pavel Stehule"

Date:

09 October 2007, 17:59:30

2007/10/9, Bruce Momjian <bruce@momjian.us>:
> Pavel Stehule wrote:
> > 2007/10/9, Bruce Momjian <bruce@momjian.us>:
> > > Pavel Stehule wrote:
> > > > 4.1)
> > > >
> > > >  To SELECT a random row, use:
> > > >     SELECT col
> > > >     FROM tab
> > > >     ORDER BY random()
> > > >     LIMIT 1;
> > > >
> > > > + On bigger tables this solution is slow. Please, find smarter
> > > > solution on network.
> > > >
> > >
> > > Well, give me a better example that works.
> >
> > Better universal solution doesn't exist. Exists only unelegant
> > solutions - but mutch faster.
> >
> > SELECT id, ...
> >    FROM data
> >   WHERE id = ANY(ARRAY(
> >                        SELECT (random()*:max_id)::int
> >                           FROM generate_series(1,20)))
> >   LIMIT 1;
> >
> > max_id is host variable ~ real max id + some
> >
> > -- fast solution if id is PK of data
>
> Right.  We really only want general solutions in the FAQ.
>

ok. I accept it. Can be some note there? Not this strange select.


> > > > 4.19)
> > > >
> > > > + most of problems with invalid OIDs in cache are solved in PostgreSQL
> > > > 8.3. Please remeber, so every replanning of SQL statements needs time.
> > > > Write your application, they can exist without cache invalidation.
> > >
> > > Agreed.  Item removed.
> > >
> >
> > Cache invalidation isn't 100% protection before this error message.
> > With specific using of EXECUTE statement, you can get this message
> > too. But all temp tables related problems are solved.
>
> OK, let's see how many bug reports we get and we can always re-add it.
>

It's true :).  You have to try really wild things inside plpgsql procedures.

Pavel

Re: some points for FAQ

From

Bruce Momjian

Date:

09 October 2007, 18:07:26

Pavel Stehule wrote:
> > > Better universal solution doesn't exist. Exists only unelegant
> > > solutions - but mutch faster.
> > >
> > > SELECT id, ...
> > >    FROM data
> > >   WHERE id = ANY(ARRAY(
> > >                        SELECT (random()*:max_id)::int
> > >                           FROM generate_series(1,20)))
> > >   LIMIT 1;
> > >
> > > max_id is host variable ~ real max id + some
> > >
> > > -- fast solution if id is PK of data
> >
> > Right.  We really only want general solutions in the FAQ.
> >
> 
> ok. I accept it. Can be some note there? Not this strange select.

Well, with 8.3 having this be faster I am thinking we should wait to see
if the hacks are needed.

> > > Cache invalidation isn't 100% protection before this error message.
> > > With specific using of EXECUTE statement, you can get this message
> > > too. But all temp tables related problems are solved.
> >
> > OK, let's see how many bug reports we get and we can always re-add it.
> >
> 
> It's true :).  You have to try really wild things inside plpgsql procedures.

Good.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://postgres.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: some points for FAQ

From

"Pavel Stehule"

Date:

09 October 2007, 18:36:37

> > >
> >
> > ok. I accept it. Can be some note there? Not this strange select.
>
> Well, with 8.3 having this be faster I am thinking we should wait to see
> if the hacks are needed.
>

difference, on 10K lines (on small think table)

postgres=# select * from test where i = any(array(select
(random()*10000)::int from generate_series(1,20))) limit 1; i  |  v
-----+-----869 | 113
(1 row)

Time: 3,984 ms

postgres=# select * from test order by random() limit 1; i   |  v
------+-----3687 | 293
(1 row)

Time: 21,978 ms

8.2
postgres=# select * from test order by random() limit 1; i   |  v
------+-----4821 | 608
(1 row)

Time: 51,299 ms

postgres=# select * from test where i = any(array(select
(random()*10000)::int from generate_series(1,20))) limit 1; i  |  v
-----+-----762 | 254
(1 row)

Time: 4,530 ms

Results:

8.3  "fast solution' is 6x faster
8.2  'fast solution' is 11x faster  .. it's minimum.

Pavel

for me, it's one from typical beginers mistakes

Re: some points for FAQ

From

Bruce Momjian

Date:

09 October 2007, 18:49:16

Pavel Stehule wrote:
> > > >
> > >
> > > ok. I accept it. Can be some note there? Not this strange select.
> >
> > Well, with 8.3 having this be faster I am thinking we should wait to see
> > if the hacks are needed.
> >
> 
> difference, on 10K lines (on small think table)
> 
> postgres=# select * from test where i = any(array(select
> (random()*10000)::int from generate_series(1,20))) limit 1;
>   i  |  v
> -----+-----
>  869 | 113
> (1 row)
> 
> Time: 3,984 ms
> 
> postgres=# select * from test order by random() limit 1;
>   i   |  v
> ------+-----
>  3687 | 293
> (1 row)
> 
> Time: 21,978 ms
> 
> 8.2
> postgres=# select * from test order by random() limit 1;
>   i   |  v
> ------+-----
>  4821 | 608
> (1 row)
> 
> Time: 51,299 ms
> 
> postgres=# select * from test where i = any(array(select
> (random()*10000)::int from generate_series(1,20))) limit 1;
>   i  |  v
> -----+-----
>  762 | 254
> (1 row)
> 
> Time: 4,530 ms
> 
> Results:
> 
> 8.3  "fast solution' is 6x faster
> 8.2  'fast solution' is 11x faster  .. it's minimum.

OK, how do we even explain this idea in the FAQ.  It pulls 20 random
values from 1 to 10000?  That seems pretty hard to code to me.  Where do
you get the 10000 number from?  How do you know you will hit a match in
20 tries?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://postgres.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: some points for FAQ

From

"Pavel Stehule"

Date:

10 October 2007, 02:11:41

>
> OK, how do we even explain this idea in the FAQ.  It pulls 20 random
> values from 1 to 10000?  That seems pretty hard to code to me.  Where do
> you get the 10000 number from?  How do you know you will hit a match in
> 20 tries?
>

Number 10000 you have to store in application .. it's magic constant.
It similar our statistics. And sometimes you have to actualise it.
This is stochastic methods, so it's possible so it doesn't return any
value, and you have to repeat it. Using this method expect knowledge
about generating random numbers. This method is far to ideal, but on
databases with big traffic only this is usable.

Pavel

Re: some points for FAQ

From

Bruce Momjian

Date:

10 October 2007, 11:44:49

Pavel Stehule wrote:
> >
> > OK, how do we even explain this idea in the FAQ.  It pulls 20 random
> > values from 1 to 10000?  That seems pretty hard to code to me.  Where do
> > you get the 10000 number from?  How do you know you will hit a match in
> > 20 tries?
> >
> 
> Number 10000 you have to store in application .. it's magic constant.
> It similar our statistics. And sometimes you have to actualise it.
> This is stochastic methods, so it's possible so it doesn't return any
> value, and you have to repeat it. Using this method expect knowledge
> about generating random numbers. This method is far to ideal, but on
> databases with big traffic only this is usable.

OK, but this is clearly something I can't just throw into the FAQ and
expect people to figure it out, and going into major detail to explain
it in the FAQ isn't logical either.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://postgres.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +