Thread: quick question abt pg_dump and restore

quick question abt pg_dump and restore

From
"Josh Harrison"
Date:
Hi,
When restoring the pg_dumped data thro psql does the rows of the table are restored in the same order? ie for example if
Table A has rows r1,r2,r3,r4,r5 in this order, then if I pg_dump and restore it to another database, will it have the rows in the same order r1,r2,r3,r4,r5? Does this apply to big tables also?
Thanks
josh

Re: quick question abt pg_dump and restore

From
Raymond O'Donnell
Date:
On 09/01/2008 14:02, Josh Harrison wrote:

> When restoring the pg_dumped data thro psql does the rows of the table
> are restored in the same order? ie for example if
> Table A has rows r1,r2,r3,r4,r5 in this order, then if I pg_dump and
> restore it to another database, will it have the rows in the same order
> r1,r2,r3,r4,r5? Does this apply to big tables also?

If you use the text dump format, you can see the order in which the rows
are restored.

Ray.

---------------------------------------------------------------
Raymond O'Donnell, Director of Music, Galway Cathedral, Ireland
rod@iol.ie
---------------------------------------------------------------

Re: quick question abt pg_dump and restore

From
"A. Kretschmer"
Date:
am  Wed, dem 09.01.2008, um  9:02:23 -0500 mailte Josh Harrison folgendes:
> Hi,
> When restoring the pg_dumped data thro psql does the rows of the table are
> restored in the same order? ie for example if
> Table A has rows r1,r2,r3,r4,r5 in this order, then if I pg_dump and restore it
> to another database, will it have the rows in the same order r1,r2,r3,r4,r5?

No. If you need an order than you need an ORDER BY in the
SELECT-Statement.


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

Re: quick question abt pg_dump and restore

From
"A. Kretschmer"
Date:
am  Wed, dem 09.01.2008, um 14:07:13 +0000 mailte Raymond O'Donnell folgendes:
> On 09/01/2008 14:02, Josh Harrison wrote:
>
> >When restoring the pg_dumped data thro psql does the rows of the table
> >are restored in the same order? ie for example if
> >Table A has rows r1,r2,r3,r4,r5 in this order, then if I pg_dump and
> >restore it to another database, will it have the rows in the same order
> >r1,r2,r3,r4,r5? Does this apply to big tables also?
>
> If you use the text dump format, you can see the order in which the rows
> are restored.

Right, but within the table the rows hasn't an order. You can see this
when you select the ctid-column:

test=# create table order_test (i int);
CREATE TABLE
test=*# insert into order_test values (1);
INSERT 0 1
test=*# insert into order_test values (2);
INSERT 0 1
test=*# insert into order_test values (3);
INSERT 0 1
test=*# select ctid, i from order_test ;
 ctid  | i
-------+---
 (0,1) | 1
 (0,2) | 2
 (0,3) | 3
(3 rows)

test=*# update order_test set i=20 where i=2;
UPDATE 1
test=*# update order_test set i=2 where i=20;
UPDATE 1
test=*# select ctid, i from order_test ;
 ctid  | i
-------+---
 (0,1) | 1
 (0,3) | 3
 (0,5) | 2
(3 rows)



Now a pg_dump:

ALTER TABLE public.order_test OWNER TO webmaster;

--
-- Data for Name: order_test; Type: TABLE DATA; Schema: public; Owner:
webmaster
--

COPY order_test (i) FROM stdin;
1
3
2
\.


Now the question: what is the correct order?


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

Re: quick question abt pg_dump and restore

From
"Josh Harrison"
Date:


On Jan 9, 2008 9:12 AM, A. Kretschmer <andreas.kretschmer@schollglas.com> wrote:
am  Wed, dem 09.01.2008, um  9:02:23 -0500 mailte Josh Harrison folgendes:
> Hi,
> When restoring the pg_dumped data thro psql does the rows of the table are
> restored in the same order? ie for example if
> Table A has rows r1,r2,r3,r4,r5 in this order, then if I pg_dump and restore it
> to another database, will it have the rows in the same order r1,r2,r3,r4,r5?

No. If you need an order than you need an ORDER BY in the
SELECT-Statement.

Fine. I can use order by when I want  to order it in terms of some columns. But What if I want to maintain the same order as in the database1? ie., I want my rows of TableABC in Database2 to be the same order as the rows in TableABC in Database 1 ???

Thanks
josh

Re: quick question abt pg_dump and restore

From
Alvaro Herrera
Date:
Josh Harrison escribió:

> Fine. I can use order by when I want  to order it in terms of some columns.
> But What if I want to maintain the same order as in the database1? ie., I
> want my rows of TableABC in Database2 to be the same order as the rows in
> TableABC in Database 1 ???

You can't.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: quick question abt pg_dump and restore

From
"Josh Harrison"
Date:


On Jan 9, 2008 9:35 AM, A. Kretschmer <andreas.kretschmer@schollglas.com> wrote:
am  Wed, dem 09.01.2008, um 14:07:13 +0000 mailte Raymond O'Donnell folgendes:
> On 09/01/2008 14:02, Josh Harrison wrote:
>
> >When restoring the pg_dumped data thro psql does the rows of the table
> >are restored in the same order? ie for example if
> >Table A has rows r1,r2,r3,r4,r5 in this order, then if I pg_dump and
> >restore it to another database, will it have the rows in the same order
> >r1,r2,r3,r4,r5? Does this apply to big tables also?
>
> If you use the text dump format, you can see the order in which the rows
> are restored.

Right, but within the table the rows hasn't an order. You can see this
when you select the ctid-column:

test=# create table order_test (i int);
CREATE TABLE
test=*# insert into order_test values (1);
INSERT 0 1
test=*# insert into order_test values (2);
INSERT 0 1
test=*# insert into order_test values (3);
INSERT 0 1
test=*# select ctid, i from order_test ;
 ctid  | i
-------+---
 (0,1) | 1
 (0,2) | 2
 (0,3) | 3
(3 rows)

test=*# update order_test set i=20 where i=2;
UPDATE 1
test=*# update order_test set i=2 where i=20;
UPDATE 1
test=*# select ctid, i from order_test ;
 ctid  | i
-------+---
 (0,1) | 1
 (0,3) | 3
 (0,5) | 2
(3 rows)



Now a pg_dump:

ALTER TABLE public.order_test OWNER TO webmaster;

--
-- Data for Name: order_test; Type: TABLE DATA; Schema: public; Owner:
webmaster
--

COPY order_test (i) FROM stdin;
1
3
2
\.


Now the question: what is the correct order?

All my requirement is that the dumped table in database2 should be in the same order as the original table(at the time of dump) in database1 .
Thanks
josh

Re: quick question abt pg_dump and restore

From
"A. Kretschmer"
Date:
am  Wed, dem 09.01.2008, um  9:45:11 -0500 mailte Josh Harrison folgendes:
> What if I want to maintain the same order as in the database1? ie., I want my
> rows of TableABC in Database2 to be the same order as the rows in TableABC in
> Database 1 ???

For what reason?

Again: there is no order within the database.


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

Re: quick question abt pg_dump and restore

From
"Josh Harrison"
Date:


On Jan 9, 2008 9:59 AM, A. Kretschmer <andreas.kretschmer@schollglas.com> wrote:
am  Wed, dem 09.01.2008, um  9:45:11 -0500 mailte Josh Harrison folgendes:
> What if I want to maintain the same order as in the database1? ie., I want my
> rows of TableABC in Database2 to be the same order as the rows in TableABC in
> Database 1 ???

For what reason?

Again: there is no order within the database.

Thanks...
Another quick question...When you issue a query like this
select * from dummy limit 10
What 10 rows are fetched? like first 10  or last 10 or  the first 10 from first block or.... ?
And this query always returns the same 10 rows (when there are no updates/deletes)


josh

Re: quick question abt pg_dump and restore

From
Alvaro Herrera
Date:
Josh Harrison escribió:

> Another quick question...When you issue a query like this
> select * from dummy limit 10
> What 10 rows are fetched? like first 10  or last 10 or  the first 10 from
> first block or.... ?

Any 10.  (First 10 in the physical table _if_ a seqscan is used).

> And this query always returns the same 10 rows (when there are no
> updates/deletes)

Yes, assuming there is no VACUUM or CLUSTER either.  And no inserts
either if there is free space on the first pages.

You really shouldn't trust it.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: quick question abt pg_dump and restore

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Josh Harrison escribi�:
>> Fine. I can use order by when I want  to order it in terms of some columns.
>> But What if I want to maintain the same order as in the database1? ie., I
>> want my rows of TableABC in Database2 to be the same order as the rows in
>> TableABC in Database 1 ???

> You can't.

According to the SQL standard, a table is an *unordered* collection of
rows, and the results of any query are produced in an unspecified order
(unless you use ORDER BY).  The ambiguity about row ordering is
intentional and is exploited by most DBMSes including Postgres to
improve implementation efficiency.  If you assume there is such a thing
as a specific ordering within a table, you'll live to regret it
eventually.

            regards, tom lane

Re: quick question abt pg_dump and restore

From
"Josh Harrison"
Date:


On Jan 9, 2008 10:27 AM, Alvaro Herrera <alvherre@commandprompt.com> wrote:
Josh Harrison escribió:

> Another quick question...When you issue a query like this
> select * from dummy limit 10
> What 10 rows are fetched? like first 10  or last 10 or  the first 10 from
> first block or.... ?

Any 10.  (First 10 in the physical table _if_ a seqscan is used).

Okay. Here is another scenario where Im confused.
I have a a table with around  30,000,000 recs. This is not a production system but a test system. So in the test system generally we upload the rows in some order say rows corresponding to a particular  patient or something like that. But in the production system, it generally doesn't happen like that. The rows of 1 particular patient can be shuffled anywhere (ie., inserted in any order). We r trying to duplicate the same by shuffling te data in the table so that the rows are not in any order and also not stored in contiguous blocks

So now I have a table Dummy with 30,000,000 recs and a table Shuffled_Dummy (Create table Shuffled_Dummy as select * from Dummy order by random() ) with the same shuffled rows of dummy.

My questions
1.  I pg_dumped dummy and Shuffled_dummy (from database1) to another database(database2)
When I issued the query in both database (database1 and database2)

select * from dummy limit 1000 ( the planner chooses seq scan for this query)
 ----- the output results from dummy are different in the 2 databases
But
select * from shuffled_dummy limit 1000 (planner chooses seq scan)
 ----- the outputs from shuffled_dummy are same from both the database

Why?

2. Also when does the planner switch from choosing index scan to bitmap index scan? Is it dependent on the number of rows to be retrieved or the position of the relevant data in the blocks or something else?

Thanks
josh

Re: quick question abt pg_dump and restore

From
Ivan Sergio Borgonovo
Date:
On Wed, 09 Jan 2008 10:54:21 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Josh Harrison escribió:
> >> Fine. I can use order by when I want  to order it in terms of
> >> some columns. But What if I want to maintain the same order as
> >> in the database1? ie., I want my rows of TableABC in Database2
> >> to be the same order as the rows in TableABC in Database 1 ???
>
> > You can't.
>
> According to the SQL standard, a table is an *unordered* collection
> of rows, and the results of any query are produced in an
> unspecified order (unless you use ORDER BY).  The ambiguity about
> row ordering is intentional and is exploited by most DBMSes
> including Postgres to improve implementation efficiency.  If you
> assume there is such a thing as a specific ordering within a table,
> you'll live to regret it eventually.

Does it make any sense *knowing* how the implementation works to load
records in a table in a specific order to improve performances?

And yeah I know that once you start deleting/updating row you may
lose the advantage you gained betting on some peculiarity of the
implementation... but in case you're dealing with a mostly static
table?

eg. if I'm importing a table does it make any sense to pre-sort it
before importing it in postgres?

--
Ivan Sergio Borgonovo
http://www.webthatworks.it


Re: quick question abt pg_dump and restore

From
Andrew Sullivan
Date:
On Wed, Jan 09, 2008 at 05:28:15PM +0100, Ivan Sergio Borgonovo wrote:
> Does it make any sense *knowing* how the implementation works to load
> records in a table in a specific order to improve performances?

Well, this is more or less what CLUSTER does.  There are some cases where
happening to know about the order the table is in will yield happy effects,
yes.

A


Re: quick question abt pg_dump and restore

From
"Josh Harrison"
Date:


On Jan 9, 2008 11:28 AM, Ivan Sergio Borgonovo <mail@webthatworks.it> wrote:
On Wed, 09 Jan 2008 10:54:21 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Alvaro Herrera < alvherre@commandprompt.com> writes:
> > Josh Harrison escribió:
> >> Fine. I can use order by when I want  to order it in terms of
> >> some columns. But What if I want to maintain the same order as
> >> in the database1? ie., I want my rows of TableABC in Database2
> >> to be the same order as the rows in TableABC in Database 1 ???
>
> > You can't.
>
> According to the SQL standard, a table is an *unordered* collection
> of rows, and the results of any query are produced in an
> unspecified order (unless you use ORDER BY).  The ambiguity about
> row ordering is intentional and is exploited by most DBMSes
> including Postgres to improve implementation efficiency.  If you
> assume there is such a thing as a specific ordering within a table,
> you'll live to regret it eventually.

Does it make any sense *knowing* how the implementation works to load
records in a table in a specific order to improve performances?

And yeah I know that once you start deleting/updating row you may
lose the advantage you gained betting on some peculiarity of the
implementation... but in case you're dealing with a mostly static
table?

eg. if I'm importing a table does it make any sense to pre-sort it
before importing it in postgres?
 
Okay. Let me explain this again
Lets say you load the data related to a particular person sequentially into a table in the test  database. This results in all of the data for any one person being located one or a very few sequential data blocks.  Testing access to the person's data then reveals access to be very fast.  However, the data is never loaded in that way in the production database.  It is almost always spread out across many data blocks within the database, roughly organized by the date-time in which the data arrived.In this case access to a particular person's data is not as fast as compared to the previous one where the data are located close to each other.

we have this problem when we compare Oracle's performance with postgres since Oracle has index-only scan where it can access the data just from the index when the query involves only indexed columns. But since postgres currently doesn't implement index-only scan and goes to the heap for fetching the tuples it becomes very slow when the data are shuffled

Let me know if it makes sense now
Thanks
josh

Re: quick question abt pg_dump and restore

From
"Josh Harrison"
Date:


On Jan 9, 2008 11:39 AM, Andrew Sullivan <ajs@crankycanuck.ca> wrote:
On Wed, Jan 09, 2008 at 05:28:15PM +0100, Ivan Sergio Borgonovo wrote:
> Does it make any sense *knowing* how the implementation works to load
> records in a table in a specific order to improve performances?

Well, this is more or less what CLUSTER does.  There are some cases where
happening to know about the order the table is in will yield happy effects,
yes.

You are right. Sometimes when i cluster the table according to the frequently accessed indexes then it makes queries pretty fast. But its not a feasible solution always since some tables have more indexes which are accessed frequently. So clustering the table according to one index will yield poor performance to queries involving other indexes. Index-only scan is a good solution for this I guess for queries involving indexed columns (like in oracle) !!!

josh

Re: quick question abt pg_dump and restore

From
Andreas Kretschmer
Date:
Josh Harrison <joshques@gmail.com> schrieb:
> My questions
> 1.  I pg_dumped dummy and Shuffled_dummy (from database1) to another database
> (database2)
> When I issued the query in both database (database1 and database2)
>
> select * from dummy limit 1000 ( the planner chooses seq scan for this query)
> select * from shuffled_dummy limit 1000 (planner chooses seq scan)
>
>
> 2. Also when does the planner switch from choosing index scan to bitmap index
> scan? Is it dependent on the number of rows to be retrieved or the position of
> the relevant data in the blocks or something else?

For a select * ... without a WHERE the db can't use an index, this query
forced a seq-scan.

A index is used when:
- a index are created
- a propper WHERE or ORDER BY in the query
- this index is useful
  (a index isn't useful, for instance, for a small table or when almost
  all rows are in the result set)

A bitmap index scan performed when:
- 2 or more propper indexes available
- see above



Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."    (unknow)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°

Re: quick question abt pg_dump and restore

From
Andrew Sullivan
Date:
On Wed, Jan 09, 2008 at 11:51:16AM -0500, Josh Harrison wrote:

> accessed frequently. So clustering the table according to one index will
> yield poor performance to queries involving other indexes.

Maybe not poor, but certainly not optimised.

> Index-only scan is a good solution for this I guess for queries involving
> indexed columns (like in oracle) !!!

I think I don't know what you mean by "index-only scan".  Oracle can't be
returning you data just by scanning an index, can it?  It needs to get the
tuple if you need it back.

My bet is that you haven't tuned your vacuums correctly, or you aren't doing
ANALYSE often enough on the affected tables, or you need to SET STATISTICS
higher on some of the columns in order to get better estimates (and
therefore better plans).

A

Re: quick question abt pg_dump and restore

From
"Josh Harrison"
Date:


On Jan 9, 2008 11:56 AM, Andreas Kretschmer <akretschmer@spamfence.net> wrote:
Josh Harrison <joshques@gmail.com> schrieb:
> My questions
> 1.  I pg_dumped dummy and Shuffled_dummy (from database1) to another database
> (database2)
> When I issued the query in both database (database1 and database2)
>
> select * from dummy limit 1000 ( the planner chooses seq scan for this query)
> select * from shuffled_dummy limit 1000 (planner chooses seq scan)
>
>
> 2. Also when does the planner switch from choosing index scan to bitmap index
> scan? Is it dependent on the number of rows to be retrieved or the position of
> the relevant data in the blocks or something else?

For a select * ... without a WHERE the db can't use an index, this query
forced a seq-scan.

A index is used when:
- a index are created
- a propper WHERE or ORDER BY in the query
- this index is useful
 (a index isn't useful, for instance, for a small table or when almost
 all rows are in the result set)

A bitmap index scan performed when:
- 2 or more propper indexes available
- see above

Thanks
In my database, I have a table 'person' containing roughly 30,000,000 records

explain select count(*) from person where person_id >  1147000000
QUERY PLAN                                                                                                  
 ------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=307708.20..307708.21 rows=1 width=0)                                                       
   ->  Index Scan using person_pk on person  (cost=0.00..307379.79 rows=131364 width=0)
         Index Cond: (person_id > 1147000000::numeric)                                            

 3 record(s) selected [Fetch MetaData: 0/ms] [Fetch Data: 0/ms]
This returns the result
count   
 --------
 78718 

But for this query where the condition is slightly different the query plan is different. The planner goes for bitmap index

explain select count(*) from person where person_id >  1146000000
QUERY PLAN                                                                                     
 -----------------------------------------------------------------------------------------------
 Aggregate  (cost=342178.51..342178.52 rows=1 width=0)                                          
   ->  Bitmap Heap Scan on person  (cost= 3120.72..341806.71 rows=148721 width=0)     
         Recheck Cond: (person_id > 1146000000::numeric)                             
         ->  Bitmap Index Scan on person_pk  (cost=0.00..3083.53 rows=148721 width=0)
               Index Cond: (person_id > 1146000000::numeric)                         

 5 record(s) selected [Fetch MetaData: 0/ms] [Fetch Data: 0/ms]
and the result is
count   
 --------
 90625 

How does the planner choose the plan?
josh

Re: quick question abt pg_dump and restore

From
"Josh Harrison"
Date:


On Jan 9, 2008 12:11 PM, Andrew Sullivan <ajs@crankycanuck.ca> wrote:
On Wed, Jan 09, 2008 at 11:51:16AM -0500, Josh Harrison wrote:

> accessed frequently. So clustering the table according to one index will
> yield poor performance to queries involving other indexes.

Maybe not poor, but certainly not optimised.

> Index-only scan is a good solution for this I guess for queries involving
> indexed columns (like in oracle) !!!

I think I don't know what you mean by "index-only scan".  Oracle can't be
returning you data just by scanning an index, can it?  It needs to get the
tuple if you need it back.
 
For example if I have a query like
select  column2 from ABC where column1 > 20
and table ABC is indexed on (column1,column2) then Oracle will not goto the heap to fetch the tuples. It will return them from the index itself since the column information is available in the index. But postgres always goes to the heap even if it has all the queried columns in the index.
For queries that involves all the columns(like select * from ABC where column1>20) ... of course oracle goes to the heap to fetch them

My bet is that you haven't tuned your vacuums correctly, or you aren't doing
ANALYSE often enough on the affected tables, or you need to SET STATISTICS
higher on some of the columns in order to get better estimates (and
therefore better plans).

I had vacuumed and analysed the tables ...tried increasing the statistics too. But the performance compared to oracle for these types of queries(that i had mentioned above) is pretty slow

josh

Re: quick question abt pg_dump and restore

From
Andrew Sullivan
Date:
On Wed, Jan 09, 2008 at 12:24:37PM -0500, Josh Harrison wrote:
> For example if I have a query like
> select  column2 from ABC where column1 > 20
> and table ABC is indexed on (column1,column2) then Oracle will not goto the
> heap to fetch the tuples. It will return them from the index itself since
> the column information is available in the index.

Ah, I didn't know that.  If your description is right, there would seem to
be some pretty severe I/O penalties when using an index that doesn't cover
all the fields you want.  I mean, I can see the gain, but it would seem
that's a significant cost.  Anyway, no point in speculating about the
implementation details of a system I don't know in detail.

> I had vacuumed and analysed the tables ...tried increasing the statistics
> too. But the performance compared to oracle for these types of queries(that
> i had mentioned above) is pretty slow

I see.  Without rather more information, we're not going to be able to help
you.  What you could do is take some detailed examples to the -performance
list.  Oracle is terrifically capable, but Postgres can usually hold its own
when correctly tuned (there are some cases where it can't, though).

A

Re: quick question abt pg_dump and restore

From
Gregory Stark
Date:
"Josh Harrison" <joshques@gmail.com> writes:

>  Aggregate  (cost=342178.51..342178.52 rows=1 width=0)
>    ->  Bitmap Heap Scan on person  (cost=3120.72..341806.71 rows=148721 width=0)
>          Recheck Cond: (person_id > 1146000000::numeric)
>          ->  Bitmap Index Scan on person_pk  (cost=0.00..3083.53 rows=148721 width=0)
>                Index Cond: (person_id > 1146000000::numeric)
...
> How does the planner choose the plan?

The short answer is that it looks at the "cost" for all the reasonable plans
it can come up with and picks the plan with the lowest cost.

Scans which return many rows will tend to prefer bitmap index scans (when they
don't do a full sequential scan of the t table) since it avoids random access.


--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!