Thread: splitting a table?

splitting a table?

From

"Johnson, Shaunn"

Date:

20 June 2003, 11:02:57

Howdy:

Running PostgreSQL 7.2.1 on RedHat Linux 7.2.

I have a table with about 10 million records and
even though it's indexed on three key columns,
it's still pretty slow, I believe, because of the
size.

Is there a way to split the data so that I
can access all the records, but maintain
smaller tables?

Thanks!

-X

Re: splitting a table?

From

Richard Huxton

Date:

20 June 2003, 11:28:29

On Friday 20 Jun 2003 3:02 pm, Johnson, Shaunn wrote:
> Howdy:
>
> Running PostgreSQL 7.2.1 on RedHat Linux 7.2.
>
> I have a table with about 10 million records and
> even though it's indexed on three key columns,
> it's still pretty slow, I believe, because of the
> size.

Depends what you mean by "pretty slow" - can you provide us with and example
(EXPLAIN ANALYSE SELECT ...) and details of the schema?

> Is there a way to split the data so that I
> can access all the records, but maintain
> smaller tables?

You could split the table and build a view that UNIONs the individual tables,
but that's not very elegant and 10 million records isn't much.

Things to check for:
 - Have you run VACCUM FULL or VACUUM ANALYSE recently?
 - Are your "where" data-types the same as the index data-types?
 - See if the indexes are used in the EXPLAIN ANALYSE output.

--
  Richard Huxton

Re: splitting a table?

From

"Arjen van der Meijden"

Date:

20 June 2003, 11:37:17

> Richard Huxton wrote

> You could split the table and build a view that UNIONs the
> individual tables,
> but that's not very elegant and 10 million records isn't much.
I think that union-solution would be quite a bit slower than simply
searching through the table.

> Things to check for:
>  - Have you run VACCUM FULL or VACUUM ANALYSE recently?
>  - Are your "where" data-types the same as the index data-types?
>  - See if the indexes are used in the EXPLAIN ANALYSE output.
And of course whether the hd-parameters are set in case of an ide disk,
whether there is sufficient memory allocated for the shared buffers,
etc.

It may also be wise to have multiple indexes. If you, for instance,
search very often on a combination of values and very often on a single
value two indexes (the combined one and the separate one) could improve
both queries. Although the combined index would be used in both cases if
that were the only index, it isn't the most efficient index to use in
that case, afaik.

Arjen

Re: splitting a table?

From

"Johnson, Shaunn"

Date:

20 June 2003, 11:46:16

--thanks for the reply.

--pretty slow i guess is relative to other
--applications. there are other factors
--that i didn't put in (sorry).

* clients are using MS Access to grab
the data

* the server's load average is pretty
high (in our book, anything over 4 is
high)

--i have been *told* by my boss that the
--query is using the indexed columns in
--her where clause. i can only go by
--that - i will a copy of her query
--and test using 'explain analyze'.

--i started the vacuum verbose about 20
--minutes ago - i will see if it is any
--better after that.

--haven't done a vaccumdb / analyze
--on the entire db lately because of
--the errors i get

[snip from log]

FATAL 2: open of /var/lib/pgsql/data/pg_clog/0003 failed: No such file or
directory
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
connection to server was lost

[/snip from log]

--thanks again!

-X

-----Original Message-----
From: Richard Huxton [mailto:dev@archonet.com]

On Friday 20 Jun 2003 3:02 pm, Johnson, Shaunn wrote:
> Howdy:
>
> Running PostgreSQL 7.2.1 on RedHat Linux 7.2.
>
> I have a table with about 10 million records and
> even though it's indexed on three key columns,
> it's still pretty slow, I believe, because of the
> size.

Depends what you mean by "pretty slow" - can you provide us with and example
(EXPLAIN ANALYSE SELECT ...) and details of the schema?

> Is there a way to split the data so that I
> can access all the records, but maintain
> smaller tables?

You could split the table and build a view that UNIONs the individual tables,
but that's not very elegant and 10 million records isn't much.

Things to check for:
- Have you run VACCUM FULL or VACUUM ANALYSE recently?
- Are your "where" data-types the same as the index data-types?
- See if the indexes are used in the EXPLAIN ANALYSE output.

--
Richard Huxton

Re: splitting a table?

From

"Johnson, Shaunn"

Date:

20 June 2003, 11:54:29

--howdy:

--let me be sure i understand this:

--how can i calculate to see if i have enough
--memory allocated for the shared buffers?
--for example, we have a gig of ram ... what
--should the ratio look like.

--the indexes i have for the table looks like
--this:

[snip index]
test_db=> \d ref_dcg_pkey

   Index "ref_dcg_pkey"
Column |     Type
----------+---------------
contract | character(12)
mbr_num | character(2)
id       | smallint
unique btree (primary key)

[/snip index]

--are you saying that i should create
--a seperate one as well for each column
--that i may query against (say, i search
--for contract more often than the combo
--i have above)?

-----Original Message-----
From: Arjen van der Meijden [mailto:acm@tweakers.net]

[snip some stuff]

> Things to check for:
> - Have you run VACCUM FULL or VACUUM ANALYSE recently?
> - Are your "where" data-types the same as the index data-types?
> - See if the indexes are used in the EXPLAIN ANALYSE output.

And of course whether the hd-parameters are set in case of an ide disk,
whether there is sufficient memory allocated for the shared buffers,
etc.

It may also be wise to have multiple indexes. If you, for instance,
search very often on a combination of values and very often on a single
value two indexes (the combined one and the separate one) could improve
both queries. Although the combined index would be used in both cases if
that were the only index, it isn't the most efficient index to use in
that case, afaik.

Arjen

Re: splitting a table?

From

Tom Lane

Date:

20 June 2003, 12:05:19

"Johnson, Shaunn" <SJohnson6@bcbsm.com> writes:
> --haven't done a vaccumdb / analyze
> --on the entire db lately because of
> --the errors i get

> FATAL 2:  open of /var/lib/pgsql/data/pg_clog/0003 failed: No such file or
> directory

You need to be dealing with that, not ignoring it.  An update to 7.2.4
might help.

            regards, tom lane

Re: splitting a table?

From

"Johnson, Shaunn"

Date:

20 June 2003, 12:07:58

--i know! i'm having problems doing pg_dumps now
--... i'm getting 'corrupt memory' errors more and
--more when i dump the tables ... and the ones
--where it can't find the index and aborts the
--dump.

--that and the fact that i don't have much
--disk space to work with ... (i know ... no
--excuse!)

--guilty.

-X

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Friday, June 20, 2003 11:05 AM
To: Johnson, Shaunn
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] splitting a table?

"Johnson, Shaunn" <SJohnson6@bcbsm.com> writes:
> --haven't done a vaccumdb / analyze
> --on the entire db lately because of
> --the errors i get

> FATAL 2: open of /var/lib/pgsql/data/pg_clog/0003 failed: No such file or
> directory

You need to be dealing with that, not ignoring it. An update to 7.2.4
might help.

regards, tom lane

Re: splitting a table?

From

"Johnson, Shaunn"

Date:

20 June 2003, 15:22:40

--howdy:

--this is what i was talking about when doing
--things like 'vacuum' and 'pg_dump'.

**vacuum error**
[snip]

test_db=> vacuum verbose analyze ref_dcg ;
NOTICE: --Relation ref_dcg--
NOTICE: Index ref_dcg_pkey: Pages 290047; Tuples 43600255: Deleted 1397955.
        CPU 45.71s/74.86u sec elapsed 7755.21 sec.
FATAL 1: This connection has been terminated by the administrator.
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
test_db=>

[/snip]

**pg_dump error**
[snip]

Backing up schema for debborah_dev_bh_claimsum_4q02 ...
pg_dump: query to obtain list of indexes failed: ERROR: syscache lookup for index 4185182890 failed

[/snip]

--i'm looking for ways to move the data, save and
--repopulate the (new) database without losing
--much, if anything. if i could get some clean
--dumps, that would be cool ...

--i even have to export to MS Access for some
--smaller tables just to clean up some of the
--mess.

--anyway ... thanks.

-X

"Johnson, Shaunn" <SJohnson6@bcbsm.com> writes:
> --haven't done a vaccumdb / analyze
> --on the entire db lately because of
> --the errors i get

> FATAL 2: open of /var/lib/pgsql/data/pg_clog/0003 failed: No such file or
> directory

You need to be dealing with that, not ignoring it. An update to 7.2.4
might help.

regards, tom lane

Re: splitting a table?

From

Richard Huxton

Date:

20 June 2003, 15:47:22

On Friday 20 Jun 2003 7:22 pm, Johnson, Shaunn wrote:
> --howdy:
>
> --this is what i was talking about when doing
> --things like 'vacuum' and 'pg_dump'.
[snip]
> Backing up schema for debborah_dev_bh_claimsum_4q02 ...
> pg_dump: query to obtain list of indexes failed: ERROR:  syscache lookup
> for index 4185182890 failed

As Tom said, if you're on 7.2.1, upgrade to 7.2.4 immediately. Stop PG and
take a backup of the entire data/base directory tree, Being paranoid, I'd set
it up in parallel on a separaate machine or with a different PGDATA (and port
etc). Then see if pg_dump for 7.2.4 can cope. There were a number of bugfixes
between 7.2.1 and .4 - details in the release notes.

--
  Richard Huxton

Re: splitting a table?

From

Jeff Eckermann

Date:

20 June 2003, 15:57:50

Here's a link to a relevant thread that also shows how
to fix the problem, at least sufficiently to
successfully vacuum and/or dump the database.

This problem seems to be highly correlated with
versions in the 7.2.0 to 7.2.2 range, although AFAIK
no-one has ever traced it to a specific bug.  We had
the same problem with our 7.2.1 installation, but have
been running fine with 7.2.4 for some months now.

--- Richard Huxton <dev@archonet.com> wrote:
> On Friday 20 Jun 2003 7:22 pm, Johnson, Shaunn
> wrote:
> > --howdy:
> >
> > --this is what i was talking about when doing
> > --things like 'vacuum' and 'pg_dump'.
> [snip]
> > Backing up schema for
> debborah_dev_bh_claimsum_4q02 ...
> > pg_dump: query to obtain list of indexes failed:
> ERROR:  syscache lookup
> > for index 4185182890 failed
>
> As Tom said, if you're on 7.2.1, upgrade to 7.2.4
> immediately. Stop PG and
> take a backup of the entire data/base directory
> tree, Being paranoid, I'd set
> it up in parallel on a separaate machine or with a
> different PGDATA (and port
> etc). Then see if pg_dump for 7.2.4 can cope. There
> were a number of bugfixes
> between 7.2.1 and .4 - details in the release notes.
>
> --
>   Richard Huxton
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 2: you can get off all lists at once with the
> unregister command
>     (send "unregister YourEmailAddressHere" to
majordomo@postgresql.org)

__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com