Re: pg_upgrade --jobs - Mailing list pgsql-general

From senor
Subject Re: pg_upgrade --jobs
Date
Msg-id BYAPR01MB3701DCD876A3BB05640A8092F7530@BYAPR01MB3701.prod.exchangelabs.com
Whole thread Raw
In response to Re: pg_upgrade --jobs  (Adrian Klaver <adrian.klaver@aklaver.com>)
Responses Re: pg_upgrade --jobs  (Adrian Klaver <adrian.klaver@aklaver.com>)
List pgsql-general
Thank you Adrian. I'm not sure if I can provide as much as you'd need for a definite answer but I'll give you what I
have.

The original scheduled downtime for one installation was 24 hours. By 21 hours it had not completed the pg_dump
schema-onlyso it was returned to operation. 
The amount of data per table is widely varied. Some daily tables are 100-200GB and thousands of reports tables with
statsare much smaller. I'm not connected to check now but I'd guess 1GB max. We chose to use the --link option partly
becausesome servers do not have the disk space to copy. The time necessary to copy 1-2TB was also going to be an issue. 
The vast majority of activity is on current day inserts and stats reports of that data. All previous days and existing
reportsare read only. 
As is all too common, the DB usage grew with no redesign so it is a single database on a single machine with a single
schema.
I get the impression there may be an option of getting the schema dump while in service but possibly not in this
scenario.Plan B is to drop a lot of tables and deal with imports later. 

I appreciate the help.

________________________________________
From: Adrian Klaver <adrian.klaver@aklaver.com>
Sent: Sunday, April 7, 2019 8:19 AM
To: senor; pgsql-general@lists.postgresql.org
Subject: Re: pg_upgrade --jobs

On 4/6/19 5:47 PM, senor wrote:
> Thanks Tom for the explanation. I assumed it was my ignorance of how the schema was handled that was making this look
likea problem that had already been solved and I was missing something. 
>
> I fully expected the "You're Doing It Wrong" part. That is out of my control but not beyond my influence.
>
> I suspect I know the answer to this but have to ask. Using a simplified example where there are 100K sets of 4
tables,each representing the output of a single job, are there any shortcuts to upgrading that would circumvent
exportingthe entire schema? I'm sure a different DB design would be better but that's not what I'm working with. 

An answer is going to depend on more information:

1) What is the time frame for moving from one version to another?
Both the setup and the actual downtime.

2) There are 500,000+ tables, but what is the amount of data involved?

3) Are all the tables active?

4) How are the tables distributed across databases in the cluster and
schemas in each database?


>
> Thanks
>
> ________________________________________
> From: Ron <ronljohnsonjr@gmail.com>
> Sent: Saturday, April 6, 2019 4:57 PM
> To: pgsql-general@lists.postgresql.org
> Subject: Re: pg_upgrade --jobs
>
> On 4/6/19 6:50 PM, Tom Lane wrote:
>
> senor <frio_cervesa@hotmail.com><mailto:frio_cervesa@hotmail.com> writes:
>
>
> [snip]
>
> The --link option to pg_upgrade would be so much more useful if it
> weren't still bound to serially dumping the schemas of half a million
> tables.
>
>
>
> To be perfectly blunt, if you've got a database with half a million
> tables, You're Doing It Wrong.
>
> Heavy (really heavy) partitioning?
>
> --
> Angular momentum makes the world go 'round.
>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com



pgsql-general by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: assembling PGresults from multiple simultaneous queries (libpq, singlerowmode)
Next
From: Jess Wren
Date:
Subject: Re: How to use full-text search URL parser to filter query results bydomain name?