6 messages in com.perforce.perforce-user[p4] Backup procedure for the journal...| From | Sent On | Attachments |
|---|---|---|
| Peter Steiner | 01 Mar 2002 00:51 | |
| Schaible, Jorg | 01 Mar 2002 02:31 | |
| Russell C. Jackson | 01 Mar 2002 06:57 | |
| Jeff A. Bowles | 01 Mar 2002 09:29 | |
| Justin Hahn | 01 Mar 2002 09:48 | |
| Jeff A. Bowles | 01 Mar 2002 10:26 |
| Subject: | [p4] Backup procedure for the journalfile?![]() |
|---|---|
| From: | Jeff A. Bowles (ja...@piccoloeng.com) |
| Date: | 03/01/2002 09:29:54 AM |
| List: | com.perforce.perforce-user |
At 06:57 AM 3/1/2002 -0800, Russell C. Jackson wrote:
Don't backup the journal file, it is always open by Perforce. If you have to restore from your backup tape, you would only have your data up to the point of the checkpoint, so the current journal entries wouldn't do you any good anyway.
Not so fast there, son! ;-)
While you are correct that a Perforce installation, that's recovered from the 5 AM checkpoint and the 5 AM depot/* tree and then had the 5:04 PM journal applied to the db.* files, isn't usable because it has db.* entries made during the day and no depot/* tree to deliver on what the metadata promised...
Wait, that sounds a lot like several recent Presidential elections. But I digress...
... anyhow, those more recent journal entries ARE handy to have lying around in the face of a disaster. (Just don't use them for updating your restored production server's db.* files.) Here's my approach: 1. Make checkpoints frequently. At least once a week, and if you're doing loads of submissions and the like, perhaps more frequently. (I like "every night" if possible.) 2. Back up, using your reliable tar/dump/backup programs, your checkpoints, depot/* trees, and the journal as of the instant of the checkpoint. [Aside: Some of you might be on platforms that will put a lock on the journal file during your backup, causing Perforce writes to the journal to fail. That's undesirable, and you might want to either "copy/cp" the journal to another filename (if you can do so without hitting that problem) before the backup and don't back up "journal" - or, run "p4d -h" to see if there are other options available. "p4d -h" shows me, on version 2001.1, that there's a "p4d -jj" option to copy/truncate the journal without incurring the overhead of a checkpoint. I'd do that, but I'd try to never need more than ONE checkpoint and ONE journal to restore a system completely. (I've always been nervous about restoring from multiple backup sets. Get the order wrong or something, oh, it's messy.) So, back up checkpoints and depot/* trees and journals. "p4d -jj" might be very helpful, but I wouldn't overuse it instead of running "p4 admin checkpoint" because the act of making a checkpoint walks every db.* data structure, giving you an important safety check: no pathological corruption in the db.* trees. Doesn't happen in my experience unless you have hardware failing (or a very infrequent occurrence of a bug, but I cite that just to avoid having someone mention "but what about the time..."), but it's helpful.] a. Consider copying your P4ROOT files (db.*, journal, depot/*) to a file server as an initial backup, and letting the IS folks back up the file server. That way, if your Perforce server machine explodes but the file server's okay, you don't have to bug IS to get your files back. (IS departments, well, take time to find those backup tapes. In fact, see the next point.) b. In fact, find OUT how long it takes IS to pull files off backup tapes. b1. Send the following e-mail to them, "I need to recover a copy of /u2/perforce from machine 'xyzzy' from this morning's backup, could you please pull it down and put it somewhere on machine 'plover' for me?" and look at the clock. Write down the time. Every time you get e-mail on the subject in the thread, write down the time. Record how long it takes to the get files back. b2. Do the same thing, but try to recover a file from the backup 9 days ago. b3. Do the same thing, but go for 70 days ago. If you find that it takes DAYS or WEEKS to get those files back from your IS department, you pretty much have no choice: backup to a fileserver, let them back that up. If you lose both the Perforce server and your backups on the fileserver, because the building fell over in an earthquake, then you're still relying on IS. But for most other cases, you can have your backups from the file server faster than IS can even READ the mail asking them to find a backup tape/disk. 3. On an unrelated machine, every so often, read in the backup (checkpoint+depot, not journal) and start a test server on that backup. Run "p4 verify //depot/..." to make sure that all depot/* revisions, referred to in the db.* metadata, are there. (You might need a spare license file for that backup server; the folks at Perforce might be able to help on that front.) 4. When the big day comes, and you are royally screwed if these backups aren't good (the dweeb in the GUI group decided to rewire his computer and shorted out every machine on the net - since the network is wireless, this took some planning, I'd say ;-), you engage two machines: the machine you're restoring to, and a second one for a copy of that data. (If you need to do surgery, you have a place to do it for experimentation. Of course, you also have those backups.) a. Restore checkpoint+depot to your server. Promise yourself that you'll run no commands that modify data, until you're satisfied that everything's "fine". (Perhaps you should start that server on a port number other than 1666 for this phase, to keep users from connecting and thinking it's up?) [Aside: if I have data on the production machine's server area and I'm restoring a checkpoint and depot and so on, I always make a copy of the original db.* trees somewhere "to be safe". That's just me.] b. Restore checkpoint+depot+journal onto a test area. Many commands will fail against this test area: there's several hours of new submissions recorded in the metadata but not actually stored in the depot/* tree. You can find out EXACTLY which files were submitted using commands like "p4 files @1239, at now" (and "p4 filelog" and so on are handy), even if the contents of the revisions are gone. From these file lists, and from "p4 describe" output, you can know WHO changed WHAT and from WHERE (which client workspace), so you can quickly generate e-mail that looks like this: Jack, you submitted changelist 1245 between the time that we did our backups and the time that the machines died. You modified these files: //depot/src/lib/x.c //depot/src/lib/makefile You probably have your updates in your workspace still, and they need to be resubmitted. To do that, please run: p4 edit //depot/src/lib/x.c p4 edit //depot/src/lib/makefile and then see if "p4 diff" tells you that your changes are on the local copy of the file but not in the server's copy. (That's what we expect.) If that's the case, please submit the changes again; if not, come find me or Sue and we'll get that sucker checked back in for you.
In other words, you can use the journal up to the instant of the backup, or better if it's up to the instant of the crash, to tell you the extent of the damage. (There's another mail you'll be sending to people who did 'p4 sync' operations during the same period, to get them to do "p4 diff -se" commands to make sure that the database idea of what they have ('p4 have' data) and reality don't disagree. Those people might need to resync if there's a problem, with 'p4 sync #have' as the first step.)
It's not "to the second" backups, which aren't quite possible, but can get you closer and closer. Especially if you do that fileserver approach and do incremental copies to it during the day.
[Aside: There are several things in here that I mention but don't spend lots of time on. One is that you need to really know the weak points of your recovery process before you need them; it does NOT do to find out that the IS folks don't keep backups for more than two weeks, or that they take four days to respond to restore requests, when you have people breathing down YOUR neck to get the server back. Know those exposures. (Another is that "p4d -jj" trick. There are other ways into that sort of thing, which others will elaborate on.) Lastly, if you have multiple depots, you'll have to read "all depot trees on the server" for each time I wrote "depot/* above.)]
Longwindedly yours,
Jeff Bowles San Francisco




