When a “root” crashes

On the ec2-for-poets list, Michael Helleson writes:

“I’ve been getting errors with config.root during the nightly backup operations. It hasn’t been able to sucessfully back it up in a week or more. The size of the file in Guest Databases is over 700 megabytes.

“I attempted to do the filemenu.savecopy(“c:\config.root”) command and got this error.

A picture named crash.gif

“I checked in c: and there is no copy of config.root so I’m assuming the backup didn’t happen.

“I’m willing to let it go like this since River2/Blork still works just fine but I think if config.root gets much bigger it will be bad.”

What to do

This is a serious situation, do not let it go on. It may appear that everything is fine, but eventually you will lose the database.

1. Take your server off the air. Quit the OPML Editor.

2. If you’re running a Windows server, in the Task Manager app, force flaunch.exe to quit. This app may have been automatically launched by opml.exe. It periodically checks if opml.exe is running, and if not, it launches it. If you’re slow at this step you may have to repeat step 1.

3. Make a physical copy of the file that’s reporting the problem, in case something goes wrong with the recovery. Put the copy in a safe place, somewhere outside the OPML folder. The Desktop is where I usually put these files.

4. Re-launch OPML.

5. Bring the database with the problem to the front. Assume it’s config.root, as in Michael’s example.

6. Choose the Quick Script command from the Misc menu. This opens a small window that you can enter a script into.

7. Enter this into the Quick Script window: filemenu.savecopy (“C:\\config.root”) and press Enter. Wait.

8. If the script finishes, great. It was able to recover the database. But it probably didn’t.

9. If you got an error, it should have given you an idea where the problem is. In Michael’s case it was a sub-table of config.river2.feeds. Carefully navigate to that location in the database, and put the cursor on the problem table.

A picture named badspot.gif

10. Choose Copy Address from the Table menu.

11. Choose Quick Script from the Misc menu.

12. Enter table.jettison (@ then Paste then enter ) — and count to 10 then press Enter.

A picture named jettison.gif

This is an undocumented verb, used only in drastic situations like this one. Do not write scripts using it, because it doesn’t reclaim the space used by the object that is jettisoned. It’s safe to call in this situation because of that.

13. Immediately the bad bit disappears. (This happens instantaneously.)

14. Still in the Quick Script window, enter filemenu.savecopy (“C:\\config.root”) and press Enter. This should take a minute or so to run, and if that was the only problem in the database, it will run to completion without error. You can tell that it did because there’s a value in the bottom of the Quick Script window (screen shot) and there’s a file at the top level of the C disk called config.root.

15. Repeat steps 1 and 2 to quit the OPML Editor and flaunch.exe if it’s running.

16. In the filesystem, navigate to the location of the database that was crashed, and change its name from config.root to configCrashed.root. (If you get an error that the file is busy this probably means that OPML has re-launched. Repeat steps 1 and 2 until this works.)

17. Copy the saved file from the top level of disk C into the location of the crashed file. It should be considerably smaller. This is because all the unused space in the file has been “squeezed” out.

18. Relaunch the OPML Editor.

When this happens

You might wonder what causes a database to go bad.

It doesn’t happen very often, because the way the underlying code is written, it does all its work with a database quickly and doesn’t leave the file on disk in an invalid state. So when the program crashes, or if you force-quit it, almost always the data in the files are safe.

But occasionally the program crashes while it’s in the middle of writing out a critical resource, like the “avail list” or the items in a table. The database software that’s built into the OPML Editor complains if it finds something out of place. A “free” node where there should only be used ones. Or vice versa. This is an indication that a serious problem is coming soon. The data isn’t yet lost, but if you continue, it will be.

That’s why the first thing you must do when you notice the problem is to take a snapshot of the database, so if the problem gets worse you can try to recover from something better.

But knock wood, it doesn’t happen very often. But when it does, it pays to be prepared.

Bookmark this page

In all the years that the OPML Editor and its ancestors have been around, I’ve always meant to write one of these howtos. Now I have. When someone says they have this problem, point them to this page.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: