JP Vossen on 12 Dec 2010 15:06:42 -0800


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] 3-way data mirror?


Why it is that no matter how many time I read, and re-read and proof-read these things, I always manage to forget some important details?


Date: Sun, 12 Dec 2010 00:52:00 -0500
From: JP Vossen<jp@jpsdomain.org>

I have a situation where I have 2 labs that need to keep ~150G of data
in sync, but they can't talk to each other.  They can talk to a 3rd
machine, which is also backed up, so...

What I think I'd like to do is have:

	Lab1<-->  COLO<-->  Lab2

The problem is that lab1 and lab2 are both read-write, so there is a
real possibility of stepping on changes.  COLO is read-only.

I'd try to just relay through the COLO, but that's also where the backup
will happen.  And I can't get the FW rules changed.

Any better ideas?

Any suggestions on tools?  I'd strongly prefer something already in the
CentOS-5 or EPEL repos.

Stuff I forgot:
1) I have only limited control over the COLO server. I can probably get something in a repo installed, more than that is iffy. This is an important server used for other things, and I'm only allowed to use it because it's there and has connectivity and space. 2) The COLO server is 32-bit RHEL5, the 2 Lab servers are 64-bit CentOS-5.5 and CentOS-5.4 (I can upgrade that one).
3) AFAIK, the only connectivity is SSH and I can't change that.
4) The WAN links are very slow, relative to, say, FiOS.
5) The data is a mix of large (DC & DVD ISOs) and small (docs, configs, RPMs, etc.).

I also thought about DRDB, but I'm pretty sure IT would shoot that down for the COLO server. And I'm not sure if/how that would work 3-way.


Date: Sun, 12 Dec 2010 01:02:59 -0500
From: Doug Stewart<zamoose@gmail.com>
Have you looked into Unison?

http://www.cis.upenn.edu/~bcpierce/unison/

I use Unison to sync my laptop & server when traveling. I considered it now, but a) forgot to mention that and b) dismissed it because I need something non-interactive. (I always use the GUI when I use it.)

But now that you mention it, I think it can run scripted over SSH, which is what I need to do. I'll need to re-read the docs on this and figure out haw it handles conflicts if running non-interactively.

And 'unison227' is in EPEL for CentOS-5.  :-)



Date: Sun, 12 Dec 2010 03:33:29 -0500
From: Brian Stempin <brian.stempin@gmail.com>

Perhaps something like this?
http://fak3r.com/2009/09/14/howto-build-your-own-open-source-dropbox-clone/

Interesting. Needs more thought. I *think* I can still see changes getting stepped on if I change something on Lab1 and someone else changes the same thing on Lab2.



Date: Sun, 12 Dec 2010 10:54:31 -0500
From: "K.S. Bhaskar" <bhaskar@bhaskars.com>

Might an OpenAFS file system (http://openafs.org) be an option?  Perhaps
with SELinux used to disable updates at COLO that are not replicated from
Lab1 and Lab2?

Tricker. One of the things I forgot to note was that I have limited control over the COLO server, and the connections are probably limited to SSH only. Also, the WAN links are relatively slow.



Date: Sun, 12 Dec 2010 11:51:23 -0500
From: "Gavin W. Burris" <bug@sas.upenn.edu>

I would consider using revision control, like SVN, if your data files
aren't too large, especially if the raw data doesn't change much.  The
problem is that changes in binary files do not benefit from diffs, with
each change requiring a complete upload/sync of the file.
http://subversion.apache.org/

Yup, I use CVS, SVN and BZR in various places. (I require CVS at work, but you can nest BZR under CVS for local revisions and just "publish" to CVS when done. A tad clunky, but it works. :)

However, much of the data is large binary files (CD and DVD ISOs).


Another option that would apply to future, more data intensive
collaboration, would be iRODS.  This is a more comprehensive solution
that builds a "data grid" for distributed projects.
https://www.irods.org/index.php/What_is_iRODS%3F

Wow, that's pretty cool. It sounds like overkill, and I may not be able to do it due to the above SSH and limited control issues, but I'll need to look deeper.


Thanks,
JP
----------------------------|:::======|-------------------------------
JP Vossen, CISSP            |:::======|      http://bashcookbook.com/
My Account, My Opinions     |=========|      http://www.jpsdomain.org/
----------------------------|=========|-------------------------------
"Microsoft Tax" = the additional hardware & yearly fees for the add-on
software required to protect Windows from its own poorly designed and
implemented self, while the overhead incidentally flattens Moore's Law.
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug