How To Share Files Between EC2 Instances
Scaling your web servers to handle more traffic, setting up a redundant web hosting infrastructure and building a reliable mail server are a few reasons to share files between EC2 instances. Some factors you would want to consider when choosing how to share files between your EC2 instances are:
- Ease-of-use: how much setup does it take?
- Performance: how quickly will all EC2 instances get the updated data?
- Reliability: how safe is your data?
- Maintenance overhead: how much ongoing maintenance is required?
Here are a few common ways to share files between EC2 instances.
1. rsync + cron
rsync is a standard tool that you can use to copy files between your EC2 instances. It copies files in one direction: from a source instance to a destination instance. If you have two EC2 instances with different updates each, you need to run two rsyncs (one in each direction) to synchronize both instances.
In a primary-replica configuration, where there is only one primary instance that handles both reads and writes and all other instances are read-only replicas, you only need to setup rsync between every replica EC2 instance with the primary instance. The primary instance can become a performance bottleneck when there is a large number of writes, and it is also a single point of failure.
Sharing files with rsync for a primary-replica configuration
In a peer-to-peer configuration, where all EC2 instances handle both reads and writes to maximize the upload bandwidth and to reduce the risk of single point of failure, the rsync setup is more complicated. You will need to make sure updates from each EC2 instance are propagated to all other EC2 instances.
Sharing files with rsync for a peer-to-peer configuration
One not commonly known drawback of rsync is that directory renaming is not atomic. This means that a directory move from A to B involves 3 steps: creating a new directory B, copying files from A to B, and deleting the old directory A. If any file is added or updated during the move, you may not see the new files at the destination since the process is not atomic.
You can set up cron jobs to run rsync periodically between your EC2 instances. However, there is a lag time between an update on one instance and the next time the cron job is run. So, all your other EC2 instances will still be serving the old data in the meantime.
One way to improve the lag time is to increase the cron job frequency or to trigger rsync to run whenever there is a change. This will increase your network bandwidth usage, especially if you have a peer-to-peer setup between your EC2 instances.
An alternative to using rsync is to use a file system. With a file system, updates on your EC2 instance can be seen by all other EC2 instances without the lag time issue seen in the rsync with cron job approach. Next, we will review different types of file systems to share files between EC2 instances.
2. File system: NFS
Sharing files with NFS
NFS is a common shared file system that you can use to share files between EC2 instances. You will need to set up and maintain a dedicated server to run the NFS server and store your files. You will also need to determine how much storage you want in advance for this server, so you won’t run out of disk space quickly. The NFS server handles all reads and writes, and all EC2 instances will connect to it as clients.
Since the single NFS server stores and serves all your files, it can be a performance bottleneck and won’t scale up easily as the number of EC2 instances increases. It is also a single point of failure. So, if the NFS server is down, you will not be able to access any files from any of your EC2 instances.
3. Distributed file system: GlusterFS
Sharing files with GlusterFS
You can also use a distributed file system, like GlusterFS for sharing files between your EC2 instances. Similar to NFS, you will also need to setup and maintain servers in a GlusterFS storage cluster, and determine the amount of storage you need in advance. By having multiple servers in your GlusterFS storage cluster, you get increased redundancy, scalability and performance compared to NFS.
GlusterFS has a high storage cluster management overhead. Besides maintaining multiple storage servers (called ‘bricks’ in GlusterFS), you also need to add storage and rebalance data between ‘bricks’ when you run low on disk space. When GlusterFS gets into the occasional split-brain state, you also have to deal with the related file healing process.
4. Cloud file system: ObjectiveFS
Sharing files with ObjectiveFS
You can use a shared cloud file system like ObjectiveFS to share files between your EC2 instances. ObjectiveFS runs directly on your EC2 instances and uses reliable and durable cloud object stores like AWS S3. So, you don’t need to maintain any extra servers or take care of a storage cluster.
ObjectiveFS also has strong end-to-end integrity checks to protect your data both at rest and in motion. By building on top of a scalable object store, you now have storage that scales automatically and will never run out of disk space again. You also don’t have to decide how much storage you need in advance.
ObjectiveFS utilizes the scalability of cloud object stores like AWS S3, so your performance scales when you have more EC2 instances and you won’t run into storage performance bottlenecks. The performance is similar to running on a local hard drive.
With no single point of failure, your EC2 instances can come and go, and your data remains accessible by all of your other EC2 instances.
We have reviewed four different ways to share files between EC2 instances. rsync with cron job is easy to maintain and set up for a primary-replica configuration. The setup is more complicated for the robust peer-to-peer synchronization. The main issue with the rsync and cron job approach is the delay in propagating updates to all of your EC2 instances.
Shared file systems, such as NFS, GlusterFS or ObjectiveFS, remove the update lag time. NFS requires you to run and maintain a server, which is the single point of failure and potential performance bottleneck. GlusterFS requires you to run and maintain multiple servers, preferably across multiple availability zones, to keep your storage reliable and available.
ObjectiveFS removes the extra server/storage cluster management and takes advantage of the scalability of the cloud object store to give you performance and storage that scales with your EC2 instances. It presents a standard file system interface and also gives you storage that grows, so you don’t run out of disk space. With the ability to also share the file system with your Linux and Mac OS X laptop/office servers, we think this makes it the best solution when it comes to sharing files between EC2 instances.
by ObjectiveFS staff, June 29, 2015
ObjectiveFS is a shared file system for OS X and Linux that automatically scales and gives you scalable cloud storage.
Want to try ObjectiveFS? You can download a copy ObjectiveFS using our free trial.