1. Using Your Production Assets in Development - A Hybrid File Storage Backend for Django

    We introduce our new approach to handling assets across different environments in Django projects.

    One of the problems developers face in maintaining a good development environment is how to have that environment be an accurate reflection of production. It’s usually relatively straightforward to have a good database, all you have to do is take a dump of the production database, anonymise any user data and load it into your development system. There are caveats of course. For example, your production database could be prohibitively large, but this solution works pretty well for a large number of cases.

    Site media is a different problem. In the case of onefinestay our production images (primarily photographs of our homes) take up around 100 gigabytes (a figure that’s quickly increasing), so it’s clearly impractical to copy them all when setting up a new environment (which we do very regularly when trying out new functionality in our numerous QA environments).

    The obvious solution is to use the production image server in our other environments, but this creates problems when it comes to handling file uploads. Should uploads go to the production server, or to an environment-specific storage such as the local filesystem? Hopefully you’ll agree that the second option is preferable, because the rest of this post is going to be about how we did it with Django.

    Enter Custom Storage Backends

    Django has supported custom file storage backends since version 1.0. In short they allow you to swap out the underlying mechanism for file storage so that saving to SFTP, Amazon S3, MogileFS etc, is no different to saving to the local filesystem (which is what the default storage backend does). There are a large number of 3rd-party backends available. For our production assets we’re using the S3boto backend that comes as part of the excellent django-storages project.

    We came up with the idea of a hybrid storage backend, one that is capable of talking to multiple underlying methods of handling files. The key idea is that for each operation, a number (in most cases, just 2) of different storage backends are queried in sequence, and we decide which one to use depending on the result from each one. We’ll start out by illustrating what this looks like in our Django settings file:

    DEFAULT_FILE_STORAGE = 'onefinestay.utils.storages.HybridStorage'
    
    HYBRID_STORAGE_BACKENDS = (
        'django.core.files.storage.FileSystemStorage',
        'onefinestay.utils.storages.ProductionStorage',
    )
    

    Hopefully you can see that the HybridStorage backend is configured by providing a list (or tuples, for those of you with keen eyesight) of other backends. The order matters here because the backends will be accessed in the order you define here, in the simplest case you want your environment’s main storage to come first and the production one to be last.

    Don’t be confused by the ProductionStorage class, it’s simply a subclass of S3BotoStorage that’s preconfigured to point at our production S3 bucket, and we have a similar subclass for Staging. We’ve done this so that it’s possible to interact with either bucket from any environment. But don’t worry, non-production environments are only given read-only access keys to the production bucket, likewise with staging.

    image

    For each operation a storage is expected to be able to perform (_open, _save, delete, exists, listdir, size, and url), the hybrid backend will iterate over each configured backend, attempting to call the operation, and moving to the next backend down if the operation fails. So (using the configuration above) to open a file based on its relative file path we first access FileSystemStorage and try to open the file. If it fails we try the ProductionStorage, and if that fails we finally return the expected error. This pattern is repeated across all the operations except for listdir which returns an aggregate of results across all storages and delete which we’ve left as a stub method. We’ve left delete empty for now because we don’t generally delete files through our application code, and we haven’t yet decided what the ideal delete behaviour should be. Even with the safety of read-only keys, we shouldn’t be trying to delete files from the production bucket.

    The nice thing about this approach is that it makes it easy to get up and running with a fresh copy of the production database quickly. Simply load in the new data, delete the contents of your local file storage and you’re good to go. Our integration test suite can operate in a similar manner, allowing us to run Selenium tests that include file uploads on an environment that closely mimics production.

    Here’s the full code for our implementation of the hybrid backend:

    Any drawbacks?

    The obvious issue is that we’re creating extra disk I/O with all the misses against the local storage backends. This could have a noticable effect on performance if you’re rendering a page with a lot of images. We consider this acceptable because it doesn’t affect our production environment (where we don’t use the hybrid storage at all), but you may feel differently.

    Another possibility is that you could upload a new file to your local file system and somebody else could upload a file with the same name to production. This shouldn’t normally pose an issue, because you won’t (yet) have the database record that’s expecting the production image rather than your local one, but it’s possible this could cause some weirdness that we haven’t covered here.

    Finally, we’d be remiss to if we didn’t mention that because this is a tool that’s only used in non-production environments, it doesn’t have much in the way of test coverage yet. So feel free to use and adapt the code as you see fit, but do so with caution.

     
  1. kano89 reblogged this from onefinestaytech
  2. pajju likes this
  3. onefinestaytech posted this