It’s day one at your new gig. You’re hopeful that setting up the development environment won’t take too long. How awesome would it be to get some code out to production on your first day?
You’re lucky if this is your first day at IFTTT. Using our Docker-based development environment, you can get set up in a snap. Having a complex codebase configured and ready to use in a few minutes feels amazing. What if I told you that setting up data in your development environment could be that easy too?
The Data Dilemma
Managing data across environments is a hard task, especially when your product relies on user generated content. As your user base grows and your product becomes more complex you start seeing a variety of new patterns in user data. Replicating these patterns in development becomes harder and harder.
A specific scenario on IFTTT is the addition of new Channels to our Developer Platform. Not every engineer owns all the new cool IoT devices or has signed up to all 235 services on IFTTT.
Here’s where the trouble starts. Engineers will have to find ways to simulate user data into their development environment. Which will probably lead to some well known bugs:
- Some UI elements will start to look weird in development
- Pages will look empty
- Engineers will start overlooking some specific validation scenarios (in forms, URLs, usernames…)
The worst consequence of all of this: engineers will begin to want to test things in production. 😱
Fixing Data in a Sad Rails Development Environment
Your development experience may already be looking pretty 😞 at this point. Good engineers will try to come up with ideas to fix these problems. Here are some of the most common fixes:
- Test things in production (not good)
- Build a staging environment
- Rails database migrations
- Custom CSVs or Google Spreadsheets
The problem with these solutions is that basically none of them actually solve the problem of re-creating real world data in any environment. For example, creating a staging environment is no guarantee that you’ll be generating good user data, it’s just moving the problem to a different layer.
To solve this problem we created Polo.
Sample Database Snapshots
Polo is the tool we use to generate snapshots of our production database and export them to .sql files our developers can import on any environment. Named after Marco Polo, the famous explorer. We use Polo to explore our data model and return a representation of what it finds along the way.
Here are a few examples from the Polo GitHub repository.
Given an ActiveRecord::Base class and a record_id:
Polo will transform that record into a SQL INSERT statement:
The real benefits start to show when you teach your data model to Polo, allowing the library to navigate your database’s dependency tree. Fear not, this is simpler than it sounds.
Given the following data model represented by ActiveRecord::Associations:
You can tell Polo to export an ActiveRecord record’s dependencies:
Or even tell it to load complex ActiveRecord associations:
And get back INSERTs for every object Polo visited on its journey:
As long as you stick to this beautiful API we borrowed from ActiveRecord::Associations::Preloader, you should be able to define complex object associations and get back imports for every record involved.
Polo is Awesome!
We were pleasantly surprised when we discovered that we could to cover a wide range of scenarios and generate a solid database snapshot with just 5 of our most active users as seeds. YMMV of course, but finding the right seeds and dependency graph for Polo shouldn’t be hard.
Our developers are much happier now, and everyone starts with nice looking data in development on their first day at IFTTT. Frontend Engineers can build great user experiences with much more confidence these days, and Backend Engineers find it easier to deal with complex data tasks.
You can read about some more advanced features of Polo, like obfuscation and blacklisting of attributes, on GitHub.
Come join us!