Like most of our SaaS clients, SIM was already doing well long before they met us, but that was kind of the problem. They were hosting their applications on Amazon LightSail servers and managed databases – which in our humble opinion is a great way to start. But word soon spread about the awesome quizzes and games SIM produced. SIM’s lead developer, Ian, started to harbour concerns about the scalability of their platform. That’s when he got in touch with us.
Like most good engagements, we started with a really informal chin wag about what Ian thought their problems were, and what a good outcome for SIM would look like. And like most developers, he didn’t want to be wrapped up in layers of AWS abstraction which he wouldn’t have enough domain knowledge to be able to support their app anymore. Whatever we built needed to be robust and scalable, but it also needed to be familiar and developer-friendly.
So, we had our brief: scalable, but nothing wild! EC2 instances and RDS (Relational Database Service) it is.
We set about migrating their LightSail instances over to EC2 (Elastic Compute Cloud), and stood a pair of them up behind an Elastic Load Balancer. We didn’t need to do too much customisation to the server image, but one difference was going to be issuing certificates via Amazon Certificate Manager, and pinning them to HTTPS listeners on the load balancer. Certbot running on the EC2 instance wouldn’t be able to issue new certificates anymore, since the servers would be protected from direct Internet access in private subnets. This was a nice little security benefit which comes as a side-effect of the scalability work.
Next, we had to make the scaling happen automatically. For that, we need an Autoscaling Group, and a Launch Template. A launch template is exactly what it sounds like – it’s a template that tells AWS what size server you need, where it should be on the network, and what commands to issue to it in order to bring it up and online. For our launch template, we needed to make sure that if the autoscaling group called for a server to be added to the group, because lots of new users suddenly accessed the platform, those new servers would be running the latest and greatest code. So that means you can’t just take a snapshot of the server, and use that.
We built a script which securely connected to SIM’s source code repositories, downloaded the code which runs their apps, and ran all the build scripts to get things going. Part of this involves an environment file, which tells the app things like where the database is, and how to connect. That’s one example of what we call a “secret”. It’s something you’d be upset about an attacker getting their hands on. AWS Secrets Manager to the rescue!
Okay, so now we had the app up and running, a little more secure than it was, and much more scalable. But we had 2 problems left; one to do with user uploads, and one to do with timed jobs which lots of applications have set up in the background.
Kwizly allows its users to upload images to their profiles, and as part of the quizzes they create with the app. But when your app is running with an Autoscaling Group, it must be stateless. When we say stateless, we mean that it shouldn’t matter which server answers the call from a user. So let’s say you log into Kwizly right at the time when other users have finished using it. The way we built the platform means that EC2 instances will automatically be retired when demand slows down, which saves SIM money. But what if your app experience was being managed by the server which we’re about to retire? Well in this case, the Load Balancer moves you over to one of the other servers which is staying around. That means that all servers need to have access to all user uploads, all the time – just in case they’re asked to take over.
Would you believe, AWS have a solution for this too. It’s called Elastic File Service, or EFS. EFS just presents a pool of storage, and we connect it up to the EC2 instances. Tell the code running on the application server to store uploads in EFS instead of on the server, and bingo bango bongo, you’ve got yourself a stateless application server… well, nearly.
Remember I said servers sometimes have tasks which run periodically? They can be things like clearing temporary data (if you’ve ever heard of a cache, that’s all it is), it could be doing some hard work ahead of time so that when a user presses a button in the app, they see instant results, or anything else really. But sometimes it’s very important to know that a job is being run exactly once. So if you have 8 servers running at a really busy time, you wouldn’t want all 8 to decide: ooh, it’s 8 o clock, better go and have a tidy around! The way this works in the real world is that you would put a padlock around the big power button of a dangerous machine you’re about to work on, and only you have the key. If someone else comes along and wants to use the machine, they physically can’t press the button, until you come back and unlock it. That’s what we do for servers too!
Elasticache is AWS’s managed Redis service, and it’s perfect for this kind of locking – as well as other things the app does in the background. Again Elasticache is connected to the servers via the Launch Template (so all future EC2 instances get connected too), you tell the code running on the servers about it, and off you go!
After some testing by Ian, we were ready to book the go-live! Testament to the thorough testing Ian and his team conducted, the go live was a success, and now Kwizly is ready for everyone who wants a go!
Like all software projects, things change, and we can always make improvements. We have already successfully engaged with Ian and the SIM team on other projects, but that’s a story for another time.