Lewdlad is a Discord chat bot I created which (at the time of writing this) is used to orchestrate multiple AWS EC2 servers which run different game servers(Minecraft, Reflex, Hexxit, etc).
The first version of Lewdlad was a Python script that would literally pick random images from a set of red boards on 4chan and send it to a random person.
Eventually I turned it into a bot which basically just sent the random image to whoever invoked the command.
Being hosted on AWS meant the server had _burst capacity_, which basically means the CPU can boost temporarily to handle harder workloads.
This ability is similar to a magic ability in most games, takes up some MP and has to recharge.
While idling a vanilla minecraft server sits very close to the _burst limit_ where the CPU starts using its _burst capability_.
To make sure it didn't needlessly burst I put Lewdlad on the same server as the minecraft game files and put a `start-minecraft.sh` script behind a command for the bot to use.
The advantage of this structure above is that only directories with that `discord-bot.json` configuration file would ever be picked up by the bot.
It was also really easy to setup since you only needed a `start.sh` script and optionally a `stop.sh` script since Lewdlad had its own _nuke-all-the-things-function_ in its own back-end.
This meant a minimal configuration could look like
```
{
"name": "some server",
"id": "name of parent directory",
"script": "start.sh"
// These below are added by the bot
"pid": <processidhere>,
"active": true|false,
}
```
Adding crash safety is trivially easy as recovering is a matter of checking configuration files and determining which are _falsely active_.
* Maintenance requires more planning since there's more _moving parts_ than a single server
* Bot commands are way slower since because...:
* Lewdlad has to go through an API Gateway
* That gateway then has to talk to Lambda
* Finally lambda has to (probably) go through several internal AWS services to reach the EC2 instances
* Game services are located literally everywhere(Oregon, L.A., Chicago etc.)
With the tech out of the way here's the _real_ reason why I did this.
The advantages of Light sail during the "on-season" periods of high traffic are not worth the cost during the "off-season" when there's no traffic.
This new setup allows me to have instances which can be elastically started/stopped based on their usage which in turn reduces monthly cost.
Regular stoppages can also be configured either from Lambda's side or from an EC2 instance, if we don't want to risk going over the 1000 request quota.
This should honestly never happen however since that would require somewhere like 30 lambda based commands to be issued _everyday_ for a month straight, which isn't likely given how often commands were issued before the migration.
At most I would get something like 10 commands a day but it was never consistent and only happened a few times a month, like on the weekends.
There is one thing I didn't mention, which is turning off the server given low CPU usage.
This could probably be fixed by just installing a script on each game server to occasionally log CPU time and check if the server has been "dead" for some time, prompting a shutdown to conserve cost.
There's probably more that could be done but after this point for my current scaling any more feels to much like over engineering so I'll basically keep architecture the way it is now.