blog/content/posts/lewd-lad-infra.md
shockrah 9e40ea463c + Required meta data for active posts
* Moving drafted never posted stuff to an unused folder
2021-08-14 18:05:10 -07:00

5.3 KiB

title date draft
Lewdlad: The Little Chronjob that could 2021-07-25 false

Lewdlad: the little Chronjob that could

What is "Lewdlad"

Lewdlad is a Discord chat bot I created which (at the time of writing this) is used to orchestrate multiple AWS EC2 servers which run different game servers(Minecraft, Reflex, Hexxit, etc).

Some history

The first version of Lewdlad was a Python script that would literally pick random images from a set of red boards on 4chan and send it to a random person. Eventually I turned it into a bot which basically just sent the random image to whoever invoked the command.

After some time came the Hanime module, which came from a joke of "I wish Lewdlad would send some fire hentai. After some research I realized querying the site wasn't going to be easy since there wasn't a public nice made API. To this end I opened up the site and did some reverse engineering to figure out how to spoof a browser request and get some results. Some reversing and trial-and-error later and I had a working request script which I could hook up to the bot. Slapped it all behind a command with some basic arguments and like .pron was born.

The beginnings of the server manager

Around this time I started hosting a minecraft server for friends to play on. Being hosted on AWS meant the server had burst capacity, which basically means the CPU can boost temporarily to handle harder workloads. This ability is similar to a magic ability in most games, takes up some MP and has to recharge. While idling a vanilla minecraft server sits very close to the burst limit where the CPU starts using its burst capability. To make sure it didn't needlessly burst I made sure that we only started a game server if the bot was asked to; hence why I gave Lewdlad a hook to run a start.sh and stop.sh script for each configured game. This behavior was then put behind a couple of Discord commands.

The architecture thus far ended up looking like this:

Lewdlad/
	<code and things>
Games/
	Minecraft/
		discord-bot.json
		start.sh
		<game files>
	ReflexArena/
		discord-bot.json
		start.sh
		stop.sh
		<game files>
	CS:Source/
		<game files>

The advantage of this structure above is that only directories with that discord-bot.json configuration file would ever be picked up by the bot. It was also really easy to setup since you only needed a start.sh script and optionally a stop.sh script since Lewdlad had its own nuke-all-the-things-function in its own back-end. This meant a minimal configuration could look like

{
	"name": "some server",
	"id": "name of parent directory",
	"script": "start.sh"
	// These below are added by the bot
	"pid": <process id here>,
	"active": true|false,
}

Adding crash safety is trivially easy as recovering is a matter of checking configuration files and determining which are falsely active.

A new Architecture

The new architecture has a few goals in mind:

  • Reduce operating costs

  • Reduce impact of game failure

    • learned the hard way how bad wine would throttle everything
  • More flexibility

With this new architecture:

  • Lewdlad lives on its own virtual private server

  • Each game server is now its own EC2 instance

    • Basically just an EC2 cluster
  • Loggerlad is whatever I decide to use for centralized logging

Pros

  • Only get charged for ec2 instances that are actually live/running

  • Elastic IP's are cheap as hell per month

  • Lots of free logging services that I could technically even host my self.

Cons

  • More complexity, more problems

  • Maintenance requires more planning since there's more moving parts than a single server

  • Bot commands are way slower since because...:

    • Lewdlad has to go through an API Gateway

    • That gateway then has to talk to Lambda

    • Finally lambda has to (probably) go through several internal AWS services to reach the EC2 instances

    • Game services are located literally everywhere(Oregon, L.A., Chicago etc.)

With the tech out of the way here's the real reason why I did this. The advantages of Light sail during the "on-season" periods of high traffic are not worth the cost during the "off-season" when there's no traffic. This new setup allows me to have instances which can be elastically started/stopped based on their usage which in turn reduces monthly cost.

Regular stoppages can also be configured either from Lambda's side or from an EC2 instance, if we don't want to risk going over the 1000 request quota. This should honestly never happen however since that would require somewhere like 30 lambda based commands to be issued everyday for a month straight, which isn't likely given how often commands were issued before the migration. At most I would get something like 10 commands a day but it was never consistent and only happened a few times a month, like on the weekends.

Closing Thoughts and Remarks

There is one thing I didn't mention, which is turning off the server given low CPU usage. This could probably be fixed by just installing a script on each game server to occasionally log CPU time and check if the server has been "dead" for some time, prompting a shutdown to conserve cost.

There's probably more that could be done but after this point for my current scaling any more feels to much like over engineering so I'll basically keep architecture the way it is now.