Introduction

So the input to the program is a landscape video (16:9 aspect ratio) of league of legends gameplay, preferably under a minute due to AWS lambda memory 3008 MB contraint. The output is a vertical form video (9:16 aspect ratio).

Example:

Default Center Crop

League of Lenses Output

Try It↗

Motivation

I had lots of clips saved up and wanted to upload them as YouTube shorts but realized that converting from landscape to vertical form led to some information loss, partically across the X-axis. The camera (player fov) isn’t always centered on the player or where the action is. But the watcher is really only interested in the action, and the clip maker (me) is also really interested in showing the meat or action part of the gameplay at every frame.

Solution

To tackle the challenge of focusing the camera where the enemies are, one obvious but computationally expensive solution would be to train a computer vision model to detect League of Legends characters on screen and output their locations. Perhaps an auto-regressive encoder-decoder architecture could handle variable output lengths. However, this approach uses far too much compute for such a simple problem.

Instead of tracking enemies on the player screen and finding a bounding box that includes most enemies while respecting vertical format constraints, we can simplify the problem by analyzing the minimap and reprojecting that information back to the player screen dimensions. On the minimap, we can determine both what the player is currently seeing (through contour detection and thresholding) and where enemies are located (through hand-crafted blob detection) relative to the current screen. This turns out to be all we really need.

Step 1: Locate the Minimap

We only need to find the minimap location once, using the first frame(s) of gameplay. This is done using SIFT (Scale-Invariant Feature Transform), which is ideal because the minimap can vary in size across different resolutions. To handle different map appearances, I only use the top-left corner of the map for SIFT matching, as that area remains consistent across all map skins. Since the minimap is always in the bottom-right corner, once the top-left corner is found, the minimap area is simply

minimap_area = frame[minimap_top_left_y:, minimap_top_left_x:]

All subsequent processing is cropped to operate only on this minimap region.

Step 2: Find the Player’s Field of View

Next, we need to find the area of the minimap that represents what the player currently sees, since this is the domain we’ll reproject our bounding box back to. This is done by thresholding the minimap and finding the white rectangle that indicates the player’s FOV. Initially, I restricted contours to those with exactly 4 corners (rectangular shapes), but champion icons overlapping the white rectangle made this approach unreliable. Instead, gating by the contour’s area using prior knowledge about expected FOV size proved more effective.

Step 3: Detect Enemies

We look for red circles (or orange in colorblind mode, though this isn’t fully supported yet) within the player FOV on the minimap. We find the centers of these circles and take the average of their X positions. We don’t need to consider Y positions because the vertical format includes all vertical content once the horizontal bounds are set.

Step 4: Update Camera Pan

We update the pan based on the last camera starting location, accounting for the minimap-to-screen ratio and applying smoothing to ensure natural movement. We then reproject this position back to frame dimensions (1920x1080) to determine the next camera starting location. Each frame uses this starting location until a set width is reached.

This width is computed from the original frame size. If the original landscape frame is 1920x1080, the vertical frame becomes 1080x1920. Since the Y dimension grows by almost double, the visible width is reduced to approximately 500 pixels (calculated precisely in the application). This Y-axis expansion creates a zoom-in effect, focusing on the action.

Improvements

Frame sampling: You don’t need to process every frame; checking every X frames is sufficient.
Look-ahead: Since the camera pans gradually, you want to look ahead Y frames to anticipate future action and pan the camera proactively.

AWS SAM to deploy application. This involved using ECR to store container images but only recently did I findout once a lambda function utilizes the container image, you are safe to delete it. This reduced my monthly cost from 10 cents to 5 cents!
AWS S3 to store user uploaded files and processed files. These are automatically cleaned up in 3 days.
AWS SQS for lambda functions to notify when they are done. And used by clients to poll for progress.