Database 101: How to Model Leaderboards for 1M Player's Game.
Daniel Reis

Daniel Reis @danielhe4rt

About: A Developer who likes to teach.

Location:
São Paulo
Joined:
Sep 13, 2020

Database 101: How to Model Leaderboards for 1M Player's Game.

Publish Date: Jan 29 '24
449 41

Ever wondered how a game like League of Legends, Fortnite, or even Rockband models its leaderboards? In this article, we'll find out how to properly model a schema to handle them in a monstrously performative way!

If you’re just getting started with databases in general or databases, you might want to start off by reading my initial post, Database 101: Data Consistency for Beginners for Beginners. That article captures my own exploration of how many database paradigms exist as I look far beyond my previous experience with just SQL and MySQL. I’m keeping track of my studies in this Database 101 series.

It has been almost a year since I published my first article in this series! Thank you for being here with me as I learn this subject. Your comments and thoughts are always very helpful!

1. Prologue

YARG Game Playthrough ScreenShot

Ever since I was a kid, like most average developers, I've been fascinated by games and how they're made. Speaking of which, let me introduce you to my favorite childhood game: Guitar Hero 3: Legends of Rock.

Well, more than a decade later, I decided to try to contribute with some games in the open source environment, like rust-ro (Rust Ragnarok Emulator) and also the main character of this article: the YARG (Yet Another Rhythm Game).

YARG is literally another rhythm game, but the difference about this project is that it is completely open source and they united legendary contributors in game development and also design to make this project works.

Suddenly the game was being picked up and played mostly by Guitar Hero/Rockband streamers on Twitch, and I thought: well, it's an open source project, so maybe I can use my database skills for something like creating a monstrously fast leaderboard or store the past games.

It started as a simple chat on their Discord, which turned into a long discussion about how to make this project grow faster.

Then I decided to talk to my boss and ask him if I could work with the YARG guys and the condition was to create something cool enough to implement ScyllaDB (NoSQL Wide-column Database) since I'm working as a Developer Advocate there. You won't believe how the simplicity and scalability brought by ScyllaDB perfectly fit the needs of YARG.in!

Anyway, talk is cheap. Let me show you some code and concepts!

2. QDD - Query Driven Data Modeling

NoSQL vs Relational

When we're talking about develop with NoSQL, mostly we should understand that depending on the paradigm (document, graph, wide-column etc) you should first understand which query you want to run.

While in MySQL the main goal is to understand the consistency, in Scylla you should focus on the query and create your schema based on that query.

At this project, we will handle two types of paradigm, which are:

  • Key-Value
  • Wide Column (Clusterization)

Now let's talk about the queries/features of our modeling.

2.1 Feature: Storing the matches

Submission Details YARG

Every time that you finish a YARG gameplay, the most interest thing would be submitting your scores together with many other in-game metrics.

Basically it will be a single query based in a main index and that's all.

SELECT score, stars, missed_notes, instrument, ...  
FROM leaderboard.submisisons  
WHERE submission_id = 'some-uuid-here-omg'
Enter fullscreen mode Exit fullscreen mode

2.2 Feature: Leaderboard

Leaderboard Figma File

And now our main goal: a super cool leaderboard that you don't need to care about after a good data modeling. The leaderboard is per song, so every time you play a specific song, your best score will be saved and ranked.

However there's a big point on this interface, which is having filters to know exactly "which" leaderboard to bring:

  • song_id: required
  • instrument: required
  • modifiers: required
  • difficulty: required
  • player_id: optional
  • score: optional

So imagine our query looks like this, and it returns the results sorted by score in descending order:

SELECT 
    player_id, score, ...
FROM 
    leaderboard.song_leaderboard 
WHERE 
    instrument = 'guitar' AND
    difficulty = 'expert' AND
    modifiers = {'none'} AND
    track_id = 'dani-california' 
LIMIT 
    100;

--   player_id  | score
----------------+-------
--        tzach | 12000
--  danielhe4rt | 10000
--     kadoodle |  9999
----------------+-------
Enter fullscreen mode Exit fullscreen mode

Now we know the features that will be worked here, but can you already imagine how will be the final schema?

No? Ok, lemme help you on that!

3. Data Modeling time!

Time to take a deep dive into data modeling with ScyllaDB and better understand how to scale it.

3.1 - Matches Modeling

End Game Screen

First, let us understand a little more about the game itself:

  • It's a rhythm game;
  • You play a certain song at a time;
  • You can activate "modifiers" to make your life easier or harder before the game;
  • You have to choose an instrument (e.g. guitar, drums, bass and microphone).
  • Every aspect of the gameplay is tracked, such as:
    • Score;
    • Missed notes;
    • Overdrive count;
    • Play speed (1.5x ~ 1.0x);
    • Date/time of gameplay;
    • And other cool stuff.

Thinking on that, we can easily start our data modeling, which will turn in something like this:

CREATE TABLE IF NOT EXISTS leaderboard.submissions (
    submission_id uuid,
    track_id text,
    player_id text,
    modifiers frozen<set<text>>,
    score int,
    difficulty text,
    instrument text,
    stars int,
    accuracy_percentage float,
    missed_count int,
    ghost_notes_count int,
    max_combo_count int,
    overdrive_count int,
    speed int,
    played_at timestamp,
    PRIMARY KEY (submission_id, played_at)
);
Enter fullscreen mode Exit fullscreen mode

Let's skip all the int/text values and jump to the set<text>.

The set type allows you to store a list of items of a particular type. I decided to use this list to store the modifiers because it's a perfect fit. Look at how the queries are executed:

INSERT INTO leaderboard.submissions (
    submission_id, 
    track_id,
    modifiers, 
    played_at
) VALUES (
    some-cool-uuid-here,
    'starlight-muse'
    {'all-taps', 'hell-mode', 'no-hopos'},
    '2024-01-01 00:00:00'
);
Enter fullscreen mode Exit fullscreen mode

With this type, you can easily store a list of items to retrieve later.

Another cool piece of information is that this query is a key-value like! What does that mean?

Since you will always query it by the submission_id only, it can be categorized as a key-value.

3.2 Leaderboard Modeling

Leaderboard filters Figma

This is the part of the article where you will learn some cool wide-column database concepts.

On our leaderboard query, as mentioned earlier, we will always need some dynamic values in the WHERE clauses, which means that these values will belong to the Partition Key while the Clustering Keys will have value that can be "optional".

A partition key is a hash based on a combination of fields that you added to identify a value. Did you get it? No? Well, it took me a while to understand that too, but let me show you something:

Let's imagine that you played Starlight - Muse 100x times. If you would query this information, would be 100x different results differentiated by Clustering Keys like score or player_id.

SELECT 
    player_id, score ---
FROM 
    leaderboard.song_leaderboard 
WHERE 
    track_id = 'starlight-muse' 
LIMIT 
    100;
Enter fullscreen mode Exit fullscreen mode

If 1.000.000 players play this song, your query will become slow and it will become a problem in the future, because your partition key consists of only one field, which is track_id.

However, if you add more fields to your Partition Key, like mandatory things before playing the game, maybe we can shrink these possibilities for a faster query. Now you see the big picture? Adding the fields like: Instrument, Difficulty and Modifiers will give you a way to split the information about that specific track evenly.

Let's imagine with some simple numbers:


-- Query Partition ID: '1'
SELECT 
    player_id, score, ...
FROM 
    leaderboard.song_leaderboard 
WHERE 
    instrument = 'guitar' AND
    difficulty = 'expert' AND 
    modifiers = {'none'} AND -- Modifiers Changed
    track_id = 'starlight-muse' 
LIMIT 
    100;

-- Query Partition ID: '2'
SELECT 
    player_id, score, ...
FROM 
    leaderboard.song_leaderboard 
WHERE 
    instrument = 'guitar' AND
    difficulty = 'expert' AND
    modifiers = {'all-hopos'} AND -- Modifiers Changed
    track_id = 'starlight-muse' 
LIMIT 
    100;
Enter fullscreen mode Exit fullscreen mode

So, if you build the query in a specific shape it will always look for a specific token and retrieve the data based on these specifics Partition Keys.

Let's take a look at the final modeling and talk about the clustering keys and the application layer:

CREATE TABLE IF NOT EXISTS leaderboard.song_leaderboard (
    submission_id uuid,
    track_id text,
    player_id text,
    modifiers frozen<set<text>>,
    score int,
    difficulty text,
    instrument text,
    stars int,
    accuracy_percentage float,
    missed_count int,
    ghost_notes_count int,
    max_combo_count int,
    overdrive_count int,
    speed int,
    played_at timestamp,
    PRIMARY KEY ((track_id, modifiers, difficulty, instrument), score, player_id)
) WITH CLUSTERING ORDER BY (score DESC, player_id ASC);
Enter fullscreen mode Exit fullscreen mode

The partition key was defined as mentioned above, consisting of our REQUIRED PARAMETERS such as: track_id, modifiers, difficulty and instrument. And on the Clustering Keys we added score and player_id.

Note that by default the clustering fields are ordered by score DESC and just in case a player has the same score, the criteria to choose the winner will be alphabetical ¯\(ツ)/¯.

First is good to understand that we will have only ONE SCORE PER PLAYER, but with this modeling like that if the player goes through the same track twice in different scores, it will generate two different entries.

INSERT INTO leaderboard.song_leaderboard  (
    track_id, 
    player_id,
    modifiers,
    score,
    difficulty,
    instrument,
    stars, 
    played_at
) VALUES (
    'starlight-muse',
    'daniel-reis', 
    {'none'}, 
    133700, 
    'expert', 
    'guitar', 
    '2023-11-23 00:00:00'
);

INSERT INTO leaderboard.song_leaderboard (
    track_id,
    player_id,
    modifiers,
    score,
    difficulty,
    instrument,
    stars, 
    played_at
) VALUES (
    'starlight-muse',
    'daniel-reis', 
    {'none'}, 
    123700, 
    'expert', 
    'guitar', 
    '2023-11-23 00:00:00'
);


SELECT 
    player_id, score
FROM 
    leaderboard.song_leaderboard 
WHERE 
    instrument = 'guitar' AND
    difficulty = 'expert' AND
    modifiers = {'none'} AND
    track_id = 'starlight-muse' 
LIMIT 
    2;

--   player_id  | score
----------------+-------
--  daniel-reis | 133700
--  daniel-reis | 123700
----------------+-------
Enter fullscreen mode Exit fullscreen mode

So how do we fix this problem? Well, it's not a problem per se. It's a feature! LOL

As a developer, you have to create your own business rules based on the project needs, and this is no different. What do I mean by that?

You can run a simple DELETE query before insert the new entry and guarantee that you will not have a specific data from the player_id with less than the new score inside that specific group of partition keys.

-- Before Insert the new Gampleplay

DELETE FROM 
    leaderboard.song_leaderboard 
WHERE 
    instrument = 'guitar' AND
    difficulty = 'expert' AND
    modifiers = {'none'} AND
    track_id = 'starlight-muse' AND
    player_id = 'daniel-reis' AND
    score <= 'your-new-score-here';

-- Now you can insert the new payload...
Enter fullscreen mode Exit fullscreen mode

And with that we finished our simple leaderboard system, the same one that runs in YARG and can also be used in games with MILLIONS of entries per second :D

4. How to Contribute to YARG

Here's the part of the text that I'll invite you to contribute on this wonderful open source project!

Today we're building a brand new platform for all the players using:

and we will need many developers and testers as possible to discuss future implementations of the game together with the main contributors!

YARG Discord

First, make sure to join their Discord Community. There is the place where all the technical discussions happen with the back of the community before going to the development board.

Also, outside of Discord, the YARG community is mostly focused on the EliteAsian (core contributor and project owner) Twitter account for development showcases. Be sure to follow him there as well.

And FYI, the Lead Artist of the game, aka Kadu is also a Broadcast Specialist and Product Innovation Developer at Elgato that worked with streamers like:

  • Ninja
  • Nadeshot
  • StoneMountain64
  • and the legendary DJ Marshmello.

Kadu also use his twitter to share some insights and early previews of new features and experimentations for YARG. So, don't forget to follow him on Twitter as well!

Here is some useful links to know more about the project:

Fun fact: YARG got noticed by Brian Bright, project lead on Guitar Hero, who liked the fact that the project was open source. Awesome, right?

5. Conclusion

Data modeling is sometimes challenging, and this study took 3 months of many new ScyllaDB concepts and a lot of testing together with my community at Twitch.

I have also published a Gaming Leaderboard Demo, where you can get some insights on how to implement the same project using NextJS and ScyllaDB!

Also, if you liked ScyllaDB and want to learn more about, I strongly suggest you to watch our free Masterclass Courses or vising ScyllaDB University!

Don't forget to like this post, follow me on the socials and fill your water bottle xD

See you in the next article!

Follow me on Twitter
Follow me on Github
Follow me on Github
Follow and Subscribe at my Twitch Channel

Comments 41 total

  • Lucas Porfirio
    Lucas PorfirioJan 29, 2024

    Great article as always Dan!

  • Nicolas Battisti
    Nicolas BattistiJan 29, 2024

    Awesome post :D thanks for sharing it.

  • Nadachi
    NadachiJan 29, 2024

    Nice post

  • Anthony Vinicius
    Anthony ViniciusJan 29, 2024

    Wow Daniel, stop blowing my mind 🤯

  • Henrique Benjamim
    Henrique BenjamimJan 29, 2024

    Amazing!

  • thalesmengue
    thalesmengueJan 29, 2024

    My mind exploded several times while reading your content.

    Your writing is so well structured, congratulations Daniel, your article is great, I appreciate that you shared this content!

  • Vinicius Prado
    Vinicius PradoJan 29, 2024

    Great article Dan!

  • gui martins
    gui martinsJan 29, 2024

    irado

  • Cherry Ramatis
    Cherry RamatisJan 29, 2024

    My knowledge of databases is really beginner, but your point of view really opened my eyes, especially with this "narrowing the results of a query to get fewer and more specific results."

    I'm learning a lot from this database series. Thanks! ❤️❤️

  • Renato Teixeira
    Renato TeixeiraJan 29, 2024

    this is QUALITY!

  • Mario Bezerra
    Mario BezerraJan 29, 2024

    Thanks for sharing bro! Nice article!

  • Henrique Leme de Brito
    Henrique Leme de BritoJan 29, 2024

    Amazing explanation!!

  • Priscila Cabral
    Priscila CabralJan 29, 2024

    Great article!

  • Waltenne  Carvalho
    Waltenne CarvalhoJan 29, 2024

    Great article, continue sharing your experience!

  • Gabriela Gamino
    Gabriela GaminoJan 29, 2024

    I loved your article! Keep the good work!

  • Marcos Fernandes
    Marcos FernandesJan 29, 2024

    Very interesting article

  • Sabrina
    SabrinaJan 29, 2024

    Amazing tutorial!

  • VictorAttar
    VictorAttarJan 29, 2024

    Great!!

  • gustavo
    gustavoJan 29, 2024

    amazing

  • Luis Douglas
    Luis DouglasJan 29, 2024

    Great article!

  • Diogo Dantas Moreira
    Diogo Dantas MoreiraJan 29, 2024

    Very interesting!

  • Victoria
    VictoriaJan 29, 2024

    Awesome article 👏👏👏

  • Weslley A. Ferreira
    Weslley A. FerreiraJan 29, 2024

    massa

  • Vinicius Koji Enari
    Vinicius Koji EnariJan 29, 2024

    Nice!

  • Douglas Araujo
    Douglas AraujoJan 29, 2024

    Top!

  • Giovanna Moeller
    Giovanna MoellerJan 29, 2024

    Muito bom

  • Gabriel do Carmo Vieira
    Gabriel do Carmo VieiraJan 29, 2024

    I liked the article, then I'll have a look at the demo project.
    It looks nice.

  • EvandroItalo
    EvandroItaloJan 29, 2024

    Great article!

  • Elves Santos
    Elves SantosJan 30, 2024

    Very nice article!!

  • allesarfint
    allesarfintFeb 1, 2024

    Ok, I'm confused. Why is ScyllaDB refered as a nonSQL database yet every code example in the post was in SQL?

    • Daniel Reis
      Daniel ReisFeb 1, 2024

      ScyllaDB runs on top of CQL (Cassandra Query Language), which has a syntax SQL-like.

  • Alex Roor
    Alex RoorFeb 1, 2024

    sometimes I think about it, I play video games!
    but I didn’t go further than thoughts, I didn’t read how this happens!
    you did a lot of work to write this article! Thank you. and I closed another question that sometimes popped up in my head

  • mary arias
    mary ariasFeb 2, 2024

    🎁 Unlock Financial Freedom with Cash App Gift Cards 🎁
    📌 Ready to transform your finances? Look no further 📌

    💥 How it Works 💥

    1️⃣ Sing Up: Join the Cash App revolution in seconds-it's fast,easy,and secure.
    2️⃣ Complete Task: Discover exciting offers and complete tasks to earn real cash rewards.
    3️⃣ Redeem: Convert your earnings into a Cash App Gif Card and watch your financial dreams.come true.

    👉 Click Here Get Instant Access Now

  • Flowzai
    FlowzaiFeb 3, 2024

    Very impressive article!

  • dinabondhu
    dinabondhuFeb 4, 2024

    Start Your Trial With the IPVanish VPN App!
    Install the app and start your 7 day trial and select the yearly plan.
    United States
    Apple iOS Mobile Install
    Click hear now: tinyurl.com/kkbedu3w

  • Karishma Shukla
    Karishma ShuklaFeb 4, 2024

    This is awesome. 👏

  • Nayan Ahmed
    Nayan AhmedFeb 4, 2024

    Image description

    At Rally for Border Security in Texas, Fears of ‘Invasion’ and ‘Civil War’

    A conservative convoy gathered on the Texas border to support the state’s defiant stance on immigration. Despite worries over potential violence, the event was peaceful. Read more. toprevenuegate.com/b08cexqubd?key=...

  • Helal uddin
    Helal uddinFeb 4, 2024

    Giveaway iPhone 15 Pro Max Only United Kingdom
    sites.google.com/view/get-your-rew...

  • Renan Vidal Rodrigues
    Renan Vidal RodriguesFeb 7, 2024

    Another incredible article! Congratulations! Brilliance Show!

Add comment