Deck Shuffle in O(1) space and time

Intro

The other day, a long time friend of mine who lives far away was back for a couple of weeks during summer. He invited me and some other friends to celebrate his birthday.

We had a real blast, it was like the old days.

Among many things we did that night, we also played a card game called Dixit.

For those of you that have never heard of it, it's a very simple yet fun game, here's the gist of it:

- There's a deck of cards, each one has a unique illustration representing something real or abstract - Each player is dealt 7 cards - Each turn one player, the storyteller says a "story" and lays one of his cards on the table, face down - All other players have to choose one of their cards and place it on the table face down as well - After that, the chosen cards are shuffled and then revealed - Players have to guess the storyteller's card based on the story, if they do so they get some points - If a player guesses incorrectly, the point will go to the owner of that card - If everyone guesses the storyteller card correctly, the storyteller doesn't get any points

This means that the story has to be neither too easy nor to complex.

Anyway, we had so much fun playing this game that I decided to code an online version so that we could play it remotely every once in a while. It looked like a fun side project, and it sure is, you'll soon see it live on this website.

I'm sure there already exist similar projects, but I don't care. The fun lies in the fact that, since it's mine, I can bend the rules and, most importantly, I can create some funny cards (maybe more on this in a future post).

Enough with the intro. Here's the interesting part: I would love for my version of the game to be played by many people, not just my friends. But we can't play all at the same time, hence it must support rooms: you create a room, make your friends join it, then start the game.

Problem

I plan to create huge decks, with hundreds, maybe thousands of cards. Also I'll need to shuffle them. Is it efficient to store such a deck, in memory, for each room?

Let's analyze this.

How are we representing cards?

Normal playing cards usually have colors, seeds and numbers. Dixit cards have an image, Yu-Gi-Oh! cards have stars, description, an image, attack and defense points. All card games might have some special card layout, but we are interested in a general solution.

Luckily for us, we can treat any deck of cards just as a list of numbers. If there are n cards in the deck, we will have card 0, 1, 2 and so on and so forth up to card n - 1. If we want to retrieve the special card info back, we just have to map these numbers back to the info.

For example, we might decide that the Ace of hearts is card 0, 2 of hearts is card 1, 3 of hearts is card 2 and so on up to the king of hearts which would be card 13. Then we might say that clubs are next, so ace of clubs is card 13, 2 of clubs is 14, and so on and so forth. I'm sure it would be relatively straightforward to create such kind of mapping algorithm.

Now that we know what we are dealing with (just some numbers), we can continue.

Assuming you'll never have more than a few hundred million cards in a deck, you can store each card as 32 bit number, which gives us plenty of room.

Thus a deck of n cards would take n*4 bytes. E.g. a 52 cards deck would take 208 bytes (manageable), a 10K cards deck would take 40KB ([grunting noise]) a 10M cards deck would take 40MB (sigh).

On the contrary, having just a list of numbers has a couple of advantages: For starters, it's very easy to shuffle the deck:

routine shuffleDeck(deck):
    for i from 0 to n-2:
        pick a random number j betwen i and n-1
        swap item i with item j in the deck

This is called a Fisher-Yates shuffle.

Second, it means we can deal cards very efficiently, we just need to do an array lookup:

routine dealCard(i):
    return deck[i]

Still, we want to do better in terms of space.

What if, instead of storing the entire deck, we just store an id for the deck, and then when we need to deal a card we reconstruct the deck on the fly and deal the card? Is this even possible?

Well, after a bit of research I discovered that it is indeed possible:

Short digression on factorials

In math terms, a deck shuffle is called a permutation. Which is, a way in which we can arrange items in a set.

Mathematicians represent the number of permutations of a set with n elements as n! (red as "n factorial"), which corresponds to n × n - 1 × n - 2 × n - 3 × ... × 1.

To see why this is the case imagine we want to shuffle a deck of n cards: Pick a random card and place it in the first spot - we could have chosen up to n different cards. Now, fill the remaining places by shuffling the remainder of the deck (n-1 cards). How do we do that? Easy: pick a random card, place it in the first spot (which now is actually the second spot of the original deck) - we could have chosen n-1 cards. Now fill the remaining places by shuffling the remainder of the deck (n-2 cards). And so on and so forth up to when you are left with just a deck of 1 card.

Basically, at the first iteration we can pick up to n different cards, at the second, we can pick up to n-2 different cards, then n-3 and so on. Thus leaving us with n × n - 1 × n - 2 × n - 3 × ... × 1 possibilities.

Back to the space problem

Now that factorials are clear, let's get back to our optimization problem.

If a deck has up to n! permutations, can't we just represent a particular shuffle with a number from 0 to n! -1 and then reconstruct the deck on the fly when needed? Yes we can! But... there's a big but. Factorials grow extremely fast:

 10!: 3628800
 100!: 93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000
 1000!: 402387260077093773543702433923003985719374864210714632543799910429938512398629020592044208486969404800479988610197196058631666872994808558901323829669944590997424504087073759918823627727188732519779505950995276120874975462497043601418278094646496291056393887437886487337119181045825783647849977012476632889835955735432513185323958463075557409114262417474349347553428646576611667797396668820291207379143853719588249808126867838374559731746136085379534524221586593201928090878297308431392844403281231558611036976801357304216168747609675871348312025478589320767169132448426236131412508780208000261683151027341827977704784635868170164365024153691398281264810213092761244896359928705114964975419909342221566832572080821333186116811553615836546984046708975602900950537616475847728421889679646244945160765353408198901385442487984959953319101723355556602139450399736280750137837615307127761926849034352625200015888535147331611702103968175921510907788019393178114194545257223865541461062892187960223838971476088506276862967146674697562911234082439208160153780889893964518263243671616762179168909779911903754031274622289988005195444414282012187361745992642956581746628302955570299024324153181617210465832036786906117260158783520751516284225540265170483304226143974286933061690897968482590125458327168226458066526769958652682272807075781391858178889652208164348344825993266043367660176999612831860788386150279465955131156552036093988180612138558600301435694527224206344631797460594682573103790084024432438465657245014402821885252470935190620929023136493273497565513958720559654228749774011413346962715422845862377387538230483865688976461927383814900140767310446640259899490222221765904339901886018566526485061799702356193897017860040811889729918311021171229845901641921068884387121855646124960798722908519296819372388642614839657382291123125024186649353143970137428531926649875337218940694281434118520158014123344828015051399694290153483077644569099073152433278288269864602789864321139083506217095002597389863554277196742822248757586765752344220207573630569498825087968928162753848863396909959826280956121450994871701244516461260379029309120889086942028510640182154399457156805941872748998094254742173582401063677404595741785160829230135358081840096996372524230560855903700624271243416909004153690105933983835777939410970027753472000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

So we might need a good amount of bits to be able to represent a given permutation, ceil(log2(n!)) to be exact, which is superlinear. We have made things worse.

But let's indulge for a moment. We still need a way to reconstruct the deck back. There actually is an algorithm called Lehmer code that does that using factoriadic numbers. I won't go into much details here because I don't really understand the process well, and I encourage you to go and look for yourself, but the gist of it is that this algorithm is able to take a permutation of a deck (with 0 to n-1 cards) and transform it into a factoriadic number (a numbering system where numbers instead of being based on powers of 2 (like binary numbers) or powers of 10 (like decimal numbers) are actually based on factorials), i.e. an alternative representation, and then reconstruct the original deck back from this factoriadic number.

The problem however is that this still requires O(n) space, an improvement over O(log2(n!)), but still not better than the simple list representation. Moreover it also require O(nlog(n)) time complexity.

So we are back to the start.

Introducing Feistel Cipher

After a little bit of more research, I discovered the Feistel cipher. This is a simple yet effective algorithm used as a building block in many other encryption algorithms. It's purpose is to scramble bits given as input such that, if you are only given the output, you can't reconstruct the original input if you don't posses a secret key.

It sort of works like this:

- Split the input into two halves, L and R
- Repeat for as many turns as you want:
    - Compute F(Ki and R)
    - Xor the result into L
    - L becomes the new R, R becomes the new L
- Merge the two halves back and return the result

Where F is called "the round function", which can be whatever you want as long as it satisfies some conditions, and "Ki" is the turn key.

We are interested in this algorithm because it has a bunch of interesting properties, as per Wikipedia:

...if the round function is a cryptographically secure pseudorandom function, with Ki used as the seed, then 3 rounds are sufficient to make the block cipher a pseudorandom permutation, while 4 rounds are sufficient to make it a "strong" pseudorandom permutation (which means that it remains pseudorandom even to an adversary who gets oracle access to its inverse permutation).

Ah! Translated, this means that if we manage to implement this algorithm, we can:

  • represent our deck shuffle using the set of keys for each round
  • then we can just input a deck position, like card 0, card 1 and so on, and the cypher would output the card in that position
  • if we do everything right, even if players see the order in which cards are dealt, they cannot predict what the next card will be.

Yes! Like magic!

The block cipher being a pseudorandom permutation means that if we input all 32 bit numbers from 0 to the maximum one after the other, we get back all 32 bit numbers, but shuffled, depending on the keys, without repetitions.

If we are smart about how we design F and how big keys are, we can get a very nice result.

Introducing the ChaCha algorithm family

Finally something I'm already familiar with!

Before starting this project I didn't know anything about Fisher-Yates shuffle, Lehmer codes, factoriadic numbers and Feistel Ciphers. It's been a fun ride so far. Now it's time for me to combine this new knowledge with something already cemented in me. Hold on with me for the last piece of the puzzle.

ChaCha is a family of cryptographic algorithms, there are many variants which are usually indicated with a number which dictates how many internal rounds the algorithm performs (similar to Feistel' rounds). E.g. we have ChaCha8, ChaCha12 ChaCha20 (there also are some other variants that depend on the size of the nonce and other stuff, but we don't really care at the moment).

ChaChas are stream cyphers, they can be used to encrypt and decrypt a stream of data, i.e. they don't need the entire data at once, instead they work with blocks of a certain size, e.g. 64 bytes for ChaCha20. It's basic mode of operation is similar to this:

- Generate a secure random Key (this needs to be private)
- Generate a random Nonce (doesn't need to be private)
- For each chunk i of input data, use the key, the nonce and i (the counter) to generate a corresponding chunk of pseudorandom data
- Xor the data with the generated bytes
- That's it, you've got your cypher text

And to decrypt you simply have to regenerate chunks and xor them back with the cypher to get the plain text.

ChaCha has many properties, and I encourage you to research more if your not familiar with it. We don't actually need it to encrypt data, we are just interested in its key expansion capabilities. In particular, after we have a key and a nonce, we can generate many random blocks of data by just increasing a counter from 0 up to its maximum value.

Yes, you guessed it, we are going to use it to generate round keys.

Actually, even better, we'll use it to do the heavy lifting of our F function. In fact, recall that we need F to be a cryptographically secure pseudorandom function, and ChaCha is just that, however, if we only use it for the round keys, we would still need to merge R into the result somehow, while making sure that it's still cryptographically secure. Not easy.

Instead, we get R, and the current round number, pack their bits into a counter, pass the counter to ChaCha, et voilà, just read back the output and xor it with L.

In other words, the round number becomes the round key, and we're relying on ChaCha to get cryptographically secure pseudo random data.

I'm sure there might be other cryptographic pseudo random number generators, but I chose ChaCha because I'm already familiar with it and it felt natural, moreover, it's very fast.

A few consideration

Time and Space Complexity

I don't care about generating the whole deck, I just care about the complexity of dealing one card:

Space: 
we just need to store the chacha key and nonce, which are fixed size, hence O(1) space regardless of deck size

Time: 
to draw a card we simply need to run the Feistel cypher wich runs ChaCha for a constant number of rounds. ChaCha is very fast, and also constant time (yes, even for high counter numbers, still constant time), hence O(1) time regardless of deck size.

Entropy

Can we generate all possible shuffled decks of size n this way? No!

We saw how incredibly big do factorials get. On the contrary, if we use ChaCha20 (which is the strongest one) we have a 256bit long key. If we also consider the nonce length (96 bits), we get a total of 352 bits of entropy, or this many different key streams / shuffles

9173994463960286046443283581208347763186259956673124494950355357547691504353939232280074212440502746218496

Very huge numbers, yet a speckle of dust compared to 1000!. But still huge numbers, much bigger than the number of atoms in the observable universe. So I reckon 352 bits of entropy are good enough for my toy project.

I guess the only way to make a shuffler that could potentially create all possible permutations of a deck would be to use the Fisher-Yates in combination with a true random source.

Technical implementation details

  1. I chose ChaCha20 because it's the strongest
  2. I implemented Feistel with 32bit numbers in mind, plenty enough to represent decks with billions of cards.
  3. What if you want a deck size that is not a power of two? When you permute a card position you risk getting a value outside the deck. Luckily, if that happens, just reapply the process recursively until you land back again inside the deck.
  4. Feistel cyphers are a bit spoiled. They need to be balanced i.e. they need to work on an even number of bits so that it can be split in two equal parts. My understanding is that if you have an umbalanced cypher the permutation property doesn't old anymore. So what I do to set the deck bit size to the smallest possibile even number of bits, e.g. a deck of 120 cards needs 8 bits (4 for L and 4 for R), not 7 even though you can represent numbers up to 127 with 7 bits, otherwsier L and R are imbalanced.
  5. I chose 8 Feistel rounds, why? Need at least 4 and 8 doesn't hurt

Here's a NodeJS implementation

This implementation can deal hundreds of thousands of cards per second on my Macbook Air M1 laptop.

import crypto from "node:crypto"
import { stream } from "@stablelib/chacha"

export class ChaCha20PRNG {
    key: Buffer<ArrayBufferLike>
    nonce: Buffer<ArrayBufferLike>
    static BLOCK_LENGTH = 64
    static KEY_LENGTH = 32
    static NONCE_START_PADDING = 4
    static NONCE_LENGTH = 4 + 12

    constructor(key?: Buffer<ArrayBufferLike>, nonce?: Buffer<ArrayBufferLike>) {
        this.key = key ?? crypto.randomBytes(ChaCha20PRNG.KEY_LENGTH);
        // first 4 bytes for counter, then 96-bit nonce
        this.nonce = nonce ?? crypto.randomBytes(ChaCha20PRNG.NONCE_LENGTH);
        this.nonce.writeUInt32LE(0, 0)
    }

    randomChunk(ctr: number, dst: Uint8Array) {
        // the stream function expects the ctr to be written in the 
        // first 4 bytes of the nonce
        this.nonce.writeUInt32LE(ctr, 0)
        stream(this.key, this.nonce, dst, ChaCha20PRNG.NONCE_START_PADDING)
        return this.nonce.readUInt32LE(0)
    }
}

export class FeistelNetworkPermutator {
    deckSize: number
    rounds: number
    blockBits: number

    leftBits: number
    rightBits: number
    leftMask: number
    rightMask: number
    chacha: ChaCha20PRNG
    chachaOut: Buffer
    lastChachaBlockCtr: number
    rightHalfMask: number
    leftHalfMask: number
    constructor(chacha: ChaCha20PRNG, deckSize: number, rounds = 8) {
        this.chacha = chacha;
        this.deckSize = deckSize;

        this.rounds = rounds;

        this.blockBits = Math.ceil(Math.log2(deckSize));
        // if we don't use an even power of two, we don't get any guarantees
        if (this.blockBits % 2 != 0) {
            this.blockBits += 1
        }

        this.rightBits = Math.floor(this.blockBits / 2);
        this.leftBits = this.blockBits - this.rightBits;

        this.rightHalfMask = (1 << this.rightBits) - 1;
        this.leftHalfMask = (1 << this.leftBits) - 1;

        this.rightMask = this.rightHalfMask;
        this.leftMask = this.leftHalfMask << this.rightBits;


        this.chachaOut = Buffer.alloc(ChaCha20PRNG.BLOCK_LENGTH, 0);
        this.lastChachaBlockCtr = -1;
    }

    // Split card index into left/right halves
    private split(cardIndex: number) {
        const left = (cardIndex & this.leftMask) >>> this.rightBits;
        const right = cardIndex & this.rightMask;
        return { left, right };
    }

    // Combine left/right halves back to card index
    private combine(left: number, right: number) {
        return ((left << this.rightBits) & this.leftMask) | (right & this.rightMask);
    }

    // F function
    roundFunction(right: number, round: number): number {

        // Use (right, round) as the counter
        const ctr = (round << 20) | right; // pack round+right (ensure no overlap with right bits)
        if (ctr !== this.lastChachaBlockCtr) {
            this.chacha.randomChunk(ctr >>> 0, this.chachaOut);
            this.lastChachaBlockCtr = ctr;
        }
        const x = this.chachaOut.readUInt32LE(0); // any 32 bit word is fine
        return x & this.leftHalfMask;
    }

    // Feistel algorithm
    permuteCard(cardIndex: number): number {
        let { left, right } = this.split(cardIndex);

        for (let round = 0; round < this.rounds; round++) {
            const f = this.roundFunction(right, round);
            const newLeft = right & this.rightHalfMask;
            const newRight = (left ^ f) & this.leftHalfMask;
            left = newLeft;
            right = newRight;
        }

        const result = this.combine(left, right);
        return result >= this.deckSize ? this.permuteCard(result) : result;
    }
}


export class Deck {
    cardCount: number
    seed: ChaCha20PRNG
    drawed: number
    permutator: FeistelNetworkPermutator
    constructor(cardCount: number, seed?: ChaCha20PRNG) {
        this.cardCount = cardCount
        this.seed = seed ?? new ChaCha20PRNG()
        this.drawed = 0
        this.permutator = new FeistelNetworkPermutator(this.seed, cardCount, 8)
    }

    remainingCards() {
        return Math.max(this.cardCount - this.drawed, 0)
    }

    draw() {
        const x = this.permutator.permuteCard(this.drawed)
        this.drawed += 1
        return x
    }

    drawMany(n: number) {
        return new Array(n).fill(0).map(_ => this.draw())
    }

    // test if we can draw the entire deck without card duplicates
    testDeck() {
        let arr = new Array(this.cardCount).fill(-1)

        for (let i = 0; i < arr.length; i++) {
            const card = this.permutator.permuteCard(i);
            if (arr[i] != -1) {
                return false
            }
            arr[i] = card
        }

        arr.sort((a, b) => a - b)
        for (let i = 0; i < arr.length - 1; i++) {
            if (arr[0] - arr[1] !== -1) {
                return false
            }
        }

        if (arr[0] != 0 || arr[arr.length - 1] != arr.length - 1) {
            return false
        }

        return true
    }
}

And test it with this

import { Deck } from "./decks";

const d = new Deck(1_000_000);

if (!d.testDeck()) {
    throw new Error("Invalid deck")
}


const start = Date.now();
while(d.remainingCards() > 0) {
    d.draw()
}
const elapsed = Date.now() - start;
const speed = d.cardCount / elapsed * 1000
console.log(`Drawed ${speed.toFixed(0)} cards/s`)

Output:

Drawed 383730 cards/s

Was all this trouble worth it for a simple toy project that nobody will use anyway?

Of course not, but then we wouldn't have had all this fun.