Have you ever used rand() or Math.Random() or System.Random or java.util.Random? You could do a lot better. Did you know that these use one of the slowest and least random algorithms available? In this article, I provide C/C++, Java, and C# implementations of two fast, high-quality random number generators, the Mersenne Twister and R250/521


I'm always appalled when a new language comes on the scene (like Java or C#) and it provides nothing better than a linear congruential random number generator. Or when applications that need good random numbers (like games) use this same generator. Should we all go back to using punch cards and Fortran, too?

Like most software algorithms, random number generation continues to be researched and improved on. Thanks to the internet, it doesn't take much work to find out about these innovations and stay current with the state-of-the-art.

There are as many different random number generators as there are ways to search a string for a substring. However, two of my favorites are the Mersenne Twister and R250/521. These two random number generators take up very little space, have very long periods and other useful statistical properties, and are extremely fast — much, much faster than the standard random number generator that comes with your platform.

R250/521

First described in 1981, R250 is a relatively old random number generator, but it's still one of the best. It was invented/discovered by Kirkpatrick and Stoll, in the article A Very Fast Shift-Register Sequence Random Number Generator, J. Computational Physics, vol 40, pp. 517-526.. If you're interested in how it works, you can read more here or here (PDF).

R250 is what's known as a generalized feedback shift register, or GFSR. GFSRs are determined by two parameters, a length and an offset. R250 is actually GFSR(250,103), indicating a length of 250 and an offset of 103. R250 has a period of almost 2^250.

I've been pretty satisfied with R250 by itself, but several recent papers have suggested it has some statistical defects and should be combined with GFSR(521,168) to overcome them. Combining the two slows down the implementation, but improves the statistical quality of the random number generator. Choose your poison, I guess.

The Mersenne Twister

The Mersenne Twister is a new random number generator, invented/discovered in 1996 by Matsumora and Nishimura. If you want to know how it works, read more here.

MT is a twisted GFSR(624,397), similar in spirit to R250 and in my tests it's comparable in speed to R250 (slightly slower than R250 alone, but faster than R250/521 combined). It takes up more space than R250 or R521, but less than the two combined. MT has an amazing period of 2^19937-1. Overall, I'd have to say that the Mersenne Twister is my current favorite random number generator.

Implementations

In the implementations below, each RNG has two functions, one for initializing the RNG and one for getting the next random 32-bit integer. It's simple to then generate other kinds of random numbers (such as random floating point numbers within a particular interval).

Both random number generators require an initial random seed. I've just used the standard system random number generator; feel free to choose something more appropriate to your environment.

You can also tease out the separate implementations of R250 and R521 and use them individually, instead of combined. This will improve perf and reduce the space requirement, but reduces the statistical effectiveness of the generator.

Legal crap

I hereby place this sofware in the public domain.

C/C++

I've carefully hand-tuned the following implementations under g++ version 3.3 targeting PPC. On a Powerbook G4, these take around 32 cycles per random number. If you wish to suggest improvements for other compilers/architectures, I'm all ears.

#define R250_LEN     250
#define R521_LEN     521

int r250_index;
int r521_index;
unsigned long r250_buffer[R250_LEN];
unsigned long r521_buffer[R521_LEN];

void r250_521_init() {
    int i = R521_LEN;
    unsigned long mask1 = 1;
    unsigned long mask2 = 0xFFFFFFFF;
	
    while (i-- > R250_LEN) {
        r521_buffer[i] = rand();
    }
    while (i-- > 31) {
        r250_buffer[i] = rand();
        r521_buffer[i] = rand();
    }
    
    /*
    Establish linear independence of the bit columns
    by setting the diagonal bits and clearing all bits above
    */
    while (i-- > 0) {
        r250_buffer[i] = (rand() | mask1) & mask2;
        r521_buffer[i] = (rand() | mask1) & mask2;
        mask2 ^= mask1;
        mask1 >>= 1;
    }
    r250_buffer[0] = mask1;
    r521_buffer[0] = mask2;
    r250_index = 0;
    r521_index = 0;
}

#define R250_IA  (sizeof(unsigned long)*103)
#define R250_IB  (sizeof(unsigned long)*R250_LEN - R250_IA)
#define R521_IA  (sizeof(unsigned long)*168)
#define R521_IB  (sizeof(unsigned long)*R521_LEN - R521_IA)

unsigned long r250_521_random() {
    /*
    I prescale the indices by sizeof(unsigned long) to eliminate
    four shlwi instructions in the compiled code.  This minor optimization
    increased perf by about 12%.
    
    I also carefully arrange index increments and comparisons to minimize
    instructions.  gcc 3.3 seems a bit weak on instruction reordering. The
    j1/j2 branches are mispredicted, but nevertheless these optimizations
    increased perf by another 10%.
    */
    
    int i1 = r250_index;
    int i2 = r521_index;
    unsigned char * b1 = (unsigned char*)r250_buffer;
    unsigned char * b2 = (unsigned char*)r521_buffer;
    unsigned long * tmp1, * tmp2;
    unsigned long r, s;
    int j1, j2;
    
    j1 = i1 - R250_IB;
    if (j1 < 0)
        j1 = i1 + R250_IA;
    j2 = i2 - R521_IB;
    if (j2 < 0)
        j2 = i2 + R521_IA;
    
    tmp1 = (unsigned long *)(b1 + i1);
    r = (*(unsigned long *)(b1 + j1)) ^ (*tmp1);
    *tmp1 = r;
    tmp2 = (unsigned long *)(b2 + i2);
    s = (*(unsigned long *)(b2 + j2)) ^ (*tmp2);
    *tmp2 = s;
    
    i1 = (i1 != sizeof(unsigned long)*(R250_LEN-1)) ? (i1 + sizeof(unsigned long)) : 0;
    r250_index = i1;
    i2 = (i2 != sizeof(unsigned long)*(R521_LEN-1)) ? (i2 + sizeof(unsigned long)) : 0;
    r521_index = i2;
        
    return r ^ s;
}
#define MT_LEN       624

int mt_index;
unsigned long mt_buffer[MT_LEN];

void mt_init() {
    int i;
    for (i = 0; i < MT_LEN; i++)
        mt_buffer[i] = rng.random_integer();
    mt_index = 0;
}

#define MT_IA           397
#define MT_IB           (MT_LEN - MT_IA)
#define UPPER_MASK      0x80000000
#define LOWER_MASK      0x7FFFFFFF
#define MATRIX_A        0x9908B0DF
#define TWIST(b,i,j)    ((b)[i] & UPPER_MASK) | ((b)[j] & LOWER_MASK)
#define MAGIC(s)        (((s)&1)*MATRIX_A)

unsigned long mt_random() {
    unsigned long * b = mt_buffer;
    int idx = mt_index;
    unsigned long s;
    int i;
	
    if (idx == MT_LEN*sizeof(unsigned long))
    {
        idx = 0;
        i = 0;
        for (; i < MT_IB; i++) {
            s = TWIST(b, i, i+1);
            b[i] = b[i + MT_IA] ^ (s >> 1) ^ MAGIC(s);
        }
        for (; i < MT_LEN-1; i++) {
            s = TWIST(b, i, i+1);
            b[i] = b[i - MT_IB] ^ (s >> 1) ^ MAGIC(s);
        }
        
        s = TWIST(b, MT_LEN-1, 0);
        b[MT_LEN-1] = b[MT_IA-1] ^ (s >> 1) ^ MAGIC(s);
    }
    mt_index = idx + sizeof(unsigned long);
    return *(unsigned long *)((unsigned char *)b + idx);
    /*
    Matsumoto and Nishimura additionally confound the bits returned to the caller
    but this doesn't increase the randomness, and slows down the generator by
    as much as 25%.  So I omit these operations here.
    
    r ^= (r >> 11);
    r ^= (r << 7) & 0x9D2C5680;
    r ^= (r << 15) & 0xEFC60000;
    r ^= (r >> 18);
    */
}

Java

public final class R250_521 {
    private int r250_index;
    private int r521_index;
    private int[] r250_buffer = new int[250];
    private int[] r521_buffer = new int[521];

    public R250_521() {
        java.util.Random r = new java.util.Random();
        int i = 521;
        int mask1 = 1;
        int mask2 = 0xFFFFFFFF;
	
        while (i-- > 250) {
            r521_buffer[i] = r.nextInt();
        }
        while (i-- > 31) {
            r250_buffer[i] = r.nextInt();
            r521_buffer[i] = r.nextInt();
        }
    
        /*
        Establish linear independence of the bit columns
        by setting the diagonal bits and clearing all bits above
        */
        while (i-- > 0) {
            r250_buffer[i] = (r.nextInt() | mask1) & mask2;
            r521_buffer[i] = (r.nextInt() | mask1) & mask2;
            mask2 ^= mask1;
            mask1 >>= 1;
        }
        r250_buffer[0] = mask1;
        r521_buffer[0] = mask2;
        r250_index = 0;
        r521_index = 0;
    }
	
    public int random() {
        int i1 = r250_index;
        int i2 = r521_index;
    
        int j1 = i1 - (250-103);
        if (j1 < 0)
            j1 = i1 + 103;
        int j2 = i2 - (521-168);
        if (j2 < 0)
            j2 = i2 + 168;
    
        int r = r250_buffer[j1] ^ r250_buffer[i1];
        r250_buffer[i1] = r;
        int s = r521_buffer[j2] ^ r521_buffer[i2];
        r521_buffer[i2] = s;
    
        i1 = (i1 != 249) ? (i1 + 1) : 0;
        r250_index = i1;
        i2 = (i2 != 521) ? (i2 + 1) : 0;
        r521_index = i2;
        
        return r ^ s;
    }
}
public final class MT {
    private int mt_index;
    private int[] mt_buffer = new int[624];

    public MT() {
        java.util.Random r = new java.util.Random();
        for (int i = 0; i < 624; i++)
            mt_buffer[i] = r.nextInt();
        mt_index = 0;
    }

    public int random() {
        if (mt_index == 624)
        {
            mt_index = 0;
            int i = 0;
            int s;
            for (; i < 624 - 397; i++) {
                s = (mt_buffer[i] & 0x80000000) | (mt_buffer[i+1] & 0x7FFFFFFF);
                mt_buffer[i] = mt_buffer[i + 397] ^ (s >> 1) ^ ((s & 1) * 0x9908B0DF);
            }
            for (; i < 623; i++) {
                s = (mt_buffer[i] & 0x80000000) | (mt_buffer[i+1] & 0x7FFFFFFF);
                mt_buffer[i] = mt_buffer[i - (624 - 397)] ^ (s >> 1) ^ ((s & 1) * 0x9908B0DF);
            }
        
            s = (mt_buffer[623] & 0x80000000) | (mt_buffer[0] & 0x7FFFFFFF);
            mt_buffer[623] = mt_buffer[396] ^ (s >> 1) ^ ((s & 1) * 0x9908B0DF);
        }
        return mt_buffer[mt_index++];
    }
}

C#

public sealed class R250_521 {
    private int r250_index;
    private int r521_index;
    private int r250_buffer[250];
    private int r521_buffer[521];

    public R250_521() {
        Random r = new Random();
        int i = 521;
        int mask1 = 1;
        int mask2 = 0xFFFFFFFF;
	
        while (i-- > 250) {
            r521_buffer[i] = r.Next();
        }
        while (i-- > 31) {
            r250_buffer[i] = r.Next();
            r521_buffer[i] = r.Next();
        }
    
        /*
        Establish linear independence of the bit columns
        by setting the diagonal bits and clearing all bits above
        */
        while (i-- > 0) {
            r250_buffer[i] = (r.Next() | mask1) & mask2;
            r521_buffer[i] = (r.Next() | mask1) & mask2;
            mask2 ^= mask1;
            mask1 >>= 1;
        }
        r250_buffer[0] = mask1;
        r521_buffer[0] = mask2;
        r250_index = 0;
        r521_index = 0;
    }
	
    public int random() {
        int i1 = r250_index;
        int i2 = r521_index;
    
        int j1 = i1 - (250-103);
        if (j1 < 0)
            j1 = i1 + 103;
        int j2 = i2 - (521-168);
        if (j2 < 0)
            j2 = i2 + 168;
    
        int r = r250_buffer[j1] ^ r250_buffer[i1];
        r250_buffer[i1] = r;
        int s = r521_buffer[j2] ^ r521_buffer[i2];
        r521_buffer[i2] = s;
    
        i1 = (i1 != 249) ? (i1 + 1) : 0;
        r250_index = i1;
        i2 = (i2 != 521) ? (i2 + 1) : 0;
        r521_index = i2;
        
        return r ^ s;
    }
}
public sealed class MT {
    private int mt_index;
    private int mt_buffer[624];

    public MT() {
        Random r = new Random();
        for (int i = 0; i < 624; i++)
            mt_buffer[i] = r.Next();
        mt_index = 0;
    }

    public int Random() {
        if (mt_index == 624)
        {
            mt_index = 0;
            int i = 0;
            int s;
            for (; i < 624 - 397; i++) {
                s = (mt_buffer[i] & 0x80000000) | (mt_buffer[i+1] & 0x7FFFFFFF);
                mt_buffer[i] = mt_buffer[i + 397] ^ (s >> 1) ^ ((s & 1) * 0x9908B0DF);
            }
            for (; i < 623; i++) {
                s = (mt_buffer[i] & 0x80000000) | (mt_buffer[i+1] & 0x7FFFFFFF);
                mt_buffer[i] = mt_buffer[i - (624 - 397)] ^ (s >> 1) ^ ((s & 1) * 0x9908B0DF);
            }
        
            s = (mt_buffer[623] & 0x80000000) | (mt_buffer[0] & 0x7FFFFFFF);
            mt_buffer[623] = mt_buffer[396] ^ (s >> 1) ^ ((s & 1) * 0x9908B0DF);
        }
        return mt_buffer[mt_index++];
    }
}


Disclaimer and copyright information