Friday, April 27, 2007

Comments on GHC Random Number Generator

From the Glasgow Haskell Compiler 6.6 implementation of the module System.Random

instance Read StdGen where
  readsPrec _p = \ r ->
     case try_read r of
       r@[_] -> r
       _   -> [stdFromString r] -- because it shouldn't ever fail.
    where 
      try_read r = do

One can inject invalid state here (out of range) or 0 0.

         (s1, r1) <- readDec (dropWhile isSpace r)
         (s2, r2) <- readDec (dropWhile isSpace r1)
  return (StdGen s1 s2, r2)
{-
 If we cannot unravel the StdGen from a string, create
 one based on the string given.
-}

I wish this would use a real hash function, for example, SHA-256. It consumes only 6 characters! Since it has about 62 bits of state, it ought to in principle consume 8, though problems with overflow and the 3 multiplier might cause complications.

stdFromString         :: String -> (StdGen, String)
stdFromString s        = (mkStdGen num, rest)
 where (cs, rest) = splitAt 6 s
              num        = foldl (\a x -> x + 3 * a) 1 (map ord cs)
{- |
The function 'mkStdGen' provides an alternative way of producing an initial
generator, by mapping an 'Int' into a generator. Again, distinct arguments
should be likely to produce distinct generators.

Programmers may, of course, supply their own instances of 'RandomGen'.
-}
mkStdGen :: Int -> StdGen -- why not Integer ?

Yes, why not? Then the user could specify lots of initial entropy.

mkStdGen s
 | s < 0     = mkStdGen (-s)
 | otherwise = StdGen (s1+1) (s2+1)
      where
 (q, s1) = s `divMod` 2147483562
 s2      = q `mod` 2147483398

It should be commented that we use one less than the moduli to avoid the initial state being zero.

Look at all these hard coded numbers! Ideally they should be specified exactly once, in factored form, as in L'Ecuyer's paper. The factorization (showing no common factors between the moduli) demonstrates the long period.

Sadly, the function below is not exported.

createStdGen :: Integer -> StdGen
createStdGen s
 | s < 0     = createStdGen (-s)
 | otherwise = StdGen (fromInteger (s1+1)) (fromInteger (s2+1))
      where
 (q, s1) = s `divMod` 2147483562
 s2      = q `mod` 2147483398

-- FIXME: 1/2/3 below should be ** (vs@30082002) XXX

FIXME huh?

...


The lower 8 bits are used to create a random Char. (See the definition, especially the `mod` call in randomIvalInteger below.) The lower bits of RNG are suspect.

Even worse the lower 1 bits are used to create a random Bool.

instance Random Char where
  randomR (a,b) g = 
      case (randomIvalInteger (toInteger (ord a), toInteger (ord b)) g) of
        (x,g) -> (chr x, g)
  random g   = randomR (minBound,maxBound) g

instance Random Bool where
  randomR (a,b) g = 
      case (randomIvalInteger (toInteger (bool2Int a), toInteger (bool2Int b)) g) of
        (x, g) -> (int2Bool x, g)
       where
         bool2Int False = 0
         bool2Int True  = 1

  int2Bool 0 = False
  int2Bool _ = True

  random g   = randomR (minBound,maxBound) g
 

-- hah, so you thought you were saving cycles by using Float?

Yes, it sure would be nice. It would also be nice to avoid invoking GMP (via type Integer) when creating random numbers of the standard small types.

instance Random Float where
  random g        = randomIvalDouble (0::Double,1) realToFrac g
  randomR (a,b) g = randomIvalDouble (realToFrac a, realToFrac b) realToFrac g

mkStdRNG :: Integer -> IO StdGen
mkStdRNG o = do
    ct          <- getCPUTime
    (TOD sec _) <- getClockTime
    return (createStdGen (sec * 12345 + ct + o))

Is 12345 the optimal constant? Does it interact badly with the modulus 2147483562?

randomIvalInteger :: (RandomGen g, Num a) => (Integer, Integer) -> g -> (a, g)
randomIvalInteger (l,h) rng
 | l > h     = randomIvalInteger (h,l) rng
 | otherwise = case (f n 1 rng) of (v, rng') -> (fromInteger (l + v `mod` k), rng')

So iLogBase overshoots, then `mod` is used to bring it back into range. The wraparound overshoot makes it no longer uniform.

     where
       k = h - l + 1
       b = 2147483561

Shouldn't this be 2147483562? We are creating a big number in base 2147483562.

       n = iLogBase b k

       f 0 acc g = (acc, g)
       f n acc g = 
          let
    (x,g')   = next g

Might need to subtract 1 from x.

   in
   f (n-1) (fromIntegral x + acc * b) g'

randomIvalDouble :: (RandomGen g, Fractional a) => (Double, Double) -> (Double -> a) -> g -> (a, g)
randomIvalDouble (l,h) fromDouble rng 
  | l > h     = randomIvalDouble (h,l) fromDouble rng
  | otherwise =

Instead of choosing the pre-scaled range to be the same as the modulus (about 2 billion), we instead do from -2G to 2G, or 4 billion. This will require two calls to the RNG, then throwing away almost all the bits of the second call (which ought to be used for double-precision mantissa!)

       case (randomIvalInteger (toInteger (minBound::Int), toInteger (maxBound::Int)) rng) of
         (x, rng') -> 

Consumes only 32-ish bits of randomness to produce 52 bits of double-precision mantissa!

     let
      scaled_x = 
  fromDouble ((l+h)/2) +

The divide by 2 factor is because intRange goes from -2G to 2G.


                fromDouble ((h-l) / realToFrac intRange) *
  fromIntegral (x::Int)
     in
     (scaled_x, rng')

intRange returns approximately 4 * 109.

intRange :: Integer
intRange  = toInteger (maxBound::Int) - toInteger (minBound::Int)

This function iLogBase has quadratic running time.

iLogBase :: Integer -> Integer -> Integer
iLogBase b i = if i < b then 1 else 1 + iLogBase b (i `div` b)


stdRange :: (Int,Int)

The range is actually from 1.

stdRange = (0, 2147483562)

stdNext :: StdGen -> (Int, StdGen)
-- Returns values in the range stdRange
stdNext (StdGen s1 s2) = (z', StdGen s1'' s2'')
 where z'   = if z < 1 then z + 2147483562 else z

The value z' = 0 cannot be produced! The statistical independence between s1'' and s2'' is violated, so I'm not sure about the uniformity.

  z    = s1'' - s2''

  k    = s1 `quot` 53668
  s1'  = 40014 * (s1 - k * 53668) - k * 12211
  s1'' = if s1' < 0 then s1' + 2147483563 else s1'
    
  k'   = s2 `quot` 52774
  s2'  = 40692 * (s2 - k' * 52774) - k' * 3791
  s2'' = if s2' < 0 then s2' + 2147483399 else s2'

stdSplit            :: StdGen -> (StdGen, StdGen)
stdSplit std@(StdGen s1 s2)
                     = (left, right)
                       where
                        -- no statistical foundation for this!
                        left    = StdGen new_s1 t2
                        right   = StdGen t1 new_s2

                        new_s1 | s1 == 2147483562 = 1
                               | otherwise        = s1 + 1

                        new_s2 | s2 == 1          = 2147483398
                               | otherwise        = s2 - 1

                        StdGen t1 t2 = snd (next std)

No comments :