12 February 2009

Shuffle things in .Net

Shuffling is complicated. There are two nice pages on CodingHorror:

http://www.codinghorror.com/blog/archives/001008.html

http://www.codinghorror.com/blog/archives/001015.html

The second link shows a chart of the counts of the different possible draws from a  card pack when you do it a few hundred thousand times. This detects the bias of an algorithm.

 

How not to shuffle # 1

Here we go for the approach of picking a number at random, seeing if we have selected it already, adding it to the selected numbers if we haven't, and picking another if we have.

Public Class Form1

Private Shared rand As New Random
Private dic As New Dictionary(Of Integer, Integer)
Dim tb As New TextBox With {.Dock = DockStyle.Fill, .Multiline = True}

Private Sub Form1_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load
Dim sw As Stopwatch = Stopwatch.StartNew
Dim shuffled(9999999)() As Integer
For
i As Integer = 0 To shuffled.GetUpperBound(0)
shuffled(i) = GetShuffledDeck(4)
Next
sw.Stop()
For i As Integer = 0 To shuffled.GetUpperBound(0)
Dim key As Integer = shuffled(i)(0) + shuffled(i)(1) * 10 + shuffled(i)(2) * 100 + shuffled(i)(3) * 1000
If dic.ContainsKey(key) Then
dic(key) = dic(key) + 1
Else
dic.Add(key, 1)
End If
Next
Dim
sb As New System.Text.StringBuilder
sb.AppendLine(sw.ElapsedMilliseconds.ToString & "ms.")
For Each kvp As KeyValuePair(Of Integer, Integer) In dic
sb.Append(kvp.ToString & " ")
Next
tb.Text = sb.ToString
Me.Controls.Add(tb)
End Sub

Function
GetShuffledDeck(ByVal numCards As Integer) As Integer()
Dim cards(numCards - 1) As Integer
For
index As Integer = 0 To cards.Length - 1 ' BTW (cards.Length - 1) will be optimized.
Dim card As Integer
Do
' We don't want two identical cards.
card = rand.Next(1, numCards + 1)
Loop While cards.Contains(card)
cards(index) = card
Next
Return
cards
End Function

End Class


This is slow, (O(n^2). My results:



29400ms.

[4312, 416350] [3241, 417266] [4231, 417358] [3214, 416887]


[2134, 416303] [3421, 416548] [4321, 415916] [1324, 415585]


[1234, 417627] [3412, 416762] [4132, 415979] [1432, 416423]


[2143, 416603] [3142, 416542] [1423, 417515] [2314, 416934]


[2341, 417028] [1243, 416388] [2431, 415852] [1342, 417022]


[2413, 416309] [4213, 418019] [3124, 417387] [4123, 415397]



They don't look biased. Standard deviation from the mean is 653.


Certainly not as much as those on the codinghorror site. Let's redo that one...



How not to shuffle # 2


Using the same code with a different shuffle function:



Function GetShuffledDeck(ByVal numCards As Integer) As Integer()
Dim cards(numCards - 1) As Integer
For
i As Integer = 0 To cards.Length - 1
cards(i) = i + 1
Next
For
i As Integer = 0 To cards.Length - 1
Dim n As Integer = rand.Next(cards.Length)
Dim temp As Integer = cards(i)
cards(i) = cards(n)
cards(n) = temp
Next
Return
cards
End Function



3618ms.

[1234, 390450] [1243, 390582] [1324, 312355] [1342, 429289]


[1423, 429646] [1432, 547601] [2134, 390203] [2143, 429178]


[2314, 351753] [2341, 351524] [2413, 429756] [2431, 545840]


[3124, 350694] [3142, 430600] [3214, 312704] [3241, 430164]


[3412, 586053] [3421, 391312] [4123, 351567] [4132, 546387]


[4213, 429804] [4231, 391291] [4312, 390192] [4321, 391055]



There's bias there. s.d. = 71880!!!



How to shuffle # 1 - Knuth shuffle / Fisher-Yates


Here I initialize the Random object with a cryptographically strong seed. Bit better than the default? The default uses the Environment.TickCount, which could be used to work out likely starting seed values.... Not likely to actually happen....

The main difference is the algorithm though, which no longer introduces bias.



Imports System.Security.Cryptography

Public Class Form1

Private Shared rand As Random
Private dic As New Dictionary(Of Integer, Integer)
Dim tb As New TextBox With {.Dock = DockStyle.Fill, .Multiline = True}

Private Sub Form1_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load
' get a random seed rather than using the default Random constructor, which
' just uses the Environment.TickCount
Dim rng As New RNGCryptoServiceProvider
Dim seedBytes(3) As Byte
rng.GetBytes(seedBytes)
rand = New Random(BitConverter.ToInt32(seedBytes, 0))
Dim sw As Stopwatch = Stopwatch.StartNew
Dim shuffled(9999999)() As Integer
For
i As Integer = 0 To shuffled.GetUpperBound(0)
shuffled(i) = GetShuffledDeck(4)
Next
sw.Stop()
For i As Integer = 0 To shuffled.GetUpperBound(0)
Dim key As Integer = shuffled(i)(0) + shuffled(i)(1) * 10 + shuffled(i)(2) * 100 + shuffled(i)(3) * 1000
If dic.ContainsKey(key) Then
dic(key) = dic(key) + 1
Else
dic.Add(key, 1)
End If
Next
Dim
sb As New System.Text.StringBuilder
sb.AppendLine(sw.ElapsedMilliseconds.ToString & "ms.")
Dim keys As New List(Of Integer)(dic.Keys)
keys.Sort()
For Each key As Integer In keys
sb.AppendFormat("[{0}, {1}] ", key, dic(key))
Next
tb.Text = sb.ToString
Me.Controls.Add(tb)
End Sub

Function
GetShuffledDeck(ByVal numCards As Integer) As Integer()
Dim cards(numCards - 1) As Integer
For
i As Integer = 0 To cards.Length - 1
cards(i) = i + 1
Next
For
i As Integer = cards.Length - 1 To 0 Step -1
Dim n As Integer = rand.Next(i + 1)
Dim temp As Integer = cards(i)
cards(i) = cards(n)
cards(n) = temp
Next
Return
cards
End Function

End Class



3564ms.

[1234, 417700] [1243, 416060] [1324, 416157] [1342, 416886]


[1423, 416348] [1432, 416439] [2134, 416352] [2143, 417366]


[2314, 416459] [2341, 417555] [2413, 417086] [2431, 416996]


[3124, 415841] [3142, 417191] [3214, 416693] [3241, 416009]


[3412, 416599] [3421, 416026] [4123, 416222] [4132, 416616]


[4213, 416332] [4231, 417548] [4312, 417035] [4321, 416484]



Bias free. s.d. = 526



How to shuffle # 2 - Sorting with a GUID


Again we can just alter the function:



Function GetShuffledDeck(ByVal numCards As Integer) As Integer()
Dim cards = Enumerable.Range(1, numCards)
cards = cards.OrderBy(Function(x) Guid.NewGuid)
Return cards.ToArray
End Function


20129ms.

[1234, 417906] [1243, 416614] [1324, 416582] [1342, 416302]


[1423, 416529] [1432, 416558] [2134, 416695] [2143, 416061]


[2314, 417049] [2341, 415870] [2413, 416690] [2431, 417183]


[3124, 415954] [3142, 416735] [3214, 415955] [3241, 415697]


[3412, 417235] [3421, 416427] [4123, 417626] [4132, 417726]


[4213, 417010] [4231, 415691] [4312, 418397] [4321, 415508]



Slow again. s.d. = 733



How to shuffle # 2 - Sorting with a load of random Doubles


Option Infer On

Imports
System.Security.Cryptography

Public Class Form1

Private Shared rand As Random
Private dic As New Dictionary(Of Integer, Integer)
Private tb As New TextBox With {.Dock = DockStyle.Fill, .Multiline = True}
Private rng As New RNGCryptoServiceProvider

Private Sub Form1_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load
Dim sw As Stopwatch = Stopwatch.StartNew
Dim shuffled(9999999)() As Integer
For
i As Integer = 0 To shuffled.GetUpperBound(0)
shuffled(i) = GetShuffledDeck(4)
Next
sw.Stop()
For i As Integer = 0 To shuffled.GetUpperBound(0)
Dim key As Integer = shuffled(i)(0) + shuffled(i)(1) * 10 + shuffled(i)(2) * 100 + shuffled(i)(3) * 1000
If dic.ContainsKey(key) Then
dic(key) = dic(key) + 1
Else
dic.Add(key, 1)
End If
Next
Dim
sb As New System.Text.StringBuilder
sb.AppendLine(sw.ElapsedMilliseconds.ToString & "ms.")
Dim keys As New List(Of Integer)(dic.Keys)
keys.Sort()
For Each key As Integer In keys
sb.AppendFormat("[{0}, {1}] ", key, dic(key))
Next
tb.Text = sb.ToString
Me.Controls.Add(tb)
End Sub

Function
GetShuffledDeck(ByVal numCards As Integer) As Integer()
Dim cards = Enumerable.Range(1, numCards)
cards = cards.OrderBy(Function(x) RandomDouble(x))
Return cards.ToArray
End Function

' This returns a function that creates a random double. The Integer bit is just ignored.
Private Function RandomDouble() As Func(Of Integer, Double)
' 8 bytes for the double.
Dim bytes(7) As Byte
' Get cryptographically strong random bytes into the array.
rng.GetBytes(bytes)
' The function uses the bytes to create a double. x is ignored.
Dim getDouble = Function(x As Integer) BitConverter.ToDouble(bytes, 0)
Return getDouble
End Function

End Class


137380ms.

[1234, 416945] [1243, 416314] [1324, 416742]


[1342, 416652] [1423, 417382] [1432, 416208]


[2134, 417577] [2143, 416464] [2314, 415925]


[2341, 416543] [2413, 416615] [2431, 418076]


[3124, 416493] [3142, 415963] [3214, 417219]


[3241, 415923] [3412, 416774] [3421, 417394]


[4123, 416466] [4132, 416706] [4213, 415900]


[4231, 416736] [4312, 415534] [4321, 417449]



Slooooooooooooooooooooooooooooooow. - The cryptographic random numbers are very expensive!



s.d. = 605



 



So there you go. But, it's interesting to note atma's comments about experienced card players - they don't like their cards shuffled by the computer, they prefer human shuffled cards and can tell the difference.




0 comments:

Post a Comment