A common misconception some (many?) have about recursive solutions is that they are not fast. There are many reasons to use recursion, both in code and in data, mainly that solutions based on recursion are easy to implement, understand, and maintain. For an introduction see Recursion (http://www.tushar-mehta.com/publish_train/book_vba/07_recursion.htm).
What is interesting is that recursive solutions can also provide an improvement over a non-recursive solution. This happened recently when I finally decided to tackle Project Euler problem 14 (http://projecteuler.net/index.php?section=problems&id=14). In the problem, one had to find the number under 1,000,000 that had the longest Collatz chain.
I had postponed addressing the problem because until earlier today I could think of only a “brute force” approach1. But, then, I realized something. If one started from 1 and worked their way up to 999,999, one could build a database of answers for all the numbers encountered in earlier calculations. Then, whenever one encountered a number already in the database, a simple look up of the associated would short circuit the need for any further computations.
For the recursive implementation as well as the intuitive approach see Project Euler – Problem 14 (http://www.tushar-mehta.com/misc_tutorials/project_euler/euler014.html)
1 A approach that should work but that I consider as somewhat unethical is to assume that the solution is a high number. So, work backwards from 999,999 to 900,000. Check this solution with the Project Euler system. If it rejects your answer, try the range 899,999 to 800,000. If that isn’t OK, try the next range down. Sooner or later you will find the correct answer. Just keep in mind that the PE website has a 20 minute wait before one can resubmit an answer.
Was the link to problem 14 supposed to be recursive? :)
I used a simple brute force solution that solved in about 11 seconds on my computer, compared with about 70 seconds for your recursive solution, and about twice as long for the non-recursive routine. One reason was that I only checked odd starting numbers, but checking every number only doubles the time to about 22 seconds.
I modified your non-recursive code as shown below, which reduced the solution time to about 7 seconds! It seems that the CDec() or mod functions were taking a lot of time. I still haven’t worked out why your code is so much faster than mine (which was essentially the same).
Sub Euler014s()
Dim I As Long
Dim ProcTime As Single, MaxRslt As Integer, MaxStartNbr As Long
ProcTime = Timer
MaxRslt = 1: MaxStartNbr = 1
For I = 3 To 1000000 - 1 Step 2
Dim Rslt As Double, ThisSeqLen As Long
' Rslt = CDec(I): ThisSeqLen = 1
Rslt = I: ThisSeqLen = 1
Do While Rslt 1
'If Right(Rslt, 1) Mod 2 = 0 Then Rslt = Rslt / 2 _
If Rslt / 2 - Int(Rslt / 2) = 0 Then Rslt = Rslt / 2 _
Else: Rslt = 3 * Rslt + 1
ThisSeqLen = ThisSeqLen + 1
Loop
If MaxRslt
The dreaded greater/less than strikes again.
The rest of the code is as shown at Tushar’s link.
I also meant to comment that making similar changes to the recursive routine does not seem to help significantly. I’m not sure why not.
Recursion isn’t the same thing as accumulating previous results.
Tushar Mehta’s lookup approach is slow due to the function calls to CheckOneNbr as well as error trapping accesses to the SeqCount collection. While it takes a reference in the VBA Project, WSH Dictionary objects fly compared to plain vanilla VBA collection objects.
My results agree with Doug Jenkins’s: brute force takes about 11 seconds.
The point about cheating by interating from 999999 to 2 is foolish. Unless one has a mathematical proof that the answer would be towards the high end of the range, one must try all values in the range either largest to smallest or smallest to largest.
Finally, Right(Rslt, 1) Mod 2 to avoid overflow is losing tactic when the goal is performance. For that matter,
If Rslt / 2 – Int(Rslt / 2) = 0 Then Rslt = Rslt / 2 _
Else: Rslt = 3 * Rslt + 1
(why the : after Else?) performs the Rslt / 2 test more often than needed. Better something like
Rslt = Rslt / 2#
If Int(Rslt) Rslt Then Rslt = 6# * Rslt + 1#
1 divide rather than 2 in every loop, no additional divide when Rslt starts off even, and the same number of multiply and add operations when Rslt starts off odd.
I think the move to make is to solve the problem in reverse. The recursive solution still does tons of calculations to come up with the answer.
This is a big hint:
Although it has not been proved yet (Collatz Problem), it is thought that all starting numbers finish at 1.
So, start at 1 and work backwards trying to find the longest chain.
The functions:
F1: n ? n/2 (n is even)
F2: n ? 3n + 1 (n is odd)
The functions in reverse are:
R1: n ? n*2
R2: n ? (n-1) / 3
The trick is to know when to choose R2 instead of R1. You don’t have to make this choice very often. I’ll work on this and post again later.
After playing around with this, my plan doesn’t work. Even with the correct decision path in hand, backing up from 1, I can only get the first eighteen numbers correct with simple decision criteria. My criteria for choosing R2 was: n mod 2 = 0 and R2(n) mod 3 = 2.
Even if there’s a reliable way to choose when to use R2 instead of R1, there’s not a great way of knowing at what point n will never drop back below 1,000,000.
I tried a recursive approach that stores known solutions to lesser seed values in an array and as expected it works orders of magnitude faster that the brute force method.
My problem is I am new to VBA and I am stuck with what will likely be an easy solution in the end. I have a problem with my code when the the Collatz chain sequence creates an overflow error for n (seed number 113383 is the first occurance)
As suggestions?
My code is:
‘ The following iterative sequence is defined for the set of positive integers:
‘
‘ n -> n/2 (n is even)
‘ n -> 3n + 1 (n is odd)
‘
‘ Using the rule above and starting with 13, we generate the following sequence:
‘
‘ 13 -> 40 -> 20 -> 10 -> 5 -> 16 -> 8 -> 4 -> 2 -> 1
‘
‘ it can be seen that this sequence (starting at 13 and finishing at 1) contains
‘ 10 terms. Although it has not been proved yet (Collatz Problem), it is thought
‘ that all starting numbers finish at 1.
‘
‘ Which starting number, under one million, produces the longest chain?
‘
‘ NOTE: Once the chain starts the terms are allowed to go above one million.
On Error GoTo 0
Dim T As Single
Dim answer_seed As Double
Dim answer_max_step As Double
Dim step As Double
Dim seed_n As Double
Dim UpperBound As Double
Dim n As Double
Dim ChainLength() As Double
UpperBound = 100000
ReDim ChainLength(1 To UpperBound)
seed_n = 2
step = 1
T = Timer
Do
n = seed_n
Do While n 1
If n <= UpperBound Then
If ChainLength(n) 0 Then
ChainLength(seed_n) = ChainLength(n) + step
If ChainLength(seed_n) > answer_max_step Then
answer_max_step = ChainLength(seed_n)
answer_seed = seed_n
End If
Exit Do
End If
End If
If n Mod 2 = 0 Then
n = n / 2
ElseIf n Mod 2 0 Then
n = 3 * n + 1
End If
step = step + 1
Loop
If n = 1 Then
ChainLength(seed_n) = step
If ChainLength(seed_n) > answer_max_step Then
answer_max_step = ChainLength(seed_n)
answer_seed = seed_n
End If
End If
seed_n = seed_n + 1
step = 0
Loop Until seed_n = UpperBound
Debug.Print “Seed: “; answer_seed
Debug.Print “Steps:”; answer_max_step
Debug.Print “Time: “; Timer – T
End Sub
The problem is that the Mod function likes Longs, and n exceeds the limit of the Long data type. I have a couple of sugestions for you involving the “If n mod 2 = 0 Then” statement. First off, you don’t need the ElseIf part there; n mod 2 can only be 1 or 0 so a simple Else will suffice. That way you don’t have to waste time taking the Mod twice.
For your overflow problem, you can go back to the definition of the Mod function.
Mod(n,d) = n – d * INT(n/d)
Thus your statement would look like: If n – 2 * Int(n / 2) = 0 Then
This works for this particular problem and solves the problem in 1.4 seconds on my machine.
A Double will get you 15 digits of precision, but will also be vulnerable to floating point rounding errors. A common theme with PE is to require more than 15 digits of precision. A number of people here like the Currency data type. I prefer the Decimal data type (27 digits of precision and no floating point errors), though it is exceedingly slow. By comparison simply using the above formula with Decimal data took about 6.5 seconds.
Nicely done solution, and I hope this helps.
-Josh
Hi Tim –
And welcome to the Euler conversations.
Josh has helped you with your specifics. I’ll point out that there is a bug in posting code inside VB tags. Inside those tags, angle bracket pairs are treated as HTML tags, and everything between a starting “less than” and a following “greater than” is treated as unknown HTML and thrown away.
If you look at your code as it appears above, I’ll bet some money that the line
If ChainLength(n) 0 Then
really has a “not equals” angle bracket pair inside that got thrown away. Not a big deal here, but sometimes this bug can span lines of code. Does make it shorter ;-)
Something to watch for. You can see Doug complaining about it up top.
…mrt