<< Back

Troubleshooting a Recursive Function in an R Script for Tableau Prep

You may have seen the blog earlier this week on getting started with R scripts in Tableau Prep. This blog aims to build on this learning curve of adapting functions in R to be used in Tableau Prep.

Example: Calculate n factorial (n!)

A great, simple example of a recursive function is n factorial. For anyone unfamiliar, this means multiplying a number by all positive integers smaller than it i.e.

n! = n * (n-1) * (n-2) * ... * 2 * 1

e.g. 5! = 5 * 4 * 3 * 2 * 1 
        = 120

If we were to write a function in R that mimics this behaviour, we would use a recursive function. This means that as you’re defining the function, you are able to call it within itself, as follows:

n_factorial <- function(n) {
	if(n <= 1) {
		return(1)
	} else { 
		return(n * n_factorial(n-1))
	}
}

If we take the example of n_factorial(3), then we would be taken down the else route (since 3>1) and the output of n_factorial(3) would be 3*n_factorial(2). But the function would not stop there! It would go off and calculate n_factorial(2), which would return 2*n_factorial(1). Finally, n_factorial(1) would return 1 and so we would have our 3*2*1 = 6, which would be returned by n_factorial(3).

Hopefully this makes sense, but if you’d like it explained a different way then check out this blog.

First Issue: The function works in R, but not in Tableau Prep

If we run the above script in R and then call it within the console with an integer, it works perfectly. When we load it into Tableau Prep on a field with the numbers 1 – 10, however, we see this error:

This is confusing, since there were no errors when we ran the script in R and we’re not trying to add any columns. However, the error arises from using a data frame from Tableau Prep (i.e. the 1 – 10 field) as the input for the function, rather than a single integer. So we’ll need to adapt the function to be able to take a data frame as an input:

n_factorial <- function(n) {
	for (i in 1:nrow(n)) {
		if(n[i,1] <= 1) {
			n[i,1]
		} else {
			n[i,1]<- n[i,1] * n_factorial(n[i,1]-1)
		}
	}
	return(n)
}

Firstly, we have introduced a for loop. In plain English, this line of code is introducing a temporary variable i, which will take on integer values between 1 and the number of rows in the data frame (nrow = number of rows). This will allow us to work with one row at a time.

Next, we need to ensure that we’re working with the right field in our data frame. On the next line of code you will notice n[i,1]. The square brackets allow us to specify a particular value within our data frame i.e. the i-th row and the 1st column, in this case. As the i cycles through different values in the for loop, you can see how this will allow us to move through the different rows, picking out one value at a time.

The rest of the function remains largely the same, it is just tucked inside the for loop, so that every row is affected by the function, before we return the new data frame. So let’s test the function with a data frame.

Second Issue: Now the function isn’t even working in R!

Now we are seeing errors both in R and Tableau Prep. The positive news is that we’re on the right track, as Tableau Prep is no longer complaining that the function didn’t return the expected number of columns. So we only need to focus on the error in R.

The error is saying that the length of the data frame we are calling is 0, which is odd as when we check this in the console, a value of 2 is returned. When working with for loops, it can be difficult to know at what stage the function is erroring. So a neat trick is to add in a line of code just before we close the for loop, to see if that iteration worked as expected:

print(c(i,n[i,1]))

We can now see that the function works as intended on the first loop i.e. n_factorial(1)=1, but there is an error on the second row where we are trying to compute n_factorial(2) or 2*n_factorial(1). This seems confusing as it seems like R was just able to compute n_factorial(1) in the previous iteration, so why the error now?

It is clear that the error is coming from the recursive calling of the function. By changing our function to be able to take in a data frame as input, we have lost the ability for the function to be called on an integer value. Therefore, the function needs to be adapted again to be able to take both an integer input and a data frame input.

A solution that works for both

n_factorial <- function(n) {
	if (is.null(nrow(n))) {
		if(n == 1) {
			n
		} else {
			n<- n * n_factorial(n-1)
		}
	} else {
		for (i in 1:nrow(n)) {
			if(n[i,1] == 1) {
				n[i,1]
			} else {
				n[i,1]<- n[i,1] * n_factorial(n[i,1]-1)
			}
		}

	}
	return(n)
}

Our final version of our n! function now has a check at the beginning that sees whether the variable being read in is an integer or a data frame. If it is an integer, the function nrow will return NULL. This means there is no need to use a for loop and the function looks similar to our first version. If nrow does not return NULL i.e. there are 1 or more rows, then our version of the function with a for loop will come into play.

Finally we are ready to take the function back into Tableau Prep:

Thank you for reading, and please do let me know if it was helpful to read about how you might troubleshoot issues in R/Tableau Prep! @JennyMartinDS14

Leave a Reply

Your email address will not be published. Required fields are marked *