Funcation Programing.. This post is based on a talk I gave at the - TopicsExpress



          

Funcation Programing.. This post is based on a talk I gave at the Manchester R User Group on functional programming in R on May 2nd 2013. The original slides can be found here This post is about functional programming, why it is at the heart of the R language and how it can hopefully help you to write cleaner, faster and more bug-free R programs. I will discuss what functional programming is at a very abstract level as a means of the representation of some simplified model of reality on a computer. Then IâÃÂÃÂll talk about the elements that functional programming is comprised of and highlight the most important elements in programming in R. I will then go through a quick example demo of a FP-style generic bootstrap algorithm to sample linear models and return bootstrap confidence intervals. IâÃÂÃÂll compare this with a non-FP alternative version so you will hopefully clearly see the advantages of using an FP style. To wrap up, IâÃÂÃÂll make a few suggestions for places to go to if you want to learn more about functional programming in R. What is Functional programming? … Well, what is programming? When you write a program you are building an abstract representation of some tiny subset of reality on your computer, whether it is an experiment you have conducted or a model of some financial system or a collection of features of members of a population. There are obviously different ways to represent reality, and the different different methods of doing so programmatically can be thought of as the metaphysics of different styles of programming. Consider for a moment building a representation of a river on a computer, a model of a river system for example. In non-functional languages such as C++, Java and (to some extent) Python, the river is an object in itself, a thing that does things to other things and that may have various properties associated with it such as flow rate, depth and pollution levels. These properties may change over time but there is always this constant, the river, which somehow persists over time. In FP we look at things differently… The presocratic philosopher Hereclitus said âÃÂÃÂWe never step into the same river twiceâÃÂÃÂ, recognising that the thing we call a river is not really an object in itself, but something undergoing constant change through a variety of processes. In functional programming we are less concerned with the object of the river itself but rather the processes that it undergoes through time. Our river at any point in time is just a collection of values (say, particles and their positions). These values then feed into a process to generate the series of values at the next time point. So we have data flowing through processes of functions and that collection of data over time is what we call a river, which is really defined by the functions that the data flows through. This is a very different way of looking at things to that of imperative, object oriented programming. After this somewhat abstract and philosophical start, Ill talk about the more practical elements of functional programming (FP). FP has been around for a very long time and originally stems from Lisp, which was first implemented in the 1950âÃÂÃÂs. It is making something of a comeback of late for a variety of reasons, but mostly because it is so good at dealing with concurrent, multicore problems potentially over many computers. There are several elements that FP is generally considered to be comprised of. Different languages highlight different elements, depending on how strictly functional they are. Functions are first class citizens of the language This means that functions can be treated just like any other data type - They can be passed around as arguments to other functions, returned by other functions and stored in lists. This is the really big deal about functional programming and allows for higher-order functions (such as lapply) and closures, which Ill talk about later. This is the most fundamental functional concept and Id argue that a language has to have this property in order to be called a functional language, even if it has some of the other elements listed below. For example, Python has anonymous functions and supports declarative programming with list comprehensions and generators, but functions are fundamentally different from data-types such as lists so Python cannot really be described as a functional language in the same way as Scheme or R can be. Functional purity This is more of an element of good functional program design. Pure functions take arguments, return values and otherwise have no side effects - no I/O or global variable changes. This means that if you call a pure function twice with the same arguments, you know it will return the same value. This means programs are easily tested because you can test different elements in isolation and once you know they work, you can treat them like a black box, knowing that they will not change some other part of your code somewhere else. Some very strictly functional languages, like Haskell, insist on functional purity to the extent that in order to output data or read or write files you are forced to wrap your dirty functions in constructs called monads to preserve the purity of your code. R does not insist on functional purity, but it is often good practice to split your code into pure and impure functions. This means you can test your pure code easily and confine your I/O and random numbers etc to a small number of dirty functions. Vectorised functions Vectorised functions operate equally well on all elements of a vector as they do on a single number. They are very important in R programming to the point that much of the criticism of R as a really slow language can be put down to failing to properly understand vectorisation. This also includes the declarative style of programming, where you tell the language what you want, rather than how you want to get it. This is common in languages like SQL and in Python generators. Ill discuss this more later. Anonymous functions In FP, naming and applying a function are two separate operations, you dont need to give your functions names in order to call them. So, calling this function: powfun 1 + 2 ## [1] 3 … is exactly the same as… > `+`(1, 2) ## [1] 3 The + operator is really just “syntactic sugar” for a + function with the arguments 1 and 2 applied to it. Similarly, > 1:10 ## [1] 1 2 3 4 5 6 7 8 9 10 … is the same as… > `:`(1, 10) ## [1] 1 2 3 4 5 6 7 8 9 10 here again, to give the range of numbers between 1 and 10, : is really just a function in disguise. If you were to break down more complex expressions in this way, the result would be code that looks very Scheme-like indeed. I will now look in more depth at the functional concepts that are most important in R, Vectorised functions, higher order functions and closures. Vectorised functions Probably the best known FP concept in R is the vectorised function which automagically operates on a data structure like a vector in just the same way as on a single number. Some of the most important of these are the vector subsetting operations. In these, you take a declarative approach to programming: you tell R what you want, not how to get it. Because of this property of operating across vectors, proper use of vectorised functions will drastically reduce the number of loops you have to hand code into your scripts. They are still performing loops under the hood, but these loops are implemented in C, so are many times faster than a standard for loop in R. For example, when I was first using R for data management and analysis, I spent months writing code like this: > # Get all even numbers up to 200000 > # using S-style vector allocation: > x for(i in 1:200000){ > if(i %% 2 == 0){ > x } > } This is about the worst possible way of achieving the given task (here, getting all even numbers up to 200000). You are running a for loop, which is slow in itself, and testing if i is even on each iteration and then growing the x vector every time you find that it is. The code is ugly, slow and verbose (On my machine it took around 10 seconds). For me, writing vectorised code was a real revelation. To achieve the same goal as the code above in a vectorised style: > # FP style vectorised operation > a x FUNCTION -> list sapply : Any collection -> FUNCTION -> matrix/vector apply : Matrix/dataframe + margin -> FUNCTION -> matrix/vector Reduce : Any collection -> FUNCTION -> single element so if you want your output in a list, use lapply. If you want a vector or matrix, use sapply. If you want to calculate summaries of the rows or columns of a dataframe, use apply. If you want to condense your dataset down into a single summary number, use Reduce. There are several other functions in the same family, which all follow a similar pattern. Closures Closures are at the heart of all functional programming languages. Essentially a closure is a function to which has been added data via its arguments. The function closes over the data at the time the function was created and it is possible to access it at a later time. Compare this to the idea of an object in languages like C++ and Java, which are data with functions attached to them. You can use closures to build wrappers around functions with new default values and partially apply functions and even mimic Object-oriented style objects, but possibly most interestingly, you can build functions that return other functions. This is great if you want to call a function many times on the same dataset but with different parameters, such as in maximum-likelihood optimisation problems when you are seeking to minimise some cost function and also for randomisation and bootstrapping algorithms. To demonstrate the usefulness of this, I am now going to build a generic bootstrapping algorithm in a functional style that can be applied to any linear model. It will demonstrate not only functions returning functions, but higher-order functions (in this case, sapply), anonymous functions (in the mapping function to sapply) and vectorised functions. I will then compare this against a non-FP version of the algorithm and hopefully some of the advantages of writing in an FP style in R will become clear. Here is the code. I am doing a bootstrap of a simple linear model on the classic iris dataset: boot.lm
Posted on: Sun, 10 Aug 2014 05:34:35 +0000

Trending Topics



Recently Viewed Topics




© 2015