Home | About | Apps | Github | Rss

Inspired by a friend’s prolific use of R for doing data analysis, over past few days I have been trying to get the hang of the language. Here are some notes and thoughts

Thinking of R as a programming language may sub-consciously invite a notion of procedural logic. That was a path not worth going down. Its easier to assimilate the syntax, if one were to consider it an expression evaluation system.

Not having worked with any kind functional languages (unless you consider javascript), I had to find analogies to the procedural world, to help grasp the fundamentals.

Here are my notes starting from the very basics to the point of being able to make a basic plot.

Am following documentation here - https://cran.r-project.org/doc/manuals/R-intro.html. For the most part, it is great, although some parts may seem tediously verbose, fretting on about smallest details.

R Studio is a pretty good place to start learning and using R. It features.

- REPL (interactive prompt)
- Notebook from which you can execute one or many statements into your REPL.
- package manager
- Graphics/output viewer
- Variable viewer

Python’s REPL allowed me to learn the language without leaving the convenience of the terminal and in the absence of internet a decade ago in India.

R’s REPL in R-Studio takes it a notch further.

Getting help is quite easy. Simple executing `?func`

will open docs for that function.

```
> ?plot
```

The language is loosely typed. Variables can be assigned to one type and re-assigned to another later.

```
> a = "foo"
> class(a)
[1] "character"
> a = 23
> class(a)
[1] "numeric"
```

R being primarily a statistical analysis language, the primitive data types are quite diverse. A non-exhaustive list for an idea:

- numeric
- complex
- character (strings)
- vectors (think arrays)
- matrices
- factors
- data frames (think tables)
- etc

Variables are assigned from right to left with either `=`

or `<-`

. You can also perform left to right assignments by the use of `->`

operator.

```
> a <- 23**2
> 23**2 -> a # assign expression into a
> print(a)
[1] 529
```

Closest (and insufficient) analogy is to think of them as arrays, but richer.

A vector can be initialised using `c()`

```
> c(1,123,33,223,2) -> a
> print(a)
[1] 1 123 33 223 2
> sum(a)
[1] 382
```

All elements of the vector must be of the same type. If elements all but one are numeric and last is string, all are coerced to become a string

```
> c(1,2,3,"f") # all but one are numeric
[1] "1" "2" "3" "f" # all coerced to strings
> c(1,2,3,"f") -> a
> class(a)
[1] "character"
```

`c()`

function flattens nested vectors

```
> c(a, c(1,2,3))
[1] "1" "2" "3" "f" "1" "2" "3"
```

Length can be computed using `length()`

function.

Items can be accessed using a one-index notation.

```
> a
[1] "1" "2" "3" "f"
> a[1]
"1"
```

Range of items can be expressed as `variable[from: length]`

```
> a
[1] "1" "2" "3" "f"
> a[0:2]
[1] "1" "2"
```

Numeric vectors can also be created using the range syntax

```
> 1:10
[1] 1 2 3 4 5 6 7 8 9 10
```

Index *can* go out of bounds. Out of bounds elements will be marked with a *typed*`NA`

value.

```
> a
[1] "1" "2" "3" "f"
> a[0:10]
[1] "1" "2" "3" "f" NA NA NA NA NA NA
```

`a[<expression>]`

can contain any valid expression and is not limited to range representation

```
> b = c(1,2,3,NA,NA) # consider
> b[b>2] # elements of b > 2
[1] 3 NA NA
> b[ !is.na(b) ] # elements of b that is not `NA`
[1] 1 2 3
> b[ b > 2 & !is.na(b) ] # b > 2 and that is not `NA`
[1] 3
```

Any basic math operation performed, is applied to whole vector and result is represented as an equivalent vector.

```
> b = c(1,2,3) # assign
> b
[1] 1 2 3
> b + 1 # +1 all elements
[1] 2 3 4
> b*b # square all elements
[1] 1 4 9
```

Addition of multiple vectors will result in one-to-one addition.

```
> c <- 1:5 # c = c(1,2,3,4,5)
> d <- 6:10 # d = c(6,7,8,9,10)
> c + d
[1] 7 9 11 13 15 # 1+5, 2+6
```

vectors of in-equal *can* participate in binary operations - the catch being the shortest vector will be repeated

```
> 1:6 -> p # 1 2 3 4 5 6
> 1:3 -> q # 1 2 3
> p + q
[1] 2 4 6 5 7 9 # 1+1, 2+2, 3+3, 4+1, 5+2, 6+3
```

Elements of a vector *can be named*

```
> x = c(foo=1,2,3,4,5) # named vector element
> x
foo # column name
1 2 3 4 5
```

It can be accessed both by an index as well as by its name.

```
> x[1]
foo
1
> x["foo"]
foo
1
```

The names can be obtained by `names`

function.

```
> names(x)
[1] "foo" "" "" "" ""
```

Consider vectors

```
> x = 1:5
> x
[1] 1 2 3 4 5
> x^3
[1] 1 8 27 64 125
```

Plotting `x`

vs `x^3`

is as simple as running the function `plot()`

```
> plot(x = x, y = x**3, type='b')
```

Note on the parameter `type`

:

- ‘b’ - indicates “both” points and lines
- ‘p’ - will plot only points
- ’l’ - will plot only lines

- Next: Books in 2017
- Previous: Getting started with a blank iOS Project