R cheatsheet

This notes contain several commands useful to use in RStudio. I find mandatory to follow this online book by Hadley Wickham and Garret Grolemund.

Basics

Transform data into code to make it reproducible to others

dput(mtcars)

so that, in the script, you can paste the output of the previous command and assign it

df <- OUTPUT

The notation to refer to a specific resource (function or dataframe) in a specific package is:

packagename::resource

Get help about a resource foo:

?package::foo
?foo

Press ALT+SHIFT+K to access the shortcuts.

ggplot2 fundamentals

Check out this awesome ggplot2 pdf cheatsheet.

Scatter plot of var1 vs var2 data from dataframe df:

ggplot(data = df) + 
  geom_point(mapping = aes(x = var2, y = var1))

Mapping a variable to an aesthetic:

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy, color = class))

An aesthetic is a visual property of a point, i.e., shape, size, color, alpha (transparency), stroke, group, fill (see ?geom_point).

To change aesthetics manually add it outside the aes() block:

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy), color = "green")

plot with different colors depending on condition:

 ggplot(data = mpg) + 
      geom_point(mapping = aes(x = displ, y = hwy, colour = displ < 5))

Facet a plot separating data in different plots grouped by a third discrete or cathegorical variable value:

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) + 
  facet_wrap(~ class, nrow = 2)

above, ~ variable is a data structure called a formula in R. Also, facet the plot with two discrete variables:

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) + 
  facet_grid(drv ~ class)

Use different geom objects to represent data: geom_points, geom_smooth… You can overlap different geom’s of the same data using the mapping argument:

ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point() + 
  geom_smooth()

mappings declared this way are global, any further mapping introduced inside any geom block will be a local one, and will replace settings of the global mapping.

ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point(mapping = aes(color = class)) + 
  geom_smooth()

Console

If you want to assign and print the assignment in a single line use () around the assignment:

(y <- seq(1, 10))

dplyr basics

Data can be transformed using the basic dplyr functions, whose syntax is:

new_dataframe <- verb(old_dataframe, what to do options)

Filter:

jan1 <- filter(flights, month == 1, day == 1)

comparison operators are >, >=, <, <=, != (not equal), and == (equal). Conditions can be combined using & “and”, | “or”, and ! “not”. When comparing real numbers it might be suitable using near(a, b) instead of a == b to avoid floating precision errors. There is the option x %in% y that will filter rows where x is found in y

nov_dec <- filter(flights, month %in% c(11, 12))

Ordering rows is achieved with arrange. Values with NA are shown at the end, they can be printed first as follows:

arrange(data_frame, desc(is.na(column_name)))

Selecting columns by variable name and dropping the rest

select(flights, year, month, day)
select(flights, year:day)
select(flights, -(year:day))

variables can be identified with auxiliary functions as starts_with, end_with, see ?select. Change the order of some variables:

select(flights, time_hour, air_time, everything())

Create new variables from existing ones with mutate:

# Create a new dataframe from existing one
flights_sml <- select(flights, 
  year:day, 
  ends_with("delay"), 
  distance, 
  air_time
)
# Append the new columns
mutate(flights_sml,
  gain = dep_delay - arr_delay,
  speed = distance / air_time * 60,
  hours = air_time / 60,
  gain_per_hour = gain / hours
)

Or, if you want to replace all variables by the new ones:

transmute(flights,
  gain = dep_delay - arr_delay,
  hours = air_time / 60,
  gain_per_hour = gain / hours
)

Using summary together with group_by to obtain statistics group by group:

by_day <- group_by(flights, year, month, day)
summarise(by_day, delay = mean(dep_delay, na.rm = TRUE))

The same code can be written with pipes: %>%. Pipes such as x %>% f(y) %>% g(z) perform g(f(x, y),x).

flights %>% 
  group_by(year, month, day) %>% 
  summarise(mean = mean(dep_delay, na.rm = TRUE))

Utilities

Append strings

Use paste() method to append strings

reports_dir = paste(wd, "/reports", sep="")

Export/Import R objects such as strings, lists

Store the contents of an object in a local file:

TF_PRE=NULL
TF_PRE[1]="DBUSER"
TF_PRE[2]="SID"
TF_PRE[3]="DNS"
TF_PRE[4]="PASSWD"
TF_PRE[5]=66600
saveRDS(TF_PRE, file = paste(wd,"/TF_PRE.rds", sep=""))

Import the contents of an object:

Connection to oracle database

Return to main page