Several cheat-sheets of different topics in .md format. Checkout the Github pages version.
This notes contain several commands useful to use in RStudio. I find mandatory to follow this online book by Hadley Wickham and Garret Grolemund.
Transform data into code to make it reproducible to others
dput(mtcars)
so that, in the script, you can paste the output of the previous command and assign it
df <- OUTPUT
The notation to refer to a specific resource (function or dataframe) in a specific package is:
packagename::resource
Get help about a resource foo:
?package::foo
?foo
Press ALT+SHIFT+K to access the shortcuts.
Check out this awesome ggplot2 pdf cheatsheet.
Scatter plot of var1 vs var2 data from dataframe df:
ggplot(data = df) +
geom_point(mapping = aes(x = var2, y = var1))
Mapping a variable to an aesthetic:
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = class))
An aesthetic is a visual property of a point, i.e., shape, size, color, alpha (transparency), stroke, group, fill (see ?geom_point
).
To change aesthetics manually add it outside the aes()
block:
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy), color = "green")
plot with different colors depending on condition:
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, colour = displ < 5))
Facet a plot separating data in different plots grouped by a third discrete or cathegorical variable value:
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_wrap(~ class, nrow = 2)
above, ~ variable
is a data structure called a formula in R. Also, facet the plot with two discrete variables:
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(drv ~ class)
Use different geom
objects to represent data: geom_points, geom_smooth…
You can overlap different geom’s of the same data using the mapping argument:
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth()
mappings declared this way are global, any further mapping introduced inside any geom block will be a local one, and will replace settings of the global mapping.
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(mapping = aes(color = class)) +
geom_smooth()
If you want to assign and print the assignment in a single line use ()
around the assignment:
(y <- seq(1, 10))
Data can be transformed using the basic dplyr functions, whose syntax is:
new_dataframe <- verb(old_dataframe, what to do options)
Filter:
jan1 <- filter(flights, month == 1, day == 1)
comparison operators are >, >=, <, <=, != (not equal), and == (equal). Conditions can be combined using & “and”, | “or”, and ! “not”.
When comparing real numbers it might be suitable using near(a, b)
instead of a == b
to avoid floating precision errors.
There is the option x %in% y that will filter rows where x is found in y
nov_dec <- filter(flights, month %in% c(11, 12))
Ordering rows is achieved with arrange
. Values with NA are shown at the end, they can be printed first as follows:
arrange(data_frame, desc(is.na(column_name)))
Selecting columns by variable name and dropping the rest
select(flights, year, month, day)
select(flights, year:day)
select(flights, -(year:day))
variables can be identified with auxiliary functions as starts_with, end_with, see ?select. Change the order of some variables:
select(flights, time_hour, air_time, everything())
Create new variables from existing ones with mutate
:
# Create a new dataframe from existing one
flights_sml <- select(flights,
year:day,
ends_with("delay"),
distance,
air_time
)
# Append the new columns
mutate(flights_sml,
gain = dep_delay - arr_delay,
speed = distance / air_time * 60,
hours = air_time / 60,
gain_per_hour = gain / hours
)
Or, if you want to replace all variables by the new ones:
transmute(flights,
gain = dep_delay - arr_delay,
hours = air_time / 60,
gain_per_hour = gain / hours
)
Using summary
together with group_by
to obtain statistics group by group:
by_day <- group_by(flights, year, month, day)
summarise(by_day, delay = mean(dep_delay, na.rm = TRUE))
The same code can be written with pipes: %>%
. Pipes such as x %>% f(y) %>% g(z)
perform g(f(x, y),x)
.
flights %>%
group_by(year, month, day) %>%
summarise(mean = mean(dep_delay, na.rm = TRUE))
Use paste() method to append strings
reports_dir = paste(wd, "/reports", sep="")
Store the contents of an object in a local file:
TF_PRE=NULL
TF_PRE[1]="DBUSER"
TF_PRE[2]="SID"
TF_PRE[3]="DNS"
TF_PRE[4]="PASSWD"
TF_PRE[5]=66600
saveRDS(TF_PRE, file = paste(wd,"/TF_PRE.rds", sep=""))
Import the contents of an object:
Return to main page