01/13/2024
Taking part in Veganuary? Plant-based meat alternatives have increased in popularity over the past decade; the latest edition of in C&EN looks at what they're made from and how chemistry helps improve their appearance and flavour: https://cen.acs.org/food/food-science/Periodic-Graphics-chemistry-meat-alternatives/102/i1
01/05/2024
Subset a dataset by top/bottom 5 values of a column:
mtcars %>%
group_by(am) %>%
slice_max(mpg, n = 5)
01/03/2024
Beauty of ggplot!
mpg %>% count(class) %>% mutate(n = n * 2000) %>% ggplot(aes(reorder(class, n), n)) + geom_col(fill = ' ') + geom_text(aes(label = comma(n)), hjust = 0, nudge_y = 2000) + scale_y_continuous(labels = comma, limits = c(0, 150000)) + coord_flip() + labs(x = "Fuel efficiency (mpg)", y = "Weight (tons)", title = "Fuel efficiency of different car models", subtitle = "All models are pre-tested", caption = "Data source: Beautiful-data") + theme_ipsum(grid = "Y")
01/03/2024
Calculate group summaries:
> gapminder %>% filter(year==2007) %>%
+ group_by(continent) %>%
+ summarise(pop = mean(pop))
# A tibble: 5 × 2
continent pop
1 Africa 17875763.
2 Americas 35954847.
3 Asia 115513752.
4 Europe 19536618.
5 Oceania 12274974
01/03/2024
A cute dual-axis plot with a combination of line and area chart.
Here is the full code to reproduce the chart:
gapminder %>%
group_by(year, continent) %>%
summarise(pop = mean(pop), gdp = mean(gdpPercap)) %>%
ggplot(aes(x = year, y = pop, fill = continent)) +
geom_area() +
geom_line(aes(y = gdp * 10000, color = continent)) +
scale_y_continuous(
sec.axis = sec_axis(~./10000, name = "GDP")
)+
theme_minimal()+
theme(
axis.title.y = element_text(color = 'salmon1', size=13),
axis.title.y.right = element_text(color = 'slategrey', size=13)
)
01/01/2024
Show all summary statistics on one plot using the iris data:
Get the full code here:
iris %>% group_by(Species) %>% summarise_all(list(mean = ~mean(.), median = ~median(.), sd = ~sd(.))) %>% pivot_longer(-Species) %>% separate(name, into = c("variable", "stat"), sep = "_") %>% ggplot(aes(x = Species, y = value, fill = variable)) + geom_col(position = "dodge") + geom_text(aes(label = round(value, 2)), position = position_dodge(width = 0.9), size=3, vjust=2.3) + facet_wrap(vars(stat), scales = "free_y", ncol = 1) + theme(plot.margin = margin(30, 10, 10, 10, "pt")) + theme_minimal()+ theme_minimal() + theme(panel.background = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank())
01/01/2024
Here is today's killer bar chart using ggplot.
Happy new years friends!
Here is the code to reproduce the plot:
library(ggplot2)diamonds %>% count(cut) %>% ggplot(aes(x = reorder(cut, -(-n)), y = n, fill = cut)) + geom_col() + geom_text(aes(label = scales::percent(n / sum(n), scale_factor = 1), hjust=-.4), position = position_stack(vjust = 0.5), size = 7) + theme(legend.position = "none", axis.text.y = element_text(size = 17), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank(), axis.line = element_line(color = "white"), axis.ticks.y = element_blank(), axis.ticks.x = element_blank(), axis.text.x = element_blank()) + coord_flip() + labs(x='', y='', title = paste(names(diamonds["cut"]))) + expand_limits(y = 0)
12/30/2023
Today's cute plot in R studio:
Run this code in R compiler
https://www.mycompiler.io/new/r
matplot(iris[,1:4], type = "l", lty = 1, col = 1:4,
xlab = "Observation", ylab = "Value",
main = "Numeric Column Plots")
text(x =150, y = max(iris[,1]), labels = names(iris)[1], col = 1)
text(x = 150, y = max(iris[,2]), labels = names(iris)[2], col = 2)
text(x = 150, y = max(iris[,3]), labels = names(iris)[3], col = 3)
text(x = 150, y = max(iris[,4]), labels = names(iris)[4], col = 4)
12/24/2023
many students find it hard to find the highest values for a given column. Here are 2 cool ways to do it in R:
Method 1:
iris %>% top_n(2, wt = Sepal.Width)
Method 2:
iris %>% arrange(desc(Sepal.Length)) %>% head(2)
02/24/2022
Data scientists are changing the world
RStudio - YouTube
RStudio’s mission is to create free and open-source software for data science, scientific research, and technical communication to enhance the production and...