knitr::opts_chunk$set(tidy = TRUE, tidy.opts = list(comment = FALSE))

With this dataset, the visuals seeks to ask four questions:

  1. How does the category of ‘coffee’ compare to other categories of items in terms of caffeine content?

  2. What are the theoretical substitutes to coffee?

  3. Is coffee all that strong? (i.e. should you drink coffee to benefit from the stay-awake properties of caffeine)

  4. How much is too much for you?

After bringing in the data, several edits were made to streamline the data analysis process. First, regex string matching and a subsequent as.numeric() transformation was used to ensure that only numbers were left in the caffeine content column. To deal with the wide distribution of caffeine content in the various items, a logarithmic term (of base 10) was used to transform the caffiene content column. Next, a ‘serving size’ column was created. Here, numbers were extracted out of brackets where available and a ‘none’ value assigned otherwise. Thereafter, a third new ‘maxdaily’ column was created using the recommended daily intake of 400mg (Source: Food and Drug Safety Administration).

1a. How does coffee compare to other groups of items in terms of caffeine content?

caf_dist <- ggplot(caffeine, aes(x = category, y = logserving)) + ggtitle("Dotplots of drink categories based on log of caffeine content") + 
    theme(plot.title = element_text(hjust = 0.5))
caf_dist + geom_dotplot(aes(color = category, fill = category), binaxis = "y", 
    stackdir = "center")
## `stat_bindot()` using `bins = 30`. Pick better value with `binwidth`.

On first glance, it appears that, on average, coffee doesn’t appear to have all that much caffiene at all especially considering the amount observed in the listed medications, soft drinks and energy drinks.

caf_dist + geom_boxplot(aes(fill = category))

Using a boxplot, we obtain a better visual of the group-by-group comparison.

However, removing decaffeinated coffee might give us a better sense of caffeine content.

caffeine_nodecaf <- caffeine %>% filter(!grepl("decaf", Item))
caf_dist_nodecaf <- ggplot(caffeine_nodecaf, aes(x = category, y = logserving)) + 
    ggtitle("Boxplots of drink categories based on log of caffeine content (without decaf)") + 
    theme(plot.title = element_text(hjust = 0.5))
caf_dist_nodecaf + geom_boxplot(aes(fill = category))

In this case, we arrive at the more intuitive conclusion that the item category of coffee has one of the highest amount of caffeine on average. However, this still does not deal with the interesting observation - that energy drinks have noticeably more caffeine than coffee. An inspection of the included items suggests three items- Chameleon Cold Brew Coffee, Starbucks Tall Coffee and Biggby Iced Coffee- are closer to the category of ‘Coffee’ than they are to ‘Energy drinks’. The report which the table stems from provides no clear rationale for the perculiar inclusion of these three drinks into the category of ‘Energy drinks’ or how they categorised the items more generally.

I then repeat the analysis with those three items moved into the ‘Coffee’ category with the decaffeinated options removed.

for (i in 1:nrow(caffeine_nodecaf)) {
    if (str_detect(caffeine_nodecaf$Item[i], "Coffee") == TRUE) {
        caffeine_nodecaf$category[i] <- "Coffee"
    }
    else {
        caffeine_nodecaf$category[i] <- caffeine_nodecaf$category[i]
    }
}

1b. How does the distribution change?

caf_dist1 <- ggplot(caffeine_nodecaf, aes(x = category, y = logserving)) + ggtitle("Dotplots of drink categories-log of caffeine content (without decaf) & recategorised items") + 
    theme(plot.title = element_text(hjust = 0.2))
caf_dist1 + geom_dotplot(aes(color = category, fill = category), binaxis = "y", 
    stackdir = "center")
## `stat_bindot()` using `bins = 30`. Pick better value with `binwidth`.

1c. What about group averages?

caf_dist1 + geom_boxplot(aes(fill = category)) + ggtitle("Boxplot of drink categories-log of caffeine content(without decaf) & recategorised items") + 
    theme(plot.title = element_text(hjust = 0.16))

Coffee now appears to be on par with energy drinks though the variance in energry drinks are still smaller than coffee.

2. Theoretical substitutes to coffee?

Here, we look at items that have amounts of caffeine (4g) close to that of a 180mg cup of coffee.

like_coffee <- caffeine %>% filter(between(maxdaily, 3, 5)) %>% select(-logserving) %>% 
    filter(category != "Coffee")
datatable(like_coffee)

The bulk of the replacement for coffee appears to be energy drinks. One would be well-advised to watch out for the sugar content though!

3. Is coffee all that strong?

greater_coffee <- caffeine %>% filter(`caffeine (mg/serving)` > 100) %>% arrange(desc(`caffeine (mg/serving)`)) %>% 
    select(category, Item, `caffeine (mg/serving)`) %>% mutate(how_much_more = `caffeine (mg/serving)`/100)
datatable(greater_coffee)

We normalise caffeine content found in these drinks using a good ol’180mg cup of coffee as the baseline. Apparently, many things are stronger than coffee.

4. How much of something can you consume before it starts being bad for you?

Maximum allowable limits

Hover over to observe the quantity! You just need 0.18 cups of Chameleon Cold Brew to be over the limit (this might be a data entry error)!

max_daily <- caffeine %>% select(category, Item, maxdaily) %>% arrange(desc(maxdaily))
max_daily$Item <- as.factor(max_daily$Item)
plot_ly(max_daily, y = ~Item, x = ~maxdaily, width = 900, height = 700, type = "bar", 
    orientation = "h", marker = list(color = max_daily$maxdaily, showscale = T)) %>% 
    layout(title = "Maximum daily intake based on caffeine content (mg/serving) with scale (right)")