# Introduction

This is an update of a previous article. Since data for the year 2016 is available, I updated this report.

Who does not know it, the Oktoberfest in Munich, Germany, or at least heard about it? In this brief analysis I would like to show some statistics and data visualizations about the Oktoberfest.

My motivation to perform the analysis is firstly to explore an interesting data set, but also to test my hopefully new publishing workflow involving (1) R Notebook and (2) blogdown. One goal is to make the analysis available on GitHub and also use the same file for a Blog post.

# Setup and data overview

The data is an open data set from the Munich Opendata portal.

# load the tidyverse and other libraries
library("tidyverse")
library("knitr")
library("scales")
library("ggthemes")

# constants
r_90_d <- theme(axis.text.x = element_text(angle = 90, hjust = 1))
caption <- "https://gresch.github.io/"
my_theme <- theme_hc()
cp <- c("#1d3a7b", "#b53fa8", "#224dce", "#529480", "#23f2c0") # color palette "Song of the Unexpected Color Palette" http://www.color-hex.com/color-palette/29752
# load the data

# translate variable names to English

data <- rename(data, year = jahr,
duration_days = dauer,
visitor_year = besucher_gesamt,
visitor_day = besucher_tag,
beer_price = bier_preis,
beer_sold = bier_konsum,
chicken_price = hendl_preis,
chicken_sold = hendl_konsum)

# unify measures

data$visitor_year <- data$visitor_year * 1000000
data$visitor_day <- data$visitor_day * 1000
data$beer_sold <- data$beer_sold * 100
# create table overview
kable(summary(data))
year duration_days visitor_year visitor_day beer_price beer_sold chicken_price chicken_sold
Min. :1985 Min. :16.00 Min. :5500000 Min. :329000 Min. : 3.200 Min. :4869800 Min. : 3.920 Min. :351705
1st Qu.:1993 1st Qu.:16.00 1st Qu.:5975000 1st Qu.:369000 1st Qu.: 4.638 1st Qu.:5249350 1st Qu.: 5.215 1st Qu.:487975
Median :2000 Median :16.00 Median :6400000 Median :394000 Median : 6.410 Median :5883400 Median : 7.975 Median :559201
Mean :2000 Mean :16.25 Mean :6318750 Mean :389188 Mean : 6.456 Mean :6071172 Mean : 7.203 Mean :583718
3rd Qu.:2008 3rd Qu.:16.00 3rd Qu.:6525000 3rd Qu.:406000 3rd Qu.: 8.320 3rd Qu.:6698125 3rd Qu.: 9.090 3rd Qu.:698493
Max. :2016 Max. :18.00 Max. :7100000 Max. :444000 Max. :10.570 Max. :7922500 Max. :11.100 Max. :807710

The data set spans 32 years (32 observations) and eight variables that include:

• The year of the Oktoberfest
• The duration of the fest in days (duration_days)
• Visitors for each year (visitor_year)
• Mean visitors per day for each year (visitor_day)
• Mean price for one beer (1 Liter) for each year (beer_price)
• Whole amount of beer sold for each year in Liter (beer_sold)
• Mean price for one chicken for each year (chicken_price)
• Whole amount of chickens sold for each year (chicken_sold)

# Data exploration and analysis

I explore the observations of some of the variables mentioned above to get a better understanding of the data. Usually, histograms are a perfect tool to explore data’s distribution and to find out interesting details. In this analysis I show how the data progressed over the years 1986-2015 using a barchart.

## Visitors per year

# show visitors per year
ggplot(data) +
geom_bar(aes(year, visitor_year),
stat = "identity",
fill = cp[1]) +
labs(
title = "Number of visitors to the Oktoberfest vary and are currently in decline since 2011",
subtitle = "Number of visitors to the Oktoberfest",
caption = caption,
x = "Year",
y = "Visitors per year"
) +
scale_y_continuous(
labels = scales::comma,
limits = c(0, 8000000),
breaks = seq(0, 8000000, by = 1000000)) +
scale_x_continuous(breaks = seq(1985, 2016, by = 1)) +
r_90_d + my_theme

Over the 31 years, on average, 6,341,935 people visited the Oktoberfest each year. What is interesting to note is that there was a major decline in visitors just after 9/11 in year 2001 – Oktoberfest starts usually end of September each year. The trend of yearly visitors looks the same as for the average daily visitors.

# show divergence of visitors per year from the mean
ggplot(data) +
geom_bar(aes(year, visitor_year - mean(visitor_year)),
stat = "identity",
fill = cp[1]) +
labs(
title = "Number of visitors to the Oktoberfest vary and are currently in decline since 2011",
subtitle = "Divergence of visitors compared to the mean ",
caption = caption,
x = "Year",
y = "Divergence from mean visitors per year"
) +
scale_y_continuous(
labels = scales::comma,
limits = c(-1000000, 1000000),
breaks = seq(-1000000, 1000000, by = 100000)
) +
scale_x_continuous(
breaks = seq(1985, 2016, by = 1)) +
r_90_d + my_theme

In this graph it is even better to see that in 1985 the most people visited the Oktoberfest. Then there is a stark decline of visitors with 1988 hitting the bottom with ~600,000 visitors less than average. After that the number of visitors increased until 2001 or nine-eleven.

## Beer price development

# show beer price development
ggplot(data) +
geom_bar(aes(year, beer_price),
stat = "identity",
fill = cp[2]) +
labs(
title = "Oktoberfest beer price increases 0.24 € per year, on average",
subtitle = "Overview of beer price development -- Oktoberfest",
caption = caption,
x = "Year",
y = "Price for one liter beer"
) +
scale_y_continuous(
labels = scales::dollar_format(suffix = "€", prefix = ""),
limits = c(0, 11.5),
breaks = seq(0, 11.5, by = 0.5)) +
scale_x_continuous(
breaks = seq(1985, 2016, by = 1)) +
r_90_d + my_theme

Not surprisingly, the beer price increased over the years. starting in 1985 with 3.20€ and hitting 10.75€ in 2015. That would be an average yearly increase of $(10.57 - 3.20) / 30 = 0.23774193548387096774193548387097$

# show beer price increase based on former year
data %>%
mutate(increase = beer_price - lag(beer_price, default = NA)) %>%
ggplot() +
geom_bar(aes(year, increase),
stat = "identity",
fill = cp[2]) +
labs(
title = "There were steep beer price increases in 1991, 2000, 2007, and 2008 compared to the former years",
subtitle = "Beer price increase compared to former year -- Oktoberfest",
caption = caption,
x = "Year",
y = "Beer price increase compared to year before"
) +
scale_y_continuous(
labels = scales::dollar_format(suffix = "€", prefix = ""),
breaks = seq(-10, +10, by = 0.05)) +
scale_x_continuous(
breaks = seq(1985, 2016, by = 1),
limits = c(1985, 2017)) +
r_90_d + my_theme

Interestingly, in 1991 there was, on average, a major increase of 0.44€ per beer compared to 1990. Why would that be? The following years the increase levels off until 1996 to rise again with the year 2000 showing a major increase of 0.55€ compared to 1999, which is more than double than the average increase over time.

## Beer sold over the years

ggplot(data) +
geom_bar(aes(year, beer_sold / duration_days),
stat = "identity",
fill = cp[3]) +
labs(
title = "Regardless of fewer visitors in recent years, the amount of sold beer is higher than in the nineties",
subtitle = "Overview of beer sold in Liter per day, on average -- Oktoberfest",
caption = caption,
x = "Year",
y = "Beer sold per day (Liter)"
) +
scale_y_continuous(
labels = scales::comma,
breaks = seq(0, 500000, by = 50000)) +
scale_x_continuous(
breaks = seq(1985, 2016, by = 1),
limits = c(1985, 2017)) +
r_90_d + my_theme

On average, each year on each day 372,837 liters of beer was sold. The bar chart looks multimodel with a peak in 1999 and a low in 2001 (where less people visited the Oktoberfest). Since then, the amount of beer sold is increasing.

ggplot(data) +
geom_bar(aes(year, beer_sold / duration_days / visitor_day),
stat = "identity",
fill = cp[3]) +
labs(
title = "On average, Oktoberfest visitors drink more beer -- over 1.2 Liter in 2015",
subtitle = "Overview of beer sold in Liter per day per visitor, on average",
caption = caption,
x = "Year",
y = "Beer sold per day (Liter)"
) +
scale_y_continuous(
labels = scales::comma,
breaks = seq(0, 1.5, by = 0.1)) +
scale_x_continuous(
breaks = seq(1985, 2016, by = 1),
limits = c(1984, 2017)) +
r_90_d + my_theme

## Chicken price development

# show Chicken price development
ggplot(data) +
geom_bar(aes(year, chicken_price),
stat = "identity",
fill = cp[4]) +
labs(
title = "Oktoberfest chicken price increases over the years with a steep increase in 2000",
subtitle = "Overview of chicken price",
caption = caption,
x = "Year",
y = "Chicken price"
) +
scale_y_continuous(
labels = scales::dollar_format(suffix = "€", prefix = ""),
limits = c(0, 11.5),
breaks = seq(0, 11.5, by = 1)) +
scale_x_continuous(
breaks = seq(1985, 2016, by = 1)) +
r_90_d + my_theme

The chicken price increased over the years until 1995 to fall slightly. However a huge increase was in the year 2000. The price was increased by 2.47€ more than twelve times the average price increase. Starting in 1985 with 4.77€ and hitting 11.00€ in 2016. That would be an average yearly increase of $(11.00 - 4.77) / 31 = 0.2009677$

# show chicken price increase based on former year
data %>%
mutate(increase = chicken_price - lag(chicken_price, default = NA)) %>%
ggplot() +
geom_bar(aes(year, increase),
stat = "identity",
fill = cp[4]) +
labs(
title = "Oktoberfest chicken price increased 46% in the year 2000",
subtitle = "Overview of chicken price compared to year before",
caption = caption,
x = "Year",
y = "Chicken price increase compared to year before"
) +
scale_y_continuous(
labels = scales::dollar_format(suffix = "€", prefix = ""),
breaks = seq(-10, +10, by = 0.25)) +
scale_x_continuous(
breaks = seq(1985, 2017, by = 1),
limits = c(1984, 2017)) +
r_90_d + my_theme

Interestingly, in 1986 the price for chicken dropped four times the average with then a steady increase until 1986. Then in the year 2000 the aforementioned rise of 2.47€. Also, in 2013 we saw a major price increase compared to 2012, namely 1.03€.

# Conlusion

This was an analysis of available data of the Oktoberfest by Munich Opendata portal. In brief I learned that

• the total numbers of visitors are in decline
• in the years were terrorism attacks happened the visitors dropped, esp. in 2001
• the price for one Liter of beer increases, esp. in the year 2000 with an increase of 0.55€
• the total amount of beer sold increases
• alarmingly, the amount of beer per visitor also increases after 1997; on average on visitor drinks over 1.2 liters of beer per day
• the price for chicken doubled compared to 1999
• that there was a 46% price increase for a chicken in the year 2000!

Of course this analysis is limited by the amount of data available for such analyses. Averaging over already aggregated data allow limited insights and statistics need to be interpreted carefully.

From an approach to data science (or data analysis with R) I learned that

• even well structured (ggplot) code can get messy
• an interesting question to begin with would streamline the analysis
• I am a long cry away from story telling with data
• I was unsuccessful to globally set figure width in the R Notebook

If you have any feedback on the analysis, the approach or tools, please feel free to contact me.