R Markdown - A Better Way of Communicating with Data

Danyang Dai

The University of Melbourne

August 24, 2020

https://rmarkdown-rladiesmelbourne.netlify.app

1 / 29

About Me

Graduated from Monash University with Bachelors of Commerce in 2018
Currently a Masters Student at the University of Melbourne

2 / 29

Find Me at

danyangd@student.unimelb.edu.au

https://dai.netlify.app

@Daidaidai2014

@DanyangDai

https://www.linkedin.com/in/danyang-dai-7529b4152/

3 / 29

Why R Markdown4 / 29

Why R Markdown

5 / 29

Why R Markdown

Hypothesis testing

6 / 29

Why R Markdown

Hypothesis testing

Bayesian Estimation and Graphical presentation

7 / 29

Why R Markdown

Hypothesis testing

Bayesian Estimation and Graphical presentation

Demonstration of Reproducible report

8 / 29

Case Study - Hypothesis Testing

Example - yearly wage of 474 bank employees

y: natural logarithm of salary (LOGSAL)
$x_{1}$ : individual's number of completed years of schooling (EDUC)
$x_{2}$ : information on the employee's gender (GENDER: 0 for females, 1 for males)
$x_{3}$ : whether or not they belong to a minority group (MINORITY : 0 for non-minority, 1 for minorities)
$x_{4}$ : a categorical variable indicating the nature of the position in which the individual is employed (JOBCAT: 1 for administrative jobs, 2 for custodial jobs, and 3 for management jobs)
We are interested in testing hypotheses in the model
$y = β_{0} + β_{e d u c} x_{1} + β_{g e n d e r} x_{2} + β_{m i n o r i t y} x_{3} + β_{j o b c a t} x_{4} + u_{i}$

Data provided by Professor Chris Skeels in Econometrics 3 ECOM90013

9 / 29

Hypothesis Testing

Does Education affect annual salary?

$H_{0} : β_{e d u c} = 0$
$H_{1} : β_{e d u c} \neq 0$

## LM Test
lm0 <- lm(LOGSAL ~ GENDER + MINORITY + JOBCAT, data = wages)
e0 <- residuals(lm0)
lm1 <- lm(e0 ~ EDUC + GENDER + MINORITY + JOBCAT, data = wages)
e1 <- summary(lm1)
e1rsq <- e1$r.squared
test1 <- nrow(wages) * e1rsq

```{r, echo = FALSE, result = 'asis'}
cat(
  "Under the null hypothesis with degree of freedom equal to 1,",
  " the test statistic is ",round(test1,4),
  " and critical value is ", round(qchisq(0.95,1),4)
)
```

Under the null hypothesis with degree of freedom equal to 1 , the test statistic is 125.7683 and the critical value is 3.8415.

10 / 29

Does Education affect annual salary?

$H_{0} : β_{e d u c} = 0$
$H_{1} : β_{e d u c} \neq 0$

reject_h0 <- test1 > round(qchisq(0.95, 1), 4)

Since the test statistic for LM1 is `r if(reject_h0) "greater" else "smaller" ` greater than the critical, therefore we `r if(reject_h0) "" else " cannot" ` reject the null hypothesis and conclude that $β_{e d u c}$ is `r if(reject_h0) "" else " not"` significant at 5% level.

Since the test statistic for LM1 is greater than the critical, therefore we reject the null hypothesis and conclude that $β_{e d u c}$ is significant at 5% level.

11 / 29

Easy? Let's do another one!

Does Minority and Job category affect salary?

$H_{0} : β_{m i n o r i t y} =$ $β_{j o b c a t} = 0$ $H_{1} : β_{m i n o r i t y} \neq 0$
or $β_{j o b c a t} \neq 0$

lmrest <- lm(formula = LOGSAL ~ EDUC + GENDER, data = wages)
e2 <- summary(lmrest)$residuals
lme2 <- lm(e2 ~ EDUC + GENDER + MINORITY + JOBCAT, data = wages)
e2.sqr <- summary(lme2)$r.squared
test2 <- nrow(wages) * e2.sqr
print("Under the null hypothesis with degree of freedom equal to 2")

## [1] "Under the null hypothesis with degree of freedom equal to 2"

print(paste0("the test statistic is ", round(test2, 4)))

## [1] "the test statistic is 208.745"

print(paste0("The critical value is ", round(qchisq(0.95, 2), 4)))

## [1] "The critical value is 5.9915"

12 / 29

Does Minority and Job category affect salary?

$H_{0} : β_{m i n o r i t y} =$ $β_{j o b c a t} = 0$ $H_{1} : β_{m i n o r i t y} \neq 0$
or $β_{j o b c a t} \neq 0$

reject_h0.2 <- test2 > round(qchisq(0.95, 2), 4)

Since the test statistic for LM1 is `r if(reject_h0.2) "greater" else "smaller"` greater than the critical, therefore we `r if(reject_h0.2) "" else " cannot" ` reject the null hypothesis and conclude that `r if(reject_h0.2) "at least one of" else "none of" ` at least one of $β_{m i n o r i t y}$ and $β_{j o b c a t}$ is significant at 5% level.
Since the test statistic for LM1 is greater than the critical, therefore we reject the null hypothesis and conclude that at least one of $β_{m i n o r i t y}$ and $β_{j o b c a t}$ is significant at 5% level.

13 / 29

14 / 29

Bayesian Approach - Prior Adjustments

Bayes' Rule: $p (θ | Y) \propto L (θ | Y) p (θ)$

The posterior distribution is proportion to the kernel of posterior distribution times the distribution of the prior distribution.

15 / 29

Bayesian Approach - Prior Adjustments

Bayes' Rule: $p (θ | Y) \propto L (θ | Y) p (θ)$

The posterior distribution is proportion to the kernel of posterior distribution times the distribution of the prior distribution.

We have a time series for Australian real GDP from the Australian Real-Time Macroeconomic Database containing T=230 observations on the quarterly data from quarter 3 of 1959 to the last quarter of 2016.

Data provided by Tomasz Wozniak in Macroeconometrics ECOM90007

16 / 29

Setting Prior distributions parameters

Question: "Set the parameters of the natural-conjugate prior distribution and motivate the values that you choose."
Random Walk with drift process: $l o g G D P_{t} = μ_{0} + α l o g G D P_{t - 1} + u_{t}$
$α$ =1
$u_{t} \sim N (0, σ^{2})$
$P (σ^{2}) \sim I G_{2} (s, ν)$
Priors: $μ_{0}$ , $α$ , $σ^{2}$ , s, $ν$

17 / 29

Setting Prior distributions parameters

Question: "Set the parameters of the natural-conjugate prior distribution and motivate the values that you choose."
Random Walk with drift process: $l o g G D P_{t} = μ_{0} + α l o g G D P_{t - 1} + u_{t}$
$α$ =1
$u_{t} \sim N (0, σ^{2})$
$P (σ^{2}) \sim I G_{2} (s, ν)$
Priors: $μ_{0}$ , $α$ , $σ^{2}$ , s, $ν$

18 / 29

First set of priors testing

$P (β = [\begin{matrix} μ_{0} \\ α \end{matrix}] | σ^{2}) \sim N ([\begin{matrix} 0.01 \\ 1 \end{matrix}], σ^{2} [\begin{matrix} 1 & 0 \\ 0 & 10 \end{matrix}])$

The sample mean of $μ_{0}$ with 5000 draws is 0.0148564 and the variance is 0.011913.
The sample mean of $α$ with 5000 draws is 0.999454 and the variance is 0.000082.
The sample mean of $σ^{2}$ with 5000 draws is 0.017256 and the variance is 0.0000026.

19 / 29

Adjust prior parameters

$P (β = [\begin{matrix} μ_{0} \\ α \end{matrix}] | σ^{2}) \sim N ([\begin{matrix} 0.01 \\ 1 \end{matrix}], σ^{2} [\begin{matrix} 0.1 & 0 \\ 0 & 1 \end{matrix}])$

The sample mean of $μ_{0}$ with 5000 draws is 0.0024582 and the variance is 0.001686.
The sample mean of $α$ with 5000 draws is 1.00048 and the variance is 0.000012.
The sample mean of $σ^{2}$ with 5000 draws is 0.017258 and the variance is 0.0000026.

20 / 29

Adjust prior parameters

$P (β = [\begin{matrix} μ_{0} \\ α \end{matrix}] | σ^{2}) \sim N ([\begin{matrix} 0 \\ 1 \end{matrix}], σ^{2} [\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix}])$

The sample mean of $μ_{0}$ with 5000 draws is 0.0114882 and the variance is 0.011913.
The sample mean of $α$ with 5000 draws is 0.999733 and the variance is 0.000082.
The sample mean of $σ^{2}$ with 5000 draws is 0.017257 and the variance is 0.0000026.

21 / 29

Behind the Scenes

The sample mean of $μ_{0}$ with 5000 draws is `r round(mean(blogau$V1),8)` and the variance is `r round(var(blogau$V1),6)` .
The sample mean of $α$ with 5000 draws is `r round(mean(blogau$V2),6)` and the variance is `r round(var(blogau$V2),8)` .
The sample mean of $σ^{2}$ with 5000 draws is`r round(mean(blogau$sigmasq),6)` and the variance is `r round(var(blogau$sigmasq),8)` .

shh, witchcraft here. Why do I need this chunk to advance to next slide?
@yihui, please send help!

22 / 29

Outputting Plots

R Script

pdf(file="mu0plot.pdf", height=12, width=9)
ggplot(data=blogau, aes(x=V1)) +
    geom_histogram(binwidth=0.01, colour="black", fill="white")+
    ggtitle("Distribution of mu0")+
    xlab("mu0") 
dev.off()

23 / 29

Outputting Plots

R Script

pdf(file="mu0plot.pdf", height=12, width=9)
ggplot(data=blogau, aes(x=V1)) +
    geom_histogram(binwidth=0.01, colour="black", fill="white")+
    ggtitle("Distribution of mu0")+
    xlab("mu0") 
dev.off()

R Markdown

```{r,echo=FALSE,fig.height=12,fig.width=9,dev="pdf"}
ggplot(data=blogau, aes(x=V1)) +
    geom_histogram(binwidth=0.01, colour="black", fill="white")+
    ggtitle("Distribution of mu0")+
    xlab("mu0") 
```

24 / 29

Demonstrations

25 / 29

Reference

Alison Hill, June 2019, R-Ladies xaringan theme:

Professor Chris Skeels, S1 2020,Econometrics ECOM90013, University of Melbourne

Guidotti, E., Ardia, D., (2020), "COVID-19 Data Hub", Journal of Open Source Software 5(51):2376, doi:10.21105/joss.02376.

Tomasz Wozniak, S1 2020, Macroeconometrics ECOM90007, University of Melbourne

26 / 29

Sources

R Markdown Cheat Sheet

R Markdown: The Definitive Guide

Stack Overflow

RStudio Community

Workshops: Communicating with Data via R Markdown by Emi Tanaka

Recent Talks about R Markdown on the 2020 RStudio Conference:

One R Markdown Document, Fourteen Demos by Yihui Xie

How Rmarkdown changed my life by Professor Rob J Hyndman

These slides!

27 / 29

Questions?

28 / 29

Stay in Touch

danyangd@student.unimelb.edu.au

https://www.linkedin.com/in/danyang-dai-7529b4152/

29 / 29

Help

Keyboard shortcuts

↑, ←, Pg Up, k

Go to previous slide

↓, →, Pg Dn, Space, j

Go to next slide

Home

Go to first slide

End

Go to last slide

Number + Return

Go to specific slide

b / m / f

Toggle blackout / mirrored / fullscreen mode

Clone slideshow

Toggle presenter mode

Restart the presentation timer

?, h

Toggle this help

R Markdown - A Better Way of Communicating with Data

Danyang Dai

The University of Melbourne

August 24, 2020https://rmarkdown-rladiesmelbourne.netlify.app

About Me

Find Me at

Why R Markdown

Why R Markdown

Why R Markdown

Hypothesis testing

Why R Markdown

Hypothesis testing

Bayesian Estimation and Graphical presentation

Why R Markdown

Hypothesis testing

Bayesian Estimation and Graphical presentation

Demonstration of Reproducible report

Case Study - Hypothesis Testing

Hypothesis Testing

Does Education affect annual salary?

Does Education affect annual salary?

Easy? Let's do another one!

Does Minority and Job category affect salary?

Does Minority and Job category affect salary?

Bayesian Approach - Prior Adjustments

Bayesian Approach - Prior Adjustments

Setting Prior distributions parameters

Setting Prior distributions parameters

First set of priors testing

Adjust prior parameters

Adjust prior parameters

Behind the Scenes

Outputting Plots

R Script

Outputting Plots

R Script

R Markdown

Demonstrations

Reference

Sources

Recent Talks about R Markdown on the 2020 RStudio Conference:

Questions?

Stay in Touch

About Me

Help

August 24, 2020

https://rmarkdown-rladiesmelbourne.netlify.app