install.packages("rmarkdown")
install.packages("tidyverse")
2 Using Quarto
No R functions are introduced here, but Quarto syntax is explained.
2.1 Introduction
This chapter will teach you how to use Quarto to create quality documents that incorporate text and R code seamlessly.
First, though, let’s make sure you are set up in your project in RStudio.
2.1.1 Are you in your project?
If you followed the directions at the end of the last chapter, you should have created/named a project called intro_stats
. Let’s make sure you’re in that project.
Look at either the top (Posit Cloud) or the upper right corner of the RStudio screen. Does it say intro_stats
? If so, congratulations! You are in your project.
If you’re not in the intro_stats
project,
- in Posit Cloud, click on “Your Workspace” and click on the “intro_stats” project you created in the last chapter.
- for local installations, click on whatever it does say in the upper right corner (probably
Project: (None)
). You can click “Open Project” but it’s likely that theintro_stats
project appears in the drop-down menu in your list of recently accessed projects. So click on the projectintro_stats
.
2.1.2 Install new packages
Type the following commands at the Console:1
It may take a few minutes to install these packages, especially tidyverse
, so be patient.
2.1.3 Download the Quarto file
You need to download this chapter as an Quarto (.qmd
) file. Please click the following link to do so:
https://vectorposse.github.io/intro_stats/chapter_downloads/02-using_quarto.qmd
The file is now likely sitting in a Downloads folder on your machine (or wherever you have set up for web files to download).
- If you are logged into RStudio on a web browser, go to the Files tab and click the “Upload” button. From there, leave the first box alone (“Target directory”). Click the “Choose File” button and navigate to the folder on your machine containing the file
02_using-quarto.qmd
. Select that file and click “OK” to upload the file. Then you will be able to open the file in RStudio simply by clicking on it in the Files pane. - If you using RStudio on your own machine, you will need to move the file from your Downloads folder into the
intro_stats
project directory you created at the end of the last chapter. (Again, if you haven’t created theintro_stats
project, please go back to Chapter 1 and follow the directions for doing that.) Moving files around is most easily done using File Explorer in Windows or the Finder in MacOS.
At this point, you should be reading this text while looking at it in the left pane of RStudio, after opening up the Quarto (.qmd
) file and scrolling down until you reach this spot. If you are reading this text while still browsing the book online, be aware that there are several instructions below that won’t make any sense because you’re not looking at the plain text file with all the code in it. Much of the material in this book can be read and enjoyed online, but the real learning comes from downloading the chapter files (starting with Chapter 2—this one) and working through them in RStudio.
2.2 What is Quarto?
Before we can talk about Quarto documents, we need to talk about markdown.
Markdown is a way of using plain text with simple characters to indicate formatting choices in a document. For example, in a markdown file, one can make headers by using number signs (or hashtags as the kids are calling them these days2). The Quarto file itself is just a plain text file. To see the formatting, the file has to be converted to HTML, which is the format used for web pages. (This process is described below.)
Quarto documents are just text files that use markdown. They also allow you to include R code alongside the text. Here’s an example of a “code chunk”:
1 + 1
[1] 2
Click the little dark green, right-facing arrow in the upper-right corner of the code chunk. (The icon I’m referring to is next to a lighter green icon with a downward-facing arrow.) When you “run” the code chunk like this, R produces the appropriate output. We’ll say more about code chunks later in this document.
This document—with text and code chunks together—is called an Quarto document.
2.3 Rendering a document
There is a button in the toolbar right above the text that says “Render”. It’s next to a big, blue arrow. Go ahead and click it. See what happens.
Once the pretty output is generated, take a few moments to look back and forth between it and the original Quarto document (the plain text in RStudio). You can see some tricks that we won’t need much (embedding web links, making lists, etc.) and some tricks that we will use in every chapter (like R code chunks).
At first, you’ll want to work back and forth between the Quarto document and the HTML file to get used to how the formatting in the plain text file get translated to output in the HTML file. You can hit the Render button any time you want to update the HTML output with your most recent changes. After a while, you will look at the HTML file less often and work mostly in the Quarto file, only rendering the final HTML output occasionally, most importantly when you are finished and ready to produce your final draft.
One additional benefit of rendering often is that it also automatically saves your progress. If you choose to render less often, you’ll want to save your document manually from time to time just so you don’t lose any work in case of a computer malfunction. (The easiest way to save things in pretty much every piece of software anywhere is to use the keyboard shortcut Ctrl-S in Windows or Cmd-S in MacOS.)
2.4 Literate programming
Quarto is one way to implement a “literate programming” paradigm. The concept of literate programming was famously described by Donald Knuth, an eminent computer scientist. The idea is that computer programs should not appear in sterile files that are full of hard-to-read, abstruse lines of computer code. Instead, functional computer code should appear interspersed with writing that explains the code.
2.5 Reproducible research
One huge benefit of organizing your work into Quarto documents is that it makes your work reproducible. This means that anyone with access to your data and your Quarto document should be able to re-create the exact same analysis you did.
This is a far cry from what generally happens in research. For example, if a person does all their work in Microsoft Excel, they make a series of choices in how they format and analyze their data, and all those choices take the form of menu commands that they point and click with a mouse. There is no record of the exact sequence of clicks that took them from point A to B all the way to Z. All they have to show for their work is the “clean” spreadsheet and anything they’ve written down or communicated about their results. If there were any errors along the way, they would be very hard to track down.3
Reproducibility should be a minimum prerequisite for all statistical analysis. Sadly, that is not the case in most of the research world. We are training you to be better.
2.6 Structure of a Quarto document
Let’s start from the top. Look at the very beginning of the plain Quarto document. (If you’re in RStudio, you are looking at the Quarto document. If you are looking at the pretty HTML file, you’ll need to go back to RStudio.) The section at the very top of the file that starts and ends with three hyphens is called the YAML header. (Search for YAML on the internet it if you really care why.) The title of the document appears already, but you’ll need to substitute your name and date in the obvious place. Scroll up and do that now. Come back here when you’re done.
You’ve made changes to the document, so you’ll need to push the “Render” button again. Once that’s done, look at the resulting HTML document. The YAML header has been converted into a nicely formatted document header with the new information you’ve provided.
Next, you’ll see three colons followed by the text {.callout-note}
. Then there is some text that lists R functions that will be introduced in the chapter. (There aren’t any in this chapter, but there usually will be in future chapters.) Then there are three more colons. It’s very important that you do not modify the three colons or any of the text that appears on the same line in the the curly braces next to the colons! This will mess up your ability to Render the document.
Another place you will see the three-colon “tags” will be when providing answers to homework questions. It will start with three colons next to {.answer}
and end with three colons. While you should be careful not to modify those tags, you can and should modify the plain text that lies between the colon tag fences. Let’s try it:
Replace this text here with something else. (Anything you want to type here is fine.) Then render the document and see how it appears in the HTML file.
To be clear, the colorful answer boxes are not part of the standard Quarto tool set. The custom code to create those answer boxes is imported from the book’s website.
We also have section headers throughout, which in the Quarto file look like:
2.7 Section header
The hashtags are markdown code for formatting headers. Additional hashtags will create subsections:
2.7.1 Not quite as big
We could actually use a single number sign, but #
makes a header as big as the title, which is too big and is reserved for the main title of the whole document. Therefore, we will use ##
for section headers and ###
for subsections.
You do need to make sure that there is a blank line before and after each section header. To see why, look at the HTML document at this spot: ## Is this a new section? Do you see the problem?
Put a blank line before and after the line above that says “Is this a new section?” Render one more time and make sure that the line now shows up as a proper section header.
2.8 Other formatting tricks
You can make text italic or bold by using asterisks. (Don’t forget to look at the HTML to see the result.)
You can make bullet-point lists. These can be made with asterisks, but you’ll need to start after a blank line, then put asterisks at the beginning of each new line, followed by a space, as follows:
- First item
- Second item
If you want sub-items, indent at least two spaces and use another asterisk followed by a space.
- Item
- Sub-item
- Sub-item
- Item
- Item
Or you can make ordered lists. Just use numbers and Quarto will do all the work for you. (Again, make sure you’re starting after a blank line and that there is a space after the periods.)
- First Item
- Second Item
- Third Item
We can make horizontal rules. There are lots of ways of doing this, but we prefer a bunch of asterisks in a row (at least three—we’ve used five for more visual impact in the Quarto document):
There are many more formatting tricks available. For a good resource on all Quarto stuff, click on this link. And notice in the previous sentence the syntax for including hyperlinks in your document.
2.9 R code chunks
The most powerful feature of Quarto is the ability to do data analysis right inside the document. This is accomplished by including R code chunks. An R code chunk doesn’t just show you the R code in your output file; it also runs that code and generates output that appears right below the code chunk.
An R code chunk starts with three “backticks” followed by the letter r enclosed in braces, and it ends with three more backticks. (The backtick is usually in the upper-left corner of your keyboard, next to the number 1 and sharing a key with the tilde ~.)
In RStudio, click the little dark green, right-facing arrow in the upper-right corner of the code chunk below, just as you did earlier.
# Here's some sample R code
<- c(1, 2, 3, 4)
test sum(test)
[1] 10
After pushing the dark green arrow, you should notice that the output of the R code appeared like magic. If you Render the HTML output, you should see the same output appear. If you hover your mouse over the dark green arrow, you should see the words “Run Current Chunk”. We’ll call this the Run button for short.
We need to address something here that always confuses people new to R and Quarto. A number sign (aka “hashtag”) in a Quarto document gives us section headers. In R code, however, a number sign indicates a “comment” line. In the R code above, the line # Here's some sample R code
is not executed as R code. But you can clearly see that the two lines following were executed as R code. So be careful! Number signs inside and outside R code chunks behave very differently.
Typically, the first code chunk that appears in our document will load any packages we need. We will be using a package called tidyverse
(which is really a collection of lots of different packages) throughout the course. We load it now. Click on the Run button (the dark green, right-facing arrow) in the code chunk below. (If you get an error that says “there is no package called tidyverse” that means you failed to install it earlier. Go back and use install.packages
as instructed at the top of this document.)
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
The output here consists of a bunch of information generated when trying to load the package. These are not errors, even though one section is labeled “Conflicts”. Usually, errors appear with the word “Error”, so it’s typically clear when something just didn’t work. Also note that once you’ve loaded a package, you don’t need to load it again until you restart your R session. For example, if you go back and try to run the code chunk above one more time, the output will disappear. That’s because tidyverse
is already loaded, so the second “run” doesn’t actually generate output anymore.
Okay, let’s do something interesting now. We’ll revisit the penguins
data set we introduced in the previous chapter. Remember, though, that this data set also lives in a package that needs to be loaded. Run the code chunk below to load the palmerpenguins
package. (There may not be any output. But the command worked silently in the background to load the palmerpenguins
package.)
library(palmerpenguins)
Let’s see what happens when we try to run multiple commands in one code chunk:
head(penguins)
tail(penguins)
str(penguins)
tibble [344 × 8] (S3: tbl_df/tbl/data.frame)
$ species : Factor w/ 3 levels "Adelie","Chinstrap",..: 1 1 1 1 1 1 1 1 1 1 ...
$ island : Factor w/ 3 levels "Biscoe","Dream",..: 3 3 3 3 3 3 3 3 3 3 ...
$ bill_length_mm : num [1:344] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ...
$ bill_depth_mm : num [1:344] 18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ...
$ flipper_length_mm: int [1:344] 181 186 195 NA 193 190 181 195 193 190 ...
$ body_mass_g : int [1:344] 3750 3800 3250 NA 3450 3650 3625 4675 3475 4250 ...
$ sex : Factor w/ 2 levels "female","male": 2 1 1 NA 1 2 1 2 NA NA ...
$ year : int [1:344] 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...
If you’re looking at this in RStudio, it’s a bit of a mess. RStudio did its best to give you what you asked for, but there are three separate commands here. The first two (head
and tail
) print some of the data, so the first two boxes of output are tables showing you the head and the tail of the data. The next one (str
) normally just prints some information to the Console. So RStudio gave you an R Console box with the output of this command.
If you look at the HTML file, you can see the situation isn’t as bad. Each command and its corresponding output appear nicely separated there.
Nevertheless, it will be good practice and a good habit to get into to put multiple output-generating commands in their own R code chunks. Run the following code chunks and compare the output to the mess you saw above:
head(penguins)
tail(penguins)
str(penguins)
tibble [344 × 8] (S3: tbl_df/tbl/data.frame)
$ species : Factor w/ 3 levels "Adelie","Chinstrap",..: 1 1 1 1 1 1 1 1 1 1 ...
$ island : Factor w/ 3 levels "Biscoe","Dream",..: 3 3 3 3 3 3 3 3 3 3 ...
$ bill_length_mm : num [1:344] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ...
$ bill_depth_mm : num [1:344] 18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ...
$ flipper_length_mm: int [1:344] 181 186 195 NA 193 190 181 195 193 190 ...
$ body_mass_g : int [1:344] 3750 3800 3250 NA 3450 3650 3625 4675 3475 4250 ...
$ sex : Factor w/ 2 levels "female","male": 2 1 1 NA 1 2 1 2 NA NA ...
$ year : int [1:344] 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...
This won’t look any different in the HTML file, but it sure looks a lot cleaner in RStudio.
What about the two lines of the first code chunk we ran above?
<- c(1, 2, 3, 4)
test sum(test)
[1] 10
Should these two lines be separated into two code chunks? If you run it, you’ll see only one piece of output. That’s because the line test <- c(1, 2, 3, 4)
works invisibly in the background. The vector test
gets assigned, but no output is produced. Try it and see (push the Run button):
<- c(1, 2, 3, 4) test
So while there’s no harm in separating these lines and putting them in their own chunks, it’s not strictly necessary. You really only need to separate lines when they produce output. (And even then, if you forget, RStudio will kindly give you multiple boxes of output.)
Suppose we define a new variable called test2
in a code chunk. FOR PURPOSES OF THIS EXERCISE, DO NOT HIT THE RUN BUTTON YET! But do go look at the HTML file.
<- c("a", "b", "c")
test2 test2
[1] "a" "b" "c"
The first line defines test2
invisibly. The second line asks R to print the value of test2
, and the HTML document shows the vector of three letters printed in the output below the code chunk. DON’T HIT THE RUN BUTTON YET!
Okay, now go to the Console in RStudio (in the lower left corner of the screen). Try typing test2. You should get an “Error: object ‘test2’ not found.”
Why does this happen? The Global Environment doesn’t know about it yet. Look in the upper right corner of the screen, under the “Environment” tab. You should see test
, but not test2
.
Okay, NOW GO BACK AND CLICK THE RUN BUTTON IN THE LAST CHUNK ABOVE. The output appears in RStudio below the code chunk and the Global Environment has been updated.
The take home message is this:
Be sure to run all your code chunks in RStudio!
In RStudio, look in the toolbar above this document, toward the right. You should see the word “Run” with a little drop-down menu next to it. Click on that drop-down menu and select “Run All”. Do you see what happened? All the code chunks ran again, and that means that anything in the Global Environment will now be updated to reflect the definitions made in the Quarto document.
It’s a good idea to “Run All” when you first open a new Quarto document. This will ensure that all your code chunks have their output below them (meaning you don’t have to go through and click the Run button manually for each chunk, one at a time) and the Global Environment will accurately reflect the variables you are using.
You can “Run All” from time to time, but it’s easier just to “Run All” once at the beginning, and then Run individual R code chunks manually as you create them.
Now go back to the Environment tab and find the icon with the little broom on it. Click it. You will get a popup warning you that you about to “remove all objects from the environment”. Click “Yes”. Now the Global Environment is empty. Go back to the “Run” menu and select “Run All”. All the objects you defined in the Quarto document are back.
Clearing out your environment can be useful from time to time. Maybe you’ve been working on a chapter for a while and you’ve tried a bunch of stuff that didn’t work, or you went back and changed a bunch of code. Eventually, all that junk accumulates in your Global Environment and it can mess up your Quarto document. For example, let’s define a variable called my_variable
.
<- 42 my_variable
Then, let’s do some calculation with my_variable
.
* 2 my_variable
[1] 84
Perhaps later you decide you don’t really need my_variable
. Put a hashtag in front of the code my_variable <- 42
to comment it out so that it will no longer run, but don’t touch the next code chunk where you multiply it by 2. Now try running the code chunk with my_variable * 2
again. Note that my_variable
is still sitting in your Global Environment, so you don’t get any error messages. R can still see and access my_variable
.
Now go to the “Run” menu and select “Restart R and Run All Chunks”. This clears the Global Environment and runs all the R code starting from the top of the Quarto document. This time you will get an error message: object 'my_variable' not found
. You’ve tried to calculate with a variable called my_variable
that doesn’t exist anymore. (The line in which it was defined has been commented out.)
It’s best to make sure all your code chunks will run when loaded from a clean R session. The “Restart R and Run All Chunks” option is an easy way to both clear your environment and re-run all code chunks. You can do this as often as you want, but you will definitely want to do this one last time when you are done. At the end of the chapter, when you are ready to prepare the final draft, please select “Restart R and Run All Chunks”. Make sure everything still works!
To get rid of the error above, uncomment the line my_variable <- 42
by removing the hashtag you added earlier.
2.10 Inline R commands
You don’t need a standalone R code chunk to do computations. One neat feature is the ability to use R to calculate things right in the middle of your text.
Here’s an example. Suppose we wanted to compute the mean body mass (in grams) for the penguins in the penguins
data set. We could do this:
mean(penguins$body_mass_g, na.rm = TRUE)
[1] 4201.754
(The na.rm = TRUE
part is necessary because two of the penguins are missing body mass data. More on missing data in future chapters.)
But we can also do this inline by using backticks and putting the letter r
inside the first backtick. Go to the HTML document to see how the following sentence appears:
The mean body mass for penguins in the penguins
data set is 4201.754386 grams.
You can (and should) check to make sure your inline R code is working by checking the HTML output, but you don’t necessarily need to go to the HTML file to find out. In RStudio, click so that the cursor is somewhere in the middle of the inline code chunk in the paragraph above. (In other words, click on some part of the text in between the backticks.) Now hit Ctrl-Enter or Cmd-Enter (PC or Mac respectively). A little box should pop up that shows you the answer!
Notice that, in addition to the inline R command that calculated the mean, we also enclosed penguins
in backticks to make it stand out in the output. We’ll continue to do that for all computer commands and R functions. But to be clear, putting a word in backticks is just a formatting trick. If you want inline R code, you also need the letter r
followed by a space inside the backticks.
2.11 Inline R command errors
Working with inline code is trickier than working with code chunks. When a code chunk fails, the error message appears right there below the chunk. When inline code fails, it manifests in more subtle ways.
Observe the simple inline code below:
5
This code works just fine. (Test this yourself by clicking somewhere between the backticks and hitting Ctrl-Enter or Cmd-Enter.)
Hit the Render button and scroll down in the HTML document to see what appears here.
Now go up and change the inline code by adding a percent sign (%) after the 3 (so it will look like 2+3%).
Click somewhere between the backticks and hit Ctrl-Enter or Cmd-Enter. You should get an error.
Next, go to the Run menu and select “Restart R and Run All Chunks.” You get no errors. This makes sense because all the code chunks are working fine. The broken code is inline, not inside a code chunk.
Now try to hit the Render button. The “Background Jobs” tab will stay open with a bunch of error text. Scroll up a little in that error text until you see
Failed to parse the inline R code
That’s the only hint we have as to where the problem might live. We happen to know where to find the offending code, but imagine this code were buried in the middle of some long document.
The easiest solution is to use Ctrl-F or Cmd-F. This will open the “Find” tool. Start typing “2+3%” and it will take you to the problematic spot.
Fix the error by deleting the percent sign. Now hit the Render button again to make sure the problem really got fixed and that the output now appears correctly in the HTML document.
2.12 Copying and pasting
In future chapters, you will be shown how to run statistical analyses using R. Each chapter will give extensive explanations of the statistical concepts and demonstrations of the necessary R code. Afterwards, there will be one or more exercises that ask you to apply your new-found knowledge to run similar analyses on your own with different data.
The idea is that you should be able to copy and paste the R code from the previously worked examples. But you must be thoughtful about how you do this. The code cannot just be copied and pasted blindly. It must be modified so that it applies to the exercises with new data. This requires that you understand what the code is doing. You cannot effectively modify the code if you don’t know which parts to modify.
There will also be exercises in which you are asked to provide your own explanations and interpretations of your analyses. These should not be copied and pasted from any previous work. These exercises are designed to help you understand the statistical concepts, so they must be in your own words, using your own understanding.
In order to be successful in these chapters, you must do the following:
- Read every part of the chapter carefully!
- It will be tempting to skim over the paragraphs quickly and just jump from code chunk to code chunk. This will be highly detrimental to your ability to gain the necessary understanding—not just to complete the chapter, but to succeed in statistics overall.
- Copy and paste thoughtfully!
- Not every piece of code from the early part of the chapter will necessarily apply to the later exercises. And the code that does apply will need to be modified (sometimes quite heavily) to be able to run new analyses. Your job is to understand how the code works so that you can make changes to it without breaking things. If you don’t understand a piece of code, don’t copy and paste it until you’ve read and re-read the earlier exposition that explains how the code works.
One final note about copying and pasting. Sometimes, people will try to copy and paste code from the HTML output file. This is a bad idea. The HTML document uses special characters to make the output look pretty, but these characters don’t actually work as plain text in a Quarto document. The same applies to things copied and pasted from a Word document or another website. If you need to copy and paste code, be sure to find the plain text Quarto file (the one with the .qmd extension here in RStudio) and copy and paste from that.
2.13 Conclusion
That’s it! There wasn’t too much you were asked to do for this assignment that will actually show up in the HTML output. (Make sure you did do the three things that were asked of you, however: one was adding your name and the date to the YAML header, one was typing something in the blue answer box, and the last was to make a section header appear properly.) As you gain confidence and as we move into more serious stats material, you will be asked to do a lot more.
2.13.1 Preparing and submitting your assignment
If you look in your project folder, you should see (at least) three files:
[a file with a .Rproj extension, either project.Rproj or intro_stats.Rproj]
02-using_quarto.qmd
02-using_quarto.html
The first file (with extension .Rproj
) you were instructed never to touch.
The next file has extension .qmd
. This is your Quarto file. It’s the file you’re looking at right now. It is really nothing more than a plain text file, although when you open it in RStudio, some magic allows you to see the output from the code chunks you run.
Finally, you have a file with extension .html
. This is the pretty output file generated when you hit the “Render” button. This is the “final product” of your work.
(If you happen to see other files or folders in your project folder, you should ignore those and not mess with them.)
There are several steps that you should follow at the end of each of every chapter.
- From the “Run” menu, select “Restart R and Run All Chunks”.
- Deal with any code errors that crop up. Repeat steps 1–2 until there are no more code errors.
- Spell check your document by clicking the icon with “ABC” and a check mark.
- Hit the “Render” button one last time to generate the final draft of the HTML file. (If there are errors here, you may need to go back and fix broken inline code or other markdown issues.)
- Proofread the HTML file carefully. If there are errors, go back and fix them, then repeat steps 1–5 again.
If you have completed this chapter as part of a statistics course, follow the directions you receive from your professor to submit your assignment.
If you get an authorization error, that means you are accessing RStudio through a managed local server and you’ll need your admin to install the package for you.↩︎
Also called “pound signs” or “octothorpes”. This is also an example of formatting a footnote!↩︎
If you think these errors are trivial, Google ``Reinhart and Rogoff Excel error’’ to read about the catastrophic consequences of seemingly trivial Excel mistakes.↩︎