11 Hypothesis testing with randomization, Part 2

2.0

Functions introduced in this chapter

factor

11.1 Introduction

Now that we have learned about hypothesis testing, we’ll explore a different example. Although the rubric for performing the hypothesis test will not change, the individual steps will be implemented in a different way due to the research question we’re asking and the type of data used to answer it.

11.1.1 Install new packages

If you are using RStudio Workbench, you do not need to install any packages. (Any packages you need should already be installed by the server administrators.)

If you are using R and RStudio on your own machine instead of accessing RStudio Workbench through a browser, you’ll need to type the following command at the Console:

install.packages("MASS")

11.1.2 Download the R notebook file

Check the upper-right corner in RStudio to make sure you’re in your intro_stats project. Then click on the following link to download this chapter as an R notebook file (.Rmd).

https://vectorposse.github.io/intro_stats/chapter_downloads/11-hypothesis_testing_with_randomization_2.Rmd

Once the file is downloaded, move it to your project folder in RStudio and open it there.

11.1.3 Restart R and run all chunks

In RStudio, select “Restart R and Run All Chunks” from the “Run” menu.

11.2 Load packages

In additional to tidyverse and janitor, we load the MASS package to access the Melanoma data on patients in Denmark with malignant melanoma, and the infer package for inference tools.

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(janitor)

## 
## Attaching package: 'janitor'
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

library(MASS)

## 
## Attaching package: 'MASS'
## 
## The following object is masked from 'package:dplyr':
## 
##     select

library(infer)

11.3 Our research question

We know that certain types of cancer are more common among females or males. Is there a sex bias among patients with malignant melanoma?

Let’s jump into the “Exploratory data analysis” part of the rubric first.

11.4 Exploratory data analysis

11.4.1 Use data documentation (help files, code books, Google, etc.) to determine as much as possible about the data provenance and structure.

You can look at the help file by typing ?Melanoma at the Console. However, do not put that command here in a code chunk. The R Notebook has no way of displaying a help file when it’s processed. Be careful: there’s another data set called melanoma with a lower-case “m”. Make sure you are using an uppercase “M”.

There is a reference at the bottom of the help file.

Exercise 1

Using the reference in the help file, do an internet search to find the source of this data. How can you tell that this reference is not, in fact, a reference to a study of cancer patients in Denmark?

Please write up your answer here.

From the exercise above, we can see that it will be very difficult, if not impossible, to discover anything useful about the true provenance of the data (unless you happen to have a copy of that textbook, which in theory provided another more primary source). We will not know, for example, how the data was collected and if the patients consented to having their data shared publicly. The data is suitably anonymized, though, so we don’t have any serious concerns about the privacy of the data. Having said that, if a condition is rare enough, a dedicated research can often “de-anonymize” data by cross-referencing information in the data to other kinds of public records. But melanoma is not particularly rare. At any rate, all we can do is assume that the textbook authors obtained the data from a source that used proper procedures for collecting and handling the data.

We print the data frame:

Melanoma

Use glimpse to examine the structure of the data:

glimpse(Melanoma)

## Rows: 205
## Columns: 7
## $ time      <int> 10, 30, 35, 99, 185, 204, 210, 232, 232, 279, 295, 355, 386,…
## $ status    <int> 3, 3, 2, 3, 1, 1, 1, 3, 1, 1, 1, 3, 1, 1, 1, 3, 1, 1, 1, 1, …
## $ sex       <int> 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, …
## $ age       <int> 76, 56, 41, 71, 52, 28, 77, 60, 49, 68, 53, 64, 68, 63, 14, …
## $ year      <int> 1972, 1968, 1977, 1968, 1965, 1971, 1972, 1974, 1968, 1971, …
## $ thickness <dbl> 6.76, 0.65, 1.34, 2.90, 12.08, 4.84, 5.16, 3.22, 12.88, 7.41…
## $ ulcer     <int> 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …

11.4.2 Prepare the data for analysis.

It appears that sex is coded as an integer. You will recall that we need to convert it to a factor variable since it is categorical, not numerical.

Exercise 2

According to the help file, which number corresponds to which sex?

Please write up your answer here.

We can convert a numerical variable a couple of different ways. In Chapter 3, we used the as_factor command. That command works fine, but it doesn’t give you a way to change the levels of the variable. In other words, if we used as_factor here, we would get a factor variable that still contained zeroes and ones.

Instead, we will use the factor command. It allows us to manually relabel the levels. The levels argument requires a vector (with c) of the current levels, and the labels argument requires a vector listing the new names you want to assign, as follows:

Melanoma <- Melanoma %>%
    mutate(sex_fct = factor(sex, levels = c(0, 1), labels = c("female", "male")))
glimpse(Melanoma)

## Rows: 205
## Columns: 8
## $ time      <int> 10, 30, 35, 99, 185, 204, 210, 232, 232, 279, 295, 355, 386,…
## $ status    <int> 3, 3, 2, 3, 1, 1, 1, 3, 1, 1, 1, 3, 1, 1, 1, 3, 1, 1, 1, 1, …
## $ sex       <int> 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, …
## $ age       <int> 76, 56, 41, 71, 52, 28, 77, 60, 49, 68, 53, 64, 68, 63, 14, …
## $ year      <int> 1972, 1968, 1977, 1968, 1965, 1971, 1972, 1974, 1968, 1971, …
## $ thickness <dbl> 6.76, 0.65, 1.34, 2.90, 12.08, 4.84, 5.16, 3.22, 12.88, 7.41…
## $ ulcer     <int> 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ sex_fct   <fct> male, male, male, female, male, male, male, female, male, fe…

You should check to make sure the first few entries of sex_fct agree with the numbers in the sex variable according to the labels explained in the help file. (If not, it means that you put the levels in one order and the labels in a different order.)

11.4.3 Make tables or plots to explore the data visually.

We only have one categorical variable, so we only need a frequency table. Since we are concerned with proportions, we’ll also look at a relative frequency table which the tabyl command provides for free.

tabyl(Melanoma, sex_fct) %>%
    adorn_totals()

11.5 The logic of inference and randomization

This is a good place to pause and remember why statistical inference is important. There are certainly more females than males in this data set. So why don’t we just show the table above, declare females are more likely to have malignant melanoma, and then go home?

Think back to coin flips. Even though there was a 50% chance of seeing heads, did that mean that exactly half of our flips came up heads? No. We have to acknowledge sampling variability: even if the truth were 50%, when w sample, we could accidentally get more or less than 50%, just by pure chance alone. Perhaps these 205 patients just happen to have more females than average.

The key, then, is to figure out if 61.5% is significantly larger than 50%, or if a number like 61.5% (or one even more extreme) could easily come about from random chance.

As we know from the last chapter, we can run a formal hypothesis test to find out. As we do so, make note of the things that are the same and the things that have changed from the last hypothesis tests you ran. For example, we are not comparing two groups anymore. We have one group of patients, and all we’re doing is measuring the percentage of this group that is female. It’s tempting to think that we’re comparing males and females, but that’s not the case. We are not using sex to divide our data into two groups for the purpose of exploring whether some other variable differs between men and women. We just have one sample. “Female” and “Male” are simply categories in a single categorical variable. Also, because we are only asking about one variable (sex_fct), the mathematical form of the hypotheses will look a little different.

Because this is no longer a question about two variables being independent or associated, the “permuting” idea we’ve been using no longer makes sense. So what does make sense?

It helps to start by figuring out what our null hypothesis is. Remember, our question of interest is whether there is a sex bias in malignant melanoma. In other words, are there more or fewer females than males with malignant melanoma? As this is our research question, it will be the alternative hypothesis. So what is the null? What is the “default” situation in which nothing interesting is going on? Well, there would be no sex bias. In other words, there would be the same number of females and males with malignant melanoma. Or another way of saying that—with respect to the “success” condition of being female that we discussed earlier—is that females comprise 50% of all patients with malignant melanoma.

Okay, given our philosophy about the null hypothesis, let’s take the skeptical position and assume that, indeed, 50% of all malignant melanoma patients in our population are female. Then let’s take a sample of 205 patients. We can’t get exactly 50% females from a sample of 205 (that would be 102.5 females!), so what numbers can we get?

Randomization will tell us. What kind of randomization? As we come across each patient in our sample, there is a 50% chance of them being female. So instead of sampling real patients, what if we just flipped a coin? A simulated coin flip will come up heads just as often as our patients will be female under the assumption of the null.

This brings us full circle, back to the first randomization idea we explored. We can simulate coin flips, graph our results, and calculate a P-value. More specifically, we’ll flip a coin 205 times to represent sampling 205 patients. Then we’ll repeat this procedure a bunch of times and establish a range of plausible percentages that can come about by chance from this procedure. Instead of doing coin flips with the rflip command as we did then, however, we’ll use our new favorite friend, the infer package.

Let’s dive back into the remaining steps of the formal hypothesis test.

11.6 Hypotheses

11.6.1 Identify the sample (or samples) and a reasonable population (or populations) of interest.

The sample consists of 205 patients from Denmark with malignant melanoma. Our population is presumably all patients with malignant melanoma, although in checking conditions below, we’ll take care to discuss whether patients in Denmark are representative of patients elsewhere.

11.6.2 Express the null and alternative hypotheses as contextually meaningful full sentences.

\(H_{0}:\) Half of malignant melanoma patients are female.

\(H_{A}:\) There is a sex bias among patients with malignant melanoma (meaning that females are either over-represented or under-represented).

11.6.3 Express the null and alternative hypotheses in symbols (when possible).

\(H_{0}: p_{female} = 0.5\)

\(H_{A}: p_{female} \neq 0.5\)

11.7 Model

11.7.1 Identify the sampling distribution model.

We will randomize to simulate the sampling distribution.

11.7.2 Check the relevant conditions to ensure that model assumptions are met.

Random
- As mentioned above, these 205 patients are not a random sample of all people with malignant melanoma. We don’t even have any evidence that they are a random sample of melanoma patients in Denmark. Without such evidence, we have to hope that these 205 patients are representative of all patients who have malignant melanoma. Unless there’s something special about Danes in terms of their genetics or diet or something like that, one could imagine that their physiology makes them just as susceptible to melanoma as anyone else. More specifically, though, our question is about females and males getting malignant melanoma. Perhaps there are more female sunbathers in Denmark than in other countries. That might make Danes unrepresentative in terms of the gender balance among melanoma patients. We should be cautious in interpreting any conclusion we might reach in light of these doubts.
10%
- Whether in Denmark or not, given that melanoma is a fairly common form of cancer, I assume 205 is less than 10% of all patients with malignant melanoma.

11.8 Mechanics

11.8.1 Compute the test statistic.

female_prop <- Melanoma %>%
    observe(response = sex_fct, success = "female",
            stat = "prop")
female_prop

Note: Pay close attention to the difference in the observe command above. Unlike in the last chapter, we don’t have any tildes. That’s because there are not two variables involved. There is only one variable, which observe needs to see as the “response” variable. (Don’t forget to use the factor version sex_fct and not sex!) We still have to specify a “success” condition. Since the hypotheses are about measuring females, we have to tell observe to calculate the proportion of females. Finally, the stat is no longer “diff in props” There are not two proportions with which to find a difference. There is just one proportion, hence, “prop”.

11.8.2 Report the test statistic in context (when possible).

The observed percentage of females with melanoma in our sample is 61.4634146%.

Note: As explained in the last chapter, we have to use pull to pull out the number from the female_prop tibble.

11.8.3 Plot the null distribution.

Since this is the first step for which we need the simulated values, it will be convenient to run the simulation here. We’ll need to set the seed as well.

set.seed(42)
melanoma_test <- Melanoma %>%
    specify(response = sex_fct, success = "female") %>%
    hypothesize(null = "point", p = 0.5) %>%
    generate(reps = 1000, type = "draw") %>%
    calculate(stat = "prop")
melanoma_test

This list of proportions is the sampling distribution. It represents possible sample proportions of females with melanoma under the assumption that the null is true. In other words, even if the “true” proportion of female melanoma patients were 0.5, these are all values that can result from random samples.

In the hypothesize command, we use “point” to tell infer that we want the null to be centered at the point 0.5. In the generate command, we need to specify the type as “draw” instead of “permute”. We are not shuffling any values here; we are “drawing” values from a probability distribution like coin flips. Everything else in the command is pretty self-explanatory.

The value of our test statistic, female_prop, is 0.6146341. It appears in the right tail:

melanoma_test %>%
    visualize() +
    shade_p_value(obs_stat = female_prop, direction = "two-sided")

Although the line only appears on the right, keep in mind that we are conducting a two-sided test, so we are interested in values more extreme than the red line on the right, but also more extreme than a similarly placed line on the left.

Exercise 3

The red line sits at about 0.615. If you were to draw a red line on the above histogram that represented a value equally distant from 0.5, but on the left instead of the right, where would that line be? Do a little arithmetic to figure it out and show your work.

Please write up your answer here.

11.8.4 Calculate the P-value.

melanoma_test %>%
    get_p_value(obs_stat = female_prop, direction = "two-sided")

## Warning: Please be cautious in reporting a p-value of 0. This result is an
## approximation based on the number of `reps` chosen in the `generate()` step.
## See `?get_p_value()` for more information.

The P-value appears to be zero. Indeed, among the 1000 simulated values, we saw none that exceeded 0.615 and none that were less than 0.385. However, a true P-value can never be zero. If you did millions or billions of simulations (please don’t try!), surely there would be one or two with even more extreme values. In cases when the P-value is really, really tiny, it is traditional to report \(P < 0.001\). It is incorrect to say \(P = 0\).

11.8.5 Interpret the P-value as a probability given the null.

\(P < 0.001\). If there were no sex bias in malignant melanoma patients, there would be less than a 0.1% chance of seeing a percentage of females at least as extreme as the one we saw in our data.

Note: Don’t forget to interpret the P-value in a contextually meaningful way. The P-value is the probability under the assumption of the null hypothesis of seeing data at least as extreme as the data we saw. In this context, that means that if we assume 50% of patients are female, it would be extremely rare to see more than 61.5% or less than 38.5% females in a sample of size 205.

11.9 Conclusion

11.9.1 State the statistical conclusion.

We reject the null hypothesis.

11.9.2 State (but do not overstate) a contextually meaningful conclusion.

There is sufficient evidence that there is a sex bias in patients who suffer from malignant melanoma.

11.9.3 Express reservations or uncertainty about the generalizability of the conclusion.

We have no idea how these patients were sampled. Are these all the patients in Denmark with malignant melanoma over a certain period of time? Were they part of a convenience sample? As a result of our uncertainly about the sampling process, we can’t be sure if the results generalize to a larger population, either in Denmark or especially outside of Denmark.

Exercise 4

Can you find on the internet any evidence that females do indeed suffer from malignant melanoma more often than males (not just in Denmark, but anywhere)?

Please write up your answer here.

11.9.4 Identify the possibility of either a Type I or Type II error and state what making such an error means in the context of the hypotheses.

As we rejected the null, we run the risk of making a Type I error. If we have made such an error, that would mean that patients with malignant melanoma are equally likely to be male or female, but that we got a sample with an unusual number of female patients.

11.10 Your turn

Determine if the percentage of patients in Denmark with malignant melanoma who also have an ulcerated tumor (measured with the ulcer variable) is significantly different from 50%.

As before, you have the outline of the rubric for inference below. Some of the steps will be the same or similar to steps in the example above. It is perfectly okay to copy and paste R code, making the necessary changes. It is not okay to copy and paste text. You need to put everything into your own words.

The template below is exactly the same as in the appendix (Rubric for inference) up to the part about confidence intervals which we haven’t learned yet.

Exploratory data analysis

Use data documentation (help files, code books, Google, etc.) to determine as much as possible about the data provenance and structure.

# Add code here to understand the data.

Prepare the data for analysis. [Not always necessary.]

# Add code here to prepare the data for analysis.

Make tables or plots to explore the data visually.

# Add code here to make tables or plots.

Hypotheses

Identify the sample (or samples) and a reasonable population (or populations) of interest.

Please write up your answer here.

Express the null and alternative hypotheses as contextually meaningful full sentences.

\(H_{0}:\) Null hypothesis goes here.

\(H_{A}:\) Alternative hypothesis goes here.

Express the null and alternative hypotheses in symbols (when possible).

\(H_{0}: math\)

\(H_{A}: math\)

Model

Identify the sampling distribution model.

Please write up your answer here.

Check the relevant conditions to ensure that model assumptions are met.

Please write up your answer here. (Some conditions may require R code as well.)

Mechanics

Compute the test statistic.

# Add code here to compute the test statistic.

Report the test statistic in context (when possible).

Please write up your answer here.

Plot the null distribution.

set.seed(42)
# Add code here to simulate the null distribution.
# Run 1000 simulations like in the earlier example.

# Add code here to plot the null distribution.

Calculate the P-value.

# Add code here to calculate the P-value.

Interpret the P-value as a probability given the null.

Please write up your answer here.

Conclusion

State the statistical conclusion.

Please write up your answer here.

State (but do not overstate) a contextually meaningful conclusion.

Please write up your answer here.

Express reservations or uncertainty about the generalizability of the conclusion.

Please write up your answer here.

Identify the possibility of either a Type I or Type II error and state what making such an error means in the context of the hypotheses.

Please write up your answer here.

11.11 Conclusion

Now you have seen two fully-worked examples of hypothesis tests using randomization, and you have created two more examples on your own. Hopefully, the logic of inference and the process of running a formal hypothesis test are starting to make sense.

Keep in mind that the outline of steps will not change. However, the way each step is carried out will vary from problem to problem. Not only does the context change (one example involved sex discrimination, the other melanoma patients), but the statistics you compute also change (one example compared proportions from two samples and the other only had one proportion from a single sample). Pay close attention to the research question and the data that will be used to answer that question. That will be the only information you have to help you know which hypothesis test applies.

11.11.1 Preparing and submitting your assignment

From the “Run” menu, select “Restart R and Run All Chunks”.
Deal with any code errors that crop up. Repeat steps 1–-2 until there are no more code errors.
Spell check your document by clicking the icon with “ABC” and a check mark.
Hit the “Preview” button one last time to generate the final draft of the .nb.html file.
Proofread the HTML file carefully. If there are errors, go back and fix them, then repeat steps 1–5 again.

If you have completed this chapter as part of a statistics course, follow the directions you receive from your professor to submit your assignment.

10 Hypothesis testing with randomization, Part 1

12 Confidence intervals