diff --git a/dev/articles/anova.html b/dev/articles/anova.html index 127e7b31..5b9b59b5 100644 --- a/dev/articles/anova.html +++ b/dev/articles/anova.html @@ -215,11 +215,11 @@
## # A tibble: 1 × 1
 ##   p_value
 ##     <dbl>
-## 1   0.077
+## 1 0.056

Thus, if there were really no relationship between age and political party affiliation, our approximation of the probability that we would see a statistic as or more extreme than 2.4842 is approximately -0.077.

+0.056.

To calculate the p-value using the true \(F\) distribution, we can use the pf function from base R. This function allows us to situate the test statistic we calculated previously in the \(F\) distribution with the appropriate diff --git a/dev/articles/anova_files/figure-html/visualize-f-1.png b/dev/articles/anova_files/figure-html/visualize-f-1.png index 860b6a64..14662ca5 100644 Binary files a/dev/articles/anova_files/figure-html/visualize-f-1.png and b/dev/articles/anova_files/figure-html/visualize-f-1.png differ diff --git a/dev/articles/anova_files/figure-html/visualize-indep-both-1.png b/dev/articles/anova_files/figure-html/visualize-indep-both-1.png index 0cbe3695..2d59f642 100644 Binary files a/dev/articles/anova_files/figure-html/visualize-indep-both-1.png and b/dev/articles/anova_files/figure-html/visualize-indep-both-1.png differ diff --git a/dev/articles/chi_squared_files/figure-html/visualize-indep-1.png b/dev/articles/chi_squared_files/figure-html/visualize-indep-1.png index d6864384..80e1c809 100644 Binary files a/dev/articles/chi_squared_files/figure-html/visualize-indep-1.png and b/dev/articles/chi_squared_files/figure-html/visualize-indep-1.png differ diff --git a/dev/articles/chi_squared_files/figure-html/visualize-indep-both-1.png b/dev/articles/chi_squared_files/figure-html/visualize-indep-both-1.png index eb1044f9..744c3740 100644 Binary files a/dev/articles/chi_squared_files/figure-html/visualize-indep-both-1.png and b/dev/articles/chi_squared_files/figure-html/visualize-indep-both-1.png differ diff --git a/dev/articles/chi_squared_files/figure-html/visualize-indep-gof-1.png b/dev/articles/chi_squared_files/figure-html/visualize-indep-gof-1.png index d7ffd9c4..0e6b0126 100644 Binary files a/dev/articles/chi_squared_files/figure-html/visualize-indep-gof-1.png and b/dev/articles/chi_squared_files/figure-html/visualize-indep-gof-1.png differ diff --git a/dev/articles/observed_stat_examples.html b/dev/articles/observed_stat_examples.html index 1b7f5cff..a3c99705 100644 --- a/dev/articles/observed_stat_examples.html +++ b/dev/articles/observed_stat_examples.html @@ -172,7 +172,7 @@

One numerical variable (mean)## # A tibble: 1 × 1 ## p_value ## <dbl> -## 1 0.032 +## 1 0.042

One numerical variable (standardized mean \(t\)) @@ -229,7 +229,7 @@

One numerical variable (stan
## # A tibble: 1 × 1
 ##   p_value
 ##     <dbl>
-## 1    0.04
+## 1 0.038

Alternatively, using the t_test wrapper:

 gss %>%
@@ -275,7 +275,7 @@ 

One numerical variable (median)## # A tibble: 1 × 1 ## p_value ## <dbl> -## 1 0.01

+## 1 0.008

One numerical variable (paired) diff --git a/dev/articles/observed_stat_examples_files/figure-html/unnamed-chunk-11-1.png b/dev/articles/observed_stat_examples_files/figure-html/unnamed-chunk-11-1.png index 93019287..4ed869ba 100644 Binary files a/dev/articles/observed_stat_examples_files/figure-html/unnamed-chunk-11-1.png and b/dev/articles/observed_stat_examples_files/figure-html/unnamed-chunk-11-1.png differ diff --git a/dev/articles/observed_stat_examples_files/figure-html/unnamed-chunk-13-1.png b/dev/articles/observed_stat_examples_files/figure-html/unnamed-chunk-13-1.png index 6922be16..fe47fff2 100644 Binary files a/dev/articles/observed_stat_examples_files/figure-html/unnamed-chunk-13-1.png and b/dev/articles/observed_stat_examples_files/figure-html/unnamed-chunk-13-1.png differ diff --git a/dev/articles/observed_stat_examples_files/figure-html/unnamed-chunk-19-1.png b/dev/articles/observed_stat_examples_files/figure-html/unnamed-chunk-19-1.png index e9464e0d..c4457d66 100644 Binary files a/dev/articles/observed_stat_examples_files/figure-html/unnamed-chunk-19-1.png and b/dev/articles/observed_stat_examples_files/figure-html/unnamed-chunk-19-1.png differ diff --git a/dev/articles/observed_stat_examples_files/figure-html/unnamed-chunk-5-1.png b/dev/articles/observed_stat_examples_files/figure-html/unnamed-chunk-5-1.png index 96881df5..c9529150 100644 Binary files a/dev/articles/observed_stat_examples_files/figure-html/unnamed-chunk-5-1.png and b/dev/articles/observed_stat_examples_files/figure-html/unnamed-chunk-5-1.png differ diff --git a/dev/articles/t_test.html b/dev/articles/t_test.html index d038e85a..f7dafa4f 100644 --- a/dev/articles/t_test.html +++ b/dev/articles/t_test.html @@ -202,10 +202,10 @@

1-Sample t-Test## # A tibble: 1 × 1 ## p_value ## <dbl> -## 1 0.04 +## 1 0.034

Thus, if the true mean number of hours worked per week was really 40, our approximation of the probability that we would see a test statistic -as or more extreme than 41.382 is approximately 0.04.

+as or more extreme than 41.382 is approximately 0.034.

Analogously to the steps shown above, the package supplies a wrapper function, t_test, to carry out 1-sample \(t\)-tests on tidy data. Rather than using randomization, the wrappers carry out the theory-based \(t\)-test. The syntax looks like this:

@@ -332,11 +332,11 @@

2-Sample t-Test## # A tibble: 1 × 1 ## p_value ## <dbl> -## 1 0.25 +## 1 0.284

Thus, if there were really no relationship between the number of hours worked a week and whether one has a college degree, the probability that we would see a statistic as or more extreme than 1.5384 -is approximately 0.25.

+is approximately 0.284.

Note that, similarly to the steps shown above, the package supplies a wrapper function, t_test, to carry out 2-sample \(t\)-tests on tidy data. The syntax looks like this:

diff --git a/dev/articles/t_test_files/figure-html/visualize-1-sample-1.png b/dev/articles/t_test_files/figure-html/visualize-1-sample-1.png index 0f2eb4da..5687f07e 100644 Binary files a/dev/articles/t_test_files/figure-html/visualize-1-sample-1.png and b/dev/articles/t_test_files/figure-html/visualize-1-sample-1.png differ diff --git a/dev/articles/t_test_files/figure-html/visualize-2-sample-1.png b/dev/articles/t_test_files/figure-html/visualize-2-sample-1.png index 3591147d..38320173 100644 Binary files a/dev/articles/t_test_files/figure-html/visualize-2-sample-1.png and b/dev/articles/t_test_files/figure-html/visualize-2-sample-1.png differ diff --git a/dev/pkgdown.yml b/dev/pkgdown.yml index 7a6f9d4e..2e0299ab 100644 --- a/dev/pkgdown.yml +++ b/dev/pkgdown.yml @@ -8,7 +8,7 @@ articles: observed_stat_examples: observed_stat_examples.html paired: paired.html t_test: t_test.html -last_built: 2024-03-25T14:59Z +last_built: 2024-03-25T15:07Z urls: reference: https://infer.tidymodels.org/reference article: https://infer.tidymodels.org/articles diff --git a/dev/search.json b/dev/search.json index 8ef9de37..0ee588a9 100644 --- a/dev/search.json +++ b/dev/search.json @@ -1 +1 @@ -[{"path":[]},{"path":"https://infer.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"our-pledge","dir":"","previous_headings":"","what":"Our Pledge","title":"Contributor Covenant Code of Conduct","text":"members, contributors, leaders pledge make participation community harassment-free experience everyone, regardless age, body size, visible invisible disability, ethnicity, sex characteristics, gender identity expression, level experience, education, socio-economic status, nationality, personal appearance, race, caste, color, religion, sexual identity orientation. pledge act interact ways contribute open, welcoming, diverse, inclusive, healthy community.","code":""},{"path":"https://infer.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"our-standards","dir":"","previous_headings":"","what":"Our Standards","title":"Contributor Covenant Code of Conduct","text":"Examples behavior contributes positive environment community include: Demonstrating empathy kindness toward people respectful differing opinions, viewpoints, experiences Giving gracefully accepting constructive feedback Accepting responsibility apologizing affected mistakes, learning experience Focusing best just us individuals, overall community Examples unacceptable behavior include: use sexualized language imagery, sexual attention advances kind Trolling, insulting derogatory comments, personal political attacks Public private harassment Publishing others’ private information, physical email address, without explicit permission conduct reasonably considered inappropriate professional setting","code":""},{"path":"https://infer.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"enforcement-responsibilities","dir":"","previous_headings":"","what":"Enforcement Responsibilities","title":"Contributor Covenant Code of Conduct","text":"Community leaders responsible clarifying enforcing standards acceptable behavior take appropriate fair corrective action response behavior deem inappropriate, threatening, offensive, harmful. Community leaders right responsibility remove, edit, reject comments, commits, code, wiki edits, issues, contributions aligned Code Conduct, communicate reasons moderation decisions appropriate.","code":""},{"path":"https://infer.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"scope","dir":"","previous_headings":"","what":"Scope","title":"Contributor Covenant Code of Conduct","text":"Code Conduct applies within community spaces, also applies individual officially representing community public spaces. Examples representing community include using official e-mail address, posting via official social media account, acting appointed representative online offline event.","code":""},{"path":"https://infer.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"enforcement","dir":"","previous_headings":"","what":"Enforcement","title":"Contributor Covenant Code of Conduct","text":"Instances abusive, harassing, otherwise unacceptable behavior may reported community leaders responsible enforcement codeofconduct@posit.co. complaints reviewed investigated promptly fairly. community leaders obligated respect privacy security reporter incident.","code":""},{"path":"https://infer.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"enforcement-guidelines","dir":"","previous_headings":"","what":"Enforcement Guidelines","title":"Contributor Covenant Code of Conduct","text":"Community leaders follow Community Impact Guidelines determining consequences action deem violation Code Conduct:","code":""},{"path":"https://infer.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"id_1-correction","dir":"","previous_headings":"Enforcement Guidelines","what":"1. Correction","title":"Contributor Covenant Code of Conduct","text":"Community Impact: Use inappropriate language behavior deemed unprofessional unwelcome community. Consequence: private, written warning community leaders, providing clarity around nature violation explanation behavior inappropriate. public apology may requested.","code":""},{"path":"https://infer.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"id_2-warning","dir":"","previous_headings":"Enforcement Guidelines","what":"2. Warning","title":"Contributor Covenant Code of Conduct","text":"Community Impact: violation single incident series actions. Consequence: warning consequences continued behavior. interaction people involved, including unsolicited interaction enforcing Code Conduct, specified period time. includes avoiding interactions community spaces well external channels like social media. Violating terms may lead temporary permanent ban.","code":""},{"path":"https://infer.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"id_3-temporary-ban","dir":"","previous_headings":"Enforcement Guidelines","what":"3. Temporary Ban","title":"Contributor Covenant Code of Conduct","text":"Community Impact: serious violation community standards, including sustained inappropriate behavior. Consequence: temporary ban sort interaction public communication community specified period time. public private interaction people involved, including unsolicited interaction enforcing Code Conduct, allowed period. Violating terms may lead permanent ban.","code":""},{"path":"https://infer.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"id_4-permanent-ban","dir":"","previous_headings":"Enforcement Guidelines","what":"4. Permanent Ban","title":"Contributor Covenant Code of Conduct","text":"Community Impact: Demonstrating pattern violation community standards, including sustained inappropriate behavior, harassment individual, aggression toward disparagement classes individuals. Consequence: permanent ban sort public interaction within community.","code":""},{"path":"https://infer.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"attribution","dir":"","previous_headings":"","what":"Attribution","title":"Contributor Covenant Code of Conduct","text":"Code Conduct adapted Contributor Covenant, version 2.1, available https://www.contributor-covenant.org/version/2/1/code_of_conduct.html. Community Impact Guidelines inspired [Mozilla’s code conduct enforcement ladder][https://github.com/mozilla/inclusion]. answers common questions code conduct, see FAQ https://www.contributor-covenant.org/faq. Translations available https://www.contributor-covenant.org/translations.","code":""},{"path":"https://infer.tidymodels.org/dev/CONTRIBUTING.html","id":null,"dir":"","previous_headings":"","what":"Contributing","title":"Contributing","text":"Contributions infer whether form bug fixes, issue reports, new code documentation improvements encouraged welcome. welcome novices may never contributed package well friendly veterans looking help us improve package users. eager include accepting contributions everyone meets code conduct guidelines. Please use GitHub issues. pull request, please link open corresponding issue GitHub issues. Please ensure notifications turned respond questions, comments needed changes promptly.","code":""},{"path":"https://infer.tidymodels.org/dev/CONTRIBUTING.html","id":"tests","dir":"","previous_headings":"","what":"Tests","title":"Contributing","text":"infer uses testthat testing. Please try provide 100% test coverage submitted code always check existing tests continue pass. beginner need help writing test, mention issue try help. ’s also helpful run goodpractice::gp() ensure lines code 80 characters lines code tests written. Please prior submitting pull request fix suggestions . Reach us need assistance .","code":""},{"path":"https://infer.tidymodels.org/dev/CONTRIBUTING.html","id":"code-style","dir":"","previous_headings":"","what":"Code style","title":"Contributing","text":"Please use snake case (rep_sample_n) function names. Besides , general follow tidyverse style R.","code":""},{"path":"https://infer.tidymodels.org/dev/CONTRIBUTING.html","id":"code-of-conduct","dir":"","previous_headings":"","what":"Code of Conduct","title":"Contributing","text":"contributing infer package must follow code conduct defined CONDUCT.","code":""},{"path":"https://infer.tidymodels.org/dev/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"MIT License","title":"MIT License","text":"Copyright (c) 2021 infer authors Permission hereby granted, free charge, person obtaining copy software associated documentation files (“Software”), deal Software without restriction, including without limitation rights use, copy, modify, merge, publish, distribute, sublicense, /sell copies Software, permit persons Software furnished , subject following conditions: copyright notice permission notice shall included copies substantial portions Software. SOFTWARE PROVIDED “”, WITHOUT WARRANTY KIND, EXPRESS IMPLIED, INCLUDING LIMITED WARRANTIES MERCHANTABILITY, FITNESS PARTICULAR PURPOSE NONINFRINGEMENT. EVENT SHALL AUTHORS COPYRIGHT HOLDERS LIABLE CLAIM, DAMAGES LIABILITY, WHETHER ACTION CONTRACT, TORT OTHERWISE, ARISING , CONNECTION SOFTWARE USE DEALINGS SOFTWARE.","code":""},{"path":"https://infer.tidymodels.org/dev/articles/chi_squared.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Tidy Chi-Squared Tests with infer","text":"vignette, ’ll walk conducting \\(\\chi^2\\) (chi-squared) test independence chi-squared goodness fit test using infer. ’ll start chi-squared test independence, can used test association two categorical variables. , ’ll move chi-squared goodness fit test, tests well distribution one categorical variable can approximated theoretical distribution. Throughout vignette, ’ll make use gss dataset supplied infer, contains sample data General Social Survey. See ?gss information variables included source. Note data (examples ) demonstration purposes , necessarily provide accurate estimates unless weighted properly. examples, let’s suppose dataset representative sample population want learn : American adults. data looks like :","code":"dplyr::glimpse(gss) ## Rows: 500 ## Columns: 11 ## $ year 2014, 1994, 1998, 1996, 1994, 1996, 1990, 2016, 2000, 19… ## $ age 36, 34, 24, 42, 31, 32, 48, 36, 30, 33, 21, 30, 38, 49, … ## $ sex male, female, male, male, male, female, female, female, … ## $ college degree, no degree, degree, no degree, degree, no degree,… ## $ partyid ind, rep, ind, ind, rep, rep, dem, ind, rep, dem, dem, i… ## $ hompop 3, 4, 1, 4, 2, 4, 2, 1, 5, 2, 4, 3, 4, 4, 2, 2, 3, 2, 1,… ## $ hours 50, 31, 40, 40, 40, 53, 32, 20, 40, 40, 23, 52, 38, 72, … ## $ income $25000 or more, $20000 - 24999, $25000 or more, $25000 o… ## $ class middle class, working class, working class, working clas… ## $ finrela below average, below average, below average, above avera… ## $ weight 0.8960, 1.0825, 0.5501, 1.0864, 1.0825, 1.0864, 1.0627, …"},{"path":"https://infer.tidymodels.org/dev/articles/chi_squared.html","id":"test-of-independence","dir":"Articles","previous_headings":"","what":"Test of Independence","title":"Tidy Chi-Squared Tests with infer","text":"carry chi-squared test independence, ’ll examine association income educational attainment United States. college categorical variable values degree degree, indicating whether respondent college degree (including community college), finrela gives respondent’s self-identification family income—either far average, average, average, average, far average, DK (don’t know). relationship looks like sample data: relationship, expect see purple bars reaching height, regardless income class. differences see , though, just due random noise? First, calculate observed statistic, can use specify() calculate(). observed \\(\\chi^2\\) statistic 30.6825. Now, want compare statistic null distribution, generated assumption variables actually related, get sense likely us see observed statistic actually association education income. can generate null distribution one two ways—using randomization theory-based methods. randomization approach approximates null distribution permuting response explanatory variables, person’s educational attainment matched random income sample order break association two. Note , line specify(college ~ finrela) , use equivalent syntax specify(response = college, explanatory = finrela). goes code , generates null distribution using theory-based methods instead randomization. get sense distributions look like, observed statistic falls, can use visualize(): also visualize observed statistic theoretical null distribution. , use assume() verb define theoretical null distribution pass visualize() like null distribution outputted generate() calculate(). visualize randomization-based theoretical null distributions get sense two relate, can pipe randomization-based null distribution visualize(), provide method = \"\". Either way, looks like observed test statistic quite unlikely actually association education income. exactly, can approximate p-value get_p_value: Thus, really relationship education income, approximation probability see statistic extreme 30.6825 approximately 0. calculate p-value using true \\(\\chi^2\\) distribution, can use pchisq function base R. function allows us situate test statistic calculated previously \\(\\chi^2\\) distribution appropriate degrees freedom. Note , equivalently theory-based approach shown , package supplies wrapper function, chisq_test, carry Chi-Squared tests independence tidy data. syntax goes like :","code":"# calculate the observed statistic observed_indep_statistic <- gss %>% specify(college ~ finrela) %>% hypothesize(null = \"independence\") %>% calculate(stat = \"Chisq\") # generate the null distribution using randomization null_dist_sim <- gss %>% specify(college ~ finrela) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") %>% calculate(stat = \"Chisq\") # generate the null distribution by theoretical approximation null_dist_theory <- gss %>% specify(college ~ finrela) %>% assume(distribution = \"Chisq\") # visualize the null distribution and test statistic! null_dist_sim %>% visualize() + shade_p_value(observed_indep_statistic, direction = \"greater\") # visualize the theoretical null distribution and test statistic! gss %>% specify(college ~ finrela) %>% assume(distribution = \"Chisq\") %>% visualize() + shade_p_value(observed_indep_statistic, direction = \"greater\") # visualize both null distributions and the test statistic! null_dist_sim %>% visualize(method = \"both\") + shade_p_value(observed_indep_statistic, direction = \"greater\") # calculate the p value from the observed statistic and null distribution p_value_independence <- null_dist_sim %>% get_p_value(obs_stat = observed_indep_statistic, direction = \"greater\") p_value_independence ## # A tibble: 1 × 1 ## p_value ## ## 1 0 pchisq(observed_indep_statistic$stat, 5, lower.tail = FALSE) ## X-squared ## 1.082e-05 chisq_test(gss, college ~ finrela) ## # A tibble: 1 × 3 ## statistic chisq_df p_value ## ## 1 30.7 5 0.0000108"},{"path":"https://infer.tidymodels.org/dev/articles/chi_squared.html","id":"goodness-of-fit","dir":"Articles","previous_headings":"","what":"Goodness of Fit","title":"Tidy Chi-Squared Tests with infer","text":"Now, moving chi-squared goodness fit test, ’ll take look self-identified income class survey respondents. Suppose null hypothesis finrela follows uniform distribution (.e. ’s actually equal number people describe income far average, average, average, average, far average, don’t know income.) graph represents hypothesis: seems like uniform distribution may appropriate description data–many people describe income average options. Lets now test whether difference distributions statistically significant. First, carry hypothesis test, calculate observed statistic. observed statistic 487.984. Now, generating null distribution, just dropping call generate(): , get sense distributions look like, observed statistic falls, can use visualize(): statistic seems like quite unlikely income class self-identification actually followed uniform distribution! unlikely, though? Calculating p-value: Thus, self-identified income class equally likely occur, approximation probability see distribution like one approximately 0. calculate p-value using true \\(\\chi^2\\) distribution, can use pchisq function base R. function allows us situate test statistic calculated previously \\(\\chi^2\\) distribution appropriate degrees freedom. , equivalently theory-based approach shown , package supplies wrapper function, chisq_test, carry Chi-Squared goodness fit tests tidy data. syntax goes like :","code":"# calculating the null distribution observed_gof_statistic <- gss %>% specify(response = finrela) %>% hypothesize(null = \"point\", p = c(\"far below average\" = 1/6, \"below average\" = 1/6, \"average\" = 1/6, \"above average\" = 1/6, \"far above average\" = 1/6, \"DK\" = 1/6)) %>% calculate(stat = \"Chisq\") # generating a null distribution, assuming each income class is equally likely null_dist_gof <- gss %>% specify(response = finrela) %>% hypothesize(null = \"point\", p = c(\"far below average\" = 1/6, \"below average\" = 1/6, \"average\" = 1/6, \"above average\" = 1/6, \"far above average\" = 1/6, \"DK\" = 1/6)) %>% generate(reps = 1000, type = \"draw\") %>% calculate(stat = \"Chisq\") # visualize the null distribution and test statistic! null_dist_gof %>% visualize() + shade_p_value(observed_gof_statistic, direction = \"greater\") # calculate the p-value p_value_gof <- null_dist_gof %>% get_p_value(observed_gof_statistic, direction = \"greater\") p_value_gof ## # A tibble: 1 × 1 ## p_value ## ## 1 0 pchisq(observed_gof_statistic$stat, 5, lower.tail = FALSE) ## [1] 3.131e-103 chisq_test(gss, response = finrela, p = c(\"far below average\" = 1/6, \"below average\" = 1/6, \"average\" = 1/6, \"above average\" = 1/6, \"far above average\" = 1/6, \"DK\" = 1/6)) ## # A tibble: 1 × 3 ## statistic chisq_df p_value ## ## 1 488. 5 3.13e-103"},{"path":"https://infer.tidymodels.org/dev/articles/infer.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Getting to Know infer","text":"infer implements expressive grammar perform statistical inference coheres tidyverse design framework. Rather providing methods specific statistical tests, package consolidates principles shared among common hypothesis tests set 4 main verbs (functions), supplemented many utilities visualize extract value outputs. Regardless hypothesis test ’re using, ’re still asking kind question: effect/difference observed data real, due chance? answer question, start assuming observed data came world “nothing going ” (.e. observed effect simply due random chance), call assumption null hypothesis. (reality, might believe null hypothesis —null hypothesis opposition alternate hypothesis, supposes effect present observed data actually due fact “something going .”) calculate test statistic data describes observed effect. can use test statistic calculate p-value, giving probability observed data come null hypothesis true. probability pre-defined significance level \\(\\alpha\\), can reject null hypothesis. workflow package designed around idea. Starting dataset, specify() allows specify variable, relationship variables, ’re interested . hypothesize() allows declare null hypothesis. generate() allows generate data reflecting null hypothesis. calculate() allows calculate distribution statistics generated data form null distribution. Throughout vignette, make use gss, dataset supplied infer containing sample 500 observations 11 variables General Social Survey. row individual survey response, containing basic demographic information respondent well additional variables. See ?gss information variables included source. Note data (examples ) demonstration purposes , necessarily provide accurate estimates unless weighted properly. examples, let’s suppose dataset representative sample population want learn : American adults.","code":"# load in the dataset data(gss) # take a look at its structure dplyr::glimpse(gss) ## Rows: 500 ## Columns: 11 ## $ year 2014, 1994, 1998, 1996, 1994, 1996, 1990, 2016, 2000, 19… ## $ age 36, 34, 24, 42, 31, 32, 48, 36, 30, 33, 21, 30, 38, 49, … ## $ sex male, female, male, male, male, female, female, female, … ## $ college degree, no degree, degree, no degree, degree, no degree,… ## $ partyid ind, rep, ind, ind, rep, rep, dem, ind, rep, dem, dem, i… ## $ hompop 3, 4, 1, 4, 2, 4, 2, 1, 5, 2, 4, 3, 4, 4, 2, 2, 3, 2, 1,… ## $ hours 50, 31, 40, 40, 40, 53, 32, 20, 40, 40, 23, 52, 38, 72, … ## $ income $25000 or more, $20000 - 24999, $25000 or more, $25000 o… ## $ class middle class, working class, working class, working clas… ## $ finrela below average, below average, below average, above avera… ## $ weight 0.8960, 1.0825, 0.5501, 1.0864, 1.0825, 1.0864, 1.0627, …"},{"path":"https://infer.tidymodels.org/dev/articles/infer.html","id":"specify-specifying-response-and-explanatory-variables","dir":"Articles","previous_headings":"","what":"specify(): Specifying Response (and Explanatory) Variables","title":"Getting to Know infer","text":"specify function can used specify variables dataset ’re interested . ’re interested , say, age respondents, might write: front-end, output specify just looks like selects columns dataframe ’ve specified. Checking class object, though: can see infer class appended top dataframe classes–new class stores extra metadata. ’re interested two variables–age partyid, example–can specify relationship one two (equivalent) ways: ’re inference one proportion difference proportions, need use success argument specify level response variable success. instance, ’re interested proportion population college degree, might use following code:","code":"gss %>% specify(response = age) ## Response: age (numeric) ## # A tibble: 500 × 1 ## age ## ## 1 36 ## 2 34 ## 3 24 ## 4 42 ## 5 31 ## 6 32 ## 7 48 ## 8 36 ## 9 30 ## 10 33 ## # ℹ 490 more rows gss %>% specify(response = age) %>% class() ## [1] \"infer\" \"tbl_df\" \"tbl\" \"data.frame\" # as a formula gss %>% specify(age ~ partyid) ## Response: age (numeric) ## Explanatory: partyid (factor) ## # A tibble: 500 × 2 ## age partyid ## ## 1 36 ind ## 2 34 rep ## 3 24 ind ## 4 42 ind ## 5 31 rep ## 6 32 rep ## 7 48 dem ## 8 36 ind ## 9 30 rep ## 10 33 dem ## # ℹ 490 more rows # with the named arguments gss %>% specify(response = age, explanatory = partyid) ## Response: age (numeric) ## Explanatory: partyid (factor) ## # A tibble: 500 × 2 ## age partyid ## ## 1 36 ind ## 2 34 rep ## 3 24 ind ## 4 42 ind ## 5 31 rep ## 6 32 rep ## 7 48 dem ## 8 36 ind ## 9 30 rep ## 10 33 dem ## # ℹ 490 more rows # specifying for inference on proportions gss %>% specify(response = college, success = \"degree\") ## Response: college (factor) ## # A tibble: 500 × 1 ## college ## ## 1 degree ## 2 no degree ## 3 degree ## 4 no degree ## 5 degree ## 6 no degree ## 7 no degree ## 8 degree ## 9 degree ## 10 no degree ## # ℹ 490 more rows"},{"path":"https://infer.tidymodels.org/dev/articles/infer.html","id":"hypothesize-declaring-the-null-hypothesis","dir":"Articles","previous_headings":"","what":"hypothesize(): Declaring the Null Hypothesis","title":"Getting to Know infer","text":"next step infer pipeline often declare null hypothesis using hypothesize(). first step supply one “independence” “point” null argument. null hypothesis assumes independence two variables, need supply hypothesize(): ’re inference point estimate, also need provide one p (true proportion successes, 0 1), mu (true mean), med (true median), sigma (true standard deviation). instance, null hypothesis mean number hours worked per week population 40, write: , front-end, dataframe outputted hypothesize() looks almost exactly came specify(), infer now “knows” null hypothesis.","code":"gss %>% specify(college ~ partyid, success = \"degree\") %>% hypothesize(null = \"independence\") ## Response: college (factor) ## Explanatory: partyid (factor) ## Null Hypothesis: independence ## # A tibble: 500 × 2 ## college partyid ## ## 1 degree ind ## 2 no degree rep ## 3 degree ind ## 4 no degree ind ## 5 degree rep ## 6 no degree rep ## 7 no degree dem ## 8 degree ind ## 9 degree rep ## 10 no degree dem ## # ℹ 490 more rows gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) ## Response: hours (numeric) ## Null Hypothesis: point ## # A tibble: 500 × 1 ## hours ## ## 1 50 ## 2 31 ## 3 40 ## 4 40 ## 5 40 ## 6 53 ## 7 32 ## 8 20 ## 9 40 ## 10 40 ## # ℹ 490 more rows"},{"path":"https://infer.tidymodels.org/dev/articles/infer.html","id":"generate-generating-the-null-distribution","dir":"Articles","previous_headings":"","what":"generate(): Generating the Null Distribution","title":"Getting to Know infer","text":"’ve asserted null hypothesis using hypothesize(), can construct null distribution based hypothesis. can using one several methods, supplied type argument: bootstrap: bootstrap sample drawn replicate, sample size equal input sample size drawn (replacement) input sample data. permute: replicate, input value randomly reassigned (without replacement) new output value sample. draw: value sampled theoretical distribution parameters specified hypothesize() replicate. option currently applicable testing point estimates. generation type previously called \"simulate\", superseded. Continuing example , average number hours worked week, might write: example, take 1000 bootstrap samples form null distribution. Note , generate()ing, ’ve set seed random number generation set.seed() function. using infer package research, cases exact reproducibility priority, good practice. infer respect random seed specified set.seed() function, returning result generate()ing data given identical seed. generate null distribution independence two variables, also randomly reshuffle pairings explanatory response variables break existing association. instance, generate 1000 replicates can used create null distribution assumption political party affiliation affected age:","code":"set.seed(1) gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% generate(reps = 1000, type = \"bootstrap\") ## Response: hours (numeric) ## Null Hypothesis: point ## # A tibble: 500,000 × 2 ## # Groups: replicate [1,000] ## replicate hours ## ## 1 1 46.6 ## 2 1 43.6 ## 3 1 38.6 ## 4 1 28.6 ## 5 1 38.6 ## 6 1 38.6 ## 7 1 6.62 ## 8 1 78.6 ## 9 1 38.6 ## 10 1 38.6 ## # ℹ 499,990 more rows gss %>% specify(partyid ~ age) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") ## Response: partyid (factor) ## Explanatory: age (numeric) ## Null Hypothesis: independence ## # A tibble: 500,000 × 3 ## # Groups: replicate [1,000] ## partyid age replicate ## ## 1 rep 36 1 ## 2 rep 34 1 ## 3 dem 24 1 ## 4 dem 42 1 ## 5 dem 31 1 ## 6 ind 32 1 ## 7 ind 48 1 ## 8 rep 36 1 ## 9 dem 30 1 ## 10 rep 33 1 ## # ℹ 499,990 more rows"},{"path":"https://infer.tidymodels.org/dev/articles/infer.html","id":"calculate-calculating-summary-statistics","dir":"Articles","previous_headings":"","what":"calculate(): Calculating Summary Statistics","title":"Getting to Know infer","text":"calculate() calculates summary statistics output infer core functions. function takes stat argument, currently one “mean”, “median”, “sum”, “sd”, “prop”, “count”, “diff means”, “diff medians”, “diff props”, “Chisq”, “F”, “t”, “z”, “slope”, “correlation”. example, continuing example calculate null distribution mean hours worked per week: output calculate() shows us sample statistic (case, mean) 1000 replicates. ’re carrying inference differences means, medians, proportions, t z statistics, need supply order argument, giving order explanatory variables subtracted. instance, find difference mean age college degree don’t, might write:","code":"gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"mean\") ## Response: hours (numeric) ## Null Hypothesis: point ## # A tibble: 1,000 × 2 ## replicate stat ## ## 1 1 39.2 ## 2 2 39.1 ## 3 3 39.0 ## 4 4 39.8 ## 5 5 41.4 ## 6 6 39.4 ## 7 7 39.8 ## 8 8 40.4 ## 9 9 41.5 ## 10 10 40.9 ## # ℹ 990 more rows gss %>% specify(age ~ college) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") %>% calculate(\"diff in means\", order = c(\"degree\", \"no degree\")) ## Response: age (numeric) ## Explanatory: college (factor) ## Null Hypothesis: independence ## # A tibble: 1,000 × 2 ## replicate stat ## ## 1 1 -2.35 ## 2 2 -0.902 ## 3 3 0.403 ## 4 4 -0.426 ## 5 5 0.482 ## 6 6 -0.196 ## 7 7 1.33 ## 8 8 -1.07 ## 9 9 1.68 ## 10 10 0.888 ## # ℹ 990 more rows"},{"path":"https://infer.tidymodels.org/dev/articles/infer.html","id":"other-utilities","dir":"Articles","previous_headings":"","what":"Other Utilities","title":"Getting to Know infer","text":"infer also offers several utilities extract meaning summary statistics distributions—package provides functions visualize statistic relative distribution (visualize()), calculate p-values (get_p_value()), calculate confidence intervals (get_confidence_interval()). illustrate, ’ll go back example determining whether mean number hours worked per week 40 hours. point estimate 41.382 seems pretty close 40, little bit different. might wonder difference just due random chance, mean number hours worked per week population really isn’t 40. initially just visualize null distribution. sample’s observed statistic lie distribution? can use obs_stat argument specify . Notice infer also shaded regions null distribution () extreme observed statistic. (Also, note now use + operator apply shade_p_value function. visualize outputs plot object ggplot2 instead data frame, + operator needed add p-value layer plot object.) red bar looks like ’s slightly far right tail null distribution, observing sample mean 41.382 hours somewhat unlikely mean actually 40 hours. unlikely, though? looks like p-value 0.032, pretty small—true mean number hours worked per week actually 40, probability sample mean far (1.382 hours) 40 0.032. may may statistically significantly different, depending significance level \\(\\alpha\\) decided ran analysis. set \\(\\alpha = .05\\), difference statistically significant, set \\(\\alpha = .01\\), . get confidence interval around estimate, can write: can see, 40 hours per week contained interval, aligns previous conclusion finding significant confidence level \\(\\alpha = .05\\). see interval represented visually, can use shade_confidence_interval() utility:","code":"# find the point estimate obs_mean <- gss %>% specify(response = hours) %>% calculate(stat = \"mean\") # generate a null distribution null_dist <- gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"mean\") null_dist %>% visualize() null_dist %>% visualize() + shade_p_value(obs_stat = obs_mean, direction = \"two-sided\") # get a two-tailed p-value p_value <- null_dist %>% get_p_value(obs_stat = obs_mean, direction = \"two-sided\") p_value ## # A tibble: 1 × 1 ## p_value ## ## 1 0.032 # generate a distribution like the null distribution, # though exclude the null hypothesis from the pipeline boot_dist <- gss %>% specify(response = hours) %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"mean\") # start with the bootstrap distribution ci <- boot_dist %>% # calculate the confidence interval around the point estimate get_confidence_interval(point_estimate = obs_mean, # at the 95% confidence level level = .95, # using the standard error type = \"se\") ci ## # A tibble: 1 × 2 ## lower_ci upper_ci ## ## 1 40.1 42.7 boot_dist %>% visualize() + shade_confidence_interval(endpoints = ci)"},{"path":"https://infer.tidymodels.org/dev/articles/infer.html","id":"theoretical-methods","dir":"Articles","previous_headings":"","what":"Theoretical Methods","title":"Getting to Know infer","text":"{infer} also provides functionality use theoretical methods \"Chisq\", \"F\", \"t\" \"z\" distributions. Generally, find null distribution using theory-based methods, use code use find observed statistic elsewhere, replacing calls calculate() assume(). example, calculate observed \\(t\\) statistic (standardized mean): , define theoretical \\(t\\) distribution, write: , theoretical distribution interfaces way simulation-based null distributions . example, interface p-values: Confidence intervals lie scale data rather standardized scale theoretical distribution, sure use unstandardized observed statistic working confidence intervals. visualized, \\(t\\) distribution recentered rescaled align scale observed data.","code":"# calculate an observed t statistic obs_t <- gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% calculate(stat = \"t\") # switch out calculate with assume to define a distribution t_dist <- gss %>% specify(response = hours) %>% assume(distribution = \"t\") # visualize the theoretical null distribution visualize(t_dist) + shade_p_value(obs_stat = obs_t, direction = \"greater\") # more exactly, calculate the p-value get_p_value(t_dist, obs_t, \"greater\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.0188 # find the theory-based confidence interval theor_ci <- get_confidence_interval( x = t_dist, level = .95, point_estimate = obs_mean ) theor_ci ## # A tibble: 1 × 2 ## lower_ci upper_ci ## ## 1 40.1 42.7 # visualize the theoretical sampling distribution visualize(t_dist) + shade_confidence_interval(theor_ci)"},{"path":"https://infer.tidymodels.org/dev/articles/infer.html","id":"multiple-regression","dir":"Articles","previous_headings":"","what":"Multiple regression","title":"Getting to Know infer","text":"accommodate randomization-based inference multiple explanatory variables, package implements alternative workflow based model fitting. Rather calculate()ing statistics resampled data, side package allows fit() linear models data resampled according null hypothesis, supplying model coefficients explanatory variable. part, can just switch calculate() fit() calculate()-based workflows. example, suppose want fit hours worked per week using respondent age college completion status. first begin fitting linear model observed data. Now, generate null distributions terms, can fit 1000 models resamples gss dataset, response hours permuted . Note code except addition hypothesize generate step. permute variables response variable, variables argument generate() allows choose columns data permute. Note derived effects depend columns (e.g., interaction effects) also affected. Beyond point, observed fits distributions null fits interface exactly like analogous outputs calculate(). instance, can use following code calculate 95% confidence interval objects. , can shade p-values observed regression coefficients observed data.","code":"observed_fit <- gss %>% specify(hours ~ age + college) %>% fit() null_fits <- gss %>% specify(hours ~ age + college) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") %>% fit() null_fits ## # A tibble: 3,000 × 3 ## # Groups: replicate [1,000] ## replicate term estimate ## ## 1 1 intercept 40.3 ## 2 1 age 0.0166 ## 3 1 collegedegree 1.20 ## 4 2 intercept 41.3 ## 5 2 age 0.00664 ## 6 2 collegedegree -0.407 ## 7 3 intercept 42.9 ## 8 3 age -0.0371 ## 9 3 collegedegree 0.00431 ## 10 4 intercept 42.7 ## # ℹ 2,990 more rows get_confidence_interval( null_fits, point_estimate = observed_fit, level = .95 ) ## # A tibble: 3 × 3 ## term lower_ci upper_ci ## ## 1 age -0.0948 0.0987 ## 2 collegedegree -2.57 2.72 ## 3 intercept 37.4 45.5 visualize(null_fits) + shade_p_value(observed_fit, direction = \"both\") ## Warning in (function (mapping = NULL, data = NULL, stat = \"identity\", position = \"identity\", : All aesthetics have length 1, but the data has 1000 rows. ## ℹ Did you mean to use `annotate()`? ## Warning in (function (mapping = NULL, data = NULL, stat = \"identity\", position = \"identity\", : All aesthetics have length 1, but the data has 1000 rows. ## ℹ Did you mean to use `annotate()`? ## Warning in (function (mapping = NULL, data = NULL, stat = \"identity\", position = \"identity\", : All aesthetics have length 1, but the data has 1000 rows. ## ℹ Did you mean to use `annotate()`?"},{"path":"https://infer.tidymodels.org/dev/articles/infer.html","id":"conclusion","dir":"Articles","previous_headings":"","what":"Conclusion","title":"Getting to Know infer","text":"’s ! vignette covers key functionality infer. See help(package = \"infer\") full list functions vignettes.","code":""},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Full infer Pipeline Examples","text":"vignette intended provide set examples nearly exhaustively demonstrate functionalities provided infer. Commentary examples limited—discussion intuition behind package, see “Getting Know infer” vignette, accessible calling vignette(\"infer\"). Throughout vignette, ’ll make use gss dataset supplied infer, contains sample data General Social Survey. See ?gss information variables included source. Note data (examples ) demonstration purposes , necessarily provide accurate estimates unless weighted properly. examples, let’s suppose dataset representative sample population want learn : American adults. data looks like :","code":"# load in the dataset data(gss) # take a look at its structure dplyr::glimpse(gss) ## Rows: 500 ## Columns: 11 ## $ year 2014, 1994, 1998, 1996, 1994, 1996, 1990, 2016, 2000, 19… ## $ age 36, 34, 24, 42, 31, 32, 48, 36, 30, 33, 21, 30, 38, 49, … ## $ sex male, female, male, male, male, female, female, female, … ## $ college degree, no degree, degree, no degree, degree, no degree,… ## $ partyid ind, rep, ind, ind, rep, rep, dem, ind, rep, dem, dem, i… ## $ hompop 3, 4, 1, 4, 2, 4, 2, 1, 5, 2, 4, 3, 4, 4, 2, 2, 3, 2, 1,… ## $ hours 50, 31, 40, 40, 40, 53, 32, 20, 40, 40, 23, 52, 38, 72, … ## $ income $25000 or more, $20000 - 24999, $25000 or more, $25000 o… ## $ class middle class, working class, working class, working clas… ## $ finrela below average, below average, below average, above avera… ## $ weight 0.8960, 1.0825, 0.5501, 1.0864, 1.0825, 1.0864, 1.0627, …"},{"path":[]},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-numerical-variable-mean","dir":"Articles","previous_headings":"Hypothesis tests","what":"One numerical variable (mean)","title":"Full infer Pipeline Examples","text":"Calculating observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Visualizing observed statistic alongside null distribution, Calculating p-value null distribution observed statistic,","code":"x_bar <- gss %>% specify(response = hours) %>% calculate(stat = \"mean\") x_bar <- gss %>% observe(response = hours, stat = \"mean\") null_dist <- gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% generate(reps = 1000) %>% calculate(stat = \"mean\") visualize(null_dist) + shade_p_value(obs_stat = x_bar, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = x_bar, direction = \"two-sided\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.032"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-numerical-variable-standardized-mean-t","dir":"Articles","previous_headings":"Hypothesis tests","what":"One numerical variable (standardized mean \\(t\\))","title":"Full infer Pipeline Examples","text":"Calculating observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Alternatively, finding null distribution using theoretical methods using assume() verb, Visualizing observed statistic alongside null distribution, Alternatively, visualizing observed statistic using theory-based null distribution, Alternatively, visualizing observed statistic using null distributions, Note code makes use randomization-based null distribution. Calculating p-value null distribution observed statistic, Alternatively, using t_test wrapper: infer support testing one numerical variable via z distribution.","code":"t_bar <- gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% calculate(stat = \"t\") t_bar <- gss %>% observe(response = hours, null = \"point\", mu = 40, stat = \"t\") null_dist <- gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% generate(reps = 1000) %>% calculate(stat = \"t\") null_dist_theory <- gss %>% specify(response = hours) %>% assume(\"t\") visualize(null_dist) + shade_p_value(obs_stat = t_bar, direction = \"two-sided\") visualize(null_dist_theory) + shade_p_value(obs_stat = t_bar, direction = \"two-sided\") visualize(null_dist, method = \"both\") + shade_p_value(obs_stat = t_bar, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = t_bar, direction = \"two-sided\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.04 gss %>% t_test(response = hours, mu = 40) ## # A tibble: 1 × 7 ## statistic t_df p_value alternative estimate lower_ci upper_ci ## ## 1 2.09 499 0.0376 two.sided 41.4 40.1 42.7"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-numerical-variable-median","dir":"Articles","previous_headings":"Hypothesis tests","what":"One numerical variable (median)","title":"Full infer Pipeline Examples","text":"Calculating observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Visualizing observed statistic alongside null distribution, Calculating p-value null distribution observed statistic,","code":"x_tilde <- gss %>% specify(response = age) %>% calculate(stat = \"median\") x_tilde <- gss %>% observe(response = age, stat = \"median\") null_dist <- gss %>% specify(response = age) %>% hypothesize(null = \"point\", med = 40) %>% generate(reps = 1000) %>% calculate(stat = \"median\") visualize(null_dist) + shade_p_value(obs_stat = x_tilde, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = x_tilde, direction = \"two-sided\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.01"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-numerical-variable-paired","dir":"Articles","previous_headings":"Hypothesis tests","what":"One numerical variable (paired)","title":"Full infer Pipeline Examples","text":"example header compatible stats \"mean\", \"median\", \"sum\", \"sd\". Suppose survey respondents provided number hours worked per week surveyed 5 years prior, encoded hours_previous. ’d like test null hypothesis \"mean\" hours worked per week change sampled time five years prior. infer supports paired hypothesis testing via null = \"paired independence\" argument hypothesize(). Calculating observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Note diff column permuted, rather signs values column. Visualizing observed statistic alongside null distribution, Calculating p-value null distribution observed statistic,","code":"set.seed(1) gss_paired <- gss %>% mutate( hours_previous = hours + 5 - rpois(nrow(.), 4.8), diff = hours - hours_previous ) gss_paired %>% select(hours, hours_previous, diff) ## # A tibble: 500 × 3 ## hours hours_previous diff ## ## 1 50 52 -2 ## 2 31 32 -1 ## 3 40 40 0 ## 4 40 37 3 ## 5 40 42 -2 ## 6 53 50 3 ## 7 32 28 4 ## 8 20 19 1 ## 9 40 40 0 ## 10 40 43 -3 ## # ℹ 490 more rows x_tilde <- gss_paired %>% specify(response = diff) %>% calculate(stat = \"mean\") x_tilde <- gss_paired %>% observe(response = diff, stat = \"mean\") null_dist <- gss_paired %>% specify(response = diff) %>% hypothesize(null = \"paired independence\") %>% generate(reps = 1000, type = \"permute\") %>% calculate(stat = \"mean\") visualize(null_dist) + shade_p_value(obs_stat = x_tilde, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = x_tilde, direction = \"two-sided\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.028"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-categorical-one-proportion","dir":"Articles","previous_headings":"Hypothesis tests","what":"One categorical (one proportion)","title":"Full infer Pipeline Examples","text":"Calculating observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Visualizing observed statistic alongside null distribution, Calculating p-value null distribution observed statistic, Note logical variables coerced factors:","code":"p_hat <- gss %>% specify(response = sex, success = \"female\") %>% calculate(stat = \"prop\") p_hat <- gss %>% observe(response = sex, success = \"female\", stat = \"prop\") null_dist <- gss %>% specify(response = sex, success = \"female\") %>% hypothesize(null = \"point\", p = .5) %>% generate(reps = 1000) %>% calculate(stat = \"prop\") visualize(null_dist) + shade_p_value(obs_stat = p_hat, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = p_hat, direction = \"two-sided\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.276 null_dist <- gss %>% dplyr::mutate(is_female = (sex == \"female\")) %>% specify(response = is_female, success = \"TRUE\") %>% hypothesize(null = \"point\", p = .5) %>% generate(reps = 1000) %>% calculate(stat = \"prop\")"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-categorical-variable-standardized-proportion-z","dir":"Articles","previous_headings":"Hypothesis tests","what":"One categorical variable (standardized proportion \\(z\\))","title":"Full infer Pipeline Examples","text":"Calculating observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Visualizing observed statistic alongside null distribution, Calculating p-value null distribution observed statistic, package also supplies wrapper around prop.test tests single proportion tidy data. infer support testing two means via z distribution.","code":"p_hat <- gss %>% specify(response = sex, success = \"female\") %>% hypothesize(null = \"point\", p = .5) %>% calculate(stat = \"z\") p_hat <- gss %>% observe(response = sex, success = \"female\", null = \"point\", p = .5, stat = \"z\") null_dist <- gss %>% specify(response = sex, success = \"female\") %>% hypothesize(null = \"point\", p = .5) %>% generate(reps = 1000, type = \"draw\") %>% calculate(stat = \"z\") visualize(null_dist) + shade_p_value(obs_stat = p_hat, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = p_hat, direction = \"two-sided\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.252 prop_test(gss, college ~ NULL, p = .2) ## # A tibble: 1 × 4 ## statistic chisq_df p_value alternative ## ## 1 636. 1 2.98e-140 two.sided"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"two-categorical-2-level-variables","dir":"Articles","previous_headings":"Hypothesis tests","what":"Two categorical (2 level) variables","title":"Full infer Pipeline Examples","text":"infer package provides several statistics work data type. One statistic difference proportions. Calculating observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Visualizing observed statistic alongside null distribution, Calculating p-value null distribution observed statistic, infer also provides functionality calculate ratios proportions. workflow looks similar diff props. Calculating observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Visualizing observed statistic alongside null distribution, Calculating p-value null distribution observed statistic, addition, package provides functionality calculate odds ratios. workflow also looks similar diff props. Calculating observed statistic, , generating null distribution, Visualizing observed statistic alongside null distribution, Calculating p-value null distribution observed statistic,","code":"d_hat <- gss %>% specify(college ~ sex, success = \"no degree\") %>% calculate(stat = \"diff in props\", order = c(\"female\", \"male\")) d_hat <- gss %>% observe(college ~ sex, success = \"no degree\", stat = \"diff in props\", order = c(\"female\", \"male\")) null_dist <- gss %>% specify(college ~ sex, success = \"no degree\") %>% hypothesize(null = \"independence\") %>% generate(reps = 1000) %>% calculate(stat = \"diff in props\", order = c(\"female\", \"male\")) visualize(null_dist) + shade_p_value(obs_stat = d_hat, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = d_hat, direction = \"two-sided\") ## # A tibble: 1 × 1 ## p_value ## ## 1 1 r_hat <- gss %>% specify(college ~ sex, success = \"no degree\") %>% calculate(stat = \"ratio of props\", order = c(\"female\", \"male\")) r_hat <- gss %>% observe(college ~ sex, success = \"no degree\", stat = \"ratio of props\", order = c(\"female\", \"male\")) null_dist <- gss %>% specify(college ~ sex, success = \"no degree\") %>% hypothesize(null = \"independence\") %>% generate(reps = 1000) %>% calculate(stat = \"ratio of props\", order = c(\"female\", \"male\")) visualize(null_dist) + shade_p_value(obs_stat = r_hat, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = r_hat, direction = \"two-sided\") ## # A tibble: 1 × 1 ## p_value ## ## 1 1 or_hat <- gss %>% specify(college ~ sex, success = \"no degree\") %>% calculate(stat = \"odds ratio\", order = c(\"female\", \"male\")) null_dist <- gss %>% specify(college ~ sex, success = \"no degree\") %>% hypothesize(null = \"independence\") %>% generate(reps = 1000) %>% calculate(stat = \"odds ratio\", order = c(\"female\", \"male\")) visualize(null_dist) + shade_p_value(obs_stat = or_hat, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = or_hat, direction = \"two-sided\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.984"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"two-categorical-2-level-variables-z","dir":"Articles","previous_headings":"Hypothesis tests","what":"Two categorical (2 level) variables (z)","title":"Full infer Pipeline Examples","text":"Finding standardized observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Alternatively, finding null distribution using theoretical methods using assume() verb, Visualizing observed statistic alongside null distribution, Alternatively, visualizing observed statistic using theory-based null distribution, Alternatively, visualizing observed statistic using null distributions, Note code makes use randomization-based null distribution. Calculating p-value null distribution observed statistic, Note similarities plot previous one. package also supplies wrapper around prop.test allow tests equality proportions tidy data.","code":"z_hat <- gss %>% specify(college ~ sex, success = \"no degree\") %>% hypothesize(null = \"independence\") %>% calculate(stat = \"z\", order = c(\"female\", \"male\")) z_hat <- gss %>% observe(college ~ sex, success = \"no degree\", stat = \"z\", order = c(\"female\", \"male\")) null_dist <- gss %>% specify(college ~ sex, success = \"no degree\") %>% hypothesize(null = \"independence\") %>% generate(reps = 1000) %>% calculate(stat = \"z\", order = c(\"female\", \"male\")) null_dist_theory <- gss %>% specify(college ~ sex, success = \"no degree\") %>% assume(\"z\") visualize(null_dist) + shade_p_value(obs_stat = z_hat, direction = \"two-sided\") visualize(null_dist_theory) + shade_p_value(obs_stat = z_hat, direction = \"two-sided\") visualize(null_dist, method = \"both\") + shade_p_value(obs_stat = z_hat, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = z_hat, direction = \"two-sided\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.98 prop_test(gss, college ~ sex, order = c(\"female\", \"male\")) ## # A tibble: 1 × 6 ## statistic chisq_df p_value alternative lower_ci upper_ci ## ## 1 0.0000204 1 0.996 two.sided -0.0918 0.0834"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-categorical-2-level---gof","dir":"Articles","previous_headings":"Hypothesis tests","what":"One categorical (>2 level) - GoF","title":"Full infer Pipeline Examples","text":"Calculating observed statistic, Note need add hypothesized values compute observed statistic. Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Alternatively, finding null distribution using theoretical methods using assume() verb, Visualizing observed statistic alongside null distribution, Alternatively, visualizing observed statistic using theory-based null distribution, Alternatively, visualizing observed statistic using null distributions, Note code makes use randomization-based null distribution. Calculating p-value null distribution observed statistic, Alternatively, using chisq_test wrapper:","code":"Chisq_hat <- gss %>% specify(response = finrela) %>% hypothesize(null = \"point\", p = c(\"far below average\" = 1/6, \"below average\" = 1/6, \"average\" = 1/6, \"above average\" = 1/6, \"far above average\" = 1/6, \"DK\" = 1/6)) %>% calculate(stat = \"Chisq\") Chisq_hat <- gss %>% observe(response = finrela, null = \"point\", p = c(\"far below average\" = 1/6, \"below average\" = 1/6, \"average\" = 1/6, \"above average\" = 1/6, \"far above average\" = 1/6, \"DK\" = 1/6), stat = \"Chisq\") null_dist <- gss %>% specify(response = finrela) %>% hypothesize(null = \"point\", p = c(\"far below average\" = 1/6, \"below average\" = 1/6, \"average\" = 1/6, \"above average\" = 1/6, \"far above average\" = 1/6, \"DK\" = 1/6)) %>% generate(reps = 1000, type = \"draw\") %>% calculate(stat = \"Chisq\") null_dist_theory <- gss %>% specify(response = finrela) %>% assume(\"Chisq\") visualize(null_dist) + shade_p_value(obs_stat = Chisq_hat, direction = \"greater\") visualize(null_dist_theory) + shade_p_value(obs_stat = Chisq_hat, direction = \"greater\") visualize(null_dist_theory, method = \"both\") + shade_p_value(obs_stat = Chisq_hat, direction = \"greater\") null_dist %>% get_p_value(obs_stat = Chisq_hat, direction = \"greater\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0 chisq_test(gss, response = finrela, p = c(\"far below average\" = 1/6, \"below average\" = 1/6, \"average\" = 1/6, \"above average\" = 1/6, \"far above average\" = 1/6, \"DK\" = 1/6)) ## # A tibble: 1 × 3 ## statistic chisq_df p_value ## ## 1 488. 5 3.13e-103"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"two-categorical-2-level-chi-squared-test-of-independence","dir":"Articles","previous_headings":"Hypothesis tests","what":"Two categorical (>2 level): Chi-squared test of independence","title":"Full infer Pipeline Examples","text":"Calculating observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Alternatively, finding null distribution using theoretical methods using assume() verb, Visualizing observed statistic alongside null distribution, Alternatively, visualizing observed statistic using theory-based null distribution, Alternatively, visualizing observed statistic using null distributions, Note code makes use randomization-based null distribution. Calculating p-value null distribution observed statistic, Alternatively, using wrapper carry test,","code":"Chisq_hat <- gss %>% specify(formula = finrela ~ sex) %>% hypothesize(null = \"independence\") %>% calculate(stat = \"Chisq\") Chisq_hat <- gss %>% observe(formula = finrela ~ sex, stat = \"Chisq\") null_dist <- gss %>% specify(finrela ~ sex) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") %>% calculate(stat = \"Chisq\") null_dist_theory <- gss %>% specify(finrela ~ sex) %>% assume(distribution = \"Chisq\") visualize(null_dist) + shade_p_value(obs_stat = Chisq_hat, direction = \"greater\") visualize(null_dist_theory) + shade_p_value(obs_stat = Chisq_hat, direction = \"greater\") visualize(null_dist, method = \"both\") + shade_p_value(obs_stat = Chisq_hat, direction = \"greater\") null_dist %>% get_p_value(obs_stat = Chisq_hat, direction = \"greater\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.118 gss %>% chisq_test(formula = finrela ~ sex) ## # A tibble: 1 × 3 ## statistic chisq_df p_value ## ## 1 9.11 5 0.105"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-numerical-variable-one-categorical-2-levels-diff-in-means","dir":"Articles","previous_headings":"Hypothesis tests","what":"One numerical variable, one categorical (2 levels) (diff in means)","title":"Full infer Pipeline Examples","text":"Calculating observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Visualizing observed statistic alongside null distribution, Calculating p-value null distribution observed statistic,","code":"d_hat <- gss %>% specify(age ~ college) %>% calculate(stat = \"diff in means\", order = c(\"degree\", \"no degree\")) d_hat <- gss %>% observe(age ~ college, stat = \"diff in means\", order = c(\"degree\", \"no degree\")) null_dist <- gss %>% specify(age ~ college) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") %>% calculate(stat = \"diff in means\", order = c(\"degree\", \"no degree\")) visualize(null_dist) + shade_p_value(obs_stat = d_hat, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = d_hat, direction = \"two-sided\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.46"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-numerical-variable-one-categorical-2-levels-t","dir":"Articles","previous_headings":"Hypothesis tests","what":"One numerical variable, one categorical (2 levels) (t)","title":"Full infer Pipeline Examples","text":"Finding standardized observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Alternatively, finding null distribution using theoretical methods using assume() verb, Visualizing observed statistic alongside null distribution, Alternatively, visualizing observed statistic using theory-based null distribution, Alternatively, visualizing observed statistic using null distributions, Note code makes use randomization-based null distribution. Calculating p-value null distribution observed statistic, Note similarities plot previous one.","code":"t_hat <- gss %>% specify(age ~ college) %>% hypothesize(null = \"independence\") %>% calculate(stat = \"t\", order = c(\"degree\", \"no degree\")) t_hat <- gss %>% observe(age ~ college, stat = \"t\", order = c(\"degree\", \"no degree\")) null_dist <- gss %>% specify(age ~ college) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") %>% calculate(stat = \"t\", order = c(\"degree\", \"no degree\")) null_dist_theory <- gss %>% specify(age ~ college) %>% assume(\"t\") visualize(null_dist) + shade_p_value(obs_stat = t_hat, direction = \"two-sided\") visualize(null_dist_theory) + shade_p_value(obs_stat = t_hat, direction = \"two-sided\") visualize(null_dist, method = \"both\") + shade_p_value(obs_stat = t_hat, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = t_hat, direction = \"two-sided\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.442"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-numerical-variable-one-categorical-2-levels-diff-in-medians","dir":"Articles","previous_headings":"Hypothesis tests","what":"One numerical variable, one categorical (2 levels) (diff in medians)","title":"Full infer Pipeline Examples","text":"Calculating observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Visualizing observed statistic alongside null distribution, Calculating p-value null distribution observed statistic,","code":"d_hat <- gss %>% specify(age ~ college) %>% calculate(stat = \"diff in medians\", order = c(\"degree\", \"no degree\")) d_hat <- gss %>% observe(age ~ college, stat = \"diff in medians\", order = c(\"degree\", \"no degree\")) null_dist <- gss %>% specify(age ~ college) %>% # alt: response = age, explanatory = season hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") %>% calculate(stat = \"diff in medians\", order = c(\"degree\", \"no degree\")) visualize(null_dist) + shade_p_value(obs_stat = d_hat, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = d_hat, direction = \"two-sided\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.172"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-numerical-one-categorical-2-levels---anova","dir":"Articles","previous_headings":"Hypothesis tests","what":"One numerical, one categorical (>2 levels) - ANOVA","title":"Full infer Pipeline Examples","text":"Calculating observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Alternatively, finding null distribution using theoretical methods using assume() verb, Visualizing observed statistic alongside null distribution, Alternatively, visualizing observed statistic using theory-based null distribution, Alternatively, visualizing observed statistic using null distributions, Note code makes use randomization-based null distribution. Calculating p-value null distribution observed statistic,","code":"F_hat <- gss %>% specify(age ~ partyid) %>% calculate(stat = \"F\") F_hat <- gss %>% observe(age ~ partyid, stat = \"F\") null_dist <- gss %>% specify(age ~ partyid) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") %>% calculate(stat = \"F\") null_dist_theory <- gss %>% specify(age ~ partyid) %>% hypothesize(null = \"independence\") %>% assume(distribution = \"F\") visualize(null_dist) + shade_p_value(obs_stat = F_hat, direction = \"greater\") visualize(null_dist_theory) + shade_p_value(obs_stat = F_hat, direction = \"greater\") visualize(null_dist, method = \"both\") + shade_p_value(obs_stat = F_hat, direction = \"greater\") null_dist %>% get_p_value(obs_stat = F_hat, direction = \"greater\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.045"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"two-numerical-vars---slr","dir":"Articles","previous_headings":"Hypothesis tests","what":"Two numerical vars - SLR","title":"Full infer Pipeline Examples","text":"Calculating observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Visualizing observed statistic alongside null distribution, Calculating p-value null distribution observed statistic,","code":"slope_hat <- gss %>% specify(hours ~ age) %>% calculate(stat = \"slope\") slope_hat <- gss %>% observe(hours ~ age, stat = \"slope\") null_dist <- gss %>% specify(hours ~ age) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") %>% calculate(stat = \"slope\") visualize(null_dist) + shade_p_value(obs_stat = slope_hat, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = slope_hat, direction = \"two-sided\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.902"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"two-numerical-vars---correlation","dir":"Articles","previous_headings":"Hypothesis tests","what":"Two numerical vars - correlation","title":"Full infer Pipeline Examples","text":"Calculating observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Visualizing observed statistic alongside null distribution, Calculating p-value null distribution observed statistic,","code":"correlation_hat <- gss %>% specify(hours ~ age) %>% calculate(stat = \"correlation\") correlation_hat <- gss %>% observe(hours ~ age, stat = \"correlation\") null_dist <- gss %>% specify(hours ~ age) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") %>% calculate(stat = \"correlation\") visualize(null_dist) + shade_p_value(obs_stat = correlation_hat, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = correlation_hat, direction = \"two-sided\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.878"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"two-numerical-vars---slr-t","dir":"Articles","previous_headings":"Hypothesis tests","what":"Two numerical vars - SLR (t)","title":"Full infer Pipeline Examples","text":"currently implemented since \\(t\\) refer standardized slope standardized correlation.","code":""},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"multiple-explanatory-variables","dir":"Articles","previous_headings":"Hypothesis tests","what":"Multiple explanatory variables","title":"Full infer Pipeline Examples","text":"Calculating observed fit, Generating distribution fits response variable permuted, Generating distribution fits explanatory variable permuted independently, Visualizing observed fit alongside null fits, Calculating p-values null distribution observed fit, Note fit()-based workflow can applied use cases differing numbers explanatory variables explanatory variable types.","code":"obs_fit <- gss %>% specify(hours ~ age + college) %>% fit() null_dist <- gss %>% specify(hours ~ age + college) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") %>% fit() null_dist2 <- gss %>% specify(hours ~ age + college) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\", variables = c(age, college)) %>% fit() visualize(null_dist) + shade_p_value(obs_stat = obs_fit, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = obs_fit, direction = \"two-sided\") ## # A tibble: 3 × 2 ## term p_value ## ## 1 age 0.914 ## 2 collegedegree 0.266 ## 3 intercept 0.734"},{"path":[]},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-numerical-one-mean","dir":"Articles","previous_headings":"Confidence intervals","what":"One numerical (one mean)","title":"Full infer Pipeline Examples","text":"Finding observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating bootstrap distribution, Use bootstrap distribution find confidence interval, Visualizing observed statistic alongside distribution, Alternatively, use bootstrap distribution find confidence interval using standard error, Instead simulation-based bootstrap distribution, can also define theory-based sampling distribution, Visualization calculation confidence intervals interfaces way simulation-based distribution, Note t distribution recentered rescaled lie scale observed data. infer support confidence intervals means via z distribution.","code":"x_bar <- gss %>% specify(response = hours) %>% calculate(stat = \"mean\") x_bar <- gss %>% observe(response = hours, stat = \"mean\") boot_dist <- gss %>% specify(response = hours) %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"mean\") percentile_ci <- get_ci(boot_dist) visualize(boot_dist) + shade_confidence_interval(endpoints = percentile_ci) standard_error_ci <- get_ci(boot_dist, type = \"se\", point_estimate = x_bar) visualize(boot_dist) + shade_confidence_interval(endpoints = standard_error_ci) sampling_dist <- gss %>% specify(response = hours) %>% assume(distribution = \"t\") theor_ci <- get_ci(sampling_dist, point_estimate = x_bar) theor_ci ## # A tibble: 1 × 2 ## lower_ci upper_ci ## ## 1 40.1 42.7 visualize(sampling_dist) + shade_confidence_interval(endpoints = theor_ci)"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-numerical-one-mean---standardized","dir":"Articles","previous_headings":"Confidence intervals","what":"One numerical (one mean - standardized)","title":"Full infer Pipeline Examples","text":"Finding observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating bootstrap distribution, Use bootstrap distribution find confidence interval, Visualizing observed statistic alongside distribution, Alternatively, use bootstrap distribution find confidence interval using standard error, See subsection (one mean) theory-based approach. Note infer support confidence intervals means via z distribution.","code":"t_hat <- gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% calculate(stat = \"t\") t_hat <- gss %>% observe(response = hours, null = \"point\", mu = 40, stat = \"t\") boot_dist <- gss %>% specify(response = hours) %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"t\") percentile_ci <- get_ci(boot_dist) visualize(boot_dist) + shade_confidence_interval(endpoints = percentile_ci) standard_error_ci <- boot_dist %>% get_ci(type = \"se\", point_estimate = t_hat) visualize(boot_dist) + shade_confidence_interval(endpoints = standard_error_ci)"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-categorical-one-proportion-1","dir":"Articles","previous_headings":"Confidence intervals","what":"One categorical (one proportion)","title":"Full infer Pipeline Examples","text":"Finding observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating bootstrap distribution, Use bootstrap distribution find confidence interval, Visualizing observed statistic alongside distribution, Alternatively, use bootstrap distribution find confidence interval using standard error, Instead simulation-based bootstrap distribution, can also define theory-based sampling distribution, Visualization calculation confidence intervals interfaces way simulation-based distribution, Note z distribution recentered rescaled lie scale observed data. infer support confidence intervals means via z distribution.","code":"p_hat <- gss %>% specify(response = sex, success = \"female\") %>% calculate(stat = \"prop\") p_hat <- gss %>% observe(response = sex, success = \"female\", stat = \"prop\") boot_dist <- gss %>% specify(response = sex, success = \"female\") %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"prop\") percentile_ci <- get_ci(boot_dist) visualize(boot_dist) + shade_confidence_interval(endpoints = percentile_ci) standard_error_ci <- boot_dist %>% get_ci(type = \"se\", point_estimate = p_hat) visualize(boot_dist) + shade_confidence_interval(endpoints = standard_error_ci) sampling_dist <- gss %>% specify(response = sex, success = \"female\") %>% assume(distribution = \"z\") theor_ci <- get_ci(sampling_dist, point_estimate = p_hat) theor_ci ## # A tibble: 1 × 2 ## lower_ci upper_ci ## ## 1 0.430 0.518 visualize(sampling_dist) + shade_confidence_interval(endpoints = theor_ci)"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-categorical-variable-standardized-proportion-z-1","dir":"Articles","previous_headings":"Confidence intervals","what":"One categorical variable (standardized proportion \\(z\\))","title":"Full infer Pipeline Examples","text":"See subsection (one proportion) theory-based approach.","code":""},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-numerical-variable-one-categorical-2-levels-diff-in-means-1","dir":"Articles","previous_headings":"Confidence intervals","what":"One numerical variable, one categorical (2 levels) (diff in means)","title":"Full infer Pipeline Examples","text":"Finding observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating bootstrap distribution, Use bootstrap distribution find confidence interval, Visualizing observed statistic alongside distribution, Alternatively, use bootstrap distribution find confidence interval using standard error, Instead simulation-based bootstrap distribution, can also define theory-based sampling distribution, Visualization calculation confidence intervals interfaces way simulation-based distribution, Note t distribution recentered rescaled lie scale observed data. infer also provides functionality calculate ratios means. workflow looks similar diff means. Finding observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating bootstrap distribution, Use bootstrap distribution find confidence interval, Visualizing observed statistic alongside distribution, Alternatively, use bootstrap distribution find confidence interval using standard error,","code":"d_hat <- gss %>% specify(hours ~ college) %>% calculate(stat = \"diff in means\", order = c(\"degree\", \"no degree\")) d_hat <- gss %>% observe(hours ~ college, stat = \"diff in means\", order = c(\"degree\", \"no degree\")) boot_dist <- gss %>% specify(hours ~ college) %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"diff in means\", order = c(\"degree\", \"no degree\")) percentile_ci <- get_ci(boot_dist) visualize(boot_dist) + shade_confidence_interval(endpoints = percentile_ci) standard_error_ci <- boot_dist %>% get_ci(type = \"se\", point_estimate = d_hat) visualize(boot_dist) + shade_confidence_interval(endpoints = standard_error_ci) sampling_dist <- gss %>% specify(hours ~ college) %>% assume(distribution = \"t\") theor_ci <- get_ci(sampling_dist, point_estimate = d_hat) theor_ci ## # A tibble: 1 × 2 ## lower_ci upper_ci ## ## 1 -1.16 4.24 visualize(sampling_dist) + shade_confidence_interval(endpoints = theor_ci) d_hat <- gss %>% specify(hours ~ college) %>% calculate(stat = \"ratio of means\", order = c(\"degree\", \"no degree\")) d_hat <- gss %>% observe(hours ~ college, stat = \"ratio of means\", order = c(\"degree\", \"no degree\")) boot_dist <- gss %>% specify(hours ~ college) %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"ratio of means\", order = c(\"degree\", \"no degree\")) percentile_ci <- get_ci(boot_dist) visualize(boot_dist) + shade_confidence_interval(endpoints = percentile_ci) standard_error_ci <- boot_dist %>% get_ci(type = \"se\", point_estimate = d_hat) visualize(boot_dist) + shade_confidence_interval(endpoints = standard_error_ci)"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-numerical-variable-one-categorical-2-levels-t-1","dir":"Articles","previous_headings":"Confidence intervals","what":"One numerical variable, one categorical (2 levels) (t)","title":"Full infer Pipeline Examples","text":"Finding standardized point estimate, Alternatively, using observe() wrapper calculate observed statistic, , generating bootstrap distribution, Use bootstrap distribution find confidence interval, Visualizing observed statistic alongside distribution, Alternatively, use bootstrap distribution find confidence interval using standard error, See subsection (diff means) theory-based approach. infer support confidence intervals means via z distribution.","code":"t_hat <- gss %>% specify(hours ~ college) %>% calculate(stat = \"t\", order = c(\"degree\", \"no degree\")) t_hat <- gss %>% observe(hours ~ college, stat = \"t\", order = c(\"degree\", \"no degree\")) boot_dist <- gss %>% specify(hours ~ college) %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"t\", order = c(\"degree\", \"no degree\")) percentile_ci <- get_ci(boot_dist) visualize(boot_dist) + shade_confidence_interval(endpoints = percentile_ci) standard_error_ci <- boot_dist %>% get_ci(type = \"se\", point_estimate = t_hat) visualize(boot_dist) + shade_confidence_interval(endpoints = standard_error_ci)"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"two-categorical-variables-diff-in-proportions","dir":"Articles","previous_headings":"Confidence intervals","what":"Two categorical variables (diff in proportions)","title":"Full infer Pipeline Examples","text":"Finding observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating bootstrap distribution, Use bootstrap distribution find confidence interval, Visualizing observed statistic alongside distribution, Alternatively, use bootstrap distribution find confidence interval using standard error, Instead simulation-based bootstrap distribution, can also define theory-based sampling distribution, Visualization calculation confidence intervals interfaces way simulation-based distribution, Note z distribution recentered rescaled lie scale observed data.","code":"d_hat <- gss %>% specify(college ~ sex, success = \"degree\") %>% calculate(stat = \"diff in props\", order = c(\"female\", \"male\")) d_hat <- gss %>% observe(college ~ sex, success = \"degree\", stat = \"diff in props\", order = c(\"female\", \"male\")) boot_dist <- gss %>% specify(college ~ sex, success = \"degree\") %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"diff in props\", order = c(\"female\", \"male\")) percentile_ci <- get_ci(boot_dist) visualize(boot_dist) + shade_confidence_interval(endpoints = percentile_ci) standard_error_ci <- boot_dist %>% get_ci(type = \"se\", point_estimate = d_hat) visualize(boot_dist) + shade_confidence_interval(endpoints = standard_error_ci) sampling_dist <- gss %>% specify(college ~ sex, success = \"degree\") %>% assume(distribution = \"z\") theor_ci <- get_ci(sampling_dist, point_estimate = d_hat) theor_ci ## # A tibble: 1 × 2 ## lower_ci upper_ci ## ## 1 -0.0794 0.0878 visualize(sampling_dist) + shade_confidence_interval(endpoints = theor_ci)"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"two-categorical-variables-z","dir":"Articles","previous_headings":"Confidence intervals","what":"Two categorical variables (z)","title":"Full infer Pipeline Examples","text":"Finding standardized point estimate, Alternatively, using observe() wrapper calculate observed statistic, , generating bootstrap distribution, Use bootstrap distribution find confidence interval, Visualizing observed statistic alongside distribution, Alternatively, use bootstrap distribution find confidence interval using standard error, See subsection (diff props) theory-based approach.","code":"z_hat <- gss %>% specify(college ~ sex, success = \"degree\") %>% calculate(stat = \"z\", order = c(\"female\", \"male\")) z_hat <- gss %>% observe(college ~ sex, success = \"degree\", stat = \"z\", order = c(\"female\", \"male\")) boot_dist <- gss %>% specify(college ~ sex, success = \"degree\") %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"z\", order = c(\"female\", \"male\")) percentile_ci <- get_ci(boot_dist) visualize(boot_dist) + shade_confidence_interval(endpoints = percentile_ci) standard_error_ci <- boot_dist %>% get_ci(type = \"se\", point_estimate = z_hat) visualize(boot_dist) + shade_confidence_interval(endpoints = standard_error_ci)"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"two-numerical-vars---slr-1","dir":"Articles","previous_headings":"Confidence intervals","what":"Two numerical vars - SLR","title":"Full infer Pipeline Examples","text":"Finding observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating bootstrap distribution, Use bootstrap distribution find confidence interval, Visualizing observed statistic alongside distribution, Alternatively, use bootstrap distribution find confidence interval using standard error,","code":"slope_hat <- gss %>% specify(hours ~ age) %>% calculate(stat = \"slope\") slope_hat <- gss %>% observe(hours ~ age, stat = \"slope\") boot_dist <- gss %>% specify(hours ~ age) %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"slope\") percentile_ci <- get_ci(boot_dist) visualize(boot_dist) + shade_confidence_interval(endpoints = percentile_ci) standard_error_ci <- boot_dist %>% get_ci(type = \"se\", point_estimate = slope_hat) visualize(boot_dist) + shade_confidence_interval(endpoints = standard_error_ci)"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"two-numerical-vars---correlation-1","dir":"Articles","previous_headings":"Confidence intervals","what":"Two numerical vars - correlation","title":"Full infer Pipeline Examples","text":"Finding observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating bootstrap distribution, Use bootstrap distribution find confidence interval, Visualizing observed statistic alongside distribution, Alternatively, use bootstrap distribution find confidence interval using standard error,","code":"correlation_hat <- gss %>% specify(hours ~ age) %>% calculate(stat = \"correlation\") correlation_hat <- gss %>% observe(hours ~ age, stat = \"correlation\") boot_dist <- gss %>% specify(hours ~ age) %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"correlation\") percentile_ci <- get_ci(boot_dist) visualize(boot_dist) + shade_confidence_interval(endpoints = percentile_ci) standard_error_ci <- boot_dist %>% get_ci(type = \"se\", point_estimate = correlation_hat) visualize(boot_dist) + shade_confidence_interval(endpoints = standard_error_ci)"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"two-numerical-vars---t","dir":"Articles","previous_headings":"Confidence intervals","what":"Two numerical vars - t","title":"Full infer Pipeline Examples","text":"currently implemented since \\(t\\) refer standardized slope standardized correlation.","code":""},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"multiple-explanatory-variables-1","dir":"Articles","previous_headings":"Confidence intervals","what":"Multiple explanatory variables","title":"Full infer Pipeline Examples","text":"Calculating observed fit, , generating bootstrap distribution, Use bootstrap distribution find confidence interval, Visualizing observed statistic alongside distribution, Note fit()-based workflow can applied use cases differing numbers explanatory variables explanatory variable types.","code":"obs_fit <- gss %>% specify(hours ~ age + college) %>% fit() boot_dist <- gss %>% specify(hours ~ age + college) %>% generate(reps = 1000, type = \"bootstrap\") %>% fit() conf_ints <- get_confidence_interval( boot_dist, level = .95, point_estimate = obs_fit ) visualize(boot_dist) + shade_confidence_interval(endpoints = conf_ints)"},{"path":"https://infer.tidymodels.org/dev/articles/paired.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Tidy inference for paired data","text":"vignette, ’ll walk conducting randomization-based paired test independence infer. Throughout vignette, ’ll make use gss dataset supplied infer, contains sample data General Social Survey. See ?gss information variables included source. Note data (examples ) demonstration purposes , necessarily provide accurate estimates unless weighted properly. examples, let’s suppose dataset representative sample population want learn : American adults. data looks like : Two sets observations paired observation one column special correspondence connection exactly one observation . purposes vignette, ’ll simulate additional data variable natural pairing: suppose survey respondents provided number hours worked per week surveyed 5 years prior, encoded hours_previous. number hours worked per week particular respondent special correspondence number hours worked 5 years prior hours_previous respondent. ’d like test null hypothesis \"mean\" hours worked per week change sampled time five years prior. carry inference paired data infer, pre-compute difference paired values beginning analysis, use differences values interest. , pre-compute difference paired observations diff. distribution diff observed data looks like : looks distribution, respondents worked similar number hours worked per week 5 hours prior, though seems like may slight decline number hours worked per week aggregate. (know true effect -.2 since ’ve simulated data.) calculate observed statistic paired setting way outside paired setting. Using specify() calculate(): observed statistic -0.202. Now, want compare statistic null distribution, generated assumption true difference actually zero, get sense likely us see observed difference truly change hours worked per week population. Tests paired data carried via null = \"paired independence\" argument hypothesize(). replicate, generate() carries type = \"permute\" null = \"paired independence\" : Randomly sampling vector signs (.e. -1 1), probability .5 either, length equal input data, Multiplying response variable vector signs, “flipping” observed values random subset value replicate get sense distribution looks like, observed statistic falls, can use visualize(): looks like observed mean -0.202 relatively unlikely truly change mean number hours worked per week time period. exactly, can calculate p-value: Thus, change mean number hours worked per week time period truly zero, approximation probability see test statistic extreme -0.202 approximately 0.028. can also generate bootstrap confidence interval mean paired difference using type = \"bootstrap\" generate(). , use pre-computed differences generating bootstrap resamples: Note , unlike null distribution test statistics generated earlier type = \"permute\", distribution centered observed_statistic. Calculating confidence interval: default, get_confidence_interval() constructs lower upper bounds taking observations \\((1 - .95) / 2\\) \\(1 - ((1-.95) / 2)\\)th percentiles. instead build confidence interval using standard error bootstrap distribution, can write: learn randomization-based inference paired observations, see relevant chapter Introduction Modern Statistics.","code":"dplyr::glimpse(gss) ## Rows: 500 ## Columns: 11 ## $ year 2014, 1994, 1998, 1996, 1994, 1996, 1990, 2016, 2000, 19… ## $ age 36, 34, 24, 42, 31, 32, 48, 36, 30, 33, 21, 30, 38, 49, … ## $ sex male, female, male, male, male, female, female, female, … ## $ college degree, no degree, degree, no degree, degree, no degree,… ## $ partyid ind, rep, ind, ind, rep, rep, dem, ind, rep, dem, dem, i… ## $ hompop 3, 4, 1, 4, 2, 4, 2, 1, 5, 2, 4, 3, 4, 4, 2, 2, 3, 2, 1,… ## $ hours 50, 31, 40, 40, 40, 53, 32, 20, 40, 40, 23, 52, 38, 72, … ## $ income $25000 or more, $20000 - 24999, $25000 or more, $25000 o… ## $ class middle class, working class, working class, working clas… ## $ finrela below average, below average, below average, above avera… ## $ weight 0.8960, 1.0825, 0.5501, 1.0864, 1.0825, 1.0864, 1.0627, … set.seed(1) gss_paired <- gss %>% mutate( hours_previous = hours + 5 - rpois(nrow(.), 4.8), diff = hours - hours_previous ) gss_paired %>% select(hours, hours_previous, diff) ## # A tibble: 500 × 3 ## hours hours_previous diff ## ## 1 50 52 -2 ## 2 31 32 -1 ## 3 40 40 0 ## 4 40 37 3 ## 5 40 42 -2 ## 6 53 50 3 ## 7 32 28 4 ## 8 20 19 1 ## 9 40 40 0 ## 10 40 43 -3 ## # ℹ 490 more rows # calculate the observed statistic observed_statistic <- gss_paired %>% specify(response = diff) %>% calculate(stat = \"mean\") # generate the null distribution null_dist <- gss_paired %>% specify(response = diff) %>% hypothesize(null = \"paired independence\") %>% generate(reps = 1000, type = \"permute\") %>% calculate(stat = \"mean\") null_dist ## Response: diff (numeric) ## Null Hypothesis: paired independence ## # A tibble: 1,000 × 2 ## replicate stat ## ## 1 1 -0.146 ## 2 2 0.19 ## 3 3 0.042 ## 4 4 0.034 ## 5 5 -0.138 ## 6 6 -0.03 ## 7 7 0.174 ## 8 8 0.066 ## 9 9 0.01 ## 10 10 0.13 ## # ℹ 990 more rows # visualize the null distribution and test statistic null_dist %>% visualize() + shade_p_value(observed_statistic, direction = \"two-sided\") ## Warning in (function (mapping = NULL, data = NULL, stat = \"identity\", position = \"identity\", : All aesthetics have length 1, but the data has 1000 rows. ## ℹ Did you mean to use `annotate()`? # calculate the p value from the test statistic and null distribution p_value <- null_dist %>% get_p_value(obs_stat = observed_statistic, direction = \"two-sided\") p_value ## # A tibble: 1 × 1 ## p_value ## ## 1 0.028 # generate a bootstrap distribution boot_dist <- gss_paired %>% specify(response = diff) %>% hypothesize(null = \"paired independence\") %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"mean\") visualize(boot_dist) # calculate the confidence from the bootstrap distribution confidence_interval <- boot_dist %>% get_confidence_interval(level = .95) confidence_interval ## # A tibble: 1 × 2 ## lower_ci upper_ci ## ## 1 -0.390 -0.022 boot_dist %>% get_confidence_interval(type = \"se\", point_estimate = observed_statistic, level = .95) ## # A tibble: 1 × 2 ## lower_ci upper_ci ## ## 1 -0.383 -0.0210"},{"path":"https://infer.tidymodels.org/dev/articles/t_test.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Tidy t-Tests with infer","text":"vignette, ’ll walk conducting \\(t\\)-tests randomization-based analogue using infer. ’ll start 1-sample \\(t\\)-test, compares sample mean hypothesized true mean value. , ’ll discuss paired \\(t\\)-tests, special use case 1-sample \\(t\\)-tests, evaluate whether differences paired values (e.g. measure taken person experiment) differ 0. Finally, ’ll wrap 2-sample \\(t\\)-tests, testing difference means two populations using sample data drawn . Throughout vignette, ’ll make use gss dataset supplied infer, contains sample data General Social Survey. See ?gss information variables included source. Note data (examples ) demonstration purposes , necessarily provide accurate estimates unless weighted properly. examples, let’s suppose dataset representative sample population want learn : American adults. data looks like :","code":"dplyr::glimpse(gss) ## Rows: 500 ## Columns: 11 ## $ year 2014, 1994, 1998, 1996, 1994, 1996, 1990, 2016, 2000, 19… ## $ age 36, 34, 24, 42, 31, 32, 48, 36, 30, 33, 21, 30, 38, 49, … ## $ sex male, female, male, male, male, female, female, female, … ## $ college degree, no degree, degree, no degree, degree, no degree,… ## $ partyid ind, rep, ind, ind, rep, rep, dem, ind, rep, dem, dem, i… ## $ hompop 3, 4, 1, 4, 2, 4, 2, 1, 5, 2, 4, 3, 4, 4, 2, 2, 3, 2, 1,… ## $ hours 50, 31, 40, 40, 40, 53, 32, 20, 40, 40, 23, 52, 38, 72, … ## $ income $25000 or more, $20000 - 24999, $25000 or more, $25000 o… ## $ class middle class, working class, working class, working clas… ## $ finrela below average, below average, below average, above avera… ## $ weight 0.8960, 1.0825, 0.5501, 1.0864, 1.0825, 1.0864, 1.0627, …"},{"path":"https://infer.tidymodels.org/dev/articles/t_test.html","id":"sample-t-test","dir":"Articles","previous_headings":"","what":"1-Sample t-Test","title":"Tidy t-Tests with infer","text":"1-sample \\(t\\)-test can used test whether sample continuous data plausibly come population specified mean. example, ’ll test whether average American adult works 40 hours week using data gss. , make use hours variable, giving number hours respondents reported worked previous week. distribution hours observed data looks like : looks like respondents reported worked 40 hours, ’s quite bit variability. Let’s test whether evidence true mean number hours Americans work per week 40. infer’s randomization-based analogue 1-sample \\(t\\)-test 1-sample mean test. ’ll start showcasing test demonstrating carry theory-based \\(t\\)-test package. First, calculate observed statistic, can use specify() calculate(). observed statistic 41.382. Now, want compare statistic null distribution, generated assumption mean actually 40, get sense likely us see observed mean true number hours worked per week population really 40. can generate null distribution using bootstrap. bootstrap, replicate, sample size equal input sample size drawn (replacement) input sample data. allows us get sense much variability ’d expect see entire population can understand unlikely sample mean . get sense distributions look like, observed statistic falls, can use visualize(): looks like observed mean 41.382 relatively unlikely true mean actually 40 hours week. exactly, can calculate p-value: Thus, true mean number hours worked per week really 40, approximation probability see test statistic extreme 41.382 approximately 0.04. Analogously steps shown , package supplies wrapper function, t_test, carry 1-sample \\(t\\)-tests tidy data. Rather using randomization, wrappers carry theory-based \\(t\\)-test. syntax looks like : alternative approach t_test() wrapper calculate observed statistic infer pipeline supply pt function base R. Note pipeline calculate observed statistic includes call hypothesize() since \\(t\\) statistic requires hypothesized mean value. , juxtaposing \\(t\\) statistic associated distribution using pt function: Note resulting \\(t\\)-statistics two theory-based approaches .","code":"# calculate the observed statistic observed_statistic <- gss %>% specify(response = hours) %>% calculate(stat = \"mean\") # generate the null distribution null_dist_1_sample <- gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"mean\") # visualize the null distribution and test statistic! null_dist_1_sample %>% visualize() + shade_p_value(observed_statistic, direction = \"two-sided\") # calculate the p value from the test statistic and null distribution p_value_1_sample <- null_dist_1_sample %>% get_p_value(obs_stat = observed_statistic, direction = \"two-sided\") p_value_1_sample ## # A tibble: 1 × 1 ## p_value ## ## 1 0.04 t_test(gss, response = hours, mu = 40) ## # A tibble: 1 × 7 ## statistic t_df p_value alternative estimate lower_ci upper_ci ## ## 1 2.09 499 0.0376 two.sided 41.4 40.1 42.7 # calculate the observed statistic observed_statistic <- gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% calculate(stat = \"t\") %>% dplyr::pull() pt(unname(observed_statistic), df = nrow(gss) - 1, lower.tail = FALSE)*2 ## [1] 0.03756"},{"path":"https://infer.tidymodels.org/dev/articles/t_test.html","id":"sample-t-test-1","dir":"Articles","previous_headings":"","what":"2-Sample t-Test","title":"Tidy t-Tests with infer","text":"2-Sample \\(t\\)-tests evaluate difference mean values two populations using data randomly-sampled population approximately follows normal distribution. example, ’ll test Americans work number hours week regardless whether college degree using data gss. college hours variables allow us : looks like distributions centered near 40 hours week, distribution degree slightly right skewed. , note warning missing values—many respondents’ values missing. actually carrying hypothesis test, might look data collected; ’s possible whether value either columns missing related value . infer’s randomization-based analogue 2-sample \\(t\\)-test difference means test. ’ll start showcasing test demonstrating carry theory-based \\(t\\)-test package. one-sample test, calculate observed difference means, can use specify() calculate(). Note , line specify(hours ~ college), swapped syntax specify(response = hours, explanatory = college)! order argument calculate line gives order subtract mean values : case, ’re taking mean number hours worked degree minus mean number hours worked without degree; positive difference, , mean people degrees worked without degree. Now, want compare difference means null distribution, generated assumption number hours worked week relationship whether one college degree, get sense likely us see observed difference means really relationship two variables. can generate null distribution using permutation, , replicate, value degree status randomly reassigned (without replacement) new number hours worked per week sample order break association two. , note , lines specify(hours ~ college) chunk, used syntax specify(response = hours, explanatory = college) instead! get sense distributions look like, observed statistic falls, can use visualize(). looks like observed statistic 1.5384 unlikely truly relationship degree status number hours worked. exactly, can calculate p-value; theoretical p-values yet supported, ’ll use randomization-based null distribution calculate p-value. Thus, really relationship number hours worked week whether one college degree, probability see statistic extreme 1.5384 approximately 0.25. Note , similarly steps shown , package supplies wrapper function, t_test, carry 2-sample \\(t\\)-tests tidy data. syntax looks like : example, specified relationship syntax formula = hours ~ college; also written response = hours, explanatory = college. alternative approach t_test() wrapper calculate observed statistic infer pipeline supply pt function base R. can calculate statistic , switching stat = \"diff means\" argument stat = \"t\". Note pipeline calculate observed statistic includes hypothesize() since \\(t\\) statistic requires hypothesized mean value. , juxtaposing \\(t\\) statistic associated distribution using pt function: Note results two theory-based approaches nearly .","code":"# calculate the observed statistic observed_statistic <- gss %>% specify(hours ~ college) %>% calculate(stat = \"diff in means\", order = c(\"degree\", \"no degree\")) observed_statistic ## Response: hours (numeric) ## Explanatory: college (factor) ## # A tibble: 1 × 1 ## stat ## ## 1 1.54 # generate the null distribution with randomization null_dist_2_sample <- gss %>% specify(hours ~ college) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") %>% calculate(stat = \"diff in means\", order = c(\"degree\", \"no degree\")) # visualize the randomization-based null distribution and test statistic! null_dist_2_sample %>% visualize() + shade_p_value(observed_statistic, direction = \"two-sided\") # calculate the p value from the randomization-based null # distribution and the observed statistic p_value_2_sample <- null_dist_2_sample %>% get_p_value(obs_stat = observed_statistic, direction = \"two-sided\") p_value_2_sample ## # A tibble: 1 × 1 ## p_value ## ## 1 0.25 t_test(x = gss, formula = hours ~ college, order = c(\"degree\", \"no degree\"), alternative = \"two-sided\") ## # A tibble: 1 × 7 ## statistic t_df p_value alternative estimate lower_ci upper_ci ## ## 1 1.12 366. 0.264 two.sided 1.54 -1.16 4.24 # calculate the observed statistic observed_statistic <- gss %>% specify(hours ~ college) %>% hypothesize(null = \"point\", mu = 40) %>% calculate(stat = \"t\", order = c(\"degree\", \"no degree\")) %>% dplyr::pull() observed_statistic ## t ## 1.119 pt(unname(observed_statistic), df = nrow(gss) - 2, lower.tail = FALSE)*2 ## [1] 0.2635"},{"path":"https://infer.tidymodels.org/dev/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Andrew Bray. Author. Chester Ismay. Author. Evgeni Chasnovski. Author. Simon Couch. Author, maintainer. Ben Baumer. Author. Mine Cetinkaya-Rundel. Author. Ted Laderas. Contributor. Nick Solomon. Contributor. Johanna Hardin. Contributor. Albert Y. Kim. Contributor. Neal Fultz. Contributor. Doug Friedman. Contributor. Richie Cotton. Contributor. Brian Fannin. Contributor.","code":""},{"path":"https://infer.tidymodels.org/dev/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Couch et al., (2021). infer: R package tidyverse-friendly statistical inference. Journal Open Source Software, 6(65), 3661, https://doi.org/10.21105/joss.03661","code":"@Article{, title = {{infer}: An {R} package for tidyverse-friendly statistical inference}, author = {Simon P. Couch and Andrew P. Bray and Chester Ismay and Evgeni Chasnovski and Benjamin S. Baumer and Mine Çetinkaya-Rundel}, journal = {Journal of Open Source Software}, year = {2021}, volume = {6}, number = {65}, pages = {3661}, doi = {10.21105/joss.03661}, }"},{"path":"https://infer.tidymodels.org/dev/index.html","id":"infer-r-package-","dir":"","previous_headings":"","what":"Tidy Statistical Inference","title":"Tidy Statistical Inference","text":"objective package perform statistical inference using expressive statistical grammar coheres tidyverse design framework. package centered around 4 main verbs, supplemented many utilities visualize extract value outputs. specify() allows specify variable, relationship variables, ’re interested . hypothesize() allows declare null hypothesis. generate() allows generate data reflecting null hypothesis. calculate() allows calculate distribution statistics generated data form null distribution. learn principles underlying package design, see vignette(\"infer\"). ’re interested learning randomization-based statistical inference generally, including applied examples package, recommend checking Statistical Inference Via Data Science: ModernDive R Tidyverse Introduction Modern Statistics.","code":""},{"path":"https://infer.tidymodels.org/dev/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"Tidy Statistical Inference","text":"install current stable version infer CRAN: install developmental stable version infer, make sure install remotes first. pkgdown website version infer.tidymodels.org.","code":"install.packages(\"infer\") # install.packages(\"pak\") pak::pak(\"tidymodels/infer\")"},{"path":"https://infer.tidymodels.org/dev/index.html","id":"contributing","dir":"","previous_headings":"","what":"Contributing","title":"Tidy Statistical Inference","text":"welcome others helping us make package user-friendly efficient possible. Please review contributing conduct guidelines. participating project agree abide terms. questions discussions tidymodels packages, modeling, machine learning, please post Posit Community. think encountered bug, please submit issue. Either way, learn create share reprex (minimal, reproducible example), clearly communicate code. Check details contributing guidelines tidymodels packages get help.","code":""},{"path":"https://infer.tidymodels.org/dev/index.html","id":"examples","dir":"","previous_headings":"","what":"Examples","title":"Tidy Statistical Inference","text":"examples pulled “Full infer Pipeline Examples” vignette, accessible calling vignette(\"observed_stat_examples\"). make use gss dataset supplied package, providing sample data General Social Survey. data looks like : example, ’ll run analysis variance age partyid, testing whether age respondent independent political party affiliation. Calculating observed statistic, , generating null distribution, Visualizing observed statistic alongside null distribution, Calculating p-value null distribution observed statistic, Note formula non-formula interfaces (.e. age ~ partyid vs. response = age, explanatory = partyid) work implemented inference procedures infer. Use whatever natural . modeling using functions like lm() glm(), though, recommend begin use formula y ~ x notation soon possible. resources available package vignettes! See vignette(\"observed_stat_examples\") examples like one , vignette(\"infer\") discussion underlying principles package design.","code":"# load in the dataset data(gss) # take a glimpse at it str(gss) ## tibble [500 × 11] (S3: tbl_df/tbl/data.frame) ## $ year : num [1:500] 2014 1994 1998 1996 1994 ... ## $ age : num [1:500] 36 34 24 42 31 32 48 36 30 33 ... ## $ sex : Factor w/ 2 levels \"male\",\"female\": 1 2 1 1 1 2 2 2 2 2 ... ## $ college: Factor w/ 2 levels \"no degree\",\"degree\": 2 1 2 1 2 1 1 2 2 1 ... ## $ partyid: Factor w/ 5 levels \"dem\",\"ind\",\"rep\",..: 2 3 2 2 3 3 1 2 3 1 ... ## $ hompop : num [1:500] 3 4 1 4 2 4 2 1 5 2 ... ## $ hours : num [1:500] 50 31 40 40 40 53 32 20 40 40 ... ## $ income : Ord.factor w/ 12 levels \"lt $1000\"<\"$1000 to 2999\"<..: 12 11 12 12 12 12 12 12 12 10 ... ## $ class : Factor w/ 6 levels \"lower class\",..: 3 2 2 2 3 3 2 3 3 2 ... ## $ finrela: Factor w/ 6 levels \"far below average\",..: 2 2 2 4 4 3 2 4 3 1 ... ## $ weight : num [1:500] 0.896 1.083 0.55 1.086 1.083 ... F_hat <- gss %>% specify(age ~ partyid) %>% calculate(stat = \"F\") null_dist <- gss %>% specify(age ~ partyid) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") %>% calculate(stat = \"F\") visualize(null_dist) + shade_p_value(obs_stat = F_hat, direction = \"greater\") null_dist %>% get_p_value(obs_stat = F_hat, direction = \"greater\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.06"},{"path":"https://infer.tidymodels.org/dev/reference/assume.html","id":null,"dir":"Reference","previous_headings":"","what":"Define a theoretical distribution — assume","title":"Define a theoretical distribution — assume","text":"function allows user define null distribution based theoretical methods. many infer pipelines, assume() can used place generate() calculate() create null distribution. Rather outputting data frame containing distribution test statistics calculated resamples observed data, assume() outputs abstract type object just containing distributional details supplied distribution df arguments. However, assume() output can passed visualize(), get_p_value(), get_confidence_interval() way simulation-based distributions can. define theoretical null distribution (use hypothesis testing), sure provide null hypothesis via hypothesize(). define theoretical sampling distribution (use confidence intervals), provide output specify(). Sampling distributions (implemented t z) lie scale data, recentered rescaled match corresponding stat given calculate() calculate observed statistic.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/assume.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Define a theoretical distribution — assume","text":"","code":"assume(x, distribution, df = NULL, ...)"},{"path":"https://infer.tidymodels.org/dev/reference/assume.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Define a theoretical distribution — assume","text":"x output specify() hypothesize(), giving observed data, variable(s) interest, (optionally) null hypothesis. distribution distribution question, string. One \"F\", \"Chisq\", \"t\", \"z\". df Optional. degrees freedom parameter(s) distribution supplied, numeric vector. distribution = \"F\", length two (e.g. c(10, 3)). distribution = \"Chisq\" distribution = \"t\", length one. distribution = \"z\", argument required. package supply message supplied df argument different recognized values. See Details section information. ... Currently ignored.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/assume.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Define a theoretical distribution — assume","text":"infer theoretical distribution can passed helpers like visualize(), get_p_value(), get_confidence_interval().","code":""},{"path":"https://infer.tidymodels.org/dev/reference/assume.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Define a theoretical distribution — assume","text":"Note assumption expressed , use theory-based inference, extends distributional assumptions: null distribution question parameters. Statistical inference infer, whether carried via simulation (.e. based pipelines using generate() calculate()) theory (.e. assume()), always involves condition observations independent . infer supports theoretical tests one two means via t distribution one two proportions via z. tests comparing two means, n1 group size one level explanatory variable, n2 level, infer recognize following degrees freedom (df) arguments: min(n1 - 1, n2 - 1) n1 + n2 - 2 \"parameter\" entry analogous stats::t.test() call \"parameter\" entry analogous stats::t.test() call var.equal = TRUE default, package use \"parameter\" entry analogous stats::t.test() call var.equal = FALSE (default).","code":""},{"path":"https://infer.tidymodels.org/dev/reference/assume.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Define a theoretical distribution — assume","text":"","code":"# construct theoretical distributions --------------------------------- # F distribution # with the `partyid` explanatory variable gss %>% specify(age ~ partyid) %>% assume(distribution = \"F\") #> Dropping unused factor levels DK from the supplied explanatory variable #> 'partyid'. #> An F distribution with 3 and 496 degrees of freedom. # Chi-squared goodness of fit distribution # on the `finrela` variable gss %>% specify(response = finrela) %>% hypothesize(null = \"point\", p = c(\"far below average\" = 1/6, \"below average\" = 1/6, \"average\" = 1/6, \"above average\" = 1/6, \"far above average\" = 1/6, \"DK\" = 1/6)) %>% assume(\"Chisq\") #> A Chi-squared distribution with 5 degrees of freedom. # Chi-squared test of independence # on the `finrela` and `sex` variables gss %>% specify(formula = finrela ~ sex) %>% assume(distribution = \"Chisq\") #> A Chi-squared distribution with 5 degrees of freedom. # T distribution gss %>% specify(age ~ college) %>% assume(\"t\") #> A T distribution with 423 degrees of freedom. # Z distribution gss %>% specify(response = sex, success = \"female\") %>% assume(\"z\") #> A Z distribution. if (FALSE) { # each of these distributions can be passed to infer helper # functions alongside observed statistics! # for example, a 1-sample t-test ------------------------------------- # calculate the observed statistic obs_stat <- gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% calculate(stat = \"t\") # construct a null distribution null_dist <- gss %>% specify(response = hours) %>% assume(\"t\") # juxtapose them visually visualize(null_dist) + shade_p_value(obs_stat, direction = \"both\") # calculate a p-value get_p_value(null_dist, obs_stat, direction = \"both\") # or, an F test ------------------------------------------------------ # calculate the observed statistic obs_stat <- gss %>% specify(age ~ partyid) %>% hypothesize(null = \"independence\") %>% calculate(stat = \"F\") # construct a null distribution null_dist <- gss %>% specify(age ~ partyid) %>% assume(distribution = \"F\") # juxtapose them visually visualize(null_dist) + shade_p_value(obs_stat, direction = \"both\") # calculate a p-value get_p_value(null_dist, obs_stat, direction = \"both\") }"},{"path":"https://infer.tidymodels.org/dev/reference/calculate.html","id":null,"dir":"Reference","previous_headings":"","what":"Calculate summary statistics — calculate","title":"Calculate summary statistics — calculate","text":"Given output specify() /hypothesize(), function return observed statistic specified stat argument. test statistics, Chisq, t, z, require null hypothesis. provided output generate(), function calculate supplied stat replicate. Learn vignette(\"infer\").","code":""},{"path":"https://infer.tidymodels.org/dev/reference/calculate.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calculate summary statistics — calculate","text":"","code":"calculate( x, stat = c(\"mean\", \"median\", \"sum\", \"sd\", \"prop\", \"count\", \"diff in means\", \"diff in medians\", \"diff in props\", \"Chisq\", \"F\", \"slope\", \"correlation\", \"t\", \"z\", \"ratio of props\", \"odds ratio\", \"ratio of means\"), order = NULL, ... )"},{"path":"https://infer.tidymodels.org/dev/reference/calculate.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calculate summary statistics — calculate","text":"x output generate() computation-based inference output hypothesize() piped theory-based inference. stat string giving type statistic calculate. Current options include \"mean\", \"median\", \"sum\", \"sd\", \"prop\", \"count\", \"diff means\", \"diff medians\", \"diff props\", \"Chisq\" (\"chisq\"), \"F\" (\"f\"), \"t\", \"z\", \"ratio props\", \"slope\", \"odds ratio\", \"ratio means\", \"correlation\". infer supports theoretical tests one two means via \"t\" distribution one two proportions via \"z\". order string vector specifying order levels explanatory variable ordered subtraction (division ratio-based statistics), order = c(\"first\", \"second\") means (\"first\" - \"second\"), analogue ratios. Needed inference difference means, medians, proportions, ratios, t, z statistics. ... pass options like na.rm = TRUE functions like mean(), sd(), etc. Can also used supply hypothesized null values \"t\" statistic additional arguments stats::chisq.test().","code":""},{"path":"https://infer.tidymodels.org/dev/reference/calculate.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calculate summary statistics — calculate","text":"tibble containing stat column calculated statistics.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/calculate.html","id":"missing-levels-in-small-samples","dir":"Reference","previous_headings":"","what":"Missing levels in small samples","title":"Calculate summary statistics — calculate","text":"cases, bootstrapping small samples, generated bootstrap samples one level explanatory variable present. test statistics, calculated statistic cases NaN. package omit non-finite values visualizations (warning) raise error p-value calculations.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/calculate.html","id":"reproducibility","dir":"Reference","previous_headings":"","what":"Reproducibility","title":"Calculate summary statistics — calculate","text":"using infer package research, cases exact reproducibility priority, sure set seed R’s random number generator. infer respect random seed specified set.seed() function, returning result generate()ing data given identical seed. instance, can calculate difference mean age college degree status using gss dataset 10 versions gss resampled permutation using following code. Setting seed value rerunning code produce result. Please keep mind writing infer code utilizes resampling generate().","code":"set.seed(1) gss %>% specify(age ~ college) %>% hypothesize(null = \"independence\") %>% generate(reps = 5, type = \"permute\") %>% calculate(\"diff in means\", order = c(\"degree\", \"no degree\")) ## Response: age (numeric) ## Explanatory: college (factor) ## Null Hypothesis: independence ## # A tibble: 5 x 2 ## replicate stat ## ## 1 1 -0.531 ## 2 2 -2.35 ## 3 3 0.764 ## 4 4 0.280 ## 5 5 0.350 # set the seed set.seed(1) gss %>% specify(age ~ college) %>% hypothesize(null = \"independence\") %>% generate(reps = 5, type = \"permute\") %>% calculate(\"diff in means\", order = c(\"degree\", \"no degree\")) ## Response: age (numeric) ## Explanatory: college (factor) ## Null Hypothesis: independence ## # A tibble: 5 x 2 ## replicate stat ## ## 1 1 -0.531 ## 2 2 -2.35 ## 3 3 0.764 ## 4 4 0.280 ## 5 5 0.350"},{"path":[]},{"path":"https://infer.tidymodels.org/dev/reference/calculate.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Calculate summary statistics — calculate","text":"","code":"# calculate a null distribution of hours worked per week under # the null hypothesis that the mean is 40 gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% generate(reps = 200, type = \"bootstrap\") %>% calculate(stat = \"mean\") #> Response: hours (numeric) #> Null Hypothesis: point #> # A tibble: 200 × 2 #> replicate stat #> #> 1 1 39.2 #> 2 2 39.4 #> 3 3 40.1 #> 4 4 39.6 #> 5 5 40.8 #> 6 6 39.9 #> 7 7 39.9 #> 8 8 40.8 #> 9 9 39.6 #> 10 10 41.0 #> # ℹ 190 more rows # calculate the corresponding observed statistic gss %>% specify(response = hours) %>% calculate(stat = \"mean\") #> Response: hours (numeric) #> # A tibble: 1 × 1 #> stat #> #> 1 41.4 # calculate a null distribution assuming independence between age # of respondent and whether they have a college degree gss %>% specify(age ~ college) %>% hypothesize(null = \"independence\") %>% generate(reps = 200, type = \"permute\") %>% calculate(\"diff in means\", order = c(\"degree\", \"no degree\")) #> Response: age (numeric) #> Explanatory: college (factor) #> Null Hypothesis: independence #> # A tibble: 200 × 2 #> replicate stat #> #> 1 1 -2.48 #> 2 2 -0.699 #> 3 3 -0.0113 #> 4 4 0.579 #> 5 5 0.553 #> 6 6 1.84 #> 7 7 -2.31 #> 8 8 -0.320 #> 9 9 -0.00250 #> 10 10 -1.78 #> # ℹ 190 more rows # calculate the corresponding observed statistic gss %>% specify(age ~ college) %>% calculate(\"diff in means\", order = c(\"degree\", \"no degree\")) #> Response: age (numeric) #> Explanatory: college (factor) #> # A tibble: 1 × 1 #> stat #> #> 1 0.941 # some statistics require a null hypothesis gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% calculate(stat = \"t\") #> Response: hours (numeric) #> Null Hypothesis: point #> # A tibble: 1 × 1 #> stat #> #> 1 2.09 # more in-depth explanation of how to use the infer package if (FALSE) { vignette(\"infer\") }"},{"path":"https://infer.tidymodels.org/dev/reference/chisq_stat.html","id":null,"dir":"Reference","previous_headings":"","what":"Tidy chi-squared test statistic — chisq_stat","title":"Tidy chi-squared test statistic — chisq_stat","text":"@description","code":""},{"path":"https://infer.tidymodels.org/dev/reference/chisq_stat.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Tidy chi-squared test statistic — chisq_stat","text":"","code":"chisq_stat(x, formula, response = NULL, explanatory = NULL, ...)"},{"path":"https://infer.tidymodels.org/dev/reference/chisq_stat.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Tidy chi-squared test statistic — chisq_stat","text":"x data frame can coerced tibble. formula formula response variable left explanatory right. Alternatively, response explanatory argument can supplied. response variable name x serve response. alternative using formula argument. explanatory variable name x serve explanatory variable. alternative using formula argument. ... Additional arguments chisq.test().","code":""},{"path":"https://infer.tidymodels.org/dev/reference/chisq_stat.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Tidy chi-squared test statistic — chisq_stat","text":"shortcut wrapper function get observed test statistic chisq test. Uses chisq.test(), applies continuity correction. function deprecated favor general observe().","code":""},{"path":[]},{"path":"https://infer.tidymodels.org/dev/reference/chisq_stat.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Tidy chi-squared test statistic — chisq_stat","text":"","code":"# chi-squared test statistic for test of independence # of college completion status depending and one's # self-identified income class chisq_stat(gss, college ~ finrela) #> Warning: The chisq_stat() wrapper has been deprecated in favor of the more general observe(). Please use that function instead. #> X-squared #> 30.68252 # chi-squared test statistic for a goodness of fit # test on whether self-identified income class # follows a uniform distribution chisq_stat(gss, response = finrela, p = c(\"far below average\" = 1/6, \"below average\" = 1/6, \"average\" = 1/6, \"above average\" = 1/6, \"far above average\" = 1/6, \"DK\" = 1/6)) #> Warning: The chisq_stat() wrapper has been deprecated in favor of the more general observe(). Please use that function instead. #> X-squared #> 487.984"},{"path":"https://infer.tidymodels.org/dev/reference/chisq_test.html","id":null,"dir":"Reference","previous_headings":"","what":"Tidy chi-squared test — chisq_test","title":"Tidy chi-squared test — chisq_test","text":"tidier version chisq.test() goodness fit tests tests independence.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/chisq_test.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Tidy chi-squared test — chisq_test","text":"","code":"chisq_test(x, formula, response = NULL, explanatory = NULL, ...)"},{"path":"https://infer.tidymodels.org/dev/reference/chisq_test.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Tidy chi-squared test — chisq_test","text":"x data frame can coerced tibble. formula formula response variable left explanatory right. Alternatively, response explanatory argument can supplied. response variable name x serve response. alternative using formula argument. explanatory variable name x serve explanatory variable. alternative using formula argument. ... Additional arguments chisq.test().","code":""},{"path":[]},{"path":"https://infer.tidymodels.org/dev/reference/chisq_test.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Tidy chi-squared test — chisq_test","text":"","code":"# chi-squared test of independence for college completion # status depending on one's self-identified income class chisq_test(gss, college ~ finrela) #> Warning: Chi-squared approximation may be incorrect #> # A tibble: 1 × 3 #> statistic chisq_df p_value #> #> 1 30.7 5 0.0000108 # chi-squared goodness of fit test on whether self-identified # income class follows a uniform distribution chisq_test(gss, response = finrela, p = c(\"far below average\" = 1/6, \"below average\" = 1/6, \"average\" = 1/6, \"above average\" = 1/6, \"far above average\" = 1/6, \"DK\" = 1/6)) #> # A tibble: 1 × 3 #> statistic chisq_df p_value #> #> 1 488. 5 3.13e-103"},{"path":"https://infer.tidymodels.org/dev/reference/deprecated.html","id":null,"dir":"Reference","previous_headings":"","what":"Deprecated functions and objects — deprecated","title":"Deprecated functions and objects — deprecated","text":"functions objects longer used. removed future release infer.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/deprecated.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Deprecated functions and objects — deprecated","text":"","code":"conf_int(x, level = 0.95, type = \"percentile\", point_estimate = NULL) p_value(x, obs_stat, direction)"},{"path":"https://infer.tidymodels.org/dev/reference/deprecated.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Deprecated functions and objects — deprecated","text":"x See non-deprecated function. level See non-deprecated function. type See non-deprecated function. point_estimate See non-deprecated function. obs_stat See non-deprecated function. direction See non-deprecated function.","code":""},{"path":[]},{"path":"https://infer.tidymodels.org/dev/reference/fit.infer.html","id":null,"dir":"Reference","previous_headings":"","what":"Fit linear models to infer objects — fit.infer","title":"Fit linear models to infer objects — fit.infer","text":"Given output infer core function, function fit linear model using stats::glm() according formula data supplied earlier pipeline. passed output specify() hypothesize(), function fit one model. passed output generate(), fit model data resample, denoted replicate column. family fitted model depends type response variable. response numeric, fit() use family = \"gaussian\" (linear regression). response 2-level factor character, fit() use family = \"binomial\" (logistic regression). fit character factor response variables two levels, recommend parsnip::multinom_reg(). infer provides fit \"method\" infer objects, way carrying model fitting applied infer output. \"generic,\" imported generics package re-exported package, provides general form fit() points infer's method called infer object. generic also documented . Learn vignette(\"infer\").","code":""},{"path":"https://infer.tidymodels.org/dev/reference/fit.infer.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Fit linear models to infer objects — fit.infer","text":"","code":"# S3 method for infer fit(object, ...)"},{"path":"https://infer.tidymodels.org/dev/reference/fit.infer.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Fit linear models to infer objects — fit.infer","text":"object Output infer function---likely generate() specify()---specifies formula data fit model . ... optional arguments pass along model fitting function. See stats::glm() information.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/fit.infer.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Fit linear models to infer objects — fit.infer","text":"tibble containing following columns: replicate: supplied input object previously passed generate(). number corresponding resample original data set model fitted . term: explanatory variable (intercept) question. estimate: model coefficient given resample (replicate) explanatory variable (term).","code":""},{"path":"https://infer.tidymodels.org/dev/reference/fit.infer.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Fit linear models to infer objects — fit.infer","text":"Randomization-based statistical inference multiple explanatory variables requires careful consideration null hypothesis question implications permutation procedures. Inference partial regression coefficients via permutation method implemented generate() multiple explanatory variables, consistent meaning elsewhere package, subject additional distributional assumptions beyond required one explanatory variable. Namely, distribution response variable must similar distribution errors null hypothesis' specification fixed effect explanatory variables. (null hypothesis reflected variables argument generate(). default, explanatory variables treated fixed.) general rule thumb , large outliers distributions explanatory variables, distributional assumption satisfied; response variable permuted, (presumably outlying) value response longer paired outlier explanatory variable, causing outsize effect resulting slope coefficient explanatory variable. sophisticated methods outside scope package requiring fewer---less strict---distributional assumptions exist. overview, see \"Permutation tests univariate multivariate analysis variance regression\" (Marti J. Anderson, 2001), doi:10.1139/cjfas-58-3-626 .","code":""},{"path":"https://infer.tidymodels.org/dev/reference/fit.infer.html","id":"reproducibility","dir":"Reference","previous_headings":"","what":"Reproducibility","title":"Fit linear models to infer objects — fit.infer","text":"using infer package research, cases exact reproducibility priority, sure set seed R’s random number generator. infer respect random seed specified set.seed() function, returning result generate()ing data given identical seed. instance, can calculate difference mean age college degree status using gss dataset 10 versions gss resampled permutation using following code. Setting seed value rerunning code produce result. Please keep mind writing infer code utilizes resampling generate().","code":"set.seed(1) gss %>% specify(age ~ college) %>% hypothesize(null = \"independence\") %>% generate(reps = 5, type = \"permute\") %>% calculate(\"diff in means\", order = c(\"degree\", \"no degree\")) ## Response: age (numeric) ## Explanatory: college (factor) ## Null Hypothesis: independence ## # A tibble: 5 x 2 ## replicate stat ## ## 1 1 -0.531 ## 2 2 -2.35 ## 3 3 0.764 ## 4 4 0.280 ## 5 5 0.350 # set the seed set.seed(1) gss %>% specify(age ~ college) %>% hypothesize(null = \"independence\") %>% generate(reps = 5, type = \"permute\") %>% calculate(\"diff in means\", order = c(\"degree\", \"no degree\")) ## Response: age (numeric) ## Explanatory: college (factor) ## Null Hypothesis: independence ## # A tibble: 5 x 2 ## replicate stat ## ## 1 1 -0.531 ## 2 2 -2.35 ## 3 3 0.764 ## 4 4 0.280 ## 5 5 0.350"},{"path":"https://infer.tidymodels.org/dev/reference/fit.infer.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Fit linear models to infer objects — fit.infer","text":"","code":"# fit a linear model predicting number of hours worked per # week using respondent age and degree status. observed_fit <- gss %>% specify(hours ~ age + college) %>% fit() observed_fit #> # A tibble: 3 × 2 #> term estimate #> #> 1 intercept 40.6 #> 2 age 0.00596 #> 3 collegedegree 1.53 # fit 100 models to resamples of the gss dataset, where the response # `hours` is permuted in each. note that this code is the same as # the above except for the addition of the `generate` step. null_fits <- gss %>% specify(hours ~ age + college) %>% hypothesize(null = \"independence\") %>% generate(reps = 100, type = \"permute\") %>% fit() null_fits #> # A tibble: 300 × 3 #> # Groups: replicate [100] #> replicate term estimate #> #> 1 1 intercept 43.4 #> 2 1 age -0.0457 #> 3 1 collegedegree -0.481 #> 4 2 intercept 41.2 #> 5 2 age 0.00565 #> 6 2 collegedegree -0.212 #> 7 3 intercept 40.3 #> 8 3 age 0.0314 #> 9 3 collegedegree -0.510 #> 10 4 intercept 40.5 #> # ℹ 290 more rows # for logistic regression, just supply a binary response variable! # (this can also be made explicit via the `family` argument in ...) gss %>% specify(college ~ age + hours) %>% fit() #> # A tibble: 3 × 2 #> term estimate #> #> 1 intercept -1.13 #> 2 age 0.00527 #> 3 hours 0.00698 # more in-depth explanation of how to use the infer package if (FALSE) { vignette(\"infer\") }"},{"path":"https://infer.tidymodels.org/dev/reference/generate.html","id":null,"dir":"Reference","previous_headings":"","what":"Generate resamples, permutations, or simulations — generate","title":"Generate resamples, permutations, or simulations — generate","text":"Generation creates simulated distribution specify(). context confidence intervals, bootstrap distribution based result specify(). context hypothesis testing, null distribution based result specify() hypothesize(). Learn vignette(\"infer\").","code":""},{"path":"https://infer.tidymodels.org/dev/reference/generate.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Generate resamples, permutations, or simulations — generate","text":"","code":"generate(x, reps = 1, type = NULL, variables = !!response_expr(x), ...)"},{"path":"https://infer.tidymodels.org/dev/reference/generate.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Generate resamples, permutations, or simulations — generate","text":"x data frame can coerced tibble. reps number resamples generate. type method used generate resamples observed data reflecting null hypothesis. Currently one \"bootstrap\", \"permute\", \"draw\" (see ). variables type = \"permute\", set unquoted column names data permute (independently ). Defaults response variable. Note derived effects depend columns (e.g., interaction effects) also affected. ... Currently ignored.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/generate.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Generate resamples, permutations, or simulations — generate","text":"tibble containing reps generated datasets, indicated replicate column.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/generate.html","id":"generation-types","dir":"Reference","previous_headings":"","what":"Generation Types","title":"Generate resamples, permutations, or simulations — generate","text":"type argument determines method used create null distribution. bootstrap: bootstrap sample drawn replicate, sample size equal input sample size drawn (replacement) input sample data. permute: replicate, input value randomly reassigned (without replacement) new output value sample. draw: value sampled theoretical distribution parameter p specified hypothesize() replicate. option currently applicable testing one proportion. generation type previously called \"simulate\", superseded.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/generate.html","id":"reproducibility","dir":"Reference","previous_headings":"","what":"Reproducibility","title":"Generate resamples, permutations, or simulations — generate","text":"using infer package research, cases exact reproducibility priority, sure set seed R’s random number generator. infer respect random seed specified set.seed() function, returning result generate()ing data given identical seed. instance, can calculate difference mean age college degree status using gss dataset 10 versions gss resampled permutation using following code. Setting seed value rerunning code produce result. Please keep mind writing infer code utilizes resampling generate().","code":"set.seed(1) gss %>% specify(age ~ college) %>% hypothesize(null = \"independence\") %>% generate(reps = 5, type = \"permute\") %>% calculate(\"diff in means\", order = c(\"degree\", \"no degree\")) ## Response: age (numeric) ## Explanatory: college (factor) ## Null Hypothesis: independence ## # A tibble: 5 x 2 ## replicate stat ## ## 1 1 -0.531 ## 2 2 -2.35 ## 3 3 0.764 ## 4 4 0.280 ## 5 5 0.350 # set the seed set.seed(1) gss %>% specify(age ~ college) %>% hypothesize(null = \"independence\") %>% generate(reps = 5, type = \"permute\") %>% calculate(\"diff in means\", order = c(\"degree\", \"no degree\")) ## Response: age (numeric) ## Explanatory: college (factor) ## Null Hypothesis: independence ## # A tibble: 5 x 2 ## replicate stat ## ## 1 1 -0.531 ## 2 2 -2.35 ## 3 3 0.764 ## 4 4 0.280 ## 5 5 0.350"},{"path":[]},{"path":"https://infer.tidymodels.org/dev/reference/generate.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Generate resamples, permutations, or simulations — generate","text":"","code":"# generate a null distribution by taking 200 bootstrap samples gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% generate(reps = 200, type = \"bootstrap\") #> Response: hours (numeric) #> Null Hypothesis: point #> # A tibble: 100,000 × 2 #> # Groups: replicate [200] #> replicate hours #> #> 1 1 48.6 #> 2 1 38.6 #> 3 1 38.6 #> 4 1 8.62 #> 5 1 38.6 #> 6 1 38.6 #> 7 1 18.6 #> 8 1 38.6 #> 9 1 38.6 #> 10 1 58.6 #> # ℹ 99,990 more rows # generate a null distribution for the independence of # two variables by permuting their values 200 times gss %>% specify(partyid ~ age) %>% hypothesize(null = \"independence\") %>% generate(reps = 200, type = \"permute\") #> Dropping unused factor levels DK from the supplied response variable #> 'partyid'. #> Response: partyid (factor) #> Explanatory: age (numeric) #> Null Hypothesis: independence #> # A tibble: 100,000 × 3 #> # Groups: replicate [200] #> partyid age replicate #> #> 1 rep 36 1 #> 2 ind 34 1 #> 3 dem 24 1 #> 4 dem 42 1 #> 5 ind 31 1 #> 6 dem 32 1 #> 7 ind 48 1 #> 8 rep 36 1 #> 9 ind 30 1 #> 10 ind 33 1 #> # ℹ 99,990 more rows # generate a null distribution via sampling from a # binomial distribution 200 times gss %>% specify(response = sex, success = \"female\") %>% hypothesize(null = \"point\", p = .5) %>% generate(reps = 200, type = \"draw\") %>% calculate(stat = \"z\") #> Response: sex (factor) #> Null Hypothesis: point #> # A tibble: 200 × 2 #> replicate stat #> #> 1 1 0.537 #> 2 2 0.447 #> 3 3 -0.447 #> 4 4 -0.984 #> 5 5 1.70 #> 6 6 1.52 #> 7 7 0.0894 #> 8 8 -1.25 #> 9 9 -0.268 #> 10 10 -0.805 #> # ℹ 190 more rows # more in-depth explanation of how to use the infer package if (FALSE) { vignette(\"infer\") }"},{"path":"https://infer.tidymodels.org/dev/reference/get_confidence_interval.html","id":null,"dir":"Reference","previous_headings":"","what":"Compute confidence interval — get_confidence_interval","title":"Compute confidence interval — get_confidence_interval","text":"Compute confidence interval around summary statistic. simulation-based theoretical methods supported, though type = \"se\" supported theoretical methods. Learn vignette(\"infer\").","code":""},{"path":"https://infer.tidymodels.org/dev/reference/get_confidence_interval.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Compute confidence interval — get_confidence_interval","text":"","code":"get_confidence_interval(x, level = 0.95, type = NULL, point_estimate = NULL) get_ci(x, level = 0.95, type = NULL, point_estimate = NULL)"},{"path":"https://infer.tidymodels.org/dev/reference/get_confidence_interval.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Compute confidence interval — get_confidence_interval","text":"x distribution. simulation-based inference, data frame containing distribution calculate()d statistics fit()ted coefficient estimates. object passed generate() supplied calculate() fit(). theory-based inference, output assume(). Distributions confidence intervals require null hypothesis via hypothesize(). level numerical value 0 1 giving confidence level. Default value 0.95. type string giving method used creating confidence interval. default \"percentile\" \"se\" corresponding (multiplier * standard error) \"bias-corrected\" bias-corrected interval options. point_estimate data frame containing observed statistic (calculate()-based workflow) observed fit (fit()-based workflow). object likely output calculate() fit() need passed generate(). Set NULL default. Must provided type \"se\" \"bias-corrected\".","code":""},{"path":"https://infer.tidymodels.org/dev/reference/get_confidence_interval.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Compute confidence interval — get_confidence_interval","text":"tibble containing following columns: term: explanatory variable (intercept) question. supplied input previously passed fit(). lower_ci, upper_ci: lower upper bounds confidence interval, respectively.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/get_confidence_interval.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Compute confidence interval — get_confidence_interval","text":"null hypothesis required compute confidence interval. However, including hypothesize() pipeline leading get_confidence_interval() break anything. can useful computing confidence interval using distribution used compute p-value. Theoretical confidence intervals (.e. calculated supplying output assume() x argument) require point estimate lies scale data. distribution defined assume() recentered rescaled align point estimate, can shown output visualize() paired shade_confidence_interval(). Confidence intervals implemented following distributions point estimates: distribution = \"t\": point_estimate output calculate() stat = \"mean\" stat = \"diff means\" distribution = \"z\": point_estimate output calculate() stat = \"prop\" stat = \"diff props\"","code":""},{"path":"https://infer.tidymodels.org/dev/reference/get_confidence_interval.html","id":"aliases","dir":"Reference","previous_headings":"","what":"Aliases","title":"Compute confidence interval — get_confidence_interval","text":"get_ci() alias get_confidence_interval(). conf_int() deprecated alias get_confidence_interval().","code":""},{"path":[]},{"path":"https://infer.tidymodels.org/dev/reference/get_confidence_interval.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Compute confidence interval — get_confidence_interval","text":"","code":"boot_dist <- gss %>% # We're interested in the number of hours worked per week specify(response = hours) %>% # Generate bootstrap samples generate(reps = 1000, type = \"bootstrap\") %>% # Calculate mean of each bootstrap sample calculate(stat = \"mean\") boot_dist %>% # Calculate the confidence interval around the point estimate get_confidence_interval( # At the 95% confidence level; percentile method level = 0.95 ) #> # A tibble: 1 × 2 #> lower_ci upper_ci #> #> 1 40.2 42.7 # for type = \"se\" or type = \"bias-corrected\" we need a point estimate sample_mean <- gss %>% specify(response = hours) %>% calculate(stat = \"mean\") boot_dist %>% get_confidence_interval( point_estimate = sample_mean, # At the 95% confidence level level = 0.95, # Using the standard error method type = \"se\" ) #> # A tibble: 1 × 2 #> lower_ci upper_ci #> #> 1 40.1 42.7 # using a theoretical distribution ----------------------------------- # define a sampling distribution sampling_dist <- gss %>% specify(response = hours) %>% assume(\"t\") # get the confidence interval---note that the # point estimate is required here get_confidence_interval( sampling_dist, level = .95, point_estimate = sample_mean ) #> # A tibble: 1 × 2 #> lower_ci upper_ci #> #> 1 40.1 42.7 # using a model fitting workflow ----------------------- # fit a linear model predicting number of hours worked per # week using respondent age and degree status. observed_fit <- gss %>% specify(hours ~ age + college) %>% fit() observed_fit #> # A tibble: 3 × 2 #> term estimate #> #> 1 intercept 40.6 #> 2 age 0.00596 #> 3 collegedegree 1.53 # fit 100 models to resamples of the gss dataset, where the response # `hours` is permuted in each. note that this code is the same as # the above except for the addition of the `generate` step. null_fits <- gss %>% specify(hours ~ age + college) %>% hypothesize(null = \"independence\") %>% generate(reps = 100, type = \"permute\") %>% fit() null_fits #> # A tibble: 300 × 3 #> # Groups: replicate [100] #> replicate term estimate #> #> 1 1 intercept 44.2 #> 2 1 age -0.0765 #> 3 1 collegedegree 0.676 #> 4 2 intercept 41.5 #> 5 2 age -0.000968 #> 6 2 collegedegree -0.329 #> 7 3 intercept 41.4 #> 8 3 age 0.0131 #> 9 3 collegedegree -1.50 #> 10 4 intercept 42.0 #> # ℹ 290 more rows get_confidence_interval( null_fits, point_estimate = observed_fit, level = .95 ) #> # A tibble: 3 × 3 #> term lower_ci upper_ci #> #> 1 age -0.0846 0.0856 #> 2 collegedegree -2.10 2.81 #> 3 intercept 38.1 44.7 # more in-depth explanation of how to use the infer package if (FALSE) { vignette(\"infer\") }"},{"path":"https://infer.tidymodels.org/dev/reference/get_p_value.html","id":null,"dir":"Reference","previous_headings":"","what":"Compute p-value — get_p_value","title":"Compute p-value — get_p_value","text":"Compute p-value null distribution observed statistic. Learn vignette(\"infer\").","code":""},{"path":"https://infer.tidymodels.org/dev/reference/get_p_value.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Compute p-value — get_p_value","text":"","code":"get_p_value(x, obs_stat, direction) # S3 method for default get_p_value(x, obs_stat, direction) get_pvalue(x, obs_stat, direction) # S3 method for infer_dist get_p_value(x, obs_stat, direction)"},{"path":"https://infer.tidymodels.org/dev/reference/get_p_value.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Compute p-value — get_p_value","text":"x null distribution. simulation-based inference, data frame containing distribution calculate()d statistics fit()ted coefficient estimates. object passed generate() supplied calculate() fit(). theory-based inference, output assume(). obs_stat data frame containing observed statistic (calculate()-based workflow) observed fit (fit()-based workflow). object likely output calculate() fit() need passed generate(). direction character string. Options \"less\", \"greater\", \"two-sided\". Can also use \"left\", \"right\", \"\", \"two_sided\", \"two sided\", \"two.sided\".","code":""},{"path":"https://infer.tidymodels.org/dev/reference/get_p_value.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Compute p-value — get_p_value","text":"tibble containing following columns: term: explanatory variable (intercept) question. supplied input previously passed fit(). p_value: value [0, 1] giving probability statistic/coefficient extreme observed statistic/coefficient occur null hypothesis true.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/get_p_value.html","id":"aliases","dir":"Reference","previous_headings":"","what":"Aliases","title":"Compute p-value — get_p_value","text":"get_pvalue() alias get_p_value(). p_value deprecated alias get_p_value().","code":""},{"path":"https://infer.tidymodels.org/dev/reference/get_p_value.html","id":"zero-p-value","dir":"Reference","previous_headings":"","what":"Zero p-value","title":"Compute p-value — get_p_value","text":"Though true p-value 0 impossible, get_p_value() may return 0 cases. due simulation-based nature {infer} package; output function approximation based number reps chosen generate() step. observed statistic unlikely given null hypothesis, small number reps generated form null distribution, possible observed statistic extreme every test statistic generated form null distribution, resulting approximate p-value 0. case, true p-value small value likely less 3/reps (based poisson approximation). case p-value zero reported, warning message raised caution user reporting p-value exactly equal 0.","code":""},{"path":[]},{"path":"https://infer.tidymodels.org/dev/reference/get_p_value.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Compute p-value — get_p_value","text":"","code":"# using a simulation-based null distribution ------------------------------ # find the point estimate---mean number of hours worked per week point_estimate <- gss %>% specify(response = hours) %>% calculate(stat = \"mean\") # starting with the gss dataset gss %>% # ...we're interested in the number of hours worked per week specify(response = hours) %>% # hypothesizing that the mean is 40 hypothesize(null = \"point\", mu = 40) %>% # generating data points for a null distribution generate(reps = 1000, type = \"bootstrap\") %>% # finding the null distribution calculate(stat = \"mean\") %>% get_p_value(obs_stat = point_estimate, direction = \"two-sided\") #> # A tibble: 1 × 1 #> p_value #> #> 1 0.032 # using a theoretical null distribution ----------------------------------- # calculate the observed statistic obs_stat <- gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% calculate(stat = \"t\") # define a null distribution null_dist <- gss %>% specify(response = hours) %>% assume(\"t\") # calculate a p-value get_p_value(null_dist, obs_stat, direction = \"both\") #> # A tibble: 1 × 1 #> p_value #> #> 1 0.0376 # using a model fitting workflow ----------------------------------------- # fit a linear model predicting number of hours worked per # week using respondent age and degree status. observed_fit <- gss %>% specify(hours ~ age + college) %>% fit() observed_fit #> # A tibble: 3 × 2 #> term estimate #> #> 1 intercept 40.6 #> 2 age 0.00596 #> 3 collegedegree 1.53 # fit 100 models to resamples of the gss dataset, where the response # `hours` is permuted in each. note that this code is the same as # the above except for the addition of the `generate` step. null_fits <- gss %>% specify(hours ~ age + college) %>% hypothesize(null = \"independence\") %>% generate(reps = 100, type = \"permute\") %>% fit() null_fits #> # A tibble: 300 × 3 #> # Groups: replicate [100] #> replicate term estimate #> #> 1 1 intercept 40.7 #> 2 1 age -0.00753 #> 3 1 collegedegree 2.78 #> 4 2 intercept 41.8 #> 5 2 age -0.000256 #> 6 2 collegedegree -1.08 #> 7 3 intercept 42.7 #> 8 3 age -0.0426 #> 9 3 collegedegree 1.23 #> 10 4 intercept 42.6 #> # ℹ 290 more rows get_p_value(null_fits, obs_stat = observed_fit, direction = \"two-sided\") #> # A tibble: 3 × 2 #> term p_value #> #> 1 age 0.92 #> 2 collegedegree 0.26 #> 3 intercept 0.68 # more in-depth explanation of how to use the infer package if (FALSE) { vignette(\"infer\") }"},{"path":"https://infer.tidymodels.org/dev/reference/gss.html","id":null,"dir":"Reference","previous_headings":"","what":"Subset of data from the General Social Survey (GSS). — gss","title":"Subset of data from the General Social Survey (GSS). — gss","text":"General Social Survey high-quality survey gathers data American society opinions, conducted since 1972. data set sample 500 entries GSS, spanning years 1973-2018, including demographic markers economic variables. Note data included demonstration , assumed provide accurate estimates relating GSS. However, due high quality GSS, unweighted data approximate weighted data analyses.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/gss.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Subset of data from the General Social Survey (GSS). — gss","text":"","code":"gss"},{"path":"https://infer.tidymodels.org/dev/reference/gss.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Subset of data from the General Social Survey (GSS). — gss","text":"tibble 500 rows 11 variables: year year respondent surveyed age age time survey, truncated 89 sex respondent's sex (self-identified) college whether respondent college degree, including junior/community college partyid political party affiliation hompop number persons household hours number hours worked week survey, truncated 89 income total family income class subjective socioeconomic class identification finrela opinion family income weight survey weight","code":""},{"path":"https://infer.tidymodels.org/dev/reference/gss.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Subset of data from the General Social Survey (GSS). — gss","text":"https://gss.norc.org","code":""},{"path":"https://infer.tidymodels.org/dev/reference/hypothesize.html","id":null,"dir":"Reference","previous_headings":"","what":"Declare a null hypothesis — hypothesize","title":"Declare a null hypothesis — hypothesize","text":"Declare null hypothesis variables selected specify(). Learn vignette(\"infer\").","code":""},{"path":"https://infer.tidymodels.org/dev/reference/hypothesize.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Declare a null hypothesis — hypothesize","text":"","code":"hypothesize(x, null, p = NULL, mu = NULL, med = NULL, sigma = NULL) hypothesise(x, null, p = NULL, mu = NULL, med = NULL, sigma = NULL)"},{"path":"https://infer.tidymodels.org/dev/reference/hypothesize.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Declare a null hypothesis — hypothesize","text":"x data frame can coerced tibble. null null hypothesis. Options include \"independence\", \"point\", \"paired independence\". independence: used response explanatory variable. Indicates values specified response variable independent associated values explanatory. point: used response variable. Indicates point estimate based values response associated parameter. Sometimes requires supplying one p, mu, med, sigma. paired independence: used response variable giving pre-computed difference paired observations. Indicates order subtraction paired values affect resulting distribution. p true proportion successes (number 0 1). used point null hypotheses specified response variable categorical. mu true mean (numerical value). used point null hypotheses specified response variable continuous. med true median (numerical value). used point null hypotheses specified response variable continuous. sigma true standard deviation (numerical value). used point null hypotheses.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/hypothesize.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Declare a null hypothesis — hypothesize","text":"tibble containing response (explanatory, specified) variable data parameter information stored well.","code":""},{"path":[]},{"path":"https://infer.tidymodels.org/dev/reference/hypothesize.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Declare a null hypothesis — hypothesize","text":"","code":"# hypothesize independence of two variables gss %>% specify(college ~ partyid, success = \"degree\") %>% hypothesize(null = \"independence\") #> Dropping unused factor levels DK from the supplied explanatory variable #> 'partyid'. #> Response: college (factor) #> Explanatory: partyid (factor) #> Null Hypothesis: independence #> # A tibble: 500 × 2 #> college partyid #> #> 1 degree ind #> 2 no degree rep #> 3 degree ind #> 4 no degree ind #> 5 degree rep #> 6 no degree rep #> 7 no degree dem #> 8 degree ind #> 9 degree rep #> 10 no degree dem #> # ℹ 490 more rows # hypothesize a mean number of hours worked per week of 40 gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) #> Response: hours (numeric) #> Null Hypothesis: point #> # A tibble: 500 × 1 #> hours #> #> 1 50 #> 2 31 #> 3 40 #> 4 40 #> 5 40 #> 6 53 #> 7 32 #> 8 20 #> 9 40 #> 10 40 #> # ℹ 490 more rows # more in-depth explanation of how to use the infer package if (FALSE) { vignette(\"infer\") }"},{"path":"https://infer.tidymodels.org/dev/reference/infer.html","id":null,"dir":"Reference","previous_headings":"","what":"infer: a grammar for statistical inference — infer","title":"infer: a grammar for statistical inference — infer","text":"objective package perform statistical inference using grammar illustrates underlying concepts format coheres tidyverse.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/infer.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"infer: a grammar for statistical inference — infer","text":"overview use core functionality, see vignette(\"infer\")","code":""},{"path":[]},{"path":"https://infer.tidymodels.org/dev/reference/infer.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"infer: a grammar for statistical inference — infer","text":"Maintainer: Simon Couch simon.couch@posit.co (ORCID) Authors: Andrew Bray abray@reed.edu Chester Ismay chester.ismay@gmail.com (ORCID) Evgeni Chasnovski evgeni.chasnovski@gmail.com (ORCID) Ben Baumer ben.baumer@gmail.com (ORCID) Mine Cetinkaya-Rundel mine@stat.duke.edu (ORCID) contributors: Ted Laderas tedladeras@gmail.com (ORCID) [contributor] Nick Solomon nick.solomon@datacamp.com [contributor] Johanna Hardin Jo.Hardin@pomona.edu [contributor] Albert Y. Kim albert.ys.kim@gmail.com (ORCID) [contributor] Neal Fultz nfultz@gmail.com [contributor] Doug Friedman doug.nhp@gmail.com [contributor] Richie Cotton richie@datacamp.com (ORCID) [contributor] Brian Fannin captain@pirategrunt.com [contributor]","code":""},{"path":"https://infer.tidymodels.org/dev/reference/observe.html","id":null,"dir":"Reference","previous_headings":"","what":"Calculate observed statistics — observe","title":"Calculate observed statistics — observe","text":"function wrapper calls specify(), hypothesize(), calculate() consecutively can used calculate observed statistics data. hypothesize() called point null hypothesis parameter supplied. Learn vignette(\"infer\").","code":""},{"path":"https://infer.tidymodels.org/dev/reference/observe.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calculate observed statistics — observe","text":"","code":"observe( x, formula, response = NULL, explanatory = NULL, success = NULL, null = NULL, p = NULL, mu = NULL, med = NULL, sigma = NULL, stat = c(\"mean\", \"median\", \"sum\", \"sd\", \"prop\", \"count\", \"diff in means\", \"diff in medians\", \"diff in props\", \"Chisq\", \"F\", \"slope\", \"correlation\", \"t\", \"z\", \"ratio of props\", \"odds ratio\"), order = NULL, ... )"},{"path":"https://infer.tidymodels.org/dev/reference/observe.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calculate observed statistics — observe","text":"x data frame can coerced tibble. formula formula response variable left explanatory right. Alternatively, response explanatory argument can supplied. response variable name x serve response. alternative using formula argument. explanatory variable name x serve explanatory variable. alternative using formula argument. success level response considered success, string. Needed inference one proportion, difference proportions, corresponding z stats. null null hypothesis. Options include \"independence\", \"point\", \"paired independence\". independence: used response explanatory variable. Indicates values specified response variable independent associated values explanatory. point: used response variable. Indicates point estimate based values response associated parameter. Sometimes requires supplying one p, mu, med, sigma. paired independence: used response variable giving pre-computed difference paired observations. Indicates order subtraction paired values affect resulting distribution. p true proportion successes (number 0 1). used point null hypotheses specified response variable categorical. mu true mean (numerical value). used point null hypotheses specified response variable continuous. med true median (numerical value). used point null hypotheses specified response variable continuous. sigma true standard deviation (numerical value). used point null hypotheses. stat string giving type statistic calculate. Current options include \"mean\", \"median\", \"sum\", \"sd\", \"prop\", \"count\", \"diff means\", \"diff medians\", \"diff props\", \"Chisq\" (\"chisq\"), \"F\" (\"f\"), \"t\", \"z\", \"ratio props\", \"slope\", \"odds ratio\", \"ratio means\", \"correlation\". infer supports theoretical tests one two means via \"t\" distribution one two proportions via \"z\". order string vector specifying order levels explanatory variable ordered subtraction (division ratio-based statistics), order = c(\"first\", \"second\") means (\"first\" - \"second\"), analogue ratios. Needed inference difference means, medians, proportions, ratios, t, z statistics. ... pass options like na.rm = TRUE functions like mean(), sd(), etc. Can also used supply hypothesized null values \"t\" statistic additional arguments stats::chisq.test().","code":""},{"path":"https://infer.tidymodels.org/dev/reference/observe.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calculate observed statistics — observe","text":"1-column tibble containing calculated statistic stat.","code":""},{"path":[]},{"path":"https://infer.tidymodels.org/dev/reference/observe.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Calculate observed statistics — observe","text":"","code":"# calculating the observed mean number of hours worked per week gss %>% observe(hours ~ NULL, stat = \"mean\") #> Response: hours (numeric) #> # A tibble: 1 × 1 #> stat #> #> 1 41.4 # equivalently, calculating the same statistic with the core verbs gss %>% specify(response = hours) %>% calculate(stat = \"mean\") #> Response: hours (numeric) #> # A tibble: 1 × 1 #> stat #> #> 1 41.4 # calculating a t statistic for hypothesized mu = 40 hours worked/week gss %>% observe(hours ~ NULL, stat = \"t\", null = \"point\", mu = 40) #> Response: hours (numeric) #> Null Hypothesis: point #> # A tibble: 1 × 1 #> stat #> #> 1 2.09 # equivalently, calculating the same statistic with the core verbs gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% calculate(stat = \"t\") #> Response: hours (numeric) #> Null Hypothesis: point #> # A tibble: 1 × 1 #> stat #> #> 1 2.09 # similarly for a difference in means in age based on whether # the respondent has a college degree observe( gss, age ~ college, stat = \"diff in means\", order = c(\"degree\", \"no degree\") ) #> Response: age (numeric) #> Explanatory: college (factor) #> # A tibble: 1 × 1 #> stat #> #> 1 0.941 # equivalently, calculating the same statistic with the core verbs gss %>% specify(age ~ college) %>% calculate(\"diff in means\", order = c(\"degree\", \"no degree\")) #> Response: age (numeric) #> Explanatory: college (factor) #> # A tibble: 1 × 1 #> stat #> #> 1 0.941 # for a more in-depth explanation of how to use the infer package if (FALSE) { vignette(\"infer\") }"},{"path":"https://infer.tidymodels.org/dev/reference/pipe.html","id":null,"dir":"Reference","previous_headings":"","what":"Pipe — %>%","title":"Pipe — %>%","text":"Like {dplyr}, {infer} also uses pipe (%>%) function magrittr turn function composition series iterative statements.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/pipe.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Pipe — %>%","text":"lhs, rhs Inference functions initial data frame.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/print.infer.html","id":null,"dir":"Reference","previous_headings":"","what":"Print methods — print.infer","title":"Print methods — print.infer","text":"Print methods","code":""},{"path":"https://infer.tidymodels.org/dev/reference/print.infer.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Print methods — print.infer","text":"","code":"# S3 method for infer print(x, ...) # S3 method for infer_layer print(x, ...) # S3 method for infer_dist print(x, ...)"},{"path":"https://infer.tidymodels.org/dev/reference/print.infer.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Print methods — print.infer","text":"x object class infer, .e. output specify() hypothesize(), class infer_layer, .e. output shade_p_value() shade_confidence_interval(). ... Arguments passed methods.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/prop_test.html","id":null,"dir":"Reference","previous_headings":"","what":"Tidy proportion test — prop_test","title":"Tidy proportion test — prop_test","text":"tidier version prop.test() equal given proportions.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/prop_test.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Tidy proportion test — prop_test","text":"","code":"prop_test( x, formula, response = NULL, explanatory = NULL, p = NULL, order = NULL, alternative = \"two-sided\", conf_int = TRUE, conf_level = 0.95, success = NULL, correct = NULL, z = FALSE, ... )"},{"path":"https://infer.tidymodels.org/dev/reference/prop_test.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Tidy proportion test — prop_test","text":"x data frame can coerced tibble. formula formula response variable left explanatory right. Alternatively, response explanatory argument can supplied. response variable name x serve response. alternative using formula argument. explanatory variable name x serve explanatory variable. alternative using formula argument. p numeric vector giving hypothesized null proportion success group. order string vector specifying order proportions subtracted, order = c(\"first\", \"second\") means \"first\" - \"second\". Ignored one-sample tests, optional two sample tests. alternative Character string giving direction alternative hypothesis. Options \"two-sided\" (default), \"greater\", \"less\". used testing null single proportion equals given value, two proportions equal; ignored otherwise. conf_int logical value whether include confidence interval . TRUE default. conf_level numeric value 0 1. Default value 0.95. success level response considered success, string. used testing null single proportion equals given value, two proportions equal; ignored otherwise. correct logical indicating whether Yates' continuity correction applied possible. z = TRUE, correct argument overwritten FALSE. Otherwise defaults correct = TRUE. z logical value whether report statistic standard normal deviate Pearson's chi-square statistic. \\(z^2\\) distributed chi-square 1 degree freedom, though note user likely need turn Yates' continuity correction setting correct = FALSE see connection. ... Additional arguments prop.test().","code":""},{"path":"https://infer.tidymodels.org/dev/reference/prop_test.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Tidy proportion test — prop_test","text":"testing explanatory variable two levels, order argument used package longer well-defined. function thus raise warning ignore value supplied non-NULL order argument. columns present output depend output prop.test() broom::glance.htest(). See latter's documentation column definitions; columns renamed following mapping: chisq_df = parameter p_value = p.value lower_ci = conf.low upper_ci = conf.high","code":""},{"path":[]},{"path":"https://infer.tidymodels.org/dev/reference/prop_test.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Tidy proportion test — prop_test","text":"","code":"# two-sample proportion test for difference in proportions of # college completion by respondent sex prop_test(gss, college ~ sex, order = c(\"female\", \"male\")) #> # A tibble: 1 × 6 #> statistic chisq_df p_value alternative lower_ci upper_ci #> #> 1 0.0000204 1 0.996 two.sided -0.0918 0.0834 # one-sample proportion test for hypothesized null # proportion of college completion of .2 prop_test(gss, college ~ NULL, p = .2) #> # A tibble: 1 × 4 #> statistic chisq_df p_value alternative #> #> 1 636. 1 2.98e-140 two.sided # report as a z-statistic rather than chi-square # and specify the success level of the response prop_test(gss, college ~ NULL, success = \"degree\", p = .2, z = TRUE) #> # A tibble: 1 × 3 #> statistic p_value alternative #> #> 1 8.27 1.30e-16 two.sided"},{"path":"https://infer.tidymodels.org/dev/reference/reexports.html","id":null,"dir":"Reference","previous_headings":"","what":"Objects exported from other packages — reexports","title":"Objects exported from other packages — reexports","text":"objects imported packages. Follow links see documentation. generics fit ggplot2 ggplot_add","code":""},{"path":"https://infer.tidymodels.org/dev/reference/reexports.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Objects exported from other packages — reexports","text":"Read infer's fit function running ?fit.infer console.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/rep_sample_n.html","id":null,"dir":"Reference","previous_headings":"","what":"Perform repeated sampling — rep_sample_n","title":"Perform repeated sampling — rep_sample_n","text":"functions extend functionality dplyr::sample_n() dplyr::slice_sample() allowing repeated sampling data. operation especially helpful creating sampling distributions—see examples !","code":""},{"path":"https://infer.tidymodels.org/dev/reference/rep_sample_n.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Perform repeated sampling — rep_sample_n","text":"","code":"rep_sample_n(tbl, size, replace = FALSE, reps = 1, prob = NULL) rep_slice_sample( .data, n = NULL, prop = NULL, replace = FALSE, weight_by = NULL, reps = 1 )"},{"path":"https://infer.tidymodels.org/dev/reference/rep_sample_n.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Perform repeated sampling — rep_sample_n","text":"tbl, .data Data frame population sample. size, n, prop size n refer sample size sample. size argument rep_sample_n() required, rep_slice_sample() sample size defaults 1 specified. prop, argument rep_slice_sample(), refers proportion rows sample sample, rounded case prop * nrow(.data) integer. using rep_slice_sample(), please supply one n prop. replace samples taken replacement? reps Number samples take. prob, weight_by vector sampling weights rows .data—must length equal nrow(.data). weight_by, may also unquoted column name .data.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/rep_sample_n.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Perform repeated sampling — rep_sample_n","text":"tibble size reps * n rows corresponding reps samples size n .data, grouped replicate.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/rep_sample_n.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Perform repeated sampling — rep_sample_n","text":"rep_sample_n() rep_slice_sample() designed behave similar dplyr counterparts. , least following differences: case replace = FALSE size bigger number data rows rep_sample_n() give error. rep_slice_sample() n prop > 1 give warning output sample size set number rows data. Note dplyr::sample_n() function superseded dplyr::slice_sample().","code":""},{"path":"https://infer.tidymodels.org/dev/reference/rep_sample_n.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Perform repeated sampling — rep_sample_n","text":"","code":"library(dplyr) #> #> Attaching package: ‘dplyr’ #> The following objects are masked from ‘package:stats’: #> #> filter, lag #> The following objects are masked from ‘package:base’: #> #> intersect, setdiff, setequal, union library(ggplot2) library(tibble) # take 1000 samples of size n = 50, without replacement slices <- gss %>% rep_slice_sample(n = 50, reps = 1000) slices #> # A tibble: 50,000 × 12 #> # Groups: replicate [1,000] #> replicate year age sex college partyid hompop hours income class #> #> 1 1 1994 34 female no degr… rep 4 31 $2000… work… #> 2 1 1976 21 female no degr… ind 2 40 $7000… midd… #> 3 1 1989 18 male no degr… rep 2 21 $2000… midd… #> 4 1 1996 32 female no degr… rep 4 53 $2500… midd… #> 5 1 1991 39 female no degr… dem 4 40 $2500… midd… #> 6 1 2010 57 male degree rep 3 60 $2500… midd… #> 7 1 2004 51 male degree rep 2 50 $2500… midd… #> 8 1 1998 35 male no degr… ind 6 45 $2500… midd… #> 9 1 1994 49 female no degr… ind 4 40 $2500… midd… #> 10 1 1985 51 female no degr… dem 4 28 $2500… midd… #> # ℹ 49,990 more rows #> # ℹ 2 more variables: finrela , weight # compute the proportion of respondents with a college # degree in each replicate p_hats <- slices %>% group_by(replicate) %>% summarize(prop_college = mean(college == \"degree\")) # plot sampling distribution ggplot(p_hats, aes(x = prop_college)) + geom_density() + labs( x = \"p_hat\", y = \"Number of samples\", title = \"Sampling distribution of p_hat\" ) # sampling with probability weights. Note probabilities are automatically # renormalized to sum to 1 df <- tibble( id = 1:5, letter = factor(c(\"a\", \"b\", \"c\", \"d\", \"e\")) ) rep_slice_sample(df, n = 2, reps = 5, weight_by = c(.5, .4, .3, .2, .1)) #> # A tibble: 10 × 3 #> # Groups: replicate [5] #> replicate id letter #> #> 1 1 3 c #> 2 1 5 e #> 3 2 5 e #> 4 2 3 c #> 5 3 1 a #> 6 3 3 c #> 7 4 1 a #> 8 4 2 b #> 9 5 1 a #> 10 5 4 d # alternatively, pass an unquoted column name in `.data` as `weight_by` df <- df %>% mutate(wts = c(.5, .4, .3, .2, .1)) rep_slice_sample(df, n = 2, reps = 5, weight_by = wts) #> # A tibble: 10 × 4 #> # Groups: replicate [5] #> replicate id letter wts #> #> 1 1 3 c 0.3 #> 2 1 1 a 0.5 #> 3 2 2 b 0.4 #> 4 2 1 a 0.5 #> 5 3 5 e 0.1 #> 6 3 3 c 0.3 #> 7 4 3 c 0.3 #> 8 4 1 a 0.5 #> 9 5 3 c 0.3 #> 10 5 4 d 0.2"},{"path":"https://infer.tidymodels.org/dev/reference/shade_confidence_interval.html","id":null,"dir":"Reference","previous_headings":"","what":"Add information about confidence interval — shade_confidence_interval","title":"Add information about confidence interval — shade_confidence_interval","text":"shade_confidence_interval() plots confidence interval region top visualize() output. output ggplot2 layer can added +. function shorter alias, shade_ci(). Learn vignette(\"infer\").","code":""},{"path":"https://infer.tidymodels.org/dev/reference/shade_confidence_interval.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Add information about confidence interval — shade_confidence_interval","text":"","code":"shade_confidence_interval( endpoints, color = \"mediumaquamarine\", fill = \"turquoise\", ... ) shade_ci(endpoints, color = \"mediumaquamarine\", fill = \"turquoise\", ...)"},{"path":"https://infer.tidymodels.org/dev/reference/shade_confidence_interval.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Add information about confidence interval — shade_confidence_interval","text":"endpoints lower upper bounds interval plotted. Likely, output get_confidence_interval(). calculate()-based workflows, 2-element vector 1 x 2 data frame containing lower upper values plotted. fit()-based workflows, (p + 1) x 3 data frame columns term, lower_ci, upper_ci, giving upper lower bounds regression term. use visualizations assume() output, must output get_confidence_interval(). color character hex string specifying color end points vertical lines plot. fill character hex string specifying color shade confidence interval. NULL shading actually done. ... arguments passed along ggplot2 functions.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/shade_confidence_interval.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Add information about confidence interval — shade_confidence_interval","text":"added existing infer visualization, ggplot2 object displaying supplied intervals top corresponding distribution. Otherwise, infer_layer list.","code":""},{"path":[]},{"path":"https://infer.tidymodels.org/dev/reference/shade_confidence_interval.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Add information about confidence interval — shade_confidence_interval","text":"","code":"# find the point estimate---mean number of hours worked per week point_estimate <- gss %>% specify(response = hours) %>% calculate(stat = \"mean\") # ...and a bootstrap distribution boot_dist <- gss %>% # ...we're interested in the number of hours worked per week specify(response = hours) %>% # generating data points generate(reps = 1000, type = \"bootstrap\") %>% # finding the distribution from the generated data calculate(stat = \"mean\") # find a confidence interval around the point estimate ci <- boot_dist %>% get_confidence_interval(point_estimate = point_estimate, # at the 95% confidence level level = .95, # using the standard error method type = \"se\") # and plot it! boot_dist %>% visualize() + shade_confidence_interval(ci) # or just plot the bounds boot_dist %>% visualize() + shade_confidence_interval(ci, fill = NULL) # you can shade confidence intervals on top of # theoretical distributions, too---the theoretical # distribution will be recentered and rescaled to # align with the confidence interval sampling_dist <- gss %>% specify(response = hours) %>% assume(distribution = \"t\") visualize(sampling_dist) + shade_confidence_interval(ci) # \\donttest{ # to visualize distributions of coefficients for multiple # explanatory variables, use a `fit()`-based workflow # fit 1000 linear models with the `hours` variable permuted null_fits <- gss %>% specify(hours ~ age + college) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") %>% fit() null_fits #> # A tibble: 3,000 × 3 #> # Groups: replicate [1,000] #> replicate term estimate #> #> 1 1 intercept 40.8 #> 2 1 age 0.0153 #> 3 1 collegedegree -0.0626 #> 4 2 intercept 40.3 #> 5 2 age 0.0278 #> 6 2 collegedegree -0.0655 #> 7 3 intercept 42.8 #> 8 3 age -0.0348 #> 9 3 collegedegree 0.0726 #> 10 4 intercept 40.7 #> # ℹ 2,990 more rows # fit a linear model to the observed data obs_fit <- gss %>% specify(hours ~ age + college) %>% fit() obs_fit #> # A tibble: 3 × 2 #> term estimate #> #> 1 intercept 40.6 #> 2 age 0.00596 #> 3 collegedegree 1.53 # get confidence intervals for each term conf_ints <- get_confidence_interval( null_fits, point_estimate = obs_fit, level = .95 ) # visualize distributions of coefficients # generated under the null visualize(null_fits) # add a confidence interval shading layer to juxtapose # the null fits with the observed fit for each term visualize(null_fits) + shade_confidence_interval(conf_ints) # } # more in-depth explanation of how to use the infer package if (FALSE) { vignette(\"infer\") }"},{"path":"https://infer.tidymodels.org/dev/reference/shade_p_value.html","id":null,"dir":"Reference","previous_headings":"","what":"Shade histogram area beyond an observed statistic — shade_p_value","title":"Shade histogram area beyond an observed statistic — shade_p_value","text":"shade_p_value() plots p-value region top visualize() output. output ggplot2 layer can added +. function shorter alias, shade_pvalue(). Learn vignette(\"infer\").","code":""},{"path":"https://infer.tidymodels.org/dev/reference/shade_p_value.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Shade histogram area beyond an observed statistic — shade_p_value","text":"","code":"shade_p_value(obs_stat, direction, color = \"red2\", fill = \"pink\", ...) shade_pvalue(obs_stat, direction, color = \"red2\", fill = \"pink\", ...)"},{"path":"https://infer.tidymodels.org/dev/reference/shade_p_value.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Shade histogram area beyond an observed statistic — shade_p_value","text":"obs_stat observed statistic estimate. calculate()-based workflows, 1-element numeric vector 1 x 1 data frame containing observed statistic. fit()-based workflows, (p + 1) x 2 data frame columns term estimate giving observed estimate term. direction string specifying direction shading occur. Options \"less\", \"greater\", \"two-sided\". Can also give \"left\", \"right\", \"\", \"two_sided\", \"two sided\", \"two.sided\". NULL, function shade area. color character hex string specifying color observed statistic vertical line plot. fill character hex string specifying color shade p-value region. NULL, function shade area. ... arguments passed along ggplot2 functions. expert use .","code":""},{"path":"https://infer.tidymodels.org/dev/reference/shade_p_value.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Shade histogram area beyond an observed statistic — shade_p_value","text":"added existing infer visualization, ggplot2 object displaying supplied statistic top corresponding distribution. Otherwise, infer_layer list.","code":""},{"path":[]},{"path":"https://infer.tidymodels.org/dev/reference/shade_p_value.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Shade histogram area beyond an observed statistic — shade_p_value","text":"","code":"# find the point estimate---mean number of hours worked per week point_estimate <- gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% calculate(stat = \"t\") # ...and a null distribution null_dist <- gss %>% # ...we're interested in the number of hours worked per week specify(response = hours) %>% # hypothesizing that the mean is 40 hypothesize(null = \"point\", mu = 40) %>% # generating data points for a null distribution generate(reps = 1000, type = \"bootstrap\") %>% # estimating the null distribution calculate(stat = \"t\") # shade the p-value of the point estimate null_dist %>% visualize() + shade_p_value(obs_stat = point_estimate, direction = \"two-sided\") #> Warning: All aesthetics have length 1, but the data has 1000 rows. #> ℹ Did you mean to use `annotate()`? # you can shade confidence intervals on top of # theoretical distributions, too! null_dist_theory <- gss %>% specify(response = hours) %>% assume(distribution = \"t\") null_dist_theory %>% visualize() + shade_p_value(obs_stat = point_estimate, direction = \"two-sided\") # \\donttest{ # to visualize distributions of coefficients for multiple # explanatory variables, use a `fit()`-based workflow # fit 1000 linear models with the `hours` variable permuted null_fits <- gss %>% specify(hours ~ age + college) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") %>% fit() null_fits #> # A tibble: 3,000 × 3 #> # Groups: replicate [1,000] #> replicate term estimate #> #> 1 1 intercept 42.3 #> 2 1 age -0.0191 #> 3 1 collegedegree -0.303 #> 4 2 intercept 37.2 #> 5 2 age 0.105 #> 6 2 collegedegree -0.0498 #> 7 3 intercept 40.3 #> 8 3 age 0.0240 #> 9 3 collegedegree 0.379 #> 10 4 intercept 41.0 #> # ℹ 2,990 more rows # fit a linear model to the observed data obs_fit <- gss %>% specify(hours ~ age + college) %>% fit() obs_fit #> # A tibble: 3 × 2 #> term estimate #> #> 1 intercept 40.6 #> 2 age 0.00596 #> 3 collegedegree 1.53 # visualize distributions of coefficients # generated under the null visualize(null_fits) # add a p-value shading layer to juxtapose the null # fits with the observed fit for each term visualize(null_fits) + shade_p_value(obs_fit, direction = \"both\") #> Warning: All aesthetics have length 1, but the data has 1000 rows. #> ℹ Did you mean to use `annotate()`? #> Warning: All aesthetics have length 1, but the data has 1000 rows. #> ℹ Did you mean to use `annotate()`? #> Warning: All aesthetics have length 1, but the data has 1000 rows. #> ℹ Did you mean to use `annotate()`? # the direction argument will be applied # to the plot for each term visualize(null_fits) + shade_p_value(obs_fit, direction = \"left\") #> Warning: All aesthetics have length 1, but the data has 1000 rows. #> ℹ Did you mean to use `annotate()`? #> Warning: All aesthetics have length 1, but the data has 1000 rows. #> ℹ Did you mean to use `annotate()`? #> Warning: All aesthetics have length 1, but the data has 1000 rows. #> ℹ Did you mean to use `annotate()`? # } # more in-depth explanation of how to use the infer package if (FALSE) { vignette(\"infer\") }"},{"path":"https://infer.tidymodels.org/dev/reference/specify.html","id":null,"dir":"Reference","previous_headings":"","what":"Specify response and explanatory variables — specify","title":"Specify response and explanatory variables — specify","text":"specify() used specify columns supplied data frame relevant response (, applicable, explanatory) variables. Note character variables converted factors. Learn vignette(\"infer\").","code":""},{"path":"https://infer.tidymodels.org/dev/reference/specify.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Specify response and explanatory variables — specify","text":"","code":"specify(x, formula, response = NULL, explanatory = NULL, success = NULL)"},{"path":"https://infer.tidymodels.org/dev/reference/specify.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Specify response and explanatory variables — specify","text":"x data frame can coerced tibble. formula formula response variable left explanatory right. Alternatively, response explanatory argument can supplied. response variable name x serve response. alternative using formula argument. explanatory variable name x serve explanatory variable. alternative using formula argument. success level response considered success, string. Needed inference one proportion, difference proportions, corresponding z stats.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/specify.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Specify response and explanatory variables — specify","text":"tibble containing response (explanatory, specified) variable data.","code":""},{"path":[]},{"path":"https://infer.tidymodels.org/dev/reference/specify.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Specify response and explanatory variables — specify","text":"","code":"# specifying for a point estimate on one variable gss %>% specify(response = age) #> Response: age (numeric) #> # A tibble: 500 × 1 #> age #> #> 1 36 #> 2 34 #> 3 24 #> 4 42 #> 5 31 #> 6 32 #> 7 48 #> 8 36 #> 9 30 #> 10 33 #> # ℹ 490 more rows # specify a relationship between variables as a formula... gss %>% specify(age ~ partyid) #> Dropping unused factor levels DK from the supplied explanatory variable #> 'partyid'. #> Response: age (numeric) #> Explanatory: partyid (factor) #> # A tibble: 500 × 2 #> age partyid #> #> 1 36 ind #> 2 34 rep #> 3 24 ind #> 4 42 ind #> 5 31 rep #> 6 32 rep #> 7 48 dem #> 8 36 ind #> 9 30 rep #> 10 33 dem #> # ℹ 490 more rows # ...or with named arguments! gss %>% specify(response = age, explanatory = partyid) #> Dropping unused factor levels DK from the supplied explanatory variable #> 'partyid'. #> Response: age (numeric) #> Explanatory: partyid (factor) #> # A tibble: 500 × 2 #> age partyid #> #> 1 36 ind #> 2 34 rep #> 3 24 ind #> 4 42 ind #> 5 31 rep #> 6 32 rep #> 7 48 dem #> 8 36 ind #> 9 30 rep #> 10 33 dem #> # ℹ 490 more rows # more in-depth explanation of how to use the infer package if (FALSE) { vignette(\"infer\") }"},{"path":"https://infer.tidymodels.org/dev/reference/t_stat.html","id":null,"dir":"Reference","previous_headings":"","what":"Tidy t-test statistic — t_stat","title":"Tidy t-test statistic — t_stat","text":"shortcut wrapper function get observed test statistic t test. function deprecated favor general observe().","code":""},{"path":"https://infer.tidymodels.org/dev/reference/t_stat.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Tidy t-test statistic — t_stat","text":"","code":"t_stat( x, formula, response = NULL, explanatory = NULL, order = NULL, alternative = \"two-sided\", mu = 0, conf_int = FALSE, conf_level = 0.95, ... )"},{"path":"https://infer.tidymodels.org/dev/reference/t_stat.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Tidy t-test statistic — t_stat","text":"x data frame can coerced tibble. formula formula response variable left explanatory right. Alternatively, response explanatory argument can supplied. response variable name x serve response. alternative using formula argument. explanatory variable name x serve explanatory variable. alternative using formula argument. order string vector specifying order levels explanatory variable ordered subtraction, order = c(\"first\", \"second\") means (\"first\" - \"second\"). alternative Character string giving direction alternative hypothesis. Options \"two-sided\" (default), \"greater\", \"less\". mu numeric value giving hypothesized null mean value one sample test hypothesized difference two sample test. conf_int logical value whether include confidence interval . TRUE default. conf_level numeric value 0 1. Default value 0.95. ... Pass arguments infer functions.","code":""},{"path":[]},{"path":"https://infer.tidymodels.org/dev/reference/t_stat.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Tidy t-test statistic — t_stat","text":"","code":"library(tidyr) # t test statistic for true mean number of hours worked # per week of 40 gss %>% t_stat(response = hours, mu = 40) #> Warning: The t_stat() wrapper has been deprecated in favor of the more general observe(). Please use that function instead. #> t #> 2.085191 # t test statistic for number of hours worked per week # by college degree status gss %>% tidyr::drop_na(college) %>% t_stat(formula = hours ~ college, order = c(\"degree\", \"no degree\"), alternative = \"two-sided\") #> Warning: The t_stat() wrapper has been deprecated in favor of the more general observe(). Please use that function instead. #> t #> 1.11931"},{"path":"https://infer.tidymodels.org/dev/reference/t_test.html","id":null,"dir":"Reference","previous_headings":"","what":"Tidy t-test — t_test","title":"Tidy t-test — t_test","text":"tidier version t.test() two sample tests.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/t_test.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Tidy t-test — t_test","text":"","code":"t_test( x, formula, response = NULL, explanatory = NULL, order = NULL, alternative = \"two-sided\", mu = 0, conf_int = TRUE, conf_level = 0.95, ... )"},{"path":"https://infer.tidymodels.org/dev/reference/t_test.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Tidy t-test — t_test","text":"x data frame can coerced tibble. formula formula response variable left explanatory right. Alternatively, response explanatory argument can supplied. response variable name x serve response. alternative using formula argument. explanatory variable name x serve explanatory variable. alternative using formula argument. order string vector specifying order levels explanatory variable ordered subtraction, order = c(\"first\", \"second\") means (\"first\" - \"second\"). alternative Character string giving direction alternative hypothesis. Options \"two-sided\" (default), \"greater\", \"less\". mu numeric value giving hypothesized null mean value one sample test hypothesized difference two sample test. conf_int logical value whether include confidence interval . TRUE default. conf_level numeric value 0 1. Default value 0.95. ... passing arguments t.test().","code":""},{"path":[]},{"path":"https://infer.tidymodels.org/dev/reference/t_test.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Tidy t-test — t_test","text":"","code":"library(tidyr) # t test for number of hours worked per week # by college degree status gss %>% tidyr::drop_na(college) %>% t_test(formula = hours ~ college, order = c(\"degree\", \"no degree\"), alternative = \"two-sided\") #> # A tibble: 1 × 7 #> statistic t_df p_value alternative estimate lower_ci upper_ci #> #> 1 1.12 366. 0.264 two.sided 1.54 -1.16 4.24 # see vignette(\"infer\") for more explanation of the # intuition behind the infer package, and vignette(\"t_test\") # for more examples of t-tests using infer"},{"path":"https://infer.tidymodels.org/dev/reference/visualize.html","id":null,"dir":"Reference","previous_headings":"","what":"Visualize statistical inference — visualize","title":"Visualize statistical inference — visualize","text":"Visualize distribution simulation-based inferential statistics theoretical distribution (!). Learn vignette(\"infer\").","code":""},{"path":"https://infer.tidymodels.org/dev/reference/visualize.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Visualize statistical inference — visualize","text":"","code":"visualize(data, bins = 15, method = \"simulation\", dens_color = \"black\", ...) visualise(data, bins = 15, method = \"simulation\", dens_color = \"black\", ...)"},{"path":"https://infer.tidymodels.org/dev/reference/visualize.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Visualize statistical inference — visualize","text":"data distribution. simulation-based inference, data frame containing distribution calculate()d statistics fit()ted coefficient estimates. object passed generate() supplied calculate() fit(). theory-based inference, output assume(). bins number bins histogram. method string giving method display. Options \"simulation\", \"theoretical\", \"\" \"\" corresponding \"simulation\" \"theoretical\". data output assume(), argument ignored default \"theoretical\". dens_color character hex string specifying color theoretical density curve. ... Additional arguments passed along functions ggplot2. method = \"simulation\", stat_bin(), method = \"theoretical\", geom_path(). values may overwritten infer internally.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/visualize.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Visualize statistical inference — visualize","text":"calculate()-based workflows, ggplot showing simulation-based distribution histogram bar graph. Can also used display theoretical distributions. assume()-based workflows, ggplot showing theoretical distribution. fit()-based workflows, patchwork object showing simulation-based distributions histogram bar graph. interface adjust plot options themes bit different patchwork plots ggplot2 plots. examples highlight biggest differences , see patchwork::plot_annotation() patchwork::&.gg details.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/visualize.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Visualize statistical inference — visualize","text":"order make visualization workflow straightforward explicit, visualize() now used plot distributions statistics directly. number arguments related shading p-values confidence intervals now deprecated visualize() now passed shade_p_value() shade_confidence_interval(), respectively. visualize() raise warning deprecated arguments supplied.","code":""},{"path":[]},{"path":"https://infer.tidymodels.org/dev/reference/visualize.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Visualize statistical inference — visualize","text":"","code":"# generate a null distribution null_dist <- gss %>% # we're interested in the number of hours worked per week specify(response = hours) %>% # hypothesizing that the mean is 40 hypothesize(null = \"point\", mu = 40) %>% # generating data points for a null distribution generate(reps = 1000, type = \"bootstrap\") %>% # calculating a distribution of means calculate(stat = \"mean\") # or a bootstrap distribution, omitting the hypothesize() step, # for use in confidence intervals boot_dist <- gss %>% specify(response = hours) %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"mean\") # we can easily plot the null distribution by piping into visualize null_dist %>% visualize() # we can add layers to the plot as in ggplot, as well... # find the point estimate---mean number of hours worked per week point_estimate <- gss %>% specify(response = hours) %>% calculate(stat = \"mean\") # find a confidence interval around the point estimate ci <- boot_dist %>% get_confidence_interval(point_estimate = point_estimate, # at the 95% confidence level level = .95, # using the standard error method type = \"se\") # display a shading of the area beyond the p-value on the plot null_dist %>% visualize() + shade_p_value(obs_stat = point_estimate, direction = \"two-sided\") #> Warning: All aesthetics have length 1, but the data has 1000 rows. #> ℹ Did you mean to use `annotate()`? # ...or within the bounds of the confidence interval null_dist %>% visualize() + shade_confidence_interval(ci) # plot a theoretical sampling distribution by creating # a theory-based distribution with `assume()` sampling_dist <- gss %>% specify(response = hours) %>% assume(distribution = \"t\") visualize(sampling_dist) # you can shade confidence intervals on top of # theoretical distributions, too---the theoretical # distribution will be recentered and rescaled to # align with the confidence interval visualize(sampling_dist) + shade_confidence_interval(ci) # to plot both a theory-based and simulation-based null distribution, # use a theorized statistic (i.e. one of t, z, F, or Chisq) # and supply the simulation-based null distribution null_dist_t <- gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"t\") obs_stat <- gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% calculate(stat = \"t\") visualize(null_dist_t, method = \"both\") #> Warning: Check to make sure the conditions have been met for the theoretical #> method. infer currently does not check these for you. visualize(null_dist_t, method = \"both\") + shade_p_value(obs_stat, \"both\") #> Warning: Check to make sure the conditions have been met for the theoretical #> method. infer currently does not check these for you. #> Warning: All aesthetics have length 1, but the data has 1000 rows. #> ℹ Did you mean to use `annotate()`? # \\donttest{ # to visualize distributions of coefficients for multiple # explanatory variables, use a `fit()`-based workflow # fit 1000 models with the `hours` variable permuted null_fits <- gss %>% specify(hours ~ age + college) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") %>% fit() null_fits #> # A tibble: 3,000 × 3 #> # Groups: replicate [1,000] #> replicate term estimate #> #> 1 1 intercept 39.5 #> 2 1 age 0.0515 #> 3 1 collegedegree -0.687 #> 4 2 intercept 40.5 #> 5 2 age 0.0209 #> 6 2 collegedegree -0.0149 #> 7 3 intercept 39.8 #> 8 3 age 0.0305 #> 9 3 collegedegree 1.16 #> 10 4 intercept 39.9 #> # ℹ 2,990 more rows # visualize distributions of resulting coefficients visualize(null_fits) # the interface to add themes and other elements to patchwork # plots (outputted by `visualize` when the inputted data # is from the `fit()` function) is a bit different than adding # them to ggplot2 plots. library(ggplot2) # to add a ggplot2 theme to a `calculate()`-based visualization, use `+` null_dist %>% visualize() + theme_dark() # to add a ggplot2 theme to a `fit()`-based visualization, use `&` null_fits %>% visualize() & theme_dark() # } # More in-depth explanation of how to use the infer package if (FALSE) { vignette(\"infer\") }"},{"path":[]},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"infer-v106","dir":"Changelog","previous_headings":"","what":"infer v1.0.6","title":"infer v1.0.6","text":"CRAN release: 2024-01-31 Updated infrastructure errors, warnings, messages (#513). changes visible users, though: Many longer error messages now broken several lines. references help-files, users can now click error message’s text navigate cited documentation. Various improvements documentation (#501, #504, #508, #512). Fixed bug get_confidence_interval() error uninformatively supplied distribution estimates contained missing values. function now warn return confidence interval calculated using non-missing estimates (#521). Fixed bug generate() used without first specify()ing variables, even cases specification affect resampling/simulation (#448).","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"infer-v105","dir":"Changelog","previous_headings":"","what":"infer v1.0.5","title":"infer v1.0.5","text":"CRAN release: 2023-09-06 Implemented support permutation hypothesis tests paired data via argument value null = \"paired independence\" hypothesize() (#487). weight_by argument rep_slice_sample() can now passed either vector numeric weights unquoted column name .data (#480). Newly accommodates variables spaces names wrapper functions t_test() prop_test() (#472). Fixed bug two-sample prop_test() response explanatory variable passed place prop.test(). enables using prop_test() explanatory variables greater 2 levels , process, addresses bug prop_test() collapsed levels success response variable 2 levels.","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"infer-v104","dir":"Changelog","previous_headings":"","what":"infer v1.0.4","title":"infer v1.0.4","text":"CRAN release: 2022-12-01 Fixed bug p-value shading shaded regions longer correctly overlaid histogram bars. Addressed deprecation warning ahead upcoming dplyr release.","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"infer-v103","dir":"Changelog","previous_headings":"","what":"infer v1.0.3","title":"infer v1.0.3","text":"CRAN release: 2022-08-22 Fix R-devel HTML5 NOTEs.","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"infer-v102","dir":"Changelog","previous_headings":"","what":"infer v1.0.2","title":"infer v1.0.2","text":"CRAN release: 2022-06-10 Fix p-value shading calculated statistic falls exactly boundaries histogram bin (#424). Fix generate() errors columns named x (#431). Fix error visualize passed generate()d infer_dist objects passed hypothesize() (#432). Update visual checks visualize output align R 4.1.0+ graphics engine (#438). specify() wrapper functions now appropriately handle ordered factors (#439). Clarify error incompatible statistics hypotheses supplied (#441). Updated generate() unexpected type warnings permissive—warning raised less often type = \"bootstrap\" (#425). Allow passing additional arguments stats::chisq.test via ... calculate(). Ellipses now always passed applicable base R hypothesis testing function, applicable (#414)! package now set levels logical variables conversion factor first level (regarded success default) TRUE. Core verbs warned without explicit success value already, change makes behavior consistent functions wrapped shorthand test wrappers (#440). Added new statistic stat = \"ratio means\" (#452).","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"infer-v101-github-only","dir":"Changelog","previous_headings":"","what":"infer v1.0.1 (GitHub Only)","title":"infer v1.0.1 (GitHub Only)","text":"release reflects infer version accepted Journal Open Source Software. Re-licensed package CC0 MIT. See LICENSE LICENSE.md files. Contributed paper Journal Open Source Software, draft available /figs/paper. Various improvements documentation (#417, #418).","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"infer-100","dir":"Changelog","previous_headings":"","what":"infer 1.0.0","title":"infer 1.0.0","text":"CRAN release: 2021-08-13 v1.0.0 first major release {infer} package! large, core verbs specify(), hypothesize(), generate(), calculate() interface . release makes several improvements behavioral consistency package introduces support theory-based inference well randomization-based inference multiple explanatory variables.","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"behavioral-consistency-1-0-0","dir":"Changelog","previous_headings":"","what":"Behavioral consistency","title":"infer 1.0.0","text":"major change package release set standards behavioral consistency calculate() (#356). Namely, package now supply consistent error supplied stat argument isn’t well-defined variables specify()d supply consistent message user supplies unneeded information via hypothesize() calculate() observed statistic supply consistent warning assume reasonable null value user supply sufficient information calculate observed statistic accommodate behavior, number new calculate methods added improved. Namely: Implemented standardized proportion z statistic one categorical variable Extended calculate() stat = \"t\" passing mu calculate() method stat = \"t\" allow calculation t statistics one numeric variable hypothesized mean Extended calculate() allow lowercase aliases stat arguments (#373). Fixed bugs calculate() allow programmatic calculation statistics behavioral consistency also allowed implementation observe(), wrapper function around specify(), hypothesize(), calculate(), calculate observed statistics. function provides shorthand alternative calculating observed statistics data: don’t anticipate changes “breaking” sense code previously worked continue , though may now message warn way used error different (hopefully informative) message.","code":"gss %>% specify(response = hours) %>% calculate(stat = \"diff in means\") #> Error: A difference in means is not well-defined for a #> numeric response variable (hours) and no explanatory variable. gss %>% specify(college ~ partyid, success = \"degree\") %>% calculate(stat = \"diff in props\") #> Error: A difference in proportions is not well-defined for a dichotomous categorical #> response variable (college) and a multinomial categorical explanatory variable (partyid). # supply mu = 40 when it's not needed gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% calculate(stat = \"mean\") #> Message: The point null hypothesis `mu = 40` does not inform calculation of #> the observed statistic (a mean) and will be ignored. #> # A tibble: 1 x 1 #> stat #> #> 1 41.4 # don't hypothesize `p` when it's needed gss %>% specify(response = sex, success = \"female\") %>% calculate(stat = \"z\") #> # A tibble: 1 x 1 #> stat #> #> 1 -1.16 #> Warning message: #> A z statistic requires a null hypothesis to calculate the observed statistic. #> Output assumes the following null value: `p = .5`. # don't hypothesize `p` when it's needed gss %>% specify(response = partyid) %>% calculate(stat = \"Chisq\") #> # A tibble: 1 x 1 #> stat #> #> 1 334. #> Warning message: #> A chi-square statistic requires a null hypothesis to calculate the observed statistic. #> Output assumes the following null values: `p = c(dem = 0.2, ind = 0.2, rep = 0.2, other = 0.2, DK = 0.2)`. # calculating the observed mean number of hours worked per week gss %>% observe(hours ~ NULL, stat = \"mean\") #> # A tibble: 1 x 1 #> stat #> #> 1 41.4 # equivalently, calculating the same statistic with the core verbs gss %>% specify(response = hours) %>% calculate(stat = \"mean\") #> # A tibble: 1 x 1 #> stat #> #> 1 41.4 # calculating a t statistic for hypothesized mu = 40 hours worked/week gss %>% observe(hours ~ NULL, stat = \"t\", null = \"point\", mu = 40) #> # A tibble: 1 x 1 #> stat #> #> 1 2.09 # equivalently, calculating the same statistic with the core verbs gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% calculate(stat = \"t\") #> # A tibble: 1 x 1 #> stat #> #> 1 2.09"},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"a-framework-for-theoretical-inference-1-0-0","dir":"Changelog","previous_headings":"","what":"A framework for theoretical inference","title":"infer 1.0.0","text":"release also introduces complete principled interface theoretical inference. package previously supplied methods visualization theory-based curves, interface provide object explicitly “null distribution” supplied helper functions like get_p_value() get_confidence_interval(). new interface based new verb, assume(), returns null distribution can interfaced way simulation-based null distributions can interfaced . example, ’ll work full infer pipeline inference mean using infer’s gss dataset. Supposed believe true mean number hours worked Americans past week 40. First, calculating observed t-statistic: code define null distribution similar required calculate theorized observed statistic, switching calculate() assume() replacing arguments needed. null distribution can now interfaced way simulation-based null distribution elsewhere package. example, calculating p-value juxtaposing observed statistic null distribution: …visualizing null distribution alone: …juxtaposing two visually: Confidence intervals lie data space rather standardized scale theoretical distributions. Calculating mean rather standardized t-statistic: null distribution just defines spread standard error calculation. Visualizing confidence interval results theoretical distribution recentered rescaled align scale observed data: Previous methods interfacing theoretical distributions superseded—continue supported, though documentation forefront assume() interface.","code":"obs_stat <- gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% calculate(stat = \"t\") obs_stat #> Response: hours (numeric) #> Null Hypothesis: point #> # A tibble: 1 x 1 #> stat #> #> 1 2.09 null_dist <- gss %>% specify(response = hours) %>% assume(distribution = \"t\") null_dist #> A T distribution with 499 degrees of freedom. get_p_value(null_dist, obs_stat, direction = \"both\") #> # A tibble: 1 x 1 #> p_value #> #> 1 0.0376 visualize(null_dist) visualize(null_dist) + shade_p_value(obs_stat, direction = \"both\") obs_mean <- gss %>% specify(response = hours) %>% calculate(stat = \"mean\") ci <- get_confidence_interval( null_dist, level = .95, point_estimate = obs_mean ) ci #> # A tibble: 1 x 2 #> lower_ci upper_ci #> #> 1 40.1 42.7 visualize(null_dist) + shade_confidence_interval(ci)"},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"support-for-multiple-regression-1-0-0","dir":"Changelog","previous_headings":"","what":"Support for multiple regression","title":"infer 1.0.0","text":"2016 “Guidelines Assessment Instruction Statistics Education” [1] state , introductory statistics courses, “[s]tudents gain experience statistical models, including multivariable models, used.” line recommendation, introduce support randomization-based inference multiple explanatory variables via new fit.infer core verb. passed infer object, method parse formula formula response explanatory arguments, pass data stats::glm call. Note function returns model coefficients estimate rather associated t-statistics stat. passed generate()d object, model fitted replicate. type = \"permute\", set unquoted column names data permute (independently ) can passed via variables argument generate. defaults response variable. feature allows detailed exploration effect disrupting correlation structure among explanatory variables outputted model coefficients. auxillary functions get_p_value(), get_confidence_interval(), visualize(), shade_p_value(), shade_confidence_interval() methods handle fit() output! See help-files example usage. Note shade_* functions now delay evaluation added existing ggplot (e.g. outputted visualize()) +.","code":"gss %>% specify(hours ~ age + college) %>% fit() #> # A tibble: 3 x 2 #> term estimate #> #> 1 intercept 40.6 #> 2 age 0.00596 #> 3 collegedegree 1.53 gss %>% specify(hours ~ age + college) %>% hypothesize(null = \"independence\") %>% generate(reps = 100, type = \"permute\") %>% fit() #> # A tibble: 300 x 3 #> # Groups: replicate [100] #> replicate term estimate #> #> 1 1 intercept 44.4 #> 2 1 age -0.0767 #> 3 1 collegedegree 0.121 #> 4 2 intercept 41.8 #> 5 2 age 0.00344 #> 6 2 collegedegree -1.59 #> 7 3 intercept 38.3 #> 8 3 age 0.0761 #> 9 3 collegedegree 0.136 #> 10 4 intercept 43.1 #> # … with 290 more rows gss %>% specify(hours ~ age + college) %>% hypothesize(null = \"independence\") %>% generate(reps = 100, type = \"permute\", variables = c(age, college)) %>% fit() #> # A tibble: 300 x 3 #> # Groups: replicate [100] #> replicate term estimate #> #> 1 1 intercept 39.4 #> 2 1 age 0.0748 #> 3 1 collegedegree -2.98 #> 4 2 intercept 42.8 #> 5 2 age -0.0190 #> 6 2 collegedegree -1.83 #> 7 3 intercept 40.4 #> 8 3 age 0.0354 #> 9 3 collegedegree -1.31 #> 10 4 intercept 40.9 #> # … with 290 more rows"},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"improvements-1-0-0","dir":"Changelog","previous_headings":"","what":"Improvements","title":"infer 1.0.0","text":"Following extensive discussion, generate() type type = \"simulate\" renamed evocative type = \"draw\". continue support type = \"simulate\" indefinitely, though supplying argument now prompt message notifying user preferred alias. (#233, #390) Fixed several bugs related factors unused levels. specify() now drop unused factor levels message done . (#374, #375, #397, #380) Added two.sided acceptable alias two_sided direction argument get_p_value() shade_p_value(). (#355) Various improvements documentation, including extending example sections help-files, re-organizing function reference {pkgdown} site, linking extensively among help-files.","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"breaking-changes-1-0-0","dir":"Changelog","previous_headings":"","what":"Breaking changes","title":"infer 1.0.0","text":"don’t anticipate changes made release “breaking” sense code previously worked continue , though may now message warn way used error different (hopefully informative) message. currently teach research infer, recommend re-running materials noting changes messaging warning. Move forward number planned deprecations. Namely, GENERATION_TYPES object now fully deprecated, arguments relocated visualize() shade_p_value() shade_confidence_interval() now fully deprecated visualize(). supplied deprecated argument, visualize() warn user ignore argument. Added prop argument rep_slice_sample() alternative n argument specifying proportion rows supplied data sample per replicate (#361, #362, #363). changes order arguments rep_slice_sample() (order aligned dplyr::slice_sample()) might break code didn’t use named arguments (like rep_slice_sample(df, 5, TRUE)). fix , use named arguments (like rep_slice_sample(df, 5, replicate = TRUE)).","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"other-1-0-0","dir":"Changelog","previous_headings":"","what":"Other","title":"infer 1.0.0","text":"Added Simon P. Couch author. Long deserved reliable maintenance improvements package. [1]: GAISE College Report ASA Revision Committee, “Guidelines Assessment Instruction Statistics Education College Report 2016,” http://www.amstat.org/education/gaise.","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"infer-054","dir":"Changelog","previous_headings":"","what":"infer 0.5.4","title":"infer 0.5.4","text":"CRAN release: 2021-01-13 rep_sample_n() longer errors supplied prob argument (#279) Added rep_slice_sample(), light wrapper around rep_sample_n(), closely resembles dplyr::slice_sample() (function supersedes dplyr::sample_n()) (#325) Added success, correct, z argument prop_test() (#343, #347, #353) Implemented observed statistic calculation standardized proportion z statistic (#351, #353) Various bug fixes improvements documentation errors.","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"infer-053","dir":"Changelog","previous_headings":"","what":"infer 0.5.3","title":"infer 0.5.3","text":"CRAN release: 2020-07-14","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"breaking-changes-0-5-3","dir":"Changelog","previous_headings":"","what":"Breaking changes","title":"infer 0.5.3","text":"get_confidence_interval() now uses column names (‘lower_ci’ ‘upper_ci’) output consistent infer functionality (#317).","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"new-functionality-0-5-3","dir":"Changelog","previous_headings":"","what":"New functionality","title":"infer 0.5.3","text":"get_confidence_interval() can now produce bias-corrected confidence intervals setting type = \"bias-corrected\". Thanks @davidbaniadam initial implementation (#237, #318)!","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"other-0-5-3","dir":"Changelog","previous_headings":"","what":"Other","title":"infer 0.5.3","text":"Fix CRAN check failures related long double errors.","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"infer-052","dir":"Changelog","previous_headings":"","what":"infer 0.5.2","title":"infer 0.5.2","text":"CRAN release: 2020-06-14 Warn user p-value 0 reported (#257, #273) Added new vignettes: chi_squared anova (#268) Updates documentation existing vignettes (#268) Add alias hypothesize() (hypothesise()) (#271) Subtraction order longer required difference-based tests–warning raised case user doesn’t supply order argument (#275, #281) Add new messages common errors (#277) Increase coverage theoretical methods documentation (#278, #280) Drop missing values reduce size gss dataset used examples (#282) Add stat = \"ratio props\" stat = \"odds ratio\" calculate (#285) Add prop_test(), tidy interface prop.test() (#284, #287) Updates visualize() compatibility ggplot2 v3.3.0 (#289) Fix error bootstrapping small samples raise warnings/errors appropriate (#239, #244, #291) Fix unit test failures resulting breaking changes dplyr v1.0.0 Fix error generate() response variable named x (#299) Add two-sided two sided aliases two_sided direction argument get_p_value() shade_p_value() (#302) Fix t_test() t_stat() ignoring order argument (#310)","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"infer-051","dir":"Changelog","previous_headings":"","what":"infer 0.5.1","title":"infer 0.5.1","text":"CRAN release: 2019-11-19 Updates documentation tweaks","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"infer-050","dir":"Changelog","previous_headings":"","what":"infer 0.5.0","title":"infer 0.5.0","text":"CRAN release: 2019-09-27","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"breaking-changes-0-5-0","dir":"Changelog","previous_headings":"","what":"Breaking changes","title":"infer 0.5.0","text":"shade_confidence_interval() now plots vertical lines starting zero (previously - bottom plot) (#234). shade_p_value() now uses “area curve” approach shading (#229).","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"other-0-5-0","dir":"Changelog","previous_headings":"","what":"Other","title":"infer 0.5.0","text":"Updated chisq_test() take arguments response/explanatory format, perform goodness fit tests, default approximation approach (#241). Updated chisq_stat() goodness fit (#241). Make interface hypothesize() clearer adding options point null parameters function signature (#242). Manage infer class systematically (#219). Use vdiffr plot testing (#221).","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"infer-041","dir":"Changelog","previous_headings":"","what":"infer 0.4.1","title":"infer 0.4.1","text":"Added Evgeni Chasnovski author incredible work refactoring package providing excellent support.","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"infer-040","dir":"Changelog","previous_headings":"","what":"infer 0.4.0","title":"infer 0.4.0","text":"CRAN release: 2018-11-15","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"breaking-changes-0-4-0","dir":"Changelog","previous_headings":"","what":"Breaking changes","title":"infer 0.4.0","text":"Changed method computing two-sided p-value conventional one. also makes get_pvalue() visualize() aligned (#205).","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"deprecation-changes-0-4-0","dir":"Changelog","previous_headings":"","what":"Deprecation changes","title":"infer 0.4.0","text":"Deprecated p_value() (use get_p_value() instead) (#180). Deprecated conf_int() (use get_confidence_interval() instead) (#180). Deprecated (via warnings) plotting p-value confidence interval visualize() (use new functions shade_p_value() shade_confidence_interval() instead) (#178).","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"new-functions-0-4-0","dir":"Changelog","previous_headings":"","what":"New functions","title":"infer 0.4.0","text":"shade_p_value() - {ggplot2}-like layer function add information p-value region visualize() output. alias shade_pvalue(). shade_confidence_interval() - {ggplot2}-like layer function add information confidence interval region visualize() output. alias shade_ci().","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"other-0-4-0","dir":"Changelog","previous_headings":"","what":"Other","title":"infer 0.4.0","text":"Account NULL value left hand side formula specify() (#156) type generate() (#157). Update documentation code follow tidyverse style guide (#159). Remove help page internal set_params() (#165). Fully use {tibble} (#166). Fix calculate() depend order p type = \"simulate\" (#122). Reduce code duplication (#173). Make transparency visualize() depend method data volume. Make visualize() work “One sample t” theoretical type method = \"\". Add stat = \"sum\" stat = \"count\" options calculate() (#50).","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"infer-031","dir":"Changelog","previous_headings":"","what":"infer 0.3.1","title":"infer 0.3.1","text":"CRAN release: 2018-08-06 Stop using package {assertive} favor custom type checks (#149) Fixed t_stat() use ... var.equal works help @echasnovski, fixed var.equal = TRUE specify() %>% calculate(stat = \"t\") Use custom functions error, warning, message, paste() handling (#155)","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"infer-030","dir":"Changelog","previous_headings":"","what":"infer 0.3.0","title":"infer 0.3.0","text":"CRAN release: 2018-07-11 Added conf_int logical argument conf_level argument t_test() Switched shade_color argument visualize() pvalue_fill instead since fill color confidence intervals also added now Green default color CI red p-values direction = \"\" get green shading Currently working simulation-based methods get_ci() get_confidence_interval() aliases conf_int() Converted longer confidence interval calculation code vignettes use get_ci() instead get_pvalue() alias p_value() Converted longer p-value calculation code vignettes use get_pvalue() instead Implemented Chi-square Goodness Fit observed stat depending params set hypothesize specify() %>% calculate() shortcut Removed “standardized” slope t since formula different “standardized” correlation way currently give one Implemented correlation bootstrap CI permutation hypothesis test Added message type given differently expected visualize() works either 1x1 data frame vector obs_stat argument Got stat = \"t\" working Refactored calculate() smaller functions reduce complexity Produced error mu given hypothesize() stat = \"median\" provided calculate() similar mis-specifications work one sample two sample cases providing formula Added order argument t_stat() Added implementation one sample t_test() passing mu argument t.test hypothesize() Tweaked pkgdown page include ToDo’s using {dplyr} example","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"infer-020","dir":"Changelog","previous_headings":"","what":"infer 0.2.0","title":"infer 0.2.0","text":"CRAN release: 2018-05-15 Switched !! instead UQ() since UQ() deprecated {rlang} 0.2.0 Added many new files: CONDUCT.md, CONTRIBUTING.md, -.md Updated README file development information Added wrapper functions t_test() chisq_test() use formula interface provide intuitive wrapper t.test() chisq.test() Created stat = \"z\" stat = \"t\" options Added many new arguments visualize() prescribe colors shade use observed statistics theoretical density curves Added check bar graph created visualize() number unique values generated statistics small Added shading method = \"theoretical\" Use percentiles determine two-tailed shading Changed method = \"randomization\" method = \"simulation\" Added warning theoretical distribution used assumptions checked Two sample t ANOVA F One proportion z Two proportion z Chi-square test independence Chi-square Goodness Fit test Standardized slope (t)","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"infer-011","dir":"Changelog","previous_headings":"","what":"infer 0.1.1","title":"infer 0.1.1","text":"CRAN release: 2018-01-22 Added additional tests Added order argument calculate() Fixed bugs post-CRAN release Automated travis build pkgdown gh-pages branch","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"infer-010","dir":"Changelog","previous_headings":"","what":"infer 0.1.0","title":"infer 0.1.0","text":"CRAN release: 2018-01-08 Altered way successes indicated infer pipeline. now live specify(). Updated documentation examples Deployed https://infer.tidymodels.org/","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"infer-001","dir":"Changelog","previous_headings":"","what":"infer 0.0.1","title":"infer 0.0.1","text":"Implemented “intro stats” examples randomization methods","code":""}] +[{"path":[]},{"path":"https://infer.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"our-pledge","dir":"","previous_headings":"","what":"Our Pledge","title":"Contributor Covenant Code of Conduct","text":"members, contributors, leaders pledge make participation community harassment-free experience everyone, regardless age, body size, visible invisible disability, ethnicity, sex characteristics, gender identity expression, level experience, education, socio-economic status, nationality, personal appearance, race, caste, color, religion, sexual identity orientation. pledge act interact ways contribute open, welcoming, diverse, inclusive, healthy community.","code":""},{"path":"https://infer.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"our-standards","dir":"","previous_headings":"","what":"Our Standards","title":"Contributor Covenant Code of Conduct","text":"Examples behavior contributes positive environment community include: Demonstrating empathy kindness toward people respectful differing opinions, viewpoints, experiences Giving gracefully accepting constructive feedback Accepting responsibility apologizing affected mistakes, learning experience Focusing best just us individuals, overall community Examples unacceptable behavior include: use sexualized language imagery, sexual attention advances kind Trolling, insulting derogatory comments, personal political attacks Public private harassment Publishing others’ private information, physical email address, without explicit permission conduct reasonably considered inappropriate professional setting","code":""},{"path":"https://infer.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"enforcement-responsibilities","dir":"","previous_headings":"","what":"Enforcement Responsibilities","title":"Contributor Covenant Code of Conduct","text":"Community leaders responsible clarifying enforcing standards acceptable behavior take appropriate fair corrective action response behavior deem inappropriate, threatening, offensive, harmful. Community leaders right responsibility remove, edit, reject comments, commits, code, wiki edits, issues, contributions aligned Code Conduct, communicate reasons moderation decisions appropriate.","code":""},{"path":"https://infer.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"scope","dir":"","previous_headings":"","what":"Scope","title":"Contributor Covenant Code of Conduct","text":"Code Conduct applies within community spaces, also applies individual officially representing community public spaces. Examples representing community include using official e-mail address, posting via official social media account, acting appointed representative online offline event.","code":""},{"path":"https://infer.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"enforcement","dir":"","previous_headings":"","what":"Enforcement","title":"Contributor Covenant Code of Conduct","text":"Instances abusive, harassing, otherwise unacceptable behavior may reported community leaders responsible enforcement codeofconduct@posit.co. complaints reviewed investigated promptly fairly. community leaders obligated respect privacy security reporter incident.","code":""},{"path":"https://infer.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"enforcement-guidelines","dir":"","previous_headings":"","what":"Enforcement Guidelines","title":"Contributor Covenant Code of Conduct","text":"Community leaders follow Community Impact Guidelines determining consequences action deem violation Code Conduct:","code":""},{"path":"https://infer.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"id_1-correction","dir":"","previous_headings":"Enforcement Guidelines","what":"1. Correction","title":"Contributor Covenant Code of Conduct","text":"Community Impact: Use inappropriate language behavior deemed unprofessional unwelcome community. Consequence: private, written warning community leaders, providing clarity around nature violation explanation behavior inappropriate. public apology may requested.","code":""},{"path":"https://infer.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"id_2-warning","dir":"","previous_headings":"Enforcement Guidelines","what":"2. Warning","title":"Contributor Covenant Code of Conduct","text":"Community Impact: violation single incident series actions. Consequence: warning consequences continued behavior. interaction people involved, including unsolicited interaction enforcing Code Conduct, specified period time. includes avoiding interactions community spaces well external channels like social media. Violating terms may lead temporary permanent ban.","code":""},{"path":"https://infer.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"id_3-temporary-ban","dir":"","previous_headings":"Enforcement Guidelines","what":"3. Temporary Ban","title":"Contributor Covenant Code of Conduct","text":"Community Impact: serious violation community standards, including sustained inappropriate behavior. Consequence: temporary ban sort interaction public communication community specified period time. public private interaction people involved, including unsolicited interaction enforcing Code Conduct, allowed period. Violating terms may lead permanent ban.","code":""},{"path":"https://infer.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"id_4-permanent-ban","dir":"","previous_headings":"Enforcement Guidelines","what":"4. Permanent Ban","title":"Contributor Covenant Code of Conduct","text":"Community Impact: Demonstrating pattern violation community standards, including sustained inappropriate behavior, harassment individual, aggression toward disparagement classes individuals. Consequence: permanent ban sort public interaction within community.","code":""},{"path":"https://infer.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"attribution","dir":"","previous_headings":"","what":"Attribution","title":"Contributor Covenant Code of Conduct","text":"Code Conduct adapted Contributor Covenant, version 2.1, available https://www.contributor-covenant.org/version/2/1/code_of_conduct.html. Community Impact Guidelines inspired [Mozilla’s code conduct enforcement ladder][https://github.com/mozilla/inclusion]. answers common questions code conduct, see FAQ https://www.contributor-covenant.org/faq. Translations available https://www.contributor-covenant.org/translations.","code":""},{"path":"https://infer.tidymodels.org/dev/CONTRIBUTING.html","id":null,"dir":"","previous_headings":"","what":"Contributing","title":"Contributing","text":"Contributions infer whether form bug fixes, issue reports, new code documentation improvements encouraged welcome. welcome novices may never contributed package well friendly veterans looking help us improve package users. eager include accepting contributions everyone meets code conduct guidelines. Please use GitHub issues. pull request, please link open corresponding issue GitHub issues. Please ensure notifications turned respond questions, comments needed changes promptly.","code":""},{"path":"https://infer.tidymodels.org/dev/CONTRIBUTING.html","id":"tests","dir":"","previous_headings":"","what":"Tests","title":"Contributing","text":"infer uses testthat testing. Please try provide 100% test coverage submitted code always check existing tests continue pass. beginner need help writing test, mention issue try help. ’s also helpful run goodpractice::gp() ensure lines code 80 characters lines code tests written. Please prior submitting pull request fix suggestions . Reach us need assistance .","code":""},{"path":"https://infer.tidymodels.org/dev/CONTRIBUTING.html","id":"code-style","dir":"","previous_headings":"","what":"Code style","title":"Contributing","text":"Please use snake case (rep_sample_n) function names. Besides , general follow tidyverse style R.","code":""},{"path":"https://infer.tidymodels.org/dev/CONTRIBUTING.html","id":"code-of-conduct","dir":"","previous_headings":"","what":"Code of Conduct","title":"Contributing","text":"contributing infer package must follow code conduct defined CONDUCT.","code":""},{"path":"https://infer.tidymodels.org/dev/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"MIT License","title":"MIT License","text":"Copyright (c) 2021 infer authors Permission hereby granted, free charge, person obtaining copy software associated documentation files (“Software”), deal Software without restriction, including without limitation rights use, copy, modify, merge, publish, distribute, sublicense, /sell copies Software, permit persons Software furnished , subject following conditions: copyright notice permission notice shall included copies substantial portions Software. SOFTWARE PROVIDED “”, WITHOUT WARRANTY KIND, EXPRESS IMPLIED, INCLUDING LIMITED WARRANTIES MERCHANTABILITY, FITNESS PARTICULAR PURPOSE NONINFRINGEMENT. EVENT SHALL AUTHORS COPYRIGHT HOLDERS LIABLE CLAIM, DAMAGES LIABILITY, WHETHER ACTION CONTRACT, TORT OTHERWISE, ARISING , CONNECTION SOFTWARE USE DEALINGS SOFTWARE.","code":""},{"path":"https://infer.tidymodels.org/dev/articles/chi_squared.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Tidy Chi-Squared Tests with infer","text":"vignette, ’ll walk conducting \\(\\chi^2\\) (chi-squared) test independence chi-squared goodness fit test using infer. ’ll start chi-squared test independence, can used test association two categorical variables. , ’ll move chi-squared goodness fit test, tests well distribution one categorical variable can approximated theoretical distribution. Throughout vignette, ’ll make use gss dataset supplied infer, contains sample data General Social Survey. See ?gss information variables included source. Note data (examples ) demonstration purposes , necessarily provide accurate estimates unless weighted properly. examples, let’s suppose dataset representative sample population want learn : American adults. data looks like :","code":"dplyr::glimpse(gss) ## Rows: 500 ## Columns: 11 ## $ year 2014, 1994, 1998, 1996, 1994, 1996, 1990, 2016, 2000, 19… ## $ age 36, 34, 24, 42, 31, 32, 48, 36, 30, 33, 21, 30, 38, 49, … ## $ sex male, female, male, male, male, female, female, female, … ## $ college degree, no degree, degree, no degree, degree, no degree,… ## $ partyid ind, rep, ind, ind, rep, rep, dem, ind, rep, dem, dem, i… ## $ hompop 3, 4, 1, 4, 2, 4, 2, 1, 5, 2, 4, 3, 4, 4, 2, 2, 3, 2, 1,… ## $ hours 50, 31, 40, 40, 40, 53, 32, 20, 40, 40, 23, 52, 38, 72, … ## $ income $25000 or more, $20000 - 24999, $25000 or more, $25000 o… ## $ class middle class, working class, working class, working clas… ## $ finrela below average, below average, below average, above avera… ## $ weight 0.8960, 1.0825, 0.5501, 1.0864, 1.0825, 1.0864, 1.0627, …"},{"path":"https://infer.tidymodels.org/dev/articles/chi_squared.html","id":"test-of-independence","dir":"Articles","previous_headings":"","what":"Test of Independence","title":"Tidy Chi-Squared Tests with infer","text":"carry chi-squared test independence, ’ll examine association income educational attainment United States. college categorical variable values degree degree, indicating whether respondent college degree (including community college), finrela gives respondent’s self-identification family income—either far average, average, average, average, far average, DK (don’t know). relationship looks like sample data: relationship, expect see purple bars reaching height, regardless income class. differences see , though, just due random noise? First, calculate observed statistic, can use specify() calculate(). observed \\(\\chi^2\\) statistic 30.6825. Now, want compare statistic null distribution, generated assumption variables actually related, get sense likely us see observed statistic actually association education income. can generate null distribution one two ways—using randomization theory-based methods. randomization approach approximates null distribution permuting response explanatory variables, person’s educational attainment matched random income sample order break association two. Note , line specify(college ~ finrela) , use equivalent syntax specify(response = college, explanatory = finrela). goes code , generates null distribution using theory-based methods instead randomization. get sense distributions look like, observed statistic falls, can use visualize(): also visualize observed statistic theoretical null distribution. , use assume() verb define theoretical null distribution pass visualize() like null distribution outputted generate() calculate(). visualize randomization-based theoretical null distributions get sense two relate, can pipe randomization-based null distribution visualize(), provide method = \"\". Either way, looks like observed test statistic quite unlikely actually association education income. exactly, can approximate p-value get_p_value: Thus, really relationship education income, approximation probability see statistic extreme 30.6825 approximately 0. calculate p-value using true \\(\\chi^2\\) distribution, can use pchisq function base R. function allows us situate test statistic calculated previously \\(\\chi^2\\) distribution appropriate degrees freedom. Note , equivalently theory-based approach shown , package supplies wrapper function, chisq_test, carry Chi-Squared tests independence tidy data. syntax goes like :","code":"# calculate the observed statistic observed_indep_statistic <- gss %>% specify(college ~ finrela) %>% hypothesize(null = \"independence\") %>% calculate(stat = \"Chisq\") # generate the null distribution using randomization null_dist_sim <- gss %>% specify(college ~ finrela) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") %>% calculate(stat = \"Chisq\") # generate the null distribution by theoretical approximation null_dist_theory <- gss %>% specify(college ~ finrela) %>% assume(distribution = \"Chisq\") # visualize the null distribution and test statistic! null_dist_sim %>% visualize() + shade_p_value(observed_indep_statistic, direction = \"greater\") # visualize the theoretical null distribution and test statistic! gss %>% specify(college ~ finrela) %>% assume(distribution = \"Chisq\") %>% visualize() + shade_p_value(observed_indep_statistic, direction = \"greater\") # visualize both null distributions and the test statistic! null_dist_sim %>% visualize(method = \"both\") + shade_p_value(observed_indep_statistic, direction = \"greater\") # calculate the p value from the observed statistic and null distribution p_value_independence <- null_dist_sim %>% get_p_value(obs_stat = observed_indep_statistic, direction = \"greater\") p_value_independence ## # A tibble: 1 × 1 ## p_value ## ## 1 0 pchisq(observed_indep_statistic$stat, 5, lower.tail = FALSE) ## X-squared ## 1.082e-05 chisq_test(gss, college ~ finrela) ## # A tibble: 1 × 3 ## statistic chisq_df p_value ## ## 1 30.7 5 0.0000108"},{"path":"https://infer.tidymodels.org/dev/articles/chi_squared.html","id":"goodness-of-fit","dir":"Articles","previous_headings":"","what":"Goodness of Fit","title":"Tidy Chi-Squared Tests with infer","text":"Now, moving chi-squared goodness fit test, ’ll take look self-identified income class survey respondents. Suppose null hypothesis finrela follows uniform distribution (.e. ’s actually equal number people describe income far average, average, average, average, far average, don’t know income.) graph represents hypothesis: seems like uniform distribution may appropriate description data–many people describe income average options. Lets now test whether difference distributions statistically significant. First, carry hypothesis test, calculate observed statistic. observed statistic 487.984. Now, generating null distribution, just dropping call generate(): , get sense distributions look like, observed statistic falls, can use visualize(): statistic seems like quite unlikely income class self-identification actually followed uniform distribution! unlikely, though? Calculating p-value: Thus, self-identified income class equally likely occur, approximation probability see distribution like one approximately 0. calculate p-value using true \\(\\chi^2\\) distribution, can use pchisq function base R. function allows us situate test statistic calculated previously \\(\\chi^2\\) distribution appropriate degrees freedom. , equivalently theory-based approach shown , package supplies wrapper function, chisq_test, carry Chi-Squared goodness fit tests tidy data. syntax goes like :","code":"# calculating the null distribution observed_gof_statistic <- gss %>% specify(response = finrela) %>% hypothesize(null = \"point\", p = c(\"far below average\" = 1/6, \"below average\" = 1/6, \"average\" = 1/6, \"above average\" = 1/6, \"far above average\" = 1/6, \"DK\" = 1/6)) %>% calculate(stat = \"Chisq\") # generating a null distribution, assuming each income class is equally likely null_dist_gof <- gss %>% specify(response = finrela) %>% hypothesize(null = \"point\", p = c(\"far below average\" = 1/6, \"below average\" = 1/6, \"average\" = 1/6, \"above average\" = 1/6, \"far above average\" = 1/6, \"DK\" = 1/6)) %>% generate(reps = 1000, type = \"draw\") %>% calculate(stat = \"Chisq\") # visualize the null distribution and test statistic! null_dist_gof %>% visualize() + shade_p_value(observed_gof_statistic, direction = \"greater\") # calculate the p-value p_value_gof <- null_dist_gof %>% get_p_value(observed_gof_statistic, direction = \"greater\") p_value_gof ## # A tibble: 1 × 1 ## p_value ## ## 1 0 pchisq(observed_gof_statistic$stat, 5, lower.tail = FALSE) ## [1] 3.131e-103 chisq_test(gss, response = finrela, p = c(\"far below average\" = 1/6, \"below average\" = 1/6, \"average\" = 1/6, \"above average\" = 1/6, \"far above average\" = 1/6, \"DK\" = 1/6)) ## # A tibble: 1 × 3 ## statistic chisq_df p_value ## ## 1 488. 5 3.13e-103"},{"path":"https://infer.tidymodels.org/dev/articles/infer.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Getting to Know infer","text":"infer implements expressive grammar perform statistical inference coheres tidyverse design framework. Rather providing methods specific statistical tests, package consolidates principles shared among common hypothesis tests set 4 main verbs (functions), supplemented many utilities visualize extract value outputs. Regardless hypothesis test ’re using, ’re still asking kind question: effect/difference observed data real, due chance? answer question, start assuming observed data came world “nothing going ” (.e. observed effect simply due random chance), call assumption null hypothesis. (reality, might believe null hypothesis —null hypothesis opposition alternate hypothesis, supposes effect present observed data actually due fact “something going .”) calculate test statistic data describes observed effect. can use test statistic calculate p-value, giving probability observed data come null hypothesis true. probability pre-defined significance level \\(\\alpha\\), can reject null hypothesis. workflow package designed around idea. Starting dataset, specify() allows specify variable, relationship variables, ’re interested . hypothesize() allows declare null hypothesis. generate() allows generate data reflecting null hypothesis. calculate() allows calculate distribution statistics generated data form null distribution. Throughout vignette, make use gss, dataset supplied infer containing sample 500 observations 11 variables General Social Survey. row individual survey response, containing basic demographic information respondent well additional variables. See ?gss information variables included source. Note data (examples ) demonstration purposes , necessarily provide accurate estimates unless weighted properly. examples, let’s suppose dataset representative sample population want learn : American adults.","code":"# load in the dataset data(gss) # take a look at its structure dplyr::glimpse(gss) ## Rows: 500 ## Columns: 11 ## $ year 2014, 1994, 1998, 1996, 1994, 1996, 1990, 2016, 2000, 19… ## $ age 36, 34, 24, 42, 31, 32, 48, 36, 30, 33, 21, 30, 38, 49, … ## $ sex male, female, male, male, male, female, female, female, … ## $ college degree, no degree, degree, no degree, degree, no degree,… ## $ partyid ind, rep, ind, ind, rep, rep, dem, ind, rep, dem, dem, i… ## $ hompop 3, 4, 1, 4, 2, 4, 2, 1, 5, 2, 4, 3, 4, 4, 2, 2, 3, 2, 1,… ## $ hours 50, 31, 40, 40, 40, 53, 32, 20, 40, 40, 23, 52, 38, 72, … ## $ income $25000 or more, $20000 - 24999, $25000 or more, $25000 o… ## $ class middle class, working class, working class, working clas… ## $ finrela below average, below average, below average, above avera… ## $ weight 0.8960, 1.0825, 0.5501, 1.0864, 1.0825, 1.0864, 1.0627, …"},{"path":"https://infer.tidymodels.org/dev/articles/infer.html","id":"specify-specifying-response-and-explanatory-variables","dir":"Articles","previous_headings":"","what":"specify(): Specifying Response (and Explanatory) Variables","title":"Getting to Know infer","text":"specify function can used specify variables dataset ’re interested . ’re interested , say, age respondents, might write: front-end, output specify just looks like selects columns dataframe ’ve specified. Checking class object, though: can see infer class appended top dataframe classes–new class stores extra metadata. ’re interested two variables–age partyid, example–can specify relationship one two (equivalent) ways: ’re inference one proportion difference proportions, need use success argument specify level response variable success. instance, ’re interested proportion population college degree, might use following code:","code":"gss %>% specify(response = age) ## Response: age (numeric) ## # A tibble: 500 × 1 ## age ## ## 1 36 ## 2 34 ## 3 24 ## 4 42 ## 5 31 ## 6 32 ## 7 48 ## 8 36 ## 9 30 ## 10 33 ## # ℹ 490 more rows gss %>% specify(response = age) %>% class() ## [1] \"infer\" \"tbl_df\" \"tbl\" \"data.frame\" # as a formula gss %>% specify(age ~ partyid) ## Response: age (numeric) ## Explanatory: partyid (factor) ## # A tibble: 500 × 2 ## age partyid ## ## 1 36 ind ## 2 34 rep ## 3 24 ind ## 4 42 ind ## 5 31 rep ## 6 32 rep ## 7 48 dem ## 8 36 ind ## 9 30 rep ## 10 33 dem ## # ℹ 490 more rows # with the named arguments gss %>% specify(response = age, explanatory = partyid) ## Response: age (numeric) ## Explanatory: partyid (factor) ## # A tibble: 500 × 2 ## age partyid ## ## 1 36 ind ## 2 34 rep ## 3 24 ind ## 4 42 ind ## 5 31 rep ## 6 32 rep ## 7 48 dem ## 8 36 ind ## 9 30 rep ## 10 33 dem ## # ℹ 490 more rows # specifying for inference on proportions gss %>% specify(response = college, success = \"degree\") ## Response: college (factor) ## # A tibble: 500 × 1 ## college ## ## 1 degree ## 2 no degree ## 3 degree ## 4 no degree ## 5 degree ## 6 no degree ## 7 no degree ## 8 degree ## 9 degree ## 10 no degree ## # ℹ 490 more rows"},{"path":"https://infer.tidymodels.org/dev/articles/infer.html","id":"hypothesize-declaring-the-null-hypothesis","dir":"Articles","previous_headings":"","what":"hypothesize(): Declaring the Null Hypothesis","title":"Getting to Know infer","text":"next step infer pipeline often declare null hypothesis using hypothesize(). first step supply one “independence” “point” null argument. null hypothesis assumes independence two variables, need supply hypothesize(): ’re inference point estimate, also need provide one p (true proportion successes, 0 1), mu (true mean), med (true median), sigma (true standard deviation). instance, null hypothesis mean number hours worked per week population 40, write: , front-end, dataframe outputted hypothesize() looks almost exactly came specify(), infer now “knows” null hypothesis.","code":"gss %>% specify(college ~ partyid, success = \"degree\") %>% hypothesize(null = \"independence\") ## Response: college (factor) ## Explanatory: partyid (factor) ## Null Hypothesis: independence ## # A tibble: 500 × 2 ## college partyid ## ## 1 degree ind ## 2 no degree rep ## 3 degree ind ## 4 no degree ind ## 5 degree rep ## 6 no degree rep ## 7 no degree dem ## 8 degree ind ## 9 degree rep ## 10 no degree dem ## # ℹ 490 more rows gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) ## Response: hours (numeric) ## Null Hypothesis: point ## # A tibble: 500 × 1 ## hours ## ## 1 50 ## 2 31 ## 3 40 ## 4 40 ## 5 40 ## 6 53 ## 7 32 ## 8 20 ## 9 40 ## 10 40 ## # ℹ 490 more rows"},{"path":"https://infer.tidymodels.org/dev/articles/infer.html","id":"generate-generating-the-null-distribution","dir":"Articles","previous_headings":"","what":"generate(): Generating the Null Distribution","title":"Getting to Know infer","text":"’ve asserted null hypothesis using hypothesize(), can construct null distribution based hypothesis. can using one several methods, supplied type argument: bootstrap: bootstrap sample drawn replicate, sample size equal input sample size drawn (replacement) input sample data. permute: replicate, input value randomly reassigned (without replacement) new output value sample. draw: value sampled theoretical distribution parameters specified hypothesize() replicate. option currently applicable testing point estimates. generation type previously called \"simulate\", superseded. Continuing example , average number hours worked week, might write: example, take 1000 bootstrap samples form null distribution. Note , generate()ing, ’ve set seed random number generation set.seed() function. using infer package research, cases exact reproducibility priority, good practice. infer respect random seed specified set.seed() function, returning result generate()ing data given identical seed. generate null distribution independence two variables, also randomly reshuffle pairings explanatory response variables break existing association. instance, generate 1000 replicates can used create null distribution assumption political party affiliation affected age:","code":"set.seed(1) gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% generate(reps = 1000, type = \"bootstrap\") ## Response: hours (numeric) ## Null Hypothesis: point ## # A tibble: 500,000 × 2 ## # Groups: replicate [1,000] ## replicate hours ## ## 1 1 46.6 ## 2 1 43.6 ## 3 1 38.6 ## 4 1 28.6 ## 5 1 38.6 ## 6 1 38.6 ## 7 1 6.62 ## 8 1 78.6 ## 9 1 38.6 ## 10 1 38.6 ## # ℹ 499,990 more rows gss %>% specify(partyid ~ age) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") ## Response: partyid (factor) ## Explanatory: age (numeric) ## Null Hypothesis: independence ## # A tibble: 500,000 × 3 ## # Groups: replicate [1,000] ## partyid age replicate ## ## 1 rep 36 1 ## 2 rep 34 1 ## 3 dem 24 1 ## 4 dem 42 1 ## 5 dem 31 1 ## 6 ind 32 1 ## 7 ind 48 1 ## 8 rep 36 1 ## 9 dem 30 1 ## 10 rep 33 1 ## # ℹ 499,990 more rows"},{"path":"https://infer.tidymodels.org/dev/articles/infer.html","id":"calculate-calculating-summary-statistics","dir":"Articles","previous_headings":"","what":"calculate(): Calculating Summary Statistics","title":"Getting to Know infer","text":"calculate() calculates summary statistics output infer core functions. function takes stat argument, currently one “mean”, “median”, “sum”, “sd”, “prop”, “count”, “diff means”, “diff medians”, “diff props”, “Chisq”, “F”, “t”, “z”, “slope”, “correlation”. example, continuing example calculate null distribution mean hours worked per week: output calculate() shows us sample statistic (case, mean) 1000 replicates. ’re carrying inference differences means, medians, proportions, t z statistics, need supply order argument, giving order explanatory variables subtracted. instance, find difference mean age college degree don’t, might write:","code":"gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"mean\") ## Response: hours (numeric) ## Null Hypothesis: point ## # A tibble: 1,000 × 2 ## replicate stat ## ## 1 1 39.2 ## 2 2 39.1 ## 3 3 39.0 ## 4 4 39.8 ## 5 5 41.4 ## 6 6 39.4 ## 7 7 39.8 ## 8 8 40.4 ## 9 9 41.5 ## 10 10 40.9 ## # ℹ 990 more rows gss %>% specify(age ~ college) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") %>% calculate(\"diff in means\", order = c(\"degree\", \"no degree\")) ## Response: age (numeric) ## Explanatory: college (factor) ## Null Hypothesis: independence ## # A tibble: 1,000 × 2 ## replicate stat ## ## 1 1 -2.35 ## 2 2 -0.902 ## 3 3 0.403 ## 4 4 -0.426 ## 5 5 0.482 ## 6 6 -0.196 ## 7 7 1.33 ## 8 8 -1.07 ## 9 9 1.68 ## 10 10 0.888 ## # ℹ 990 more rows"},{"path":"https://infer.tidymodels.org/dev/articles/infer.html","id":"other-utilities","dir":"Articles","previous_headings":"","what":"Other Utilities","title":"Getting to Know infer","text":"infer also offers several utilities extract meaning summary statistics distributions—package provides functions visualize statistic relative distribution (visualize()), calculate p-values (get_p_value()), calculate confidence intervals (get_confidence_interval()). illustrate, ’ll go back example determining whether mean number hours worked per week 40 hours. point estimate 41.382 seems pretty close 40, little bit different. might wonder difference just due random chance, mean number hours worked per week population really isn’t 40. initially just visualize null distribution. sample’s observed statistic lie distribution? can use obs_stat argument specify . Notice infer also shaded regions null distribution () extreme observed statistic. (Also, note now use + operator apply shade_p_value function. visualize outputs plot object ggplot2 instead data frame, + operator needed add p-value layer plot object.) red bar looks like ’s slightly far right tail null distribution, observing sample mean 41.382 hours somewhat unlikely mean actually 40 hours. unlikely, though? looks like p-value 0.032, pretty small—true mean number hours worked per week actually 40, probability sample mean far (1.382 hours) 40 0.032. may may statistically significantly different, depending significance level \\(\\alpha\\) decided ran analysis. set \\(\\alpha = .05\\), difference statistically significant, set \\(\\alpha = .01\\), . get confidence interval around estimate, can write: can see, 40 hours per week contained interval, aligns previous conclusion finding significant confidence level \\(\\alpha = .05\\). see interval represented visually, can use shade_confidence_interval() utility:","code":"# find the point estimate obs_mean <- gss %>% specify(response = hours) %>% calculate(stat = \"mean\") # generate a null distribution null_dist <- gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"mean\") null_dist %>% visualize() null_dist %>% visualize() + shade_p_value(obs_stat = obs_mean, direction = \"two-sided\") # get a two-tailed p-value p_value <- null_dist %>% get_p_value(obs_stat = obs_mean, direction = \"two-sided\") p_value ## # A tibble: 1 × 1 ## p_value ## ## 1 0.032 # generate a distribution like the null distribution, # though exclude the null hypothesis from the pipeline boot_dist <- gss %>% specify(response = hours) %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"mean\") # start with the bootstrap distribution ci <- boot_dist %>% # calculate the confidence interval around the point estimate get_confidence_interval(point_estimate = obs_mean, # at the 95% confidence level level = .95, # using the standard error type = \"se\") ci ## # A tibble: 1 × 2 ## lower_ci upper_ci ## ## 1 40.1 42.7 boot_dist %>% visualize() + shade_confidence_interval(endpoints = ci)"},{"path":"https://infer.tidymodels.org/dev/articles/infer.html","id":"theoretical-methods","dir":"Articles","previous_headings":"","what":"Theoretical Methods","title":"Getting to Know infer","text":"{infer} also provides functionality use theoretical methods \"Chisq\", \"F\", \"t\" \"z\" distributions. Generally, find null distribution using theory-based methods, use code use find observed statistic elsewhere, replacing calls calculate() assume(). example, calculate observed \\(t\\) statistic (standardized mean): , define theoretical \\(t\\) distribution, write: , theoretical distribution interfaces way simulation-based null distributions . example, interface p-values: Confidence intervals lie scale data rather standardized scale theoretical distribution, sure use unstandardized observed statistic working confidence intervals. visualized, \\(t\\) distribution recentered rescaled align scale observed data.","code":"# calculate an observed t statistic obs_t <- gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% calculate(stat = \"t\") # switch out calculate with assume to define a distribution t_dist <- gss %>% specify(response = hours) %>% assume(distribution = \"t\") # visualize the theoretical null distribution visualize(t_dist) + shade_p_value(obs_stat = obs_t, direction = \"greater\") # more exactly, calculate the p-value get_p_value(t_dist, obs_t, \"greater\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.0188 # find the theory-based confidence interval theor_ci <- get_confidence_interval( x = t_dist, level = .95, point_estimate = obs_mean ) theor_ci ## # A tibble: 1 × 2 ## lower_ci upper_ci ## ## 1 40.1 42.7 # visualize the theoretical sampling distribution visualize(t_dist) + shade_confidence_interval(theor_ci)"},{"path":"https://infer.tidymodels.org/dev/articles/infer.html","id":"multiple-regression","dir":"Articles","previous_headings":"","what":"Multiple regression","title":"Getting to Know infer","text":"accommodate randomization-based inference multiple explanatory variables, package implements alternative workflow based model fitting. Rather calculate()ing statistics resampled data, side package allows fit() linear models data resampled according null hypothesis, supplying model coefficients explanatory variable. part, can just switch calculate() fit() calculate()-based workflows. example, suppose want fit hours worked per week using respondent age college completion status. first begin fitting linear model observed data. Now, generate null distributions terms, can fit 1000 models resamples gss dataset, response hours permuted . Note code except addition hypothesize generate step. permute variables response variable, variables argument generate() allows choose columns data permute. Note derived effects depend columns (e.g., interaction effects) also affected. Beyond point, observed fits distributions null fits interface exactly like analogous outputs calculate(). instance, can use following code calculate 95% confidence interval objects. , can shade p-values observed regression coefficients observed data.","code":"observed_fit <- gss %>% specify(hours ~ age + college) %>% fit() null_fits <- gss %>% specify(hours ~ age + college) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") %>% fit() null_fits ## # A tibble: 3,000 × 3 ## # Groups: replicate [1,000] ## replicate term estimate ## ## 1 1 intercept 40.3 ## 2 1 age 0.0166 ## 3 1 collegedegree 1.20 ## 4 2 intercept 41.3 ## 5 2 age 0.00664 ## 6 2 collegedegree -0.407 ## 7 3 intercept 42.9 ## 8 3 age -0.0371 ## 9 3 collegedegree 0.00431 ## 10 4 intercept 42.7 ## # ℹ 2,990 more rows get_confidence_interval( null_fits, point_estimate = observed_fit, level = .95 ) ## # A tibble: 3 × 3 ## term lower_ci upper_ci ## ## 1 age -0.0948 0.0987 ## 2 collegedegree -2.57 2.72 ## 3 intercept 37.4 45.5 visualize(null_fits) + shade_p_value(observed_fit, direction = \"both\") ## Warning in (function (mapping = NULL, data = NULL, stat = \"identity\", position = \"identity\", : All aesthetics have length 1, but the data has 1000 rows. ## ℹ Did you mean to use `annotate()`? ## Warning in (function (mapping = NULL, data = NULL, stat = \"identity\", position = \"identity\", : All aesthetics have length 1, but the data has 1000 rows. ## ℹ Did you mean to use `annotate()`? ## Warning in (function (mapping = NULL, data = NULL, stat = \"identity\", position = \"identity\", : All aesthetics have length 1, but the data has 1000 rows. ## ℹ Did you mean to use `annotate()`?"},{"path":"https://infer.tidymodels.org/dev/articles/infer.html","id":"conclusion","dir":"Articles","previous_headings":"","what":"Conclusion","title":"Getting to Know infer","text":"’s ! vignette covers key functionality infer. See help(package = \"infer\") full list functions vignettes.","code":""},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Full infer Pipeline Examples","text":"vignette intended provide set examples nearly exhaustively demonstrate functionalities provided infer. Commentary examples limited—discussion intuition behind package, see “Getting Know infer” vignette, accessible calling vignette(\"infer\"). Throughout vignette, ’ll make use gss dataset supplied infer, contains sample data General Social Survey. See ?gss information variables included source. Note data (examples ) demonstration purposes , necessarily provide accurate estimates unless weighted properly. examples, let’s suppose dataset representative sample population want learn : American adults. data looks like :","code":"# load in the dataset data(gss) # take a look at its structure dplyr::glimpse(gss) ## Rows: 500 ## Columns: 11 ## $ year 2014, 1994, 1998, 1996, 1994, 1996, 1990, 2016, 2000, 19… ## $ age 36, 34, 24, 42, 31, 32, 48, 36, 30, 33, 21, 30, 38, 49, … ## $ sex male, female, male, male, male, female, female, female, … ## $ college degree, no degree, degree, no degree, degree, no degree,… ## $ partyid ind, rep, ind, ind, rep, rep, dem, ind, rep, dem, dem, i… ## $ hompop 3, 4, 1, 4, 2, 4, 2, 1, 5, 2, 4, 3, 4, 4, 2, 2, 3, 2, 1,… ## $ hours 50, 31, 40, 40, 40, 53, 32, 20, 40, 40, 23, 52, 38, 72, … ## $ income $25000 or more, $20000 - 24999, $25000 or more, $25000 o… ## $ class middle class, working class, working class, working clas… ## $ finrela below average, below average, below average, above avera… ## $ weight 0.8960, 1.0825, 0.5501, 1.0864, 1.0825, 1.0864, 1.0627, …"},{"path":[]},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-numerical-variable-mean","dir":"Articles","previous_headings":"Hypothesis tests","what":"One numerical variable (mean)","title":"Full infer Pipeline Examples","text":"Calculating observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Visualizing observed statistic alongside null distribution, Calculating p-value null distribution observed statistic,","code":"x_bar <- gss %>% specify(response = hours) %>% calculate(stat = \"mean\") x_bar <- gss %>% observe(response = hours, stat = \"mean\") null_dist <- gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% generate(reps = 1000) %>% calculate(stat = \"mean\") visualize(null_dist) + shade_p_value(obs_stat = x_bar, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = x_bar, direction = \"two-sided\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.042"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-numerical-variable-standardized-mean-t","dir":"Articles","previous_headings":"Hypothesis tests","what":"One numerical variable (standardized mean \\(t\\))","title":"Full infer Pipeline Examples","text":"Calculating observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Alternatively, finding null distribution using theoretical methods using assume() verb, Visualizing observed statistic alongside null distribution, Alternatively, visualizing observed statistic using theory-based null distribution, Alternatively, visualizing observed statistic using null distributions, Note code makes use randomization-based null distribution. Calculating p-value null distribution observed statistic, Alternatively, using t_test wrapper: infer support testing one numerical variable via z distribution.","code":"t_bar <- gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% calculate(stat = \"t\") t_bar <- gss %>% observe(response = hours, null = \"point\", mu = 40, stat = \"t\") null_dist <- gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% generate(reps = 1000) %>% calculate(stat = \"t\") null_dist_theory <- gss %>% specify(response = hours) %>% assume(\"t\") visualize(null_dist) + shade_p_value(obs_stat = t_bar, direction = \"two-sided\") visualize(null_dist_theory) + shade_p_value(obs_stat = t_bar, direction = \"two-sided\") visualize(null_dist, method = \"both\") + shade_p_value(obs_stat = t_bar, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = t_bar, direction = \"two-sided\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.038 gss %>% t_test(response = hours, mu = 40) ## # A tibble: 1 × 7 ## statistic t_df p_value alternative estimate lower_ci upper_ci ## ## 1 2.09 499 0.0376 two.sided 41.4 40.1 42.7"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-numerical-variable-median","dir":"Articles","previous_headings":"Hypothesis tests","what":"One numerical variable (median)","title":"Full infer Pipeline Examples","text":"Calculating observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Visualizing observed statistic alongside null distribution, Calculating p-value null distribution observed statistic,","code":"x_tilde <- gss %>% specify(response = age) %>% calculate(stat = \"median\") x_tilde <- gss %>% observe(response = age, stat = \"median\") null_dist <- gss %>% specify(response = age) %>% hypothesize(null = \"point\", med = 40) %>% generate(reps = 1000) %>% calculate(stat = \"median\") visualize(null_dist) + shade_p_value(obs_stat = x_tilde, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = x_tilde, direction = \"two-sided\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.008"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-numerical-variable-paired","dir":"Articles","previous_headings":"Hypothesis tests","what":"One numerical variable (paired)","title":"Full infer Pipeline Examples","text":"example header compatible stats \"mean\", \"median\", \"sum\", \"sd\". Suppose survey respondents provided number hours worked per week surveyed 5 years prior, encoded hours_previous. ’d like test null hypothesis \"mean\" hours worked per week change sampled time five years prior. infer supports paired hypothesis testing via null = \"paired independence\" argument hypothesize(). Calculating observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Note diff column permuted, rather signs values column. Visualizing observed statistic alongside null distribution, Calculating p-value null distribution observed statistic,","code":"set.seed(1) gss_paired <- gss %>% mutate( hours_previous = hours + 5 - rpois(nrow(.), 4.8), diff = hours - hours_previous ) gss_paired %>% select(hours, hours_previous, diff) ## # A tibble: 500 × 3 ## hours hours_previous diff ## ## 1 50 52 -2 ## 2 31 32 -1 ## 3 40 40 0 ## 4 40 37 3 ## 5 40 42 -2 ## 6 53 50 3 ## 7 32 28 4 ## 8 20 19 1 ## 9 40 40 0 ## 10 40 43 -3 ## # ℹ 490 more rows x_tilde <- gss_paired %>% specify(response = diff) %>% calculate(stat = \"mean\") x_tilde <- gss_paired %>% observe(response = diff, stat = \"mean\") null_dist <- gss_paired %>% specify(response = diff) %>% hypothesize(null = \"paired independence\") %>% generate(reps = 1000, type = \"permute\") %>% calculate(stat = \"mean\") visualize(null_dist) + shade_p_value(obs_stat = x_tilde, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = x_tilde, direction = \"two-sided\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.028"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-categorical-one-proportion","dir":"Articles","previous_headings":"Hypothesis tests","what":"One categorical (one proportion)","title":"Full infer Pipeline Examples","text":"Calculating observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Visualizing observed statistic alongside null distribution, Calculating p-value null distribution observed statistic, Note logical variables coerced factors:","code":"p_hat <- gss %>% specify(response = sex, success = \"female\") %>% calculate(stat = \"prop\") p_hat <- gss %>% observe(response = sex, success = \"female\", stat = \"prop\") null_dist <- gss %>% specify(response = sex, success = \"female\") %>% hypothesize(null = \"point\", p = .5) %>% generate(reps = 1000) %>% calculate(stat = \"prop\") visualize(null_dist) + shade_p_value(obs_stat = p_hat, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = p_hat, direction = \"two-sided\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.276 null_dist <- gss %>% dplyr::mutate(is_female = (sex == \"female\")) %>% specify(response = is_female, success = \"TRUE\") %>% hypothesize(null = \"point\", p = .5) %>% generate(reps = 1000) %>% calculate(stat = \"prop\")"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-categorical-variable-standardized-proportion-z","dir":"Articles","previous_headings":"Hypothesis tests","what":"One categorical variable (standardized proportion \\(z\\))","title":"Full infer Pipeline Examples","text":"Calculating observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Visualizing observed statistic alongside null distribution, Calculating p-value null distribution observed statistic, package also supplies wrapper around prop.test tests single proportion tidy data. infer support testing two means via z distribution.","code":"p_hat <- gss %>% specify(response = sex, success = \"female\") %>% hypothesize(null = \"point\", p = .5) %>% calculate(stat = \"z\") p_hat <- gss %>% observe(response = sex, success = \"female\", null = \"point\", p = .5, stat = \"z\") null_dist <- gss %>% specify(response = sex, success = \"female\") %>% hypothesize(null = \"point\", p = .5) %>% generate(reps = 1000, type = \"draw\") %>% calculate(stat = \"z\") visualize(null_dist) + shade_p_value(obs_stat = p_hat, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = p_hat, direction = \"two-sided\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.252 prop_test(gss, college ~ NULL, p = .2) ## # A tibble: 1 × 4 ## statistic chisq_df p_value alternative ## ## 1 636. 1 2.98e-140 two.sided"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"two-categorical-2-level-variables","dir":"Articles","previous_headings":"Hypothesis tests","what":"Two categorical (2 level) variables","title":"Full infer Pipeline Examples","text":"infer package provides several statistics work data type. One statistic difference proportions. Calculating observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Visualizing observed statistic alongside null distribution, Calculating p-value null distribution observed statistic, infer also provides functionality calculate ratios proportions. workflow looks similar diff props. Calculating observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Visualizing observed statistic alongside null distribution, Calculating p-value null distribution observed statistic, addition, package provides functionality calculate odds ratios. workflow also looks similar diff props. Calculating observed statistic, , generating null distribution, Visualizing observed statistic alongside null distribution, Calculating p-value null distribution observed statistic,","code":"d_hat <- gss %>% specify(college ~ sex, success = \"no degree\") %>% calculate(stat = \"diff in props\", order = c(\"female\", \"male\")) d_hat <- gss %>% observe(college ~ sex, success = \"no degree\", stat = \"diff in props\", order = c(\"female\", \"male\")) null_dist <- gss %>% specify(college ~ sex, success = \"no degree\") %>% hypothesize(null = \"independence\") %>% generate(reps = 1000) %>% calculate(stat = \"diff in props\", order = c(\"female\", \"male\")) visualize(null_dist) + shade_p_value(obs_stat = d_hat, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = d_hat, direction = \"two-sided\") ## # A tibble: 1 × 1 ## p_value ## ## 1 1 r_hat <- gss %>% specify(college ~ sex, success = \"no degree\") %>% calculate(stat = \"ratio of props\", order = c(\"female\", \"male\")) r_hat <- gss %>% observe(college ~ sex, success = \"no degree\", stat = \"ratio of props\", order = c(\"female\", \"male\")) null_dist <- gss %>% specify(college ~ sex, success = \"no degree\") %>% hypothesize(null = \"independence\") %>% generate(reps = 1000) %>% calculate(stat = \"ratio of props\", order = c(\"female\", \"male\")) visualize(null_dist) + shade_p_value(obs_stat = r_hat, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = r_hat, direction = \"two-sided\") ## # A tibble: 1 × 1 ## p_value ## ## 1 1 or_hat <- gss %>% specify(college ~ sex, success = \"no degree\") %>% calculate(stat = \"odds ratio\", order = c(\"female\", \"male\")) null_dist <- gss %>% specify(college ~ sex, success = \"no degree\") %>% hypothesize(null = \"independence\") %>% generate(reps = 1000) %>% calculate(stat = \"odds ratio\", order = c(\"female\", \"male\")) visualize(null_dist) + shade_p_value(obs_stat = or_hat, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = or_hat, direction = \"two-sided\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.984"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"two-categorical-2-level-variables-z","dir":"Articles","previous_headings":"Hypothesis tests","what":"Two categorical (2 level) variables (z)","title":"Full infer Pipeline Examples","text":"Finding standardized observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Alternatively, finding null distribution using theoretical methods using assume() verb, Visualizing observed statistic alongside null distribution, Alternatively, visualizing observed statistic using theory-based null distribution, Alternatively, visualizing observed statistic using null distributions, Note code makes use randomization-based null distribution. Calculating p-value null distribution observed statistic, Note similarities plot previous one. package also supplies wrapper around prop.test allow tests equality proportions tidy data.","code":"z_hat <- gss %>% specify(college ~ sex, success = \"no degree\") %>% hypothesize(null = \"independence\") %>% calculate(stat = \"z\", order = c(\"female\", \"male\")) z_hat <- gss %>% observe(college ~ sex, success = \"no degree\", stat = \"z\", order = c(\"female\", \"male\")) null_dist <- gss %>% specify(college ~ sex, success = \"no degree\") %>% hypothesize(null = \"independence\") %>% generate(reps = 1000) %>% calculate(stat = \"z\", order = c(\"female\", \"male\")) null_dist_theory <- gss %>% specify(college ~ sex, success = \"no degree\") %>% assume(\"z\") visualize(null_dist) + shade_p_value(obs_stat = z_hat, direction = \"two-sided\") visualize(null_dist_theory) + shade_p_value(obs_stat = z_hat, direction = \"two-sided\") visualize(null_dist, method = \"both\") + shade_p_value(obs_stat = z_hat, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = z_hat, direction = \"two-sided\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.98 prop_test(gss, college ~ sex, order = c(\"female\", \"male\")) ## # A tibble: 1 × 6 ## statistic chisq_df p_value alternative lower_ci upper_ci ## ## 1 0.0000204 1 0.996 two.sided -0.0918 0.0834"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-categorical-2-level---gof","dir":"Articles","previous_headings":"Hypothesis tests","what":"One categorical (>2 level) - GoF","title":"Full infer Pipeline Examples","text":"Calculating observed statistic, Note need add hypothesized values compute observed statistic. Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Alternatively, finding null distribution using theoretical methods using assume() verb, Visualizing observed statistic alongside null distribution, Alternatively, visualizing observed statistic using theory-based null distribution, Alternatively, visualizing observed statistic using null distributions, Note code makes use randomization-based null distribution. Calculating p-value null distribution observed statistic, Alternatively, using chisq_test wrapper:","code":"Chisq_hat <- gss %>% specify(response = finrela) %>% hypothesize(null = \"point\", p = c(\"far below average\" = 1/6, \"below average\" = 1/6, \"average\" = 1/6, \"above average\" = 1/6, \"far above average\" = 1/6, \"DK\" = 1/6)) %>% calculate(stat = \"Chisq\") Chisq_hat <- gss %>% observe(response = finrela, null = \"point\", p = c(\"far below average\" = 1/6, \"below average\" = 1/6, \"average\" = 1/6, \"above average\" = 1/6, \"far above average\" = 1/6, \"DK\" = 1/6), stat = \"Chisq\") null_dist <- gss %>% specify(response = finrela) %>% hypothesize(null = \"point\", p = c(\"far below average\" = 1/6, \"below average\" = 1/6, \"average\" = 1/6, \"above average\" = 1/6, \"far above average\" = 1/6, \"DK\" = 1/6)) %>% generate(reps = 1000, type = \"draw\") %>% calculate(stat = \"Chisq\") null_dist_theory <- gss %>% specify(response = finrela) %>% assume(\"Chisq\") visualize(null_dist) + shade_p_value(obs_stat = Chisq_hat, direction = \"greater\") visualize(null_dist_theory) + shade_p_value(obs_stat = Chisq_hat, direction = \"greater\") visualize(null_dist_theory, method = \"both\") + shade_p_value(obs_stat = Chisq_hat, direction = \"greater\") null_dist %>% get_p_value(obs_stat = Chisq_hat, direction = \"greater\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0 chisq_test(gss, response = finrela, p = c(\"far below average\" = 1/6, \"below average\" = 1/6, \"average\" = 1/6, \"above average\" = 1/6, \"far above average\" = 1/6, \"DK\" = 1/6)) ## # A tibble: 1 × 3 ## statistic chisq_df p_value ## ## 1 488. 5 3.13e-103"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"two-categorical-2-level-chi-squared-test-of-independence","dir":"Articles","previous_headings":"Hypothesis tests","what":"Two categorical (>2 level): Chi-squared test of independence","title":"Full infer Pipeline Examples","text":"Calculating observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Alternatively, finding null distribution using theoretical methods using assume() verb, Visualizing observed statistic alongside null distribution, Alternatively, visualizing observed statistic using theory-based null distribution, Alternatively, visualizing observed statistic using null distributions, Note code makes use randomization-based null distribution. Calculating p-value null distribution observed statistic, Alternatively, using wrapper carry test,","code":"Chisq_hat <- gss %>% specify(formula = finrela ~ sex) %>% hypothesize(null = \"independence\") %>% calculate(stat = \"Chisq\") Chisq_hat <- gss %>% observe(formula = finrela ~ sex, stat = \"Chisq\") null_dist <- gss %>% specify(finrela ~ sex) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") %>% calculate(stat = \"Chisq\") null_dist_theory <- gss %>% specify(finrela ~ sex) %>% assume(distribution = \"Chisq\") visualize(null_dist) + shade_p_value(obs_stat = Chisq_hat, direction = \"greater\") visualize(null_dist_theory) + shade_p_value(obs_stat = Chisq_hat, direction = \"greater\") visualize(null_dist, method = \"both\") + shade_p_value(obs_stat = Chisq_hat, direction = \"greater\") null_dist %>% get_p_value(obs_stat = Chisq_hat, direction = \"greater\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.118 gss %>% chisq_test(formula = finrela ~ sex) ## # A tibble: 1 × 3 ## statistic chisq_df p_value ## ## 1 9.11 5 0.105"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-numerical-variable-one-categorical-2-levels-diff-in-means","dir":"Articles","previous_headings":"Hypothesis tests","what":"One numerical variable, one categorical (2 levels) (diff in means)","title":"Full infer Pipeline Examples","text":"Calculating observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Visualizing observed statistic alongside null distribution, Calculating p-value null distribution observed statistic,","code":"d_hat <- gss %>% specify(age ~ college) %>% calculate(stat = \"diff in means\", order = c(\"degree\", \"no degree\")) d_hat <- gss %>% observe(age ~ college, stat = \"diff in means\", order = c(\"degree\", \"no degree\")) null_dist <- gss %>% specify(age ~ college) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") %>% calculate(stat = \"diff in means\", order = c(\"degree\", \"no degree\")) visualize(null_dist) + shade_p_value(obs_stat = d_hat, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = d_hat, direction = \"two-sided\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.46"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-numerical-variable-one-categorical-2-levels-t","dir":"Articles","previous_headings":"Hypothesis tests","what":"One numerical variable, one categorical (2 levels) (t)","title":"Full infer Pipeline Examples","text":"Finding standardized observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Alternatively, finding null distribution using theoretical methods using assume() verb, Visualizing observed statistic alongside null distribution, Alternatively, visualizing observed statistic using theory-based null distribution, Alternatively, visualizing observed statistic using null distributions, Note code makes use randomization-based null distribution. Calculating p-value null distribution observed statistic, Note similarities plot previous one.","code":"t_hat <- gss %>% specify(age ~ college) %>% hypothesize(null = \"independence\") %>% calculate(stat = \"t\", order = c(\"degree\", \"no degree\")) t_hat <- gss %>% observe(age ~ college, stat = \"t\", order = c(\"degree\", \"no degree\")) null_dist <- gss %>% specify(age ~ college) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") %>% calculate(stat = \"t\", order = c(\"degree\", \"no degree\")) null_dist_theory <- gss %>% specify(age ~ college) %>% assume(\"t\") visualize(null_dist) + shade_p_value(obs_stat = t_hat, direction = \"two-sided\") visualize(null_dist_theory) + shade_p_value(obs_stat = t_hat, direction = \"two-sided\") visualize(null_dist, method = \"both\") + shade_p_value(obs_stat = t_hat, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = t_hat, direction = \"two-sided\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.442"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-numerical-variable-one-categorical-2-levels-diff-in-medians","dir":"Articles","previous_headings":"Hypothesis tests","what":"One numerical variable, one categorical (2 levels) (diff in medians)","title":"Full infer Pipeline Examples","text":"Calculating observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Visualizing observed statistic alongside null distribution, Calculating p-value null distribution observed statistic,","code":"d_hat <- gss %>% specify(age ~ college) %>% calculate(stat = \"diff in medians\", order = c(\"degree\", \"no degree\")) d_hat <- gss %>% observe(age ~ college, stat = \"diff in medians\", order = c(\"degree\", \"no degree\")) null_dist <- gss %>% specify(age ~ college) %>% # alt: response = age, explanatory = season hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") %>% calculate(stat = \"diff in medians\", order = c(\"degree\", \"no degree\")) visualize(null_dist) + shade_p_value(obs_stat = d_hat, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = d_hat, direction = \"two-sided\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.172"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-numerical-one-categorical-2-levels---anova","dir":"Articles","previous_headings":"Hypothesis tests","what":"One numerical, one categorical (>2 levels) - ANOVA","title":"Full infer Pipeline Examples","text":"Calculating observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Alternatively, finding null distribution using theoretical methods using assume() verb, Visualizing observed statistic alongside null distribution, Alternatively, visualizing observed statistic using theory-based null distribution, Alternatively, visualizing observed statistic using null distributions, Note code makes use randomization-based null distribution. Calculating p-value null distribution observed statistic,","code":"F_hat <- gss %>% specify(age ~ partyid) %>% calculate(stat = \"F\") F_hat <- gss %>% observe(age ~ partyid, stat = \"F\") null_dist <- gss %>% specify(age ~ partyid) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") %>% calculate(stat = \"F\") null_dist_theory <- gss %>% specify(age ~ partyid) %>% hypothesize(null = \"independence\") %>% assume(distribution = \"F\") visualize(null_dist) + shade_p_value(obs_stat = F_hat, direction = \"greater\") visualize(null_dist_theory) + shade_p_value(obs_stat = F_hat, direction = \"greater\") visualize(null_dist, method = \"both\") + shade_p_value(obs_stat = F_hat, direction = \"greater\") null_dist %>% get_p_value(obs_stat = F_hat, direction = \"greater\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.045"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"two-numerical-vars---slr","dir":"Articles","previous_headings":"Hypothesis tests","what":"Two numerical vars - SLR","title":"Full infer Pipeline Examples","text":"Calculating observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Visualizing observed statistic alongside null distribution, Calculating p-value null distribution observed statistic,","code":"slope_hat <- gss %>% specify(hours ~ age) %>% calculate(stat = \"slope\") slope_hat <- gss %>% observe(hours ~ age, stat = \"slope\") null_dist <- gss %>% specify(hours ~ age) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") %>% calculate(stat = \"slope\") visualize(null_dist) + shade_p_value(obs_stat = slope_hat, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = slope_hat, direction = \"two-sided\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.902"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"two-numerical-vars---correlation","dir":"Articles","previous_headings":"Hypothesis tests","what":"Two numerical vars - correlation","title":"Full infer Pipeline Examples","text":"Calculating observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Visualizing observed statistic alongside null distribution, Calculating p-value null distribution observed statistic,","code":"correlation_hat <- gss %>% specify(hours ~ age) %>% calculate(stat = \"correlation\") correlation_hat <- gss %>% observe(hours ~ age, stat = \"correlation\") null_dist <- gss %>% specify(hours ~ age) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") %>% calculate(stat = \"correlation\") visualize(null_dist) + shade_p_value(obs_stat = correlation_hat, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = correlation_hat, direction = \"two-sided\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.878"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"two-numerical-vars---slr-t","dir":"Articles","previous_headings":"Hypothesis tests","what":"Two numerical vars - SLR (t)","title":"Full infer Pipeline Examples","text":"currently implemented since \\(t\\) refer standardized slope standardized correlation.","code":""},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"multiple-explanatory-variables","dir":"Articles","previous_headings":"Hypothesis tests","what":"Multiple explanatory variables","title":"Full infer Pipeline Examples","text":"Calculating observed fit, Generating distribution fits response variable permuted, Generating distribution fits explanatory variable permuted independently, Visualizing observed fit alongside null fits, Calculating p-values null distribution observed fit, Note fit()-based workflow can applied use cases differing numbers explanatory variables explanatory variable types.","code":"obs_fit <- gss %>% specify(hours ~ age + college) %>% fit() null_dist <- gss %>% specify(hours ~ age + college) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") %>% fit() null_dist2 <- gss %>% specify(hours ~ age + college) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\", variables = c(age, college)) %>% fit() visualize(null_dist) + shade_p_value(obs_stat = obs_fit, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = obs_fit, direction = \"two-sided\") ## # A tibble: 3 × 2 ## term p_value ## ## 1 age 0.914 ## 2 collegedegree 0.266 ## 3 intercept 0.734"},{"path":[]},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-numerical-one-mean","dir":"Articles","previous_headings":"Confidence intervals","what":"One numerical (one mean)","title":"Full infer Pipeline Examples","text":"Finding observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating bootstrap distribution, Use bootstrap distribution find confidence interval, Visualizing observed statistic alongside distribution, Alternatively, use bootstrap distribution find confidence interval using standard error, Instead simulation-based bootstrap distribution, can also define theory-based sampling distribution, Visualization calculation confidence intervals interfaces way simulation-based distribution, Note t distribution recentered rescaled lie scale observed data. infer support confidence intervals means via z distribution.","code":"x_bar <- gss %>% specify(response = hours) %>% calculate(stat = \"mean\") x_bar <- gss %>% observe(response = hours, stat = \"mean\") boot_dist <- gss %>% specify(response = hours) %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"mean\") percentile_ci <- get_ci(boot_dist) visualize(boot_dist) + shade_confidence_interval(endpoints = percentile_ci) standard_error_ci <- get_ci(boot_dist, type = \"se\", point_estimate = x_bar) visualize(boot_dist) + shade_confidence_interval(endpoints = standard_error_ci) sampling_dist <- gss %>% specify(response = hours) %>% assume(distribution = \"t\") theor_ci <- get_ci(sampling_dist, point_estimate = x_bar) theor_ci ## # A tibble: 1 × 2 ## lower_ci upper_ci ## ## 1 40.1 42.7 visualize(sampling_dist) + shade_confidence_interval(endpoints = theor_ci)"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-numerical-one-mean---standardized","dir":"Articles","previous_headings":"Confidence intervals","what":"One numerical (one mean - standardized)","title":"Full infer Pipeline Examples","text":"Finding observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating bootstrap distribution, Use bootstrap distribution find confidence interval, Visualizing observed statistic alongside distribution, Alternatively, use bootstrap distribution find confidence interval using standard error, See subsection (one mean) theory-based approach. Note infer support confidence intervals means via z distribution.","code":"t_hat <- gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% calculate(stat = \"t\") t_hat <- gss %>% observe(response = hours, null = \"point\", mu = 40, stat = \"t\") boot_dist <- gss %>% specify(response = hours) %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"t\") percentile_ci <- get_ci(boot_dist) visualize(boot_dist) + shade_confidence_interval(endpoints = percentile_ci) standard_error_ci <- boot_dist %>% get_ci(type = \"se\", point_estimate = t_hat) visualize(boot_dist) + shade_confidence_interval(endpoints = standard_error_ci)"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-categorical-one-proportion-1","dir":"Articles","previous_headings":"Confidence intervals","what":"One categorical (one proportion)","title":"Full infer Pipeline Examples","text":"Finding observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating bootstrap distribution, Use bootstrap distribution find confidence interval, Visualizing observed statistic alongside distribution, Alternatively, use bootstrap distribution find confidence interval using standard error, Instead simulation-based bootstrap distribution, can also define theory-based sampling distribution, Visualization calculation confidence intervals interfaces way simulation-based distribution, Note z distribution recentered rescaled lie scale observed data. infer support confidence intervals means via z distribution.","code":"p_hat <- gss %>% specify(response = sex, success = \"female\") %>% calculate(stat = \"prop\") p_hat <- gss %>% observe(response = sex, success = \"female\", stat = \"prop\") boot_dist <- gss %>% specify(response = sex, success = \"female\") %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"prop\") percentile_ci <- get_ci(boot_dist) visualize(boot_dist) + shade_confidence_interval(endpoints = percentile_ci) standard_error_ci <- boot_dist %>% get_ci(type = \"se\", point_estimate = p_hat) visualize(boot_dist) + shade_confidence_interval(endpoints = standard_error_ci) sampling_dist <- gss %>% specify(response = sex, success = \"female\") %>% assume(distribution = \"z\") theor_ci <- get_ci(sampling_dist, point_estimate = p_hat) theor_ci ## # A tibble: 1 × 2 ## lower_ci upper_ci ## ## 1 0.430 0.518 visualize(sampling_dist) + shade_confidence_interval(endpoints = theor_ci)"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-categorical-variable-standardized-proportion-z-1","dir":"Articles","previous_headings":"Confidence intervals","what":"One categorical variable (standardized proportion \\(z\\))","title":"Full infer Pipeline Examples","text":"See subsection (one proportion) theory-based approach.","code":""},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-numerical-variable-one-categorical-2-levels-diff-in-means-1","dir":"Articles","previous_headings":"Confidence intervals","what":"One numerical variable, one categorical (2 levels) (diff in means)","title":"Full infer Pipeline Examples","text":"Finding observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating bootstrap distribution, Use bootstrap distribution find confidence interval, Visualizing observed statistic alongside distribution, Alternatively, use bootstrap distribution find confidence interval using standard error, Instead simulation-based bootstrap distribution, can also define theory-based sampling distribution, Visualization calculation confidence intervals interfaces way simulation-based distribution, Note t distribution recentered rescaled lie scale observed data. infer also provides functionality calculate ratios means. workflow looks similar diff means. Finding observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating bootstrap distribution, Use bootstrap distribution find confidence interval, Visualizing observed statistic alongside distribution, Alternatively, use bootstrap distribution find confidence interval using standard error,","code":"d_hat <- gss %>% specify(hours ~ college) %>% calculate(stat = \"diff in means\", order = c(\"degree\", \"no degree\")) d_hat <- gss %>% observe(hours ~ college, stat = \"diff in means\", order = c(\"degree\", \"no degree\")) boot_dist <- gss %>% specify(hours ~ college) %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"diff in means\", order = c(\"degree\", \"no degree\")) percentile_ci <- get_ci(boot_dist) visualize(boot_dist) + shade_confidence_interval(endpoints = percentile_ci) standard_error_ci <- boot_dist %>% get_ci(type = \"se\", point_estimate = d_hat) visualize(boot_dist) + shade_confidence_interval(endpoints = standard_error_ci) sampling_dist <- gss %>% specify(hours ~ college) %>% assume(distribution = \"t\") theor_ci <- get_ci(sampling_dist, point_estimate = d_hat) theor_ci ## # A tibble: 1 × 2 ## lower_ci upper_ci ## ## 1 -1.16 4.24 visualize(sampling_dist) + shade_confidence_interval(endpoints = theor_ci) d_hat <- gss %>% specify(hours ~ college) %>% calculate(stat = \"ratio of means\", order = c(\"degree\", \"no degree\")) d_hat <- gss %>% observe(hours ~ college, stat = \"ratio of means\", order = c(\"degree\", \"no degree\")) boot_dist <- gss %>% specify(hours ~ college) %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"ratio of means\", order = c(\"degree\", \"no degree\")) percentile_ci <- get_ci(boot_dist) visualize(boot_dist) + shade_confidence_interval(endpoints = percentile_ci) standard_error_ci <- boot_dist %>% get_ci(type = \"se\", point_estimate = d_hat) visualize(boot_dist) + shade_confidence_interval(endpoints = standard_error_ci)"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-numerical-variable-one-categorical-2-levels-t-1","dir":"Articles","previous_headings":"Confidence intervals","what":"One numerical variable, one categorical (2 levels) (t)","title":"Full infer Pipeline Examples","text":"Finding standardized point estimate, Alternatively, using observe() wrapper calculate observed statistic, , generating bootstrap distribution, Use bootstrap distribution find confidence interval, Visualizing observed statistic alongside distribution, Alternatively, use bootstrap distribution find confidence interval using standard error, See subsection (diff means) theory-based approach. infer support confidence intervals means via z distribution.","code":"t_hat <- gss %>% specify(hours ~ college) %>% calculate(stat = \"t\", order = c(\"degree\", \"no degree\")) t_hat <- gss %>% observe(hours ~ college, stat = \"t\", order = c(\"degree\", \"no degree\")) boot_dist <- gss %>% specify(hours ~ college) %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"t\", order = c(\"degree\", \"no degree\")) percentile_ci <- get_ci(boot_dist) visualize(boot_dist) + shade_confidence_interval(endpoints = percentile_ci) standard_error_ci <- boot_dist %>% get_ci(type = \"se\", point_estimate = t_hat) visualize(boot_dist) + shade_confidence_interval(endpoints = standard_error_ci)"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"two-categorical-variables-diff-in-proportions","dir":"Articles","previous_headings":"Confidence intervals","what":"Two categorical variables (diff in proportions)","title":"Full infer Pipeline Examples","text":"Finding observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating bootstrap distribution, Use bootstrap distribution find confidence interval, Visualizing observed statistic alongside distribution, Alternatively, use bootstrap distribution find confidence interval using standard error, Instead simulation-based bootstrap distribution, can also define theory-based sampling distribution, Visualization calculation confidence intervals interfaces way simulation-based distribution, Note z distribution recentered rescaled lie scale observed data.","code":"d_hat <- gss %>% specify(college ~ sex, success = \"degree\") %>% calculate(stat = \"diff in props\", order = c(\"female\", \"male\")) d_hat <- gss %>% observe(college ~ sex, success = \"degree\", stat = \"diff in props\", order = c(\"female\", \"male\")) boot_dist <- gss %>% specify(college ~ sex, success = \"degree\") %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"diff in props\", order = c(\"female\", \"male\")) percentile_ci <- get_ci(boot_dist) visualize(boot_dist) + shade_confidence_interval(endpoints = percentile_ci) standard_error_ci <- boot_dist %>% get_ci(type = \"se\", point_estimate = d_hat) visualize(boot_dist) + shade_confidence_interval(endpoints = standard_error_ci) sampling_dist <- gss %>% specify(college ~ sex, success = \"degree\") %>% assume(distribution = \"z\") theor_ci <- get_ci(sampling_dist, point_estimate = d_hat) theor_ci ## # A tibble: 1 × 2 ## lower_ci upper_ci ## ## 1 -0.0794 0.0878 visualize(sampling_dist) + shade_confidence_interval(endpoints = theor_ci)"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"two-categorical-variables-z","dir":"Articles","previous_headings":"Confidence intervals","what":"Two categorical variables (z)","title":"Full infer Pipeline Examples","text":"Finding standardized point estimate, Alternatively, using observe() wrapper calculate observed statistic, , generating bootstrap distribution, Use bootstrap distribution find confidence interval, Visualizing observed statistic alongside distribution, Alternatively, use bootstrap distribution find confidence interval using standard error, See subsection (diff props) theory-based approach.","code":"z_hat <- gss %>% specify(college ~ sex, success = \"degree\") %>% calculate(stat = \"z\", order = c(\"female\", \"male\")) z_hat <- gss %>% observe(college ~ sex, success = \"degree\", stat = \"z\", order = c(\"female\", \"male\")) boot_dist <- gss %>% specify(college ~ sex, success = \"degree\") %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"z\", order = c(\"female\", \"male\")) percentile_ci <- get_ci(boot_dist) visualize(boot_dist) + shade_confidence_interval(endpoints = percentile_ci) standard_error_ci <- boot_dist %>% get_ci(type = \"se\", point_estimate = z_hat) visualize(boot_dist) + shade_confidence_interval(endpoints = standard_error_ci)"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"two-numerical-vars---slr-1","dir":"Articles","previous_headings":"Confidence intervals","what":"Two numerical vars - SLR","title":"Full infer Pipeline Examples","text":"Finding observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating bootstrap distribution, Use bootstrap distribution find confidence interval, Visualizing observed statistic alongside distribution, Alternatively, use bootstrap distribution find confidence interval using standard error,","code":"slope_hat <- gss %>% specify(hours ~ age) %>% calculate(stat = \"slope\") slope_hat <- gss %>% observe(hours ~ age, stat = \"slope\") boot_dist <- gss %>% specify(hours ~ age) %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"slope\") percentile_ci <- get_ci(boot_dist) visualize(boot_dist) + shade_confidence_interval(endpoints = percentile_ci) standard_error_ci <- boot_dist %>% get_ci(type = \"se\", point_estimate = slope_hat) visualize(boot_dist) + shade_confidence_interval(endpoints = standard_error_ci)"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"two-numerical-vars---correlation-1","dir":"Articles","previous_headings":"Confidence intervals","what":"Two numerical vars - correlation","title":"Full infer Pipeline Examples","text":"Finding observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating bootstrap distribution, Use bootstrap distribution find confidence interval, Visualizing observed statistic alongside distribution, Alternatively, use bootstrap distribution find confidence interval using standard error,","code":"correlation_hat <- gss %>% specify(hours ~ age) %>% calculate(stat = \"correlation\") correlation_hat <- gss %>% observe(hours ~ age, stat = \"correlation\") boot_dist <- gss %>% specify(hours ~ age) %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"correlation\") percentile_ci <- get_ci(boot_dist) visualize(boot_dist) + shade_confidence_interval(endpoints = percentile_ci) standard_error_ci <- boot_dist %>% get_ci(type = \"se\", point_estimate = correlation_hat) visualize(boot_dist) + shade_confidence_interval(endpoints = standard_error_ci)"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"two-numerical-vars---t","dir":"Articles","previous_headings":"Confidence intervals","what":"Two numerical vars - t","title":"Full infer Pipeline Examples","text":"currently implemented since \\(t\\) refer standardized slope standardized correlation.","code":""},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"multiple-explanatory-variables-1","dir":"Articles","previous_headings":"Confidence intervals","what":"Multiple explanatory variables","title":"Full infer Pipeline Examples","text":"Calculating observed fit, , generating bootstrap distribution, Use bootstrap distribution find confidence interval, Visualizing observed statistic alongside distribution, Note fit()-based workflow can applied use cases differing numbers explanatory variables explanatory variable types.","code":"obs_fit <- gss %>% specify(hours ~ age + college) %>% fit() boot_dist <- gss %>% specify(hours ~ age + college) %>% generate(reps = 1000, type = \"bootstrap\") %>% fit() conf_ints <- get_confidence_interval( boot_dist, level = .95, point_estimate = obs_fit ) visualize(boot_dist) + shade_confidence_interval(endpoints = conf_ints)"},{"path":"https://infer.tidymodels.org/dev/articles/paired.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Tidy inference for paired data","text":"vignette, ’ll walk conducting randomization-based paired test independence infer. Throughout vignette, ’ll make use gss dataset supplied infer, contains sample data General Social Survey. See ?gss information variables included source. Note data (examples ) demonstration purposes , necessarily provide accurate estimates unless weighted properly. examples, let’s suppose dataset representative sample population want learn : American adults. data looks like : Two sets observations paired observation one column special correspondence connection exactly one observation . purposes vignette, ’ll simulate additional data variable natural pairing: suppose survey respondents provided number hours worked per week surveyed 5 years prior, encoded hours_previous. number hours worked per week particular respondent special correspondence number hours worked 5 years prior hours_previous respondent. ’d like test null hypothesis \"mean\" hours worked per week change sampled time five years prior. carry inference paired data infer, pre-compute difference paired values beginning analysis, use differences values interest. , pre-compute difference paired observations diff. distribution diff observed data looks like : looks distribution, respondents worked similar number hours worked per week 5 hours prior, though seems like may slight decline number hours worked per week aggregate. (know true effect -.2 since ’ve simulated data.) calculate observed statistic paired setting way outside paired setting. Using specify() calculate(): observed statistic -0.202. Now, want compare statistic null distribution, generated assumption true difference actually zero, get sense likely us see observed difference truly change hours worked per week population. Tests paired data carried via null = \"paired independence\" argument hypothesize(). replicate, generate() carries type = \"permute\" null = \"paired independence\" : Randomly sampling vector signs (.e. -1 1), probability .5 either, length equal input data, Multiplying response variable vector signs, “flipping” observed values random subset value replicate get sense distribution looks like, observed statistic falls, can use visualize(): looks like observed mean -0.202 relatively unlikely truly change mean number hours worked per week time period. exactly, can calculate p-value: Thus, change mean number hours worked per week time period truly zero, approximation probability see test statistic extreme -0.202 approximately 0.028. can also generate bootstrap confidence interval mean paired difference using type = \"bootstrap\" generate(). , use pre-computed differences generating bootstrap resamples: Note , unlike null distribution test statistics generated earlier type = \"permute\", distribution centered observed_statistic. Calculating confidence interval: default, get_confidence_interval() constructs lower upper bounds taking observations \\((1 - .95) / 2\\) \\(1 - ((1-.95) / 2)\\)th percentiles. instead build confidence interval using standard error bootstrap distribution, can write: learn randomization-based inference paired observations, see relevant chapter Introduction Modern Statistics.","code":"dplyr::glimpse(gss) ## Rows: 500 ## Columns: 11 ## $ year 2014, 1994, 1998, 1996, 1994, 1996, 1990, 2016, 2000, 19… ## $ age 36, 34, 24, 42, 31, 32, 48, 36, 30, 33, 21, 30, 38, 49, … ## $ sex male, female, male, male, male, female, female, female, … ## $ college degree, no degree, degree, no degree, degree, no degree,… ## $ partyid ind, rep, ind, ind, rep, rep, dem, ind, rep, dem, dem, i… ## $ hompop 3, 4, 1, 4, 2, 4, 2, 1, 5, 2, 4, 3, 4, 4, 2, 2, 3, 2, 1,… ## $ hours 50, 31, 40, 40, 40, 53, 32, 20, 40, 40, 23, 52, 38, 72, … ## $ income $25000 or more, $20000 - 24999, $25000 or more, $25000 o… ## $ class middle class, working class, working class, working clas… ## $ finrela below average, below average, below average, above avera… ## $ weight 0.8960, 1.0825, 0.5501, 1.0864, 1.0825, 1.0864, 1.0627, … set.seed(1) gss_paired <- gss %>% mutate( hours_previous = hours + 5 - rpois(nrow(.), 4.8), diff = hours - hours_previous ) gss_paired %>% select(hours, hours_previous, diff) ## # A tibble: 500 × 3 ## hours hours_previous diff ## ## 1 50 52 -2 ## 2 31 32 -1 ## 3 40 40 0 ## 4 40 37 3 ## 5 40 42 -2 ## 6 53 50 3 ## 7 32 28 4 ## 8 20 19 1 ## 9 40 40 0 ## 10 40 43 -3 ## # ℹ 490 more rows # calculate the observed statistic observed_statistic <- gss_paired %>% specify(response = diff) %>% calculate(stat = \"mean\") # generate the null distribution null_dist <- gss_paired %>% specify(response = diff) %>% hypothesize(null = \"paired independence\") %>% generate(reps = 1000, type = \"permute\") %>% calculate(stat = \"mean\") null_dist ## Response: diff (numeric) ## Null Hypothesis: paired independence ## # A tibble: 1,000 × 2 ## replicate stat ## ## 1 1 -0.146 ## 2 2 0.19 ## 3 3 0.042 ## 4 4 0.034 ## 5 5 -0.138 ## 6 6 -0.03 ## 7 7 0.174 ## 8 8 0.066 ## 9 9 0.01 ## 10 10 0.13 ## # ℹ 990 more rows # visualize the null distribution and test statistic null_dist %>% visualize() + shade_p_value(observed_statistic, direction = \"two-sided\") ## Warning in (function (mapping = NULL, data = NULL, stat = \"identity\", position = \"identity\", : All aesthetics have length 1, but the data has 1000 rows. ## ℹ Did you mean to use `annotate()`? # calculate the p value from the test statistic and null distribution p_value <- null_dist %>% get_p_value(obs_stat = observed_statistic, direction = \"two-sided\") p_value ## # A tibble: 1 × 1 ## p_value ## ## 1 0.028 # generate a bootstrap distribution boot_dist <- gss_paired %>% specify(response = diff) %>% hypothesize(null = \"paired independence\") %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"mean\") visualize(boot_dist) # calculate the confidence from the bootstrap distribution confidence_interval <- boot_dist %>% get_confidence_interval(level = .95) confidence_interval ## # A tibble: 1 × 2 ## lower_ci upper_ci ## ## 1 -0.390 -0.022 boot_dist %>% get_confidence_interval(type = \"se\", point_estimate = observed_statistic, level = .95) ## # A tibble: 1 × 2 ## lower_ci upper_ci ## ## 1 -0.383 -0.0210"},{"path":"https://infer.tidymodels.org/dev/articles/t_test.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Tidy t-Tests with infer","text":"vignette, ’ll walk conducting \\(t\\)-tests randomization-based analogue using infer. ’ll start 1-sample \\(t\\)-test, compares sample mean hypothesized true mean value. , ’ll discuss paired \\(t\\)-tests, special use case 1-sample \\(t\\)-tests, evaluate whether differences paired values (e.g. measure taken person experiment) differ 0. Finally, ’ll wrap 2-sample \\(t\\)-tests, testing difference means two populations using sample data drawn . Throughout vignette, ’ll make use gss dataset supplied infer, contains sample data General Social Survey. See ?gss information variables included source. Note data (examples ) demonstration purposes , necessarily provide accurate estimates unless weighted properly. examples, let’s suppose dataset representative sample population want learn : American adults. data looks like :","code":"dplyr::glimpse(gss) ## Rows: 500 ## Columns: 11 ## $ year 2014, 1994, 1998, 1996, 1994, 1996, 1990, 2016, 2000, 19… ## $ age 36, 34, 24, 42, 31, 32, 48, 36, 30, 33, 21, 30, 38, 49, … ## $ sex male, female, male, male, male, female, female, female, … ## $ college degree, no degree, degree, no degree, degree, no degree,… ## $ partyid ind, rep, ind, ind, rep, rep, dem, ind, rep, dem, dem, i… ## $ hompop 3, 4, 1, 4, 2, 4, 2, 1, 5, 2, 4, 3, 4, 4, 2, 2, 3, 2, 1,… ## $ hours 50, 31, 40, 40, 40, 53, 32, 20, 40, 40, 23, 52, 38, 72, … ## $ income $25000 or more, $20000 - 24999, $25000 or more, $25000 o… ## $ class middle class, working class, working class, working clas… ## $ finrela below average, below average, below average, above avera… ## $ weight 0.8960, 1.0825, 0.5501, 1.0864, 1.0825, 1.0864, 1.0627, …"},{"path":"https://infer.tidymodels.org/dev/articles/t_test.html","id":"sample-t-test","dir":"Articles","previous_headings":"","what":"1-Sample t-Test","title":"Tidy t-Tests with infer","text":"1-sample \\(t\\)-test can used test whether sample continuous data plausibly come population specified mean. example, ’ll test whether average American adult works 40 hours week using data gss. , make use hours variable, giving number hours respondents reported worked previous week. distribution hours observed data looks like : looks like respondents reported worked 40 hours, ’s quite bit variability. Let’s test whether evidence true mean number hours Americans work per week 40. infer’s randomization-based analogue 1-sample \\(t\\)-test 1-sample mean test. ’ll start showcasing test demonstrating carry theory-based \\(t\\)-test package. First, calculate observed statistic, can use specify() calculate(). observed statistic 41.382. Now, want compare statistic null distribution, generated assumption mean actually 40, get sense likely us see observed mean true number hours worked per week population really 40. can generate null distribution using bootstrap. bootstrap, replicate, sample size equal input sample size drawn (replacement) input sample data. allows us get sense much variability ’d expect see entire population can understand unlikely sample mean . get sense distributions look like, observed statistic falls, can use visualize(): looks like observed mean 41.382 relatively unlikely true mean actually 40 hours week. exactly, can calculate p-value: Thus, true mean number hours worked per week really 40, approximation probability see test statistic extreme 41.382 approximately 0.034. Analogously steps shown , package supplies wrapper function, t_test, carry 1-sample \\(t\\)-tests tidy data. Rather using randomization, wrappers carry theory-based \\(t\\)-test. syntax looks like : alternative approach t_test() wrapper calculate observed statistic infer pipeline supply pt function base R. Note pipeline calculate observed statistic includes call hypothesize() since \\(t\\) statistic requires hypothesized mean value. , juxtaposing \\(t\\) statistic associated distribution using pt function: Note resulting \\(t\\)-statistics two theory-based approaches .","code":"# calculate the observed statistic observed_statistic <- gss %>% specify(response = hours) %>% calculate(stat = \"mean\") # generate the null distribution null_dist_1_sample <- gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"mean\") # visualize the null distribution and test statistic! null_dist_1_sample %>% visualize() + shade_p_value(observed_statistic, direction = \"two-sided\") # calculate the p value from the test statistic and null distribution p_value_1_sample <- null_dist_1_sample %>% get_p_value(obs_stat = observed_statistic, direction = \"two-sided\") p_value_1_sample ## # A tibble: 1 × 1 ## p_value ## ## 1 0.034 t_test(gss, response = hours, mu = 40) ## # A tibble: 1 × 7 ## statistic t_df p_value alternative estimate lower_ci upper_ci ## ## 1 2.09 499 0.0376 two.sided 41.4 40.1 42.7 # calculate the observed statistic observed_statistic <- gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% calculate(stat = \"t\") %>% dplyr::pull() pt(unname(observed_statistic), df = nrow(gss) - 1, lower.tail = FALSE)*2 ## [1] 0.03756"},{"path":"https://infer.tidymodels.org/dev/articles/t_test.html","id":"sample-t-test-1","dir":"Articles","previous_headings":"","what":"2-Sample t-Test","title":"Tidy t-Tests with infer","text":"2-Sample \\(t\\)-tests evaluate difference mean values two populations using data randomly-sampled population approximately follows normal distribution. example, ’ll test Americans work number hours week regardless whether college degree using data gss. college hours variables allow us : looks like distributions centered near 40 hours week, distribution degree slightly right skewed. , note warning missing values—many respondents’ values missing. actually carrying hypothesis test, might look data collected; ’s possible whether value either columns missing related value . infer’s randomization-based analogue 2-sample \\(t\\)-test difference means test. ’ll start showcasing test demonstrating carry theory-based \\(t\\)-test package. one-sample test, calculate observed difference means, can use specify() calculate(). Note , line specify(hours ~ college), swapped syntax specify(response = hours, explanatory = college)! order argument calculate line gives order subtract mean values : case, ’re taking mean number hours worked degree minus mean number hours worked without degree; positive difference, , mean people degrees worked without degree. Now, want compare difference means null distribution, generated assumption number hours worked week relationship whether one college degree, get sense likely us see observed difference means really relationship two variables. can generate null distribution using permutation, , replicate, value degree status randomly reassigned (without replacement) new number hours worked per week sample order break association two. , note , lines specify(hours ~ college) chunk, used syntax specify(response = hours, explanatory = college) instead! get sense distributions look like, observed statistic falls, can use visualize(). looks like observed statistic 1.5384 unlikely truly relationship degree status number hours worked. exactly, can calculate p-value; theoretical p-values yet supported, ’ll use randomization-based null distribution calculate p-value. Thus, really relationship number hours worked week whether one college degree, probability see statistic extreme 1.5384 approximately 0.284. Note , similarly steps shown , package supplies wrapper function, t_test, carry 2-sample \\(t\\)-tests tidy data. syntax looks like : example, specified relationship syntax formula = hours ~ college; also written response = hours, explanatory = college. alternative approach t_test() wrapper calculate observed statistic infer pipeline supply pt function base R. can calculate statistic , switching stat = \"diff means\" argument stat = \"t\". Note pipeline calculate observed statistic includes hypothesize() since \\(t\\) statistic requires hypothesized mean value. , juxtaposing \\(t\\) statistic associated distribution using pt function: Note results two theory-based approaches nearly .","code":"# calculate the observed statistic observed_statistic <- gss %>% specify(hours ~ college) %>% calculate(stat = \"diff in means\", order = c(\"degree\", \"no degree\")) observed_statistic ## Response: hours (numeric) ## Explanatory: college (factor) ## # A tibble: 1 × 1 ## stat ## ## 1 1.54 # generate the null distribution with randomization null_dist_2_sample <- gss %>% specify(hours ~ college) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") %>% calculate(stat = \"diff in means\", order = c(\"degree\", \"no degree\")) # visualize the randomization-based null distribution and test statistic! null_dist_2_sample %>% visualize() + shade_p_value(observed_statistic, direction = \"two-sided\") # calculate the p value from the randomization-based null # distribution and the observed statistic p_value_2_sample <- null_dist_2_sample %>% get_p_value(obs_stat = observed_statistic, direction = \"two-sided\") p_value_2_sample ## # A tibble: 1 × 1 ## p_value ## ## 1 0.284 t_test(x = gss, formula = hours ~ college, order = c(\"degree\", \"no degree\"), alternative = \"two-sided\") ## # A tibble: 1 × 7 ## statistic t_df p_value alternative estimate lower_ci upper_ci ## ## 1 1.12 366. 0.264 two.sided 1.54 -1.16 4.24 # calculate the observed statistic observed_statistic <- gss %>% specify(hours ~ college) %>% hypothesize(null = \"point\", mu = 40) %>% calculate(stat = \"t\", order = c(\"degree\", \"no degree\")) %>% dplyr::pull() observed_statistic ## t ## 1.119 pt(unname(observed_statistic), df = nrow(gss) - 2, lower.tail = FALSE)*2 ## [1] 0.2635"},{"path":"https://infer.tidymodels.org/dev/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Andrew Bray. Author. Chester Ismay. Author. Evgeni Chasnovski. Author. Simon Couch. Author, maintainer. Ben Baumer. Author. Mine Cetinkaya-Rundel. Author. Ted Laderas. Contributor. Nick Solomon. Contributor. Johanna Hardin. Contributor. Albert Y. Kim. Contributor. Neal Fultz. Contributor. Doug Friedman. Contributor. Richie Cotton. Contributor. Brian Fannin. Contributor.","code":""},{"path":"https://infer.tidymodels.org/dev/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Couch et al., (2021). infer: R package tidyverse-friendly statistical inference. Journal Open Source Software, 6(65), 3661, https://doi.org/10.21105/joss.03661","code":"@Article{, title = {{infer}: An {R} package for tidyverse-friendly statistical inference}, author = {Simon P. Couch and Andrew P. Bray and Chester Ismay and Evgeni Chasnovski and Benjamin S. Baumer and Mine Çetinkaya-Rundel}, journal = {Journal of Open Source Software}, year = {2021}, volume = {6}, number = {65}, pages = {3661}, doi = {10.21105/joss.03661}, }"},{"path":"https://infer.tidymodels.org/dev/index.html","id":"infer-r-package-","dir":"","previous_headings":"","what":"Tidy Statistical Inference","title":"Tidy Statistical Inference","text":"objective package perform statistical inference using expressive statistical grammar coheres tidyverse design framework. package centered around 4 main verbs, supplemented many utilities visualize extract value outputs. specify() allows specify variable, relationship variables, ’re interested . hypothesize() allows declare null hypothesis. generate() allows generate data reflecting null hypothesis. calculate() allows calculate distribution statistics generated data form null distribution. learn principles underlying package design, see vignette(\"infer\"). ’re interested learning randomization-based statistical inference generally, including applied examples package, recommend checking Statistical Inference Via Data Science: ModernDive R Tidyverse Introduction Modern Statistics.","code":""},{"path":"https://infer.tidymodels.org/dev/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"Tidy Statistical Inference","text":"install current stable version infer CRAN: install developmental stable version infer, make sure install remotes first. pkgdown website version infer.tidymodels.org.","code":"install.packages(\"infer\") # install.packages(\"pak\") pak::pak(\"tidymodels/infer\")"},{"path":"https://infer.tidymodels.org/dev/index.html","id":"contributing","dir":"","previous_headings":"","what":"Contributing","title":"Tidy Statistical Inference","text":"welcome others helping us make package user-friendly efficient possible. Please review contributing conduct guidelines. participating project agree abide terms. questions discussions tidymodels packages, modeling, machine learning, please post Posit Community. think encountered bug, please submit issue. Either way, learn create share reprex (minimal, reproducible example), clearly communicate code. Check details contributing guidelines tidymodels packages get help.","code":""},{"path":"https://infer.tidymodels.org/dev/index.html","id":"examples","dir":"","previous_headings":"","what":"Examples","title":"Tidy Statistical Inference","text":"examples pulled “Full infer Pipeline Examples” vignette, accessible calling vignette(\"observed_stat_examples\"). make use gss dataset supplied package, providing sample data General Social Survey. data looks like : example, ’ll run analysis variance age partyid, testing whether age respondent independent political party affiliation. Calculating observed statistic, , generating null distribution, Visualizing observed statistic alongside null distribution, Calculating p-value null distribution observed statistic, Note formula non-formula interfaces (.e. age ~ partyid vs. response = age, explanatory = partyid) work implemented inference procedures infer. Use whatever natural . modeling using functions like lm() glm(), though, recommend begin use formula y ~ x notation soon possible. resources available package vignettes! See vignette(\"observed_stat_examples\") examples like one , vignette(\"infer\") discussion underlying principles package design.","code":"# load in the dataset data(gss) # take a glimpse at it str(gss) ## tibble [500 × 11] (S3: tbl_df/tbl/data.frame) ## $ year : num [1:500] 2014 1994 1998 1996 1994 ... ## $ age : num [1:500] 36 34 24 42 31 32 48 36 30 33 ... ## $ sex : Factor w/ 2 levels \"male\",\"female\": 1 2 1 1 1 2 2 2 2 2 ... ## $ college: Factor w/ 2 levels \"no degree\",\"degree\": 2 1 2 1 2 1 1 2 2 1 ... ## $ partyid: Factor w/ 5 levels \"dem\",\"ind\",\"rep\",..: 2 3 2 2 3 3 1 2 3 1 ... ## $ hompop : num [1:500] 3 4 1 4 2 4 2 1 5 2 ... ## $ hours : num [1:500] 50 31 40 40 40 53 32 20 40 40 ... ## $ income : Ord.factor w/ 12 levels \"lt $1000\"<\"$1000 to 2999\"<..: 12 11 12 12 12 12 12 12 12 10 ... ## $ class : Factor w/ 6 levels \"lower class\",..: 3 2 2 2 3 3 2 3 3 2 ... ## $ finrela: Factor w/ 6 levels \"far below average\",..: 2 2 2 4 4 3 2 4 3 1 ... ## $ weight : num [1:500] 0.896 1.083 0.55 1.086 1.083 ... F_hat <- gss %>% specify(age ~ partyid) %>% calculate(stat = \"F\") null_dist <- gss %>% specify(age ~ partyid) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") %>% calculate(stat = \"F\") visualize(null_dist) + shade_p_value(obs_stat = F_hat, direction = \"greater\") null_dist %>% get_p_value(obs_stat = F_hat, direction = \"greater\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.06"},{"path":"https://infer.tidymodels.org/dev/reference/assume.html","id":null,"dir":"Reference","previous_headings":"","what":"Define a theoretical distribution — assume","title":"Define a theoretical distribution — assume","text":"function allows user define null distribution based theoretical methods. many infer pipelines, assume() can used place generate() calculate() create null distribution. Rather outputting data frame containing distribution test statistics calculated resamples observed data, assume() outputs abstract type object just containing distributional details supplied distribution df arguments. However, assume() output can passed visualize(), get_p_value(), get_confidence_interval() way simulation-based distributions can. define theoretical null distribution (use hypothesis testing), sure provide null hypothesis via hypothesize(). define theoretical sampling distribution (use confidence intervals), provide output specify(). Sampling distributions (implemented t z) lie scale data, recentered rescaled match corresponding stat given calculate() calculate observed statistic.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/assume.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Define a theoretical distribution — assume","text":"","code":"assume(x, distribution, df = NULL, ...)"},{"path":"https://infer.tidymodels.org/dev/reference/assume.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Define a theoretical distribution — assume","text":"x output specify() hypothesize(), giving observed data, variable(s) interest, (optionally) null hypothesis. distribution distribution question, string. One \"F\", \"Chisq\", \"t\", \"z\". df Optional. degrees freedom parameter(s) distribution supplied, numeric vector. distribution = \"F\", length two (e.g. c(10, 3)). distribution = \"Chisq\" distribution = \"t\", length one. distribution = \"z\", argument required. package supply message supplied df argument different recognized values. See Details section information. ... Currently ignored.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/assume.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Define a theoretical distribution — assume","text":"infer theoretical distribution can passed helpers like visualize(), get_p_value(), get_confidence_interval().","code":""},{"path":"https://infer.tidymodels.org/dev/reference/assume.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Define a theoretical distribution — assume","text":"Note assumption expressed , use theory-based inference, extends distributional assumptions: null distribution question parameters. Statistical inference infer, whether carried via simulation (.e. based pipelines using generate() calculate()) theory (.e. assume()), always involves condition observations independent . infer supports theoretical tests one two means via t distribution one two proportions via z. tests comparing two means, n1 group size one level explanatory variable, n2 level, infer recognize following degrees freedom (df) arguments: min(n1 - 1, n2 - 1) n1 + n2 - 2 \"parameter\" entry analogous stats::t.test() call \"parameter\" entry analogous stats::t.test() call var.equal = TRUE default, package use \"parameter\" entry analogous stats::t.test() call var.equal = FALSE (default).","code":""},{"path":"https://infer.tidymodels.org/dev/reference/assume.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Define a theoretical distribution — assume","text":"","code":"# construct theoretical distributions --------------------------------- # F distribution # with the `partyid` explanatory variable gss %>% specify(age ~ partyid) %>% assume(distribution = \"F\") #> Dropping unused factor levels DK from the supplied explanatory variable #> 'partyid'. #> An F distribution with 3 and 496 degrees of freedom. # Chi-squared goodness of fit distribution # on the `finrela` variable gss %>% specify(response = finrela) %>% hypothesize(null = \"point\", p = c(\"far below average\" = 1/6, \"below average\" = 1/6, \"average\" = 1/6, \"above average\" = 1/6, \"far above average\" = 1/6, \"DK\" = 1/6)) %>% assume(\"Chisq\") #> A Chi-squared distribution with 5 degrees of freedom. # Chi-squared test of independence # on the `finrela` and `sex` variables gss %>% specify(formula = finrela ~ sex) %>% assume(distribution = \"Chisq\") #> A Chi-squared distribution with 5 degrees of freedom. # T distribution gss %>% specify(age ~ college) %>% assume(\"t\") #> A T distribution with 423 degrees of freedom. # Z distribution gss %>% specify(response = sex, success = \"female\") %>% assume(\"z\") #> A Z distribution. if (FALSE) { # each of these distributions can be passed to infer helper # functions alongside observed statistics! # for example, a 1-sample t-test ------------------------------------- # calculate the observed statistic obs_stat <- gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% calculate(stat = \"t\") # construct a null distribution null_dist <- gss %>% specify(response = hours) %>% assume(\"t\") # juxtapose them visually visualize(null_dist) + shade_p_value(obs_stat, direction = \"both\") # calculate a p-value get_p_value(null_dist, obs_stat, direction = \"both\") # or, an F test ------------------------------------------------------ # calculate the observed statistic obs_stat <- gss %>% specify(age ~ partyid) %>% hypothesize(null = \"independence\") %>% calculate(stat = \"F\") # construct a null distribution null_dist <- gss %>% specify(age ~ partyid) %>% assume(distribution = \"F\") # juxtapose them visually visualize(null_dist) + shade_p_value(obs_stat, direction = \"both\") # calculate a p-value get_p_value(null_dist, obs_stat, direction = \"both\") }"},{"path":"https://infer.tidymodels.org/dev/reference/calculate.html","id":null,"dir":"Reference","previous_headings":"","what":"Calculate summary statistics — calculate","title":"Calculate summary statistics — calculate","text":"Given output specify() /hypothesize(), function return observed statistic specified stat argument. test statistics, Chisq, t, z, require null hypothesis. provided output generate(), function calculate supplied stat replicate. Learn vignette(\"infer\").","code":""},{"path":"https://infer.tidymodels.org/dev/reference/calculate.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calculate summary statistics — calculate","text":"","code":"calculate( x, stat = c(\"mean\", \"median\", \"sum\", \"sd\", \"prop\", \"count\", \"diff in means\", \"diff in medians\", \"diff in props\", \"Chisq\", \"F\", \"slope\", \"correlation\", \"t\", \"z\", \"ratio of props\", \"odds ratio\", \"ratio of means\"), order = NULL, ... )"},{"path":"https://infer.tidymodels.org/dev/reference/calculate.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calculate summary statistics — calculate","text":"x output generate() computation-based inference output hypothesize() piped theory-based inference. stat string giving type statistic calculate. Current options include \"mean\", \"median\", \"sum\", \"sd\", \"prop\", \"count\", \"diff means\", \"diff medians\", \"diff props\", \"Chisq\" (\"chisq\"), \"F\" (\"f\"), \"t\", \"z\", \"ratio props\", \"slope\", \"odds ratio\", \"ratio means\", \"correlation\". infer supports theoretical tests one two means via \"t\" distribution one two proportions via \"z\". order string vector specifying order levels explanatory variable ordered subtraction (division ratio-based statistics), order = c(\"first\", \"second\") means (\"first\" - \"second\"), analogue ratios. Needed inference difference means, medians, proportions, ratios, t, z statistics. ... pass options like na.rm = TRUE functions like mean(), sd(), etc. Can also used supply hypothesized null values \"t\" statistic additional arguments stats::chisq.test().","code":""},{"path":"https://infer.tidymodels.org/dev/reference/calculate.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calculate summary statistics — calculate","text":"tibble containing stat column calculated statistics.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/calculate.html","id":"missing-levels-in-small-samples","dir":"Reference","previous_headings":"","what":"Missing levels in small samples","title":"Calculate summary statistics — calculate","text":"cases, bootstrapping small samples, generated bootstrap samples one level explanatory variable present. test statistics, calculated statistic cases NaN. package omit non-finite values visualizations (warning) raise error p-value calculations.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/calculate.html","id":"reproducibility","dir":"Reference","previous_headings":"","what":"Reproducibility","title":"Calculate summary statistics — calculate","text":"using infer package research, cases exact reproducibility priority, sure set seed R’s random number generator. infer respect random seed specified set.seed() function, returning result generate()ing data given identical seed. instance, can calculate difference mean age college degree status using gss dataset 10 versions gss resampled permutation using following code. Setting seed value rerunning code produce result. Please keep mind writing infer code utilizes resampling generate().","code":"set.seed(1) gss %>% specify(age ~ college) %>% hypothesize(null = \"independence\") %>% generate(reps = 5, type = \"permute\") %>% calculate(\"diff in means\", order = c(\"degree\", \"no degree\")) ## Response: age (numeric) ## Explanatory: college (factor) ## Null Hypothesis: independence ## # A tibble: 5 x 2 ## replicate stat ## ## 1 1 -0.531 ## 2 2 -2.35 ## 3 3 0.764 ## 4 4 0.280 ## 5 5 0.350 # set the seed set.seed(1) gss %>% specify(age ~ college) %>% hypothesize(null = \"independence\") %>% generate(reps = 5, type = \"permute\") %>% calculate(\"diff in means\", order = c(\"degree\", \"no degree\")) ## Response: age (numeric) ## Explanatory: college (factor) ## Null Hypothesis: independence ## # A tibble: 5 x 2 ## replicate stat ## ## 1 1 -0.531 ## 2 2 -2.35 ## 3 3 0.764 ## 4 4 0.280 ## 5 5 0.350"},{"path":[]},{"path":"https://infer.tidymodels.org/dev/reference/calculate.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Calculate summary statistics — calculate","text":"","code":"# calculate a null distribution of hours worked per week under # the null hypothesis that the mean is 40 gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% generate(reps = 200, type = \"bootstrap\") %>% calculate(stat = \"mean\") #> Response: hours (numeric) #> Null Hypothesis: point #> # A tibble: 200 × 2 #> replicate stat #> #> 1 1 39.2 #> 2 2 39.4 #> 3 3 40.1 #> 4 4 39.6 #> 5 5 40.8 #> 6 6 39.9 #> 7 7 39.9 #> 8 8 40.8 #> 9 9 39.6 #> 10 10 41.0 #> # ℹ 190 more rows # calculate the corresponding observed statistic gss %>% specify(response = hours) %>% calculate(stat = \"mean\") #> Response: hours (numeric) #> # A tibble: 1 × 1 #> stat #> #> 1 41.4 # calculate a null distribution assuming independence between age # of respondent and whether they have a college degree gss %>% specify(age ~ college) %>% hypothesize(null = \"independence\") %>% generate(reps = 200, type = \"permute\") %>% calculate(\"diff in means\", order = c(\"degree\", \"no degree\")) #> Response: age (numeric) #> Explanatory: college (factor) #> Null Hypothesis: independence #> # A tibble: 200 × 2 #> replicate stat #> #> 1 1 -2.48 #> 2 2 -0.699 #> 3 3 -0.0113 #> 4 4 0.579 #> 5 5 0.553 #> 6 6 1.84 #> 7 7 -2.31 #> 8 8 -0.320 #> 9 9 -0.00250 #> 10 10 -1.78 #> # ℹ 190 more rows # calculate the corresponding observed statistic gss %>% specify(age ~ college) %>% calculate(\"diff in means\", order = c(\"degree\", \"no degree\")) #> Response: age (numeric) #> Explanatory: college (factor) #> # A tibble: 1 × 1 #> stat #> #> 1 0.941 # some statistics require a null hypothesis gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% calculate(stat = \"t\") #> Response: hours (numeric) #> Null Hypothesis: point #> # A tibble: 1 × 1 #> stat #> #> 1 2.09 # more in-depth explanation of how to use the infer package if (FALSE) { vignette(\"infer\") }"},{"path":"https://infer.tidymodels.org/dev/reference/chisq_stat.html","id":null,"dir":"Reference","previous_headings":"","what":"Tidy chi-squared test statistic — chisq_stat","title":"Tidy chi-squared test statistic — chisq_stat","text":"@description","code":""},{"path":"https://infer.tidymodels.org/dev/reference/chisq_stat.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Tidy chi-squared test statistic — chisq_stat","text":"","code":"chisq_stat(x, formula, response = NULL, explanatory = NULL, ...)"},{"path":"https://infer.tidymodels.org/dev/reference/chisq_stat.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Tidy chi-squared test statistic — chisq_stat","text":"x data frame can coerced tibble. formula formula response variable left explanatory right. Alternatively, response explanatory argument can supplied. response variable name x serve response. alternative using formula argument. explanatory variable name x serve explanatory variable. alternative using formula argument. ... Additional arguments chisq.test().","code":""},{"path":"https://infer.tidymodels.org/dev/reference/chisq_stat.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Tidy chi-squared test statistic — chisq_stat","text":"shortcut wrapper function get observed test statistic chisq test. Uses chisq.test(), applies continuity correction. function deprecated favor general observe().","code":""},{"path":[]},{"path":"https://infer.tidymodels.org/dev/reference/chisq_stat.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Tidy chi-squared test statistic — chisq_stat","text":"","code":"# chi-squared test statistic for test of independence # of college completion status depending and one's # self-identified income class chisq_stat(gss, college ~ finrela) #> Warning: The chisq_stat() wrapper has been deprecated in favor of the more general observe(). Please use that function instead. #> X-squared #> 30.68252 # chi-squared test statistic for a goodness of fit # test on whether self-identified income class # follows a uniform distribution chisq_stat(gss, response = finrela, p = c(\"far below average\" = 1/6, \"below average\" = 1/6, \"average\" = 1/6, \"above average\" = 1/6, \"far above average\" = 1/6, \"DK\" = 1/6)) #> Warning: The chisq_stat() wrapper has been deprecated in favor of the more general observe(). Please use that function instead. #> X-squared #> 487.984"},{"path":"https://infer.tidymodels.org/dev/reference/chisq_test.html","id":null,"dir":"Reference","previous_headings":"","what":"Tidy chi-squared test — chisq_test","title":"Tidy chi-squared test — chisq_test","text":"tidier version chisq.test() goodness fit tests tests independence.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/chisq_test.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Tidy chi-squared test — chisq_test","text":"","code":"chisq_test(x, formula, response = NULL, explanatory = NULL, ...)"},{"path":"https://infer.tidymodels.org/dev/reference/chisq_test.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Tidy chi-squared test — chisq_test","text":"x data frame can coerced tibble. formula formula response variable left explanatory right. Alternatively, response explanatory argument can supplied. response variable name x serve response. alternative using formula argument. explanatory variable name x serve explanatory variable. alternative using formula argument. ... Additional arguments chisq.test().","code":""},{"path":[]},{"path":"https://infer.tidymodels.org/dev/reference/chisq_test.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Tidy chi-squared test — chisq_test","text":"","code":"# chi-squared test of independence for college completion # status depending on one's self-identified income class chisq_test(gss, college ~ finrela) #> Warning: Chi-squared approximation may be incorrect #> # A tibble: 1 × 3 #> statistic chisq_df p_value #> #> 1 30.7 5 0.0000108 # chi-squared goodness of fit test on whether self-identified # income class follows a uniform distribution chisq_test(gss, response = finrela, p = c(\"far below average\" = 1/6, \"below average\" = 1/6, \"average\" = 1/6, \"above average\" = 1/6, \"far above average\" = 1/6, \"DK\" = 1/6)) #> # A tibble: 1 × 3 #> statistic chisq_df p_value #> #> 1 488. 5 3.13e-103"},{"path":"https://infer.tidymodels.org/dev/reference/deprecated.html","id":null,"dir":"Reference","previous_headings":"","what":"Deprecated functions and objects — deprecated","title":"Deprecated functions and objects — deprecated","text":"functions objects longer used. removed future release infer.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/deprecated.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Deprecated functions and objects — deprecated","text":"","code":"conf_int(x, level = 0.95, type = \"percentile\", point_estimate = NULL) p_value(x, obs_stat, direction)"},{"path":"https://infer.tidymodels.org/dev/reference/deprecated.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Deprecated functions and objects — deprecated","text":"x See non-deprecated function. level See non-deprecated function. type See non-deprecated function. point_estimate See non-deprecated function. obs_stat See non-deprecated function. direction See non-deprecated function.","code":""},{"path":[]},{"path":"https://infer.tidymodels.org/dev/reference/fit.infer.html","id":null,"dir":"Reference","previous_headings":"","what":"Fit linear models to infer objects — fit.infer","title":"Fit linear models to infer objects — fit.infer","text":"Given output infer core function, function fit linear model using stats::glm() according formula data supplied earlier pipeline. passed output specify() hypothesize(), function fit one model. passed output generate(), fit model data resample, denoted replicate column. family fitted model depends type response variable. response numeric, fit() use family = \"gaussian\" (linear regression). response 2-level factor character, fit() use family = \"binomial\" (logistic regression). fit character factor response variables two levels, recommend parsnip::multinom_reg(). infer provides fit \"method\" infer objects, way carrying model fitting applied infer output. \"generic,\" imported generics package re-exported package, provides general form fit() points infer's method called infer object. generic also documented . Learn vignette(\"infer\").","code":""},{"path":"https://infer.tidymodels.org/dev/reference/fit.infer.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Fit linear models to infer objects — fit.infer","text":"","code":"# S3 method for infer fit(object, ...)"},{"path":"https://infer.tidymodels.org/dev/reference/fit.infer.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Fit linear models to infer objects — fit.infer","text":"object Output infer function---likely generate() specify()---specifies formula data fit model . ... optional arguments pass along model fitting function. See stats::glm() information.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/fit.infer.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Fit linear models to infer objects — fit.infer","text":"tibble containing following columns: replicate: supplied input object previously passed generate(). number corresponding resample original data set model fitted . term: explanatory variable (intercept) question. estimate: model coefficient given resample (replicate) explanatory variable (term).","code":""},{"path":"https://infer.tidymodels.org/dev/reference/fit.infer.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Fit linear models to infer objects — fit.infer","text":"Randomization-based statistical inference multiple explanatory variables requires careful consideration null hypothesis question implications permutation procedures. Inference partial regression coefficients via permutation method implemented generate() multiple explanatory variables, consistent meaning elsewhere package, subject additional distributional assumptions beyond required one explanatory variable. Namely, distribution response variable must similar distribution errors null hypothesis' specification fixed effect explanatory variables. (null hypothesis reflected variables argument generate(). default, explanatory variables treated fixed.) general rule thumb , large outliers distributions explanatory variables, distributional assumption satisfied; response variable permuted, (presumably outlying) value response longer paired outlier explanatory variable, causing outsize effect resulting slope coefficient explanatory variable. sophisticated methods outside scope package requiring fewer---less strict---distributional assumptions exist. overview, see \"Permutation tests univariate multivariate analysis variance regression\" (Marti J. Anderson, 2001), doi:10.1139/cjfas-58-3-626 .","code":""},{"path":"https://infer.tidymodels.org/dev/reference/fit.infer.html","id":"reproducibility","dir":"Reference","previous_headings":"","what":"Reproducibility","title":"Fit linear models to infer objects — fit.infer","text":"using infer package research, cases exact reproducibility priority, sure set seed R’s random number generator. infer respect random seed specified set.seed() function, returning result generate()ing data given identical seed. instance, can calculate difference mean age college degree status using gss dataset 10 versions gss resampled permutation using following code. Setting seed value rerunning code produce result. Please keep mind writing infer code utilizes resampling generate().","code":"set.seed(1) gss %>% specify(age ~ college) %>% hypothesize(null = \"independence\") %>% generate(reps = 5, type = \"permute\") %>% calculate(\"diff in means\", order = c(\"degree\", \"no degree\")) ## Response: age (numeric) ## Explanatory: college (factor) ## Null Hypothesis: independence ## # A tibble: 5 x 2 ## replicate stat ## ## 1 1 -0.531 ## 2 2 -2.35 ## 3 3 0.764 ## 4 4 0.280 ## 5 5 0.350 # set the seed set.seed(1) gss %>% specify(age ~ college) %>% hypothesize(null = \"independence\") %>% generate(reps = 5, type = \"permute\") %>% calculate(\"diff in means\", order = c(\"degree\", \"no degree\")) ## Response: age (numeric) ## Explanatory: college (factor) ## Null Hypothesis: independence ## # A tibble: 5 x 2 ## replicate stat ## ## 1 1 -0.531 ## 2 2 -2.35 ## 3 3 0.764 ## 4 4 0.280 ## 5 5 0.350"},{"path":"https://infer.tidymodels.org/dev/reference/fit.infer.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Fit linear models to infer objects — fit.infer","text":"","code":"# fit a linear model predicting number of hours worked per # week using respondent age and degree status. observed_fit <- gss %>% specify(hours ~ age + college) %>% fit() observed_fit #> # A tibble: 3 × 2 #> term estimate #> #> 1 intercept 40.6 #> 2 age 0.00596 #> 3 collegedegree 1.53 # fit 100 models to resamples of the gss dataset, where the response # `hours` is permuted in each. note that this code is the same as # the above except for the addition of the `generate` step. null_fits <- gss %>% specify(hours ~ age + college) %>% hypothesize(null = \"independence\") %>% generate(reps = 100, type = \"permute\") %>% fit() null_fits #> # A tibble: 300 × 3 #> # Groups: replicate [100] #> replicate term estimate #> #> 1 1 intercept 43.4 #> 2 1 age -0.0457 #> 3 1 collegedegree -0.481 #> 4 2 intercept 41.2 #> 5 2 age 0.00565 #> 6 2 collegedegree -0.212 #> 7 3 intercept 40.3 #> 8 3 age 0.0314 #> 9 3 collegedegree -0.510 #> 10 4 intercept 40.5 #> # ℹ 290 more rows # for logistic regression, just supply a binary response variable! # (this can also be made explicit via the `family` argument in ...) gss %>% specify(college ~ age + hours) %>% fit() #> # A tibble: 3 × 2 #> term estimate #> #> 1 intercept -1.13 #> 2 age 0.00527 #> 3 hours 0.00698 # more in-depth explanation of how to use the infer package if (FALSE) { vignette(\"infer\") }"},{"path":"https://infer.tidymodels.org/dev/reference/generate.html","id":null,"dir":"Reference","previous_headings":"","what":"Generate resamples, permutations, or simulations — generate","title":"Generate resamples, permutations, or simulations — generate","text":"Generation creates simulated distribution specify(). context confidence intervals, bootstrap distribution based result specify(). context hypothesis testing, null distribution based result specify() hypothesize(). Learn vignette(\"infer\").","code":""},{"path":"https://infer.tidymodels.org/dev/reference/generate.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Generate resamples, permutations, or simulations — generate","text":"","code":"generate(x, reps = 1, type = NULL, variables = !!response_expr(x), ...)"},{"path":"https://infer.tidymodels.org/dev/reference/generate.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Generate resamples, permutations, or simulations — generate","text":"x data frame can coerced tibble. reps number resamples generate. type method used generate resamples observed data reflecting null hypothesis. Currently one \"bootstrap\", \"permute\", \"draw\" (see ). variables type = \"permute\", set unquoted column names data permute (independently ). Defaults response variable. Note derived effects depend columns (e.g., interaction effects) also affected. ... Currently ignored.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/generate.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Generate resamples, permutations, or simulations — generate","text":"tibble containing reps generated datasets, indicated replicate column.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/generate.html","id":"generation-types","dir":"Reference","previous_headings":"","what":"Generation Types","title":"Generate resamples, permutations, or simulations — generate","text":"type argument determines method used create null distribution. bootstrap: bootstrap sample drawn replicate, sample size equal input sample size drawn (replacement) input sample data. permute: replicate, input value randomly reassigned (without replacement) new output value sample. draw: value sampled theoretical distribution parameter p specified hypothesize() replicate. option currently applicable testing one proportion. generation type previously called \"simulate\", superseded.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/generate.html","id":"reproducibility","dir":"Reference","previous_headings":"","what":"Reproducibility","title":"Generate resamples, permutations, or simulations — generate","text":"using infer package research, cases exact reproducibility priority, sure set seed R’s random number generator. infer respect random seed specified set.seed() function, returning result generate()ing data given identical seed. instance, can calculate difference mean age college degree status using gss dataset 10 versions gss resampled permutation using following code. Setting seed value rerunning code produce result. Please keep mind writing infer code utilizes resampling generate().","code":"set.seed(1) gss %>% specify(age ~ college) %>% hypothesize(null = \"independence\") %>% generate(reps = 5, type = \"permute\") %>% calculate(\"diff in means\", order = c(\"degree\", \"no degree\")) ## Response: age (numeric) ## Explanatory: college (factor) ## Null Hypothesis: independence ## # A tibble: 5 x 2 ## replicate stat ## ## 1 1 -0.531 ## 2 2 -2.35 ## 3 3 0.764 ## 4 4 0.280 ## 5 5 0.350 # set the seed set.seed(1) gss %>% specify(age ~ college) %>% hypothesize(null = \"independence\") %>% generate(reps = 5, type = \"permute\") %>% calculate(\"diff in means\", order = c(\"degree\", \"no degree\")) ## Response: age (numeric) ## Explanatory: college (factor) ## Null Hypothesis: independence ## # A tibble: 5 x 2 ## replicate stat ## ## 1 1 -0.531 ## 2 2 -2.35 ## 3 3 0.764 ## 4 4 0.280 ## 5 5 0.350"},{"path":[]},{"path":"https://infer.tidymodels.org/dev/reference/generate.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Generate resamples, permutations, or simulations — generate","text":"","code":"# generate a null distribution by taking 200 bootstrap samples gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% generate(reps = 200, type = \"bootstrap\") #> Response: hours (numeric) #> Null Hypothesis: point #> # A tibble: 100,000 × 2 #> # Groups: replicate [200] #> replicate hours #> #> 1 1 48.6 #> 2 1 38.6 #> 3 1 38.6 #> 4 1 8.62 #> 5 1 38.6 #> 6 1 38.6 #> 7 1 18.6 #> 8 1 38.6 #> 9 1 38.6 #> 10 1 58.6 #> # ℹ 99,990 more rows # generate a null distribution for the independence of # two variables by permuting their values 200 times gss %>% specify(partyid ~ age) %>% hypothesize(null = \"independence\") %>% generate(reps = 200, type = \"permute\") #> Dropping unused factor levels DK from the supplied response variable #> 'partyid'. #> Response: partyid (factor) #> Explanatory: age (numeric) #> Null Hypothesis: independence #> # A tibble: 100,000 × 3 #> # Groups: replicate [200] #> partyid age replicate #> #> 1 rep 36 1 #> 2 ind 34 1 #> 3 dem 24 1 #> 4 dem 42 1 #> 5 ind 31 1 #> 6 dem 32 1 #> 7 ind 48 1 #> 8 rep 36 1 #> 9 ind 30 1 #> 10 ind 33 1 #> # ℹ 99,990 more rows # generate a null distribution via sampling from a # binomial distribution 200 times gss %>% specify(response = sex, success = \"female\") %>% hypothesize(null = \"point\", p = .5) %>% generate(reps = 200, type = \"draw\") %>% calculate(stat = \"z\") #> Response: sex (factor) #> Null Hypothesis: point #> # A tibble: 200 × 2 #> replicate stat #> #> 1 1 0.537 #> 2 2 0.447 #> 3 3 -0.447 #> 4 4 -0.984 #> 5 5 1.70 #> 6 6 1.52 #> 7 7 0.0894 #> 8 8 -1.25 #> 9 9 -0.268 #> 10 10 -0.805 #> # ℹ 190 more rows # more in-depth explanation of how to use the infer package if (FALSE) { vignette(\"infer\") }"},{"path":"https://infer.tidymodels.org/dev/reference/get_confidence_interval.html","id":null,"dir":"Reference","previous_headings":"","what":"Compute confidence interval — get_confidence_interval","title":"Compute confidence interval — get_confidence_interval","text":"Compute confidence interval around summary statistic. simulation-based theoretical methods supported, though type = \"se\" supported theoretical methods. Learn vignette(\"infer\").","code":""},{"path":"https://infer.tidymodels.org/dev/reference/get_confidence_interval.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Compute confidence interval — get_confidence_interval","text":"","code":"get_confidence_interval(x, level = 0.95, type = NULL, point_estimate = NULL) get_ci(x, level = 0.95, type = NULL, point_estimate = NULL)"},{"path":"https://infer.tidymodels.org/dev/reference/get_confidence_interval.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Compute confidence interval — get_confidence_interval","text":"x distribution. simulation-based inference, data frame containing distribution calculate()d statistics fit()ted coefficient estimates. object passed generate() supplied calculate() fit(). theory-based inference, output assume(). Distributions confidence intervals require null hypothesis via hypothesize(). level numerical value 0 1 giving confidence level. Default value 0.95. type string giving method used creating confidence interval. default \"percentile\" \"se\" corresponding (multiplier * standard error) \"bias-corrected\" bias-corrected interval options. point_estimate data frame containing observed statistic (calculate()-based workflow) observed fit (fit()-based workflow). object likely output calculate() fit() need passed generate(). Set NULL default. Must provided type \"se\" \"bias-corrected\".","code":""},{"path":"https://infer.tidymodels.org/dev/reference/get_confidence_interval.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Compute confidence interval — get_confidence_interval","text":"tibble containing following columns: term: explanatory variable (intercept) question. supplied input previously passed fit(). lower_ci, upper_ci: lower upper bounds confidence interval, respectively.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/get_confidence_interval.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Compute confidence interval — get_confidence_interval","text":"null hypothesis required compute confidence interval. However, including hypothesize() pipeline leading get_confidence_interval() break anything. can useful computing confidence interval using distribution used compute p-value. Theoretical confidence intervals (.e. calculated supplying output assume() x argument) require point estimate lies scale data. distribution defined assume() recentered rescaled align point estimate, can shown output visualize() paired shade_confidence_interval(). Confidence intervals implemented following distributions point estimates: distribution = \"t\": point_estimate output calculate() stat = \"mean\" stat = \"diff means\" distribution = \"z\": point_estimate output calculate() stat = \"prop\" stat = \"diff props\"","code":""},{"path":"https://infer.tidymodels.org/dev/reference/get_confidence_interval.html","id":"aliases","dir":"Reference","previous_headings":"","what":"Aliases","title":"Compute confidence interval — get_confidence_interval","text":"get_ci() alias get_confidence_interval(). conf_int() deprecated alias get_confidence_interval().","code":""},{"path":[]},{"path":"https://infer.tidymodels.org/dev/reference/get_confidence_interval.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Compute confidence interval — get_confidence_interval","text":"","code":"boot_dist <- gss %>% # We're interested in the number of hours worked per week specify(response = hours) %>% # Generate bootstrap samples generate(reps = 1000, type = \"bootstrap\") %>% # Calculate mean of each bootstrap sample calculate(stat = \"mean\") boot_dist %>% # Calculate the confidence interval around the point estimate get_confidence_interval( # At the 95% confidence level; percentile method level = 0.95 ) #> # A tibble: 1 × 2 #> lower_ci upper_ci #> #> 1 40.2 42.7 # for type = \"se\" or type = \"bias-corrected\" we need a point estimate sample_mean <- gss %>% specify(response = hours) %>% calculate(stat = \"mean\") boot_dist %>% get_confidence_interval( point_estimate = sample_mean, # At the 95% confidence level level = 0.95, # Using the standard error method type = \"se\" ) #> # A tibble: 1 × 2 #> lower_ci upper_ci #> #> 1 40.1 42.7 # using a theoretical distribution ----------------------------------- # define a sampling distribution sampling_dist <- gss %>% specify(response = hours) %>% assume(\"t\") # get the confidence interval---note that the # point estimate is required here get_confidence_interval( sampling_dist, level = .95, point_estimate = sample_mean ) #> # A tibble: 1 × 2 #> lower_ci upper_ci #> #> 1 40.1 42.7 # using a model fitting workflow ----------------------- # fit a linear model predicting number of hours worked per # week using respondent age and degree status. observed_fit <- gss %>% specify(hours ~ age + college) %>% fit() observed_fit #> # A tibble: 3 × 2 #> term estimate #> #> 1 intercept 40.6 #> 2 age 0.00596 #> 3 collegedegree 1.53 # fit 100 models to resamples of the gss dataset, where the response # `hours` is permuted in each. note that this code is the same as # the above except for the addition of the `generate` step. null_fits <- gss %>% specify(hours ~ age + college) %>% hypothesize(null = \"independence\") %>% generate(reps = 100, type = \"permute\") %>% fit() null_fits #> # A tibble: 300 × 3 #> # Groups: replicate [100] #> replicate term estimate #> #> 1 1 intercept 44.2 #> 2 1 age -0.0765 #> 3 1 collegedegree 0.676 #> 4 2 intercept 41.5 #> 5 2 age -0.000968 #> 6 2 collegedegree -0.329 #> 7 3 intercept 41.4 #> 8 3 age 0.0131 #> 9 3 collegedegree -1.50 #> 10 4 intercept 42.0 #> # ℹ 290 more rows get_confidence_interval( null_fits, point_estimate = observed_fit, level = .95 ) #> # A tibble: 3 × 3 #> term lower_ci upper_ci #> #> 1 age -0.0846 0.0856 #> 2 collegedegree -2.10 2.81 #> 3 intercept 38.1 44.7 # more in-depth explanation of how to use the infer package if (FALSE) { vignette(\"infer\") }"},{"path":"https://infer.tidymodels.org/dev/reference/get_p_value.html","id":null,"dir":"Reference","previous_headings":"","what":"Compute p-value — get_p_value","title":"Compute p-value — get_p_value","text":"Compute p-value null distribution observed statistic. Learn vignette(\"infer\").","code":""},{"path":"https://infer.tidymodels.org/dev/reference/get_p_value.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Compute p-value — get_p_value","text":"","code":"get_p_value(x, obs_stat, direction) # S3 method for default get_p_value(x, obs_stat, direction) get_pvalue(x, obs_stat, direction) # S3 method for infer_dist get_p_value(x, obs_stat, direction)"},{"path":"https://infer.tidymodels.org/dev/reference/get_p_value.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Compute p-value — get_p_value","text":"x null distribution. simulation-based inference, data frame containing distribution calculate()d statistics fit()ted coefficient estimates. object passed generate() supplied calculate() fit(). theory-based inference, output assume(). obs_stat data frame containing observed statistic (calculate()-based workflow) observed fit (fit()-based workflow). object likely output calculate() fit() need passed generate(). direction character string. Options \"less\", \"greater\", \"two-sided\". Can also use \"left\", \"right\", \"\", \"two_sided\", \"two sided\", \"two.sided\".","code":""},{"path":"https://infer.tidymodels.org/dev/reference/get_p_value.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Compute p-value — get_p_value","text":"tibble containing following columns: term: explanatory variable (intercept) question. supplied input previously passed fit(). p_value: value [0, 1] giving probability statistic/coefficient extreme observed statistic/coefficient occur null hypothesis true.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/get_p_value.html","id":"aliases","dir":"Reference","previous_headings":"","what":"Aliases","title":"Compute p-value — get_p_value","text":"get_pvalue() alias get_p_value(). p_value deprecated alias get_p_value().","code":""},{"path":"https://infer.tidymodels.org/dev/reference/get_p_value.html","id":"zero-p-value","dir":"Reference","previous_headings":"","what":"Zero p-value","title":"Compute p-value — get_p_value","text":"Though true p-value 0 impossible, get_p_value() may return 0 cases. due simulation-based nature {infer} package; output function approximation based number reps chosen generate() step. observed statistic unlikely given null hypothesis, small number reps generated form null distribution, possible observed statistic extreme every test statistic generated form null distribution, resulting approximate p-value 0. case, true p-value small value likely less 3/reps (based poisson approximation). case p-value zero reported, warning message raised caution user reporting p-value exactly equal 0.","code":""},{"path":[]},{"path":"https://infer.tidymodels.org/dev/reference/get_p_value.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Compute p-value — get_p_value","text":"","code":"# using a simulation-based null distribution ------------------------------ # find the point estimate---mean number of hours worked per week point_estimate <- gss %>% specify(response = hours) %>% calculate(stat = \"mean\") # starting with the gss dataset gss %>% # ...we're interested in the number of hours worked per week specify(response = hours) %>% # hypothesizing that the mean is 40 hypothesize(null = \"point\", mu = 40) %>% # generating data points for a null distribution generate(reps = 1000, type = \"bootstrap\") %>% # finding the null distribution calculate(stat = \"mean\") %>% get_p_value(obs_stat = point_estimate, direction = \"two-sided\") #> # A tibble: 1 × 1 #> p_value #> #> 1 0.032 # using a theoretical null distribution ----------------------------------- # calculate the observed statistic obs_stat <- gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% calculate(stat = \"t\") # define a null distribution null_dist <- gss %>% specify(response = hours) %>% assume(\"t\") # calculate a p-value get_p_value(null_dist, obs_stat, direction = \"both\") #> # A tibble: 1 × 1 #> p_value #> #> 1 0.0376 # using a model fitting workflow ----------------------------------------- # fit a linear model predicting number of hours worked per # week using respondent age and degree status. observed_fit <- gss %>% specify(hours ~ age + college) %>% fit() observed_fit #> # A tibble: 3 × 2 #> term estimate #> #> 1 intercept 40.6 #> 2 age 0.00596 #> 3 collegedegree 1.53 # fit 100 models to resamples of the gss dataset, where the response # `hours` is permuted in each. note that this code is the same as # the above except for the addition of the `generate` step. null_fits <- gss %>% specify(hours ~ age + college) %>% hypothesize(null = \"independence\") %>% generate(reps = 100, type = \"permute\") %>% fit() null_fits #> # A tibble: 300 × 3 #> # Groups: replicate [100] #> replicate term estimate #> #> 1 1 intercept 40.7 #> 2 1 age -0.00753 #> 3 1 collegedegree 2.78 #> 4 2 intercept 41.8 #> 5 2 age -0.000256 #> 6 2 collegedegree -1.08 #> 7 3 intercept 42.7 #> 8 3 age -0.0426 #> 9 3 collegedegree 1.23 #> 10 4 intercept 42.6 #> # ℹ 290 more rows get_p_value(null_fits, obs_stat = observed_fit, direction = \"two-sided\") #> # A tibble: 3 × 2 #> term p_value #> #> 1 age 0.92 #> 2 collegedegree 0.26 #> 3 intercept 0.68 # more in-depth explanation of how to use the infer package if (FALSE) { vignette(\"infer\") }"},{"path":"https://infer.tidymodels.org/dev/reference/gss.html","id":null,"dir":"Reference","previous_headings":"","what":"Subset of data from the General Social Survey (GSS). — gss","title":"Subset of data from the General Social Survey (GSS). — gss","text":"General Social Survey high-quality survey gathers data American society opinions, conducted since 1972. data set sample 500 entries GSS, spanning years 1973-2018, including demographic markers economic variables. Note data included demonstration , assumed provide accurate estimates relating GSS. However, due high quality GSS, unweighted data approximate weighted data analyses.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/gss.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Subset of data from the General Social Survey (GSS). — gss","text":"","code":"gss"},{"path":"https://infer.tidymodels.org/dev/reference/gss.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Subset of data from the General Social Survey (GSS). — gss","text":"tibble 500 rows 11 variables: year year respondent surveyed age age time survey, truncated 89 sex respondent's sex (self-identified) college whether respondent college degree, including junior/community college partyid political party affiliation hompop number persons household hours number hours worked week survey, truncated 89 income total family income class subjective socioeconomic class identification finrela opinion family income weight survey weight","code":""},{"path":"https://infer.tidymodels.org/dev/reference/gss.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Subset of data from the General Social Survey (GSS). — gss","text":"https://gss.norc.org","code":""},{"path":"https://infer.tidymodels.org/dev/reference/hypothesize.html","id":null,"dir":"Reference","previous_headings":"","what":"Declare a null hypothesis — hypothesize","title":"Declare a null hypothesis — hypothesize","text":"Declare null hypothesis variables selected specify(). Learn vignette(\"infer\").","code":""},{"path":"https://infer.tidymodels.org/dev/reference/hypothesize.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Declare a null hypothesis — hypothesize","text":"","code":"hypothesize(x, null, p = NULL, mu = NULL, med = NULL, sigma = NULL) hypothesise(x, null, p = NULL, mu = NULL, med = NULL, sigma = NULL)"},{"path":"https://infer.tidymodels.org/dev/reference/hypothesize.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Declare a null hypothesis — hypothesize","text":"x data frame can coerced tibble. null null hypothesis. Options include \"independence\", \"point\", \"paired independence\". independence: used response explanatory variable. Indicates values specified response variable independent associated values explanatory. point: used response variable. Indicates point estimate based values response associated parameter. Sometimes requires supplying one p, mu, med, sigma. paired independence: used response variable giving pre-computed difference paired observations. Indicates order subtraction paired values affect resulting distribution. p true proportion successes (number 0 1). used point null hypotheses specified response variable categorical. mu true mean (numerical value). used point null hypotheses specified response variable continuous. med true median (numerical value). used point null hypotheses specified response variable continuous. sigma true standard deviation (numerical value). used point null hypotheses.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/hypothesize.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Declare a null hypothesis — hypothesize","text":"tibble containing response (explanatory, specified) variable data parameter information stored well.","code":""},{"path":[]},{"path":"https://infer.tidymodels.org/dev/reference/hypothesize.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Declare a null hypothesis — hypothesize","text":"","code":"# hypothesize independence of two variables gss %>% specify(college ~ partyid, success = \"degree\") %>% hypothesize(null = \"independence\") #> Dropping unused factor levels DK from the supplied explanatory variable #> 'partyid'. #> Response: college (factor) #> Explanatory: partyid (factor) #> Null Hypothesis: independence #> # A tibble: 500 × 2 #> college partyid #> #> 1 degree ind #> 2 no degree rep #> 3 degree ind #> 4 no degree ind #> 5 degree rep #> 6 no degree rep #> 7 no degree dem #> 8 degree ind #> 9 degree rep #> 10 no degree dem #> # ℹ 490 more rows # hypothesize a mean number of hours worked per week of 40 gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) #> Response: hours (numeric) #> Null Hypothesis: point #> # A tibble: 500 × 1 #> hours #> #> 1 50 #> 2 31 #> 3 40 #> 4 40 #> 5 40 #> 6 53 #> 7 32 #> 8 20 #> 9 40 #> 10 40 #> # ℹ 490 more rows # more in-depth explanation of how to use the infer package if (FALSE) { vignette(\"infer\") }"},{"path":"https://infer.tidymodels.org/dev/reference/infer.html","id":null,"dir":"Reference","previous_headings":"","what":"infer: a grammar for statistical inference — infer","title":"infer: a grammar for statistical inference — infer","text":"objective package perform statistical inference using grammar illustrates underlying concepts format coheres tidyverse.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/infer.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"infer: a grammar for statistical inference — infer","text":"overview use core functionality, see vignette(\"infer\")","code":""},{"path":[]},{"path":"https://infer.tidymodels.org/dev/reference/infer.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"infer: a grammar for statistical inference — infer","text":"Maintainer: Simon Couch simon.couch@posit.co (ORCID) Authors: Andrew Bray abray@reed.edu Chester Ismay chester.ismay@gmail.com (ORCID) Evgeni Chasnovski evgeni.chasnovski@gmail.com (ORCID) Ben Baumer ben.baumer@gmail.com (ORCID) Mine Cetinkaya-Rundel mine@stat.duke.edu (ORCID) contributors: Ted Laderas tedladeras@gmail.com (ORCID) [contributor] Nick Solomon nick.solomon@datacamp.com [contributor] Johanna Hardin Jo.Hardin@pomona.edu [contributor] Albert Y. Kim albert.ys.kim@gmail.com (ORCID) [contributor] Neal Fultz nfultz@gmail.com [contributor] Doug Friedman doug.nhp@gmail.com [contributor] Richie Cotton richie@datacamp.com (ORCID) [contributor] Brian Fannin captain@pirategrunt.com [contributor]","code":""},{"path":"https://infer.tidymodels.org/dev/reference/observe.html","id":null,"dir":"Reference","previous_headings":"","what":"Calculate observed statistics — observe","title":"Calculate observed statistics — observe","text":"function wrapper calls specify(), hypothesize(), calculate() consecutively can used calculate observed statistics data. hypothesize() called point null hypothesis parameter supplied. Learn vignette(\"infer\").","code":""},{"path":"https://infer.tidymodels.org/dev/reference/observe.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calculate observed statistics — observe","text":"","code":"observe( x, formula, response = NULL, explanatory = NULL, success = NULL, null = NULL, p = NULL, mu = NULL, med = NULL, sigma = NULL, stat = c(\"mean\", \"median\", \"sum\", \"sd\", \"prop\", \"count\", \"diff in means\", \"diff in medians\", \"diff in props\", \"Chisq\", \"F\", \"slope\", \"correlation\", \"t\", \"z\", \"ratio of props\", \"odds ratio\"), order = NULL, ... )"},{"path":"https://infer.tidymodels.org/dev/reference/observe.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calculate observed statistics — observe","text":"x data frame can coerced tibble. formula formula response variable left explanatory right. Alternatively, response explanatory argument can supplied. response variable name x serve response. alternative using formula argument. explanatory variable name x serve explanatory variable. alternative using formula argument. success level response considered success, string. Needed inference one proportion, difference proportions, corresponding z stats. null null hypothesis. Options include \"independence\", \"point\", \"paired independence\". independence: used response explanatory variable. Indicates values specified response variable independent associated values explanatory. point: used response variable. Indicates point estimate based values response associated parameter. Sometimes requires supplying one p, mu, med, sigma. paired independence: used response variable giving pre-computed difference paired observations. Indicates order subtraction paired values affect resulting distribution. p true proportion successes (number 0 1). used point null hypotheses specified response variable categorical. mu true mean (numerical value). used point null hypotheses specified response variable continuous. med true median (numerical value). used point null hypotheses specified response variable continuous. sigma true standard deviation (numerical value). used point null hypotheses. stat string giving type statistic calculate. Current options include \"mean\", \"median\", \"sum\", \"sd\", \"prop\", \"count\", \"diff means\", \"diff medians\", \"diff props\", \"Chisq\" (\"chisq\"), \"F\" (\"f\"), \"t\", \"z\", \"ratio props\", \"slope\", \"odds ratio\", \"ratio means\", \"correlation\". infer supports theoretical tests one two means via \"t\" distribution one two proportions via \"z\". order string vector specifying order levels explanatory variable ordered subtraction (division ratio-based statistics), order = c(\"first\", \"second\") means (\"first\" - \"second\"), analogue ratios. Needed inference difference means, medians, proportions, ratios, t, z statistics. ... pass options like na.rm = TRUE functions like mean(), sd(), etc. Can also used supply hypothesized null values \"t\" statistic additional arguments stats::chisq.test().","code":""},{"path":"https://infer.tidymodels.org/dev/reference/observe.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calculate observed statistics — observe","text":"1-column tibble containing calculated statistic stat.","code":""},{"path":[]},{"path":"https://infer.tidymodels.org/dev/reference/observe.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Calculate observed statistics — observe","text":"","code":"# calculating the observed mean number of hours worked per week gss %>% observe(hours ~ NULL, stat = \"mean\") #> Response: hours (numeric) #> # A tibble: 1 × 1 #> stat #> #> 1 41.4 # equivalently, calculating the same statistic with the core verbs gss %>% specify(response = hours) %>% calculate(stat = \"mean\") #> Response: hours (numeric) #> # A tibble: 1 × 1 #> stat #> #> 1 41.4 # calculating a t statistic for hypothesized mu = 40 hours worked/week gss %>% observe(hours ~ NULL, stat = \"t\", null = \"point\", mu = 40) #> Response: hours (numeric) #> Null Hypothesis: point #> # A tibble: 1 × 1 #> stat #> #> 1 2.09 # equivalently, calculating the same statistic with the core verbs gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% calculate(stat = \"t\") #> Response: hours (numeric) #> Null Hypothesis: point #> # A tibble: 1 × 1 #> stat #> #> 1 2.09 # similarly for a difference in means in age based on whether # the respondent has a college degree observe( gss, age ~ college, stat = \"diff in means\", order = c(\"degree\", \"no degree\") ) #> Response: age (numeric) #> Explanatory: college (factor) #> # A tibble: 1 × 1 #> stat #> #> 1 0.941 # equivalently, calculating the same statistic with the core verbs gss %>% specify(age ~ college) %>% calculate(\"diff in means\", order = c(\"degree\", \"no degree\")) #> Response: age (numeric) #> Explanatory: college (factor) #> # A tibble: 1 × 1 #> stat #> #> 1 0.941 # for a more in-depth explanation of how to use the infer package if (FALSE) { vignette(\"infer\") }"},{"path":"https://infer.tidymodels.org/dev/reference/pipe.html","id":null,"dir":"Reference","previous_headings":"","what":"Pipe — %>%","title":"Pipe — %>%","text":"Like {dplyr}, {infer} also uses pipe (%>%) function magrittr turn function composition series iterative statements.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/pipe.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Pipe — %>%","text":"lhs, rhs Inference functions initial data frame.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/print.infer.html","id":null,"dir":"Reference","previous_headings":"","what":"Print methods — print.infer","title":"Print methods — print.infer","text":"Print methods","code":""},{"path":"https://infer.tidymodels.org/dev/reference/print.infer.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Print methods — print.infer","text":"","code":"# S3 method for infer print(x, ...) # S3 method for infer_layer print(x, ...) # S3 method for infer_dist print(x, ...)"},{"path":"https://infer.tidymodels.org/dev/reference/print.infer.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Print methods — print.infer","text":"x object class infer, .e. output specify() hypothesize(), class infer_layer, .e. output shade_p_value() shade_confidence_interval(). ... Arguments passed methods.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/prop_test.html","id":null,"dir":"Reference","previous_headings":"","what":"Tidy proportion test — prop_test","title":"Tidy proportion test — prop_test","text":"tidier version prop.test() equal given proportions.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/prop_test.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Tidy proportion test — prop_test","text":"","code":"prop_test( x, formula, response = NULL, explanatory = NULL, p = NULL, order = NULL, alternative = \"two-sided\", conf_int = TRUE, conf_level = 0.95, success = NULL, correct = NULL, z = FALSE, ... )"},{"path":"https://infer.tidymodels.org/dev/reference/prop_test.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Tidy proportion test — prop_test","text":"x data frame can coerced tibble. formula formula response variable left explanatory right. Alternatively, response explanatory argument can supplied. response variable name x serve response. alternative using formula argument. explanatory variable name x serve explanatory variable. alternative using formula argument. p numeric vector giving hypothesized null proportion success group. order string vector specifying order proportions subtracted, order = c(\"first\", \"second\") means \"first\" - \"second\". Ignored one-sample tests, optional two sample tests. alternative Character string giving direction alternative hypothesis. Options \"two-sided\" (default), \"greater\", \"less\". used testing null single proportion equals given value, two proportions equal; ignored otherwise. conf_int logical value whether include confidence interval . TRUE default. conf_level numeric value 0 1. Default value 0.95. success level response considered success, string. used testing null single proportion equals given value, two proportions equal; ignored otherwise. correct logical indicating whether Yates' continuity correction applied possible. z = TRUE, correct argument overwritten FALSE. Otherwise defaults correct = TRUE. z logical value whether report statistic standard normal deviate Pearson's chi-square statistic. \\(z^2\\) distributed chi-square 1 degree freedom, though note user likely need turn Yates' continuity correction setting correct = FALSE see connection. ... Additional arguments prop.test().","code":""},{"path":"https://infer.tidymodels.org/dev/reference/prop_test.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Tidy proportion test — prop_test","text":"testing explanatory variable two levels, order argument used package longer well-defined. function thus raise warning ignore value supplied non-NULL order argument. columns present output depend output prop.test() broom::glance.htest(). See latter's documentation column definitions; columns renamed following mapping: chisq_df = parameter p_value = p.value lower_ci = conf.low upper_ci = conf.high","code":""},{"path":[]},{"path":"https://infer.tidymodels.org/dev/reference/prop_test.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Tidy proportion test — prop_test","text":"","code":"# two-sample proportion test for difference in proportions of # college completion by respondent sex prop_test(gss, college ~ sex, order = c(\"female\", \"male\")) #> # A tibble: 1 × 6 #> statistic chisq_df p_value alternative lower_ci upper_ci #> #> 1 0.0000204 1 0.996 two.sided -0.0918 0.0834 # one-sample proportion test for hypothesized null # proportion of college completion of .2 prop_test(gss, college ~ NULL, p = .2) #> # A tibble: 1 × 4 #> statistic chisq_df p_value alternative #> #> 1 636. 1 2.98e-140 two.sided # report as a z-statistic rather than chi-square # and specify the success level of the response prop_test(gss, college ~ NULL, success = \"degree\", p = .2, z = TRUE) #> # A tibble: 1 × 3 #> statistic p_value alternative #> #> 1 8.27 1.30e-16 two.sided"},{"path":"https://infer.tidymodels.org/dev/reference/reexports.html","id":null,"dir":"Reference","previous_headings":"","what":"Objects exported from other packages — reexports","title":"Objects exported from other packages — reexports","text":"objects imported packages. Follow links see documentation. generics fit ggplot2 ggplot_add","code":""},{"path":"https://infer.tidymodels.org/dev/reference/reexports.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Objects exported from other packages — reexports","text":"Read infer's fit function running ?fit.infer console.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/rep_sample_n.html","id":null,"dir":"Reference","previous_headings":"","what":"Perform repeated sampling — rep_sample_n","title":"Perform repeated sampling — rep_sample_n","text":"functions extend functionality dplyr::sample_n() dplyr::slice_sample() allowing repeated sampling data. operation especially helpful creating sampling distributions—see examples !","code":""},{"path":"https://infer.tidymodels.org/dev/reference/rep_sample_n.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Perform repeated sampling — rep_sample_n","text":"","code":"rep_sample_n(tbl, size, replace = FALSE, reps = 1, prob = NULL) rep_slice_sample( .data, n = NULL, prop = NULL, replace = FALSE, weight_by = NULL, reps = 1 )"},{"path":"https://infer.tidymodels.org/dev/reference/rep_sample_n.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Perform repeated sampling — rep_sample_n","text":"tbl, .data Data frame population sample. size, n, prop size n refer sample size sample. size argument rep_sample_n() required, rep_slice_sample() sample size defaults 1 specified. prop, argument rep_slice_sample(), refers proportion rows sample sample, rounded case prop * nrow(.data) integer. using rep_slice_sample(), please supply one n prop. replace samples taken replacement? reps Number samples take. prob, weight_by vector sampling weights rows .data—must length equal nrow(.data). weight_by, may also unquoted column name .data.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/rep_sample_n.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Perform repeated sampling — rep_sample_n","text":"tibble size reps * n rows corresponding reps samples size n .data, grouped replicate.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/rep_sample_n.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Perform repeated sampling — rep_sample_n","text":"rep_sample_n() rep_slice_sample() designed behave similar dplyr counterparts. , least following differences: case replace = FALSE size bigger number data rows rep_sample_n() give error. rep_slice_sample() n prop > 1 give warning output sample size set number rows data. Note dplyr::sample_n() function superseded dplyr::slice_sample().","code":""},{"path":"https://infer.tidymodels.org/dev/reference/rep_sample_n.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Perform repeated sampling — rep_sample_n","text":"","code":"library(dplyr) #> #> Attaching package: ‘dplyr’ #> The following objects are masked from ‘package:stats’: #> #> filter, lag #> The following objects are masked from ‘package:base’: #> #> intersect, setdiff, setequal, union library(ggplot2) library(tibble) # take 1000 samples of size n = 50, without replacement slices <- gss %>% rep_slice_sample(n = 50, reps = 1000) slices #> # A tibble: 50,000 × 12 #> # Groups: replicate [1,000] #> replicate year age sex college partyid hompop hours income class #> #> 1 1 1994 34 female no degr… rep 4 31 $2000… work… #> 2 1 1976 21 female no degr… ind 2 40 $7000… midd… #> 3 1 1989 18 male no degr… rep 2 21 $2000… midd… #> 4 1 1996 32 female no degr… rep 4 53 $2500… midd… #> 5 1 1991 39 female no degr… dem 4 40 $2500… midd… #> 6 1 2010 57 male degree rep 3 60 $2500… midd… #> 7 1 2004 51 male degree rep 2 50 $2500… midd… #> 8 1 1998 35 male no degr… ind 6 45 $2500… midd… #> 9 1 1994 49 female no degr… ind 4 40 $2500… midd… #> 10 1 1985 51 female no degr… dem 4 28 $2500… midd… #> # ℹ 49,990 more rows #> # ℹ 2 more variables: finrela , weight # compute the proportion of respondents with a college # degree in each replicate p_hats <- slices %>% group_by(replicate) %>% summarize(prop_college = mean(college == \"degree\")) # plot sampling distribution ggplot(p_hats, aes(x = prop_college)) + geom_density() + labs( x = \"p_hat\", y = \"Number of samples\", title = \"Sampling distribution of p_hat\" ) # sampling with probability weights. Note probabilities are automatically # renormalized to sum to 1 df <- tibble( id = 1:5, letter = factor(c(\"a\", \"b\", \"c\", \"d\", \"e\")) ) rep_slice_sample(df, n = 2, reps = 5, weight_by = c(.5, .4, .3, .2, .1)) #> # A tibble: 10 × 3 #> # Groups: replicate [5] #> replicate id letter #> #> 1 1 3 c #> 2 1 5 e #> 3 2 5 e #> 4 2 3 c #> 5 3 1 a #> 6 3 3 c #> 7 4 1 a #> 8 4 2 b #> 9 5 1 a #> 10 5 4 d # alternatively, pass an unquoted column name in `.data` as `weight_by` df <- df %>% mutate(wts = c(.5, .4, .3, .2, .1)) rep_slice_sample(df, n = 2, reps = 5, weight_by = wts) #> # A tibble: 10 × 4 #> # Groups: replicate [5] #> replicate id letter wts #> #> 1 1 3 c 0.3 #> 2 1 1 a 0.5 #> 3 2 2 b 0.4 #> 4 2 1 a 0.5 #> 5 3 5 e 0.1 #> 6 3 3 c 0.3 #> 7 4 3 c 0.3 #> 8 4 1 a 0.5 #> 9 5 3 c 0.3 #> 10 5 4 d 0.2"},{"path":"https://infer.tidymodels.org/dev/reference/shade_confidence_interval.html","id":null,"dir":"Reference","previous_headings":"","what":"Add information about confidence interval — shade_confidence_interval","title":"Add information about confidence interval — shade_confidence_interval","text":"shade_confidence_interval() plots confidence interval region top visualize() output. output ggplot2 layer can added +. function shorter alias, shade_ci(). Learn vignette(\"infer\").","code":""},{"path":"https://infer.tidymodels.org/dev/reference/shade_confidence_interval.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Add information about confidence interval — shade_confidence_interval","text":"","code":"shade_confidence_interval( endpoints, color = \"mediumaquamarine\", fill = \"turquoise\", ... ) shade_ci(endpoints, color = \"mediumaquamarine\", fill = \"turquoise\", ...)"},{"path":"https://infer.tidymodels.org/dev/reference/shade_confidence_interval.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Add information about confidence interval — shade_confidence_interval","text":"endpoints lower upper bounds interval plotted. Likely, output get_confidence_interval(). calculate()-based workflows, 2-element vector 1 x 2 data frame containing lower upper values plotted. fit()-based workflows, (p + 1) x 3 data frame columns term, lower_ci, upper_ci, giving upper lower bounds regression term. use visualizations assume() output, must output get_confidence_interval(). color character hex string specifying color end points vertical lines plot. fill character hex string specifying color shade confidence interval. NULL shading actually done. ... arguments passed along ggplot2 functions.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/shade_confidence_interval.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Add information about confidence interval — shade_confidence_interval","text":"added existing infer visualization, ggplot2 object displaying supplied intervals top corresponding distribution. Otherwise, infer_layer list.","code":""},{"path":[]},{"path":"https://infer.tidymodels.org/dev/reference/shade_confidence_interval.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Add information about confidence interval — shade_confidence_interval","text":"","code":"# find the point estimate---mean number of hours worked per week point_estimate <- gss %>% specify(response = hours) %>% calculate(stat = \"mean\") # ...and a bootstrap distribution boot_dist <- gss %>% # ...we're interested in the number of hours worked per week specify(response = hours) %>% # generating data points generate(reps = 1000, type = \"bootstrap\") %>% # finding the distribution from the generated data calculate(stat = \"mean\") # find a confidence interval around the point estimate ci <- boot_dist %>% get_confidence_interval(point_estimate = point_estimate, # at the 95% confidence level level = .95, # using the standard error method type = \"se\") # and plot it! boot_dist %>% visualize() + shade_confidence_interval(ci) # or just plot the bounds boot_dist %>% visualize() + shade_confidence_interval(ci, fill = NULL) # you can shade confidence intervals on top of # theoretical distributions, too---the theoretical # distribution will be recentered and rescaled to # align with the confidence interval sampling_dist <- gss %>% specify(response = hours) %>% assume(distribution = \"t\") visualize(sampling_dist) + shade_confidence_interval(ci) # \\donttest{ # to visualize distributions of coefficients for multiple # explanatory variables, use a `fit()`-based workflow # fit 1000 linear models with the `hours` variable permuted null_fits <- gss %>% specify(hours ~ age + college) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") %>% fit() null_fits #> # A tibble: 3,000 × 3 #> # Groups: replicate [1,000] #> replicate term estimate #> #> 1 1 intercept 40.8 #> 2 1 age 0.0153 #> 3 1 collegedegree -0.0626 #> 4 2 intercept 40.3 #> 5 2 age 0.0278 #> 6 2 collegedegree -0.0655 #> 7 3 intercept 42.8 #> 8 3 age -0.0348 #> 9 3 collegedegree 0.0726 #> 10 4 intercept 40.7 #> # ℹ 2,990 more rows # fit a linear model to the observed data obs_fit <- gss %>% specify(hours ~ age + college) %>% fit() obs_fit #> # A tibble: 3 × 2 #> term estimate #> #> 1 intercept 40.6 #> 2 age 0.00596 #> 3 collegedegree 1.53 # get confidence intervals for each term conf_ints <- get_confidence_interval( null_fits, point_estimate = obs_fit, level = .95 ) # visualize distributions of coefficients # generated under the null visualize(null_fits) # add a confidence interval shading layer to juxtapose # the null fits with the observed fit for each term visualize(null_fits) + shade_confidence_interval(conf_ints) # } # more in-depth explanation of how to use the infer package if (FALSE) { vignette(\"infer\") }"},{"path":"https://infer.tidymodels.org/dev/reference/shade_p_value.html","id":null,"dir":"Reference","previous_headings":"","what":"Shade histogram area beyond an observed statistic — shade_p_value","title":"Shade histogram area beyond an observed statistic — shade_p_value","text":"shade_p_value() plots p-value region top visualize() output. output ggplot2 layer can added +. function shorter alias, shade_pvalue(). Learn vignette(\"infer\").","code":""},{"path":"https://infer.tidymodels.org/dev/reference/shade_p_value.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Shade histogram area beyond an observed statistic — shade_p_value","text":"","code":"shade_p_value(obs_stat, direction, color = \"red2\", fill = \"pink\", ...) shade_pvalue(obs_stat, direction, color = \"red2\", fill = \"pink\", ...)"},{"path":"https://infer.tidymodels.org/dev/reference/shade_p_value.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Shade histogram area beyond an observed statistic — shade_p_value","text":"obs_stat observed statistic estimate. calculate()-based workflows, 1-element numeric vector 1 x 1 data frame containing observed statistic. fit()-based workflows, (p + 1) x 2 data frame columns term estimate giving observed estimate term. direction string specifying direction shading occur. Options \"less\", \"greater\", \"two-sided\". Can also give \"left\", \"right\", \"\", \"two_sided\", \"two sided\", \"two.sided\". NULL, function shade area. color character hex string specifying color observed statistic vertical line plot. fill character hex string specifying color shade p-value region. NULL, function shade area. ... arguments passed along ggplot2 functions. expert use .","code":""},{"path":"https://infer.tidymodels.org/dev/reference/shade_p_value.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Shade histogram area beyond an observed statistic — shade_p_value","text":"added existing infer visualization, ggplot2 object displaying supplied statistic top corresponding distribution. Otherwise, infer_layer list.","code":""},{"path":[]},{"path":"https://infer.tidymodels.org/dev/reference/shade_p_value.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Shade histogram area beyond an observed statistic — shade_p_value","text":"","code":"# find the point estimate---mean number of hours worked per week point_estimate <- gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% calculate(stat = \"t\") # ...and a null distribution null_dist <- gss %>% # ...we're interested in the number of hours worked per week specify(response = hours) %>% # hypothesizing that the mean is 40 hypothesize(null = \"point\", mu = 40) %>% # generating data points for a null distribution generate(reps = 1000, type = \"bootstrap\") %>% # estimating the null distribution calculate(stat = \"t\") # shade the p-value of the point estimate null_dist %>% visualize() + shade_p_value(obs_stat = point_estimate, direction = \"two-sided\") #> Warning: All aesthetics have length 1, but the data has 1000 rows. #> ℹ Did you mean to use `annotate()`? # you can shade confidence intervals on top of # theoretical distributions, too! null_dist_theory <- gss %>% specify(response = hours) %>% assume(distribution = \"t\") null_dist_theory %>% visualize() + shade_p_value(obs_stat = point_estimate, direction = \"two-sided\") # \\donttest{ # to visualize distributions of coefficients for multiple # explanatory variables, use a `fit()`-based workflow # fit 1000 linear models with the `hours` variable permuted null_fits <- gss %>% specify(hours ~ age + college) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") %>% fit() null_fits #> # A tibble: 3,000 × 3 #> # Groups: replicate [1,000] #> replicate term estimate #> #> 1 1 intercept 42.3 #> 2 1 age -0.0191 #> 3 1 collegedegree -0.303 #> 4 2 intercept 37.2 #> 5 2 age 0.105 #> 6 2 collegedegree -0.0498 #> 7 3 intercept 40.3 #> 8 3 age 0.0240 #> 9 3 collegedegree 0.379 #> 10 4 intercept 41.0 #> # ℹ 2,990 more rows # fit a linear model to the observed data obs_fit <- gss %>% specify(hours ~ age + college) %>% fit() obs_fit #> # A tibble: 3 × 2 #> term estimate #> #> 1 intercept 40.6 #> 2 age 0.00596 #> 3 collegedegree 1.53 # visualize distributions of coefficients # generated under the null visualize(null_fits) # add a p-value shading layer to juxtapose the null # fits with the observed fit for each term visualize(null_fits) + shade_p_value(obs_fit, direction = \"both\") #> Warning: All aesthetics have length 1, but the data has 1000 rows. #> ℹ Did you mean to use `annotate()`? #> Warning: All aesthetics have length 1, but the data has 1000 rows. #> ℹ Did you mean to use `annotate()`? #> Warning: All aesthetics have length 1, but the data has 1000 rows. #> ℹ Did you mean to use `annotate()`? # the direction argument will be applied # to the plot for each term visualize(null_fits) + shade_p_value(obs_fit, direction = \"left\") #> Warning: All aesthetics have length 1, but the data has 1000 rows. #> ℹ Did you mean to use `annotate()`? #> Warning: All aesthetics have length 1, but the data has 1000 rows. #> ℹ Did you mean to use `annotate()`? #> Warning: All aesthetics have length 1, but the data has 1000 rows. #> ℹ Did you mean to use `annotate()`? # } # more in-depth explanation of how to use the infer package if (FALSE) { vignette(\"infer\") }"},{"path":"https://infer.tidymodels.org/dev/reference/specify.html","id":null,"dir":"Reference","previous_headings":"","what":"Specify response and explanatory variables — specify","title":"Specify response and explanatory variables — specify","text":"specify() used specify columns supplied data frame relevant response (, applicable, explanatory) variables. Note character variables converted factors. Learn vignette(\"infer\").","code":""},{"path":"https://infer.tidymodels.org/dev/reference/specify.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Specify response and explanatory variables — specify","text":"","code":"specify(x, formula, response = NULL, explanatory = NULL, success = NULL)"},{"path":"https://infer.tidymodels.org/dev/reference/specify.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Specify response and explanatory variables — specify","text":"x data frame can coerced tibble. formula formula response variable left explanatory right. Alternatively, response explanatory argument can supplied. response variable name x serve response. alternative using formula argument. explanatory variable name x serve explanatory variable. alternative using formula argument. success level response considered success, string. Needed inference one proportion, difference proportions, corresponding z stats.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/specify.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Specify response and explanatory variables — specify","text":"tibble containing response (explanatory, specified) variable data.","code":""},{"path":[]},{"path":"https://infer.tidymodels.org/dev/reference/specify.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Specify response and explanatory variables — specify","text":"","code":"# specifying for a point estimate on one variable gss %>% specify(response = age) #> Response: age (numeric) #> # A tibble: 500 × 1 #> age #> #> 1 36 #> 2 34 #> 3 24 #> 4 42 #> 5 31 #> 6 32 #> 7 48 #> 8 36 #> 9 30 #> 10 33 #> # ℹ 490 more rows # specify a relationship between variables as a formula... gss %>% specify(age ~ partyid) #> Dropping unused factor levels DK from the supplied explanatory variable #> 'partyid'. #> Response: age (numeric) #> Explanatory: partyid (factor) #> # A tibble: 500 × 2 #> age partyid #> #> 1 36 ind #> 2 34 rep #> 3 24 ind #> 4 42 ind #> 5 31 rep #> 6 32 rep #> 7 48 dem #> 8 36 ind #> 9 30 rep #> 10 33 dem #> # ℹ 490 more rows # ...or with named arguments! gss %>% specify(response = age, explanatory = partyid) #> Dropping unused factor levels DK from the supplied explanatory variable #> 'partyid'. #> Response: age (numeric) #> Explanatory: partyid (factor) #> # A tibble: 500 × 2 #> age partyid #> #> 1 36 ind #> 2 34 rep #> 3 24 ind #> 4 42 ind #> 5 31 rep #> 6 32 rep #> 7 48 dem #> 8 36 ind #> 9 30 rep #> 10 33 dem #> # ℹ 490 more rows # more in-depth explanation of how to use the infer package if (FALSE) { vignette(\"infer\") }"},{"path":"https://infer.tidymodels.org/dev/reference/t_stat.html","id":null,"dir":"Reference","previous_headings":"","what":"Tidy t-test statistic — t_stat","title":"Tidy t-test statistic — t_stat","text":"shortcut wrapper function get observed test statistic t test. function deprecated favor general observe().","code":""},{"path":"https://infer.tidymodels.org/dev/reference/t_stat.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Tidy t-test statistic — t_stat","text":"","code":"t_stat( x, formula, response = NULL, explanatory = NULL, order = NULL, alternative = \"two-sided\", mu = 0, conf_int = FALSE, conf_level = 0.95, ... )"},{"path":"https://infer.tidymodels.org/dev/reference/t_stat.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Tidy t-test statistic — t_stat","text":"x data frame can coerced tibble. formula formula response variable left explanatory right. Alternatively, response explanatory argument can supplied. response variable name x serve response. alternative using formula argument. explanatory variable name x serve explanatory variable. alternative using formula argument. order string vector specifying order levels explanatory variable ordered subtraction, order = c(\"first\", \"second\") means (\"first\" - \"second\"). alternative Character string giving direction alternative hypothesis. Options \"two-sided\" (default), \"greater\", \"less\". mu numeric value giving hypothesized null mean value one sample test hypothesized difference two sample test. conf_int logical value whether include confidence interval . TRUE default. conf_level numeric value 0 1. Default value 0.95. ... Pass arguments infer functions.","code":""},{"path":[]},{"path":"https://infer.tidymodels.org/dev/reference/t_stat.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Tidy t-test statistic — t_stat","text":"","code":"library(tidyr) # t test statistic for true mean number of hours worked # per week of 40 gss %>% t_stat(response = hours, mu = 40) #> Warning: The t_stat() wrapper has been deprecated in favor of the more general observe(). Please use that function instead. #> t #> 2.085191 # t test statistic for number of hours worked per week # by college degree status gss %>% tidyr::drop_na(college) %>% t_stat(formula = hours ~ college, order = c(\"degree\", \"no degree\"), alternative = \"two-sided\") #> Warning: The t_stat() wrapper has been deprecated in favor of the more general observe(). Please use that function instead. #> t #> 1.11931"},{"path":"https://infer.tidymodels.org/dev/reference/t_test.html","id":null,"dir":"Reference","previous_headings":"","what":"Tidy t-test — t_test","title":"Tidy t-test — t_test","text":"tidier version t.test() two sample tests.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/t_test.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Tidy t-test — t_test","text":"","code":"t_test( x, formula, response = NULL, explanatory = NULL, order = NULL, alternative = \"two-sided\", mu = 0, conf_int = TRUE, conf_level = 0.95, ... )"},{"path":"https://infer.tidymodels.org/dev/reference/t_test.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Tidy t-test — t_test","text":"x data frame can coerced tibble. formula formula response variable left explanatory right. Alternatively, response explanatory argument can supplied. response variable name x serve response. alternative using formula argument. explanatory variable name x serve explanatory variable. alternative using formula argument. order string vector specifying order levels explanatory variable ordered subtraction, order = c(\"first\", \"second\") means (\"first\" - \"second\"). alternative Character string giving direction alternative hypothesis. Options \"two-sided\" (default), \"greater\", \"less\". mu numeric value giving hypothesized null mean value one sample test hypothesized difference two sample test. conf_int logical value whether include confidence interval . TRUE default. conf_level numeric value 0 1. Default value 0.95. ... passing arguments t.test().","code":""},{"path":[]},{"path":"https://infer.tidymodels.org/dev/reference/t_test.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Tidy t-test — t_test","text":"","code":"library(tidyr) # t test for number of hours worked per week # by college degree status gss %>% tidyr::drop_na(college) %>% t_test(formula = hours ~ college, order = c(\"degree\", \"no degree\"), alternative = \"two-sided\") #> # A tibble: 1 × 7 #> statistic t_df p_value alternative estimate lower_ci upper_ci #> #> 1 1.12 366. 0.264 two.sided 1.54 -1.16 4.24 # see vignette(\"infer\") for more explanation of the # intuition behind the infer package, and vignette(\"t_test\") # for more examples of t-tests using infer"},{"path":"https://infer.tidymodels.org/dev/reference/visualize.html","id":null,"dir":"Reference","previous_headings":"","what":"Visualize statistical inference — visualize","title":"Visualize statistical inference — visualize","text":"Visualize distribution simulation-based inferential statistics theoretical distribution (!). Learn vignette(\"infer\").","code":""},{"path":"https://infer.tidymodels.org/dev/reference/visualize.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Visualize statistical inference — visualize","text":"","code":"visualize(data, bins = 15, method = \"simulation\", dens_color = \"black\", ...) visualise(data, bins = 15, method = \"simulation\", dens_color = \"black\", ...)"},{"path":"https://infer.tidymodels.org/dev/reference/visualize.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Visualize statistical inference — visualize","text":"data distribution. simulation-based inference, data frame containing distribution calculate()d statistics fit()ted coefficient estimates. object passed generate() supplied calculate() fit(). theory-based inference, output assume(). bins number bins histogram. method string giving method display. Options \"simulation\", \"theoretical\", \"\" \"\" corresponding \"simulation\" \"theoretical\". data output assume(), argument ignored default \"theoretical\". dens_color character hex string specifying color theoretical density curve. ... Additional arguments passed along functions ggplot2. method = \"simulation\", stat_bin(), method = \"theoretical\", geom_path(). values may overwritten infer internally.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/visualize.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Visualize statistical inference — visualize","text":"calculate()-based workflows, ggplot showing simulation-based distribution histogram bar graph. Can also used display theoretical distributions. assume()-based workflows, ggplot showing theoretical distribution. fit()-based workflows, patchwork object showing simulation-based distributions histogram bar graph. interface adjust plot options themes bit different patchwork plots ggplot2 plots. examples highlight biggest differences , see patchwork::plot_annotation() patchwork::&.gg details.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/visualize.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Visualize statistical inference — visualize","text":"order make visualization workflow straightforward explicit, visualize() now used plot distributions statistics directly. number arguments related shading p-values confidence intervals now deprecated visualize() now passed shade_p_value() shade_confidence_interval(), respectively. visualize() raise warning deprecated arguments supplied.","code":""},{"path":[]},{"path":"https://infer.tidymodels.org/dev/reference/visualize.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Visualize statistical inference — visualize","text":"","code":"# generate a null distribution null_dist <- gss %>% # we're interested in the number of hours worked per week specify(response = hours) %>% # hypothesizing that the mean is 40 hypothesize(null = \"point\", mu = 40) %>% # generating data points for a null distribution generate(reps = 1000, type = \"bootstrap\") %>% # calculating a distribution of means calculate(stat = \"mean\") # or a bootstrap distribution, omitting the hypothesize() step, # for use in confidence intervals boot_dist <- gss %>% specify(response = hours) %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"mean\") # we can easily plot the null distribution by piping into visualize null_dist %>% visualize() # we can add layers to the plot as in ggplot, as well... # find the point estimate---mean number of hours worked per week point_estimate <- gss %>% specify(response = hours) %>% calculate(stat = \"mean\") # find a confidence interval around the point estimate ci <- boot_dist %>% get_confidence_interval(point_estimate = point_estimate, # at the 95% confidence level level = .95, # using the standard error method type = \"se\") # display a shading of the area beyond the p-value on the plot null_dist %>% visualize() + shade_p_value(obs_stat = point_estimate, direction = \"two-sided\") #> Warning: All aesthetics have length 1, but the data has 1000 rows. #> ℹ Did you mean to use `annotate()`? # ...or within the bounds of the confidence interval null_dist %>% visualize() + shade_confidence_interval(ci) # plot a theoretical sampling distribution by creating # a theory-based distribution with `assume()` sampling_dist <- gss %>% specify(response = hours) %>% assume(distribution = \"t\") visualize(sampling_dist) # you can shade confidence intervals on top of # theoretical distributions, too---the theoretical # distribution will be recentered and rescaled to # align with the confidence interval visualize(sampling_dist) + shade_confidence_interval(ci) # to plot both a theory-based and simulation-based null distribution, # use a theorized statistic (i.e. one of t, z, F, or Chisq) # and supply the simulation-based null distribution null_dist_t <- gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"t\") obs_stat <- gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% calculate(stat = \"t\") visualize(null_dist_t, method = \"both\") #> Warning: Check to make sure the conditions have been met for the theoretical #> method. infer currently does not check these for you. visualize(null_dist_t, method = \"both\") + shade_p_value(obs_stat, \"both\") #> Warning: Check to make sure the conditions have been met for the theoretical #> method. infer currently does not check these for you. #> Warning: All aesthetics have length 1, but the data has 1000 rows. #> ℹ Did you mean to use `annotate()`? # \\donttest{ # to visualize distributions of coefficients for multiple # explanatory variables, use a `fit()`-based workflow # fit 1000 models with the `hours` variable permuted null_fits <- gss %>% specify(hours ~ age + college) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") %>% fit() null_fits #> # A tibble: 3,000 × 3 #> # Groups: replicate [1,000] #> replicate term estimate #> #> 1 1 intercept 39.5 #> 2 1 age 0.0515 #> 3 1 collegedegree -0.687 #> 4 2 intercept 40.5 #> 5 2 age 0.0209 #> 6 2 collegedegree -0.0149 #> 7 3 intercept 39.8 #> 8 3 age 0.0305 #> 9 3 collegedegree 1.16 #> 10 4 intercept 39.9 #> # ℹ 2,990 more rows # visualize distributions of resulting coefficients visualize(null_fits) # the interface to add themes and other elements to patchwork # plots (outputted by `visualize` when the inputted data # is from the `fit()` function) is a bit different than adding # them to ggplot2 plots. library(ggplot2) # to add a ggplot2 theme to a `calculate()`-based visualization, use `+` null_dist %>% visualize() + theme_dark() # to add a ggplot2 theme to a `fit()`-based visualization, use `&` null_fits %>% visualize() & theme_dark() # } # More in-depth explanation of how to use the infer package if (FALSE) { vignette(\"infer\") }"},{"path":[]},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"infer-v106","dir":"Changelog","previous_headings":"","what":"infer v1.0.6","title":"infer v1.0.6","text":"CRAN release: 2024-01-31 Updated infrastructure errors, warnings, messages (#513). changes visible users, though: Many longer error messages now broken several lines. references help-files, users can now click error message’s text navigate cited documentation. Various improvements documentation (#501, #504, #508, #512). Fixed bug get_confidence_interval() error uninformatively supplied distribution estimates contained missing values. function now warn return confidence interval calculated using non-missing estimates (#521). Fixed bug generate() used without first specify()ing variables, even cases specification affect resampling/simulation (#448).","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"infer-v105","dir":"Changelog","previous_headings":"","what":"infer v1.0.5","title":"infer v1.0.5","text":"CRAN release: 2023-09-06 Implemented support permutation hypothesis tests paired data via argument value null = \"paired independence\" hypothesize() (#487). weight_by argument rep_slice_sample() can now passed either vector numeric weights unquoted column name .data (#480). Newly accommodates variables spaces names wrapper functions t_test() prop_test() (#472). Fixed bug two-sample prop_test() response explanatory variable passed place prop.test(). enables using prop_test() explanatory variables greater 2 levels , process, addresses bug prop_test() collapsed levels success response variable 2 levels.","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"infer-v104","dir":"Changelog","previous_headings":"","what":"infer v1.0.4","title":"infer v1.0.4","text":"CRAN release: 2022-12-01 Fixed bug p-value shading shaded regions longer correctly overlaid histogram bars. Addressed deprecation warning ahead upcoming dplyr release.","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"infer-v103","dir":"Changelog","previous_headings":"","what":"infer v1.0.3","title":"infer v1.0.3","text":"CRAN release: 2022-08-22 Fix R-devel HTML5 NOTEs.","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"infer-v102","dir":"Changelog","previous_headings":"","what":"infer v1.0.2","title":"infer v1.0.2","text":"CRAN release: 2022-06-10 Fix p-value shading calculated statistic falls exactly boundaries histogram bin (#424). Fix generate() errors columns named x (#431). Fix error visualize passed generate()d infer_dist objects passed hypothesize() (#432). Update visual checks visualize output align R 4.1.0+ graphics engine (#438). specify() wrapper functions now appropriately handle ordered factors (#439). Clarify error incompatible statistics hypotheses supplied (#441). Updated generate() unexpected type warnings permissive—warning raised less often type = \"bootstrap\" (#425). Allow passing additional arguments stats::chisq.test via ... calculate(). Ellipses now always passed applicable base R hypothesis testing function, applicable (#414)! package now set levels logical variables conversion factor first level (regarded success default) TRUE. Core verbs warned without explicit success value already, change makes behavior consistent functions wrapped shorthand test wrappers (#440). Added new statistic stat = \"ratio means\" (#452).","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"infer-v101-github-only","dir":"Changelog","previous_headings":"","what":"infer v1.0.1 (GitHub Only)","title":"infer v1.0.1 (GitHub Only)","text":"release reflects infer version accepted Journal Open Source Software. Re-licensed package CC0 MIT. See LICENSE LICENSE.md files. Contributed paper Journal Open Source Software, draft available /figs/paper. Various improvements documentation (#417, #418).","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"infer-100","dir":"Changelog","previous_headings":"","what":"infer 1.0.0","title":"infer 1.0.0","text":"CRAN release: 2021-08-13 v1.0.0 first major release {infer} package! large, core verbs specify(), hypothesize(), generate(), calculate() interface . release makes several improvements behavioral consistency package introduces support theory-based inference well randomization-based inference multiple explanatory variables.","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"behavioral-consistency-1-0-0","dir":"Changelog","previous_headings":"","what":"Behavioral consistency","title":"infer 1.0.0","text":"major change package release set standards behavioral consistency calculate() (#356). Namely, package now supply consistent error supplied stat argument isn’t well-defined variables specify()d supply consistent message user supplies unneeded information via hypothesize() calculate() observed statistic supply consistent warning assume reasonable null value user supply sufficient information calculate observed statistic accommodate behavior, number new calculate methods added improved. Namely: Implemented standardized proportion z statistic one categorical variable Extended calculate() stat = \"t\" passing mu calculate() method stat = \"t\" allow calculation t statistics one numeric variable hypothesized mean Extended calculate() allow lowercase aliases stat arguments (#373). Fixed bugs calculate() allow programmatic calculation statistics behavioral consistency also allowed implementation observe(), wrapper function around specify(), hypothesize(), calculate(), calculate observed statistics. function provides shorthand alternative calculating observed statistics data: don’t anticipate changes “breaking” sense code previously worked continue , though may now message warn way used error different (hopefully informative) message.","code":"gss %>% specify(response = hours) %>% calculate(stat = \"diff in means\") #> Error: A difference in means is not well-defined for a #> numeric response variable (hours) and no explanatory variable. gss %>% specify(college ~ partyid, success = \"degree\") %>% calculate(stat = \"diff in props\") #> Error: A difference in proportions is not well-defined for a dichotomous categorical #> response variable (college) and a multinomial categorical explanatory variable (partyid). # supply mu = 40 when it's not needed gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% calculate(stat = \"mean\") #> Message: The point null hypothesis `mu = 40` does not inform calculation of #> the observed statistic (a mean) and will be ignored. #> # A tibble: 1 x 1 #> stat #> #> 1 41.4 # don't hypothesize `p` when it's needed gss %>% specify(response = sex, success = \"female\") %>% calculate(stat = \"z\") #> # A tibble: 1 x 1 #> stat #> #> 1 -1.16 #> Warning message: #> A z statistic requires a null hypothesis to calculate the observed statistic. #> Output assumes the following null value: `p = .5`. # don't hypothesize `p` when it's needed gss %>% specify(response = partyid) %>% calculate(stat = \"Chisq\") #> # A tibble: 1 x 1 #> stat #> #> 1 334. #> Warning message: #> A chi-square statistic requires a null hypothesis to calculate the observed statistic. #> Output assumes the following null values: `p = c(dem = 0.2, ind = 0.2, rep = 0.2, other = 0.2, DK = 0.2)`. # calculating the observed mean number of hours worked per week gss %>% observe(hours ~ NULL, stat = \"mean\") #> # A tibble: 1 x 1 #> stat #> #> 1 41.4 # equivalently, calculating the same statistic with the core verbs gss %>% specify(response = hours) %>% calculate(stat = \"mean\") #> # A tibble: 1 x 1 #> stat #> #> 1 41.4 # calculating a t statistic for hypothesized mu = 40 hours worked/week gss %>% observe(hours ~ NULL, stat = \"t\", null = \"point\", mu = 40) #> # A tibble: 1 x 1 #> stat #> #> 1 2.09 # equivalently, calculating the same statistic with the core verbs gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% calculate(stat = \"t\") #> # A tibble: 1 x 1 #> stat #> #> 1 2.09"},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"a-framework-for-theoretical-inference-1-0-0","dir":"Changelog","previous_headings":"","what":"A framework for theoretical inference","title":"infer 1.0.0","text":"release also introduces complete principled interface theoretical inference. package previously supplied methods visualization theory-based curves, interface provide object explicitly “null distribution” supplied helper functions like get_p_value() get_confidence_interval(). new interface based new verb, assume(), returns null distribution can interfaced way simulation-based null distributions can interfaced . example, ’ll work full infer pipeline inference mean using infer’s gss dataset. Supposed believe true mean number hours worked Americans past week 40. First, calculating observed t-statistic: code define null distribution similar required calculate theorized observed statistic, switching calculate() assume() replacing arguments needed. null distribution can now interfaced way simulation-based null distribution elsewhere package. example, calculating p-value juxtaposing observed statistic null distribution: …visualizing null distribution alone: …juxtaposing two visually: Confidence intervals lie data space rather standardized scale theoretical distributions. Calculating mean rather standardized t-statistic: null distribution just defines spread standard error calculation. Visualizing confidence interval results theoretical distribution recentered rescaled align scale observed data: Previous methods interfacing theoretical distributions superseded—continue supported, though documentation forefront assume() interface.","code":"obs_stat <- gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% calculate(stat = \"t\") obs_stat #> Response: hours (numeric) #> Null Hypothesis: point #> # A tibble: 1 x 1 #> stat #> #> 1 2.09 null_dist <- gss %>% specify(response = hours) %>% assume(distribution = \"t\") null_dist #> A T distribution with 499 degrees of freedom. get_p_value(null_dist, obs_stat, direction = \"both\") #> # A tibble: 1 x 1 #> p_value #> #> 1 0.0376 visualize(null_dist) visualize(null_dist) + shade_p_value(obs_stat, direction = \"both\") obs_mean <- gss %>% specify(response = hours) %>% calculate(stat = \"mean\") ci <- get_confidence_interval( null_dist, level = .95, point_estimate = obs_mean ) ci #> # A tibble: 1 x 2 #> lower_ci upper_ci #> #> 1 40.1 42.7 visualize(null_dist) + shade_confidence_interval(ci)"},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"support-for-multiple-regression-1-0-0","dir":"Changelog","previous_headings":"","what":"Support for multiple regression","title":"infer 1.0.0","text":"2016 “Guidelines Assessment Instruction Statistics Education” [1] state , introductory statistics courses, “[s]tudents gain experience statistical models, including multivariable models, used.” line recommendation, introduce support randomization-based inference multiple explanatory variables via new fit.infer core verb. passed infer object, method parse formula formula response explanatory arguments, pass data stats::glm call. Note function returns model coefficients estimate rather associated t-statistics stat. passed generate()d object, model fitted replicate. type = \"permute\", set unquoted column names data permute (independently ) can passed via variables argument generate. defaults response variable. feature allows detailed exploration effect disrupting correlation structure among explanatory variables outputted model coefficients. auxillary functions get_p_value(), get_confidence_interval(), visualize(), shade_p_value(), shade_confidence_interval() methods handle fit() output! See help-files example usage. Note shade_* functions now delay evaluation added existing ggplot (e.g. outputted visualize()) +.","code":"gss %>% specify(hours ~ age + college) %>% fit() #> # A tibble: 3 x 2 #> term estimate #> #> 1 intercept 40.6 #> 2 age 0.00596 #> 3 collegedegree 1.53 gss %>% specify(hours ~ age + college) %>% hypothesize(null = \"independence\") %>% generate(reps = 100, type = \"permute\") %>% fit() #> # A tibble: 300 x 3 #> # Groups: replicate [100] #> replicate term estimate #> #> 1 1 intercept 44.4 #> 2 1 age -0.0767 #> 3 1 collegedegree 0.121 #> 4 2 intercept 41.8 #> 5 2 age 0.00344 #> 6 2 collegedegree -1.59 #> 7 3 intercept 38.3 #> 8 3 age 0.0761 #> 9 3 collegedegree 0.136 #> 10 4 intercept 43.1 #> # … with 290 more rows gss %>% specify(hours ~ age + college) %>% hypothesize(null = \"independence\") %>% generate(reps = 100, type = \"permute\", variables = c(age, college)) %>% fit() #> # A tibble: 300 x 3 #> # Groups: replicate [100] #> replicate term estimate #> #> 1 1 intercept 39.4 #> 2 1 age 0.0748 #> 3 1 collegedegree -2.98 #> 4 2 intercept 42.8 #> 5 2 age -0.0190 #> 6 2 collegedegree -1.83 #> 7 3 intercept 40.4 #> 8 3 age 0.0354 #> 9 3 collegedegree -1.31 #> 10 4 intercept 40.9 #> # … with 290 more rows"},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"improvements-1-0-0","dir":"Changelog","previous_headings":"","what":"Improvements","title":"infer 1.0.0","text":"Following extensive discussion, generate() type type = \"simulate\" renamed evocative type = \"draw\". continue support type = \"simulate\" indefinitely, though supplying argument now prompt message notifying user preferred alias. (#233, #390) Fixed several bugs related factors unused levels. specify() now drop unused factor levels message done . (#374, #375, #397, #380) Added two.sided acceptable alias two_sided direction argument get_p_value() shade_p_value(). (#355) Various improvements documentation, including extending example sections help-files, re-organizing function reference {pkgdown} site, linking extensively among help-files.","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"breaking-changes-1-0-0","dir":"Changelog","previous_headings":"","what":"Breaking changes","title":"infer 1.0.0","text":"don’t anticipate changes made release “breaking” sense code previously worked continue , though may now message warn way used error different (hopefully informative) message. currently teach research infer, recommend re-running materials noting changes messaging warning. Move forward number planned deprecations. Namely, GENERATION_TYPES object now fully deprecated, arguments relocated visualize() shade_p_value() shade_confidence_interval() now fully deprecated visualize(). supplied deprecated argument, visualize() warn user ignore argument. Added prop argument rep_slice_sample() alternative n argument specifying proportion rows supplied data sample per replicate (#361, #362, #363). changes order arguments rep_slice_sample() (order aligned dplyr::slice_sample()) might break code didn’t use named arguments (like rep_slice_sample(df, 5, TRUE)). fix , use named arguments (like rep_slice_sample(df, 5, replicate = TRUE)).","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"other-1-0-0","dir":"Changelog","previous_headings":"","what":"Other","title":"infer 1.0.0","text":"Added Simon P. Couch author. Long deserved reliable maintenance improvements package. [1]: GAISE College Report ASA Revision Committee, “Guidelines Assessment Instruction Statistics Education College Report 2016,” http://www.amstat.org/education/gaise.","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"infer-054","dir":"Changelog","previous_headings":"","what":"infer 0.5.4","title":"infer 0.5.4","text":"CRAN release: 2021-01-13 rep_sample_n() longer errors supplied prob argument (#279) Added rep_slice_sample(), light wrapper around rep_sample_n(), closely resembles dplyr::slice_sample() (function supersedes dplyr::sample_n()) (#325) Added success, correct, z argument prop_test() (#343, #347, #353) Implemented observed statistic calculation standardized proportion z statistic (#351, #353) Various bug fixes improvements documentation errors.","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"infer-053","dir":"Changelog","previous_headings":"","what":"infer 0.5.3","title":"infer 0.5.3","text":"CRAN release: 2020-07-14","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"breaking-changes-0-5-3","dir":"Changelog","previous_headings":"","what":"Breaking changes","title":"infer 0.5.3","text":"get_confidence_interval() now uses column names (‘lower_ci’ ‘upper_ci’) output consistent infer functionality (#317).","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"new-functionality-0-5-3","dir":"Changelog","previous_headings":"","what":"New functionality","title":"infer 0.5.3","text":"get_confidence_interval() can now produce bias-corrected confidence intervals setting type = \"bias-corrected\". Thanks @davidbaniadam initial implementation (#237, #318)!","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"other-0-5-3","dir":"Changelog","previous_headings":"","what":"Other","title":"infer 0.5.3","text":"Fix CRAN check failures related long double errors.","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"infer-052","dir":"Changelog","previous_headings":"","what":"infer 0.5.2","title":"infer 0.5.2","text":"CRAN release: 2020-06-14 Warn user p-value 0 reported (#257, #273) Added new vignettes: chi_squared anova (#268) Updates documentation existing vignettes (#268) Add alias hypothesize() (hypothesise()) (#271) Subtraction order longer required difference-based tests–warning raised case user doesn’t supply order argument (#275, #281) Add new messages common errors (#277) Increase coverage theoretical methods documentation (#278, #280) Drop missing values reduce size gss dataset used examples (#282) Add stat = \"ratio props\" stat = \"odds ratio\" calculate (#285) Add prop_test(), tidy interface prop.test() (#284, #287) Updates visualize() compatibility ggplot2 v3.3.0 (#289) Fix error bootstrapping small samples raise warnings/errors appropriate (#239, #244, #291) Fix unit test failures resulting breaking changes dplyr v1.0.0 Fix error generate() response variable named x (#299) Add two-sided two sided aliases two_sided direction argument get_p_value() shade_p_value() (#302) Fix t_test() t_stat() ignoring order argument (#310)","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"infer-051","dir":"Changelog","previous_headings":"","what":"infer 0.5.1","title":"infer 0.5.1","text":"CRAN release: 2019-11-19 Updates documentation tweaks","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"infer-050","dir":"Changelog","previous_headings":"","what":"infer 0.5.0","title":"infer 0.5.0","text":"CRAN release: 2019-09-27","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"breaking-changes-0-5-0","dir":"Changelog","previous_headings":"","what":"Breaking changes","title":"infer 0.5.0","text":"shade_confidence_interval() now plots vertical lines starting zero (previously - bottom plot) (#234). shade_p_value() now uses “area curve” approach shading (#229).","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"other-0-5-0","dir":"Changelog","previous_headings":"","what":"Other","title":"infer 0.5.0","text":"Updated chisq_test() take arguments response/explanatory format, perform goodness fit tests, default approximation approach (#241). Updated chisq_stat() goodness fit (#241). Make interface hypothesize() clearer adding options point null parameters function signature (#242). Manage infer class systematically (#219). Use vdiffr plot testing (#221).","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"infer-041","dir":"Changelog","previous_headings":"","what":"infer 0.4.1","title":"infer 0.4.1","text":"Added Evgeni Chasnovski author incredible work refactoring package providing excellent support.","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"infer-040","dir":"Changelog","previous_headings":"","what":"infer 0.4.0","title":"infer 0.4.0","text":"CRAN release: 2018-11-15","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"breaking-changes-0-4-0","dir":"Changelog","previous_headings":"","what":"Breaking changes","title":"infer 0.4.0","text":"Changed method computing two-sided p-value conventional one. also makes get_pvalue() visualize() aligned (#205).","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"deprecation-changes-0-4-0","dir":"Changelog","previous_headings":"","what":"Deprecation changes","title":"infer 0.4.0","text":"Deprecated p_value() (use get_p_value() instead) (#180). Deprecated conf_int() (use get_confidence_interval() instead) (#180). Deprecated (via warnings) plotting p-value confidence interval visualize() (use new functions shade_p_value() shade_confidence_interval() instead) (#178).","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"new-functions-0-4-0","dir":"Changelog","previous_headings":"","what":"New functions","title":"infer 0.4.0","text":"shade_p_value() - {ggplot2}-like layer function add information p-value region visualize() output. alias shade_pvalue(). shade_confidence_interval() - {ggplot2}-like layer function add information confidence interval region visualize() output. alias shade_ci().","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"other-0-4-0","dir":"Changelog","previous_headings":"","what":"Other","title":"infer 0.4.0","text":"Account NULL value left hand side formula specify() (#156) type generate() (#157). Update documentation code follow tidyverse style guide (#159). Remove help page internal set_params() (#165). Fully use {tibble} (#166). Fix calculate() depend order p type = \"simulate\" (#122). Reduce code duplication (#173). Make transparency visualize() depend method data volume. Make visualize() work “One sample t” theoretical type method = \"\". Add stat = \"sum\" stat = \"count\" options calculate() (#50).","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"infer-031","dir":"Changelog","previous_headings":"","what":"infer 0.3.1","title":"infer 0.3.1","text":"CRAN release: 2018-08-06 Stop using package {assertive} favor custom type checks (#149) Fixed t_stat() use ... var.equal works help @echasnovski, fixed var.equal = TRUE specify() %>% calculate(stat = \"t\") Use custom functions error, warning, message, paste() handling (#155)","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"infer-030","dir":"Changelog","previous_headings":"","what":"infer 0.3.0","title":"infer 0.3.0","text":"CRAN release: 2018-07-11 Added conf_int logical argument conf_level argument t_test() Switched shade_color argument visualize() pvalue_fill instead since fill color confidence intervals also added now Green default color CI red p-values direction = \"\" get green shading Currently working simulation-based methods get_ci() get_confidence_interval() aliases conf_int() Converted longer confidence interval calculation code vignettes use get_ci() instead get_pvalue() alias p_value() Converted longer p-value calculation code vignettes use get_pvalue() instead Implemented Chi-square Goodness Fit observed stat depending params set hypothesize specify() %>% calculate() shortcut Removed “standardized” slope t since formula different “standardized” correlation way currently give one Implemented correlation bootstrap CI permutation hypothesis test Added message type given differently expected visualize() works either 1x1 data frame vector obs_stat argument Got stat = \"t\" working Refactored calculate() smaller functions reduce complexity Produced error mu given hypothesize() stat = \"median\" provided calculate() similar mis-specifications work one sample two sample cases providing formula Added order argument t_stat() Added implementation one sample t_test() passing mu argument t.test hypothesize() Tweaked pkgdown page include ToDo’s using {dplyr} example","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"infer-020","dir":"Changelog","previous_headings":"","what":"infer 0.2.0","title":"infer 0.2.0","text":"CRAN release: 2018-05-15 Switched !! instead UQ() since UQ() deprecated {rlang} 0.2.0 Added many new files: CONDUCT.md, CONTRIBUTING.md, -.md Updated README file development information Added wrapper functions t_test() chisq_test() use formula interface provide intuitive wrapper t.test() chisq.test() Created stat = \"z\" stat = \"t\" options Added many new arguments visualize() prescribe colors shade use observed statistics theoretical density curves Added check bar graph created visualize() number unique values generated statistics small Added shading method = \"theoretical\" Use percentiles determine two-tailed shading Changed method = \"randomization\" method = \"simulation\" Added warning theoretical distribution used assumptions checked Two sample t ANOVA F One proportion z Two proportion z Chi-square test independence Chi-square Goodness Fit test Standardized slope (t)","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"infer-011","dir":"Changelog","previous_headings":"","what":"infer 0.1.1","title":"infer 0.1.1","text":"CRAN release: 2018-01-22 Added additional tests Added order argument calculate() Fixed bugs post-CRAN release Automated travis build pkgdown gh-pages branch","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"infer-010","dir":"Changelog","previous_headings":"","what":"infer 0.1.0","title":"infer 0.1.0","text":"CRAN release: 2018-01-08 Altered way successes indicated infer pipeline. now live specify(). Updated documentation examples Deployed https://infer.tidymodels.org/","code":""},{"path":"https://infer.tidymodels.org/dev/news/index.html","id":"infer-001","dir":"Changelog","previous_headings":"","what":"infer 0.0.1","title":"infer 0.0.1","text":"Implemented “intro stats” examples randomization methods","code":""}]