## # A tibble: 1 × 1
## p_value
## <dbl>
-## 1 0.25
+## 1 0.284
Thus, if there were really no relationship between the number of
hours worked a week and whether one has a college degree, the
probability that we would see a statistic as or more extreme than 1.5384
-is approximately 0.25.
+is approximately 0.284.
Note that, similarly to the steps shown above, the package supplies a
wrapper function, t_test
, to carry out 2-sample \(t\)-tests on tidy data. The syntax looks
like this:
diff --git a/dev/articles/t_test_files/figure-html/visualize-1-sample-1.png b/dev/articles/t_test_files/figure-html/visualize-1-sample-1.png
index 0f2eb4da..5687f07e 100644
Binary files a/dev/articles/t_test_files/figure-html/visualize-1-sample-1.png and b/dev/articles/t_test_files/figure-html/visualize-1-sample-1.png differ
diff --git a/dev/articles/t_test_files/figure-html/visualize-2-sample-1.png b/dev/articles/t_test_files/figure-html/visualize-2-sample-1.png
index 3591147d..38320173 100644
Binary files a/dev/articles/t_test_files/figure-html/visualize-2-sample-1.png and b/dev/articles/t_test_files/figure-html/visualize-2-sample-1.png differ
diff --git a/dev/pkgdown.yml b/dev/pkgdown.yml
index 7a6f9d4e..2e0299ab 100644
--- a/dev/pkgdown.yml
+++ b/dev/pkgdown.yml
@@ -8,7 +8,7 @@ articles:
observed_stat_examples: observed_stat_examples.html
paired: paired.html
t_test: t_test.html
-last_built: 2024-03-25T14:59Z
+last_built: 2024-03-25T15:07Z
urls:
reference: https://infer.tidymodels.org/reference
article: https://infer.tidymodels.org/articles
diff --git a/dev/search.json b/dev/search.json
index 8ef9de37..0ee588a9 100644
--- a/dev/search.json
+++ b/dev/search.json
@@ -1 +1 @@
-[{"path":[]},{"path":"https://infer.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"our-pledge","dir":"","previous_headings":"","what":"Our Pledge","title":"Contributor Covenant Code of Conduct","text":"members, contributors, leaders pledge make participation community harassment-free experience everyone, regardless age, body size, visible invisible disability, ethnicity, sex characteristics, gender identity expression, level experience, education, socio-economic status, nationality, personal appearance, race, caste, color, religion, sexual identity orientation. pledge act interact ways contribute open, welcoming, diverse, inclusive, healthy community.","code":""},{"path":"https://infer.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"our-standards","dir":"","previous_headings":"","what":"Our Standards","title":"Contributor Covenant Code of Conduct","text":"Examples behavior contributes positive environment community include: Demonstrating empathy kindness toward people respectful differing opinions, viewpoints, experiences Giving gracefully accepting constructive feedback Accepting responsibility apologizing affected mistakes, learning experience Focusing best just us individuals, overall community Examples unacceptable behavior include: use sexualized language imagery, sexual attention advances kind Trolling, insulting derogatory comments, personal political attacks Public private harassment Publishing others’ private information, physical email address, without explicit permission conduct reasonably considered inappropriate professional setting","code":""},{"path":"https://infer.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"enforcement-responsibilities","dir":"","previous_headings":"","what":"Enforcement Responsibilities","title":"Contributor Covenant Code of Conduct","text":"Community leaders responsible clarifying enforcing standards acceptable behavior take appropriate fair corrective action response behavior deem inappropriate, threatening, offensive, harmful. Community leaders right responsibility remove, edit, reject comments, commits, code, wiki edits, issues, contributions aligned Code Conduct, communicate reasons moderation decisions appropriate.","code":""},{"path":"https://infer.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"scope","dir":"","previous_headings":"","what":"Scope","title":"Contributor Covenant Code of Conduct","text":"Code Conduct applies within community spaces, also applies individual officially representing community public spaces. Examples representing community include using official e-mail address, posting via official social media account, acting appointed representative online offline event.","code":""},{"path":"https://infer.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"enforcement","dir":"","previous_headings":"","what":"Enforcement","title":"Contributor Covenant Code of Conduct","text":"Instances abusive, harassing, otherwise unacceptable behavior may reported community leaders responsible enforcement codeofconduct@posit.co. complaints reviewed investigated promptly fairly. community leaders obligated respect privacy security reporter incident.","code":""},{"path":"https://infer.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"enforcement-guidelines","dir":"","previous_headings":"","what":"Enforcement Guidelines","title":"Contributor Covenant Code of Conduct","text":"Community leaders follow Community Impact Guidelines determining consequences action deem violation Code Conduct:","code":""},{"path":"https://infer.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"id_1-correction","dir":"","previous_headings":"Enforcement Guidelines","what":"1. Correction","title":"Contributor Covenant Code of Conduct","text":"Community Impact: Use inappropriate language behavior deemed unprofessional unwelcome community. Consequence: private, written warning community leaders, providing clarity around nature violation explanation behavior inappropriate. public apology may requested.","code":""},{"path":"https://infer.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"id_2-warning","dir":"","previous_headings":"Enforcement Guidelines","what":"2. Warning","title":"Contributor Covenant Code of Conduct","text":"Community Impact: violation single incident series actions. Consequence: warning consequences continued behavior. interaction people involved, including unsolicited interaction enforcing Code Conduct, specified period time. includes avoiding interactions community spaces well external channels like social media. Violating terms may lead temporary permanent ban.","code":""},{"path":"https://infer.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"id_3-temporary-ban","dir":"","previous_headings":"Enforcement Guidelines","what":"3. Temporary Ban","title":"Contributor Covenant Code of Conduct","text":"Community Impact: serious violation community standards, including sustained inappropriate behavior. Consequence: temporary ban sort interaction public communication community specified period time. public private interaction people involved, including unsolicited interaction enforcing Code Conduct, allowed period. Violating terms may lead permanent ban.","code":""},{"path":"https://infer.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"id_4-permanent-ban","dir":"","previous_headings":"Enforcement Guidelines","what":"4. Permanent Ban","title":"Contributor Covenant Code of Conduct","text":"Community Impact: Demonstrating pattern violation community standards, including sustained inappropriate behavior, harassment individual, aggression toward disparagement classes individuals. Consequence: permanent ban sort public interaction within community.","code":""},{"path":"https://infer.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"attribution","dir":"","previous_headings":"","what":"Attribution","title":"Contributor Covenant Code of Conduct","text":"Code Conduct adapted Contributor Covenant, version 2.1, available https://www.contributor-covenant.org/version/2/1/code_of_conduct.html. Community Impact Guidelines inspired [Mozilla’s code conduct enforcement ladder][https://github.com/mozilla/inclusion]. answers common questions code conduct, see FAQ https://www.contributor-covenant.org/faq. Translations available https://www.contributor-covenant.org/translations.","code":""},{"path":"https://infer.tidymodels.org/dev/CONTRIBUTING.html","id":null,"dir":"","previous_headings":"","what":"Contributing","title":"Contributing","text":"Contributions infer whether form bug fixes, issue reports, new code documentation improvements encouraged welcome. welcome novices may never contributed package well friendly veterans looking help us improve package users. eager include accepting contributions everyone meets code conduct guidelines. Please use GitHub issues. pull request, please link open corresponding issue GitHub issues. Please ensure notifications turned respond questions, comments needed changes promptly.","code":""},{"path":"https://infer.tidymodels.org/dev/CONTRIBUTING.html","id":"tests","dir":"","previous_headings":"","what":"Tests","title":"Contributing","text":"infer uses testthat testing. Please try provide 100% test coverage submitted code always check existing tests continue pass. beginner need help writing test, mention issue try help. ’s also helpful run goodpractice::gp() ensure lines code 80 characters lines code tests written. Please prior submitting pull request fix suggestions . Reach us need assistance .","code":""},{"path":"https://infer.tidymodels.org/dev/CONTRIBUTING.html","id":"code-style","dir":"","previous_headings":"","what":"Code style","title":"Contributing","text":"Please use snake case (rep_sample_n) function names. Besides , general follow tidyverse style R.","code":""},{"path":"https://infer.tidymodels.org/dev/CONTRIBUTING.html","id":"code-of-conduct","dir":"","previous_headings":"","what":"Code of Conduct","title":"Contributing","text":"contributing infer package must follow code conduct defined CONDUCT.","code":""},{"path":"https://infer.tidymodels.org/dev/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"MIT License","title":"MIT License","text":"Copyright (c) 2021 infer authors Permission hereby granted, free charge, person obtaining copy software associated documentation files (“Software”), deal Software without restriction, including without limitation rights use, copy, modify, merge, publish, distribute, sublicense, /sell copies Software, permit persons Software furnished , subject following conditions: copyright notice permission notice shall included copies substantial portions Software. SOFTWARE PROVIDED “”, WITHOUT WARRANTY KIND, EXPRESS IMPLIED, INCLUDING LIMITED WARRANTIES MERCHANTABILITY, FITNESS PARTICULAR PURPOSE NONINFRINGEMENT. EVENT SHALL AUTHORS COPYRIGHT HOLDERS LIABLE CLAIM, DAMAGES LIABILITY, WHETHER ACTION CONTRACT, TORT OTHERWISE, ARISING , CONNECTION SOFTWARE USE DEALINGS SOFTWARE.","code":""},{"path":"https://infer.tidymodels.org/dev/articles/chi_squared.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Tidy Chi-Squared Tests with infer","text":"vignette, ’ll walk conducting \\(\\chi^2\\) (chi-squared) test independence chi-squared goodness fit test using infer. ’ll start chi-squared test independence, can used test association two categorical variables. , ’ll move chi-squared goodness fit test, tests well distribution one categorical variable can approximated theoretical distribution. Throughout vignette, ’ll make use gss dataset supplied infer, contains sample data General Social Survey. See ?gss information variables included source. Note data (examples ) demonstration purposes , necessarily provide accurate estimates unless weighted properly. examples, let’s suppose dataset representative sample population want learn : American adults. data looks like :","code":"dplyr::glimpse(gss) ## Rows: 500 ## Columns: 11 ## $ year 2014, 1994, 1998, 1996, 1994, 1996, 1990, 2016, 2000, 19… ## $ age 36, 34, 24, 42, 31, 32, 48, 36, 30, 33, 21, 30, 38, 49, … ## $ sex male, female, male, male, male, female, female, female, … ## $ college degree, no degree, degree, no degree, degree, no degree,… ## $ partyid ind, rep, ind, ind, rep, rep, dem, ind, rep, dem, dem, i… ## $ hompop 3, 4, 1, 4, 2, 4, 2, 1, 5, 2, 4, 3, 4, 4, 2, 2, 3, 2, 1,… ## $ hours 50, 31, 40, 40, 40, 53, 32, 20, 40, 40, 23, 52, 38, 72, … ## $ income $25000 or more, $20000 - 24999, $25000 or more, $25000 o… ## $ class middle class, working class, working class, working clas… ## $ finrela below average, below average, below average, above avera… ## $ weight 0.8960, 1.0825, 0.5501, 1.0864, 1.0825, 1.0864, 1.0627, …"},{"path":"https://infer.tidymodels.org/dev/articles/chi_squared.html","id":"test-of-independence","dir":"Articles","previous_headings":"","what":"Test of Independence","title":"Tidy Chi-Squared Tests with infer","text":"carry chi-squared test independence, ’ll examine association income educational attainment United States. college categorical variable values degree degree, indicating whether respondent college degree (including community college), finrela gives respondent’s self-identification family income—either far average, average, average, average, far average, DK (don’t know). relationship looks like sample data: relationship, expect see purple bars reaching height, regardless income class. differences see , though, just due random noise? First, calculate observed statistic, can use specify() calculate(). observed \\(\\chi^2\\) statistic 30.6825. Now, want compare statistic null distribution, generated assumption variables actually related, get sense likely us see observed statistic actually association education income. can generate null distribution one two ways—using randomization theory-based methods. randomization approach approximates null distribution permuting response explanatory variables, person’s educational attainment matched random income sample order break association two. Note , line specify(college ~ finrela) , use equivalent syntax specify(response = college, explanatory = finrela). goes code , generates null distribution using theory-based methods instead randomization. get sense distributions look like, observed statistic falls, can use visualize(): also visualize observed statistic theoretical null distribution. , use assume() verb define theoretical null distribution pass visualize() like null distribution outputted generate() calculate(). visualize randomization-based theoretical null distributions get sense two relate, can pipe randomization-based null distribution visualize(), provide method = \"\". Either way, looks like observed test statistic quite unlikely actually association education income. exactly, can approximate p-value get_p_value: Thus, really relationship education income, approximation probability see statistic extreme 30.6825 approximately 0. calculate p-value using true \\(\\chi^2\\) distribution, can use pchisq function base R. function allows us situate test statistic calculated previously \\(\\chi^2\\) distribution appropriate degrees freedom. Note , equivalently theory-based approach shown , package supplies wrapper function, chisq_test, carry Chi-Squared tests independence tidy data. syntax goes like :","code":"# calculate the observed statistic observed_indep_statistic <- gss %>% specify(college ~ finrela) %>% hypothesize(null = \"independence\") %>% calculate(stat = \"Chisq\") # generate the null distribution using randomization null_dist_sim <- gss %>% specify(college ~ finrela) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") %>% calculate(stat = \"Chisq\") # generate the null distribution by theoretical approximation null_dist_theory <- gss %>% specify(college ~ finrela) %>% assume(distribution = \"Chisq\") # visualize the null distribution and test statistic! null_dist_sim %>% visualize() + shade_p_value(observed_indep_statistic, direction = \"greater\") # visualize the theoretical null distribution and test statistic! gss %>% specify(college ~ finrela) %>% assume(distribution = \"Chisq\") %>% visualize() + shade_p_value(observed_indep_statistic, direction = \"greater\") # visualize both null distributions and the test statistic! null_dist_sim %>% visualize(method = \"both\") + shade_p_value(observed_indep_statistic, direction = \"greater\") # calculate the p value from the observed statistic and null distribution p_value_independence <- null_dist_sim %>% get_p_value(obs_stat = observed_indep_statistic, direction = \"greater\") p_value_independence ## # A tibble: 1 × 1 ## p_value ## ## 1 0 pchisq(observed_indep_statistic$stat, 5, lower.tail = FALSE) ## X-squared ## 1.082e-05 chisq_test(gss, college ~ finrela) ## # A tibble: 1 × 3 ## statistic chisq_df p_value ## ## 1 30.7 5 0.0000108"},{"path":"https://infer.tidymodels.org/dev/articles/chi_squared.html","id":"goodness-of-fit","dir":"Articles","previous_headings":"","what":"Goodness of Fit","title":"Tidy Chi-Squared Tests with infer","text":"Now, moving chi-squared goodness fit test, ’ll take look self-identified income class survey respondents. Suppose null hypothesis finrela follows uniform distribution (.e. ’s actually equal number people describe income far average, average, average, average, far average, don’t know income.) graph represents hypothesis: seems like uniform distribution may appropriate description data–many people describe income average options. Lets now test whether difference distributions statistically significant. First, carry hypothesis test, calculate observed statistic. observed statistic 487.984. Now, generating null distribution, just dropping call generate(): , get sense distributions look like, observed statistic falls, can use visualize(): statistic seems like quite unlikely income class self-identification actually followed uniform distribution! unlikely, though? Calculating p-value: Thus, self-identified income class equally likely occur, approximation probability see distribution like one approximately 0. calculate p-value using true \\(\\chi^2\\) distribution, can use pchisq function base R. function allows us situate test statistic calculated previously \\(\\chi^2\\) distribution appropriate degrees freedom. , equivalently theory-based approach shown , package supplies wrapper function, chisq_test, carry Chi-Squared goodness fit tests tidy data. syntax goes like :","code":"# calculating the null distribution observed_gof_statistic <- gss %>% specify(response = finrela) %>% hypothesize(null = \"point\", p = c(\"far below average\" = 1/6, \"below average\" = 1/6, \"average\" = 1/6, \"above average\" = 1/6, \"far above average\" = 1/6, \"DK\" = 1/6)) %>% calculate(stat = \"Chisq\") # generating a null distribution, assuming each income class is equally likely null_dist_gof <- gss %>% specify(response = finrela) %>% hypothesize(null = \"point\", p = c(\"far below average\" = 1/6, \"below average\" = 1/6, \"average\" = 1/6, \"above average\" = 1/6, \"far above average\" = 1/6, \"DK\" = 1/6)) %>% generate(reps = 1000, type = \"draw\") %>% calculate(stat = \"Chisq\") # visualize the null distribution and test statistic! null_dist_gof %>% visualize() + shade_p_value(observed_gof_statistic, direction = \"greater\") # calculate the p-value p_value_gof <- null_dist_gof %>% get_p_value(observed_gof_statistic, direction = \"greater\") p_value_gof ## # A tibble: 1 × 1 ## p_value ## ## 1 0 pchisq(observed_gof_statistic$stat, 5, lower.tail = FALSE) ## [1] 3.131e-103 chisq_test(gss, response = finrela, p = c(\"far below average\" = 1/6, \"below average\" = 1/6, \"average\" = 1/6, \"above average\" = 1/6, \"far above average\" = 1/6, \"DK\" = 1/6)) ## # A tibble: 1 × 3 ## statistic chisq_df p_value ## ## 1 488. 5 3.13e-103"},{"path":"https://infer.tidymodels.org/dev/articles/infer.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Getting to Know infer","text":"infer implements expressive grammar perform statistical inference coheres tidyverse design framework. Rather providing methods specific statistical tests, package consolidates principles shared among common hypothesis tests set 4 main verbs (functions), supplemented many utilities visualize extract value outputs. Regardless hypothesis test ’re using, ’re still asking kind question: effect/difference observed data real, due chance? answer question, start assuming observed data came world “nothing going ” (.e. observed effect simply due random chance), call assumption null hypothesis. (reality, might believe null hypothesis —null hypothesis opposition alternate hypothesis, supposes effect present observed data actually due fact “something going .”) calculate test statistic data describes observed effect. can use test statistic calculate p-value, giving probability observed data come null hypothesis true. probability pre-defined significance level \\(\\alpha\\), can reject null hypothesis. workflow package designed around idea. Starting dataset, specify() allows specify variable, relationship variables, ’re interested . hypothesize() allows declare null hypothesis. generate() allows generate data reflecting null hypothesis. calculate() allows calculate distribution statistics generated data form null distribution. Throughout vignette, make use gss, dataset supplied infer containing sample 500 observations 11 variables General Social Survey. row individual survey response, containing basic demographic information respondent well additional variables. See ?gss information variables included source. Note data (examples ) demonstration purposes , necessarily provide accurate estimates unless weighted properly. examples, let’s suppose dataset representative sample population want learn : American adults.","code":"# load in the dataset data(gss) # take a look at its structure dplyr::glimpse(gss) ## Rows: 500 ## Columns: 11 ## $ year 2014, 1994, 1998, 1996, 1994, 1996, 1990, 2016, 2000, 19… ## $ age 36, 34, 24, 42, 31, 32, 48, 36, 30, 33, 21, 30, 38, 49, … ## $ sex male, female, male, male, male, female, female, female, … ## $ college degree, no degree, degree, no degree, degree, no degree,… ## $ partyid ind, rep, ind, ind, rep, rep, dem, ind, rep, dem, dem, i… ## $ hompop 3, 4, 1, 4, 2, 4, 2, 1, 5, 2, 4, 3, 4, 4, 2, 2, 3, 2, 1,… ## $ hours 50, 31, 40, 40, 40, 53, 32, 20, 40, 40, 23, 52, 38, 72, … ## $ income $25000 or more, $20000 - 24999, $25000 or more, $25000 o… ## $ class middle class, working class, working class, working clas… ## $ finrela below average, below average, below average, above avera… ## $ weight 0.8960, 1.0825, 0.5501, 1.0864, 1.0825, 1.0864, 1.0627, …"},{"path":"https://infer.tidymodels.org/dev/articles/infer.html","id":"specify-specifying-response-and-explanatory-variables","dir":"Articles","previous_headings":"","what":"specify(): Specifying Response (and Explanatory) Variables","title":"Getting to Know infer","text":"specify function can used specify variables dataset ’re interested . ’re interested , say, age respondents, might write: front-end, output specify just looks like selects columns dataframe ’ve specified. Checking class object, though: can see infer class appended top dataframe classes–new class stores extra metadata. ’re interested two variables–age partyid, example–can specify relationship one two (equivalent) ways: ’re inference one proportion difference proportions, need use success argument specify level response variable success. instance, ’re interested proportion population college degree, might use following code:","code":"gss %>% specify(response = age) ## Response: age (numeric) ## # A tibble: 500 × 1 ## age ## ## 1 36 ## 2 34 ## 3 24 ## 4 42 ## 5 31 ## 6 32 ## 7 48 ## 8 36 ## 9 30 ## 10 33 ## # ℹ 490 more rows gss %>% specify(response = age) %>% class() ## [1] \"infer\" \"tbl_df\" \"tbl\" \"data.frame\" # as a formula gss %>% specify(age ~ partyid) ## Response: age (numeric) ## Explanatory: partyid (factor) ## # A tibble: 500 × 2 ## age partyid ## ## 1 36 ind ## 2 34 rep ## 3 24 ind ## 4 42 ind ## 5 31 rep ## 6 32 rep ## 7 48 dem ## 8 36 ind ## 9 30 rep ## 10 33 dem ## # ℹ 490 more rows # with the named arguments gss %>% specify(response = age, explanatory = partyid) ## Response: age (numeric) ## Explanatory: partyid (factor) ## # A tibble: 500 × 2 ## age partyid ## ## 1 36 ind ## 2 34 rep ## 3 24 ind ## 4 42 ind ## 5 31 rep ## 6 32 rep ## 7 48 dem ## 8 36 ind ## 9 30 rep ## 10 33 dem ## # ℹ 490 more rows # specifying for inference on proportions gss %>% specify(response = college, success = \"degree\") ## Response: college (factor) ## # A tibble: 500 × 1 ## college ## ## 1 degree ## 2 no degree ## 3 degree ## 4 no degree ## 5 degree ## 6 no degree ## 7 no degree ## 8 degree ## 9 degree ## 10 no degree ## # ℹ 490 more rows"},{"path":"https://infer.tidymodels.org/dev/articles/infer.html","id":"hypothesize-declaring-the-null-hypothesis","dir":"Articles","previous_headings":"","what":"hypothesize(): Declaring the Null Hypothesis","title":"Getting to Know infer","text":"next step infer pipeline often declare null hypothesis using hypothesize(). first step supply one “independence” “point” null argument. null hypothesis assumes independence two variables, need supply hypothesize(): ’re inference point estimate, also need provide one p (true proportion successes, 0 1), mu (true mean), med (true median), sigma (true standard deviation). instance, null hypothesis mean number hours worked per week population 40, write: , front-end, dataframe outputted hypothesize() looks almost exactly came specify(), infer now “knows” null hypothesis.","code":"gss %>% specify(college ~ partyid, success = \"degree\") %>% hypothesize(null = \"independence\") ## Response: college (factor) ## Explanatory: partyid (factor) ## Null Hypothesis: independence ## # A tibble: 500 × 2 ## college partyid ## ## 1 degree ind ## 2 no degree rep ## 3 degree ind ## 4 no degree ind ## 5 degree rep ## 6 no degree rep ## 7 no degree dem ## 8 degree ind ## 9 degree rep ## 10 no degree dem ## # ℹ 490 more rows gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) ## Response: hours (numeric) ## Null Hypothesis: point ## # A tibble: 500 × 1 ## hours ## ## 1 50 ## 2 31 ## 3 40 ## 4 40 ## 5 40 ## 6 53 ## 7 32 ## 8 20 ## 9 40 ## 10 40 ## # ℹ 490 more rows"},{"path":"https://infer.tidymodels.org/dev/articles/infer.html","id":"generate-generating-the-null-distribution","dir":"Articles","previous_headings":"","what":"generate(): Generating the Null Distribution","title":"Getting to Know infer","text":"’ve asserted null hypothesis using hypothesize(), can construct null distribution based hypothesis. can using one several methods, supplied type argument: bootstrap: bootstrap sample drawn replicate, sample size equal input sample size drawn (replacement) input sample data. permute: replicate, input value randomly reassigned (without replacement) new output value sample. draw: value sampled theoretical distribution parameters specified hypothesize() replicate. option currently applicable testing point estimates. generation type previously called \"simulate\", superseded. Continuing example , average number hours worked week, might write: example, take 1000 bootstrap samples form null distribution. Note , generate()ing, ’ve set seed random number generation set.seed() function. using infer package research, cases exact reproducibility priority, good practice. infer respect random seed specified set.seed() function, returning result generate()ing data given identical seed. generate null distribution independence two variables, also randomly reshuffle pairings explanatory response variables break existing association. instance, generate 1000 replicates can used create null distribution assumption political party affiliation affected age:","code":"set.seed(1) gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% generate(reps = 1000, type = \"bootstrap\") ## Response: hours (numeric) ## Null Hypothesis: point ## # A tibble: 500,000 × 2 ## # Groups: replicate [1,000] ## replicate hours ## ## 1 1 46.6 ## 2 1 43.6 ## 3 1 38.6 ## 4 1 28.6 ## 5 1 38.6 ## 6 1 38.6 ## 7 1 6.62 ## 8 1 78.6 ## 9 1 38.6 ## 10 1 38.6 ## # ℹ 499,990 more rows gss %>% specify(partyid ~ age) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") ## Response: partyid (factor) ## Explanatory: age (numeric) ## Null Hypothesis: independence ## # A tibble: 500,000 × 3 ## # Groups: replicate [1,000] ## partyid age replicate ## ## 1 rep 36 1 ## 2 rep 34 1 ## 3 dem 24 1 ## 4 dem 42 1 ## 5 dem 31 1 ## 6 ind 32 1 ## 7 ind 48 1 ## 8 rep 36 1 ## 9 dem 30 1 ## 10 rep 33 1 ## # ℹ 499,990 more rows"},{"path":"https://infer.tidymodels.org/dev/articles/infer.html","id":"calculate-calculating-summary-statistics","dir":"Articles","previous_headings":"","what":"calculate(): Calculating Summary Statistics","title":"Getting to Know infer","text":"calculate() calculates summary statistics output infer core functions. function takes stat argument, currently one “mean”, “median”, “sum”, “sd”, “prop”, “count”, “diff means”, “diff medians”, “diff props”, “Chisq”, “F”, “t”, “z”, “slope”, “correlation”. example, continuing example calculate null distribution mean hours worked per week: output calculate() shows us sample statistic (case, mean) 1000 replicates. ’re carrying inference differences means, medians, proportions, t z statistics, need supply order argument, giving order explanatory variables subtracted. instance, find difference mean age college degree don’t, might write:","code":"gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"mean\") ## Response: hours (numeric) ## Null Hypothesis: point ## # A tibble: 1,000 × 2 ## replicate stat ## ## 1 1 39.2 ## 2 2 39.1 ## 3 3 39.0 ## 4 4 39.8 ## 5 5 41.4 ## 6 6 39.4 ## 7 7 39.8 ## 8 8 40.4 ## 9 9 41.5 ## 10 10 40.9 ## # ℹ 990 more rows gss %>% specify(age ~ college) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") %>% calculate(\"diff in means\", order = c(\"degree\", \"no degree\")) ## Response: age (numeric) ## Explanatory: college (factor) ## Null Hypothesis: independence ## # A tibble: 1,000 × 2 ## replicate stat ## ## 1 1 -2.35 ## 2 2 -0.902 ## 3 3 0.403 ## 4 4 -0.426 ## 5 5 0.482 ## 6 6 -0.196 ## 7 7 1.33 ## 8 8 -1.07 ## 9 9 1.68 ## 10 10 0.888 ## # ℹ 990 more rows"},{"path":"https://infer.tidymodels.org/dev/articles/infer.html","id":"other-utilities","dir":"Articles","previous_headings":"","what":"Other Utilities","title":"Getting to Know infer","text":"infer also offers several utilities extract meaning summary statistics distributions—package provides functions visualize statistic relative distribution (visualize()), calculate p-values (get_p_value()), calculate confidence intervals (get_confidence_interval()). illustrate, ’ll go back example determining whether mean number hours worked per week 40 hours. point estimate 41.382 seems pretty close 40, little bit different. might wonder difference just due random chance, mean number hours worked per week population really isn’t 40. initially just visualize null distribution. sample’s observed statistic lie distribution? can use obs_stat argument specify . Notice infer also shaded regions null distribution () extreme observed statistic. (Also, note now use + operator apply shade_p_value function. visualize outputs plot object ggplot2 instead data frame, + operator needed add p-value layer plot object.) red bar looks like ’s slightly far right tail null distribution, observing sample mean 41.382 hours somewhat unlikely mean actually 40 hours. unlikely, though? looks like p-value 0.032, pretty small—true mean number hours worked per week actually 40, probability sample mean far (1.382 hours) 40 0.032. may may statistically significantly different, depending significance level \\(\\alpha\\) decided ran analysis. set \\(\\alpha = .05\\), difference statistically significant, set \\(\\alpha = .01\\), . get confidence interval around estimate, can write: can see, 40 hours per week contained interval, aligns previous conclusion finding significant confidence level \\(\\alpha = .05\\). see interval represented visually, can use shade_confidence_interval() utility:","code":"# find the point estimate obs_mean <- gss %>% specify(response = hours) %>% calculate(stat = \"mean\") # generate a null distribution null_dist <- gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"mean\") null_dist %>% visualize() null_dist %>% visualize() + shade_p_value(obs_stat = obs_mean, direction = \"two-sided\") # get a two-tailed p-value p_value <- null_dist %>% get_p_value(obs_stat = obs_mean, direction = \"two-sided\") p_value ## # A tibble: 1 × 1 ## p_value ## ## 1 0.032 # generate a distribution like the null distribution, # though exclude the null hypothesis from the pipeline boot_dist <- gss %>% specify(response = hours) %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"mean\") # start with the bootstrap distribution ci <- boot_dist %>% # calculate the confidence interval around the point estimate get_confidence_interval(point_estimate = obs_mean, # at the 95% confidence level level = .95, # using the standard error type = \"se\") ci ## # A tibble: 1 × 2 ## lower_ci upper_ci ## ## 1 40.1 42.7 boot_dist %>% visualize() + shade_confidence_interval(endpoints = ci)"},{"path":"https://infer.tidymodels.org/dev/articles/infer.html","id":"theoretical-methods","dir":"Articles","previous_headings":"","what":"Theoretical Methods","title":"Getting to Know infer","text":"{infer} also provides functionality use theoretical methods \"Chisq\", \"F\", \"t\" \"z\" distributions. Generally, find null distribution using theory-based methods, use code use find observed statistic elsewhere, replacing calls calculate() assume(). example, calculate observed \\(t\\) statistic (standardized mean): , define theoretical \\(t\\) distribution, write: , theoretical distribution interfaces way simulation-based null distributions . example, interface p-values: Confidence intervals lie scale data rather standardized scale theoretical distribution, sure use unstandardized observed statistic working confidence intervals. visualized, \\(t\\) distribution recentered rescaled align scale observed data.","code":"# calculate an observed t statistic obs_t <- gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% calculate(stat = \"t\") # switch out calculate with assume to define a distribution t_dist <- gss %>% specify(response = hours) %>% assume(distribution = \"t\") # visualize the theoretical null distribution visualize(t_dist) + shade_p_value(obs_stat = obs_t, direction = \"greater\") # more exactly, calculate the p-value get_p_value(t_dist, obs_t, \"greater\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.0188 # find the theory-based confidence interval theor_ci <- get_confidence_interval( x = t_dist, level = .95, point_estimate = obs_mean ) theor_ci ## # A tibble: 1 × 2 ## lower_ci upper_ci ## ## 1 40.1 42.7 # visualize the theoretical sampling distribution visualize(t_dist) + shade_confidence_interval(theor_ci)"},{"path":"https://infer.tidymodels.org/dev/articles/infer.html","id":"multiple-regression","dir":"Articles","previous_headings":"","what":"Multiple regression","title":"Getting to Know infer","text":"accommodate randomization-based inference multiple explanatory variables, package implements alternative workflow based model fitting. Rather calculate()ing statistics resampled data, side package allows fit() linear models data resampled according null hypothesis, supplying model coefficients explanatory variable. part, can just switch calculate() fit() calculate()-based workflows. example, suppose want fit hours worked per week using respondent age college completion status. first begin fitting linear model observed data. Now, generate null distributions terms, can fit 1000 models resamples gss dataset, response hours permuted . Note code except addition hypothesize generate step. permute variables response variable, variables argument generate() allows choose columns data permute. Note derived effects depend columns (e.g., interaction effects) also affected. Beyond point, observed fits distributions null fits interface exactly like analogous outputs calculate(). instance, can use following code calculate 95% confidence interval objects. , can shade p-values observed regression coefficients observed data.","code":"observed_fit <- gss %>% specify(hours ~ age + college) %>% fit() null_fits <- gss %>% specify(hours ~ age + college) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") %>% fit() null_fits ## # A tibble: 3,000 × 3 ## # Groups: replicate [1,000] ## replicate term estimate ## ## 1 1 intercept 40.3 ## 2 1 age 0.0166 ## 3 1 collegedegree 1.20 ## 4 2 intercept 41.3 ## 5 2 age 0.00664 ## 6 2 collegedegree -0.407 ## 7 3 intercept 42.9 ## 8 3 age -0.0371 ## 9 3 collegedegree 0.00431 ## 10 4 intercept 42.7 ## # ℹ 2,990 more rows get_confidence_interval( null_fits, point_estimate = observed_fit, level = .95 ) ## # A tibble: 3 × 3 ## term lower_ci upper_ci ## ## 1 age -0.0948 0.0987 ## 2 collegedegree -2.57 2.72 ## 3 intercept 37.4 45.5 visualize(null_fits) + shade_p_value(observed_fit, direction = \"both\") ## Warning in (function (mapping = NULL, data = NULL, stat = \"identity\", position = \"identity\", : All aesthetics have length 1, but the data has 1000 rows. ## ℹ Did you mean to use `annotate()`? ## Warning in (function (mapping = NULL, data = NULL, stat = \"identity\", position = \"identity\", : All aesthetics have length 1, but the data has 1000 rows. ## ℹ Did you mean to use `annotate()`? ## Warning in (function (mapping = NULL, data = NULL, stat = \"identity\", position = \"identity\", : All aesthetics have length 1, but the data has 1000 rows. ## ℹ Did you mean to use `annotate()`?"},{"path":"https://infer.tidymodels.org/dev/articles/infer.html","id":"conclusion","dir":"Articles","previous_headings":"","what":"Conclusion","title":"Getting to Know infer","text":"’s ! vignette covers key functionality infer. See help(package = \"infer\") full list functions vignettes.","code":""},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Full infer Pipeline Examples","text":"vignette intended provide set examples nearly exhaustively demonstrate functionalities provided infer. Commentary examples limited—discussion intuition behind package, see “Getting Know infer” vignette, accessible calling vignette(\"infer\"). Throughout vignette, ’ll make use gss dataset supplied infer, contains sample data General Social Survey. See ?gss information variables included source. Note data (examples ) demonstration purposes , necessarily provide accurate estimates unless weighted properly. examples, let’s suppose dataset representative sample population want learn : American adults. data looks like :","code":"# load in the dataset data(gss) # take a look at its structure dplyr::glimpse(gss) ## Rows: 500 ## Columns: 11 ## $ year 2014, 1994, 1998, 1996, 1994, 1996, 1990, 2016, 2000, 19… ## $ age 36, 34, 24, 42, 31, 32, 48, 36, 30, 33, 21, 30, 38, 49, … ## $ sex male, female, male, male, male, female, female, female, … ## $ college degree, no degree, degree, no degree, degree, no degree,… ## $ partyid ind, rep, ind, ind, rep, rep, dem, ind, rep, dem, dem, i… ## $ hompop 3, 4, 1, 4, 2, 4, 2, 1, 5, 2, 4, 3, 4, 4, 2, 2, 3, 2, 1,… ## $ hours 50, 31, 40, 40, 40, 53, 32, 20, 40, 40, 23, 52, 38, 72, … ## $ income $25000 or more, $20000 - 24999, $25000 or more, $25000 o… ## $ class middle class, working class, working class, working clas… ## $ finrela below average, below average, below average, above avera… ## $ weight 0.8960, 1.0825, 0.5501, 1.0864, 1.0825, 1.0864, 1.0627, …"},{"path":[]},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-numerical-variable-mean","dir":"Articles","previous_headings":"Hypothesis tests","what":"One numerical variable (mean)","title":"Full infer Pipeline Examples","text":"Calculating observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Visualizing observed statistic alongside null distribution, Calculating p-value null distribution observed statistic,","code":"x_bar <- gss %>% specify(response = hours) %>% calculate(stat = \"mean\") x_bar <- gss %>% observe(response = hours, stat = \"mean\") null_dist <- gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% generate(reps = 1000) %>% calculate(stat = \"mean\") visualize(null_dist) + shade_p_value(obs_stat = x_bar, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = x_bar, direction = \"two-sided\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.032"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-numerical-variable-standardized-mean-t","dir":"Articles","previous_headings":"Hypothesis tests","what":"One numerical variable (standardized mean \\(t\\))","title":"Full infer Pipeline Examples","text":"Calculating observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Alternatively, finding null distribution using theoretical methods using assume() verb, Visualizing observed statistic alongside null distribution, Alternatively, visualizing observed statistic using theory-based null distribution, Alternatively, visualizing observed statistic using null distributions, Note code makes use randomization-based null distribution. Calculating p-value null distribution observed statistic, Alternatively, using t_test wrapper: infer support testing one numerical variable via z distribution.","code":"t_bar <- gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% calculate(stat = \"t\") t_bar <- gss %>% observe(response = hours, null = \"point\", mu = 40, stat = \"t\") null_dist <- gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% generate(reps = 1000) %>% calculate(stat = \"t\") null_dist_theory <- gss %>% specify(response = hours) %>% assume(\"t\") visualize(null_dist) + shade_p_value(obs_stat = t_bar, direction = \"two-sided\") visualize(null_dist_theory) + shade_p_value(obs_stat = t_bar, direction = \"two-sided\") visualize(null_dist, method = \"both\") + shade_p_value(obs_stat = t_bar, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = t_bar, direction = \"two-sided\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.04 gss %>% t_test(response = hours, mu = 40) ## # A tibble: 1 × 7 ## statistic t_df p_value alternative estimate lower_ci upper_ci ## ## 1 2.09 499 0.0376 two.sided 41.4 40.1 42.7"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-numerical-variable-median","dir":"Articles","previous_headings":"Hypothesis tests","what":"One numerical variable (median)","title":"Full infer Pipeline Examples","text":"Calculating observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Visualizing observed statistic alongside null distribution, Calculating p-value null distribution observed statistic,","code":"x_tilde <- gss %>% specify(response = age) %>% calculate(stat = \"median\") x_tilde <- gss %>% observe(response = age, stat = \"median\") null_dist <- gss %>% specify(response = age) %>% hypothesize(null = \"point\", med = 40) %>% generate(reps = 1000) %>% calculate(stat = \"median\") visualize(null_dist) + shade_p_value(obs_stat = x_tilde, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = x_tilde, direction = \"two-sided\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.01"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-numerical-variable-paired","dir":"Articles","previous_headings":"Hypothesis tests","what":"One numerical variable (paired)","title":"Full infer Pipeline Examples","text":"example header compatible stats \"mean\", \"median\", \"sum\", \"sd\". Suppose survey respondents provided number hours worked per week surveyed 5 years prior, encoded hours_previous. ’d like test null hypothesis \"mean\" hours worked per week change sampled time five years prior. infer supports paired hypothesis testing via null = \"paired independence\" argument hypothesize(). Calculating observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Note diff column permuted, rather signs values column. Visualizing observed statistic alongside null distribution, Calculating p-value null distribution observed statistic,","code":"set.seed(1) gss_paired <- gss %>% mutate( hours_previous = hours + 5 - rpois(nrow(.), 4.8), diff = hours - hours_previous ) gss_paired %>% select(hours, hours_previous, diff) ## # A tibble: 500 × 3 ## hours hours_previous diff ## ## 1 50 52 -2 ## 2 31 32 -1 ## 3 40 40 0 ## 4 40 37 3 ## 5 40 42 -2 ## 6 53 50 3 ## 7 32 28 4 ## 8 20 19 1 ## 9 40 40 0 ## 10 40 43 -3 ## # ℹ 490 more rows x_tilde <- gss_paired %>% specify(response = diff) %>% calculate(stat = \"mean\") x_tilde <- gss_paired %>% observe(response = diff, stat = \"mean\") null_dist <- gss_paired %>% specify(response = diff) %>% hypothesize(null = \"paired independence\") %>% generate(reps = 1000, type = \"permute\") %>% calculate(stat = \"mean\") visualize(null_dist) + shade_p_value(obs_stat = x_tilde, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = x_tilde, direction = \"two-sided\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.028"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-categorical-one-proportion","dir":"Articles","previous_headings":"Hypothesis tests","what":"One categorical (one proportion)","title":"Full infer Pipeline Examples","text":"Calculating observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Visualizing observed statistic alongside null distribution, Calculating p-value null distribution observed statistic, Note logical variables coerced factors:","code":"p_hat <- gss %>% specify(response = sex, success = \"female\") %>% calculate(stat = \"prop\") p_hat <- gss %>% observe(response = sex, success = \"female\", stat = \"prop\") null_dist <- gss %>% specify(response = sex, success = \"female\") %>% hypothesize(null = \"point\", p = .5) %>% generate(reps = 1000) %>% calculate(stat = \"prop\") visualize(null_dist) + shade_p_value(obs_stat = p_hat, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = p_hat, direction = \"two-sided\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.276 null_dist <- gss %>% dplyr::mutate(is_female = (sex == \"female\")) %>% specify(response = is_female, success = \"TRUE\") %>% hypothesize(null = \"point\", p = .5) %>% generate(reps = 1000) %>% calculate(stat = \"prop\")"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-categorical-variable-standardized-proportion-z","dir":"Articles","previous_headings":"Hypothesis tests","what":"One categorical variable (standardized proportion \\(z\\))","title":"Full infer Pipeline Examples","text":"Calculating observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Visualizing observed statistic alongside null distribution, Calculating p-value null distribution observed statistic, package also supplies wrapper around prop.test tests single proportion tidy data. infer support testing two means via z distribution.","code":"p_hat <- gss %>% specify(response = sex, success = \"female\") %>% hypothesize(null = \"point\", p = .5) %>% calculate(stat = \"z\") p_hat <- gss %>% observe(response = sex, success = \"female\", null = \"point\", p = .5, stat = \"z\") null_dist <- gss %>% specify(response = sex, success = \"female\") %>% hypothesize(null = \"point\", p = .5) %>% generate(reps = 1000, type = \"draw\") %>% calculate(stat = \"z\") visualize(null_dist) + shade_p_value(obs_stat = p_hat, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = p_hat, direction = \"two-sided\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.252 prop_test(gss, college ~ NULL, p = .2) ## # A tibble: 1 × 4 ## statistic chisq_df p_value alternative ## ## 1 636. 1 2.98e-140 two.sided"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"two-categorical-2-level-variables","dir":"Articles","previous_headings":"Hypothesis tests","what":"Two categorical (2 level) variables","title":"Full infer Pipeline Examples","text":"infer package provides several statistics work data type. One statistic difference proportions. Calculating observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Visualizing observed statistic alongside null distribution, Calculating p-value null distribution observed statistic, infer also provides functionality calculate ratios proportions. workflow looks similar diff props. Calculating observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Visualizing observed statistic alongside null distribution, Calculating p-value null distribution observed statistic, addition, package provides functionality calculate odds ratios. workflow also looks similar diff props. Calculating observed statistic, , generating null distribution, Visualizing observed statistic alongside null distribution, Calculating p-value null distribution observed statistic,","code":"d_hat <- gss %>% specify(college ~ sex, success = \"no degree\") %>% calculate(stat = \"diff in props\", order = c(\"female\", \"male\")) d_hat <- gss %>% observe(college ~ sex, success = \"no degree\", stat = \"diff in props\", order = c(\"female\", \"male\")) null_dist <- gss %>% specify(college ~ sex, success = \"no degree\") %>% hypothesize(null = \"independence\") %>% generate(reps = 1000) %>% calculate(stat = \"diff in props\", order = c(\"female\", \"male\")) visualize(null_dist) + shade_p_value(obs_stat = d_hat, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = d_hat, direction = \"two-sided\") ## # A tibble: 1 × 1 ## p_value ## ## 1 1 r_hat <- gss %>% specify(college ~ sex, success = \"no degree\") %>% calculate(stat = \"ratio of props\", order = c(\"female\", \"male\")) r_hat <- gss %>% observe(college ~ sex, success = \"no degree\", stat = \"ratio of props\", order = c(\"female\", \"male\")) null_dist <- gss %>% specify(college ~ sex, success = \"no degree\") %>% hypothesize(null = \"independence\") %>% generate(reps = 1000) %>% calculate(stat = \"ratio of props\", order = c(\"female\", \"male\")) visualize(null_dist) + shade_p_value(obs_stat = r_hat, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = r_hat, direction = \"two-sided\") ## # A tibble: 1 × 1 ## p_value ## ## 1 1 or_hat <- gss %>% specify(college ~ sex, success = \"no degree\") %>% calculate(stat = \"odds ratio\", order = c(\"female\", \"male\")) null_dist <- gss %>% specify(college ~ sex, success = \"no degree\") %>% hypothesize(null = \"independence\") %>% generate(reps = 1000) %>% calculate(stat = \"odds ratio\", order = c(\"female\", \"male\")) visualize(null_dist) + shade_p_value(obs_stat = or_hat, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = or_hat, direction = \"two-sided\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.984"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"two-categorical-2-level-variables-z","dir":"Articles","previous_headings":"Hypothesis tests","what":"Two categorical (2 level) variables (z)","title":"Full infer Pipeline Examples","text":"Finding standardized observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Alternatively, finding null distribution using theoretical methods using assume() verb, Visualizing observed statistic alongside null distribution, Alternatively, visualizing observed statistic using theory-based null distribution, Alternatively, visualizing observed statistic using null distributions, Note code makes use randomization-based null distribution. Calculating p-value null distribution observed statistic, Note similarities plot previous one. package also supplies wrapper around prop.test allow tests equality proportions tidy data.","code":"z_hat <- gss %>% specify(college ~ sex, success = \"no degree\") %>% hypothesize(null = \"independence\") %>% calculate(stat = \"z\", order = c(\"female\", \"male\")) z_hat <- gss %>% observe(college ~ sex, success = \"no degree\", stat = \"z\", order = c(\"female\", \"male\")) null_dist <- gss %>% specify(college ~ sex, success = \"no degree\") %>% hypothesize(null = \"independence\") %>% generate(reps = 1000) %>% calculate(stat = \"z\", order = c(\"female\", \"male\")) null_dist_theory <- gss %>% specify(college ~ sex, success = \"no degree\") %>% assume(\"z\") visualize(null_dist) + shade_p_value(obs_stat = z_hat, direction = \"two-sided\") visualize(null_dist_theory) + shade_p_value(obs_stat = z_hat, direction = \"two-sided\") visualize(null_dist, method = \"both\") + shade_p_value(obs_stat = z_hat, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = z_hat, direction = \"two-sided\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.98 prop_test(gss, college ~ sex, order = c(\"female\", \"male\")) ## # A tibble: 1 × 6 ## statistic chisq_df p_value alternative lower_ci upper_ci ## ## 1 0.0000204 1 0.996 two.sided -0.0918 0.0834"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-categorical-2-level---gof","dir":"Articles","previous_headings":"Hypothesis tests","what":"One categorical (>2 level) - GoF","title":"Full infer Pipeline Examples","text":"Calculating observed statistic, Note need add hypothesized values compute observed statistic. Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Alternatively, finding null distribution using theoretical methods using assume() verb, Visualizing observed statistic alongside null distribution, Alternatively, visualizing observed statistic using theory-based null distribution, Alternatively, visualizing observed statistic using null distributions, Note code makes use randomization-based null distribution. Calculating p-value null distribution observed statistic, Alternatively, using chisq_test wrapper:","code":"Chisq_hat <- gss %>% specify(response = finrela) %>% hypothesize(null = \"point\", p = c(\"far below average\" = 1/6, \"below average\" = 1/6, \"average\" = 1/6, \"above average\" = 1/6, \"far above average\" = 1/6, \"DK\" = 1/6)) %>% calculate(stat = \"Chisq\") Chisq_hat <- gss %>% observe(response = finrela, null = \"point\", p = c(\"far below average\" = 1/6, \"below average\" = 1/6, \"average\" = 1/6, \"above average\" = 1/6, \"far above average\" = 1/6, \"DK\" = 1/6), stat = \"Chisq\") null_dist <- gss %>% specify(response = finrela) %>% hypothesize(null = \"point\", p = c(\"far below average\" = 1/6, \"below average\" = 1/6, \"average\" = 1/6, \"above average\" = 1/6, \"far above average\" = 1/6, \"DK\" = 1/6)) %>% generate(reps = 1000, type = \"draw\") %>% calculate(stat = \"Chisq\") null_dist_theory <- gss %>% specify(response = finrela) %>% assume(\"Chisq\") visualize(null_dist) + shade_p_value(obs_stat = Chisq_hat, direction = \"greater\") visualize(null_dist_theory) + shade_p_value(obs_stat = Chisq_hat, direction = \"greater\") visualize(null_dist_theory, method = \"both\") + shade_p_value(obs_stat = Chisq_hat, direction = \"greater\") null_dist %>% get_p_value(obs_stat = Chisq_hat, direction = \"greater\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0 chisq_test(gss, response = finrela, p = c(\"far below average\" = 1/6, \"below average\" = 1/6, \"average\" = 1/6, \"above average\" = 1/6, \"far above average\" = 1/6, \"DK\" = 1/6)) ## # A tibble: 1 × 3 ## statistic chisq_df p_value ## ## 1 488. 5 3.13e-103"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"two-categorical-2-level-chi-squared-test-of-independence","dir":"Articles","previous_headings":"Hypothesis tests","what":"Two categorical (>2 level): Chi-squared test of independence","title":"Full infer Pipeline Examples","text":"Calculating observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Alternatively, finding null distribution using theoretical methods using assume() verb, Visualizing observed statistic alongside null distribution, Alternatively, visualizing observed statistic using theory-based null distribution, Alternatively, visualizing observed statistic using null distributions, Note code makes use randomization-based null distribution. Calculating p-value null distribution observed statistic, Alternatively, using wrapper carry test,","code":"Chisq_hat <- gss %>% specify(formula = finrela ~ sex) %>% hypothesize(null = \"independence\") %>% calculate(stat = \"Chisq\") Chisq_hat <- gss %>% observe(formula = finrela ~ sex, stat = \"Chisq\") null_dist <- gss %>% specify(finrela ~ sex) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") %>% calculate(stat = \"Chisq\") null_dist_theory <- gss %>% specify(finrela ~ sex) %>% assume(distribution = \"Chisq\") visualize(null_dist) + shade_p_value(obs_stat = Chisq_hat, direction = \"greater\") visualize(null_dist_theory) + shade_p_value(obs_stat = Chisq_hat, direction = \"greater\") visualize(null_dist, method = \"both\") + shade_p_value(obs_stat = Chisq_hat, direction = \"greater\") null_dist %>% get_p_value(obs_stat = Chisq_hat, direction = \"greater\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.118 gss %>% chisq_test(formula = finrela ~ sex) ## # A tibble: 1 × 3 ## statistic chisq_df p_value ## ## 1 9.11 5 0.105"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-numerical-variable-one-categorical-2-levels-diff-in-means","dir":"Articles","previous_headings":"Hypothesis tests","what":"One numerical variable, one categorical (2 levels) (diff in means)","title":"Full infer Pipeline Examples","text":"Calculating observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Visualizing observed statistic alongside null distribution, Calculating p-value null distribution observed statistic,","code":"d_hat <- gss %>% specify(age ~ college) %>% calculate(stat = \"diff in means\", order = c(\"degree\", \"no degree\")) d_hat <- gss %>% observe(age ~ college, stat = \"diff in means\", order = c(\"degree\", \"no degree\")) null_dist <- gss %>% specify(age ~ college) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") %>% calculate(stat = \"diff in means\", order = c(\"degree\", \"no degree\")) visualize(null_dist) + shade_p_value(obs_stat = d_hat, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = d_hat, direction = \"two-sided\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.46"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-numerical-variable-one-categorical-2-levels-t","dir":"Articles","previous_headings":"Hypothesis tests","what":"One numerical variable, one categorical (2 levels) (t)","title":"Full infer Pipeline Examples","text":"Finding standardized observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Alternatively, finding null distribution using theoretical methods using assume() verb, Visualizing observed statistic alongside null distribution, Alternatively, visualizing observed statistic using theory-based null distribution, Alternatively, visualizing observed statistic using null distributions, Note code makes use randomization-based null distribution. Calculating p-value null distribution observed statistic, Note similarities plot previous one.","code":"t_hat <- gss %>% specify(age ~ college) %>% hypothesize(null = \"independence\") %>% calculate(stat = \"t\", order = c(\"degree\", \"no degree\")) t_hat <- gss %>% observe(age ~ college, stat = \"t\", order = c(\"degree\", \"no degree\")) null_dist <- gss %>% specify(age ~ college) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") %>% calculate(stat = \"t\", order = c(\"degree\", \"no degree\")) null_dist_theory <- gss %>% specify(age ~ college) %>% assume(\"t\") visualize(null_dist) + shade_p_value(obs_stat = t_hat, direction = \"two-sided\") visualize(null_dist_theory) + shade_p_value(obs_stat = t_hat, direction = \"two-sided\") visualize(null_dist, method = \"both\") + shade_p_value(obs_stat = t_hat, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = t_hat, direction = \"two-sided\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.442"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-numerical-variable-one-categorical-2-levels-diff-in-medians","dir":"Articles","previous_headings":"Hypothesis tests","what":"One numerical variable, one categorical (2 levels) (diff in medians)","title":"Full infer Pipeline Examples","text":"Calculating observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Visualizing observed statistic alongside null distribution, Calculating p-value null distribution observed statistic,","code":"d_hat <- gss %>% specify(age ~ college) %>% calculate(stat = \"diff in medians\", order = c(\"degree\", \"no degree\")) d_hat <- gss %>% observe(age ~ college, stat = \"diff in medians\", order = c(\"degree\", \"no degree\")) null_dist <- gss %>% specify(age ~ college) %>% # alt: response = age, explanatory = season hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") %>% calculate(stat = \"diff in medians\", order = c(\"degree\", \"no degree\")) visualize(null_dist) + shade_p_value(obs_stat = d_hat, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = d_hat, direction = \"two-sided\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.172"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-numerical-one-categorical-2-levels---anova","dir":"Articles","previous_headings":"Hypothesis tests","what":"One numerical, one categorical (>2 levels) - ANOVA","title":"Full infer Pipeline Examples","text":"Calculating observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Alternatively, finding null distribution using theoretical methods using assume() verb, Visualizing observed statistic alongside null distribution, Alternatively, visualizing observed statistic using theory-based null distribution, Alternatively, visualizing observed statistic using null distributions, Note code makes use randomization-based null distribution. Calculating p-value null distribution observed statistic,","code":"F_hat <- gss %>% specify(age ~ partyid) %>% calculate(stat = \"F\") F_hat <- gss %>% observe(age ~ partyid, stat = \"F\") null_dist <- gss %>% specify(age ~ partyid) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") %>% calculate(stat = \"F\") null_dist_theory <- gss %>% specify(age ~ partyid) %>% hypothesize(null = \"independence\") %>% assume(distribution = \"F\") visualize(null_dist) + shade_p_value(obs_stat = F_hat, direction = \"greater\") visualize(null_dist_theory) + shade_p_value(obs_stat = F_hat, direction = \"greater\") visualize(null_dist, method = \"both\") + shade_p_value(obs_stat = F_hat, direction = \"greater\") null_dist %>% get_p_value(obs_stat = F_hat, direction = \"greater\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.045"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"two-numerical-vars---slr","dir":"Articles","previous_headings":"Hypothesis tests","what":"Two numerical vars - SLR","title":"Full infer Pipeline Examples","text":"Calculating observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Visualizing observed statistic alongside null distribution, Calculating p-value null distribution observed statistic,","code":"slope_hat <- gss %>% specify(hours ~ age) %>% calculate(stat = \"slope\") slope_hat <- gss %>% observe(hours ~ age, stat = \"slope\") null_dist <- gss %>% specify(hours ~ age) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") %>% calculate(stat = \"slope\") visualize(null_dist) + shade_p_value(obs_stat = slope_hat, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = slope_hat, direction = \"two-sided\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.902"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"two-numerical-vars---correlation","dir":"Articles","previous_headings":"Hypothesis tests","what":"Two numerical vars - correlation","title":"Full infer Pipeline Examples","text":"Calculating observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating null distribution, Visualizing observed statistic alongside null distribution, Calculating p-value null distribution observed statistic,","code":"correlation_hat <- gss %>% specify(hours ~ age) %>% calculate(stat = \"correlation\") correlation_hat <- gss %>% observe(hours ~ age, stat = \"correlation\") null_dist <- gss %>% specify(hours ~ age) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") %>% calculate(stat = \"correlation\") visualize(null_dist) + shade_p_value(obs_stat = correlation_hat, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = correlation_hat, direction = \"two-sided\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.878"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"two-numerical-vars---slr-t","dir":"Articles","previous_headings":"Hypothesis tests","what":"Two numerical vars - SLR (t)","title":"Full infer Pipeline Examples","text":"currently implemented since \\(t\\) refer standardized slope standardized correlation.","code":""},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"multiple-explanatory-variables","dir":"Articles","previous_headings":"Hypothesis tests","what":"Multiple explanatory variables","title":"Full infer Pipeline Examples","text":"Calculating observed fit, Generating distribution fits response variable permuted, Generating distribution fits explanatory variable permuted independently, Visualizing observed fit alongside null fits, Calculating p-values null distribution observed fit, Note fit()-based workflow can applied use cases differing numbers explanatory variables explanatory variable types.","code":"obs_fit <- gss %>% specify(hours ~ age + college) %>% fit() null_dist <- gss %>% specify(hours ~ age + college) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") %>% fit() null_dist2 <- gss %>% specify(hours ~ age + college) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\", variables = c(age, college)) %>% fit() visualize(null_dist) + shade_p_value(obs_stat = obs_fit, direction = \"two-sided\") null_dist %>% get_p_value(obs_stat = obs_fit, direction = \"two-sided\") ## # A tibble: 3 × 2 ## term p_value ## ## 1 age 0.914 ## 2 collegedegree 0.266 ## 3 intercept 0.734"},{"path":[]},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-numerical-one-mean","dir":"Articles","previous_headings":"Confidence intervals","what":"One numerical (one mean)","title":"Full infer Pipeline Examples","text":"Finding observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating bootstrap distribution, Use bootstrap distribution find confidence interval, Visualizing observed statistic alongside distribution, Alternatively, use bootstrap distribution find confidence interval using standard error, Instead simulation-based bootstrap distribution, can also define theory-based sampling distribution, Visualization calculation confidence intervals interfaces way simulation-based distribution, Note t distribution recentered rescaled lie scale observed data. infer support confidence intervals means via z distribution.","code":"x_bar <- gss %>% specify(response = hours) %>% calculate(stat = \"mean\") x_bar <- gss %>% observe(response = hours, stat = \"mean\") boot_dist <- gss %>% specify(response = hours) %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"mean\") percentile_ci <- get_ci(boot_dist) visualize(boot_dist) + shade_confidence_interval(endpoints = percentile_ci) standard_error_ci <- get_ci(boot_dist, type = \"se\", point_estimate = x_bar) visualize(boot_dist) + shade_confidence_interval(endpoints = standard_error_ci) sampling_dist <- gss %>% specify(response = hours) %>% assume(distribution = \"t\") theor_ci <- get_ci(sampling_dist, point_estimate = x_bar) theor_ci ## # A tibble: 1 × 2 ## lower_ci upper_ci ## ## 1 40.1 42.7 visualize(sampling_dist) + shade_confidence_interval(endpoints = theor_ci)"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-numerical-one-mean---standardized","dir":"Articles","previous_headings":"Confidence intervals","what":"One numerical (one mean - standardized)","title":"Full infer Pipeline Examples","text":"Finding observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating bootstrap distribution, Use bootstrap distribution find confidence interval, Visualizing observed statistic alongside distribution, Alternatively, use bootstrap distribution find confidence interval using standard error, See subsection (one mean) theory-based approach. Note infer support confidence intervals means via z distribution.","code":"t_hat <- gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% calculate(stat = \"t\") t_hat <- gss %>% observe(response = hours, null = \"point\", mu = 40, stat = \"t\") boot_dist <- gss %>% specify(response = hours) %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"t\") percentile_ci <- get_ci(boot_dist) visualize(boot_dist) + shade_confidence_interval(endpoints = percentile_ci) standard_error_ci <- boot_dist %>% get_ci(type = \"se\", point_estimate = t_hat) visualize(boot_dist) + shade_confidence_interval(endpoints = standard_error_ci)"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-categorical-one-proportion-1","dir":"Articles","previous_headings":"Confidence intervals","what":"One categorical (one proportion)","title":"Full infer Pipeline Examples","text":"Finding observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating bootstrap distribution, Use bootstrap distribution find confidence interval, Visualizing observed statistic alongside distribution, Alternatively, use bootstrap distribution find confidence interval using standard error, Instead simulation-based bootstrap distribution, can also define theory-based sampling distribution, Visualization calculation confidence intervals interfaces way simulation-based distribution, Note z distribution recentered rescaled lie scale observed data. infer support confidence intervals means via z distribution.","code":"p_hat <- gss %>% specify(response = sex, success = \"female\") %>% calculate(stat = \"prop\") p_hat <- gss %>% observe(response = sex, success = \"female\", stat = \"prop\") boot_dist <- gss %>% specify(response = sex, success = \"female\") %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"prop\") percentile_ci <- get_ci(boot_dist) visualize(boot_dist) + shade_confidence_interval(endpoints = percentile_ci) standard_error_ci <- boot_dist %>% get_ci(type = \"se\", point_estimate = p_hat) visualize(boot_dist) + shade_confidence_interval(endpoints = standard_error_ci) sampling_dist <- gss %>% specify(response = sex, success = \"female\") %>% assume(distribution = \"z\") theor_ci <- get_ci(sampling_dist, point_estimate = p_hat) theor_ci ## # A tibble: 1 × 2 ## lower_ci upper_ci ## ## 1 0.430 0.518 visualize(sampling_dist) + shade_confidence_interval(endpoints = theor_ci)"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-categorical-variable-standardized-proportion-z-1","dir":"Articles","previous_headings":"Confidence intervals","what":"One categorical variable (standardized proportion \\(z\\))","title":"Full infer Pipeline Examples","text":"See subsection (one proportion) theory-based approach.","code":""},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-numerical-variable-one-categorical-2-levels-diff-in-means-1","dir":"Articles","previous_headings":"Confidence intervals","what":"One numerical variable, one categorical (2 levels) (diff in means)","title":"Full infer Pipeline Examples","text":"Finding observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating bootstrap distribution, Use bootstrap distribution find confidence interval, Visualizing observed statistic alongside distribution, Alternatively, use bootstrap distribution find confidence interval using standard error, Instead simulation-based bootstrap distribution, can also define theory-based sampling distribution, Visualization calculation confidence intervals interfaces way simulation-based distribution, Note t distribution recentered rescaled lie scale observed data. infer also provides functionality calculate ratios means. workflow looks similar diff means. Finding observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating bootstrap distribution, Use bootstrap distribution find confidence interval, Visualizing observed statistic alongside distribution, Alternatively, use bootstrap distribution find confidence interval using standard error,","code":"d_hat <- gss %>% specify(hours ~ college) %>% calculate(stat = \"diff in means\", order = c(\"degree\", \"no degree\")) d_hat <- gss %>% observe(hours ~ college, stat = \"diff in means\", order = c(\"degree\", \"no degree\")) boot_dist <- gss %>% specify(hours ~ college) %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"diff in means\", order = c(\"degree\", \"no degree\")) percentile_ci <- get_ci(boot_dist) visualize(boot_dist) + shade_confidence_interval(endpoints = percentile_ci) standard_error_ci <- boot_dist %>% get_ci(type = \"se\", point_estimate = d_hat) visualize(boot_dist) + shade_confidence_interval(endpoints = standard_error_ci) sampling_dist <- gss %>% specify(hours ~ college) %>% assume(distribution = \"t\") theor_ci <- get_ci(sampling_dist, point_estimate = d_hat) theor_ci ## # A tibble: 1 × 2 ## lower_ci upper_ci ## ## 1 -1.16 4.24 visualize(sampling_dist) + shade_confidence_interval(endpoints = theor_ci) d_hat <- gss %>% specify(hours ~ college) %>% calculate(stat = \"ratio of means\", order = c(\"degree\", \"no degree\")) d_hat <- gss %>% observe(hours ~ college, stat = \"ratio of means\", order = c(\"degree\", \"no degree\")) boot_dist <- gss %>% specify(hours ~ college) %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"ratio of means\", order = c(\"degree\", \"no degree\")) percentile_ci <- get_ci(boot_dist) visualize(boot_dist) + shade_confidence_interval(endpoints = percentile_ci) standard_error_ci <- boot_dist %>% get_ci(type = \"se\", point_estimate = d_hat) visualize(boot_dist) + shade_confidence_interval(endpoints = standard_error_ci)"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"one-numerical-variable-one-categorical-2-levels-t-1","dir":"Articles","previous_headings":"Confidence intervals","what":"One numerical variable, one categorical (2 levels) (t)","title":"Full infer Pipeline Examples","text":"Finding standardized point estimate, Alternatively, using observe() wrapper calculate observed statistic, , generating bootstrap distribution, Use bootstrap distribution find confidence interval, Visualizing observed statistic alongside distribution, Alternatively, use bootstrap distribution find confidence interval using standard error, See subsection (diff means) theory-based approach. infer support confidence intervals means via z distribution.","code":"t_hat <- gss %>% specify(hours ~ college) %>% calculate(stat = \"t\", order = c(\"degree\", \"no degree\")) t_hat <- gss %>% observe(hours ~ college, stat = \"t\", order = c(\"degree\", \"no degree\")) boot_dist <- gss %>% specify(hours ~ college) %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"t\", order = c(\"degree\", \"no degree\")) percentile_ci <- get_ci(boot_dist) visualize(boot_dist) + shade_confidence_interval(endpoints = percentile_ci) standard_error_ci <- boot_dist %>% get_ci(type = \"se\", point_estimate = t_hat) visualize(boot_dist) + shade_confidence_interval(endpoints = standard_error_ci)"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"two-categorical-variables-diff-in-proportions","dir":"Articles","previous_headings":"Confidence intervals","what":"Two categorical variables (diff in proportions)","title":"Full infer Pipeline Examples","text":"Finding observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating bootstrap distribution, Use bootstrap distribution find confidence interval, Visualizing observed statistic alongside distribution, Alternatively, use bootstrap distribution find confidence interval using standard error, Instead simulation-based bootstrap distribution, can also define theory-based sampling distribution, Visualization calculation confidence intervals interfaces way simulation-based distribution, Note z distribution recentered rescaled lie scale observed data.","code":"d_hat <- gss %>% specify(college ~ sex, success = \"degree\") %>% calculate(stat = \"diff in props\", order = c(\"female\", \"male\")) d_hat <- gss %>% observe(college ~ sex, success = \"degree\", stat = \"diff in props\", order = c(\"female\", \"male\")) boot_dist <- gss %>% specify(college ~ sex, success = \"degree\") %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"diff in props\", order = c(\"female\", \"male\")) percentile_ci <- get_ci(boot_dist) visualize(boot_dist) + shade_confidence_interval(endpoints = percentile_ci) standard_error_ci <- boot_dist %>% get_ci(type = \"se\", point_estimate = d_hat) visualize(boot_dist) + shade_confidence_interval(endpoints = standard_error_ci) sampling_dist <- gss %>% specify(college ~ sex, success = \"degree\") %>% assume(distribution = \"z\") theor_ci <- get_ci(sampling_dist, point_estimate = d_hat) theor_ci ## # A tibble: 1 × 2 ## lower_ci upper_ci ## ## 1 -0.0794 0.0878 visualize(sampling_dist) + shade_confidence_interval(endpoints = theor_ci)"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"two-categorical-variables-z","dir":"Articles","previous_headings":"Confidence intervals","what":"Two categorical variables (z)","title":"Full infer Pipeline Examples","text":"Finding standardized point estimate, Alternatively, using observe() wrapper calculate observed statistic, , generating bootstrap distribution, Use bootstrap distribution find confidence interval, Visualizing observed statistic alongside distribution, Alternatively, use bootstrap distribution find confidence interval using standard error, See subsection (diff props) theory-based approach.","code":"z_hat <- gss %>% specify(college ~ sex, success = \"degree\") %>% calculate(stat = \"z\", order = c(\"female\", \"male\")) z_hat <- gss %>% observe(college ~ sex, success = \"degree\", stat = \"z\", order = c(\"female\", \"male\")) boot_dist <- gss %>% specify(college ~ sex, success = \"degree\") %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"z\", order = c(\"female\", \"male\")) percentile_ci <- get_ci(boot_dist) visualize(boot_dist) + shade_confidence_interval(endpoints = percentile_ci) standard_error_ci <- boot_dist %>% get_ci(type = \"se\", point_estimate = z_hat) visualize(boot_dist) + shade_confidence_interval(endpoints = standard_error_ci)"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"two-numerical-vars---slr-1","dir":"Articles","previous_headings":"Confidence intervals","what":"Two numerical vars - SLR","title":"Full infer Pipeline Examples","text":"Finding observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating bootstrap distribution, Use bootstrap distribution find confidence interval, Visualizing observed statistic alongside distribution, Alternatively, use bootstrap distribution find confidence interval using standard error,","code":"slope_hat <- gss %>% specify(hours ~ age) %>% calculate(stat = \"slope\") slope_hat <- gss %>% observe(hours ~ age, stat = \"slope\") boot_dist <- gss %>% specify(hours ~ age) %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"slope\") percentile_ci <- get_ci(boot_dist) visualize(boot_dist) + shade_confidence_interval(endpoints = percentile_ci) standard_error_ci <- boot_dist %>% get_ci(type = \"se\", point_estimate = slope_hat) visualize(boot_dist) + shade_confidence_interval(endpoints = standard_error_ci)"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"two-numerical-vars---correlation-1","dir":"Articles","previous_headings":"Confidence intervals","what":"Two numerical vars - correlation","title":"Full infer Pipeline Examples","text":"Finding observed statistic, Alternatively, using observe() wrapper calculate observed statistic, , generating bootstrap distribution, Use bootstrap distribution find confidence interval, Visualizing observed statistic alongside distribution, Alternatively, use bootstrap distribution find confidence interval using standard error,","code":"correlation_hat <- gss %>% specify(hours ~ age) %>% calculate(stat = \"correlation\") correlation_hat <- gss %>% observe(hours ~ age, stat = \"correlation\") boot_dist <- gss %>% specify(hours ~ age) %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"correlation\") percentile_ci <- get_ci(boot_dist) visualize(boot_dist) + shade_confidence_interval(endpoints = percentile_ci) standard_error_ci <- boot_dist %>% get_ci(type = \"se\", point_estimate = correlation_hat) visualize(boot_dist) + shade_confidence_interval(endpoints = standard_error_ci)"},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"two-numerical-vars---t","dir":"Articles","previous_headings":"Confidence intervals","what":"Two numerical vars - t","title":"Full infer Pipeline Examples","text":"currently implemented since \\(t\\) refer standardized slope standardized correlation.","code":""},{"path":"https://infer.tidymodels.org/dev/articles/observed_stat_examples.html","id":"multiple-explanatory-variables-1","dir":"Articles","previous_headings":"Confidence intervals","what":"Multiple explanatory variables","title":"Full infer Pipeline Examples","text":"Calculating observed fit, , generating bootstrap distribution, Use bootstrap distribution find confidence interval, Visualizing observed statistic alongside distribution, Note fit()-based workflow can applied use cases differing numbers explanatory variables explanatory variable types.","code":"obs_fit <- gss %>% specify(hours ~ age + college) %>% fit() boot_dist <- gss %>% specify(hours ~ age + college) %>% generate(reps = 1000, type = \"bootstrap\") %>% fit() conf_ints <- get_confidence_interval( boot_dist, level = .95, point_estimate = obs_fit ) visualize(boot_dist) + shade_confidence_interval(endpoints = conf_ints)"},{"path":"https://infer.tidymodels.org/dev/articles/paired.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Tidy inference for paired data","text":"vignette, ’ll walk conducting randomization-based paired test independence infer. Throughout vignette, ’ll make use gss dataset supplied infer, contains sample data General Social Survey. See ?gss information variables included source. Note data (examples ) demonstration purposes , necessarily provide accurate estimates unless weighted properly. examples, let’s suppose dataset representative sample population want learn : American adults. data looks like : Two sets observations paired observation one column special correspondence connection exactly one observation . purposes vignette, ’ll simulate additional data variable natural pairing: suppose survey respondents provided number hours worked per week surveyed 5 years prior, encoded hours_previous. number hours worked per week particular respondent special correspondence number hours worked 5 years prior hours_previous respondent. ’d like test null hypothesis \"mean\" hours worked per week change sampled time five years prior. carry inference paired data infer, pre-compute difference paired values beginning analysis, use differences values interest. , pre-compute difference paired observations diff. distribution diff observed data looks like : looks distribution, respondents worked similar number hours worked per week 5 hours prior, though seems like may slight decline number hours worked per week aggregate. (know true effect -.2 since ’ve simulated data.) calculate observed statistic paired setting way outside paired setting. Using specify() calculate(): observed statistic -0.202. Now, want compare statistic null distribution, generated assumption true difference actually zero, get sense likely us see observed difference truly change hours worked per week population. Tests paired data carried via null = \"paired independence\" argument hypothesize(). replicate, generate() carries type = \"permute\" null = \"paired independence\" : Randomly sampling vector signs (.e. -1 1), probability .5 either, length equal input data, Multiplying response variable vector signs, “flipping” observed values random subset value replicate get sense distribution looks like, observed statistic falls, can use visualize(): looks like observed mean -0.202 relatively unlikely truly change mean number hours worked per week time period. exactly, can calculate p-value: Thus, change mean number hours worked per week time period truly zero, approximation probability see test statistic extreme -0.202 approximately 0.028. can also generate bootstrap confidence interval mean paired difference using type = \"bootstrap\" generate(). , use pre-computed differences generating bootstrap resamples: Note , unlike null distribution test statistics generated earlier type = \"permute\", distribution centered observed_statistic. Calculating confidence interval: default, get_confidence_interval() constructs lower upper bounds taking observations \\((1 - .95) / 2\\) \\(1 - ((1-.95) / 2)\\)th percentiles. instead build confidence interval using standard error bootstrap distribution, can write: learn randomization-based inference paired observations, see relevant chapter Introduction Modern Statistics.","code":"dplyr::glimpse(gss) ## Rows: 500 ## Columns: 11 ## $ year 2014, 1994, 1998, 1996, 1994, 1996, 1990, 2016, 2000, 19… ## $ age 36, 34, 24, 42, 31, 32, 48, 36, 30, 33, 21, 30, 38, 49, … ## $ sex male, female, male, male, male, female, female, female, … ## $ college degree, no degree, degree, no degree, degree, no degree,… ## $ partyid ind, rep, ind, ind, rep, rep, dem, ind, rep, dem, dem, i… ## $ hompop 3, 4, 1, 4, 2, 4, 2, 1, 5, 2, 4, 3, 4, 4, 2, 2, 3, 2, 1,… ## $ hours 50, 31, 40, 40, 40, 53, 32, 20, 40, 40, 23, 52, 38, 72, … ## $ income $25000 or more, $20000 - 24999, $25000 or more, $25000 o… ## $ class middle class, working class, working class, working clas… ## $ finrela below average, below average, below average, above avera… ## $ weight 0.8960, 1.0825, 0.5501, 1.0864, 1.0825, 1.0864, 1.0627, … set.seed(1) gss_paired <- gss %>% mutate( hours_previous = hours + 5 - rpois(nrow(.), 4.8), diff = hours - hours_previous ) gss_paired %>% select(hours, hours_previous, diff) ## # A tibble: 500 × 3 ## hours hours_previous diff ## ## 1 50 52 -2 ## 2 31 32 -1 ## 3 40 40 0 ## 4 40 37 3 ## 5 40 42 -2 ## 6 53 50 3 ## 7 32 28 4 ## 8 20 19 1 ## 9 40 40 0 ## 10 40 43 -3 ## # ℹ 490 more rows # calculate the observed statistic observed_statistic <- gss_paired %>% specify(response = diff) %>% calculate(stat = \"mean\") # generate the null distribution null_dist <- gss_paired %>% specify(response = diff) %>% hypothesize(null = \"paired independence\") %>% generate(reps = 1000, type = \"permute\") %>% calculate(stat = \"mean\") null_dist ## Response: diff (numeric) ## Null Hypothesis: paired independence ## # A tibble: 1,000 × 2 ## replicate stat ## ## 1 1 -0.146 ## 2 2 0.19 ## 3 3 0.042 ## 4 4 0.034 ## 5 5 -0.138 ## 6 6 -0.03 ## 7 7 0.174 ## 8 8 0.066 ## 9 9 0.01 ## 10 10 0.13 ## # ℹ 990 more rows # visualize the null distribution and test statistic null_dist %>% visualize() + shade_p_value(observed_statistic, direction = \"two-sided\") ## Warning in (function (mapping = NULL, data = NULL, stat = \"identity\", position = \"identity\", : All aesthetics have length 1, but the data has 1000 rows. ## ℹ Did you mean to use `annotate()`? # calculate the p value from the test statistic and null distribution p_value <- null_dist %>% get_p_value(obs_stat = observed_statistic, direction = \"two-sided\") p_value ## # A tibble: 1 × 1 ## p_value ## ## 1 0.028 # generate a bootstrap distribution boot_dist <- gss_paired %>% specify(response = diff) %>% hypothesize(null = \"paired independence\") %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"mean\") visualize(boot_dist) # calculate the confidence from the bootstrap distribution confidence_interval <- boot_dist %>% get_confidence_interval(level = .95) confidence_interval ## # A tibble: 1 × 2 ## lower_ci upper_ci ## ## 1 -0.390 -0.022 boot_dist %>% get_confidence_interval(type = \"se\", point_estimate = observed_statistic, level = .95) ## # A tibble: 1 × 2 ## lower_ci upper_ci ## ## 1 -0.383 -0.0210"},{"path":"https://infer.tidymodels.org/dev/articles/t_test.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Tidy t-Tests with infer","text":"vignette, ’ll walk conducting \\(t\\)-tests randomization-based analogue using infer. ’ll start 1-sample \\(t\\)-test, compares sample mean hypothesized true mean value. , ’ll discuss paired \\(t\\)-tests, special use case 1-sample \\(t\\)-tests, evaluate whether differences paired values (e.g. measure taken person experiment) differ 0. Finally, ’ll wrap 2-sample \\(t\\)-tests, testing difference means two populations using sample data drawn . Throughout vignette, ’ll make use gss dataset supplied infer, contains sample data General Social Survey. See ?gss information variables included source. Note data (examples ) demonstration purposes , necessarily provide accurate estimates unless weighted properly. examples, let’s suppose dataset representative sample population want learn : American adults. data looks like :","code":"dplyr::glimpse(gss) ## Rows: 500 ## Columns: 11 ## $ year 2014, 1994, 1998, 1996, 1994, 1996, 1990, 2016, 2000, 19… ## $ age 36, 34, 24, 42, 31, 32, 48, 36, 30, 33, 21, 30, 38, 49, … ## $ sex male, female, male, male, male, female, female, female, … ## $ college degree, no degree, degree, no degree, degree, no degree,… ## $ partyid ind, rep, ind, ind, rep, rep, dem, ind, rep, dem, dem, i… ## $ hompop 3, 4, 1, 4, 2, 4, 2, 1, 5, 2, 4, 3, 4, 4, 2, 2, 3, 2, 1,… ## $ hours 50, 31, 40, 40, 40, 53, 32, 20, 40, 40, 23, 52, 38, 72, … ## $ income $25000 or more, $20000 - 24999, $25000 or more, $25000 o… ## $ class middle class, working class, working class, working clas… ## $ finrela below average, below average, below average, above avera… ## $ weight 0.8960, 1.0825, 0.5501, 1.0864, 1.0825, 1.0864, 1.0627, …"},{"path":"https://infer.tidymodels.org/dev/articles/t_test.html","id":"sample-t-test","dir":"Articles","previous_headings":"","what":"1-Sample t-Test","title":"Tidy t-Tests with infer","text":"1-sample \\(t\\)-test can used test whether sample continuous data plausibly come population specified mean. example, ’ll test whether average American adult works 40 hours week using data gss. , make use hours variable, giving number hours respondents reported worked previous week. distribution hours observed data looks like : looks like respondents reported worked 40 hours, ’s quite bit variability. Let’s test whether evidence true mean number hours Americans work per week 40. infer’s randomization-based analogue 1-sample \\(t\\)-test 1-sample mean test. ’ll start showcasing test demonstrating carry theory-based \\(t\\)-test package. First, calculate observed statistic, can use specify() calculate(). observed statistic 41.382. Now, want compare statistic null distribution, generated assumption mean actually 40, get sense likely us see observed mean true number hours worked per week population really 40. can generate null distribution using bootstrap. bootstrap, replicate, sample size equal input sample size drawn (replacement) input sample data. allows us get sense much variability ’d expect see entire population can understand unlikely sample mean . get sense distributions look like, observed statistic falls, can use visualize(): looks like observed mean 41.382 relatively unlikely true mean actually 40 hours week. exactly, can calculate p-value: Thus, true mean number hours worked per week really 40, approximation probability see test statistic extreme 41.382 approximately 0.04. Analogously steps shown , package supplies wrapper function, t_test, carry 1-sample \\(t\\)-tests tidy data. Rather using randomization, wrappers carry theory-based \\(t\\)-test. syntax looks like : alternative approach t_test() wrapper calculate observed statistic infer pipeline supply pt function base R. Note pipeline calculate observed statistic includes call hypothesize() since \\(t\\) statistic requires hypothesized mean value. , juxtaposing \\(t\\) statistic associated distribution using pt function: Note resulting \\(t\\)-statistics two theory-based approaches .","code":"# calculate the observed statistic observed_statistic <- gss %>% specify(response = hours) %>% calculate(stat = \"mean\") # generate the null distribution null_dist_1_sample <- gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% generate(reps = 1000, type = \"bootstrap\") %>% calculate(stat = \"mean\") # visualize the null distribution and test statistic! null_dist_1_sample %>% visualize() + shade_p_value(observed_statistic, direction = \"two-sided\") # calculate the p value from the test statistic and null distribution p_value_1_sample <- null_dist_1_sample %>% get_p_value(obs_stat = observed_statistic, direction = \"two-sided\") p_value_1_sample ## # A tibble: 1 × 1 ## p_value ## ## 1 0.04 t_test(gss, response = hours, mu = 40) ## # A tibble: 1 × 7 ## statistic t_df p_value alternative estimate lower_ci upper_ci ## ## 1 2.09 499 0.0376 two.sided 41.4 40.1 42.7 # calculate the observed statistic observed_statistic <- gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% calculate(stat = \"t\") %>% dplyr::pull() pt(unname(observed_statistic), df = nrow(gss) - 1, lower.tail = FALSE)*2 ## [1] 0.03756"},{"path":"https://infer.tidymodels.org/dev/articles/t_test.html","id":"sample-t-test-1","dir":"Articles","previous_headings":"","what":"2-Sample t-Test","title":"Tidy t-Tests with infer","text":"2-Sample \\(t\\)-tests evaluate difference mean values two populations using data randomly-sampled population approximately follows normal distribution. example, ’ll test Americans work number hours week regardless whether college degree using data gss. college hours variables allow us : looks like distributions centered near 40 hours week, distribution degree slightly right skewed. , note warning missing values—many respondents’ values missing. actually carrying hypothesis test, might look data collected; ’s possible whether value either columns missing related value . infer’s randomization-based analogue 2-sample \\(t\\)-test difference means test. ’ll start showcasing test demonstrating carry theory-based \\(t\\)-test package. one-sample test, calculate observed difference means, can use specify() calculate(). Note , line specify(hours ~ college), swapped syntax specify(response = hours, explanatory = college)! order argument calculate line gives order subtract mean values : case, ’re taking mean number hours worked degree minus mean number hours worked without degree; positive difference, , mean people degrees worked without degree. Now, want compare difference means null distribution, generated assumption number hours worked week relationship whether one college degree, get sense likely us see observed difference means really relationship two variables. can generate null distribution using permutation, , replicate, value degree status randomly reassigned (without replacement) new number hours worked per week sample order break association two. , note , lines specify(hours ~ college) chunk, used syntax specify(response = hours, explanatory = college) instead! get sense distributions look like, observed statistic falls, can use visualize(). looks like observed statistic 1.5384 unlikely truly relationship degree status number hours worked. exactly, can calculate p-value; theoretical p-values yet supported, ’ll use randomization-based null distribution calculate p-value. Thus, really relationship number hours worked week whether one college degree, probability see statistic extreme 1.5384 approximately 0.25. Note , similarly steps shown , package supplies wrapper function, t_test, carry 2-sample \\(t\\)-tests tidy data. syntax looks like : example, specified relationship syntax formula = hours ~ college; also written response = hours, explanatory = college. alternative approach t_test() wrapper calculate observed statistic infer pipeline supply pt function base R. can calculate statistic , switching stat = \"diff means\" argument stat = \"t\". Note pipeline calculate observed statistic includes hypothesize() since \\(t\\) statistic requires hypothesized mean value. , juxtaposing \\(t\\) statistic associated distribution using pt function: Note results two theory-based approaches nearly .","code":"# calculate the observed statistic observed_statistic <- gss %>% specify(hours ~ college) %>% calculate(stat = \"diff in means\", order = c(\"degree\", \"no degree\")) observed_statistic ## Response: hours (numeric) ## Explanatory: college (factor) ## # A tibble: 1 × 1 ## stat ## ## 1 1.54 # generate the null distribution with randomization null_dist_2_sample <- gss %>% specify(hours ~ college) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") %>% calculate(stat = \"diff in means\", order = c(\"degree\", \"no degree\")) # visualize the randomization-based null distribution and test statistic! null_dist_2_sample %>% visualize() + shade_p_value(observed_statistic, direction = \"two-sided\") # calculate the p value from the randomization-based null # distribution and the observed statistic p_value_2_sample <- null_dist_2_sample %>% get_p_value(obs_stat = observed_statistic, direction = \"two-sided\") p_value_2_sample ## # A tibble: 1 × 1 ## p_value ## ## 1 0.25 t_test(x = gss, formula = hours ~ college, order = c(\"degree\", \"no degree\"), alternative = \"two-sided\") ## # A tibble: 1 × 7 ## statistic t_df p_value alternative estimate lower_ci upper_ci ## ## 1 1.12 366. 0.264 two.sided 1.54 -1.16 4.24 # calculate the observed statistic observed_statistic <- gss %>% specify(hours ~ college) %>% hypothesize(null = \"point\", mu = 40) %>% calculate(stat = \"t\", order = c(\"degree\", \"no degree\")) %>% dplyr::pull() observed_statistic ## t ## 1.119 pt(unname(observed_statistic), df = nrow(gss) - 2, lower.tail = FALSE)*2 ## [1] 0.2635"},{"path":"https://infer.tidymodels.org/dev/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Andrew Bray. Author. Chester Ismay. Author. Evgeni Chasnovski. Author. Simon Couch. Author, maintainer. Ben Baumer. Author. Mine Cetinkaya-Rundel. Author. Ted Laderas. Contributor. Nick Solomon. Contributor. Johanna Hardin. Contributor. Albert Y. Kim. Contributor. Neal Fultz. Contributor. Doug Friedman. Contributor. Richie Cotton. Contributor. Brian Fannin. Contributor.","code":""},{"path":"https://infer.tidymodels.org/dev/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Couch et al., (2021). infer: R package tidyverse-friendly statistical inference. Journal Open Source Software, 6(65), 3661, https://doi.org/10.21105/joss.03661","code":"@Article{, title = {{infer}: An {R} package for tidyverse-friendly statistical inference}, author = {Simon P. Couch and Andrew P. Bray and Chester Ismay and Evgeni Chasnovski and Benjamin S. Baumer and Mine Çetinkaya-Rundel}, journal = {Journal of Open Source Software}, year = {2021}, volume = {6}, number = {65}, pages = {3661}, doi = {10.21105/joss.03661}, }"},{"path":"https://infer.tidymodels.org/dev/index.html","id":"infer-r-package-","dir":"","previous_headings":"","what":"Tidy Statistical Inference","title":"Tidy Statistical Inference","text":"objective package perform statistical inference using expressive statistical grammar coheres tidyverse design framework. package centered around 4 main verbs, supplemented many utilities visualize extract value outputs. specify() allows specify variable, relationship variables, ’re interested . hypothesize() allows declare null hypothesis. generate() allows generate data reflecting null hypothesis. calculate() allows calculate distribution statistics generated data form null distribution. learn principles underlying package design, see vignette(\"infer\"). ’re interested learning randomization-based statistical inference generally, including applied examples package, recommend checking Statistical Inference Via Data Science: ModernDive R Tidyverse Introduction Modern Statistics.","code":""},{"path":"https://infer.tidymodels.org/dev/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"Tidy Statistical Inference","text":"install current stable version infer CRAN: install developmental stable version infer, make sure install remotes first. pkgdown website version infer.tidymodels.org.","code":"install.packages(\"infer\") # install.packages(\"pak\") pak::pak(\"tidymodels/infer\")"},{"path":"https://infer.tidymodels.org/dev/index.html","id":"contributing","dir":"","previous_headings":"","what":"Contributing","title":"Tidy Statistical Inference","text":"welcome others helping us make package user-friendly efficient possible. Please review contributing conduct guidelines. participating project agree abide terms. questions discussions tidymodels packages, modeling, machine learning, please post Posit Community. think encountered bug, please submit issue. Either way, learn create share reprex (minimal, reproducible example), clearly communicate code. Check details contributing guidelines tidymodels packages get help.","code":""},{"path":"https://infer.tidymodels.org/dev/index.html","id":"examples","dir":"","previous_headings":"","what":"Examples","title":"Tidy Statistical Inference","text":"examples pulled “Full infer Pipeline Examples” vignette, accessible calling vignette(\"observed_stat_examples\"). make use gss dataset supplied package, providing sample data General Social Survey. data looks like : example, ’ll run analysis variance age partyid, testing whether age respondent independent political party affiliation. Calculating observed statistic, , generating null distribution, Visualizing observed statistic alongside null distribution, Calculating p-value null distribution observed statistic, Note formula non-formula interfaces (.e. age ~ partyid vs. response = age, explanatory = partyid) work implemented inference procedures infer. Use whatever natural . modeling using functions like lm() glm(), though, recommend begin use formula y ~ x notation soon possible. resources available package vignettes! See vignette(\"observed_stat_examples\") examples like one , vignette(\"infer\") discussion underlying principles package design.","code":"# load in the dataset data(gss) # take a glimpse at it str(gss) ## tibble [500 × 11] (S3: tbl_df/tbl/data.frame) ## $ year : num [1:500] 2014 1994 1998 1996 1994 ... ## $ age : num [1:500] 36 34 24 42 31 32 48 36 30 33 ... ## $ sex : Factor w/ 2 levels \"male\",\"female\": 1 2 1 1 1 2 2 2 2 2 ... ## $ college: Factor w/ 2 levels \"no degree\",\"degree\": 2 1 2 1 2 1 1 2 2 1 ... ## $ partyid: Factor w/ 5 levels \"dem\",\"ind\",\"rep\",..: 2 3 2 2 3 3 1 2 3 1 ... ## $ hompop : num [1:500] 3 4 1 4 2 4 2 1 5 2 ... ## $ hours : num [1:500] 50 31 40 40 40 53 32 20 40 40 ... ## $ income : Ord.factor w/ 12 levels \"lt $1000\"<\"$1000 to 2999\"<..: 12 11 12 12 12 12 12 12 12 10 ... ## $ class : Factor w/ 6 levels \"lower class\",..: 3 2 2 2 3 3 2 3 3 2 ... ## $ finrela: Factor w/ 6 levels \"far below average\",..: 2 2 2 4 4 3 2 4 3 1 ... ## $ weight : num [1:500] 0.896 1.083 0.55 1.086 1.083 ... F_hat <- gss %>% specify(age ~ partyid) %>% calculate(stat = \"F\") null_dist <- gss %>% specify(age ~ partyid) %>% hypothesize(null = \"independence\") %>% generate(reps = 1000, type = \"permute\") %>% calculate(stat = \"F\") visualize(null_dist) + shade_p_value(obs_stat = F_hat, direction = \"greater\") null_dist %>% get_p_value(obs_stat = F_hat, direction = \"greater\") ## # A tibble: 1 × 1 ## p_value ## ## 1 0.06"},{"path":"https://infer.tidymodels.org/dev/reference/assume.html","id":null,"dir":"Reference","previous_headings":"","what":"Define a theoretical distribution — assume","title":"Define a theoretical distribution — assume","text":"function allows user define null distribution based theoretical methods. many infer pipelines, assume() can used place generate() calculate() create null distribution. Rather outputting data frame containing distribution test statistics calculated resamples observed data, assume() outputs abstract type object just containing distributional details supplied distribution df arguments. However, assume() output can passed visualize(), get_p_value(), get_confidence_interval() way simulation-based distributions can. define theoretical null distribution (use hypothesis testing), sure provide null hypothesis via hypothesize(). define theoretical sampling distribution (use confidence intervals), provide output specify(). Sampling distributions (implemented t z) lie scale data, recentered rescaled match corresponding stat given calculate() calculate observed statistic.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/assume.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Define a theoretical distribution — assume","text":"","code":"assume(x, distribution, df = NULL, ...)"},{"path":"https://infer.tidymodels.org/dev/reference/assume.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Define a theoretical distribution — assume","text":"x output specify() hypothesize(), giving observed data, variable(s) interest, (optionally) null hypothesis. distribution distribution question, string. One \"F\", \"Chisq\", \"t\", \"z\". df Optional. degrees freedom parameter(s) distribution supplied, numeric vector. distribution = \"F\", length two (e.g. c(10, 3)). distribution = \"Chisq\" distribution = \"t\", length one. distribution = \"z\", argument required. package supply message supplied df argument different recognized values. See Details section information. ... Currently ignored.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/assume.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Define a theoretical distribution — assume","text":"infer theoretical distribution can passed helpers like visualize(), get_p_value(), get_confidence_interval().","code":""},{"path":"https://infer.tidymodels.org/dev/reference/assume.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Define a theoretical distribution — assume","text":"Note assumption expressed , use theory-based inference, extends distributional assumptions: null distribution question parameters. Statistical inference infer, whether carried via simulation (.e. based pipelines using generate() calculate()) theory (.e. assume()), always involves condition observations independent . infer supports theoretical tests one two means via t distribution one two proportions via z. tests comparing two means, n1 group size one level explanatory variable, n2 level, infer recognize following degrees freedom (df) arguments: min(n1 - 1, n2 - 1) n1 + n2 - 2 \"parameter\" entry analogous stats::t.test() call \"parameter\" entry analogous stats::t.test() call var.equal = TRUE default, package use \"parameter\" entry analogous stats::t.test() call var.equal = FALSE (default).","code":""},{"path":"https://infer.tidymodels.org/dev/reference/assume.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Define a theoretical distribution — assume","text":"","code":"# construct theoretical distributions --------------------------------- # F distribution # with the `partyid` explanatory variable gss %>% specify(age ~ partyid) %>% assume(distribution = \"F\") #> Dropping unused factor levels DK from the supplied explanatory variable #> 'partyid'. #> An F distribution with 3 and 496 degrees of freedom. # Chi-squared goodness of fit distribution # on the `finrela` variable gss %>% specify(response = finrela) %>% hypothesize(null = \"point\", p = c(\"far below average\" = 1/6, \"below average\" = 1/6, \"average\" = 1/6, \"above average\" = 1/6, \"far above average\" = 1/6, \"DK\" = 1/6)) %>% assume(\"Chisq\") #> A Chi-squared distribution with 5 degrees of freedom. # Chi-squared test of independence # on the `finrela` and `sex` variables gss %>% specify(formula = finrela ~ sex) %>% assume(distribution = \"Chisq\") #> A Chi-squared distribution with 5 degrees of freedom. # T distribution gss %>% specify(age ~ college) %>% assume(\"t\") #> A T distribution with 423 degrees of freedom. # Z distribution gss %>% specify(response = sex, success = \"female\") %>% assume(\"z\") #> A Z distribution. if (FALSE) { # each of these distributions can be passed to infer helper # functions alongside observed statistics! # for example, a 1-sample t-test ------------------------------------- # calculate the observed statistic obs_stat <- gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% calculate(stat = \"t\") # construct a null distribution null_dist <- gss %>% specify(response = hours) %>% assume(\"t\") # juxtapose them visually visualize(null_dist) + shade_p_value(obs_stat, direction = \"both\") # calculate a p-value get_p_value(null_dist, obs_stat, direction = \"both\") # or, an F test ------------------------------------------------------ # calculate the observed statistic obs_stat <- gss %>% specify(age ~ partyid) %>% hypothesize(null = \"independence\") %>% calculate(stat = \"F\") # construct a null distribution null_dist <- gss %>% specify(age ~ partyid) %>% assume(distribution = \"F\") # juxtapose them visually visualize(null_dist) + shade_p_value(obs_stat, direction = \"both\") # calculate a p-value get_p_value(null_dist, obs_stat, direction = \"both\") }"},{"path":"https://infer.tidymodels.org/dev/reference/calculate.html","id":null,"dir":"Reference","previous_headings":"","what":"Calculate summary statistics — calculate","title":"Calculate summary statistics — calculate","text":"Given output specify() /hypothesize(), function return observed statistic specified stat argument. test statistics, Chisq, t, z, require null hypothesis. provided output generate(), function calculate supplied stat replicate. Learn vignette(\"infer\").","code":""},{"path":"https://infer.tidymodels.org/dev/reference/calculate.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calculate summary statistics — calculate","text":"","code":"calculate( x, stat = c(\"mean\", \"median\", \"sum\", \"sd\", \"prop\", \"count\", \"diff in means\", \"diff in medians\", \"diff in props\", \"Chisq\", \"F\", \"slope\", \"correlation\", \"t\", \"z\", \"ratio of props\", \"odds ratio\", \"ratio of means\"), order = NULL, ... )"},{"path":"https://infer.tidymodels.org/dev/reference/calculate.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calculate summary statistics — calculate","text":"x output generate() computation-based inference output hypothesize() piped theory-based inference. stat string giving type statistic calculate. Current options include \"mean\", \"median\", \"sum\", \"sd\", \"prop\", \"count\", \"diff means\", \"diff medians\", \"diff props\", \"Chisq\" (\"chisq\"), \"F\" (\"f\"), \"t\", \"z\", \"ratio props\", \"slope\", \"odds ratio\", \"ratio means\", \"correlation\". infer supports theoretical tests one two means via \"t\" distribution one two proportions via \"z\". order string vector specifying order levels explanatory variable ordered subtraction (division ratio-based statistics), order = c(\"first\", \"second\") means (\"first\" - \"second\"), analogue ratios. Needed inference difference means, medians, proportions, ratios, t, z statistics. ... pass options like na.rm = TRUE functions like mean(), sd(), etc. Can also used supply hypothesized null values \"t\" statistic additional arguments stats::chisq.test().","code":""},{"path":"https://infer.tidymodels.org/dev/reference/calculate.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calculate summary statistics — calculate","text":"tibble containing stat column calculated statistics.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/calculate.html","id":"missing-levels-in-small-samples","dir":"Reference","previous_headings":"","what":"Missing levels in small samples","title":"Calculate summary statistics — calculate","text":"cases, bootstrapping small samples, generated bootstrap samples one level explanatory variable present. test statistics, calculated statistic cases NaN. package omit non-finite values visualizations (warning) raise error p-value calculations.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/calculate.html","id":"reproducibility","dir":"Reference","previous_headings":"","what":"Reproducibility","title":"Calculate summary statistics — calculate","text":"using infer package research, cases exact reproducibility priority, sure set seed R’s random number generator. infer respect random seed specified set.seed() function, returning result generate()ing data given identical seed. instance, can calculate difference mean age college degree status using gss dataset 10 versions gss resampled permutation using following code. Setting seed value rerunning code produce result. Please keep mind writing infer code utilizes resampling generate().","code":"set.seed(1) gss %>% specify(age ~ college) %>% hypothesize(null = \"independence\") %>% generate(reps = 5, type = \"permute\") %>% calculate(\"diff in means\", order = c(\"degree\", \"no degree\")) ## Response: age (numeric) ## Explanatory: college (factor) ## Null Hypothesis: independence ## # A tibble: 5 x 2 ## replicate stat ## ## 1 1 -0.531 ## 2 2 -2.35 ## 3 3 0.764 ## 4 4 0.280 ## 5 5 0.350 # set the seed set.seed(1) gss %>% specify(age ~ college) %>% hypothesize(null = \"independence\") %>% generate(reps = 5, type = \"permute\") %>% calculate(\"diff in means\", order = c(\"degree\", \"no degree\")) ## Response: age (numeric) ## Explanatory: college (factor) ## Null Hypothesis: independence ## # A tibble: 5 x 2 ## replicate stat ## ## 1 1 -0.531 ## 2 2 -2.35 ## 3 3 0.764 ## 4 4 0.280 ## 5 5 0.350"},{"path":[]},{"path":"https://infer.tidymodels.org/dev/reference/calculate.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Calculate summary statistics — calculate","text":"","code":"# calculate a null distribution of hours worked per week under # the null hypothesis that the mean is 40 gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% generate(reps = 200, type = \"bootstrap\") %>% calculate(stat = \"mean\") #> Response: hours (numeric) #> Null Hypothesis: point #> # A tibble: 200 × 2 #> replicate stat #> #> 1 1 39.2 #> 2 2 39.4 #> 3 3 40.1 #> 4 4 39.6 #> 5 5 40.8 #> 6 6 39.9 #> 7 7 39.9 #> 8 8 40.8 #> 9 9 39.6 #> 10 10 41.0 #> # ℹ 190 more rows # calculate the corresponding observed statistic gss %>% specify(response = hours) %>% calculate(stat = \"mean\") #> Response: hours (numeric) #> # A tibble: 1 × 1 #> stat #> #> 1 41.4 # calculate a null distribution assuming independence between age # of respondent and whether they have a college degree gss %>% specify(age ~ college) %>% hypothesize(null = \"independence\") %>% generate(reps = 200, type = \"permute\") %>% calculate(\"diff in means\", order = c(\"degree\", \"no degree\")) #> Response: age (numeric) #> Explanatory: college (factor) #> Null Hypothesis: independence #> # A tibble: 200 × 2 #> replicate stat #> #> 1 1 -2.48 #> 2 2 -0.699 #> 3 3 -0.0113 #> 4 4 0.579 #> 5 5 0.553 #> 6 6 1.84 #> 7 7 -2.31 #> 8 8 -0.320 #> 9 9 -0.00250 #> 10 10 -1.78 #> # ℹ 190 more rows # calculate the corresponding observed statistic gss %>% specify(age ~ college) %>% calculate(\"diff in means\", order = c(\"degree\", \"no degree\")) #> Response: age (numeric) #> Explanatory: college (factor) #> # A tibble: 1 × 1 #> stat #> #> 1 0.941 # some statistics require a null hypothesis gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% calculate(stat = \"t\") #> Response: hours (numeric) #> Null Hypothesis: point #> # A tibble: 1 × 1 #> stat #> #> 1 2.09 # more in-depth explanation of how to use the infer package if (FALSE) { vignette(\"infer\") }"},{"path":"https://infer.tidymodels.org/dev/reference/chisq_stat.html","id":null,"dir":"Reference","previous_headings":"","what":"Tidy chi-squared test statistic — chisq_stat","title":"Tidy chi-squared test statistic — chisq_stat","text":"@description","code":""},{"path":"https://infer.tidymodels.org/dev/reference/chisq_stat.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Tidy chi-squared test statistic — chisq_stat","text":"","code":"chisq_stat(x, formula, response = NULL, explanatory = NULL, ...)"},{"path":"https://infer.tidymodels.org/dev/reference/chisq_stat.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Tidy chi-squared test statistic — chisq_stat","text":"x data frame can coerced tibble. formula formula response variable left explanatory right. Alternatively, response explanatory argument can supplied. response variable name x serve response. alternative using formula argument. explanatory variable name x serve explanatory variable. alternative using formula argument. ... Additional arguments chisq.test().","code":""},{"path":"https://infer.tidymodels.org/dev/reference/chisq_stat.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Tidy chi-squared test statistic — chisq_stat","text":"shortcut wrapper function get observed test statistic chisq test. Uses chisq.test(), applies continuity correction. function deprecated favor general observe().","code":""},{"path":[]},{"path":"https://infer.tidymodels.org/dev/reference/chisq_stat.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Tidy chi-squared test statistic — chisq_stat","text":"","code":"# chi-squared test statistic for test of independence # of college completion status depending and one's # self-identified income class chisq_stat(gss, college ~ finrela) #> Warning: The chisq_stat() wrapper has been deprecated in favor of the more general observe(). Please use that function instead. #> X-squared #> 30.68252 # chi-squared test statistic for a goodness of fit # test on whether self-identified income class # follows a uniform distribution chisq_stat(gss, response = finrela, p = c(\"far below average\" = 1/6, \"below average\" = 1/6, \"average\" = 1/6, \"above average\" = 1/6, \"far above average\" = 1/6, \"DK\" = 1/6)) #> Warning: The chisq_stat() wrapper has been deprecated in favor of the more general observe(). Please use that function instead. #> X-squared #> 487.984"},{"path":"https://infer.tidymodels.org/dev/reference/chisq_test.html","id":null,"dir":"Reference","previous_headings":"","what":"Tidy chi-squared test — chisq_test","title":"Tidy chi-squared test — chisq_test","text":"tidier version chisq.test() goodness fit tests tests independence.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/chisq_test.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Tidy chi-squared test — chisq_test","text":"","code":"chisq_test(x, formula, response = NULL, explanatory = NULL, ...)"},{"path":"https://infer.tidymodels.org/dev/reference/chisq_test.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Tidy chi-squared test — chisq_test","text":"x data frame can coerced tibble. formula formula response variable left explanatory right. Alternatively, response explanatory argument can supplied. response variable name x serve response. alternative using formula argument. explanatory variable name x serve explanatory variable. alternative using formula argument. ... Additional arguments chisq.test().","code":""},{"path":[]},{"path":"https://infer.tidymodels.org/dev/reference/chisq_test.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Tidy chi-squared test — chisq_test","text":"","code":"# chi-squared test of independence for college completion # status depending on one's self-identified income class chisq_test(gss, college ~ finrela) #> Warning: Chi-squared approximation may be incorrect #> # A tibble: 1 × 3 #> statistic chisq_df p_value #> #> 1 30.7 5 0.0000108 # chi-squared goodness of fit test on whether self-identified # income class follows a uniform distribution chisq_test(gss, response = finrela, p = c(\"far below average\" = 1/6, \"below average\" = 1/6, \"average\" = 1/6, \"above average\" = 1/6, \"far above average\" = 1/6, \"DK\" = 1/6)) #> # A tibble: 1 × 3 #> statistic chisq_df p_value #> #> 1 488. 5 3.13e-103"},{"path":"https://infer.tidymodels.org/dev/reference/deprecated.html","id":null,"dir":"Reference","previous_headings":"","what":"Deprecated functions and objects — deprecated","title":"Deprecated functions and objects — deprecated","text":"functions objects longer used. removed future release infer.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/deprecated.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Deprecated functions and objects — deprecated","text":"","code":"conf_int(x, level = 0.95, type = \"percentile\", point_estimate = NULL) p_value(x, obs_stat, direction)"},{"path":"https://infer.tidymodels.org/dev/reference/deprecated.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Deprecated functions and objects — deprecated","text":"x See non-deprecated function. level See non-deprecated function. type See non-deprecated function. point_estimate See non-deprecated function. obs_stat See non-deprecated function. direction See non-deprecated function.","code":""},{"path":[]},{"path":"https://infer.tidymodels.org/dev/reference/fit.infer.html","id":null,"dir":"Reference","previous_headings":"","what":"Fit linear models to infer objects — fit.infer","title":"Fit linear models to infer objects — fit.infer","text":"Given output infer core function, function fit linear model using stats::glm() according formula data supplied earlier pipeline. passed output specify() hypothesize(), function fit one model. passed output generate(), fit model data resample, denoted replicate column. family fitted model depends type response variable. response numeric, fit() use family = \"gaussian\" (linear regression). response 2-level factor character, fit() use family = \"binomial\" (logistic regression). fit character factor response variables two levels, recommend parsnip::multinom_reg(). infer provides fit \"method\" infer objects, way carrying model fitting applied infer output. \"generic,\" imported generics package re-exported package, provides general form fit() points infer's method called infer object. generic also documented . Learn vignette(\"infer\").","code":""},{"path":"https://infer.tidymodels.org/dev/reference/fit.infer.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Fit linear models to infer objects — fit.infer","text":"","code":"# S3 method for infer fit(object, ...)"},{"path":"https://infer.tidymodels.org/dev/reference/fit.infer.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Fit linear models to infer objects — fit.infer","text":"object Output infer function---likely generate() specify()---specifies formula data fit model . ... optional arguments pass along model fitting function. See stats::glm() information.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/fit.infer.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Fit linear models to infer objects — fit.infer","text":"tibble containing following columns: replicate: supplied input object previously passed generate(). number corresponding resample original data set model fitted . term: explanatory variable (intercept) question. estimate: model coefficient given resample (replicate) explanatory variable (term).","code":""},{"path":"https://infer.tidymodels.org/dev/reference/fit.infer.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Fit linear models to infer objects — fit.infer","text":"Randomization-based statistical inference multiple explanatory variables requires careful consideration null hypothesis question implications permutation procedures. Inference partial regression coefficients via permutation method implemented generate() multiple explanatory variables, consistent meaning elsewhere package, subject additional distributional assumptions beyond required one explanatory variable. Namely, distribution response variable must similar distribution errors null hypothesis' specification fixed effect explanatory variables. (null hypothesis reflected variables argument generate(). default, explanatory variables treated fixed.) general rule thumb , large outliers distributions explanatory variables, distributional assumption satisfied; response variable permuted, (presumably outlying) value response longer paired outlier explanatory variable, causing outsize effect resulting slope coefficient explanatory variable. sophisticated methods outside scope package requiring fewer---less strict---distributional assumptions exist. overview, see \"Permutation tests univariate multivariate analysis variance regression\" (Marti J. Anderson, 2001), doi:10.1139/cjfas-58-3-626 .","code":""},{"path":"https://infer.tidymodels.org/dev/reference/fit.infer.html","id":"reproducibility","dir":"Reference","previous_headings":"","what":"Reproducibility","title":"Fit linear models to infer objects — fit.infer","text":"using infer package research, cases exact reproducibility priority, sure set seed R’s random number generator. infer respect random seed specified set.seed() function, returning result generate()ing data given identical seed. instance, can calculate difference mean age college degree status using gss dataset 10 versions gss resampled permutation using following code. Setting seed value rerunning code produce result. Please keep mind writing infer code utilizes resampling generate().","code":"set.seed(1) gss %>% specify(age ~ college) %>% hypothesize(null = \"independence\") %>% generate(reps = 5, type = \"permute\") %>% calculate(\"diff in means\", order = c(\"degree\", \"no degree\")) ## Response: age (numeric) ## Explanatory: college (factor) ## Null Hypothesis: independence ## # A tibble: 5 x 2 ## replicate stat ## ## 1 1 -0.531 ## 2 2 -2.35 ## 3 3 0.764 ## 4 4 0.280 ## 5 5 0.350 # set the seed set.seed(1) gss %>% specify(age ~ college) %>% hypothesize(null = \"independence\") %>% generate(reps = 5, type = \"permute\") %>% calculate(\"diff in means\", order = c(\"degree\", \"no degree\")) ## Response: age (numeric) ## Explanatory: college (factor) ## Null Hypothesis: independence ## # A tibble: 5 x 2 ## replicate stat ## ## 1 1 -0.531 ## 2 2 -2.35 ## 3 3 0.764 ## 4 4 0.280 ## 5 5 0.350"},{"path":"https://infer.tidymodels.org/dev/reference/fit.infer.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Fit linear models to infer objects — fit.infer","text":"","code":"# fit a linear model predicting number of hours worked per # week using respondent age and degree status. observed_fit <- gss %>% specify(hours ~ age + college) %>% fit() observed_fit #> # A tibble: 3 × 2 #> term estimate #> #> 1 intercept 40.6 #> 2 age 0.00596 #> 3 collegedegree 1.53 # fit 100 models to resamples of the gss dataset, where the response # `hours` is permuted in each. note that this code is the same as # the above except for the addition of the `generate` step. null_fits <- gss %>% specify(hours ~ age + college) %>% hypothesize(null = \"independence\") %>% generate(reps = 100, type = \"permute\") %>% fit() null_fits #> # A tibble: 300 × 3 #> # Groups: replicate [100] #> replicate term estimate #> #> 1 1 intercept 43.4 #> 2 1 age -0.0457 #> 3 1 collegedegree -0.481 #> 4 2 intercept 41.2 #> 5 2 age 0.00565 #> 6 2 collegedegree -0.212 #> 7 3 intercept 40.3 #> 8 3 age 0.0314 #> 9 3 collegedegree -0.510 #> 10 4 intercept 40.5 #> # ℹ 290 more rows # for logistic regression, just supply a binary response variable! # (this can also be made explicit via the `family` argument in ...) gss %>% specify(college ~ age + hours) %>% fit() #> # A tibble: 3 × 2 #> term estimate #> #> 1 intercept -1.13 #> 2 age 0.00527 #> 3 hours 0.00698 # more in-depth explanation of how to use the infer package if (FALSE) { vignette(\"infer\") }"},{"path":"https://infer.tidymodels.org/dev/reference/generate.html","id":null,"dir":"Reference","previous_headings":"","what":"Generate resamples, permutations, or simulations — generate","title":"Generate resamples, permutations, or simulations — generate","text":"Generation creates simulated distribution specify(). context confidence intervals, bootstrap distribution based result specify(). context hypothesis testing, null distribution based result specify() hypothesize(). Learn vignette(\"infer\").","code":""},{"path":"https://infer.tidymodels.org/dev/reference/generate.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Generate resamples, permutations, or simulations — generate","text":"","code":"generate(x, reps = 1, type = NULL, variables = !!response_expr(x), ...)"},{"path":"https://infer.tidymodels.org/dev/reference/generate.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Generate resamples, permutations, or simulations — generate","text":"x data frame can coerced tibble. reps number resamples generate. type method used generate resamples observed data reflecting null hypothesis. Currently one \"bootstrap\", \"permute\", \"draw\" (see ). variables type = \"permute\", set unquoted column names data permute (independently ). Defaults response variable. Note derived effects depend columns (e.g., interaction effects) also affected. ... Currently ignored.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/generate.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Generate resamples, permutations, or simulations — generate","text":"tibble containing reps generated datasets, indicated replicate column.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/generate.html","id":"generation-types","dir":"Reference","previous_headings":"","what":"Generation Types","title":"Generate resamples, permutations, or simulations — generate","text":"type argument determines method used create null distribution. bootstrap: bootstrap sample drawn replicate, sample size equal input sample size drawn (replacement) input sample data. permute: replicate, input value randomly reassigned (without replacement) new output value sample. draw: value sampled theoretical distribution parameter p specified hypothesize() replicate. option currently applicable testing one proportion. generation type previously called \"simulate\", superseded.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/generate.html","id":"reproducibility","dir":"Reference","previous_headings":"","what":"Reproducibility","title":"Generate resamples, permutations, or simulations — generate","text":"using infer package research, cases exact reproducibility priority, sure set seed R’s random number generator. infer respect random seed specified set.seed() function, returning result generate()ing data given identical seed. instance, can calculate difference mean age college degree status using gss dataset 10 versions gss resampled permutation using following code. Setting seed value rerunning code produce result. Please keep mind writing infer code utilizes resampling generate().","code":"set.seed(1) gss %>% specify(age ~ college) %>% hypothesize(null = \"independence\") %>% generate(reps = 5, type = \"permute\") %>% calculate(\"diff in means\", order = c(\"degree\", \"no degree\")) ## Response: age (numeric) ## Explanatory: college (factor) ## Null Hypothesis: independence ## # A tibble: 5 x 2 ## replicate stat ## ## 1 1 -0.531 ## 2 2 -2.35 ## 3 3 0.764 ## 4 4 0.280 ## 5 5 0.350 # set the seed set.seed(1) gss %>% specify(age ~ college) %>% hypothesize(null = \"independence\") %>% generate(reps = 5, type = \"permute\") %>% calculate(\"diff in means\", order = c(\"degree\", \"no degree\")) ## Response: age (numeric) ## Explanatory: college (factor) ## Null Hypothesis: independence ## # A tibble: 5 x 2 ## replicate stat ## ## 1 1 -0.531 ## 2 2 -2.35 ## 3 3 0.764 ## 4 4 0.280 ## 5 5 0.350"},{"path":[]},{"path":"https://infer.tidymodels.org/dev/reference/generate.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Generate resamples, permutations, or simulations — generate","text":"","code":"# generate a null distribution by taking 200 bootstrap samples gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% generate(reps = 200, type = \"bootstrap\") #> Response: hours (numeric) #> Null Hypothesis: point #> # A tibble: 100,000 × 2 #> # Groups: replicate [200] #> replicate hours #> #> 1 1 48.6 #> 2 1 38.6 #> 3 1 38.6 #> 4 1 8.62 #> 5 1 38.6 #> 6 1 38.6 #> 7 1 18.6 #> 8 1 38.6 #> 9 1 38.6 #> 10 1 58.6 #> # ℹ 99,990 more rows # generate a null distribution for the independence of # two variables by permuting their values 200 times gss %>% specify(partyid ~ age) %>% hypothesize(null = \"independence\") %>% generate(reps = 200, type = \"permute\") #> Dropping unused factor levels DK from the supplied response variable #> 'partyid'. #> Response: partyid (factor) #> Explanatory: age (numeric) #> Null Hypothesis: independence #> # A tibble: 100,000 × 3 #> # Groups: replicate [200] #> partyid age replicate #> #> 1 rep 36 1 #> 2 ind 34 1 #> 3 dem 24 1 #> 4 dem 42 1 #> 5 ind 31 1 #> 6 dem 32 1 #> 7 ind 48 1 #> 8 rep 36 1 #> 9 ind 30 1 #> 10 ind 33 1 #> # ℹ 99,990 more rows # generate a null distribution via sampling from a # binomial distribution 200 times gss %>% specify(response = sex, success = \"female\") %>% hypothesize(null = \"point\", p = .5) %>% generate(reps = 200, type = \"draw\") %>% calculate(stat = \"z\") #> Response: sex (factor) #> Null Hypothesis: point #> # A tibble: 200 × 2 #> replicate stat #> #> 1 1 0.537 #> 2 2 0.447 #> 3 3 -0.447 #> 4 4 -0.984 #> 5 5 1.70 #> 6 6 1.52 #> 7 7 0.0894 #> 8 8 -1.25 #> 9 9 -0.268 #> 10 10 -0.805 #> # ℹ 190 more rows # more in-depth explanation of how to use the infer package if (FALSE) { vignette(\"infer\") }"},{"path":"https://infer.tidymodels.org/dev/reference/get_confidence_interval.html","id":null,"dir":"Reference","previous_headings":"","what":"Compute confidence interval — get_confidence_interval","title":"Compute confidence interval — get_confidence_interval","text":"Compute confidence interval around summary statistic. simulation-based theoretical methods supported, though type = \"se\" supported theoretical methods. Learn vignette(\"infer\").","code":""},{"path":"https://infer.tidymodels.org/dev/reference/get_confidence_interval.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Compute confidence interval — get_confidence_interval","text":"","code":"get_confidence_interval(x, level = 0.95, type = NULL, point_estimate = NULL) get_ci(x, level = 0.95, type = NULL, point_estimate = NULL)"},{"path":"https://infer.tidymodels.org/dev/reference/get_confidence_interval.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Compute confidence interval — get_confidence_interval","text":"x distribution. simulation-based inference, data frame containing distribution calculate()d statistics fit()ted coefficient estimates. object passed generate() supplied calculate() fit(). theory-based inference, output assume(). Distributions confidence intervals require null hypothesis via hypothesize(). level numerical value 0 1 giving confidence level. Default value 0.95. type string giving method used creating confidence interval. default \"percentile\" \"se\" corresponding (multiplier * standard error) \"bias-corrected\" bias-corrected interval options. point_estimate data frame containing observed statistic (calculate()-based workflow) observed fit (fit()-based workflow). object likely output calculate() fit() need passed generate(). Set NULL default. Must provided type \"se\" \"bias-corrected\".","code":""},{"path":"https://infer.tidymodels.org/dev/reference/get_confidence_interval.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Compute confidence interval — get_confidence_interval","text":"tibble containing following columns: term: explanatory variable (intercept) question. supplied input previously passed fit(). lower_ci, upper_ci: lower upper bounds confidence interval, respectively.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/get_confidence_interval.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Compute confidence interval — get_confidence_interval","text":"null hypothesis required compute confidence interval. However, including hypothesize() pipeline leading get_confidence_interval() break anything. can useful computing confidence interval using distribution used compute p-value. Theoretical confidence intervals (.e. calculated supplying output assume() x argument) require point estimate lies scale data. distribution defined assume() recentered rescaled align point estimate, can shown output visualize() paired shade_confidence_interval(). Confidence intervals implemented following distributions point estimates: distribution = \"t\": point_estimate output calculate() stat = \"mean\" stat = \"diff means\" distribution = \"z\": point_estimate output calculate() stat = \"prop\" stat = \"diff props\"","code":""},{"path":"https://infer.tidymodels.org/dev/reference/get_confidence_interval.html","id":"aliases","dir":"Reference","previous_headings":"","what":"Aliases","title":"Compute confidence interval — get_confidence_interval","text":"get_ci() alias get_confidence_interval(). conf_int() deprecated alias get_confidence_interval().","code":""},{"path":[]},{"path":"https://infer.tidymodels.org/dev/reference/get_confidence_interval.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Compute confidence interval — get_confidence_interval","text":"","code":"boot_dist <- gss %>% # We're interested in the number of hours worked per week specify(response = hours) %>% # Generate bootstrap samples generate(reps = 1000, type = \"bootstrap\") %>% # Calculate mean of each bootstrap sample calculate(stat = \"mean\") boot_dist %>% # Calculate the confidence interval around the point estimate get_confidence_interval( # At the 95% confidence level; percentile method level = 0.95 ) #> # A tibble: 1 × 2 #> lower_ci upper_ci #> #> 1 40.2 42.7 # for type = \"se\" or type = \"bias-corrected\" we need a point estimate sample_mean <- gss %>% specify(response = hours) %>% calculate(stat = \"mean\") boot_dist %>% get_confidence_interval( point_estimate = sample_mean, # At the 95% confidence level level = 0.95, # Using the standard error method type = \"se\" ) #> # A tibble: 1 × 2 #> lower_ci upper_ci #> #> 1 40.1 42.7 # using a theoretical distribution ----------------------------------- # define a sampling distribution sampling_dist <- gss %>% specify(response = hours) %>% assume(\"t\") # get the confidence interval---note that the # point estimate is required here get_confidence_interval( sampling_dist, level = .95, point_estimate = sample_mean ) #> # A tibble: 1 × 2 #> lower_ci upper_ci #> #> 1 40.1 42.7 # using a model fitting workflow ----------------------- # fit a linear model predicting number of hours worked per # week using respondent age and degree status. observed_fit <- gss %>% specify(hours ~ age + college) %>% fit() observed_fit #> # A tibble: 3 × 2 #> term estimate #> #> 1 intercept 40.6 #> 2 age 0.00596 #> 3 collegedegree 1.53 # fit 100 models to resamples of the gss dataset, where the response # `hours` is permuted in each. note that this code is the same as # the above except for the addition of the `generate` step. null_fits <- gss %>% specify(hours ~ age + college) %>% hypothesize(null = \"independence\") %>% generate(reps = 100, type = \"permute\") %>% fit() null_fits #> # A tibble: 300 × 3 #> # Groups: replicate [100] #> replicate term estimate #> #> 1 1 intercept 44.2 #> 2 1 age -0.0765 #> 3 1 collegedegree 0.676 #> 4 2 intercept 41.5 #> 5 2 age -0.000968 #> 6 2 collegedegree -0.329 #> 7 3 intercept 41.4 #> 8 3 age 0.0131 #> 9 3 collegedegree -1.50 #> 10 4 intercept 42.0 #> # ℹ 290 more rows get_confidence_interval( null_fits, point_estimate = observed_fit, level = .95 ) #> # A tibble: 3 × 3 #> term lower_ci upper_ci #> #> 1 age -0.0846 0.0856 #> 2 collegedegree -2.10 2.81 #> 3 intercept 38.1 44.7 # more in-depth explanation of how to use the infer package if (FALSE) { vignette(\"infer\") }"},{"path":"https://infer.tidymodels.org/dev/reference/get_p_value.html","id":null,"dir":"Reference","previous_headings":"","what":"Compute p-value — get_p_value","title":"Compute p-value — get_p_value","text":"Compute p-value null distribution observed statistic. Learn vignette(\"infer\").","code":""},{"path":"https://infer.tidymodels.org/dev/reference/get_p_value.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Compute p-value — get_p_value","text":"","code":"get_p_value(x, obs_stat, direction) # S3 method for default get_p_value(x, obs_stat, direction) get_pvalue(x, obs_stat, direction) # S3 method for infer_dist get_p_value(x, obs_stat, direction)"},{"path":"https://infer.tidymodels.org/dev/reference/get_p_value.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Compute p-value — get_p_value","text":"x null distribution. simulation-based inference, data frame containing distribution calculate()d statistics fit()ted coefficient estimates. object passed generate() supplied calculate() fit(). theory-based inference, output assume(). obs_stat data frame containing observed statistic (calculate()-based workflow) observed fit (fit()-based workflow). object likely output calculate() fit() need passed generate(). direction character string. Options \"less\", \"greater\", \"two-sided\". Can also use \"left\", \"right\", \"\", \"two_sided\", \"two sided\", \"two.sided\".","code":""},{"path":"https://infer.tidymodels.org/dev/reference/get_p_value.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Compute p-value — get_p_value","text":"tibble containing following columns: term: explanatory variable (intercept) question. supplied input previously passed fit(). p_value: value [0, 1] giving probability statistic/coefficient extreme observed statistic/coefficient occur null hypothesis true.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/get_p_value.html","id":"aliases","dir":"Reference","previous_headings":"","what":"Aliases","title":"Compute p-value — get_p_value","text":"get_pvalue() alias get_p_value(). p_value deprecated alias get_p_value().","code":""},{"path":"https://infer.tidymodels.org/dev/reference/get_p_value.html","id":"zero-p-value","dir":"Reference","previous_headings":"","what":"Zero p-value","title":"Compute p-value — get_p_value","text":"Though true p-value 0 impossible, get_p_value() may return 0 cases. due simulation-based nature {infer} package; output function approximation based number reps chosen generate() step. observed statistic unlikely given null hypothesis, small number reps generated form null distribution, possible observed statistic extreme every test statistic generated form null distribution, resulting approximate p-value 0. case, true p-value small value likely less 3/reps (based poisson approximation). case p-value zero reported, warning message raised caution user reporting p-value exactly equal 0.","code":""},{"path":[]},{"path":"https://infer.tidymodels.org/dev/reference/get_p_value.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Compute p-value — get_p_value","text":"","code":"# using a simulation-based null distribution ------------------------------ # find the point estimate---mean number of hours worked per week point_estimate <- gss %>% specify(response = hours) %>% calculate(stat = \"mean\") # starting with the gss dataset gss %>% # ...we're interested in the number of hours worked per week specify(response = hours) %>% # hypothesizing that the mean is 40 hypothesize(null = \"point\", mu = 40) %>% # generating data points for a null distribution generate(reps = 1000, type = \"bootstrap\") %>% # finding the null distribution calculate(stat = \"mean\") %>% get_p_value(obs_stat = point_estimate, direction = \"two-sided\") #> # A tibble: 1 × 1 #> p_value #> #> 1 0.032 # using a theoretical null distribution ----------------------------------- # calculate the observed statistic obs_stat <- gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% calculate(stat = \"t\") # define a null distribution null_dist <- gss %>% specify(response = hours) %>% assume(\"t\") # calculate a p-value get_p_value(null_dist, obs_stat, direction = \"both\") #> # A tibble: 1 × 1 #> p_value #> #> 1 0.0376 # using a model fitting workflow ----------------------------------------- # fit a linear model predicting number of hours worked per # week using respondent age and degree status. observed_fit <- gss %>% specify(hours ~ age + college) %>% fit() observed_fit #> # A tibble: 3 × 2 #> term estimate #> #> 1 intercept 40.6 #> 2 age 0.00596 #> 3 collegedegree 1.53 # fit 100 models to resamples of the gss dataset, where the response # `hours` is permuted in each. note that this code is the same as # the above except for the addition of the `generate` step. null_fits <- gss %>% specify(hours ~ age + college) %>% hypothesize(null = \"independence\") %>% generate(reps = 100, type = \"permute\") %>% fit() null_fits #> # A tibble: 300 × 3 #> # Groups: replicate [100] #> replicate term estimate #> #> 1 1 intercept 40.7 #> 2 1 age -0.00753 #> 3 1 collegedegree 2.78 #> 4 2 intercept 41.8 #> 5 2 age -0.000256 #> 6 2 collegedegree -1.08 #> 7 3 intercept 42.7 #> 8 3 age -0.0426 #> 9 3 collegedegree 1.23 #> 10 4 intercept 42.6 #> # ℹ 290 more rows get_p_value(null_fits, obs_stat = observed_fit, direction = \"two-sided\") #> # A tibble: 3 × 2 #> term p_value #> #> 1 age 0.92 #> 2 collegedegree 0.26 #> 3 intercept 0.68 # more in-depth explanation of how to use the infer package if (FALSE) { vignette(\"infer\") }"},{"path":"https://infer.tidymodels.org/dev/reference/gss.html","id":null,"dir":"Reference","previous_headings":"","what":"Subset of data from the General Social Survey (GSS). — gss","title":"Subset of data from the General Social Survey (GSS). — gss","text":"General Social Survey high-quality survey gathers data American society opinions, conducted since 1972. data set sample 500 entries GSS, spanning years 1973-2018, including demographic markers economic variables. Note data included demonstration , assumed provide accurate estimates relating GSS. However, due high quality GSS, unweighted data approximate weighted data analyses.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/gss.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Subset of data from the General Social Survey (GSS). — gss","text":"","code":"gss"},{"path":"https://infer.tidymodels.org/dev/reference/gss.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Subset of data from the General Social Survey (GSS). — gss","text":"tibble 500 rows 11 variables: year year respondent surveyed age age time survey, truncated 89 sex respondent's sex (self-identified) college whether respondent college degree, including junior/community college partyid political party affiliation hompop number persons household hours number hours worked week survey, truncated 89 income total family income class subjective socioeconomic class identification finrela opinion family income weight survey weight","code":""},{"path":"https://infer.tidymodels.org/dev/reference/gss.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Subset of data from the General Social Survey (GSS). — gss","text":"https://gss.norc.org","code":""},{"path":"https://infer.tidymodels.org/dev/reference/hypothesize.html","id":null,"dir":"Reference","previous_headings":"","what":"Declare a null hypothesis — hypothesize","title":"Declare a null hypothesis — hypothesize","text":"Declare null hypothesis variables selected specify(). Learn vignette(\"infer\").","code":""},{"path":"https://infer.tidymodels.org/dev/reference/hypothesize.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Declare a null hypothesis — hypothesize","text":"","code":"hypothesize(x, null, p = NULL, mu = NULL, med = NULL, sigma = NULL) hypothesise(x, null, p = NULL, mu = NULL, med = NULL, sigma = NULL)"},{"path":"https://infer.tidymodels.org/dev/reference/hypothesize.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Declare a null hypothesis — hypothesize","text":"x data frame can coerced tibble. null null hypothesis. Options include \"independence\", \"point\", \"paired independence\". independence: used response explanatory variable. Indicates values specified response variable independent associated values explanatory. point: used response variable. Indicates point estimate based values response associated parameter. Sometimes requires supplying one p, mu, med, sigma. paired independence: used response variable giving pre-computed difference paired observations. Indicates order subtraction paired values affect resulting distribution. p true proportion successes (number 0 1). used point null hypotheses specified response variable categorical. mu true mean (numerical value). used point null hypotheses specified response variable continuous. med true median (numerical value). used point null hypotheses specified response variable continuous. sigma true standard deviation (numerical value). used point null hypotheses.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/hypothesize.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Declare a null hypothesis — hypothesize","text":"tibble containing response (explanatory, specified) variable data parameter information stored well.","code":""},{"path":[]},{"path":"https://infer.tidymodels.org/dev/reference/hypothesize.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Declare a null hypothesis — hypothesize","text":"","code":"# hypothesize independence of two variables gss %>% specify(college ~ partyid, success = \"degree\") %>% hypothesize(null = \"independence\") #> Dropping unused factor levels DK from the supplied explanatory variable #> 'partyid'. #> Response: college (factor) #> Explanatory: partyid (factor) #> Null Hypothesis: independence #> # A tibble: 500 × 2 #> college partyid #> #> 1 degree ind #> 2 no degree rep #> 3 degree ind #> 4 no degree ind #> 5 degree rep #> 6 no degree rep #> 7 no degree dem #> 8 degree ind #> 9 degree rep #> 10 no degree dem #> # ℹ 490 more rows # hypothesize a mean number of hours worked per week of 40 gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) #> Response: hours (numeric) #> Null Hypothesis: point #> # A tibble: 500 × 1 #> hours #> #> 1 50 #> 2 31 #> 3 40 #> 4 40 #> 5 40 #> 6 53 #> 7 32 #> 8 20 #> 9 40 #> 10 40 #> # ℹ 490 more rows # more in-depth explanation of how to use the infer package if (FALSE) { vignette(\"infer\") }"},{"path":"https://infer.tidymodels.org/dev/reference/infer.html","id":null,"dir":"Reference","previous_headings":"","what":"infer: a grammar for statistical inference — infer","title":"infer: a grammar for statistical inference — infer","text":"objective package perform statistical inference using grammar illustrates underlying concepts format coheres tidyverse.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/infer.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"infer: a grammar for statistical inference — infer","text":"overview use core functionality, see vignette(\"infer\")","code":""},{"path":[]},{"path":"https://infer.tidymodels.org/dev/reference/infer.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"infer: a grammar for statistical inference — infer","text":"Maintainer: Simon Couch simon.couch@posit.co (ORCID) Authors: Andrew Bray abray@reed.edu Chester Ismay chester.ismay@gmail.com (ORCID) Evgeni Chasnovski evgeni.chasnovski@gmail.com (ORCID) Ben Baumer ben.baumer@gmail.com (ORCID) Mine Cetinkaya-Rundel mine@stat.duke.edu (ORCID) contributors: Ted Laderas tedladeras@gmail.com (ORCID) [contributor] Nick Solomon nick.solomon@datacamp.com [contributor] Johanna Hardin Jo.Hardin@pomona.edu [contributor] Albert Y. Kim albert.ys.kim@gmail.com (ORCID) [contributor] Neal Fultz nfultz@gmail.com [contributor] Doug Friedman doug.nhp@gmail.com [contributor] Richie Cotton richie@datacamp.com (ORCID) [contributor] Brian Fannin captain@pirategrunt.com [contributor]","code":""},{"path":"https://infer.tidymodels.org/dev/reference/observe.html","id":null,"dir":"Reference","previous_headings":"","what":"Calculate observed statistics — observe","title":"Calculate observed statistics — observe","text":"function wrapper calls specify(), hypothesize(), calculate() consecutively can used calculate observed statistics data. hypothesize() called point null hypothesis parameter supplied. Learn vignette(\"infer\").","code":""},{"path":"https://infer.tidymodels.org/dev/reference/observe.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calculate observed statistics — observe","text":"","code":"observe( x, formula, response = NULL, explanatory = NULL, success = NULL, null = NULL, p = NULL, mu = NULL, med = NULL, sigma = NULL, stat = c(\"mean\", \"median\", \"sum\", \"sd\", \"prop\", \"count\", \"diff in means\", \"diff in medians\", \"diff in props\", \"Chisq\", \"F\", \"slope\", \"correlation\", \"t\", \"z\", \"ratio of props\", \"odds ratio\"), order = NULL, ... )"},{"path":"https://infer.tidymodels.org/dev/reference/observe.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calculate observed statistics — observe","text":"x data frame can coerced tibble. formula formula response variable left explanatory right. Alternatively, response explanatory argument can supplied. response variable name x serve response. alternative using formula argument. explanatory variable name x serve explanatory variable. alternative using formula argument. success level response considered success, string. Needed inference one proportion, difference proportions, corresponding z stats. null null hypothesis. Options include \"independence\", \"point\", \"paired independence\". independence: used response explanatory variable. Indicates values specified response variable independent associated values explanatory. point: used response variable. Indicates point estimate based values response associated parameter. Sometimes requires supplying one p, mu, med, sigma. paired independence: used response variable giving pre-computed difference paired observations. Indicates order subtraction paired values affect resulting distribution. p true proportion successes (number 0 1). used point null hypotheses specified response variable categorical. mu true mean (numerical value). used point null hypotheses specified response variable continuous. med true median (numerical value). used point null hypotheses specified response variable continuous. sigma true standard deviation (numerical value). used point null hypotheses. stat string giving type statistic calculate. Current options include \"mean\", \"median\", \"sum\", \"sd\", \"prop\", \"count\", \"diff means\", \"diff medians\", \"diff props\", \"Chisq\" (\"chisq\"), \"F\" (\"f\"), \"t\", \"z\", \"ratio props\", \"slope\", \"odds ratio\", \"ratio means\", \"correlation\". infer supports theoretical tests one two means via \"t\" distribution one two proportions via \"z\". order string vector specifying order levels explanatory variable ordered subtraction (division ratio-based statistics), order = c(\"first\", \"second\") means (\"first\" - \"second\"), analogue ratios. Needed inference difference means, medians, proportions, ratios, t, z statistics. ... pass options like na.rm = TRUE functions like mean(), sd(), etc. Can also used supply hypothesized null values \"t\" statistic additional arguments stats::chisq.test().","code":""},{"path":"https://infer.tidymodels.org/dev/reference/observe.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calculate observed statistics — observe","text":"1-column tibble containing calculated statistic stat.","code":""},{"path":[]},{"path":"https://infer.tidymodels.org/dev/reference/observe.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Calculate observed statistics — observe","text":"","code":"# calculating the observed mean number of hours worked per week gss %>% observe(hours ~ NULL, stat = \"mean\") #> Response: hours (numeric) #> # A tibble: 1 × 1 #> stat #> #> 1 41.4 # equivalently, calculating the same statistic with the core verbs gss %>% specify(response = hours) %>% calculate(stat = \"mean\") #> Response: hours (numeric) #> # A tibble: 1 × 1 #> stat #> #> 1 41.4 # calculating a t statistic for hypothesized mu = 40 hours worked/week gss %>% observe(hours ~ NULL, stat = \"t\", null = \"point\", mu = 40) #> Response: hours (numeric) #> Null Hypothesis: point #> # A tibble: 1 × 1 #> stat #> #> 1 2.09 # equivalently, calculating the same statistic with the core verbs gss %>% specify(response = hours) %>% hypothesize(null = \"point\", mu = 40) %>% calculate(stat = \"t\") #> Response: hours (numeric) #> Null Hypothesis: point #> # A tibble: 1 × 1 #> stat #> #> 1 2.09 # similarly for a difference in means in age based on whether # the respondent has a college degree observe( gss, age ~ college, stat = \"diff in means\", order = c(\"degree\", \"no degree\") ) #> Response: age (numeric) #> Explanatory: college (factor) #> # A tibble: 1 × 1 #> stat #> #> 1 0.941 # equivalently, calculating the same statistic with the core verbs gss %>% specify(age ~ college) %>% calculate(\"diff in means\", order = c(\"degree\", \"no degree\")) #> Response: age (numeric) #> Explanatory: college (factor) #> # A tibble: 1 × 1 #> stat #> #> 1 0.941 # for a more in-depth explanation of how to use the infer package if (FALSE) { vignette(\"infer\") }"},{"path":"https://infer.tidymodels.org/dev/reference/pipe.html","id":null,"dir":"Reference","previous_headings":"","what":"Pipe — %>%","title":"Pipe — %>%","text":"Like {dplyr}, {infer} also uses pipe (%>%) function magrittr turn function composition series iterative statements.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/pipe.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Pipe — %>%","text":"lhs, rhs Inference functions initial data frame.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/print.infer.html","id":null,"dir":"Reference","previous_headings":"","what":"Print methods — print.infer","title":"Print methods — print.infer","text":"Print methods","code":""},{"path":"https://infer.tidymodels.org/dev/reference/print.infer.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Print methods — print.infer","text":"","code":"# S3 method for infer print(x, ...) # S3 method for infer_layer print(x, ...) # S3 method for infer_dist print(x, ...)"},{"path":"https://infer.tidymodels.org/dev/reference/print.infer.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Print methods — print.infer","text":"x object class infer, .e. output specify() hypothesize(), class infer_layer, .e. output shade_p_value() shade_confidence_interval(). ... Arguments passed methods.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/prop_test.html","id":null,"dir":"Reference","previous_headings":"","what":"Tidy proportion test — prop_test","title":"Tidy proportion test — prop_test","text":"tidier version prop.test() equal given proportions.","code":""},{"path":"https://infer.tidymodels.org/dev/reference/prop_test.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Tidy proportion test — prop_test","text":"","code":"prop_test( x, formula, response = NULL, explanatory = NULL, p = NULL, order = NULL, alternative = \"two-sided\", conf_int = TRUE, conf_level = 0.95, success = NULL, correct = NULL, z = FALSE, ... )"},{"path":"https://infer.tidymodels.org/dev/reference/prop_test.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Tidy proportion test — prop_test","text":"x data frame can coerced tibble. formula formula response variable left explanatory right. Alternatively, response explanatory argument can supplied. response variable name x serve response. alternative using formula argument. explanatory variable name x serve explanatory variable. alternative using formula argument. p numeric vector giving hypothesized null proportion success group. order string vector specifying order proportions subtracted, order = c(\"first\", \"second\") means \"first\" - \"second\". Ignored one-sample tests, optional two sample tests. alternative Character string giving direction alternative hypothesis. Options \"two-sided\" (default), \"greater\", \"less\". used testing null single proportion equals given value, two proportions equal; ignored otherwise. conf_int logical value whether include confidence interval . TRUE default. conf_level numeric value 0 1. Default value 0.95. success level response considered success, string. used testing null single proportion equals given value, two proportions equal; ignored otherwise. correct logical indicating whether Yates' continuity correction applied possible. z = TRUE, correct argument overwritten FALSE. Otherwise defaults correct = TRUE. z logical value whether report statistic standard normal deviate Pearson's chi-square statistic. \\(z^2\\) distributed chi-square 1 degree freedom, though note user likely need turn Yates' continuity correction setting correct = FALSE see connection. ... Additional arguments prop.test().","code":""},{"path":"https://infer.tidymodels.org/dev/reference/prop_test.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Tidy proportion test — prop_test","text":"testing explanatory variable two levels, order argument used package longer well-defined. function thus raise warning ignore value supplied non-NULL order argument. columns present output depend output prop.test() broom::glance.htest(). See latter's documentation column definitions; columns renamed following mapping: chisq_df = parameter p_value = p.value lower_ci = conf.low upper_ci = conf.high","code":""},{"path":[]},{"path":"https://infer.tidymodels.org/dev/reference/prop_test.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Tidy proportion test — prop_test","text":"","code":"# two-sample proportion test for difference in proportions of # college completion by respondent sex prop_test(gss, college ~ sex, order = c(\"female\", \"male\")) #> # A tibble: 1 × 6 #> statistic chisq_df p_value alternative lower_ci upper_ci #>