Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding LFC values in pairwise test with structural zeros. Misinterpretation or bug? #298

Open
kevinbretscher opened this issue Dec 9, 2024 · 0 comments

Comments

@kevinbretscher
Copy link

kevinbretscher commented Dec 9, 2024

Dear Authors,

First of all, thank your work on ANCOM. I love how everything is neatly wrapped in a single function.

This is my first time working with the new pairwise comparison and I run into an issue. I am not sure if I'm interpreting the output wrong or that there is a bug. I provided a minimum working example to recreate the issue. I am by no means an expert on normalization and statistics so please correct me if I am wrong.

It seems to me that when doing a pairwise comparison on data that contains structural zeros, some of the lfc values make no sense.

Let's create a fake OTU table with structural zeros.


library(dplyr)
library(phyloseq)
library(ANCOMBC)

####Create a fake OTU table

otumat = matrix(sample(1:100, 100, replace = TRUE), nrow = 10, ncol = 10)
rownames(otumat) <- paste0("OTU", 1:nrow(otumat))
colnames(otumat) <- paste0("Sample", 1:ncol(otumat))

### introduce strucutural zero 

otumat[1,c(8,9,10)] = 0
otumat[2,c(8,9,10)] = 0

otumat

taxmat = matrix(sample(letters, 70, replace = TRUE), nrow = nrow(otumat), ncol = 7)
rownames(taxmat) <- rownames(otumat)
colnames(taxmat) <- c("Domain", "Phylum", "Class", "Order", "Family", "Genus", "Species")

OTU = otu_table(otumat, taxa_are_rows = TRUE)
TAX = tax_table(taxmat)

physeq = phyloseq(OTU, TAX)

### add fake metadata so that OTU1 and OTU2 have 0 counts in location C

sampledata = sample_data(data.frame(
  Location = c("A","A","A","B","B","B","B","C","C","C"),
  row.names=sample_names(physeq),
  stringsAsFactors=FALSE
))

physeq1 = merge_phyloseq(physeq, sampledata)


Let's run ancom with struc_zero = TRUE (Non-default behaviour).

out_struc_zero <- ancombc2(data = physeq1, fix_formula = "Location", group = "Location", struc_zero = TRUE,
                p_adj_method = "fdr", prv_cut = 0, lib_cut = 0, verbose = FALSE, 
                n_cl = 4, global = TRUE, pairwise = TRUE, mdfdr_control = list(fwer_ctrl_method = "holm", B = 100))

out_struc_zero$res_pair

taxon lfc_LocationB lfc_LocationC lfc_LocationC_LocationB se_LocationB se_LocationC se_LocationC_LocationB W_LocationB W_LocationC
1 OTU3 -1.1834893 -1.15479205 0.02869728 0.7140427 0.4822120 0.8282884 -1.6574490 -2.3947806
2 OTU4 0.1363115 0.78924163 0.65293009 0.7371565 0.5582272 0.8353284 0.1849153 1.4138358
3 OTU5 -0.2155298 -0.82505945 -0.60952967 0.7874220 1.2469554 1.3372344 -0.2737157 -0.6616592

taxon structural_zero (Location = A) structural_zero (Location = B) structural_zero (Location = C)
1 OTU1 FALSE FALSE TRUE
2 OTU2 FALSE FALSE TRUE

OTU 1 and 2 are not included. This seem to be in line with the documentation. However I am still interested in the comparison between location A and B. And I want the P values to be corrected for pairwise comparison. So I choose to turn structural zeros FALSE (default behaviour).

out_struc_zero_FALSE <- ancombc2(data = physeq1, fix_formula = "Location", group = "Location", struc_zero = FALSE,
                           p_adj_method = "fdr", prv_cut = 0, lib_cut = 0, verbose = FALSE, 
                           n_cl = 4, global = TRUE, pairwise = TRUE, mdfdr_control = list(fwer_ctrl_method = "holm", B = 100))

out_struc_zero_FALSE$res_pair
out_struc_zero_FALSE$zero_ind

taxon lfc_LocationB lfc_LocationC lfc_LocationC_LocationB se_LocationB se_LocationC se_LocationC_LocationB W_LocationB W_LocationC
1 OTU1 0.6320237 0.00000000 -0.63202374 1.2631016 1.2380156 0.9517277 0.5003744 0.0000000
2 OTU2 -0.9690525 0.00000000 0.96905250 0.6529718 0.6289003 0.7439026 -1.4840648 0.0000000
3 OTU3 -1.1834893 -1.15479205 0.02869728 0.7217923 0.4903441 0.8397010 -1.6396537 -2.3550645

out_struc_zero_FALSE$zero_ind
NULL

In this case OTU1 and OTU2 are included as expected. The lfc for A-C is zero, this is understandable as no comparison can be done. However it is slightly misleading as it might be interpreted as no change between A and C.

More strangely is the LFC value for C - B for OTU 1 and 2. It seems to be the inverse of A-B. Which to me makes no sense to me as it would indicate the Location C has somehow more counts then B.

The biggest problem with this is when interpreting or visualizing the data. Showing a lfc change where there is none. Ideally I think here a NA value should be reported.

As there is no zero_ind table available when struc_zero = FALSE it can also not be cross-referenced to see which OTUs have structural zeros.

Now the solution could be to add a pseudocount. However this is not ideal as explained in your documentation.


 out_struc_zero_FALSE.pseudo <- ancombc2(data = physeq1, fix_formula = "Location", group = "Location", struc_zero = FALSE, pseudo = 1,
                               p_adj_method = "fdr", prv_cut = 0, lib_cut = 0, verbose = FALSE, 
                                 n_cl = 4, global = TRUE, pairwise = TRUE, mdfdr_control = list(fwer_ctrl_method = "holm", B = 100))
 out_struc_zero_FALSE.pseudo$res_pair

out_struc_zero_FALSE.pseudo$res_pair
taxon lfc_LocationB lfc_LocationC lfc_LocationC_LocationB se_LocationB se_LocationC se_LocationC_LocationB W_LocationB W_LocationC
1 OTU1 0.40557307 -2.99590963 -3.40148270 1.0217534 1.0467730 0.8672596 0.3969383 -2.86204341
2 OTU2 -0.96549660 -3.97509307 -3.00959647 0.4584745 0.4723326 0.6267913 -2.1058893 -8.41587664
3 OTU3 -1.15524647 -1.09262766 0.06261882 0.6394919 0.4721751 0.7664153 -1.8065069 -2.31403059

To me it seems the when struc-zero = FALSE and no pseudocount is added the LFC values make no sense for C-B. I think the problem here is that when using ANCOM with default setting a researcher might assume these lfc values to be true if they do not check further.

MWE_ANCOM_Issue.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant