Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeepBGC failed with ValueError: Grouper for 'sequence_id' not 1-dimensional #100

Open
marlaux opened this issue Jul 29, 2024 · 0 comments
Open

Comments

@marlaux
Copy link

marlaux commented Jul 29, 2024

Hello
Could you please help me with the folowing error: DeepBGC failed with ValueError: Grouper for 'sequence_id' not 1-dimensional
Thank you very much!!!

Commands:

deepbgc prepare --output-tsv cat_saxi_clusters_refs.prepared.tsv cat_saxi_clusters_refs_genomic.fasta
deepbgc train --model deepbgc.json --output SaxiDetector.pkl --config PFAM2VEC ./pfam2vec.csv Saxi_Positives.incluster.pfam.tsv Fake_negatives.pfam.tsv
ERROR 29/07 18:14:08 DeepBGC failed with ValueError: Grouper for 'sequence_id' not 1-dimensional

cat_saxi_clusters_refs_genomic.fasta contains six nucleotide sequences from a BGC of six species

Based on the GeneSwap_Negatives.pfam.tsv file, I edited the cat_saxi_clusters_refs.prepared.tsv to have the same columns as the GeneSwap_Negatives.pfam.tsv file, including the 'sequence_id', which consist of six BGC identifiers in the positive file and 'NEG_FAKE_CLUSTER' in the edited Fake_negatives.pfam.tsv
Both files have these columns:
sequence_id|contig_id|protein_id|gene_start|gene_end|gene_strand|pfam_id|domain_start|domain_end|bitscore|in_cluster

Saxi_Positives.incluster.pfam.tsv with in_cluster = 1 and six sequence_id to group by during training
Fake_negatives.pfam.tsv with in_cluster = 0 and one sequence_id

I got the deepbgc.json and pfam2vec.csv from github

Complete error message:
Traceback (most recent call last):
File "/home/marlaux/anaconda3/envs/deepbgc/lib/python3.7/site-packages/deepbgc/main.py", line 113, in main
run(argv)
File "/home/marlaux/anaconda3/envs/deepbgc/lib/python3.7/site-packages/deepbgc/main.py", line 102, in run
args.func.run(**args_dict)
File "/home/marlaux/anaconda3/envs/deepbgc/lib/python3.7/site-packages/deepbgc/command/train.py", line 60, in run
train_samples, train_y = util.read_samples(inputs, target)
File "/home/marlaux/anaconda3/envs/deepbgc/lib/python3.7/site-packages/deepbgc/util.py", line 561, in read_samples
samples = [sample for sample_id, sample in domains.groupby('sequence_id')]
File "/home/marlaux/anaconda3/envs/deepbgc/lib/python3.7/site-packages/pandas/core/generic.py", line 7632, in groupby
observed=observed, **kwargs)
File "/home/marlaux/anaconda3/envs/deepbgc/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 2110, in groupby
return klass(obj, by, **kwds)
File "/home/marlaux/anaconda3/envs/deepbgc/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 360, in init
mutated=self.mutated)
File "/home/marlaux/anaconda3/envs/deepbgc/lib/python3.7/site-packages/pandas/core/groupby/grouper.py", line 602, in _get_grouper
if not isinstance(gpr, Grouping) else gpr)
File "/home/marlaux/anaconda3/envs/deepbgc/lib/python3.7/site-packages/pandas/core/groupby/grouper.py", line 322, in init
"Grouper for '{}' not 1-dimensional".format(t))
ValueError: Grouper for 'sequence_id' not 1-dimensional
ERROR 29/07 18:14:08 ================================================================================
ERROR 29/07 18:14:08 DeepBGC failed with ValueError: Grouper for 'sequence_id' not 1-dimensional
ERROR 29/07 18:14:08 ================================================================================

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant