Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: cannot perform reduce with flexible type #9

Open
ramadatta opened this issue Sep 13, 2017 · 10 comments
Open

TypeError: cannot perform reduce with flexible type #9

ramadatta opened this issue Sep 13, 2017 · 10 comments

Comments

@ramadatta
Copy link

ramadatta commented Sep 13, 2017

Hi Vipin,

Thank you very much for the GFFtools scripts. However, when I run the command,

python gff_to_bed.py ref.gff3 > out2.bed

while converting gff3 to bed 12 format, I receive an error below:

Traceback (most recent call last):
  File "gff_to_bed.py", line 115, in <module>
    __main__() 
  File "gff_to_bed.py", line 112, in __main__
    writeBED(Transcriptdb)
  File "gff_to_bed.py", line 86, in writeBED
    score = ent1['score'][0] if ent1['score'].any() else score
  File "/home/.linuxbrew/Cellar/python/2.7.13/lib/python2.7/site-packages/numpy/core/_methods.py", line 38, in _any
    return umr_any(a, axis, dtype, out, keepdims)
TypeError: cannot perform reduce with flexible type

Could I please know how to fix this error?

@vipints
Copy link
Owner

vipints commented Sep 13, 2017

Hi @ramadatta, Can I have first few lines from your GFF3 input file?

@ramadatta
Copy link
Author

Hi Vipin,

Please find it here:

##gff-version 3
#!gff-spec-version 1.21
#!processor NCBI annotwriter
#!genome-build ASM185804v2
#!genome-build-accession NCBI_Assembly:GCF_001858045.1
#!annotation-date 5 December 2016
#!annotation-source NCBI Oreochromis niloticus Annotation Release 103
##sequence-region NC_031965.1 1 38372991
##species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=8128
NC_031965.1     RefSeq  region  1       38372991        .       +       .       ID=id0;Dbxref=taxon:8128;Name=LG1;dev-stage=adult;gbkey=Src;genome=chromosome;isolate=F11D_XX;linkage-group=LG1;mol_type=genomic DNA;sex=female;tissue-type=blood
NC_031965.1     Gnomon  gene    11049   13118   .       -       .       ID=gene0;Dbxref=GeneID:102075536;Name=LOC102075536;gbkey=Gene;gene=LOC102075536;gene_biotype=lncRNA
NC_031965.1     Gnomon  lnc_RNA 11049   13118   .       -       .       ID=rna0;Parent=gene0;Dbxref=GeneID:102075536,Genbank:XR_266817.3;Name=XR_266817.3;gbkey=ncRNA;gene=LOC102075536;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 13 samples with support for all annotated introns;product=uncharacterized LOC102075536;transcript_id=XR_266817.3
NC_031965.1     Gnomon  exon    13071   13118   .       -       .       ID=id1;Parent=rna0;Dbxref=GeneID:102075536,Genbank:XR_266817.3;gbkey=ncRNA;gene=LOC102075536;product=uncharacterized LOC102075536;transcript_id=XR_266817.3
NC_031965.1     Gnomon  exon    12125   12153   .       -       .       ID=id2;Parent=rna0;Dbxref=GeneID:102075536,Genbank:XR_266817.3;gbkey=ncRNA;gene=LOC102075536;product=uncharacterized LOC102075536;transcript_id=XR_266817.3
NC_031965.1     Gnomon  exon    11049   11689   .       -       .       ID=id3;Parent=rna0;Dbxref=GeneID:102075536,Genbank:XR_266817.3;gbkey=ncRNA;gene=LOC102075536;product=uncharacterized LOC102075536;transcript_id=XR_266817.3
NC_031965.1     Gnomon  gene    31634   150960  .       -       .       ID=gene1;Dbxref=GeneID:100690862;Name=LOC100690862;gbkey=Gene;gene=LOC100690862;gene_biotype=protein_coding

Prakki Rama

@vipints
Copy link
Owner

vipints commented Sep 13, 2017

@ramadatta the input gff3 file lines looks good to me. From the above error message, I am seeing little difference the current version of the code:
https://github.com/vipints/GFFtools-GX/blob/master/gff_to_bed.py#L87. Can you test with this recent version of the script.

@ramadatta
Copy link
Author

Hi Vipin,

Thank you for quick replies. I have got the gff3 file from NCBI. The following is the result when I used the latest version of your script:

$ python gff_to_bed_latest.py ref_ASM185804v2_top_level.gff3 >out3.bed
Traceback (most recent call last):
  File "gff_to_bed_latest.py", line 112, in <module>
    __main__() 
  File "gff_to_bed_latest.py", line 109, in __main__
    writeBED(Transcriptdb)
  File "gff_to_bed_latest.py", line 66, in writeBED
    score = ent1['transcript_score'][idx] if ent1['transcript_score'].any() else score ## getting the transcript score 
ValueError: no field of name transcript_score

@vipints
Copy link
Owner

vipints commented Sep 14, 2017

@ramadatta can you provide me the ncbi link where you downloaded the gff3 file.

@ramadatta
Copy link
Author

Hi Vipin,

Here is the annotation file:

ftp://ftp.ncbi.nlm.nih.gov/genomes/Oreochromis_niloticus/GFF/ref_ASM185804v2_top_level.gff3.gz

@vipints
Copy link
Owner

vipints commented Sep 14, 2017

Hi @ramadatta,
I have downloaded the file from the above link. There are many features available in this GFF file.
cut -f 2,3 ref_ASM185804v2_top_level.gff3 | sort | uniq -c
The file looks good with the features available. Now I found an issue with feature mappings in the file:
NW_017615909.1 Gnomon gene 431704 433832 . + . ID=gene36112;Dbxref=GeneID:100700432;Name=LOC100700432;gbkey=Gene;gene=LOC100700432;gene_biotype=pseudogene;pseudo=true
NW_017615909.1 Gnomon exon 431704 431773 . + . ID=id778571;Parent=gene36112;Dbxref=GeneID:100700432;gbkey=exon;gene=LOC100700432;model_evidence=Supporting evidence includes similarity to: 4 Proteins
NW_017615909.1 Gnomon exon 433129 433832 . + . ID=id778572;Parent=gene36112;Dbxref=GeneID:100700432;gbkey=exon;gene=LOC100700432;model_evidence=Supporting evidence includes similarity to: 4 Proteins

The exon feature is directly mapped to gene, in this mapping there missing a additional layer of mapping exon -> transcript -> gene. I would say as per the GFF specification even if it is a single transcript gene, the mapping description should go as mentioned before. (gene -> transcript -> exon). May be it is good to contact the genome annotation center (ncbi) where you downloaded the file and inform them that the mapping in GFF is not valid.

@vipints
Copy link
Owner

vipints commented Sep 14, 2017

in BED format we are representing the transcript feature in a single line.

@vipints
Copy link
Owner

vipints commented Sep 14, 2017

I could do something that I can kick out these non properly matched annotation features. But I m not sure whether this will clean all single exon gene annotation from this file. I think then the resulting bed format will not contain any single exon gene. That will not be a good fix to this problem. Also you cannot define the missing transcript status whether a mRNA, lnRNA, or any other type of RNA feature, this has to be defined by the annotation center.

@ramadatta
Copy link
Author

ramadatta commented Sep 15, 2017

Hi @vipints

That appears true. Some transcripts are missing the status. Any other possible solution to think. May be I will try to approach the NCBI annotation team about this. Thanks so much for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants