Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ChunkedEncodingError: search may be incomplete #4

Open
Moh-Pou opened this issue May 18, 2023 · 2 comments
Open

ChunkedEncodingError: search may be incomplete #4

Moh-Pou opened this issue May 18, 2023 · 2 comments

Comments

@Moh-Pou
Copy link

Moh-Pou commented May 18, 2023

I am getting an error when searching for aspirin in REAL, indicating that it doesn't search in the entire REAL so the code can't find Aspirin:

Python 3.8.16 | packaged by conda-forge | (default, Feb 1 2023, 16:01:55)
[GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

import pandas as pd
from smallworld_api import SmallWorld
aspirin = 'O=C(C)Oc1ccccc1C(=O)O'
sw = SmallWorld()
results : pd.DataFrame = sw.search(aspirin, dist=0, db=sw.REAL_dataset)
ChunkedEncodingError: search may be incomplete
results['dist'].values
array([2, 2, 2, 2, 2, 3, 3, 3, 3, 3])

@David-Araripe
Copy link

This seems to be a problem on Small World's side, you can check a previous discussion about it in the previous issue #2

@matteoferla
Copy link
Owner

matteoferla commented Jul 17, 2023

The chunking error in SW (not in Arthor) is a concern.

But the aspirin in REAL is weird. And its oddity spills over to Arthor. I have never cared too much to actually enquire or dig deep into.
I first I had assumed Lipinski rules don't play well with it and then I was told that REAL BB is allegedly not a subset of REAL DB, a misconception I held for a long time.

However, Aspirin (ZINC0053) is in Enamine REAL DB (Z104474430) and Enamine BB (EN300-19606 —$24/g 😆). The e-store link doesn't work as of July 23, but it ought to have.

So my guess is that the fingerprint heuristics struggle to find such an unremarkable compound, that is just a benzene scaffold with a just a carboxylate and a ester. Going one step further, benzoate (O=C(O)c1ccccc1) in Arthor against BB works but it shows what's going on under the hood, with a deluge of bezenyl and carboxyl compounds of diverse sizes...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants