Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can ANARCI support hmm search with a specific chain? #74

Open
yzhougnf opened this issue Dec 14, 2023 · 5 comments
Open

Can ANARCI support hmm search with a specific chain? #74

yzhougnf opened this issue Dec 14, 2023 · 5 comments

Comments

@yzhougnf
Copy link

We often need to explicitly search CDRs for a heavy chain or a light chain. E.g., we have a nanobody sequence (with heavy and light chain fused), we don't want to get heavy chain CDRs or light chain CDRs randomly. We prefer to use anarci to first scan for heavy chain CDRs, and then scan for light chain CDRs.

This means we should modify the following function to consider an optional chain type argument.

_parse_hmmer_query(query, bit_score_threshold=80, hmmer_species=None, chain_type=None):

            if chain_type is None: return hsp_list
            hsp_list=[hsp for hsp in hsp_list if hsp.hit_id.endswith(f"{chain_type}")]

Thanks for the consideration.

@ywilke
Copy link

ywilke commented Dec 19, 2023

anarci(sequences, scheme="imgt", database="ALL", output=False, outfile=None, csv=False, allow=set(["H","K","L","A","B","G","D"]), 
           hmmerpath="", ncpu=None, assign_germline=False, allowed_species=['human','mouse'], bit_score_threshold=80)

If you want only heavy chain you could do allow=set(["H"])

@yzhougnf
Copy link
Author

yzhougnf commented Dec 19, 2023

It seems like this is a bug. the reason is chain_type in number_seqquce_from_alignment is not enforced.

When I set allow to set(["L"]), it returns both "K" and "L" based on the code:

       elif chain_type in "KL":
            return number_chothia_light(state_vector, sequence)

In my case the "K" hit happens to be the one with higher bit_score, so it returns the "K" result instead of the "L" as specified. Could this be fixed? Thanks.

@ywilke
Copy link

ywilke commented Dec 19, 2023

The K and L stand for the Kappa and Lambda chain which are both light chains. https://en.wikipedia.org/wiki/Immunoglobulin_light_chain

I am not a developer by the way.

@yzhougnf
Copy link
Author

I check further and I think I found the root cause. My sequence matches both K and L, K is the best hit and L is the second hit.

In _parse_hmmer_query returns "top_descriptions". However, the chain_type is set to the chain type of the top hit. I.e., in my case, althoug
h we have two hits K and L. The chain type is set to "K".

Later in number_sequences_from_alignment, we have
'''
if state_vector and details["chain_type"] in allow:
'''
But since details["chain_type"] is "K" and the allow list is ("L"), the hit was dropped.
So I ended up with no hits, when I set allow=set(["L"]).

The L-chain hit was seconded to the K-chain hit, and it's chain type was not used. So the L-chain hit did not survive in the "allow" filter.

To me, it seems one solution is in my first post, i.e., to consider "allow" logic within _parse_hmmer_query, so that "K" top hit will be filtered out. Then the "L" hit can survive.

@yzhougnf
Copy link
Author

Or the document should say "K" and "L" should not be used alone, they must be used together as set(["K","L"]). I am doing that as the workaround. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants