Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protein function prediction with GO - Part 3 #64

Draft
wants to merge 37 commits into
base: dev
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
bdba442
script to evaluate go predictions
aditya0by0 Nov 4, 2024
264bd94
Merge branch 'dev' into protein_prediction
aditya0by0 Nov 4, 2024
6c0fce1
add fmax to evaluation script
aditya0by0 Nov 4, 2024
154e827
Merge branch 'dev' into protein_prediction
aditya0by0 Nov 4, 2024
58ae92d
add base code for deep_go data migration
aditya0by0 Nov 5, 2024
78a38de
varry fmax threshold as per paper
aditya0by0 Nov 5, 2024
3a4e007
go_uniprot: add sequence len to docstring
aditya0by0 Nov 5, 2024
227a014
update experiment evidence codes as per DeepGo SE
aditya0by0 Nov 6, 2024
33436e8
Merge branch 'dev' into protein_prediction
aditya0by0 Nov 6, 2024
c6d60cd
consIder `X` as a valid amino acid as per DeepGO-SE
aditya0by0 Nov 6, 2024
ca5461f
deepgo se mirgration : add class to migrate
aditya0by0 Nov 6, 2024
af54954
Merge branch 'dev' into protein_prediction
aditya0by0 Nov 6, 2024
dfb9430
migration: rectify errors
aditya0by0 Nov 7, 2024
085b13b
protein trigram containing tokenS with `X`
aditya0by0 Nov 7, 2024
3e0bae0
protein token unigram contain `X`
aditya0by0 Nov 7, 2024
99b5af1
add migration for deepgo1 - 2018 paper
aditya0by0 Nov 11, 2024
a15d492
deepgo1: create non-exclusive val set as a placeholder
aditya0by0 Nov 12, 2024
e0a8524
deepgo1: further split train set into train and val for
aditya0by0 Nov 13, 2024
093be28
migration script update
aditya0by0 Nov 13, 2024
14db9d6
add classes to use migrated deepgo data
aditya0by0 Nov 13, 2024
8922d4d
deepgo: minor code change
aditya0by0 Nov 13, 2024
796356c
modify prints to display actual file name
aditya0by0 Nov 13, 2024
3c11a69
create sub dir for deego dataset and move rel files
aditya0by0 Nov 17, 2024
2b571c5
update imports as per new deepGO dir
aditya0by0 Nov 17, 2024
f75e30b
update import dir for pretrain test
aditya0by0 Nov 17, 2024
1b8b270
migration fix : truncate seq and save data with labels
aditya0by0 Dec 4, 2024
bcda11c
Delete protein_protein_interactions.py
aditya0by0 Dec 4, 2024
85c47a0
migration: replace invalid amino acid with "X" notation
aditya0by0 Dec 4, 2024
fbb5c58
update deepgo configs
aditya0by0 Dec 4, 2024
272446d
add esm2 reader for deepGO
aditya0by0 Dec 9, 2024
a12354b
increase electra vocab size
Dec 9, 2024
66732a7
fix: print right name of missing file
aditya0by0 Dec 9, 2024
e7b3d80
migration : add esm2 embeddings
aditya0by0 Dec 9, 2024
862c8ef
scope dataset: add scope abstract code
aditya0by0 Jan 5, 2025
7da8963
base: make _name property abstract method
aditya0by0 Jan 5, 2025
976f2b8
add simple Feed-forward network (for ESM2->chebi task)
sfluegel05 Jan 10, 2025
3b17487
reformat using Black
sfluegel05 Jan 10, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion chebai/models/electra.py
Original file line number Diff line number Diff line change
Expand Up @@ -329,7 +329,7 @@ def forward(self, data: Dict[str, Tensor], **kwargs: Any) -> Dict[str, Any]:
except RuntimeError as e:
print(f"RuntimeError at forward: {e}")
print(f'data[features]: {data["features"]}')
raise Exception
raise e
inp = self.word_dropout(inp)
electra = self.electra(inputs_embeds=inp, **kwargs)
d = electra.last_hidden_state[:, 0, :]
Expand Down
61 changes: 61 additions & 0 deletions chebai/models/ffn.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
from typing import Dict, Any, Tuple

from chebai.models import ChebaiBaseNet
import torch
from torch import Tensor


class FFN(ChebaiBaseNet):

NAME = "FFN"

def __init__(
self,
input_size: int = 1000,
num_hidden_layers: int = 3,
hidden_size: int = 128,
**kwargs
):
super().__init__(**kwargs)

self.layers = torch.nn.ModuleList()
self.layers.append(torch.nn.Linear(input_size, hidden_size))
for _ in range(num_hidden_layers):
self.layers.append(torch.nn.Linear(hidden_size, hidden_size))
self.layers.append(torch.nn.Linear(hidden_size, self.out_dim))

def _get_prediction_and_labels(self, data, labels, model_output):
d = model_output["logits"]
loss_kwargs = data.get("loss_kwargs", dict())
if "non_null_labels" in loss_kwargs:
n = loss_kwargs["non_null_labels"]
d = data[n]
return torch.sigmoid(d), labels.int() if labels is not None else None

def _process_for_loss(
self,
model_output: Dict[str, Tensor],
labels: Tensor,
loss_kwargs: Dict[str, Any],
) -> Tuple[Tensor, Tensor, Dict[str, Any]]:
"""
Process the model output for calculating the loss.

Args:
model_output (Dict[str, Tensor]): The output of the model.
labels (Tensor): The target labels.
loss_kwargs (Dict[str, Any]): Additional loss arguments.

Returns:
tuple: A tuple containing the processed model output, labels, and loss arguments.
"""
kwargs_copy = dict(loss_kwargs)
if labels is not None:
labels = labels.float()
return model_output["logits"], labels, kwargs_copy

def forward(self, data, **kwargs):
x = data["features"]
for layer in self.layers:
x = torch.relu(layer(x))
return {"logits": x}
1 change: 1 addition & 0 deletions chebai/preprocessing/bin/protein_token/tokens.txt
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,4 @@ W
E
V
H
X
Loading
Loading