Skip to content
This repository has been archived by the owner on Feb 3, 2023. It is now read-only.

Regression testing after merging smart ner with master #49

Open
wants to merge 793 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
793 commits
Select commit Hold shift + click to select a range
ba210f4
merged hedging into one function
mithunpaul08 Sep 4, 2018
60c365e
added code for refuting body
mithunpaul08 Sep 4, 2018
986c39b
changed refuting_value_body_matrix size
mithunpaul08 Sep 4, 2018
eed13a4
removed tqdm for load_embed
mithunpaul08 Sep 4, 2018
d79cbec
Merge pull request #30 from mithunpaul08/add_refute_headline
mithunpaul08 Sep 4, 2018
31806c5
Merge pull request #31 from mithunpaul08/hedging_single_fn
mithunpaul08 Sep 4, 2018
54a3812
will print word overlap 3 features and quit
Sep 4, 2018
c6c8d0f
fixed indentation
Sep 4, 2018
6057dc2
fixed overlap counter
Sep 4, 2018
a887dd8
switched to dot info
Sep 4, 2018
4208762
will print common words also
Sep 4, 2018
a8425b3
removed sys.exit
Sep 4, 2018
7f3db31
will print cv and exit
Sep 4, 2018
222afb9
change debug info
Sep 4, 2018
d12f71e
changed overlap features to not list
Sep 4, 2018
dc76551
removed []
Sep 4, 2018
acc32e6
removed sys.exit
Sep 4, 2018
f2b6792
syntax
Sep 4, 2018
f897ba8
Merge pull request #32 from mithunpaul08/word_overlap3
mithunpaul08 Sep 10, 2018
010ad56
will write first entry in snli format
Sep 10, 2018
3311f8a
fixed extra arguments
Sep 10, 2018
f386dbf
will exit if evidence>1
Sep 11, 2018
85fba55
more log file
Sep 11, 2018
0fbd6bb
added if evidences>1 then take that
Sep 11, 2018
04144a8
more lgo statements
Sep 11, 2018
4e61e31
did inside_eve[0] to get inside one list of ev
Sep 11, 2018
9a4678d
will print all evidences
Sep 11, 2018
78c4fbe
will write snli file for 10
Sep 11, 2018
040a390
wont quit
Sep 11, 2018
4e1b1c8
delete if exists else append
Sep 11, 2018
a58a9be
adds a newline
Sep 11, 2018
8ca9c54
added out .write
Sep 11, 2018
f909e0d
Merge pull request #33 from mithunpaul08/for_allennlp_attn
mithunpaul08 Sep 11, 2018
475d4f6
will annotate instead of snli
Sep 11, 2018
97e0c9f
will write dictionary and quit at index=3
Sep 11, 2018
473b640
delete file if it exists
Sep 12, 2018
90fc09b
wont exit
Sep 12, 2018
66b88b5
will exit when same page line found
Sep 12, 2018
d68c114
will print dict
Sep 12, 2018
c0273cf
.items()
Sep 12, 2018
9101680
more dict prints
Sep 12, 2018
670d758
2
Sep 12, 2018
1a83483
prints found page not in list
Sep 12, 2018
7588e7f
tab page not in dict
Sep 12, 2018
3987632
replaced with set
Sep 12, 2018
3b8004b
will exit when length of unqiue is diff
Sep 12, 2018
c5a1848
wont exit will keep annotating
Sep 12, 2018
e41fce0
will train on folder train_chain1
mithunpaul08 Sep 12, 2018
59e36e7
replaced hardcoded path with train)chain2
mithunpaul08 Sep 12, 2018
5ef8bcf
will exit if new label
Sep 12, 2018
cbd3a3b
moved inside
Sep 12, 2018
a361097
fixed syntax
Sep 12, 2018
06fc036
annotation for dev
mithunpaul08 Sep 13, 2018
82a2060
commented out annotation code in uofa_train and
mithunpaul08 Sep 13, 2018
a61fe8a
Merge pull request #34 from mithunpaul08/chain_evidence_bugfix
mithunpaul08 Sep 16, 2018
7d19339
Merge branch 'master' into for_allennlp_attn
mithunpaul08 Sep 16, 2018
03234d0
Merge pull request #35 from mithunpaul08/for_allennlp_attn
mithunpaul08 Sep 16, 2018
3abd9be
this is a dummy push. but this is the version that
mithunpaul08 Sep 26, 2018
0931cce
added annotation code also
mithunpaul08 Sep 26, 2018
515954b
this version has annotator code and runs on jenny
mithunpaul08 Sep 27, 2018
d4abf3b
added annotation code, second time
mithunpaul08 Sep 27, 2018
4b21200
added new folder src/rte/mithun
mithunpaul08 Sep 27, 2018
5dfcf4f
added scorer folder
mithunpaul08 Sep 27, 2018
b968db6
added pushgit. also the function deleteifexists
mithunpaul08 Sep 27, 2018
bc99121
da to rte.mithun.ds
mithunpaul08 Sep 27, 2018
8ec0209
added import os
mithunpaul08 Sep 27, 2018
a4ac191
added small to all train and validation paths
mithunpaul08 Sep 27, 2018
96e1f46
confirmed that this version runs training with ann
mithunpaul08 Sep 27, 2018
735d25b
will print annotated hyp and premise and exit
mithunpaul08 Sep 28, 2018
8b0c42b
.lemmas
mithunpaul08 Sep 28, 2018
3b68b89
syntax error
mithunpaul08 Sep 28, 2018
6730c2d
will print entities
mithunpaul08 Sep 28, 2018
e0057a9
will print doc.sentences[0]
mithunpaul08 Sep 28, 2018
e1e7aab
will print doc1 also
mithunpaul08 Sep 28, 2018
876dac3
gave up on doing classification on the fly...will
mithunpaul08 Sep 28, 2018
c3cecc0
fixed syntax error
mithunpaul08 Sep 28, 2018
e9693ae
should write separate files for head and body
mithunpaul08 Sep 28, 2018
d6938a6
bug fix. was writing only the last entry. moved de
mithunpaul08 Sep 28, 2018
6739beb
syntax error fix
mithunpaul08 Sep 28, 2018
528cb24
will annotate all 145k and 19k entries
mithunpaul08 Sep 28, 2018
29cb7d1
will quit before training starts
mithunpaul08 Sep 28, 2018
2e28b2a
will try to print doc1.entities again
mithunpaul08 Sep 28, 2018
8061753
will print entities.
mithunpaul08 Sep 28, 2018
aa387ac
fixed syntax error
mithunpaul08 Sep 28, 2018
37b8693
sentences[0]
mithunpaul08 Sep 28, 2018
948c3dd
in the middle of coding ner replacement
mithunpaul08 Sep 29, 2018
1b30289
will run ner for hypothesis and premise
mithunpaul08 Sep 29, 2018
2d6cd47
will print entities of hyp and premise adn exit
mithunpaul08 Sep 29, 2018
b62b9d7
added space
mithunpaul08 Sep 29, 2018
b8ea9cc
should train on all 10 hopefully
mithunpaul08 Sep 29, 2018
3c430de
should print the NER filled version and exit
mithunpaul08 Sep 29, 2018
a1beb14
fixed 7 arg issue
mithunpaul08 Oct 1, 2018
584c0ff
removed self
mithunpaul08 Oct 1, 2018
5b46107
removed .data
mithunpaul08 Oct 1, 2018
5af8525
added print
mithunpaul08 Oct 1, 2018
ec7a6e0
no more splitting
mithunpaul08 Oct 1, 2018
85bed87
added space between words
mithunpaul08 Oct 1, 2018
1935bf3
wont exit. will train on 10 with ner
mithunpaul08 Oct 1, 2018
7bacd5d
run_name =dev
mithunpaul08 Oct 1, 2018
e17744a
validation path =empty
mithunpaul08 Oct 1, 2018
9c2dc76
commented out validation path
mithunpaul08 Oct 1, 2018
6d2e98e
added a single file to run it all
mithunpaul08 Oct 1, 2018
c93a09f
uses .small for all runs
mithunpaul08 Oct 1, 2018
a827f5c
good to go to train and test on all 145k entries
mithunpaul08 Oct 1, 2018
a9783d3
in the middle of coding to read from annotated data
mithunpaul08 Oct 2, 2018
64ccd88
fixed extra arugments error
mithunpaul08 Oct 2, 2018
165969e
replaced i UOFADataReader with objUofaTrainTest
mithunpaul08 Oct 2, 2018
a605a89
objUofaTrainTest
mithunpaul08 Oct 2, 2018
d230589
same bug objconvert_NER_form_per_sent
mithunpaul08 Oct 2, 2018
7cc28ce
more printf statements
mithunpaul08 Oct 2, 2018
1c4e398
more print statements
mithunpaul08 Oct 2, 2018
734af1e
replaced read_json_with_id with read_json
mithunpaul08 Oct 2, 2018
ed16e35
commented out logging line
mithunpaul08 Oct 2, 2018
a63cbe6
removed extra argument in fn read_jsno
mithunpaul08 Oct 2, 2018
529fe53
added self to reads_json
mithunpaul08 Oct 2, 2018
3d0f977
will print len of lemma arrays
mithunpaul08 Oct 2, 2018
82024c1
will print hl length and exit
mithunpaul08 Oct 2, 2018
64d360d
will go through each entry in annotated data
mithunpaul08 Oct 2, 2018
3eae1b9
removed the sys.exit inside convert_NER_form_per_sent
mithunpaul08 Oct 2, 2018
83e94aa
will print type of hl
mithunpaul08 Oct 2, 2018
fb2a3a2
splits all entites etc into a list
mithunpaul08 Oct 2, 2018
5366060
should read label
mithunpaul08 Oct 2, 2018
16bdad3
will keep going for small training
mithunpaul08 Oct 2, 2018
86c3c53
added len to tqdm
mithunpaul08 Oct 2, 2018
d2f859b
changed the run all file to delete logs/
mithunpaul08 Oct 2, 2018
a0c1ad8
fixed the read in dev data
mithunpaul08 Oct 2, 2018
6fc9bb3
will print premise and hyp of last entry in training
mithunpaul08 Oct 2, 2018
1a33609
removed all stops. should go full throttle on trai-
mithunpaul08 Oct 2, 2018
d601587
added pickling to eval_da i.e dev data
mithunpaul08 Oct 3, 2018
8277748
will start logging after counter ==150
mithunpaul08 Oct 6, 2018
857a0ab
will create emb cosine sim if lenght of both vectors>0
mithunpaul08 Oct 7, 2018
8e3d05f
turned annotation on for dev
mithunpaul08 Oct 7, 2018
1a75d2c
commented out annotation. will do plain dev evaluation
mithunpaul08 Oct 7, 2018
1c12d4f
add just ir.py as command
mithunpaul08 Oct 7, 2018
d1e5308
ADD /data to dockerfile
mithunpaul08 Oct 7, 2018
4806153
commented out entry point in dockerfile
mithunpaul08 Oct 7, 2018
1835ec2
removed method and logger from uofa_dev
mithunpaul08 Oct 7, 2018
5af5fd8
removed logger and method from all uofa functions
mithunpaul08 Oct 7, 2018
3973b48
added combined_vector.pkl
Oct 7, 2018
6e723ef
version which runs out of the box and provides eval
Oct 7, 2018
2b17139
manualy removed all processor dependencies
Oct 7, 2018
ad77753
prints and premise hypothesis
mithunpaul08 Oct 9, 2018
4ed4775
Merge pull request #36 from mithunpaul08/chain_evidence_bugfix
mithunpaul08 Oct 10, 2018
8aee2b2
added code for smart ner replacement
mithunpaul08 Oct 20, 2018
a14cf58
added self.get_new_name
mithunpaul08 Oct 20, 2018
665262c
subsumes both direction
mithunpaul08 Oct 20, 2018
e811af1
will print he be quit
mithunpaul08 Oct 20, 2018
64e8f84
evidence_words_list
mithunpaul08 Oct 20, 2018
583897d
does subsumptions both directions of set
mithunpaul08 Oct 20, 2018
ac58c45
commented out sys.exit after printing hw
mithunpaul08 Oct 20, 2018
b5290ce
if(counter%10=0):
mithunpaul08 Oct 20, 2018
a124193
fixed syntax error
mithunpaul08 Oct 20, 2018
21677b5
removed yet another sys.exit. should quit at 10th entry
mithunpaul08 Oct 20, 2018
4f71162
added quit inside load annotation
mithunpaul08 Oct 20, 2018
99c653a
fixed syntax error
mithunpaul08 Oct 20, 2018
9baafd3
no sys.exit
mithunpaul08 Oct 20, 2018
e054a3f
removed more printf
mithunpaul08 Oct 20, 2018
a1561a1
will quit after 2nd entry
mithunpaul08 Oct 20, 2018
8a1d96e
should exit after first one
mithunpaul08 Oct 20, 2018
e4ffecb
nobody is in combat
mithunpaul08 Oct 20, 2018
fd46c3f
syntax
mithunpaul08 Oct 20, 2018
5d3e14f
more 2
mithunpaul08 Oct 20, 2018
ab5a924
fixed syntax
mithunpaul08 Oct 20, 2018
1b9ac6d
more syntax shit
mithunpaul08 Oct 20, 2018
e741c1e
2
mithunpaul08 Oct 20, 2018
d2d166e
will quit after 2
mithunpaul08 Oct 20, 2018
b758033
will print words
mithunpaul08 Oct 20, 2018
6a80571
will quit at 13
mithunpaul08 Oct 20, 2018
9f0e234
synta
mithunpaul08 Oct 20, 2018
d56ab95
will quit at roman atwood
mithunpaul08 Oct 20, 2018
6e83501
4
mithunpaul08 Oct 20, 2018
f5cf681
quits at 8
mithunpaul08 Oct 20, 2018
a6a2eb2
removed sys.exit
mithunpaul08 Oct 20, 2018
77904c8
commented more print statements
mithunpaul08 Oct 20, 2018
e4ceb49
back to non smart ner
mithunpaul08 Oct 22, 2018
efb25b3
will exit after loading annotated data
mithunpaul08 Oct 22, 2018
37c797d
fixed syntax error
mithunpaul08 Oct 22, 2018
de5538d
replaced logger.info with print
mithunpaul08 Oct 22, 2018
c2d0be4
will print the first premise and hyp and then exit
mithunpaul08 Oct 22, 2018
3a9ba25
will run over smaller training set
mithunpaul08 Oct 22, 2018
57e4175
will exit after printing print("validation path is
mithunpaul08 Oct 22, 2018
566f864
will print vocab+exit
mithunpaul08 Oct 22, 2018
54d28ea
Went through entire training data with a toothcomb.
mithunpaul08 Oct 22, 2018
e1298d6
commented out print statements in loop
mithunpaul08 Oct 22, 2018
0e17226
will train on small100
mithunpaul08 Oct 22, 2018
323db9d
will exit dev after printing first entry
mithunpaul08 Oct 22, 2018
850cac5
will print first hyp with and without join
mithunpaul08 Oct 22, 2018
2615f0d
more log statements
mithunpaul08 Oct 22, 2018
32e2db6
removed print
mithunpaul08 Oct 22, 2018
27e9854
will print premis and hypothesis for very first entry
mithunpaul08 Oct 22, 2018
ed7f583
changed to logger.info
mithunpaul08 Oct 22, 2018
4c672a9
will quit training after 1st entry
mithunpaul08 Oct 22, 2018
c7e1f18
eval. will pritn label
mithunpaul08 Oct 22, 2018
ce062b7
removed sys.exit
mithunpaul08 Oct 22, 2018
c184c98
added data folder path to ora_Sent
mithunpaul08 Oct 23, 2018
2fa2ab2
added a json file for same folder also
mithunpaul08 Oct 23, 2018
e66c4fa
commented out print statements
mithunpaul08 Oct 23, 2018
dc84412
going back to non smart ner. will print first value and quit
mithunpaul08 Oct 23, 2018
b10c75d
will get inside read and quit
mithunpaul08 Oct 23, 2018
f66801f
ii
mithunpaul08 Oct 23, 2018
51c1a4f
only logging
mithunpaul08 Oct 23, 2018
65a1105
should print 1st entry of non smart ner
mithunpaul08 Oct 23, 2018
a6f4d9b
removed extra premise
mithunpaul08 Oct 23, 2018
df9105a
non smart ner. wont exit
mithunpaul08 Oct 23, 2018
9ddc4a1
non smart ner
mithunpaul08 Oct 23, 2018
cd8bf2a
should do annotate dev
mithunpaul08 Oct 23, 2018
900d69a
uncommented method class
mithunpaul08 Oct 23, 2018
c7fe770
fixed syntax
mithunpaul08 Oct 23, 2018
91e6183
passed loggers all over
mithunpaul08 Oct 23, 2018
19f210e
added pyprocessors reference back
mithunpaul08 Oct 23, 2018
dfe2a25
ADDED PROCESSORS BASEAPI instead of the java version. must be for docker
mithunpaul08 Oct 23, 2018
acd555c
fixed syntax self.
mithunpaul08 Oct 23, 2018
dc1cef5
will annotate nei also
mithunpaul08 Oct 23, 2018
d14aff9
fixed readme with the right command after nei evidence picker code
mithunpaul08 Oct 23, 2018
4fe964d
changed logging to include mode passed from command line
mithunpaul08 Oct 23, 2018
c150d04
will exit after gettiing into read_claims_annotate
mithunpaul08 Oct 23, 2018
8d61a5b
wont print on screen only into log file
mithunpaul08 Oct 23, 2018
bfce2b8
will exit if label is not enough info
mithunpaul08 Oct 23, 2018
fd1c341
will print all evidences for NEI
mithunpaul08 Oct 23, 2018
eb9a28e
fixed syntax
mithunpaul08 Oct 23, 2018
c8ce8f9
syntax error again
mithunpaul08 Oct 23, 2018
3ec661b
exit after line 20
mithunpaul08 Oct 23, 2018
c4be9b6
will print label also
mithunpaul08 Oct 23, 2018
c20f876
will keep going and annotate everythign
mithunpaul08 Oct 23, 2018
3860a15
will run smartner on dev and train
mithunpaul08 Oct 24, 2018
47bfef1
will run non smart ner for support and refute
mithunpaul08 Oct 24, 2018
da342e5
will run smartner
mithunpaul08 Oct 24, 2018
6cdb6ba
will run non smart ner
mithunpaul08 Oct 24, 2018
8c2a436
Will Run training +annotation with SMART NER
mithunpaul08 Oct 24, 2018
3be5249
will exit at the first nei data point
mithunpaul08 Oct 24, 2018
e5bfca7
removed .join(premise_ann)
mithunpaul08 Oct 24, 2018
0a64c0f
will exit after first 50 entries plus will print a
mithunpaul08 Oct 24, 2018
0306959
explicity print statemetn
mithunpaul08 Oct 24, 2018
71c0cca
swapped hypothesis with premise in smartner
mithunpaul08 Oct 24, 2018
bde28e5
will print the first premise hypothesis exit
mithunpaul08 Oct 24, 2018
b2e9161
sys.exit was at a wrong place
mithunpaul08 Oct 24, 2018
2daa986
will print annotated one
mithunpaul08 Oct 24, 2018
de9cdaa
will print hypothesis and premise before and after annotation
mithunpaul08 Oct 24, 2018
acdea4b
will exit at counter==5
mithunpaul08 Oct 24, 2018
e6c3165
plain ner prints first 5
mithunpaul08 Oct 24, 2018
7b085de
will exit after first NEI but with do_annotation_o
mithunpaul08 Oct 24, 2018
d81ea60
will count cls
mithunpaul08 Oct 31, 2018
9645853
commented out extra print lines
mithunpaul08 Oct 31, 2018
b956a5c
exit removd
mithunpaul08 Oct 31, 2018
087ac8b
Merge branch 'master' into chain_evidence_bugfix
mithunpaul08 Oct 31, 2018
69cbceb
Merge pull request #37 from mithunpaul08/chain_evidence_bugfix
mithunpaul08 Oct 31, 2018
07a3a37
commented out extra print statements
mithunpaul08 Oct 31, 2018
d0c762b
Merge branch 'master' into smartner
mithunpaul08 Oct 31, 2018
6f93878
Merge pull request #38 from mithunpaul08/smartner
mithunpaul08 Oct 31, 2018
f3e473f
fixed indentation issues
mithunpaul08 Oct 31, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
*.csv
.idea/
.DS_Store
__pycache__
Expand Down
11 changes: 9 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
FROM continuumio/miniconda3

ENTRYPOINT ["/bin/bash"]
#ENTRYPOINT ["/bin/bash"]

ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility
Expand All @@ -12,10 +12,14 @@ RUN mkdir /fever/scripts

VOLUME /fever/

RUN bash scripts/download-glove.sh
RUN bash scripts/download-data.sh

ADD requirements.txt /fever/
ADD src /fever/src/
ADD config /fever/config/
ADD scripts /fever/scripts/
ADD data /fever/data/

RUN apt-get update
RUN apt-get install -y --no-install-recommends \
Expand All @@ -39,6 +43,9 @@ RUN conda create -q -n fever python=3.6

WORKDIR /fever/
RUN . activate fever
RUN conda install -y pytorch=0.3.1 torchvision -c pytorch
#RUN conda install pytorch torchvision -c pytorch
RUN conda install cython nltk scikit-learn
RUN pip install -r requirements.txt
RUN python src/scripts/prepare_nltk.py
ENV PYTHONPATH src
CMD ["python", "src/scripts/retrieval/ir.py --model data/index/fever-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz --in-file data/fever-data/dev.jsonl --out-file data/fever/dev.sentences.p5.s5.jsonl --mode dev --lmode WARNING"]
73 changes: 73 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,78 @@

# UOFA- Fact Extraction and VERification
## Smart NER: replace tokens with NER tags but checking if they exists in the claim

To run the the training and evaluation using the smartNER either just do `./run_all_train_test.sh`
or use these commands below
@server@jenny

`rm -rf logs/`

`PYTHONPATH=src python src/scripts/rte/da/train_da.py data/fever/fever.db config/fever_nn_ora_sent.json logs/da_nn_sent --cuda-device $CUDA_DEVICE`

`mkdir -p data/models`

`cp logs/da_nn_sent/model.tar.gz data/models/decomposable_attention.tar.gz`

`PYTHONPATH=src python src/scripts/rte/da/eval_da.py data/fever/fever.db data/models/decomposable_attention.tar.gz data/fever/dev.ns.pages.p1.jsonl`

This assumes that you are on the same folder. If your data folder is somewhere else, use this

for training:
`PYTHONPATH=src python src/scripts/rte/da/train_da.py /net/kate/storage/work/mithunpaul/fever/my_fork/fever-baselines/data/fever/fever.db config/fever_nn_ora_sent.json logs/da_nn_sent --cuda-device $CUDA_DEVICE`
for dev:
`PYTHONPATH=src python src/scripts/rte/da/eval_da.py /net/kate/storage/work/mithunpaul/fever/my_fork/fever-baselines/data/fever/fever.db data/models/decomposable_attention.tar.gz /net/kate/storage/work/mithunpaul/fever/my_fork/fever-baselines/data/fever/dev.ns.pages.p1.jsonl`






`source activate fever`
`PYTHONPATH=src python src/scripts/rte/da/eval_da.py data/fever/fever.db data/models/decomposable_attention.tar.gz data/fever/dev.ns.pages.p1.jsonl`

# Fact Extraction and VERification


- To annotate data once you have Docker you need to pull pyprocessors using :docker pull myedibleenso/processors-server:latest

- Then run this image using: docker run -d -e _JAVA_OPTIONS="-Xmx3G" -p 127.0.0.1:8886:8888 --name procserv myedibleenso/processors-server

note: the docker run command is for the very first time you create this container. Second time onwards use: docker start procserv

- source activate fever

## to run training from my_fork folder on jenny
`PYTHONPATH=src python src/scripts/retrieval/ir.py --db data/fever/fever.db --model data/index/fever-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz --in-file data/fever-data/train.jsonl --out-file data/fever/train.sentences.p5.s5.jsonl --max-page 5 --max-sent 5 --mode train --lmode WARNING`


## to run training from another folder on jenny
PYTHONPATH=src python src/scripts/retrieval/ir.py --db /work/mithunpaul/fever/my_fork/fever-baselines/data/fever/fever.db --model /work/mithunpaul/fever/my_fork/fever-baselines/data/index/fever-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz --in-file /work/mithunpaul/fever/my_fork/fever-baselines/data/fever-data/train.jsonl --out-file /work/mithunpaul/fever/my_fork/fever-baselines/data/fever/train.sentences.p5.s5.jsonl --max-page 5 --max-sent 5 --mode train --lmode WARNING

## to run training on a smaller data set from another folder on jenny
PYTHONPATH=src python src/scripts/retrieval/ir.py --db /work/mithunpaul/fever/my_fork/fever-baselines/data/fever/fever.db
--model /work/mithunpaul/fever/my_fork/fever-baselines/data/index/fever-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz --in-file /work/mithunpaul/fever/my_fork/fever-baselines/data/fever-data/train.jsonl --out-file /work/mithunpaul/fever/my_fork/fever-baselines/data/fever/train.sentences.p5.s5.jsonl --max-page 5 --max-sent 5 --mode small --dynamic_cv True


## To run our entailment trainer on training data alone :

data_root="/work/mithunpaul/fever/my_fork/fever-baselines/data"

## To run on dev

`PYTHONPATH=src python src/scripts/retrieval/ir.py --db data/fever/fever.db --model data/index/fever-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz --in-file data/fever-data/dev.jsonl --out-file data/fever/dev.sentences.p5.s5.jsonl --max-page 5 --max-sent 5 --mode dev --lmode WARNING`

## to run dev in a folder branch_myfork in server but feeding from same data fold
`PYTHONPATH=src python src/scripts/retrieval/ir.py --db /work/mithunpaul/fever/my_fork/fever-baselines/data/fever/fever.db --model /work/mithunpaul/fever/my_fork/fever-baselines/data/index/fever-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz --in-file /work/mithunpaul/fever/my_fork/fever-baselines/data/fever-data/dev.jsonl --out-file /work/mithunpaul/fever/my_fork/fever-baselines/data/fever/dev.sentences.p5.s5.jsonl --max-page 5 --max-sent 5 --mode dev --lmode INFO`

## to run testing
`PYTHONPATH=src python src/scripts/retrieval/ir.py --db /work/mithunpaul/fever/my_fork/fever-baselines/data/fever/fever.db --model /work/mithunpaul/fever/my_fork/fever-baselines/data/index/fever-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz --in-file /work/mithunpaul/fever/my_fork/fever-baselines/data/fever-data/dev.jsonl --out-file /work/mithunpaul/fever/my_fork/fever-baselines/data/fever/dev.sentences.p5.s5.jsonl --max-page 5 --max-sent 5 --mode test --dynamic_cv True`

## to run dev after running the nearest neighbors algo for not enough info class (note that this assumes that you have run the NEI code mentioned below by sheffield)
`PYTHONPATH=src python src/scripts/retrieval/ir.py --db /work/mithunpaul/fever/my_fork/fever-baselines/data/fever/fever.db --model /work/mithunpaul/fever/my_fork/fever-baselines/data/index/fever-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz --in-file /work/mithunpaul/fever/my_fork/fever-baselines/data/fever/dev.ns.pages.p1.jsonl --out-file /work/mithunpaul/fever/my_fork/fever-baselines/data/fever/dev.sentences.p5.s5.jsonl --max-page 5 --max-sent 5 --mode dev --lmode INFO`


## Copy of Instructions from sheffield :might not be updated. use their instructions [page](https://github.com/sheffieldnlp/fever-baselines#evaluation)
This is the PyTorch implementation of the FEVER pipeline baseline described in the NAACL2018 paper: [FEVER: A large-scale dataset for Fact Extraction and VERification.]()

> Unlike other tasks and despite recent interest, research in textual claim verification has been hindered by the lack of large-scale manually annotated datasets. In this paper we introduce a new publicly available dataset for verification against textual sources, FEVER: Fact Extraction and VERification. It consists of 185,441 claims generated by altering sentences extracted from Wikipedia and subsequently verified without knowledge of the sentence they were derived from. The claims are classified as Supported, Refuted or NotEnoughInfo by annotators achieving 0.6841 in Fleiss κ. For the first two classes, the annotators also recorded the sentence(s) forming the necessary evidence for their judgment. To characterize the challenge of the dataset presented, we develop a pipeline approach using both baseline and state-of-the-art components and compare it to suitably designed oracles. The best accuracy we achieve on labeling a claim accompanied by the correct evidence is 31.87%, while if we ignore the evidence we achieve 50.91%. Thus we believe that FEVER is a challenging testbed that will help stimulate progress on claim verification against textual sources
Expand Down
5 changes: 5 additions & 0 deletions alt_folder_runner.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
PYTHONPATH=src python src/scripts/retrieval/ir.py --db /work/mithunpaul/fever/my_fork/fever-baselines/data/fever/fever.db --model /work/mithunpaul/fever/my_fork/fever-baselines/data/index/fever-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz --in-file /work/mithunpaul/fever/my_fork/fever-baselines/data/fever-data/train.jsonl --out-file /work/mithunpaul/fever/my_fork/fever-baselines/data/fever/train.sentences.p5.s5.jsonl --max-page 5 --max-sent 5 --mode train --lmode WARNING
PYTHONPATH=src python src/scripts/retrieval/ir.py --db /work/mithunpaul/fever/my_fork/fever-baselines/data/fever/fever.db --model /work/mithunpaul/fever/my_fork/fever-baselines/data/index/fever-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz --in-file /work/mithunpaul/fever/my_fork/fever-baselines/data/fever-data/dev.jsonl --out-file /work/mithunpaul/fever/my_fork/fever-baselines/data/fever/dev.sentences.p5.s5.jsonl --max-page 5 --max-sent 5 --mode dev --lmode WARNING



1 change: 1 addition & 0 deletions app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
print("hello world")
Binary file added combined_vector.pkl
Binary file not shown.
2 changes: 1 addition & 1 deletion config/fever_nn_ora_sent.json
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
}
},
"train_data_path": "data/fever/train.ns.pages.p1.jsonl",
"validation_data_path": "data/fever/dev.ns.pages.p1.jsonl",
//"validation_data_path": none,
"model": {
"type": "decomposable_attention",
"text_field_embedder": {
Expand Down
83 changes: 83 additions & 0 deletions config/fever_nn_ora_sent_diff_folder.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
{
"dataset_reader": {
"type": "fever",
"sentence_level":true,
"token_indexers": {
"tokens": {
"type": "single_id",
"lowercase_tokens": true
}
},
"wiki_tokenizer": {
"type":"word",
"word_splitter": {
"type": "just_spaces"
},
"end_tokens":["@@END@@"]
},
"claim_tokenizer": {
"type":"word",
"word_splitter": {
"type": "simple"
},
"end_tokens":["@@END@@"]
}
},
"train_data_path": "/net/kate/storage/work/mithunpaul/fever/my_fork/fever-baselines/data/fever/train.ns.pages.p1.jsonl",
//"validation_data_path": none,
"model": {
"type": "decomposable_attention",
"text_field_embedder": {
"tokens": {
"type": "embedding",
"projection_dim": 200,
"pretrained_file": "data/glove/glove.6B.300d.txt.gz",
"embedding_dim": 300,
"trainable": false
}
},
"attend_feedforward": {
"input_dim": 200,
"num_layers": 2,
"hidden_dims": 200,
"activations": "relu",
"dropout": 0.2
},
"similarity_function": {"type": "dot_product"},
"compare_feedforward": {
"input_dim": 400,
"num_layers": 2,
"hidden_dims": 200,
"activations": "relu",
"dropout": 0.2
},
"aggregate_feedforward": {
"input_dim": 400,
"num_layers": 2,
"hidden_dims": [200, 3],
"activations": ["relu", "linear"],
"dropout": [0.2, 0.0]
},
"initializer": [
[".*linear_layers.*weight", {"type": "xavier_normal"}],
[".*token_embedder_tokens\._projection.*weight", {"type": "xavier_normal"}]
]
},
"iterator": {
"type": "bucket",
"sorting_keys": [["premise", "num_tokens"], ["hypothesis", "num_tokens"]],
"batch_size": 32
},

"trainer": {
"num_epochs": 140,
"patience": 20,
"cuda_device": 0,
"grad_clipping": 5.0,
"validation_metric": "+accuracy",
"no_tqdm": true,
"optimizer": {
"type": "adagrad"
}
}
}
83 changes: 83 additions & 0 deletions config/fever_nn_ora_sent_same_folder.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
{
"dataset_reader": {
"type": "fever",
"sentence_level":true,
"token_indexers": {
"tokens": {
"type": "single_id",
"lowercase_tokens": true
}
},
"wiki_tokenizer": {
"type":"word",
"word_splitter": {
"type": "just_spaces"
},
"end_tokens":["@@END@@"]
},
"claim_tokenizer": {
"type":"word",
"word_splitter": {
"type": "simple"
},
"end_tokens":["@@END@@"]
}
},
"train_data_path": "data/fever/train.ns.pages.p1.jsonl",
//"validation_data_path": none,
"model": {
"type": "decomposable_attention",
"text_field_embedder": {
"tokens": {
"type": "embedding",
"projection_dim": 200,
"pretrained_file": "data/glove/glove.6B.300d.txt.gz",
"embedding_dim": 300,
"trainable": false
}
},
"attend_feedforward": {
"input_dim": 200,
"num_layers": 2,
"hidden_dims": 200,
"activations": "relu",
"dropout": 0.2
},
"similarity_function": {"type": "dot_product"},
"compare_feedforward": {
"input_dim": 400,
"num_layers": 2,
"hidden_dims": 200,
"activations": "relu",
"dropout": 0.2
},
"aggregate_feedforward": {
"input_dim": 400,
"num_layers": 2,
"hidden_dims": [200, 3],
"activations": ["relu", "linear"],
"dropout": [0.2, 0.0]
},
"initializer": [
[".*linear_layers.*weight", {"type": "xavier_normal"}],
[".*token_embedder_tokens\._projection.*weight", {"type": "xavier_normal"}]
]
},
"iterator": {
"type": "bucket",
"sorting_keys": [["premise", "num_tokens"], ["hypothesis", "num_tokens"]],
"batch_size": 32
},

"trainer": {
"num_epochs": 140,
"patience": 20,
"cuda_device": 0,
"grad_clipping": 5.0,
"validation_metric": "+accuracy",
"no_tqdm": true,
"optimizer": {
"type": "adagrad"
}
}
}
28 changes: 28 additions & 0 deletions log_fever.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
10-07 13:17 root WARNING got inside uofa_dev
10-07 13:17 root INFO going to load combined vector from disk
10-07 13:17 root INFO done with generating feature vectors. Model loading and predicting next
10-07 13:17 root INFO shape of cv:(13332, 118)
10-07 13:17 root INFO number of rows in label list is is:13332
10-07 13:17 root INFO above two must match
10-07 13:17 root INFO all value of combined_vector is:[[0.38095238 0.38095238 1. ... 0. 0. 0.82806861]
[0.05128205 0.05194805 0.8 ... 0. 0. 0.82942629]
[0.09756098 0.10810811 0.5 ... 0. 0. 0.76864988]
...
[0.13333333 0.16666667 0.4 ... 0. 0. 0.65084118]
[0.22727273 0.25 0.71428571 ... 0. 0. 0.90638053]
[0.07317073 0.07692308 0.6 ... 0. 0. 0.6764158 ]]
10-07 13:17 root INFO going to predict...
10-07 13:17 root WARNING done testing. and the accuracy is:
10-07 13:17 root WARNING 59.4959495949595%
10-07 13:17 root INFO precision recall f1-score support

0.0 0.56 0.95 0.70 6666
1.0 0.82 0.24 0.38 6666

micro avg 0.59 0.59 0.59 13332
macro avg 0.69 0.59 0.54 13332
weighted avg 0.69 0.59 0.54 13332

10-07 13:17 root INFO [[6307 359]
[5041 1625]]
10-07 13:17 root INFO done with testing. going to exit
Binary file added model_trained.pkl
Binary file not shown.
Empty file added old_log.log
Empty file.
1 change: 1 addition & 0 deletions paper/fever.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
\documentclass[10pt,a4paper]{article}
Binary file added predicted_results.pkl
Binary file not shown.
5 changes: 5 additions & 0 deletions pushgit.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@

git add --all
git commit

git push
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ typing
overrides
tqdm
nltk
allennlp==0.2.3
#allennlp==0.2.3
pytz
tensorboard-pytorch
git+git://github.com/j6mes/drqa@parallel
Expand Down
9 changes: 9 additions & 0 deletions run_all_train_test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
rm -rf logs/
PYTHONPATH=src python src/scripts/rte/da/train_da.py data/fever/fever.db config/fever_nn_ora_sent.json logs/da_nn_sent --cuda-device $CUDA_DEVICE
mkdir -p data/models
cp logs/da_nn_sent/model.tar.gz data/models/decomposable_attention.tar.gz
PYTHONPATH=src python src/scripts/rte/da/eval_da.py data/fever/fever.db data/models/decomposable_attention.tar.gz data/fever/dev.ns.pages.p1.small100.jsonl




5 changes: 5 additions & 0 deletions runner.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
PYTHONPATH=src python src/scripts/retrieval/ir.py --db data/fever/fever.db --model data/index/fever-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz --in-file data/fever-data/train.jsonl --out-file data/fever/train.sentences.p5.s5.jsonl --max-page 5 --max-sent 5 --mode train --lmode WARNING

PYTHONPATH=src python src/scripts/retrieval/ir.py --db data/fever/fever.db --model data/index/fever-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz --in-file data/fever-data/dev.jsonl --out-file data/fever/dev.sentences.p5.s5.jsonl --max-page 5 --max-sent 5 --mode dev --lmode INFO


1 change: 1 addition & 0 deletions scripts/download-data.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
#!/bin/bash
mkdir -p data
mkdir -p data/fever-data

wget -O data/fever-data/train.jsonl https://s3-eu-west-1.amazonaws.com/fever.public/train.jsonl
wget -O data/fever-data/dev.jsonl https://s3-eu-west-1.amazonaws.com/fever.public/shared_task_dev.jsonl
wget -O data/fever-data/test.jsonl https://s3-eu-west-1.amazonaws.com/fever.public/shared_task_test.jsonl
Binary file added src/common/__init__.pyc
Binary file not shown.
Binary file added src/common/util/__init__.pyc
Binary file not shown.
Loading