diff --git a/examples/colab/Training/binary_text_classification/NLU_training_negation_classifier_demo_biological_texts.ipynb b/examples/colab/Training/binary_text_classification/NLU_training_negation_classifier_demo_biological_texts.ipynb index 770608c4..afaa8883 100644 --- a/examples/colab/Training/binary_text_classification/NLU_training_negation_classifier_demo_biological_texts.ipynb +++ b/examples/colab/Training/binary_text_classification/NLU_training_negation_classifier_demo_biological_texts.ipynb @@ -1 +1,3124 @@ -{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"NLU_training_negation_classifier_demo_biological_texts.ipynb","provenance":[],"collapsed_sections":[]},"kernelspec":{"display_name":"Python 3","name":"python3"}},"cells":[{"cell_type":"markdown","metadata":{"id":"zkufh760uvF3"},"source":["![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)\n","\n","[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/Training/binary_text_classification/NLU_training_negation_classifier_demo_biological_texts.ipynb)\n","\n","\n","# Training a Sentiment Analysis Classifier with NLU \n","## 2 Class Biological Negation Classifer Training\n","With the [SentimentDL model](https://nlp.johnsnowlabs.com/docs/en/annotators#sentimentdl-multi-class-sentiment-analysis-annotator) from Spark NLP you can achieve State Of the Art results on any multi class text classification problem \n","\n","This notebook showcases the following features : \n","\n","- How to train the deep learning classifier\n","- How to store a pipeline to disk\n","- How to load the pipeline from disk (Enables NLU offline mode)\n","\n","You can achieve these results or even better on this dataset with training data : \n","\n","
\n","\n","![image.png]()\n","\n","\n","You can achieve these results or even better on this dataset with test data : \n","\n","
\n","\n","\n","![Screenshot 2021-02-25 140123.png]()\n","\n","\n","\n","\n","\n","\n","\n","\n"]},{"cell_type":"markdown","metadata":{"id":"dur2drhW5Rvi"},"source":["# 1. Install Java 8 and NLU"]},{"cell_type":"code","metadata":{"id":"hFGnBCHavltY","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1620187471950,"user_tz":-120,"elapsed":121254,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"4abf85d5-354d-4b91-dac8-96cf4fabc546"},"source":["!wget https://setup.johnsnowlabs.com/nlu/colab.sh -O - | bash\n","import nlu"],"execution_count":null,"outputs":[{"output_type":"stream","text":["--2021-05-05 04:02:31-- https://raw.githubusercontent.com/JohnSnowLabs/nlu/master/scripts/colab_setup.sh\n","Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.110.133, 185.199.111.133, ...\n","Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.\n","HTTP request sent, awaiting response... 200 OK\n","Length: 1671 (1.6K) [text/plain]\n","Saving to: ‘STDOUT’\n","\n","\r- 0%[ ] 0 --.-KB/s \r- 100%[===================>] 1.63K --.-KB/s in 0s \n","\n","2021-05-05 04:02:31 (36.7 MB/s) - written to stdout [1671/1671]\n","\n","Installing NLU 3.0.0 with PySpark 3.0.2 and Spark NLP 3.0.1 for Google Colab ...\n","\u001b[K |████████████████████████████████| 204.8MB 72kB/s \n","\u001b[K |████████████████████████████████| 153kB 51.4MB/s \n","\u001b[K |████████████████████████████████| 204kB 22.1MB/s \n","\u001b[K |████████████████████████████████| 204kB 49.6MB/s \n","\u001b[?25h Building wheel for pyspark (setup.py) ... \u001b[?25l\u001b[?25hdone\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"f4KkTfnR5Ugg"},"source":["# 2. Download Negation Bilogical Texts dataset \n","https://www.kaggle.com/ma7555/bioscope-corpus-negation-annotated\n","#Context\n","The BioScope corpus consists of medical and biological texts annotated for negation and their linguistic scope. This was done to allow a comparison between the development of systems for negation/hedge detection and scope resolution.\n","The corpus is publicly available for research purposes.\n","\n","You can use this corpus to fine-tune a BERT-like model for negation detection.\n","\n","This dataset was created in this format during the COVID-19 crisis as a training set for detecting negations regarding treatment of specific drugs in the released research papers.\n","\n","Creators of the original dataset: MTA-SZTE Research Group on Artificial Intelligence - RGAI\n","https://rgai.inf.u-szeged.hu/node/105\n"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"OrVb5ZMvvrQD","executionInfo":{"status":"ok","timestamp":1620187472933,"user_tz":-120,"elapsed":122227,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"c4e16951-2733-4f4f-c59b-a6b517dc3774"},"source":["! wget http://ckl-it.de/wp-content/uploads/2021/02/bioscope_abstract.csv\n"],"execution_count":null,"outputs":[{"output_type":"stream","text":["--2021-05-05 04:04:31-- http://ckl-it.de/wp-content/uploads/2021/02/bioscope_abstract.csv\n","Resolving ckl-it.de (ckl-it.de)... 217.160.0.108, 2001:8d8:100f:f000::209\n","Connecting to ckl-it.de (ckl-it.de)|217.160.0.108|:80... connected.\n","HTTP request sent, awaiting response... 200 OK\n","Length: 802898 (784K) [text/csv]\n","Saving to: ‘bioscope_abstract.csv’\n","\n","bioscope_abstract.c 100%[===================>] 784.08K 1.21MB/s in 0.6s \n","\n","2021-05-05 04:04:32 (1.21 MB/s) - ‘bioscope_abstract.csv’ saved [802898/802898]\n","\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":419},"id":"y4xSRWIhwT28","executionInfo":{"status":"ok","timestamp":1620187474073,"user_tz":-120,"elapsed":123361,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"d0700a4a-7d4e-4fe9-8d96-84ecb1ef03ea"},"source":["import pandas as pd\n","train_path = '/content/bioscope_abstract.csv'\n","\n","train_df = pd.read_csv(train_path)\n","# the text data to use for classification should be in a column named 'text'\n","columns=['text','y']\n","train_df = train_df[columns]\n","train_df = train_df.dropna()\n","train_df = train_df.sample(frac=1).reset_index(drop=True)\n","from sklearn.model_selection import train_test_split\n","\n","train_df, test_df = train_test_split(train_df, test_size=0.2)\n","train_df"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
texty
85The mAb failed to induce NF-kappa B/Rel nuclea...positive
937Because these induced gene products have NF-ka...negative
1707H2O2-induced NF-kappaB activation in Wurzburg ...positive
1029The carboxyl-terminal cytoplasmic domain of CD...negative
258Pretreatment with actinomycin D and cyclohexim...negative
.........
516The finding that dexamethasone has no effect o...positive
1870IL-4 secreted by activated T cells is a pleiot...negative
380In contrast to wild-type B cells, neither of t...positive
684The IL-12 nonresponsiveness of the Th2 clones ...positive
1830Synergism between two distinct elements of the...negative
\n","

1600 rows × 2 columns

\n","
"],"text/plain":[" text y\n","85 The mAb failed to induce NF-kappa B/Rel nuclea... positive\n","937 Because these induced gene products have NF-ka... negative\n","1707 H2O2-induced NF-kappaB activation in Wurzburg ... positive\n","1029 The carboxyl-terminal cytoplasmic domain of CD... negative\n","258 Pretreatment with actinomycin D and cyclohexim... negative\n","... ... ...\n","516 The finding that dexamethasone has no effect o... positive\n","1870 IL-4 secreted by activated T cells is a pleiot... negative\n","380 In contrast to wild-type B cells, neither of t... positive\n","684 The IL-12 nonresponsiveness of the Th2 clones ... positive\n","1830 Synergism between two distinct elements of the... negative\n","\n","[1600 rows x 2 columns]"]},"metadata":{"tags":[]},"execution_count":3}]},{"cell_type":"markdown","metadata":{"id":"0296Om2C5anY"},"source":["# 3. Train Deep Learning Classifier using nlu.load('train.sentiment')\n","\n","You dataset label column should be named 'y' and the feature column with text data should be named 'text'"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":1000},"id":"3ZIPkRkWftBG","executionInfo":{"status":"ok","timestamp":1620188018837,"user_tz":-120,"elapsed":10641,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"1d2af806-3dc3-4a6e-839b-892fc20497a2"},"source":["import nlu \n","from sklearn.metrics import classification_report\n","\n","# load a trainable pipeline by specifying the train. prefix and fit it on a datset with label and text columns\n","# by default the Universal Sentence Encoder (USE) Sentence embeddings are used for generation\n","trainable_pipe = nlu.load('train.sentiment')\n","fitted_pipe = trainable_pipe.fit(train_df.iloc[:50])\n","\n","# predict with the trainable pipeline on dataset and get predictions\n","preds = fitted_pipe.predict(train_df.iloc[:50],output_level='document')\n","#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n","preds.dropna(inplace=True)\n","print(classification_report(preds['y'], preds['sentiment']))\n","\n","preds"],"execution_count":null,"outputs":[{"output_type":"stream","text":["tfhub_use download started this may take some time.\n","Approximate size to download 923.7 MB\n","[OK!]\n","sentence_detector_dl download started this may take some time.\n","Approximate size to download 354.6 KB\n","[OK!]\n"," precision recall f1-score support\n","\n"," negative 0.00 0.00 0.00 24\n"," positive 0.52 1.00 0.68 26\n","\n"," accuracy 0.52 50\n"," macro avg 0.26 0.50 0.34 50\n","weighted avg 0.27 0.52 0.36 50\n","\n"],"name":"stdout"},{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
ytexttrained_sentimentsentenceorigin_indexdocumentsentence_embedding_usetrained_sentiment_confidence
0positiveThe mAb failed to induce NF-kappa B/Rel nuclea...positive[The mAb failed to induce NF-kappa B/Rel nucle...85The mAb failed to induce NF-kappa B/Rel nuclea...[-0.021044651046395302, -0.012281016446650028,...0.718423
1negativeBecause these induced gene products have NF-ka...positive[Because these induced gene products have NF-k...937Because these induced gene products have NF-ka...[-0.018397482112050056, -0.0002952178183477372...0.656633
2positiveH2O2-induced NF-kappaB activation in Wurzburg ...positive[H2O2-induced NF-kappaB activation in Wurzburg...1707H2O2-induced NF-kappaB activation in Wurzburg ...[0.04317500814795494, 0.00023781821073498577, ...0.721148
3negativeThe carboxyl-terminal cytoplasmic domain of CD...positive[The carboxyl-terminal cytoplasmic domain of C...1029The carboxyl-terminal cytoplasmic domain of CD...[0.0399024672806263, 0.06178785115480423, -0.0...0.702527
4negativePretreatment with actinomycin D and cyclohexim...positive[Pretreatment with actinomycin D and cyclohexi...258Pretreatment with actinomycin D and cyclohexim...[0.03217654675245285, -0.031980931758880615, 0...0.716568
5positiveOn the other hand, phorbol ester-induced produ...positive[On the other hand, phorbol ester-induced prod...1839On the other hand, phorbol ester-induced produ...[0.05933626368641853, 0.06317632645368576, 0.0...0.712259
6positiveActivated Rac-1 could mimic activated p21ras t...positive[Activated Rac-1 could mimic activated p21ras ...1552Activated Rac-1 could mimic activated p21ras t...[0.012313981540501118, -0.05240459367632866, -...0.694476
7negativeAfter a 2-day incubation in LDL, the binding o...positive[After a 2-day incubation in LDL, the binding ...942After a 2-day incubation in LDL, the binding o...[0.01922391913831234, -0.05796779692173004, -0...0.673653
8negativeAs evidenced by electro mobility shift assay (...positive[As evidenced by electro mobility shift assay ...383As evidenced by electro mobility shift assay (...[0.004417878575623035, -0.020263247191905975, ...0.698439
9negativeMutations at the B2 site abolish this transcri...positive[Mutations at the B2 site abolish this transcr...1515Mutations at the B2 site abolish this transcri...[0.021449951454997063, 0.017173701897263527, -...0.679992
10positiveElectrophoretic mobility-shift assays of HUVEC...positive[Electrophoretic mobility-shift assays of HUVE...1313Electrophoretic mobility-shift assays of HUVEC...[0.008786042220890522, 0.03473477438092232, -0...0.750811
11positiveIn comparison to other activators of NF-kappa ...positive[In comparison to other activators of NF-kappa...1972In comparison to other activators of NF-kappa ...[0.05494653061032295, 0.05886928364634514, 0.0...0.707760
12negativeHere we show that the I alpha1 promoter contai...positive[Here we show that the I alpha1 promoter conta...1279Here we show that the I alpha1 promoter contai...[0.024258537217974663, 0.003928740043193102, 0...0.672414
13positiveThus these data suggest that the phorbol myris...positive[Thus these data suggest that the phorbol myri...1094Thus these data suggest that the phorbol myris...[0.04496181383728981, 0.006568090058863163, -0...0.738647
14positiveA 40-fold stimulation of chloramphenicol acety...positive[A 40-fold stimulation of chloramphenicol acet...1528A 40-fold stimulation of chloramphenicol acety...[0.02669776789844036, 0.0031935900915414095, 0...0.702648
15negativeThird, a coimmunoprecipitation assay showed th...positive[Third, a coimmunoprecipitation assay showed t...1195Third, a coimmunoprecipitation assay showed th...[0.03080531395971775, -0.01305992528796196, 0....0.627124
16negativePhosphorylation of Jak2 in tax transformed cel...positive[Phosphorylation of Jak2 in tax transformed ce...1725Phosphorylation of Jak2 in tax transformed cel...[0.05668512359261513, -0.014115522615611553, 0...0.666193
17positiveThis peptide potently inhibited NFAT activatio...positive[This peptide potently inhibited NFAT activati...1924This peptide potently inhibited NFAT activatio...[0.051138270646333694, 0.019551211968064308, 0...0.726400
18positiveOverexpression of Bcl-2 in tumor cells blocks ...positive[Overexpression of Bcl-2 in tumor cells blocks...527Overexpression of Bcl-2 in tumor cells blocks ...[0.029882941395044327, 0.031224790960550308, -...0.700843
19negativeNuclear factor kappa B (NF-kappa B) is a pleio...positive[Nuclear factor kappa B (NF-kappa B) is a plei...284Nuclear factor kappa B (NF-kappa B) is a pleio...[0.04925459995865822, 0.030116116628050804, -0...0.664960
20positiveAn NFAT oligonucleotide carrying mutations in ...positive[An NFAT oligonucleotide carrying mutations in...524An NFAT oligonucleotide carrying mutations in ...[0.014317388646304607, 0.01018354669213295, 0....0.684760
21negativeIn addition Spi-B as well as PU.1 were able to...positive[In addition Spi-B as well as PU.1 were able t...521In addition Spi-B as well as PU.1 were able to...[0.044076837599277496, -0.007634499575942755, ...0.632185
22positiveNuclear run-on assays demonstrate that : ( 1 )...positive[Nuclear run-on assays demonstrate that : ( 1 ...1064Nuclear run-on assays demonstrate that : ( 1 )...[0.03107193484902382, 0.020781097933650017, 0....0.722295
23positiveFinally, we conclude that this effect of CIITA...positive[Finally, we conclude that this effect of CIIT...1539Finally, we conclude that this effect of CIITA...[0.033310066908597946, -0.03267408534884453, 0...0.678016
24positiveWe found that ALD induce a transient activatio...positive[We found that ALD induce a transient activati...1994We found that ALD induce a transient activatio...[0.049336668103933334, -0.016328582540154457, ...0.725506
25positiveAFR behaves like ascorbate, while DHA and asco...positive[AFR behaves like ascorbate, while DHA and asc...203AFR behaves like ascorbate, while DHA and asco...[0.06031205505132675, 0.00172607006970793, 0.0...0.738096
26positiveIt has been shown recently that in wild-type C...positive[It has been shown recently that in wild-type ...428It has been shown recently that in wild-type C...[0.06434260308742523, 0.035497453063726425, -0...0.714807
27positiveOncogenic forms of NOTCH1 lacking either the p...positive[Oncogenic forms of NOTCH1 lacking either the ...112Oncogenic forms of NOTCH1 lacking either the p...[0.036650002002716064, -0.005419247783720493, ...0.724903
28positiveRhabdomyosarcomas do not contain mutations in ...positive[Rhabdomyosarcomas do not contain mutations in...1258Rhabdomyosarcomas do not contain mutations in ...[0.04726873338222504, -0.012257322669029236, -...0.685728
29negativeCharacterization of CD40 signaling determinant...positive[Characterization of CD40 signaling determinan...151Characterization of CD40 signaling determinant...[0.005249931011348963, -0.01369913388043642, 0...0.699257
30negativeThus, a component of LDL-enhanced endothelial ...positive[Thus, a component of LDL-enhanced endothelial...1669Thus, a component of LDL-enhanced endothelial ...[0.06682103872299194, -0.043387044221162796, -...0.651579
31positiveHowever, stimulation with MBP did not produce ...positive[However, stimulation with MBP did not produce...1627However, stimulation with MBP did not produce ...[0.07646981626749039, 0.05336544290184975, -0....0.723428
32negativeWe analyzed the activity of the enhancer, the ...positive[We analyzed the activity of the enhancer, the...1819We analyzed the activity of the enhancer, the ...[0.058565203100442886, 0.017934346571564674, 0...0.655187
33negativeDeath-inducing ligands (DILs) such as tumor ne...positive[Death-inducing ligands (DILs) such as tumor n...148Death-inducing ligands (DILs) such as tumor ne...[0.018583133816719055, 0.05835242569446564, 0....0.707814
34positiveUsing electrophoretic mobility shift assays, w...positive[Using electrophoretic mobility shift assays, ...857Using electrophoretic mobility shift assays, w...[0.01857064664363861, -1.7747517631505616e-05,...0.719099
35negativecAMP signaling inhibited Stat1 at several diff...positive[cAMP signaling inhibited Stat1 at several dif...285cAMP signaling inhibited Stat1 at several diff...[0.0070323762483894825, -0.036651283502578735,...0.661534
36positiveIn addition, JNK activation by PMA plus ionoph...positive[In addition, JNK activation by PMA plus ionop...36In addition, JNK activation by PMA plus ionoph...[0.04152128845453262, 0.014748011715710163, 0....0.734088
37negativeThe viral protein Tax induces the activation a...positive[The viral protein Tax induces the activation ...679The viral protein Tax induces the activation a...[0.06768079847097397, -0.028039786964654922, 0...0.682207
38negativeFurthermore, OFT-1 appeared to have approximat...positive[Furthermore, OFT-1 appeared to have approxima...1527Furthermore, OFT-1 appeared to have approximat...[0.07009289413690567, 0.03128473833203316, -0....0.687950
39negativeA principal objective of the present study was...positive[A principal objective of the present study wa...479A principal objective of the present study was...[0.017783455550670624, -0.026092059910297394, ...0.659265
40positiveWe conclude that downregulation of WT1 during ...positive[We conclude that downregulation of WT1 during...1561We conclude that downregulation of WT1 during ...[0.00441081915050745, 0.021023083478212357, -0...0.698446
41negativeIn these transfected CL-01 cells, CD40:CD40L e...positive[In these transfected CL-01 cells, CD40:CD40L ...1188In these transfected CL-01 cells, CD40:CD40L e...[0.010555184446275234, -0.005831445567309856, ...0.679171
42negativeRecent studies indicate that mutations in the ...positive[Recent studies indicate that mutations in the...667Recent studies indicate that mutations in the ...[0.05200810357928276, -0.0026498467195779085, ...0.651673
43negativeTo confirm the importance of the three cis-act...positive[To confirm the importance of the three cis-ac...734To confirm the importance of the three cis-act...[0.05337677150964737, -0.028393635526299477, 0...0.667999
44positiveNo significant difference was detected in the ...positive[No significant difference was detected in the...1805No significant difference was detected in the ...[0.05019189417362213, 0.01725945621728897, 0.0...0.748773
45positiveIn contrast, anti-CD3 and anti-CD28 stimulated...positive[In contrast, anti-CD3 and anti-CD28 stimulate...1473In contrast, anti-CD3 and anti-CD28 stimulated...[0.053164657205343246, 0.015389195643365383, 0...0.712844
46negativeIn this study we analyzed the effect of CD40 s...positive[In this study we analyzed the effect of CD40 ...418In this study we analyzed the effect of CD40 s...[0.016681713983416557, 0.06071694567799568, -0...0.657744
47positiveA dramatic increase in the intracellular level...positive[A dramatic increase in the intracellular leve...689A dramatic increase in the intracellular level...[0.032944243401288986, -0.005736679304391146, ...0.704365
48positiveQualitatively different effects were observed ...positive[Qualitatively different effects were observed...1228Qualitatively different effects were observed ...[0.06803048402070999, 0.039302267134189606, 0....0.734736
49negativeA third tyrosine within the amino-terminal reg...positive[A third tyrosine within the amino-terminal re...1899A third tyrosine within the amino-terminal reg...[0.08541002124547958, 0.0034846188500523567, 0...0.687449
\n","
"],"text/plain":[" y ... trained_sentiment_confidence\n","0 positive ... 0.718423\n","1 negative ... 0.656633\n","2 positive ... 0.721148\n","3 negative ... 0.702527\n","4 negative ... 0.716568\n","5 positive ... 0.712259\n","6 positive ... 0.694476\n","7 negative ... 0.673653\n","8 negative ... 0.698439\n","9 negative ... 0.679992\n","10 positive ... 0.750811\n","11 positive ... 0.707760\n","12 negative ... 0.672414\n","13 positive ... 0.738647\n","14 positive ... 0.702648\n","15 negative ... 0.627124\n","16 negative ... 0.666193\n","17 positive ... 0.726400\n","18 positive ... 0.700843\n","19 negative ... 0.664960\n","20 positive ... 0.684760\n","21 negative ... 0.632185\n","22 positive ... 0.722295\n","23 positive ... 0.678016\n","24 positive ... 0.725506\n","25 positive ... 0.738096\n","26 positive ... 0.714807\n","27 positive ... 0.724903\n","28 positive ... 0.685728\n","29 negative ... 0.699257\n","30 negative ... 0.651579\n","31 positive ... 0.723428\n","32 negative ... 0.655187\n","33 negative ... 0.707814\n","34 positive ... 0.719099\n","35 negative ... 0.661534\n","36 positive ... 0.734088\n","37 negative ... 0.682207\n","38 negative ... 0.687950\n","39 negative ... 0.659265\n","40 positive ... 0.698446\n","41 negative ... 0.679171\n","42 negative ... 0.651673\n","43 negative ... 0.667999\n","44 positive ... 0.748773\n","45 positive ... 0.712844\n","46 negative ... 0.657744\n","47 positive ... 0.704365\n","48 positive ... 0.734736\n","49 negative ... 0.687449\n","\n","[50 rows x 8 columns]"]},"metadata":{"tags":[]},"execution_count":7}]},{"cell_type":"markdown","metadata":{"id":"lVyOE2wV0fw_"},"source":["# 4. Test the fitted pipe on new example"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":80},"id":"qdCUg2MR0PD2","executionInfo":{"status":"ok","timestamp":1620188242044,"user_tz":-120,"elapsed":1496,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"a201bda6-d5e8-460d-d53d-af9416e5d519"},"source":["fitted_pipe.predict(\"The virus had a direct impact on the nervous system\")"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
trained_sentimentsentenceorigin_indexdocumentsentence_embedding_usetrained_sentiment_confidence
0positive[The virus had a direct impact on the nervous ...0The virus had a direct impact on the nervous s...[0.005800435319542885, 0.02561130002140999, -0...0.67287
\n","
"],"text/plain":[" trained_sentiment ... trained_sentiment_confidence\n","0 positive ... 0.67287\n","\n","[1 rows x 6 columns]"]},"metadata":{"tags":[]},"execution_count":8}]},{"cell_type":"markdown","metadata":{"id":"xflpwrVjjBVD"},"source":["## 5. Configure pipe training parameters"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"UtsAUGTmOTms","executionInfo":{"status":"ok","timestamp":1620188242045,"user_tz":-120,"elapsed":1004,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"9fc40e4e-8b18-45d7-d003-3be9a40c7481"},"source":["trainable_pipe.print_info()"],"execution_count":null,"outputs":[{"output_type":"stream","text":["The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n",">>> pipe['sentiment_dl'] has settable params:\n","pipe['sentiment_dl'].setMaxEpochs(1) | Info: Maximum number of epochs to train | Currently set to : 1\n","pipe['sentiment_dl'].setLr(0.005) | Info: Learning Rate | Currently set to : 0.005\n","pipe['sentiment_dl'].setBatchSize(64) | Info: Batch size | Currently set to : 64\n","pipe['sentiment_dl'].setDropout(0.5) | Info: Dropout coefficient | Currently set to : 0.5\n","pipe['sentiment_dl'].setEnableOutputLogs(True) | Info: Whether to use stdout in addition to Spark logs. | Currently set to : True\n","pipe['sentiment_dl'].setThreshold(0.6) | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n","pipe['sentiment_dl'].setThresholdLabel('neutral') | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n",">>> pipe['use@tfhub_use'] has settable params:\n","pipe['use@tfhub_use'].setDimension(512) | Info: Number of embedding dimensions | Currently set to : 512\n","pipe['use@tfhub_use'].setLoadSP(False) | Info: Whether to load SentencePiece ops file which is required only by multi-lingual models. This is not changeable after it's set with a pretrained model nor it is compatible with Windows. | Currently set to : False\n","pipe['use@tfhub_use'].setStorageRef('tfhub_use') | Info: unique reference name for identification | Currently set to : tfhub_use\n",">>> pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'] has settable params:\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setExplodeSentences(False) | Info: whether to explode each sentence into a different row, for better parallelization. Defaults to false. | Currently set to : False\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setStorageRef('SentenceDetectorDLModel_c83c27f46b97') | Info: storage unique identifier | Currently set to : SentenceDetectorDLModel_c83c27f46b97\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setEncoder(com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@1ff4ede2) | Info: Data encoder | Currently set to : com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@1ff4ede2\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setImpossiblePenultimates(['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']) | Info: Impossible penultimates | Currently set to : ['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setModelArchitecture('cnn') | Info: Model architecture (CNN) | Currently set to : cnn\n",">>> pipe['document_assembler'] has settable params:\n","pipe['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"2GJdDNV9jEIe"},"source":["## 6. Retrain with new parameters"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":1000},"id":"mptfvHx-MMMX","executionInfo":{"status":"ok","timestamp":1620188276333,"user_tz":-120,"elapsed":4993,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"0ccd6fea-1606-44f8-8dd0-632fbb3bee1b"},"source":["# Train longer!\n","trainable_pipe = nlu.load('train.sentiment')\n","trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(5) \n","fitted_pipe = trainable_pipe.fit(train_df.iloc[:50])\n","# predict with the trainable pipeline on dataset and get predictions\n","preds = fitted_pipe.predict(train_df.iloc[:50],output_level='document')\n","\n","#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n","preds.dropna(inplace=True)\n","print(classification_report(preds['y'], preds['sentiment']))\n","\n","preds"],"execution_count":null,"outputs":[{"output_type":"stream","text":[" precision recall f1-score support\n","\n"," negative 0.00 0.00 0.00 24\n"," neutral 0.00 0.00 0.00 0\n"," positive 0.76 0.96 0.85 26\n","\n"," accuracy 0.50 50\n"," macro avg 0.25 0.32 0.28 50\n","weighted avg 0.39 0.50 0.44 50\n","\n"],"name":"stdout"},{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
ytexttrained_sentimentsentenceorigin_indexdocumentsentence_embedding_usetrained_sentiment_confidence
0positiveThe mAb failed to induce NF-kappa B/Rel nuclea...positive[The mAb failed to induce NF-kappa B/Rel nucle...85The mAb failed to induce NF-kappa B/Rel nuclea...[-0.021044651046395302, -0.012281016446650028,...0.750314
1negativeBecause these induced gene products have NF-ka...neutral[Because these induced gene products have NF-k...937Because these induced gene products have NF-ka...[-0.018397482112050056, -0.0002952178183477372...0.524412
2positiveH2O2-induced NF-kappaB activation in Wurzburg ...positive[H2O2-induced NF-kappaB activation in Wurzburg...1707H2O2-induced NF-kappaB activation in Wurzburg ...[0.04317500814795494, 0.00023781821073498577, ...0.771501
3negativeThe carboxyl-terminal cytoplasmic domain of CD...positive[The carboxyl-terminal cytoplasmic domain of C...1029The carboxyl-terminal cytoplasmic domain of CD...[0.0399024672806263, 0.06178785115480423, -0.0...0.703551
4negativePretreatment with actinomycin D and cyclohexim...positive[Pretreatment with actinomycin D and cyclohexi...258Pretreatment with actinomycin D and cyclohexim...[0.03217654675245285, -0.031980931758880615, 0...0.695075
5positiveOn the other hand, phorbol ester-induced produ...positive[On the other hand, phorbol ester-induced prod...1839On the other hand, phorbol ester-induced produ...[0.05933626368641853, 0.06317632645368576, 0.0...0.776027
6positiveActivated Rac-1 could mimic activated p21ras t...positive[Activated Rac-1 could mimic activated p21ras ...1552Activated Rac-1 could mimic activated p21ras t...[0.012313981540501118, -0.05240459367632866, -...0.816325
7negativeAfter a 2-day incubation in LDL, the binding o...positive[After a 2-day incubation in LDL, the binding ...942After a 2-day incubation in LDL, the binding o...[0.01922391913831234, -0.05796779692173004, -0...0.645191
8negativeAs evidenced by electro mobility shift assay (...positive[As evidenced by electro mobility shift assay ...383As evidenced by electro mobility shift assay (...[0.004417878575623035, -0.020263247191905975, ...0.670496
9negativeMutations at the B2 site abolish this transcri...neutral[Mutations at the B2 site abolish this transcr...1515Mutations at the B2 site abolish this transcri...[0.021449951454997063, 0.017173701897263527, -...0.548251
10positiveElectrophoretic mobility-shift assays of HUVEC...positive[Electrophoretic mobility-shift assays of HUVE...1313Electrophoretic mobility-shift assays of HUVEC...[0.008786042220890522, 0.03473477438092232, -0...0.838780
11positiveIn comparison to other activators of NF-kappa ...positive[In comparison to other activators of NF-kappa...1972In comparison to other activators of NF-kappa ...[0.05494653061032295, 0.05886928364634514, 0.0...0.709446
12negativeHere we show that the I alpha1 promoter contai...neutral[Here we show that the I alpha1 promoter conta...1279Here we show that the I alpha1 promoter contai...[0.024258537217974663, 0.003928740043193102, 0...0.514196
13positiveThus these data suggest that the phorbol myris...positive[Thus these data suggest that the phorbol myri...1094Thus these data suggest that the phorbol myris...[0.04496181383728981, 0.006568090058863163, -0...0.803803
14positiveA 40-fold stimulation of chloramphenicol acety...positive[A 40-fold stimulation of chloramphenicol acet...1528A 40-fold stimulation of chloramphenicol acety...[0.02669776789844036, 0.0031935900915414095, 0...0.719408
15negativeThird, a coimmunoprecipitation assay showed th...neutral[Third, a coimmunoprecipitation assay showed t...1195Third, a coimmunoprecipitation assay showed th...[0.03080531395971775, -0.01305992528796196, 0....0.577207
16negativePhosphorylation of Jak2 in tax transformed cel...neutral[Phosphorylation of Jak2 in tax transformed ce...1725Phosphorylation of Jak2 in tax transformed cel...[0.05668512359261513, -0.014115522615611553, 0...0.535941
17positiveThis peptide potently inhibited NFAT activatio...positive[This peptide potently inhibited NFAT activati...1924This peptide potently inhibited NFAT activatio...[0.051138270646333694, 0.019551211968064308, 0...0.761979
18positiveOverexpression of Bcl-2 in tumor cells blocks ...positive[Overexpression of Bcl-2 in tumor cells blocks...527Overexpression of Bcl-2 in tumor cells blocks ...[0.029882941395044327, 0.031224790960550308, -...0.715134
19negativeNuclear factor kappa B (NF-kappa B) is a pleio...neutral[Nuclear factor kappa B (NF-kappa B) is a plei...284Nuclear factor kappa B (NF-kappa B) is a pleio...[0.04925459995865822, 0.030116116628050804, -0...0.522261
20positiveAn NFAT oligonucleotide carrying mutations in ...positive[An NFAT oligonucleotide carrying mutations in...524An NFAT oligonucleotide carrying mutations in ...[0.014317388646304607, 0.01018354669213295, 0....0.605652
21negativeIn addition Spi-B as well as PU.1 were able to...neutral[In addition Spi-B as well as PU.1 were able t...521In addition Spi-B as well as PU.1 were able to...[0.044076837599277496, -0.007634499575942755, ...0.559043
22positiveNuclear run-on assays demonstrate that : ( 1 )...positive[Nuclear run-on assays demonstrate that : ( 1 ...1064Nuclear run-on assays demonstrate that : ( 1 )...[0.03107193484902382, 0.020781097933650017, 0....0.723425
23positiveFinally, we conclude that this effect of CIITA...neutral[Finally, we conclude that this effect of CIIT...1539Finally, we conclude that this effect of CIITA...[0.033310066908597946, -0.03267408534884453, 0...0.566672
24positiveWe found that ALD induce a transient activatio...positive[We found that ALD induce a transient activati...1994We found that ALD induce a transient activatio...[0.049336668103933334, -0.016328582540154457, ...0.777087
25positiveAFR behaves like ascorbate, while DHA and asco...positive[AFR behaves like ascorbate, while DHA and asc...203AFR behaves like ascorbate, while DHA and asco...[0.06031205505132675, 0.00172607006970793, 0.0...0.827045
26positiveIt has been shown recently that in wild-type C...positive[It has been shown recently that in wild-type ...428It has been shown recently that in wild-type C...[0.06434260308742523, 0.035497453063726425, -0...0.763702
27positiveOncogenic forms of NOTCH1 lacking either the p...positive[Oncogenic forms of NOTCH1 lacking either the ...112Oncogenic forms of NOTCH1 lacking either the p...[0.036650002002716064, -0.005419247783720493, ...0.761244
28positiveRhabdomyosarcomas do not contain mutations in ...positive[Rhabdomyosarcomas do not contain mutations in...1258Rhabdomyosarcomas do not contain mutations in ...[0.04726873338222504, -0.012257322669029236, -...0.607541
29negativeCharacterization of CD40 signaling determinant...positive[Characterization of CD40 signaling determinan...151Characterization of CD40 signaling determinant...[0.005249931011348963, -0.01369913388043642, 0...0.617174
30negativeThus, a component of LDL-enhanced endothelial ...neutral[Thus, a component of LDL-enhanced endothelial...1669Thus, a component of LDL-enhanced endothelial ...[0.06682103872299194, -0.043387044221162796, -...0.532349
31positiveHowever, stimulation with MBP did not produce ...positive[However, stimulation with MBP did not produce...1627However, stimulation with MBP did not produce ...[0.07646981626749039, 0.05336544290184975, -0....0.803395
32negativeWe analyzed the activity of the enhancer, the ...neutral[We analyzed the activity of the enhancer, the...1819We analyzed the activity of the enhancer, the ...[0.058565203100442886, 0.017934346571564674, 0...0.525206
33negativeDeath-inducing ligands (DILs) such as tumor ne...positive[Death-inducing ligands (DILs) such as tumor n...148Death-inducing ligands (DILs) such as tumor ne...[0.018583133816719055, 0.05835242569446564, 0....0.721663
34positiveUsing electrophoretic mobility shift assays, w...positive[Using electrophoretic mobility shift assays, ...857Using electrophoretic mobility shift assays, w...[0.01857064664363861, -1.7747517631505616e-05,...0.762466
35negativecAMP signaling inhibited Stat1 at several diff...neutral[cAMP signaling inhibited Stat1 at several dif...285cAMP signaling inhibited Stat1 at several diff...[0.0070323762483894825, -0.036651283502578735,...0.517350
36positiveIn addition, JNK activation by PMA plus ionoph...positive[In addition, JNK activation by PMA plus ionop...36In addition, JNK activation by PMA plus ionoph...[0.04152128845453262, 0.014748011715710163, 0....0.800704
37negativeThe viral protein Tax induces the activation a...neutral[The viral protein Tax induces the activation ...679The viral protein Tax induces the activation a...[0.06768079847097397, -0.028039786964654922, 0...0.559131
38negativeFurthermore, OFT-1 appeared to have approximat...positive[Furthermore, OFT-1 appeared to have approxima...1527Furthermore, OFT-1 appeared to have approximat...[0.07009289413690567, 0.03128473833203316, -0....0.662930
39negativeA principal objective of the present study was...neutral[A principal objective of the present study wa...479A principal objective of the present study was...[0.017783455550670624, -0.026092059910297394, ...0.569664
40positiveWe conclude that downregulation of WT1 during ...positive[We conclude that downregulation of WT1 during...1561We conclude that downregulation of WT1 during ...[0.00441081915050745, 0.021023083478212357, -0...0.700463
41negativeIn these transfected CL-01 cells, CD40:CD40L e...neutral[In these transfected CL-01 cells, CD40:CD40L ...1188In these transfected CL-01 cells, CD40:CD40L e...[0.010555184446275234, -0.005831445567309856, ...0.548869
42negativeRecent studies indicate that mutations in the ...neutral[Recent studies indicate that mutations in the...667Recent studies indicate that mutations in the ...[0.05200810357928276, -0.0026498467195779085, ...0.535857
43negativeTo confirm the importance of the three cis-act...neutral[To confirm the importance of the three cis-ac...734To confirm the importance of the three cis-act...[0.05337677150964737, -0.028393635526299477, 0...0.538683
44positiveNo significant difference was detected in the ...positive[No significant difference was detected in the...1805No significant difference was detected in the ...[0.05019189417362213, 0.01725945621728897, 0.0...0.838974
45positiveIn contrast, anti-CD3 and anti-CD28 stimulated...positive[In contrast, anti-CD3 and anti-CD28 stimulate...1473In contrast, anti-CD3 and anti-CD28 stimulated...[0.053164657205343246, 0.015389195643365383, 0...0.727595
46negativeIn this study we analyzed the effect of CD40 s...neutral[In this study we analyzed the effect of CD40 ...418In this study we analyzed the effect of CD40 s...[0.016681713983416557, 0.06071694567799568, -0...0.515776
47positiveA dramatic increase in the intracellular level...positive[A dramatic increase in the intracellular leve...689A dramatic increase in the intracellular level...[0.032944243401288986, -0.005736679304391146, ...0.699241
48positiveQualitatively different effects were observed ...positive[Qualitatively different effects were observed...1228Qualitatively different effects were observed ...[0.06803048402070999, 0.039302267134189606, 0....0.795765
49negativeA third tyrosine within the amino-terminal reg...positive[A third tyrosine within the amino-terminal re...1899A third tyrosine within the amino-terminal reg...[0.08541002124547958, 0.0034846188500523567, 0...0.600163
\n","
"],"text/plain":[" y ... trained_sentiment_confidence\n","0 positive ... 0.750314\n","1 negative ... 0.524412\n","2 positive ... 0.771501\n","3 negative ... 0.703551\n","4 negative ... 0.695075\n","5 positive ... 0.776027\n","6 positive ... 0.816325\n","7 negative ... 0.645191\n","8 negative ... 0.670496\n","9 negative ... 0.548251\n","10 positive ... 0.838780\n","11 positive ... 0.709446\n","12 negative ... 0.514196\n","13 positive ... 0.803803\n","14 positive ... 0.719408\n","15 negative ... 0.577207\n","16 negative ... 0.535941\n","17 positive ... 0.761979\n","18 positive ... 0.715134\n","19 negative ... 0.522261\n","20 positive ... 0.605652\n","21 negative ... 0.559043\n","22 positive ... 0.723425\n","23 positive ... 0.566672\n","24 positive ... 0.777087\n","25 positive ... 0.827045\n","26 positive ... 0.763702\n","27 positive ... 0.761244\n","28 positive ... 0.607541\n","29 negative ... 0.617174\n","30 negative ... 0.532349\n","31 positive ... 0.803395\n","32 negative ... 0.525206\n","33 negative ... 0.721663\n","34 positive ... 0.762466\n","35 negative ... 0.517350\n","36 positive ... 0.800704\n","37 negative ... 0.559131\n","38 negative ... 0.662930\n","39 negative ... 0.569664\n","40 positive ... 0.700463\n","41 negative ... 0.548869\n","42 negative ... 0.535857\n","43 negative ... 0.538683\n","44 positive ... 0.838974\n","45 positive ... 0.727595\n","46 negative ... 0.515776\n","47 positive ... 0.699241\n","48 positive ... 0.795765\n","49 negative ... 0.600163\n","\n","[50 rows x 8 columns]"]},"metadata":{"tags":[]},"execution_count":11}]},{"cell_type":"markdown","metadata":{"id":"qFoT-s1MjTSS"},"source":["# 7. Try training with different Embeddings"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"nxWFzQOhjWC8","executionInfo":{"status":"ok","timestamp":1620188276334,"user_tz":-120,"elapsed":4701,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"b4d5e2b4-e0a5-4431-8ecd-6adf5f8a718c"},"source":["# We can use nlu.print_components(action='embed_sentence') to see every possibler sentence embedding we could use. Lets use bert!\n","nlu.print_components(action='embed_sentence')"],"execution_count":null,"outputs":[{"output_type":"stream","text":["For language NLU provides the following Models : \n","nlu.load('en.embed_sentence') returns Spark NLP model tfhub_use\n","nlu.load('en.embed_sentence.use') returns Spark NLP model tfhub_use\n","nlu.load('en.embed_sentence.tfhub_use') returns Spark NLP model tfhub_use\n","nlu.load('en.embed_sentence.use.lg') returns Spark NLP model tfhub_use_lg\n","nlu.load('en.embed_sentence.tfhub_use.lg') returns Spark NLP model tfhub_use_lg\n","nlu.load('en.embed_sentence.albert') returns Spark NLP model albert_base_uncased\n","nlu.load('en.embed_sentence.electra') returns Spark NLP model sent_electra_small_uncased\n","nlu.load('en.embed_sentence.electra_small_uncased') returns Spark NLP model sent_electra_small_uncased\n","nlu.load('en.embed_sentence.electra_base_uncased') returns Spark NLP model sent_electra_base_uncased\n","nlu.load('en.embed_sentence.electra_large_uncased') returns Spark NLP model sent_electra_large_uncased\n","nlu.load('en.embed_sentence.bert') returns Spark NLP model sent_bert_base_uncased\n","nlu.load('en.embed_sentence.bert_base_uncased') returns Spark NLP model sent_bert_base_uncased\n","nlu.load('en.embed_sentence.bert_base_cased') returns Spark NLP model sent_bert_base_cased\n","nlu.load('en.embed_sentence.bert_large_uncased') returns Spark NLP model sent_bert_large_uncased\n","nlu.load('en.embed_sentence.bert_large_cased') returns Spark NLP model sent_bert_large_cased\n","nlu.load('en.embed_sentence.biobert.pubmed_base_cased') returns Spark NLP model sent_biobert_pubmed_base_cased\n","nlu.load('en.embed_sentence.biobert.pubmed_large_cased') returns Spark NLP model sent_biobert_pubmed_large_cased\n","nlu.load('en.embed_sentence.biobert.pmc_base_cased') returns Spark NLP model sent_biobert_pmc_base_cased\n","nlu.load('en.embed_sentence.biobert.pubmed_pmc_base_cased') returns Spark NLP model sent_biobert_pubmed_pmc_base_cased\n","nlu.load('en.embed_sentence.biobert.clinical_base_cased') returns Spark NLP model sent_biobert_clinical_base_cased\n","nlu.load('en.embed_sentence.biobert.discharge_base_cased') returns Spark NLP model sent_biobert_discharge_base_cased\n","nlu.load('en.embed_sentence.covidbert.large_uncased') returns Spark NLP model sent_covidbert_large_uncased\n","nlu.load('en.embed_sentence.small_bert_L2_128') returns Spark NLP model sent_small_bert_L2_128\n","nlu.load('en.embed_sentence.small_bert_L4_128') returns Spark NLP model sent_small_bert_L4_128\n","nlu.load('en.embed_sentence.small_bert_L6_128') returns Spark NLP model sent_small_bert_L6_128\n","nlu.load('en.embed_sentence.small_bert_L8_128') returns Spark NLP model sent_small_bert_L8_128\n","nlu.load('en.embed_sentence.small_bert_L10_128') returns Spark NLP model sent_small_bert_L10_128\n","nlu.load('en.embed_sentence.small_bert_L12_128') returns Spark NLP model sent_small_bert_L12_128\n","nlu.load('en.embed_sentence.small_bert_L2_256') returns Spark NLP model sent_small_bert_L2_256\n","nlu.load('en.embed_sentence.small_bert_L4_256') returns Spark NLP model sent_small_bert_L4_256\n","nlu.load('en.embed_sentence.small_bert_L6_256') returns Spark NLP model sent_small_bert_L6_256\n","nlu.load('en.embed_sentence.small_bert_L8_256') returns Spark NLP model sent_small_bert_L8_256\n","nlu.load('en.embed_sentence.small_bert_L10_256') returns Spark NLP model sent_small_bert_L10_256\n","nlu.load('en.embed_sentence.small_bert_L12_256') returns Spark NLP model sent_small_bert_L12_256\n","nlu.load('en.embed_sentence.small_bert_L2_512') returns Spark NLP model sent_small_bert_L2_512\n","nlu.load('en.embed_sentence.small_bert_L4_512') returns Spark NLP model sent_small_bert_L4_512\n","nlu.load('en.embed_sentence.small_bert_L6_512') returns Spark NLP model sent_small_bert_L6_512\n","nlu.load('en.embed_sentence.small_bert_L8_512') returns Spark NLP model sent_small_bert_L8_512\n","nlu.load('en.embed_sentence.small_bert_L10_512') returns Spark NLP model sent_small_bert_L10_512\n","nlu.load('en.embed_sentence.small_bert_L12_512') returns Spark NLP model sent_small_bert_L12_512\n","nlu.load('en.embed_sentence.small_bert_L2_768') returns Spark NLP model sent_small_bert_L2_768\n","nlu.load('en.embed_sentence.small_bert_L4_768') returns Spark NLP model sent_small_bert_L4_768\n","nlu.load('en.embed_sentence.small_bert_L6_768') returns Spark NLP model sent_small_bert_L6_768\n","nlu.load('en.embed_sentence.small_bert_L8_768') returns Spark NLP model sent_small_bert_L8_768\n","nlu.load('en.embed_sentence.small_bert_L10_768') returns Spark NLP model sent_small_bert_L10_768\n","nlu.load('en.embed_sentence.small_bert_L12_768') returns Spark NLP model sent_small_bert_L12_768\n","For language NLU provides the following Models : \n","nlu.load('fi.embed_sentence') returns Spark NLP model sent_bert_finnish_cased\n","nlu.load('fi.embed_sentence.bert.cased') returns Spark NLP model sent_bert_finnish_cased\n","nlu.load('fi.embed_sentence.bert.uncased') returns Spark NLP model sent_bert_finnish_uncased\n","For language NLU provides the following Models : \n","nlu.load('xx.embed_sentence') returns Spark NLP model sent_bert_multi_cased\n","nlu.load('xx.embed_sentence.bert') returns Spark NLP model sent_bert_multi_cased\n","nlu.load('xx.embed_sentence.bert.cased') returns Spark NLP model sent_bert_multi_cased\n","nlu.load('xx.embed_sentence.labse') returns Spark NLP model labse\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"IKK_Ii_gjJfF","executionInfo":{"status":"ok","timestamp":1620189387010,"user_tz":-120,"elapsed":1115278,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"6b006f1b-dc1b-454d-f65a-95d87fd62f07"},"source":["trainable_pipe = nlu.load('en.embed_sentence.small_bert_L12_768 train.sentiment')\n","# We need to train longer and user smaller LR for NON-USE based sentence embeddings usually\n","# We could tune the hyperparameters further with hyperparameter tuning methods like gridsearch\n","# Also longer training gives more accuracy\n","trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(120) \n","trainable_pipe['trainable_sentiment_dl'].setLr(0.0005) \n","fitted_pipe = trainable_pipe.fit(train_df)\n","# predict with the trainable pipeline on dataset and get predictions\n","preds = fitted_pipe.predict(train_df,output_level='document')\n","\n","#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n","preds.dropna(inplace=True)\n","print(classification_report(preds['y'], preds['sentiment']))\n","\n","#preds"],"execution_count":null,"outputs":[{"output_type":"stream","text":["sent_small_bert_L12_768 download started this may take some time.\n","Approximate size to download 392.9 MB\n","[OK!]\n","sentence_detector_dl download started this may take some time.\n","Approximate size to download 354.6 KB\n","[OK!]\n"," precision recall f1-score support\n","\n"," negative 0.95 0.86 0.90 791\n"," neutral 0.00 0.00 0.00 0\n"," positive 0.91 0.92 0.92 809\n","\n"," accuracy 0.89 1600\n"," macro avg 0.62 0.59 0.61 1600\n","weighted avg 0.93 0.89 0.91 1600\n","\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"_1jxw3GnVGlI"},"source":["# 7.1 evaluate on Test Data"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"Fxx4yNkNVGFl","executionInfo":{"status":"ok","timestamp":1620189890877,"user_tz":-120,"elapsed":85461,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"ce8e9318-9a86-4c78-d5de-cd73ba3de282"},"source":["preds = fitted_pipe.predict(test_df,output_level='document')\n","\n","#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n","preds.dropna(inplace=True)\n","print(classification_report(preds['y'], preds['sentiment']))"],"execution_count":null,"outputs":[{"output_type":"stream","text":[" precision recall f1-score support\n","\n"," negative 0.92 0.81 0.86 209\n"," neutral 0.00 0.00 0.00 0\n"," positive 0.85 0.89 0.87 191\n","\n"," accuracy 0.85 400\n"," macro avg 0.59 0.57 0.58 400\n","weighted avg 0.89 0.85 0.86 400\n","\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"2BB-NwZUoHSe"},"source":["# 8. Lets save the model"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"eLex095goHwm","executionInfo":{"status":"ok","timestamp":1620190077725,"user_tz":-120,"elapsed":272045,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"35f50714-cad7-4123-d117-f6aea7095f79"},"source":["stored_model_path = './models/classifier_dl_trained' \n","fitted_pipe.save(stored_model_path)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["Stored model in ./models/classifier_dl_trained\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"e_b2DPd4rCiU"},"source":["# 9. Lets load the model from HDD.\n","This makes Offlien NLU usage possible! \n","You need to call nlu.load(path=path_to_the_pipe) to load a model/pipeline from disk."]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":80},"id":"SO4uz45MoRgp","executionInfo":{"status":"ok","timestamp":1620190092860,"user_tz":-120,"elapsed":286890,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"c891d7d9-52c8-456c-a354-0bcc81af43eb"},"source":["hdd_pipe = nlu.load(path=stored_model_path)\n","\n","preds = hdd_pipe.predict('The virus had a direct impact on the nervous system')\n","preds"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
sentiment_confidencesentencetextsentence_embedding_from_disksentimentdocumentorigin_index
0[0.9973864][The virus had a direct impact on the nervous ...The virus had a direct impact on the nervous s...[[0.19975340366363525, 0.40417489409446716, 0....[negative]The virus had a direct impact on the nervous s...8589934592
\n","
"],"text/plain":[" sentiment_confidence ... origin_index\n","0 [0.9973864] ... 8589934592\n","\n","[1 rows x 7 columns]"]},"metadata":{"tags":[]},"execution_count":17}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"e0CVlkk9v6Qi","executionInfo":{"status":"ok","timestamp":1620190092863,"user_tz":-120,"elapsed":286783,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"9f4b2eb0-6144-4e6b-d6c6-c782ddd192eb"},"source":["hdd_pipe.print_info()"],"execution_count":null,"outputs":[{"output_type":"stream","text":["The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n",">>> pipe['document_assembler'] has settable params:\n","pipe['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n",">>> pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'] has settable params:\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setExplodeSentences(False) | Info: whether to explode each sentence into a different row, for better parallelization. Defaults to false. | Currently set to : False\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setStorageRef('SentenceDetectorDLModel_c83c27f46b97') | Info: storage unique identifier | Currently set to : SentenceDetectorDLModel_c83c27f46b97\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setEncoder(com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@3a4ced43) | Info: Data encoder | Currently set to : com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@3a4ced43\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setImpossiblePenultimates(['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']) | Info: Impossible penultimates | Currently set to : ['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setModelArchitecture('cnn') | Info: Model architecture (CNN) | Currently set to : cnn\n",">>> pipe['bert_sentence@sent_small_bert_L12_768'] has settable params:\n","pipe['bert_sentence@sent_small_bert_L12_768'].setBatchSize(8) | Info: Size of every batch | Currently set to : 8\n","pipe['bert_sentence@sent_small_bert_L12_768'].setCaseSensitive(False) | Info: whether to ignore case in tokens for embeddings matching | Currently set to : False\n","pipe['bert_sentence@sent_small_bert_L12_768'].setDimension(768) | Info: Number of embedding dimensions | Currently set to : 768\n","pipe['bert_sentence@sent_small_bert_L12_768'].setMaxSentenceLength(128) | Info: Max sentence length to process | Currently set to : 128\n","pipe['bert_sentence@sent_small_bert_L12_768'].setIsLong(False) | Info: Use Long type instead of Int type for inputs buffer - Some Bert models require Long instead of Int. | Currently set to : False\n","pipe['bert_sentence@sent_small_bert_L12_768'].setStorageRef('sent_small_bert_L12_768') | Info: unique reference name for identification | Currently set to : sent_small_bert_L12_768\n",">>> pipe['sentiment_dl@sent_small_bert_L12_768'] has settable params:\n","pipe['sentiment_dl@sent_small_bert_L12_768'].setThreshold(0.6) | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n","pipe['sentiment_dl@sent_small_bert_L12_768'].setThresholdLabel('neutral') | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n","pipe['sentiment_dl@sent_small_bert_L12_768'].setClasses(['positive', 'negative']) | Info: get the tags used to trained this SentimentDLModel | Currently set to : ['positive', 'negative']\n","pipe['sentiment_dl@sent_small_bert_L12_768'].setStorageRef('sent_small_bert_L12_768') | Info: unique reference name for identification | Currently set to : sent_small_bert_L12_768\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"CtOuwWgAvqXw"},"source":[""],"execution_count":null,"outputs":[]}]} \ No newline at end of file +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "zkufh760uvF3" + }, + "source": [ + "![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)\n", + "\n", + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/Training/binary_text_classification/NLU_training_negation_classifier_demo_biological_texts.ipynb)\n", + "\n", + "\n", + "# Training a Sentiment Analysis Classifier with NLU\n", + "## 2 Class Biological Negation Classifer Training\n", + "With the [SentimentDL model](https://nlp.johnsnowlabs.com/docs/en/annotators#sentimentdl-multi-class-sentiment-analysis-annotator) from Spark NLP you can achieve State Of the Art results on any multi class text classification problem\n", + "\n", + "This notebook showcases the following features :\n", + "\n", + "- How to train the deep learning classifier\n", + "- How to store a pipeline to disk\n", + "- How to load the pipeline from disk (Enables NLU offline mode)\n", + "\n", + "You can achieve these results or even better on this dataset with training data :\n", + "\n", + "
\n", + "\n", + "![image.png]()\n", + "\n", + "\n", + "You can achieve these results or even better on this dataset with test data :\n", + "\n", + "
\n", + "\n", + "\n", + "![Screenshot 2021-02-25 140123.png]()\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dur2drhW5Rvi" + }, + "source": [ + "# 1. Colab Setup" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "hFGnBCHavltY" + }, + "source": [ + "!pip install -q johnsnowlabs" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "f4KkTfnR5Ugg" + }, + "source": [ + "# 2. Download Negation Bilogical Texts dataset\n", + "https://www.kaggle.com/ma7555/bioscope-corpus-negation-annotated\n", + "#Context\n", + "The BioScope corpus consists of medical and biological texts annotated for negation and their linguistic scope. This was done to allow a comparison between the development of systems for negation/hedge detection and scope resolution.\n", + "The corpus is publicly available for research purposes.\n", + "\n", + "You can use this corpus to fine-tune a BERT-like model for negation detection.\n", + "\n", + "This dataset was created in this format during the COVID-19 crisis as a training set for detecting negations regarding treatment of specific drugs in the released research papers.\n", + "\n", + "Creators of the original dataset: MTA-SZTE Research Group on Artificial Intelligence - RGAI\n", + "https://rgai.inf.u-szeged.hu/node/105\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "OrVb5ZMvvrQD" + }, + "source": [ + "! wget https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/resources/en/classifier-dl/bioscope_abstract/bioscope_abstract.csv\n" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 424 + }, + "id": "y4xSRWIhwT28", + "outputId": "f8f31c71-7046-474a-fd36-fb21ac09e091" + }, + "source": [ + "import pandas as pd\n", + "train_path = '/content/bioscope_abstract.csv'\n", + "\n", + "train_df = pd.read_csv(train_path)\n", + "# the text data to use for classification should be in a column named 'text' and label column should be named 'y' or 'label' or 'labels'\n", + "columns=['text','y']\n", + "train_df = train_df[columns]\n", + "train_df = train_df.dropna()\n", + "train_df = train_df.sample(frac=1).reset_index(drop=True)\n", + "from sklearn.model_selection import train_test_split\n", + "\n", + "train_df, test_df = train_test_split(train_df, test_size=0.2)\n", + "train_df" + ], + "execution_count": 3, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " text y\n", + "7142 These apparent inconsistencies reflect the com... positive\n", + "2305 Different glucocorticoid hormones (GCH) show d... positive\n", + "1203 Northern blot analysis of RNA purified from B ... positive\n", + "2093 These results indicate that E3 is a hematopoie... positive\n", + "7594 During recent years, studies of insulin-gene r... positive\n", + "... ... ...\n", + "3894 It can also be distinguished from other previo... positive\n", + "2990 We tested the effects of BHA, a phenolic, lipi... positive\n", + "11027 Sequence analyses of pCD41 indicate that there... positive\n", + "9537 Over a 72-hr period of activation, the express... positive\n", + "7699 The B cell NFAT complex, however, was not func... negative\n", + "\n", + "[9594 rows x 2 columns]" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
texty
7142These apparent inconsistencies reflect the com...positive
2305Different glucocorticoid hormones (GCH) show d...positive
1203Northern blot analysis of RNA purified from B ...positive
2093These results indicate that E3 is a hematopoie...positive
7594During recent years, studies of insulin-gene r...positive
.........
3894It can also be distinguished from other previo...positive
2990We tested the effects of BHA, a phenolic, lipi...positive
11027Sequence analyses of pCD41 indicate that there...positive
9537Over a 72-hr period of activation, the express...positive
7699The B cell NFAT complex, however, was not func...negative
\n", + "

9594 rows × 2 columns

\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 3 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0296Om2C5anY" + }, + "source": [ + "# 3. Train Deep Learning Classifier using nlu.load('train.sentiment')\n", + "\n", + "You dataset label column should be named 'y' and the feature column with text data should be named 'text'" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "id": "3ZIPkRkWftBG", + "outputId": "bdfad334-78a0-49f2-9271-1a727ddaa17a" + }, + "source": [ + "from johnsnowlabs import nlp\n", + "from sklearn.metrics import classification_report\n", + "\n", + "# load a trainable pipeline by specifying the train. prefix and fit it on a datset with label and text columns\n", + "# by default the Universal Sentence Encoder (USE) Sentence embeddings are used for generation\n", + "trainable_pipe = nlp.load('train.sentiment')\n", + "fitted_pipe = trainable_pipe.fit(train_df.iloc[:50])\n", + "\n", + "# predict with the trainable pipeline on dataset and get predictions\n", + "preds = fitted_pipe.predict(train_df.iloc[:50],output_level='document')\n", + "#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n", + "preds.dropna(inplace=True)\n", + "print(classification_report(preds['y'], preds['sentiment']))\n", + "\n", + "preds" + ], + "execution_count": 4, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "sent_small_bert_L2_128 download started this may take some time.\n", + "Approximate size to download 16.1 MB\n", + "[OK!]\n", + " precision recall f1-score support\n", + "\n", + " negative 0.00 0.00 0.00 6\n", + " positive 0.88 1.00 0.94 44\n", + "\n", + " accuracy 0.88 50\n", + " macro avg 0.44 0.50 0.47 50\n", + "weighted avg 0.77 0.88 0.82 50\n", + "\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " document \\\n", + "0 These apparent inconsistencies reflect the com... \n", + "1 Different glucocorticoid hormones (GCH) show d... \n", + "2 Northern blot analysis of RNA purified from B ... \n", + "3 These results indicate that E3 is a hematopoie... \n", + "4 During recent years, studies of insulin-gene r... \n", + "5 To our knowledge, this constitutes the first i... \n", + "6 In contrast, gp41 failed to stimulate NF-kappa... \n", + "7 Data obtained from studies in our laboratories... \n", + "8 RESULTS: Interleukin-6 protein and mRNA produc... \n", + "9 When submitted to an in vitro CD4 cross-linkin... \n", + "10 Treatment of human resting T cells with phorbo... \n", + "11 Distinct DNase-I hypersensitive sites are asso... \n", + "12 The biological activity of 11 alpha-methyl-1 a... \n", + "13 These observations indicate that the monovalen... \n", + "14 The M-CSF receptor was first detectable in the... \n", + "15 TNFRI has been recently shown to activate NF-k... \n", + "16 The two mutations R217A and R294A caused an in... \n", + "17 Cipro did not affect the nuclear transcription... \n", + "18 Conversely, diurnal rhythmicity persisted in a... \n", + "19 The immunoglobulin heavy chain (IgH) class swi... \n", + "20 Four regions in this DNA fragment interact wit... \n", + "21 Previous characterization of the GPIX promoter... \n", + "22 Neutrophil maturation was impaired in PEBP2bet... \n", + "23 Treatment of normal monocytes with 12-0-tetrad... \n", + "24 We have cloned the gene for a new ets-related ... \n", + "25 Expression and genomic configuration of GM-CSF... \n", + "26 Consequently, cytosolic activation, nuclear tr... \n", + "27 We propose that Jun plays a bifunctional role ... \n", + "28 In B lymphoid cells, deltaspi-B and spi-B mRNA... \n", + "29 In order to study CD14 gene regulation, the hu... \n", + "30 E3 transcripts were RA-inducible in HL60 cells... \n", + "31 IL-7 also delayed the decreases in the levels ... \n", + "32 A marked activation of gamma-promoter activity... \n", + "33 These results demonstrate that the transcripti... \n", + "34 The A6H monoclonal antibody (mAb) recognizes a... \n", + "35 The highly specific granulomonocyte-associated... \n", + "36 Compared to benign tumor or mammary reduction ... \n", + "37 Sequence-specific DNA-binding small molecules ... \n", + "38 Structure function analysis of vitamin D analo... \n", + "39 To determine the mechanisms responsible for th... \n", + "40 DNA band-shift analysis reveals NF-kappa B bin... \n", + "41 RESULTS: Both all-trans and 9-cis RA inhibited... \n", + "42 Negative selection in the cortex appears to be... \n", + "43 Nitric oxide-stimulated guanine nucleotide exc... \n", + "44 We show that the ability to detect NF-ATp in a... \n", + "45 C3/5 cells developed specific proliferation an... \n", + "46 Recent reports demonstrated that ionizing radi... \n", + "47 Here we examine molecular mechanisms controlli... \n", + "48 These results demonstrate that multiple X1 box... \n", + "49 IL-2 and IL-7 were equivalent in their ability... \n", + "\n", + " sentence_embedding_small_bert_L2_128 sentiment \\\n", + "0 [-0.540424644947052, 0.6268006563186646, -0.67... positive \n", + "1 [-0.7596609592437744, -0.5880932807922363, -0.... positive \n", + "2 [-0.5203543305397034, 0.033645179122686386, -0... positive \n", + "3 [-0.7046567797660828, 0.33753958344459534, -0.... positive \n", + "4 [-0.7543756365776062, 0.4511456787586212, -0.9... positive \n", + "5 [-1.0517436265945435, -0.0811738669872284, -0.... positive \n", + "6 [-0.8322800397872925, 0.5296803116798401, -0.5... positive \n", + "7 [-0.7395010590553284, 0.5016824007034302, -0.5... positive \n", + "8 [-0.06874295324087143, 0.3614305853843689, -0.... positive \n", + "9 [-0.24975398182868958, 0.007427605800330639, -... positive \n", + "10 [-0.7139838933944702, -0.15548910200595856, -0... positive \n", + "11 [0.16768714785575867, -0.4549502730369568, -0.... positive \n", + "12 [-0.6323443055152893, -0.3998635411262512, -0.... positive \n", + "13 [-0.5919932723045349, 0.1734667867422104, -0.8... positive \n", + "14 [-0.526080310344696, 0.1735551655292511, -0.53... positive \n", + "15 [-0.7427854537963867, -0.3252595067024231, -0.... positive \n", + "16 [-0.5198838710784912, 0.6203998327255249, -0.6... positive \n", + "17 [-0.010745533742010593, 0.36891499161720276, -... positive \n", + "18 [-0.6558191180229187, -0.4102998673915863, -0.... positive \n", + "19 [-0.5788812637329102, 0.16702984273433685, -0.... positive \n", + "20 [0.4348486065864563, -0.4957105219364166, -0.9... positive \n", + "21 [-1.4378761053085327, 0.35956352949142456, -0.... positive \n", + "22 [-0.9040020704269409, -0.3321942389011383, -0.... positive \n", + "23 [-0.38281142711639404, -0.09264473617076874, -... positive \n", + "24 [-0.06995158642530441, 0.0628473237156868, -0.... positive \n", + "25 [-0.2674736976623535, 0.6871048212051392, -0.5... positive \n", + "26 [-0.11669447273015976, -0.41803038120269775, -... positive \n", + "27 [-0.3439536988735199, -0.11270888149738312, -0... positive \n", + "28 [-0.026912417262792587, -0.07306383550167084, ... positive \n", + "29 [-0.6894456744194031, 0.6227385997772217, -0.4... positive \n", + "30 [0.055561263114213943, 0.29671669006347656, -0... positive \n", + "31 [-1.2081276178359985, -0.4925746023654938, -0.... positive \n", + "32 [-0.7283461093902588, -0.00887372437864542, -0... positive \n", + "33 [-0.8078993558883667, 0.024193232879042625, -0... positive \n", + "34 [-0.6170817017555237, 0.176478773355484, -0.77... positive \n", + "35 [-0.5863608717918396, 0.10614630579948425, -0.... positive \n", + "36 [0.0020447946153581142, 0.1792808622121811, -0... positive \n", + "37 [-0.2069147676229477, 0.016566185280680656, -0... positive \n", + "38 [-0.4320926070213318, 0.74772709608078, -0.559... positive \n", + "39 [-0.37579116225242615, 0.5168997049331665, -0.... positive \n", + "40 [-0.04819030314683914, 0.04168706387281418, -0... positive \n", + "41 [-0.5226595401763916, -0.33986032009124756, -0... positive \n", + "42 [-0.022223301231861115, -0.22735534608364105, ... positive \n", + "43 [-0.3036770224571228, -0.13188859820365906, -0... positive \n", + "44 [-0.6237607598304749, 0.3612492084503174, -0.4... positive \n", + "45 [-0.26343587040901184, -0.09341304749250412, -... positive \n", + "46 [-0.5906685590744019, 0.20001085102558136, -0.... positive \n", + "47 [-0.4508644938468933, 0.2742282450199127, -0.0... positive \n", + "48 [-0.3715269863605499, 0.3266661465167999, -0.3... positive \n", + "49 [-0.7310452461242676, -0.2004045844078064, -0.... positive \n", + "\n", + " sentiment_confidence text \\\n", + "0 1.0 These apparent inconsistencies reflect the com... \n", + "1 3.0 Different glucocorticoid hormones (GCH) show d... \n", + "2 2.0 Northern blot analysis of RNA purified from B ... \n", + "3 4.0 These results indicate that E3 is a hematopoie... \n", + "4 4.0 During recent years, studies of insulin-gene r... \n", + "5 3.0 To our knowledge, this constitutes the first i... \n", + "6 2.0 In contrast, gp41 failed to stimulate NF-kappa... \n", + "7 1.0 Data obtained from studies in our laboratories... \n", + "8 8.0 RESULTS: Interleukin-6 protein and mRNA produc... \n", + "9 2.0 When submitted to an in vitro CD4 cross-linkin... \n", + "10 5.0 Treatment of human resting T cells with phorbo... \n", + "11 1.0 Distinct DNase-I hypersensitive sites are asso... \n", + "12 4.0 The biological activity of 11 alpha-methyl-1 a... \n", + "13 3.0 These observations indicate that the monovalen... \n", + "14 9.0 The M-CSF receptor was first detectable in the... \n", + "15 9.0 TNFRI has been recently shown to activate NF-k... \n", + "16 3.0 The two mutations R217A and R294A caused an in... \n", + "17 8.0 Cipro did not affect the nuclear transcription... \n", + "18 8.0 Conversely, diurnal rhythmicity persisted in a... \n", + "19 2.0 The immunoglobulin heavy chain (IgH) class swi... \n", + "20 1.0 Four regions in this DNA fragment interact wit... \n", + "21 4.0 Previous characterization of the GPIX promoter... \n", + "22 8.0 Neutrophil maturation was impaired in PEBP2bet... \n", + "23 1.0 Treatment of normal monocytes with 12-0-tetrad... \n", + "24 5.0 We have cloned the gene for a new ets-related ... \n", + "25 4.0 Expression and genomic configuration of GM-CSF... \n", + "26 5.0 Consequently, cytosolic activation, nuclear tr... \n", + "27 3.0 We propose that Jun plays a bifunctional role ... \n", + "28 1.0 In B lymphoid cells, deltaspi-B and spi-B mRNA... \n", + "29 8.0 In order to study CD14 gene regulation, the hu... \n", + "30 9.0 E3 transcripts were RA-inducible in HL60 cells... \n", + "31 3.0 IL-7 also delayed the decreases in the levels ... \n", + "32 1.0 A marked activation of gamma-promoter activity... \n", + "33 6.0 These results demonstrate that the transcripti... \n", + "34 2.0 The A6H monoclonal antibody (mAb) recognizes a... \n", + "35 1.0 The highly specific granulomonocyte-associated... \n", + "36 1.0 Compared to benign tumor or mammary reduction ... \n", + "37 1.0 Sequence-specific DNA-binding small molecules ... \n", + "38 1.0 Structure function analysis of vitamin D analo... \n", + "39 5.0 To determine the mechanisms responsible for th... \n", + "40 7.0 DNA band-shift analysis reveals NF-kappa B bin... \n", + "41 5.0 RESULTS: Both all-trans and 9-cis RA inhibited... \n", + "42 5.0 Negative selection in the cortex appears to be... \n", + "43 7.0 Nitric oxide-stimulated guanine nucleotide exc... \n", + "44 2.0 We show that the ability to detect NF-ATp in a... \n", + "45 1.0 C3/5 cells developed specific proliferation an... \n", + "46 4.0 Recent reports demonstrated that ionizing radi... \n", + "47 2.0 Here we examine molecular mechanisms controlli... \n", + "48 2.0 These results demonstrate that multiple X1 box... \n", + "49 3.0 IL-2 and IL-7 were equivalent in their ability... \n", + "\n", + " y \n", + "0 positive \n", + "1 positive \n", + "2 positive \n", + "3 positive \n", + "4 positive \n", + "5 positive \n", + "6 negative \n", + "7 positive \n", + "8 positive \n", + "9 positive \n", + "10 positive \n", + "11 positive \n", + "12 positive \n", + "13 positive \n", + "14 positive \n", + "15 positive \n", + "16 negative \n", + "17 negative \n", + "18 positive \n", + "19 positive \n", + "20 positive \n", + "21 positive \n", + "22 positive \n", + "23 positive \n", + "24 positive \n", + "25 positive \n", + "26 positive \n", + "27 positive \n", + "28 positive \n", + "29 positive \n", + "30 negative \n", + "31 positive \n", + "32 positive \n", + "33 positive \n", + "34 positive \n", + "35 positive \n", + "36 positive \n", + "37 positive \n", + "38 positive \n", + "39 positive \n", + "40 positive \n", + "41 negative \n", + "42 negative \n", + "43 positive \n", + "44 positive \n", + "45 positive \n", + "46 positive \n", + "47 positive \n", + "48 positive \n", + "49 positive " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
documentsentence_embedding_small_bert_L2_128sentimentsentiment_confidencetexty
0These apparent inconsistencies reflect the com...[-0.540424644947052, 0.6268006563186646, -0.67...positive1.0These apparent inconsistencies reflect the com...positive
1Different glucocorticoid hormones (GCH) show d...[-0.7596609592437744, -0.5880932807922363, -0....positive3.0Different glucocorticoid hormones (GCH) show d...positive
2Northern blot analysis of RNA purified from B ...[-0.5203543305397034, 0.033645179122686386, -0...positive2.0Northern blot analysis of RNA purified from B ...positive
3These results indicate that E3 is a hematopoie...[-0.7046567797660828, 0.33753958344459534, -0....positive4.0These results indicate that E3 is a hematopoie...positive
4During recent years, studies of insulin-gene r...[-0.7543756365776062, 0.4511456787586212, -0.9...positive4.0During recent years, studies of insulin-gene r...positive
5To our knowledge, this constitutes the first i...[-1.0517436265945435, -0.0811738669872284, -0....positive3.0To our knowledge, this constitutes the first i...positive
6In contrast, gp41 failed to stimulate NF-kappa...[-0.8322800397872925, 0.5296803116798401, -0.5...positive2.0In contrast, gp41 failed to stimulate NF-kappa...negative
7Data obtained from studies in our laboratories...[-0.7395010590553284, 0.5016824007034302, -0.5...positive1.0Data obtained from studies in our laboratories...positive
8RESULTS: Interleukin-6 protein and mRNA produc...[-0.06874295324087143, 0.3614305853843689, -0....positive8.0RESULTS: Interleukin-6 protein and mRNA produc...positive
9When submitted to an in vitro CD4 cross-linkin...[-0.24975398182868958, 0.007427605800330639, -...positive2.0When submitted to an in vitro CD4 cross-linkin...positive
10Treatment of human resting T cells with phorbo...[-0.7139838933944702, -0.15548910200595856, -0...positive5.0Treatment of human resting T cells with phorbo...positive
11Distinct DNase-I hypersensitive sites are asso...[0.16768714785575867, -0.4549502730369568, -0....positive1.0Distinct DNase-I hypersensitive sites are asso...positive
12The biological activity of 11 alpha-methyl-1 a...[-0.6323443055152893, -0.3998635411262512, -0....positive4.0The biological activity of 11 alpha-methyl-1 a...positive
13These observations indicate that the monovalen...[-0.5919932723045349, 0.1734667867422104, -0.8...positive3.0These observations indicate that the monovalen...positive
14The M-CSF receptor was first detectable in the...[-0.526080310344696, 0.1735551655292511, -0.53...positive9.0The M-CSF receptor was first detectable in the...positive
15TNFRI has been recently shown to activate NF-k...[-0.7427854537963867, -0.3252595067024231, -0....positive9.0TNFRI has been recently shown to activate NF-k...positive
16The two mutations R217A and R294A caused an in...[-0.5198838710784912, 0.6203998327255249, -0.6...positive3.0The two mutations R217A and R294A caused an in...negative
17Cipro did not affect the nuclear transcription...[-0.010745533742010593, 0.36891499161720276, -...positive8.0Cipro did not affect the nuclear transcription...negative
18Conversely, diurnal rhythmicity persisted in a...[-0.6558191180229187, -0.4102998673915863, -0....positive8.0Conversely, diurnal rhythmicity persisted in a...positive
19The immunoglobulin heavy chain (IgH) class swi...[-0.5788812637329102, 0.16702984273433685, -0....positive2.0The immunoglobulin heavy chain (IgH) class swi...positive
20Four regions in this DNA fragment interact wit...[0.4348486065864563, -0.4957105219364166, -0.9...positive1.0Four regions in this DNA fragment interact wit...positive
21Previous characterization of the GPIX promoter...[-1.4378761053085327, 0.35956352949142456, -0....positive4.0Previous characterization of the GPIX promoter...positive
22Neutrophil maturation was impaired in PEBP2bet...[-0.9040020704269409, -0.3321942389011383, -0....positive8.0Neutrophil maturation was impaired in PEBP2bet...positive
23Treatment of normal monocytes with 12-0-tetrad...[-0.38281142711639404, -0.09264473617076874, -...positive1.0Treatment of normal monocytes with 12-0-tetrad...positive
24We have cloned the gene for a new ets-related ...[-0.06995158642530441, 0.0628473237156868, -0....positive5.0We have cloned the gene for a new ets-related ...positive
25Expression and genomic configuration of GM-CSF...[-0.2674736976623535, 0.6871048212051392, -0.5...positive4.0Expression and genomic configuration of GM-CSF...positive
26Consequently, cytosolic activation, nuclear tr...[-0.11669447273015976, -0.41803038120269775, -...positive5.0Consequently, cytosolic activation, nuclear tr...positive
27We propose that Jun plays a bifunctional role ...[-0.3439536988735199, -0.11270888149738312, -0...positive3.0We propose that Jun plays a bifunctional role ...positive
28In B lymphoid cells, deltaspi-B and spi-B mRNA...[-0.026912417262792587, -0.07306383550167084, ...positive1.0In B lymphoid cells, deltaspi-B and spi-B mRNA...positive
29In order to study CD14 gene regulation, the hu...[-0.6894456744194031, 0.6227385997772217, -0.4...positive8.0In order to study CD14 gene regulation, the hu...positive
30E3 transcripts were RA-inducible in HL60 cells...[0.055561263114213943, 0.29671669006347656, -0...positive9.0E3 transcripts were RA-inducible in HL60 cells...negative
31IL-7 also delayed the decreases in the levels ...[-1.2081276178359985, -0.4925746023654938, -0....positive3.0IL-7 also delayed the decreases in the levels ...positive
32A marked activation of gamma-promoter activity...[-0.7283461093902588, -0.00887372437864542, -0...positive1.0A marked activation of gamma-promoter activity...positive
33These results demonstrate that the transcripti...[-0.8078993558883667, 0.024193232879042625, -0...positive6.0These results demonstrate that the transcripti...positive
34The A6H monoclonal antibody (mAb) recognizes a...[-0.6170817017555237, 0.176478773355484, -0.77...positive2.0The A6H monoclonal antibody (mAb) recognizes a...positive
35The highly specific granulomonocyte-associated...[-0.5863608717918396, 0.10614630579948425, -0....positive1.0The highly specific granulomonocyte-associated...positive
36Compared to benign tumor or mammary reduction ...[0.0020447946153581142, 0.1792808622121811, -0...positive1.0Compared to benign tumor or mammary reduction ...positive
37Sequence-specific DNA-binding small molecules ...[-0.2069147676229477, 0.016566185280680656, -0...positive1.0Sequence-specific DNA-binding small molecules ...positive
38Structure function analysis of vitamin D analo...[-0.4320926070213318, 0.74772709608078, -0.559...positive1.0Structure function analysis of vitamin D analo...positive
39To determine the mechanisms responsible for th...[-0.37579116225242615, 0.5168997049331665, -0....positive5.0To determine the mechanisms responsible for th...positive
40DNA band-shift analysis reveals NF-kappa B bin...[-0.04819030314683914, 0.04168706387281418, -0...positive7.0DNA band-shift analysis reveals NF-kappa B bin...positive
41RESULTS: Both all-trans and 9-cis RA inhibited...[-0.5226595401763916, -0.33986032009124756, -0...positive5.0RESULTS: Both all-trans and 9-cis RA inhibited...negative
42Negative selection in the cortex appears to be...[-0.022223301231861115, -0.22735534608364105, ...positive5.0Negative selection in the cortex appears to be...negative
43Nitric oxide-stimulated guanine nucleotide exc...[-0.3036770224571228, -0.13188859820365906, -0...positive7.0Nitric oxide-stimulated guanine nucleotide exc...positive
44We show that the ability to detect NF-ATp in a...[-0.6237607598304749, 0.3612492084503174, -0.4...positive2.0We show that the ability to detect NF-ATp in a...positive
45C3/5 cells developed specific proliferation an...[-0.26343587040901184, -0.09341304749250412, -...positive1.0C3/5 cells developed specific proliferation an...positive
46Recent reports demonstrated that ionizing radi...[-0.5906685590744019, 0.20001085102558136, -0....positive4.0Recent reports demonstrated that ionizing radi...positive
47Here we examine molecular mechanisms controlli...[-0.4508644938468933, 0.2742282450199127, -0.0...positive2.0Here we examine molecular mechanisms controlli...positive
48These results demonstrate that multiple X1 box...[-0.3715269863605499, 0.3266661465167999, -0.3...positive2.0These results demonstrate that multiple X1 box...positive
49IL-2 and IL-7 were equivalent in their ability...[-0.7310452461242676, -0.2004045844078064, -0....positive3.0IL-2 and IL-7 were equivalent in their ability...positive
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 4 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lVyOE2wV0fw_" + }, + "source": [ + "# 4. Test the fitted pipe on new example" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 150 + }, + "id": "qdCUg2MR0PD2", + "outputId": "0ecf1df7-d5ff-4ed4-e993-9092c7870aad" + }, + "source": [ + "fitted_pipe.predict(\"The virus had a direct impact on the nervous system\")" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "sentence_detector_dl download started this may take some time.\n", + "Approximate size to download 354.6 KB\n", + "[OK!]\n", + "Warning::Spark Session already created, some configs may not take.\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " sentence \\\n", + "0 The virus had a direct impact on the nervous s... \n", + "\n", + " sentence_embedding_small_bert_L2_128 sentiment \\\n", + "0 [-0.4990377724170685, 0.34958764910697937, -0.... positive \n", + "\n", + " sentiment_confidence \n", + "0 1.0 " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
sentencesentence_embedding_small_bert_L2_128sentimentsentiment_confidence
0The virus had a direct impact on the nervous s...[-0.4990377724170685, 0.34958764910697937, -0....positive1.0
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 4 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xflpwrVjjBVD" + }, + "source": [ + "## 5. Configure pipe training parameters" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "UtsAUGTmOTms", + "outputId": "3d34ed83-2b77-48ab-960d-054ed7937903" + }, + "source": [ + "trainable_pipe.print_info()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n", + ">>> component_list['bert_sentence_embeddings@sent_small_bert_L2_128'] has settable params:\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setBatchSize(8) | Info: Size of every batch | Currently set to : 8\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setIsLong(False) | Info: Use Long type instead of Int type for inputs buffer - Some Bert models require Long instead of Int. | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setMaxSentenceLength(128) | Info: Max sentence length to process | Currently set to : 128\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setDimension(128) | Info: Number of embedding dimensions | Currently set to : 128\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setCaseSensitive(False) | Info: whether to ignore case in tokens for embeddings matching | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setStorageRef('sent_small_bert_L2_128') | Info: unique reference name for identification | Currently set to : sent_small_bert_L2_128\n", + ">>> component_list['document_assembler'] has settable params:\n", + "component_list['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n", + ">>> component_list['sentiment_dl@sent_small_bert_L2_128'] has settable params:\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setThreshold(0.6) | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setThresholdLabel('neutral') | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setStorageRef('sent_small_bert_L2_128') | Info: unique reference name for identification | Currently set to : sent_small_bert_L2_128\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2GJdDNV9jEIe" + }, + "source": [ + "## 6. Retrain with new parameters" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "id": "mptfvHx-MMMX", + "outputId": "1088e9d9-1d05-4b63-d07b-cb23113d9277" + }, + "source": [ + "# Train longer!\n", + "trainable_pipe = nlp.load('train.sentiment')\n", + "trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(5)\n", + "fitted_pipe = trainable_pipe.fit(train_df.iloc[:50])\n", + "# predict with the trainable pipeline on dataset and get predictions\n", + "preds = fitted_pipe.predict(train_df.iloc[:50],output_level='document')\n", + "\n", + "#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n", + "preds.dropna(inplace=True)\n", + "print(classification_report(preds['y'], preds['sentiment']))\n", + "\n", + "preds" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", + "sent_small_bert_L2_128 download started this may take some time.\n", + "Approximate size to download 16.1 MB\n", + "[OK!]\n", + " precision recall f1-score support\n", + "\n", + " negative 0.00 0.00 0.00 2\n", + " positive 0.96 1.00 0.98 48\n", + "\n", + " accuracy 0.96 50\n", + " macro avg 0.48 0.50 0.49 50\n", + "weighted avg 0.92 0.96 0.94 50\n", + "\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " document \\\n", + "0 Based on nucleotide sequence requirements and ... \n", + "1 TINUR belongs to the NGFI-B/nur77 family of th... \n", + "2 In selenium-deprived Jurkat and ESb-L T lympho... \n", + "3 These findings demonstrate that IFNs inhibit I... \n", + "4 These data reveal the presence of distinct com... \n", + "5 The translated protein showed weak DNA binding... \n", + "6 All tumor cell lines from the B-cell lineage a... \n", + "7 GABP factors bind to a distal interleukin 2 (I... \n", + "8 In addition, Tax also stimulates the transcrip... \n", + "9 Mutation of the TCF-1 alpha binding site dimin... \n", + "10 They involve phosphorylation and proteolytic r... \n", + "11 Activation of the transcription factor NF-kapp... \n", + "12 Neutrophil accumulation and development of lun... \n", + "13 Mutations in these binding sites can interfere... \n", + "14 To understand the molecular mechanisms of func... \n", + "15 Collectively, these results suggest that HOCl ... \n", + "16 However, mutation of the AP-1 site markedly di... \n", + "17 A novel HIV-1 isolate containing alterations a... \n", + "18 These findings suggest that Zp responds direct... \n", + "19 Several distinct roles for hsp90 in modulating... \n", + "20 In addition to activation of phospholipase C g... \n", + "21 Also in the current study, binding activity to... \n", + "22 These alterations of transcription factors are... \n", + "23 Epstein-Barr virus nuclear antigen 2 and laten... \n", + "24 Here, we present the isolation and characteriz... \n", + "25 The signaling capabilities of the IL-10R for a... \n", + "26 To identify potential cellular homologues of c... \n", + "27 In addition, no CIITA protein is detectable in... \n", + "28 The expression of AP-1 depended on calcium mob... \n", + "29 Nuclear accumulation of NFAT4 opposed by the J... \n", + "30 This resistance to apoptosis is reversed by an... \n", + "31 During recent years, studies of insulin-gene r... \n", + "32 The NFAT protein migrated more slowly in a sod... \n", + "33 The effect of DM on expression of IL-2R alpha ... \n", + "34 We conclude that TNF-alpha bioavailability and... \n", + "35 The two isozymes show little amino acid identi... \n", + "36 Intriguingly, surface expression of LT-alpha1b... \n", + "37 Binding of the drug inhibits isomerase activit... \n", + "38 These genes may then play a role in altering t... \n", + "39 Copyright 1999 Academic Press. \n", + "40 Direct exposure to 10 nM 2,3,7,8-TCDD caused a... \n", + "41 However, activation of the T cell lines leadin... \n", + "42 The defensin sensitivities of Salmonella typhi... \n", + "43 Receptors for the Fc portion of immunoglobulin... \n", + "44 We have isolated a novel cDNA clone encoding i... \n", + "45 IL-10 inhibitory activity is exerted on T lymp... \n", + "46 Furthermore, CD40 ligation of a HLA-A2+, Melan... \n", + "47 We conclude that interactions between TAFII32 ... \n", + "48 These cDNA were 2343 bp long and their transcr... \n", + "49 In vitro studies using pure recombinant p21ras... \n", + "\n", + " sentence_embedding_small_bert_L2_128 sentiment \\\n", + "0 [-0.49277549982070923, 0.09530887007713318, -0... positive \n", + "1 [-0.10481061786413193, 0.1171066015958786, -0.... positive \n", + "2 [-1.0812174081802368, 0.5883667469024658, -0.4... positive \n", + "3 [-0.9547467231750488, -0.15689292550086975, -0... positive \n", + "4 [-0.4628618657588959, 0.06154884025454521, -0.... positive \n", + "5 [-0.3139030635356903, -0.15748938918113708, -0... positive \n", + "6 [0.20084746181964874, -0.4846010208129883, -0.... positive \n", + "7 [-0.26659855246543884, 0.2846565246582031, -1.... positive \n", + "8 [-0.822009801864624, 0.6354378461837769, -0.34... positive \n", + "9 [-0.6489402651786804, 0.1254355013370514, -0.5... positive \n", + "10 [-0.8970018029212952, -0.5171773433685303, -0.... positive \n", + "11 [-0.5248922109603882, 0.24680814146995544, -0.... positive \n", + "12 [-1.0332255363464355, 0.5459337830543518, -1.1... positive \n", + "13 [-0.11175256222486496, 0.06691665202379227, -0... positive \n", + "14 [-0.4407936632633209, 0.782042920589447, -0.50... positive \n", + "15 [-0.6647999882698059, 0.3089727759361267, -0.4... positive \n", + "16 [-0.9864672422409058, 0.05116121098399162, -0.... positive \n", + "17 [-0.18733267486095428, 0.06821992248296738, -0... positive \n", + "18 [-0.7554842829704285, -0.054939232766628265, -... positive \n", + "19 [-0.12544415891170502, 0.2649071216583252, -0.... positive \n", + "20 [-0.5602053999900818, -0.5014104843139648, -0.... positive \n", + "21 [-0.9772286415100098, 0.048676762729883194, -0... positive \n", + "22 [-0.3631213307380676, 0.2459881603717804, -0.8... positive \n", + "23 [-0.1847376972436905, -0.007672342471778393, -... positive \n", + "24 [-0.14775823056697845, -0.014818106777966022, ... positive \n", + "25 [-0.506297767162323, 0.44844114780426025, -0.5... positive \n", + "26 [0.001409502699971199, 0.13168536126613617, -0... positive \n", + "27 [0.09807377308607101, -0.04232050105929375, -0... positive \n", + "28 [-0.9711195826530457, -0.15560954809188843, -0... positive \n", + "29 [-0.22911998629570007, 0.4173714518547058, -0.... positive \n", + "30 [-0.3486934304237366, 0.22132794559001923, -0.... positive \n", + "31 [-0.7870786786079407, 0.4678671956062317, -0.9... positive \n", + "32 [0.048784464597702026, -0.1957472562789917, -0... positive \n", + "33 [-0.37844932079315186, 0.12598302960395813, -0... positive \n", + "34 [-0.5289692878723145, 0.7431911826133728, -0.7... positive \n", + "35 [0.14430226385593414, -0.674239993095398, -0.7... positive \n", + "36 [-0.2223060429096222, -0.15909825265407562, -0... positive \n", + "37 [-0.6680501699447632, 0.34882932901382446, -0.... positive \n", + "38 [-0.8830984830856323, 0.1207333579659462, -0.6... positive \n", + "39 [-1.021697759628296, 0.9163565635681152, -0.25... positive \n", + "40 [-0.8829392194747925, 0.09077997505664825, -0.... positive \n", + "41 [-0.2961618900299072, -0.22415119409561157, -0... positive \n", + "42 [-0.4785139262676239, 0.172796368598938, -0.22... positive \n", + "43 [-0.44332364201545715, -0.3984912037849426, -0... positive \n", + "44 [0.11249734461307526, -0.1219027042388916, -0.... positive \n", + "45 [-0.07616603374481201, 0.030630899593234062, -... positive \n", + "46 [-0.3785442113876343, 0.29912295937538147, -0.... positive \n", + "47 [-0.4170002043247223, -0.09449539333581924, -0... positive \n", + "48 [-0.9243519306182861, 0.7070343494415283, -0.3... positive \n", + "49 [-0.6335657238960266, 0.15302662551403046, -0.... positive \n", + "\n", + " sentiment_confidence text \\\n", + "0 1.0 Based on nucleotide sequence requirements and ... \n", + "1 1.0 TINUR belongs to the NGFI-B/nur77 family of th... \n", + "2 1.0 In selenium-deprived Jurkat and ESb-L T lympho... \n", + "3 1.0 These findings demonstrate that IFNs inhibit I... \n", + "4 1.0 These data reveal the presence of distinct com... \n", + "5 1.0 The translated protein showed weak DNA binding... \n", + "6 1.0 All tumor cell lines from the B-cell lineage a... \n", + "7 1.0 GABP factors bind to a distal interleukin 2 (I... \n", + "8 1.0 In addition, Tax also stimulates the transcrip... \n", + "9 1.0 Mutation of the TCF-1 alpha binding site dimin... \n", + "10 1.0 They involve phosphorylation and proteolytic r... \n", + "11 1.0 Activation of the transcription factor NF-kapp... \n", + "12 1.0 Neutrophil accumulation and development of lun... \n", + "13 1.0 Mutations in these binding sites can interfere... \n", + "14 1.0 To understand the molecular mechanisms of func... \n", + "15 1.0 Collectively, these results suggest that HOCl ... \n", + "16 1.0 However, mutation of the AP-1 site markedly di... \n", + "17 1.0 A novel HIV-1 isolate containing alterations a... \n", + "18 1.0 These findings suggest that Zp responds direct... \n", + "19 1.0 Several distinct roles for hsp90 in modulating... \n", + "20 1.0 In addition to activation of phospholipase C g... \n", + "21 1.0 Also in the current study, binding activity to... \n", + "22 1.0 These alterations of transcription factors are... \n", + "23 1.0 Epstein-Barr virus nuclear antigen 2 and laten... \n", + "24 1.0 Here, we present the isolation and characteriz... \n", + "25 1.0 The signaling capabilities of the IL-10R for a... \n", + "26 1.0 To identify potential cellular homologues of c... \n", + "27 1.0 In addition, no CIITA protein is detectable in... \n", + "28 1.0 The expression of AP-1 depended on calcium mob... \n", + "29 1.0 Nuclear accumulation of NFAT4 opposed by the J... \n", + "30 1.0 This resistance to apoptosis is reversed by an... \n", + "31 1.0 During recent years, studies of insulin-gene r... \n", + "32 1.0 The NFAT protein migrated more slowly in a sod... \n", + "33 1.0 The effect of DM on expression of IL-2R alpha ... \n", + "34 1.0 We conclude that TNF-alpha bioavailability and... \n", + "35 1.0 The two isozymes show little amino acid identi... \n", + "36 1.0 Intriguingly, surface expression of LT-alpha1b... \n", + "37 1.0 Binding of the drug inhibits isomerase activit... \n", + "38 1.0 These genes may then play a role in altering t... \n", + "39 1.0 Copyright 1999 Academic Press. \n", + "40 1.0 Direct exposure to 10 nM 2,3,7,8-TCDD caused a... \n", + "41 1.0 However, activation of the T cell lines leadin... \n", + "42 1.0 The defensin sensitivities of Salmonella typhi... \n", + "43 1.0 Receptors for the Fc portion of immunoglobulin... \n", + "44 1.0 We have isolated a novel cDNA clone encoding i... \n", + "45 1.0 IL-10 inhibitory activity is exerted on T lymp... \n", + "46 1.0 Furthermore, CD40 ligation of a HLA-A2+, Melan... \n", + "47 1.0 We conclude that interactions between TAFII32 ... \n", + "48 1.0 These cDNA were 2343 bp long and their transcr... \n", + "49 1.0 In vitro studies using pure recombinant p21ras... \n", + "\n", + " y \n", + "0 positive \n", + "1 positive \n", + "2 positive \n", + "3 positive \n", + "4 positive \n", + "5 positive \n", + "6 positive \n", + "7 positive \n", + "8 positive \n", + "9 positive \n", + "10 positive \n", + "11 positive \n", + "12 positive \n", + "13 positive \n", + "14 positive \n", + "15 positive \n", + "16 positive \n", + "17 positive \n", + "18 positive \n", + "19 positive \n", + "20 positive \n", + "21 positive \n", + "22 positive \n", + "23 positive \n", + "24 positive \n", + "25 positive \n", + "26 positive \n", + "27 negative \n", + "28 positive \n", + "29 positive \n", + "30 positive \n", + "31 positive \n", + "32 positive \n", + "33 positive \n", + "34 positive \n", + "35 positive \n", + "36 positive \n", + "37 negative \n", + "38 positive \n", + "39 positive \n", + "40 positive \n", + "41 positive \n", + "42 positive \n", + "43 positive \n", + "44 positive \n", + "45 positive \n", + "46 positive \n", + "47 positive \n", + "48 positive \n", + "49 positive " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
documentsentence_embedding_small_bert_L2_128sentimentsentiment_confidencetexty
0Based on nucleotide sequence requirements and ...[-0.49277549982070923, 0.09530887007713318, -0...positive1.0Based on nucleotide sequence requirements and ...positive
1TINUR belongs to the NGFI-B/nur77 family of th...[-0.10481061786413193, 0.1171066015958786, -0....positive1.0TINUR belongs to the NGFI-B/nur77 family of th...positive
2In selenium-deprived Jurkat and ESb-L T lympho...[-1.0812174081802368, 0.5883667469024658, -0.4...positive1.0In selenium-deprived Jurkat and ESb-L T lympho...positive
3These findings demonstrate that IFNs inhibit I...[-0.9547467231750488, -0.15689292550086975, -0...positive1.0These findings demonstrate that IFNs inhibit I...positive
4These data reveal the presence of distinct com...[-0.4628618657588959, 0.06154884025454521, -0....positive1.0These data reveal the presence of distinct com...positive
5The translated protein showed weak DNA binding...[-0.3139030635356903, -0.15748938918113708, -0...positive1.0The translated protein showed weak DNA binding...positive
6All tumor cell lines from the B-cell lineage a...[0.20084746181964874, -0.4846010208129883, -0....positive1.0All tumor cell lines from the B-cell lineage a...positive
7GABP factors bind to a distal interleukin 2 (I...[-0.26659855246543884, 0.2846565246582031, -1....positive1.0GABP factors bind to a distal interleukin 2 (I...positive
8In addition, Tax also stimulates the transcrip...[-0.822009801864624, 0.6354378461837769, -0.34...positive1.0In addition, Tax also stimulates the transcrip...positive
9Mutation of the TCF-1 alpha binding site dimin...[-0.6489402651786804, 0.1254355013370514, -0.5...positive1.0Mutation of the TCF-1 alpha binding site dimin...positive
10They involve phosphorylation and proteolytic r...[-0.8970018029212952, -0.5171773433685303, -0....positive1.0They involve phosphorylation and proteolytic r...positive
11Activation of the transcription factor NF-kapp...[-0.5248922109603882, 0.24680814146995544, -0....positive1.0Activation of the transcription factor NF-kapp...positive
12Neutrophil accumulation and development of lun...[-1.0332255363464355, 0.5459337830543518, -1.1...positive1.0Neutrophil accumulation and development of lun...positive
13Mutations in these binding sites can interfere...[-0.11175256222486496, 0.06691665202379227, -0...positive1.0Mutations in these binding sites can interfere...positive
14To understand the molecular mechanisms of func...[-0.4407936632633209, 0.782042920589447, -0.50...positive1.0To understand the molecular mechanisms of func...positive
15Collectively, these results suggest that HOCl ...[-0.6647999882698059, 0.3089727759361267, -0.4...positive1.0Collectively, these results suggest that HOCl ...positive
16However, mutation of the AP-1 site markedly di...[-0.9864672422409058, 0.05116121098399162, -0....positive1.0However, mutation of the AP-1 site markedly di...positive
17A novel HIV-1 isolate containing alterations a...[-0.18733267486095428, 0.06821992248296738, -0...positive1.0A novel HIV-1 isolate containing alterations a...positive
18These findings suggest that Zp responds direct...[-0.7554842829704285, -0.054939232766628265, -...positive1.0These findings suggest that Zp responds direct...positive
19Several distinct roles for hsp90 in modulating...[-0.12544415891170502, 0.2649071216583252, -0....positive1.0Several distinct roles for hsp90 in modulating...positive
20In addition to activation of phospholipase C g...[-0.5602053999900818, -0.5014104843139648, -0....positive1.0In addition to activation of phospholipase C g...positive
21Also in the current study, binding activity to...[-0.9772286415100098, 0.048676762729883194, -0...positive1.0Also in the current study, binding activity to...positive
22These alterations of transcription factors are...[-0.3631213307380676, 0.2459881603717804, -0.8...positive1.0These alterations of transcription factors are...positive
23Epstein-Barr virus nuclear antigen 2 and laten...[-0.1847376972436905, -0.007672342471778393, -...positive1.0Epstein-Barr virus nuclear antigen 2 and laten...positive
24Here, we present the isolation and characteriz...[-0.14775823056697845, -0.014818106777966022, ...positive1.0Here, we present the isolation and characteriz...positive
25The signaling capabilities of the IL-10R for a...[-0.506297767162323, 0.44844114780426025, -0.5...positive1.0The signaling capabilities of the IL-10R for a...positive
26To identify potential cellular homologues of c...[0.001409502699971199, 0.13168536126613617, -0...positive1.0To identify potential cellular homologues of c...positive
27In addition, no CIITA protein is detectable in...[0.09807377308607101, -0.04232050105929375, -0...positive1.0In addition, no CIITA protein is detectable in...negative
28The expression of AP-1 depended on calcium mob...[-0.9711195826530457, -0.15560954809188843, -0...positive1.0The expression of AP-1 depended on calcium mob...positive
29Nuclear accumulation of NFAT4 opposed by the J...[-0.22911998629570007, 0.4173714518547058, -0....positive1.0Nuclear accumulation of NFAT4 opposed by the J...positive
30This resistance to apoptosis is reversed by an...[-0.3486934304237366, 0.22132794559001923, -0....positive1.0This resistance to apoptosis is reversed by an...positive
31During recent years, studies of insulin-gene r...[-0.7870786786079407, 0.4678671956062317, -0.9...positive1.0During recent years, studies of insulin-gene r...positive
32The NFAT protein migrated more slowly in a sod...[0.048784464597702026, -0.1957472562789917, -0...positive1.0The NFAT protein migrated more slowly in a sod...positive
33The effect of DM on expression of IL-2R alpha ...[-0.37844932079315186, 0.12598302960395813, -0...positive1.0The effect of DM on expression of IL-2R alpha ...positive
34We conclude that TNF-alpha bioavailability and...[-0.5289692878723145, 0.7431911826133728, -0.7...positive1.0We conclude that TNF-alpha bioavailability and...positive
35The two isozymes show little amino acid identi...[0.14430226385593414, -0.674239993095398, -0.7...positive1.0The two isozymes show little amino acid identi...positive
36Intriguingly, surface expression of LT-alpha1b...[-0.2223060429096222, -0.15909825265407562, -0...positive1.0Intriguingly, surface expression of LT-alpha1b...positive
37Binding of the drug inhibits isomerase activit...[-0.6680501699447632, 0.34882932901382446, -0....positive1.0Binding of the drug inhibits isomerase activit...negative
38These genes may then play a role in altering t...[-0.8830984830856323, 0.1207333579659462, -0.6...positive1.0These genes may then play a role in altering t...positive
39Copyright 1999 Academic Press.[-1.021697759628296, 0.9163565635681152, -0.25...positive1.0Copyright 1999 Academic Press.positive
40Direct exposure to 10 nM 2,3,7,8-TCDD caused a...[-0.8829392194747925, 0.09077997505664825, -0....positive1.0Direct exposure to 10 nM 2,3,7,8-TCDD caused a...positive
41However, activation of the T cell lines leadin...[-0.2961618900299072, -0.22415119409561157, -0...positive1.0However, activation of the T cell lines leadin...positive
42The defensin sensitivities of Salmonella typhi...[-0.4785139262676239, 0.172796368598938, -0.22...positive1.0The defensin sensitivities of Salmonella typhi...positive
43Receptors for the Fc portion of immunoglobulin...[-0.44332364201545715, -0.3984912037849426, -0...positive1.0Receptors for the Fc portion of immunoglobulin...positive
44We have isolated a novel cDNA clone encoding i...[0.11249734461307526, -0.1219027042388916, -0....positive1.0We have isolated a novel cDNA clone encoding i...positive
45IL-10 inhibitory activity is exerted on T lymp...[-0.07616603374481201, 0.030630899593234062, -...positive1.0IL-10 inhibitory activity is exerted on T lymp...positive
46Furthermore, CD40 ligation of a HLA-A2+, Melan...[-0.3785442113876343, 0.29912295937538147, -0....positive1.0Furthermore, CD40 ligation of a HLA-A2+, Melan...positive
47We conclude that interactions between TAFII32 ...[-0.4170002043247223, -0.09449539333581924, -0...positive1.0We conclude that interactions between TAFII32 ...positive
48These cDNA were 2343 bp long and their transcr...[-0.9243519306182861, 0.7070343494415283, -0.3...positive1.0These cDNA were 2343 bp long and their transcr...positive
49In vitro studies using pure recombinant p21ras...[-0.6335657238960266, 0.15302662551403046, -0....positive1.0In vitro studies using pure recombinant p21ras...positive
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 6 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qFoT-s1MjTSS" + }, + "source": [ + "# 7. Try training with different Embeddings" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "nxWFzQOhjWC8", + "outputId": "38055466-53e0-462c-91da-be2baf02b48f" + }, + "source": [ + "# We can use nlu.print_components(action='embed_sentence') to see every possibler sentence embedding we could use. Lets use bert!\n", + "nlp.nlu.print_components(action='embed_sentence')" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "For language NLU provides the following Models : \n", + "nlu.load('am.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_amharic\n", + "For language NLU provides the following Models : \n", + "nlu.load('de.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "For language NLU provides the following Models : \n", + "nlu.load('el.embed_sentence.bert.base_uncased') returns Spark NLP model_anno_obj sent_bert_base_uncased\n", + "For language NLU provides the following Models : \n", + "nlu.load('en.embed_sentence') returns Spark NLP model_anno_obj tfhub_use\n", + "nlu.load('en.embed_sentence.albert') returns Spark NLP model_anno_obj albert_base_uncased\n", + "nlu.load('en.embed_sentence.bert') returns Spark NLP model_anno_obj sent_bert_base_uncased\n", + "nlu.load('en.embed_sentence.bert.base_uncased_legal') returns Spark NLP model_anno_obj sent_bert_base_uncased_legal\n", + "nlu.load('en.embed_sentence.bert.finetuned') returns Spark NLP model_anno_obj sbert_setfit_finetuned_financial_text_classification\n", + "nlu.load('en.embed_sentence.bert.pubmed') returns Spark NLP model_anno_obj sent_bert_pubmed\n", + "nlu.load('en.embed_sentence.bert.pubmed_squad2') returns Spark NLP model_anno_obj sent_bert_pubmed_squad2\n", + "nlu.load('en.embed_sentence.bert.wiki_books') returns Spark NLP model_anno_obj sent_bert_wiki_books\n", + "nlu.load('en.embed_sentence.bert.wiki_books_mnli') returns Spark NLP model_anno_obj sent_bert_wiki_books_mnli\n", + "nlu.load('en.embed_sentence.bert.wiki_books_qnli') returns Spark NLP model_anno_obj sent_bert_wiki_books_qnli\n", + "nlu.load('en.embed_sentence.bert.wiki_books_qqp') returns Spark NLP model_anno_obj sent_bert_wiki_books_qqp\n", + "nlu.load('en.embed_sentence.bert.wiki_books_squad2') returns Spark NLP model_anno_obj sent_bert_wiki_books_squad2\n", + "nlu.load('en.embed_sentence.bert.wiki_books_sst2') returns Spark NLP model_anno_obj sent_bert_wiki_books_sst2\n", + "nlu.load('en.embed_sentence.bert_base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "nlu.load('en.embed_sentence.bert_base_uncased') returns Spark NLP model_anno_obj sent_bert_base_uncased\n", + "nlu.load('en.embed_sentence.bert_large_cased') returns Spark NLP model_anno_obj sent_bert_large_cased\n", + "nlu.load('en.embed_sentence.bert_large_uncased') returns Spark NLP model_anno_obj sent_bert_large_uncased\n", + "nlu.load('en.embed_sentence.bert_use_cmlm_en_base') returns Spark NLP model_anno_obj sent_bert_use_cmlm_en_base\n", + "nlu.load('en.embed_sentence.bert_use_cmlm_en_large') returns Spark NLP model_anno_obj sent_bert_use_cmlm_en_large\n", + "nlu.load('en.embed_sentence.biobert.clinical_base_cased') returns Spark NLP model_anno_obj sent_biobert_clinical_base_cased\n", + "nlu.load('en.embed_sentence.biobert.discharge_base_cased') returns Spark NLP model_anno_obj sent_biobert_discharge_base_cased\n", + "nlu.load('en.embed_sentence.biobert.pmc_base_cased') returns Spark NLP model_anno_obj sent_biobert_pmc_base_cased\n", + "nlu.load('en.embed_sentence.biobert.pubmed_base_cased') returns Spark NLP model_anno_obj sent_biobert_pubmed_base_cased\n", + "nlu.load('en.embed_sentence.biobert.pubmed_large_cased') returns Spark NLP model_anno_obj sent_biobert_pubmed_large_cased\n", + "nlu.load('en.embed_sentence.biobert.pubmed_pmc_base_cased') returns Spark NLP model_anno_obj sent_biobert_pubmed_pmc_base_cased\n", + "nlu.load('en.embed_sentence.covidbert.large_uncased') returns Spark NLP model_anno_obj sent_covidbert_large_uncased\n", + "nlu.load('en.embed_sentence.distil_roberta.distilled_base') returns Spark NLP model_anno_obj sent_distilroberta_base\n", + "nlu.load('en.embed_sentence.doc2vec') returns Spark NLP model_anno_obj doc2vec_gigaword_300\n", + "nlu.load('en.embed_sentence.doc2vec.gigaword_300') returns Spark NLP model_anno_obj doc2vec_gigaword_300\n", + "nlu.load('en.embed_sentence.doc2vec.gigaword_wiki_300') returns Spark NLP model_anno_obj doc2vec_gigaword_wiki_300\n", + "nlu.load('en.embed_sentence.electra') returns Spark NLP model_anno_obj sent_electra_small_uncased\n", + "nlu.load('en.embed_sentence.electra_base_uncased') returns Spark NLP model_anno_obj sent_electra_base_uncased\n", + "nlu.load('en.embed_sentence.electra_large_uncased') returns Spark NLP model_anno_obj sent_electra_large_uncased\n", + "nlu.load('en.embed_sentence.electra_small_uncased') returns Spark NLP model_anno_obj sent_electra_small_uncased\n", + "nlu.load('en.embed_sentence.roberta.base') returns Spark NLP model_anno_obj sent_roberta_base\n", + "nlu.load('en.embed_sentence.roberta.large') returns Spark NLP model_anno_obj sent_roberta_large\n", + "nlu.load('en.embed_sentence.small_bert_L10_128') returns Spark NLP model_anno_obj sent_small_bert_L10_128\n", + "nlu.load('en.embed_sentence.small_bert_L10_256') returns Spark NLP model_anno_obj sent_small_bert_L10_256\n", + "nlu.load('en.embed_sentence.small_bert_L10_512') returns Spark NLP model_anno_obj sent_small_bert_L10_512\n", + "nlu.load('en.embed_sentence.small_bert_L10_768') returns Spark NLP model_anno_obj sent_small_bert_L10_768\n", + "nlu.load('en.embed_sentence.small_bert_L12_128') returns Spark NLP model_anno_obj sent_small_bert_L12_128\n", + "nlu.load('en.embed_sentence.small_bert_L12_256') returns Spark NLP model_anno_obj sent_small_bert_L12_256\n", + "nlu.load('en.embed_sentence.small_bert_L12_512') returns Spark NLP model_anno_obj sent_small_bert_L12_512\n", + "nlu.load('en.embed_sentence.small_bert_L12_768') returns Spark NLP model_anno_obj sent_small_bert_L12_768\n", + "nlu.load('en.embed_sentence.small_bert_L2_128') returns Spark NLP model_anno_obj sent_small_bert_L2_128\n", + "nlu.load('en.embed_sentence.small_bert_L2_256') returns Spark NLP model_anno_obj sent_small_bert_L2_256\n", + "nlu.load('en.embed_sentence.small_bert_L2_512') returns Spark NLP model_anno_obj sent_small_bert_L2_512\n", + "nlu.load('en.embed_sentence.small_bert_L2_768') returns Spark NLP model_anno_obj sent_small_bert_L2_768\n", + "nlu.load('en.embed_sentence.small_bert_L4_128') returns Spark NLP model_anno_obj sent_small_bert_L4_128\n", + "nlu.load('en.embed_sentence.small_bert_L4_256') returns Spark NLP model_anno_obj sent_small_bert_L4_256\n", + "nlu.load('en.embed_sentence.small_bert_L4_512') returns Spark NLP model_anno_obj sent_small_bert_L4_512\n", + "nlu.load('en.embed_sentence.small_bert_L4_768') returns Spark NLP model_anno_obj sent_small_bert_L4_768\n", + "nlu.load('en.embed_sentence.small_bert_L6_128') returns Spark NLP model_anno_obj sent_small_bert_L6_128\n", + "nlu.load('en.embed_sentence.small_bert_L6_256') returns Spark NLP model_anno_obj sent_small_bert_L6_256\n", + "nlu.load('en.embed_sentence.small_bert_L6_512') returns Spark NLP model_anno_obj sent_small_bert_L6_512\n", + "nlu.load('en.embed_sentence.small_bert_L6_768') returns Spark NLP model_anno_obj sent_small_bert_L6_768\n", + "nlu.load('en.embed_sentence.small_bert_L8_128') returns Spark NLP model_anno_obj sent_small_bert_L8_128\n", + "nlu.load('en.embed_sentence.small_bert_L8_256') returns Spark NLP model_anno_obj sent_small_bert_L8_256\n", + "nlu.load('en.embed_sentence.small_bert_L8_512') returns Spark NLP model_anno_obj sent_small_bert_L8_512\n", + "nlu.load('en.embed_sentence.small_bert_L8_768') returns Spark NLP model_anno_obj sent_small_bert_L8_768\n", + "nlu.load('en.embed_sentence.tfhub_use') returns Spark NLP model_anno_obj tfhub_use\n", + "nlu.load('en.embed_sentence.tfhub_use.lg') returns Spark NLP model_anno_obj tfhub_use_lg\n", + "nlu.load('en.embed_sentence.use') returns Spark NLP model_anno_obj tfhub_use\n", + "nlu.load('en.embed_sentence.use.lg') returns Spark NLP model_anno_obj tfhub_use_lg\n", + "For language NLU provides the following Models : \n", + "nlu.load('es.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "nlu.load('es.embed_sentence.bert.base_uncased') returns Spark NLP model_anno_obj sent_bert_base_uncased\n", + "For language NLU provides the following Models : \n", + "nlu.load('fi.embed_sentence.bert') returns Spark NLP model_anno_obj bert_base_finnish_uncased\n", + "nlu.load('fi.embed_sentence.bert.cased') returns Spark NLP model_anno_obj bert_base_finnish_cased\n", + "nlu.load('fi.embed_sentence.bert.uncased') returns Spark NLP model_anno_obj bert_base_finnish_uncased\n", + "For language NLU provides the following Models : \n", + "nlu.load('ha.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_hausa\n", + "For language NLU provides the following Models : \n", + "nlu.load('ig.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_igbo\n", + "For language NLU provides the following Models : \n", + "nlu.load('lg.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_luganda\n", + "For language NLU provides the following Models : \n", + "nlu.load('nl.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "For language NLU provides the following Models : \n", + "nlu.load('pcm.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_naija\n", + "For language NLU provides the following Models : \n", + "nlu.load('pt.embed_sentence.bert.base_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_base_tsdae_sts\n", + "nlu.load('pt.embed_sentence.bert.cased_large_legal') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.1\n", + "nlu.load('pt.embed_sentence.bert.large_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_gpl_sts\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.10.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.10\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.2.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.2\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.3.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.3\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.4.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.4\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.5.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.5\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.7.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.7\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.8.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.8\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.9.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.9\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v1.0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v1.0\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_gpl_nli_sts_v0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_gpl_nli_sts_v0\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_gpl_nli_sts_v1.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_gpl_nli_sts_v1\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_nli_sts_v0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_nli_sts_v0\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_nli_sts_v1.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_nli_sts_v1\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_sts_v0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_sts_v0\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_sts_v1.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_sts_v1\n", + "nlu.load('pt.embed_sentence.bert.v2_base_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_base_ma_v2\n", + "nlu.load('pt.embed_sentence.bert.v2_large_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_sts_v2\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.assin.base.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_base_ma\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.assin2.base.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_base\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_sts_by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_sts_ma.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large_ma\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_sts_ma_v3.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large_ma_v3\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_tsdae_sts.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_sts\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_tsdae_sts_v4.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_sts_v4\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_tsdae_v4_gpl_sts.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_v4_gpl_sts\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.v2_large_sts_v2.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large_v2\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.v2_large_v2_sts.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_v2_sts\n", + "For language NLU provides the following Models : \n", + "nlu.load('rw.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_kinyarwanda\n", + "For language NLU provides the following Models : \n", + "nlu.load('sv.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "For language NLU provides the following Models : \n", + "nlu.load('sw.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_swahili\n", + "For language NLU provides the following Models : \n", + "nlu.load('wo.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_wolof\n", + "For language NLU provides the following Models : \n", + "nlu.load('xx.embed_sentence') returns Spark NLP model_anno_obj sent_bert_multi_cased\n", + "nlu.load('xx.embed_sentence.bert') returns Spark NLP model_anno_obj sent_bert_multi_cased\n", + "nlu.load('xx.embed_sentence.bert.cased') returns Spark NLP model_anno_obj sent_bert_multi_cased\n", + "nlu.load('xx.embed_sentence.bert.muril') returns Spark NLP model_anno_obj sent_bert_muril\n", + "nlu.load('xx.embed_sentence.bert_use_cmlm_multi_base') returns Spark NLP model_anno_obj sent_bert_use_cmlm_multi_base\n", + "nlu.load('xx.embed_sentence.bert_use_cmlm_multi_base_br') returns Spark NLP model_anno_obj sent_bert_use_cmlm_multi_base_br\n", + "nlu.load('xx.embed_sentence.labse') returns Spark NLP model_anno_obj labse\n", + "nlu.load('xx.embed_sentence.xlm_roberta.base') returns Spark NLP model_anno_obj sent_xlm_roberta_base\n", + "For language NLU provides the following Models : \n", + "nlu.load('yo.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_yoruba\n", + "For language NLU provides the following Models : \n", + "nlu.load('zh.embed_sentence.bert') returns Spark NLP model_anno_obj sbert_chinese_qmc_finance_v1\n", + "nlu.load('zh.embed_sentence.bert.distilled') returns Spark NLP model_anno_obj sbert_chinese_qmc_finance_v1_distill\n" + ] + } + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "IKK_Ii_gjJfF", + "outputId": "6c32478b-c999-499f-ca4a-38a992c4e950" + }, + "source": [ + "trainable_pipe = nlp.load('en.embed_sentence.small_bert_L12_128 train.sentiment')\n", + "# We need to train longer and user smaller LR for NON-USE based sentence embeddings usually\n", + "# We could tune the hyperparameters further with hyperparameter tuning methods like gridsearch\n", + "# Also longer training gives more accuracy\n", + "trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(120)\n", + "trainable_pipe['trainable_sentiment_dl'].setLr(0.0005)\n", + "fitted_pipe = trainable_pipe.fit(train_df[:1000])\n", + "\n", + "# predict with the trainable pipeline on dataset and get predictions\n", + "preds = fitted_pipe.predict(train_df[:1000],output_level='document')\n", + "\n", + "#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n", + "preds.dropna(inplace=True)\n", + "print(classification_report(preds['y'], preds['sentiment']))\n", + "\n", + "#preds" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", + "sent_small_bert_L12_128 download started this may take some time.\n", + "Approximate size to download 23.4 MB\n", + "[OK!]\n", + " precision recall f1-score support\n", + "\n", + " negative 0.00 0.00 0.00 132\n", + " positive 0.87 1.00 0.93 868\n", + "\n", + " accuracy 0.87 1000\n", + " macro avg 0.43 0.50 0.46 1000\n", + "weighted avg 0.75 0.87 0.81 1000\n", + "\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_1jxw3GnVGlI" + }, + "source": [ + "# 7.1 evaluate on Test Data" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Fxx4yNkNVGFl", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "d01316a0-0e7c-4155-e8b0-8b28715aa921" + }, + "source": [ + "preds = fitted_pipe.predict(test_df,output_level='document')\n", + "\n", + "#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n", + "preds.dropna(inplace=True)\n", + "print(classification_report(preds['y'], preds['sentiment']))" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + " precision recall f1-score support\n", + "\n", + " negative 0.00 0.00 0.00 348\n", + " positive 0.85 1.00 0.92 2051\n", + "\n", + " accuracy 0.85 2399\n", + " macro avg 0.43 0.50 0.46 2399\n", + "weighted avg 0.73 0.85 0.79 2399\n", + "\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2BB-NwZUoHSe" + }, + "source": [ + "# 8. Lets save the model" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "eLex095goHwm" + }, + "source": [ + "stored_model_path = './models/classifier_dl_trained'\n", + "fitted_pipe.save(stored_model_path)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e_b2DPd4rCiU" + }, + "source": [ + "# 9. Lets load the model from HDD.\n", + "This makes Offlien NLU usage possible! \n", + "You need to call nlu.load(path=path_to_the_pipe) to load a model/pipeline from disk." + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 133 + }, + "id": "SO4uz45MoRgp", + "outputId": "e4464877-4bd1-469f-cb1b-651aea1e9bec" + }, + "source": [ + "hdd_pipe = nlp.load(path=stored_model_path)\n", + "\n", + "preds = hdd_pipe.predict('The virus had a direct impact on the nervous system')\n", + "preds" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " document \\\n", + "0 The virus had a direct impact on the nervous s... \n", + "\n", + " sentence_embedding_from_disk sentiment \\\n", + "0 [0.6362331509590149, 0.006696224212646484, 0.2... positive \n", + "\n", + " sentiment_confidence \n", + "0 4.0 " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
documentsentence_embedding_from_disksentimentsentiment_confidence
0The virus had a direct impact on the nervous s...[0.6362331509590149, 0.006696224212646484, 0.2...positive4.0
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 37 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "e0CVlkk9v6Qi", + "outputId": "1151c44d-6bc8-4c03-95b9-f495cd366a94" + }, + "source": [ + "hdd_pipe.print_info()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n", + ">>> component_list['document_assembler'] has settable params:\n", + "component_list['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n", + ">>> component_list['bert_sentence_embeddings@sent_small_bert_L12_128'] has settable params:\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_128'].setBatchSize(8) | Info: Size of every batch | Currently set to : 8\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_128'].setCaseSensitive(False) | Info: whether to ignore case in tokens for embeddings matching | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_128'].setDimension(128) | Info: Number of embedding dimensions | Currently set to : 128\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_128'].setMaxSentenceLength(128) | Info: Max sentence length to process | Currently set to : 128\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_128'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_128'].setIsLong(False) | Info: Use Long type instead of Int type for inputs buffer - Some Bert models require Long instead of Int. | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_128'].setStorageRef('sent_small_bert_L12_128') | Info: unique reference name for identification | Currently set to : sent_small_bert_L12_128\n", + ">>> component_list['sentiment_dl@sent_small_bert_L12_128'] has settable params:\n", + "component_list['sentiment_dl@sent_small_bert_L12_128'].setThreshold(0.6) | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n", + "component_list['sentiment_dl@sent_small_bert_L12_128'].setThresholdLabel('neutral') | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n", + "component_list['sentiment_dl@sent_small_bert_L12_128'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['sentiment_dl@sent_small_bert_L12_128'].setClasses(['positive', 'negative']) | Info: get the tags used to trained this SentimentDLModel | Currently set to : ['positive', 'negative']\n", + "component_list['sentiment_dl@sent_small_bert_L12_128'].setStorageRef('sent_small_bert_L12_128') | Info: unique reference name for identification | Currently set to : sent_small_bert_L12_128\n" + ] + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "CtOuwWgAvqXw" + }, + "source": [], + "execution_count": null, + "outputs": [] + } + ] +} \ No newline at end of file diff --git a/examples/colab/Training/binary_text_classification/NLU_training_sarcasam_classifier_demo_news_headlines.ipynb b/examples/colab/Training/binary_text_classification/NLU_training_sarcasam_classifier_demo_news_headlines.ipynb index 869452af..87316f62 100644 --- a/examples/colab/Training/binary_text_classification/NLU_training_sarcasam_classifier_demo_news_headlines.ipynb +++ b/examples/colab/Training/binary_text_classification/NLU_training_sarcasam_classifier_demo_news_headlines.ipynb @@ -1 +1,3130 @@ -{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"NLU_training_sarcasam_classifier_demo_news_headlines.ipynb","provenance":[],"collapsed_sections":["zkufh760uvF3"]},"kernelspec":{"name":"python3","display_name":"Python 3"}},"cells":[{"cell_type":"markdown","metadata":{"id":"zkufh760uvF3"},"source":["![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)\n","\n","[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/Training/binary_text_classification/NLU_training_sarcasam_classifier_demo_news_headlines.ipynb)\n","\n","\n","# Training a Sentiment Analysis Classifier with NLU \n","## 2 Class News Headlines Sarcasam Training\n","With the [SentimentDL model](https://nlp.johnsnowlabs.com/docs/en/annotators#sentimentdl-multi-class-sentiment-analysis-annotator) from Spark NLP you can achieve State Of the Art results on any multi class text classification problem \n","\n","This notebook showcases the following features : \n","\n","- How to train the deep learning classifier\n","- How to store a pipeline to disk\n","- How to load the pipeline from disk (Enables NLU offline mode)\n","\n","You can achieve these results or even better on this dataset with training data:\n","\n","\n","
\n","\n","![img.png]()\n","\n","You can achieve these results or even better on this dataset with test data:\n","\n","\n","
\n","\n","![Screenshot 2021-02-25 150812.png]()\n","\n","\n","\n"]},{"cell_type":"markdown","metadata":{"id":"dur2drhW5Rvi"},"source":["# 1. Install Java 8 and NLU"]},{"cell_type":"code","metadata":{"id":"hFGnBCHavltY","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1620187463238,"user_tz":-120,"elapsed":101354,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"1ab394d0-dce6-49cd-939b-c96e47fde49b"},"source":["!wget https://setup.johnsnowlabs.com/nlu/colab.sh -O - | bash\n","import nlu"],"execution_count":null,"outputs":[{"output_type":"stream","text":["--2021-05-05 04:02:42-- https://raw.githubusercontent.com/JohnSnowLabs/nlu/master/scripts/colab_setup.sh\n","Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.109.133, 185.199.111.133, ...\n","Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.\n","HTTP request sent, awaiting response... 200 OK\n","Length: 1671 (1.6K) [text/plain]\n","Saving to: ‘STDOUT’\n","\n","\r- 0%[ ] 0 --.-KB/s \r- 100%[===================>] 1.63K --.-KB/s in 0s \n","\n","2021-05-05 04:02:42 (33.7 MB/s) - written to stdout [1671/1671]\n","\n","Installing NLU 3.0.0 with PySpark 3.0.2 and Spark NLP 3.0.1 for Google Colab ...\n","\u001b[K |████████████████████████████████| 204.8MB 74kB/s \n","\u001b[K |████████████████████████████████| 153kB 68.2MB/s \n","\u001b[K |████████████████████████████████| 204kB 20.5MB/s \n","\u001b[K |████████████████████████████████| 204kB 79.6MB/s \n","\u001b[?25h Building wheel for pyspark (setup.py) ... \u001b[?25l\u001b[?25hdone\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"f4KkTfnR5Ugg"},"source":["# 2. Download News Headlines Sarcsam dataset \n","https://www.kaggle.com/rmisra/news-headlines-dataset-for-sarcasm-detection\n","#Context\n","Past studies in Sarcasm Detection mostly make use of Twitter datasets collected using hashtag based supervision but such datasets are noisy in terms of labels and language. Furthermore, many tweets are replies to other tweets and detecting sarcasm in these requires the availability of contextual tweets.\n","\n","To overcome the limitations related to noise in Twitter datasets, this News Headlines dataset for Sarcasm Detection is collected from two news website. TheOnion aims at producing sarcastic versions of current events and we collected all the headlines from News in Brief and News in Photos categories (which are sarcastic). We collect real (and non-sarcastic) news headlines from HuffPost.\n","\n","This new dataset has following advantages over the existing Twitter datasets:\n","\n","Since news headlines are written by professionals in a formal manner, there are no spelling mistakes and informal usage. This reduces the sparsity and also increases the chance of finding pre-trained embeddings.\n","\n","Furthermore, since the sole purpose of TheOnion is to publish sarcastic news, we get high-quality labels with much less noise as compared to Twitter datasets.\n","\n","Unlike tweets which are replies to other tweets, the news headlines we obtained are self-contained. This would help us in teasing apart the real sarcastic elements.\n"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"OrVb5ZMvvrQD","executionInfo":{"status":"ok","timestamp":1620187463651,"user_tz":-120,"elapsed":101758,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"4b81d821-074f-4409-ef50-7a401f6706be"},"source":["! wget http://ckl-it.de/wp-content/uploads/2021/02/Sarcasm_Headlines_Dataset_v2.csv\n"],"execution_count":null,"outputs":[{"output_type":"stream","text":["--2021-05-05 04:04:22-- http://ckl-it.de/wp-content/uploads/2021/02/Sarcasm_Headlines_Dataset_v2.csv\n","Resolving ckl-it.de (ckl-it.de)... 217.160.0.108, 2001:8d8:100f:f000::209\n","Connecting to ckl-it.de (ckl-it.de)|217.160.0.108|:80... connected.\n","HTTP request sent, awaiting response... 200 OK\n","Length: 2381880 (2.3M) [text/csv]\n","Saving to: ‘Sarcasm_Headlines_Dataset_v2.csv’\n","\n","Sarcasm_Headlines_D 100%[===================>] 2.27M --.-KB/s in 0.1s \n","\n","2021-05-05 04:04:23 (15.7 MB/s) - ‘Sarcasm_Headlines_Dataset_v2.csv’ saved [2381880/2381880]\n","\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":419},"id":"y4xSRWIhwT28","executionInfo":{"status":"ok","timestamp":1620187464360,"user_tz":-120,"elapsed":102461,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"f3e8cdcc-271a-4fa6-e8c4-4aed44da42dc"},"source":["import pandas as pd\n","test_path = '/content/Sarcasm_Headlines_Dataset_v2.csv'\n","train_df = pd.read_csv(test_path,sep=\",\")\n","cols = [\"y\",\"text\"]\n","train_df = train_df[cols]\n","from sklearn.model_selection import train_test_split\n","\n","train_df, test_df = train_test_split(train_df, test_size=0.2)\n","train_df\n","\n"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
ytext
1463positivered cross installs blood drop-off bins for don...
9543positivepresident-elect edwards seen entering chinatow...
2930positivecrowd at trump rally realizes they've been cha...
644positivekerry captures bin laden one week too late
2390negative'new hampshire' episode 4: not just for old, w...
.........
9034negativedeputy interior secretary met with lobbyist fo...
6330negativeparents of kidnapped girls make desperate plea
7127negativeimage vs. substance in your self-made journey
7289positive'secretary clinton is a different person than ...
9077positivecnn producer on hunt for saddest-looking fuck ...
\n","

8000 rows × 2 columns

\n","
"],"text/plain":[" y text\n","1463 positive red cross installs blood drop-off bins for don...\n","9543 positive president-elect edwards seen entering chinatow...\n","2930 positive crowd at trump rally realizes they've been cha...\n","644 positive kerry captures bin laden one week too late\n","2390 negative 'new hampshire' episode 4: not just for old, w...\n","... ... ...\n","9034 negative deputy interior secretary met with lobbyist fo...\n","6330 negative parents of kidnapped girls make desperate plea\n","7127 negative image vs. substance in your self-made journey\n","7289 positive 'secretary clinton is a different person than ...\n","9077 positive cnn producer on hunt for saddest-looking fuck ...\n","\n","[8000 rows x 2 columns]"]},"metadata":{"tags":[]},"execution_count":3}]},{"cell_type":"markdown","metadata":{"id":"0296Om2C5anY"},"source":["# 3. Train Deep Learning Classifier using nlu.load('train.sentiment')\n","\n","You dataset label column should be named 'y' and the feature column with text data should be named 'text'"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":1000},"id":"3ZIPkRkWftBG","executionInfo":{"status":"ok","timestamp":1620188373251,"user_tz":-120,"elapsed":10920,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"0f669bc1-ee54-423d-81ff-d52fd9b53808"},"source":["import nlu \n","from sklearn.metrics import classification_report\n","\n","# load a trainable pipeline by specifying the train. prefix and fit it on a datset with label and text columns\n","# by default the Universal Sentence Encoder (USE) Sentence embeddings are used for generation\n","trainable_pipe = nlu.load('train.sentiment')\n","fitted_pipe = trainable_pipe.fit(train_df.iloc[:50])\n","\n","# predict with the trainable pipeline on dataset and get predictions\n","preds = fitted_pipe.predict(train_df.iloc[:50],output_level='document')\n","#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n","preds.dropna(inplace=True)\n","print(classification_report(preds['y'], preds['sentiment']))\n","\n","preds"],"execution_count":null,"outputs":[{"output_type":"stream","text":["tfhub_use download started this may take some time.\n","Approximate size to download 923.7 MB\n","[OK!]\n","sentence_detector_dl download started this may take some time.\n","Approximate size to download 354.6 KB\n","[OK!]\n"," precision recall f1-score support\n","\n"," negative 0.00 0.00 0.00 24\n"," neutral 0.00 0.00 0.00 0\n"," positive 1.00 0.23 0.38 26\n","\n"," accuracy 0.12 50\n"," macro avg 0.33 0.08 0.12 50\n","weighted avg 0.52 0.12 0.20 50\n","\n"],"name":"stdout"},{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
trained_sentimentdocumentsentence_embedding_usetexttrained_sentiment_confidencesentenceyorigin_index
0neutralred cross installs blood drop-off bins for don...[-0.07407097518444061, 0.020259270444512367, -...red cross installs blood drop-off bins for don...0.586574[red cross installs blood drop-off bins for do...positive1463
1neutralpresident-elect edwards seen entering chinatow...[0.07900547981262207, 0.06132232025265694, 0.0...president-elect edwards seen entering chinatow...0.539660[president-elect edwards seen entering chinato...positive9543
2positivecrowd at trump rally realizes they've been cha...[0.00032250594813376665, -0.022321783006191254...crowd at trump rally realizes they've been cha...0.624888[crowd at trump rally realizes they've been ch...positive2930
3neutralkerry captures bin laden one week too late[-0.004850792698562145, 0.017207739874720573, ...kerry captures bin laden one week too late0.594691[kerry captures bin laden one week too late]positive644
4neutral'new hampshire' episode 4: not just for old, w...[-0.04294964671134949, -0.07175017148256302, -...'new hampshire' episode 4: not just for old, w...0.546094['new hampshire' episode 4: not just for old, ...negative2390
5neutral12 indie spots in hong kong[-0.036156971007585526, -0.014244569465517998,...12 indie spots in hong kong0.509936[12 indie spots in hong kong]negative5450
6neutralformer refugee fights for her dream to abolish...[-0.0035593388602137566, 0.008986725471913815,...former refugee fights for her dream to abolish...0.578656[former refugee fights for her dream to abolis...negative7739
7neutralwatch out, sephora. h&m beauty is coming for you.[-0.02806154452264309, -0.04734545946121216, 0...watch out, sephora. h&m beauty is coming for you.0.503327[watch out, sephora., h&m beauty is coming for...negative4370
8neutralcomputer scientists say ai's underdeveloped et...[-0.007209080271422863, -0.01665995828807354, ...computer scientists say ai's underdeveloped et...0.564115[computer scientists say ai's underdeveloped e...positive9418
9neutralpast armageddon and on to zippori, one of isra...[-0.0013639864046126604, -0.05516374856233597,...past armageddon and on to zippori, one of isra...0.528413[past armageddon and on to zippori, one of isr...negative4357
10neutralobama caves to girl scout lobby, wears tiara i...[-0.019499918445944786, -0.01798412762582302, ...obama caves to girl scout lobby, wears tiara i...0.580572[obama caves to girl scout lobby, wears tiara ...negative2418
11neutralisrael passes law cementing itself as exclusiv...[0.05780275911092758, -0.04909253492951393, -0...israel passes law cementing itself as exclusiv...0.587168[israel passes law cementing itself as exclusi...positive7882
12positivelocal man's fear of snakes increases with each...[0.056381646543741226, -0.057368289679288864, ...local man's fear of snakes increases with each...0.611751[local man's fear of snakes increases with eac...positive6117
13neutralmedia ethics: whose standards?[0.0406915545463562, 0.03338606283068657, -0.0...media ethics: whose standards?0.522510[media ethics: whose standards?]negative7992
14neutralrare species of frog may hold cure to...ah, ne...[0.079066202044487, 0.021108830347657204, -0.0...rare species of frog may hold cure to...ah, ne...0.583740[rare species of frog may hold cure to., ..ah,...positive6445
15neutralpolice officers waving everyone over to take a...[-0.011674991808831692, -0.07644280791282654, ...police officers waving everyone over to take a...0.551790[police officers waving everyone over to take ...positive8222
16neutralnate berkus and jeremiah brent are married![0.03597276285290718, 0.05780172348022461, -0....nate berkus and jeremiah brent are married!0.521706[nate berkus and jeremiah brent are married!]negative3847
17neutralsummer camp hierarchy thrown into chaos after ...[-0.07574271410703659, 0.037431202828884125, 0...summer camp hierarchy thrown into chaos after ...0.547273[summer camp hierarchy thrown into chaos after...positive253
18neutraliraqi officer under saddam masterminded the ri...[0.02313143201172352, -0.05338258296251297, -0...iraqi officer under saddam masterminded the ri...0.570464[iraqi officer under saddam masterminded the r...negative2504
19neutralcop crashed cruiser into ditch after this owl ...[-0.0011760082561522722, -0.07277680933475494,...cop crashed cruiser into ditch after this owl ...0.539377[cop crashed cruiser into ditch after this owl...negative6992
20positivebush elected president of iraq[-0.02786075882613659, 0.044202402234077454, -...bush elected president of iraq0.607670[bush elected president of iraq]positive5882
21neutralsenate does equifax a favor as a former execut...[0.048467256128787994, -0.020521214231848717, ...senate does equifax a favor as a former execut...0.546467[senate does equifax a favor as a former execu...negative9721
22neutraltexan feels emotionally empty after chili cook...[-0.03747135400772095, 0.021678565070033073, -...texan feels emotionally empty after chili cook...0.593196[texan feels emotionally empty after chili coo...positive8362
23neutrallgbtq activists organizing massive dance prote...[0.03160602226853371, -0.031953465193510056, -...lgbtq activists organizing massive dance prote...0.582981[lgbtq activists organizing massive dance prot...negative1073
24neutralsamsung slashes profit forecast after pulling ...[0.06239231675863266, -0.03570358455181122, -0...samsung slashes profit forecast after pulling ...0.528766[samsung slashes profit forecast after pulling...negative7620
25neutralunpublished twain autobiography rails against ...[-0.030312832444906235, 0.06083317846059799, 0...unpublished twain autobiography rails against ...0.546941[unpublished twain autobiography rails against...positive6476
26neutralvegetarian begins sad, private routine of scan...[-0.07400622963905334, -0.04156332463026047, -...vegetarian begins sad, private routine of scan...0.557931[vegetarian begins sad, private routine of sca...positive6011
27neutraldown with cutesy cleaning supplies![-0.05738799646496773, 0.06619091331958771, -0...down with cutesy cleaning supplies!0.523498[down with cutesy cleaning supplies!]negative7466
28neutralhillary clinton sets personal single rep squat...[0.05143097788095474, 0.03707878664135933, -0....hillary clinton sets personal single rep squat...0.586224[hillary clinton sets personal single rep squa...positive2975
29neutraltraveling tips for families with special diet[-0.05301835760474205, -0.04789021238684654, -...traveling tips for families with special diet0.544159[traveling tips for families with special diet]negative9610
30neutraltearful biden carefully takes down blacklight ...[0.02181081660091877, -0.05245676264166832, 0....tearful biden carefully takes down blacklight ...0.578549[tearful biden carefully takes down blacklight...positive9186
31neutral8-year-old girl dies after drinking boiling wa...[-0.07436364144086838, -0.042152464389801025, ...8-year-old girl dies after drinking boiling wa...0.536114[8-year-old girl dies after drinking boiling w...negative5053
32neutralwoody harrelson applies to open a marijuana di...[0.0774756669998169, -0.06551078706979752, -0....woody harrelson applies to open a marijuana di...0.520218[woody harrelson applies to open a marijuana d...negative2457
33neutralinterview with elaine jung[-0.0419117696583271, -0.02243213728070259, 0....interview with elaine jung0.504458[interview with elaine jung]negative4830
34neutralhow your sleep changes with the moon[0.005925372242927551, 0.0024853269569575787, ...how your sleep changes with the moon0.505398[how your sleep changes with the moon]negative9055
35neutraldr. scholl's introduces new cartilage inserts ...[0.05181374028325081, 0.061040107160806656, -0...dr. scholl's introduces new cartilage inserts ...0.577065[dr. scholl's introduces new cartilage inserts...positive1651
36positivenation's dogs dangerously underpetted, say dogs[-0.01876039244234562, -0.03308898210525513, -...nation's dogs dangerously underpetted, say dogs0.600809[nation's dogs dangerously underpetted, say dogs]positive5842
37neutralsean bean's most memorable death wasn't 'lord ...[-0.013742960058152676, 0.05260717123746872, -...sean bean's most memorable death wasn't 'lord ...0.530136[sean bean's most memorable death wasn't 'lord...negative3328
38neutralwife too busy videotaping elk attack to save h...[-0.02963477373123169, -0.024700473994016647, ...wife too busy videotaping elk attack to save h...0.575398[wife too busy videotaping elk attack to save ...positive3869
39positiveobama, romney remain about equally powerful[0.046225905418395996, 0.020606394857168198, -...obama, romney remain about equally powerful0.609472[obama, romney remain about equally powerful]positive6087
40neutralman arriving late forced to use excuse he was ...[0.007267913315445185, 0.012945346534252167, -...man arriving late forced to use excuse he was ...0.545002[man arriving late forced to use excuse he was...positive6329
41neutralnews roundup for june 19, 2017[0.006106187589466572, -0.07620107382535934, -...news roundup for june 19, 20170.519457[news roundup for june 19, 2017]negative4470
42neutralreport: 1 in 5 air ducts contains person looki...[-0.03163425996899605, 0.012088724412024021, -...report: 1 in 5 air ducts contains person looki...0.572511[report: 1 in 5 air ducts contains person look...positive4423
43neutralarrested but innocent? the internet still thin...[0.0008291222620755434, -0.07605188339948654, ...arrested but innocent? the internet still thin...0.505108[arrested but innocent?, the internet still t...negative7363
44neutralthis desperate dad is trying to ward off the t...[-0.009455329738557339, -0.0632706880569458, -...this desperate dad is trying to ward off the t...0.539598[this desperate dad is trying to ward off the ...negative6400
45neutralin defense of the promposal[0.04038773849606514, 0.06097205728292465, -0....in defense of the promposal0.511992[in defense of the promposal]negative9190
46neutralreason man turning to religion later in life m...[0.024827271699905396, -0.00032991680200211704...reason man turning to religion later in life m...0.574382[reason man turning to religion later in life ...positive234
47neutralbunch of hick nobodies sue for toxic-waste exp...[0.02218099869787693, 0.0022194429766386747, 0...bunch of hick nobodies sue for toxic-waste exp...0.570060[bunch of hick nobodies sue for toxic-waste ex...positive5098
48neutralno one in family sure who trip to arboretum is...[-0.03585259988903999, 0.017737973481416702, -...no one in family sure who trip to arboretum is...0.573483[no one in family sure who trip to arboretum i...positive2552
49positivevoice coming from dnc sound system during sand...[-0.009395286440849304, -0.0020206505432724953...voice coming from dnc sound system during sand...0.602834[voice coming from dnc sound system during san...positive1449
\n","
"],"text/plain":[" trained_sentiment ... origin_index\n","0 neutral ... 1463\n","1 neutral ... 9543\n","2 positive ... 2930\n","3 neutral ... 644\n","4 neutral ... 2390\n","5 neutral ... 5450\n","6 neutral ... 7739\n","7 neutral ... 4370\n","8 neutral ... 9418\n","9 neutral ... 4357\n","10 neutral ... 2418\n","11 neutral ... 7882\n","12 positive ... 6117\n","13 neutral ... 7992\n","14 neutral ... 6445\n","15 neutral ... 8222\n","16 neutral ... 3847\n","17 neutral ... 253\n","18 neutral ... 2504\n","19 neutral ... 6992\n","20 positive ... 5882\n","21 neutral ... 9721\n","22 neutral ... 8362\n","23 neutral ... 1073\n","24 neutral ... 7620\n","25 neutral ... 6476\n","26 neutral ... 6011\n","27 neutral ... 7466\n","28 neutral ... 2975\n","29 neutral ... 9610\n","30 neutral ... 9186\n","31 neutral ... 5053\n","32 neutral ... 2457\n","33 neutral ... 4830\n","34 neutral ... 9055\n","35 neutral ... 1651\n","36 positive ... 5842\n","37 neutral ... 3328\n","38 neutral ... 3869\n","39 positive ... 6087\n","40 neutral ... 6329\n","41 neutral ... 4470\n","42 neutral ... 4423\n","43 neutral ... 7363\n","44 neutral ... 6400\n","45 neutral ... 9190\n","46 neutral ... 234\n","47 neutral ... 5098\n","48 neutral ... 2552\n","49 positive ... 1449\n","\n","[50 rows x 8 columns]"]},"metadata":{"tags":[]},"execution_count":7}]},{"cell_type":"markdown","metadata":{"id":"lVyOE2wV0fw_"},"source":["# 4. Test the fitted pipe on new example"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":80},"id":"qdCUg2MR0PD2","executionInfo":{"status":"ok","timestamp":1620188373652,"user_tz":-120,"elapsed":10618,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"3740cfa7-b6c4-41f0-eb9c-d4b101438602"},"source":["fitted_pipe.predict('Aliens are immortal!')\n"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
trained_sentimentdocumentsentence_embedding_usetrained_sentiment_confidencesentenceorigin_index
0neutralAliens are immortal![-0.0700131505727768, -0.06706050038337708, -0...0.549292[Aliens are immortal!]0
\n","
"],"text/plain":[" trained_sentiment document ... sentence origin_index\n","0 neutral Aliens are immortal! ... [Aliens are immortal!] 0\n","\n","[1 rows x 6 columns]"]},"metadata":{"tags":[]},"execution_count":8}]},{"cell_type":"markdown","metadata":{"id":"xflpwrVjjBVD"},"source":["## 5. Configure pipe training parameters"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"UtsAUGTmOTms","executionInfo":{"status":"ok","timestamp":1620188373653,"user_tz":-120,"elapsed":10552,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"e43b4908-42f7-46fb-d110-210ae33b8ff9"},"source":["trainable_pipe.print_info()"],"execution_count":null,"outputs":[{"output_type":"stream","text":["The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n",">>> pipe['sentiment_dl'] has settable params:\n","pipe['sentiment_dl'].setMaxEpochs(1) | Info: Maximum number of epochs to train | Currently set to : 1\n","pipe['sentiment_dl'].setLr(0.005) | Info: Learning Rate | Currently set to : 0.005\n","pipe['sentiment_dl'].setBatchSize(64) | Info: Batch size | Currently set to : 64\n","pipe['sentiment_dl'].setDropout(0.5) | Info: Dropout coefficient | Currently set to : 0.5\n","pipe['sentiment_dl'].setEnableOutputLogs(True) | Info: Whether to use stdout in addition to Spark logs. | Currently set to : True\n","pipe['sentiment_dl'].setThreshold(0.6) | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n","pipe['sentiment_dl'].setThresholdLabel('neutral') | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n",">>> pipe['use@tfhub_use'] has settable params:\n","pipe['use@tfhub_use'].setDimension(512) | Info: Number of embedding dimensions | Currently set to : 512\n","pipe['use@tfhub_use'].setLoadSP(False) | Info: Whether to load SentencePiece ops file which is required only by multi-lingual models. This is not changeable after it's set with a pretrained model nor it is compatible with Windows. | Currently set to : False\n","pipe['use@tfhub_use'].setStorageRef('tfhub_use') | Info: unique reference name for identification | Currently set to : tfhub_use\n",">>> pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'] has settable params:\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setExplodeSentences(False) | Info: whether to explode each sentence into a different row, for better parallelization. Defaults to false. | Currently set to : False\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setStorageRef('SentenceDetectorDLModel_c83c27f46b97') | Info: storage unique identifier | Currently set to : SentenceDetectorDLModel_c83c27f46b97\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setEncoder(com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@9b0d688) | Info: Data encoder | Currently set to : com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@9b0d688\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setImpossiblePenultimates(['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']) | Info: Impossible penultimates | Currently set to : ['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setModelArchitecture('cnn') | Info: Model architecture (CNN) | Currently set to : cnn\n",">>> pipe['document_assembler'] has settable params:\n","pipe['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"2GJdDNV9jEIe"},"source":["## 6. Retrain with new parameters"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":1000},"id":"mptfvHx-MMMX","executionInfo":{"status":"ok","timestamp":1620188376889,"user_tz":-120,"elapsed":13708,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"86a3b364-8312-4dc0-beb0-37293e3ce167"},"source":["# Train longer!\n","trainable_pipe = nlu.load('train.sentiment')\n","trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(5) \n","fitted_pipe = trainable_pipe.fit(train_df.iloc[:50])\n","# predict with the trainable pipeline on dataset and get predictions\n","preds = fitted_pipe.predict(train_df.iloc[:50],output_level='document')\n","\n","#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n","preds.dropna(inplace=True)\n","print(classification_report(preds['y'], preds['sentiment']))\n","\n","preds"],"execution_count":null,"outputs":[{"output_type":"stream","text":[" precision recall f1-score support\n","\n"," negative 1.00 0.88 0.93 24\n"," neutral 0.00 0.00 0.00 0\n"," positive 0.96 0.92 0.94 26\n","\n"," accuracy 0.90 50\n"," macro avg 0.65 0.60 0.62 50\n","weighted avg 0.98 0.90 0.94 50\n","\n"],"name":"stdout"},{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
trained_sentimentdocumentsentence_embedding_usetexttrained_sentiment_confidencesentenceyorigin_index
0positivered cross installs blood drop-off bins for don...[-0.07407097518444061, 0.020259270444512367, -...red cross installs blood drop-off bins for don...0.918347[red cross installs blood drop-off bins for do...positive1463
1neutralpresident-elect edwards seen entering chinatow...[0.07900547981262207, 0.06132232025265694, 0.0...president-elect edwards seen entering chinatow...0.591700[president-elect edwards seen entering chinato...positive9543
2positivecrowd at trump rally realizes they've been cha...[0.00032250594813376665, -0.022321783006191254...crowd at trump rally realizes they've been cha...0.963672[crowd at trump rally realizes they've been ch...positive2930
3positivekerry captures bin laden one week too late[-0.004850792698562145, 0.017207739874720573, ...kerry captures bin laden one week too late0.846780[kerry captures bin laden one week too late]positive644
4negative'new hampshire' episode 4: not just for old, w...[-0.04294964671134949, -0.07175017148256302, -...'new hampshire' episode 4: not just for old, w...0.888615['new hampshire' episode 4: not just for old, ...negative2390
5negative12 indie spots in hong kong[-0.036156971007585526, -0.014244569465517998,...12 indie spots in hong kong0.889380[12 indie spots in hong kong]negative5450
6neutralformer refugee fights for her dream to abolish...[-0.0035593388602137566, 0.008986725471913815,...former refugee fights for her dream to abolish...0.516011[former refugee fights for her dream to abolis...negative7739
7negativewatch out, sephora. h&m beauty is coming for you.[-0.02806154452264309, -0.04734545946121216, 0...watch out, sephora. h&m beauty is coming for you.0.904426[watch out, sephora., h&m beauty is coming for...negative4370
8positivecomputer scientists say ai's underdeveloped et...[-0.007209080271422863, -0.01665995828807354, ...computer scientists say ai's underdeveloped et...0.881709[computer scientists say ai's underdeveloped e...positive9418
9negativepast armageddon and on to zippori, one of isra...[-0.0013639864046126604, -0.05516374856233597,...past armageddon and on to zippori, one of isra...0.887740[past armageddon and on to zippori, one of isr...negative4357
10positiveobama caves to girl scout lobby, wears tiara i...[-0.019499918445944786, -0.01798412762582302, ...obama caves to girl scout lobby, wears tiara i...0.654486[obama caves to girl scout lobby, wears tiara ...negative2418
11positiveisrael passes law cementing itself as exclusiv...[0.05780275911092758, -0.04909253492951393, -0...israel passes law cementing itself as exclusiv...0.803852[israel passes law cementing itself as exclusi...positive7882
12positivelocal man's fear of snakes increases with each...[0.056381646543741226, -0.057368289679288864, ...local man's fear of snakes increases with each...0.973072[local man's fear of snakes increases with eac...positive6117
13negativemedia ethics: whose standards?[0.0406915545463562, 0.03338606283068657, -0.0...media ethics: whose standards?0.884770[media ethics: whose standards?]negative7992
14positiverare species of frog may hold cure to...ah, ne...[0.079066202044487, 0.021108830347657204, -0.0...rare species of frog may hold cure to...ah, ne...0.942009[rare species of frog may hold cure to., ..ah,...positive6445
15positivepolice officers waving everyone over to take a...[-0.011674991808831692, -0.07644280791282654, ...police officers waving everyone over to take a...0.616271[police officers waving everyone over to take ...positive8222
16negativenate berkus and jeremiah brent are married![0.03597276285290718, 0.05780172348022461, -0....nate berkus and jeremiah brent are married!0.904110[nate berkus and jeremiah brent are married!]negative3847
17neutralsummer camp hierarchy thrown into chaos after ...[-0.07574271410703659, 0.037431202828884125, 0...summer camp hierarchy thrown into chaos after ...0.586359[summer camp hierarchy thrown into chaos after...positive253
18negativeiraqi officer under saddam masterminded the ri...[0.02313143201172352, -0.05338258296251297, -0...iraqi officer under saddam masterminded the ri...0.712658[iraqi officer under saddam masterminded the r...negative2504
19negativecop crashed cruiser into ditch after this owl ...[-0.0011760082561522722, -0.07277680933475494,...cop crashed cruiser into ditch after this owl ...0.835989[cop crashed cruiser into ditch after this owl...negative6992
20positivebush elected president of iraq[-0.02786075882613659, 0.044202402234077454, -...bush elected president of iraq0.928596[bush elected president of iraq]positive5882
21negativesenate does equifax a favor as a former execut...[0.048467256128787994, -0.020521214231848717, ...senate does equifax a favor as a former execut...0.908127[senate does equifax a favor as a former execu...negative9721
22positivetexan feels emotionally empty after chili cook...[-0.03747135400772095, 0.021678565070033073, -...texan feels emotionally empty after chili cook...0.953778[texan feels emotionally empty after chili coo...positive8362
23neutrallgbtq activists organizing massive dance prote...[0.03160602226853371, -0.031953465193510056, -...lgbtq activists organizing massive dance prote...0.548180[lgbtq activists organizing massive dance prot...negative1073
24negativesamsung slashes profit forecast after pulling ...[0.06239231675863266, -0.03570358455181122, -0...samsung slashes profit forecast after pulling ...0.912397[samsung slashes profit forecast after pulling...negative7620
25positiveunpublished twain autobiography rails against ...[-0.030312832444906235, 0.06083317846059799, 0...unpublished twain autobiography rails against ...0.709892[unpublished twain autobiography rails against...positive6476
26positivevegetarian begins sad, private routine of scan...[-0.07400622963905334, -0.04156332463026047, -...vegetarian begins sad, private routine of scan...0.845192[vegetarian begins sad, private routine of sca...positive6011
27negativedown with cutesy cleaning supplies![-0.05738799646496773, 0.06619091331958771, -0...down with cutesy cleaning supplies!0.853556[down with cutesy cleaning supplies!]negative7466
28positivehillary clinton sets personal single rep squat...[0.05143097788095474, 0.03707878664135933, -0....hillary clinton sets personal single rep squat...0.908901[hillary clinton sets personal single rep squa...positive2975
29negativetraveling tips for families with special diet[-0.05301835760474205, -0.04789021238684654, -...traveling tips for families with special diet0.788320[traveling tips for families with special diet]negative9610
30positivetearful biden carefully takes down blacklight ...[0.02181081660091877, -0.05245676264166832, 0....tearful biden carefully takes down blacklight ...0.836097[tearful biden carefully takes down blacklight...positive9186
31negative8-year-old girl dies after drinking boiling wa...[-0.07436364144086838, -0.042152464389801025, ...8-year-old girl dies after drinking boiling wa...0.888045[8-year-old girl dies after drinking boiling w...negative5053
32negativewoody harrelson applies to open a marijuana di...[0.0774756669998169, -0.06551078706979752, -0....woody harrelson applies to open a marijuana di...0.916998[woody harrelson applies to open a marijuana d...negative2457
33negativeinterview with elaine jung[-0.0419117696583271, -0.02243213728070259, 0....interview with elaine jung0.909372[interview with elaine jung]negative4830
34negativehow your sleep changes with the moon[0.005925372242927551, 0.0024853269569575787, ...how your sleep changes with the moon0.908831[how your sleep changes with the moon]negative9055
35positivedr. scholl's introduces new cartilage inserts ...[0.05181374028325081, 0.061040107160806656, -0...dr. scholl's introduces new cartilage inserts ...0.901216[dr. scholl's introduces new cartilage inserts...positive1651
36positivenation's dogs dangerously underpetted, say dogs[-0.01876039244234562, -0.03308898210525513, -...nation's dogs dangerously underpetted, say dogs0.959039[nation's dogs dangerously underpetted, say dogs]positive5842
37negativesean bean's most memorable death wasn't 'lord ...[-0.013742960058152676, 0.05260717123746872, -...sean bean's most memorable death wasn't 'lord ...0.888762[sean bean's most memorable death wasn't 'lord...negative3328
38positivewife too busy videotaping elk attack to save h...[-0.02963477373123169, -0.024700473994016647, ...wife too busy videotaping elk attack to save h...0.892977[wife too busy videotaping elk attack to save ...positive3869
39positiveobama, romney remain about equally powerful[0.046225905418395996, 0.020606394857168198, -...obama, romney remain about equally powerful0.950219[obama, romney remain about equally powerful]positive6087
40positiveman arriving late forced to use excuse he was ...[0.007267913315445185, 0.012945346534252167, -...man arriving late forced to use excuse he was ...0.854663[man arriving late forced to use excuse he was...positive6329
41negativenews roundup for june 19, 2017[0.006106187589466572, -0.07620107382535934, -...news roundup for june 19, 20170.926237[news roundup for june 19, 2017]negative4470
42positivereport: 1 in 5 air ducts contains person looki...[-0.03163425996899605, 0.012088724412024021, -...report: 1 in 5 air ducts contains person looki...0.952198[report: 1 in 5 air ducts contains person look...positive4423
43negativearrested but innocent? the internet still thin...[0.0008291222620755434, -0.07605188339948654, ...arrested but innocent? the internet still thin...0.936171[arrested but innocent?, the internet still t...negative7363
44negativethis desperate dad is trying to ward off the t...[-0.009455329738557339, -0.0632706880569458, -...this desperate dad is trying to ward off the t...0.873377[this desperate dad is trying to ward off the ...negative6400
45negativein defense of the promposal[0.04038773849606514, 0.06097205728292465, -0....in defense of the promposal0.866386[in defense of the promposal]negative9190
46positivereason man turning to religion later in life m...[0.024827271699905396, -0.00032991680200211704...reason man turning to religion later in life m...0.901040[reason man turning to religion later in life ...positive234
47positivebunch of hick nobodies sue for toxic-waste exp...[0.02218099869787693, 0.0022194429766386747, 0...bunch of hick nobodies sue for toxic-waste exp...0.874898[bunch of hick nobodies sue for toxic-waste ex...positive5098
48positiveno one in family sure who trip to arboretum is...[-0.03585259988903999, 0.017737973481416702, -...no one in family sure who trip to arboretum is...0.859806[no one in family sure who trip to arboretum i...positive2552
49positivevoice coming from dnc sound system during sand...[-0.009395286440849304, -0.0020206505432724953...voice coming from dnc sound system during sand...0.943251[voice coming from dnc sound system during san...positive1449
\n","
"],"text/plain":[" trained_sentiment ... origin_index\n","0 positive ... 1463\n","1 neutral ... 9543\n","2 positive ... 2930\n","3 positive ... 644\n","4 negative ... 2390\n","5 negative ... 5450\n","6 neutral ... 7739\n","7 negative ... 4370\n","8 positive ... 9418\n","9 negative ... 4357\n","10 positive ... 2418\n","11 positive ... 7882\n","12 positive ... 6117\n","13 negative ... 7992\n","14 positive ... 6445\n","15 positive ... 8222\n","16 negative ... 3847\n","17 neutral ... 253\n","18 negative ... 2504\n","19 negative ... 6992\n","20 positive ... 5882\n","21 negative ... 9721\n","22 positive ... 8362\n","23 neutral ... 1073\n","24 negative ... 7620\n","25 positive ... 6476\n","26 positive ... 6011\n","27 negative ... 7466\n","28 positive ... 2975\n","29 negative ... 9610\n","30 positive ... 9186\n","31 negative ... 5053\n","32 negative ... 2457\n","33 negative ... 4830\n","34 negative ... 9055\n","35 positive ... 1651\n","36 positive ... 5842\n","37 negative ... 3328\n","38 positive ... 3869\n","39 positive ... 6087\n","40 positive ... 6329\n","41 negative ... 4470\n","42 positive ... 4423\n","43 negative ... 7363\n","44 negative ... 6400\n","45 negative ... 9190\n","46 positive ... 234\n","47 positive ... 5098\n","48 positive ... 2552\n","49 positive ... 1449\n","\n","[50 rows x 8 columns]"]},"metadata":{"tags":[]},"execution_count":10}]},{"cell_type":"markdown","metadata":{"id":"qFoT-s1MjTSS"},"source":["# 7. Try training with different Embeddings"]},{"cell_type":"code","metadata":{"id":"nxWFzQOhjWC8","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1620188377296,"user_tz":-120,"elapsed":14031,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"504a2809-2e9f-45b1-812d-4cd953323d9f"},"source":["# We can use nlu.print_components(action='embed_sentence') to see every possibler sentence embedding we could use. Lets use bert!\n","nlu.print_components(action='embed_sentence')"],"execution_count":null,"outputs":[{"output_type":"stream","text":["For language NLU provides the following Models : \n","nlu.load('en.embed_sentence') returns Spark NLP model tfhub_use\n","nlu.load('en.embed_sentence.use') returns Spark NLP model tfhub_use\n","nlu.load('en.embed_sentence.tfhub_use') returns Spark NLP model tfhub_use\n","nlu.load('en.embed_sentence.use.lg') returns Spark NLP model tfhub_use_lg\n","nlu.load('en.embed_sentence.tfhub_use.lg') returns Spark NLP model tfhub_use_lg\n","nlu.load('en.embed_sentence.albert') returns Spark NLP model albert_base_uncased\n","nlu.load('en.embed_sentence.electra') returns Spark NLP model sent_electra_small_uncased\n","nlu.load('en.embed_sentence.electra_small_uncased') returns Spark NLP model sent_electra_small_uncased\n","nlu.load('en.embed_sentence.electra_base_uncased') returns Spark NLP model sent_electra_base_uncased\n","nlu.load('en.embed_sentence.electra_large_uncased') returns Spark NLP model sent_electra_large_uncased\n","nlu.load('en.embed_sentence.bert') returns Spark NLP model sent_bert_base_uncased\n","nlu.load('en.embed_sentence.bert_base_uncased') returns Spark NLP model sent_bert_base_uncased\n","nlu.load('en.embed_sentence.bert_base_cased') returns Spark NLP model sent_bert_base_cased\n","nlu.load('en.embed_sentence.bert_large_uncased') returns Spark NLP model sent_bert_large_uncased\n","nlu.load('en.embed_sentence.bert_large_cased') returns Spark NLP model sent_bert_large_cased\n","nlu.load('en.embed_sentence.biobert.pubmed_base_cased') returns Spark NLP model sent_biobert_pubmed_base_cased\n","nlu.load('en.embed_sentence.biobert.pubmed_large_cased') returns Spark NLP model sent_biobert_pubmed_large_cased\n","nlu.load('en.embed_sentence.biobert.pmc_base_cased') returns Spark NLP model sent_biobert_pmc_base_cased\n","nlu.load('en.embed_sentence.biobert.pubmed_pmc_base_cased') returns Spark NLP model sent_biobert_pubmed_pmc_base_cased\n","nlu.load('en.embed_sentence.biobert.clinical_base_cased') returns Spark NLP model sent_biobert_clinical_base_cased\n","nlu.load('en.embed_sentence.biobert.discharge_base_cased') returns Spark NLP model sent_biobert_discharge_base_cased\n","nlu.load('en.embed_sentence.covidbert.large_uncased') returns Spark NLP model sent_covidbert_large_uncased\n","nlu.load('en.embed_sentence.small_bert_L2_128') returns Spark NLP model sent_small_bert_L2_128\n","nlu.load('en.embed_sentence.small_bert_L4_128') returns Spark NLP model sent_small_bert_L4_128\n","nlu.load('en.embed_sentence.small_bert_L6_128') returns Spark NLP model sent_small_bert_L6_128\n","nlu.load('en.embed_sentence.small_bert_L8_128') returns Spark NLP model sent_small_bert_L8_128\n","nlu.load('en.embed_sentence.small_bert_L10_128') returns Spark NLP model sent_small_bert_L10_128\n","nlu.load('en.embed_sentence.small_bert_L12_128') returns Spark NLP model sent_small_bert_L12_128\n","nlu.load('en.embed_sentence.small_bert_L2_256') returns Spark NLP model sent_small_bert_L2_256\n","nlu.load('en.embed_sentence.small_bert_L4_256') returns Spark NLP model sent_small_bert_L4_256\n","nlu.load('en.embed_sentence.small_bert_L6_256') returns Spark NLP model sent_small_bert_L6_256\n","nlu.load('en.embed_sentence.small_bert_L8_256') returns Spark NLP model sent_small_bert_L8_256\n","nlu.load('en.embed_sentence.small_bert_L10_256') returns Spark NLP model sent_small_bert_L10_256\n","nlu.load('en.embed_sentence.small_bert_L12_256') returns Spark NLP model sent_small_bert_L12_256\n","nlu.load('en.embed_sentence.small_bert_L2_512') returns Spark NLP model sent_small_bert_L2_512\n","nlu.load('en.embed_sentence.small_bert_L4_512') returns Spark NLP model sent_small_bert_L4_512\n","nlu.load('en.embed_sentence.small_bert_L6_512') returns Spark NLP model sent_small_bert_L6_512\n","nlu.load('en.embed_sentence.small_bert_L8_512') returns Spark NLP model sent_small_bert_L8_512\n","nlu.load('en.embed_sentence.small_bert_L10_512') returns Spark NLP model sent_small_bert_L10_512\n","nlu.load('en.embed_sentence.small_bert_L12_512') returns Spark NLP model sent_small_bert_L12_512\n","nlu.load('en.embed_sentence.small_bert_L2_768') returns Spark NLP model sent_small_bert_L2_768\n","nlu.load('en.embed_sentence.small_bert_L4_768') returns Spark NLP model sent_small_bert_L4_768\n","nlu.load('en.embed_sentence.small_bert_L6_768') returns Spark NLP model sent_small_bert_L6_768\n","nlu.load('en.embed_sentence.small_bert_L8_768') returns Spark NLP model sent_small_bert_L8_768\n","nlu.load('en.embed_sentence.small_bert_L10_768') returns Spark NLP model sent_small_bert_L10_768\n","nlu.load('en.embed_sentence.small_bert_L12_768') returns Spark NLP model sent_small_bert_L12_768\n","For language NLU provides the following Models : \n","nlu.load('fi.embed_sentence') returns Spark NLP model sent_bert_finnish_cased\n","nlu.load('fi.embed_sentence.bert.cased') returns Spark NLP model sent_bert_finnish_cased\n","nlu.load('fi.embed_sentence.bert.uncased') returns Spark NLP model sent_bert_finnish_uncased\n","For language NLU provides the following Models : \n","nlu.load('xx.embed_sentence') returns Spark NLP model sent_bert_multi_cased\n","nlu.load('xx.embed_sentence.bert') returns Spark NLP model sent_bert_multi_cased\n","nlu.load('xx.embed_sentence.bert.cased') returns Spark NLP model sent_bert_multi_cased\n","nlu.load('xx.embed_sentence.labse') returns Spark NLP model labse\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"IKK_Ii_gjJfF","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1620190483755,"user_tz":-120,"elapsed":2120443,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"f6ff250f-0e4f-4425-a92e-18f08d6f1334"},"source":["trainable_pipe = nlu.load('en.embed_sentence.small_bert_L12_768 train.sentiment')\n","# We need to train longer and user smaller LR for NON-USE based sentence embeddings usually\n","# We could tune the hyperparameters further with hyperparameter tuning methods like gridsearch\n","# Also longer training gives more accuracy\n","trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(120) \n","trainable_pipe['trainable_sentiment_dl'].setLr(0.0005) \n","fitted_pipe = trainable_pipe.fit(train_df)\n","# predict with the trainable pipeline on dataset and get predictions\n","preds = fitted_pipe.predict(train_df,output_level='document')\n","\n","#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n","preds.dropna(inplace=True)\n","print(classification_report(preds['y'], preds['sentiment']))\n","\n","#preds"],"execution_count":null,"outputs":[{"output_type":"stream","text":["sent_small_bert_L12_768 download started this may take some time.\n","Approximate size to download 392.9 MB\n","[OK!]\n","sentence_detector_dl download started this may take some time.\n","Approximate size to download 354.6 KB\n","[OK!]\n"," precision recall f1-score support\n","\n"," negative 0.90 0.88 0.89 4023\n"," neutral 0.00 0.00 0.00 0\n"," positive 0.90 0.88 0.89 3977\n","\n"," accuracy 0.88 8000\n"," macro avg 0.60 0.58 0.59 8000\n","weighted avg 0.90 0.88 0.89 8000\n","\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"_1jxw3GnVGlI"},"source":["# 7.1 evaluate on Test Data"]},{"cell_type":"code","metadata":{"id":"Fxx4yNkNVGFl","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1620190639161,"user_tz":-120,"elapsed":2275776,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"d7f7c06c-aae2-43fe-f7c6-34a9fc87482b"},"source":["preds = fitted_pipe.predict(test_df,output_level='document')\n","\n","#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n","preds.dropna(inplace=True)\n","print(classification_report(preds['y'], preds['sentiment']))"],"execution_count":null,"outputs":[{"output_type":"stream","text":[" precision recall f1-score support\n","\n"," negative 0.84 0.82 0.83 977\n"," neutral 0.00 0.00 0.00 0\n"," positive 0.86 0.81 0.83 1023\n","\n"," accuracy 0.82 2000\n"," macro avg 0.57 0.55 0.56 2000\n","weighted avg 0.85 0.82 0.83 2000\n","\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"2BB-NwZUoHSe"},"source":["# 8. Lets save the model"]},{"cell_type":"code","metadata":{"id":"eLex095goHwm","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1620190792311,"user_tz":-120,"elapsed":2428848,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"5f8bc338-d089-4a82-e4b6-0eb286d14b0d"},"source":["stored_model_path = './models/classifier_dl_trained' \n","fitted_pipe.save(stored_model_path)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["Stored model in ./models/classifier_dl_trained\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"e_b2DPd4rCiU"},"source":["# 9. Lets load the model from HDD.\n","This makes Offlien NLU usage possible! \n","You need to call nlu.load(path=path_to_the_pipe) to load a model/pipeline from disk."]},{"cell_type":"code","metadata":{"id":"SO4uz45MoRgp","colab":{"base_uri":"https://localhost:8080/","height":80},"executionInfo":{"status":"ok","timestamp":1620190810537,"user_tz":-120,"elapsed":2446986,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"2ad4167d-9332-4880-d27b-6a1679256f41"},"source":["hdd_pipe = nlu.load(path=stored_model_path)\n","\n","preds = hdd_pipe.predict('Aliens are immortal!')\n","\n","preds"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
documenttextsentimentorigin_indexsentencesentiment_confidencesentence_embedding_from_disk
0Aliens are immortal!Aliens are immortal![negative]8589934592[Aliens are immortal!][0.9999138][[0.30930545926094055, 0.12947328388690948, 0....
\n","
"],"text/plain":[" document ... sentence_embedding_from_disk\n","0 Aliens are immortal! ... [[0.30930545926094055, 0.12947328388690948, 0....\n","\n","[1 rows x 7 columns]"]},"metadata":{"tags":[]},"execution_count":15}]},{"cell_type":"code","metadata":{"id":"e0CVlkk9v6Qi","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1620190810538,"user_tz":-120,"elapsed":2446902,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"4f7f8f8f-477e-430a-c17c-5b84c3ebe8fb"},"source":["hdd_pipe.print_info()"],"execution_count":null,"outputs":[{"output_type":"stream","text":["The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n",">>> pipe['document_assembler'] has settable params:\n","pipe['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n",">>> pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'] has settable params:\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setExplodeSentences(False) | Info: whether to explode each sentence into a different row, for better parallelization. Defaults to false. | Currently set to : False\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setStorageRef('SentenceDetectorDLModel_c83c27f46b97') | Info: storage unique identifier | Currently set to : SentenceDetectorDLModel_c83c27f46b97\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setEncoder(com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@6cf015ef) | Info: Data encoder | Currently set to : com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@6cf015ef\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setImpossiblePenultimates(['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']) | Info: Impossible penultimates | Currently set to : ['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setModelArchitecture('cnn') | Info: Model architecture (CNN) | Currently set to : cnn\n",">>> pipe['bert_sentence@sent_small_bert_L12_768'] has settable params:\n","pipe['bert_sentence@sent_small_bert_L12_768'].setBatchSize(8) | Info: Size of every batch | Currently set to : 8\n","pipe['bert_sentence@sent_small_bert_L12_768'].setCaseSensitive(False) | Info: whether to ignore case in tokens for embeddings matching | Currently set to : False\n","pipe['bert_sentence@sent_small_bert_L12_768'].setDimension(768) | Info: Number of embedding dimensions | Currently set to : 768\n","pipe['bert_sentence@sent_small_bert_L12_768'].setMaxSentenceLength(128) | Info: Max sentence length to process | Currently set to : 128\n","pipe['bert_sentence@sent_small_bert_L12_768'].setIsLong(False) | Info: Use Long type instead of Int type for inputs buffer - Some Bert models require Long instead of Int. | Currently set to : False\n","pipe['bert_sentence@sent_small_bert_L12_768'].setStorageRef('sent_small_bert_L12_768') | Info: unique reference name for identification | Currently set to : sent_small_bert_L12_768\n",">>> pipe['sentiment_dl@sent_small_bert_L12_768'] has settable params:\n","pipe['sentiment_dl@sent_small_bert_L12_768'].setThreshold(0.6) | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n","pipe['sentiment_dl@sent_small_bert_L12_768'].setThresholdLabel('neutral') | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n","pipe['sentiment_dl@sent_small_bert_L12_768'].setClasses(['positive', 'negative']) | Info: get the tags used to trained this SentimentDLModel | Currently set to : ['positive', 'negative']\n","pipe['sentiment_dl@sent_small_bert_L12_768'].setStorageRef('sent_small_bert_L12_768') | Info: unique reference name for identification | Currently set to : sent_small_bert_L12_768\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"aq3RCRU4wHsv"},"source":[""],"execution_count":null,"outputs":[]}]} \ No newline at end of file +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "zkufh760uvF3" + }, + "source": [ + "![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)\n", + "\n", + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/Training/binary_text_classification/NLU_training_sarcasam_classifier_demo_news_headlines.ipynb)\n", + "\n", + "\n", + "# Training a Sentiment Analysis Classifier with NLU\n", + "## 2 Class News Headlines Sarcasam Training\n", + "With the [SentimentDL model](https://nlp.johnsnowlabs.com/docs/en/annotators#sentimentdl-multi-class-sentiment-analysis-annotator) from Spark NLP you can achieve State Of the Art results on any multi class text classification problem\n", + "\n", + "This notebook showcases the following features :\n", + "\n", + "- How to train the deep learning classifier\n", + "- How to store a pipeline to disk\n", + "- How to load the pipeline from disk (Enables NLU offline mode)\n", + "\n", + "You can achieve these results or even better on this dataset with training data:\n", + "\n", + "\n", + "
\n", + "\n", + "![img.png]()\n", + "\n", + "You can achieve these results or even better on this dataset with test data:\n", + "\n", + "\n", + "
\n", + "\n", + "![Screenshot 2021-02-25 150812.png]()\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dur2drhW5Rvi" + }, + "source": [ + "# 1. Colab Setup" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "hFGnBCHavltY" + }, + "outputs": [], + "source": [ + "! pip install -q johnsnowlabs" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "f4KkTfnR5Ugg" + }, + "source": [ + "# 2. Download News Headlines Sarcsam dataset\n", + "https://www.kaggle.com/rmisra/news-headlines-dataset-for-sarcasm-detection\n", + "#Context\n", + "Past studies in Sarcasm Detection mostly make use of Twitter datasets collected using hashtag based supervision but such datasets are noisy in terms of labels and language. Furthermore, many tweets are replies to other tweets and detecting sarcasm in these requires the availability of contextual tweets.\n", + "\n", + "To overcome the limitations related to noise in Twitter datasets, this News Headlines dataset for Sarcasm Detection is collected from two news website. TheOnion aims at producing sarcastic versions of current events and we collected all the headlines from News in Brief and News in Photos categories (which are sarcastic). We collect real (and non-sarcastic) news headlines from HuffPost.\n", + "\n", + "This new dataset has following advantages over the existing Twitter datasets:\n", + "\n", + "Since news headlines are written by professionals in a formal manner, there are no spelling mistakes and informal usage. This reduces the sparsity and also increases the chance of finding pre-trained embeddings.\n", + "\n", + "Furthermore, since the sole purpose of TheOnion is to publish sarcastic news, we get high-quality labels with much less noise as compared to Twitter datasets.\n", + "\n", + "Unlike tweets which are replies to other tweets, the news headlines we obtained are self-contained. This would help us in teasing apart the real sarcastic elements.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "OrVb5ZMvvrQD" + }, + "outputs": [], + "source": [ + "! wget https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/resources/en/classifier-dl/Sarcasm_Headlines/Sarcasm_Headlines_Dataset_v2.csv\n" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 424 + }, + "id": "y4xSRWIhwT28", + "outputId": "3b062b61-bd33-4f03-902e-066900be06d1" + }, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ytext
1875negativethis t. rex dominates 'american ninja warrior'...
11997negativecomputer science in vietnam: counting down to ...
315positivedhs announces racial profiling free-for-all th...
16252negativethe best rent the runway dresses for bridesmaids
12178negativeabe's visit will remind americans china's powe...
.........
10908positivesudden death of aunt creates rupture in family...
17451negativeover 50% of lgbtq youths struggle with eating ...
16032negativegop senator: my family went from 'cotton to co...
17823negativemartin o'malley fails to make ohio's president...
14495positivenra says parkland students should be grateful ...
\n", + "

22895 rows × 2 columns

\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ], + "text/plain": [ + " y text\n", + "1875 negative this t. rex dominates 'american ninja warrior'...\n", + "11997 negative computer science in vietnam: counting down to ...\n", + "315 positive dhs announces racial profiling free-for-all th...\n", + "16252 negative the best rent the runway dresses for bridesmaids\n", + "12178 negative abe's visit will remind americans china's powe...\n", + "... ... ...\n", + "10908 positive sudden death of aunt creates rupture in family...\n", + "17451 negative over 50% of lgbtq youths struggle with eating ...\n", + "16032 negative gop senator: my family went from 'cotton to co...\n", + "17823 negative martin o'malley fails to make ohio's president...\n", + "14495 positive nra says parkland students should be grateful ...\n", + "\n", + "[22895 rows x 2 columns]" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import pandas as pd\n", + "test_path = '/content/Sarcasm_Headlines_Dataset_v2.csv'\n", + "train_df = pd.read_csv(test_path,sep=\",\")\n", + "cols = [\"y\",\"text\"]\n", + "train_df = train_df[cols]\n", + "from sklearn.model_selection import train_test_split\n", + "train_df, test_df = train_test_split(train_df, test_size=0.2)\n", + "train_df\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0296Om2C5anY" + }, + "source": [ + "# 3. Train Deep Learning Classifier using nlu.load('train.sentiment')\n", + "\n", + "You dataset label column should be named 'y' and the feature column with text data should be named 'text'" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "id": "3ZIPkRkWftBG", + "outputId": "d880ec91-023e-4f75-a687-94825d7ea1b5" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "sent_small_bert_L2_128 download started this may take some time.\n", + "Approximate size to download 16.1 MB\n", + "[OK!]\n", + " precision recall f1-score support\n", + "\n", + " negative 0.00 0.00 0.00 22\n", + " positive 0.56 1.00 0.72 28\n", + "\n", + " accuracy 0.56 50\n", + " macro avg 0.28 0.50 0.36 50\n", + "weighted avg 0.31 0.56 0.40 50\n", + "\n" + ] + }, + { + "data": { + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
documentsentence_embedding_small_bert_L2_128sentimentsentiment_confidencetexty
0kids in bus accident mocked by kids in passing...[-0.7924436926841736, -0.07141707092523575, -0...positive0.0kids in bus accident mocked by kids in passing...positive
1these two words are stealing your freedom[-1.028885006904602, -0.7214630246162415, 0.12...positive0.0these two words are stealing your freedomnegative
2bush vows to do 'that thing gore just said, on...[-1.7108917236328125, 0.2942552864551544, -0.3...positive0.0bush vows to do 'that thing gore just said, on...positive
3scotland's parliament backs new independence r...[0.14012272655963898, 0.7012102603912354, -0.0...positive0.0scotland's parliament backs new independence r...negative
4joe biden: being vice president is 'a bitch'[-0.571045458316803, 0.5463784337043762, -0.17...positive0.0joe biden: being vice president is 'a bitch'negative
5u.s. army now just chasing single remaining is...[-0.9116203188896179, -0.3868906497955322, -0....positive0.0u.s. army now just chasing single remaining is...positive
6sheryl crow's freshness date expires[-1.388686180114746, 0.5200713872909546, -0.43...positive0.0sheryl crow's freshness date expirespositive
7new ebola quarantine protocol seen as barrier ...[-0.6295406222343445, 0.43973761796951294, -0....positive0.0new ebola quarantine protocol seen as barrier ...negative
8the secret to building a successful business t...[-0.1442088484764099, 0.5362765192985535, -0.5...positive0.0the secret to building a successful business t...negative
9school for crime[-1.4345587491989136, 0.35326844453811646, -1....positive0.0school for crimenegative
10want to increase trust? increase your say/do r...[-1.2950726747512817, 1.153881311416626, -0.36...positive0.0want to increase trust? increase your say/do r...negative
11when autocorrect and sexting collide[-1.1433782577514648, 0.25382623076438904, -0....positive0.0when autocorrect and sexting collidenegative
12the house science committee doesn't seem to un...[-0.24199619889259338, 0.8295924663543701, -0....positive0.0the house science committee doesn't seem to un...negative
13eric clapton wows audience with even slower ve...[-0.12445597350597382, -0.2018168419599533, 0....positive0.0eric clapton wows audience with even slower ve...positive
14rick perry returning to iowa[-1.3457694053649902, 0.5665305852890015, -0.5...positive0.0rick perry returning to iowanegative
15one beer can't do local alcoholic any harm[-1.3927576541900635, 0.6284745335578918, -0.5...positive0.0one beer can't do local alcoholic any harmpositive
16couple at point where they're comfortable usin...[-1.5534732341766357, 0.2146364003419876, -0.6...positive0.0couple at point where they're comfortable usin...positive
17watch how mountains of trash spread across the...[-1.8412539958953857, 0.1492537260055542, -0.0...positive0.0watch how mountains of trash spread across the...negative
18lowly mortal opens portal to hell[-0.8720283508300781, -1.011899709701538, -0.3...positive0.0lowly mortal opens portal to hellpositive
19stephen hawking reportedly working on juicy te...[-0.7349460124969482, -0.011331072077155113, 0...positive0.0stephen hawking reportedly working on juicy te...positive
20offshorers demand: no taxes, no risk[-1.0823032855987549, 1.29856276512146, -0.915...positive0.0offshorers demand: no taxes, no risknegative
21troubling study finds majority of americans wh...[-0.7015470862388611, -0.17110034823417664, -0...positive0.0troubling study finds majority of americans wh...positive
22annoying man more annoying after skydiving[-1.4149224758148193, -1.0473958253860474, 0.1...positive6.0annoying man more annoying after skydivingpositive
23robin hood foundation[-0.057901788502931595, 0.8722413778305054, -0...positive0.0robin hood foundationnegative
24passenger glued to airplane window like it fuc...[-0.33463752269744873, 0.7636221051216125, -0....positive0.0passenger glued to airplane window like it fuc...positive
25amazingly humanlike robot able to commit thous...[-1.2185980081558228, -0.9673603177070618, -0....positive0.0amazingly humanlike robot able to commit thous...positive
26tourists describe scenes of horror in tunisian...[-1.1378378868103027, -1.2116812467575073, -0....positive0.0tourists describe scenes of horror in tunisian...negative
27is your christmas present spying on you?[-0.8687025308609009, -0.0924399271607399, -0....positive0.0is your christmas present spying on you?negative
28area man busts his ass all day, and for what?[-2.376340627670288, -0.3160030245780945, -0.4...positive0.0area man busts his ass all day, and for what?positive
29department of interior employee caught embezzl...[-1.1687604188919067, 0.5776483416557312, -0.6...positive0.0department of interior employee caught embezzl...positive
30new software yellows neglected digital photos ...[-0.773508608341217, 0.5193011164665222, -0.54...positive0.0new software yellows neglected digital photos ...positive
31read live updates on the cnn democratic debate[-1.1500324010849, -0.5574671626091003, 0.3618...positive0.0read live updates on the cnn democratic debatenegative
32new poultry stripe gum hardly tastes like goos...[-1.2104802131652832, 0.2168104648590088, -0.0...positive0.0new poultry stripe gum hardly tastes like goos...positive
33new film takes an honest look at life with a t...[-0.27279043197631836, -0.44589897990226746, -...positive0.0new film takes an honest look at life with a t...negative
34vice president pence pushes expansive nato and...[0.06816250830888748, 0.562971830368042, -0.57...positive0.0vice president pence pushes expansive nato and...negative
35mom tucks handwritten guide on how to use netf...[-1.1668932437896729, 0.6492086052894592, -0.3...positive0.0mom tucks handwritten guide on how to use netf...positive
36man insists facebook friend actually reads 'wh...[-0.41535136103630066, -0.6858690977096558, -0...positive0.0man insists facebook friend actually reads 'wh...positive
37internet explorer makes desperate overture to ...[-0.058107126504182816, -0.19895373284816742, ...positive0.0internet explorer makes desperate overture to ...positive
38rock song takes pro-rock stance[-0.8466075658798218, -0.350190132856369, 0.33...positive0.0rock song takes pro-rock stancepositive
39college still looking for absolute saddest pla...[-0.8797492384910583, -0.09284910559654236, -0...positive0.0college still looking for absolute saddest pla...positive
40alton sterling's family demands action from ba...[-1.123725175857544, 0.27766698598861694, -0.4...positive0.0alton sterling's family demands action from ba...negative
41ronda rousey wants to show you how ripped she ...[-0.704077959060669, 0.4832143783569336, -0.43...positive0.0ronda rousey wants to show you how ripped she ...negative
42the workplace revolution: adding company cultu...[-0.6472877860069275, 0.819389283657074, -0.50...positive0.0the workplace revolution: adding company cultu...negative
43wolf blitzer walks into middle of olive garden...[-1.089509129524231, -0.010750504210591316, -0...positive0.0wolf blitzer walks into middle of olive garden...positive
44grandchild, grandfather equally dreading colla...[-0.9178588390350342, -0.05760730057954788, -0...positive0.0grandchild, grandfather equally dreading colla...positive
45congress reassures nervous zuckerberg they won...[-0.723938524723053, 0.5092206597328186, -0.02...positive0.0congress reassures nervous zuckerberg they won...positive
46senators lured back to emergency session by pr...[-2.088550567626953, 0.19368097186088562, -0.7...positive0.0senators lured back to emergency session by pr...positive
47powerball officials remove plastic balls from ...[-1.236419916152954, 0.6744992733001709, -0.52...positive9.0powerball officials remove plastic balls from ...positive
48rude guy unfortunately says something funny[-0.9946603775024414, -0.6593665480613708, 0.3...positive0.0rude guy unfortunately says something funnypositive
49the room i carry with me[-1.4270148277282715, 0.47338584065437317, -0....positive0.0the room i carry with menegative
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ], + "text/plain": [ + " document \\\n", + "0 kids in bus accident mocked by kids in passing... \n", + "1 these two words are stealing your freedom \n", + "2 bush vows to do 'that thing gore just said, on... \n", + "3 scotland's parliament backs new independence r... \n", + "4 joe biden: being vice president is 'a bitch' \n", + "5 u.s. army now just chasing single remaining is... \n", + "6 sheryl crow's freshness date expires \n", + "7 new ebola quarantine protocol seen as barrier ... \n", + "8 the secret to building a successful business t... \n", + "9 school for crime \n", + "10 want to increase trust? increase your say/do r... \n", + "11 when autocorrect and sexting collide \n", + "12 the house science committee doesn't seem to un... \n", + "13 eric clapton wows audience with even slower ve... \n", + "14 rick perry returning to iowa \n", + "15 one beer can't do local alcoholic any harm \n", + "16 couple at point where they're comfortable usin... \n", + "17 watch how mountains of trash spread across the... \n", + "18 lowly mortal opens portal to hell \n", + "19 stephen hawking reportedly working on juicy te... \n", + "20 offshorers demand: no taxes, no risk \n", + "21 troubling study finds majority of americans wh... \n", + "22 annoying man more annoying after skydiving \n", + "23 robin hood foundation \n", + "24 passenger glued to airplane window like it fuc... \n", + "25 amazingly humanlike robot able to commit thous... \n", + "26 tourists describe scenes of horror in tunisian... \n", + "27 is your christmas present spying on you? \n", + "28 area man busts his ass all day, and for what? \n", + "29 department of interior employee caught embezzl... \n", + "30 new software yellows neglected digital photos ... \n", + "31 read live updates on the cnn democratic debate \n", + "32 new poultry stripe gum hardly tastes like goos... \n", + "33 new film takes an honest look at life with a t... \n", + "34 vice president pence pushes expansive nato and... \n", + "35 mom tucks handwritten guide on how to use netf... \n", + "36 man insists facebook friend actually reads 'wh... \n", + "37 internet explorer makes desperate overture to ... \n", + "38 rock song takes pro-rock stance \n", + "39 college still looking for absolute saddest pla... \n", + "40 alton sterling's family demands action from ba... \n", + "41 ronda rousey wants to show you how ripped she ... \n", + "42 the workplace revolution: adding company cultu... \n", + "43 wolf blitzer walks into middle of olive garden... \n", + "44 grandchild, grandfather equally dreading colla... \n", + "45 congress reassures nervous zuckerberg they won... \n", + "46 senators lured back to emergency session by pr... \n", + "47 powerball officials remove plastic balls from ... \n", + "48 rude guy unfortunately says something funny \n", + "49 the room i carry with me \n", + "\n", + " sentence_embedding_small_bert_L2_128 sentiment \\\n", + "0 [-0.7924436926841736, -0.07141707092523575, -0... positive \n", + "1 [-1.028885006904602, -0.7214630246162415, 0.12... positive \n", + "2 [-1.7108917236328125, 0.2942552864551544, -0.3... positive \n", + "3 [0.14012272655963898, 0.7012102603912354, -0.0... positive \n", + "4 [-0.571045458316803, 0.5463784337043762, -0.17... positive \n", + "5 [-0.9116203188896179, -0.3868906497955322, -0.... positive \n", + "6 [-1.388686180114746, 0.5200713872909546, -0.43... positive \n", + "7 [-0.6295406222343445, 0.43973761796951294, -0.... positive \n", + "8 [-0.1442088484764099, 0.5362765192985535, -0.5... positive \n", + "9 [-1.4345587491989136, 0.35326844453811646, -1.... positive \n", + "10 [-1.2950726747512817, 1.153881311416626, -0.36... positive \n", + "11 [-1.1433782577514648, 0.25382623076438904, -0.... positive \n", + "12 [-0.24199619889259338, 0.8295924663543701, -0.... positive \n", + "13 [-0.12445597350597382, -0.2018168419599533, 0.... positive \n", + "14 [-1.3457694053649902, 0.5665305852890015, -0.5... positive \n", + "15 [-1.3927576541900635, 0.6284745335578918, -0.5... positive \n", + "16 [-1.5534732341766357, 0.2146364003419876, -0.6... positive \n", + "17 [-1.8412539958953857, 0.1492537260055542, -0.0... positive \n", + "18 [-0.8720283508300781, -1.011899709701538, -0.3... positive \n", + "19 [-0.7349460124969482, -0.011331072077155113, 0... positive \n", + "20 [-1.0823032855987549, 1.29856276512146, -0.915... positive \n", + "21 [-0.7015470862388611, -0.17110034823417664, -0... positive \n", + "22 [-1.4149224758148193, -1.0473958253860474, 0.1... positive \n", + "23 [-0.057901788502931595, 0.8722413778305054, -0... positive \n", + "24 [-0.33463752269744873, 0.7636221051216125, -0.... positive \n", + "25 [-1.2185980081558228, -0.9673603177070618, -0.... positive \n", + "26 [-1.1378378868103027, -1.2116812467575073, -0.... positive \n", + "27 [-0.8687025308609009, -0.0924399271607399, -0.... positive \n", + "28 [-2.376340627670288, -0.3160030245780945, -0.4... positive \n", + "29 [-1.1687604188919067, 0.5776483416557312, -0.6... positive \n", + "30 [-0.773508608341217, 0.5193011164665222, -0.54... positive \n", + "31 [-1.1500324010849, -0.5574671626091003, 0.3618... positive \n", + "32 [-1.2104802131652832, 0.2168104648590088, -0.0... positive \n", + "33 [-0.27279043197631836, -0.44589897990226746, -... positive \n", + "34 [0.06816250830888748, 0.562971830368042, -0.57... positive \n", + "35 [-1.1668932437896729, 0.6492086052894592, -0.3... positive \n", + "36 [-0.41535136103630066, -0.6858690977096558, -0... positive \n", + "37 [-0.058107126504182816, -0.19895373284816742, ... positive \n", + "38 [-0.8466075658798218, -0.350190132856369, 0.33... positive \n", + "39 [-0.8797492384910583, -0.09284910559654236, -0... positive \n", + "40 [-1.123725175857544, 0.27766698598861694, -0.4... positive \n", + "41 [-0.704077959060669, 0.4832143783569336, -0.43... positive \n", + "42 [-0.6472877860069275, 0.819389283657074, -0.50... positive \n", + "43 [-1.089509129524231, -0.010750504210591316, -0... positive \n", + "44 [-0.9178588390350342, -0.05760730057954788, -0... positive \n", + "45 [-0.723938524723053, 0.5092206597328186, -0.02... positive \n", + "46 [-2.088550567626953, 0.19368097186088562, -0.7... positive \n", + "47 [-1.236419916152954, 0.6744992733001709, -0.52... positive \n", + "48 [-0.9946603775024414, -0.6593665480613708, 0.3... positive \n", + "49 [-1.4270148277282715, 0.47338584065437317, -0.... positive \n", + "\n", + " sentiment_confidence text \\\n", + "0 0.0 kids in bus accident mocked by kids in passing... \n", + "1 0.0 these two words are stealing your freedom \n", + "2 0.0 bush vows to do 'that thing gore just said, on... \n", + "3 0.0 scotland's parliament backs new independence r... \n", + "4 0.0 joe biden: being vice president is 'a bitch' \n", + "5 0.0 u.s. army now just chasing single remaining is... \n", + "6 0.0 sheryl crow's freshness date expires \n", + "7 0.0 new ebola quarantine protocol seen as barrier ... \n", + "8 0.0 the secret to building a successful business t... \n", + "9 0.0 school for crime \n", + "10 0.0 want to increase trust? increase your say/do r... \n", + "11 0.0 when autocorrect and sexting collide \n", + "12 0.0 the house science committee doesn't seem to un... \n", + "13 0.0 eric clapton wows audience with even slower ve... \n", + "14 0.0 rick perry returning to iowa \n", + "15 0.0 one beer can't do local alcoholic any harm \n", + "16 0.0 couple at point where they're comfortable usin... \n", + "17 0.0 watch how mountains of trash spread across the... \n", + "18 0.0 lowly mortal opens portal to hell \n", + "19 0.0 stephen hawking reportedly working on juicy te... \n", + "20 0.0 offshorers demand: no taxes, no risk \n", + "21 0.0 troubling study finds majority of americans wh... \n", + "22 6.0 annoying man more annoying after skydiving \n", + "23 0.0 robin hood foundation \n", + "24 0.0 passenger glued to airplane window like it fuc... \n", + "25 0.0 amazingly humanlike robot able to commit thous... \n", + "26 0.0 tourists describe scenes of horror in tunisian... \n", + "27 0.0 is your christmas present spying on you? \n", + "28 0.0 area man busts his ass all day, and for what? \n", + "29 0.0 department of interior employee caught embezzl... \n", + "30 0.0 new software yellows neglected digital photos ... \n", + "31 0.0 read live updates on the cnn democratic debate \n", + "32 0.0 new poultry stripe gum hardly tastes like goos... \n", + "33 0.0 new film takes an honest look at life with a t... \n", + "34 0.0 vice president pence pushes expansive nato and... \n", + "35 0.0 mom tucks handwritten guide on how to use netf... \n", + "36 0.0 man insists facebook friend actually reads 'wh... \n", + "37 0.0 internet explorer makes desperate overture to ... \n", + "38 0.0 rock song takes pro-rock stance \n", + "39 0.0 college still looking for absolute saddest pla... \n", + "40 0.0 alton sterling's family demands action from ba... \n", + "41 0.0 ronda rousey wants to show you how ripped she ... \n", + "42 0.0 the workplace revolution: adding company cultu... \n", + "43 0.0 wolf blitzer walks into middle of olive garden... \n", + "44 0.0 grandchild, grandfather equally dreading colla... \n", + "45 0.0 congress reassures nervous zuckerberg they won... \n", + "46 0.0 senators lured back to emergency session by pr... \n", + "47 9.0 powerball officials remove plastic balls from ... \n", + "48 0.0 rude guy unfortunately says something funny \n", + "49 0.0 the room i carry with me \n", + "\n", + " y \n", + "0 positive \n", + "1 negative \n", + "2 positive \n", + "3 negative \n", + "4 negative \n", + "5 positive \n", + "6 positive \n", + "7 negative \n", + "8 negative \n", + "9 negative \n", + "10 negative \n", + "11 negative \n", + "12 negative \n", + "13 positive \n", + "14 negative \n", + "15 positive \n", + "16 positive \n", + "17 negative \n", + "18 positive \n", + "19 positive \n", + "20 negative \n", + "21 positive \n", + "22 positive \n", + "23 negative \n", + "24 positive \n", + "25 positive \n", + "26 negative \n", + "27 negative \n", + "28 positive \n", + "29 positive \n", + "30 positive \n", + "31 negative \n", + "32 positive \n", + "33 negative \n", + "34 negative \n", + "35 positive \n", + "36 positive \n", + "37 positive \n", + "38 positive \n", + "39 positive \n", + "40 negative \n", + "41 negative \n", + "42 negative \n", + "43 positive \n", + "44 positive \n", + "45 positive \n", + "46 positive \n", + "47 positive \n", + "48 positive \n", + "49 negative " + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from johnsnowlabs import nlp\n", + "from sklearn.metrics import classification_report\n", + "\n", + "# load a trainable pipeline by specifying the train. prefix and fit it on a datset with label and text columns\n", + "# by default the Universal Sentence Encoder (USE) Sentence embeddings are used for generation\n", + "trainable_pipe = nlp.load('train.sentiment')\n", + "fitted_pipe = trainable_pipe.fit(train_df.iloc[:50])\n", + "\n", + "# predict with the trainable pipeline on dataset and get predictions\n", + "preds = fitted_pipe.predict(train_df.iloc[:50],output_level='document')\n", + "#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n", + "preds.dropna(inplace=True)\n", + "\n", + "print(classification_report(preds['y'], preds['sentiment']))\n", + "\n", + "preds" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lVyOE2wV0fw_" + }, + "source": [ + "# 4. Test the fitted pipe on new example" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 150 + }, + "id": "qdCUg2MR0PD2", + "outputId": "dd1009b1-8146-4cfa-8be5-b9ab14e444ec" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "sentence_detector_dl download started this may take some time.\n", + "Approximate size to download 354.6 KB\n", + "[OK!]\n", + "Warning::Spark Session already created, some configs may not take.\n" + ] + }, + { + "data": { + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
sentencesentence_embedding_small_bert_L2_128sentimentsentiment_confidence
0Aliens are immortal![-0.6534902453422546, -1.4232430458068848, -0....positive0.989048
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "text/plain": [ + " sentence sentence_embedding_small_bert_L2_128 \\\n", + "0 Aliens are immortal! [-0.6534902453422546, -1.4232430458068848, -0.... \n", + "\n", + " sentiment sentiment_confidence \n", + "0 positive 0.989048 " + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "fitted_pipe.predict('Aliens are immortal!')\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xflpwrVjjBVD" + }, + "source": [ + "## 5. Configure pipe training parameters" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "UtsAUGTmOTms", + "outputId": "9977460e-1f6e-497a-ca9c-2d78957f7aa2" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n", + ">>> component_list['bert_sentence_embeddings@sent_small_bert_L2_128'] has settable params:\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setBatchSize(8) | Info: Size of every batch | Currently set to : 8\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setIsLong(False) | Info: Use Long type instead of Int type for inputs buffer - Some Bert models require Long instead of Int. | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setMaxSentenceLength(128) | Info: Max sentence length to process | Currently set to : 128\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setDimension(128) | Info: Number of embedding dimensions | Currently set to : 128\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setCaseSensitive(False) | Info: whether to ignore case in tokens for embeddings matching | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setStorageRef('sent_small_bert_L2_128') | Info: unique reference name for identification | Currently set to : sent_small_bert_L2_128\n", + ">>> component_list['document_assembler'] has settable params:\n", + "component_list['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n", + ">>> component_list['sentiment_dl@sent_small_bert_L2_128'] has settable params:\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setThreshold(0.6) | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setThresholdLabel('neutral') | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setStorageRef('sent_small_bert_L2_128') | Info: unique reference name for identification | Currently set to : sent_small_bert_L2_128\n" + ] + } + ], + "source": [ + "trainable_pipe.print_info()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2GJdDNV9jEIe" + }, + "source": [ + "## 6. Retrain with new parameters" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "id": "mptfvHx-MMMX", + "outputId": "2a8a4c62-02a1-48d4-8331-990e1268c030" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", + "sent_small_bert_L2_128 download started this may take some time.\n", + "Approximate size to download 16.1 MB\n", + "[OK!]\n", + " precision recall f1-score support\n", + "\n", + " negative 0.44 1.00 0.61 22\n", + " positive 0.00 0.00 0.00 28\n", + "\n", + " accuracy 0.44 50\n", + " macro avg 0.22 0.50 0.31 50\n", + "weighted avg 0.19 0.44 0.27 50\n", + "\n" + ] + }, + { + "data": { + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
documentsentence_embedding_small_bert_L2_128sentimentsentiment_confidencetexty
0kids in bus accident mocked by kids in passing...[-0.7924436926841736, -0.07141707092523575, -0...negative7.0kids in bus accident mocked by kids in passing...positive
1these two words are stealing your freedom[-1.028885006904602, -0.7214630246162415, 0.12...negative1.0these two words are stealing your freedomnegative
2bush vows to do 'that thing gore just said, on...[-1.7108917236328125, 0.2942552864551544, -0.3...negative1.0bush vows to do 'that thing gore just said, on...positive
3scotland's parliament backs new independence r...[0.14012272655963898, 0.7012102603912354, -0.0...negative5.0scotland's parliament backs new independence r...negative
4joe biden: being vice president is 'a bitch'[-0.571045458316803, 0.5463784337043762, -0.17...negative2.0joe biden: being vice president is 'a bitch'negative
5u.s. army now just chasing single remaining is...[-0.9116203188896179, -0.3868906497955322, -0....negative1.0u.s. army now just chasing single remaining is...positive
6sheryl crow's freshness date expires[-1.388686180114746, 0.5200713872909546, -0.43...negative1.0sheryl crow's freshness date expirespositive
7new ebola quarantine protocol seen as barrier ...[-0.6295406222343445, 0.43973761796951294, -0....negative1.0new ebola quarantine protocol seen as barrier ...negative
8the secret to building a successful business t...[-0.1442088484764099, 0.5362765192985535, -0.5...negative4.0the secret to building a successful business t...negative
9school for crime[-1.4345587491989136, 0.35326844453811646, -1....negative1.0school for crimenegative
10want to increase trust? increase your say/do r...[-1.2950726747512817, 1.153881311416626, -0.36...negative1.0want to increase trust? increase your say/do r...negative
11when autocorrect and sexting collide[-1.1433782577514648, 0.25382623076438904, -0....negative1.0when autocorrect and sexting collidenegative
12the house science committee doesn't seem to un...[-0.24199619889259338, 0.8295924663543701, -0....negative7.0the house science committee doesn't seem to un...negative
13eric clapton wows audience with even slower ve...[-0.12445597350597382, -0.2018168419599533, 0....negative6.0eric clapton wows audience with even slower ve...positive
14rick perry returning to iowa[-1.3457694053649902, 0.5665305852890015, -0.5...negative1.0rick perry returning to iowanegative
15one beer can't do local alcoholic any harm[-1.3927576541900635, 0.6284745335578918, -0.5...negative6.0one beer can't do local alcoholic any harmpositive
16couple at point where they're comfortable usin...[-1.5534732341766357, 0.2146364003419876, -0.6...negative2.0couple at point where they're comfortable usin...positive
17watch how mountains of trash spread across the...[-1.8412539958953857, 0.1492537260055542, -0.0...negative4.0watch how mountains of trash spread across the...negative
18lowly mortal opens portal to hell[-0.8720283508300781, -1.011899709701538, -0.3...negative6.0lowly mortal opens portal to hellpositive
19stephen hawking reportedly working on juicy te...[-0.7349460124969482, -0.011331072077155113, 0...negative5.0stephen hawking reportedly working on juicy te...positive
20offshorers demand: no taxes, no risk[-1.0823032855987549, 1.29856276512146, -0.915...negative2.0offshorers demand: no taxes, no risknegative
21troubling study finds majority of americans wh...[-0.7015470862388611, -0.17110034823417664, -0...negative5.0troubling study finds majority of americans wh...positive
22annoying man more annoying after skydiving[-1.4149224758148193, -1.0473958253860474, 0.1...negative1.0annoying man more annoying after skydivingpositive
23robin hood foundation[-0.057901788502931595, 0.8722413778305054, -0...negative6.0robin hood foundationnegative
24passenger glued to airplane window like it fuc...[-0.33463752269744873, 0.7636221051216125, -0....negative1.0passenger glued to airplane window like it fuc...positive
25amazingly humanlike robot able to commit thous...[-1.2185980081558228, -0.9673603177070618, -0....negative3.0amazingly humanlike robot able to commit thous...positive
26tourists describe scenes of horror in tunisian...[-1.1378378868103027, -1.2116812467575073, -0....negative5.0tourists describe scenes of horror in tunisian...negative
27is your christmas present spying on you?[-0.8687025308609009, -0.0924399271607399, -0....negative3.0is your christmas present spying on you?negative
28area man busts his ass all day, and for what?[-2.376340627670288, -0.3160030245780945, -0.4...negative5.0area man busts his ass all day, and for what?positive
29department of interior employee caught embezzl...[-1.1687604188919067, 0.5776483416557312, -0.6...negative2.0department of interior employee caught embezzl...positive
30new software yellows neglected digital photos ...[-0.773508608341217, 0.5193011164665222, -0.54...negative9.0new software yellows neglected digital photos ...positive
31read live updates on the cnn democratic debate[-1.1500324010849, -0.5574671626091003, 0.3618...negative8.0read live updates on the cnn democratic debatenegative
32new poultry stripe gum hardly tastes like goos...[-1.2104802131652832, 0.2168104648590088, -0.0...negative1.0new poultry stripe gum hardly tastes like goos...positive
33new film takes an honest look at life with a t...[-0.27279043197631836, -0.44589897990226746, -...negative4.0new film takes an honest look at life with a t...negative
34vice president pence pushes expansive nato and...[0.06816250830888748, 0.562971830368042, -0.57...negative3.0vice president pence pushes expansive nato and...negative
35mom tucks handwritten guide on how to use netf...[-1.1668932437896729, 0.6492086052894592, -0.3...negative2.0mom tucks handwritten guide on how to use netf...positive
36man insists facebook friend actually reads 'wh...[-0.41535136103630066, -0.6858690977096558, -0...negative7.0man insists facebook friend actually reads 'wh...positive
37internet explorer makes desperate overture to ...[-0.058107126504182816, -0.19895373284816742, ...negative2.0internet explorer makes desperate overture to ...positive
38rock song takes pro-rock stance[-0.8466075658798218, -0.350190132856369, 0.33...negative4.0rock song takes pro-rock stancepositive
39college still looking for absolute saddest pla...[-0.8797492384910583, -0.09284910559654236, -0...negative9.0college still looking for absolute saddest pla...positive
40alton sterling's family demands action from ba...[-1.123725175857544, 0.27766698598861694, -0.4...negative3.0alton sterling's family demands action from ba...negative
41ronda rousey wants to show you how ripped she ...[-0.704077959060669, 0.4832143783569336, -0.43...negative1.0ronda rousey wants to show you how ripped she ...negative
42the workplace revolution: adding company cultu...[-0.6472877860069275, 0.819389283657074, -0.50...negative2.0the workplace revolution: adding company cultu...negative
43wolf blitzer walks into middle of olive garden...[-1.089509129524231, -0.010750504210591316, -0...negative1.0wolf blitzer walks into middle of olive garden...positive
44grandchild, grandfather equally dreading colla...[-0.9178588390350342, -0.05760730057954788, -0...negative4.0grandchild, grandfather equally dreading colla...positive
45congress reassures nervous zuckerberg they won...[-0.723938524723053, 0.5092206597328186, -0.02...negative1.0congress reassures nervous zuckerberg they won...positive
46senators lured back to emergency session by pr...[-2.088550567626953, 0.19368097186088562, -0.7...negative3.0senators lured back to emergency session by pr...positive
47powerball officials remove plastic balls from ...[-1.236419916152954, 0.6744992733001709, -0.52...negative1.0powerball officials remove plastic balls from ...positive
48rude guy unfortunately says something funny[-0.9946603775024414, -0.6593665480613708, 0.3...negative6.0rude guy unfortunately says something funnypositive
49the room i carry with me[-1.4270148277282715, 0.47338584065437317, -0....negative4.0the room i carry with menegative
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ], + "text/plain": [ + " document \\\n", + "0 kids in bus accident mocked by kids in passing... \n", + "1 these two words are stealing your freedom \n", + "2 bush vows to do 'that thing gore just said, on... \n", + "3 scotland's parliament backs new independence r... \n", + "4 joe biden: being vice president is 'a bitch' \n", + "5 u.s. army now just chasing single remaining is... \n", + "6 sheryl crow's freshness date expires \n", + "7 new ebola quarantine protocol seen as barrier ... \n", + "8 the secret to building a successful business t... \n", + "9 school for crime \n", + "10 want to increase trust? increase your say/do r... \n", + "11 when autocorrect and sexting collide \n", + "12 the house science committee doesn't seem to un... \n", + "13 eric clapton wows audience with even slower ve... \n", + "14 rick perry returning to iowa \n", + "15 one beer can't do local alcoholic any harm \n", + "16 couple at point where they're comfortable usin... \n", + "17 watch how mountains of trash spread across the... \n", + "18 lowly mortal opens portal to hell \n", + "19 stephen hawking reportedly working on juicy te... \n", + "20 offshorers demand: no taxes, no risk \n", + "21 troubling study finds majority of americans wh... \n", + "22 annoying man more annoying after skydiving \n", + "23 robin hood foundation \n", + "24 passenger glued to airplane window like it fuc... \n", + "25 amazingly humanlike robot able to commit thous... \n", + "26 tourists describe scenes of horror in tunisian... \n", + "27 is your christmas present spying on you? \n", + "28 area man busts his ass all day, and for what? \n", + "29 department of interior employee caught embezzl... \n", + "30 new software yellows neglected digital photos ... \n", + "31 read live updates on the cnn democratic debate \n", + "32 new poultry stripe gum hardly tastes like goos... \n", + "33 new film takes an honest look at life with a t... \n", + "34 vice president pence pushes expansive nato and... \n", + "35 mom tucks handwritten guide on how to use netf... \n", + "36 man insists facebook friend actually reads 'wh... \n", + "37 internet explorer makes desperate overture to ... \n", + "38 rock song takes pro-rock stance \n", + "39 college still looking for absolute saddest pla... \n", + "40 alton sterling's family demands action from ba... \n", + "41 ronda rousey wants to show you how ripped she ... \n", + "42 the workplace revolution: adding company cultu... \n", + "43 wolf blitzer walks into middle of olive garden... \n", + "44 grandchild, grandfather equally dreading colla... \n", + "45 congress reassures nervous zuckerberg they won... \n", + "46 senators lured back to emergency session by pr... \n", + "47 powerball officials remove plastic balls from ... \n", + "48 rude guy unfortunately says something funny \n", + "49 the room i carry with me \n", + "\n", + " sentence_embedding_small_bert_L2_128 sentiment \\\n", + "0 [-0.7924436926841736, -0.07141707092523575, -0... negative \n", + "1 [-1.028885006904602, -0.7214630246162415, 0.12... negative \n", + "2 [-1.7108917236328125, 0.2942552864551544, -0.3... negative \n", + "3 [0.14012272655963898, 0.7012102603912354, -0.0... negative \n", + "4 [-0.571045458316803, 0.5463784337043762, -0.17... negative \n", + "5 [-0.9116203188896179, -0.3868906497955322, -0.... negative \n", + "6 [-1.388686180114746, 0.5200713872909546, -0.43... negative \n", + "7 [-0.6295406222343445, 0.43973761796951294, -0.... negative \n", + "8 [-0.1442088484764099, 0.5362765192985535, -0.5... negative \n", + "9 [-1.4345587491989136, 0.35326844453811646, -1.... negative \n", + "10 [-1.2950726747512817, 1.153881311416626, -0.36... negative \n", + "11 [-1.1433782577514648, 0.25382623076438904, -0.... negative \n", + "12 [-0.24199619889259338, 0.8295924663543701, -0.... negative \n", + "13 [-0.12445597350597382, -0.2018168419599533, 0.... negative \n", + "14 [-1.3457694053649902, 0.5665305852890015, -0.5... negative \n", + "15 [-1.3927576541900635, 0.6284745335578918, -0.5... negative \n", + "16 [-1.5534732341766357, 0.2146364003419876, -0.6... negative \n", + "17 [-1.8412539958953857, 0.1492537260055542, -0.0... negative \n", + "18 [-0.8720283508300781, -1.011899709701538, -0.3... negative \n", + "19 [-0.7349460124969482, -0.011331072077155113, 0... negative \n", + "20 [-1.0823032855987549, 1.29856276512146, -0.915... negative \n", + "21 [-0.7015470862388611, -0.17110034823417664, -0... negative \n", + "22 [-1.4149224758148193, -1.0473958253860474, 0.1... negative \n", + "23 [-0.057901788502931595, 0.8722413778305054, -0... negative \n", + "24 [-0.33463752269744873, 0.7636221051216125, -0.... negative \n", + "25 [-1.2185980081558228, -0.9673603177070618, -0.... negative \n", + "26 [-1.1378378868103027, -1.2116812467575073, -0.... negative \n", + "27 [-0.8687025308609009, -0.0924399271607399, -0.... negative \n", + "28 [-2.376340627670288, -0.3160030245780945, -0.4... negative \n", + "29 [-1.1687604188919067, 0.5776483416557312, -0.6... negative \n", + "30 [-0.773508608341217, 0.5193011164665222, -0.54... negative \n", + "31 [-1.1500324010849, -0.5574671626091003, 0.3618... negative \n", + "32 [-1.2104802131652832, 0.2168104648590088, -0.0... negative \n", + "33 [-0.27279043197631836, -0.44589897990226746, -... negative \n", + "34 [0.06816250830888748, 0.562971830368042, -0.57... negative \n", + "35 [-1.1668932437896729, 0.6492086052894592, -0.3... negative \n", + "36 [-0.41535136103630066, -0.6858690977096558, -0... negative \n", + "37 [-0.058107126504182816, -0.19895373284816742, ... negative \n", + "38 [-0.8466075658798218, -0.350190132856369, 0.33... negative \n", + "39 [-0.8797492384910583, -0.09284910559654236, -0... negative \n", + "40 [-1.123725175857544, 0.27766698598861694, -0.4... negative \n", + "41 [-0.704077959060669, 0.4832143783569336, -0.43... negative \n", + "42 [-0.6472877860069275, 0.819389283657074, -0.50... negative \n", + "43 [-1.089509129524231, -0.010750504210591316, -0... negative \n", + "44 [-0.9178588390350342, -0.05760730057954788, -0... negative \n", + "45 [-0.723938524723053, 0.5092206597328186, -0.02... negative \n", + "46 [-2.088550567626953, 0.19368097186088562, -0.7... negative \n", + "47 [-1.236419916152954, 0.6744992733001709, -0.52... negative \n", + "48 [-0.9946603775024414, -0.6593665480613708, 0.3... negative \n", + "49 [-1.4270148277282715, 0.47338584065437317, -0.... negative \n", + "\n", + " sentiment_confidence text \\\n", + "0 7.0 kids in bus accident mocked by kids in passing... \n", + "1 1.0 these two words are stealing your freedom \n", + "2 1.0 bush vows to do 'that thing gore just said, on... \n", + "3 5.0 scotland's parliament backs new independence r... \n", + "4 2.0 joe biden: being vice president is 'a bitch' \n", + "5 1.0 u.s. army now just chasing single remaining is... \n", + "6 1.0 sheryl crow's freshness date expires \n", + "7 1.0 new ebola quarantine protocol seen as barrier ... \n", + "8 4.0 the secret to building a successful business t... \n", + "9 1.0 school for crime \n", + "10 1.0 want to increase trust? increase your say/do r... \n", + "11 1.0 when autocorrect and sexting collide \n", + "12 7.0 the house science committee doesn't seem to un... \n", + "13 6.0 eric clapton wows audience with even slower ve... \n", + "14 1.0 rick perry returning to iowa \n", + "15 6.0 one beer can't do local alcoholic any harm \n", + "16 2.0 couple at point where they're comfortable usin... \n", + "17 4.0 watch how mountains of trash spread across the... \n", + "18 6.0 lowly mortal opens portal to hell \n", + "19 5.0 stephen hawking reportedly working on juicy te... \n", + "20 2.0 offshorers demand: no taxes, no risk \n", + "21 5.0 troubling study finds majority of americans wh... \n", + "22 1.0 annoying man more annoying after skydiving \n", + "23 6.0 robin hood foundation \n", + "24 1.0 passenger glued to airplane window like it fuc... \n", + "25 3.0 amazingly humanlike robot able to commit thous... \n", + "26 5.0 tourists describe scenes of horror in tunisian... \n", + "27 3.0 is your christmas present spying on you? \n", + "28 5.0 area man busts his ass all day, and for what? \n", + "29 2.0 department of interior employee caught embezzl... \n", + "30 9.0 new software yellows neglected digital photos ... \n", + "31 8.0 read live updates on the cnn democratic debate \n", + "32 1.0 new poultry stripe gum hardly tastes like goos... \n", + "33 4.0 new film takes an honest look at life with a t... \n", + "34 3.0 vice president pence pushes expansive nato and... \n", + "35 2.0 mom tucks handwritten guide on how to use netf... \n", + "36 7.0 man insists facebook friend actually reads 'wh... \n", + "37 2.0 internet explorer makes desperate overture to ... \n", + "38 4.0 rock song takes pro-rock stance \n", + "39 9.0 college still looking for absolute saddest pla... \n", + "40 3.0 alton sterling's family demands action from ba... \n", + "41 1.0 ronda rousey wants to show you how ripped she ... \n", + "42 2.0 the workplace revolution: adding company cultu... \n", + "43 1.0 wolf blitzer walks into middle of olive garden... \n", + "44 4.0 grandchild, grandfather equally dreading colla... \n", + "45 1.0 congress reassures nervous zuckerberg they won... \n", + "46 3.0 senators lured back to emergency session by pr... \n", + "47 1.0 powerball officials remove plastic balls from ... \n", + "48 6.0 rude guy unfortunately says something funny \n", + "49 4.0 the room i carry with me \n", + "\n", + " y \n", + "0 positive \n", + "1 negative \n", + "2 positive \n", + "3 negative \n", + "4 negative \n", + "5 positive \n", + "6 positive \n", + "7 negative \n", + "8 negative \n", + "9 negative \n", + "10 negative \n", + "11 negative \n", + "12 negative \n", + "13 positive \n", + "14 negative \n", + "15 positive \n", + "16 positive \n", + "17 negative \n", + "18 positive \n", + "19 positive \n", + "20 negative \n", + "21 positive \n", + "22 positive \n", + "23 negative \n", + "24 positive \n", + "25 positive \n", + "26 negative \n", + "27 negative \n", + "28 positive \n", + "29 positive \n", + "30 positive \n", + "31 negative \n", + "32 positive \n", + "33 negative \n", + "34 negative \n", + "35 positive \n", + "36 positive \n", + "37 positive \n", + "38 positive \n", + "39 positive \n", + "40 negative \n", + "41 negative \n", + "42 negative \n", + "43 positive \n", + "44 positive \n", + "45 positive \n", + "46 positive \n", + "47 positive \n", + "48 positive \n", + "49 negative " + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Train longer!\n", + "trainable_pipe = nlp.load('train.sentiment')\n", + "trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(5)\n", + "fitted_pipe = trainable_pipe.fit(train_df.iloc[:50])\n", + "# predict with the trainable pipeline on dataset and get predictions\n", + "preds = fitted_pipe.predict(train_df.iloc[:50],output_level='document')\n", + "\n", + "#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n", + "preds.dropna(inplace=True)\n", + "print(classification_report(preds['y'], preds['sentiment']))\n", + "\n", + "preds" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qFoT-s1MjTSS" + }, + "source": [ + "# 7. Try training with different Embeddings" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "nxWFzQOhjWC8", + "outputId": "c87f2bce-fbf5-45d5-e6f2-1204f1897958" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "For language NLU provides the following Models : \n", + "nlu.load('am.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_amharic\n", + "For language NLU provides the following Models : \n", + "nlu.load('de.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "For language NLU provides the following Models : \n", + "nlu.load('el.embed_sentence.bert.base_uncased') returns Spark NLP model_anno_obj sent_bert_base_uncased\n", + "For language NLU provides the following Models : \n", + "nlu.load('en.embed_sentence') returns Spark NLP model_anno_obj tfhub_use\n", + "nlu.load('en.embed_sentence.albert') returns Spark NLP model_anno_obj albert_base_uncased\n", + "nlu.load('en.embed_sentence.bert') returns Spark NLP model_anno_obj sent_bert_base_uncased\n", + "nlu.load('en.embed_sentence.bert.base_uncased_legal') returns Spark NLP model_anno_obj sent_bert_base_uncased_legal\n", + "nlu.load('en.embed_sentence.bert.finetuned') returns Spark NLP model_anno_obj sbert_setfit_finetuned_financial_text_classification\n", + "nlu.load('en.embed_sentence.bert.pubmed') returns Spark NLP model_anno_obj sent_bert_pubmed\n", + "nlu.load('en.embed_sentence.bert.pubmed_squad2') returns Spark NLP model_anno_obj sent_bert_pubmed_squad2\n", + "nlu.load('en.embed_sentence.bert.wiki_books') returns Spark NLP model_anno_obj sent_bert_wiki_books\n", + "nlu.load('en.embed_sentence.bert.wiki_books_mnli') returns Spark NLP model_anno_obj sent_bert_wiki_books_mnli\n", + "nlu.load('en.embed_sentence.bert.wiki_books_qnli') returns Spark NLP model_anno_obj sent_bert_wiki_books_qnli\n", + "nlu.load('en.embed_sentence.bert.wiki_books_qqp') returns Spark NLP model_anno_obj sent_bert_wiki_books_qqp\n", + "nlu.load('en.embed_sentence.bert.wiki_books_squad2') returns Spark NLP model_anno_obj sent_bert_wiki_books_squad2\n", + "nlu.load('en.embed_sentence.bert.wiki_books_sst2') returns Spark NLP model_anno_obj sent_bert_wiki_books_sst2\n", + "nlu.load('en.embed_sentence.bert_base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "nlu.load('en.embed_sentence.bert_base_uncased') returns Spark NLP model_anno_obj sent_bert_base_uncased\n", + "nlu.load('en.embed_sentence.bert_large_cased') returns Spark NLP model_anno_obj sent_bert_large_cased\n", + "nlu.load('en.embed_sentence.bert_large_uncased') returns Spark NLP model_anno_obj sent_bert_large_uncased\n", + "nlu.load('en.embed_sentence.bert_use_cmlm_en_base') returns Spark NLP model_anno_obj sent_bert_use_cmlm_en_base\n", + "nlu.load('en.embed_sentence.bert_use_cmlm_en_large') returns Spark NLP model_anno_obj sent_bert_use_cmlm_en_large\n", + "nlu.load('en.embed_sentence.biobert.clinical_base_cased') returns Spark NLP model_anno_obj sent_biobert_clinical_base_cased\n", + "nlu.load('en.embed_sentence.biobert.discharge_base_cased') returns Spark NLP model_anno_obj sent_biobert_discharge_base_cased\n", + "nlu.load('en.embed_sentence.biobert.pmc_base_cased') returns Spark NLP model_anno_obj sent_biobert_pmc_base_cased\n", + "nlu.load('en.embed_sentence.biobert.pubmed_base_cased') returns Spark NLP model_anno_obj sent_biobert_pubmed_base_cased\n", + "nlu.load('en.embed_sentence.biobert.pubmed_large_cased') returns Spark NLP model_anno_obj sent_biobert_pubmed_large_cased\n", + "nlu.load('en.embed_sentence.biobert.pubmed_pmc_base_cased') returns Spark NLP model_anno_obj sent_biobert_pubmed_pmc_base_cased\n", + "nlu.load('en.embed_sentence.covidbert.large_uncased') returns Spark NLP model_anno_obj sent_covidbert_large_uncased\n", + "nlu.load('en.embed_sentence.distil_roberta.distilled_base') returns Spark NLP model_anno_obj sent_distilroberta_base\n", + "nlu.load('en.embed_sentence.doc2vec') returns Spark NLP model_anno_obj doc2vec_gigaword_300\n", + "nlu.load('en.embed_sentence.doc2vec.gigaword_300') returns Spark NLP model_anno_obj doc2vec_gigaword_300\n", + "nlu.load('en.embed_sentence.doc2vec.gigaword_wiki_300') returns Spark NLP model_anno_obj doc2vec_gigaword_wiki_300\n", + "nlu.load('en.embed_sentence.electra') returns Spark NLP model_anno_obj sent_electra_small_uncased\n", + "nlu.load('en.embed_sentence.electra_base_uncased') returns Spark NLP model_anno_obj sent_electra_base_uncased\n", + "nlu.load('en.embed_sentence.electra_large_uncased') returns Spark NLP model_anno_obj sent_electra_large_uncased\n", + "nlu.load('en.embed_sentence.electra_small_uncased') returns Spark NLP model_anno_obj sent_electra_small_uncased\n", + "nlu.load('en.embed_sentence.roberta.base') returns Spark NLP model_anno_obj sent_roberta_base\n", + "nlu.load('en.embed_sentence.roberta.large') returns Spark NLP model_anno_obj sent_roberta_large\n", + "nlu.load('en.embed_sentence.small_bert_L10_128') returns Spark NLP model_anno_obj sent_small_bert_L10_128\n", + "nlu.load('en.embed_sentence.small_bert_L10_256') returns Spark NLP model_anno_obj sent_small_bert_L10_256\n", + "nlu.load('en.embed_sentence.small_bert_L10_512') returns Spark NLP model_anno_obj sent_small_bert_L10_512\n", + "nlu.load('en.embed_sentence.small_bert_L10_768') returns Spark NLP model_anno_obj sent_small_bert_L10_768\n", + "nlu.load('en.embed_sentence.small_bert_L12_128') returns Spark NLP model_anno_obj sent_small_bert_L12_128\n", + "nlu.load('en.embed_sentence.small_bert_L12_256') returns Spark NLP model_anno_obj sent_small_bert_L12_256\n", + "nlu.load('en.embed_sentence.small_bert_L12_512') returns Spark NLP model_anno_obj sent_small_bert_L12_512\n", + "nlu.load('en.embed_sentence.small_bert_L12_768') returns Spark NLP model_anno_obj sent_small_bert_L12_768\n", + "nlu.load('en.embed_sentence.small_bert_L2_128') returns Spark NLP model_anno_obj sent_small_bert_L2_128\n", + "nlu.load('en.embed_sentence.small_bert_L2_256') returns Spark NLP model_anno_obj sent_small_bert_L2_256\n", + "nlu.load('en.embed_sentence.small_bert_L2_512') returns Spark NLP model_anno_obj sent_small_bert_L2_512\n", + "nlu.load('en.embed_sentence.small_bert_L2_768') returns Spark NLP model_anno_obj sent_small_bert_L2_768\n", + "nlu.load('en.embed_sentence.small_bert_L4_128') returns Spark NLP model_anno_obj sent_small_bert_L4_128\n", + "nlu.load('en.embed_sentence.small_bert_L4_256') returns Spark NLP model_anno_obj sent_small_bert_L4_256\n", + "nlu.load('en.embed_sentence.small_bert_L4_512') returns Spark NLP model_anno_obj sent_small_bert_L4_512\n", + "nlu.load('en.embed_sentence.small_bert_L4_768') returns Spark NLP model_anno_obj sent_small_bert_L4_768\n", + "nlu.load('en.embed_sentence.small_bert_L6_128') returns Spark NLP model_anno_obj sent_small_bert_L6_128\n", + "nlu.load('en.embed_sentence.small_bert_L6_256') returns Spark NLP model_anno_obj sent_small_bert_L6_256\n", + "nlu.load('en.embed_sentence.small_bert_L6_512') returns Spark NLP model_anno_obj sent_small_bert_L6_512\n", + "nlu.load('en.embed_sentence.small_bert_L6_768') returns Spark NLP model_anno_obj sent_small_bert_L6_768\n", + "nlu.load('en.embed_sentence.small_bert_L8_128') returns Spark NLP model_anno_obj sent_small_bert_L8_128\n", + "nlu.load('en.embed_sentence.small_bert_L8_256') returns Spark NLP model_anno_obj sent_small_bert_L8_256\n", + "nlu.load('en.embed_sentence.small_bert_L8_512') returns Spark NLP model_anno_obj sent_small_bert_L8_512\n", + "nlu.load('en.embed_sentence.small_bert_L8_768') returns Spark NLP model_anno_obj sent_small_bert_L8_768\n", + "nlu.load('en.embed_sentence.tfhub_use') returns Spark NLP model_anno_obj tfhub_use\n", + "nlu.load('en.embed_sentence.tfhub_use.lg') returns Spark NLP model_anno_obj tfhub_use_lg\n", + "nlu.load('en.embed_sentence.use') returns Spark NLP model_anno_obj tfhub_use\n", + "nlu.load('en.embed_sentence.use.lg') returns Spark NLP model_anno_obj tfhub_use_lg\n", + "For language NLU provides the following Models : \n", + "nlu.load('es.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "nlu.load('es.embed_sentence.bert.base_uncased') returns Spark NLP model_anno_obj sent_bert_base_uncased\n", + "For language NLU provides the following Models : \n", + "nlu.load('fi.embed_sentence.bert') returns Spark NLP model_anno_obj bert_base_finnish_uncased\n", + "nlu.load('fi.embed_sentence.bert.cased') returns Spark NLP model_anno_obj bert_base_finnish_cased\n", + "nlu.load('fi.embed_sentence.bert.uncased') returns Spark NLP model_anno_obj bert_base_finnish_uncased\n", + "For language NLU provides the following Models : \n", + "nlu.load('ha.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_hausa\n", + "For language NLU provides the following Models : \n", + "nlu.load('ig.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_igbo\n", + "For language NLU provides the following Models : \n", + "nlu.load('lg.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_luganda\n", + "For language NLU provides the following Models : \n", + "nlu.load('nl.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "For language NLU provides the following Models : \n", + "nlu.load('pcm.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_naija\n", + "For language NLU provides the following Models : \n", + "nlu.load('pt.embed_sentence.bert.base_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_base_tsdae_sts\n", + "nlu.load('pt.embed_sentence.bert.cased_large_legal') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.1\n", + "nlu.load('pt.embed_sentence.bert.large_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_gpl_sts\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.10.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.10\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.2.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.2\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.3.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.3\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.4.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.4\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.5.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.5\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.7.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.7\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.8.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.8\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.9.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.9\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v1.0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v1.0\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_gpl_nli_sts_v0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_gpl_nli_sts_v0\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_gpl_nli_sts_v1.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_gpl_nli_sts_v1\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_nli_sts_v0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_nli_sts_v0\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_nli_sts_v1.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_nli_sts_v1\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_sts_v0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_sts_v0\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_sts_v1.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_sts_v1\n", + "nlu.load('pt.embed_sentence.bert.v2_base_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_base_ma_v2\n", + "nlu.load('pt.embed_sentence.bert.v2_large_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_sts_v2\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.assin.base.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_base_ma\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.assin2.base.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_base\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_sts_by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_sts_ma.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large_ma\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_sts_ma_v3.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large_ma_v3\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_tsdae_sts.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_sts\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_tsdae_sts_v4.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_sts_v4\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_tsdae_v4_gpl_sts.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_v4_gpl_sts\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.v2_large_sts_v2.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large_v2\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.v2_large_v2_sts.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_v2_sts\n", + "For language NLU provides the following Models : \n", + "nlu.load('rw.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_kinyarwanda\n", + "For language NLU provides the following Models : \n", + "nlu.load('sv.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "For language NLU provides the following Models : \n", + "nlu.load('sw.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_swahili\n", + "For language NLU provides the following Models : \n", + "nlu.load('wo.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_wolof\n", + "For language NLU provides the following Models : \n", + "nlu.load('xx.embed_sentence') returns Spark NLP model_anno_obj sent_bert_multi_cased\n", + "nlu.load('xx.embed_sentence.bert') returns Spark NLP model_anno_obj sent_bert_multi_cased\n", + "nlu.load('xx.embed_sentence.bert.cased') returns Spark NLP model_anno_obj sent_bert_multi_cased\n", + "nlu.load('xx.embed_sentence.bert.muril') returns Spark NLP model_anno_obj sent_bert_muril\n", + "nlu.load('xx.embed_sentence.bert_use_cmlm_multi_base') returns Spark NLP model_anno_obj sent_bert_use_cmlm_multi_base\n", + "nlu.load('xx.embed_sentence.bert_use_cmlm_multi_base_br') returns Spark NLP model_anno_obj sent_bert_use_cmlm_multi_base_br\n", + "nlu.load('xx.embed_sentence.labse') returns Spark NLP model_anno_obj labse\n", + "nlu.load('xx.embed_sentence.xlm_roberta.base') returns Spark NLP model_anno_obj sent_xlm_roberta_base\n", + "For language NLU provides the following Models : \n", + "nlu.load('yo.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_yoruba\n", + "For language NLU provides the following Models : \n", + "nlu.load('zh.embed_sentence.bert') returns Spark NLP model_anno_obj sbert_chinese_qmc_finance_v1\n", + "nlu.load('zh.embed_sentence.bert.distilled') returns Spark NLP model_anno_obj sbert_chinese_qmc_finance_v1_distill\n" + ] + } + ], + "source": [ + "# We can use nlu.print_components(action='embed_sentence') to see every possibler sentence embedding we could use. Lets use bert!\n", + "nlp.nlu.print_components(action='embed_sentence')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "IKK_Ii_gjJfF", + "outputId": "14ac0c42-1969-435b-a834-d48601c15b2c" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", + "sent_small_bert_L12_768 download started this may take some time.\n", + "Approximate size to download 392.9 MB\n", + "[OK!]\n", + " precision recall f1-score support\n", + "\n", + " negative 0.92 0.86 0.89 542\n", + " neutral 0.00 0.00 0.00 0\n", + " positive 0.89 0.85 0.87 458\n", + "\n", + " accuracy 0.86 1000\n", + " macro avg 0.60 0.57 0.59 1000\n", + "weighted avg 0.91 0.86 0.88 1000\n", + "\n" + ] + } + ], + "source": [ + "trainable_pipe = nlp.load('en.embed_sentence.small_bert_L12_768 train.sentiment')\n", + "# We need to train longer and user smaller LR for NON-USE based sentence embeddings usually\n", + "# We could tune the hyperparameters further with hyperparameter tuning methods like gridsearch\n", + "# Also longer training gives more accuracy\n", + "trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(120)\n", + "trainable_pipe['trainable_sentiment_dl'].setLr(0.0005)\n", + "fitted_pipe = trainable_pipe.fit(train_df[:1000])\n", + "# predict with the trainable pipeline on dataset and get predictions\n", + "preds = fitted_pipe.predict(train_df[:1000],output_level='document')\n", + "\n", + "#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n", + "preds.dropna(inplace=True)\n", + "print(classification_report(preds['y'], preds['sentiment']))\n", + "\n", + "#preds" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_1jxw3GnVGlI" + }, + "source": [ + "# 7.1 evaluate on Test Data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "Fxx4yNkNVGFl", + "outputId": "542c6589-d1f1-4310-c0b3-0717bdd4111e" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " precision recall f1-score support\n", + "\n", + " negative 0.76 0.79 0.78 53\n", + " neutral 0.00 0.00 0.00 0\n", + " positive 0.82 0.60 0.69 47\n", + "\n", + " accuracy 0.70 100\n", + " macro avg 0.53 0.46 0.49 100\n", + "weighted avg 0.79 0.70 0.74 100\n", + "\n" + ] + } + ], + "source": [ + "preds = fitted_pipe.predict(test_df[:100],output_level='document')\n", + "\n", + "#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n", + "preds.dropna(inplace=True)\n", + "print(classification_report(preds['y'], preds['sentiment']))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2BB-NwZUoHSe" + }, + "source": [ + "# 8. Lets save the model" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "eLex095goHwm" + }, + "outputs": [], + "source": [ + "stored_model_path = './models/classifier_dl_trained'\n", + "fitted_pipe.save(stored_model_path)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e_b2DPd4rCiU" + }, + "source": [ + "# 9. Lets load the model from HDD.\n", + "This makes Offlien NLU usage possible! \n", + "You need to call nlu.load(path=path_to_the_pipe) to load a model/pipeline from disk." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 133 + }, + "id": "SO4uz45MoRgp", + "outputId": "a37045b7-f9d1-4fbc-fbd2-8ab0731d2614" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n" + ] + }, + { + "data": { + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
documentsentence_embedding_from_disksentimentsentiment_confidence
0Aliens are immortal![0.3093056380748749, 0.1294729858636856, 0.065...negative0.0
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "text/plain": [ + " document sentence_embedding_from_disk \\\n", + "0 Aliens are immortal! [0.3093056380748749, 0.1294729858636856, 0.065... \n", + "\n", + " sentiment sentiment_confidence \n", + "0 negative 0.0 " + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "hdd_pipe = nlp.load(path=stored_model_path)\n", + "\n", + "preds = hdd_pipe.predict('Aliens are immortal!')\n", + "\n", + "preds" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "e0CVlkk9v6Qi", + "outputId": "4edc51a7-f104-4c0e-a480-58f5a2f62d6e" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n", + ">>> component_list['document_assembler'] has settable params:\n", + "component_list['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n", + ">>> component_list['bert_sentence_embeddings@sent_small_bert_L12_768'] has settable params:\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setBatchSize(8) | Info: Size of every batch | Currently set to : 8\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setCaseSensitive(False) | Info: whether to ignore case in tokens for embeddings matching | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setDimension(768) | Info: Number of embedding dimensions | Currently set to : 768\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setMaxSentenceLength(128) | Info: Max sentence length to process | Currently set to : 128\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setIsLong(False) | Info: Use Long type instead of Int type for inputs buffer - Some Bert models require Long instead of Int. | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setStorageRef('sent_small_bert_L12_768') | Info: unique reference name for identification | Currently set to : sent_small_bert_L12_768\n", + ">>> component_list['sentiment_dl@sent_small_bert_L12_768'] has settable params:\n", + "component_list['sentiment_dl@sent_small_bert_L12_768'].setThreshold(0.6) | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n", + "component_list['sentiment_dl@sent_small_bert_L12_768'].setThresholdLabel('neutral') | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n", + "component_list['sentiment_dl@sent_small_bert_L12_768'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['sentiment_dl@sent_small_bert_L12_768'].setClasses(['positive', 'negative']) | Info: get the tags used to trained this SentimentDLModel | Currently set to : ['positive', 'negative']\n", + "component_list['sentiment_dl@sent_small_bert_L12_768'].setStorageRef('sent_small_bert_L12_768') | Info: unique reference name for identification | Currently set to : sent_small_bert_L12_768\n" + ] + } + ], + "source": [ + "hdd_pipe.print_info()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "aq3RCRU4wHsv" + }, + "outputs": [], + "source": [] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [ + "zkufh760uvF3" + ], + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.4" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo.ipynb b/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo.ipynb index bd71d0e9..dc1fe968 100644 --- a/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo.ipynb +++ b/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo.ipynb @@ -1 +1,2652 @@ -{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"NLU_training_sentiment_classifier_demo.ipynb","provenance":[],"collapsed_sections":[]},"kernelspec":{"name":"python3","display_name":"Python 3"}},"cells":[{"cell_type":"markdown","metadata":{"id":"zkufh760uvF3"},"source":["![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)\n","\n","[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo.ipynb)\n","\n","\n","\n","# Training a Sentiment Analysis Classifier with NLU \n","With the [ClassifierDL model](https://nlp.johnsnowlabs.com/docs/en/annotators#classifierdl-multi-class-text-classification) from Spark NLP you can achieve State Of the Art results on any multi class text classification problem \n","\n","This notebook showcases the following features : \n","\n","- How to train the deep learning classifier\n","- How to store a pipeline to disk\n","- How to load the pipeline from disk (Enables NLU offline mode)\n","\n"]},{"cell_type":"markdown","metadata":{"id":"dur2drhW5Rvi"},"source":["# 1. Install Java 8 and NLU"]},{"cell_type":"code","metadata":{"id":"hFGnBCHavltY","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1620215193255,"user_tz":-120,"elapsed":126309,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"ede615d6-a5e6-41a3-aa1e-1f0c3c841234"},"source":["!wget https://setup.johnsnowlabs.com/nlu/colab.sh -O - | bash\n","import nlu"],"execution_count":null,"outputs":[{"output_type":"stream","text":["--2021-05-05 11:44:27-- https://raw.githubusercontent.com/JohnSnowLabs/nlu/master/scripts/colab_setup.sh\n","Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.111.133, 185.199.109.133, ...\n","Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.\n","HTTP request sent, awaiting response... 200 OK\n","Length: 1671 (1.6K) [text/plain]\n","Saving to: ‘STDOUT’\n","\n","- 0%[ ] 0 --.-KB/s Installing NLU 3.0.0 with PySpark 3.0.2 and Spark NLP 3.0.1 for Google Colab ...\n","- 100%[===================>] 1.63K --.-KB/s in 0.001s \n","\n","2021-05-05 11:44:27 (1.49 MB/s) - written to stdout [1671/1671]\n","\n","\u001b[K |████████████████████████████████| 204.8MB 58kB/s \n","\u001b[K |████████████████████████████████| 153kB 46.3MB/s \n","\u001b[K |████████████████████████████████| 204kB 18.2MB/s \n","\u001b[K |████████████████████████████████| 204kB 52.9MB/s \n","\u001b[?25h Building wheel for pyspark (setup.py) ... \u001b[?25l\u001b[?25hdone\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"f4KkTfnR5Ugg"},"source":["# 2. Download Stock Market Sentiment dataset \n","https://www.kaggle.com/yash612/stockmarket-sentiment-dataset"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"OrVb5ZMvvrQD","executionInfo":{"status":"ok","timestamp":1620215194079,"user_tz":-120,"elapsed":127118,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"f161f492-1d88-4652-aeb8-1a0f67e0272c"},"source":["! wget http://ckl-it.de/wp-content/uploads/2020/11/stock_data.csv\n"],"execution_count":null,"outputs":[{"output_type":"stream","text":["--2021-05-05 11:46:32-- http://ckl-it.de/wp-content/uploads/2020/11/stock_data.csv\n","Resolving ckl-it.de (ckl-it.de)... 217.160.0.108, 2001:8d8:100f:f000::209\n","Connecting to ckl-it.de (ckl-it.de)|217.160.0.108|:80... connected.\n","HTTP request sent, awaiting response... 200 OK\n","Length: 479973 (469K) [text/csv]\n","Saving to: ‘stock_data.csv’\n","\n","stock_data.csv 100%[===================>] 468.72K 846KB/s in 0.6s \n","\n","2021-05-05 11:46:33 (846 KB/s) - ‘stock_data.csv’ saved [479973/479973]\n","\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"uDGIOASY_fRj","executionInfo":{"status":"ok","timestamp":1620215245366,"user_tz":-120,"elapsed":178400,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"fcf975df-6fe9-4a6a-f7fd-10439b2b541c"},"source":["import nlu\n","sentiment = nlu.load('sentiment')"],"execution_count":null,"outputs":[{"output_type":"stream","text":["analyze_sentiment download started this may take some time.\n","Approx size to download 4.9 MB\n","[OK!]\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":76},"id":"U0ENiuMc_kyb","executionInfo":{"status":"ok","timestamp":1620215255157,"user_tz":-120,"elapsed":188186,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"b77e7958-75b4-4bf7-aff8-e416450b168a"},"source":["sentiment.predict(\"I'm very very not at all happy\")"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
tokensentimentsentenceorigin_indexspelldocumentsentiment_confidencetext
0[I'm, very, very, not, at, all, happy][positive][I'm very very not at all happy]8589934592[I'm, very, very, not, at, all, happy]I'm very very not at all happy[0.3043]I'm very very not at all happy
\n","
"],"text/plain":[" token ... text\n","0 [I'm, very, very, not, at, all, happy] ... I'm very very not at all happy\n","\n","[1 rows x 8 columns]"]},"metadata":{"tags":[]},"execution_count":4}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":391},"id":"y4xSRWIhwT28","executionInfo":{"status":"ok","timestamp":1620215255159,"user_tz":-120,"elapsed":188185,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"42f2b380-7742-49ee-f640-c62d4c969fce"},"source":["import pandas as pd\n","train_path = '/content/stock_data.csv'\n","\n","train_df = pd.read_csv(train_path)\n","# the text data to use for classification should be in a column named 'text'\n","# the label column must have name 'y' name be of type str\n","train_df.columns=['text','y']\n","train_df.y = train_df.y.astype(str)\n","train_df.y = train_df.y.str.replace('-1','negative')\n","train_df.y = train_df.y.str.replace('1','positive')\n","train_df"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
texty
0Kickers on my watchlist XIDE TIT SOQ PNK CPW B...positive
1user: AAP MOVIE. 55% return for the FEA/GEED i...positive
2user I'd be afraid to short AMZN - they are lo...positive
3MNTA Over 12.00positive
4OI Over 21.37positive
.........
5786Industry body CII said #discoms are likely to ...negative
5787#Gold prices slip below Rs 46,000 as #investor...negative
5788Workers at Bajaj Auto have agreed to a 10% wag...positive
5789#Sharemarket LIVE: Sensex off day’s high, up 6...positive
5790#Sensex, #Nifty climb off day's highs, still u...positive
\n","

5791 rows × 2 columns

\n","
"],"text/plain":[" text y\n","0 Kickers on my watchlist XIDE TIT SOQ PNK CPW B... positive\n","1 user: AAP MOVIE. 55% return for the FEA/GEED i... positive\n","2 user I'd be afraid to short AMZN - they are lo... positive\n","3 MNTA Over 12.00 positive\n","4 OI Over 21.37 positive\n","... ... ...\n","5786 Industry body CII said #discoms are likely to ... negative\n","5787 #Gold prices slip below Rs 46,000 as #investor... negative\n","5788 Workers at Bajaj Auto have agreed to a 10% wag... positive\n","5789 #Sharemarket LIVE: Sensex off day’s high, up 6... positive\n","5790 #Sensex, #Nifty climb off day's highs, still u... positive\n","\n","[5791 rows x 2 columns]"]},"metadata":{"tags":[]},"execution_count":5}]},{"cell_type":"markdown","metadata":{"id":"0296Om2C5anY"},"source":["# 3. Train Deep Learning Classifier using nlu.load('train.sentiment')\n","\n","You dataset label column should be named 'y' and the feature column with text data should be named 'text'"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":651},"id":"3ZIPkRkWftBG","executionInfo":{"status":"ok","timestamp":1620215420196,"user_tz":-120,"elapsed":353218,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"0eea644b-5461-4347-b47e-c300e441be80"},"source":["from sklearn.metrics import classification_report\n","import nlu \n","# load a trainable pipeline by specifying the train. prefix and fit it on a datset with label and text columns\n","# by default the Universal Sentence Encoder (USE) Sentence embeddings are used for generation\n","trainable_pipe = nlu.load('train.sentiment')\n","fitted_pipe = trainable_pipe.fit(train_df)\n","\n","# predict with the trainable pipeline on dataset and get predictions\n","preds = fitted_pipe.predict(train_df,output_level='document')\n","#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n","preds.dropna(inplace=True)\n","print(classification_report(preds['y'], preds['sentiment']))\n","\n","preds"],"execution_count":null,"outputs":[{"output_type":"stream","text":["tfhub_use download started this may take some time.\n","Approximate size to download 923.7 MB\n","[OK!]\n","sentence_detector_dl download started this may take some time.\n","Approximate size to download 354.6 KB\n","[OK!]\n"," precision recall f1-score support\n","\n"," negative 0.61 0.58 0.59 2106\n"," neutral 0.00 0.00 0.00 0\n"," positive 0.79 0.71 0.75 3685\n","\n"," accuracy 0.66 5791\n"," macro avg 0.47 0.43 0.45 5791\n","weighted avg 0.73 0.66 0.69 5791\n","\n"],"name":"stdout"},{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
sentenceorigin_indexydocumenttrained_sentiment_confidencetexttrained_sentimentsentence_embedding_use
0[Kickers on my watchlist XIDE TIT SOQ PNK CPW ...0positiveKickers on my watchlist XIDE TIT SOQ PNK CPW B...0.985571Kickers on my watchlist XIDE TIT SOQ PNK CPW B...positive[0.006487144622951746, -0.042024899274110794, ...
1[user: AAP MOVIE. 55% return for the FEA/GEED ...1positiveuser: AAP MOVIE. 55% return for the FEA/GEED i...0.720251user: AAP MOVIE. 55% return for the FEA/GEED i...positive[0.04891366884112358, -0.07381151616573334, -0...
2[user I'd be afraid to short AMZN - they are l...2positiveuser I'd be afraid to short AMZN - they are lo...0.580390user I'd be afraid to short AMZN - they are lo...neutral[0.05556508153676987, -0.016491785645484924, 0...
3[MNTA Over 12.00]3positiveMNTA Over 12.000.941032MNTA Over 12.00positive[-0.010976563207805157, -0.029801178723573685,...
4[OI Over 21.37]4positiveOI Over 21.370.535012OI Over 21.37neutral[0.024849383160471916, 0.04679657891392708, -0...
...........................
5786[Industry body CII said #discoms are likely to...5786negativeIndustry body CII said #discoms are likely to ...0.939032Industry body CII said #discoms are likely to ...negative[0.020985640585422516, -0.03145354241132736, -...
5787[#Gold prices slip below Rs 46,000 as #investo...5787negative#Gold prices slip below Rs 46,000 as #investor...0.983991#Gold prices slip below Rs 46,000 as #investor...negative[0.05627664923667908, 0.012842322699725628, -0...
5788[Workers at Bajaj Auto have agreed to a 10% wa...5788positiveWorkers at Bajaj Auto have agreed to a 10% wag...0.918838Workers at Bajaj Auto have agreed to a 10% wag...negative[0.019935883581638336, -0.031780488789081573, ...
5789[#Sharemarket LIVE: Sensex off day’s high, up ...5789positive#Sharemarket LIVE: Sensex off day’s high, up 6...0.761864#Sharemarket LIVE: Sensex off day’s high, up 6...positive[0.0031773506198078394, -0.04296385496854782, ...
5790[#Sensex, #Nifty climb off day's highs, still ...5790positive#Sensex, #Nifty climb off day's highs, still u...0.904347#Sensex, #Nifty climb off day's highs, still u...positive[0.04823731631040573, -0.012027987278997898, -...
\n","

5791 rows × 8 columns

\n","
"],"text/plain":[" sentence ... sentence_embedding_use\n","0 [Kickers on my watchlist XIDE TIT SOQ PNK CPW ... ... [0.006487144622951746, -0.042024899274110794, ...\n","1 [user: AAP MOVIE. 55% return for the FEA/GEED ... ... [0.04891366884112358, -0.07381151616573334, -0...\n","2 [user I'd be afraid to short AMZN - they are l... ... [0.05556508153676987, -0.016491785645484924, 0...\n","3 [MNTA Over 12.00] ... [-0.010976563207805157, -0.029801178723573685,...\n","4 [OI Over 21.37] ... [0.024849383160471916, 0.04679657891392708, -0...\n","... ... ... ...\n","5786 [Industry body CII said #discoms are likely to... ... [0.020985640585422516, -0.03145354241132736, -...\n","5787 [#Gold prices slip below Rs 46,000 as #investo... ... [0.05627664923667908, 0.012842322699725628, -0...\n","5788 [Workers at Bajaj Auto have agreed to a 10% wa... ... [0.019935883581638336, -0.031780488789081573, ...\n","5789 [#Sharemarket LIVE: Sensex off day’s high, up ... ... [0.0031773506198078394, -0.04296385496854782, ...\n","5790 [#Sensex, #Nifty climb off day's highs, still ... ... [0.04823731631040573, -0.012027987278997898, -...\n","\n","[5791 rows x 8 columns]"]},"metadata":{"tags":[]},"execution_count":6}]},{"cell_type":"markdown","metadata":{"id":"lVyOE2wV0fw_"},"source":["# Test the fitted pipe on new example"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":76},"id":"qdCUg2MR0PD2","executionInfo":{"status":"ok","timestamp":1620215420883,"user_tz":-120,"elapsed":353902,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"4ebad73c-0dc4-40cd-d173-a8134f269886"},"source":["fitted_pipe.predict(\"Bitcoin is going to the moon!\")"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
sentenceorigin_indexdocumenttrained_sentiment_confidencetrained_sentimentsentence_embedding_use
0[Bitcoin is going to the moon!]0Bitcoin is going to the moon!0.713436positive[0.06468033790588379, -0.040837567299604416, -...
\n","
"],"text/plain":[" sentence ... sentence_embedding_use\n","0 [Bitcoin is going to the moon!] ... [0.06468033790588379, -0.040837567299604416, -...\n","\n","[1 rows x 6 columns]"]},"metadata":{"tags":[]},"execution_count":7}]},{"cell_type":"markdown","metadata":{"id":"xflpwrVjjBVD"},"source":["## Configure pipe training parameters"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"UtsAUGTmOTms","executionInfo":{"status":"ok","timestamp":1620215420884,"user_tz":-120,"elapsed":353900,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"0a8cbe1d-a0be-47d9-e9d4-fac1f907b178"},"source":["trainable_pipe.print_info()"],"execution_count":null,"outputs":[{"output_type":"stream","text":["The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n",">>> pipe['sentiment_dl'] has settable params:\n","pipe['sentiment_dl'].setMaxEpochs(1) | Info: Maximum number of epochs to train | Currently set to : 1\n","pipe['sentiment_dl'].setLr(0.005) | Info: Learning Rate | Currently set to : 0.005\n","pipe['sentiment_dl'].setBatchSize(64) | Info: Batch size | Currently set to : 64\n","pipe['sentiment_dl'].setDropout(0.5) | Info: Dropout coefficient | Currently set to : 0.5\n","pipe['sentiment_dl'].setEnableOutputLogs(True) | Info: Whether to use stdout in addition to Spark logs. | Currently set to : True\n","pipe['sentiment_dl'].setThreshold(0.6) | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n","pipe['sentiment_dl'].setThresholdLabel('neutral') | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n",">>> pipe['use@tfhub_use'] has settable params:\n","pipe['use@tfhub_use'].setDimension(512) | Info: Number of embedding dimensions | Currently set to : 512\n","pipe['use@tfhub_use'].setLoadSP(False) | Info: Whether to load SentencePiece ops file which is required only by multi-lingual models. This is not changeable after it's set with a pretrained model nor it is compatible with Windows. | Currently set to : False\n","pipe['use@tfhub_use'].setStorageRef('tfhub_use') | Info: unique reference name for identification | Currently set to : tfhub_use\n",">>> pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'] has settable params:\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setExplodeSentences(False) | Info: whether to explode each sentence into a different row, for better parallelization. Defaults to false. | Currently set to : False\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setStorageRef('SentenceDetectorDLModel_c83c27f46b97') | Info: storage unique identifier | Currently set to : SentenceDetectorDLModel_c83c27f46b97\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setEncoder(com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@4ae24199) | Info: Data encoder | Currently set to : com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@4ae24199\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setImpossiblePenultimates(['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']) | Info: Impossible penultimates | Currently set to : ['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setModelArchitecture('cnn') | Info: Model architecture (CNN) | Currently set to : cnn\n",">>> pipe['document_assembler'] has settable params:\n","pipe['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"2GJdDNV9jEIe"},"source":["## Retrain with new parameters"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":553},"id":"mptfvHx-MMMX","executionInfo":{"status":"ok","timestamp":1620215480821,"user_tz":-120,"elapsed":413834,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"a07fb934-0fcb-4213-f693-118b873437c5"},"source":["# Train longer!\n","trainable_pipe = nlu.load('train.sentiment')\n","trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(5) \n","fitted_pipe = trainable_pipe.fit(train_df)\n","# predict with the trainable pipeline on dataset and get predictions\n","preds = fitted_pipe.predict(train_df,output_level='document')\n","\n","#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n","preds.dropna(inplace=True)\n","print(classification_report(preds['y'], preds['sentiment']))\n","\n","preds"],"execution_count":null,"outputs":[{"output_type":"stream","text":[" precision recall f1-score support\n","\n"," negative 0.83 0.38 0.52 2106\n"," neutral 0.00 0.00 0.00 0\n"," positive 0.75 0.94 0.83 3685\n","\n"," accuracy 0.74 5791\n"," macro avg 0.53 0.44 0.45 5791\n","weighted avg 0.78 0.74 0.72 5791\n","\n"],"name":"stdout"},{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
sentenceorigin_indexydocumenttrained_sentiment_confidencetexttrained_sentimentsentence_embedding_use
0[Kickers on my watchlist XIDE TIT SOQ PNK CPW ...0positiveKickers on my watchlist XIDE TIT SOQ PNK CPW B...1.000000Kickers on my watchlist XIDE TIT SOQ PNK CPW B...positive[0.006487144622951746, -0.042024899274110794, ...
1[user: AAP MOVIE. 55% return for the FEA/GEED ...1positiveuser: AAP MOVIE. 55% return for the FEA/GEED i...0.934187user: AAP MOVIE. 55% return for the FEA/GEED i...positive[0.04891366884112358, -0.07381151616573334, -0...
2[user I'd be afraid to short AMZN - they are l...2positiveuser I'd be afraid to short AMZN - they are lo...0.625671user I'd be afraid to short AMZN - they are lo...negative[0.05556508153676987, -0.016491785645484924, 0...
3[MNTA Over 12.00]3positiveMNTA Over 12.000.999983MNTA Over 12.00positive[-0.010976563207805157, -0.029801178723573685,...
4[OI Over 21.37]4positiveOI Over 21.370.985523OI Over 21.37positive[0.024849383160471916, 0.04679657891392708, -0...
...........................
5786[Industry body CII said #discoms are likely to...5786negativeIndustry body CII said #discoms are likely to ...0.733400Industry body CII said #discoms are likely to ...negative[0.020985640585422516, -0.03145354241132736, -...
5787[#Gold prices slip below Rs 46,000 as #investo...5787negative#Gold prices slip below Rs 46,000 as #investor...0.967702#Gold prices slip below Rs 46,000 as #investor...negative[0.05627664923667908, 0.012842322699725628, -0...
5788[Workers at Bajaj Auto have agreed to a 10% wa...5788positiveWorkers at Bajaj Auto have agreed to a 10% wag...0.778937Workers at Bajaj Auto have agreed to a 10% wag...negative[0.019935883581638336, -0.031780488789081573, ...
5789[#Sharemarket LIVE: Sensex off day’s high, up ...5789positive#Sharemarket LIVE: Sensex off day’s high, up 6...0.999009#Sharemarket LIVE: Sensex off day’s high, up 6...positive[0.0031773506198078394, -0.04296385496854782, ...
5790[#Sensex, #Nifty climb off day's highs, still ...5790positive#Sensex, #Nifty climb off day's highs, still u...0.999243#Sensex, #Nifty climb off day's highs, still u...positive[0.04823731631040573, -0.012027987278997898, -...
\n","

5791 rows × 8 columns

\n","
"],"text/plain":[" sentence ... sentence_embedding_use\n","0 [Kickers on my watchlist XIDE TIT SOQ PNK CPW ... ... [0.006487144622951746, -0.042024899274110794, ...\n","1 [user: AAP MOVIE. 55% return for the FEA/GEED ... ... [0.04891366884112358, -0.07381151616573334, -0...\n","2 [user I'd be afraid to short AMZN - they are l... ... [0.05556508153676987, -0.016491785645484924, 0...\n","3 [MNTA Over 12.00] ... [-0.010976563207805157, -0.029801178723573685,...\n","4 [OI Over 21.37] ... [0.024849383160471916, 0.04679657891392708, -0...\n","... ... ... ...\n","5786 [Industry body CII said #discoms are likely to... ... [0.020985640585422516, -0.03145354241132736, -...\n","5787 [#Gold prices slip below Rs 46,000 as #investo... ... [0.05627664923667908, 0.012842322699725628, -0...\n","5788 [Workers at Bajaj Auto have agreed to a 10% wa... ... [0.019935883581638336, -0.031780488789081573, ...\n","5789 [#Sharemarket LIVE: Sensex off day’s high, up ... ... [0.0031773506198078394, -0.04296385496854782, ...\n","5790 [#Sensex, #Nifty climb off day's highs, still ... ... [0.04823731631040573, -0.012027987278997898, -...\n","\n","[5791 rows x 8 columns]"]},"metadata":{"tags":[]},"execution_count":9}]},{"cell_type":"markdown","metadata":{"id":"qFoT-s1MjTSS"},"source":["# Try training with different Embeddings"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"nxWFzQOhjWC8","executionInfo":{"status":"ok","timestamp":1620215481023,"user_tz":-120,"elapsed":414033,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"69e55a4f-67b2-48e2-a5f1-395f5486dc31"},"source":["# We can use nlu.print_components(action='embed_sentence') to see every possibler sentence embedding we could use. Lets use bert!\n","nlu.print_components(action='embed_sentence')"],"execution_count":null,"outputs":[{"output_type":"stream","text":["For language NLU provides the following Models : \n","nlu.load('en.embed_sentence') returns Spark NLP model tfhub_use\n","nlu.load('en.embed_sentence.use') returns Spark NLP model tfhub_use\n","nlu.load('en.embed_sentence.tfhub_use') returns Spark NLP model tfhub_use\n","nlu.load('en.embed_sentence.use.lg') returns Spark NLP model tfhub_use_lg\n","nlu.load('en.embed_sentence.tfhub_use.lg') returns Spark NLP model tfhub_use_lg\n","nlu.load('en.embed_sentence.albert') returns Spark NLP model albert_base_uncased\n","nlu.load('en.embed_sentence.electra') returns Spark NLP model sent_electra_small_uncased\n","nlu.load('en.embed_sentence.electra_small_uncased') returns Spark NLP model sent_electra_small_uncased\n","nlu.load('en.embed_sentence.electra_base_uncased') returns Spark NLP model sent_electra_base_uncased\n","nlu.load('en.embed_sentence.electra_large_uncased') returns Spark NLP model sent_electra_large_uncased\n","nlu.load('en.embed_sentence.bert') returns Spark NLP model sent_bert_base_uncased\n","nlu.load('en.embed_sentence.bert_base_uncased') returns Spark NLP model sent_bert_base_uncased\n","nlu.load('en.embed_sentence.bert_base_cased') returns Spark NLP model sent_bert_base_cased\n","nlu.load('en.embed_sentence.bert_large_uncased') returns Spark NLP model sent_bert_large_uncased\n","nlu.load('en.embed_sentence.bert_large_cased') returns Spark NLP model sent_bert_large_cased\n","nlu.load('en.embed_sentence.biobert.pubmed_base_cased') returns Spark NLP model sent_biobert_pubmed_base_cased\n","nlu.load('en.embed_sentence.biobert.pubmed_large_cased') returns Spark NLP model sent_biobert_pubmed_large_cased\n","nlu.load('en.embed_sentence.biobert.pmc_base_cased') returns Spark NLP model sent_biobert_pmc_base_cased\n","nlu.load('en.embed_sentence.biobert.pubmed_pmc_base_cased') returns Spark NLP model sent_biobert_pubmed_pmc_base_cased\n","nlu.load('en.embed_sentence.biobert.clinical_base_cased') returns Spark NLP model sent_biobert_clinical_base_cased\n","nlu.load('en.embed_sentence.biobert.discharge_base_cased') returns Spark NLP model sent_biobert_discharge_base_cased\n","nlu.load('en.embed_sentence.covidbert.large_uncased') returns Spark NLP model sent_covidbert_large_uncased\n","nlu.load('en.embed_sentence.small_bert_L2_128') returns Spark NLP model sent_small_bert_L2_128\n","nlu.load('en.embed_sentence.small_bert_L4_128') returns Spark NLP model sent_small_bert_L4_128\n","nlu.load('en.embed_sentence.small_bert_L6_128') returns Spark NLP model sent_small_bert_L6_128\n","nlu.load('en.embed_sentence.small_bert_L8_128') returns Spark NLP model sent_small_bert_L8_128\n","nlu.load('en.embed_sentence.small_bert_L10_128') returns Spark NLP model sent_small_bert_L10_128\n","nlu.load('en.embed_sentence.small_bert_L12_128') returns Spark NLP model sent_small_bert_L12_128\n","nlu.load('en.embed_sentence.small_bert_L2_256') returns Spark NLP model sent_small_bert_L2_256\n","nlu.load('en.embed_sentence.small_bert_L4_256') returns Spark NLP model sent_small_bert_L4_256\n","nlu.load('en.embed_sentence.small_bert_L6_256') returns Spark NLP model sent_small_bert_L6_256\n","nlu.load('en.embed_sentence.small_bert_L8_256') returns Spark NLP model sent_small_bert_L8_256\n","nlu.load('en.embed_sentence.small_bert_L10_256') returns Spark NLP model sent_small_bert_L10_256\n","nlu.load('en.embed_sentence.small_bert_L12_256') returns Spark NLP model sent_small_bert_L12_256\n","nlu.load('en.embed_sentence.small_bert_L2_512') returns Spark NLP model sent_small_bert_L2_512\n","nlu.load('en.embed_sentence.small_bert_L4_512') returns Spark NLP model sent_small_bert_L4_512\n","nlu.load('en.embed_sentence.small_bert_L6_512') returns Spark NLP model sent_small_bert_L6_512\n","nlu.load('en.embed_sentence.small_bert_L8_512') returns Spark NLP model sent_small_bert_L8_512\n","nlu.load('en.embed_sentence.small_bert_L10_512') returns Spark NLP model sent_small_bert_L10_512\n","nlu.load('en.embed_sentence.small_bert_L12_512') returns Spark NLP model sent_small_bert_L12_512\n","nlu.load('en.embed_sentence.small_bert_L2_768') returns Spark NLP model sent_small_bert_L2_768\n","nlu.load('en.embed_sentence.small_bert_L4_768') returns Spark NLP model sent_small_bert_L4_768\n","nlu.load('en.embed_sentence.small_bert_L6_768') returns Spark NLP model sent_small_bert_L6_768\n","nlu.load('en.embed_sentence.small_bert_L8_768') returns Spark NLP model sent_small_bert_L8_768\n","nlu.load('en.embed_sentence.small_bert_L10_768') returns Spark NLP model sent_small_bert_L10_768\n","nlu.load('en.embed_sentence.small_bert_L12_768') returns Spark NLP model sent_small_bert_L12_768\n","For language NLU provides the following Models : \n","nlu.load('fi.embed_sentence') returns Spark NLP model sent_bert_finnish_cased\n","nlu.load('fi.embed_sentence.bert.cased') returns Spark NLP model sent_bert_finnish_cased\n","nlu.load('fi.embed_sentence.bert.uncased') returns Spark NLP model sent_bert_finnish_uncased\n","For language NLU provides the following Models : \n","nlu.load('xx.embed_sentence') returns Spark NLP model sent_bert_multi_cased\n","nlu.load('xx.embed_sentence.bert') returns Spark NLP model sent_bert_multi_cased\n","nlu.load('xx.embed_sentence.bert.cased') returns Spark NLP model sent_bert_multi_cased\n","nlu.load('xx.embed_sentence.labse') returns Spark NLP model labse\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":651},"id":"IKK_Ii_gjJfF","executionInfo":{"status":"ok","timestamp":1620215663304,"user_tz":-120,"elapsed":596311,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"776428c8-8135-4f41-edfd-358ee5572e2f"},"source":["trainable_pipe = nlu.load('embed_sentence.bert train.sentiment')\n","# We need to train longer and user smaller LR for NON-USE based sentence embeddings usually\n","# We could tune the hyperparameters further with hyperparameter tuning methods like gridsearch\n","# Also longer training gives more accuracy\n","trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(40) \n","trainable_pipe['trainable_sentiment_dl'].setLr(0.0005) \n","fitted_pipe = trainable_pipe.fit(train_df)\n","# predict with the trainable pipeline on dataset and get predictions\n","preds = fitted_pipe.predict(train_df,output_level='document')\n","\n","#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n","preds.dropna(inplace=True)\n","print(classification_report(preds['y'], preds['sentiment']))\n","\n","preds"],"execution_count":null,"outputs":[{"output_type":"stream","text":["sent_small_bert_L2_128 download started this may take some time.\n","Approximate size to download 16.1 MB\n","[OK!]\n","sentence_detector_dl download started this may take some time.\n","Approximate size to download 354.6 KB\n","[OK!]\n"," precision recall f1-score support\n","\n"," negative 0.68 0.24 0.35 2106\n"," neutral 0.00 0.00 0.00 0\n"," positive 0.72 0.84 0.77 3685\n","\n"," accuracy 0.62 5791\n"," macro avg 0.47 0.36 0.37 5791\n","weighted avg 0.71 0.62 0.62 5791\n","\n"],"name":"stdout"},{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
sentenceorigin_indexydocumenttrained_sentiment_confidencetrained_sentimenttextsentence_embedding_bert
0[Kickers on my watchlist XIDE TIT SOQ PNK CPW ...0positiveKickers on my watchlist XIDE TIT SOQ PNK CPW B...0.864774positiveKickers on my watchlist XIDE TIT SOQ PNK CPW B...[-0.9207566380500793, 0.21013399958610535, 0.1...
1[user: AAP MOVIE. 55% return for the FEA/GEED ...1positiveuser: AAP MOVIE. 55% return for the FEA/GEED i...0.648291positiveuser: AAP MOVIE. 55% return for the FEA/GEED i...[-0.43004706501960754, 0.5101228952407837, -0....
2[user I'd be afraid to short AMZN - they are l...2positiveuser I'd be afraid to short AMZN - they are lo...0.793143positiveuser I'd be afraid to short AMZN - they are lo...[0.3040030300617218, 0.22862930595874786, -0.5...
3[MNTA Over 12.00]3positiveMNTA Over 12.000.964940positiveMNTA Over 12.00[-1.8103482723236084, -0.4799136519432068, -0....
4[OI Over 21.37]4positiveOI Over 21.370.959243positiveOI Over 21.37[-2.4639298915863037, 0.3879586458206177, -0.6...
...........................
5786[Industry body CII said #discoms are likely to...5786negativeIndustry body CII said #discoms are likely to ...0.753365negativeIndustry body CII said #discoms are likely to ...[-0.09503882378339767, 0.6293947696685791, 0.0...
5787[#Gold prices slip below Rs 46,000 as #investo...5787negative#Gold prices slip below Rs 46,000 as #investor...0.724050negative#Gold prices slip below Rs 46,000 as #investor...[-0.12879370152950287, 0.28170245885849, 0.028...
5788[Workers at Bajaj Auto have agreed to a 10% wa...5788positiveWorkers at Bajaj Auto have agreed to a 10% wag...0.781417negativeWorkers at Bajaj Auto have agreed to a 10% wag...[-0.3395586907863617, 0.9124063849449158, -0.3...
5789[#Sharemarket LIVE: Sensex off day’s high, up ...5789positive#Sharemarket LIVE: Sensex off day’s high, up 6...0.520319neutral#Sharemarket LIVE: Sensex off day’s high, up 6...[-0.6081282496452332, 0.2732301652431488, 0.25...
5790[#Sensex, #Nifty climb off day's highs, still ...5790positive#Sensex, #Nifty climb off day's highs, still u...0.542672neutral#Sensex, #Nifty climb off day's highs, still u...[-0.4486270248889923, 0.43264666199684143, 0.0...
\n","

5791 rows × 8 columns

\n","
"],"text/plain":[" sentence ... sentence_embedding_bert\n","0 [Kickers on my watchlist XIDE TIT SOQ PNK CPW ... ... [-0.9207566380500793, 0.21013399958610535, 0.1...\n","1 [user: AAP MOVIE. 55% return for the FEA/GEED ... ... [-0.43004706501960754, 0.5101228952407837, -0....\n","2 [user I'd be afraid to short AMZN - they are l... ... [0.3040030300617218, 0.22862930595874786, -0.5...\n","3 [MNTA Over 12.00] ... [-1.8103482723236084, -0.4799136519432068, -0....\n","4 [OI Over 21.37] ... [-2.4639298915863037, 0.3879586458206177, -0.6...\n","... ... ... ...\n","5786 [Industry body CII said #discoms are likely to... ... [-0.09503882378339767, 0.6293947696685791, 0.0...\n","5787 [#Gold prices slip below Rs 46,000 as #investo... ... [-0.12879370152950287, 0.28170245885849, 0.028...\n","5788 [Workers at Bajaj Auto have agreed to a 10% wa... ... [-0.3395586907863617, 0.9124063849449158, -0.3...\n","5789 [#Sharemarket LIVE: Sensex off day’s high, up ... ... [-0.6081282496452332, 0.2732301652431488, 0.25...\n","5790 [#Sensex, #Nifty climb off day's highs, still ... ... [-0.4486270248889923, 0.43264666199684143, 0.0...\n","\n","[5791 rows x 8 columns]"]},"metadata":{"tags":[]},"execution_count":11}]},{"cell_type":"markdown","metadata":{"id":"2BB-NwZUoHSe"},"source":["# 5. Lets save the model"]},{"cell_type":"code","metadata":{"id":"eLex095goHwm","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1620215683754,"user_tz":-120,"elapsed":616758,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"d32a77d0-d25b-414d-e849-d6fee69b169c"},"source":["stored_model_path = './models/classifier_dl_trained' \n","fitted_pipe.save(stored_model_path)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["Stored model in ./models/classifier_dl_trained\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"e_b2DPd4rCiU"},"source":["# 6. Lets load the model from HDD.\n","This makes Offlien NLU usage possible! \n","You need to call nlu.load(path=path_to_the_pipe) to load a model/pipeline from disk."]},{"cell_type":"code","metadata":{"id":"SO4uz45MoRgp","colab":{"base_uri":"https://localhost:8080/","height":76},"executionInfo":{"status":"ok","timestamp":1620215688728,"user_tz":-120,"elapsed":621729,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"d8ab883e-ae6e-442f-e92c-c2ce7147e3ee"},"source":["hdd_pipe = nlu.load(path=stored_model_path)\n","\n","preds = hdd_pipe.predict('Tesla plans to invest 10M into the ML sector')\n","preds"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
sentimentsentenceorigin_indexdocumentsentence_embedding_from_disksentiment_confidencetext
0[positive][Tesla plans to invest 10M into the ML sector]8589934592Tesla plans to invest 10M into the ML sector[[-0.07111598551273346, 0.9532928466796875, -1...[0.9114534]Tesla plans to invest 10M into the ML sector
\n","
"],"text/plain":[" sentiment ... text\n","0 [positive] ... Tesla plans to invest 10M into the ML sector\n","\n","[1 rows x 7 columns]"]},"metadata":{"tags":[]},"execution_count":13}]},{"cell_type":"code","metadata":{"id":"e0CVlkk9v6Qi","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1620215688730,"user_tz":-120,"elapsed":621726,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"e5f153e0-cb42-4779-dfa2-ccefb4f1fba4"},"source":["hdd_pipe.print_info()"],"execution_count":null,"outputs":[{"output_type":"stream","text":["The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n",">>> pipe['document_assembler'] has settable params:\n","pipe['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n",">>> pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'] has settable params:\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setExplodeSentences(False) | Info: whether to explode each sentence into a different row, for better parallelization. Defaults to false. | Currently set to : False\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setStorageRef('SentenceDetectorDLModel_c83c27f46b97') | Info: storage unique identifier | Currently set to : SentenceDetectorDLModel_c83c27f46b97\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setEncoder(com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@63c37ce4) | Info: Data encoder | Currently set to : com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@63c37ce4\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setImpossiblePenultimates(['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']) | Info: Impossible penultimates | Currently set to : ['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setModelArchitecture('cnn') | Info: Model architecture (CNN) | Currently set to : cnn\n",">>> pipe['bert_sentence@sent_small_bert_L2_128'] has settable params:\n","pipe['bert_sentence@sent_small_bert_L2_128'].setBatchSize(8) | Info: Size of every batch | Currently set to : 8\n","pipe['bert_sentence@sent_small_bert_L2_128'].setCaseSensitive(False) | Info: whether to ignore case in tokens for embeddings matching | Currently set to : False\n","pipe['bert_sentence@sent_small_bert_L2_128'].setDimension(128) | Info: Number of embedding dimensions | Currently set to : 128\n","pipe['bert_sentence@sent_small_bert_L2_128'].setMaxSentenceLength(128) | Info: Max sentence length to process | Currently set to : 128\n","pipe['bert_sentence@sent_small_bert_L2_128'].setIsLong(False) | Info: Use Long type instead of Int type for inputs buffer - Some Bert models require Long instead of Int. | Currently set to : False\n","pipe['bert_sentence@sent_small_bert_L2_128'].setStorageRef('sent_small_bert_L2_128') | Info: unique reference name for identification | Currently set to : sent_small_bert_L2_128\n",">>> pipe['sentiment_dl@sent_small_bert_L2_128'] has settable params:\n","pipe['sentiment_dl@sent_small_bert_L2_128'].setThreshold(0.6) | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n","pipe['sentiment_dl@sent_small_bert_L2_128'].setThresholdLabel('neutral') | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n","pipe['sentiment_dl@sent_small_bert_L2_128'].setClasses(['positive', 'negative']) | Info: get the tags used to trained this SentimentDLModel | Currently set to : ['positive', 'negative']\n","pipe['sentiment_dl@sent_small_bert_L2_128'].setStorageRef('sent_small_bert_L2_128') | Info: unique reference name for identification | Currently set to : sent_small_bert_L2_128\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"73rQbUy-KLpb"},"source":[""],"execution_count":null,"outputs":[]}]} \ No newline at end of file +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "zkufh760uvF3" + }, + "source": [ + "![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)\n", + "\n", + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo.ipynb)\n", + "\n", + "\n", + "\n", + "# Training a Sentiment Analysis Classifier with NLU\n", + "With the [ClassifierDL model](https://nlp.johnsnowlabs.com/docs/en/annotators#classifierdl-multi-class-text-classification) from Spark NLP you can achieve State Of the Art results on any multi class text classification problem\n", + "\n", + "This notebook showcases the following features :\n", + "\n", + "- How to train the deep learning classifier\n", + "- How to store a pipeline to disk\n", + "- How to load the pipeline from disk (Enables NLU offline mode)\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dur2drhW5Rvi" + }, + "source": [ + "# 1. Install Java 8 and NLU" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "hFGnBCHavltY" + }, + "outputs": [], + "source": [ + "!pip install -q johnsnowlabs" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "f4KkTfnR5Ugg" + }, + "source": [ + "# 2. Download Stock Market Sentiment dataset\n", + "https://www.kaggle.com/yash612/stockmarket-sentiment-dataset" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "OrVb5ZMvvrQD" + }, + "outputs": [], + "source": [ + "! wget https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/resources/en/classifier-dl/stock_data/stock_data.csv\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "uDGIOASY_fRj", + "outputId": "bdd53b41-7b1e-47e8-9783-b4d8e3315d84" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "sentimentdl_glove_imdb download started this may take some time.\n", + "Approximate size to download 8.7 MB\n", + "[OK!]\n", + "glove_100d download started this may take some time.\n", + "Approximate size to download 145.3 MB\n", + "[OK!]\n" + ] + } + ], + "source": [ + "from johnsnowlabs import nlp\n", + "sentiment = nlp.load('sentiment')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 150 + }, + "id": "U0ENiuMc_kyb", + "outputId": "5d5d2dc8-7481-468c-ff8c-0c45fdc0b0c7" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "sentence_detector_dl download started this may take some time.\n", + "Approximate size to download 354.6 KB\n", + "[OK!]\n", + "Warning::Spark Session already created, some configs may not take.\n" + ] + }, + { + "data": { + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
sentencesentence_embedding_convertersentimentsentiment_confidenceword_embedding_glove
0I'm very very not at all happy[-0.2865465581417084, 0.25398728251457214, 0.2...pos0.999995[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,...
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "text/plain": [ + " sentence \\\n", + "0 I'm very very not at all happy \n", + "\n", + " sentence_embedding_converter sentiment \\\n", + "0 [-0.2865465581417084, 0.25398728251457214, 0.2... pos \n", + "\n", + " sentiment_confidence word_embedding_glove \n", + "0 0.999995 [[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,... " + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "sentiment.predict(\"I'm very very not at all happy\")" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 424 + }, + "id": "y4xSRWIhwT28", + "outputId": "c486e2d0-708b-4519-b43c-7427900236cf" + }, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
texty
0Kickers on my watchlist XIDE TIT SOQ PNK CPW B...positive
1user: AAP MOVIE. 55% return for the FEA/GEED i...positive
2user I'd be afraid to short AMZN - they are lo...positive
3MNTA Over 12.00positive
4OI Over 21.37positive
.........
5786Industry body CII said #discoms are likely to ...negative
5787#Gold prices slip below Rs 46,000 as #investor...negative
5788Workers at Bajaj Auto have agreed to a 10% wag...positive
5789#Sharemarket LIVE: Sensex off day’s high, up 6...positive
5790#Sensex, #Nifty climb off day's highs, still u...positive
\n", + "

5791 rows × 2 columns

\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ], + "text/plain": [ + " text y\n", + "0 Kickers on my watchlist XIDE TIT SOQ PNK CPW B... positive\n", + "1 user: AAP MOVIE. 55% return for the FEA/GEED i... positive\n", + "2 user I'd be afraid to short AMZN - they are lo... positive\n", + "3 MNTA Over 12.00 positive\n", + "4 OI Over 21.37 positive\n", + "... ... ...\n", + "5786 Industry body CII said #discoms are likely to ... negative\n", + "5787 #Gold prices slip below Rs 46,000 as #investor... negative\n", + "5788 Workers at Bajaj Auto have agreed to a 10% wag... positive\n", + "5789 #Sharemarket LIVE: Sensex off day’s high, up 6... positive\n", + "5790 #Sensex, #Nifty climb off day's highs, still u... positive\n", + "\n", + "[5791 rows x 2 columns]" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import pandas as pd\n", + "train_path = '/content/stock_data.csv'\n", + "\n", + "train_df = pd.read_csv(train_path)\n", + "# the text data to use for classification should be in a column named 'text'\n", + "# the label column must have name 'y' name be of type str\n", + "train_df.columns=['text','y']\n", + "train_df.y = train_df.y.astype(str)\n", + "train_df.y = train_df.y.str.replace('-1','negative')\n", + "train_df.y = train_df.y.str.replace('1','positive')\n", + "train_df" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0296Om2C5anY" + }, + "source": [ + "# 3. Train Deep Learning Classifier using nlu.load('train.sentiment')\n", + "\n", + "You dataset label column should be named 'y' and the feature column with text data should be named 'text'" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 806 + }, + "id": "3ZIPkRkWftBG", + "outputId": "91874f31-7e1a-4180-c354-5317930d5c4c" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", + "sent_small_bert_L2_128 download started this may take some time.\n", + "Approximate size to download 16.1 MB\n", + "[OK!]\n", + " precision recall f1-score support\n", + "\n", + " negative 0.00 0.00 0.00 2106\n", + " positive 0.64 1.00 0.78 3685\n", + "\n", + " accuracy 0.64 5791\n", + " macro avg 0.32 0.50 0.39 5791\n", + "weighted avg 0.40 0.64 0.49 5791\n", + "\n" + ] + }, + { + "data": { + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
documentsentence_embedding_small_bert_L2_128sentimentsentiment_confidencetexty
0Kickers on my watchlist XIDE TIT SOQ PNK CPW B...[-0.9530864357948303, 0.2135828286409378, 0.10...positive1.0Kickers on my watchlist XIDE TIT SOQ PNK CPW B...positive
1user: AAP MOVIE. 55% return for the FEA/GEED i...[-0.4725969433784485, 0.5354134440422058, -0.2...positive1.0user: AAP MOVIE. 55% return for the FEA/GEED i...positive
2user I'd be afraid to short AMZN - they are lo...[0.30400288105010986, 0.22862982749938965, -0....positive1.0user I'd be afraid to short AMZN - they are lo...positive
3MNTA Over 12.00[-1.707902193069458, -0.48472753167152405, -0....positive1.0MNTA Over 12.00positive
4OI Over 21.37[-2.3011534214019775, 0.2649511396884918, -0.4...positive1.0OI Over 21.37positive
.....................
5786Industry body CII said #discoms are likely to ...[-0.21655204892158508, 0.6153537631034851, 0.0...positive1.0Industry body CII said #discoms are likely to ...negative
5787#Gold prices slip below Rs 46,000 as #investor...[-0.19915254414081573, 0.2607441842556, 0.0032...positive1.0#Gold prices slip below Rs 46,000 as #investor...negative
5788Workers at Bajaj Auto have agreed to a 10% wag...[-0.4361518919467926, 0.9346759915351868, -0.3...positive1.0Workers at Bajaj Auto have agreed to a 10% wag...positive
5789#Sharemarket LIVE: Sensex off day’s high, up 6...[-0.6081278920173645, 0.2732301354408264, 0.25...positive1.0#Sharemarket LIVE: Sensex off day’s high, up 6...positive
5790#Sensex, #Nifty climb off day's highs, still u...[-0.5274896621704102, 0.4326432943344116, 0.06...positive1.0#Sensex, #Nifty climb off day's highs, still u...positive
\n", + "

5791 rows × 6 columns

\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ], + "text/plain": [ + " document \\\n", + "0 Kickers on my watchlist XIDE TIT SOQ PNK CPW B... \n", + "1 user: AAP MOVIE. 55% return for the FEA/GEED i... \n", + "2 user I'd be afraid to short AMZN - they are lo... \n", + "3 MNTA Over 12.00 \n", + "4 OI Over 21.37 \n", + "... ... \n", + "5786 Industry body CII said #discoms are likely to ... \n", + "5787 #Gold prices slip below Rs 46,000 as #investor... \n", + "5788 Workers at Bajaj Auto have agreed to a 10% wag... \n", + "5789 #Sharemarket LIVE: Sensex off day’s high, up 6... \n", + "5790 #Sensex, #Nifty climb off day's highs, still u... \n", + "\n", + " sentence_embedding_small_bert_L2_128 sentiment \\\n", + "0 [-0.9530864357948303, 0.2135828286409378, 0.10... positive \n", + "1 [-0.4725969433784485, 0.5354134440422058, -0.2... positive \n", + "2 [0.30400288105010986, 0.22862982749938965, -0.... positive \n", + "3 [-1.707902193069458, -0.48472753167152405, -0.... positive \n", + "4 [-2.3011534214019775, 0.2649511396884918, -0.4... positive \n", + "... ... ... \n", + "5786 [-0.21655204892158508, 0.6153537631034851, 0.0... positive \n", + "5787 [-0.19915254414081573, 0.2607441842556, 0.0032... positive \n", + "5788 [-0.4361518919467926, 0.9346759915351868, -0.3... positive \n", + "5789 [-0.6081278920173645, 0.2732301354408264, 0.25... positive \n", + "5790 [-0.5274896621704102, 0.4326432943344116, 0.06... positive \n", + "\n", + " sentiment_confidence text \\\n", + "0 1.0 Kickers on my watchlist XIDE TIT SOQ PNK CPW B... \n", + "1 1.0 user: AAP MOVIE. 55% return for the FEA/GEED i... \n", + "2 1.0 user I'd be afraid to short AMZN - they are lo... \n", + "3 1.0 MNTA Over 12.00 \n", + "4 1.0 OI Over 21.37 \n", + "... ... ... \n", + "5786 1.0 Industry body CII said #discoms are likely to ... \n", + "5787 1.0 #Gold prices slip below Rs 46,000 as #investor... \n", + "5788 1.0 Workers at Bajaj Auto have agreed to a 10% wag... \n", + "5789 1.0 #Sharemarket LIVE: Sensex off day’s high, up 6... \n", + "5790 1.0 #Sensex, #Nifty climb off day's highs, still u... \n", + "\n", + " y \n", + "0 positive \n", + "1 positive \n", + "2 positive \n", + "3 positive \n", + "4 positive \n", + "... ... \n", + "5786 negative \n", + "5787 negative \n", + "5788 positive \n", + "5789 positive \n", + "5790 positive \n", + "\n", + "[5791 rows x 6 columns]" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from sklearn.metrics import classification_report\n", + "# load a trainable pipeline by specifying the train. prefix and fit it on a datset with label and text columns\n", + "# by default the Universal Sentence Encoder (USE) Sentence embeddings are used for generation\n", + "trainable_pipe = nlp.load('train.sentiment')\n", + "fitted_pipe = trainable_pipe.fit(train_df)\n", + "\n", + "# predict with the trainable pipeline on dataset and get predictions\n", + "preds = fitted_pipe.predict(train_df,output_level='document')\n", + "#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n", + "preds.dropna(inplace=True)\n", + "print(classification_report(preds['y'], preds['sentiment']))\n", + "\n", + "preds" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lVyOE2wV0fw_" + }, + "source": [ + "# Test the fitted pipe on new example" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 150 + }, + "id": "qdCUg2MR0PD2", + "outputId": "d68ec313-a55c-4d59-c162-4b985b238ebc" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "sentence_detector_dl download started this may take some time.\n", + "Approximate size to download 354.6 KB\n", + "[OK!]\n", + "Warning::Spark Session already created, some configs may not take.\n" + ] + }, + { + "data": { + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
sentencesentence_embedding_small_bert_L2_128sentimentsentiment_confidence
0Bitcoin is going to the moon![-1.0531491041183472, -0.2827455699443817, -0....positive1.0
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "text/plain": [ + " sentence \\\n", + "0 Bitcoin is going to the moon! \n", + "\n", + " sentence_embedding_small_bert_L2_128 sentiment \\\n", + "0 [-1.0531491041183472, -0.2827455699443817, -0.... positive \n", + "\n", + " sentiment_confidence \n", + "0 1.0 " + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "fitted_pipe.predict(\"Bitcoin is going to the moon!\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xflpwrVjjBVD" + }, + "source": [ + "## Configure pipe training parameters" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "UtsAUGTmOTms", + "outputId": "8b700f48-678d-4e18-fab7-b03a3eba15b0" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n", + ">>> component_list['bert_sentence_embeddings@sent_small_bert_L2_128'] has settable params:\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setBatchSize(8) | Info: Size of every batch | Currently set to : 8\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setIsLong(False) | Info: Use Long type instead of Int type for inputs buffer - Some Bert models require Long instead of Int. | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setMaxSentenceLength(128) | Info: Max sentence length to process | Currently set to : 128\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setDimension(128) | Info: Number of embedding dimensions | Currently set to : 128\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setCaseSensitive(False) | Info: whether to ignore case in tokens for embeddings matching | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setStorageRef('sent_small_bert_L2_128') | Info: unique reference name for identification | Currently set to : sent_small_bert_L2_128\n", + ">>> component_list['document_assembler'] has settable params:\n", + "component_list['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n", + ">>> component_list['sentiment_dl@sent_small_bert_L2_128'] has settable params:\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setThreshold(0.6) | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setThresholdLabel('neutral') | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setStorageRef('sent_small_bert_L2_128') | Info: unique reference name for identification | Currently set to : sent_small_bert_L2_128\n" + ] + } + ], + "source": [ + "trainable_pipe.print_info()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2GJdDNV9jEIe" + }, + "source": [ + "## Retrain with new parameters" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 806 + }, + "id": "mptfvHx-MMMX", + "outputId": "57b240b3-3c7d-4fd1-a1e7-3d0f6212a02a" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", + "sent_small_bert_L2_128 download started this may take some time.\n", + "Approximate size to download 16.1 MB\n", + "[OK!]\n", + " precision recall f1-score support\n", + "\n", + " negative 0.00 0.00 0.00 2106\n", + " positive 0.64 1.00 0.78 3685\n", + "\n", + " accuracy 0.64 5791\n", + " macro avg 0.32 0.50 0.39 5791\n", + "weighted avg 0.40 0.64 0.49 5791\n", + "\n" + ] + }, + { + "data": { + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
documentsentence_embedding_small_bert_L2_128sentimentsentiment_confidencetexty
0Kickers on my watchlist XIDE TIT SOQ PNK CPW B...[-0.9530864357948303, 0.2135828286409378, 0.10...positive1.0Kickers on my watchlist XIDE TIT SOQ PNK CPW B...positive
1user: AAP MOVIE. 55% return for the FEA/GEED i...[-0.4725969433784485, 0.5354134440422058, -0.2...positive1.0user: AAP MOVIE. 55% return for the FEA/GEED i...positive
2user I'd be afraid to short AMZN - they are lo...[0.30400288105010986, 0.22862982749938965, -0....positive1.0user I'd be afraid to short AMZN - they are lo...positive
3MNTA Over 12.00[-1.707902193069458, -0.48472753167152405, -0....positive1.0MNTA Over 12.00positive
4OI Over 21.37[-2.3011534214019775, 0.2649511396884918, -0.4...positive1.0OI Over 21.37positive
.....................
5786Industry body CII said #discoms are likely to ...[-0.21655204892158508, 0.6153537631034851, 0.0...positive1.0Industry body CII said #discoms are likely to ...negative
5787#Gold prices slip below Rs 46,000 as #investor...[-0.19915254414081573, 0.2607441842556, 0.0032...positive1.0#Gold prices slip below Rs 46,000 as #investor...negative
5788Workers at Bajaj Auto have agreed to a 10% wag...[-0.4361518919467926, 0.9346759915351868, -0.3...positive1.0Workers at Bajaj Auto have agreed to a 10% wag...positive
5789#Sharemarket LIVE: Sensex off day’s high, up 6...[-0.6081278920173645, 0.2732301354408264, 0.25...positive1.0#Sharemarket LIVE: Sensex off day’s high, up 6...positive
5790#Sensex, #Nifty climb off day's highs, still u...[-0.5274896621704102, 0.4326432943344116, 0.06...positive1.0#Sensex, #Nifty climb off day's highs, still u...positive
\n", + "

5791 rows × 6 columns

\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ], + "text/plain": [ + " document \\\n", + "0 Kickers on my watchlist XIDE TIT SOQ PNK CPW B... \n", + "1 user: AAP MOVIE. 55% return for the FEA/GEED i... \n", + "2 user I'd be afraid to short AMZN - they are lo... \n", + "3 MNTA Over 12.00 \n", + "4 OI Over 21.37 \n", + "... ... \n", + "5786 Industry body CII said #discoms are likely to ... \n", + "5787 #Gold prices slip below Rs 46,000 as #investor... \n", + "5788 Workers at Bajaj Auto have agreed to a 10% wag... \n", + "5789 #Sharemarket LIVE: Sensex off day’s high, up 6... \n", + "5790 #Sensex, #Nifty climb off day's highs, still u... \n", + "\n", + " sentence_embedding_small_bert_L2_128 sentiment \\\n", + "0 [-0.9530864357948303, 0.2135828286409378, 0.10... positive \n", + "1 [-0.4725969433784485, 0.5354134440422058, -0.2... positive \n", + "2 [0.30400288105010986, 0.22862982749938965, -0.... positive \n", + "3 [-1.707902193069458, -0.48472753167152405, -0.... positive \n", + "4 [-2.3011534214019775, 0.2649511396884918, -0.4... positive \n", + "... ... ... \n", + "5786 [-0.21655204892158508, 0.6153537631034851, 0.0... positive \n", + "5787 [-0.19915254414081573, 0.2607441842556, 0.0032... positive \n", + "5788 [-0.4361518919467926, 0.9346759915351868, -0.3... positive \n", + "5789 [-0.6081278920173645, 0.2732301354408264, 0.25... positive \n", + "5790 [-0.5274896621704102, 0.4326432943344116, 0.06... positive \n", + "\n", + " sentiment_confidence text \\\n", + "0 1.0 Kickers on my watchlist XIDE TIT SOQ PNK CPW B... \n", + "1 1.0 user: AAP MOVIE. 55% return for the FEA/GEED i... \n", + "2 1.0 user I'd be afraid to short AMZN - they are lo... \n", + "3 1.0 MNTA Over 12.00 \n", + "4 1.0 OI Over 21.37 \n", + "... ... ... \n", + "5786 1.0 Industry body CII said #discoms are likely to ... \n", + "5787 1.0 #Gold prices slip below Rs 46,000 as #investor... \n", + "5788 1.0 Workers at Bajaj Auto have agreed to a 10% wag... \n", + "5789 1.0 #Sharemarket LIVE: Sensex off day’s high, up 6... \n", + "5790 1.0 #Sensex, #Nifty climb off day's highs, still u... \n", + "\n", + " y \n", + "0 positive \n", + "1 positive \n", + "2 positive \n", + "3 positive \n", + "4 positive \n", + "... ... \n", + "5786 negative \n", + "5787 negative \n", + "5788 positive \n", + "5789 positive \n", + "5790 positive \n", + "\n", + "[5791 rows x 6 columns]" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Train longer!\n", + "trainable_pipe = nlp.load('train.sentiment')\n", + "trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(5)\n", + "fitted_pipe = trainable_pipe.fit(train_df)\n", + "# predict with the trainable pipeline on dataset and get predictions\n", + "preds = fitted_pipe.predict(train_df,output_level='document')\n", + "\n", + "#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n", + "preds.dropna(inplace=True)\n", + "print(classification_report(preds['y'], preds['sentiment']))\n", + "\n", + "preds" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qFoT-s1MjTSS" + }, + "source": [ + "# Try training with different Embeddings" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "nxWFzQOhjWC8", + "outputId": "34e045c7-7402-448a-d52e-cb6ce06c9a00" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "For language NLU provides the following Models : \n", + "nlu.load('am.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_amharic\n", + "For language NLU provides the following Models : \n", + "nlu.load('de.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "For language NLU provides the following Models : \n", + "nlu.load('el.embed_sentence.bert.base_uncased') returns Spark NLP model_anno_obj sent_bert_base_uncased\n", + "For language NLU provides the following Models : \n", + "nlu.load('en.embed_sentence') returns Spark NLP model_anno_obj tfhub_use\n", + "nlu.load('en.embed_sentence.albert') returns Spark NLP model_anno_obj albert_base_uncased\n", + "nlu.load('en.embed_sentence.bert') returns Spark NLP model_anno_obj sent_bert_base_uncased\n", + "nlu.load('en.embed_sentence.bert.base_uncased_legal') returns Spark NLP model_anno_obj sent_bert_base_uncased_legal\n", + "nlu.load('en.embed_sentence.bert.finetuned') returns Spark NLP model_anno_obj sbert_setfit_finetuned_financial_text_classification\n", + "nlu.load('en.embed_sentence.bert.pubmed') returns Spark NLP model_anno_obj sent_bert_pubmed\n", + "nlu.load('en.embed_sentence.bert.pubmed_squad2') returns Spark NLP model_anno_obj sent_bert_pubmed_squad2\n", + "nlu.load('en.embed_sentence.bert.wiki_books') returns Spark NLP model_anno_obj sent_bert_wiki_books\n", + "nlu.load('en.embed_sentence.bert.wiki_books_mnli') returns Spark NLP model_anno_obj sent_bert_wiki_books_mnli\n", + "nlu.load('en.embed_sentence.bert.wiki_books_qnli') returns Spark NLP model_anno_obj sent_bert_wiki_books_qnli\n", + "nlu.load('en.embed_sentence.bert.wiki_books_qqp') returns Spark NLP model_anno_obj sent_bert_wiki_books_qqp\n", + "nlu.load('en.embed_sentence.bert.wiki_books_squad2') returns Spark NLP model_anno_obj sent_bert_wiki_books_squad2\n", + "nlu.load('en.embed_sentence.bert.wiki_books_sst2') returns Spark NLP model_anno_obj sent_bert_wiki_books_sst2\n", + "nlu.load('en.embed_sentence.bert_base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "nlu.load('en.embed_sentence.bert_base_uncased') returns Spark NLP model_anno_obj sent_bert_base_uncased\n", + "nlu.load('en.embed_sentence.bert_large_cased') returns Spark NLP model_anno_obj sent_bert_large_cased\n", + "nlu.load('en.embed_sentence.bert_large_uncased') returns Spark NLP model_anno_obj sent_bert_large_uncased\n", + "nlu.load('en.embed_sentence.bert_use_cmlm_en_base') returns Spark NLP model_anno_obj sent_bert_use_cmlm_en_base\n", + "nlu.load('en.embed_sentence.bert_use_cmlm_en_large') returns Spark NLP model_anno_obj sent_bert_use_cmlm_en_large\n", + "nlu.load('en.embed_sentence.biobert.clinical_base_cased') returns Spark NLP model_anno_obj sent_biobert_clinical_base_cased\n", + "nlu.load('en.embed_sentence.biobert.discharge_base_cased') returns Spark NLP model_anno_obj sent_biobert_discharge_base_cased\n", + "nlu.load('en.embed_sentence.biobert.pmc_base_cased') returns Spark NLP model_anno_obj sent_biobert_pmc_base_cased\n", + "nlu.load('en.embed_sentence.biobert.pubmed_base_cased') returns Spark NLP model_anno_obj sent_biobert_pubmed_base_cased\n", + "nlu.load('en.embed_sentence.biobert.pubmed_large_cased') returns Spark NLP model_anno_obj sent_biobert_pubmed_large_cased\n", + "nlu.load('en.embed_sentence.biobert.pubmed_pmc_base_cased') returns Spark NLP model_anno_obj sent_biobert_pubmed_pmc_base_cased\n", + "nlu.load('en.embed_sentence.covidbert.large_uncased') returns Spark NLP model_anno_obj sent_covidbert_large_uncased\n", + "nlu.load('en.embed_sentence.distil_roberta.distilled_base') returns Spark NLP model_anno_obj sent_distilroberta_base\n", + "nlu.load('en.embed_sentence.doc2vec') returns Spark NLP model_anno_obj doc2vec_gigaword_300\n", + "nlu.load('en.embed_sentence.doc2vec.gigaword_300') returns Spark NLP model_anno_obj doc2vec_gigaword_300\n", + "nlu.load('en.embed_sentence.doc2vec.gigaword_wiki_300') returns Spark NLP model_anno_obj doc2vec_gigaword_wiki_300\n", + "nlu.load('en.embed_sentence.electra') returns Spark NLP model_anno_obj sent_electra_small_uncased\n", + "nlu.load('en.embed_sentence.electra_base_uncased') returns Spark NLP model_anno_obj sent_electra_base_uncased\n", + "nlu.load('en.embed_sentence.electra_large_uncased') returns Spark NLP model_anno_obj sent_electra_large_uncased\n", + "nlu.load('en.embed_sentence.electra_small_uncased') returns Spark NLP model_anno_obj sent_electra_small_uncased\n", + "nlu.load('en.embed_sentence.roberta.base') returns Spark NLP model_anno_obj sent_roberta_base\n", + "nlu.load('en.embed_sentence.roberta.large') returns Spark NLP model_anno_obj sent_roberta_large\n", + "nlu.load('en.embed_sentence.small_bert_L10_128') returns Spark NLP model_anno_obj sent_small_bert_L10_128\n", + "nlu.load('en.embed_sentence.small_bert_L10_256') returns Spark NLP model_anno_obj sent_small_bert_L10_256\n", + "nlu.load('en.embed_sentence.small_bert_L10_512') returns Spark NLP model_anno_obj sent_small_bert_L10_512\n", + "nlu.load('en.embed_sentence.small_bert_L10_768') returns Spark NLP model_anno_obj sent_small_bert_L10_768\n", + "nlu.load('en.embed_sentence.small_bert_L12_128') returns Spark NLP model_anno_obj sent_small_bert_L12_128\n", + "nlu.load('en.embed_sentence.small_bert_L12_256') returns Spark NLP model_anno_obj sent_small_bert_L12_256\n", + "nlu.load('en.embed_sentence.small_bert_L12_512') returns Spark NLP model_anno_obj sent_small_bert_L12_512\n", + "nlu.load('en.embed_sentence.small_bert_L12_768') returns Spark NLP model_anno_obj sent_small_bert_L12_768\n", + "nlu.load('en.embed_sentence.small_bert_L2_128') returns Spark NLP model_anno_obj sent_small_bert_L2_128\n", + "nlu.load('en.embed_sentence.small_bert_L2_256') returns Spark NLP model_anno_obj sent_small_bert_L2_256\n", + "nlu.load('en.embed_sentence.small_bert_L2_512') returns Spark NLP model_anno_obj sent_small_bert_L2_512\n", + "nlu.load('en.embed_sentence.small_bert_L2_768') returns Spark NLP model_anno_obj sent_small_bert_L2_768\n", + "nlu.load('en.embed_sentence.small_bert_L4_128') returns Spark NLP model_anno_obj sent_small_bert_L4_128\n", + "nlu.load('en.embed_sentence.small_bert_L4_256') returns Spark NLP model_anno_obj sent_small_bert_L4_256\n", + "nlu.load('en.embed_sentence.small_bert_L4_512') returns Spark NLP model_anno_obj sent_small_bert_L4_512\n", + "nlu.load('en.embed_sentence.small_bert_L4_768') returns Spark NLP model_anno_obj sent_small_bert_L4_768\n", + "nlu.load('en.embed_sentence.small_bert_L6_128') returns Spark NLP model_anno_obj sent_small_bert_L6_128\n", + "nlu.load('en.embed_sentence.small_bert_L6_256') returns Spark NLP model_anno_obj sent_small_bert_L6_256\n", + "nlu.load('en.embed_sentence.small_bert_L6_512') returns Spark NLP model_anno_obj sent_small_bert_L6_512\n", + "nlu.load('en.embed_sentence.small_bert_L6_768') returns Spark NLP model_anno_obj sent_small_bert_L6_768\n", + "nlu.load('en.embed_sentence.small_bert_L8_128') returns Spark NLP model_anno_obj sent_small_bert_L8_128\n", + "nlu.load('en.embed_sentence.small_bert_L8_256') returns Spark NLP model_anno_obj sent_small_bert_L8_256\n", + "nlu.load('en.embed_sentence.small_bert_L8_512') returns Spark NLP model_anno_obj sent_small_bert_L8_512\n", + "nlu.load('en.embed_sentence.small_bert_L8_768') returns Spark NLP model_anno_obj sent_small_bert_L8_768\n", + "nlu.load('en.embed_sentence.tfhub_use') returns Spark NLP model_anno_obj tfhub_use\n", + "nlu.load('en.embed_sentence.tfhub_use.lg') returns Spark NLP model_anno_obj tfhub_use_lg\n", + "nlu.load('en.embed_sentence.use') returns Spark NLP model_anno_obj tfhub_use\n", + "nlu.load('en.embed_sentence.use.lg') returns Spark NLP model_anno_obj tfhub_use_lg\n", + "For language NLU provides the following Models : \n", + "nlu.load('es.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "nlu.load('es.embed_sentence.bert.base_uncased') returns Spark NLP model_anno_obj sent_bert_base_uncased\n", + "For language NLU provides the following Models : \n", + "nlu.load('fi.embed_sentence.bert') returns Spark NLP model_anno_obj bert_base_finnish_uncased\n", + "nlu.load('fi.embed_sentence.bert.cased') returns Spark NLP model_anno_obj bert_base_finnish_cased\n", + "nlu.load('fi.embed_sentence.bert.uncased') returns Spark NLP model_anno_obj bert_base_finnish_uncased\n", + "For language NLU provides the following Models : \n", + "nlu.load('ha.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_hausa\n", + "For language NLU provides the following Models : \n", + "nlu.load('ig.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_igbo\n", + "For language NLU provides the following Models : \n", + "nlu.load('lg.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_luganda\n", + "For language NLU provides the following Models : \n", + "nlu.load('nl.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "For language NLU provides the following Models : \n", + "nlu.load('pcm.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_naija\n", + "For language NLU provides the following Models : \n", + "nlu.load('pt.embed_sentence.bert.base_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_base_tsdae_sts\n", + "nlu.load('pt.embed_sentence.bert.cased_large_legal') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.1\n", + "nlu.load('pt.embed_sentence.bert.large_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_gpl_sts\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.10.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.10\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.2.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.2\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.3.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.3\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.4.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.4\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.5.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.5\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.7.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.7\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.8.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.8\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.9.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.9\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v1.0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v1.0\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_gpl_nli_sts_v0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_gpl_nli_sts_v0\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_gpl_nli_sts_v1.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_gpl_nli_sts_v1\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_nli_sts_v0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_nli_sts_v0\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_nli_sts_v1.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_nli_sts_v1\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_sts_v0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_sts_v0\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_sts_v1.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_sts_v1\n", + "nlu.load('pt.embed_sentence.bert.v2_base_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_base_ma_v2\n", + "nlu.load('pt.embed_sentence.bert.v2_large_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_sts_v2\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.assin.base.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_base_ma\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.assin2.base.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_base\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_sts_by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_sts_ma.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large_ma\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_sts_ma_v3.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large_ma_v3\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_tsdae_sts.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_sts\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_tsdae_sts_v4.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_sts_v4\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_tsdae_v4_gpl_sts.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_v4_gpl_sts\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.v2_large_sts_v2.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large_v2\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.v2_large_v2_sts.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_v2_sts\n", + "For language NLU provides the following Models : \n", + "nlu.load('rw.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_kinyarwanda\n", + "For language NLU provides the following Models : \n", + "nlu.load('sv.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "For language NLU provides the following Models : \n", + "nlu.load('sw.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_swahili\n", + "For language NLU provides the following Models : \n", + "nlu.load('wo.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_wolof\n", + "For language NLU provides the following Models : \n", + "nlu.load('xx.embed_sentence') returns Spark NLP model_anno_obj sent_bert_multi_cased\n", + "nlu.load('xx.embed_sentence.bert') returns Spark NLP model_anno_obj sent_bert_multi_cased\n", + "nlu.load('xx.embed_sentence.bert.cased') returns Spark NLP model_anno_obj sent_bert_multi_cased\n", + "nlu.load('xx.embed_sentence.bert.muril') returns Spark NLP model_anno_obj sent_bert_muril\n", + "nlu.load('xx.embed_sentence.bert_use_cmlm_multi_base') returns Spark NLP model_anno_obj sent_bert_use_cmlm_multi_base\n", + "nlu.load('xx.embed_sentence.bert_use_cmlm_multi_base_br') returns Spark NLP model_anno_obj sent_bert_use_cmlm_multi_base_br\n", + "nlu.load('xx.embed_sentence.labse') returns Spark NLP model_anno_obj labse\n", + "nlu.load('xx.embed_sentence.xlm_roberta.base') returns Spark NLP model_anno_obj sent_xlm_roberta_base\n", + "For language NLU provides the following Models : \n", + "nlu.load('yo.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_yoruba\n", + "For language NLU provides the following Models : \n", + "nlu.load('zh.embed_sentence.bert') returns Spark NLP model_anno_obj sbert_chinese_qmc_finance_v1\n", + "nlu.load('zh.embed_sentence.bert.distilled') returns Spark NLP model_anno_obj sbert_chinese_qmc_finance_v1_distill\n" + ] + } + ], + "source": [ + "# We can use nlu.print_components(action='embed_sentence') to see every possibler sentence embedding we could use. Lets use bert!\n", + "nlp.nlu.print_components(action='embed_sentence')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 858 + }, + "id": "IKK_Ii_gjJfF", + "outputId": "2650912d-7378-4149-9265-8d5e662ebeb8" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", + "sent_small_bert_L2_128 download started this may take some time.\n", + "Approximate size to download 16.1 MB\n", + "[OK!]\n", + " precision recall f1-score support\n", + "\n", + " negative 0.69 0.24 0.36 2106\n", + " neutral 0.00 0.00 0.00 0\n", + " positive 0.72 0.85 0.78 3685\n", + "\n", + " accuracy 0.63 5791\n", + " macro avg 0.47 0.36 0.38 5791\n", + "weighted avg 0.71 0.63 0.63 5791\n", + "\n" + ] + }, + { + "data": { + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
documentsentence_embedding_bertsentimentsentiment_confidencetexty
0Kickers on my watchlist XIDE TIT SOQ PNK CPW B...[-0.9530864357948303, 0.2135828286409378, 0.10...positive0.0Kickers on my watchlist XIDE TIT SOQ PNK CPW B...positive
1user: AAP MOVIE. 55% return for the FEA/GEED i...[-0.4725969433784485, 0.5354134440422058, -0.2...positive0.0user: AAP MOVIE. 55% return for the FEA/GEED i...positive
2user I'd be afraid to short AMZN - they are lo...[0.30400288105010986, 0.22862982749938965, -0....positive0.0user I'd be afraid to short AMZN - they are lo...positive
3MNTA Over 12.00[-1.707902193069458, -0.48472753167152405, -0....positive0.0MNTA Over 12.00positive
4OI Over 21.37[-2.3011534214019775, 0.2649511396884918, -0.4...positive0.0OI Over 21.37positive
.....................
5786Industry body CII said #discoms are likely to ...[-0.21655204892158508, 0.6153537631034851, 0.0...negative0.0Industry body CII said #discoms are likely to ...negative
5787#Gold prices slip below Rs 46,000 as #investor...[-0.19915254414081573, 0.2607441842556, 0.0032...negative0.0#Gold prices slip below Rs 46,000 as #investor...negative
5788Workers at Bajaj Auto have agreed to a 10% wag...[-0.4361518919467926, 0.9346759915351868, -0.3...negative0.0Workers at Bajaj Auto have agreed to a 10% wag...positive
5789#Sharemarket LIVE: Sensex off day’s high, up 6...[-0.6081278920173645, 0.2732301354408264, 0.25...neutral0.0#Sharemarket LIVE: Sensex off day’s high, up 6...positive
5790#Sensex, #Nifty climb off day's highs, still u...[-0.5274896621704102, 0.4326432943344116, 0.06...neutral0.0#Sensex, #Nifty climb off day's highs, still u...positive
\n", + "

5791 rows × 6 columns

\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ], + "text/plain": [ + " document \\\n", + "0 Kickers on my watchlist XIDE TIT SOQ PNK CPW B... \n", + "1 user: AAP MOVIE. 55% return for the FEA/GEED i... \n", + "2 user I'd be afraid to short AMZN - they are lo... \n", + "3 MNTA Over 12.00 \n", + "4 OI Over 21.37 \n", + "... ... \n", + "5786 Industry body CII said #discoms are likely to ... \n", + "5787 #Gold prices slip below Rs 46,000 as #investor... \n", + "5788 Workers at Bajaj Auto have agreed to a 10% wag... \n", + "5789 #Sharemarket LIVE: Sensex off day’s high, up 6... \n", + "5790 #Sensex, #Nifty climb off day's highs, still u... \n", + "\n", + " sentence_embedding_bert sentiment \\\n", + "0 [-0.9530864357948303, 0.2135828286409378, 0.10... positive \n", + "1 [-0.4725969433784485, 0.5354134440422058, -0.2... positive \n", + "2 [0.30400288105010986, 0.22862982749938965, -0.... positive \n", + "3 [-1.707902193069458, -0.48472753167152405, -0.... positive \n", + "4 [-2.3011534214019775, 0.2649511396884918, -0.4... positive \n", + "... ... ... \n", + "5786 [-0.21655204892158508, 0.6153537631034851, 0.0... negative \n", + "5787 [-0.19915254414081573, 0.2607441842556, 0.0032... negative \n", + "5788 [-0.4361518919467926, 0.9346759915351868, -0.3... negative \n", + "5789 [-0.6081278920173645, 0.2732301354408264, 0.25... neutral \n", + "5790 [-0.5274896621704102, 0.4326432943344116, 0.06... neutral \n", + "\n", + " sentiment_confidence text \\\n", + "0 0.0 Kickers on my watchlist XIDE TIT SOQ PNK CPW B... \n", + "1 0.0 user: AAP MOVIE. 55% return for the FEA/GEED i... \n", + "2 0.0 user I'd be afraid to short AMZN - they are lo... \n", + "3 0.0 MNTA Over 12.00 \n", + "4 0.0 OI Over 21.37 \n", + "... ... ... \n", + "5786 0.0 Industry body CII said #discoms are likely to ... \n", + "5787 0.0 #Gold prices slip below Rs 46,000 as #investor... \n", + "5788 0.0 Workers at Bajaj Auto have agreed to a 10% wag... \n", + "5789 0.0 #Sharemarket LIVE: Sensex off day’s high, up 6... \n", + "5790 0.0 #Sensex, #Nifty climb off day's highs, still u... \n", + "\n", + " y \n", + "0 positive \n", + "1 positive \n", + "2 positive \n", + "3 positive \n", + "4 positive \n", + "... ... \n", + "5786 negative \n", + "5787 negative \n", + "5788 positive \n", + "5789 positive \n", + "5790 positive \n", + "\n", + "[5791 rows x 6 columns]" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "trainable_pipe = nlp.load('embed_sentence.bert train.sentiment')\n", + "# We need to train longer and user smaller LR for NON-USE based sentence embeddings usually\n", + "# We could tune the hyperparameters further with hyperparameter tuning methods like gridsearch\n", + "# Also longer training gives more accuracy\n", + "trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(40)\n", + "trainable_pipe['trainable_sentiment_dl'].setLr(0.0005)\n", + "fitted_pipe = trainable_pipe.fit(train_df)\n", + "# predict with the trainable pipeline on dataset and get predictions\n", + "preds = fitted_pipe.predict(train_df,output_level='document')\n", + "\n", + "#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n", + "preds.dropna(inplace=True)\n", + "print(classification_report(preds['y'], preds['sentiment']))\n", + "\n", + "preds" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2BB-NwZUoHSe" + }, + "source": [ + "# 5. Lets save the model" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "eLex095goHwm" + }, + "outputs": [], + "source": [ + "stored_model_path = './models/classifier_dl_trained'\n", + "fitted_pipe.save(stored_model_path)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e_b2DPd4rCiU" + }, + "source": [ + "# 6. Lets load the model from HDD.\n", + "This makes Offlien NLU usage possible! \n", + "You need to call nlu.load(path=path_to_the_pipe) to load a model/pipeline from disk." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 133 + }, + "id": "SO4uz45MoRgp", + "outputId": "f914162e-6c41-4355-ad11-20c92fd155d1" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n" + ] + }, + { + "data": { + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
documentsentence_embedding_from_disksentimentsentiment_confidence
0Tesla plans to invest 10M into the ML sector[-0.07111673802137375, 0.9532930850982666, -1....positive0.0
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "text/plain": [ + " document \\\n", + "0 Tesla plans to invest 10M into the ML sector \n", + "\n", + " sentence_embedding_from_disk sentiment \\\n", + "0 [-0.07111673802137375, 0.9532930850982666, -1.... positive \n", + "\n", + " sentiment_confidence \n", + "0 0.0 " + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "hdd_pipe = nlp.load(path=stored_model_path)\n", + "\n", + "preds = hdd_pipe.predict('Tesla plans to invest 10M into the ML sector')\n", + "preds" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "e0CVlkk9v6Qi", + "outputId": "20c1ef9d-6769-4a30-ebae-cb37c2ec9f1d" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n", + ">>> component_list['document_assembler'] has settable params:\n", + "component_list['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n", + ">>> component_list['bert_sentence_embeddings@sent_small_bert_L2_128'] has settable params:\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setBatchSize(8) | Info: Size of every batch | Currently set to : 8\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setCaseSensitive(False) | Info: whether to ignore case in tokens for embeddings matching | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setDimension(128) | Info: Number of embedding dimensions | Currently set to : 128\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setMaxSentenceLength(128) | Info: Max sentence length to process | Currently set to : 128\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setIsLong(False) | Info: Use Long type instead of Int type for inputs buffer - Some Bert models require Long instead of Int. | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setStorageRef('sent_small_bert_L2_128') | Info: unique reference name for identification | Currently set to : sent_small_bert_L2_128\n", + ">>> component_list['sentiment_dl@sent_small_bert_L2_128'] has settable params:\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setThreshold(0.6) | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setThresholdLabel('neutral') | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setClasses(['positive', 'negative']) | Info: get the tags used to trained this SentimentDLModel | Currently set to : ['positive', 'negative']\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setStorageRef('sent_small_bert_L2_128') | Info: unique reference name for identification | Currently set to : sent_small_bert_L2_128\n" + ] + } + ], + "source": [ + "hdd_pipe.print_info()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "73rQbUy-KLpb" + }, + "outputs": [], + "source": [] + } + ], + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.4" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_IMDB.ipynb b/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_IMDB.ipynb index 2c362a88..da0e6868 100644 --- a/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_IMDB.ipynb +++ b/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_IMDB.ipynb @@ -1 +1,3115 @@ -{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"NLU_training_sentiment_classifier_demo_IMDB.ipynb","provenance":[],"collapsed_sections":["zkufh760uvF3"],"toc_visible":true},"kernelspec":{"display_name":"Python 3","name":"python3"}},"cells":[{"cell_type":"markdown","metadata":{"id":"zkufh760uvF3"},"source":["![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)\n","\n","[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_IMDB.ipynb)\n","\n","\n","# Training a Sentiment Analysis Classifier with NLU \n","## 2 class IMDB Movie sentiment classifier training\n","With the [SentimentDL model](https://nlp.johnsnowlabs.com/docs/en/annotators#sentimentdl-multi-class-sentiment-analysis-annotator) from Spark NLP you can achieve State Of the Art results on any multi class text classification problem \n","\n","This notebook showcases the following features : \n","\n","- How to train the deep learning classifier\n","- How to store a pipeline to disk\n","- How to load the pipeline from disk (Enables NLU offline mode)\n","\n","You can achieve these results or even better on this dataset with training data:\n","\n","\n","
\n","\n","\n","![image.png]()\n","\n","\n","\n","You can achieve these results or even better on this dataset with test data:\n","\n","\n","
\n","\n","\n","![image.png]()\n"]},{"cell_type":"markdown","metadata":{"id":"dur2drhW5Rvi"},"source":["# 1. Install Java 8 and NLU"]},{"cell_type":"code","metadata":{"id":"hFGnBCHavltY","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1620193225748,"user_tz":-120,"elapsed":116140,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"9ac3d81e-7a53-4711-fa24-e7320fd99723"},"source":["!wget https://setup.johnsnowlabs.com/nlu/colab.sh -O - | bash\n","import nlu"],"execution_count":null,"outputs":[{"output_type":"stream","text":["--2021-05-05 05:38:30-- https://raw.githubusercontent.com/JohnSnowLabs/nlu/master/scripts/colab_setup.sh\n","Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...\n","Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.\n","HTTP request sent, awaiting response... 200 OK\n","Length: 1671 (1.6K) [text/plain]\n","Saving to: ‘STDOUT’\n","\n","\r- 0%[ ] 0 --.-KB/s \rInstalling NLU 3.0.0 with PySpark 3.0.2 and Spark NLP 3.0.1 for Google Colab ...\n","- 100%[===================>] 1.63K --.-KB/s in 0s \n","\n","2021-05-05 05:38:30 (63.2 MB/s) - written to stdout [1671/1671]\n","\n","\u001b[K |████████████████████████████████| 204.8MB 67kB/s \n","\u001b[K |████████████████████████████████| 153kB 73.9MB/s \n","\u001b[K |████████████████████████████████| 204kB 23.9MB/s \n","\u001b[K |████████████████████████████████| 204kB 65.5MB/s \n","\u001b[?25h Building wheel for pyspark (setup.py) ... \u001b[?25l\u001b[?25hdone\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"f4KkTfnR5Ugg"},"source":["# 2. Download IMDB dataset\n","https://www.kaggle.com/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews\n","\n","IMDB dataset having 50K movie reviews for natural language processing or Text analytics.\n","This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training and 25,000 for testing. So, predict the number of positive and negative reviews using either classification or deep learning algorithms.\n","For more dataset information, please go through the following link,\n","http://ai.stanford.edu/~amaas/data/sentiment/"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"OrVb5ZMvvrQD","executionInfo":{"status":"ok","timestamp":1620193226173,"user_tz":-120,"elapsed":116555,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"827e3c03-e794-491f-ef27-a5951ecfa716"},"source":["! wget http://ckl-it.de/wp-content/uploads/2021/01/IMDB-Dataset.csv\n"],"execution_count":null,"outputs":[{"output_type":"stream","text":["--2021-05-05 05:40:25-- http://ckl-it.de/wp-content/uploads/2021/01/IMDB-Dataset.csv\n","Resolving ckl-it.de (ckl-it.de)... 217.160.0.108, 2001:8d8:100f:f000::209\n","Connecting to ckl-it.de (ckl-it.de)|217.160.0.108|:80... connected.\n","HTTP request sent, awaiting response... 200 OK\n","Length: 3288450 (3.1M) [text/csv]\n","Saving to: ‘IMDB-Dataset.csv’\n","\n","IMDB-Dataset.csv 100%[===================>] 3.14M 20.6MB/s in 0.2s \n","\n","2021-05-05 05:40:25 (20.6 MB/s) - ‘IMDB-Dataset.csv’ saved [3288450/3288450]\n","\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":406},"id":"y4xSRWIhwT28","executionInfo":{"status":"ok","timestamp":1620193227021,"user_tz":-120,"elapsed":117397,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"91412e47-6ce0-4d25-b00d-e2238146895b"},"source":["import pandas as pd\n","train_path = '/content/IMDB-Dataset.csv'\n","\n","train_df = pd.read_csv(train_path)\n","# the text data to use for classification should be in a column named 'text'\n","# the label column must have name 'y' name be of type str\n","columns=['text','y']\n","train_df = train_df[columns]\n","from sklearn.model_selection import train_test_split\n","\n","train_df, test_df = train_test_split(train_df, test_size=0.2)\n","train_df"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
texty
1459Absolutely wonderful drama and Ros is top notc...positive
2428I was totally surprised just how good this mov...positive
2093This is a fine musical with a timeless score b...positive
2370Saw it yesterday night at the Midnight Slam of...negative
2064This show was Fabulous. It was intricate and w...positive
.........
353This film held my interest enough to watch it ...positive
2322After reading the other tepid reviews and comm...positive
1997This movie, despite its list of B, C, and D li...negative
1585One of the most magnificent movies ever made. ...positive
237This movie is not worth anything. I mean, if y...negative
\n","

2000 rows × 2 columns

\n","
"],"text/plain":[" text y\n","1459 Absolutely wonderful drama and Ros is top notc... positive\n","2428 I was totally surprised just how good this mov... positive\n","2093 This is a fine musical with a timeless score b... positive\n","2370 Saw it yesterday night at the Midnight Slam of... negative\n","2064 This show was Fabulous. It was intricate and w... positive\n","... ... ...\n","353 This film held my interest enough to watch it ... positive\n","2322 After reading the other tepid reviews and comm... positive\n","1997 This movie, despite its list of B, C, and D li... negative\n","1585 One of the most magnificent movies ever made. ... positive\n","237 This movie is not worth anything. I mean, if y... negative\n","\n","[2000 rows x 2 columns]"]},"metadata":{"tags":[]},"execution_count":3}]},{"cell_type":"markdown","metadata":{"id":"0296Om2C5anY"},"source":["# 3. Train Deep Learning Classifier using nlu.load('train.sentiment')\n","\n","You dataset label column should be named 'y' and the feature column with text data should be named 'text'"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":1000},"id":"3ZIPkRkWftBG","executionInfo":{"status":"ok","timestamp":1620195366745,"user_tz":-120,"elapsed":11481,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"f826ae3a-a84b-4aab-f612-edb38026ac36"},"source":["import nlu \n","from sklearn.metrics import classification_report\n","\n","# load a trainable pipeline by specifying the train. prefix and fit it on a datset with label and text columns\n","# by default the Universal Sentence Encoder (USE) Sentence embeddings are used for generation\n","trainable_pipe = nlu.load('train.sentiment')\n","fitted_pipe = trainable_pipe.fit(train_df.iloc[:50])\n","\n","# predict with the trainable pipeline on dataset and get predictions\n","preds = fitted_pipe.predict(train_df.iloc[:50],output_level='document')\n","#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n","preds.dropna(inplace=True)\n","print(classification_report(preds['y'], preds['sentiment']))\n","\n","preds"],"execution_count":null,"outputs":[{"output_type":"stream","text":["tfhub_use download started this may take some time.\n","Approximate size to download 923.7 MB\n","[OK!]\n","sentence_detector_dl download started this may take some time.\n","Approximate size to download 354.6 KB\n","[OK!]\n"," precision recall f1-score support\n","\n"," negative 0.82 0.88 0.85 26\n"," neutral 0.00 0.00 0.00 0\n"," positive 0.85 0.71 0.77 24\n","\n"," accuracy 0.80 50\n"," macro avg 0.56 0.53 0.54 50\n","weighted avg 0.84 0.80 0.81 50\n","\n"],"name":"stdout"},{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
sentence_embedding_usetrained_sentiment_confidencetextdocumenttrained_sentimentsentenceyorigin_index
0[0.06351275742053986, 0.040804557502269745, -2...0.971989Absolutely wonderful drama and Ros is top notc...Absolutely wonderful drama and Ros is top notc...negative[Absolutely wonderful drama and Ros is top not...positive1459
1[0.06992700695991516, 0.0012252129381522536, -...0.972120I was totally surprised just how good this mov...I was totally surprised just how good this mov...positive[I was totally surprised just how good this mo...positive2428
2[0.052442848682403564, -0.017640504986047745, ...0.989897This is a fine musical with a timeless score b...This is a fine musical with a timeless score b...positive[This is a fine musical with a timeless score ...positive2093
3[0.04445473477244377, 0.018118631094694138, -0...0.999111Saw it yesterday night at the Midnight Slam of...Saw it yesterday night at the Midnight Slam of...negative[Saw it yesterday night at the Midnight Slam o...negative2370
4[0.01753336377441883, 0.05620148777961731, -0....0.954813This show was Fabulous. It was intricate and w...This show was Fabulous. It was intricate and w...positive[This show was Fabulous., It was intricate and...positive2064
5[0.02791544236242771, -0.028735769912600517, -...0.506074Jean Dujardin gets Connery's mannerisms down p...Jean Dujardin gets Connery's mannerisms down p...neutral[Jean Dujardin gets Connery's mannerisms down ...positive1988
6[0.06690913438796997, 0.03236781805753708, -0....0.982164This is a tongue in cheek movie from the very ...This is a tongue in cheek movie from the very ...positive[This is a tongue in cheek movie from the very...positive1483
7[0.05750656872987747, -0.059703629463911057, -...0.989845I have seen most, if not all of the Laurel & H...I have seen most, if not all of the Laurel & H...negative[I have seen most, if not all of the Laurel & ...negative207
8[-0.04340153560042381, -0.015529784373939037, ...0.991958Although there is melodrama at the center or r...Although there is melodrama at the center or r...positive[Although there is melodrama at the center or ...positive2030
9[-0.032444775104522705, -0.018744193017482758,...0.986264\"Hey Babu Riba\" is a film about a young woman,...\"Hey Babu Riba\" is a film about a young woman,...positive[\"Hey Babu Riba\" is a film about a young woman...positive397
10[0.04818158969283104, 0.006010068114846945, 0....0.991307I have to say the first I watched this film wa...I have to say the first I watched this film wa...negative[I have to say the first I watched this film w...negative1079
11[-0.011373912915587425, 0.033458177000284195, ...0.997292Well, there's no real plot to speak of, it's j...Well, there's no real plot to speak of, it's j...negative[Well, there's no real plot to speak of, it's ...negative308
12[0.059722382575273514, -0.04133075103163719, -...0.999755Why can't a movie be rated a zero? Or even a n...Why can't a movie be rated a zero? Or even a n...negative[Why can't a movie be rated a zero?, Or even a...negative175
13[-0.0024044793099164963, 0.01555465068668127, ...0.915433PREY <br /><br />Aspect ratio: 1.37:1<br /><br...PREY <br /><br />Aspect ratio: 1.37:1<br /><br...positive[PREY <br /><br />Aspect ratio: 1.37:1<br /><b...negative610
14[0.014560171402990818, 0.07181794941425323, 0....0.954957Don't bother trying to watch this terrible min...Don't bother trying to watch this terrible min...negative[Don't bother trying to watch this terrible mi...negative1516
15[0.047474540770053864, -0.01399887353181839, -...0.992878This is one of the most ridiculous westerns th...This is one of the most ridiculous westerns th...negative[This is one of the most ridiculous westerns t...negative1558
16[0.04194203019142151, 0.014816542156040668, -0...0.994948Please don't waste your time. This movie rehas...Please don't waste your time. This movie rehas...negative[Please don't waste your time., This movie reh...negative1691
17[-0.01830967515707016, 0.0020676907151937485, ...0.835862Nice character development in a pretty cool mi...Nice character development in a pretty cool mi...positive[Nice character development in a pretty cool m...positive227
18[-0.028767548501491547, -0.04782490059733391, ...0.994877The Elegant Documentary -<br /><br />Don't wat...The Elegant Documentary -<br /><br />Don't wat...positive[The Elegant Documentary -<br /><br />Don't wa...positive172
19[0.020619677379727364, -0.0368206612765789, -0...0.993641The script was VERY weak w/o enough character ...The script was VERY weak w/o enough character ...negative[The script was VERY weak w/o enough character...negative1687
20[0.025141961872577667, 0.03661772608757019, -0...0.972840Though structured totally different from the b...Though structured totally different from the b...positive[Though structured totally different from the ...positive2496
21[0.05255172401666641, 0.0014638856519013643, -...0.971560I am so excited that Greek is back! This seaso...I am so excited that Greek is back! This seaso...positive[I am so excited that Greek is back!, This sea...positive2162
22[0.05090215429663658, 0.04202255234122276, -0....0.986716Petter Mattei's \"Love in the Time of Money\" is...Petter Mattei's \"Love in the Time of Money\" is...positive[Petter Mattei's \"Love in the Time of Money\" i...positive4
23[0.028773412108421326, 0.025388378649950027, -...0.998700Most Lorne Michaels films seem to fail because...Most Lorne Michaels films seem to fail because...negative[Most Lorne Michaels films seem to fail becaus...negative654
24[0.0029449418652802706, -0.054734643548727036,...0.999298\"Revolt of the Zombies\" proves that having the...\"Revolt of the Zombies\" proves that having the...negative[\"Revolt of the Zombies\" proves that having th...negative126
25[0.045008499175310135, 0.06566429883241653, -0...0.760545Necessarily ridiculous film version the litera...Necessarily ridiculous film version the litera...positive[Necessarily ridiculous film version the liter...negative2316
26[0.03256276622414589, -0.041013315320014954, -...0.964842Supposedly a \"social commentary\" on racism and...Supposedly a \"social commentary\" on racism and...negative[Supposedly a \"social commentary\" on racism an...negative250
27[0.037413761019706726, -0.041003260761499405, ...0.996598I rented the DVD in a video store, as an alter...I rented the DVD in a video store, as an alter...negative[I rented the DVD in a video store, as an alte...negative644
28[-0.0654454156756401, 0.005620448384433985, -0...0.561247If you like original gut wrenching laughter yo...If you like original gut wrenching laughter yo...neutral[If you like original gut wrenching laughter y...positive9
29[0.027594028040766716, -0.024396954104304314, ...0.945314Doctor Mordrid is one of those rare films that...Doctor Mordrid is one of those rare films that...negative[Doctor Mordrid is one of those rare films tha...positive376
30[0.017458664253354073, 0.05214596167206764, -0...0.973781Back in 1982 a little film called MAKING LOVE ...Back in 1982 a little film called MAKING LOVE ...positive[Back in 1982 a little film called MAKING LOVE...positive746
31[0.04889684543013573, 0.029793666675686836, 0....0.9689981936 was the most prolific year for Astaire an...1936 was the most prolific year for Astaire an...positive[1936 was the most prolific year for Astaire a...positive2293
32[0.04028377681970596, 0.05781426653265953, -0....0.989890Tim Taylor is an abusive acholoic drug addict....Tim Taylor is an abusive acholoic drug addict....negative[Tim Taylor is an abusive acholoic drug addict...negative2398
33[0.033708442002534866, -0.06806037575006485, -...0.997308First of all yes I'm white, so I try to tread ...First of all yes I'm white, so I try to tread ...negative[First of all yes I'm white, so I try to tread...negative784
34[-0.05339590832591057, 0.03226976469159126, 0....0.971397I have to say that the events of 9/11 didn't h...I have to say that the events of 9/11 didn't h...positive[I have to say that the events of 9/11 didn't ...positive713
35[0.034553565084934235, -0.002318630227819085, ...0.999211This movie features Charlie Spradling dancing ...This movie features Charlie Spradling dancing ...negative[This movie features Charlie Spradling dancing...negative358
36[-0.03914014622569084, -0.04544996842741966, -...0.991993. . . and that is only if you like the sight o.... . . and that is only if you like the sight o...negative[. . . and that is only if you like the sight ...negative1377
37[-0.037124212831258774, 0.04466667398810387, -...0.890090An interesting movie with Jordana Brewster as ...An interesting movie with Jordana Brewster as ...positive[An interesting movie with Jordana Brewster as...positive1902
38[0.05217211693525314, -0.045366983860731125, -...0.998053Next to the slasher films of the 1970s and 80s...Next to the slasher films of the 1970s and 80s...negative[Next to the slasher films of the 1970s and 80...negative2028
39[0.054036945104599, -0.0035783019848167896, -0...0.994295I saw this movie in a theater while on vacatio...I saw this movie in a theater while on vacatio...negative[I saw this movie in a theater while on vacati...positive929
40[0.004495782777667046, 0.0002409428561804816, ...0.999238Did the first travesty actually make money? Th...Did the first travesty actually make money? Th...negative[Did the first travesty actually make money?, ...negative659
41[0.05336875841021538, 0.01685612089931965, -0....0.979532Jay Craven's criminally ignored film is a sobe...Jay Craven's criminally ignored film is a sobe...positive[Jay Craven's criminally ignored film is a sob...positive2286
42[0.06096196547150612, -0.018674220889806747, -...0.936765This is a great film with an amazing cast. Cri...This is a great film with an amazing cast. Cri...positive[This is a great film with an amazing cast. Cr...positive1198
43[-0.01874159276485443, -0.015269572846591473, ...0.945504I had a hard time staying awake for the two ho...I had a hard time staying awake for the two ho...positive[I had a hard time staying awake for the two h...negative468
44[0.0465228296816349, -0.03501952812075615, -0....0.980367Not the film to see if you want to be intellec...Not the film to see if you want to be intellec...negative[Not the film to see if you want to be intelle...positive1335
45[-0.022962188348174095, -0.05519339069724083, ...0.996134I bought this (it was only $3, ok?) under the ...I bought this (it was only $3, ok?) under the ...negative[I bought this (it was only $3, ok?) under the...negative1322
46[0.05667755752801895, -0.004784214776009321, 0...0.983472I saw it tonight and fell asleep in the movie....I saw it tonight and fell asleep in the movie....negative[I saw it tonight and fell asleep in the movie...negative2141
47[-0.045412369072437286, -0.010548366233706474,...0.998861The opening scene keeps me from rating at abso...The opening scene keeps me from rating at abso...negative[The opening scene keeps me from rating at abs...negative1065
48[0.005430039018392563, -0.021274806931614876, ...0.984353If you are a fan of Zorro, Indiana Jones, or a...If you are a fan of Zorro, Indiana Jones, or a...negative[If you are a fan of Zorro, Indiana Jones, or ...positive2103
49[-0.012310817837715149, 0.036387305706739426, ...0.960394Stan Laurel and Oliver Hardy are the most famo...Stan Laurel and Oliver Hardy are the most famo...negative[Stan Laurel and Oliver Hardy are the most fam...negative588
\n","
"],"text/plain":[" sentence_embedding_use ... origin_index\n","0 [0.06351275742053986, 0.040804557502269745, -2... ... 1459\n","1 [0.06992700695991516, 0.0012252129381522536, -... ... 2428\n","2 [0.052442848682403564, -0.017640504986047745, ... ... 2093\n","3 [0.04445473477244377, 0.018118631094694138, -0... ... 2370\n","4 [0.01753336377441883, 0.05620148777961731, -0.... ... 2064\n","5 [0.02791544236242771, -0.028735769912600517, -... ... 1988\n","6 [0.06690913438796997, 0.03236781805753708, -0.... ... 1483\n","7 [0.05750656872987747, -0.059703629463911057, -... ... 207\n","8 [-0.04340153560042381, -0.015529784373939037, ... ... 2030\n","9 [-0.032444775104522705, -0.018744193017482758,... ... 397\n","10 [0.04818158969283104, 0.006010068114846945, 0.... ... 1079\n","11 [-0.011373912915587425, 0.033458177000284195, ... ... 308\n","12 [0.059722382575273514, -0.04133075103163719, -... ... 175\n","13 [-0.0024044793099164963, 0.01555465068668127, ... ... 610\n","14 [0.014560171402990818, 0.07181794941425323, 0.... ... 1516\n","15 [0.047474540770053864, -0.01399887353181839, -... ... 1558\n","16 [0.04194203019142151, 0.014816542156040668, -0... ... 1691\n","17 [-0.01830967515707016, 0.0020676907151937485, ... ... 227\n","18 [-0.028767548501491547, -0.04782490059733391, ... ... 172\n","19 [0.020619677379727364, -0.0368206612765789, -0... ... 1687\n","20 [0.025141961872577667, 0.03661772608757019, -0... ... 2496\n","21 [0.05255172401666641, 0.0014638856519013643, -... ... 2162\n","22 [0.05090215429663658, 0.04202255234122276, -0.... ... 4\n","23 [0.028773412108421326, 0.025388378649950027, -... ... 654\n","24 [0.0029449418652802706, -0.054734643548727036,... ... 126\n","25 [0.045008499175310135, 0.06566429883241653, -0... ... 2316\n","26 [0.03256276622414589, -0.041013315320014954, -... ... 250\n","27 [0.037413761019706726, -0.041003260761499405, ... ... 644\n","28 [-0.0654454156756401, 0.005620448384433985, -0... ... 9\n","29 [0.027594028040766716, -0.024396954104304314, ... ... 376\n","30 [0.017458664253354073, 0.05214596167206764, -0... ... 746\n","31 [0.04889684543013573, 0.029793666675686836, 0.... ... 2293\n","32 [0.04028377681970596, 0.05781426653265953, -0.... ... 2398\n","33 [0.033708442002534866, -0.06806037575006485, -... ... 784\n","34 [-0.05339590832591057, 0.03226976469159126, 0.... ... 713\n","35 [0.034553565084934235, -0.002318630227819085, ... ... 358\n","36 [-0.03914014622569084, -0.04544996842741966, -... ... 1377\n","37 [-0.037124212831258774, 0.04466667398810387, -... ... 1902\n","38 [0.05217211693525314, -0.045366983860731125, -... ... 2028\n","39 [0.054036945104599, -0.0035783019848167896, -0... ... 929\n","40 [0.004495782777667046, 0.0002409428561804816, ... ... 659\n","41 [0.05336875841021538, 0.01685612089931965, -0.... ... 2286\n","42 [0.06096196547150612, -0.018674220889806747, -... ... 1198\n","43 [-0.01874159276485443, -0.015269572846591473, ... ... 468\n","44 [0.0465228296816349, -0.03501952812075615, -0.... ... 1335\n","45 [-0.022962188348174095, -0.05519339069724083, ... ... 1322\n","46 [0.05667755752801895, -0.004784214776009321, 0... ... 2141\n","47 [-0.045412369072437286, -0.010548366233706474,... ... 1065\n","48 [0.005430039018392563, -0.021274806931614876, ... ... 2103\n","49 [-0.012310817837715149, 0.036387305706739426, ... ... 588\n","\n","[50 rows x 8 columns]"]},"metadata":{"tags":[]},"execution_count":6}]},{"cell_type":"markdown","metadata":{"id":"lVyOE2wV0fw_"},"source":["# 4. Test the fitted pipe on new example"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":77},"id":"qdCUg2MR0PD2","executionInfo":{"status":"ok","timestamp":1620195367250,"user_tz":-120,"elapsed":11297,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"42030190-589e-46f3-d227-e4a8eebc69a2"},"source":["fitted_pipe.predict('It was one of the best films i have ever watched in my entire life !!')"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
sentence_embedding_usetrained_sentiment_confidencedocumenttrained_sentimentsentenceorigin_index
0[-0.022810865193605423, 0.015739120543003082, ...0.833494It was one of the best films i have ever watch...negative[It was one of the best films i have ever watc...0
\n","
"],"text/plain":[" sentence_embedding_use ... origin_index\n","0 [-0.022810865193605423, 0.015739120543003082, ... ... 0\n","\n","[1 rows x 6 columns]"]},"metadata":{"tags":[]},"execution_count":7}]},{"cell_type":"markdown","metadata":{"id":"xflpwrVjjBVD"},"source":["## 5. Configure pipe training parameters"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"UtsAUGTmOTms","executionInfo":{"status":"ok","timestamp":1620195367250,"user_tz":-120,"elapsed":11235,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"40bfeffd-ac62-4ddb-9bfb-dbc13a32ab76"},"source":["trainable_pipe.print_info()"],"execution_count":null,"outputs":[{"output_type":"stream","text":["The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n",">>> pipe['sentiment_dl'] has settable params:\n","pipe['sentiment_dl'].setMaxEpochs(1) | Info: Maximum number of epochs to train | Currently set to : 1\n","pipe['sentiment_dl'].setLr(0.005) | Info: Learning Rate | Currently set to : 0.005\n","pipe['sentiment_dl'].setBatchSize(64) | Info: Batch size | Currently set to : 64\n","pipe['sentiment_dl'].setDropout(0.5) | Info: Dropout coefficient | Currently set to : 0.5\n","pipe['sentiment_dl'].setEnableOutputLogs(True) | Info: Whether to use stdout in addition to Spark logs. | Currently set to : True\n","pipe['sentiment_dl'].setThreshold(0.6) | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n","pipe['sentiment_dl'].setThresholdLabel('neutral') | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n",">>> pipe['use@tfhub_use'] has settable params:\n","pipe['use@tfhub_use'].setDimension(512) | Info: Number of embedding dimensions | Currently set to : 512\n","pipe['use@tfhub_use'].setLoadSP(False) | Info: Whether to load SentencePiece ops file which is required only by multi-lingual models. This is not changeable after it's set with a pretrained model nor it is compatible with Windows. | Currently set to : False\n","pipe['use@tfhub_use'].setStorageRef('tfhub_use') | Info: unique reference name for identification | Currently set to : tfhub_use\n",">>> pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'] has settable params:\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setExplodeSentences(False) | Info: whether to explode each sentence into a different row, for better parallelization. Defaults to false. | Currently set to : False\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setStorageRef('SentenceDetectorDLModel_c83c27f46b97') | Info: storage unique identifier | Currently set to : SentenceDetectorDLModel_c83c27f46b97\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setEncoder(com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@260a728d) | Info: Data encoder | Currently set to : com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@260a728d\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setImpossiblePenultimates(['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']) | Info: Impossible penultimates | Currently set to : ['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setModelArchitecture('cnn') | Info: Model architecture (CNN) | Currently set to : cnn\n",">>> pipe['document_assembler'] has settable params:\n","pipe['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"2GJdDNV9jEIe"},"source":["## 6. Retrain with new parameters"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":1000},"id":"mptfvHx-MMMX","executionInfo":{"status":"ok","timestamp":1620195371203,"user_tz":-120,"elapsed":15108,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"c558685c-7b89-42f6-f5d2-75bda38aebef"},"source":["# Train longer!\n","trainable_pipe = nlu.load('train.sentiment')\n","trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(5) \n","fitted_pipe = trainable_pipe.fit(train_df.iloc[:50])\n","# predict with the trainable pipeline on dataset and get predictions\n","preds = fitted_pipe.predict(train_df.iloc[:50],output_level='document')\n","\n","#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n","preds.dropna(inplace=True)\n","print(classification_report(preds['y'], preds['sentiment']))\n","\n","preds"],"execution_count":null,"outputs":[{"output_type":"stream","text":[" precision recall f1-score support\n","\n"," negative 0.92 0.92 0.92 26\n"," neutral 0.00 0.00 0.00 0\n"," positive 1.00 0.75 0.86 24\n","\n"," accuracy 0.84 50\n"," macro avg 0.64 0.56 0.59 50\n","weighted avg 0.96 0.84 0.89 50\n","\n"],"name":"stdout"},{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
sentence_embedding_usetrained_sentiment_confidencetextdocumenttrained_sentimentsentenceyorigin_index
0[0.06351275742053986, 0.040804557502269745, -2...0.564160Absolutely wonderful drama and Ros is top notc...Absolutely wonderful drama and Ros is top notc...neutral[Absolutely wonderful drama and Ros is top not...positive1459
1[0.06992700695991516, 0.0012252129381522536, -...0.853955I was totally surprised just how good this mov...I was totally surprised just how good this mov...positive[I was totally surprised just how good this mo...positive2428
2[0.052442848682403564, -0.017640504986047745, ...0.866541This is a fine musical with a timeless score b...This is a fine musical with a timeless score b...positive[This is a fine musical with a timeless score ...positive2093
3[0.04445473477244377, 0.018118631094694138, -0...0.894689Saw it yesterday night at the Midnight Slam of...Saw it yesterday night at the Midnight Slam of...negative[Saw it yesterday night at the Midnight Slam o...negative2370
4[0.01753336377441883, 0.05620148777961731, -0....0.680551This show was Fabulous. It was intricate and w...This show was Fabulous. It was intricate and w...positive[This show was Fabulous., It was intricate and...positive2064
5[0.02791544236242771, -0.028735769912600517, -...0.566901Jean Dujardin gets Connery's mannerisms down p...Jean Dujardin gets Connery's mannerisms down p...neutral[Jean Dujardin gets Connery's mannerisms down ...positive1988
6[0.06690913438796997, 0.03236781805753708, -0....0.870670This is a tongue in cheek movie from the very ...This is a tongue in cheek movie from the very ...positive[This is a tongue in cheek movie from the very...positive1483
7[0.05750656872987747, -0.059703629463911057, -...0.811313I have seen most, if not all of the Laurel & H...I have seen most, if not all of the Laurel & H...negative[I have seen most, if not all of the Laurel & ...negative207
8[-0.04340153560042381, -0.015529784373939037, ...0.885475Although there is melodrama at the center or r...Although there is melodrama at the center or r...positive[Although there is melodrama at the center or ...positive2030
9[-0.032444775104522705, -0.018744193017482758,...0.880005\"Hey Babu Riba\" is a film about a young woman,...\"Hey Babu Riba\" is a film about a young woman,...positive[\"Hey Babu Riba\" is a film about a young woman...positive397
10[0.04818158969283104, 0.006010068114846945, 0....0.832985I have to say the first I watched this film wa...I have to say the first I watched this film wa...negative[I have to say the first I watched this film w...negative1079
11[-0.011373912915587425, 0.033458177000284195, ...0.881889Well, there's no real plot to speak of, it's j...Well, there's no real plot to speak of, it's j...negative[Well, there's no real plot to speak of, it's ...negative308
12[0.059722382575273514, -0.04133075103163719, -...0.939842Why can't a movie be rated a zero? Or even a n...Why can't a movie be rated a zero? Or even a n...negative[Why can't a movie be rated a zero?, Or even a...negative175
13[-0.0024044793099164963, 0.01555465068668127, ...0.566536PREY <br /><br />Aspect ratio: 1.37:1<br /><br...PREY <br /><br />Aspect ratio: 1.37:1<br /><br...neutral[PREY <br /><br />Aspect ratio: 1.37:1<br /><b...negative610
14[0.014560171402990818, 0.07181794941425323, 0....0.835540Don't bother trying to watch this terrible min...Don't bother trying to watch this terrible min...negative[Don't bother trying to watch this terrible mi...negative1516
15[0.047474540770053864, -0.01399887353181839, -...0.720724This is one of the most ridiculous westerns th...This is one of the most ridiculous westerns th...negative[This is one of the most ridiculous westerns t...negative1558
16[0.04194203019142151, 0.014816542156040668, -0...0.881261Please don't waste your time. This movie rehas...Please don't waste your time. This movie rehas...negative[Please don't waste your time., This movie reh...negative1691
17[-0.01830967515707016, 0.0020676907151937485, ...0.744092Nice character development in a pretty cool mi...Nice character development in a pretty cool mi...positive[Nice character development in a pretty cool m...positive227
18[-0.028767548501491547, -0.04782490059733391, ...0.875401The Elegant Documentary -<br /><br />Don't wat...The Elegant Documentary -<br /><br />Don't wat...positive[The Elegant Documentary -<br /><br />Don't wa...positive172
19[0.020619677379727364, -0.0368206612765789, -0...0.839296The script was VERY weak w/o enough character ...The script was VERY weak w/o enough character ...negative[The script was VERY weak w/o enough character...negative1687
20[0.025141961872577667, 0.03661772608757019, -0...0.849500Though structured totally different from the b...Though structured totally different from the b...positive[Though structured totally different from the ...positive2496
21[0.05255172401666641, 0.0014638856519013643, -...0.840122I am so excited that Greek is back! This seaso...I am so excited that Greek is back! This seaso...positive[I am so excited that Greek is back!, This sea...positive2162
22[0.05090215429663658, 0.04202255234122276, -0....0.874497Petter Mattei's \"Love in the Time of Money\" is...Petter Mattei's \"Love in the Time of Money\" is...positive[Petter Mattei's \"Love in the Time of Money\" i...positive4
23[0.028773412108421326, 0.025388378649950027, -...0.912337Most Lorne Michaels films seem to fail because...Most Lorne Michaels films seem to fail because...negative[Most Lorne Michaels films seem to fail becaus...negative654
24[0.0029449418652802706, -0.054734643548727036,...0.894369\"Revolt of the Zombies\" proves that having the...\"Revolt of the Zombies\" proves that having the...negative[\"Revolt of the Zombies\" proves that having th...negative126
25[0.045008499175310135, 0.06566429883241653, -0...0.562989Necessarily ridiculous film version the litera...Necessarily ridiculous film version the litera...neutral[Necessarily ridiculous film version the liter...negative2316
26[0.03256276622414589, -0.041013315320014954, -...0.753462Supposedly a \"social commentary\" on racism and...Supposedly a \"social commentary\" on racism and...negative[Supposedly a \"social commentary\" on racism an...negative250
27[0.037413761019706726, -0.041003260761499405, ...0.898136I rented the DVD in a video store, as an alter...I rented the DVD in a video store, as an alter...negative[I rented the DVD in a video store, as an alte...negative644
28[-0.0654454156756401, 0.005620448384433985, -0...0.765227If you like original gut wrenching laughter yo...If you like original gut wrenching laughter yo...positive[If you like original gut wrenching laughter y...positive9
29[0.027594028040766716, -0.024396954104304314, ...0.581867Doctor Mordrid is one of those rare films that...Doctor Mordrid is one of those rare films that...neutral[Doctor Mordrid is one of those rare films tha...positive376
30[0.017458664253354073, 0.05214596167206764, -0...0.814803Back in 1982 a little film called MAKING LOVE ...Back in 1982 a little film called MAKING LOVE ...positive[Back in 1982 a little film called MAKING LOVE...positive746
31[0.04889684543013573, 0.029793666675686836, 0....0.7119661936 was the most prolific year for Astaire an...1936 was the most prolific year for Astaire an...positive[1936 was the most prolific year for Astaire a...positive2293
32[0.04028377681970596, 0.05781426653265953, -0....0.823739Tim Taylor is an abusive acholoic drug addict....Tim Taylor is an abusive acholoic drug addict....negative[Tim Taylor is an abusive acholoic drug addict...negative2398
33[0.033708442002534866, -0.06806037575006485, -...0.880034First of all yes I'm white, so I try to tread ...First of all yes I'm white, so I try to tread ...negative[First of all yes I'm white, so I try to tread...negative784
34[-0.05339590832591057, 0.03226976469159126, 0....0.859471I have to say that the events of 9/11 didn't h...I have to say that the events of 9/11 didn't h...positive[I have to say that the events of 9/11 didn't ...positive713
35[0.034553565084934235, -0.002318630227819085, ...0.873029This movie features Charlie Spradling dancing ...This movie features Charlie Spradling dancing ...negative[This movie features Charlie Spradling dancing...negative358
36[-0.03914014622569084, -0.04544996842741966, -...0.919768. . . and that is only if you like the sight o.... . . and that is only if you like the sight o...negative[. . . and that is only if you like the sight ...negative1377
37[-0.037124212831258774, 0.04466667398810387, -...0.825637An interesting movie with Jordana Brewster as ...An interesting movie with Jordana Brewster as ...positive[An interesting movie with Jordana Brewster as...positive1902
38[0.05217211693525314, -0.045366983860731125, -...0.879427Next to the slasher films of the 1970s and 80s...Next to the slasher films of the 1970s and 80s...negative[Next to the slasher films of the 1970s and 80...negative2028
39[0.054036945104599, -0.0035783019848167896, -0...0.673891I saw this movie in a theater while on vacatio...I saw this movie in a theater while on vacatio...negative[I saw this movie in a theater while on vacati...positive929
40[0.004495782777667046, 0.0002409428561804816, ...0.870754Did the first travesty actually make money? Th...Did the first travesty actually make money? Th...negative[Did the first travesty actually make money?, ...negative659
41[0.05336875841021538, 0.01685612089931965, -0....0.839090Jay Craven's criminally ignored film is a sobe...Jay Craven's criminally ignored film is a sobe...positive[Jay Craven's criminally ignored film is a sob...positive2286
42[0.06096196547150612, -0.018674220889806747, -...0.880842This is a great film with an amazing cast. Cri...This is a great film with an amazing cast. Cri...positive[This is a great film with an amazing cast. Cr...positive1198
43[-0.01874159276485443, -0.015269572846591473, ...0.653215I had a hard time staying awake for the two ho...I had a hard time staying awake for the two ho...negative[I had a hard time staying awake for the two h...negative468
44[0.0465228296816349, -0.03501952812075615, -0....0.520477Not the film to see if you want to be intellec...Not the film to see if you want to be intellec...neutral[Not the film to see if you want to be intelle...positive1335
45[-0.022962188348174095, -0.05519339069724083, ...0.916282I bought this (it was only $3, ok?) under the ...I bought this (it was only $3, ok?) under the ...negative[I bought this (it was only $3, ok?) under the...negative1322
46[0.05667755752801895, -0.004784214776009321, 0...0.629980I saw it tonight and fell asleep in the movie....I saw it tonight and fell asleep in the movie....negative[I saw it tonight and fell asleep in the movie...negative2141
47[-0.045412369072437286, -0.010548366233706474,...0.944924The opening scene keeps me from rating at abso...The opening scene keeps me from rating at abso...negative[The opening scene keeps me from rating at abs...negative1065
48[0.005430039018392563, -0.021274806931614876, ...0.716253If you are a fan of Zorro, Indiana Jones, or a...If you are a fan of Zorro, Indiana Jones, or a...negative[If you are a fan of Zorro, Indiana Jones, or ...positive2103
49[-0.012310817837715149, 0.036387305706739426, ...0.831511Stan Laurel and Oliver Hardy are the most famo...Stan Laurel and Oliver Hardy are the most famo...negative[Stan Laurel and Oliver Hardy are the most fam...negative588
\n","
"],"text/plain":[" sentence_embedding_use ... origin_index\n","0 [0.06351275742053986, 0.040804557502269745, -2... ... 1459\n","1 [0.06992700695991516, 0.0012252129381522536, -... ... 2428\n","2 [0.052442848682403564, -0.017640504986047745, ... ... 2093\n","3 [0.04445473477244377, 0.018118631094694138, -0... ... 2370\n","4 [0.01753336377441883, 0.05620148777961731, -0.... ... 2064\n","5 [0.02791544236242771, -0.028735769912600517, -... ... 1988\n","6 [0.06690913438796997, 0.03236781805753708, -0.... ... 1483\n","7 [0.05750656872987747, -0.059703629463911057, -... ... 207\n","8 [-0.04340153560042381, -0.015529784373939037, ... ... 2030\n","9 [-0.032444775104522705, -0.018744193017482758,... ... 397\n","10 [0.04818158969283104, 0.006010068114846945, 0.... ... 1079\n","11 [-0.011373912915587425, 0.033458177000284195, ... ... 308\n","12 [0.059722382575273514, -0.04133075103163719, -... ... 175\n","13 [-0.0024044793099164963, 0.01555465068668127, ... ... 610\n","14 [0.014560171402990818, 0.07181794941425323, 0.... ... 1516\n","15 [0.047474540770053864, -0.01399887353181839, -... ... 1558\n","16 [0.04194203019142151, 0.014816542156040668, -0... ... 1691\n","17 [-0.01830967515707016, 0.0020676907151937485, ... ... 227\n","18 [-0.028767548501491547, -0.04782490059733391, ... ... 172\n","19 [0.020619677379727364, -0.0368206612765789, -0... ... 1687\n","20 [0.025141961872577667, 0.03661772608757019, -0... ... 2496\n","21 [0.05255172401666641, 0.0014638856519013643, -... ... 2162\n","22 [0.05090215429663658, 0.04202255234122276, -0.... ... 4\n","23 [0.028773412108421326, 0.025388378649950027, -... ... 654\n","24 [0.0029449418652802706, -0.054734643548727036,... ... 126\n","25 [0.045008499175310135, 0.06566429883241653, -0... ... 2316\n","26 [0.03256276622414589, -0.041013315320014954, -... ... 250\n","27 [0.037413761019706726, -0.041003260761499405, ... ... 644\n","28 [-0.0654454156756401, 0.005620448384433985, -0... ... 9\n","29 [0.027594028040766716, -0.024396954104304314, ... ... 376\n","30 [0.017458664253354073, 0.05214596167206764, -0... ... 746\n","31 [0.04889684543013573, 0.029793666675686836, 0.... ... 2293\n","32 [0.04028377681970596, 0.05781426653265953, -0.... ... 2398\n","33 [0.033708442002534866, -0.06806037575006485, -... ... 784\n","34 [-0.05339590832591057, 0.03226976469159126, 0.... ... 713\n","35 [0.034553565084934235, -0.002318630227819085, ... ... 358\n","36 [-0.03914014622569084, -0.04544996842741966, -... ... 1377\n","37 [-0.037124212831258774, 0.04466667398810387, -... ... 1902\n","38 [0.05217211693525314, -0.045366983860731125, -... ... 2028\n","39 [0.054036945104599, -0.0035783019848167896, -0... ... 929\n","40 [0.004495782777667046, 0.0002409428561804816, ... ... 659\n","41 [0.05336875841021538, 0.01685612089931965, -0.... ... 2286\n","42 [0.06096196547150612, -0.018674220889806747, -... ... 1198\n","43 [-0.01874159276485443, -0.015269572846591473, ... ... 468\n","44 [0.0465228296816349, -0.03501952812075615, -0.... ... 1335\n","45 [-0.022962188348174095, -0.05519339069724083, ... ... 1322\n","46 [0.05667755752801895, -0.004784214776009321, 0... ... 2141\n","47 [-0.045412369072437286, -0.010548366233706474,... ... 1065\n","48 [0.005430039018392563, -0.021274806931614876, ... ... 2103\n","49 [-0.012310817837715149, 0.036387305706739426, ... ... 588\n","\n","[50 rows x 8 columns]"]},"metadata":{"tags":[]},"execution_count":9}]},{"cell_type":"markdown","metadata":{"id":"qFoT-s1MjTSS"},"source":["# 7. Try training with different Embeddings"]},{"cell_type":"code","metadata":{"id":"nxWFzQOhjWC8","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1620195371204,"user_tz":-120,"elapsed":15026,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"cfc1e505-40b8-4021-e75d-228cae3cfe4c"},"source":["# We can use nlu.print_components(action='embed_sentence') to see every possibler sentence embedding we could use. Lets use bert!\n","nlu.print_components(action='embed_sentence')"],"execution_count":null,"outputs":[{"output_type":"stream","text":["For language NLU provides the following Models : \n","nlu.load('en.embed_sentence') returns Spark NLP model tfhub_use\n","nlu.load('en.embed_sentence.use') returns Spark NLP model tfhub_use\n","nlu.load('en.embed_sentence.tfhub_use') returns Spark NLP model tfhub_use\n","nlu.load('en.embed_sentence.use.lg') returns Spark NLP model tfhub_use_lg\n","nlu.load('en.embed_sentence.tfhub_use.lg') returns Spark NLP model tfhub_use_lg\n","nlu.load('en.embed_sentence.albert') returns Spark NLP model albert_base_uncased\n","nlu.load('en.embed_sentence.electra') returns Spark NLP model sent_electra_small_uncased\n","nlu.load('en.embed_sentence.electra_small_uncased') returns Spark NLP model sent_electra_small_uncased\n","nlu.load('en.embed_sentence.electra_base_uncased') returns Spark NLP model sent_electra_base_uncased\n","nlu.load('en.embed_sentence.electra_large_uncased') returns Spark NLP model sent_electra_large_uncased\n","nlu.load('en.embed_sentence.bert') returns Spark NLP model sent_bert_base_uncased\n","nlu.load('en.embed_sentence.bert_base_uncased') returns Spark NLP model sent_bert_base_uncased\n","nlu.load('en.embed_sentence.bert_base_cased') returns Spark NLP model sent_bert_base_cased\n","nlu.load('en.embed_sentence.bert_large_uncased') returns Spark NLP model sent_bert_large_uncased\n","nlu.load('en.embed_sentence.bert_large_cased') returns Spark NLP model sent_bert_large_cased\n","nlu.load('en.embed_sentence.biobert.pubmed_base_cased') returns Spark NLP model sent_biobert_pubmed_base_cased\n","nlu.load('en.embed_sentence.biobert.pubmed_large_cased') returns Spark NLP model sent_biobert_pubmed_large_cased\n","nlu.load('en.embed_sentence.biobert.pmc_base_cased') returns Spark NLP model sent_biobert_pmc_base_cased\n","nlu.load('en.embed_sentence.biobert.pubmed_pmc_base_cased') returns Spark NLP model sent_biobert_pubmed_pmc_base_cased\n","nlu.load('en.embed_sentence.biobert.clinical_base_cased') returns Spark NLP model sent_biobert_clinical_base_cased\n","nlu.load('en.embed_sentence.biobert.discharge_base_cased') returns Spark NLP model sent_biobert_discharge_base_cased\n","nlu.load('en.embed_sentence.covidbert.large_uncased') returns Spark NLP model sent_covidbert_large_uncased\n","nlu.load('en.embed_sentence.small_bert_L2_128') returns Spark NLP model sent_small_bert_L2_128\n","nlu.load('en.embed_sentence.small_bert_L4_128') returns Spark NLP model sent_small_bert_L4_128\n","nlu.load('en.embed_sentence.small_bert_L6_128') returns Spark NLP model sent_small_bert_L6_128\n","nlu.load('en.embed_sentence.small_bert_L8_128') returns Spark NLP model sent_small_bert_L8_128\n","nlu.load('en.embed_sentence.small_bert_L10_128') returns Spark NLP model sent_small_bert_L10_128\n","nlu.load('en.embed_sentence.small_bert_L12_128') returns Spark NLP model sent_small_bert_L12_128\n","nlu.load('en.embed_sentence.small_bert_L2_256') returns Spark NLP model sent_small_bert_L2_256\n","nlu.load('en.embed_sentence.small_bert_L4_256') returns Spark NLP model sent_small_bert_L4_256\n","nlu.load('en.embed_sentence.small_bert_L6_256') returns Spark NLP model sent_small_bert_L6_256\n","nlu.load('en.embed_sentence.small_bert_L8_256') returns Spark NLP model sent_small_bert_L8_256\n","nlu.load('en.embed_sentence.small_bert_L10_256') returns Spark NLP model sent_small_bert_L10_256\n","nlu.load('en.embed_sentence.small_bert_L12_256') returns Spark NLP model sent_small_bert_L12_256\n","nlu.load('en.embed_sentence.small_bert_L2_512') returns Spark NLP model sent_small_bert_L2_512\n","nlu.load('en.embed_sentence.small_bert_L4_512') returns Spark NLP model sent_small_bert_L4_512\n","nlu.load('en.embed_sentence.small_bert_L6_512') returns Spark NLP model sent_small_bert_L6_512\n","nlu.load('en.embed_sentence.small_bert_L8_512') returns Spark NLP model sent_small_bert_L8_512\n","nlu.load('en.embed_sentence.small_bert_L10_512') returns Spark NLP model sent_small_bert_L10_512\n","nlu.load('en.embed_sentence.small_bert_L12_512') returns Spark NLP model sent_small_bert_L12_512\n","nlu.load('en.embed_sentence.small_bert_L2_768') returns Spark NLP model sent_small_bert_L2_768\n","nlu.load('en.embed_sentence.small_bert_L4_768') returns Spark NLP model sent_small_bert_L4_768\n","nlu.load('en.embed_sentence.small_bert_L6_768') returns Spark NLP model sent_small_bert_L6_768\n","nlu.load('en.embed_sentence.small_bert_L8_768') returns Spark NLP model sent_small_bert_L8_768\n","nlu.load('en.embed_sentence.small_bert_L10_768') returns Spark NLP model sent_small_bert_L10_768\n","nlu.load('en.embed_sentence.small_bert_L12_768') returns Spark NLP model sent_small_bert_L12_768\n","For language NLU provides the following Models : \n","nlu.load('fi.embed_sentence') returns Spark NLP model sent_bert_finnish_cased\n","nlu.load('fi.embed_sentence.bert.cased') returns Spark NLP model sent_bert_finnish_cased\n","nlu.load('fi.embed_sentence.bert.uncased') returns Spark NLP model sent_bert_finnish_uncased\n","For language NLU provides the following Models : \n","nlu.load('xx.embed_sentence') returns Spark NLP model sent_bert_multi_cased\n","nlu.load('xx.embed_sentence.bert') returns Spark NLP model sent_bert_multi_cased\n","nlu.load('xx.embed_sentence.bert.cased') returns Spark NLP model sent_bert_multi_cased\n","nlu.load('xx.embed_sentence.labse') returns Spark NLP model labse\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"IKK_Ii_gjJfF","executionInfo":{"status":"ok","timestamp":1620201627380,"user_tz":-120,"elapsed":6271143,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"40fc246a-ca8b-4b81-f02d-30a189cd26f2"},"source":["trainable_pipe = nlu.load('en.embed_sentence.small_bert_L12_768 train.sentiment')\n","# We need to train longer and user smaller LR for NON-USE based sentence embeddings usually\n","# We could tune the hyperparameters further with hyperparameter tuning methods like gridsearch\n","# Also longer training gives more accuracy\n","trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(120) \n","trainable_pipe['trainable_sentiment_dl'].setLr(0.0005) \n","fitted_pipe = trainable_pipe.fit(train_df)\n","# predict with the trainable pipeline on dataset and get predictions\n","preds = fitted_pipe.predict(train_df,output_level='document')\n","\n","#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n","preds.dropna(inplace=True)\n","print(classification_report(preds['y'], preds['sentiment']))\n","\n","#preds"],"execution_count":null,"outputs":[{"output_type":"stream","text":["sent_small_bert_L12_768 download started this may take some time.\n","Approximate size to download 392.9 MB\n","[OK!]\n","sentence_detector_dl download started this may take some time.\n","Approximate size to download 354.6 KB\n","[OK!]\n"," precision recall f1-score support\n","\n"," negative 0.87 0.77 0.82 988\n"," neutral 0.00 0.00 0.00 0\n"," positive 0.85 0.83 0.84 1012\n","\n"," accuracy 0.80 2000\n"," macro avg 0.57 0.53 0.55 2000\n","weighted avg 0.86 0.80 0.83 2000\n","\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"_1jxw3GnVGlI"},"source":["# 7.1 evaluate on Test Data"]},{"cell_type":"code","metadata":{"id":"Fxx4yNkNVGFl","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1620201800537,"user_tz":-120,"elapsed":6444231,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"4cd26bf6-b452-4f97-d27e-125ba4adba24"},"source":["preds = fitted_pipe.predict(test_df,output_level='document')\n","\n","#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n","preds.dropna(inplace=True)\n","print(classification_report(preds['y'], preds['sentiment']))"],"execution_count":null,"outputs":[{"output_type":"stream","text":[" precision recall f1-score support\n","\n"," negative 0.85 0.75 0.80 246\n"," neutral 0.00 0.00 0.00 0\n"," positive 0.84 0.81 0.83 254\n","\n"," accuracy 0.78 500\n"," macro avg 0.56 0.52 0.54 500\n","weighted avg 0.85 0.78 0.81 500\n","\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"2BB-NwZUoHSe"},"source":["# 8. Lets save the model"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"eLex095goHwm","executionInfo":{"status":"ok","timestamp":1620201947576,"user_tz":-120,"elapsed":6591195,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"2722dd4c-b2c6-453c-9a76-b59de2034549"},"source":["stored_model_path = './models/classifier_dl_trained' \n","fitted_pipe.save(stored_model_path)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["Stored model in ./models/classifier_dl_trained\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"e_b2DPd4rCiU"},"source":["# 9. Lets load the model from HDD.\n","This makes Offlien NLU usage possible! \n","You need to call nlu.load(path=path_to_the_pipe) to load a model/pipeline from disk."]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":76},"id":"SO4uz45MoRgp","executionInfo":{"status":"ok","timestamp":1620201958142,"user_tz":-120,"elapsed":6601685,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"e7af7ac1-5347-4bf3-d23b-a1f65df85c97"},"source":["hdd_pipe = nlu.load(path=stored_model_path)\n","\n","preds = hdd_pipe.predict('It was one of the best films i have ever watched in my entire life !!')\n","preds"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
sentiment_confidencesentence_embedding_from_disktextdocumentsentenceorigin_indexsentiment
0[0.9986149, 0.9986149][[0.030373765155673027, 0.05577867478132248, 0...It was one of the best films i have ever watch...It was one of the best films i have ever watch...[It was one of the best films i have ever watc...8589934592[positive, positive]
\n","
"],"text/plain":[" sentiment_confidence ... sentiment\n","0 [0.9986149, 0.9986149] ... [positive, positive]\n","\n","[1 rows x 7 columns]"]},"metadata":{"tags":[]},"execution_count":14}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"e0CVlkk9v6Qi","executionInfo":{"status":"ok","timestamp":1620201958143,"user_tz":-120,"elapsed":6601641,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"5dfa5379-198e-4796-c27b-18f7c92a6ffa"},"source":["hdd_pipe.print_info()"],"execution_count":null,"outputs":[{"output_type":"stream","text":["The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n",">>> pipe['document_assembler'] has settable params:\n","pipe['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n",">>> pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'] has settable params:\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setExplodeSentences(False) | Info: whether to explode each sentence into a different row, for better parallelization. Defaults to false. | Currently set to : False\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setStorageRef('SentenceDetectorDLModel_c83c27f46b97') | Info: storage unique identifier | Currently set to : SentenceDetectorDLModel_c83c27f46b97\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setEncoder(com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@2350f35a) | Info: Data encoder | Currently set to : com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@2350f35a\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setImpossiblePenultimates(['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']) | Info: Impossible penultimates | Currently set to : ['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setModelArchitecture('cnn') | Info: Model architecture (CNN) | Currently set to : cnn\n",">>> pipe['bert_sentence@sent_small_bert_L12_768'] has settable params:\n","pipe['bert_sentence@sent_small_bert_L12_768'].setBatchSize(8) | Info: Size of every batch | Currently set to : 8\n","pipe['bert_sentence@sent_small_bert_L12_768'].setCaseSensitive(False) | Info: whether to ignore case in tokens for embeddings matching | Currently set to : False\n","pipe['bert_sentence@sent_small_bert_L12_768'].setDimension(768) | Info: Number of embedding dimensions | Currently set to : 768\n","pipe['bert_sentence@sent_small_bert_L12_768'].setMaxSentenceLength(128) | Info: Max sentence length to process | Currently set to : 128\n","pipe['bert_sentence@sent_small_bert_L12_768'].setIsLong(False) | Info: Use Long type instead of Int type for inputs buffer - Some Bert models require Long instead of Int. | Currently set to : False\n","pipe['bert_sentence@sent_small_bert_L12_768'].setStorageRef('sent_small_bert_L12_768') | Info: unique reference name for identification | Currently set to : sent_small_bert_L12_768\n",">>> pipe['sentiment_dl@sent_small_bert_L12_768'] has settable params:\n","pipe['sentiment_dl@sent_small_bert_L12_768'].setThreshold(0.6) | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n","pipe['sentiment_dl@sent_small_bert_L12_768'].setThresholdLabel('neutral') | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n","pipe['sentiment_dl@sent_small_bert_L12_768'].setClasses(['positive', 'negative']) | Info: get the tags used to trained this SentimentDLModel | Currently set to : ['positive', 'negative']\n","pipe['sentiment_dl@sent_small_bert_L12_768'].setStorageRef('sent_small_bert_L12_768') | Info: unique reference name for identification | Currently set to : sent_small_bert_L12_768\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"mtDcALorKHIx"},"source":[""],"execution_count":null,"outputs":[]}]} \ No newline at end of file +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "provenance": [], + "collapsed_sections": [ + "zkufh760uvF3" + ] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "zkufh760uvF3" + }, + "source": [ + "![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)\n", + "\n", + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_IMDB.ipynb)\n", + "\n", + "\n", + "# Training a Sentiment Analysis Classifier with NLU\n", + "## 2 class IMDB Movie sentiment classifier training\n", + "With the [SentimentDL model](https://nlp.johnsnowlabs.com/docs/en/annotators#sentimentdl-multi-class-sentiment-analysis-annotator) from Spark NLP you can achieve State Of the Art results on any multi class text classification problem\n", + "\n", + "This notebook showcases the following features :\n", + "\n", + "- How to train the deep learning classifier\n", + "- How to store a pipeline to disk\n", + "- How to load the pipeline from disk (Enables NLU offline mode)\n", + "\n", + "You can achieve these results or even better on this dataset with training data:\n", + "\n", + "\n", + "
\n", + "\n", + "\n", + "![image.png]()\n", + "\n", + "\n", + "\n", + "You can achieve these results or even better on this dataset with test data:\n", + "\n", + "\n", + "
\n", + "\n", + "\n", + "![image.png]()\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dur2drhW5Rvi" + }, + "source": [ + "# 1. Install Java 8 and NLU" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "hFGnBCHavltY" + }, + "source": [ + "!pip install -q johnsnowlabs" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "f4KkTfnR5Ugg" + }, + "source": [ + "# 2. Download IMDB dataset\n", + "https://www.kaggle.com/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews\n", + "\n", + "IMDB dataset having 50K movie reviews for natural language processing or Text analytics.\n", + "This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training and 25,000 for testing. So, predict the number of positive and negative reviews using either classification or deep learning algorithms.\n", + "For more dataset information, please go through the following link,\n", + "http://ai.stanford.edu/~amaas/data/sentiment/" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "OrVb5ZMvvrQD" + }, + "source": [ + "! wget https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/resources/en/classifier-dl/IMDB/IMDB-Dataset.csv\n" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 424 + }, + "id": "y4xSRWIhwT28", + "outputId": "dd2c7a9f-1f4c-4c30-98ee-9cf666d4d44b" + }, + "source": [ + "import pandas as pd\n", + "from johnsnowlabs import nlp\n", + "train_path = '/content/IMDB-Dataset.csv'\n", + "\n", + "train_df = pd.read_csv(train_path)\n", + "# the text data to use for classification should be in a column named 'text'\n", + "# the label column must have name 'y' name be of type str\n", + "columns=['text','y']\n", + "train_df = train_df[columns]\n", + "from sklearn.model_selection import train_test_split\n", + "\n", + "train_df, test_df = train_test_split(train_df, test_size=0.2)\n", + "train_df" + ], + "execution_count": 2, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " text y\n", + "11147 I have not seen and heard the original version... negative\n", + "5176 Ghost Story has an interesting feminist reveng... negative\n", + "23853 Christ. A sequel to one of the most cloying fi... negative\n", + "12990 Mendez and Marichal have provided us with a se... positive\n", + "28039 \"Bend It Like Beckham\" is a film that got very... positive\n", + "... ... ...\n", + "30425 This movie is a lot better than the asylums ve... positive\n", + "6508 I concur with everyone above who said anything... negative\n", + "2432 The \"Wrinkle in Time\" book series is my favori... negative\n", + "12347 Clint Eastwood scores big in this thriller fro... positive\n", + "3332 This is the one major problem with this film, ... negative\n", + "\n", + "[39999 rows x 2 columns]" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
texty
11147I have not seen and heard the original version...negative
5176Ghost Story has an interesting feminist reveng...negative
23853Christ. A sequel to one of the most cloying fi...negative
12990Mendez and Marichal have provided us with a se...positive
28039\"Bend It Like Beckham\" is a film that got very...positive
.........
30425This movie is a lot better than the asylums ve...positive
6508I concur with everyone above who said anything...negative
2432The \"Wrinkle in Time\" book series is my favori...negative
12347Clint Eastwood scores big in this thriller fro...positive
3332This is the one major problem with this film, ...negative
\n", + "

39999 rows × 2 columns

\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 2 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0296Om2C5anY" + }, + "source": [ + "# 3. Train Deep Learning Classifier using nlu.load('train.sentiment')\n", + "\n", + "You dataset label column should be named 'y' and the feature column with text data should be named 'text'" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "id": "3ZIPkRkWftBG", + "outputId": "a1d1177f-d728-4338-cb76-aff8f37c6a65" + }, + "source": [ + "from johnsnowlabs import nlp\n", + "from sklearn.metrics import classification_report\n", + "\n", + "# load a trainable pipeline by specifying the train. prefix and fit it on a datset with label and text columns\n", + "# by default the Universal Sentence Encoder (USE) Sentence embeddings are used for generation\n", + "trainable_pipe = nlp.load('train.sentiment')\n", + "fitted_pipe = trainable_pipe.fit(train_df.iloc[:50])\n", + "\n", + "# predict with the trainable pipeline on dataset and get predictions\n", + "preds = fitted_pipe.predict(train_df.iloc[:50],output_level='document')\n", + "#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n", + "preds.dropna(inplace=True)\n", + "print(classification_report(preds['y'], preds['sentiment']))\n", + "\n", + "preds" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", + "sent_small_bert_L2_128 download started this may take some time.\n", + "Approximate size to download 16.1 MB\n", + "[OK!]\n", + " precision recall f1-score support\n", + "\n", + " negative 0.00 0.00 0.00 23\n", + " positive 0.54 1.00 0.70 27\n", + "\n", + " accuracy 0.54 50\n", + " macro avg 0.27 0.50 0.35 50\n", + "weighted avg 0.29 0.54 0.38 50\n", + "\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " document \\\n", + "0 This film has to be viewed in the right frame ... \n", + "1 This is just a long advertisement for the movi... \n", + "2 This movie was physically painful to sit throu... \n", + "3 This movie was one of a handful that actually ... \n", + "4 I was pleasantly pleased with the ending. I ju... \n", + "5 I would purchase this and \"Thirty Seconds Over... \n", + "6 In any number of films, you can find Nicholas ... \n", + "7 This film is quite boring. There are snippets ... \n", + "8 Let's start this review out on a positive note... \n", + "9 Payback is the game being played in this drama... \n", + "10 Anthony Quinn is a master at capturing our hea... \n", + "11 Someone, some day, should do a study of archit... \n", + "12 Thats My Bush is first of all a very entertain... \n", + "13 The Last American Virgin (1982) was one of the... \n", + "14 I experienced Nightbreed for the first time on... \n", + "15 When I checked out the review for this film af... \n", + "16 This movie is a very realistic view of a polic... \n", + "17 The fact that after 50 years, it is still a hi... \n", + "18 One of the great mysteries of life, suffered f... \n", + "19 This movie is the Latino Godfather. An unlikel... \n", + "20 This is one fine movie, I can watch it any tim... \n", + "21 I love Meatballs! Terrific characters and poig... \n", + "22 Clocking in at an interminable three hours and... \n", + "23 Sudden Impact is the 4th of the Dirty Harry fi... \n", + "24 In a genre by itself, this film has a limited ... \n", + "25 The One and only was a great film. I had just ... \n", + "26 If you are 10 years old and never seen a movie... \n", + "27 Even if it were remotely funny, this mouldy wa... \n", + "28 This is one of the best bond games i have ever... \n", + "29 The performances rate better than the rating I... \n", + "30 Chinese Ghost Story III is a totally superfluo... \n", + "31 Fred Williamson, one of the two or three top b... \n", + "32 Vince Lombardi High School has a new principal... \n", + "33 It's hard to comment on this movie. It's one o... \n", + "34 I commend pictures that try something differen... \n", + "35 It's not just that this is a bad movie; it's n... \n", + "36 In order to enjoy 'Fur - An imaginary portrait... \n", + "37 I became a fan of the TV series `Homicide: Lif... \n", + "38 A terminally dull mystery-thriller, which may ... \n", + "39 this movie, while it could be considered an al... \n", + "40 This is the \"Battlefield Earth\" of mini series... \n", + "41 DVD has become the equivalent of the old late ... \n", + "42 I saw this movie when I was very young living ... \n", + "43 This may not be a memorable classic, but it is... \n", + "44 This is a great film for pure entertainment, n... \n", + "45 When I saw this on TV I was nervous...whats if... \n", + "46 This film without doubt is one of the worst I ... \n", + "47 \"Challenge to be Free\" was one of the first fi... \n", + "48 I bought this movie a few days ago, and though... \n", + "49 This movie is not as good as all think. the ac... \n", + "\n", + " sentence_embedding_small_bert_L2_128 sentiment \\\n", + "0 [-0.1513771265745163, 0.3099152743816376, -0.2... positive \n", + "1 [-0.3328755795955658, 0.2796784043312073, 0.10... positive \n", + "2 [-0.6589022278785706, 0.09297071397304535, 0.0... positive \n", + "3 [-0.5372501015663147, 0.5361205339431763, -0.0... positive \n", + "4 [-0.3981836140155792, 0.3446210026741028, -0.0... positive \n", + "5 [-0.4655883014202118, 0.7156240940093994, -0.1... positive \n", + "6 [-0.366023987531662, 0.2559768557548523, -0.03... positive \n", + "7 [-0.4195595383644104, 0.37842151522636414, -0.... positive \n", + "8 [-0.5557488203048706, -0.04156230762600899, 0.... positive \n", + "9 [-0.48163551092147827, 0.21267670392990112, -0... positive \n", + "10 [-0.710480809211731, 0.029913106933236122, -0.... positive \n", + "11 [-0.6811888813972473, -0.06688129156827927, -0... positive \n", + "12 [-0.5853302478790283, 0.4730848968029022, 0.01... positive \n", + "13 [-0.4013202488422394, 0.19056788086891174, -0.... positive \n", + "14 [-0.22270944714546204, 0.22281427681446075, 0.... positive \n", + "15 [-0.39890941977500916, 0.403655081987381, -0.0... positive \n", + "16 [-0.5352460145950317, 0.13765230774879456, -0.... positive \n", + "17 [-0.2797639071941376, -0.11143724620342255, -0... positive \n", + "18 [-0.5399389863014221, 0.385494589805603, -0.17... positive \n", + "19 [-0.37433841824531555, 0.10099871456623077, 0.... positive \n", + "20 [-1.3702389001846313, 0.30410271883010864, -0.... positive \n", + "21 [-0.9990127086639404, 0.07002592086791992, -0.... positive \n", + "22 [-0.27705344557762146, -0.5027639865875244, 0.... positive \n", + "23 [-0.5442849397659302, 0.2992677688598633, -0.2... positive \n", + "24 [-0.14087404310703278, 0.12531475722789764, -0... positive \n", + "25 [-0.68660968542099, 0.14108557999134064, 0.091... positive \n", + "26 [-0.6596466302871704, 0.1337697058916092, -0.1... positive \n", + "27 [-0.6116098761558533, 0.49599868059158325, 0.0... positive \n", + "28 [-0.03306262567639351, -0.01435087714344263, -... positive \n", + "29 [-0.9807755947113037, 0.5014723539352417, -0.4... positive \n", + "30 [0.17701220512390137, -0.3180699050426483, -0.... positive \n", + "31 [-0.3707915246486664, 0.044643059372901917, -0... positive \n", + "32 [-0.49639755487442017, 0.48042890429496765, -0... positive \n", + "33 [-0.3472699224948883, 0.6240038275718689, -0.2... positive \n", + "34 [-0.3393455743789673, 0.5790384411811829, -0.1... positive \n", + "35 [-0.3814673125743866, -0.26899996399879456, -0... positive \n", + "36 [-0.03743020445108414, 0.3995753526687622, -0.... positive \n", + "37 [-0.4682881832122803, -0.018485404551029205, -... positive \n", + "38 [-0.18775352835655212, 0.06152409315109253, -0... positive \n", + "39 [-0.507961094379425, 0.4713260531425476, -0.16... positive \n", + "40 [-0.4353094696998596, 0.027452869340777397, -0... positive \n", + "41 [-0.1517794281244278, 0.2896312475204468, -0.2... positive \n", + "42 [-0.14374561607837677, 0.10458363592624664, -0... positive \n", + "43 [-0.6834889650344849, 0.5808060169219971, -0.2... positive \n", + "44 [-0.5870745778083801, 0.007088684476912022, 0.... positive \n", + "45 [-0.8343449234962463, 0.2768908441066742, -0.5... positive \n", + "46 [-0.48633041977882385, 0.6339375972747803, -0.... positive \n", + "47 [-0.30197039246559143, 0.4937012493610382, -0.... positive \n", + "48 [-0.6582587361335754, 0.1842564195394516, -0.2... positive \n", + "49 [-0.33901163935661316, 0.04764261469244957, 0.... positive \n", + "\n", + " sentiment_confidence text \\\n", + "0 0.0 This film has to be viewed in the right frame ... \n", + "1 0.0 This is just a long advertisement for the movi... \n", + "2 0.0 This movie was physically painful to sit throu... \n", + "3 0.0 This movie was one of a handful that actually ... \n", + "4 0.0 I was pleasantly pleased with the ending. I ju... \n", + "5 0.0 I would purchase this and \"Thirty Seconds Over... \n", + "6 0.0 In any number of films, you can find Nicholas ... \n", + "7 0.0 This film is quite boring. There are snippets ... \n", + "8 0.0 Let's start this review out on a positive note... \n", + "9 0.0 Payback is the game being played in this drama... \n", + "10 0.0 Anthony Quinn is a master at capturing our hea... \n", + "11 0.0 Someone, some day, should do a study of archit... \n", + "12 0.0 Thats My Bush is first of all a very entertain... \n", + "13 0.0 The Last American Virgin (1982) was one of the... \n", + "14 0.0 I experienced Nightbreed for the first time on... \n", + "15 0.0 When I checked out the review for this film af... \n", + "16 0.0 This movie is a very realistic view of a polic... \n", + "17 0.0 The fact that after 50 years, it is still a hi... \n", + "18 0.0 One of the great mysteries of life, suffered f... \n", + "19 0.0 This movie is the Latino Godfather. An unlikel... \n", + "20 0.0 This is one fine movie, I can watch it any tim... \n", + "21 0.0 I love Meatballs! Terrific characters and poig... \n", + "22 0.0 Clocking in at an interminable three hours and... \n", + "23 0.0 Sudden Impact is the 4th of the Dirty Harry fi... \n", + "24 0.0 In a genre by itself, this film has a limited ... \n", + "25 0.0 The One and only was a great film. I had just ... \n", + "26 0.0 If you are 10 years old and never seen a movie... \n", + "27 0.0 Even if it were remotely funny, this mouldy wa... \n", + "28 0.0 This is one of the best bond games i have ever... \n", + "29 0.0 The performances rate better than the rating I... \n", + "30 0.0 Chinese Ghost Story III is a totally superfluo... \n", + "31 0.0 Fred Williamson, one of the two or three top b... \n", + "32 0.0 Vince Lombardi High School has a new principal... \n", + "33 0.0 It's hard to comment on this movie. It's one o... \n", + "34 0.0 I commend pictures that try something differen... \n", + "35 0.0 It's not just that this is a bad movie; it's n... \n", + "36 0.0 In order to enjoy 'Fur - An imaginary portrait... \n", + "37 0.0 I became a fan of the TV series `Homicide: Lif... \n", + "38 0.0 A terminally dull mystery-thriller, which may ... \n", + "39 0.0 this movie, while it could be considered an al... \n", + "40 0.0 This is the \"Battlefield Earth\" of mini series... \n", + "41 0.0 DVD has become the equivalent of the old late ... \n", + "42 0.0 I saw this movie when I was very young living ... \n", + "43 0.0 This may not be a memorable classic, but it is... \n", + "44 0.0 This is a great film for pure entertainment, n... \n", + "45 0.0 When I saw this on TV I was nervous...whats if... \n", + "46 0.0 This film without doubt is one of the worst I ... \n", + "47 0.0 \"Challenge to be Free\" was one of the first fi... \n", + "48 0.0 I bought this movie a few days ago, and though... \n", + "49 0.0 This movie is not as good as all think. the ac... \n", + "\n", + " y \n", + "0 positive \n", + "1 negative \n", + "2 negative \n", + "3 negative \n", + "4 negative \n", + "5 positive \n", + "6 positive \n", + "7 negative \n", + "8 positive \n", + "9 negative \n", + "10 positive \n", + "11 negative \n", + "12 positive \n", + "13 positive \n", + "14 positive \n", + "15 negative \n", + "16 positive \n", + "17 positive \n", + "18 positive \n", + "19 positive \n", + "20 positive \n", + "21 positive \n", + "22 negative \n", + "23 positive \n", + "24 positive \n", + "25 positive \n", + "26 negative \n", + "27 negative \n", + "28 positive \n", + "29 negative \n", + "30 negative \n", + "31 negative \n", + "32 positive \n", + "33 negative \n", + "34 positive \n", + "35 negative \n", + "36 negative \n", + "37 positive \n", + "38 negative \n", + "39 negative \n", + "40 negative \n", + "41 negative \n", + "42 positive \n", + "43 positive \n", + "44 positive \n", + "45 positive \n", + "46 negative \n", + "47 positive \n", + "48 positive \n", + "49 negative " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
documentsentence_embedding_small_bert_L2_128sentimentsentiment_confidencetexty
0This film has to be viewed in the right frame ...[-0.1513771265745163, 0.3099152743816376, -0.2...positive0.0This film has to be viewed in the right frame ...positive
1This is just a long advertisement for the movi...[-0.3328755795955658, 0.2796784043312073, 0.10...positive0.0This is just a long advertisement for the movi...negative
2This movie was physically painful to sit throu...[-0.6589022278785706, 0.09297071397304535, 0.0...positive0.0This movie was physically painful to sit throu...negative
3This movie was one of a handful that actually ...[-0.5372501015663147, 0.5361205339431763, -0.0...positive0.0This movie was one of a handful that actually ...negative
4I was pleasantly pleased with the ending. I ju...[-0.3981836140155792, 0.3446210026741028, -0.0...positive0.0I was pleasantly pleased with the ending. I ju...negative
5I would purchase this and \"Thirty Seconds Over...[-0.4655883014202118, 0.7156240940093994, -0.1...positive0.0I would purchase this and \"Thirty Seconds Over...positive
6In any number of films, you can find Nicholas ...[-0.366023987531662, 0.2559768557548523, -0.03...positive0.0In any number of films, you can find Nicholas ...positive
7This film is quite boring. There are snippets ...[-0.4195595383644104, 0.37842151522636414, -0....positive0.0This film is quite boring. There are snippets ...negative
8Let's start this review out on a positive note...[-0.5557488203048706, -0.04156230762600899, 0....positive0.0Let's start this review out on a positive note...positive
9Payback is the game being played in this drama...[-0.48163551092147827, 0.21267670392990112, -0...positive0.0Payback is the game being played in this drama...negative
10Anthony Quinn is a master at capturing our hea...[-0.710480809211731, 0.029913106933236122, -0....positive0.0Anthony Quinn is a master at capturing our hea...positive
11Someone, some day, should do a study of archit...[-0.6811888813972473, -0.06688129156827927, -0...positive0.0Someone, some day, should do a study of archit...negative
12Thats My Bush is first of all a very entertain...[-0.5853302478790283, 0.4730848968029022, 0.01...positive0.0Thats My Bush is first of all a very entertain...positive
13The Last American Virgin (1982) was one of the...[-0.4013202488422394, 0.19056788086891174, -0....positive0.0The Last American Virgin (1982) was one of the...positive
14I experienced Nightbreed for the first time on...[-0.22270944714546204, 0.22281427681446075, 0....positive0.0I experienced Nightbreed for the first time on...positive
15When I checked out the review for this film af...[-0.39890941977500916, 0.403655081987381, -0.0...positive0.0When I checked out the review for this film af...negative
16This movie is a very realistic view of a polic...[-0.5352460145950317, 0.13765230774879456, -0....positive0.0This movie is a very realistic view of a polic...positive
17The fact that after 50 years, it is still a hi...[-0.2797639071941376, -0.11143724620342255, -0...positive0.0The fact that after 50 years, it is still a hi...positive
18One of the great mysteries of life, suffered f...[-0.5399389863014221, 0.385494589805603, -0.17...positive0.0One of the great mysteries of life, suffered f...positive
19This movie is the Latino Godfather. An unlikel...[-0.37433841824531555, 0.10099871456623077, 0....positive0.0This movie is the Latino Godfather. An unlikel...positive
20This is one fine movie, I can watch it any tim...[-1.3702389001846313, 0.30410271883010864, -0....positive0.0This is one fine movie, I can watch it any tim...positive
21I love Meatballs! Terrific characters and poig...[-0.9990127086639404, 0.07002592086791992, -0....positive0.0I love Meatballs! Terrific characters and poig...positive
22Clocking in at an interminable three hours and...[-0.27705344557762146, -0.5027639865875244, 0....positive0.0Clocking in at an interminable three hours and...negative
23Sudden Impact is the 4th of the Dirty Harry fi...[-0.5442849397659302, 0.2992677688598633, -0.2...positive0.0Sudden Impact is the 4th of the Dirty Harry fi...positive
24In a genre by itself, this film has a limited ...[-0.14087404310703278, 0.12531475722789764, -0...positive0.0In a genre by itself, this film has a limited ...positive
25The One and only was a great film. I had just ...[-0.68660968542099, 0.14108557999134064, 0.091...positive0.0The One and only was a great film. I had just ...positive
26If you are 10 years old and never seen a movie...[-0.6596466302871704, 0.1337697058916092, -0.1...positive0.0If you are 10 years old and never seen a movie...negative
27Even if it were remotely funny, this mouldy wa...[-0.6116098761558533, 0.49599868059158325, 0.0...positive0.0Even if it were remotely funny, this mouldy wa...negative
28This is one of the best bond games i have ever...[-0.03306262567639351, -0.01435087714344263, -...positive0.0This is one of the best bond games i have ever...positive
29The performances rate better than the rating I...[-0.9807755947113037, 0.5014723539352417, -0.4...positive0.0The performances rate better than the rating I...negative
30Chinese Ghost Story III is a totally superfluo...[0.17701220512390137, -0.3180699050426483, -0....positive0.0Chinese Ghost Story III is a totally superfluo...negative
31Fred Williamson, one of the two or three top b...[-0.3707915246486664, 0.044643059372901917, -0...positive0.0Fred Williamson, one of the two or three top b...negative
32Vince Lombardi High School has a new principal...[-0.49639755487442017, 0.48042890429496765, -0...positive0.0Vince Lombardi High School has a new principal...positive
33It's hard to comment on this movie. It's one o...[-0.3472699224948883, 0.6240038275718689, -0.2...positive0.0It's hard to comment on this movie. It's one o...negative
34I commend pictures that try something differen...[-0.3393455743789673, 0.5790384411811829, -0.1...positive0.0I commend pictures that try something differen...positive
35It's not just that this is a bad movie; it's n...[-0.3814673125743866, -0.26899996399879456, -0...positive0.0It's not just that this is a bad movie; it's n...negative
36In order to enjoy 'Fur - An imaginary portrait...[-0.03743020445108414, 0.3995753526687622, -0....positive0.0In order to enjoy 'Fur - An imaginary portrait...negative
37I became a fan of the TV series `Homicide: Lif...[-0.4682881832122803, -0.018485404551029205, -...positive0.0I became a fan of the TV series `Homicide: Lif...positive
38A terminally dull mystery-thriller, which may ...[-0.18775352835655212, 0.06152409315109253, -0...positive0.0A terminally dull mystery-thriller, which may ...negative
39this movie, while it could be considered an al...[-0.507961094379425, 0.4713260531425476, -0.16...positive0.0this movie, while it could be considered an al...negative
40This is the \"Battlefield Earth\" of mini series...[-0.4353094696998596, 0.027452869340777397, -0...positive0.0This is the \"Battlefield Earth\" of mini series...negative
41DVD has become the equivalent of the old late ...[-0.1517794281244278, 0.2896312475204468, -0.2...positive0.0DVD has become the equivalent of the old late ...negative
42I saw this movie when I was very young living ...[-0.14374561607837677, 0.10458363592624664, -0...positive0.0I saw this movie when I was very young living ...positive
43This may not be a memorable classic, but it is...[-0.6834889650344849, 0.5808060169219971, -0.2...positive0.0This may not be a memorable classic, but it is...positive
44This is a great film for pure entertainment, n...[-0.5870745778083801, 0.007088684476912022, 0....positive0.0This is a great film for pure entertainment, n...positive
45When I saw this on TV I was nervous...whats if...[-0.8343449234962463, 0.2768908441066742, -0.5...positive0.0When I saw this on TV I was nervous...whats if...positive
46This film without doubt is one of the worst I ...[-0.48633041977882385, 0.6339375972747803, -0....positive0.0This film without doubt is one of the worst I ...negative
47\"Challenge to be Free\" was one of the first fi...[-0.30197039246559143, 0.4937012493610382, -0....positive0.0\"Challenge to be Free\" was one of the first fi...positive
48I bought this movie a few days ago, and though...[-0.6582587361335754, 0.1842564195394516, -0.2...positive0.0I bought this movie a few days ago, and though...positive
49This movie is not as good as all think. the ac...[-0.33901163935661316, 0.04764261469244957, 0....positive0.0This movie is not as good as all think. the ac...negative
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 10 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lVyOE2wV0fw_" + }, + "source": [ + "# 4. Test the fitted pipe on new example" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 150 + }, + "id": "qdCUg2MR0PD2", + "outputId": "b04708fa-2610-4b7d-fe94-3caf9811a6f7" + }, + "source": [ + "fitted_pipe.predict('It was one of the best films i have ever watched in my entire life !')" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "sentence_detector_dl download started this may take some time.\n", + "Approximate size to download 354.6 KB\n", + "[OK!]\n", + "Warning::Spark Session already created, some configs may not take.\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " sentence \\\n", + "0 It was one of the best films i have ever watch... \n", + "\n", + " sentence_embedding_small_bert_L2_128 sentiment \\\n", + "0 [-0.6158236265182495, -0.5645654201507568, -0.... positive \n", + "\n", + " sentiment_confidence \n", + "0 0.997638 " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
sentencesentence_embedding_small_bert_L2_128sentimentsentiment_confidence
0It was one of the best films i have ever watch...[-0.6158236265182495, -0.5645654201507568, -0....positive0.997638
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 11 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xflpwrVjjBVD" + }, + "source": [ + "## 5. Configure pipe training parameters" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "UtsAUGTmOTms", + "outputId": "3a0f86bb-f26c-4500-fa1b-1bde63e945df" + }, + "source": [ + "trainable_pipe.print_info()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n", + ">>> component_list['bert_sentence_embeddings@sent_small_bert_L2_128'] has settable params:\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setBatchSize(8) | Info: Size of every batch | Currently set to : 8\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setIsLong(False) | Info: Use Long type instead of Int type for inputs buffer - Some Bert models require Long instead of Int. | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setMaxSentenceLength(128) | Info: Max sentence length to process | Currently set to : 128\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setDimension(128) | Info: Number of embedding dimensions | Currently set to : 128\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setCaseSensitive(False) | Info: whether to ignore case in tokens for embeddings matching | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setStorageRef('sent_small_bert_L2_128') | Info: unique reference name for identification | Currently set to : sent_small_bert_L2_128\n", + ">>> component_list['document_assembler'] has settable params:\n", + "component_list['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n", + ">>> component_list['sentiment_dl@sent_small_bert_L2_128'] has settable params:\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setThreshold(0.6) | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setThresholdLabel('neutral') | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setStorageRef('sent_small_bert_L2_128') | Info: unique reference name for identification | Currently set to : sent_small_bert_L2_128\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2GJdDNV9jEIe" + }, + "source": [ + "## 6. Retrain with new parameters" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "id": "mptfvHx-MMMX", + "outputId": "b48d685f-0ad3-4b22-b79f-a0897eaaf8e2" + }, + "source": [ + "# Train longer!\n", + "trainable_pipe = nlp.load('train.sentiment')\n", + "trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(5)\n", + "fitted_pipe = trainable_pipe.fit(train_df.iloc[:50])\n", + "# predict with the trainable pipeline on dataset and get predictions\n", + "preds = fitted_pipe.predict(train_df.iloc[:50],output_level='document')\n", + "\n", + "#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n", + "preds.dropna(inplace=True)\n", + "print(classification_report(preds['y'], preds['sentiment']))\n", + "\n", + "preds" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", + "sent_small_bert_L2_128 download started this may take some time.\n", + "Approximate size to download 16.1 MB\n", + "[OK!]\n", + " precision recall f1-score support\n", + "\n", + " negative 0.56 1.00 0.72 28\n", + " positive 0.00 0.00 0.00 22\n", + "\n", + " accuracy 0.56 50\n", + " macro avg 0.28 0.50 0.36 50\n", + "weighted avg 0.31 0.56 0.40 50\n", + "\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " document \\\n", + "0 Set in 1945, Skenbart follows a failed Swedish... \n", + "1 I can't believe I watched this whole movie. An... \n", + "2 I'm beginning to see a pattern in the movies I... \n", + "3 Dan, the widowed father of three girls, has hi... \n", + "4 David Webb Peoples meets Paul Anderson...if it... \n", + "5 I love MIDNIGHT COWBOY and have it in my video... \n", + "6 I have NEVER EVER seen such a bad movie before... \n", + "7 I absolutely could not believe the levels of i... \n", + "8 A brash, self-centered Army cadet arrives at W... \n", + "9 The film is a remake of a 1956 BBC serial call... \n", + "10 Though Frank Loesser's songs are some of the f... \n", + "11 Rented(free rental thank goodness) this as sup... \n", + "12 This is one of those movies that make better t... \n", + "13 What can be said of this independent effort be... \n", + "14 Wow. I thought this might be insipid but it wa... \n", + "15 There is part of one sequence where some water... \n", + "16 i got a copy from the writer of this movie on ... \n", + "17 ... or was Honest Iago actually smirking at th... \n", + "18 First of all, I'd like to say that I love the ... \n", + "19 What an embarassment...This doesnt do justice ... \n", + "20 One of the worst movies ever made. Let's start... \n", + "21 This movie is humorous, charming, and easily b... \n", + "22 The movie starts something like a less hyper-k... \n", + "23 I won't say this movie was bad, but it wasn't ... \n", + "24 Wow! In my opinion, THE NET is an excellent, n... \n", + "25 Falsely accused, skirt-chasing chums John Wayn... \n", + "26 The Education of Little Tree is just not as go... \n", + "27 Interesting film about an actual event that to... \n", + "28 I have been a Mario fan for as long as I can r... \n", + "29 This woman never stops talking throughout the ... \n", + "30

I still can't belive Louis Gossett... \n", + "31 Florence Chadwick was actually the far more ac... \n", + "32 This may be one of the best movies I have ever... \n", + "33 Othello is set to burn the eyes of the viewers... \n", + "34 The statistics in this movie were well researc... \n", + "35 Following their daughter's brutal murder,Julie... \n", + "36 Compelling and Innovative! At the beginning of... \n", + "37 I saw this movie when it aired on Lifetime bac... \n", + "38 I really wanted to like this movie. I absolute... \n", + "39 This comment discusses \"North and South Book I... \n", + "40 Rarely have I seen an action/suspense movie th... \n", + "41 This is a 100% improvement over the dross of a... \n", + "42 I liked it better than House Party 2 & 3. The ... \n", + "43 A friend and I went to see this movie. We have... \n", + "44 Whattt was with the sound? It sounded like it ... \n", + "45 My college theater just had a special screenin... \n", + "46 Kingdom County, Vermont, 1927. Noel Lord (Rip ... \n", + "47 Man, I really enjoyed this, if only for Fred W... \n", + "48 You've seen the same tired, worn out clichéd s... \n", + "49 This movie is really genuine and random. It's ... \n", + "\n", + " sentence_embedding_small_bert_L2_128 sentiment \\\n", + "0 [-0.37176743149757385, 0.28505513072013855, -0... negative \n", + "1 [-0.6678956151008606, 0.4565986394882202, -0.3... negative \n", + "2 [-0.4640607237815857, 0.13232995569705963, -0.... negative \n", + "3 [-1.204692006111145, 0.2007242888212204, -0.27... negative \n", + "4 [-0.23281101882457733, 0.1650732308626175, 0.1... negative \n", + "5 [-0.731963574886322, 0.055591657757759094, -0.... negative \n", + "6 [-0.8489042520523071, -0.11029214411973953, -0... negative \n", + "7 [-0.7605423331260681, 0.3872695565223694, -0.2... negative \n", + "8 [-0.6972024440765381, 0.3831547200679779, -0.2... negative \n", + "9 [-0.19118931889533997, 0.3001491129398346, 0.0... negative \n", + "10 [-0.5250385999679565, 0.13566239178180695, -0.... negative \n", + "11 [-0.83491051197052, 0.5083938837051392, -0.224... negative \n", + "12 [-0.2737158536911011, 0.3766928017139435, -0.2... negative \n", + "13 [-0.18508094549179077, 0.3635772466659546, -0.... negative \n", + "14 [-0.4717271327972412, 0.4512611925601959, -0.3... negative \n", + "15 [-0.3119819164276123, 0.24397613108158112, -0.... negative \n", + "16 [-0.4513140022754669, 0.25351178646087646, -0.... negative \n", + "17 [-0.448510080575943, 0.3488628566265106, 0.424... negative \n", + "18 [-0.7053700685501099, -0.1855289489030838, 0.1... negative \n", + "19 [-0.9799928069114685, 0.37281569838523865, -0.... negative \n", + "20 [-0.63435959815979, -0.03198552504181862, -0.3... negative \n", + "21 [-0.5476221442222595, 0.45070168375968933, 0.0... negative \n", + "22 [-0.550375759601593, -0.21221719682216644, -0.... negative \n", + "23 [-0.5449793934822083, 0.3829023241996765, -0.2... negative \n", + "24 [-0.9690271615982056, 0.5579360127449036, -0.3... negative \n", + "25 [-0.7523209452629089, 0.8658801913261414, 0.18... negative \n", + "26 [-0.775317370891571, 0.23664450645446777, -0.0... negative \n", + "27 [-0.7987304329872131, 0.3676035404205322, 0.11... negative \n", + "28 [-0.25044935941696167, -0.36489105224609375, -... negative \n", + "29 [-0.7558593153953552, 0.573503315448761, -0.11... negative \n", + "30 [-0.5051725506782532, 0.4716736972332001, 0.07... negative \n", + "31 [-0.8047178387641907, 0.6103856563568115, -0.2... negative \n", + "32 [-0.9392787218093872, -0.002228498924523592, -... negative \n", + "33 [-0.5233349800109863, -0.0218301210552454, 0.0... negative \n", + "34 [-0.4964343309402466, -0.1613437831401825, 0.0... negative \n", + "35 [-0.45342516899108887, 0.025125373154878616, -... negative \n", + "36 [-0.39057832956314087, 0.07514691352844238, 0.... negative \n", + "37 [-0.4824381172657013, -0.024186618626117706, -... negative \n", + "38 [-0.5213059782981873, 0.08636035025119781, 0.0... negative \n", + "39 [-1.0346614122390747, 0.5499882698059082, -0.0... negative \n", + "40 [0.08839510381221771, -0.08421049267053604, -0... negative \n", + "41 [-0.4159573018550873, 0.1026173010468483, -0.1... negative \n", + "42 [-0.37537163496017456, 0.2604108452796936, 0.0... negative \n", + "43 [-0.30783846974372864, 0.0511380173265934, -0.... negative \n", + "44 [-0.591052234172821, 0.5442489385604858, 0.109... negative \n", + "45 [-0.47453969717025757, 0.33813729882240295, -0... negative \n", + "46 [-0.7206570506095886, 0.63383948802948, -0.231... negative \n", + "47 [-0.9212082624435425, 0.2386074960231781, 0.20... negative \n", + "48 [-0.6829520463943481, 0.18268251419067383, 0.1... negative \n", + "49 [-0.2603294849395752, -0.09567182511091232, -0... negative \n", + "\n", + " sentiment_confidence text \\\n", + "0 3.0 Set in 1945, Skenbart follows a failed Swedish... \n", + "1 2.0 I can't believe I watched this whole movie. An... \n", + "2 3.0 I'm beginning to see a pattern in the movies I... \n", + "3 1.0 Dan, the widowed father of three girls, has hi... \n", + "4 2.0 David Webb Peoples meets Paul Anderson...if it... \n", + "5 3.0 I love MIDNIGHT COWBOY and have it in my video... \n", + "6 2.0 I have NEVER EVER seen such a bad movie before... \n", + "7 2.0 I absolutely could not believe the levels of i... \n", + "8 3.0 A brash, self-centered Army cadet arrives at W... \n", + "9 4.0 The film is a remake of a 1956 BBC serial call... \n", + "10 5.0 Though Frank Loesser's songs are some of the f... \n", + "11 3.0 Rented(free rental thank goodness) this as sup... \n", + "12 3.0 This is one of those movies that make better t... \n", + "13 2.0 What can be said of this independent effort be... \n", + "14 1.0 Wow. I thought this might be insipid but it wa... \n", + "15 1.0 There is part of one sequence where some water... \n", + "16 1.0 i got a copy from the writer of this movie on ... \n", + "17 3.0 ... or was Honest Iago actually smirking at th... \n", + "18 3.0 First of all, I'd like to say that I love the ... \n", + "19 1.0 What an embarassment...This doesnt do justice ... \n", + "20 3.0 One of the worst movies ever made. Let's start... \n", + "21 5.0 This movie is humorous, charming, and easily b... \n", + "22 6.0 The movie starts something like a less hyper-k... \n", + "23 2.0 I won't say this movie was bad, but it wasn't ... \n", + "24 5.0 Wow! In my opinion, THE NET is an excellent, n... \n", + "25 6.0 Falsely accused, skirt-chasing chums John Wayn... \n", + "26 3.0 The Education of Little Tree is just not as go... \n", + "27 3.0 Interesting film about an actual event that to... \n", + "28 7.0 I have been a Mario fan for as long as I can r... \n", + "29 1.0 This woman never stops talking throughout the ... \n", + "30 9.0

I still can't belive Louis Gossett... \n", + "31 9.0 Florence Chadwick was actually the far more ac... \n", + "32 3.0 This may be one of the best movies I have ever... \n", + "33 2.0 Othello is set to burn the eyes of the viewers... \n", + "34 2.0 The statistics in this movie were well researc... \n", + "35 6.0 Following their daughter's brutal murder,Julie... \n", + "36 9.0 Compelling and Innovative! At the beginning of... \n", + "37 2.0 I saw this movie when it aired on Lifetime bac... \n", + "38 2.0 I really wanted to like this movie. I absolute... \n", + "39 9.0 This comment discusses \"North and South Book I... \n", + "40 3.0 Rarely have I seen an action/suspense movie th... \n", + "41 2.0 This is a 100% improvement over the dross of a... \n", + "42 3.0 I liked it better than House Party 2 & 3. The ... \n", + "43 4.0 A friend and I went to see this movie. We have... \n", + "44 1.0 Whattt was with the sound? It sounded like it ... \n", + "45 2.0 My college theater just had a special screenin... \n", + "46 1.0 Kingdom County, Vermont, 1927. Noel Lord (Rip ... \n", + "47 3.0 Man, I really enjoyed this, if only for Fred W... \n", + "48 4.0 You've seen the same tired, worn out clichéd s... \n", + "49 2.0 This movie is really genuine and random. It's ... \n", + "\n", + " y \n", + "0 negative \n", + "1 negative \n", + "2 negative \n", + "3 positive \n", + "4 positive \n", + "5 positive \n", + "6 negative \n", + "7 negative \n", + "8 positive \n", + "9 positive \n", + "10 negative \n", + "11 negative \n", + "12 positive \n", + "13 negative \n", + "14 negative \n", + "15 negative \n", + "16 negative \n", + "17 positive \n", + "18 negative \n", + "19 negative \n", + "20 negative \n", + "21 positive \n", + "22 positive \n", + "23 negative \n", + "24 positive \n", + "25 negative \n", + "26 negative \n", + "27 negative \n", + "28 positive \n", + "29 negative \n", + "30 negative \n", + "31 negative \n", + "32 positive \n", + "33 negative \n", + "34 negative \n", + "35 positive \n", + "36 positive \n", + "37 positive \n", + "38 negative \n", + "39 positive \n", + "40 negative \n", + "41 positive \n", + "42 positive \n", + "43 negative \n", + "44 negative \n", + "45 positive \n", + "46 positive \n", + "47 positive \n", + "48 negative \n", + "49 positive " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
documentsentence_embedding_small_bert_L2_128sentimentsentiment_confidencetexty
0Set in 1945, Skenbart follows a failed Swedish...[-0.37176743149757385, 0.28505513072013855, -0...negative3.0Set in 1945, Skenbart follows a failed Swedish...negative
1I can't believe I watched this whole movie. An...[-0.6678956151008606, 0.4565986394882202, -0.3...negative2.0I can't believe I watched this whole movie. An...negative
2I'm beginning to see a pattern in the movies I...[-0.4640607237815857, 0.13232995569705963, -0....negative3.0I'm beginning to see a pattern in the movies I...negative
3Dan, the widowed father of three girls, has hi...[-1.204692006111145, 0.2007242888212204, -0.27...negative1.0Dan, the widowed father of three girls, has hi...positive
4David Webb Peoples meets Paul Anderson...if it...[-0.23281101882457733, 0.1650732308626175, 0.1...negative2.0David Webb Peoples meets Paul Anderson...if it...positive
5I love MIDNIGHT COWBOY and have it in my video...[-0.731963574886322, 0.055591657757759094, -0....negative3.0I love MIDNIGHT COWBOY and have it in my video...positive
6I have NEVER EVER seen such a bad movie before...[-0.8489042520523071, -0.11029214411973953, -0...negative2.0I have NEVER EVER seen such a bad movie before...negative
7I absolutely could not believe the levels of i...[-0.7605423331260681, 0.3872695565223694, -0.2...negative2.0I absolutely could not believe the levels of i...negative
8A brash, self-centered Army cadet arrives at W...[-0.6972024440765381, 0.3831547200679779, -0.2...negative3.0A brash, self-centered Army cadet arrives at W...positive
9The film is a remake of a 1956 BBC serial call...[-0.19118931889533997, 0.3001491129398346, 0.0...negative4.0The film is a remake of a 1956 BBC serial call...positive
10Though Frank Loesser's songs are some of the f...[-0.5250385999679565, 0.13566239178180695, -0....negative5.0Though Frank Loesser's songs are some of the f...negative
11Rented(free rental thank goodness) this as sup...[-0.83491051197052, 0.5083938837051392, -0.224...negative3.0Rented(free rental thank goodness) this as sup...negative
12This is one of those movies that make better t...[-0.2737158536911011, 0.3766928017139435, -0.2...negative3.0This is one of those movies that make better t...positive
13What can be said of this independent effort be...[-0.18508094549179077, 0.3635772466659546, -0....negative2.0What can be said of this independent effort be...negative
14Wow. I thought this might be insipid but it wa...[-0.4717271327972412, 0.4512611925601959, -0.3...negative1.0Wow. I thought this might be insipid but it wa...negative
15There is part of one sequence where some water...[-0.3119819164276123, 0.24397613108158112, -0....negative1.0There is part of one sequence where some water...negative
16i got a copy from the writer of this movie on ...[-0.4513140022754669, 0.25351178646087646, -0....negative1.0i got a copy from the writer of this movie on ...negative
17... or was Honest Iago actually smirking at th...[-0.448510080575943, 0.3488628566265106, 0.424...negative3.0... or was Honest Iago actually smirking at th...positive
18First of all, I'd like to say that I love the ...[-0.7053700685501099, -0.1855289489030838, 0.1...negative3.0First of all, I'd like to say that I love the ...negative
19What an embarassment...This doesnt do justice ...[-0.9799928069114685, 0.37281569838523865, -0....negative1.0What an embarassment...This doesnt do justice ...negative
20One of the worst movies ever made. Let's start...[-0.63435959815979, -0.03198552504181862, -0.3...negative3.0One of the worst movies ever made. Let's start...negative
21This movie is humorous, charming, and easily b...[-0.5476221442222595, 0.45070168375968933, 0.0...negative5.0This movie is humorous, charming, and easily b...positive
22The movie starts something like a less hyper-k...[-0.550375759601593, -0.21221719682216644, -0....negative6.0The movie starts something like a less hyper-k...positive
23I won't say this movie was bad, but it wasn't ...[-0.5449793934822083, 0.3829023241996765, -0.2...negative2.0I won't say this movie was bad, but it wasn't ...negative
24Wow! In my opinion, THE NET is an excellent, n...[-0.9690271615982056, 0.5579360127449036, -0.3...negative5.0Wow! In my opinion, THE NET is an excellent, n...positive
25Falsely accused, skirt-chasing chums John Wayn...[-0.7523209452629089, 0.8658801913261414, 0.18...negative6.0Falsely accused, skirt-chasing chums John Wayn...negative
26The Education of Little Tree is just not as go...[-0.775317370891571, 0.23664450645446777, -0.0...negative3.0The Education of Little Tree is just not as go...negative
27Interesting film about an actual event that to...[-0.7987304329872131, 0.3676035404205322, 0.11...negative3.0Interesting film about an actual event that to...negative
28I have been a Mario fan for as long as I can r...[-0.25044935941696167, -0.36489105224609375, -...negative7.0I have been a Mario fan for as long as I can r...positive
29This woman never stops talking throughout the ...[-0.7558593153953552, 0.573503315448761, -0.11...negative1.0This woman never stops talking throughout the ...negative
30<br /><br />I still can't belive Louis Gossett...[-0.5051725506782532, 0.4716736972332001, 0.07...negative9.0<br /><br />I still can't belive Louis Gossett...negative
31Florence Chadwick was actually the far more ac...[-0.8047178387641907, 0.6103856563568115, -0.2...negative9.0Florence Chadwick was actually the far more ac...negative
32This may be one of the best movies I have ever...[-0.9392787218093872, -0.002228498924523592, -...negative3.0This may be one of the best movies I have ever...positive
33Othello is set to burn the eyes of the viewers...[-0.5233349800109863, -0.0218301210552454, 0.0...negative2.0Othello is set to burn the eyes of the viewers...negative
34The statistics in this movie were well researc...[-0.4964343309402466, -0.1613437831401825, 0.0...negative2.0The statistics in this movie were well researc...negative
35Following their daughter's brutal murder,Julie...[-0.45342516899108887, 0.025125373154878616, -...negative6.0Following their daughter's brutal murder,Julie...positive
36Compelling and Innovative! At the beginning of...[-0.39057832956314087, 0.07514691352844238, 0....negative9.0Compelling and Innovative! At the beginning of...positive
37I saw this movie when it aired on Lifetime bac...[-0.4824381172657013, -0.024186618626117706, -...negative2.0I saw this movie when it aired on Lifetime bac...positive
38I really wanted to like this movie. I absolute...[-0.5213059782981873, 0.08636035025119781, 0.0...negative2.0I really wanted to like this movie. I absolute...negative
39This comment discusses \"North and South Book I...[-1.0346614122390747, 0.5499882698059082, -0.0...negative9.0This comment discusses \"North and South Book I...positive
40Rarely have I seen an action/suspense movie th...[0.08839510381221771, -0.08421049267053604, -0...negative3.0Rarely have I seen an action/suspense movie th...negative
41This is a 100% improvement over the dross of a...[-0.4159573018550873, 0.1026173010468483, -0.1...negative2.0This is a 100% improvement over the dross of a...positive
42I liked it better than House Party 2 & 3. The ...[-0.37537163496017456, 0.2604108452796936, 0.0...negative3.0I liked it better than House Party 2 & 3. The ...positive
43A friend and I went to see this movie. We have...[-0.30783846974372864, 0.0511380173265934, -0....negative4.0A friend and I went to see this movie. We have...negative
44Whattt was with the sound? It sounded like it ...[-0.591052234172821, 0.5442489385604858, 0.109...negative1.0Whattt was with the sound? It sounded like it ...negative
45My college theater just had a special screenin...[-0.47453969717025757, 0.33813729882240295, -0...negative2.0My college theater just had a special screenin...positive
46Kingdom County, Vermont, 1927. Noel Lord (Rip ...[-0.7206570506095886, 0.63383948802948, -0.231...negative1.0Kingdom County, Vermont, 1927. Noel Lord (Rip ...positive
47Man, I really enjoyed this, if only for Fred W...[-0.9212082624435425, 0.2386074960231781, 0.20...negative3.0Man, I really enjoyed this, if only for Fred W...positive
48You've seen the same tired, worn out clichéd s...[-0.6829520463943481, 0.18268251419067383, 0.1...negative4.0You've seen the same tired, worn out clichéd s...negative
49This movie is really genuine and random. It's ...[-0.2603294849395752, -0.09567182511091232, -0...negative2.0This movie is really genuine and random. It's ...positive
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 34 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qFoT-s1MjTSS" + }, + "source": [ + "# 7. Try training with different Embeddings" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "nxWFzQOhjWC8", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "d940310d-dfc6-45ae-c5dc-50d0141c836f" + }, + "source": [ + "# We can use nlu.print_components(action='embed_sentence') to see every possibler sentence embedding we could use. Lets use bert!\n", + "nlp.nlu.print_components(action='embed_sentence')" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "For language NLU provides the following Models : \n", + "nlu.load('am.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_amharic\n", + "For language NLU provides the following Models : \n", + "nlu.load('de.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "For language NLU provides the following Models : \n", + "nlu.load('el.embed_sentence.bert.base_uncased') returns Spark NLP model_anno_obj sent_bert_base_uncased\n", + "For language NLU provides the following Models : \n", + "nlu.load('en.embed_sentence') returns Spark NLP model_anno_obj tfhub_use\n", + "nlu.load('en.embed_sentence.albert') returns Spark NLP model_anno_obj albert_base_uncased\n", + "nlu.load('en.embed_sentence.bert') returns Spark NLP model_anno_obj sent_bert_base_uncased\n", + "nlu.load('en.embed_sentence.bert.base_uncased_legal') returns Spark NLP model_anno_obj sent_bert_base_uncased_legal\n", + "nlu.load('en.embed_sentence.bert.finetuned') returns Spark NLP model_anno_obj sbert_setfit_finetuned_financial_text_classification\n", + "nlu.load('en.embed_sentence.bert.pubmed') returns Spark NLP model_anno_obj sent_bert_pubmed\n", + "nlu.load('en.embed_sentence.bert.pubmed_squad2') returns Spark NLP model_anno_obj sent_bert_pubmed_squad2\n", + "nlu.load('en.embed_sentence.bert.wiki_books') returns Spark NLP model_anno_obj sent_bert_wiki_books\n", + "nlu.load('en.embed_sentence.bert.wiki_books_mnli') returns Spark NLP model_anno_obj sent_bert_wiki_books_mnli\n", + "nlu.load('en.embed_sentence.bert.wiki_books_qnli') returns Spark NLP model_anno_obj sent_bert_wiki_books_qnli\n", + "nlu.load('en.embed_sentence.bert.wiki_books_qqp') returns Spark NLP model_anno_obj sent_bert_wiki_books_qqp\n", + "nlu.load('en.embed_sentence.bert.wiki_books_squad2') returns Spark NLP model_anno_obj sent_bert_wiki_books_squad2\n", + "nlu.load('en.embed_sentence.bert.wiki_books_sst2') returns Spark NLP model_anno_obj sent_bert_wiki_books_sst2\n", + "nlu.load('en.embed_sentence.bert_base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "nlu.load('en.embed_sentence.bert_base_uncased') returns Spark NLP model_anno_obj sent_bert_base_uncased\n", + "nlu.load('en.embed_sentence.bert_large_cased') returns Spark NLP model_anno_obj sent_bert_large_cased\n", + "nlu.load('en.embed_sentence.bert_large_uncased') returns Spark NLP model_anno_obj sent_bert_large_uncased\n", + "nlu.load('en.embed_sentence.bert_use_cmlm_en_base') returns Spark NLP model_anno_obj sent_bert_use_cmlm_en_base\n", + "nlu.load('en.embed_sentence.bert_use_cmlm_en_large') returns Spark NLP model_anno_obj sent_bert_use_cmlm_en_large\n", + "nlu.load('en.embed_sentence.biobert.clinical_base_cased') returns Spark NLP model_anno_obj sent_biobert_clinical_base_cased\n", + "nlu.load('en.embed_sentence.biobert.discharge_base_cased') returns Spark NLP model_anno_obj sent_biobert_discharge_base_cased\n", + "nlu.load('en.embed_sentence.biobert.pmc_base_cased') returns Spark NLP model_anno_obj sent_biobert_pmc_base_cased\n", + "nlu.load('en.embed_sentence.biobert.pubmed_base_cased') returns Spark NLP model_anno_obj sent_biobert_pubmed_base_cased\n", + "nlu.load('en.embed_sentence.biobert.pubmed_large_cased') returns Spark NLP model_anno_obj sent_biobert_pubmed_large_cased\n", + "nlu.load('en.embed_sentence.biobert.pubmed_pmc_base_cased') returns Spark NLP model_anno_obj sent_biobert_pubmed_pmc_base_cased\n", + "nlu.load('en.embed_sentence.covidbert.large_uncased') returns Spark NLP model_anno_obj sent_covidbert_large_uncased\n", + "nlu.load('en.embed_sentence.distil_roberta.distilled_base') returns Spark NLP model_anno_obj sent_distilroberta_base\n", + "nlu.load('en.embed_sentence.doc2vec') returns Spark NLP model_anno_obj doc2vec_gigaword_300\n", + "nlu.load('en.embed_sentence.doc2vec.gigaword_300') returns Spark NLP model_anno_obj doc2vec_gigaword_300\n", + "nlu.load('en.embed_sentence.doc2vec.gigaword_wiki_300') returns Spark NLP model_anno_obj doc2vec_gigaword_wiki_300\n", + "nlu.load('en.embed_sentence.electra') returns Spark NLP model_anno_obj sent_electra_small_uncased\n", + "nlu.load('en.embed_sentence.electra_base_uncased') returns Spark NLP model_anno_obj sent_electra_base_uncased\n", + "nlu.load('en.embed_sentence.electra_large_uncased') returns Spark NLP model_anno_obj sent_electra_large_uncased\n", + "nlu.load('en.embed_sentence.electra_small_uncased') returns Spark NLP model_anno_obj sent_electra_small_uncased\n", + "nlu.load('en.embed_sentence.roberta.base') returns Spark NLP model_anno_obj sent_roberta_base\n", + "nlu.load('en.embed_sentence.roberta.large') returns Spark NLP model_anno_obj sent_roberta_large\n", + "nlu.load('en.embed_sentence.small_bert_L10_128') returns Spark NLP model_anno_obj sent_small_bert_L10_128\n", + "nlu.load('en.embed_sentence.small_bert_L10_256') returns Spark NLP model_anno_obj sent_small_bert_L10_256\n", + "nlu.load('en.embed_sentence.small_bert_L10_512') returns Spark NLP model_anno_obj sent_small_bert_L10_512\n", + "nlu.load('en.embed_sentence.small_bert_L10_768') returns Spark NLP model_anno_obj sent_small_bert_L10_768\n", + "nlu.load('en.embed_sentence.small_bert_L12_128') returns Spark NLP model_anno_obj sent_small_bert_L12_128\n", + "nlu.load('en.embed_sentence.small_bert_L12_256') returns Spark NLP model_anno_obj sent_small_bert_L12_256\n", + "nlu.load('en.embed_sentence.small_bert_L12_512') returns Spark NLP model_anno_obj sent_small_bert_L12_512\n", + "nlu.load('en.embed_sentence.small_bert_L12_768') returns Spark NLP model_anno_obj sent_small_bert_L12_768\n", + "nlu.load('en.embed_sentence.small_bert_L2_128') returns Spark NLP model_anno_obj sent_small_bert_L2_128\n", + "nlu.load('en.embed_sentence.small_bert_L2_256') returns Spark NLP model_anno_obj sent_small_bert_L2_256\n", + "nlu.load('en.embed_sentence.small_bert_L2_512') returns Spark NLP model_anno_obj sent_small_bert_L2_512\n", + "nlu.load('en.embed_sentence.small_bert_L2_768') returns Spark NLP model_anno_obj sent_small_bert_L2_768\n", + "nlu.load('en.embed_sentence.small_bert_L4_128') returns Spark NLP model_anno_obj sent_small_bert_L4_128\n", + "nlu.load('en.embed_sentence.small_bert_L4_256') returns Spark NLP model_anno_obj sent_small_bert_L4_256\n", + "nlu.load('en.embed_sentence.small_bert_L4_512') returns Spark NLP model_anno_obj sent_small_bert_L4_512\n", + "nlu.load('en.embed_sentence.small_bert_L4_768') returns Spark NLP model_anno_obj sent_small_bert_L4_768\n", + "nlu.load('en.embed_sentence.small_bert_L6_128') returns Spark NLP model_anno_obj sent_small_bert_L6_128\n", + "nlu.load('en.embed_sentence.small_bert_L6_256') returns Spark NLP model_anno_obj sent_small_bert_L6_256\n", + "nlu.load('en.embed_sentence.small_bert_L6_512') returns Spark NLP model_anno_obj sent_small_bert_L6_512\n", + "nlu.load('en.embed_sentence.small_bert_L6_768') returns Spark NLP model_anno_obj sent_small_bert_L6_768\n", + "nlu.load('en.embed_sentence.small_bert_L8_128') returns Spark NLP model_anno_obj sent_small_bert_L8_128\n", + "nlu.load('en.embed_sentence.small_bert_L8_256') returns Spark NLP model_anno_obj sent_small_bert_L8_256\n", + "nlu.load('en.embed_sentence.small_bert_L8_512') returns Spark NLP model_anno_obj sent_small_bert_L8_512\n", + "nlu.load('en.embed_sentence.small_bert_L8_768') returns Spark NLP model_anno_obj sent_small_bert_L8_768\n", + "nlu.load('en.embed_sentence.tfhub_use') returns Spark NLP model_anno_obj tfhub_use\n", + "nlu.load('en.embed_sentence.tfhub_use.lg') returns Spark NLP model_anno_obj tfhub_use_lg\n", + "nlu.load('en.embed_sentence.use') returns Spark NLP model_anno_obj tfhub_use\n", + "nlu.load('en.embed_sentence.use.lg') returns Spark NLP model_anno_obj tfhub_use_lg\n", + "For language NLU provides the following Models : \n", + "nlu.load('es.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "nlu.load('es.embed_sentence.bert.base_uncased') returns Spark NLP model_anno_obj sent_bert_base_uncased\n", + "For language NLU provides the following Models : \n", + "nlu.load('fi.embed_sentence.bert') returns Spark NLP model_anno_obj bert_base_finnish_uncased\n", + "nlu.load('fi.embed_sentence.bert.cased') returns Spark NLP model_anno_obj bert_base_finnish_cased\n", + "nlu.load('fi.embed_sentence.bert.uncased') returns Spark NLP model_anno_obj bert_base_finnish_uncased\n", + "For language NLU provides the following Models : \n", + "nlu.load('ha.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_hausa\n", + "For language NLU provides the following Models : \n", + "nlu.load('ig.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_igbo\n", + "For language NLU provides the following Models : \n", + "nlu.load('lg.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_luganda\n", + "For language NLU provides the following Models : \n", + "nlu.load('nl.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "For language NLU provides the following Models : \n", + "nlu.load('pcm.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_naija\n", + "For language NLU provides the following Models : \n", + "nlu.load('pt.embed_sentence.bert.base_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_base_tsdae_sts\n", + "nlu.load('pt.embed_sentence.bert.cased_large_legal') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.1\n", + "nlu.load('pt.embed_sentence.bert.large_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_gpl_sts\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.10.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.10\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.2.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.2\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.3.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.3\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.4.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.4\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.5.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.5\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.7.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.7\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.8.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.8\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.9.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.9\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v1.0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v1.0\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_gpl_nli_sts_v0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_gpl_nli_sts_v0\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_gpl_nli_sts_v1.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_gpl_nli_sts_v1\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_nli_sts_v0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_nli_sts_v0\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_nli_sts_v1.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_nli_sts_v1\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_sts_v0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_sts_v0\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_sts_v1.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_sts_v1\n", + "nlu.load('pt.embed_sentence.bert.v2_base_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_base_ma_v2\n", + "nlu.load('pt.embed_sentence.bert.v2_large_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_sts_v2\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.assin.base.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_base_ma\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.assin2.base.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_base\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_sts_by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_sts_ma.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large_ma\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_sts_ma_v3.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large_ma_v3\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_tsdae_sts.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_sts\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_tsdae_sts_v4.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_sts_v4\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_tsdae_v4_gpl_sts.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_v4_gpl_sts\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.v2_large_sts_v2.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large_v2\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.v2_large_v2_sts.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_v2_sts\n", + "For language NLU provides the following Models : \n", + "nlu.load('rw.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_kinyarwanda\n", + "For language NLU provides the following Models : \n", + "nlu.load('sv.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "For language NLU provides the following Models : \n", + "nlu.load('sw.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_swahili\n", + "For language NLU provides the following Models : \n", + "nlu.load('wo.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_wolof\n", + "For language NLU provides the following Models : \n", + "nlu.load('xx.embed_sentence') returns Spark NLP model_anno_obj sent_bert_multi_cased\n", + "nlu.load('xx.embed_sentence.bert') returns Spark NLP model_anno_obj sent_bert_multi_cased\n", + "nlu.load('xx.embed_sentence.bert.cased') returns Spark NLP model_anno_obj sent_bert_multi_cased\n", + "nlu.load('xx.embed_sentence.bert.muril') returns Spark NLP model_anno_obj sent_bert_muril\n", + "nlu.load('xx.embed_sentence.bert_use_cmlm_multi_base') returns Spark NLP model_anno_obj sent_bert_use_cmlm_multi_base\n", + "nlu.load('xx.embed_sentence.bert_use_cmlm_multi_base_br') returns Spark NLP model_anno_obj sent_bert_use_cmlm_multi_base_br\n", + "nlu.load('xx.embed_sentence.labse') returns Spark NLP model_anno_obj labse\n", + "nlu.load('xx.embed_sentence.xlm_roberta.base') returns Spark NLP model_anno_obj sent_xlm_roberta_base\n", + "For language NLU provides the following Models : \n", + "nlu.load('yo.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_yoruba\n", + "For language NLU provides the following Models : \n", + "nlu.load('zh.embed_sentence.bert') returns Spark NLP model_anno_obj sbert_chinese_qmc_finance_v1\n", + "nlu.load('zh.embed_sentence.bert.distilled') returns Spark NLP model_anno_obj sbert_chinese_qmc_finance_v1_distill\n" + ] + } + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "IKK_Ii_gjJfF", + "outputId": "53181dbd-eea1-4b84-ead4-ea49a27d903d" + }, + "source": [ + "trainable_pipe = nlp.load('en.embed_sentence.small_bert_L12_768 train.sentiment')\n", + "# We need to train longer and user smaller LR for NON-USE based sentence embeddings usually\n", + "# We could tune the hyperparameters further with hyperparameter tuning methods like gridsearch\n", + "# Also longer training gives more accuracy\n", + "trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(120)\n", + "trainable_pipe['trainable_sentiment_dl'].setLr(0.0005)\n", + "fitted_pipe = trainable_pipe.fit(train_df[:100])\n", + "# predict with the trainable pipeline on dataset and get predictions\n", + "preds = fitted_pipe.predict(train_df[:100],output_level='document')\n", + "\n", + "#sentence detector that is part of the pipe generates some NaNs. lets drop them first\n", + "preds.dropna(inplace=True)\n", + "print(classification_report(preds['y'], preds['sentiment']))\n", + "\n", + "#preds" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + " precision recall f1-score support\n", + "\n", + " negative 1.00 0.55 0.71 44\n", + " neutral 0.00 0.00 0.00 0\n", + " positive 0.98 0.88 0.92 56\n", + "\n", + " accuracy 0.73 100\n", + " macro avg 0.66 0.47 0.54 100\n", + "weighted avg 0.99 0.73 0.83 100\n", + "\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_1jxw3GnVGlI" + }, + "source": [ + "# 7.1 evaluate on Test Data" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Fxx4yNkNVGFl", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "b9ff1dc4-a29d-4975-9167-0304dfe71278" + }, + "source": [ + "preds = fitted_pipe.predict(test_df[:100],output_level='document')\n", + "\n", + "#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n", + "preds.dropna(inplace=True)\n", + "print(classification_report(preds['y'], preds['sentiment']))" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + " precision recall f1-score support\n", + "\n", + " negative 0.94 0.34 0.50 50\n", + " neutral 0.00 0.00 0.00 0\n", + " positive 0.76 0.64 0.70 50\n", + "\n", + " accuracy 0.49 100\n", + " macro avg 0.57 0.33 0.40 100\n", + "weighted avg 0.85 0.49 0.60 100\n", + "\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2BB-NwZUoHSe" + }, + "source": [ + "# 8. Lets save the model" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "eLex095goHwm" + }, + "source": [ + "stored_model_path = './models/classifier_dl_trained'\n", + "fitted_pipe.save(stored_model_path)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e_b2DPd4rCiU" + }, + "source": [ + "# 9. Lets load the model from HDD.\n", + "This makes Offlien NLU usage possible! \n", + "You need to call nlu.load(path=path_to_the_pipe) to load a model/pipeline from disk." + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 133 + }, + "id": "SO4uz45MoRgp", + "outputId": "a1c68191-a229-4990-e847-a8e9e18d4618" + }, + "source": [ + "hdd_pipe = nlp.load(path=stored_model_path)\n", + "\n", + "preds = hdd_pipe.predict('It was one of the best films i have ever watched in my entire life !!')\n", + "preds" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " document \\\n", + "0 It was one of the best films i have ever watch... \n", + "\n", + " sentence_embedding_from_disk sentiment \\\n", + "0 [0.09222032874822617, 0.1172066256403923, 0.19... positive \n", + "\n", + " sentiment_confidence \n", + "0 0.0 " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
documentsentence_embedding_from_disksentimentsentiment_confidence
0It was one of the best films i have ever watch...[0.09222032874822617, 0.1172066256403923, 0.19...positive0.0
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 8 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "e0CVlkk9v6Qi", + "outputId": "651f1a33-cb6b-413e-db68-6e11338b3077" + }, + "source": [ + "hdd_pipe.print_info()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n", + ">>> component_list['document_assembler'] has settable params:\n", + "component_list['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n", + ">>> component_list['bert_sentence_embeddings@sent_small_bert_L12_768'] has settable params:\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setBatchSize(8) | Info: Size of every batch | Currently set to : 8\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setCaseSensitive(False) | Info: whether to ignore case in tokens for embeddings matching | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setDimension(768) | Info: Number of embedding dimensions | Currently set to : 768\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setMaxSentenceLength(128) | Info: Max sentence length to process | Currently set to : 128\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setIsLong(False) | Info: Use Long type instead of Int type for inputs buffer - Some Bert models require Long instead of Int. | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setStorageRef('sent_small_bert_L12_768') | Info: unique reference name for identification | Currently set to : sent_small_bert_L12_768\n", + ">>> component_list['sentiment_dl@sent_small_bert_L12_768'] has settable params:\n", + "component_list['sentiment_dl@sent_small_bert_L12_768'].setThreshold(0.6) | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n", + "component_list['sentiment_dl@sent_small_bert_L12_768'].setThresholdLabel('neutral') | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n", + "component_list['sentiment_dl@sent_small_bert_L12_768'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['sentiment_dl@sent_small_bert_L12_768'].setClasses(['positive', 'negative']) | Info: get the tags used to trained this SentimentDLModel | Currently set to : ['positive', 'negative']\n", + "component_list['sentiment_dl@sent_small_bert_L12_768'].setStorageRef('sent_small_bert_L12_768') | Info: unique reference name for identification | Currently set to : sent_small_bert_L12_768\n" + ] + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "mtDcALorKHIx" + }, + "source": [], + "execution_count": null, + "outputs": [] + } + ] +} \ No newline at end of file diff --git a/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_apple_twitter.ipynb b/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_apple_twitter.ipynb index a949d4ee..98d1aebe 100644 --- a/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_apple_twitter.ipynb +++ b/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_apple_twitter.ipynb @@ -1 +1,2598 @@ -{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"NLU_training_sentiment_classifier_demo_apple_twitter.ipynb","provenance":[],"collapsed_sections":[]},"kernelspec":{"display_name":"Python 3","name":"python3"}},"cells":[{"cell_type":"markdown","metadata":{"id":"RIV-9vEqxTBB"},"source":["![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)\n","\n","[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_apple_twitter.ipynb)\n","\n","\n","\n","# Training a Sentiment Analysis Classifier with NLU \n","## 2 class Apple Tweets sentiment classifier training\n","With the [SentimentDL model](https://nlp.johnsnowlabs.com/docs/en/annotators#sentimentdl-multi-class-sentiment-analysis-annotator) from Spark NLP you can achieve State Of the Art results on any multi class text classification problem \n","\n","This notebook showcases the following features : \n","\n","- How to train the deep learning classifier\n","- How to store a pipeline to disk\n","- How to load the pipeline from disk (Enables NLU offline mode)\n","\n","You can achieve these results or even better on this dataset with training data:\n","\n","
\n","\n","![image.png]()\n","\n","You can achieve these results or even better on this dataset with test data:\n","\n","\n","\n","
\n","\n","\n","![image.png]()\n","\n"]},{"cell_type":"code","metadata":{"id":"05-mAOF6ol-0","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1620187468464,"user_tz":-120,"elapsed":103699,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"c1058df3-e272-4d87-f61e-0589cfd22b84"},"source":["!wget https://setup.johnsnowlabs.com/nlu/colab.sh -O - | bash\n","import nlu"],"execution_count":null,"outputs":[{"output_type":"stream","text":["--2021-05-05 04:02:45-- https://raw.githubusercontent.com/JohnSnowLabs/nlu/master/scripts/colab_setup.sh\n","Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...\n","Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.\n","HTTP request sent, awaiting response... 200 OK\n","Length: 1671 (1.6K) [text/plain]\n","Saving to: ‘STDOUT’\n","\n","\r- 0%[ ] 0 --.-KB/s \r- 100%[===================>] 1.63K --.-KB/s in 0s \n","\n","2021-05-05 04:02:45 (39.1 MB/s) - written to stdout [1671/1671]\n","\n","Installing NLU 3.0.0 with PySpark 3.0.2 and Spark NLP 3.0.1 for Google Colab ...\n","\u001b[K |████████████████████████████████| 204.8MB 73kB/s \n","\u001b[K |████████████████████████████████| 153kB 46.7MB/s \n","\u001b[K |████████████████████████████████| 204kB 21.9MB/s \n","\u001b[K |████████████████████████████████| 204kB 51.4MB/s \n","\u001b[?25h Building wheel for pyspark (setup.py) ... \u001b[?25l\u001b[?25hdone\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"f4KkTfnR5Ugg"},"source":["# 2. Download appple twitter Sentiment dataset \n","https://www.kaggle.com/seriousran/appletwittersentimenttexts\n","\n","this dataset contains tweets made towards apple and today we are going to train our model to predict whether the tweet contains sentiment!\n"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"OrVb5ZMvvrQD","executionInfo":{"status":"ok","timestamp":1620187468785,"user_tz":-120,"elapsed":104015,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"c6049d62-5149-4669-8f6e-8d4f0f0b7c63"},"source":["! wget http://ckl-it.de/wp-content/uploads/2021/01/apple-twitter-sentiment-texts.csv\n"],"execution_count":null,"outputs":[{"output_type":"stream","text":["--2021-05-05 04:04:28-- http://ckl-it.de/wp-content/uploads/2021/01/apple-twitter-sentiment-texts.csv\n","Resolving ckl-it.de (ckl-it.de)... 217.160.0.108, 2001:8d8:100f:f000::209\n","Connecting to ckl-it.de (ckl-it.de)|217.160.0.108|:80... connected.\n","HTTP request sent, awaiting response... 200 OK\n","Length: 31678 (31K) [text/csv]\n","Saving to: ‘apple-twitter-sentiment-texts.csv’\n","\n","apple-twitter-senti 100%[===================>] 30.94K --.-KB/s in 0.1s \n","\n","2021-05-05 04:04:28 (250 KB/s) - ‘apple-twitter-sentiment-texts.csv’ saved [31678/31678]\n","\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":419},"id":"y4xSRWIhwT28","executionInfo":{"status":"ok","timestamp":1620187470342,"user_tz":-120,"elapsed":105566,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"dcd0fde1-7df8-462f-bf4f-85174437b1a8"},"source":["import pandas as pd\n","train_path = '/content/apple-twitter-sentiment-texts.csv'\n","\n","train_df = pd.read_csv(train_path)\n","# the text data to use for classification should be in a column named 'text'\n","# the label column must have name 'y' name be of type str\n","columns=['text','y']\n","train_df = train_df[columns]\n","train_df = train_df[~train_df[\"y\"].isin([\"neuteral\"])]\n","from sklearn.model_selection import train_test_split\n","\n","train_df, test_df = train_test_split(train_df, test_size=0.2)\n","train_df"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
texty
36Theyre not RT @Naivana_: You gotta be kidding ...negative
7Thank you @Apple for fixing the #Swift sourcek...positive
272Washed my earphones by accident, still works. ...positive
146@AppleOfficialll @apple I can't say enough abo...positive
237@OneRepublic @Apple So amazingpositive
.........
247Got to hear about the new patent by apple ... ...positive
157fuck you @applenegative
93#apple earns more #profit in on quarter than #...positive
165This is why I moved over to @Apple... http://t...positive
52I don't know what you're trying to do @apple b...negative
\n","

228 rows × 2 columns

\n","
"],"text/plain":[" text y\n","36 Theyre not RT @Naivana_: You gotta be kidding ... negative\n","7 Thank you @Apple for fixing the #Swift sourcek... positive\n","272 Washed my earphones by accident, still works. ... positive\n","146 @AppleOfficialll @apple I can't say enough abo... positive\n","237 @OneRepublic @Apple So amazing positive\n",".. ... ...\n","247 Got to hear about the new patent by apple ... ... positive\n","157 fuck you @apple negative\n","93 #apple earns more #profit in on quarter than #... positive\n","165 This is why I moved over to @Apple... http://t... positive\n","52 I don't know what you're trying to do @apple b... negative\n","\n","[228 rows x 2 columns]"]},"metadata":{"tags":[]},"execution_count":3}]},{"cell_type":"markdown","metadata":{"id":"0296Om2C5anY"},"source":["# 3. Train Deep Learning Classifier using nlu.load('train.sentiment')\n","\n","You dataset label column should be named 'y' and the feature column with text data should be named 'text'"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":1000},"id":"3ZIPkRkWftBG","executionInfo":{"status":"ok","timestamp":1620188334902,"user_tz":-120,"elapsed":10389,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"5fb3d813-68c4-4fc1-fd1a-d19b98dd828c"},"source":["from sklearn.metrics import classification_report\n","import nlu \n","# load a trainable pipeline by specifying the train. prefix and fit it on a datset with label and text columns\n","# by default the Universal Sentence Encoder (USE) Sentence embeddings are used for generation\n","trainable_pipe = nlu.load('train.sentiment')\n","fitted_pipe = trainable_pipe.fit(train_df.iloc[:50])\n","\n","# predict with the trainable pipeline on dataset and get predictions\n","preds = fitted_pipe.predict(train_df.iloc[:50],output_level='document')\n","#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n","preds.dropna(inplace=True)\n","print(classification_report(preds['y'], preds['sentiment']))\n","\n","preds"],"execution_count":null,"outputs":[{"output_type":"stream","text":["tfhub_use download started this may take some time.\n","Approximate size to download 923.7 MB\n","[OK!]\n","sentence_detector_dl download started this may take some time.\n","Approximate size to download 354.6 KB\n","[OK!]\n"," precision recall f1-score support\n","\n"," negative 0.00 0.00 0.00 23.0\n"," neutral 0.00 0.00 0.00 0.0\n"," positive 0.00 0.00 0.00 27.0\n","\n"," accuracy 0.00 50.0\n"," macro avg 0.00 0.00 0.00 50.0\n","weighted avg 0.00 0.00 0.00 50.0\n","\n"],"name":"stdout"},{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
textsentence_embedding_usesentenceyorigin_indexdocumenttrained_sentimenttrained_sentiment_confidence
0Theyre not RT @Naivana_: You gotta be kidding ...[0.053627125918865204, 0.03388773649930954, 0....[Theyre not RT @Naivana_:, You gotta be kiddi...negative36Theyre not RT @Naivana_: You gotta be kidding ...neutral0.519284
1Thank you @Apple for fixing the #Swift sourcek...[0.01400269940495491, 0.04662228375673294, 0.0...[Thank you @Apple for fixing the #Swift source...positive7Thank you @Apple for fixing the #Swift sourcek...neutral0.526175
2Washed my earphones by accident, still works. ...[0.028794147074222565, 0.060343481600284576, -...[Washed my earphones by accident, still works....positive272Washed my earphones by accident, still works. ...neutral0.546108
3@AppleOfficialll @apple I can't say enough abo...[0.04390808567404747, 0.0024722537491470575, -...[@AppleOfficialll @apple I can't say enough ab...positive146@AppleOfficialll @apple I can't say enough abo...neutral0.535401
4@OneRepublic @Apple So amazing[-0.0420607253909111, 0.05555792152881622, -0....[@OneRepublic @Apple So amazing]positive237@OneRepublic @Apple So amazingneutral0.562668
5The 10 biggest differences between #Mac and #P...[0.046186886727809906, 0.0493854284286499, 0.0...[The 10 biggest differences between #Mac and #...positive213The 10 biggest differences between #Mac and #P...neutral0.541398
6@SamJam Agreed--have to give props to @Apple f...[0.06067856401205063, -0.014465369284152985, -...[@SamJam Agreed--have to give props to @Apple ...positive77@SamJam Agreed--have to give props to @Apple f...neutral0.553645
7@OneRepublic @Apple that looks amazing, don't ...[0.016945000737905502, -0.019095521420240402, ...[@OneRepublic @Apple that looks amazing, don't...positive56@OneRepublic @Apple that looks amazing, don't ...neutral0.548541
8Finally! Brooklyn Is Getting Its Own #Apple St...[0.06617175042629242, -0.018538057804107666, -...[Finally!, Brooklyn Is Getting Its Own #Apple...positive234Finally! Brooklyn Is Getting Its Own #Apple St...neutral0.531661
9RT @ehsonakbary: THE WORLD'S FIRST MURDER VIA ...[0.07017797976732254, 0.03393903002142906, -0....[RT @ehsonakbary: THE WORLD'S FIRST MURDER VIA...negative103RT @ehsonakbary: THE WORLD'S FIRST MURDER VIA ...neutral0.541381
10@SwiftKey @Apple Oh In wish you guys would por...[0.04137858748435974, -0.03665991127490997, -0...[@SwiftKey @Apple Oh In wish you guys would po...negative208@SwiftKey @Apple Oh In wish you guys would por...neutral0.532717
11The king of the phablets! Apple's iPhone 6 plu...[0.07199912518262863, 0.0021514107938855886, -...[The king of the phablets!, Apple's iPhone 6 ...positive250The king of the phablets! Apple's iPhone 6 plu...neutral0.556155
12iPhone6 fell 2 ft. Screen shattered like it wa...[0.0498758926987648, 0.02919364534318447, -0.0...[iPhone6 fell 2 ft., Screen shattered like it ...negative263iPhone6 fell 2 ft. Screen shattered like it wa...neutral0.525769
13Shockingly, iMessage on the desktop is fucked ...[0.056961141526699066, -0.02297591045498848, -...[Shockingly, iMessage on the desktop is fucked...negative107Shockingly, iMessage on the desktop is fucked ...neutral0.515020
14Yeeaaayyy....awesome OS X Yosemite 10.10.1 roc...[0.03625739365816116, 0.0035043705720454454, -...[Yeeaaayyy., ., ., .awesome OS X Yosemite 10.1...positive285Yeeaaayyy....awesome OS X Yosemite 10.10.1 roc...neutral0.542655
15@OneRepublic @Apple Thanks for sharing pics. A...[0.020150689408183098, -0.03949593007564545, 0...[@OneRepublic @Apple Thanks for sharing pics.,...positive138@OneRepublic @Apple Thanks for sharing pics. A...neutral0.568121
16@apple @iBooks is awesome. Thank You![0.0218979399651289, 0.005902431905269623, -0....[@apple @iBooks is awesome., Thank You!]positive198@apple @iBooks is awesome. Thank You!neutral0.559235
17Why I Hate Apple - http://t.co/IbpyJXaOuW #AAP...[0.055701617151498795, 0.04886789247393608, -0...[Why I Hate Apple - http://t.co/IbpyJXaOuW #AA...negative109Why I Hate Apple - http://t.co/IbpyJXaOuW #AAP...neutral0.525435
18yo @Apple , why yall make it so hard to grip t...[0.07655678689479828, 0.0038530321326106787, -...[yo @Apple , why yall make it so hard to grip ...negative225yo @Apple , why yall make it so hard to grip t...neutral0.526321
19@apple Why is your NYC Grand Central store so ...[0.06858145445585251, 0.052669648081064224, 0....[@apple Why is your NYC Grand Central store so...negative131@apple Why is your NYC Grand Central store so ...neutral0.518131
20Love the @Apple is supporting #HourOfCode with...[0.01483983639627695, -0.00843955110758543, -0...[Love the @Apple is supporting #HourOfCode wit...positive219Love the @Apple is supporting #HourOfCode with...neutral0.543223
21Companies i admire : @3QDigital @vaynermedia @...[0.015711167827248573, 0.064913809299469, 0.01...[Companies i admire : @3QDigital @vaynermedia ...positive19Companies i admire : @3QDigital @vaynermedia @...neutral0.539861
22These Damn @Apple Commercials Are Getting Wors...[0.00820611510425806, 0.004215271212160587, -0...[These Damn @Apple Commercials Are Getting Wor...negative142These Damn @Apple Commercials Are Getting Wors...neutral0.513193
23Free s/o @apple for this nice iPad[0.05512142553925514, 0.011973769403994083, -0...[Free s/o @apple for this nice iPad]positive57Free s/o @apple for this nice iPadneutral0.543708
24if I tweet about NANDOS my phone puts it in ca...[0.021839885041117668, 0.07290177792310715, -0...[if I tweet about NANDOS my phone puts it in c...positive140if I tweet about NANDOS my phone puts it in ca...neutral0.538120
25RT @saigeist: the most offensive thing is the ...[0.043079257011413574, -0.045617833733558655, ...[RT @saigeist: the most offensive thing is the...negative178RT @saigeist: the most offensive thing is the ...neutral0.512527
26.@tim_cook That rage when dealing @Apple Geniu...[0.05030369386076927, 0.023012423887848854, -0...[., @tim_cook That rage when dealing @Apple Ge...negative96.@tim_cook That rage when dealing @Apple Geniu...neutral0.517721
27I hate my MacBook now. Fuck this update and fu...[0.04070044681429863, 0.02998330444097519, 0.0...[I hate my MacBook now., Fuck this update and...negative155I hate my MacBook now. Fuck this update and fu...neutral0.510436
28Kantar: iPhone 6 helps Apple gain share over A...[0.06512415409088135, -0.009421378374099731, -...[Kantar: iPhone 6 helps Apple gain share over ...positive41Kantar: iPhone 6 helps Apple gain share over A...neutral0.550157
29@tehhGOAT @Apple do you by accident have me bl...[-0.019428884610533714, 0.055587127804756165, ...[@tehhGOAT @Apple do you by accident have me b...negative17@tehhGOAT @Apple do you by accident have me bl...neutral0.506774
30@fullcircleone ThanX! Big @Apple ThanX goes 2 ...[0.0020813194569200277, -0.025028454139828682,...[@fullcircleone ThanX!, Big @Apple ThanX goes...positive194@fullcircleone ThanX! Big @Apple ThanX goes 2 ...neutral0.553299
31my phone keeps restarting on its own @apple do...[-0.0074605816043913364, 0.026783859357237816,...[my phone keeps restarting on its own @apple d...negative135my phone keeps restarting on its own @apple do...neutral0.513831
32my dad called now my musics arent playing jesu...[0.008138233795762062, -0.026052597910165787, ...[my dad called now my musics arent playing jes...negative187my dad called now my musics arent playing jesu...neutral0.519269
33@apple #greatservice thanks for helping me out...[0.02643176168203354, 0.015369059517979622, -0...[@apple #greatservice thanks for helping me ou...positive201@apple #greatservice thanks for helping me out...neutral0.540207
34@Apple fix addis' phone so we can text each ot...[0.041290853172540665, 0.037915393710136414, -...[@Apple fix addis' phone so we can text each o...negative261@Apple fix addis' phone so we can text each ot...neutral0.521058
35Sorry @samsung but I will be taking my smartph...[0.07281249761581421, 0.0376746729016304, 0.01...[Sorry @samsung but I will be taking my smartp...positive5Sorry @samsung but I will be taking my smartph...neutral0.554886
36@LittleWordBites @TravlandLeisure @Apple &gt;...[0.0581418015062809, -0.042348094284534454, -0...[@LittleWordBites @TravlandLeisure @Apple &gt;...positive195@LittleWordBites @TravlandLeisure @Apple &gt; ...neutral0.571530
37RT @_emilymahon: @nosheenh2oo9 @Apple this is ...[0.05870470777153969, 0.061276875436306, -0.02...[RT @_emilymahon: @nosheenh2oo9 @Apple this is...positive8RT @_emilymahon: @nosheenh2oo9 @Apple this is ...neutral0.546566
38@FaZeNikan @Apple lol iPhone,weak. Get on that...[0.07184943556785583, -0.025483926758170128, -...[@FaZeNikan @Apple lol iPhone,weak., Get on t...negative191@FaZeNikan @Apple lol iPhone,weak. Get on that...neutral0.526558
39RT @hsmoghul: My @apple autocorrect changes Mu...[0.031838081777095795, 0.009894482791423798, 0...[RT @hsmoghul:, My @apple autocorrect changes...positive149RT @hsmoghul: My @apple autocorrect changes Mu...neutral0.541348
40RT @_iamGambino: Thank you @Apple[0.018735762685537338, 0.07813401520252228, -0...[RT @_iamGambino: Thank you @Apple]positive212RT @_iamGambino: Thank you @Appleneutral0.538737
41@lanadelreystan KILL YOURSELF @apple[-0.018729694187641144, 0.05311402678489685, -...[@lanadelreystan KILL YOURSELF @apple]negative245@lanadelreystan KILL YOURSELF @appleneutral0.526904
42Great time had @Apple store on Friday. @Russel...[0.08689077943563461, 0.0028835677076131105, -...[Great time had @Apple store on Friday., @Rus...positive72Great time had @Apple store on Friday. @Russel...neutral0.557536
43@CharlesJMeyer @Apple @Appy_Geek Hasn't Apple ...[0.05127181485295296, 0.03388502821326256, -0....[@CharlesJMeyer @Apple @Appy_Geek Hasn't Apple...negative181@CharlesJMeyer @Apple @Appy_Geek Hasn't Apple ...neutral0.518989
44Left the hoos we 100% now got 50 iPhones are d...[0.009334878996014595, 0.06441717594861984, 0....[Left the hoos we 100% now got 50 iPhones are ...negative86Left the hoos we 100% now got 50 iPhones are d...neutral0.526055
45@apple why is it that your iPhone's alarm fail...[0.013754838146269321, 0.05925201624631882, -0...[@apple why is it that your iPhone's alarm fai...negative69@apple why is it that your iPhone's alarm fail...neutral0.515201
46Changing words that aren't even misspelled lik...[0.005594303365796804, 0.03357389196753502, -0...[Changing words that aren't even misspelled li...negative50Changing words that aren't even misspelled lik...neutral0.510925
47that yosemite update is so annoying i regret i...[0.011035434901714325, -0.046652231365442276, ...[that yosemite update is so annoying i regret ...negative159that yosemite update is so annoying i regret i...neutral0.512430
48Photo: Amazing customer service today @Apple. ...[0.0008898481610231102, 0.025655869394540787, ...[Photo: Amazing customer service today @Apple....positive256Photo: Amazing customer service today @Apple. ...neutral0.563457
49Awesome! @Apple invented peanut-butter-sandwic...[0.0439443401992321, 0.043788887560367584, -0....[Awesome!, @Apple invented peanut-butter-sandw...positive20Awesome! @Apple invented peanut-butter-sandwic...neutral0.547759
\n","
"],"text/plain":[" text ... trained_sentiment_confidence\n","0 Theyre not RT @Naivana_: You gotta be kidding ... ... 0.519284\n","1 Thank you @Apple for fixing the #Swift sourcek... ... 0.526175\n","2 Washed my earphones by accident, still works. ... ... 0.546108\n","3 @AppleOfficialll @apple I can't say enough abo... ... 0.535401\n","4 @OneRepublic @Apple So amazing ... 0.562668\n","5 The 10 biggest differences between #Mac and #P... ... 0.541398\n","6 @SamJam Agreed--have to give props to @Apple f... ... 0.553645\n","7 @OneRepublic @Apple that looks amazing, don't ... ... 0.548541\n","8 Finally! Brooklyn Is Getting Its Own #Apple St... ... 0.531661\n","9 RT @ehsonakbary: THE WORLD'S FIRST MURDER VIA ... ... 0.541381\n","10 @SwiftKey @Apple Oh In wish you guys would por... ... 0.532717\n","11 The king of the phablets! Apple's iPhone 6 plu... ... 0.556155\n","12 iPhone6 fell 2 ft. Screen shattered like it wa... ... 0.525769\n","13 Shockingly, iMessage on the desktop is fucked ... ... 0.515020\n","14 Yeeaaayyy....awesome OS X Yosemite 10.10.1 roc... ... 0.542655\n","15 @OneRepublic @Apple Thanks for sharing pics. A... ... 0.568121\n","16 @apple @iBooks is awesome. Thank You! ... 0.559235\n","17 Why I Hate Apple - http://t.co/IbpyJXaOuW #AAP... ... 0.525435\n","18 yo @Apple , why yall make it so hard to grip t... ... 0.526321\n","19 @apple Why is your NYC Grand Central store so ... ... 0.518131\n","20 Love the @Apple is supporting #HourOfCode with... ... 0.543223\n","21 Companies i admire : @3QDigital @vaynermedia @... ... 0.539861\n","22 These Damn @Apple Commercials Are Getting Wors... ... 0.513193\n","23 Free s/o @apple for this nice iPad ... 0.543708\n","24 if I tweet about NANDOS my phone puts it in ca... ... 0.538120\n","25 RT @saigeist: the most offensive thing is the ... ... 0.512527\n","26 .@tim_cook That rage when dealing @Apple Geniu... ... 0.517721\n","27 I hate my MacBook now. Fuck this update and fu... ... 0.510436\n","28 Kantar: iPhone 6 helps Apple gain share over A... ... 0.550157\n","29 @tehhGOAT @Apple do you by accident have me bl... ... 0.506774\n","30 @fullcircleone ThanX! Big @Apple ThanX goes 2 ... ... 0.553299\n","31 my phone keeps restarting on its own @apple do... ... 0.513831\n","32 my dad called now my musics arent playing jesu... ... 0.519269\n","33 @apple #greatservice thanks for helping me out... ... 0.540207\n","34 @Apple fix addis' phone so we can text each ot... ... 0.521058\n","35 Sorry @samsung but I will be taking my smartph... ... 0.554886\n","36 @LittleWordBites @TravlandLeisure @Apple >... ... 0.571530\n","37 RT @_emilymahon: @nosheenh2oo9 @Apple this is ... ... 0.546566\n","38 @FaZeNikan @Apple lol iPhone,weak. Get on that... ... 0.526558\n","39 RT @hsmoghul: My @apple autocorrect changes Mu... ... 0.541348\n","40 RT @_iamGambino: Thank you @Apple ... 0.538737\n","41 @lanadelreystan KILL YOURSELF @apple ... 0.526904\n","42 Great time had @Apple store on Friday. @Russel... ... 0.557536\n","43 @CharlesJMeyer @Apple @Appy_Geek Hasn't Apple ... ... 0.518989\n","44 Left the hoos we 100% now got 50 iPhones are d... ... 0.526055\n","45 @apple why is it that your iPhone's alarm fail... ... 0.515201\n","46 Changing words that aren't even misspelled lik... ... 0.510925\n","47 that yosemite update is so annoying i regret i... ... 0.512430\n","48 Photo: Amazing customer service today @Apple. ... ... 0.563457\n","49 Awesome! @Apple invented peanut-butter-sandwic... ... 0.547759\n","\n","[50 rows x 8 columns]"]},"metadata":{"tags":[]},"execution_count":6}]},{"cell_type":"markdown","metadata":{"id":"lVyOE2wV0fw_"},"source":["#4. Test the fitted pipe on new example"]},{"cell_type":"code","metadata":{"id":"qdCUg2MR0PD2","colab":{"base_uri":"https://localhost:8080/","height":80},"executionInfo":{"status":"ok","timestamp":1620188335247,"user_tz":-120,"elapsed":10456,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"31d1c18f-b30b-4507-cd1c-4aa13c0bec6d"},"source":["fitted_pipe.predict('I hate the newest update')"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
sentence_embedding_usesentenceorigin_indexdocumenttrained_sentimenttrained_sentiment_confidence
0[-0.023322951048612595, -0.04157407209277153, ...[I hate the newest update]0I hate the newest updateneutral0.509875
\n","
"],"text/plain":[" sentence_embedding_use ... trained_sentiment_confidence\n","0 [-0.023322951048612595, -0.04157407209277153, ... ... 0.509875\n","\n","[1 rows x 6 columns]"]},"metadata":{"tags":[]},"execution_count":7}]},{"cell_type":"markdown","metadata":{"id":"xflpwrVjjBVD"},"source":["##5. Configure pipe training parameters"]},{"cell_type":"code","metadata":{"id":"UtsAUGTmOTms","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1620188335248,"user_tz":-120,"elapsed":10187,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"f0702b13-6ac0-4a3d-87eb-57924a6760b1"},"source":["trainable_pipe.print_info()"],"execution_count":null,"outputs":[{"output_type":"stream","text":["The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n",">>> pipe['sentiment_dl'] has settable params:\n","pipe['sentiment_dl'].setMaxEpochs(1) | Info: Maximum number of epochs to train | Currently set to : 1\n","pipe['sentiment_dl'].setLr(0.005) | Info: Learning Rate | Currently set to : 0.005\n","pipe['sentiment_dl'].setBatchSize(64) | Info: Batch size | Currently set to : 64\n","pipe['sentiment_dl'].setDropout(0.5) | Info: Dropout coefficient | Currently set to : 0.5\n","pipe['sentiment_dl'].setEnableOutputLogs(True) | Info: Whether to use stdout in addition to Spark logs. | Currently set to : True\n","pipe['sentiment_dl'].setThreshold(0.6) | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n","pipe['sentiment_dl'].setThresholdLabel('neutral') | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n",">>> pipe['use@tfhub_use'] has settable params:\n","pipe['use@tfhub_use'].setDimension(512) | Info: Number of embedding dimensions | Currently set to : 512\n","pipe['use@tfhub_use'].setLoadSP(False) | Info: Whether to load SentencePiece ops file which is required only by multi-lingual models. This is not changeable after it's set with a pretrained model nor it is compatible with Windows. | Currently set to : False\n","pipe['use@tfhub_use'].setStorageRef('tfhub_use') | Info: unique reference name for identification | Currently set to : tfhub_use\n",">>> pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'] has settable params:\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setExplodeSentences(False) | Info: whether to explode each sentence into a different row, for better parallelization. Defaults to false. | Currently set to : False\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setStorageRef('SentenceDetectorDLModel_c83c27f46b97') | Info: storage unique identifier | Currently set to : SentenceDetectorDLModel_c83c27f46b97\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setEncoder(com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@41fb48fb) | Info: Data encoder | Currently set to : com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@41fb48fb\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setImpossiblePenultimates(['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']) | Info: Impossible penultimates | Currently set to : ['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setModelArchitecture('cnn') | Info: Model architecture (CNN) | Currently set to : cnn\n",">>> pipe['document_assembler'] has settable params:\n","pipe['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"2GJdDNV9jEIe"},"source":["##6. Retrain with new parameters"]},{"cell_type":"code","metadata":{"id":"mptfvHx-MMMX","colab":{"base_uri":"https://localhost:8080/","height":759},"executionInfo":{"status":"ok","timestamp":1620188339611,"user_tz":-120,"elapsed":14279,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"c580024a-2436-4b70-b7bb-c121cd3d6e80"},"source":["# Train longer!\n","trainable_pipe = nlu.load('train.sentiment')\n","trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(5) \n","fitted_pipe = trainable_pipe.fit(train_df.iloc[:100])\n","# predict with the trainable pipeline on dataset and get predictions\n","preds = fitted_pipe.predict(train_df.iloc[:100],output_level='document')\n","\n","#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n","preds.dropna(inplace=True)\n","print(classification_report(preds['y'], preds['sentiment']))\n","\n","preds"],"execution_count":null,"outputs":[{"output_type":"stream","text":[" precision recall f1-score support\n","\n"," negative 0.94 0.94 0.94 47\n"," neutral 0.00 0.00 0.00 0\n"," positive 0.98 0.91 0.94 53\n","\n"," accuracy 0.92 100\n"," macro avg 0.64 0.61 0.63 100\n","weighted avg 0.96 0.92 0.94 100\n","\n"],"name":"stdout"},{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
textsentence_embedding_usesentenceyorigin_indexdocumenttrained_sentimenttrained_sentiment_confidence
0Theyre not RT @Naivana_: You gotta be kidding ...[0.053627125918865204, 0.03388773649930954, 0....[Theyre not RT @Naivana_:, You gotta be kiddi...negative36Theyre not RT @Naivana_: You gotta be kidding ...negative0.947914
1Thank you @Apple for fixing the #Swift sourcek...[0.01400269940495491, 0.04662228375673294, 0.0...[Thank you @Apple for fixing the #Swift source...positive7Thank you @Apple for fixing the #Swift sourcek...positive0.926184
2Washed my earphones by accident, still works. ...[0.028794147074222565, 0.060343481600284576, -...[Washed my earphones by accident, still works....positive272Washed my earphones by accident, still works. ...positive0.867718
3@AppleOfficialll @apple I can't say enough abo...[0.04390808567404747, 0.0024722537491470575, -...[@AppleOfficialll @apple I can't say enough ab...positive146@AppleOfficialll @apple I can't say enough abo...negative0.707370
4@OneRepublic @Apple So amazing[-0.0420607253909111, 0.05555792152881622, -0....[@OneRepublic @Apple So amazing]positive237@OneRepublic @Apple So amazingpositive0.989958
...........................
95Apple Inc. CEO Donates $291K To Pennsylvania S...[0.036090489476919174, 0.033749453723430634, -...[Apple Inc. CEO Donates $291K To Pennsylvania ...positive4Apple Inc. CEO Donates $291K To Pennsylvania S...neutral0.598268
96Photo: Yaaass. Shoutout to @apple.holidays and...[0.062088269740343094, -0.0338711179792881, -0...[Photo:, Yaaass., Shoutout to @apple., holiday...positive275Photo: Yaaass. Shoutout to @apple.holidays and...positive0.998650
97RT @hypebot: Steve Job's Deposition in #iPod L...[0.06204485893249512, 0.05829763785004616, -0....[RT @hypebot: Steve Job's Deposition in #iPod ...negative141RT @hypebot: Steve Job's Deposition in #iPod L...neutral0.581826
98@apple I have been on hold for 30 minutes than...[0.02501610852777958, 0.04794774204492569, -0....[@apple I have been on hold for 30 minutes tha...negative128@apple I have been on hold for 30 minutes than...negative0.933449
99Wow. Yall needa step it up @Apple RT @heynyla:...[0.027452174574136734, -0.004120523110032082, ...[Wow. Yall needa step it up @Apple RT @heynyla...negative1Wow. Yall needa step it up @Apple RT @heynyla:...negative0.729602
\n","

100 rows × 8 columns

\n","
"],"text/plain":[" text ... trained_sentiment_confidence\n","0 Theyre not RT @Naivana_: You gotta be kidding ... ... 0.947914\n","1 Thank you @Apple for fixing the #Swift sourcek... ... 0.926184\n","2 Washed my earphones by accident, still works. ... ... 0.867718\n","3 @AppleOfficialll @apple I can't say enough abo... ... 0.707370\n","4 @OneRepublic @Apple So amazing ... 0.989958\n",".. ... ... ...\n","95 Apple Inc. CEO Donates $291K To Pennsylvania S... ... 0.598268\n","96 Photo: Yaaass. Shoutout to @apple.holidays and... ... 0.998650\n","97 RT @hypebot: Steve Job's Deposition in #iPod L... ... 0.581826\n","98 @apple I have been on hold for 30 minutes than... ... 0.933449\n","99 Wow. Yall needa step it up @Apple RT @heynyla:... ... 0.729602\n","\n","[100 rows x 8 columns]"]},"metadata":{"tags":[]},"execution_count":9}]},{"cell_type":"markdown","metadata":{"id":"qFoT-s1MjTSS"},"source":["#7. Try training with different Embeddings"]},{"cell_type":"code","metadata":{"id":"nxWFzQOhjWC8","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1620188339931,"user_tz":-120,"elapsed":14332,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"b6acf5d2-2f7e-4200-a6eb-24c9c4b05592"},"source":["# We can use nlu.print_components(action='embed_sentence') to see every possibler sentence embedding we could use. Lets use bert!\n","nlu.print_components(action='embed_sentence')"],"execution_count":null,"outputs":[{"output_type":"stream","text":["For language NLU provides the following Models : \n","nlu.load('en.embed_sentence') returns Spark NLP model tfhub_use\n","nlu.load('en.embed_sentence.use') returns Spark NLP model tfhub_use\n","nlu.load('en.embed_sentence.tfhub_use') returns Spark NLP model tfhub_use\n","nlu.load('en.embed_sentence.use.lg') returns Spark NLP model tfhub_use_lg\n","nlu.load('en.embed_sentence.tfhub_use.lg') returns Spark NLP model tfhub_use_lg\n","nlu.load('en.embed_sentence.albert') returns Spark NLP model albert_base_uncased\n","nlu.load('en.embed_sentence.electra') returns Spark NLP model sent_electra_small_uncased\n","nlu.load('en.embed_sentence.electra_small_uncased') returns Spark NLP model sent_electra_small_uncased\n","nlu.load('en.embed_sentence.electra_base_uncased') returns Spark NLP model sent_electra_base_uncased\n","nlu.load('en.embed_sentence.electra_large_uncased') returns Spark NLP model sent_electra_large_uncased\n","nlu.load('en.embed_sentence.bert') returns Spark NLP model sent_bert_base_uncased\n","nlu.load('en.embed_sentence.bert_base_uncased') returns Spark NLP model sent_bert_base_uncased\n","nlu.load('en.embed_sentence.bert_base_cased') returns Spark NLP model sent_bert_base_cased\n","nlu.load('en.embed_sentence.bert_large_uncased') returns Spark NLP model sent_bert_large_uncased\n","nlu.load('en.embed_sentence.bert_large_cased') returns Spark NLP model sent_bert_large_cased\n","nlu.load('en.embed_sentence.biobert.pubmed_base_cased') returns Spark NLP model sent_biobert_pubmed_base_cased\n","nlu.load('en.embed_sentence.biobert.pubmed_large_cased') returns Spark NLP model sent_biobert_pubmed_large_cased\n","nlu.load('en.embed_sentence.biobert.pmc_base_cased') returns Spark NLP model sent_biobert_pmc_base_cased\n","nlu.load('en.embed_sentence.biobert.pubmed_pmc_base_cased') returns Spark NLP model sent_biobert_pubmed_pmc_base_cased\n","nlu.load('en.embed_sentence.biobert.clinical_base_cased') returns Spark NLP model sent_biobert_clinical_base_cased\n","nlu.load('en.embed_sentence.biobert.discharge_base_cased') returns Spark NLP model sent_biobert_discharge_base_cased\n","nlu.load('en.embed_sentence.covidbert.large_uncased') returns Spark NLP model sent_covidbert_large_uncased\n","nlu.load('en.embed_sentence.small_bert_L2_128') returns Spark NLP model sent_small_bert_L2_128\n","nlu.load('en.embed_sentence.small_bert_L4_128') returns Spark NLP model sent_small_bert_L4_128\n","nlu.load('en.embed_sentence.small_bert_L6_128') returns Spark NLP model sent_small_bert_L6_128\n","nlu.load('en.embed_sentence.small_bert_L8_128') returns Spark NLP model sent_small_bert_L8_128\n","nlu.load('en.embed_sentence.small_bert_L10_128') returns Spark NLP model sent_small_bert_L10_128\n","nlu.load('en.embed_sentence.small_bert_L12_128') returns Spark NLP model sent_small_bert_L12_128\n","nlu.load('en.embed_sentence.small_bert_L2_256') returns Spark NLP model sent_small_bert_L2_256\n","nlu.load('en.embed_sentence.small_bert_L4_256') returns Spark NLP model sent_small_bert_L4_256\n","nlu.load('en.embed_sentence.small_bert_L6_256') returns Spark NLP model sent_small_bert_L6_256\n","nlu.load('en.embed_sentence.small_bert_L8_256') returns Spark NLP model sent_small_bert_L8_256\n","nlu.load('en.embed_sentence.small_bert_L10_256') returns Spark NLP model sent_small_bert_L10_256\n","nlu.load('en.embed_sentence.small_bert_L12_256') returns Spark NLP model sent_small_bert_L12_256\n","nlu.load('en.embed_sentence.small_bert_L2_512') returns Spark NLP model sent_small_bert_L2_512\n","nlu.load('en.embed_sentence.small_bert_L4_512') returns Spark NLP model sent_small_bert_L4_512\n","nlu.load('en.embed_sentence.small_bert_L6_512') returns Spark NLP model sent_small_bert_L6_512\n","nlu.load('en.embed_sentence.small_bert_L8_512') returns Spark NLP model sent_small_bert_L8_512\n","nlu.load('en.embed_sentence.small_bert_L10_512') returns Spark NLP model sent_small_bert_L10_512\n","nlu.load('en.embed_sentence.small_bert_L12_512') returns Spark NLP model sent_small_bert_L12_512\n","nlu.load('en.embed_sentence.small_bert_L2_768') returns Spark NLP model sent_small_bert_L2_768\n","nlu.load('en.embed_sentence.small_bert_L4_768') returns Spark NLP model sent_small_bert_L4_768\n","nlu.load('en.embed_sentence.small_bert_L6_768') returns Spark NLP model sent_small_bert_L6_768\n","nlu.load('en.embed_sentence.small_bert_L8_768') returns Spark NLP model sent_small_bert_L8_768\n","nlu.load('en.embed_sentence.small_bert_L10_768') returns Spark NLP model sent_small_bert_L10_768\n","nlu.load('en.embed_sentence.small_bert_L12_768') returns Spark NLP model sent_small_bert_L12_768\n","For language NLU provides the following Models : \n","nlu.load('fi.embed_sentence') returns Spark NLP model sent_bert_finnish_cased\n","nlu.load('fi.embed_sentence.bert.cased') returns Spark NLP model sent_bert_finnish_cased\n","nlu.load('fi.embed_sentence.bert.uncased') returns Spark NLP model sent_bert_finnish_uncased\n","For language NLU provides the following Models : \n","nlu.load('xx.embed_sentence') returns Spark NLP model sent_bert_multi_cased\n","nlu.load('xx.embed_sentence.bert') returns Spark NLP model sent_bert_multi_cased\n","nlu.load('xx.embed_sentence.bert.cased') returns Spark NLP model sent_bert_multi_cased\n","nlu.load('xx.embed_sentence.labse') returns Spark NLP model labse\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"eLex095goHwm","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1620188495786,"user_tz":-120,"elapsed":170049,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"5043ddfe-4d7a-472d-c7b9-8cf34cc1bb06"},"source":["trainable_pipe = nlu.load('en.embed_sentence.small_bert_L12_768 train.sentiment')\n","# We need to train longer and user smaller LR for NON-USE based sentence embeddings usually\n","# We could tune the hyperparameters further with hyperparameter tuning methods like gridsearch\n","# Also longer training gives more accuracy\n","trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(110) \n","trainable_pipe['trainable_sentiment_dl'].setLr(0.0005) \n","fitted_pipe = trainable_pipe.fit(train_df)\n","# predict with the trainable pipeline on dataset and get predictions\n","preds = fitted_pipe.predict(train_df,output_level='document')\n","\n","#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n","preds.dropna(inplace=True)\n","print(classification_report(preds['y'], preds['sentiment']))\n","\n","#preds"],"execution_count":null,"outputs":[{"output_type":"stream","text":["sent_small_bert_L12_768 download started this may take some time.\n","Approximate size to download 392.9 MB\n","[OK!]\n","sentence_detector_dl download started this may take some time.\n","Approximate size to download 354.6 KB\n","[OK!]\n"," precision recall f1-score support\n","\n"," negative 0.97 0.83 0.90 114\n"," neutral 0.00 0.00 0.00 0\n"," positive 0.88 0.93 0.90 114\n","\n"," accuracy 0.88 228\n"," macro avg 0.62 0.59 0.60 228\n","weighted avg 0.92 0.88 0.90 228\n","\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"_1jxw3GnVGlI"},"source":["# 7.1 evaluate on Test Data"]},{"cell_type":"code","metadata":{"id":"Fxx4yNkNVGFl","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1620188505405,"user_tz":-120,"elapsed":179404,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"35e0f9c4-8486-4b24-b00e-387c5f7af5dd"},"source":["preds = fitted_pipe.predict(test_df,output_level='document')\n","\n","#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n","preds.dropna(inplace=True)\n","print(classification_report(preds['y'], preds['sentiment']))"],"execution_count":null,"outputs":[{"output_type":"stream","text":[" precision recall f1-score support\n","\n"," negative 0.94 0.59 0.72 29\n"," neutral 0.00 0.00 0.00 0\n"," positive 0.79 0.93 0.86 29\n","\n"," accuracy 0.76 58\n"," macro avg 0.58 0.51 0.53 58\n","weighted avg 0.87 0.76 0.79 58\n","\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"2BB-NwZUoHSe"},"source":["# 8. Lets save the model"]},{"cell_type":"code","metadata":{"id":"bZZpObLOtqo8","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1620188675266,"user_tz":-120,"elapsed":348933,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"776368a9-02df-4524-e80d-b004a10c4306"},"source":["stored_model_path = './models/classifier_dl_trained' \n","fitted_pipe.save(stored_model_path)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["Stored model in ./models/classifier_dl_trained\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"e_b2DPd4rCiU"},"source":["# 9. Lets load the model from HDD.\n","This makes Offlien NLU usage possible! \n","You need to call nlu.load(path=path_to_the_pipe) to load a model/pipeline from disk."]},{"cell_type":"code","metadata":{"id":"SO4uz45MoRgp","colab":{"base_uri":"https://localhost:8080/","height":80},"executionInfo":{"status":"ok","timestamp":1620188686839,"user_tz":-120,"elapsed":360106,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"32ac7000-0278-40e0-fcc2-5eef5cb56094"},"source":["hdd_pipe = nlu.load(path=stored_model_path)\n","\n","preds = hdd_pipe.predict('I hate the newest update')\n","preds"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
textsentimentsentenceorigin_indexdocumentsentence_embedding_from_disksentiment_confidence
0I hate the newest update[negative][I hate the newest update]8589934592I hate the newest update[[-0.3084237277507782, -0.11030610650777817, 0...[0.67131346]
\n","
"],"text/plain":[" text ... sentiment_confidence\n","0 I hate the newest update ... [0.67131346]\n","\n","[1 rows x 7 columns]"]},"metadata":{"tags":[]},"execution_count":14}]},{"cell_type":"code","metadata":{"id":"e0CVlkk9v6Qi","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1620188686840,"user_tz":-120,"elapsed":359738,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"b7915040-d049-4447-c1aa-21dcb0eb27db"},"source":["hdd_pipe.print_info()"],"execution_count":null,"outputs":[{"output_type":"stream","text":["The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n",">>> pipe['document_assembler'] has settable params:\n","pipe['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n",">>> pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'] has settable params:\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setExplodeSentences(False) | Info: whether to explode each sentence into a different row, for better parallelization. Defaults to false. | Currently set to : False\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setStorageRef('SentenceDetectorDLModel_c83c27f46b97') | Info: storage unique identifier | Currently set to : SentenceDetectorDLModel_c83c27f46b97\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setEncoder(com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@3102445a) | Info: Data encoder | Currently set to : com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@3102445a\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setImpossiblePenultimates(['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']) | Info: Impossible penultimates | Currently set to : ['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setModelArchitecture('cnn') | Info: Model architecture (CNN) | Currently set to : cnn\n",">>> pipe['bert_sentence@sent_small_bert_L12_768'] has settable params:\n","pipe['bert_sentence@sent_small_bert_L12_768'].setBatchSize(8) | Info: Size of every batch | Currently set to : 8\n","pipe['bert_sentence@sent_small_bert_L12_768'].setCaseSensitive(False) | Info: whether to ignore case in tokens for embeddings matching | Currently set to : False\n","pipe['bert_sentence@sent_small_bert_L12_768'].setDimension(768) | Info: Number of embedding dimensions | Currently set to : 768\n","pipe['bert_sentence@sent_small_bert_L12_768'].setMaxSentenceLength(128) | Info: Max sentence length to process | Currently set to : 128\n","pipe['bert_sentence@sent_small_bert_L12_768'].setIsLong(False) | Info: Use Long type instead of Int type for inputs buffer - Some Bert models require Long instead of Int. | Currently set to : False\n","pipe['bert_sentence@sent_small_bert_L12_768'].setStorageRef('sent_small_bert_L12_768') | Info: unique reference name for identification | Currently set to : sent_small_bert_L12_768\n",">>> pipe['sentiment_dl@sent_small_bert_L12_768'] has settable params:\n","pipe['sentiment_dl@sent_small_bert_L12_768'].setThreshold(0.6) | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n","pipe['sentiment_dl@sent_small_bert_L12_768'].setThresholdLabel('neutral') | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n","pipe['sentiment_dl@sent_small_bert_L12_768'].setClasses(['positive', 'negative']) | Info: get the tags used to trained this SentimentDLModel | Currently set to : ['positive', 'negative']\n","pipe['sentiment_dl@sent_small_bert_L12_768'].setStorageRef('sent_small_bert_L12_768') | Info: unique reference name for identification | Currently set to : sent_small_bert_L12_768\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"cbwS4bE0uT7d"},"source":[""],"execution_count":null,"outputs":[]}]} \ No newline at end of file +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "RIV-9vEqxTBB" + }, + "source": [ + "![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)\n", + "\n", + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_apple_twitter.ipynb)\n", + "\n", + "\n", + "\n", + "# Training a Sentiment Analysis Classifier with NLU\n", + "## 2 class Apple Tweets sentiment classifier training\n", + "With the [SentimentDL model](https://nlp.johnsnowlabs.com/docs/en/annotators#sentimentdl-multi-class-sentiment-analysis-annotator) from Spark NLP you can achieve State Of the Art results on any multi class text classification problem\n", + "\n", + "This notebook showcases the following features :\n", + "\n", + "- How to train the deep learning classifier\n", + "- How to store a pipeline to disk\n", + "- How to load the pipeline from disk (Enables NLU offline mode)\n", + "\n", + "You can achieve these results or even better on this dataset with training data:\n", + "\n", + "
\n", + "\n", + "![image.png]()\n", + "\n", + "You can achieve these results or even better on this dataset with test data:\n", + "\n", + "\n", + "\n", + "
\n", + "\n", + "\n", + "![image.png]()\n", + "\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "05-mAOF6ol-0" + }, + "source": [ + "!pip install -q johnsnowlabs" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "f4KkTfnR5Ugg" + }, + "source": [ + "# 2. Download appple twitter Sentiment dataset\n", + "https://www.kaggle.com/seriousran/appletwittersentimenttexts\n", + "\n", + "this dataset contains tweets made towards apple and today we are going to train our model to predict whether the tweet contains sentiment!\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "OrVb5ZMvvrQD" + }, + "source": [ + "! wget https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/resources/en/classifier-dl/apple-twitter/apple-twitter-sentiment-texts.csv\n" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 424 + }, + "id": "y4xSRWIhwT28", + "outputId": "fd429b0f-a052-4017-e497-aa73d3566f3e" + }, + "source": [ + "import pandas as pd\n", + "train_path = '/content/apple-twitter-sentiment-texts.csv'\n", + "\n", + "train_df = pd.read_csv(train_path)\n", + "# the text data to use for classification should be in a column named 'text'\n", + "# the label column must have name 'y' name be of type str\n", + "columns=['text','y']\n", + "train_df = train_df[columns]\n", + "train_df = train_df[~train_df[\"y\"].isin([\"neutral\"])]\n", + "from sklearn.model_selection import train_test_split\n", + "\n", + "train_df, test_df = train_test_split(train_df, test_size=0.2)\n", + "train_df" + ], + "execution_count": 3, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " text y\n", + "44 I'm surprised there isn't more talk about what... negative\n", + "691 over 1 hour on hold with @apple customer servi... negative\n", + "1371 I'm still mad @apple negative\n", + "870 @jokigenki @Apple I think it's like 2011? Can'... negative\n", + "1226 @apple #ios8 The lack of true keyboard integra... negative\n", + "... ... ...\n", + "1392 itunes is awful & is ruining my life fix y... negative\n", + "733 Happy Monday! My camera on my fancy @Apple #iP... negative\n", + "503 Phone just died while it was plug in. @apple w... negative\n", + "634 Whoever downgrades from a iphone 6 to a 5S obv... negative\n", + "891 I can't queue songs on an iPhone @apple @timco... negative\n", + "\n", + "[663 rows x 2 columns]" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
texty
44I'm surprised there isn't more talk about what...negative
691over 1 hour on hold with @apple customer servi...negative
1371I'm still mad @applenegative
870@jokigenki @Apple I think it's like 2011? Can'...negative
1226@apple #ios8 The lack of true keyboard integra...negative
.........
1392itunes is awful &amp; is ruining my life fix y...negative
733Happy Monday! My camera on my fancy @Apple #iP...negative
503Phone just died while it was plug in. @apple w...negative
634Whoever downgrades from a iphone 6 to a 5S obv...negative
891I can't queue songs on an iPhone @apple @timco...negative
\n", + "

663 rows × 2 columns

\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 3 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0296Om2C5anY" + }, + "source": [ + "# 3. Train Deep Learning Classifier using nlu.load('train.sentiment')\n", + "\n", + "You dataset label column should be named 'y' and the feature column with text data should be named 'text'" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "id": "3ZIPkRkWftBG", + "outputId": "463803c9-0441-4007-c5c1-133b15f038b4" + }, + "source": [ + "from sklearn.metrics import classification_report\n", + "from johnsnowlabs import nlp\n", + "\n", + "# load a trainable pipeline by specifying the train. prefix and fit it on a datset with label and text columns\n", + "# by default the Universal Sentence Encoder (USE) Sentence embeddings are used for generation\n", + "trainable_pipe = nlp.load('train.sentiment')\n", + "\n", + "fitted_pipe = trainable_pipe.fit(train_df.iloc[:50])\n", + "\n", + "# predict with the trainable pipeline on dataset and get predictions\n", + "preds = fitted_pipe.predict(train_df.iloc[:50],output_level='document')\n", + "#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n", + "preds.dropna(inplace=True)\n", + "print(classification_report(preds['y'], preds['sentiment']))\n", + "\n", + "preds" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "sent_small_bert_L2_128 download started this may take some time.\n", + "Approximate size to download 16.1 MB\n", + "[OK!]\n", + " precision recall f1-score support\n", + "\n", + " negative 0.82 1.00 0.90 41\n", + " positive 0.00 0.00 0.00 9\n", + "\n", + " accuracy 0.82 50\n", + " macro avg 0.41 0.50 0.45 50\n", + "weighted avg 0.67 0.82 0.74 50\n", + "\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " document \\\n", + "0 @apple fucking let everyone name the group cha... \n", + "1 As a die hard @Apple customer, I must say I am... \n", + "2 RT @_iamGambino: Thank you @Apple \n", + "3 YO YOU AINT SHIT @apple \n", + "4 Theyre not RT @Naivana_: You gotta be kidding ... \n", + "5 My MacBook Pro is now as annoying as my ASUS W... \n", + "6 It's just dawned on me that I've probably spen... \n", + "7 Hey @apple @sprint I'm not a fan of your lates... \n", + "8 @apple y'all shitty \n", + "9 YO I DIDNT TOUCH MY PHONE AT ALL AND RIGHT WHE... \n", + "10 FUCK @apple \n", + "11 my iphone6 plus is impossible to hold without ... \n", + "12 @apple last time I checked I thought I bought ... \n", + "13 just need @apple to come out with a charger co... \n", + "14 RT @peterpham: Bought my @AugustSmartLock at t... \n", + "15 Apple's iPhone 6 Plus Amazingly Captures 41% o... \n", + "16 @SamJam Agreed--have to give props to @Apple f... \n", + "17 @jakeflem @Apple Yes It seems to fix it good t... \n", + "18 It shouldn't take a whole week to replace my h... \n", + "19 @nigxnog @Apple bruh that means u type that a ... \n", + "20 I don't undestand how @SYFNews #CreditCare web... \n", + "21 Steve Jobs Predicted Future Of E-Commerce Back... \n", + "22 These Damn @Apple Commercials Are Getting Wors... \n", + "23 Those** PICK UP THE SLACK YOU FUCK BOYS @Apple \n", + "24 hey @apple can I catch a fucking maverick of a... \n", + "25 @apple#ipad #irig For the price to connect my ... \n", + "26 fucking @apple are memer FAGGOTS http://t.co/w... \n", + "27 Safari just crashed on me and I didn't have an... \n", + "28 @Apple deleted users' non-#iTunes music and di... \n", + "29 @PCAudioLabs is in and @Apple is out at Emanon... \n", + "30 @jokigenki @Apple I think it's like 2011? Can'... \n", + "31 RT @tschwettman: hey @apple why won't you let ... \n", + "32 @wastwater1 l agree with you, they're about as... \n", + "33 How do I log into iCloud on my phone? Not the ... \n", + "34 Shout out to @Apple for making crappy iPhone a... \n", + "35 We're so excited to be named to @Apple's 'App ... \n", + "36 my iPhone is fucked. Thanks @apple and @EE wha... \n", + "37 iPhone6 fell 2 ft. Screen shattered like it wa... \n", + "38 CNBCTV: #Chromecast beats #AppleTV #aapl http:... \n", + "39 It makes you smarter. Elevate is @apple app of... \n", + "40 IPhone6 has too many issues, why tf is the ear... \n", + "41 Great time had @Apple store on Friday. @Russel... \n", + "42 Hey @apple are you even thinking about fixing ... \n", + "43 . @apple I don't think the '59' should be so c... \n", + "44 .@apple why do your computers like to crash on... \n", + "45 Ha, poor @apple trying to make its iPad and Ma... \n", + "46 You'll be back in my life soon @apple \n", + "47 iTunes is pissing me tf off @apple \n", + "48 RT @bchmura12: .@apple you suck \n", + "49 Updated to Yosemite on two machines. Both are ... \n", + "\n", + " sentence_embedding_small_bert_L2_128 sentiment \\\n", + "0 [-1.5767942667007446, -0.2661866843700409, 0.1... negative \n", + "1 [-0.1864045411348343, 0.37810075283050537, -0.... negative \n", + "2 [-0.39863264560699463, -0.018525924533605576, ... negative \n", + "3 [-1.3272020816802979, -0.6504784226417542, -0.... negative \n", + "4 [-0.26906388998031616, 0.16873443126678467, 0.... negative \n", + "5 [-0.7138646841049194, -0.03021504543721676, 0.... negative \n", + "6 [-0.08956478536128998, -0.09104951471090317, -... negative \n", + "7 [-0.6237524747848511, 0.10997672379016876, 0.6... negative \n", + "8 [-1.721300482749939, -1.328816533088684, 0.083... negative \n", + "9 [-0.7379449605941772, 0.2959931790828705, -0.4... negative \n", + "10 [-1.0854371786117554, -0.38098257780075073, 0.... negative \n", + "11 [-0.06848365068435669, 0.4246528148651123, 0.1... negative \n", + "12 [-1.1634764671325684, 0.5120655298233032, -0.0... negative \n", + "13 [-0.5657770037651062, 0.8361220359802246, -0.1... negative \n", + "14 [-0.43666037917137146, 0.7088866829872131, 0.4... negative \n", + "15 [-0.1403159201145172, 0.65235835313797, 0.1092... negative \n", + "16 [-0.2358071506023407, 0.060792725533246994, -0... negative \n", + "17 [-1.220468521118164, 0.09469269961118698, -0.0... negative \n", + "18 [-1.0816304683685303, 0.6302781701087952, -0.4... negative \n", + "19 [-1.056034803390503, 0.15630772709846497, 0.51... negative \n", + "20 [-0.6713889241218567, 0.5271422266960144, 0.07... negative \n", + "21 [-0.3053719401359558, 0.9345525503158569, 0.21... negative \n", + "22 [-0.8299529552459717, -0.4301386773586273, -0.... negative \n", + "23 [-1.3581013679504395, -0.15301916003227234, 0.... negative \n", + "24 [-0.6373146772384644, 0.5904020667076111, 0.28... negative \n", + "25 [-0.5737035870552063, 0.6071469783782959, 0.06... negative \n", + "26 [-0.04430821165442467, 0.44264474511146545, 0.... negative \n", + "27 [-0.32811102271080017, 0.6601451635360718, -0.... negative \n", + "28 [-0.3346312940120697, 0.2459762543439865, 0.54... negative \n", + "29 [-0.05351848155260086, 0.6132703423500061, 0.0... negative \n", + "30 [-0.9171913862228394, -0.03775791823863983, 0.... negative \n", + "31 [-0.8461720943450928, 0.45761561393737793, 0.1... negative \n", + "32 [-0.5620263814926147, 0.1833726316690445, 0.10... negative \n", + "33 [-1.0261833667755127, 0.39792588353157043, -0.... negative \n", + "34 [-1.0023715496063232, 0.22417372465133667, 0.0... negative \n", + "35 [-0.3373744487762451, 0.1699923425912857, 0.39... negative \n", + "36 [-0.6871265769004822, 0.16453255712985992, -0.... negative \n", + "37 [-0.38180163502693176, 0.26194843649864197, -0... negative \n", + "38 [0.25390008091926575, 0.2027512639760971, 0.71... negative \n", + "39 [-0.3650900721549988, -0.056419190019369125, 0... negative \n", + "40 [-0.5127007961273193, 0.68586665391922, -0.193... negative \n", + "41 [-0.8063774704933167, 0.4269622266292572, 0.51... negative \n", + "42 [-0.9981245994567871, -0.04270806908607483, 0.... negative \n", + "43 [-1.193355679512024, 0.2766774296760559, -0.14... negative \n", + "44 [-1.2451688051223755, -0.2248333841562271, -0.... negative \n", + "45 [-0.7224618196487427, 0.03269221633672714, -0.... negative \n", + "46 [-1.220320701599121, -0.1481868475675583, -0.3... negative \n", + "47 [-1.0455917119979858, 0.08097667992115021, 0.3... negative \n", + "48 [-0.9056015014648438, -0.7452876567840576, 0.6... negative \n", + "49 [-0.5752751231193542, 0.05041218921542168, -0.... negative \n", + "\n", + " sentiment_confidence text \\\n", + "0 7.0 @apple fucking let everyone name the group cha... \n", + "1 5.0 As a die hard @Apple customer, I must say I am... \n", + "2 9.0 RT @_iamGambino: Thank you @Apple \n", + "3 1.0 YO YOU AINT SHIT @apple \n", + "4 3.0 Theyre not RT @Naivana_: You gotta be kidding ... \n", + "5 1.0 My MacBook Pro is now as annoying as my ASUS W... \n", + "6 3.0 It's just dawned on me that I've probably spen... \n", + "7 3.0 Hey @apple @sprint I'm not a fan of your lates... \n", + "8 1.0 @apple y'all shitty \n", + "9 8.0 YO I DIDNT TOUCH MY PHONE AT ALL AND RIGHT WHE... \n", + "10 2.0 FUCK @apple \n", + "11 1.0 my iphone6 plus is impossible to hold without ... \n", + "12 1.0 @apple last time I checked I thought I bought ... \n", + "13 3.0 just need @apple to come out with a charger co... \n", + "14 1.0 RT @peterpham: Bought my @AugustSmartLock at t... \n", + "15 2.0 Apple's iPhone 6 Plus Amazingly Captures 41% o... \n", + "16 4.0 @SamJam Agreed--have to give props to @Apple f... \n", + "17 2.0 @jakeflem @Apple Yes It seems to fix it good t... \n", + "18 1.0 It shouldn't take a whole week to replace my h... \n", + "19 1.0 @nigxnog @Apple bruh that means u type that a ... \n", + "20 1.0 I don't undestand how @SYFNews #CreditCare web... \n", + "21 6.0 Steve Jobs Predicted Future Of E-Commerce Back... \n", + "22 4.0 These Damn @Apple Commercials Are Getting Wors... \n", + "23 5.0 Those** PICK UP THE SLACK YOU FUCK BOYS @Apple \n", + "24 4.0 hey @apple can I catch a fucking maverick of a... \n", + "25 6.0 @apple#ipad #irig For the price to connect my... \n", + "26 2.0 fucking @apple are memer FAGGOTS http://t.co/w... \n", + "27 6.0 Safari just crashed on me and I didn't have an... \n", + "28 1.0 @Apple deleted users' non-#iTunes music and di... \n", + "29 8.0 @PCAudioLabs is in and @Apple is out at Emanon... \n", + "30 9.0 @jokigenki @Apple I think it's like 2011? Can'... \n", + "31 3.0 RT @tschwettman: hey @apple why won't you let ... \n", + "32 1.0 @wastwater1 l agree with you, they're about as... \n", + "33 6.0 How do I log into iCloud on my phone? Not the ... \n", + "34 3.0 Shout out to @Apple for making crappy iPhone a... \n", + "35 3.0 We're so excited to be named to @Apple's 'App ... \n", + "36 2.0 my iPhone is fucked. Thanks @apple and @EE wha... \n", + "37 1.0 iPhone6 fell 2 ft. Screen shattered like it wa... \n", + "38 1.0 CNBCTV: #Chromecast beats #AppleTV #aapl http... \n", + "39 4.0 It makes you smarter. Elevate is @apple app o... \n", + "40 5.0 IPhone6 has too many issues, why tf is the ea... \n", + "41 4.0 Great time had @Apple store on Friday. @Russel... \n", + "42 1.0 Hey @apple are you even thinking about fixing ... \n", + "43 1.0 . @apple I don't think the '59' should be so c... \n", + "44 5.0 .@apple why do your computers like to crash on... \n", + "45 5.0 Ha, poor @apple trying to make its iPad and Ma... \n", + "46 6.0 You'll be back in my life soon @apple \n", + "47 1.0 iTunes is pissing me tf off @apple \n", + "48 2.0 RT @bchmura12: .@apple you suck \n", + "49 1.0 Updated to Yosemite on two machines. Both are ... \n", + "\n", + " y \n", + "0 negative \n", + "1 negative \n", + "2 positive \n", + "3 negative \n", + "4 negative \n", + "5 negative \n", + "6 negative \n", + "7 negative \n", + "8 negative \n", + "9 negative \n", + "10 negative \n", + "11 negative \n", + "12 negative \n", + "13 negative \n", + "14 positive \n", + "15 positive \n", + "16 positive \n", + "17 negative \n", + "18 negative \n", + "19 negative \n", + "20 negative \n", + "21 positive \n", + "22 negative \n", + "23 negative \n", + "24 negative \n", + "25 negative \n", + "26 negative \n", + "27 negative \n", + "28 negative \n", + "29 negative \n", + "30 negative \n", + "31 negative \n", + "32 negative \n", + "33 negative \n", + "34 negative \n", + "35 positive \n", + "36 negative \n", + "37 negative \n", + "38 negative \n", + "39 positive \n", + "40 negative \n", + "41 positive \n", + "42 negative \n", + "43 negative \n", + "44 negative \n", + "45 negative \n", + "46 positive \n", + "47 negative \n", + "48 negative \n", + "49 negative " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
documentsentence_embedding_small_bert_L2_128sentimentsentiment_confidencetexty
0@apple fucking let everyone name the group cha...[-1.5767942667007446, -0.2661866843700409, 0.1...negative7.0@apple fucking let everyone name the group cha...negative
1As a die hard @Apple customer, I must say I am...[-0.1864045411348343, 0.37810075283050537, -0....negative5.0As a die hard @Apple customer, I must say I am...negative
2RT @_iamGambino: Thank you @Apple[-0.39863264560699463, -0.018525924533605576, ...negative9.0RT @_iamGambino: Thank you @Applepositive
3YO YOU AINT SHIT @apple[-1.3272020816802979, -0.6504784226417542, -0....negative1.0YO YOU AINT SHIT @applenegative
4Theyre not RT @Naivana_: You gotta be kidding ...[-0.26906388998031616, 0.16873443126678467, 0....negative3.0Theyre not RT @Naivana_: You gotta be kidding ...negative
5My MacBook Pro is now as annoying as my ASUS W...[-0.7138646841049194, -0.03021504543721676, 0....negative1.0My MacBook Pro is now as annoying as my ASUS W...negative
6It's just dawned on me that I've probably spen...[-0.08956478536128998, -0.09104951471090317, -...negative3.0It's just dawned on me that I've probably spen...negative
7Hey @apple @sprint I'm not a fan of your lates...[-0.6237524747848511, 0.10997672379016876, 0.6...negative3.0Hey @apple @sprint I'm not a fan of your lates...negative
8@apple y'all shitty[-1.721300482749939, -1.328816533088684, 0.083...negative1.0@apple y'all shittynegative
9YO I DIDNT TOUCH MY PHONE AT ALL AND RIGHT WHE...[-0.7379449605941772, 0.2959931790828705, -0.4...negative8.0YO I DIDNT TOUCH MY PHONE AT ALL AND RIGHT WHE...negative
10FUCK @apple[-1.0854371786117554, -0.38098257780075073, 0....negative2.0FUCK @applenegative
11my iphone6 plus is impossible to hold without ...[-0.06848365068435669, 0.4246528148651123, 0.1...negative1.0my iphone6 plus is impossible to hold without ...negative
12@apple last time I checked I thought I bought ...[-1.1634764671325684, 0.5120655298233032, -0.0...negative1.0@apple last time I checked I thought I bought ...negative
13just need @apple to come out with a charger co...[-0.5657770037651062, 0.8361220359802246, -0.1...negative3.0just need @apple to come out with a charger co...negative
14RT @peterpham: Bought my @AugustSmartLock at t...[-0.43666037917137146, 0.7088866829872131, 0.4...negative1.0RT @peterpham: Bought my @AugustSmartLock at t...positive
15Apple's iPhone 6 Plus Amazingly Captures 41% o...[-0.1403159201145172, 0.65235835313797, 0.1092...negative2.0Apple's iPhone 6 Plus Amazingly Captures 41% o...positive
16@SamJam Agreed--have to give props to @Apple f...[-0.2358071506023407, 0.060792725533246994, -0...negative4.0@SamJam Agreed--have to give props to @Apple f...positive
17@jakeflem @Apple Yes It seems to fix it good t...[-1.220468521118164, 0.09469269961118698, -0.0...negative2.0@jakeflem @Apple Yes It seems to fix it good t...negative
18It shouldn't take a whole week to replace my h...[-1.0816304683685303, 0.6302781701087952, -0.4...negative1.0It shouldn't take a whole week to replace my h...negative
19@nigxnog @Apple bruh that means u type that a ...[-1.056034803390503, 0.15630772709846497, 0.51...negative1.0@nigxnog @Apple bruh that means u type that a ...negative
20I don't undestand how @SYFNews #CreditCare web...[-0.6713889241218567, 0.5271422266960144, 0.07...negative1.0I don't undestand how @SYFNews #CreditCare web...negative
21Steve Jobs Predicted Future Of E-Commerce Back...[-0.3053719401359558, 0.9345525503158569, 0.21...negative6.0Steve Jobs Predicted Future Of E-Commerce Back...positive
22These Damn @Apple Commercials Are Getting Wors...[-0.8299529552459717, -0.4301386773586273, -0....negative4.0These Damn @Apple Commercials Are Getting Wors...negative
23Those** PICK UP THE SLACK YOU FUCK BOYS @Apple[-1.3581013679504395, -0.15301916003227234, 0....negative5.0Those** PICK UP THE SLACK YOU FUCK BOYS @Applenegative
24hey @apple can I catch a fucking maverick of a...[-0.6373146772384644, 0.5904020667076111, 0.28...negative4.0hey @apple can I catch a fucking maverick of a...negative
25@apple#ipad #irig For the price to connect my ...[-0.5737035870552063, 0.6071469783782959, 0.06...negative6.0@apple#ipad #irig For the price to connect my...negative
26fucking @apple are memer FAGGOTS http://t.co/w...[-0.04430821165442467, 0.44264474511146545, 0....negative2.0fucking @apple are memer FAGGOTS http://t.co/w...negative
27Safari just crashed on me and I didn't have an...[-0.32811102271080017, 0.6601451635360718, -0....negative6.0Safari just crashed on me and I didn't have an...negative
28@Apple deleted users' non-#iTunes music and di...[-0.3346312940120697, 0.2459762543439865, 0.54...negative1.0@Apple deleted users' non-#iTunes music and di...negative
29@PCAudioLabs is in and @Apple is out at Emanon...[-0.05351848155260086, 0.6132703423500061, 0.0...negative8.0@PCAudioLabs is in and @Apple is out at Emanon...negative
30@jokigenki @Apple I think it's like 2011? Can'...[-0.9171913862228394, -0.03775791823863983, 0....negative9.0@jokigenki @Apple I think it's like 2011? Can'...negative
31RT @tschwettman: hey @apple why won't you let ...[-0.8461720943450928, 0.45761561393737793, 0.1...negative3.0RT @tschwettman: hey @apple why won't you let ...negative
32@wastwater1 l agree with you, they're about as...[-0.5620263814926147, 0.1833726316690445, 0.10...negative1.0@wastwater1 l agree with you, they're about as...negative
33How do I log into iCloud on my phone? Not the ...[-1.0261833667755127, 0.39792588353157043, -0....negative6.0How do I log into iCloud on my phone? Not the ...negative
34Shout out to @Apple for making crappy iPhone a...[-1.0023715496063232, 0.22417372465133667, 0.0...negative3.0Shout out to @Apple for making crappy iPhone a...negative
35We're so excited to be named to @Apple's 'App ...[-0.3373744487762451, 0.1699923425912857, 0.39...negative3.0We're so excited to be named to @Apple's 'App ...positive
36my iPhone is fucked. Thanks @apple and @EE wha...[-0.6871265769004822, 0.16453255712985992, -0....negative2.0my iPhone is fucked. Thanks @apple and @EE wha...negative
37iPhone6 fell 2 ft. Screen shattered like it wa...[-0.38180163502693176, 0.26194843649864197, -0...negative1.0iPhone6 fell 2 ft. Screen shattered like it wa...negative
38CNBCTV: #Chromecast beats #AppleTV #aapl http:...[0.25390008091926575, 0.2027512639760971, 0.71...negative1.0CNBCTV: #Chromecast beats #AppleTV #aapl http...negative
39It makes you smarter. Elevate is @apple app of...[-0.3650900721549988, -0.056419190019369125, 0...negative4.0It makes you smarter. Elevate is @apple app o...positive
40IPhone6 has too many issues, why tf is the ear...[-0.5127007961273193, 0.68586665391922, -0.193...negative5.0IPhone6 has too many issues, why tf is the ea...negative
41Great time had @Apple store on Friday. @Russel...[-0.8063774704933167, 0.4269622266292572, 0.51...negative4.0Great time had @Apple store on Friday. @Russel...positive
42Hey @apple are you even thinking about fixing ...[-0.9981245994567871, -0.04270806908607483, 0....negative1.0Hey @apple are you even thinking about fixing ...negative
43. @apple I don't think the '59' should be so c...[-1.193355679512024, 0.2766774296760559, -0.14...negative1.0. @apple I don't think the '59' should be so c...negative
44.@apple why do your computers like to crash on...[-1.2451688051223755, -0.2248333841562271, -0....negative5.0.@apple why do your computers like to crash on...negative
45Ha, poor @apple trying to make its iPad and Ma...[-0.7224618196487427, 0.03269221633672714, -0....negative5.0Ha, poor @apple trying to make its iPad and Ma...negative
46You'll be back in my life soon @apple[-1.220320701599121, -0.1481868475675583, -0.3...negative6.0You'll be back in my life soon @applepositive
47iTunes is pissing me tf off @apple[-1.0455917119979858, 0.08097667992115021, 0.3...negative1.0iTunes is pissing me tf off @applenegative
48RT @bchmura12: .@apple you suck[-0.9056015014648438, -0.7452876567840576, 0.6...negative2.0RT @bchmura12: .@apple you sucknegative
49Updated to Yosemite on two machines. Both are ...[-0.5752751231193542, 0.05041218921542168, -0....negative1.0Updated to Yosemite on two machines. Both are ...negative
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 3 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lVyOE2wV0fw_" + }, + "source": [ + "#4. Test the fitted pipe on new example" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "qdCUg2MR0PD2", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 150 + }, + "outputId": "40c2a76d-9bf9-4b99-e4fe-6db107e98c60" + }, + "source": [ + "fitted_pipe.predict('I hate the newest update')" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "sentence_detector_dl download started this may take some time.\n", + "Approximate size to download 354.6 KB\n", + "[OK!]\n", + "Warning::Spark Session already created, some configs may not take.\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " sentence \\\n", + "0 I hate the newest update \n", + "\n", + " sentence_embedding_small_bert_L2_128 sentiment \\\n", + "0 [-1.3662102222442627, 0.10369864851236343, -0.... negative \n", + "\n", + " sentiment_confidence \n", + "0 1.0 " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
sentencesentence_embedding_small_bert_L2_128sentimentsentiment_confidence
0I hate the newest update[-1.3662102222442627, 0.10369864851236343, -0....negative1.0
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 4 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xflpwrVjjBVD" + }, + "source": [ + "##5. Configure pipe training parameters" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "UtsAUGTmOTms", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "b831ec66-c65f-40f8-af41-e5c00fc87ee9" + }, + "source": [ + "trainable_pipe.print_info()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n", + ">>> component_list['bert_sentence_embeddings@sent_small_bert_L2_128'] has settable params:\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setBatchSize(8) | Info: Size of every batch | Currently set to : 8\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setIsLong(False) | Info: Use Long type instead of Int type for inputs buffer - Some Bert models require Long instead of Int. | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setMaxSentenceLength(128) | Info: Max sentence length to process | Currently set to : 128\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setDimension(128) | Info: Number of embedding dimensions | Currently set to : 128\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setCaseSensitive(False) | Info: whether to ignore case in tokens for embeddings matching | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setStorageRef('sent_small_bert_L2_128') | Info: unique reference name for identification | Currently set to : sent_small_bert_L2_128\n", + ">>> component_list['document_assembler'] has settable params:\n", + "component_list['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n", + ">>> component_list['sentiment_dl@sent_small_bert_L2_128'] has settable params:\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setThreshold(0.6) | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setThresholdLabel('neutral') | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setStorageRef('sent_small_bert_L2_128') | Info: unique reference name for identification | Currently set to : sent_small_bert_L2_128\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2GJdDNV9jEIe" + }, + "source": [ + "##6. Retrain with new parameters" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "mptfvHx-MMMX", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 736 + }, + "outputId": "a465590a-6409-4699-93df-734cc3ecde79" + }, + "source": [ + "# Train longer!\n", + "trainable_pipe = nlp.load('train.sentiment')\n", + "trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(5)\n", + "fitted_pipe = trainable_pipe.fit(train_df.iloc[:100])\n", + "# predict with the trainable pipeline on dataset and get predictions\n", + "preds = fitted_pipe.predict(train_df.iloc[:100],output_level='document')\n", + "\n", + "#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n", + "preds.dropna(inplace=True)\n", + "print(classification_report(preds['y'], preds['sentiment']))\n", + "\n", + "preds" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", + "sent_small_bert_L2_128 download started this may take some time.\n", + "Approximate size to download 16.1 MB\n", + "[OK!]\n", + " precision recall f1-score support\n", + "\n", + " negative 0.82 1.00 0.90 82\n", + " positive 0.00 0.00 0.00 18\n", + "\n", + " accuracy 0.82 100\n", + " macro avg 0.41 0.50 0.45 100\n", + "weighted avg 0.67 0.82 0.74 100\n", + "\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " document \\\n", + "0 @apple fucking let everyone name the group cha... \n", + "1 As a die hard @Apple customer, I must say I am... \n", + "2 RT @_iamGambino: Thank you @Apple \n", + "3 YO YOU AINT SHIT @apple \n", + "4 Theyre not RT @Naivana_: You gotta be kidding ... \n", + ".. ... \n", + "95 @Apple you need to sort your phones out. \n", + "96 Hey @apple, fuck you for thinking I want text ... \n", + "97 @Apple honestly sucks \n", + "98 How long does it really take for a phone to sh... \n", + "99 @whereiscooldude @Cyrus_T_Virus @Apple wait no... \n", + "\n", + " sentence_embedding_small_bert_L2_128 sentiment \\\n", + "0 [-1.5767942667007446, -0.2661866843700409, 0.1... negative \n", + "1 [-0.1864045411348343, 0.37810075283050537, -0.... negative \n", + "2 [-0.39863264560699463, -0.018525924533605576, ... negative \n", + "3 [-1.3272020816802979, -0.6504784226417542, -0.... negative \n", + "4 [-0.26906388998031616, 0.16873443126678467, 0.... negative \n", + ".. ... ... \n", + "95 [-0.788460910320282, 0.194553405046463, 0.0300... negative \n", + "96 [-0.7153223752975464, -0.0814858004450798, 0.2... negative \n", + "97 [-1.2007137537002563, -1.1623204946517944, 0.1... negative \n", + "98 [-0.9548358917236328, 0.6761127710342407, -0.1... negative \n", + "99 [-1.06529700756073, -0.23610278964042664, -0.4... negative \n", + "\n", + " sentiment_confidence text \\\n", + "0 1.0 @apple fucking let everyone name the group cha... \n", + "1 1.0 As a die hard @Apple customer, I must say I am... \n", + "2 1.0 RT @_iamGambino: Thank you @Apple \n", + "3 1.0 YO YOU AINT SHIT @apple \n", + "4 1.0 Theyre not RT @Naivana_: You gotta be kidding ... \n", + ".. ... ... \n", + "95 1.0 @Apple you need to sort your phones out. \n", + "96 1.0 Hey @apple, fuck you for thinking I want text ... \n", + "97 1.0 @Apple honestly sucks \n", + "98 1.0 How long does it really take for a phone to sh... \n", + "99 1.0 @whereiscooldude @Cyrus_T_Virus @Apple wait no... \n", + "\n", + " y \n", + "0 negative \n", + "1 negative \n", + "2 positive \n", + "3 negative \n", + "4 negative \n", + ".. ... \n", + "95 negative \n", + "96 negative \n", + "97 negative \n", + "98 negative \n", + "99 negative \n", + "\n", + "[100 rows x 6 columns]" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
documentsentence_embedding_small_bert_L2_128sentimentsentiment_confidencetexty
0@apple fucking let everyone name the group cha...[-1.5767942667007446, -0.2661866843700409, 0.1...negative1.0@apple fucking let everyone name the group cha...negative
1As a die hard @Apple customer, I must say I am...[-0.1864045411348343, 0.37810075283050537, -0....negative1.0As a die hard @Apple customer, I must say I am...negative
2RT @_iamGambino: Thank you @Apple[-0.39863264560699463, -0.018525924533605576, ...negative1.0RT @_iamGambino: Thank you @Applepositive
3YO YOU AINT SHIT @apple[-1.3272020816802979, -0.6504784226417542, -0....negative1.0YO YOU AINT SHIT @applenegative
4Theyre not RT @Naivana_: You gotta be kidding ...[-0.26906388998031616, 0.16873443126678467, 0....negative1.0Theyre not RT @Naivana_: You gotta be kidding ...negative
.....................
95@Apple you need to sort your phones out.[-0.788460910320282, 0.194553405046463, 0.0300...negative1.0@Apple you need to sort your phones out.negative
96Hey @apple, fuck you for thinking I want text ...[-0.7153223752975464, -0.0814858004450798, 0.2...negative1.0Hey @apple, fuck you for thinking I want text ...negative
97@Apple honestly sucks[-1.2007137537002563, -1.1623204946517944, 0.1...negative1.0@Apple honestly sucksnegative
98How long does it really take for a phone to sh...[-0.9548358917236328, 0.6761127710342407, -0.1...negative1.0How long does it really take for a phone to sh...negative
99@whereiscooldude @Cyrus_T_Virus @Apple wait no...[-1.06529700756073, -0.23610278964042664, -0.4...negative1.0@whereiscooldude @Cyrus_T_Virus @Apple wait no...negative
\n", + "

100 rows × 6 columns

\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 6 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qFoT-s1MjTSS" + }, + "source": [ + "#7. Try training with different Embeddings" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "nxWFzQOhjWC8", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "4df8af5e-3076-43eb-cb0e-d2e0799f602e" + }, + "source": [ + "# We can use nlu.print_components(action='embed_sentence') to see every possibler sentence embedding we could use. Lets use bert!\n", + "nlp.nlu.print_components(action='embed_sentence')" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "For language NLU provides the following Models : \n", + "nlu.load('am.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_amharic\n", + "For language NLU provides the following Models : \n", + "nlu.load('de.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "For language NLU provides the following Models : \n", + "nlu.load('el.embed_sentence.bert.base_uncased') returns Spark NLP model_anno_obj sent_bert_base_uncased\n", + "For language NLU provides the following Models : \n", + "nlu.load('en.embed_sentence') returns Spark NLP model_anno_obj tfhub_use\n", + "nlu.load('en.embed_sentence.albert') returns Spark NLP model_anno_obj albert_base_uncased\n", + "nlu.load('en.embed_sentence.bert') returns Spark NLP model_anno_obj sent_bert_base_uncased\n", + "nlu.load('en.embed_sentence.bert.base_uncased_legal') returns Spark NLP model_anno_obj sent_bert_base_uncased_legal\n", + "nlu.load('en.embed_sentence.bert.finetuned') returns Spark NLP model_anno_obj sbert_setfit_finetuned_financial_text_classification\n", + "nlu.load('en.embed_sentence.bert.pubmed') returns Spark NLP model_anno_obj sent_bert_pubmed\n", + "nlu.load('en.embed_sentence.bert.pubmed_squad2') returns Spark NLP model_anno_obj sent_bert_pubmed_squad2\n", + "nlu.load('en.embed_sentence.bert.wiki_books') returns Spark NLP model_anno_obj sent_bert_wiki_books\n", + "nlu.load('en.embed_sentence.bert.wiki_books_mnli') returns Spark NLP model_anno_obj sent_bert_wiki_books_mnli\n", + "nlu.load('en.embed_sentence.bert.wiki_books_qnli') returns Spark NLP model_anno_obj sent_bert_wiki_books_qnli\n", + "nlu.load('en.embed_sentence.bert.wiki_books_qqp') returns Spark NLP model_anno_obj sent_bert_wiki_books_qqp\n", + "nlu.load('en.embed_sentence.bert.wiki_books_squad2') returns Spark NLP model_anno_obj sent_bert_wiki_books_squad2\n", + "nlu.load('en.embed_sentence.bert.wiki_books_sst2') returns Spark NLP model_anno_obj sent_bert_wiki_books_sst2\n", + "nlu.load('en.embed_sentence.bert_base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "nlu.load('en.embed_sentence.bert_base_uncased') returns Spark NLP model_anno_obj sent_bert_base_uncased\n", + "nlu.load('en.embed_sentence.bert_large_cased') returns Spark NLP model_anno_obj sent_bert_large_cased\n", + "nlu.load('en.embed_sentence.bert_large_uncased') returns Spark NLP model_anno_obj sent_bert_large_uncased\n", + "nlu.load('en.embed_sentence.bert_use_cmlm_en_base') returns Spark NLP model_anno_obj sent_bert_use_cmlm_en_base\n", + "nlu.load('en.embed_sentence.bert_use_cmlm_en_large') returns Spark NLP model_anno_obj sent_bert_use_cmlm_en_large\n", + "nlu.load('en.embed_sentence.biobert.clinical_base_cased') returns Spark NLP model_anno_obj sent_biobert_clinical_base_cased\n", + "nlu.load('en.embed_sentence.biobert.discharge_base_cased') returns Spark NLP model_anno_obj sent_biobert_discharge_base_cased\n", + "nlu.load('en.embed_sentence.biobert.pmc_base_cased') returns Spark NLP model_anno_obj sent_biobert_pmc_base_cased\n", + "nlu.load('en.embed_sentence.biobert.pubmed_base_cased') returns Spark NLP model_anno_obj sent_biobert_pubmed_base_cased\n", + "nlu.load('en.embed_sentence.biobert.pubmed_large_cased') returns Spark NLP model_anno_obj sent_biobert_pubmed_large_cased\n", + "nlu.load('en.embed_sentence.biobert.pubmed_pmc_base_cased') returns Spark NLP model_anno_obj sent_biobert_pubmed_pmc_base_cased\n", + "nlu.load('en.embed_sentence.covidbert.large_uncased') returns Spark NLP model_anno_obj sent_covidbert_large_uncased\n", + "nlu.load('en.embed_sentence.distil_roberta.distilled_base') returns Spark NLP model_anno_obj sent_distilroberta_base\n", + "nlu.load('en.embed_sentence.doc2vec') returns Spark NLP model_anno_obj doc2vec_gigaword_300\n", + "nlu.load('en.embed_sentence.doc2vec.gigaword_300') returns Spark NLP model_anno_obj doc2vec_gigaword_300\n", + "nlu.load('en.embed_sentence.doc2vec.gigaword_wiki_300') returns Spark NLP model_anno_obj doc2vec_gigaword_wiki_300\n", + "nlu.load('en.embed_sentence.electra') returns Spark NLP model_anno_obj sent_electra_small_uncased\n", + "nlu.load('en.embed_sentence.electra_base_uncased') returns Spark NLP model_anno_obj sent_electra_base_uncased\n", + "nlu.load('en.embed_sentence.electra_large_uncased') returns Spark NLP model_anno_obj sent_electra_large_uncased\n", + "nlu.load('en.embed_sentence.electra_small_uncased') returns Spark NLP model_anno_obj sent_electra_small_uncased\n", + "nlu.load('en.embed_sentence.roberta.base') returns Spark NLP model_anno_obj sent_roberta_base\n", + "nlu.load('en.embed_sentence.roberta.large') returns Spark NLP model_anno_obj sent_roberta_large\n", + "nlu.load('en.embed_sentence.small_bert_L10_128') returns Spark NLP model_anno_obj sent_small_bert_L10_128\n", + "nlu.load('en.embed_sentence.small_bert_L10_256') returns Spark NLP model_anno_obj sent_small_bert_L10_256\n", + "nlu.load('en.embed_sentence.small_bert_L10_512') returns Spark NLP model_anno_obj sent_small_bert_L10_512\n", + "nlu.load('en.embed_sentence.small_bert_L10_768') returns Spark NLP model_anno_obj sent_small_bert_L10_768\n", + "nlu.load('en.embed_sentence.small_bert_L12_128') returns Spark NLP model_anno_obj sent_small_bert_L12_128\n", + "nlu.load('en.embed_sentence.small_bert_L12_256') returns Spark NLP model_anno_obj sent_small_bert_L12_256\n", + "nlu.load('en.embed_sentence.small_bert_L12_512') returns Spark NLP model_anno_obj sent_small_bert_L12_512\n", + "nlu.load('en.embed_sentence.small_bert_L12_768') returns Spark NLP model_anno_obj sent_small_bert_L12_768\n", + "nlu.load('en.embed_sentence.small_bert_L2_128') returns Spark NLP model_anno_obj sent_small_bert_L2_128\n", + "nlu.load('en.embed_sentence.small_bert_L2_256') returns Spark NLP model_anno_obj sent_small_bert_L2_256\n", + "nlu.load('en.embed_sentence.small_bert_L2_512') returns Spark NLP model_anno_obj sent_small_bert_L2_512\n", + "nlu.load('en.embed_sentence.small_bert_L2_768') returns Spark NLP model_anno_obj sent_small_bert_L2_768\n", + "nlu.load('en.embed_sentence.small_bert_L4_128') returns Spark NLP model_anno_obj sent_small_bert_L4_128\n", + "nlu.load('en.embed_sentence.small_bert_L4_256') returns Spark NLP model_anno_obj sent_small_bert_L4_256\n", + "nlu.load('en.embed_sentence.small_bert_L4_512') returns Spark NLP model_anno_obj sent_small_bert_L4_512\n", + "nlu.load('en.embed_sentence.small_bert_L4_768') returns Spark NLP model_anno_obj sent_small_bert_L4_768\n", + "nlu.load('en.embed_sentence.small_bert_L6_128') returns Spark NLP model_anno_obj sent_small_bert_L6_128\n", + "nlu.load('en.embed_sentence.small_bert_L6_256') returns Spark NLP model_anno_obj sent_small_bert_L6_256\n", + "nlu.load('en.embed_sentence.small_bert_L6_512') returns Spark NLP model_anno_obj sent_small_bert_L6_512\n", + "nlu.load('en.embed_sentence.small_bert_L6_768') returns Spark NLP model_anno_obj sent_small_bert_L6_768\n", + "nlu.load('en.embed_sentence.small_bert_L8_128') returns Spark NLP model_anno_obj sent_small_bert_L8_128\n", + "nlu.load('en.embed_sentence.small_bert_L8_256') returns Spark NLP model_anno_obj sent_small_bert_L8_256\n", + "nlu.load('en.embed_sentence.small_bert_L8_512') returns Spark NLP model_anno_obj sent_small_bert_L8_512\n", + "nlu.load('en.embed_sentence.small_bert_L8_768') returns Spark NLP model_anno_obj sent_small_bert_L8_768\n", + "nlu.load('en.embed_sentence.tfhub_use') returns Spark NLP model_anno_obj tfhub_use\n", + "nlu.load('en.embed_sentence.tfhub_use.lg') returns Spark NLP model_anno_obj tfhub_use_lg\n", + "nlu.load('en.embed_sentence.use') returns Spark NLP model_anno_obj tfhub_use\n", + "nlu.load('en.embed_sentence.use.lg') returns Spark NLP model_anno_obj tfhub_use_lg\n", + "For language NLU provides the following Models : \n", + "nlu.load('es.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "nlu.load('es.embed_sentence.bert.base_uncased') returns Spark NLP model_anno_obj sent_bert_base_uncased\n", + "For language NLU provides the following Models : \n", + "nlu.load('fi.embed_sentence.bert') returns Spark NLP model_anno_obj bert_base_finnish_uncased\n", + "nlu.load('fi.embed_sentence.bert.cased') returns Spark NLP model_anno_obj bert_base_finnish_cased\n", + "nlu.load('fi.embed_sentence.bert.uncased') returns Spark NLP model_anno_obj bert_base_finnish_uncased\n", + "For language NLU provides the following Models : \n", + "nlu.load('ha.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_hausa\n", + "For language NLU provides the following Models : \n", + "nlu.load('ig.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_igbo\n", + "For language NLU provides the following Models : \n", + "nlu.load('lg.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_luganda\n", + "For language NLU provides the following Models : \n", + "nlu.load('nl.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "For language NLU provides the following Models : \n", + "nlu.load('pcm.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_naija\n", + "For language NLU provides the following Models : \n", + "nlu.load('pt.embed_sentence.bert.base_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_base_tsdae_sts\n", + "nlu.load('pt.embed_sentence.bert.cased_large_legal') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.1\n", + "nlu.load('pt.embed_sentence.bert.large_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_gpl_sts\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.10.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.10\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.2.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.2\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.3.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.3\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.4.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.4\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.5.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.5\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.7.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.7\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.8.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.8\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.9.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.9\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v1.0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v1.0\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_gpl_nli_sts_v0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_gpl_nli_sts_v0\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_gpl_nli_sts_v1.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_gpl_nli_sts_v1\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_nli_sts_v0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_nli_sts_v0\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_nli_sts_v1.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_nli_sts_v1\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_sts_v0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_sts_v0\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_sts_v1.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_sts_v1\n", + "nlu.load('pt.embed_sentence.bert.v2_base_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_base_ma_v2\n", + "nlu.load('pt.embed_sentence.bert.v2_large_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_sts_v2\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.assin.base.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_base_ma\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.assin2.base.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_base\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_sts_by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_sts_ma.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large_ma\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_sts_ma_v3.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large_ma_v3\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_tsdae_sts.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_sts\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_tsdae_sts_v4.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_sts_v4\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_tsdae_v4_gpl_sts.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_v4_gpl_sts\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.v2_large_sts_v2.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large_v2\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.v2_large_v2_sts.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_v2_sts\n", + "For language NLU provides the following Models : \n", + "nlu.load('rw.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_kinyarwanda\n", + "For language NLU provides the following Models : \n", + "nlu.load('sv.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "For language NLU provides the following Models : \n", + "nlu.load('sw.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_swahili\n", + "For language NLU provides the following Models : \n", + "nlu.load('wo.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_wolof\n", + "For language NLU provides the following Models : \n", + "nlu.load('xx.embed_sentence') returns Spark NLP model_anno_obj sent_bert_multi_cased\n", + "nlu.load('xx.embed_sentence.bert') returns Spark NLP model_anno_obj sent_bert_multi_cased\n", + "nlu.load('xx.embed_sentence.bert.cased') returns Spark NLP model_anno_obj sent_bert_multi_cased\n", + "nlu.load('xx.embed_sentence.bert.muril') returns Spark NLP model_anno_obj sent_bert_muril\n", + "nlu.load('xx.embed_sentence.bert_use_cmlm_multi_base') returns Spark NLP model_anno_obj sent_bert_use_cmlm_multi_base\n", + "nlu.load('xx.embed_sentence.bert_use_cmlm_multi_base_br') returns Spark NLP model_anno_obj sent_bert_use_cmlm_multi_base_br\n", + "nlu.load('xx.embed_sentence.labse') returns Spark NLP model_anno_obj labse\n", + "nlu.load('xx.embed_sentence.xlm_roberta.base') returns Spark NLP model_anno_obj sent_xlm_roberta_base\n", + "For language NLU provides the following Models : \n", + "nlu.load('yo.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_yoruba\n", + "For language NLU provides the following Models : \n", + "nlu.load('zh.embed_sentence.bert') returns Spark NLP model_anno_obj sbert_chinese_qmc_finance_v1\n", + "nlu.load('zh.embed_sentence.bert.distilled') returns Spark NLP model_anno_obj sbert_chinese_qmc_finance_v1_distill\n" + ] + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "eLex095goHwm", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "b860cf26-d169-410a-931c-57d98997255f" + }, + "source": [ + "trainable_pipe = nlp.load('en.embed_sentence.small_bert_L12_768 train.sentiment')\n", + "# We need to train longer and user smaller LR for NON-USE based sentence embeddings usually\n", + "# We could tune the hyperparameters further with hyperparameter tuning methods like gridsearch\n", + "# Also longer training gives more accuracy\n", + "trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(110)\n", + "trainable_pipe['trainable_sentiment_dl'].setLr(0.0005)\n", + "fitted_pipe = trainable_pipe.fit(train_df)\n", + "# predict with the trainable pipeline on dataset and get predictions\n", + "preds = fitted_pipe.predict(train_df,output_level='document')\n", + "\n", + "#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n", + "preds.dropna(inplace=True)\n", + "print(classification_report(preds['y'], preds['sentiment']))\n", + "\n", + "#preds" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", + "sent_small_bert_L12_768 download started this may take some time.\n", + "Approximate size to download 392.9 MB\n", + "[OK!]\n", + " precision recall f1-score support\n", + "\n", + " negative 0.83 1.00 0.91 551\n", + " positive 0.00 0.00 0.00 112\n", + "\n", + " accuracy 0.83 663\n", + " macro avg 0.42 0.50 0.45 663\n", + "weighted avg 0.69 0.83 0.75 663\n", + "\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_1jxw3GnVGlI" + }, + "source": [ + "# 7.1 evaluate on Test Data" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Fxx4yNkNVGFl", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "073133f0-e3be-4d87-c114-bc0c0a6d21e1" + }, + "source": [ + "preds = fitted_pipe.predict(test_df,output_level='document')\n", + "\n", + "#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n", + "preds.dropna(inplace=True)\n", + "print(classification_report(preds['y'], preds['sentiment']))" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + " precision recall f1-score support\n", + "\n", + " negative 0.81 1.00 0.90 135\n", + " positive 0.00 0.00 0.00 31\n", + "\n", + " accuracy 0.81 166\n", + " macro avg 0.41 0.50 0.45 166\n", + "weighted avg 0.66 0.81 0.73 166\n", + "\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2BB-NwZUoHSe" + }, + "source": [ + "# 8. Lets save the model" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "bZZpObLOtqo8" + }, + "source": [ + "stored_model_path = './models/classifier_dl_trained'\n", + "fitted_pipe.save(stored_model_path)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e_b2DPd4rCiU" + }, + "source": [ + "# 9. Lets load the model from HDD.\n", + "This makes Offlien NLU usage possible! \n", + "You need to call nlu.load(path=path_to_the_pipe) to load a model/pipeline from disk." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "SO4uz45MoRgp", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 133 + }, + "outputId": "63658ab4-ed77-46fb-843f-47d0e8eb4bfa" + }, + "source": [ + "hdd_pipe = nlp.load(path=stored_model_path)\n", + "\n", + "preds = hdd_pipe.predict('I hate the newest update')\n", + "preds" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " document \\\n", + "0 I hate the newest update \n", + "\n", + " sentence_embedding_from_disk sentiment \\\n", + "0 [-0.3084234893321991, -0.1103060245513916, 0.1... negative \n", + "\n", + " sentiment_confidence \n", + "0 2.0 " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
documentsentence_embedding_from_disksentimentsentiment_confidence
0I hate the newest update[-0.3084234893321991, -0.1103060245513916, 0.1...negative2.0
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 12 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "e0CVlkk9v6Qi", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "60215391-eb4e-42c0-94ff-51e4dedbe5fe" + }, + "source": [ + "hdd_pipe.print_info()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n", + ">>> component_list['document_assembler'] has settable params:\n", + "component_list['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n", + ">>> component_list['bert_sentence_embeddings@sent_small_bert_L12_768'] has settable params:\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setBatchSize(8) | Info: Size of every batch | Currently set to : 8\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setCaseSensitive(False) | Info: whether to ignore case in tokens for embeddings matching | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setDimension(768) | Info: Number of embedding dimensions | Currently set to : 768\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setMaxSentenceLength(128) | Info: Max sentence length to process | Currently set to : 128\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setIsLong(False) | Info: Use Long type instead of Int type for inputs buffer - Some Bert models require Long instead of Int. | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setStorageRef('sent_small_bert_L12_768') | Info: unique reference name for identification | Currently set to : sent_small_bert_L12_768\n", + ">>> component_list['sentiment_dl@sent_small_bert_L12_768'] has settable params:\n", + "component_list['sentiment_dl@sent_small_bert_L12_768'].setThreshold(0.6) | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n", + "component_list['sentiment_dl@sent_small_bert_L12_768'].setThresholdLabel('neutral') | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n", + "component_list['sentiment_dl@sent_small_bert_L12_768'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['sentiment_dl@sent_small_bert_L12_768'].setClasses(['positive', 'negative']) | Info: get the tags used to trained this SentimentDLModel | Currently set to : ['positive', 'negative']\n", + "component_list['sentiment_dl@sent_small_bert_L12_768'].setStorageRef('sent_small_bert_L12_768') | Info: unique reference name for identification | Currently set to : sent_small_bert_L12_768\n" + ] + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "cbwS4bE0uT7d" + }, + "source": [], + "execution_count": null, + "outputs": [] + } + ] +} \ No newline at end of file diff --git a/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_covid_19.ipynb b/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_covid_19.ipynb index d55e2c17..89061c68 100644 --- a/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_covid_19.ipynb +++ b/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_covid_19.ipynb @@ -1 +1,4135 @@ -{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"NLU_training_sentiment_classifier_demo_covid_19.ipynb","provenance":[],"collapsed_sections":[],"toc_visible":true},"kernelspec":{"display_name":"Python 3","name":"python3"}},"cells":[{"cell_type":"markdown","metadata":{"id":"zkufh760uvF3"},"source":["![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)\n","\n","[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_covid_19.ipynb)\n","\n","\n","\n","# Training a Sentiment Analysis Classifier with NLU \n","## 2 Class COVID-19 Sentiment Classifer Training\n","With the [SentimentDL model](https://nlp.johnsnowlabs.com/docs/en/annotators#sentimentdl-multi-class-sentiment-analysis-annotator) from Spark NLP you can achieve State Of the Art results on any multi class text classification problem \n","\n","This notebook showcases the following features : \n","\n","- How to train the deep learning classifier\n","- How to store a pipeline to disk\n","- How to load the pipeline from disk (Enables NLU offline mode)\n","\n","\n","You can achieve these results or even better on this dataset with training data:\n","\n","\n","
\n","\n","![image.png]()\n","\n","\n","You can achieve these results or even better on this dataset with training data:\n","\n","\n","
\n","\n","![Screenshot 2021-02-25 190003.png]()"]},{"cell_type":"markdown","metadata":{"id":"dur2drhW5Rvi"},"source":["# 1. Install Java 8 and NLU"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"hFGnBCHavltY","executionInfo":{"status":"ok","timestamp":1620188963860,"user_tz":-120,"elapsed":108208,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"0258b3b7-e331-4f13-8667-00f107198737"},"source":["!wget https://setup.johnsnowlabs.com/nlu/colab.sh -O - | bash\n","import nlu"],"execution_count":null,"outputs":[{"output_type":"stream","text":["--2021-05-05 04:27:36-- https://raw.githubusercontent.com/JohnSnowLabs/nlu/master/scripts/colab_setup.sh\n","Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...\n","Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.\n","HTTP request sent, awaiting response... 200 OK\n","Length: 1671 (1.6K) [text/plain]\n","Saving to: ‘STDOUT’\n","\n","\r- 0%[ ] 0 --.-KB/s \r- 100%[===================>] 1.63K --.-KB/s in 0s \n","\n","2021-05-05 04:27:36 (35.7 MB/s) - written to stdout [1671/1671]\n","\n","Installing NLU 3.0.0 with PySpark 3.0.2 and Spark NLP 3.0.1 for Google Colab ...\n","\u001b[K |████████████████████████████████| 204.8MB 64kB/s \n","\u001b[K |████████████████████████████████| 153kB 46.6MB/s \n","\u001b[K |████████████████████████████████| 204kB 24.2MB/s \n","\u001b[K |████████████████████████████████| 204kB 59.8MB/s \n","\u001b[?25h Building wheel for pyspark (setup.py) ... \u001b[?25l\u001b[?25hdone\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"f4KkTfnR5Ugg"},"source":["# 2. Download Coivd19 NLP Text Sentiemnt Classifcation dataset \n","https://www.kaggle.com/datatattle/covid-19-nlp-text-classification\n","#Context\n","\n","This is a Dataset made of tweets about coivid 19 "]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"OrVb5ZMvvrQD","executionInfo":{"status":"ok","timestamp":1620188965300,"user_tz":-120,"elapsed":109641,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"2e5ecb0f-1fac-4dce-9275-e7a34ec3ff99"},"source":["! wget http://ckl-it.de/wp-content/uploads/2021/02/Corona_NLP_train.csv"],"execution_count":null,"outputs":[{"output_type":"stream","text":["--2021-05-05 04:29:23-- http://ckl-it.de/wp-content/uploads/2021/02/Corona_NLP_train.csv\n","Resolving ckl-it.de (ckl-it.de)... 217.160.0.108, 2001:8d8:100f:f000::209\n","Connecting to ckl-it.de (ckl-it.de)|217.160.0.108|:80... connected.\n","HTTP request sent, awaiting response... 200 OK\n","Length: 5293639 (5.0M) [text/csv]\n","Saving to: ‘Corona_NLP_train.csv’\n","\n","Corona_NLP_train.cs 100%[===================>] 5.05M 4.58MB/s in 1.1s \n","\n","2021-05-05 04:29:25 (4.58 MB/s) - ‘Corona_NLP_train.csv’ saved [5293639/5293639]\n","\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":406},"id":"y4xSRWIhwT28","executionInfo":{"status":"ok","timestamp":1620188966112,"user_tz":-120,"elapsed":110448,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"0946f3a9-aa1e-490d-9763-c34787c6f6b9"},"source":["import pandas as pd\n","train_path = '/content/Corona_NLP_train.csv'\n","\n","train_df = pd.read_csv(train_path)\n","# the text data to use for classification should be in a column named 'text'\n","columns=['text','y']\n","train_df = train_df[columns]\n","from sklearn.model_selection import train_test_split\n","\n","train_df, test_df = train_test_split(train_df, test_size=0.2)\n","train_df"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
texty
4204When I was a kid I always wanted to go on the ...positive
5622The selfish morons who panic bought toilet rol...negative
6545Profitero studied keyword search patterns and ...positive
9049From listed funds to builders to landlords, @k...negative
8568Against the back drop of NHS workers, civil se...positive
.........
9513Crude prices could go negative while Alberta's...negative
5209Thank you so much to all of the amazing Health...positive
8606Crime in the COVID 19 era says a man s charged...negative
3066I need a sugar daddy fast! Cause this covid-19...positive
5140#auspol \\r\\r\\r\\n#StayAtHome\\r\\r\\r\\n#COVID?19\\r...negative
\n","

8000 rows × 2 columns

\n","
"],"text/plain":[" text y\n","4204 When I was a kid I always wanted to go on the ... positive\n","5622 The selfish morons who panic bought toilet rol... negative\n","6545 Profitero studied keyword search patterns and ... positive\n","9049 From listed funds to builders to landlords, @k... negative\n","8568 Against the back drop of NHS workers, civil se... positive\n","... ... ...\n","9513 Crude prices could go negative while Alberta's... negative\n","5209 Thank you so much to all of the amazing Health... positive\n","8606 Crime in the COVID 19 era says a man s charged... negative\n","3066 I need a sugar daddy fast! Cause this covid-19... positive\n","5140 #auspol \\r\\r\\r\\n#StayAtHome\\r\\r\\r\\n#COVID?19\\r... negative\n","\n","[8000 rows x 2 columns]"]},"metadata":{"tags":[]},"execution_count":3}]},{"cell_type":"markdown","metadata":{"id":"0296Om2C5anY"},"source":["# 3. Train Deep Learning Classifier using nlu.load('train.sentiment')\n","\n","You dataset label column should be named 'y' and the feature column with text data should be named 'text'"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":1000},"id":"3ZIPkRkWftBG","executionInfo":{"status":"ok","timestamp":1620189094731,"user_tz":-120,"elapsed":239061,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"58819459-f54c-4139-82fa-1de8e437d848"},"source":["import nlu \n","from sklearn.metrics import classification_report\n","\n","# load a trainable pipeline by specifying the train. prefix and fit it on a datset with label and text columns\n","# by default the Universal Sentence Encoder (USE) Sentence embeddings are used for generation\n","trainable_pipe = nlu.load('train.sentiment')\n","fitted_pipe = trainable_pipe.fit(train_df.iloc[:50])\n","\n","# predict with the trainable pipeline on dataset and get predictions\n","preds = fitted_pipe.predict(train_df.iloc[:50],output_level='document')\n","#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n","preds.dropna(inplace=True)\n","print(classification_report(preds['y'], preds['trainable_sentiment_dl']))\n","\n","preds"],"execution_count":null,"outputs":[{"output_type":"stream","text":["tfhub_use download started this may take some time.\n","Approximate size to download 923.7 MB\n","[OK!]\n","sentence_detector_dl download started this may take some time.\n","Approximate size to download 354.6 KB\n","[OK!]\n"," precision recall f1-score support\n","\n"," negative 0.00 0.00 0.00 19\n"," neutral 0.00 0.00 0.00 0\n"," positive 0.65 1.00 0.78 31\n","\n"," accuracy 0.62 50\n"," macro avg 0.22 0.33 0.26 50\n","weighted avg 0.40 0.62 0.49 50\n","\n"],"name":"stdout"},{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
sentence_embedding_usetextyorigin_indexdocumenttrained_sentiment_confidencesentencetrained_sentiment
0[-0.07024706155061722, -0.047160204499959946, ...When I was a kid I always wanted to go on the ...positive4204When I was a kid I always wanted to go on the ...0.782395[When I was a kid I always wanted to go on the...positive
1[-0.012395520694553852, -0.020829271525144577,...The selfish morons who panic bought toilet rol...negative5622The selfish morons who panic bought toilet rol...0.628907[The selfish morons who panic bought toilet ro...positive
2[-0.02502020262181759, -0.08026497811079025, 0...Profitero studied keyword search patterns and ...positive6545Profitero studied keyword search patterns and ...0.716436[Profitero studied keyword search patterns and...positive
3[0.06621277332305908, 0.011142794042825699, -0...From listed funds to builders to landlords, @k...negative9049From listed funds to builders to landlords, @k...0.613169[From listed funds to builders to landlords, @...positive
4[-0.024583933874964714, 0.06747899204492569, 0...Against the back drop of NHS workers, civil se...positive8568Against the back drop of NHS workers, civil se...0.753840[Against the back drop of NHS workers, civil s...positive
5[-0.030639173462986946, 0.07562850415706635, -...\"Your going to lose people to the flu but you'...negative1645\"Your going to lose people to the flu but you'...0.642136[\"Your going to lose people to the flu but you...positive
6[0.020836224779486656, -0.06270623952150345, -...As the COVID 19 outbreak continues to spread f...negative4190As the COVID 19 outbreak continues to spread f...0.636920[As the COVID 19 outbreak continues to spread ...positive
7[0.04782669618725777, -0.0003077308356296271, ...We re tracking daily changes in economic uncer...negative1930We re tracking daily changes in economic uncer...0.618729[We re tracking daily changes in economic unce...positive
8[-0.04952622205018997, -0.024686245247721672, ...I'm slightly confused! All for social distanc...positive4717I'm slightly confused! All for social distanci...0.748444[I'm slightly confused!, All for social distan...positive
9[0.015434695407748222, -0.0009663645178079605,...The knock-on effects of COVID-19 are having a...positive3860The knock-on effects of COVID-19 are having a...0.778461[The knock-on effects of COVID-19 are having ...positive
10[0.01979624293744564, -0.006749877706170082, -...Praise #coronavirus! God Bless the little bug...positive7223Praise #coronavirus! God Bless the little bugs...0.820170[Praise #coronavirus!, God Bless the little bu...positive
11[0.06731504201889038, 0.06525817513465881, -0....For those that think #COVID—19 #Coronavirus ...positive1540For those that think #COVID—19 #Coronavirus i...0.854121[For those that think #COVID—19 #Coronavirus ...positive
12[-0.03877340257167816, -0.04483730345964432, -...Just managed to book a delivery with Sainsbury...negative1003Just managed to book a delivery with Sainsbury...0.658872[Just managed to book a delivery with Sainsbur...positive
13[0.012202774174511433, 0.04586584493517876, -0...Can’t get any in the supermarket but can rece...positive9630Can’t get any in the supermarket but can rece...0.777700[Can’t get any in the supermarket but can rec...positive
14[0.03157752379775047, -0.03581700101494789, -0...Scary. Gun sales up in #USA. Due to food short...negative9278Scary. Gun sales up in #USA. Due to food short...0.597093[Scary. Gun sales up in #USA., Due to food sho...neutral
15[0.07137540727853775, 0.004993322305381298, -0...VR headset companies, now is the time to slash...positive6960VR headset companies, now is the time to slash...0.786807[VR headset companies, now is the time to slas...positive
16[0.07225924730300903, 0.002497387118637562, 0....31% Can’t Pay the Rent: ‘It’s Only Going to...negative377131% Can’t Pay the Rent: ‘It’s Only Going to...0.643341[31% Can’t Pay the Rent: ‘It’s Only Going t...positive
17[0.04617633670568466, 0.03097780980169773, -0....COVID-19 is unprecedented. And to manage the c...negative2605COVID-19 is unprecedented. And to manage the c...0.652922[COVID-19 is unprecedented., And to manage the...positive
18[0.02162887528538704, -0.07251725345849991, -0...Even as government officials have appealed for...negative9976Even as government officials have appealed for...0.594582[Even as government officials have appealed fo...neutral
19[-0.004043699707835913, -0.05827523022890091, ...I'm just wondering where all these panic buyer...negative6191I'm just wondering where all these panic buyer...0.604813[I'm just wondering where all these panic buye...positive
20[-0.006520306225866079, 0.04672585800290108, -...To address the increased demand on our communi...positive4213To address the increased demand on our communi...0.754619[To address the increased demand on our commun...positive
21[0.09055875241756439, -0.03801679238677025, -0...Important phone conversation with @ReedHasting...positive4180Important phone conversation with @ReedHasting...0.824278[Important phone conversation with @ReedHastin...positive
22[0.06983352452516556, 0.045526593923568726, 0....@JeffreeStar hey,I hope you see this tweet,jus...positive4393@JeffreeStar hey,I hope you see this tweet,jus...0.712092[@JeffreeStar hey,I hope you see this tweet,ju...positive
23[0.010531553998589516, -0.041734274476766586, ...In addition to masks we re now also banning ha...positive7906In addition to masks we re now also banning ha...0.804328[In addition to masks we re now also banning h...positive
24[-0.0046012867242097855, -0.00663403794169426,...It'll make bump into April bumpier, drawing do...positive2178It'll make bump into April bumpier, drawing do...0.680699[It'll make bump into April bumpier, drawing d...positive
25[0.02057724818587303, 0.02416279725730419, -0....#ScamAlert! There are some awful folks out the...negative3000#ScamAlert! There are some awful folks out the...0.717378[#ScamAlert!, There are some awful folks out ...positive
26[-0.0029535864014178514, 0.06930557638406754, ...It's April and I'm on #ESA. So how has @GOVUK ...positive8967It's April and I'm on #ESA. So how has @GOVUK ...0.729225[It's April and I'm on #ESA., So how has @GOVU...positive
27[-0.022838125005364418, 0.035331811755895615, ...#THANKYOU times a million to all those on the ...positive8202#THANKYOU times a million to all those on the ...0.817317[#THANKYOU times a million to all those on the...positive
28[0.0058709485456347466, 0.007139457855373621, ...Online shopping doubles down during coronaviru...negative4932Online shopping doubles down during coronaviru...0.656595[Online shopping doubles down during coronavir...positive
29[0.038493022322654724, -0.04178141430020332, 0...Covid-19 is changing the world as we know it. ...positive525Covid-19 is changing the world as we know it. ...0.836061[Covid-19 is changing the world as we know it....positive
30[0.001453789067454636, 0.005407411605119705, -...The whole supermarket delivery model is broken...negative1389The whole supermarket delivery model is broken...0.656143[The whole supermarket delivery model is broke...positive
31[0.07007955759763718, 0.02776280976831913, -0....For brands interested in building long term eq...positive146For brands interested in building long term eq...0.717087[For brands interested in building long term e...positive
32[0.014691045507788658, 0.052066899836063385, 0...@masrour_barzani It is worth mentioning your e...positive2989@masrour_barzani It is worth mentioning your e...0.750604[@masrour_barzani It is worth mentioning your ...positive
33[0.02123432233929634, 0.052282460033893585, -0...When you realize health care workers grocery s...positive9206When you realize health care workers grocery s...0.808270[When you realize health care workers grocery ...positive
34[-0.016208922490477562, 0.041428711265325546, ...\"Sales of shelf-stable, fresh, and frozen seaf...positive5297\"Sales of shelf-stable, fresh, and frozen seaf...0.683432[\"Sales of shelf-stable, fresh, and frozen sea...positive
35[0.0038951593451201916, 0.012588057667016983, ...@HolidayInn @ChoiceHotels @OmniHotels all majo...positive7155@HolidayInn @ChoiceHotels @OmniHotels all majo...0.792782[@HolidayInn @ChoiceHotels @OmniHotels all maj...positive
36[0.027244137600064278, 0.058916911482810974, -...Some silver linings during this covid-19 crisi...negative3295Some silver linings during this covid-19 crisi...0.639766[Some silver linings during this covid-19 cris...positive
37[0.013577781617641449, 0.004573680926114321, 0...Ok people CAN WE STOP BUYING ALL THE HAND SANI...positive1259Ok people CAN WE STOP BUYING ALL THE HAND SANI...0.784130[Ok people CAN WE STOP BUYING ALL THE HAND SAN...positive
38[0.011083973571658134, -0.004668612964451313, ...Your well being is our top priority!\\r\\r\\r\\nWh...positive3510Your well being is our top priority! While oth...0.712732[Your well being is our top priority!, While o...positive
39[0.010227790102362633, 0.03293986618518829, 0....In #France, supermarket chains agree to a one-...positive9662In #France, supermarket chains agree to a one-...0.762242[In #France, supermarket chains agree to a one...positive
40[-0.03131600469350815, 0.054168395698070526, 0...In one week health care workers, truck drivers...positive8688In one week health care workers, truck drivers...0.787212[In one week health care workers, truck driver...positive
41[-0.01420750841498375, 0.031081417575478554, 0...Try your best to support local businesses duri...positive8111Try your best to support local businesses duri...0.765113[Try your best to support local businesses dur...positive
42[0.023995142430067062, -0.014616046100854874, ...Consumer Hero is wishing all businesses and co...positive6005Consumer Hero is wishing all businesses and co...0.732366[Consumer Hero is wishing all businesses and c...positive
43[0.03285776078701019, 0.007155246566981077, -0...The no longer hidden crisis in middle and poor...negative3965The no longer hidden crisis in middle and poor...0.626646[The no longer hidden crisis in middle and poo...positive
44[0.061222951859235764, 0.04071924835443497, -0...@osaeB @markets Two economic positives from CO...positive561@osaeB @markets Two economic positives from CO...0.702125[@osaeB @markets Two economic positives from C...positive
45[0.027163468301296234, -0.029017480090260506, ...Well the #Coronavirus food panic buying isn’t...negative8352Well the #Coronavirus food panic buying isn’t...0.610110[Well the #Coronavirus food panic buying isn’...positive
46[0.04085322096943855, 0.04066244512796402, -0....@Complex This is unacceptable! Where’s Peta? ...positive4370@Complex This is unacceptable! Where’s Peta? ...0.818300[@Complex This is unacceptable!, Where’s Peta...positive
47[0.06088094785809517, 0.02243555523455143, -0....STOP THE PANIC BUYING \\r\\r\\r\\nThere is no shor...negative8241STOP THE PANIC BUYING There is no shortage of ...0.603424[STOP THE PANIC BUYING There is no shortage of...positive
48[-0.05397384986281395, 0.04183822497725487, -0...We often rightly hear about the great work tha...positive5160We often rightly hear about the great work tha...0.831636[We often rightly hear about the great work th...positive
49[-0.009696043096482754, 0.09022427350282669, 0...In these tough and stressful times with 100 s ...negative5159In these tough and stressful times with 100 s ...0.751247[In these tough and stressful times with 100 s...positive
\n","
"],"text/plain":[" sentence_embedding_use ... trained_sentiment\n","0 [-0.07024706155061722, -0.047160204499959946, ... ... positive\n","1 [-0.012395520694553852, -0.020829271525144577,... ... positive\n","2 [-0.02502020262181759, -0.08026497811079025, 0... ... positive\n","3 [0.06621277332305908, 0.011142794042825699, -0... ... positive\n","4 [-0.024583933874964714, 0.06747899204492569, 0... ... positive\n","5 [-0.030639173462986946, 0.07562850415706635, -... ... positive\n","6 [0.020836224779486656, -0.06270623952150345, -... ... positive\n","7 [0.04782669618725777, -0.0003077308356296271, ... ... positive\n","8 [-0.04952622205018997, -0.024686245247721672, ... ... positive\n","9 [0.015434695407748222, -0.0009663645178079605,... ... positive\n","10 [0.01979624293744564, -0.006749877706170082, -... ... positive\n","11 [0.06731504201889038, 0.06525817513465881, -0.... ... positive\n","12 [-0.03877340257167816, -0.04483730345964432, -... ... positive\n","13 [0.012202774174511433, 0.04586584493517876, -0... ... positive\n","14 [0.03157752379775047, -0.03581700101494789, -0... ... neutral\n","15 [0.07137540727853775, 0.004993322305381298, -0... ... positive\n","16 [0.07225924730300903, 0.002497387118637562, 0.... ... positive\n","17 [0.04617633670568466, 0.03097780980169773, -0.... ... positive\n","18 [0.02162887528538704, -0.07251725345849991, -0... ... neutral\n","19 [-0.004043699707835913, -0.05827523022890091, ... ... positive\n","20 [-0.006520306225866079, 0.04672585800290108, -... ... positive\n","21 [0.09055875241756439, -0.03801679238677025, -0... ... positive\n","22 [0.06983352452516556, 0.045526593923568726, 0.... ... positive\n","23 [0.010531553998589516, -0.041734274476766586, ... ... positive\n","24 [-0.0046012867242097855, -0.00663403794169426,... ... positive\n","25 [0.02057724818587303, 0.02416279725730419, -0.... ... positive\n","26 [-0.0029535864014178514, 0.06930557638406754, ... ... positive\n","27 [-0.022838125005364418, 0.035331811755895615, ... ... positive\n","28 [0.0058709485456347466, 0.007139457855373621, ... ... positive\n","29 [0.038493022322654724, -0.04178141430020332, 0... ... positive\n","30 [0.001453789067454636, 0.005407411605119705, -... ... positive\n","31 [0.07007955759763718, 0.02776280976831913, -0.... ... positive\n","32 [0.014691045507788658, 0.052066899836063385, 0... ... positive\n","33 [0.02123432233929634, 0.052282460033893585, -0... ... positive\n","34 [-0.016208922490477562, 0.041428711265325546, ... ... positive\n","35 [0.0038951593451201916, 0.012588057667016983, ... ... positive\n","36 [0.027244137600064278, 0.058916911482810974, -... ... positive\n","37 [0.013577781617641449, 0.004573680926114321, 0... ... positive\n","38 [0.011083973571658134, -0.004668612964451313, ... ... positive\n","39 [0.010227790102362633, 0.03293986618518829, 0.... ... positive\n","40 [-0.03131600469350815, 0.054168395698070526, 0... ... positive\n","41 [-0.01420750841498375, 0.031081417575478554, 0... ... positive\n","42 [0.023995142430067062, -0.014616046100854874, ... ... positive\n","43 [0.03285776078701019, 0.007155246566981077, -0... ... positive\n","44 [0.061222951859235764, 0.04071924835443497, -0... ... positive\n","45 [0.027163468301296234, -0.029017480090260506, ... ... positive\n","46 [0.04085322096943855, 0.04066244512796402, -0.... ... positive\n","47 [0.06088094785809517, 0.02243555523455143, -0.... ... positive\n","48 [-0.05397384986281395, 0.04183822497725487, -0... ... positive\n","49 [-0.009696043096482754, 0.09022427350282669, 0... ... positive\n","\n","[50 rows x 8 columns]"]},"metadata":{"tags":[]},"execution_count":4}]},{"cell_type":"markdown","metadata":{"id":"lVyOE2wV0fw_"},"source":["# 4. Test the fitted pipe on new example"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":77},"id":"qdCUg2MR0PD2","executionInfo":{"status":"ok","timestamp":1620189095590,"user_tz":-120,"elapsed":239915,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"76211d7f-06a9-4d29-8d51-64593429993c"},"source":["fitted_pipe.predict(\"Everything is under control !\")"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
sentence_embedding_useorigin_indexdocumenttrained_sentiment_confidencesentencetrained_sentiment
0[0.027917474508285522, -0.06684374064207077, -...0Everything is under control !0.770841[Everything is under control !]positive
\n","
"],"text/plain":[" sentence_embedding_use ... trained_sentiment\n","0 [0.027917474508285522, -0.06684374064207077, -... ... positive\n","\n","[1 rows x 6 columns]"]},"metadata":{"tags":[]},"execution_count":5}]},{"cell_type":"markdown","metadata":{"id":"xflpwrVjjBVD"},"source":["## 5. Configure pipe training parameters"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"UtsAUGTmOTms","executionInfo":{"status":"ok","timestamp":1620189095591,"user_tz":-120,"elapsed":239911,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"7d9e3f28-7bdc-4a4f-e1cb-2329d5b26dd1"},"source":["trainable_pipe.print_info()"],"execution_count":null,"outputs":[{"output_type":"stream","text":["The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n",">>> pipe['sentiment_dl'] has settable params:\n","pipe['sentiment_dl'].setMaxEpochs(1) | Info: Maximum number of epochs to train | Currently set to : 1\n","pipe['sentiment_dl'].setLr(0.005) | Info: Learning Rate | Currently set to : 0.005\n","pipe['sentiment_dl'].setBatchSize(64) | Info: Batch size | Currently set to : 64\n","pipe['sentiment_dl'].setDropout(0.5) | Info: Dropout coefficient | Currently set to : 0.5\n","pipe['sentiment_dl'].setEnableOutputLogs(True) | Info: Whether to use stdout in addition to Spark logs. | Currently set to : True\n","pipe['sentiment_dl'].setThreshold(0.6) | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n","pipe['sentiment_dl'].setThresholdLabel('neutral') | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n",">>> pipe['use@tfhub_use'] has settable params:\n","pipe['use@tfhub_use'].setDimension(512) | Info: Number of embedding dimensions | Currently set to : 512\n","pipe['use@tfhub_use'].setLoadSP(False) | Info: Whether to load SentencePiece ops file which is required only by multi-lingual models. This is not changeable after it's set with a pretrained model nor it is compatible with Windows. | Currently set to : False\n","pipe['use@tfhub_use'].setStorageRef('tfhub_use') | Info: unique reference name for identification | Currently set to : tfhub_use\n",">>> pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'] has settable params:\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setExplodeSentences(False) | Info: whether to explode each sentence into a different row, for better parallelization. Defaults to false. | Currently set to : False\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setStorageRef('SentenceDetectorDLModel_c83c27f46b97') | Info: storage unique identifier | Currently set to : SentenceDetectorDLModel_c83c27f46b97\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setEncoder(com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@d4b5f4f) | Info: Data encoder | Currently set to : com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@d4b5f4f\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setImpossiblePenultimates(['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']) | Info: Impossible penultimates | Currently set to : ['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setModelArchitecture('cnn') | Info: Model architecture (CNN) | Currently set to : cnn\n",">>> pipe['document_assembler'] has settable params:\n","pipe['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"2GJdDNV9jEIe"},"source":["##6. Retrain with new parameters"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":1000},"id":"mptfvHx-MMMX","executionInfo":{"status":"ok","timestamp":1620189100229,"user_tz":-120,"elapsed":244544,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"b7a16d53-fde9-4d97-bb0e-ac3be96d722d"},"source":["# Train longer!\n","trainable_pipe = nlu.load('train.sentiment')\n","trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(5) \n","fitted_pipe = trainable_pipe.fit(train_df.iloc[:50])\n","# predict with the trainable pipeline on dataset and get predictions\n","preds = fitted_pipe.predict(train_df.iloc[:50],output_level='document')\n","\n","#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n","preds.dropna(inplace=True)\n","print(classification_report(preds['y'], preds['sentiment']))\n","\n","preds"],"execution_count":null,"outputs":[{"output_type":"stream","text":[" precision recall f1-score support\n","\n"," negative 0.00 0.00 0.00 19\n"," neutral 0.00 0.00 0.00 0\n"," positive 0.97 1.00 0.98 31\n","\n"," accuracy 0.62 50\n"," macro avg 0.32 0.33 0.33 50\n","weighted avg 0.60 0.62 0.61 50\n","\n"],"name":"stdout"},{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
sentence_embedding_usetextyorigin_indexdocumenttrained_sentiment_confidencesentencetrained_sentiment
0[-0.07024706155061722, -0.047160204499959946, ...When I was a kid I always wanted to go on the ...positive4204When I was a kid I always wanted to go on the ...0.978457[When I was a kid I always wanted to go on the...positive
1[-0.012395520694553852, -0.020829271525144577,...The selfish morons who panic bought toilet rol...negative5622The selfish morons who panic bought toilet rol...0.530837[The selfish morons who panic bought toilet ro...neutral
2[-0.02502020262181759, -0.08026497811079025, 0...Profitero studied keyword search patterns and ...positive6545Profitero studied keyword search patterns and ...0.845908[Profitero studied keyword search patterns and...positive
3[0.06621277332305908, 0.011142794042825699, -0...From listed funds to builders to landlords, @k...negative9049From listed funds to builders to landlords, @k...0.541567[From listed funds to builders to landlords, @...neutral
4[-0.024583933874964714, 0.06747899204492569, 0...Against the back drop of NHS workers, civil se...positive8568Against the back drop of NHS workers, civil se...0.961466[Against the back drop of NHS workers, civil s...positive
5[-0.030639173462986946, 0.07562850415706635, -...\"Your going to lose people to the flu but you'...negative1645\"Your going to lose people to the flu but you'...0.508357[\"Your going to lose people to the flu but you...neutral
6[0.020836224779486656, -0.06270623952150345, -...As the COVID 19 outbreak continues to spread f...negative4190As the COVID 19 outbreak continues to spread f...0.526309[As the COVID 19 outbreak continues to spread ...neutral
7[0.04782669618725777, -0.0003077308356296271, ...We re tracking daily changes in economic uncer...negative1930We re tracking daily changes in economic uncer...0.542942[We re tracking daily changes in economic unce...neutral
8[-0.04952622205018997, -0.024686245247721672, ...I'm slightly confused! All for social distanc...positive4717I'm slightly confused! All for social distanci...0.953167[I'm slightly confused!, All for social distan...positive
9[0.015434695407748222, -0.0009663645178079605,...The knock-on effects of COVID-19 are having a...positive3860The knock-on effects of COVID-19 are having a...0.974030[The knock-on effects of COVID-19 are having ...positive
10[0.01979624293744564, -0.006749877706170082, -...Praise #coronavirus! God Bless the little bug...positive7223Praise #coronavirus! God Bless the little bugs...0.983924[Praise #coronavirus!, God Bless the little bu...positive
11[0.06731504201889038, 0.06525817513465881, -0....For those that think #COVID—19 #Coronavirus ...positive1540For those that think #COVID—19 #Coronavirus i...0.994959[For those that think #COVID—19 #Coronavirus ...positive
12[-0.03877340257167816, -0.04483730345964432, -...Just managed to book a delivery with Sainsbury...negative1003Just managed to book a delivery with Sainsbury...0.519039[Just managed to book a delivery with Sainsbur...neutral
13[0.012202774174511433, 0.04586584493517876, -0...Can’t get any in the supermarket but can rece...positive9630Can’t get any in the supermarket but can rece...0.976869[Can’t get any in the supermarket but can rec...positive
14[0.03157752379775047, -0.03581700101494789, -0...Scary. Gun sales up in #USA. Due to food short...negative9278Scary. Gun sales up in #USA. Due to food short...0.534066[Scary. Gun sales up in #USA., Due to food sho...neutral
15[0.07137540727853775, 0.004993322305381298, -0...VR headset companies, now is the time to slash...positive6960VR headset companies, now is the time to slash...0.949657[VR headset companies, now is the time to slas...positive
16[0.07225924730300903, 0.002497387118637562, 0....31% Can’t Pay the Rent: ‘It’s Only Going to...negative377131% Can’t Pay the Rent: ‘It’s Only Going to...0.517149[31% Can’t Pay the Rent: ‘It’s Only Going t...neutral
17[0.04617633670568466, 0.03097780980169773, -0....COVID-19 is unprecedented. And to manage the c...negative2605COVID-19 is unprecedented. And to manage the c...0.502484[COVID-19 is unprecedented., And to manage the...neutral
18[0.02162887528538704, -0.07251725345849991, -0...Even as government officials have appealed for...negative9976Even as government officials have appealed for...0.543885[Even as government officials have appealed fo...neutral
19[-0.004043699707835913, -0.05827523022890091, ...I'm just wondering where all these panic buyer...negative6191I'm just wondering where all these panic buyer...0.528756[I'm just wondering where all these panic buye...neutral
20[-0.006520306225866079, 0.04672585800290108, -...To address the increased demand on our communi...positive4213To address the increased demand on our communi...0.955490[To address the increased demand on our commun...positive
21[0.09055875241756439, -0.03801679238677025, -0...Important phone conversation with @ReedHasting...positive4180Important phone conversation with @ReedHasting...0.983021[Important phone conversation with @ReedHastin...positive
22[0.06983352452516556, 0.045526593923568726, 0....@JeffreeStar hey,I hope you see this tweet,jus...positive4393@JeffreeStar hey,I hope you see this tweet,jus...0.886496[@JeffreeStar hey,I hope you see this tweet,ju...positive
23[0.010531553998589516, -0.041734274476766586, ...In addition to masks we re now also banning ha...positive7906In addition to masks we re now also banning ha...0.980952[In addition to masks we re now also banning h...positive
24[-0.0046012867242097855, -0.00663403794169426,...It'll make bump into April bumpier, drawing do...positive2178It'll make bump into April bumpier, drawing do...0.780364[It'll make bump into April bumpier, drawing d...positive
25[0.02057724818587303, 0.02416279725730419, -0....#ScamAlert! There are some awful folks out the...negative3000#ScamAlert! There are some awful folks out the...0.547257[#ScamAlert!, There are some awful folks out ...neutral
26[-0.0029535864014178514, 0.06930557638406754, ...It's April and I'm on #ESA. So how has @GOVUK ...positive8967It's April and I'm on #ESA. So how has @GOVUK ...0.844140[It's April and I'm on #ESA., So how has @GOVU...positive
27[-0.022838125005364418, 0.035331811755895615, ...#THANKYOU times a million to all those on the ...positive8202#THANKYOU times a million to all those on the ...0.984350[#THANKYOU times a million to all those on the...positive
28[0.0058709485456347466, 0.007139457855373621, ...Online shopping doubles down during coronaviru...negative4932Online shopping doubles down during coronaviru...0.517497[Online shopping doubles down during coronavir...neutral
29[0.038493022322654724, -0.04178141430020332, 0...Covid-19 is changing the world as we know it. ...positive525Covid-19 is changing the world as we know it. ...0.994486[Covid-19 is changing the world as we know it....positive
30[0.001453789067454636, 0.005407411605119705, -...The whole supermarket delivery model is broken...negative1389The whole supermarket delivery model is broken...0.505307[The whole supermarket delivery model is broke...neutral
31[0.07007955759763718, 0.02776280976831913, -0....For brands interested in building long term eq...positive146For brands interested in building long term eq...0.887487[For brands interested in building long term e...positive
32[0.014691045507788658, 0.052066899836063385, 0...@masrour_barzani It is worth mentioning your e...positive2989@masrour_barzani It is worth mentioning your e...0.946808[@masrour_barzani It is worth mentioning your ...positive
33[0.02123432233929634, 0.052282460033893585, -0...When you realize health care workers grocery s...positive9206When you realize health care workers grocery s...0.988752[When you realize health care workers grocery ...positive
34[-0.016208922490477562, 0.041428711265325546, ...\"Sales of shelf-stable, fresh, and frozen seaf...positive5297\"Sales of shelf-stable, fresh, and frozen seaf...0.841267[\"Sales of shelf-stable, fresh, and frozen sea...positive
35[0.0038951593451201916, 0.012588057667016983, ...@HolidayInn @ChoiceHotels @OmniHotels all majo...positive7155@HolidayInn @ChoiceHotels @OmniHotels all majo...0.972513[@HolidayInn @ChoiceHotels @OmniHotels all maj...positive
36[0.027244137600064278, 0.058916911482810974, -...Some silver linings during this covid-19 crisi...negative3295Some silver linings during this covid-19 crisi...0.500515[Some silver linings during this covid-19 cris...neutral
37[0.013577781617641449, 0.004573680926114321, 0...Ok people CAN WE STOP BUYING ALL THE HAND SANI...positive1259Ok people CAN WE STOP BUYING ALL THE HAND SANI...0.953883[Ok people CAN WE STOP BUYING ALL THE HAND SAN...positive
38[0.011083973571658134, -0.004668612964451313, ...Your well being is our top priority!\\r\\r\\r\\nWh...positive3510Your well being is our top priority! While oth...0.848192[Your well being is our top priority!, While o...positive
39[0.010227790102362633, 0.03293986618518829, 0....In #France, supermarket chains agree to a one-...positive9662In #France, supermarket chains agree to a one-...0.961013[In #France, supermarket chains agree to a one...positive
40[-0.03131600469350815, 0.054168395698070526, 0...In one week health care workers, truck drivers...positive8688In one week health care workers, truck drivers...0.983875[In one week health care workers, truck driver...positive
41[-0.01420750841498375, 0.031081417575478554, 0...Try your best to support local businesses duri...positive8111Try your best to support local businesses duri...0.963043[Try your best to support local businesses dur...positive
42[0.023995142430067062, -0.014616046100854874, ...Consumer Hero is wishing all businesses and co...positive6005Consumer Hero is wishing all businesses and co...0.903110[Consumer Hero is wishing all businesses and c...positive
43[0.03285776078701019, 0.007155246566981077, -0...The no longer hidden crisis in middle and poor...negative3965The no longer hidden crisis in middle and poor...0.530567[The no longer hidden crisis in middle and poo...neutral
44[0.061222951859235764, 0.04071924835443497, -0...@osaeB @markets Two economic positives from CO...positive561@osaeB @markets Two economic positives from CO...0.764335[@osaeB @markets Two economic positives from C...positive
45[0.027163468301296234, -0.029017480090260506, ...Well the #Coronavirus food panic buying isn’t...negative8352Well the #Coronavirus food panic buying isn’t...0.536124[Well the #Coronavirus food panic buying isn’...neutral
46[0.04085322096943855, 0.04066244512796402, -0....@Complex This is unacceptable! Where’s Peta? ...positive4370@Complex This is unacceptable! Where’s Peta? ...0.983527[@Complex This is unacceptable!, Where’s Peta...positive
47[0.06088094785809517, 0.02243555523455143, -0....STOP THE PANIC BUYING \\r\\r\\r\\nThere is no shor...negative8241STOP THE PANIC BUYING There is no shortage of ...0.538391[STOP THE PANIC BUYING There is no shortage of...neutral
48[-0.05397384986281395, 0.04183822497725487, -0...We often rightly hear about the great work tha...positive5160We often rightly hear about the great work tha...0.989714[We often rightly hear about the great work th...positive
49[-0.009696043096482754, 0.09022427350282669, 0...In these tough and stressful times with 100 s ...negative5159In these tough and stressful times with 100 s ...0.822190[In these tough and stressful times with 100 s...positive
\n","
"],"text/plain":[" sentence_embedding_use ... trained_sentiment\n","0 [-0.07024706155061722, -0.047160204499959946, ... ... positive\n","1 [-0.012395520694553852, -0.020829271525144577,... ... neutral\n","2 [-0.02502020262181759, -0.08026497811079025, 0... ... positive\n","3 [0.06621277332305908, 0.011142794042825699, -0... ... neutral\n","4 [-0.024583933874964714, 0.06747899204492569, 0... ... positive\n","5 [-0.030639173462986946, 0.07562850415706635, -... ... neutral\n","6 [0.020836224779486656, -0.06270623952150345, -... ... neutral\n","7 [0.04782669618725777, -0.0003077308356296271, ... ... neutral\n","8 [-0.04952622205018997, -0.024686245247721672, ... ... positive\n","9 [0.015434695407748222, -0.0009663645178079605,... ... positive\n","10 [0.01979624293744564, -0.006749877706170082, -... ... positive\n","11 [0.06731504201889038, 0.06525817513465881, -0.... ... positive\n","12 [-0.03877340257167816, -0.04483730345964432, -... ... neutral\n","13 [0.012202774174511433, 0.04586584493517876, -0... ... positive\n","14 [0.03157752379775047, -0.03581700101494789, -0... ... neutral\n","15 [0.07137540727853775, 0.004993322305381298, -0... ... positive\n","16 [0.07225924730300903, 0.002497387118637562, 0.... ... neutral\n","17 [0.04617633670568466, 0.03097780980169773, -0.... ... neutral\n","18 [0.02162887528538704, -0.07251725345849991, -0... ... neutral\n","19 [-0.004043699707835913, -0.05827523022890091, ... ... neutral\n","20 [-0.006520306225866079, 0.04672585800290108, -... ... positive\n","21 [0.09055875241756439, -0.03801679238677025, -0... ... positive\n","22 [0.06983352452516556, 0.045526593923568726, 0.... ... positive\n","23 [0.010531553998589516, -0.041734274476766586, ... ... positive\n","24 [-0.0046012867242097855, -0.00663403794169426,... ... positive\n","25 [0.02057724818587303, 0.02416279725730419, -0.... ... neutral\n","26 [-0.0029535864014178514, 0.06930557638406754, ... ... positive\n","27 [-0.022838125005364418, 0.035331811755895615, ... ... positive\n","28 [0.0058709485456347466, 0.007139457855373621, ... ... neutral\n","29 [0.038493022322654724, -0.04178141430020332, 0... ... positive\n","30 [0.001453789067454636, 0.005407411605119705, -... ... neutral\n","31 [0.07007955759763718, 0.02776280976831913, -0.... ... positive\n","32 [0.014691045507788658, 0.052066899836063385, 0... ... positive\n","33 [0.02123432233929634, 0.052282460033893585, -0... ... positive\n","34 [-0.016208922490477562, 0.041428711265325546, ... ... positive\n","35 [0.0038951593451201916, 0.012588057667016983, ... ... positive\n","36 [0.027244137600064278, 0.058916911482810974, -... ... neutral\n","37 [0.013577781617641449, 0.004573680926114321, 0... ... positive\n","38 [0.011083973571658134, -0.004668612964451313, ... ... positive\n","39 [0.010227790102362633, 0.03293986618518829, 0.... ... positive\n","40 [-0.03131600469350815, 0.054168395698070526, 0... ... positive\n","41 [-0.01420750841498375, 0.031081417575478554, 0... ... positive\n","42 [0.023995142430067062, -0.014616046100854874, ... ... positive\n","43 [0.03285776078701019, 0.007155246566981077, -0... ... neutral\n","44 [0.061222951859235764, 0.04071924835443497, -0... ... positive\n","45 [0.027163468301296234, -0.029017480090260506, ... ... neutral\n","46 [0.04085322096943855, 0.04066244512796402, -0.... ... positive\n","47 [0.06088094785809517, 0.02243555523455143, -0.... ... neutral\n","48 [-0.05397384986281395, 0.04183822497725487, -0... ... positive\n","49 [-0.009696043096482754, 0.09022427350282669, 0... ... positive\n","\n","[50 rows x 8 columns]"]},"metadata":{"tags":[]},"execution_count":7}]},{"cell_type":"markdown","metadata":{"id":"qFoT-s1MjTSS"},"source":["# 7. Try training with different Embeddings"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"nxWFzQOhjWC8","executionInfo":{"status":"ok","timestamp":1620189100230,"user_tz":-120,"elapsed":244540,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"04526125-5b61-4628-e7a4-9778daa6da8f"},"source":["# We can use nlu.print_components(action='embed_sentence') to see every possibler sentence embedding we could use. Lets use bert!\n","nlu.print_components(action='embed_sentence')"],"execution_count":null,"outputs":[{"output_type":"stream","text":["For language NLU provides the following Models : \n","nlu.load('en.embed_sentence') returns Spark NLP model tfhub_use\n","nlu.load('en.embed_sentence.use') returns Spark NLP model tfhub_use\n","nlu.load('en.embed_sentence.tfhub_use') returns Spark NLP model tfhub_use\n","nlu.load('en.embed_sentence.use.lg') returns Spark NLP model tfhub_use_lg\n","nlu.load('en.embed_sentence.tfhub_use.lg') returns Spark NLP model tfhub_use_lg\n","nlu.load('en.embed_sentence.albert') returns Spark NLP model albert_base_uncased\n","nlu.load('en.embed_sentence.electra') returns Spark NLP model sent_electra_small_uncased\n","nlu.load('en.embed_sentence.electra_small_uncased') returns Spark NLP model sent_electra_small_uncased\n","nlu.load('en.embed_sentence.electra_base_uncased') returns Spark NLP model sent_electra_base_uncased\n","nlu.load('en.embed_sentence.electra_large_uncased') returns Spark NLP model sent_electra_large_uncased\n","nlu.load('en.embed_sentence.bert') returns Spark NLP model sent_bert_base_uncased\n","nlu.load('en.embed_sentence.bert_base_uncased') returns Spark NLP model sent_bert_base_uncased\n","nlu.load('en.embed_sentence.bert_base_cased') returns Spark NLP model sent_bert_base_cased\n","nlu.load('en.embed_sentence.bert_large_uncased') returns Spark NLP model sent_bert_large_uncased\n","nlu.load('en.embed_sentence.bert_large_cased') returns Spark NLP model sent_bert_large_cased\n","nlu.load('en.embed_sentence.biobert.pubmed_base_cased') returns Spark NLP model sent_biobert_pubmed_base_cased\n","nlu.load('en.embed_sentence.biobert.pubmed_large_cased') returns Spark NLP model sent_biobert_pubmed_large_cased\n","nlu.load('en.embed_sentence.biobert.pmc_base_cased') returns Spark NLP model sent_biobert_pmc_base_cased\n","nlu.load('en.embed_sentence.biobert.pubmed_pmc_base_cased') returns Spark NLP model sent_biobert_pubmed_pmc_base_cased\n","nlu.load('en.embed_sentence.biobert.clinical_base_cased') returns Spark NLP model sent_biobert_clinical_base_cased\n","nlu.load('en.embed_sentence.biobert.discharge_base_cased') returns Spark NLP model sent_biobert_discharge_base_cased\n","nlu.load('en.embed_sentence.covidbert.large_uncased') returns Spark NLP model sent_covidbert_large_uncased\n","nlu.load('en.embed_sentence.small_bert_L2_128') returns Spark NLP model sent_small_bert_L2_128\n","nlu.load('en.embed_sentence.small_bert_L4_128') returns Spark NLP model sent_small_bert_L4_128\n","nlu.load('en.embed_sentence.small_bert_L6_128') returns Spark NLP model sent_small_bert_L6_128\n","nlu.load('en.embed_sentence.small_bert_L8_128') returns Spark NLP model sent_small_bert_L8_128\n","nlu.load('en.embed_sentence.small_bert_L10_128') returns Spark NLP model sent_small_bert_L10_128\n","nlu.load('en.embed_sentence.small_bert_L12_128') returns Spark NLP model sent_small_bert_L12_128\n","nlu.load('en.embed_sentence.small_bert_L2_256') returns Spark NLP model sent_small_bert_L2_256\n","nlu.load('en.embed_sentence.small_bert_L4_256') returns Spark NLP model sent_small_bert_L4_256\n","nlu.load('en.embed_sentence.small_bert_L6_256') returns Spark NLP model sent_small_bert_L6_256\n","nlu.load('en.embed_sentence.small_bert_L8_256') returns Spark NLP model sent_small_bert_L8_256\n","nlu.load('en.embed_sentence.small_bert_L10_256') returns Spark NLP model sent_small_bert_L10_256\n","nlu.load('en.embed_sentence.small_bert_L12_256') returns Spark NLP model sent_small_bert_L12_256\n","nlu.load('en.embed_sentence.small_bert_L2_512') returns Spark NLP model sent_small_bert_L2_512\n","nlu.load('en.embed_sentence.small_bert_L4_512') returns Spark NLP model sent_small_bert_L4_512\n","nlu.load('en.embed_sentence.small_bert_L6_512') returns Spark NLP model sent_small_bert_L6_512\n","nlu.load('en.embed_sentence.small_bert_L8_512') returns Spark NLP model sent_small_bert_L8_512\n","nlu.load('en.embed_sentence.small_bert_L10_512') returns Spark NLP model sent_small_bert_L10_512\n","nlu.load('en.embed_sentence.small_bert_L12_512') returns Spark NLP model sent_small_bert_L12_512\n","nlu.load('en.embed_sentence.small_bert_L2_768') returns Spark NLP model sent_small_bert_L2_768\n","nlu.load('en.embed_sentence.small_bert_L4_768') returns Spark NLP model sent_small_bert_L4_768\n","nlu.load('en.embed_sentence.small_bert_L6_768') returns Spark NLP model sent_small_bert_L6_768\n","nlu.load('en.embed_sentence.small_bert_L8_768') returns Spark NLP model sent_small_bert_L8_768\n","nlu.load('en.embed_sentence.small_bert_L10_768') returns Spark NLP model sent_small_bert_L10_768\n","nlu.load('en.embed_sentence.small_bert_L12_768') returns Spark NLP model sent_small_bert_L12_768\n","For language NLU provides the following Models : \n","nlu.load('fi.embed_sentence') returns Spark NLP model sent_bert_finnish_cased\n","nlu.load('fi.embed_sentence.bert.cased') returns Spark NLP model sent_bert_finnish_cased\n","nlu.load('fi.embed_sentence.bert.uncased') returns Spark NLP model sent_bert_finnish_uncased\n","For language NLU provides the following Models : \n","nlu.load('xx.embed_sentence') returns Spark NLP model sent_bert_multi_cased\n","nlu.load('xx.embed_sentence.bert') returns Spark NLP model sent_bert_multi_cased\n","nlu.load('xx.embed_sentence.bert.cased') returns Spark NLP model sent_bert_multi_cased\n","nlu.load('xx.embed_sentence.labse') returns Spark NLP model labse\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"IKK_Ii_gjJfF","executionInfo":{"status":"ok","timestamp":1620199760733,"user_tz":-120,"elapsed":6530,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"bb105ac7-fef2-4e67-d5ae-c48d6dd7107c"},"source":["trainable_pipe = nlu.load('en.embed_sentence.small_bert_L12_768 train.sentiment')\n","# We need to train longer and user smaller LR for NON-USE based sentence embeddings usually\n","# We could tune the hyperparameters further with hyperparameter tuning methods like gridsearch\n","# Also longer training gives more accuracy\n","trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(120) \n","trainable_pipe['trainable_sentiment_dl'].setLr(0.0005) \n","fitted_pipe = trainable_pipe.fit(train_df)\n","# predict with the trainable pipeline on dataset and get predictions\n","preds = fitted_pipe.predict(train_df,output_level='document')\n","\n","#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n","preds.dropna(inplace=True)\n","print(classification_report(preds['y'], preds['sentiment']))\n","\n","#preds"],"execution_count":null,"outputs":[{"output_type":"stream","text":["sent_small_bert_L12_768 download started this may take some time.\n","Approximate size to download 392.9 MB\n","[OK!]\n","sentence_detector_dl download started this may take some time.\n","Approximate size to download 354.6 KB\n","[OK!]\n"," precision recall f1-score support\n","\n"," negative 0.89 0.86 0.87 3982\n"," neutral 0.00 0.00 0.00 0\n"," positive 0.90 0.85 0.87 4018\n","\n"," accuracy 0.85 8000\n"," macro avg 0.60 0.57 0.58 8000\n","weighted avg 0.90 0.85 0.87 8000\n","\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"_1jxw3GnVGlI"},"source":["# 7.1 evaluate on Test Data"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"Fxx4yNkNVGFl","executionInfo":{"status":"ok","timestamp":1620200260263,"user_tz":-120,"elapsed":491614,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"eff0c644-2484-4180-8c2d-0295eca4d1c5"},"source":["preds = fitted_pipe.predict(test_df,output_level='document')\n","\n","#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n","preds.dropna(inplace=True)\n","print(classification_report(preds['y'], preds['sentiment']))"],"execution_count":null,"outputs":[{"output_type":"stream","text":[" precision recall f1-score support\n","\n"," negative 0.88 0.81 0.85 1018\n"," neutral 0.00 0.00 0.00 0\n"," positive 0.86 0.85 0.85 982\n","\n"," accuracy 0.83 2000\n"," macro avg 0.58 0.55 0.57 2000\n","weighted avg 0.87 0.83 0.85 2000\n","\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"2BB-NwZUoHSe"},"source":["# 8. Lets save the model"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"eLex095goHwm","executionInfo":{"status":"ok","timestamp":1620200475138,"user_tz":-120,"elapsed":705445,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"0223e739-965d-4882-cf48-ebc6d1fd45d4"},"source":["stored_model_path = './models/classifier_dl_trained' \n","fitted_pipe.save(stored_model_path)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["Stored model in ./models/classifier_dl_trained\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"e_b2DPd4rCiU"},"source":["# 9. Lets load the model from HDD.\n","This makes Offlien NLU usage possible! \n","You need to call nlu.load(path=path_to_the_pipe) to load a model/pipeline from disk."]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":76},"id":"SO4uz45MoRgp","executionInfo":{"status":"ok","timestamp":1620200488257,"user_tz":-120,"elapsed":716709,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"4347e200-966d-415b-d719-090e2e2efcd7"},"source":["hdd_pipe = nlu.load(path=stored_model_path)\n","\n","preds = hdd_pipe.predict('Everything is under control !')\n","preds"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
textsentimentorigin_indexdocumentsentiment_confidencesentence_embedding_from_disksentence
0Everything is under control ![negative]8589934592Everything is under control ![0.7562715][[0.3778035342693329, 0.29955393075942993, 0.1...[Everything is under control !]
\n","
"],"text/plain":[" text ... sentence\n","0 Everything is under control ! ... [Everything is under control !]\n","\n","[1 rows x 7 columns]"]},"metadata":{"tags":[]},"execution_count":13}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"e0CVlkk9v6Qi","executionInfo":{"status":"ok","timestamp":1620200488259,"user_tz":-120,"elapsed":715702,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"a1f2e70d-ac08-428c-9e05-0c83c78f2db6"},"source":["hdd_pipe.print_info()"],"execution_count":null,"outputs":[{"output_type":"stream","text":["The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n",">>> pipe['document_assembler'] has settable params:\n","pipe['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n",">>> pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'] has settable params:\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setExplodeSentences(False) | Info: whether to explode each sentence into a different row, for better parallelization. Defaults to false. | Currently set to : False\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setStorageRef('SentenceDetectorDLModel_c83c27f46b97') | Info: storage unique identifier | Currently set to : SentenceDetectorDLModel_c83c27f46b97\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setEncoder(com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@551e1ab0) | Info: Data encoder | Currently set to : com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@551e1ab0\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setImpossiblePenultimates(['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']) | Info: Impossible penultimates | Currently set to : ['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setModelArchitecture('cnn') | Info: Model architecture (CNN) | Currently set to : cnn\n",">>> pipe['bert_sentence@sent_small_bert_L12_768'] has settable params:\n","pipe['bert_sentence@sent_small_bert_L12_768'].setBatchSize(8) | Info: Size of every batch | Currently set to : 8\n","pipe['bert_sentence@sent_small_bert_L12_768'].setCaseSensitive(False) | Info: whether to ignore case in tokens for embeddings matching | Currently set to : False\n","pipe['bert_sentence@sent_small_bert_L12_768'].setDimension(768) | Info: Number of embedding dimensions | Currently set to : 768\n","pipe['bert_sentence@sent_small_bert_L12_768'].setMaxSentenceLength(128) | Info: Max sentence length to process | Currently set to : 128\n","pipe['bert_sentence@sent_small_bert_L12_768'].setIsLong(False) | Info: Use Long type instead of Int type for inputs buffer - Some Bert models require Long instead of Int. | Currently set to : False\n","pipe['bert_sentence@sent_small_bert_L12_768'].setStorageRef('sent_small_bert_L12_768') | Info: unique reference name for identification | Currently set to : sent_small_bert_L12_768\n",">>> pipe['sentiment_dl@sent_small_bert_L12_768'] has settable params:\n","pipe['sentiment_dl@sent_small_bert_L12_768'].setThreshold(0.6) | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n","pipe['sentiment_dl@sent_small_bert_L12_768'].setThresholdLabel('neutral') | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n","pipe['sentiment_dl@sent_small_bert_L12_768'].setClasses(['positive', 'negative']) | Info: get the tags used to trained this SentimentDLModel | Currently set to : ['positive', 'negative']\n","pipe['sentiment_dl@sent_small_bert_L12_768'].setStorageRef('sent_small_bert_L12_768') | Info: unique reference name for identification | Currently set to : sent_small_bert_L12_768\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"-S888rwObpEs"},"source":[""],"execution_count":null,"outputs":[]}]} \ No newline at end of file +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "zkufh760uvF3" + }, + "source": [ + "![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)\n", + "\n", + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_covid_19.ipynb)\n", + "\n", + "\n", + "\n", + "# Training a Sentiment Analysis Classifier with NLU\n", + "## 2 Class COVID-19 Sentiment Classifer Training\n", + "With the [SentimentDL model](https://nlp.johnsnowlabs.com/docs/en/annotators#sentimentdl-multi-class-sentiment-analysis-annotator) from Spark NLP you can achieve State Of the Art results on any multi class text classification problem\n", + "\n", + "This notebook showcases the following features :\n", + "\n", + "- How to train the deep learning classifier\n", + "- How to store a pipeline to disk\n", + "- How to load the pipeline from disk (Enables NLU offline mode)\n", + "\n", + "\n", + "You can achieve these results or even better on this dataset with training data:\n", + "\n", + "\n", + "
\n", + "\n", + "![image.png]()\n", + "\n", + "\n", + "You can achieve these results or even better on this dataset with training data:\n", + "\n", + "\n", + "
\n", + "\n", + "![Screenshot 2021-02-25 190003.png]()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dur2drhW5Rvi" + }, + "source": [ + "# 1. Install Java 8 and NLU" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "hFGnBCHavltY" + }, + "source": [ + "!pip install -q johnsnowlabs" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "f4KkTfnR5Ugg" + }, + "source": [ + "# 2. Download Coivd19 NLP Text Sentiemnt Classifcation dataset\n", + "https://www.kaggle.com/datatattle/covid-19-nlp-text-classification\n", + "#Context\n", + "\n", + "This is a Dataset made of tweets about covid 19" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "OrVb5ZMvvrQD" + }, + "source": [ + "! wget https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/resources/en/classifier-dl/corona_nlp/Corona_NLP_train.csv" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 424 + }, + "id": "y4xSRWIhwT28", + "outputId": "eaa9efdb-08ed-444f-e53d-c261efcaf812" + }, + "source": [ + "import pandas as pd\n", + "train_path = '/content/Corona_NLP_train.csv'\n", + "\n", + "train_df = pd.read_csv(train_path)\n", + "# the text data to use for classification should be in a column named 'text'\n", + "columns=['text','y']\n", + "train_df = train_df[~train_df[\"y\"].isin([\"neutral\"])]\n", + "\n", + "train_df = train_df[columns]\n", + "from sklearn.model_selection import train_test_split\n", + "\n", + "train_df, test_df = train_test_split(train_df, test_size=0.2)\n", + "train_df" + ], + "execution_count": 1, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " text y\n", + "20175 What’s the world most dangerous job?\\r\\r\\nSup... negative\n", + "32442 Check out how Coronavirus is transforming Cons... positive\n", + "4860 this was an HOUR after opened My dad was shook... negative\n", + "12605 ScriptCo members have their medications delive... positive\n", + "6444 As we continue to monitor the situation concer... positive\n", + "... ... ...\n", + "34114 Cracking Down on Retail COVID 19 Profiteers in... positive\n", + "36475 @SethAbramson Work. I am very fortunate to hav... positive\n", + "29755 Create a free Amazon Business account to save ... positive\n", + "39000 Best online stores to buy things from Tokyo an... positive\n", + "11120 Bring back the ration book to stop this Panic ... negative\n", + "\n", + "[26758 rows x 2 columns]" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
texty
20175What’s the world most dangerous job?\\r\\r\\nSup...negative
32442Check out how Coronavirus is transforming Cons...positive
4860this was an HOUR after opened My dad was shook...negative
12605ScriptCo members have their medications delive...positive
6444As we continue to monitor the situation concer...positive
.........
34114Cracking Down on Retail COVID 19 Profiteers in...positive
36475@SethAbramson Work. I am very fortunate to hav...positive
29755Create a free Amazon Business account to save ...positive
39000Best online stores to buy things from Tokyo an...positive
11120Bring back the ration book to stop this Panic ...negative
\n", + "

26758 rows × 2 columns

\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 1 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0296Om2C5anY" + }, + "source": [ + "# 3. Train Deep Learning Classifier using nlu.load('train.sentiment')\n", + "\n", + "You dataset label column should be named 'y' and the feature column with text data should be named 'text'" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "id": "3ZIPkRkWftBG", + "outputId": "5177ed67-1321-4eec-a83c-e035e5d28f08" + }, + "source": [ + "from johnsnowlabs import nlp\n", + "from sklearn.metrics import classification_report\n", + "\n", + "# load a trainable pipeline by specifying the train. prefix and fit it on a datset with label and text columns\n", + "# by default the Universal Sentence Encoder (USE) Sentence embeddings are used for generation\n", + "trainable_pipe = nlp.load('train.sentiment')\n", + "fitted_pipe = trainable_pipe.fit(train_df.iloc[:50])\n", + "\n", + "# predict with the trainable pipeline on dataset and get predictions\n", + "preds = fitted_pipe.predict(train_df.iloc[:50],output_level='document')\n", + "#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n", + "preds.dropna(inplace=True)\n", + "print(classification_report(preds['y'], preds['sentiment']))\n", + "\n", + "preds" + ], + "execution_count": 33, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", + "sent_small_bert_L2_128 download started this may take some time.\n", + "Approximate size to download 16.1 MB\n", + "[OK!]\n" + ] + }, + { + "output_type": "stream", + "name": "stderr", + "text": [ + "/usr/local/lib/python3.10/dist-packages/nlu/pipe/pipeline.py:149: SettingWithCopyWarning: \n", + "A value is trying to be set on a copy of a slice from a DataFrame.\n", + "Try using .loc[row_indexer,col_indexer] = value instead\n", + "\n", + "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", + " dataset.y = dataset.y.apply(str)\n", + "/usr/local/lib/python3.10/dist-packages/nlu/pipe/utils/data_conversion_utils.py:160: SettingWithCopyWarning: \n", + "A value is trying to be set on a copy of a slice from a DataFrame.\n", + "Try using .loc[row_indexer,col_indexer] = value instead\n", + "\n", + "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", + " data['origin_index'] = data.index\n", + "/usr/local/lib/python3.10/dist-packages/nlu/pipe/utils/data_conversion_utils.py:160: SettingWithCopyWarning: \n", + "A value is trying to be set on a copy of a slice from a DataFrame.\n", + "Try using .loc[row_indexer,col_indexer] = value instead\n", + "\n", + "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", + " data['origin_index'] = data.index\n" + ] + }, + { + "output_type": "stream", + "name": "stdout", + "text": [ + " precision recall f1-score support\n", + "\n", + " negative 0.00 0.00 0.00 21\n", + " positive 0.58 1.00 0.73 29\n", + "\n", + " accuracy 0.58 50\n", + " macro avg 0.29 0.50 0.37 50\n", + "weighted avg 0.34 0.58 0.43 50\n", + "\n" + ] + }, + { + "output_type": "stream", + "name": "stderr", + "text": [ + "/usr/local/lib/python3.10/dist-packages/nlu/pipe/extractors/extractor_methods/base_extractor_methods.py:356: SettingWithCopyWarning: \n", + "A value is trying to be set on a copy of a slice from a DataFrame.\n", + "Try using .loc[row_indexer,col_indexer] = value instead\n", + "\n", + "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", + " df[cols_to_explode] = df[cols_to_explode].apply(pad_same_level_cols, axis=1)\n", + "/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.\n", + " _warn_prf(average, modifier, msg_start, len(result))\n", + "/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.\n", + " _warn_prf(average, modifier, msg_start, len(result))\n", + "/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.\n", + " _warn_prf(average, modifier, msg_start, len(result))\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " document \\\n", + "0 Over 45 tar sands projects that have already s... \n", + "1 I just got super excited because I found toile... \n", + "2 BE AFRAID, BE VERY AFRAID—INSLEE PROMOTES THE... \n", + "3 A lady fro @CheckersSA Brits tells me to hide ... \n", + "4 Just done an online shop and all the food esse... \n", + "5 @GavinNewsom so let me get this straight crowd... \n", + "6 NY: 18- to 49-year-olds make up more than half... \n", + "7 Emerging market debt prices reflect worst case... \n", + "8 do you clean your #supermarket items due to th... \n", + "9 @flipkartsupport @rsprasad @PMOIndia In his cr... \n", + "10 Several Guernsey women have set up a group of ... \n", + "11 I took Plaquenil for 4 years. Now it's the hot... \n", + "12 Can’t let this die. A little humor in these t... \n", + "13 I’ve been to the grocery store three times si... \n", + "14 That maybe there was a rioter out there who th... \n", + "15 First they release and spread the virus Strip ... \n", + "16 For about a year I have been toying with the i... \n", + "17 Creation Consumer Finance in Belfast are now p... \n", + "18 \"Your going to lose people to the flu but you'... \n", + "19 If you are a #RetailWorker and your store is s... \n", + "20 I love this social distancing.. No one stand c... \n", + "21 #Facebook Ads Fails to Reject COVID-19 Misinfo... \n", + "22 Lease prices decline in March as as the corona... \n", + "23 Complete Shutdown of Karnataka W.E.F 24th Marc... \n", + "24 @MobilePunch Another deceitful palliative in m... \n", + "25 I am anyways under lockdown since 6th March be... \n", + "26 @KFCSA @yumbrands was at your store , the air ... \n", + "27 How to support and save local businesses durin... \n", + "28 This is the latest current information on supe... \n", + "29 If anyone witnesses any shops hiking their pri... \n", + "30 It's about time @eBay_UK stopped allowing peop... \n", + "31 sounds like a good idea - LA market offers spe... \n", + "32 No matter how they're ramping up efforts at #d... \n", + "33 I was wondering if a lovely supermarket would ... \n", + "34 Telangana people, Please dont go out unless it... \n", + "35 Idiots are panic purchasing food in bulk that ... \n", + "36 Following @BBVA_USA's initial offers last week... \n", + "37 Please appreciate your supermarket delivery dr... \n", + "38 When asked, “Why does ADI continue to operate... \n", + "39 The long-term fallout of the #coronavirus lock... \n", + "40 HELP US PACK THE PANTRIES I m live at 12 15 on... \n", + "41 Seeing what’s being left behind at grocery st... \n", + "42 Everyone don t forget condoms when you are hoa... \n", + "43 If you come across any shops charging inflated... \n", + "44 New @SA_Update #consumer insights affecting ve... \n", + "45 How is COVID-19 affecting the pork and commodi... \n", + "46 Unbelievable https://t.co/fBrPYaxNH6 \n", + "47 How prepared are UK supermarkets to fulfill co... \n", + "48 Quakertown Distiller Providing Hand Sanitizer ... \n", + "49 Trump's disastrous dumbfuckery about #coronavi... \n", + "\n", + " sentence_embedding_small_bert_L2_128 sentiment \\\n", + "0 [-0.040878042578697205, 0.7422170639038086, -1... positive \n", + "1 [-0.8823249340057373, 0.2473045289516449, -0.2... positive \n", + "2 [-0.5860391855239868, 0.45967912673950195, -0.... positive \n", + "3 [-1.0504486560821533, 0.03315277025103569, -0.... positive \n", + "4 [-0.7228847742080688, 0.12011835724115372, -0.... positive \n", + "5 [-0.7876735329627991, -0.0005504372529685497, ... positive \n", + "6 [-0.7093901634216309, 0.15827353298664093, -0.... positive \n", + "7 [-0.38020092248916626, 0.469475656747818, -1.0... positive \n", + "8 [-0.9434630870819092, 0.15352500975131989, -0.... positive \n", + "9 [-0.5607839226722717, 0.01385028287768364, -0.... positive \n", + "10 [-0.5054810047149658, 0.3916030824184418, -0.1... positive \n", + "11 [-0.7918602228164673, 0.6628591418266296, -0.4... positive \n", + "12 [-0.6068148016929626, 0.4092712998390198, 0.18... positive \n", + "13 [-1.2009631395339966, 0.18674977123737335, -0.... positive \n", + "14 [-0.9833343029022217, 0.002539500128477812, -0... positive \n", + "15 [-0.903867781162262, 0.3173466622829437, -0.74... positive \n", + "16 [-0.7246012091636658, 0.4053008258342743, -0.2... positive \n", + "17 [-0.08034359663724899, 0.9084559082984924, -0.... positive \n", + "18 [-1.6253035068511963, 0.23702141642570496, -0.... positive \n", + "19 [-0.18891872465610504, 1.0232254266738892, -0.... positive \n", + "20 [-0.7862111330032349, 0.17004938423633575, -0.... positive \n", + "21 [0.09989737719297409, 0.21870799362659454, 0.2... positive \n", + "22 [-0.21414655447006226, 0.6170880794525146, -0.... positive \n", + "23 [-1.0250476598739624, 0.7916482090950012, -0.4... positive \n", + "24 [-0.967275083065033, 0.39687538146972656, -0.3... positive \n", + "25 [-0.8778604865074158, 0.4535127580165863, -0.8... positive \n", + "26 [-1.004093050956726, 0.5079272985458374, -0.48... positive \n", + "27 [-0.7925586700439453, 0.2502853572368622, -0.0... positive \n", + "28 [-1.155573844909668, 1.2174267768859863, -0.72... positive \n", + "29 [-1.0366969108581543, 0.9876224398612976, 0.21... positive \n", + "30 [-0.6213818788528442, 0.8505221009254456, -0.0... positive \n", + "31 [-0.8883647918701172, 0.09247803688049316, 0.0... positive \n", + "32 [-0.6616833209991455, 0.4456593692302704, -0.1... positive \n", + "33 [-0.7988455891609192, 0.3940075635910034, -0.2... positive \n", + "34 [-0.859315812587738, 0.18060781061649323, -0.3... positive \n", + "35 [-1.1440715789794922, 0.10256315022706985, -0.... positive \n", + "36 [-0.4633345901966095, 0.6127039790153503, -0.0... positive \n", + "37 [-0.8110732436180115, 0.9440520405769348, -0.2... positive \n", + "38 [-0.7378478050231934, 1.1133265495300293, -0.6... positive \n", + "39 [-0.4056348204612732, 0.8604288101196289, -0.2... positive \n", + "40 [-1.0248322486877441, 0.3955387771129608, -0.5... positive \n", + "41 [-0.7818620800971985, 0.35503625869750977, 0.1... positive \n", + "42 [-1.7940707206726074, 0.559665322303772, -0.49... positive \n", + "43 [-0.878898024559021, 0.7265058755874634, 0.008... positive \n", + "44 [0.0033321159426122904, 0.9780365228652954, -0... positive \n", + "45 [-0.6315991282463074, 0.5254122018814087, -0.3... positive \n", + "46 [-0.4117434322834015, 0.6890383362770081, 0.51... positive \n", + "47 [-0.5095536112785339, 0.48363006114959717, 0.0... positive \n", + "48 [-1.3492352962493896, 0.669184684753418, -0.38... positive \n", + "49 [-0.7200496792793274, 0.3299054205417633, -0.4... positive \n", + "\n", + " sentiment_confidence text \\\n", + "0 4.0 Over 45 tar sands projects that have already s... \n", + "1 1.0 I just got super excited because I found toile... \n", + "2 3.0 BE AFRAID, BE VERY AFRAID—INSLEE PROMOTES THE... \n", + "3 2.0 A lady fro @CheckersSA Brits tells me to hide ... \n", + "4 3.0 Just done an online shop and all the food esse... \n", + "5 2.0 @GavinNewsom so let me get this straight crow... \n", + "6 2.0 NY: 18- to 49-year-olds make up more than half... \n", + "7 1.0 Emerging market debt prices reflect worst case... \n", + "8 3.0 do you clean your #supermarket items due to t... \n", + "9 3.0 @flipkartsupport @rsprasad\\r\\r\\n@PMOIndia In h... \n", + "10 5.0 Several Guernsey women have set up a group of ... \n", + "11 3.0 I took Plaquenil for 4 years. Now it's the hot... \n", + "12 2.0 Can’t let this die. A little humor in these t... \n", + "13 1.0 I’ve been to the grocery store three times si... \n", + "14 1.0 That maybe there was a rioter out there who th... \n", + "15 3.0 First they release and spread the virus Strip ... \n", + "16 9.0 For about a year I have been toying with the i... \n", + "17 1.0 Creation Consumer Finance in Belfast are now p... \n", + "18 4.0 \"Your going to lose people to the flu but you'... \n", + "19 2.0 If you are a #RetailWorker and your store is s... \n", + "20 8.0 I love this social distancing..\\r\\r\\n\\r\\r\\nNo ... \n", + "21 1.0 #Facebook Ads Fails to Reject COVID-19 Misinfo... \n", + "22 1.0 Lease prices decline in March as as the corona... \n", + "23 5.0 Complete Shutdown of Karnataka W.E.F 24th Marc... \n", + "24 1.0 @MobilePunch Another deceitful palliative in m... \n", + "25 4.0 I am anyways under lockdown since 6th March be... \n", + "26 1.0 @KFCSA @yumbrands was at your store , the air ... \n", + "27 1.0 How to support and save local businesses durin... \n", + "28 7.0 This is the latest current information on supe... \n", + "29 6.0 If anyone witnesses any shops hiking their pri... \n", + "30 1.0 It's about time @eBay_UK stopped allowing peop... \n", + "31 2.0 sounds like a good idea - LA market offers spe... \n", + "32 2.0 No matter how they're ramping up efforts at #d... \n", + "33 1.0 I was wondering if a lovely supermarket would ... \n", + "34 1.0 Telangana people, Please dont go out unless it... \n", + "35 1.0 Idiots are panic purchasing food in bulk that ... \n", + "36 2.0 Following @BBVA_USA's initial offers last week... \n", + "37 3.0 Please appreciate your supermarket delivery dr... \n", + "38 9.0 When asked, “Why does ADI continue to operate... \n", + "39 2.0 The long-term fallout of the #coronavirus lock... \n", + "40 4.0 HELP US PACK THE PANTRIES I m live at 12 15 on... \n", + "41 4.0 Seeing what’s being left behind at grocery st... \n", + "42 2.0 Everyone don t forget condoms when you are hoa... \n", + "43 1.0 If you come across any shops charging inflated... \n", + "44 6.0 New @SA_Update #consumer insights affecting v... \n", + "45 4.0 How is COVID-19 affecting the pork and commodi... \n", + "46 4.0 Unbelievable https://t.co/fBrPYaxNH6 \n", + "47 1.0 How prepared are UK supermarkets to fulfill co... \n", + "48 1.0 Quakertown Distiller Providing Hand Sanitizer ... \n", + "49 1.0 Trump's disastrous dumbfuckery about #coronavi... \n", + "\n", + " y \n", + "0 positive \n", + "1 positive \n", + "2 positive \n", + "3 positive \n", + "4 negative \n", + "5 negative \n", + "6 negative \n", + "7 negative \n", + "8 positive \n", + "9 negative \n", + "10 negative \n", + "11 positive \n", + "12 negative \n", + "13 negative \n", + "14 positive \n", + "15 negative \n", + "16 positive \n", + "17 negative \n", + "18 negative \n", + "19 positive \n", + "20 positive \n", + "21 negative \n", + "22 positive \n", + "23 negative \n", + "24 negative \n", + "25 positive \n", + "26 negative \n", + "27 positive \n", + "28 positive \n", + "29 positive \n", + "30 negative \n", + "31 positive \n", + "32 positive \n", + "33 positive \n", + "34 positive \n", + "35 negative \n", + "36 negative \n", + "37 positive \n", + "38 positive \n", + "39 positive \n", + "40 positive \n", + "41 negative \n", + "42 positive \n", + "43 positive \n", + "44 positive \n", + "45 negative \n", + "46 positive \n", + "47 positive \n", + "48 positive \n", + "49 negative " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
documentsentence_embedding_small_bert_L2_128sentimentsentiment_confidencetexty
0Over 45 tar sands projects that have already s...[-0.040878042578697205, 0.7422170639038086, -1...positive4.0Over 45 tar sands projects that have already s...positive
1I just got super excited because I found toile...[-0.8823249340057373, 0.2473045289516449, -0.2...positive1.0I just got super excited because I found toile...positive
2BE AFRAID, BE VERY AFRAID—INSLEE PROMOTES THE...[-0.5860391855239868, 0.45967912673950195, -0....positive3.0BE AFRAID, BE VERY AFRAID—INSLEE PROMOTES THE...positive
3A lady fro @CheckersSA Brits tells me to hide ...[-1.0504486560821533, 0.03315277025103569, -0....positive2.0A lady fro @CheckersSA Brits tells me to hide ...positive
4Just done an online shop and all the food esse...[-0.7228847742080688, 0.12011835724115372, -0....positive3.0Just done an online shop and all the food esse...negative
5@GavinNewsom so let me get this straight crowd...[-0.7876735329627991, -0.0005504372529685497, ...positive2.0@GavinNewsom so let me get this straight crow...negative
6NY: 18- to 49-year-olds make up more than half...[-0.7093901634216309, 0.15827353298664093, -0....positive2.0NY: 18- to 49-year-olds make up more than half...negative
7Emerging market debt prices reflect worst case...[-0.38020092248916626, 0.469475656747818, -1.0...positive1.0Emerging market debt prices reflect worst case...negative
8do you clean your #supermarket items due to th...[-0.9434630870819092, 0.15352500975131989, -0....positive3.0do you clean your #supermarket items due to t...positive
9@flipkartsupport @rsprasad @PMOIndia In his cr...[-0.5607839226722717, 0.01385028287768364, -0....positive3.0@flipkartsupport @rsprasad\\r\\r\\n@PMOIndia In h...negative
10Several Guernsey women have set up a group of ...[-0.5054810047149658, 0.3916030824184418, -0.1...positive5.0Several Guernsey women have set up a group of ...negative
11I took Plaquenil for 4 years. Now it's the hot...[-0.7918602228164673, 0.6628591418266296, -0.4...positive3.0I took Plaquenil for 4 years. Now it's the hot...positive
12Can’t let this die. A little humor in these t...[-0.6068148016929626, 0.4092712998390198, 0.18...positive2.0Can’t let this die. A little humor in these t...negative
13I’ve been to the grocery store three times si...[-1.2009631395339966, 0.18674977123737335, -0....positive1.0I’ve been to the grocery store three times si...negative
14That maybe there was a rioter out there who th...[-0.9833343029022217, 0.002539500128477812, -0...positive1.0That maybe there was a rioter out there who th...positive
15First they release and spread the virus Strip ...[-0.903867781162262, 0.3173466622829437, -0.74...positive3.0First they release and spread the virus Strip ...negative
16For about a year I have been toying with the i...[-0.7246012091636658, 0.4053008258342743, -0.2...positive9.0For about a year I have been toying with the i...positive
17Creation Consumer Finance in Belfast are now p...[-0.08034359663724899, 0.9084559082984924, -0....positive1.0Creation Consumer Finance in Belfast are now p...negative
18\"Your going to lose people to the flu but you'...[-1.6253035068511963, 0.23702141642570496, -0....positive4.0\"Your going to lose people to the flu but you'...negative
19If you are a #RetailWorker and your store is s...[-0.18891872465610504, 1.0232254266738892, -0....positive2.0If you are a #RetailWorker and your store is s...positive
20I love this social distancing.. No one stand c...[-0.7862111330032349, 0.17004938423633575, -0....positive8.0I love this social distancing..\\r\\r\\n\\r\\r\\nNo ...positive
21#Facebook Ads Fails to Reject COVID-19 Misinfo...[0.09989737719297409, 0.21870799362659454, 0.2...positive1.0#Facebook Ads Fails to Reject COVID-19 Misinfo...negative
22Lease prices decline in March as as the corona...[-0.21414655447006226, 0.6170880794525146, -0....positive1.0Lease prices decline in March as as the corona...positive
23Complete Shutdown of Karnataka W.E.F 24th Marc...[-1.0250476598739624, 0.7916482090950012, -0.4...positive5.0Complete Shutdown of Karnataka W.E.F 24th Marc...negative
24@MobilePunch Another deceitful palliative in m...[-0.967275083065033, 0.39687538146972656, -0.3...positive1.0@MobilePunch Another deceitful palliative in m...negative
25I am anyways under lockdown since 6th March be...[-0.8778604865074158, 0.4535127580165863, -0.8...positive4.0I am anyways under lockdown since 6th March be...positive
26@KFCSA @yumbrands was at your store , the air ...[-1.004093050956726, 0.5079272985458374, -0.48...positive1.0@KFCSA @yumbrands was at your store , the air ...negative
27How to support and save local businesses durin...[-0.7925586700439453, 0.2502853572368622, -0.0...positive1.0How to support and save local businesses durin...positive
28This is the latest current information on supe...[-1.155573844909668, 1.2174267768859863, -0.72...positive7.0This is the latest current information on supe...positive
29If anyone witnesses any shops hiking their pri...[-1.0366969108581543, 0.9876224398612976, 0.21...positive6.0If anyone witnesses any shops hiking their pri...positive
30It's about time @eBay_UK stopped allowing peop...[-0.6213818788528442, 0.8505221009254456, -0.0...positive1.0It's about time @eBay_UK stopped allowing peop...negative
31sounds like a good idea - LA market offers spe...[-0.8883647918701172, 0.09247803688049316, 0.0...positive2.0sounds like a good idea - LA market offers spe...positive
32No matter how they're ramping up efforts at #d...[-0.6616833209991455, 0.4456593692302704, -0.1...positive2.0No matter how they're ramping up efforts at #d...positive
33I was wondering if a lovely supermarket would ...[-0.7988455891609192, 0.3940075635910034, -0.2...positive1.0I was wondering if a lovely supermarket would ...positive
34Telangana people, Please dont go out unless it...[-0.859315812587738, 0.18060781061649323, -0.3...positive1.0Telangana people, Please dont go out unless it...positive
35Idiots are panic purchasing food in bulk that ...[-1.1440715789794922, 0.10256315022706985, -0....positive1.0Idiots are panic purchasing food in bulk that ...negative
36Following @BBVA_USA's initial offers last week...[-0.4633345901966095, 0.6127039790153503, -0.0...positive2.0Following @BBVA_USA's initial offers last week...negative
37Please appreciate your supermarket delivery dr...[-0.8110732436180115, 0.9440520405769348, -0.2...positive3.0Please appreciate your supermarket delivery dr...positive
38When asked, “Why does ADI continue to operate...[-0.7378478050231934, 1.1133265495300293, -0.6...positive9.0When asked, “Why does ADI continue to operate...positive
39The long-term fallout of the #coronavirus lock...[-0.4056348204612732, 0.8604288101196289, -0.2...positive2.0The long-term fallout of the #coronavirus lock...positive
40HELP US PACK THE PANTRIES I m live at 12 15 on...[-1.0248322486877441, 0.3955387771129608, -0.5...positive4.0HELP US PACK THE PANTRIES I m live at 12 15 on...positive
41Seeing what’s being left behind at grocery st...[-0.7818620800971985, 0.35503625869750977, 0.1...positive4.0Seeing what’s being left behind at grocery st...negative
42Everyone don t forget condoms when you are hoa...[-1.7940707206726074, 0.559665322303772, -0.49...positive2.0Everyone don t forget condoms when you are hoa...positive
43If you come across any shops charging inflated...[-0.878898024559021, 0.7265058755874634, 0.008...positive1.0If you come across any shops charging inflated...positive
44New @SA_Update #consumer insights affecting ve...[0.0033321159426122904, 0.9780365228652954, -0...positive6.0New @SA_Update #consumer insights affecting v...positive
45How is COVID-19 affecting the pork and commodi...[-0.6315991282463074, 0.5254122018814087, -0.3...positive4.0How is COVID-19 affecting the pork and commodi...negative
46Unbelievable https://t.co/fBrPYaxNH6[-0.4117434322834015, 0.6890383362770081, 0.51...positive4.0Unbelievable https://t.co/fBrPYaxNH6positive
47How prepared are UK supermarkets to fulfill co...[-0.5095536112785339, 0.48363006114959717, 0.0...positive1.0How prepared are UK supermarkets to fulfill co...positive
48Quakertown Distiller Providing Hand Sanitizer ...[-1.3492352962493896, 0.669184684753418, -0.38...positive1.0Quakertown Distiller Providing Hand Sanitizer ...positive
49Trump's disastrous dumbfuckery about #coronavi...[-0.7200496792793274, 0.3299054205417633, -0.4...positive1.0Trump's disastrous dumbfuckery about #coronavi...negative
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 33 + } + ] + }, + { + "cell_type": "code", + "source": [ + "preds" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "id": "UbEWiXSUO0Qb", + "outputId": "b7ae6af2-3b84-4c6d-a997-d41163e474e0" + }, + "execution_count": 34, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " document \\\n", + "0 Over 45 tar sands projects that have already s... \n", + "1 I just got super excited because I found toile... \n", + "2 BE AFRAID, BE VERY AFRAID—INSLEE PROMOTES THE... \n", + "3 A lady fro @CheckersSA Brits tells me to hide ... \n", + "4 Just done an online shop and all the food esse... \n", + "5 @GavinNewsom so let me get this straight crowd... \n", + "6 NY: 18- to 49-year-olds make up more than half... \n", + "7 Emerging market debt prices reflect worst case... \n", + "8 do you clean your #supermarket items due to th... \n", + "9 @flipkartsupport @rsprasad @PMOIndia In his cr... \n", + "10 Several Guernsey women have set up a group of ... \n", + "11 I took Plaquenil for 4 years. Now it's the hot... \n", + "12 Can’t let this die. A little humor in these t... \n", + "13 I’ve been to the grocery store three times si... \n", + "14 That maybe there was a rioter out there who th... \n", + "15 First they release and spread the virus Strip ... \n", + "16 For about a year I have been toying with the i... \n", + "17 Creation Consumer Finance in Belfast are now p... \n", + "18 \"Your going to lose people to the flu but you'... \n", + "19 If you are a #RetailWorker and your store is s... \n", + "20 I love this social distancing.. No one stand c... \n", + "21 #Facebook Ads Fails to Reject COVID-19 Misinfo... \n", + "22 Lease prices decline in March as as the corona... \n", + "23 Complete Shutdown of Karnataka W.E.F 24th Marc... \n", + "24 @MobilePunch Another deceitful palliative in m... \n", + "25 I am anyways under lockdown since 6th March be... \n", + "26 @KFCSA @yumbrands was at your store , the air ... \n", + "27 How to support and save local businesses durin... \n", + "28 This is the latest current information on supe... \n", + "29 If anyone witnesses any shops hiking their pri... \n", + "30 It's about time @eBay_UK stopped allowing peop... \n", + "31 sounds like a good idea - LA market offers spe... \n", + "32 No matter how they're ramping up efforts at #d... \n", + "33 I was wondering if a lovely supermarket would ... \n", + "34 Telangana people, Please dont go out unless it... \n", + "35 Idiots are panic purchasing food in bulk that ... \n", + "36 Following @BBVA_USA's initial offers last week... \n", + "37 Please appreciate your supermarket delivery dr... \n", + "38 When asked, “Why does ADI continue to operate... \n", + "39 The long-term fallout of the #coronavirus lock... \n", + "40 HELP US PACK THE PANTRIES I m live at 12 15 on... \n", + "41 Seeing what’s being left behind at grocery st... \n", + "42 Everyone don t forget condoms when you are hoa... \n", + "43 If you come across any shops charging inflated... \n", + "44 New @SA_Update #consumer insights affecting ve... \n", + "45 How is COVID-19 affecting the pork and commodi... \n", + "46 Unbelievable https://t.co/fBrPYaxNH6 \n", + "47 How prepared are UK supermarkets to fulfill co... \n", + "48 Quakertown Distiller Providing Hand Sanitizer ... \n", + "49 Trump's disastrous dumbfuckery about #coronavi... \n", + "\n", + " sentence_embedding_small_bert_L2_128 sentiment \\\n", + "0 [-0.040878042578697205, 0.7422170639038086, -1... positive \n", + "1 [-0.8823249340057373, 0.2473045289516449, -0.2... positive \n", + "2 [-0.5860391855239868, 0.45967912673950195, -0.... positive \n", + "3 [-1.0504486560821533, 0.03315277025103569, -0.... positive \n", + "4 [-0.7228847742080688, 0.12011835724115372, -0.... positive \n", + "5 [-0.7876735329627991, -0.0005504372529685497, ... positive \n", + "6 [-0.7093901634216309, 0.15827353298664093, -0.... positive \n", + "7 [-0.38020092248916626, 0.469475656747818, -1.0... positive \n", + "8 [-0.9434630870819092, 0.15352500975131989, -0.... positive \n", + "9 [-0.5607839226722717, 0.01385028287768364, -0.... positive \n", + "10 [-0.5054810047149658, 0.3916030824184418, -0.1... positive \n", + "11 [-0.7918602228164673, 0.6628591418266296, -0.4... positive \n", + "12 [-0.6068148016929626, 0.4092712998390198, 0.18... positive \n", + "13 [-1.2009631395339966, 0.18674977123737335, -0.... positive \n", + "14 [-0.9833343029022217, 0.002539500128477812, -0... positive \n", + "15 [-0.903867781162262, 0.3173466622829437, -0.74... positive \n", + "16 [-0.7246012091636658, 0.4053008258342743, -0.2... positive \n", + "17 [-0.08034359663724899, 0.9084559082984924, -0.... positive \n", + "18 [-1.6253035068511963, 0.23702141642570496, -0.... positive \n", + "19 [-0.18891872465610504, 1.0232254266738892, -0.... positive \n", + "20 [-0.7862111330032349, 0.17004938423633575, -0.... positive \n", + "21 [0.09989737719297409, 0.21870799362659454, 0.2... positive \n", + "22 [-0.21414655447006226, 0.6170880794525146, -0.... positive \n", + "23 [-1.0250476598739624, 0.7916482090950012, -0.4... positive \n", + "24 [-0.967275083065033, 0.39687538146972656, -0.3... positive \n", + "25 [-0.8778604865074158, 0.4535127580165863, -0.8... positive \n", + "26 [-1.004093050956726, 0.5079272985458374, -0.48... positive \n", + "27 [-0.7925586700439453, 0.2502853572368622, -0.0... positive \n", + "28 [-1.155573844909668, 1.2174267768859863, -0.72... positive \n", + "29 [-1.0366969108581543, 0.9876224398612976, 0.21... positive \n", + "30 [-0.6213818788528442, 0.8505221009254456, -0.0... positive \n", + "31 [-0.8883647918701172, 0.09247803688049316, 0.0... positive \n", + "32 [-0.6616833209991455, 0.4456593692302704, -0.1... positive \n", + "33 [-0.7988455891609192, 0.3940075635910034, -0.2... positive \n", + "34 [-0.859315812587738, 0.18060781061649323, -0.3... positive \n", + "35 [-1.1440715789794922, 0.10256315022706985, -0.... positive \n", + "36 [-0.4633345901966095, 0.6127039790153503, -0.0... positive \n", + "37 [-0.8110732436180115, 0.9440520405769348, -0.2... positive \n", + "38 [-0.7378478050231934, 1.1133265495300293, -0.6... positive \n", + "39 [-0.4056348204612732, 0.8604288101196289, -0.2... positive \n", + "40 [-1.0248322486877441, 0.3955387771129608, -0.5... positive \n", + "41 [-0.7818620800971985, 0.35503625869750977, 0.1... positive \n", + "42 [-1.7940707206726074, 0.559665322303772, -0.49... positive \n", + "43 [-0.878898024559021, 0.7265058755874634, 0.008... positive \n", + "44 [0.0033321159426122904, 0.9780365228652954, -0... positive \n", + "45 [-0.6315991282463074, 0.5254122018814087, -0.3... positive \n", + "46 [-0.4117434322834015, 0.6890383362770081, 0.51... positive \n", + "47 [-0.5095536112785339, 0.48363006114959717, 0.0... positive \n", + "48 [-1.3492352962493896, 0.669184684753418, -0.38... positive \n", + "49 [-0.7200496792793274, 0.3299054205417633, -0.4... positive \n", + "\n", + " sentiment_confidence text \\\n", + "0 4.0 Over 45 tar sands projects that have already s... \n", + "1 1.0 I just got super excited because I found toile... \n", + "2 3.0 BE AFRAID, BE VERY AFRAID—INSLEE PROMOTES THE... \n", + "3 2.0 A lady fro @CheckersSA Brits tells me to hide ... \n", + "4 3.0 Just done an online shop and all the food esse... \n", + "5 2.0 @GavinNewsom so let me get this straight crow... \n", + "6 2.0 NY: 18- to 49-year-olds make up more than half... \n", + "7 1.0 Emerging market debt prices reflect worst case... \n", + "8 3.0 do you clean your #supermarket items due to t... \n", + "9 3.0 @flipkartsupport @rsprasad\\r\\r\\n@PMOIndia In h... \n", + "10 5.0 Several Guernsey women have set up a group of ... \n", + "11 3.0 I took Plaquenil for 4 years. Now it's the hot... \n", + "12 2.0 Can’t let this die. A little humor in these t... \n", + "13 1.0 I’ve been to the grocery store three times si... \n", + "14 1.0 That maybe there was a rioter out there who th... \n", + "15 3.0 First they release and spread the virus Strip ... \n", + "16 9.0 For about a year I have been toying with the i... \n", + "17 1.0 Creation Consumer Finance in Belfast are now p... \n", + "18 4.0 \"Your going to lose people to the flu but you'... \n", + "19 2.0 If you are a #RetailWorker and your store is s... \n", + "20 8.0 I love this social distancing..\\r\\r\\n\\r\\r\\nNo ... \n", + "21 1.0 #Facebook Ads Fails to Reject COVID-19 Misinfo... \n", + "22 1.0 Lease prices decline in March as as the corona... \n", + "23 5.0 Complete Shutdown of Karnataka W.E.F 24th Marc... \n", + "24 1.0 @MobilePunch Another deceitful palliative in m... \n", + "25 4.0 I am anyways under lockdown since 6th March be... \n", + "26 1.0 @KFCSA @yumbrands was at your store , the air ... \n", + "27 1.0 How to support and save local businesses durin... \n", + "28 7.0 This is the latest current information on supe... \n", + "29 6.0 If anyone witnesses any shops hiking their pri... \n", + "30 1.0 It's about time @eBay_UK stopped allowing peop... \n", + "31 2.0 sounds like a good idea - LA market offers spe... \n", + "32 2.0 No matter how they're ramping up efforts at #d... \n", + "33 1.0 I was wondering if a lovely supermarket would ... \n", + "34 1.0 Telangana people, Please dont go out unless it... \n", + "35 1.0 Idiots are panic purchasing food in bulk that ... \n", + "36 2.0 Following @BBVA_USA's initial offers last week... \n", + "37 3.0 Please appreciate your supermarket delivery dr... \n", + "38 9.0 When asked, “Why does ADI continue to operate... \n", + "39 2.0 The long-term fallout of the #coronavirus lock... \n", + "40 4.0 HELP US PACK THE PANTRIES I m live at 12 15 on... \n", + "41 4.0 Seeing what’s being left behind at grocery st... \n", + "42 2.0 Everyone don t forget condoms when you are hoa... \n", + "43 1.0 If you come across any shops charging inflated... \n", + "44 6.0 New @SA_Update #consumer insights affecting v... \n", + "45 4.0 How is COVID-19 affecting the pork and commodi... \n", + "46 4.0 Unbelievable https://t.co/fBrPYaxNH6 \n", + "47 1.0 How prepared are UK supermarkets to fulfill co... \n", + "48 1.0 Quakertown Distiller Providing Hand Sanitizer ... \n", + "49 1.0 Trump's disastrous dumbfuckery about #coronavi... \n", + "\n", + " y \n", + "0 positive \n", + "1 positive \n", + "2 positive \n", + "3 positive \n", + "4 negative \n", + "5 negative \n", + "6 negative \n", + "7 negative \n", + "8 positive \n", + "9 negative \n", + "10 negative \n", + "11 positive \n", + "12 negative \n", + "13 negative \n", + "14 positive \n", + "15 negative \n", + "16 positive \n", + "17 negative \n", + "18 negative \n", + "19 positive \n", + "20 positive \n", + "21 negative \n", + "22 positive \n", + "23 negative \n", + "24 negative \n", + "25 positive \n", + "26 negative \n", + "27 positive \n", + "28 positive \n", + "29 positive \n", + "30 negative \n", + "31 positive \n", + "32 positive \n", + "33 positive \n", + "34 positive \n", + "35 negative \n", + "36 negative \n", + "37 positive \n", + "38 positive \n", + "39 positive \n", + "40 positive \n", + "41 negative \n", + "42 positive \n", + "43 positive \n", + "44 positive \n", + "45 negative \n", + "46 positive \n", + "47 positive \n", + "48 positive \n", + "49 negative " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
documentsentence_embedding_small_bert_L2_128sentimentsentiment_confidencetexty
0Over 45 tar sands projects that have already s...[-0.040878042578697205, 0.7422170639038086, -1...positive4.0Over 45 tar sands projects that have already s...positive
1I just got super excited because I found toile...[-0.8823249340057373, 0.2473045289516449, -0.2...positive1.0I just got super excited because I found toile...positive
2BE AFRAID, BE VERY AFRAID—INSLEE PROMOTES THE...[-0.5860391855239868, 0.45967912673950195, -0....positive3.0BE AFRAID, BE VERY AFRAID—INSLEE PROMOTES THE...positive
3A lady fro @CheckersSA Brits tells me to hide ...[-1.0504486560821533, 0.03315277025103569, -0....positive2.0A lady fro @CheckersSA Brits tells me to hide ...positive
4Just done an online shop and all the food esse...[-0.7228847742080688, 0.12011835724115372, -0....positive3.0Just done an online shop and all the food esse...negative
5@GavinNewsom so let me get this straight crowd...[-0.7876735329627991, -0.0005504372529685497, ...positive2.0@GavinNewsom so let me get this straight crow...negative
6NY: 18- to 49-year-olds make up more than half...[-0.7093901634216309, 0.15827353298664093, -0....positive2.0NY: 18- to 49-year-olds make up more than half...negative
7Emerging market debt prices reflect worst case...[-0.38020092248916626, 0.469475656747818, -1.0...positive1.0Emerging market debt prices reflect worst case...negative
8do you clean your #supermarket items due to th...[-0.9434630870819092, 0.15352500975131989, -0....positive3.0do you clean your #supermarket items due to t...positive
9@flipkartsupport @rsprasad @PMOIndia In his cr...[-0.5607839226722717, 0.01385028287768364, -0....positive3.0@flipkartsupport @rsprasad\\r\\r\\n@PMOIndia In h...negative
10Several Guernsey women have set up a group of ...[-0.5054810047149658, 0.3916030824184418, -0.1...positive5.0Several Guernsey women have set up a group of ...negative
11I took Plaquenil for 4 years. Now it's the hot...[-0.7918602228164673, 0.6628591418266296, -0.4...positive3.0I took Plaquenil for 4 years. Now it's the hot...positive
12Can’t let this die. A little humor in these t...[-0.6068148016929626, 0.4092712998390198, 0.18...positive2.0Can’t let this die. A little humor in these t...negative
13I’ve been to the grocery store three times si...[-1.2009631395339966, 0.18674977123737335, -0....positive1.0I’ve been to the grocery store three times si...negative
14That maybe there was a rioter out there who th...[-0.9833343029022217, 0.002539500128477812, -0...positive1.0That maybe there was a rioter out there who th...positive
15First they release and spread the virus Strip ...[-0.903867781162262, 0.3173466622829437, -0.74...positive3.0First they release and spread the virus Strip ...negative
16For about a year I have been toying with the i...[-0.7246012091636658, 0.4053008258342743, -0.2...positive9.0For about a year I have been toying with the i...positive
17Creation Consumer Finance in Belfast are now p...[-0.08034359663724899, 0.9084559082984924, -0....positive1.0Creation Consumer Finance in Belfast are now p...negative
18\"Your going to lose people to the flu but you'...[-1.6253035068511963, 0.23702141642570496, -0....positive4.0\"Your going to lose people to the flu but you'...negative
19If you are a #RetailWorker and your store is s...[-0.18891872465610504, 1.0232254266738892, -0....positive2.0If you are a #RetailWorker and your store is s...positive
20I love this social distancing.. No one stand c...[-0.7862111330032349, 0.17004938423633575, -0....positive8.0I love this social distancing..\\r\\r\\n\\r\\r\\nNo ...positive
21#Facebook Ads Fails to Reject COVID-19 Misinfo...[0.09989737719297409, 0.21870799362659454, 0.2...positive1.0#Facebook Ads Fails to Reject COVID-19 Misinfo...negative
22Lease prices decline in March as as the corona...[-0.21414655447006226, 0.6170880794525146, -0....positive1.0Lease prices decline in March as as the corona...positive
23Complete Shutdown of Karnataka W.E.F 24th Marc...[-1.0250476598739624, 0.7916482090950012, -0.4...positive5.0Complete Shutdown of Karnataka W.E.F 24th Marc...negative
24@MobilePunch Another deceitful palliative in m...[-0.967275083065033, 0.39687538146972656, -0.3...positive1.0@MobilePunch Another deceitful palliative in m...negative
25I am anyways under lockdown since 6th March be...[-0.8778604865074158, 0.4535127580165863, -0.8...positive4.0I am anyways under lockdown since 6th March be...positive
26@KFCSA @yumbrands was at your store , the air ...[-1.004093050956726, 0.5079272985458374, -0.48...positive1.0@KFCSA @yumbrands was at your store , the air ...negative
27How to support and save local businesses durin...[-0.7925586700439453, 0.2502853572368622, -0.0...positive1.0How to support and save local businesses durin...positive
28This is the latest current information on supe...[-1.155573844909668, 1.2174267768859863, -0.72...positive7.0This is the latest current information on supe...positive
29If anyone witnesses any shops hiking their pri...[-1.0366969108581543, 0.9876224398612976, 0.21...positive6.0If anyone witnesses any shops hiking their pri...positive
30It's about time @eBay_UK stopped allowing peop...[-0.6213818788528442, 0.8505221009254456, -0.0...positive1.0It's about time @eBay_UK stopped allowing peop...negative
31sounds like a good idea - LA market offers spe...[-0.8883647918701172, 0.09247803688049316, 0.0...positive2.0sounds like a good idea - LA market offers spe...positive
32No matter how they're ramping up efforts at #d...[-0.6616833209991455, 0.4456593692302704, -0.1...positive2.0No matter how they're ramping up efforts at #d...positive
33I was wondering if a lovely supermarket would ...[-0.7988455891609192, 0.3940075635910034, -0.2...positive1.0I was wondering if a lovely supermarket would ...positive
34Telangana people, Please dont go out unless it...[-0.859315812587738, 0.18060781061649323, -0.3...positive1.0Telangana people, Please dont go out unless it...positive
35Idiots are panic purchasing food in bulk that ...[-1.1440715789794922, 0.10256315022706985, -0....positive1.0Idiots are panic purchasing food in bulk that ...negative
36Following @BBVA_USA's initial offers last week...[-0.4633345901966095, 0.6127039790153503, -0.0...positive2.0Following @BBVA_USA's initial offers last week...negative
37Please appreciate your supermarket delivery dr...[-0.8110732436180115, 0.9440520405769348, -0.2...positive3.0Please appreciate your supermarket delivery dr...positive
38When asked, “Why does ADI continue to operate...[-0.7378478050231934, 1.1133265495300293, -0.6...positive9.0When asked, “Why does ADI continue to operate...positive
39The long-term fallout of the #coronavirus lock...[-0.4056348204612732, 0.8604288101196289, -0.2...positive2.0The long-term fallout of the #coronavirus lock...positive
40HELP US PACK THE PANTRIES I m live at 12 15 on...[-1.0248322486877441, 0.3955387771129608, -0.5...positive4.0HELP US PACK THE PANTRIES I m live at 12 15 on...positive
41Seeing what’s being left behind at grocery st...[-0.7818620800971985, 0.35503625869750977, 0.1...positive4.0Seeing what’s being left behind at grocery st...negative
42Everyone don t forget condoms when you are hoa...[-1.7940707206726074, 0.559665322303772, -0.49...positive2.0Everyone don t forget condoms when you are hoa...positive
43If you come across any shops charging inflated...[-0.878898024559021, 0.7265058755874634, 0.008...positive1.0If you come across any shops charging inflated...positive
44New @SA_Update #consumer insights affecting ve...[0.0033321159426122904, 0.9780365228652954, -0...positive6.0New @SA_Update #consumer insights affecting v...positive
45How is COVID-19 affecting the pork and commodi...[-0.6315991282463074, 0.5254122018814087, -0.3...positive4.0How is COVID-19 affecting the pork and commodi...negative
46Unbelievable https://t.co/fBrPYaxNH6[-0.4117434322834015, 0.6890383362770081, 0.51...positive4.0Unbelievable https://t.co/fBrPYaxNH6positive
47How prepared are UK supermarkets to fulfill co...[-0.5095536112785339, 0.48363006114959717, 0.0...positive1.0How prepared are UK supermarkets to fulfill co...positive
48Quakertown Distiller Providing Hand Sanitizer ...[-1.3492352962493896, 0.669184684753418, -0.38...positive1.0Quakertown Distiller Providing Hand Sanitizer ...positive
49Trump's disastrous dumbfuckery about #coronavi...[-0.7200496792793274, 0.3299054205417633, -0.4...positive1.0Trump's disastrous dumbfuckery about #coronavi...negative
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 34 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lVyOE2wV0fw_" + }, + "source": [ + "# 4. Test the fitted pipe on new example" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 150 + }, + "id": "qdCUg2MR0PD2", + "outputId": "3c6d4f10-f2ef-4fbf-8488-bf8be3857f92" + }, + "source": [ + "fitted_pipe.predict(\"Everything is under control !\")" + ], + "execution_count": 35, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "sentence_detector_dl download started this may take some time.\n", + "Approximate size to download 354.6 KB\n", + "[OK!]\n", + "Warning::Spark Session already created, some configs may not take.\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " sentence \\\n", + "0 Everything is under control ! \n", + "\n", + " sentence_embedding_small_bert_L2_128 sentiment \\\n", + "0 [-1.6309462785720825, 0.7175763845443726, -0.7... positive \n", + "\n", + " sentiment_confidence \n", + "0 0.999999 " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
sentencesentence_embedding_small_bert_L2_128sentimentsentiment_confidence
0Everything is under control ![-1.6309462785720825, 0.7175763845443726, -0.7...positive0.999999
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 35 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xflpwrVjjBVD" + }, + "source": [ + "## 5. Configure pipe training parameters" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "UtsAUGTmOTms", + "outputId": "b535fdb8-eee3-4825-e3d5-74d4dd7b42a5" + }, + "source": [ + "trainable_pipe.print_info()" + ], + "execution_count": 36, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n", + ">>> component_list['bert_sentence_embeddings@sent_small_bert_L2_128'] has settable params:\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setBatchSize(8) | Info: Size of every batch | Currently set to : 8\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setIsLong(False) | Info: Use Long type instead of Int type for inputs buffer - Some Bert models require Long instead of Int. | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setMaxSentenceLength(128) | Info: Max sentence length to process | Currently set to : 128\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setDimension(128) | Info: Number of embedding dimensions | Currently set to : 128\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setCaseSensitive(False) | Info: whether to ignore case in tokens for embeddings matching | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setStorageRef('sent_small_bert_L2_128') | Info: unique reference name for identification | Currently set to : sent_small_bert_L2_128\n", + ">>> component_list['document_assembler'] has settable params:\n", + "component_list['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n", + ">>> component_list['sentiment_dl@sent_small_bert_L2_128'] has settable params:\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setThreshold(0.6) | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setThresholdLabel('neutral') | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setStorageRef('sent_small_bert_L2_128') | Info: unique reference name for identification | Currently set to : sent_small_bert_L2_128\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2GJdDNV9jEIe" + }, + "source": [ + "##6. Retrain with new parameters" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "id": "mptfvHx-MMMX", + "outputId": "c647f370-bd30-43ef-d2a9-b7ffc1948686" + }, + "source": [ + "# Train longer!\n", + "trainable_pipe = nlp.load('train.sentiment')\n", + "trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(5)\n", + "fitted_pipe = trainable_pipe.fit(train_df.iloc[:50])\n", + "# predict with the trainable pipeline on dataset and get predictions\n", + "preds = fitted_pipe.predict(train_df.iloc[:50],output_level='document')\n", + "\n", + "#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n", + "preds.dropna(inplace=True)\n", + "print(classification_report(preds['y'], preds['sentiment']))\n", + "\n", + "preds" + ], + "execution_count": 37, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", + "sent_small_bert_L2_128 download started this may take some time.\n", + "Approximate size to download 16.1 MB\n", + "[OK!]\n" + ] + }, + { + "output_type": "stream", + "name": "stderr", + "text": [ + "/usr/local/lib/python3.10/dist-packages/nlu/pipe/pipeline.py:149: SettingWithCopyWarning: \n", + "A value is trying to be set on a copy of a slice from a DataFrame.\n", + "Try using .loc[row_indexer,col_indexer] = value instead\n", + "\n", + "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", + " dataset.y = dataset.y.apply(str)\n", + "/usr/local/lib/python3.10/dist-packages/nlu/pipe/utils/data_conversion_utils.py:160: SettingWithCopyWarning: \n", + "A value is trying to be set on a copy of a slice from a DataFrame.\n", + "Try using .loc[row_indexer,col_indexer] = value instead\n", + "\n", + "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", + " data['origin_index'] = data.index\n", + "/usr/local/lib/python3.10/dist-packages/nlu/pipe/utils/data_conversion_utils.py:160: SettingWithCopyWarning: \n", + "A value is trying to be set on a copy of a slice from a DataFrame.\n", + "Try using .loc[row_indexer,col_indexer] = value instead\n", + "\n", + "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", + " data['origin_index'] = data.index\n" + ] + }, + { + "output_type": "stream", + "name": "stdout", + "text": [ + " precision recall f1-score support\n", + "\n", + " negative 0.00 0.00 0.00 21\n", + " positive 0.58 1.00 0.73 29\n", + "\n", + " accuracy 0.58 50\n", + " macro avg 0.29 0.50 0.37 50\n", + "weighted avg 0.34 0.58 0.43 50\n", + "\n" + ] + }, + { + "output_type": "stream", + "name": "stderr", + "text": [ + "/usr/local/lib/python3.10/dist-packages/nlu/pipe/extractors/extractor_methods/base_extractor_methods.py:356: SettingWithCopyWarning: \n", + "A value is trying to be set on a copy of a slice from a DataFrame.\n", + "Try using .loc[row_indexer,col_indexer] = value instead\n", + "\n", + "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", + " df[cols_to_explode] = df[cols_to_explode].apply(pad_same_level_cols, axis=1)\n", + "/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.\n", + " _warn_prf(average, modifier, msg_start, len(result))\n", + "/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.\n", + " _warn_prf(average, modifier, msg_start, len(result))\n", + "/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.\n", + " _warn_prf(average, modifier, msg_start, len(result))\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " document \\\n", + "0 Over 45 tar sands projects that have already s... \n", + "1 I just got super excited because I found toile... \n", + "2 BE AFRAID, BE VERY AFRAID—INSLEE PROMOTES THE... \n", + "3 A lady fro @CheckersSA Brits tells me to hide ... \n", + "4 Just done an online shop and all the food esse... \n", + "5 @GavinNewsom so let me get this straight crowd... \n", + "6 NY: 18- to 49-year-olds make up more than half... \n", + "7 Emerging market debt prices reflect worst case... \n", + "8 do you clean your #supermarket items due to th... \n", + "9 @flipkartsupport @rsprasad @PMOIndia In his cr... \n", + "10 Several Guernsey women have set up a group of ... \n", + "11 I took Plaquenil for 4 years. Now it's the hot... \n", + "12 Can’t let this die. A little humor in these t... \n", + "13 I’ve been to the grocery store three times si... \n", + "14 That maybe there was a rioter out there who th... \n", + "15 First they release and spread the virus Strip ... \n", + "16 For about a year I have been toying with the i... \n", + "17 Creation Consumer Finance in Belfast are now p... \n", + "18 \"Your going to lose people to the flu but you'... \n", + "19 If you are a #RetailWorker and your store is s... \n", + "20 I love this social distancing.. No one stand c... \n", + "21 #Facebook Ads Fails to Reject COVID-19 Misinfo... \n", + "22 Lease prices decline in March as as the corona... \n", + "23 Complete Shutdown of Karnataka W.E.F 24th Marc... \n", + "24 @MobilePunch Another deceitful palliative in m... \n", + "25 I am anyways under lockdown since 6th March be... \n", + "26 @KFCSA @yumbrands was at your store , the air ... \n", + "27 How to support and save local businesses durin... \n", + "28 This is the latest current information on supe... \n", + "29 If anyone witnesses any shops hiking their pri... \n", + "30 It's about time @eBay_UK stopped allowing peop... \n", + "31 sounds like a good idea - LA market offers spe... \n", + "32 No matter how they're ramping up efforts at #d... \n", + "33 I was wondering if a lovely supermarket would ... \n", + "34 Telangana people, Please dont go out unless it... \n", + "35 Idiots are panic purchasing food in bulk that ... \n", + "36 Following @BBVA_USA's initial offers last week... \n", + "37 Please appreciate your supermarket delivery dr... \n", + "38 When asked, “Why does ADI continue to operate... \n", + "39 The long-term fallout of the #coronavirus lock... \n", + "40 HELP US PACK THE PANTRIES I m live at 12 15 on... \n", + "41 Seeing what’s being left behind at grocery st... \n", + "42 Everyone don t forget condoms when you are hoa... \n", + "43 If you come across any shops charging inflated... \n", + "44 New @SA_Update #consumer insights affecting ve... \n", + "45 How is COVID-19 affecting the pork and commodi... \n", + "46 Unbelievable https://t.co/fBrPYaxNH6 \n", + "47 How prepared are UK supermarkets to fulfill co... \n", + "48 Quakertown Distiller Providing Hand Sanitizer ... \n", + "49 Trump's disastrous dumbfuckery about #coronavi... \n", + "\n", + " sentence_embedding_small_bert_L2_128 sentiment \\\n", + "0 [-0.040878042578697205, 0.7422170639038086, -1... positive \n", + "1 [-0.8823249340057373, 0.2473045289516449, -0.2... positive \n", + "2 [-0.5860391855239868, 0.45967912673950195, -0.... positive \n", + "3 [-1.0504486560821533, 0.03315277025103569, -0.... positive \n", + "4 [-0.7228847742080688, 0.12011835724115372, -0.... positive \n", + "5 [-0.7876735329627991, -0.0005504372529685497, ... positive \n", + "6 [-0.7093901634216309, 0.15827353298664093, -0.... positive \n", + "7 [-0.38020092248916626, 0.469475656747818, -1.0... positive \n", + "8 [-0.9434630870819092, 0.15352500975131989, -0.... positive \n", + "9 [-0.5607839226722717, 0.01385028287768364, -0.... positive \n", + "10 [-0.5054810047149658, 0.3916030824184418, -0.1... positive \n", + "11 [-0.7918602228164673, 0.6628591418266296, -0.4... positive \n", + "12 [-0.6068148016929626, 0.4092712998390198, 0.18... positive \n", + "13 [-1.2009631395339966, 0.18674977123737335, -0.... positive \n", + "14 [-0.9833343029022217, 0.002539500128477812, -0... positive \n", + "15 [-0.903867781162262, 0.3173466622829437, -0.74... positive \n", + "16 [-0.7246012091636658, 0.4053008258342743, -0.2... positive \n", + "17 [-0.08034359663724899, 0.9084559082984924, -0.... positive \n", + "18 [-1.6253035068511963, 0.23702141642570496, -0.... positive \n", + "19 [-0.18891872465610504, 1.0232254266738892, -0.... positive \n", + "20 [-0.7862111330032349, 0.17004938423633575, -0.... positive \n", + "21 [0.09989737719297409, 0.21870799362659454, 0.2... positive \n", + "22 [-0.21414655447006226, 0.6170880794525146, -0.... positive \n", + "23 [-1.0250476598739624, 0.7916482090950012, -0.4... positive \n", + "24 [-0.967275083065033, 0.39687538146972656, -0.3... positive \n", + "25 [-0.8778604865074158, 0.4535127580165863, -0.8... positive \n", + "26 [-1.004093050956726, 0.5079272985458374, -0.48... positive \n", + "27 [-0.7925586700439453, 0.2502853572368622, -0.0... positive \n", + "28 [-1.155573844909668, 1.2174267768859863, -0.72... positive \n", + "29 [-1.0366969108581543, 0.9876224398612976, 0.21... positive \n", + "30 [-0.6213818788528442, 0.8505221009254456, -0.0... positive \n", + "31 [-0.8883647918701172, 0.09247803688049316, 0.0... positive \n", + "32 [-0.6616833209991455, 0.4456593692302704, -0.1... positive \n", + "33 [-0.7988455891609192, 0.3940075635910034, -0.2... positive \n", + "34 [-0.859315812587738, 0.18060781061649323, -0.3... positive \n", + "35 [-1.1440715789794922, 0.10256315022706985, -0.... positive \n", + "36 [-0.4633345901966095, 0.6127039790153503, -0.0... positive \n", + "37 [-0.8110732436180115, 0.9440520405769348, -0.2... positive \n", + "38 [-0.7378478050231934, 1.1133265495300293, -0.6... positive \n", + "39 [-0.4056348204612732, 0.8604288101196289, -0.2... positive \n", + "40 [-1.0248322486877441, 0.3955387771129608, -0.5... positive \n", + "41 [-0.7818620800971985, 0.35503625869750977, 0.1... positive \n", + "42 [-1.7940707206726074, 0.559665322303772, -0.49... positive \n", + "43 [-0.878898024559021, 0.7265058755874634, 0.008... positive \n", + "44 [0.0033321159426122904, 0.9780365228652954, -0... positive \n", + "45 [-0.6315991282463074, 0.5254122018814087, -0.3... positive \n", + "46 [-0.4117434322834015, 0.6890383362770081, 0.51... positive \n", + "47 [-0.5095536112785339, 0.48363006114959717, 0.0... positive \n", + "48 [-1.3492352962493896, 0.669184684753418, -0.38... positive \n", + "49 [-0.7200496792793274, 0.3299054205417633, -0.4... positive \n", + "\n", + " sentiment_confidence text \\\n", + "0 3.0 Over 45 tar sands projects that have already s... \n", + "1 2.0 I just got super excited because I found toile... \n", + "2 2.0 BE AFRAID, BE VERY AFRAID—INSLEE PROMOTES THE... \n", + "3 7.0 A lady fro @CheckersSA Brits tells me to hide ... \n", + "4 1.0 Just done an online shop and all the food esse... \n", + "5 7.0 @GavinNewsom so let me get this straight crow... \n", + "6 3.0 NY: 18- to 49-year-olds make up more than half... \n", + "7 4.0 Emerging market debt prices reflect worst case... \n", + "8 3.0 do you clean your #supermarket items due to t... \n", + "9 1.0 @flipkartsupport @rsprasad\\r\\r\\n@PMOIndia In h... \n", + "10 6.0 Several Guernsey women have set up a group of ... \n", + "11 2.0 I took Plaquenil for 4 years. Now it's the hot... \n", + "12 1.0 Can’t let this die. A little humor in these t... \n", + "13 1.0 I’ve been to the grocery store three times si... \n", + "14 8.0 That maybe there was a rioter out there who th... \n", + "15 2.0 First they release and spread the virus Strip ... \n", + "16 7.0 For about a year I have been toying with the i... \n", + "17 2.0 Creation Consumer Finance in Belfast are now p... \n", + "18 5.0 \"Your going to lose people to the flu but you'... \n", + "19 5.0 If you are a #RetailWorker and your store is s... \n", + "20 2.0 I love this social distancing..\\r\\r\\n\\r\\r\\nNo ... \n", + "21 1.0 #Facebook Ads Fails to Reject COVID-19 Misinfo... \n", + "22 1.0 Lease prices decline in March as as the corona... \n", + "23 4.0 Complete Shutdown of Karnataka W.E.F 24th Marc... \n", + "24 1.0 @MobilePunch Another deceitful palliative in m... \n", + "25 3.0 I am anyways under lockdown since 6th March be... \n", + "26 2.0 @KFCSA @yumbrands was at your store , the air ... \n", + "27 1.0 How to support and save local businesses durin... \n", + "28 4.0 This is the latest current information on supe... \n", + "29 2.0 If anyone witnesses any shops hiking their pri... \n", + "30 1.0 It's about time @eBay_UK stopped allowing peop... \n", + "31 2.0 sounds like a good idea - LA market offers spe... \n", + "32 1.0 No matter how they're ramping up efforts at #d... \n", + "33 2.0 I was wondering if a lovely supermarket would ... \n", + "34 1.0 Telangana people, Please dont go out unless it... \n", + "35 1.0 Idiots are panic purchasing food in bulk that ... \n", + "36 8.0 Following @BBVA_USA's initial offers last week... \n", + "37 6.0 Please appreciate your supermarket delivery dr... \n", + "38 1.0 When asked, “Why does ADI continue to operate... \n", + "39 8.0 The long-term fallout of the #coronavirus lock... \n", + "40 2.0 HELP US PACK THE PANTRIES I m live at 12 15 on... \n", + "41 2.0 Seeing what’s being left behind at grocery st... \n", + "42 2.0 Everyone don t forget condoms when you are hoa... \n", + "43 2.0 If you come across any shops charging inflated... \n", + "44 3.0 New @SA_Update #consumer insights affecting v... \n", + "45 2.0 How is COVID-19 affecting the pork and commodi... \n", + "46 1.0 Unbelievable https://t.co/fBrPYaxNH6 \n", + "47 3.0 How prepared are UK supermarkets to fulfill co... \n", + "48 4.0 Quakertown Distiller Providing Hand Sanitizer ... \n", + "49 4.0 Trump's disastrous dumbfuckery about #coronavi... \n", + "\n", + " y \n", + "0 positive \n", + "1 positive \n", + "2 positive \n", + "3 positive \n", + "4 negative \n", + "5 negative \n", + "6 negative \n", + "7 negative \n", + "8 positive \n", + "9 negative \n", + "10 negative \n", + "11 positive \n", + "12 negative \n", + "13 negative \n", + "14 positive \n", + "15 negative \n", + "16 positive \n", + "17 negative \n", + "18 negative \n", + "19 positive \n", + "20 positive \n", + "21 negative \n", + "22 positive \n", + "23 negative \n", + "24 negative \n", + "25 positive \n", + "26 negative \n", + "27 positive \n", + "28 positive \n", + "29 positive \n", + "30 negative \n", + "31 positive \n", + "32 positive \n", + "33 positive \n", + "34 positive \n", + "35 negative \n", + "36 negative \n", + "37 positive \n", + "38 positive \n", + "39 positive \n", + "40 positive \n", + "41 negative \n", + "42 positive \n", + "43 positive \n", + "44 positive \n", + "45 negative \n", + "46 positive \n", + "47 positive \n", + "48 positive \n", + "49 negative " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
documentsentence_embedding_small_bert_L2_128sentimentsentiment_confidencetexty
0Over 45 tar sands projects that have already s...[-0.040878042578697205, 0.7422170639038086, -1...positive3.0Over 45 tar sands projects that have already s...positive
1I just got super excited because I found toile...[-0.8823249340057373, 0.2473045289516449, -0.2...positive2.0I just got super excited because I found toile...positive
2BE AFRAID, BE VERY AFRAID—INSLEE PROMOTES THE...[-0.5860391855239868, 0.45967912673950195, -0....positive2.0BE AFRAID, BE VERY AFRAID—INSLEE PROMOTES THE...positive
3A lady fro @CheckersSA Brits tells me to hide ...[-1.0504486560821533, 0.03315277025103569, -0....positive7.0A lady fro @CheckersSA Brits tells me to hide ...positive
4Just done an online shop and all the food esse...[-0.7228847742080688, 0.12011835724115372, -0....positive1.0Just done an online shop and all the food esse...negative
5@GavinNewsom so let me get this straight crowd...[-0.7876735329627991, -0.0005504372529685497, ...positive7.0@GavinNewsom so let me get this straight crow...negative
6NY: 18- to 49-year-olds make up more than half...[-0.7093901634216309, 0.15827353298664093, -0....positive3.0NY: 18- to 49-year-olds make up more than half...negative
7Emerging market debt prices reflect worst case...[-0.38020092248916626, 0.469475656747818, -1.0...positive4.0Emerging market debt prices reflect worst case...negative
8do you clean your #supermarket items due to th...[-0.9434630870819092, 0.15352500975131989, -0....positive3.0do you clean your #supermarket items due to t...positive
9@flipkartsupport @rsprasad @PMOIndia In his cr...[-0.5607839226722717, 0.01385028287768364, -0....positive1.0@flipkartsupport @rsprasad\\r\\r\\n@PMOIndia In h...negative
10Several Guernsey women have set up a group of ...[-0.5054810047149658, 0.3916030824184418, -0.1...positive6.0Several Guernsey women have set up a group of ...negative
11I took Plaquenil for 4 years. Now it's the hot...[-0.7918602228164673, 0.6628591418266296, -0.4...positive2.0I took Plaquenil for 4 years. Now it's the hot...positive
12Can’t let this die. A little humor in these t...[-0.6068148016929626, 0.4092712998390198, 0.18...positive1.0Can’t let this die. A little humor in these t...negative
13I’ve been to the grocery store three times si...[-1.2009631395339966, 0.18674977123737335, -0....positive1.0I’ve been to the grocery store three times si...negative
14That maybe there was a rioter out there who th...[-0.9833343029022217, 0.002539500128477812, -0...positive8.0That maybe there was a rioter out there who th...positive
15First they release and spread the virus Strip ...[-0.903867781162262, 0.3173466622829437, -0.74...positive2.0First they release and spread the virus Strip ...negative
16For about a year I have been toying with the i...[-0.7246012091636658, 0.4053008258342743, -0.2...positive7.0For about a year I have been toying with the i...positive
17Creation Consumer Finance in Belfast are now p...[-0.08034359663724899, 0.9084559082984924, -0....positive2.0Creation Consumer Finance in Belfast are now p...negative
18\"Your going to lose people to the flu but you'...[-1.6253035068511963, 0.23702141642570496, -0....positive5.0\"Your going to lose people to the flu but you'...negative
19If you are a #RetailWorker and your store is s...[-0.18891872465610504, 1.0232254266738892, -0....positive5.0If you are a #RetailWorker and your store is s...positive
20I love this social distancing.. No one stand c...[-0.7862111330032349, 0.17004938423633575, -0....positive2.0I love this social distancing..\\r\\r\\n\\r\\r\\nNo ...positive
21#Facebook Ads Fails to Reject COVID-19 Misinfo...[0.09989737719297409, 0.21870799362659454, 0.2...positive1.0#Facebook Ads Fails to Reject COVID-19 Misinfo...negative
22Lease prices decline in March as as the corona...[-0.21414655447006226, 0.6170880794525146, -0....positive1.0Lease prices decline in March as as the corona...positive
23Complete Shutdown of Karnataka W.E.F 24th Marc...[-1.0250476598739624, 0.7916482090950012, -0.4...positive4.0Complete Shutdown of Karnataka W.E.F 24th Marc...negative
24@MobilePunch Another deceitful palliative in m...[-0.967275083065033, 0.39687538146972656, -0.3...positive1.0@MobilePunch Another deceitful palliative in m...negative
25I am anyways under lockdown since 6th March be...[-0.8778604865074158, 0.4535127580165863, -0.8...positive3.0I am anyways under lockdown since 6th March be...positive
26@KFCSA @yumbrands was at your store , the air ...[-1.004093050956726, 0.5079272985458374, -0.48...positive2.0@KFCSA @yumbrands was at your store , the air ...negative
27How to support and save local businesses durin...[-0.7925586700439453, 0.2502853572368622, -0.0...positive1.0How to support and save local businesses durin...positive
28This is the latest current information on supe...[-1.155573844909668, 1.2174267768859863, -0.72...positive4.0This is the latest current information on supe...positive
29If anyone witnesses any shops hiking their pri...[-1.0366969108581543, 0.9876224398612976, 0.21...positive2.0If anyone witnesses any shops hiking their pri...positive
30It's about time @eBay_UK stopped allowing peop...[-0.6213818788528442, 0.8505221009254456, -0.0...positive1.0It's about time @eBay_UK stopped allowing peop...negative
31sounds like a good idea - LA market offers spe...[-0.8883647918701172, 0.09247803688049316, 0.0...positive2.0sounds like a good idea - LA market offers spe...positive
32No matter how they're ramping up efforts at #d...[-0.6616833209991455, 0.4456593692302704, -0.1...positive1.0No matter how they're ramping up efforts at #d...positive
33I was wondering if a lovely supermarket would ...[-0.7988455891609192, 0.3940075635910034, -0.2...positive2.0I was wondering if a lovely supermarket would ...positive
34Telangana people, Please dont go out unless it...[-0.859315812587738, 0.18060781061649323, -0.3...positive1.0Telangana people, Please dont go out unless it...positive
35Idiots are panic purchasing food in bulk that ...[-1.1440715789794922, 0.10256315022706985, -0....positive1.0Idiots are panic purchasing food in bulk that ...negative
36Following @BBVA_USA's initial offers last week...[-0.4633345901966095, 0.6127039790153503, -0.0...positive8.0Following @BBVA_USA's initial offers last week...negative
37Please appreciate your supermarket delivery dr...[-0.8110732436180115, 0.9440520405769348, -0.2...positive6.0Please appreciate your supermarket delivery dr...positive
38When asked, “Why does ADI continue to operate...[-0.7378478050231934, 1.1133265495300293, -0.6...positive1.0When asked, “Why does ADI continue to operate...positive
39The long-term fallout of the #coronavirus lock...[-0.4056348204612732, 0.8604288101196289, -0.2...positive8.0The long-term fallout of the #coronavirus lock...positive
40HELP US PACK THE PANTRIES I m live at 12 15 on...[-1.0248322486877441, 0.3955387771129608, -0.5...positive2.0HELP US PACK THE PANTRIES I m live at 12 15 on...positive
41Seeing what’s being left behind at grocery st...[-0.7818620800971985, 0.35503625869750977, 0.1...positive2.0Seeing what’s being left behind at grocery st...negative
42Everyone don t forget condoms when you are hoa...[-1.7940707206726074, 0.559665322303772, -0.49...positive2.0Everyone don t forget condoms when you are hoa...positive
43If you come across any shops charging inflated...[-0.878898024559021, 0.7265058755874634, 0.008...positive2.0If you come across any shops charging inflated...positive
44New @SA_Update #consumer insights affecting ve...[0.0033321159426122904, 0.9780365228652954, -0...positive3.0New @SA_Update #consumer insights affecting v...positive
45How is COVID-19 affecting the pork and commodi...[-0.6315991282463074, 0.5254122018814087, -0.3...positive2.0How is COVID-19 affecting the pork and commodi...negative
46Unbelievable https://t.co/fBrPYaxNH6[-0.4117434322834015, 0.6890383362770081, 0.51...positive1.0Unbelievable https://t.co/fBrPYaxNH6positive
47How prepared are UK supermarkets to fulfill co...[-0.5095536112785339, 0.48363006114959717, 0.0...positive3.0How prepared are UK supermarkets to fulfill co...positive
48Quakertown Distiller Providing Hand Sanitizer ...[-1.3492352962493896, 0.669184684753418, -0.38...positive4.0Quakertown Distiller Providing Hand Sanitizer ...positive
49Trump's disastrous dumbfuckery about #coronavi...[-0.7200496792793274, 0.3299054205417633, -0.4...positive4.0Trump's disastrous dumbfuckery about #coronavi...negative
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 37 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qFoT-s1MjTSS" + }, + "source": [ + "# 7. Try training with different Embeddings" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "nxWFzQOhjWC8", + "outputId": "d5465274-e8a1-4443-f2b4-950f2387cfbc" + }, + "source": [ + "# We can use nlu.print_components(action='embed_sentence') to see every possibler sentence embedding we could use. Lets use bert!\n", + "nlp.nlu.print_components(action='embed_sentence')" + ], + "execution_count": 39, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "For language NLU provides the following Models : \n", + "nlu.load('am.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_amharic\n", + "For language NLU provides the following Models : \n", + "nlu.load('de.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "For language NLU provides the following Models : \n", + "nlu.load('el.embed_sentence.bert.base_uncased') returns Spark NLP model_anno_obj sent_bert_base_uncased\n", + "For language NLU provides the following Models : \n", + "nlu.load('en.embed_sentence') returns Spark NLP model_anno_obj tfhub_use\n", + "nlu.load('en.embed_sentence.albert') returns Spark NLP model_anno_obj albert_base_uncased\n", + "nlu.load('en.embed_sentence.bert') returns Spark NLP model_anno_obj sent_bert_base_uncased\n", + "nlu.load('en.embed_sentence.bert.base_uncased_legal') returns Spark NLP model_anno_obj sent_bert_base_uncased_legal\n", + "nlu.load('en.embed_sentence.bert.finetuned') returns Spark NLP model_anno_obj sbert_setfit_finetuned_financial_text_classification\n", + "nlu.load('en.embed_sentence.bert.pubmed') returns Spark NLP model_anno_obj sent_bert_pubmed\n", + "nlu.load('en.embed_sentence.bert.pubmed_squad2') returns Spark NLP model_anno_obj sent_bert_pubmed_squad2\n", + "nlu.load('en.embed_sentence.bert.wiki_books') returns Spark NLP model_anno_obj sent_bert_wiki_books\n", + "nlu.load('en.embed_sentence.bert.wiki_books_mnli') returns Spark NLP model_anno_obj sent_bert_wiki_books_mnli\n", + "nlu.load('en.embed_sentence.bert.wiki_books_qnli') returns Spark NLP model_anno_obj sent_bert_wiki_books_qnli\n", + "nlu.load('en.embed_sentence.bert.wiki_books_qqp') returns Spark NLP model_anno_obj sent_bert_wiki_books_qqp\n", + "nlu.load('en.embed_sentence.bert.wiki_books_squad2') returns Spark NLP model_anno_obj sent_bert_wiki_books_squad2\n", + "nlu.load('en.embed_sentence.bert.wiki_books_sst2') returns Spark NLP model_anno_obj sent_bert_wiki_books_sst2\n", + "nlu.load('en.embed_sentence.bert_base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "nlu.load('en.embed_sentence.bert_base_uncased') returns Spark NLP model_anno_obj sent_bert_base_uncased\n", + "nlu.load('en.embed_sentence.bert_large_cased') returns Spark NLP model_anno_obj sent_bert_large_cased\n", + "nlu.load('en.embed_sentence.bert_large_uncased') returns Spark NLP model_anno_obj sent_bert_large_uncased\n", + "nlu.load('en.embed_sentence.bert_use_cmlm_en_base') returns Spark NLP model_anno_obj sent_bert_use_cmlm_en_base\n", + "nlu.load('en.embed_sentence.bert_use_cmlm_en_large') returns Spark NLP model_anno_obj sent_bert_use_cmlm_en_large\n", + "nlu.load('en.embed_sentence.biobert.clinical_base_cased') returns Spark NLP model_anno_obj sent_biobert_clinical_base_cased\n", + "nlu.load('en.embed_sentence.biobert.discharge_base_cased') returns Spark NLP model_anno_obj sent_biobert_discharge_base_cased\n", + "nlu.load('en.embed_sentence.biobert.pmc_base_cased') returns Spark NLP model_anno_obj sent_biobert_pmc_base_cased\n", + "nlu.load('en.embed_sentence.biobert.pubmed_base_cased') returns Spark NLP model_anno_obj sent_biobert_pubmed_base_cased\n", + "nlu.load('en.embed_sentence.biobert.pubmed_large_cased') returns Spark NLP model_anno_obj sent_biobert_pubmed_large_cased\n", + "nlu.load('en.embed_sentence.biobert.pubmed_pmc_base_cased') returns Spark NLP model_anno_obj sent_biobert_pubmed_pmc_base_cased\n", + "nlu.load('en.embed_sentence.covidbert.large_uncased') returns Spark NLP model_anno_obj sent_covidbert_large_uncased\n", + "nlu.load('en.embed_sentence.distil_roberta.distilled_base') returns Spark NLP model_anno_obj sent_distilroberta_base\n", + "nlu.load('en.embed_sentence.doc2vec') returns Spark NLP model_anno_obj doc2vec_gigaword_300\n", + "nlu.load('en.embed_sentence.doc2vec.gigaword_300') returns Spark NLP model_anno_obj doc2vec_gigaword_300\n", + "nlu.load('en.embed_sentence.doc2vec.gigaword_wiki_300') returns Spark NLP model_anno_obj doc2vec_gigaword_wiki_300\n", + "nlu.load('en.embed_sentence.electra') returns Spark NLP model_anno_obj sent_electra_small_uncased\n", + "nlu.load('en.embed_sentence.electra_base_uncased') returns Spark NLP model_anno_obj sent_electra_base_uncased\n", + "nlu.load('en.embed_sentence.electra_large_uncased') returns Spark NLP model_anno_obj sent_electra_large_uncased\n", + "nlu.load('en.embed_sentence.electra_small_uncased') returns Spark NLP model_anno_obj sent_electra_small_uncased\n", + "nlu.load('en.embed_sentence.roberta.base') returns Spark NLP model_anno_obj sent_roberta_base\n", + "nlu.load('en.embed_sentence.roberta.large') returns Spark NLP model_anno_obj sent_roberta_large\n", + "nlu.load('en.embed_sentence.small_bert_L10_128') returns Spark NLP model_anno_obj sent_small_bert_L10_128\n", + "nlu.load('en.embed_sentence.small_bert_L10_256') returns Spark NLP model_anno_obj sent_small_bert_L10_256\n", + "nlu.load('en.embed_sentence.small_bert_L10_512') returns Spark NLP model_anno_obj sent_small_bert_L10_512\n", + "nlu.load('en.embed_sentence.small_bert_L10_768') returns Spark NLP model_anno_obj sent_small_bert_L10_768\n", + "nlu.load('en.embed_sentence.small_bert_L12_128') returns Spark NLP model_anno_obj sent_small_bert_L12_128\n", + "nlu.load('en.embed_sentence.small_bert_L12_256') returns Spark NLP model_anno_obj sent_small_bert_L12_256\n", + "nlu.load('en.embed_sentence.small_bert_L12_512') returns Spark NLP model_anno_obj sent_small_bert_L12_512\n", + "nlu.load('en.embed_sentence.small_bert_L12_768') returns Spark NLP model_anno_obj sent_small_bert_L12_768\n", + "nlu.load('en.embed_sentence.small_bert_L2_128') returns Spark NLP model_anno_obj sent_small_bert_L2_128\n", + "nlu.load('en.embed_sentence.small_bert_L2_256') returns Spark NLP model_anno_obj sent_small_bert_L2_256\n", + "nlu.load('en.embed_sentence.small_bert_L2_512') returns Spark NLP model_anno_obj sent_small_bert_L2_512\n", + "nlu.load('en.embed_sentence.small_bert_L2_768') returns Spark NLP model_anno_obj sent_small_bert_L2_768\n", + "nlu.load('en.embed_sentence.small_bert_L4_128') returns Spark NLP model_anno_obj sent_small_bert_L4_128\n", + "nlu.load('en.embed_sentence.small_bert_L4_256') returns Spark NLP model_anno_obj sent_small_bert_L4_256\n", + "nlu.load('en.embed_sentence.small_bert_L4_512') returns Spark NLP model_anno_obj sent_small_bert_L4_512\n", + "nlu.load('en.embed_sentence.small_bert_L4_768') returns Spark NLP model_anno_obj sent_small_bert_L4_768\n", + "nlu.load('en.embed_sentence.small_bert_L6_128') returns Spark NLP model_anno_obj sent_small_bert_L6_128\n", + "nlu.load('en.embed_sentence.small_bert_L6_256') returns Spark NLP model_anno_obj sent_small_bert_L6_256\n", + "nlu.load('en.embed_sentence.small_bert_L6_512') returns Spark NLP model_anno_obj sent_small_bert_L6_512\n", + "nlu.load('en.embed_sentence.small_bert_L6_768') returns Spark NLP model_anno_obj sent_small_bert_L6_768\n", + "nlu.load('en.embed_sentence.small_bert_L8_128') returns Spark NLP model_anno_obj sent_small_bert_L8_128\n", + "nlu.load('en.embed_sentence.small_bert_L8_256') returns Spark NLP model_anno_obj sent_small_bert_L8_256\n", + "nlu.load('en.embed_sentence.small_bert_L8_512') returns Spark NLP model_anno_obj sent_small_bert_L8_512\n", + "nlu.load('en.embed_sentence.small_bert_L8_768') returns Spark NLP model_anno_obj sent_small_bert_L8_768\n", + "nlu.load('en.embed_sentence.tfhub_use') returns Spark NLP model_anno_obj tfhub_use\n", + "nlu.load('en.embed_sentence.tfhub_use.lg') returns Spark NLP model_anno_obj tfhub_use_lg\n", + "nlu.load('en.embed_sentence.use') returns Spark NLP model_anno_obj tfhub_use\n", + "nlu.load('en.embed_sentence.use.lg') returns Spark NLP model_anno_obj tfhub_use_lg\n", + "For language NLU provides the following Models : \n", + "nlu.load('es.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "nlu.load('es.embed_sentence.bert.base_uncased') returns Spark NLP model_anno_obj sent_bert_base_uncased\n", + "For language NLU provides the following Models : \n", + "nlu.load('fi.embed_sentence.bert') returns Spark NLP model_anno_obj bert_base_finnish_uncased\n", + "nlu.load('fi.embed_sentence.bert.cased') returns Spark NLP model_anno_obj bert_base_finnish_cased\n", + "nlu.load('fi.embed_sentence.bert.uncased') returns Spark NLP model_anno_obj bert_base_finnish_uncased\n", + "For language NLU provides the following Models : \n", + "nlu.load('ha.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_hausa\n", + "For language NLU provides the following Models : \n", + "nlu.load('ig.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_igbo\n", + "For language NLU provides the following Models : \n", + "nlu.load('lg.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_luganda\n", + "For language NLU provides the following Models : \n", + "nlu.load('nl.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "For language NLU provides the following Models : \n", + "nlu.load('pcm.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_naija\n", + "For language NLU provides the following Models : \n", + "nlu.load('pt.embed_sentence.bert.base_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_base_tsdae_sts\n", + "nlu.load('pt.embed_sentence.bert.cased_large_legal') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.1\n", + "nlu.load('pt.embed_sentence.bert.large_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_gpl_sts\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.10.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.10\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.2.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.2\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.3.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.3\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.4.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.4\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.5.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.5\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.7.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.7\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.8.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.8\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.9.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.9\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v1.0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v1.0\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_gpl_nli_sts_v0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_gpl_nli_sts_v0\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_gpl_nli_sts_v1.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_gpl_nli_sts_v1\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_nli_sts_v0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_nli_sts_v0\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_nli_sts_v1.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_nli_sts_v1\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_sts_v0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_sts_v0\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_sts_v1.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_sts_v1\n", + "nlu.load('pt.embed_sentence.bert.v2_base_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_base_ma_v2\n", + "nlu.load('pt.embed_sentence.bert.v2_large_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_sts_v2\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.assin.base.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_base_ma\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.assin2.base.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_base\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_sts_by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_sts_ma.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large_ma\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_sts_ma_v3.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large_ma_v3\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_tsdae_sts.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_sts\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_tsdae_sts_v4.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_sts_v4\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_tsdae_v4_gpl_sts.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_v4_gpl_sts\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.v2_large_sts_v2.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large_v2\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.v2_large_v2_sts.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_v2_sts\n", + "For language NLU provides the following Models : \n", + "nlu.load('rw.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_kinyarwanda\n", + "For language NLU provides the following Models : \n", + "nlu.load('sv.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "For language NLU provides the following Models : \n", + "nlu.load('sw.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_swahili\n", + "For language NLU provides the following Models : \n", + "nlu.load('wo.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_wolof\n", + "For language NLU provides the following Models : \n", + "nlu.load('xx.embed_sentence') returns Spark NLP model_anno_obj sent_bert_multi_cased\n", + "nlu.load('xx.embed_sentence.bert') returns Spark NLP model_anno_obj sent_bert_multi_cased\n", + "nlu.load('xx.embed_sentence.bert.cased') returns Spark NLP model_anno_obj sent_bert_multi_cased\n", + "nlu.load('xx.embed_sentence.bert.muril') returns Spark NLP model_anno_obj sent_bert_muril\n", + "nlu.load('xx.embed_sentence.bert_use_cmlm_multi_base') returns Spark NLP model_anno_obj sent_bert_use_cmlm_multi_base\n", + "nlu.load('xx.embed_sentence.bert_use_cmlm_multi_base_br') returns Spark NLP model_anno_obj sent_bert_use_cmlm_multi_base_br\n", + "nlu.load('xx.embed_sentence.labse') returns Spark NLP model_anno_obj labse\n", + "nlu.load('xx.embed_sentence.xlm_roberta.base') returns Spark NLP model_anno_obj sent_xlm_roberta_base\n", + "For language NLU provides the following Models : \n", + "nlu.load('yo.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_yoruba\n", + "For language NLU provides the following Models : \n", + "nlu.load('zh.embed_sentence.bert') returns Spark NLP model_anno_obj sbert_chinese_qmc_finance_v1\n", + "nlu.load('zh.embed_sentence.bert.distilled') returns Spark NLP model_anno_obj sbert_chinese_qmc_finance_v1_distill\n" + ] + } + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "IKK_Ii_gjJfF", + "outputId": "45dea83e-2a0c-4d09-a723-f877ea7527dd" + }, + "source": [ + "trainable_pipe = nlp.load('en.embed_sentence.small_bert_L12_768 train.sentiment')\n", + "# We need to train longer and user smaller LR for NON-USE based sentence embeddings usually\n", + "# We could tune the hyperparameters further with hyperparameter tuning methods like gridsearch\n", + "# Also longer training gives more accuracy\n", + "trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(120)\n", + "trainable_pipe['trainable_sentiment_dl'].setLr(0.0005)\n", + "fitted_pipe = trainable_pipe.fit(train_df[:500])\n", + "# predict with the trainable pipeline on dataset and get predictions\n", + "preds = fitted_pipe.predict(train_df[:500],output_level='document')\n", + "\n", + "#sentence detector that is part of the pipe generates some NaNs. lets drop them first\n", + "preds.dropna(inplace=True)\n", + "print(classification_report(preds['y'], preds['sentiment']))\n", + "\n", + "#preds" + ], + "execution_count": 4, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", + "sent_small_bert_L12_768 download started this may take some time.\n", + "Approximate size to download 392.9 MB\n", + "[OK!]\n", + " precision recall f1-score support\n", + "\n", + " negative 0.84 0.56 0.67 229\n", + " neutral 0.00 0.00 0.00 0\n", + " positive 0.86 0.71 0.78 271\n", + "\n", + " accuracy 0.64 500\n", + " macro avg 0.57 0.42 0.48 500\n", + "weighted avg 0.85 0.64 0.73 500\n", + "\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_1jxw3GnVGlI" + }, + "source": [ + "# 7.1 evaluate on Test Data" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "Fxx4yNkNVGFl", + "outputId": "0f64b81d-884b-46b8-d571-6e32cc0e1a64" + }, + "source": [ + "preds = fitted_pipe.predict(test_df[:100],output_level='document')\n", + "\n", + "#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n", + "preds.dropna(inplace=True)\n", + "print(classification_report(preds['y'], preds['sentiment']))" + ], + "execution_count": 5, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + " precision recall f1-score support\n", + "\n", + " negative 0.68 0.35 0.47 48\n", + " neutral 0.00 0.00 0.00 0\n", + " positive 0.77 0.63 0.69 52\n", + "\n", + " accuracy 0.50 100\n", + " macro avg 0.48 0.33 0.39 100\n", + "weighted avg 0.73 0.50 0.58 100\n", + "\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2BB-NwZUoHSe" + }, + "source": [ + "# 8. Lets save the model" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "eLex095goHwm" + }, + "source": [ + "stored_model_path = './models/classifier_dl_trained'\n", + "fitted_pipe.save(stored_model_path)" + ], + "execution_count": 6, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e_b2DPd4rCiU" + }, + "source": [ + "# 9. Lets load the model from HDD.\n", + "This makes Offlien NLU usage possible! \n", + "You need to call nlu.load(path=path_to_the_pipe) to load a model/pipeline from disk." + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 133 + }, + "id": "SO4uz45MoRgp", + "outputId": "12ba7df2-7ee4-432a-c5bd-52014e1b26ea" + }, + "source": [ + "hdd_pipe = nlp.load(path=stored_model_path)\n", + "\n", + "preds = hdd_pipe.predict('Everything is under control !')\n", + "preds" + ], + "execution_count": 7, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " document \\\n", + "0 Everything is under control ! \n", + "\n", + " sentence_embedding_from_disk sentiment \\\n", + "0 [0.37780365347862244, 0.29955413937568665, 0.1... negative \n", + "\n", + " sentiment_confidence \n", + "0 0.0 " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
documentsentence_embedding_from_disksentimentsentiment_confidence
0Everything is under control ![0.37780365347862244, 0.29955413937568665, 0.1...negative0.0
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 7 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "e0CVlkk9v6Qi", + "outputId": "3faff34f-7934-4ea7-f0ee-dc6fcf274b15" + }, + "source": [ + "hdd_pipe.print_info()" + ], + "execution_count": 8, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n", + ">>> component_list['document_assembler'] has settable params:\n", + "component_list['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n", + ">>> component_list['bert_sentence_embeddings@sent_small_bert_L12_768'] has settable params:\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setBatchSize(8) | Info: Size of every batch | Currently set to : 8\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setCaseSensitive(False) | Info: whether to ignore case in tokens for embeddings matching | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setDimension(768) | Info: Number of embedding dimensions | Currently set to : 768\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setMaxSentenceLength(128) | Info: Max sentence length to process | Currently set to : 128\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setIsLong(False) | Info: Use Long type instead of Int type for inputs buffer - Some Bert models require Long instead of Int. | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setStorageRef('sent_small_bert_L12_768') | Info: unique reference name for identification | Currently set to : sent_small_bert_L12_768\n", + ">>> component_list['sentiment_dl@sent_small_bert_L12_768'] has settable params:\n", + "component_list['sentiment_dl@sent_small_bert_L12_768'].setThreshold(0.6) | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n", + "component_list['sentiment_dl@sent_small_bert_L12_768'].setThresholdLabel('neutral') | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n", + "component_list['sentiment_dl@sent_small_bert_L12_768'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['sentiment_dl@sent_small_bert_L12_768'].setClasses(['positive', 'negative']) | Info: get the tags used to trained this SentimentDLModel | Currently set to : ['positive', 'negative']\n", + "component_list['sentiment_dl@sent_small_bert_L12_768'].setStorageRef('sent_small_bert_L12_768') | Info: unique reference name for identification | Currently set to : sent_small_bert_L12_768\n" + ] + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "-S888rwObpEs" + }, + "source": [], + "execution_count": null, + "outputs": [] + } + ] +} \ No newline at end of file diff --git a/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_finanical_news.ipynb b/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_finanical_news.ipynb index ba58febd..28ac9124 100644 --- a/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_finanical_news.ipynb +++ b/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_finanical_news.ipynb @@ -1 +1,2600 @@ -{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"NLU_training_sentiment_classifier_demo_finanical_news.ipynb","provenance":[],"collapsed_sections":[],"toc_visible":true},"kernelspec":{"name":"python3","display_name":"Python 3"}},"cells":[{"cell_type":"markdown","metadata":{"id":"zkufh760uvF3"},"source":["![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)\n","\n","[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_finanical_news.ipynb)\n","\n","\n","\n","# Training a Sentiment Analysis Classifier with NLU \n","## 2 class Finance News sentiment classifier training\n","With the [SentimentDL model](https://nlp.johnsnowlabs.com/docs/en/annotators#sentimentdl-multi-class-sentiment-analysis-annotator) from Spark NLP you can achieve State Of the Art results on any multi class text classification problem \n","\n","This notebook showcases the following features : \n","\n","- How to train the deep learning classifier\n","- How to store a pipeline to disk\n","- How to load the pipeline from disk (Enables NLU offline mode)\n","\n","You can achieve these results or even better on this dataset with training data:\n","\n","
\n","\n","![image.png]()\n","\n","\n","\n","\n","You can achieve these results or even better on this dataset with test data:\n","\n","\n","
\n","\n","\n","![image.png]()"]},{"cell_type":"markdown","metadata":{"id":"dur2drhW5Rvi"},"source":["# 1. Install Java 8 and NLU"]},{"cell_type":"code","metadata":{"id":"hFGnBCHavltY","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1620191453470,"user_tz":-120,"elapsed":108117,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"128e625c-21cf-4c76-969d-d58ba57f0b9f"},"source":["!wget https://setup.johnsnowlabs.com/nlu/colab.sh -O - | bash\n","import nlu"],"execution_count":null,"outputs":[{"output_type":"stream","text":["--2021-05-05 05:09:06-- https://raw.githubusercontent.com/JohnSnowLabs/nlu/master/scripts/colab_setup.sh\n","Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...\n","Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.\n","HTTP request sent, awaiting response... 200 OK\n","Length: 1671 (1.6K) [text/plain]\n","Saving to: ‘STDOUT’\n","\n","- 0%[ ] 0 --.-KB/s Installing NLU 3.0.0 with PySpark 3.0.2 and Spark NLP 3.0.1 for Google Colab ...\n","- 100%[===================>] 1.63K --.-KB/s in 0.001s \n","\n","2021-05-05 05:09:06 (1.82 MB/s) - written to stdout [1671/1671]\n","\n","\u001b[K |████████████████████████████████| 204.8MB 64kB/s \n","\u001b[K |████████████████████████████████| 153kB 46.5MB/s \n","\u001b[K |████████████████████████████████| 204kB 21.5MB/s \n","\u001b[K |████████████████████████████████| 204kB 50.1MB/s \n","\u001b[?25h Building wheel for pyspark (setup.py) ... \u001b[?25l\u001b[?25hdone\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"f4KkTfnR5Ugg"},"source":["# 2. Download Finanical News Sentiment dataset \n","https://www.kaggle.com/ankurzing/sentiment-analysis-for-financial-news\n","\n","This dataset contains the sentiments for financial news headlines from the perspective of a retail investor. Further details about the dataset can be found in: Malo, P., Sinha, A., Takala, P., Korhonen, P. and Wallenius, J. (2014): “Good debt or bad debt: Detecting semantic orientations in economic texts.” Journal of the American Society for Information Science and Technology."]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"OrVb5ZMvvrQD","executionInfo":{"status":"ok","timestamp":1620191454511,"user_tz":-120,"elapsed":109150,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"6b206aaa-ecb1-43a7-8d4a-1eae242989d4"},"source":["! wget http://ckl-it.de/wp-content/uploads/2021/01/all-data.csv\n"],"execution_count":null,"outputs":[{"output_type":"stream","text":["--2021-05-05 05:10:53-- http://ckl-it.de/wp-content/uploads/2021/01/all-data.csv\n","Resolving ckl-it.de (ckl-it.de)... 217.160.0.108, 2001:8d8:100f:f000::209\n","Connecting to ckl-it.de (ckl-it.de)|217.160.0.108|:80... connected.\n","HTTP request sent, awaiting response... 200 OK\n","Length: 704799 (688K) [text/csv]\n","Saving to: ‘all-data.csv’\n","\n","all-data.csv 100%[===================>] 688.28K 1.22MB/s in 0.5s \n","\n","2021-05-05 05:10:54 (1.22 MB/s) - ‘all-data.csv’ saved [704799/704799]\n","\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":406},"id":"y4xSRWIhwT28","executionInfo":{"status":"ok","timestamp":1620191455016,"user_tz":-120,"elapsed":109649,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"a2eb3af1-eb9a-413f-ffb8-eedd69c82cd6"},"source":["import pandas as pd\n","train_path = '/content/all-data.csv'\n","\n","train_df = pd.read_csv(train_path)\n","# the text data to use for classification should be in a column named 'text'\n","# the label column must have name 'y' name be of type str\n","columns=['text','y']\n","train_df = train_df[columns]\n","train_df = train_df[~train_df[\"y\"].isin([\"neutral\"])]\n","from sklearn.model_selection import train_test_split\n","\n","train_df, test_df = train_test_split(train_df, test_size=0.2)\n","train_df"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
texty
951Finnish software developer Basware Oyj said on...positive
829Comptel , a vendor of dynamic Operations Suppo...positive
760Espoon kaupunki awarded contracts for personal...positive
770Net cash flow from operations is expected to r...positive
811The shopping center to be opened in St. Peters...positive
.........
12Finnish Talentum reports its operating profit ...positive
981The financial impact is estimated to be an ann...positive
4773`` I am extremely delighted with this project ...positive
4080Cost cutting measures , which have produced ar...positive
1944Paper maker Stora Enso Oyj said Friday it has ...positive
\n","

1573 rows × 2 columns

\n","
"],"text/plain":[" text y\n","951 Finnish software developer Basware Oyj said on... positive\n","829 Comptel , a vendor of dynamic Operations Suppo... positive\n","760 Espoon kaupunki awarded contracts for personal... positive\n","770 Net cash flow from operations is expected to r... positive\n","811 The shopping center to be opened in St. Peters... positive\n","... ... ...\n","12 Finnish Talentum reports its operating profit ... positive\n","981 The financial impact is estimated to be an ann... positive\n","4773 `` I am extremely delighted with this project ... positive\n","4080 Cost cutting measures , which have produced ar... positive\n","1944 Paper maker Stora Enso Oyj said Friday it has ... positive\n","\n","[1573 rows x 2 columns]"]},"metadata":{"tags":[]},"execution_count":3}]},{"cell_type":"markdown","metadata":{"id":"0296Om2C5anY"},"source":["# 3. Train Deep Learning Classifier using nlu.load('train.sentiment')\n","\n","You dataset label column should be named 'y' and the feature column with text data should be named 'text'"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":1000},"id":"3ZIPkRkWftBG","executionInfo":{"status":"ok","timestamp":1620191586887,"user_tz":-120,"elapsed":241515,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"da31ed6e-519a-420d-9bdd-3dd95b8d0c78"},"source":["import nlu \n","from sklearn.metrics import classification_report\n","\n","# load a trainable pipeline by specifying the train. prefix and fit it on a datset with label and text columns\n","# by default the Universal Sentence Encoder (USE) Sentence embeddings are used for generation\n","trainable_pipe = nlu.load('train.sentiment')\n","fitted_pipe = trainable_pipe.fit(train_df.iloc[:50])\n","\n","# predict with the trainable pipeline on dataset and get predictions\n","preds = fitted_pipe.predict(train_df.iloc[:50],output_level='document')\n","#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n","preds.dropna(inplace=True)\n","print(classification_report(preds['y'], preds['sentiment']))\n","\n","preds"],"execution_count":null,"outputs":[{"output_type":"stream","text":["tfhub_use download started this may take some time.\n","Approximate size to download 923.7 MB\n","[OK!]\n","sentence_detector_dl download started this may take some time.\n","Approximate size to download 354.6 KB\n","[OK!]\n"," precision recall f1-score support\n","\n"," negative 0.00 0.00 0.00 14\n"," positive 0.72 1.00 0.84 36\n","\n"," accuracy 0.72 50\n"," macro avg 0.36 0.50 0.42 50\n","weighted avg 0.52 0.72 0.60 50\n","\n"],"name":"stdout"},{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
origin_indextrained_sentiment_confidenceytextdocumenttrained_sentimentsentencesentence_embedding_use
09510.985986positiveFinnish software developer Basware Oyj said on...Finnish software developer Basware Oyj said on...positive[Finnish software developer Basware Oyj said o...[0.015306072309613228, -0.002737046219408512, ...
18290.989200positiveComptel , a vendor of dynamic Operations Suppo...Comptel , a vendor of dynamic Operations Suppo...positive[Comptel , a vendor of dynamic Operations Supp...[0.04115989804267883, -0.013678845949470997, -...
27600.968155positiveEspoon kaupunki awarded contracts for personal...Espoon kaupunki awarded contracts for personal...positive[Espoon kaupunki awarded contracts for persona...[0.050158627331256866, 0.011978656984865665, 0...
37700.983851positiveNet cash flow from operations is expected to r...Net cash flow from operations is expected to r...positive[Net cash flow from operations is expected to ...[0.06134718284010887, 0.020957626402378082, -0...
48110.974798positiveThe shopping center to be opened in St. Peters...The shopping center to be opened in St. Peters...positive[The shopping center to be opened in St. Peter...[0.030753908678889275, -0.009971680119633675, ...
544420.965111negativeCash flow from operations in January-December ...Cash flow from operations in January-December ...positive[Cash flow from operations in January-December...[0.05352102592587471, 0.06253479421138763, -0....
6250.984061positiveNordea Group 's operating profit increased in ...Nordea Group 's operating profit increased in ...positive[Nordea Group 's operating profit increased in...[0.0497022308409214, 0.023793146014213562, -0....
73780.989835positiveThe Finnish supplier of BSS-OSS and VAS for te...The Finnish supplier of BSS-OSS and VAS for te...positive[The Finnish supplier of BSS-OSS and VAS for t...[0.03325969725847244, -0.004926654975861311, -...
840960.973341positiveFollowing the transaction , Lundbeck has world...Following the transaction , Lundbeck has world...positive[Following the transaction , Lundbeck has worl...[0.07471615076065063, -0.04718644917011261, -0...
91690.981174positiveBoth operating profit and net sales for the th...Both operating profit and net sales for the th...positive[Both operating profit and net sales for the t...[0.03235051780939102, 0.03737620636820793, -0....
101450.985590positiveFinnish Okmetic that manufactures and processe...Finnish Okmetic that manufactures and processe...positive[Finnish Okmetic that manufactures and process...[0.05688846856355667, -0.01949388161301613, -0...
119410.966510positiveCargotec Germany GmbH has been awarded a contr...Cargotec Germany GmbH has been awarded a contr...positive[Cargotec Germany GmbH has been awarded a cont...[0.041768044233322144, -0.019827384501695633, ...
127000.984256positiveFor the current year , Raute expects its net s...For the current year , Raute expects its net s...positive[For the current year , Raute expects its net ...[0.07800783962011337, 0.009446296840906143, -0...
1344170.973687negativeOperating profit , excluding non-recurring ite...Operating profit , excluding non-recurring ite...positive[Operating profit , excluding non-recurring it...[0.038493648171424866, 0.03324933722615242, -0...
1440320.970608negativeOperating loss of the Pulp & Paper Machinery u...Operating loss of the Pulp & Paper Machinery u...positive[Operating loss of the Pulp & Paper Machinery ...[0.03237569332122803, 0.05145535245537758, -0....
153700.977541positiveThe airline has ordered nine Airbus A350-900 a...The airline has ordered nine Airbus A350-900 a...positive[The airline has ordered nine Airbus A350-900 ...[0.011570269241929054, 0.03636825457215309, -0...
1620030.981171positiveAirbus has 100 firm orders for the A350 and 89...Airbus has 100 firm orders for the A350 and 89...positive[Airbus has 100 firm orders for the A350 and 8...[0.038183026015758514, 0.03844863548874855, -0...
176100.973718positiveIn addition , Cramo and Peab have signed exclu...In addition , Cramo and Peab have signed exclu...positive[In addition , Cramo and Peab have signed excl...[0.0708921030163765, -0.003263903083279729, -0...
181070.974965positiveIn the fourth quarter of 2008 , net sales incr...In the fourth quarter of 2008 , net sales incr...positive[In the fourth quarter of 2008 , net sales inc...[0.05488097295165062, 0.057823579758405685, -0...
1916010.981420positiveTekla will implement the renewal in software v...Tekla will implement the renewal in software v...positive[Tekla will implement the renewal in software ...[0.007560136262327433, -0.07031573355197906, -...
2044600.977390positiveTechnical indicators for the stock are bullish...Technical indicators for the stock are bullish...positive[Technical indicators for the stock are bullis...[0.02970636636018753, -0.03946929797530174, -0...
2148420.978522negativeOperating profit fell to EUR 35.4 mn from EUR ...Operating profit fell to EUR 35.4 mn from EUR ...positive[Operating profit fell to EUR 35.4 mn from EUR...[0.04256516322493553, 0.04907069727778435, -0....
2218630.988105positiveThe Basware Connectivity services allow compan...The Basware Connectivity services allow compan...positive[The Basware Connectivity services allow compa...[0.029763543978333473, 0.010379478335380554, 0...
236480.982175positiveIn the first nine months of 2010 , the company...In the first nine months of 2010 , the company...positive[In the first nine months of 2010 , the compan...[0.01850833371281624, 0.04868901148438454, -0....
2439940.952864negativeReturn on investment ROI was 4.1 % compared to...Return on investment ROI was 4.1 % compared to...positive[Return on investment ROI was 4.1 % compared t...[0.0815257877111435, 0.05799625441431999, -0.0...
2546430.965377negativeRevenue for the quarter totaled 27.4 billion ,...Revenue for the quarter totaled 27.4 billion ,...positive[Revenue for the quarter totaled 27.4 billion ...[0.02374003455042839, 0.031523674726486206, -0...
26450.989241positiveThe agreement strengthens our long-term partne...The agreement strengthens our long-term partne...positive[The agreement strengthens our long-term partn...[0.06433788686990738, 0.027824176475405693, -0...
272760.976970positiveRuukki has signed a contract to deliver and in...Ruukki has signed a contract to deliver and in...positive[Ruukki has signed a contract to deliver and i...[0.07258088141679764, 0.018949447199702263, -0...
2843590.957311negativeThe total number of filling stations has been ...The total number of filling stations has been ...positive[The total number of filling stations has been...[-0.08472520112991333, 0.025925278663635254, -...
291310.961769positiveThe major breweries increased their domestic b...The major breweries increased their domestic b...positive[The major breweries increased their domestic ...[-0.006784140598028898, -0.0337284579873085, -...
303020.983410positiveThe transactions would increase earnings per s...The transactions would increase earnings per s...positive[The transactions would increase earnings per ...[0.03781255707144737, -0.05154890567064285, -0...
314200.951963negativeCompared with the FTSE 100 index , which rose ...Compared with the FTSE 100 index , which rose ...positive[Compared with the FTSE 100 index , which rose...[0.033254701644182205, 0.03514326363801956, -0...
3244100.957335negativeNordic banks have already had to write off siz...Nordic banks have already had to write off siz...positive[Nordic banks have already had to write off si...[0.059580087661743164, 0.029068391770124435, -...
3312130.955761positiveEmployees are also better prepared to answer c...Employees are also better prepared to answer c...positive[Employees are also better prepared to answer ...[-0.06203755363821983, 0.001560033531859517, -...
345940.975450positiveThe company recorded revenues of E658 .1 milli...The company recorded revenues of E658 .1 milli...positive[The company recorded revenues of E658 .1 mill...[-0.00406982097774744, 0.07419776916503906, -0...
357810.981393positiveThanksto improvements in demand and the adjust...Thanksto improvements in demand and the adjust...positive[Thanksto improvements in demand and the adjus...[0.06293173134326935, 0.018011141568422318, -0...
366460.971967positiveFor the first nine months of 2010 , Talvivaara...For the first nine months of 2010 , Talvivaara...positive[For the first nine months of 2010 , Talvivaar...[0.05364197865128517, 0.025862835347652435, -0...
37860.985699positiveThe acquisition will considerably increase Kem...The acquisition will considerably increase Kem...positive[The acquisition will considerably increase Ke...[0.07686954736709595, 0.008272827602922916, -0...
384080.983062positiveTo our members and partners , the use of IT wi...To our members and partners , the use of IT wi...positive[To our members and partners , the use of IT w...[0.03138357028365135, -0.008651087991893291, -...
3947360.933318negativeIn food trade , sales amounted to EUR320 .1 m ...In food trade , sales amounted to EUR320 .1 m ...positive[In food trade , sales amounted to EUR320 .1 m...[0.06947962194681168, 0.015812069177627563, -0...
40170.988272positiveIncap Contract Manufacturing Services Pvt Ltd ...Incap Contract Manufacturing Services Pvt Ltd ...positive[Incap Contract Manufacturing Services Pvt Ltd...[0.05365738272666931, -0.055247869342565536, -...
417410.980143positive` Our strategic cooperation with Rentakran bri...` Our strategic cooperation with Rentakran bri...positive[` Our strategic cooperation with Rentakran br...[0.020993076264858246, 0.004286216571927071, -...
421940.988024positiveFinnish messaging solutions developer Tecnomen...Finnish messaging solutions developer Tecnomen...positive[Finnish messaging solutions developer Tecnome...[-0.00443086214363575, -0.01589011400938034, -...
431840.954769positiveEBIT margin was up from 1.4 % to 5.1 % .EBIT margin was up from 1.4 % to 5.1 % .positive[EBIT margin was up from 1.4 % to 5.1 % .][0.04300238937139511, -0.08182621747255325, -0...
446620.981895positiveAccording to Sepp+Ænen , the new technology UM...According to Sepp+Ænen , the new technology UM...positive[According to Sepp+Ænen , the new technology U...[0.060580458492040634, 0.02758965454995632, -0...
4546570.985490negativeFinnish Scanfil , a systems supplier and contr...Finnish Scanfil , a systems supplier and contr...positive[Finnish Scanfil , a systems supplier and cont...[0.056254126131534576, 0.05756135284900665, -0...
4643970.970407negativeFrost sold shares for $ 19 million at $ 6.06-7...Frost sold shares for $ 19 million at $ 6.06-7...positive[Frost sold shares for $ 19 million at $ 6.06-...[0.07948038727045059, 0.01797480694949627, -0....
4742830.876012negativeThis is bad news for the barbeque season .This is bad news for the barbeque season .positive[This is bad news for the barbeque season .][0.033248767256736755, -0.0037416787818074226,...
48440.981696positiveSales increased due to growing market rates an...Sales increased due to growing market rates an...positive[Sales increased due to growing market rates a...[0.047733016312122345, 0.010620158165693283, 0...
4940360.971800negativeOperating loss totalled EUR 3.2 mn , compared ...Operating loss totalled EUR 3.2 mn , compared ...positive[Operating loss totalled EUR 3.2 mn , compared...[0.056506961584091187, 0.03850569948554039, -0...
\n","
"],"text/plain":[" origin_index ... sentence_embedding_use\n","0 951 ... [0.015306072309613228, -0.002737046219408512, ...\n","1 829 ... [0.04115989804267883, -0.013678845949470997, -...\n","2 760 ... [0.050158627331256866, 0.011978656984865665, 0...\n","3 770 ... [0.06134718284010887, 0.020957626402378082, -0...\n","4 811 ... [0.030753908678889275, -0.009971680119633675, ...\n","5 4442 ... [0.05352102592587471, 0.06253479421138763, -0....\n","6 25 ... [0.0497022308409214, 0.023793146014213562, -0....\n","7 378 ... [0.03325969725847244, -0.004926654975861311, -...\n","8 4096 ... [0.07471615076065063, -0.04718644917011261, -0...\n","9 169 ... [0.03235051780939102, 0.03737620636820793, -0....\n","10 145 ... [0.05688846856355667, -0.01949388161301613, -0...\n","11 941 ... [0.041768044233322144, -0.019827384501695633, ...\n","12 700 ... [0.07800783962011337, 0.009446296840906143, -0...\n","13 4417 ... [0.038493648171424866, 0.03324933722615242, -0...\n","14 4032 ... [0.03237569332122803, 0.05145535245537758, -0....\n","15 370 ... [0.011570269241929054, 0.03636825457215309, -0...\n","16 2003 ... [0.038183026015758514, 0.03844863548874855, -0...\n","17 610 ... [0.0708921030163765, -0.003263903083279729, -0...\n","18 107 ... [0.05488097295165062, 0.057823579758405685, -0...\n","19 1601 ... [0.007560136262327433, -0.07031573355197906, -...\n","20 4460 ... [0.02970636636018753, -0.03946929797530174, -0...\n","21 4842 ... [0.04256516322493553, 0.04907069727778435, -0....\n","22 1863 ... [0.029763543978333473, 0.010379478335380554, 0...\n","23 648 ... [0.01850833371281624, 0.04868901148438454, -0....\n","24 3994 ... [0.0815257877111435, 0.05799625441431999, -0.0...\n","25 4643 ... [0.02374003455042839, 0.031523674726486206, -0...\n","26 45 ... [0.06433788686990738, 0.027824176475405693, -0...\n","27 276 ... [0.07258088141679764, 0.018949447199702263, -0...\n","28 4359 ... [-0.08472520112991333, 0.025925278663635254, -...\n","29 131 ... [-0.006784140598028898, -0.0337284579873085, -...\n","30 302 ... [0.03781255707144737, -0.05154890567064285, -0...\n","31 420 ... [0.033254701644182205, 0.03514326363801956, -0...\n","32 4410 ... [0.059580087661743164, 0.029068391770124435, -...\n","33 1213 ... [-0.06203755363821983, 0.001560033531859517, -...\n","34 594 ... [-0.00406982097774744, 0.07419776916503906, -0...\n","35 781 ... [0.06293173134326935, 0.018011141568422318, -0...\n","36 646 ... [0.05364197865128517, 0.025862835347652435, -0...\n","37 86 ... [0.07686954736709595, 0.008272827602922916, -0...\n","38 408 ... [0.03138357028365135, -0.008651087991893291, -...\n","39 4736 ... [0.06947962194681168, 0.015812069177627563, -0...\n","40 17 ... [0.05365738272666931, -0.055247869342565536, -...\n","41 741 ... [0.020993076264858246, 0.004286216571927071, -...\n","42 194 ... [-0.00443086214363575, -0.01589011400938034, -...\n","43 184 ... [0.04300238937139511, -0.08182621747255325, -0...\n","44 662 ... [0.060580458492040634, 0.02758965454995632, -0...\n","45 4657 ... [0.056254126131534576, 0.05756135284900665, -0...\n","46 4397 ... [0.07948038727045059, 0.01797480694949627, -0....\n","47 4283 ... [0.033248767256736755, -0.0037416787818074226,...\n","48 44 ... [0.047733016312122345, 0.010620158165693283, 0...\n","49 4036 ... [0.056506961584091187, 0.03850569948554039, -0...\n","\n","[50 rows x 8 columns]"]},"metadata":{"tags":[]},"execution_count":4}]},{"cell_type":"markdown","metadata":{"id":"lVyOE2wV0fw_"},"source":["# 4. Test the fitted pipe on new example"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":97},"id":"qdCUg2MR0PD2","executionInfo":{"status":"ok","timestamp":1620191587668,"user_tz":-120,"elapsed":242291,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"06353129-c349-4f77-dfcf-efda0464c20e"},"source":["fitted_pipe.predict('According to the most recent update there has been a major decrese in the rate of oil')"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
origin_indextrained_sentiment_confidencedocumenttrained_sentimentsentencesentence_embedding_use
000.94103According to the most recent update there has ...positive[According to the most recent update there has...[0.009911456145346165, 0.04162858799099922, -0...
\n","
"],"text/plain":[" origin_index ... sentence_embedding_use\n","0 0 ... [0.009911456145346165, 0.04162858799099922, -0...\n","\n","[1 rows x 6 columns]"]},"metadata":{"tags":[]},"execution_count":5}]},{"cell_type":"markdown","metadata":{"id":"xflpwrVjjBVD"},"source":["## 5. Configure pipe training parameters"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"UtsAUGTmOTms","executionInfo":{"status":"ok","timestamp":1620191587670,"user_tz":-120,"elapsed":242288,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"75f6cff2-4c64-4eb3-a5a3-cf4d142940c8"},"source":["trainable_pipe.print_info()"],"execution_count":null,"outputs":[{"output_type":"stream","text":["The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n",">>> pipe['sentiment_dl'] has settable params:\n","pipe['sentiment_dl'].setMaxEpochs(1) | Info: Maximum number of epochs to train | Currently set to : 1\n","pipe['sentiment_dl'].setLr(0.005) | Info: Learning Rate | Currently set to : 0.005\n","pipe['sentiment_dl'].setBatchSize(64) | Info: Batch size | Currently set to : 64\n","pipe['sentiment_dl'].setDropout(0.5) | Info: Dropout coefficient | Currently set to : 0.5\n","pipe['sentiment_dl'].setEnableOutputLogs(True) | Info: Whether to use stdout in addition to Spark logs. | Currently set to : True\n","pipe['sentiment_dl'].setThreshold(0.6) | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n","pipe['sentiment_dl'].setThresholdLabel('neutral') | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n",">>> pipe['use@tfhub_use'] has settable params:\n","pipe['use@tfhub_use'].setDimension(512) | Info: Number of embedding dimensions | Currently set to : 512\n","pipe['use@tfhub_use'].setLoadSP(False) | Info: Whether to load SentencePiece ops file which is required only by multi-lingual models. This is not changeable after it's set with a pretrained model nor it is compatible with Windows. | Currently set to : False\n","pipe['use@tfhub_use'].setStorageRef('tfhub_use') | Info: unique reference name for identification | Currently set to : tfhub_use\n",">>> pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'] has settable params:\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setExplodeSentences(False) | Info: whether to explode each sentence into a different row, for better parallelization. Defaults to false. | Currently set to : False\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setStorageRef('SentenceDetectorDLModel_c83c27f46b97') | Info: storage unique identifier | Currently set to : SentenceDetectorDLModel_c83c27f46b97\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setEncoder(com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@3933547a) | Info: Data encoder | Currently set to : com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@3933547a\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setImpossiblePenultimates(['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']) | Info: Impossible penultimates | Currently set to : ['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setModelArchitecture('cnn') | Info: Model architecture (CNN) | Currently set to : cnn\n",">>> pipe['document_assembler'] has settable params:\n","pipe['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"2GJdDNV9jEIe"},"source":["## 6. Retrain with new parameters"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":759},"id":"mptfvHx-MMMX","executionInfo":{"status":"ok","timestamp":1620191593284,"user_tz":-120,"elapsed":247897,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"eea9eda8-11d0-4568-c277-46105770062f"},"source":["# Train longer!\n","trainable_pipe = nlu.load('train.sentiment')\n","trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(5) \n","fitted_pipe = trainable_pipe.fit(train_df.iloc[:100])\n","# predict with the trainable pipeline on dataset and get predictions\n","preds = fitted_pipe.predict(train_df.iloc[:100],output_level='document')\n","\n","#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n","preds.dropna(inplace=True)\n","print(classification_report(preds['y'], preds['sentiment']))\n","\n","preds"],"execution_count":null,"outputs":[{"output_type":"stream","text":[" precision recall f1-score support\n","\n"," negative 0.00 0.00 0.00 31\n"," neutral 0.00 0.00 0.00 0\n"," positive 0.86 0.99 0.92 69\n","\n"," accuracy 0.68 100\n"," macro avg 0.29 0.33 0.31 100\n","weighted avg 0.59 0.68 0.63 100\n","\n"],"name":"stdout"},{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
origin_indextrained_sentiment_confidenceytextdocumenttrained_sentimentsentencesentence_embedding_use
09510.999660positiveFinnish software developer Basware Oyj said on...Finnish software developer Basware Oyj said on...positive[Finnish software developer Basware Oyj said o...[0.015306072309613228, -0.002737046219408512, ...
18290.999917positiveComptel , a vendor of dynamic Operations Suppo...Comptel , a vendor of dynamic Operations Suppo...positive[Comptel , a vendor of dynamic Operations Supp...[0.04115989804267883, -0.013678845949470997, -...
27600.997053positiveEspoon kaupunki awarded contracts for personal...Espoon kaupunki awarded contracts for personal...positive[Espoon kaupunki awarded contracts for persona...[0.050158627331256866, 0.011978656984865665, 0...
37700.958511positiveNet cash flow from operations is expected to r...Net cash flow from operations is expected to r...positive[Net cash flow from operations is expected to ...[0.06134718284010887, 0.020957626402378082, -0...
48110.978840positiveThe shopping center to be opened in St. Peters...The shopping center to be opened in St. Peters...positive[The shopping center to be opened in St. Peter...[0.030753908678889275, -0.009971680119633675, ...
...........................
956890.897502positiveFinnish Sampo Bank , of Danish Danske Bank gro...Finnish Sampo Bank , of Danish Danske Bank gro...positive[Finnish Sampo Bank , of Danish Danske Bank gr...[0.06391362845897675, 0.028690673410892487, -0...
9619980.979267positiveTalentum expects that the net sales of its cor...Talentum expects that the net sales of its cor...positive[Talentum expects that the net sales of its co...[0.07739534974098206, 0.037561606615781784, -0...
9744580.997499positiveefficiency improvement measures 20 January 201...efficiency improvement measures 20 January 201...positive[efficiency improvement measures 20 January 20...[0.056025125086307526, 0.054798685014247894, -...
9844150.518980negativeOperating profit , excluding non-recurring ite...Operating profit , excluding non-recurring ite...neutral[Operating profit , excluding non-recurring it...[0.031145866960287094, 0.051556773483753204, -...
9947520.516453negativeOperating profit totaled EUR 9.4 mn , down fro...Operating profit totaled EUR 9.4 mn , down fro...neutral[Operating profit totaled EUR 9.4 mn , down fr...[0.04533316195011139, 0.06103566288948059, -0....
\n","

100 rows × 8 columns

\n","
"],"text/plain":[" origin_index ... sentence_embedding_use\n","0 951 ... [0.015306072309613228, -0.002737046219408512, ...\n","1 829 ... [0.04115989804267883, -0.013678845949470997, -...\n","2 760 ... [0.050158627331256866, 0.011978656984865665, 0...\n","3 770 ... [0.06134718284010887, 0.020957626402378082, -0...\n","4 811 ... [0.030753908678889275, -0.009971680119633675, ...\n",".. ... ... ...\n","95 689 ... [0.06391362845897675, 0.028690673410892487, -0...\n","96 1998 ... [0.07739534974098206, 0.037561606615781784, -0...\n","97 4458 ... [0.056025125086307526, 0.054798685014247894, -...\n","98 4415 ... [0.031145866960287094, 0.051556773483753204, -...\n","99 4752 ... [0.04533316195011139, 0.06103566288948059, -0....\n","\n","[100 rows x 8 columns]"]},"metadata":{"tags":[]},"execution_count":7}]},{"cell_type":"markdown","metadata":{"id":"qFoT-s1MjTSS"},"source":["#7. Try training with different Embeddings"]},{"cell_type":"code","metadata":{"id":"nxWFzQOhjWC8","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1620191593285,"user_tz":-120,"elapsed":247893,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"c9c63c03-471b-4c63-bf5f-4bf1107c521b"},"source":["# We can use nlu.print_components(action='embed_sentence') to see every possibler sentence embedding we could use. Lets use bert!\n","nlu.print_components(action='embed_sentence')"],"execution_count":null,"outputs":[{"output_type":"stream","text":["For language NLU provides the following Models : \n","nlu.load('en.embed_sentence') returns Spark NLP model tfhub_use\n","nlu.load('en.embed_sentence.use') returns Spark NLP model tfhub_use\n","nlu.load('en.embed_sentence.tfhub_use') returns Spark NLP model tfhub_use\n","nlu.load('en.embed_sentence.use.lg') returns Spark NLP model tfhub_use_lg\n","nlu.load('en.embed_sentence.tfhub_use.lg') returns Spark NLP model tfhub_use_lg\n","nlu.load('en.embed_sentence.albert') returns Spark NLP model albert_base_uncased\n","nlu.load('en.embed_sentence.electra') returns Spark NLP model sent_electra_small_uncased\n","nlu.load('en.embed_sentence.electra_small_uncased') returns Spark NLP model sent_electra_small_uncased\n","nlu.load('en.embed_sentence.electra_base_uncased') returns Spark NLP model sent_electra_base_uncased\n","nlu.load('en.embed_sentence.electra_large_uncased') returns Spark NLP model sent_electra_large_uncased\n","nlu.load('en.embed_sentence.bert') returns Spark NLP model sent_bert_base_uncased\n","nlu.load('en.embed_sentence.bert_base_uncased') returns Spark NLP model sent_bert_base_uncased\n","nlu.load('en.embed_sentence.bert_base_cased') returns Spark NLP model sent_bert_base_cased\n","nlu.load('en.embed_sentence.bert_large_uncased') returns Spark NLP model sent_bert_large_uncased\n","nlu.load('en.embed_sentence.bert_large_cased') returns Spark NLP model sent_bert_large_cased\n","nlu.load('en.embed_sentence.biobert.pubmed_base_cased') returns Spark NLP model sent_biobert_pubmed_base_cased\n","nlu.load('en.embed_sentence.biobert.pubmed_large_cased') returns Spark NLP model sent_biobert_pubmed_large_cased\n","nlu.load('en.embed_sentence.biobert.pmc_base_cased') returns Spark NLP model sent_biobert_pmc_base_cased\n","nlu.load('en.embed_sentence.biobert.pubmed_pmc_base_cased') returns Spark NLP model sent_biobert_pubmed_pmc_base_cased\n","nlu.load('en.embed_sentence.biobert.clinical_base_cased') returns Spark NLP model sent_biobert_clinical_base_cased\n","nlu.load('en.embed_sentence.biobert.discharge_base_cased') returns Spark NLP model sent_biobert_discharge_base_cased\n","nlu.load('en.embed_sentence.covidbert.large_uncased') returns Spark NLP model sent_covidbert_large_uncased\n","nlu.load('en.embed_sentence.small_bert_L2_128') returns Spark NLP model sent_small_bert_L2_128\n","nlu.load('en.embed_sentence.small_bert_L4_128') returns Spark NLP model sent_small_bert_L4_128\n","nlu.load('en.embed_sentence.small_bert_L6_128') returns Spark NLP model sent_small_bert_L6_128\n","nlu.load('en.embed_sentence.small_bert_L8_128') returns Spark NLP model sent_small_bert_L8_128\n","nlu.load('en.embed_sentence.small_bert_L10_128') returns Spark NLP model sent_small_bert_L10_128\n","nlu.load('en.embed_sentence.small_bert_L12_128') returns Spark NLP model sent_small_bert_L12_128\n","nlu.load('en.embed_sentence.small_bert_L2_256') returns Spark NLP model sent_small_bert_L2_256\n","nlu.load('en.embed_sentence.small_bert_L4_256') returns Spark NLP model sent_small_bert_L4_256\n","nlu.load('en.embed_sentence.small_bert_L6_256') returns Spark NLP model sent_small_bert_L6_256\n","nlu.load('en.embed_sentence.small_bert_L8_256') returns Spark NLP model sent_small_bert_L8_256\n","nlu.load('en.embed_sentence.small_bert_L10_256') returns Spark NLP model sent_small_bert_L10_256\n","nlu.load('en.embed_sentence.small_bert_L12_256') returns Spark NLP model sent_small_bert_L12_256\n","nlu.load('en.embed_sentence.small_bert_L2_512') returns Spark NLP model sent_small_bert_L2_512\n","nlu.load('en.embed_sentence.small_bert_L4_512') returns Spark NLP model sent_small_bert_L4_512\n","nlu.load('en.embed_sentence.small_bert_L6_512') returns Spark NLP model sent_small_bert_L6_512\n","nlu.load('en.embed_sentence.small_bert_L8_512') returns Spark NLP model sent_small_bert_L8_512\n","nlu.load('en.embed_sentence.small_bert_L10_512') returns Spark NLP model sent_small_bert_L10_512\n","nlu.load('en.embed_sentence.small_bert_L12_512') returns Spark NLP model sent_small_bert_L12_512\n","nlu.load('en.embed_sentence.small_bert_L2_768') returns Spark NLP model sent_small_bert_L2_768\n","nlu.load('en.embed_sentence.small_bert_L4_768') returns Spark NLP model sent_small_bert_L4_768\n","nlu.load('en.embed_sentence.small_bert_L6_768') returns Spark NLP model sent_small_bert_L6_768\n","nlu.load('en.embed_sentence.small_bert_L8_768') returns Spark NLP model sent_small_bert_L8_768\n","nlu.load('en.embed_sentence.small_bert_L10_768') returns Spark NLP model sent_small_bert_L10_768\n","nlu.load('en.embed_sentence.small_bert_L12_768') returns Spark NLP model sent_small_bert_L12_768\n","For language NLU provides the following Models : \n","nlu.load('fi.embed_sentence') returns Spark NLP model sent_bert_finnish_cased\n","nlu.load('fi.embed_sentence.bert.cased') returns Spark NLP model sent_bert_finnish_cased\n","nlu.load('fi.embed_sentence.bert.uncased') returns Spark NLP model sent_bert_finnish_uncased\n","For language NLU provides the following Models : \n","nlu.load('xx.embed_sentence') returns Spark NLP model sent_bert_multi_cased\n","nlu.load('xx.embed_sentence.bert') returns Spark NLP model sent_bert_multi_cased\n","nlu.load('xx.embed_sentence.bert.cased') returns Spark NLP model sent_bert_multi_cased\n","nlu.load('xx.embed_sentence.labse') returns Spark NLP model labse\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"IKK_Ii_gjJfF","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1620192376887,"user_tz":-120,"elapsed":1031491,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"3beccb4a-b2c7-437a-8725-753d99011c4e"},"source":["trainable_pipe = nlu.load('en.embed_sentence.small_bert_L12_768 train.sentiment')\n","# We need to train longer and user smaller LR for NON-USE based sentence embeddings usually\n","# We could tune the hyperparameters further with hyperparameter tuning methods like gridsearch\n","# Also longer training gives more accuracy\n","trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(70) \n","trainable_pipe['trainable_sentiment_dl'].setLr(0.0005) \n","fitted_pipe = trainable_pipe.fit(train_df)\n","# predict with the trainable pipeline on dataset and get predictions\n","preds = fitted_pipe.predict(train_df,output_level='document')\n","\n","#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n","preds.dropna(inplace=True)\n","print(classification_report(preds['y'], preds['sentiment']))\n","\n","#preds"],"execution_count":null,"outputs":[{"output_type":"stream","text":["sent_small_bert_L12_768 download started this may take some time.\n","Approximate size to download 392.9 MB\n","[OK!]\n","sentence_detector_dl download started this may take some time.\n","Approximate size to download 354.6 KB\n","[OK!]\n"," precision recall f1-score support\n","\n"," negative 0.87 0.84 0.86 488\n"," neutral 0.00 0.00 0.00 0\n"," positive 0.96 0.92 0.94 1085\n","\n"," accuracy 0.90 1573\n"," macro avg 0.61 0.59 0.60 1573\n","weighted avg 0.94 0.90 0.92 1573\n","\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"_1jxw3GnVGlI"},"source":["# 7.1 evaluate on Test Data"]},{"cell_type":"code","metadata":{"id":"Fxx4yNkNVGFl","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1620192438491,"user_tz":-120,"elapsed":1093091,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"2ca09f36-0316-41af-9eba-61d6aeaf1f54"},"source":["preds = fitted_pipe.predict(test_df,output_level='document')\n","\n","#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n","preds.dropna(inplace=True)\n","print(classification_report(preds['y'], preds['sentiment']))"],"execution_count":null,"outputs":[{"output_type":"stream","text":[" precision recall f1-score support\n","\n"," negative 0.84 0.76 0.80 116\n"," neutral 0.00 0.00 0.00 0\n"," positive 0.95 0.90 0.93 278\n","\n"," accuracy 0.86 394\n"," macro avg 0.60 0.55 0.57 394\n","weighted avg 0.92 0.86 0.89 394\n","\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"2BB-NwZUoHSe"},"source":["# 8. Lets save the model"]},{"cell_type":"code","metadata":{"id":"eLex095goHwm","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1620192662123,"user_tz":-120,"elapsed":1316719,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"a91b2529-2d79-400d-ce51-107c2afa2195"},"source":["stored_model_path = './models/classifier_dl_trained' \n","fitted_pipe.save(stored_model_path)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["Stored model in ./models/classifier_dl_trained\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"e_b2DPd4rCiU"},"source":["# 9. Lets load the model from HDD.\n","This makes Offlien NLU usage possible! \n","You need to call nlu.load(path=path_to_the_pipe) to load a model/pipeline from disk."]},{"cell_type":"code","metadata":{"id":"SO4uz45MoRgp","colab":{"base_uri":"https://localhost:8080/","height":94},"executionInfo":{"status":"ok","timestamp":1620192674556,"user_tz":-120,"elapsed":1329148,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"864c1c25-cebd-4f7b-c5cd-06ff11db537b"},"source":["hdd_pipe = nlu.load(path=stored_model_path)\n","\n","preds = hdd_pipe.predict('According to the most recent update there has been a major decrese in the rate of oil')\n","preds"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
origin_indexsentimentsentence_embedding_from_disktextsentiment_confidencedocumentsentence
08589934592[negative][[-0.021685948595404625, 0.13073018193244934, ...According to the most recent update there has ...[0.86657214]According to the most recent update there has ...[According to the most recent update there has...
\n","
"],"text/plain":[" origin_index ... sentence\n","0 8589934592 ... [According to the most recent update there has...\n","\n","[1 rows x 7 columns]"]},"metadata":{"tags":[]},"execution_count":12}]},{"cell_type":"code","metadata":{"id":"e0CVlkk9v6Qi","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1620192674557,"user_tz":-120,"elapsed":1329144,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"53e0a6b8-7742-4428-d203-cfa5c1c499b0"},"source":["hdd_pipe.print_info()"],"execution_count":null,"outputs":[{"output_type":"stream","text":["The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n",">>> pipe['document_assembler'] has settable params:\n","pipe['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n",">>> pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'] has settable params:\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setExplodeSentences(False) | Info: whether to explode each sentence into a different row, for better parallelization. Defaults to false. | Currently set to : False\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setStorageRef('SentenceDetectorDLModel_c83c27f46b97') | Info: storage unique identifier | Currently set to : SentenceDetectorDLModel_c83c27f46b97\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setEncoder(com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@dcd6682) | Info: Data encoder | Currently set to : com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@dcd6682\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setImpossiblePenultimates(['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']) | Info: Impossible penultimates | Currently set to : ['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setModelArchitecture('cnn') | Info: Model architecture (CNN) | Currently set to : cnn\n",">>> pipe['bert_sentence@sent_small_bert_L12_768'] has settable params:\n","pipe['bert_sentence@sent_small_bert_L12_768'].setBatchSize(8) | Info: Size of every batch | Currently set to : 8\n","pipe['bert_sentence@sent_small_bert_L12_768'].setCaseSensitive(False) | Info: whether to ignore case in tokens for embeddings matching | Currently set to : False\n","pipe['bert_sentence@sent_small_bert_L12_768'].setDimension(768) | Info: Number of embedding dimensions | Currently set to : 768\n","pipe['bert_sentence@sent_small_bert_L12_768'].setMaxSentenceLength(128) | Info: Max sentence length to process | Currently set to : 128\n","pipe['bert_sentence@sent_small_bert_L12_768'].setIsLong(False) | Info: Use Long type instead of Int type for inputs buffer - Some Bert models require Long instead of Int. | Currently set to : False\n","pipe['bert_sentence@sent_small_bert_L12_768'].setStorageRef('sent_small_bert_L12_768') | Info: unique reference name for identification | Currently set to : sent_small_bert_L12_768\n",">>> pipe['sentiment_dl@sent_small_bert_L12_768'] has settable params:\n","pipe['sentiment_dl@sent_small_bert_L12_768'].setThreshold(0.6) | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n","pipe['sentiment_dl@sent_small_bert_L12_768'].setThresholdLabel('neutral') | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n","pipe['sentiment_dl@sent_small_bert_L12_768'].setClasses(['positive', 'negative']) | Info: get the tags used to trained this SentimentDLModel | Currently set to : ['positive', 'negative']\n","pipe['sentiment_dl@sent_small_bert_L12_768'].setStorageRef('sent_small_bert_L12_768') | Info: unique reference name for identification | Currently set to : sent_small_bert_L12_768\n"],"name":"stdout"}]}]} \ No newline at end of file +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + } + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "zkufh760uvF3" + }, + "source": [ + "![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)\n", + "\n", + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_finanical_news.ipynb)\n", + "\n", + "\n", + "\n", + "# Training a Sentiment Analysis Classifier with NLU\n", + "## 2 class Finance News sentiment classifier training\n", + "With the [SentimentDL model](https://nlp.johnsnowlabs.com/docs/en/annotators#sentimentdl-multi-class-sentiment-analysis-annotator) from Spark NLP you can achieve State Of the Art results on any multi class text classification problem\n", + "\n", + "This notebook showcases the following features :\n", + "\n", + "- How to train the deep learning classifier\n", + "- How to store a pipeline to disk\n", + "- How to load the pipeline from disk (Enables NLU offline mode)\n", + "\n", + "You can achieve these results or even better on this dataset with training data:\n", + "\n", + "
\n", + "\n", + "![image.png]()\n", + "\n", + "\n", + "\n", + "\n", + "You can achieve these results or even better on this dataset with test data:\n", + "\n", + "\n", + "
\n", + "\n", + "\n", + "![image.png]()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dur2drhW5Rvi" + }, + "source": [ + "# 1. Install Java 8 and NLU" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "hFGnBCHavltY" + }, + "source": [ + "!pip install -q johnsnowlabs" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "f4KkTfnR5Ugg" + }, + "source": [ + "# 2. Download Finanical News Sentiment dataset\n", + "https://www.kaggle.com/ankurzing/sentiment-analysis-for-financial-news\n", + "\n", + "This dataset contains the sentiments for financial news headlines from the perspective of a retail investor. Further details about the dataset can be found in: Malo, P., Sinha, A., Takala, P., Korhonen, P. and Wallenius, J. (2014): “Good debt or bad debt: Detecting semantic orientations in economic texts.” Journal of the American Society for Information Science and Technology." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "OrVb5ZMvvrQD" + }, + "source": [ + "! wget https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/resources/en/classifier-dl/financial_news/all-data.csv\n" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 424 + }, + "id": "y4xSRWIhwT28", + "outputId": "4b2c74bd-7563-44ca-9201-d1e76f4899e3" + }, + "source": [ + "import pandas as pd\n", + "train_path = '/content/all-data.csv'\n", + "\n", + "train_df = pd.read_csv(train_path)\n", + "# the text data to use for classification should be in a column named 'text'\n", + "# the label column must have name 'y' name be of type str\n", + "columns=['text','y']\n", + "train_df = train_df[columns]\n", + "train_df = train_df[~train_df[\"y\"].isin([\"neutral\"])]\n", + "from sklearn.model_selection import train_test_split\n", + "\n", + "train_df, test_df = train_test_split(train_df, test_size=0.2)\n", + "train_df" + ], + "execution_count": 3, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " text y\n", + "4631 The OMX Nordic 40 OMXN40 index , comprising th... negative\n", + "4512 Dubai Nokia has announced the launch of `` Com... positive\n", + "2860 credit 20 November 2009 - Finnish glass techno... positive\n", + "1780 HELSINKI AFX - Outokumpu said its technology u... positive\n", + "845 Helsingin Uutiset , Vantaan Sanomat and Lansiv... positive\n", + "... ... ...\n", + "186 EPS for the quarter was EUR0 .00 , as compared... positive\n", + "47 The company also estimates the already carried... positive\n", + "1492 A realignment of interests in the sector is cl... positive\n", + "2094 European traffic grew nearly 30 % . positive\n", + "4452 The period 's sales dropped to EUR30 .6 m from... negative\n", + "\n", + "[1573 rows x 2 columns]" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
texty
4631The OMX Nordic 40 OMXN40 index , comprising th...negative
4512Dubai Nokia has announced the launch of `` Com...positive
2860credit 20 November 2009 - Finnish glass techno...positive
1780HELSINKI AFX - Outokumpu said its technology u...positive
845Helsingin Uutiset , Vantaan Sanomat and Lansiv...positive
.........
186EPS for the quarter was EUR0 .00 , as compared...positive
47The company also estimates the already carried...positive
1492A realignment of interests in the sector is cl...positive
2094European traffic grew nearly 30 % .positive
4452The period 's sales dropped to EUR30 .6 m from...negative
\n", + "

1573 rows × 2 columns

\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 3 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0296Om2C5anY" + }, + "source": [ + "# 3. Train Deep Learning Classifier using nlu.load('train.sentiment')\n", + "\n", + "You dataset label column should be named 'y' and the feature column with text data should be named 'text'" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "id": "3ZIPkRkWftBG", + "outputId": "cd7d4cb2-deba-4161-c04f-c93016fee93c" + }, + "source": [ + "from johnsnowlabs import nlp\n", + "from sklearn.metrics import classification_report\n", + "\n", + "# load a trainable pipeline by specifying the train. prefix and fit it on a datset with label and text columns\n", + "# by default the Universal Sentence Encoder (USE) Sentence embeddings are used for generation\n", + "trainable_pipe = nlp.load('train.sentiment')\n", + "fitted_pipe = trainable_pipe.fit(train_df.iloc[:50])\n", + "\n", + "# predict with the trainable pipeline on dataset and get predictions\n", + "preds = fitted_pipe.predict(train_df.iloc[:50],output_level='document')\n", + "#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n", + "preds.dropna(inplace=True)\n", + "print(classification_report(preds['y'], preds['sentiment']))\n", + "\n", + "preds" + ], + "execution_count": 4, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "sent_small_bert_L2_128 download started this may take some time.\n", + "Approximate size to download 16.1 MB\n", + "[OK!]\n", + " precision recall f1-score support\n", + "\n", + " negative 0.00 0.00 0.00 17\n", + " positive 0.66 1.00 0.80 33\n", + "\n", + " accuracy 0.66 50\n", + " macro avg 0.33 0.50 0.40 50\n", + "weighted avg 0.44 0.66 0.52 50\n", + "\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " document \\\n", + "0 The OMX Nordic 40 OMXN40 index , comprising th... \n", + "1 Dubai Nokia has announced the launch of `` Com... \n", + "2 credit 20 November 2009 - Finnish glass techno... \n", + "3 HELSINKI AFX - Outokumpu said its technology u... \n", + "4 Helsingin Uutiset , Vantaan Sanomat and Lansiv... \n", + "5 Operating profit totaled EUR 5.5 mn , up from ... \n", + "6 Excluding non-recurring items , pre-tax profit... \n", + "7 1 February 2011 - Finnish textile and clothing... \n", + "8 `` There 's the issue of thieves stealing them... \n", + "9 Estonian telecoms company Elisa 's customer nu... \n", + "10 Net sales for the financial year 2006 are expe... \n", + "11 55 workers in +_m+_l will be affected by the c... \n", + "12 The Helsinki-based company , which also owns t... \n", + "13 YIT lodged counter claims against Neste Oil to... \n", + "14 Kaido Kaare , general director for Atria Eesti... \n", + "15 Mobile phone shipments jumped 26 percent to al... \n", + "16 `` After this purchase , Cramo will become the... \n", + "17 The court found TelecomInvest 's arguments con... \n", + "18 Operating profit for 2009 lower than outlook p... \n", + "19 Vanhanen said the strike would be `` extremely... \n", + "20 Construction volumes meanwhile grow at a rate ... \n", + "21 Operating profit for the six-month period decr... \n", + "22 Raisio 's malting capacity was in full use in ... \n", + "23 Both operating profit and turnover for the six... \n", + "24 Our superior customer centricity and expertise... \n", + "25 Commission income rose by 25.7 % to EUR 16.1 m... \n", + "26 The personnel reduction will be carried out in... \n", + "27 The result will also be burdened by increased ... \n", + "28 Employees are also better prepared to answer c... \n", + "29 It generated an operating loss of EUR 96.3 mn ... \n", + "30 In addition , the company will reduce a maximu... \n", + "31 Forestries were also higher , driven by yester... \n", + "32 In the second quarter of 2010 , the group 's n... \n", + "33 It is a solid credit that has been compared to... \n", + "34 He said he has been losing five families a mon... \n", + "35 In Finland , media group Talentum will start p... \n", + "36 Finnish pharmaceuticals company Orion 's net s... \n", + "37 `` The margarine business has been put into go... \n", + "38 Operating profit excluding non-recurring items... \n", + "39 MD Henning Bahr of Stockmann Gruppen praises t... \n", + "40 Based on the first quarter result , existing o... \n", + "41 Both operating profit and net sales for the 12... \n", + "42 In Sweden , operating profit for the period un... \n", + "43 To our members and partners , the use of IT wi... \n", + "44 Of the sales price , a sales gain of some 3.1 ... \n", + "45 It is a disappointment to see the plan folded . \n", + "46 Progress Group , QPR 's representative in Saud... \n", + "47 The new agreement , which expands a long-estab... \n", + "48 With CapMan as a partner , we will be able to ... \n", + "49 `` Overall , we 're pleased with the startup c... \n", + "\n", + " sentence_embedding_small_bert_L2_128 sentiment \\\n", + "0 [-0.5881205201148987, 0.19063492119312286, -0.... positive \n", + "1 [-0.37779635190963745, -0.20781250298023224, -... positive \n", + "2 [-0.4517509937286377, 0.6735780239105225, -0.2... positive \n", + "3 [-0.10504205524921417, 0.31528422236442566, -0... positive \n", + "4 [-1.087294101715088, -0.1457926332950592, -0.1... positive \n", + "5 [-0.643149197101593, -0.04868743568658829, -0.... positive \n", + "6 [-0.7197213172912598, 0.31680017709732056, -0.... positive \n", + "7 [-0.31616416573524475, 0.5464960336685181, -0.... positive \n", + "8 [-1.2100399732589722, 0.25674229860305786, 0.3... positive \n", + "9 [-0.3009508550167084, 0.6154500842094421, -0.4... positive \n", + "10 [-0.7339814305305481, 0.9638439416885376, -0.6... positive \n", + "11 [-1.243334174156189, 0.8785222172737122, -0.93... positive \n", + "12 [-0.7551414370536804, 0.16214342415332794, -0.... positive \n", + "13 [-1.143906831741333, 1.0500915050506592, -0.84... positive \n", + "14 [-0.9822889566421509, 0.27106359601020813, -0.... positive \n", + "15 [-0.46120399236679077, 0.09638205915689468, -0... positive \n", + "16 [-0.7024871706962585, 0.37345775961875916, -0.... positive \n", + "17 [-0.8505982756614685, -0.07158830016851425, 0.... positive \n", + "18 [-0.6575804352760315, 0.68410724401474, -0.230... positive \n", + "19 [-0.7677791714668274, -0.44950076937675476, -0... positive \n", + "20 [-1.4539079666137695, 0.1873081475496292, -0.8... positive \n", + "21 [-1.0161586999893188, 0.548879861831665, -0.86... positive \n", + "22 [-1.2556798458099365, 0.37126457691192627, -1.... positive \n", + "23 [-0.8761054873466492, 0.2913503050804138, -0.8... positive \n", + "24 [-0.11308443546295166, 0.5551179647445679, -0.... positive \n", + "25 [-0.7847265601158142, 0.2198852002620697, -0.5... positive \n", + "26 [-0.8359718918800354, -0.18021883070468903, -0... positive \n", + "27 [-0.6881139278411865, 1.2389051914215088, -1.0... positive \n", + "28 [-1.3386954069137573, 0.24523791670799255, -0.... positive \n", + "29 [-0.7223557829856873, 0.23957425355911255, -0.... positive \n", + "30 [-0.9004239439964294, 1.6903105974197388, -0.9... positive \n", + "31 [-0.9628726840019226, 0.23581582307815552, 0.2... positive \n", + "32 [-0.9420221447944641, 0.34946343302726746, -0.... positive \n", + "33 [-0.0682043731212616, 0.5560981035232544, -0.8... positive \n", + "34 [-1.1294713020324707, 0.507931113243103, -0.73... positive \n", + "35 [-0.6348045468330383, 0.3473186492919922, -0.2... positive \n", + "36 [-0.8268173933029175, 0.19876690208911896, -0.... positive \n", + "37 [-0.6286064386367798, 0.48110127449035645, -0.... positive \n", + "38 [-0.7780808210372925, 0.021108699962496758, -0... positive \n", + "39 [-0.39700981974601746, 0.9596268534660339, -0.... positive \n", + "40 [-0.7781174182891846, 0.9358287453651428, -1.0... positive \n", + "41 [-0.7753643989562988, 0.8645752668380737, -0.6... positive \n", + "42 [-0.8275543451309204, 0.10950104892253876, -0.... positive \n", + "43 [-0.798611581325531, 0.7145741581916809, -0.40... positive \n", + "44 [-0.8446655869483948, 0.8184226751327515, -0.3... positive \n", + "45 [-1.0249158143997192, 0.8160240054130554, 0.03... positive \n", + "46 [-0.17222441732883453, 0.3021131455898285, -0.... positive \n", + "47 [-0.5851113796234131, 0.6877164244651794, -0.6... positive \n", + "48 [-0.2812011241912842, 1.1173006296157837, -0.3... positive \n", + "49 [-0.7001730799674988, 0.15320643782615662, 0.3... positive \n", + "\n", + " sentiment_confidence text \\\n", + "0 7.0 The OMX Nordic 40 OMXN40 index , comprising th... \n", + "1 2.0 Dubai Nokia has announced the launch of `` Com... \n", + "2 5.0 credit 20 November 2009 - Finnish glass techno... \n", + "3 1.0 HELSINKI AFX - Outokumpu said its technology u... \n", + "4 2.0 Helsingin Uutiset , Vantaan Sanomat and Lansiv... \n", + "5 1.0 Operating profit totaled EUR 5.5 mn , up from ... \n", + "6 5.0 Excluding non-recurring items , pre-tax profit... \n", + "7 8.0 1 February 2011 - Finnish textile and clothing... \n", + "8 6.0 `` There 's the issue of thieves stealing them... \n", + "9 6.0 Estonian telecoms company Elisa 's customer nu... \n", + "10 3.0 Net sales for the financial year 2006 are expe... \n", + "11 3.0 55 workers in +_m+_l will be affected by the c... \n", + "12 1.0 The Helsinki-based company , which also owns t... \n", + "13 1.0 YIT lodged counter claims against Neste Oil to... \n", + "14 4.0 Kaido Kaare , general director for Atria Eesti... \n", + "15 2.0 Mobile phone shipments jumped 26 percent to al... \n", + "16 3.0 `` After this purchase , Cramo will become the... \n", + "17 2.0 The court found TelecomInvest 's arguments con... \n", + "18 7.0 Operating profit for 2009 lower than outlook p... \n", + "19 4.0 Vanhanen said the strike would be `` extremely... \n", + "20 8.0 Construction volumes meanwhile grow at a rate ... \n", + "21 2.0 Operating profit for the six-month period decr... \n", + "22 9.0 Raisio 's malting capacity was in full use in ... \n", + "23 5.0 Both operating profit and turnover for the six... \n", + "24 3.0 Our superior customer centricity and expertise... \n", + "25 6.0 Commission income rose by 25.7 % to EUR 16.1 m... \n", + "26 8.0 The personnel reduction will be carried out in... \n", + "27 9.0 The result will also be burdened by increased ... \n", + "28 2.0 Employees are also better prepared to answer c... \n", + "29 2.0 It generated an operating loss of EUR 96.3 mn ... \n", + "30 2.0 In addition , the company will reduce a maximu... \n", + "31 4.0 Forestries were also higher , driven by yester... \n", + "32 3.0 In the second quarter of 2010 , the group 's n... \n", + "33 2.0 It is a solid credit that has been compared to... \n", + "34 9.0 He said he has been losing five families a mon... \n", + "35 3.0 In Finland , media group Talentum will start p... \n", + "36 1.0 Finnish pharmaceuticals company Orion 's net s... \n", + "37 2.0 `` The margarine business has been put into go... \n", + "38 2.0 Operating profit excluding non-recurring items... \n", + "39 1.0 MD Henning Bahr of Stockmann Gruppen praises t... \n", + "40 9.0 Based on the first quarter result , existing o... \n", + "41 3.0 Both operating profit and net sales for the 12... \n", + "42 3.0 In Sweden , operating profit for the period un... \n", + "43 3.0 To our members and partners , the use of IT wi... \n", + "44 1.0 Of the sales price , a sales gain of some 3.1 ... \n", + "45 1.0 It is a disappointment to see the plan folded . \n", + "46 9.0 Progress Group , QPR 's representative in Saud... \n", + "47 9.0 The new agreement , which expands a long-estab... \n", + "48 4.0 With CapMan as a partner , we will be able to ... \n", + "49 2.0 `` Overall , we 're pleased with the startup c... \n", + "\n", + " y \n", + "0 negative \n", + "1 positive \n", + "2 positive \n", + "3 positive \n", + "4 positive \n", + "5 positive \n", + "6 positive \n", + "7 positive \n", + "8 negative \n", + "9 positive \n", + "10 negative \n", + "11 negative \n", + "12 positive \n", + "13 negative \n", + "14 positive \n", + "15 positive \n", + "16 positive \n", + "17 positive \n", + "18 negative \n", + "19 negative \n", + "20 positive \n", + "21 negative \n", + "22 positive \n", + "23 positive \n", + "24 positive \n", + "25 positive \n", + "26 negative \n", + "27 negative \n", + "28 positive \n", + "29 negative \n", + "30 negative \n", + "31 positive \n", + "32 positive \n", + "33 positive \n", + "34 negative \n", + "35 negative \n", + "36 positive \n", + "37 positive \n", + "38 negative \n", + "39 positive \n", + "40 negative \n", + "41 positive \n", + "42 positive \n", + "43 positive \n", + "44 positive \n", + "45 negative \n", + "46 positive \n", + "47 positive \n", + "48 positive \n", + "49 positive " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
documentsentence_embedding_small_bert_L2_128sentimentsentiment_confidencetexty
0The OMX Nordic 40 OMXN40 index , comprising th...[-0.5881205201148987, 0.19063492119312286, -0....positive7.0The OMX Nordic 40 OMXN40 index , comprising th...negative
1Dubai Nokia has announced the launch of `` Com...[-0.37779635190963745, -0.20781250298023224, -...positive2.0Dubai Nokia has announced the launch of `` Com...positive
2credit 20 November 2009 - Finnish glass techno...[-0.4517509937286377, 0.6735780239105225, -0.2...positive5.0credit 20 November 2009 - Finnish glass techno...positive
3HELSINKI AFX - Outokumpu said its technology u...[-0.10504205524921417, 0.31528422236442566, -0...positive1.0HELSINKI AFX - Outokumpu said its technology u...positive
4Helsingin Uutiset , Vantaan Sanomat and Lansiv...[-1.087294101715088, -0.1457926332950592, -0.1...positive2.0Helsingin Uutiset , Vantaan Sanomat and Lansiv...positive
5Operating profit totaled EUR 5.5 mn , up from ...[-0.643149197101593, -0.04868743568658829, -0....positive1.0Operating profit totaled EUR 5.5 mn , up from ...positive
6Excluding non-recurring items , pre-tax profit...[-0.7197213172912598, 0.31680017709732056, -0....positive5.0Excluding non-recurring items , pre-tax profit...positive
71 February 2011 - Finnish textile and clothing...[-0.31616416573524475, 0.5464960336685181, -0....positive8.01 February 2011 - Finnish textile and clothing...positive
8`` There 's the issue of thieves stealing them...[-1.2100399732589722, 0.25674229860305786, 0.3...positive6.0`` There 's the issue of thieves stealing them...negative
9Estonian telecoms company Elisa 's customer nu...[-0.3009508550167084, 0.6154500842094421, -0.4...positive6.0Estonian telecoms company Elisa 's customer nu...positive
10Net sales for the financial year 2006 are expe...[-0.7339814305305481, 0.9638439416885376, -0.6...positive3.0Net sales for the financial year 2006 are expe...negative
1155 workers in +_m+_l will be affected by the c...[-1.243334174156189, 0.8785222172737122, -0.93...positive3.055 workers in +_m+_l will be affected by the c...negative
12The Helsinki-based company , which also owns t...[-0.7551414370536804, 0.16214342415332794, -0....positive1.0The Helsinki-based company , which also owns t...positive
13YIT lodged counter claims against Neste Oil to...[-1.143906831741333, 1.0500915050506592, -0.84...positive1.0YIT lodged counter claims against Neste Oil to...negative
14Kaido Kaare , general director for Atria Eesti...[-0.9822889566421509, 0.27106359601020813, -0....positive4.0Kaido Kaare , general director for Atria Eesti...positive
15Mobile phone shipments jumped 26 percent to al...[-0.46120399236679077, 0.09638205915689468, -0...positive2.0Mobile phone shipments jumped 26 percent to al...positive
16`` After this purchase , Cramo will become the...[-0.7024871706962585, 0.37345775961875916, -0....positive3.0`` After this purchase , Cramo will become the...positive
17The court found TelecomInvest 's arguments con...[-0.8505982756614685, -0.07158830016851425, 0....positive2.0The court found TelecomInvest 's arguments con...positive
18Operating profit for 2009 lower than outlook p...[-0.6575804352760315, 0.68410724401474, -0.230...positive7.0Operating profit for 2009 lower than outlook p...negative
19Vanhanen said the strike would be `` extremely...[-0.7677791714668274, -0.44950076937675476, -0...positive4.0Vanhanen said the strike would be `` extremely...negative
20Construction volumes meanwhile grow at a rate ...[-1.4539079666137695, 0.1873081475496292, -0.8...positive8.0Construction volumes meanwhile grow at a rate ...positive
21Operating profit for the six-month period decr...[-1.0161586999893188, 0.548879861831665, -0.86...positive2.0Operating profit for the six-month period decr...negative
22Raisio 's malting capacity was in full use in ...[-1.2556798458099365, 0.37126457691192627, -1....positive9.0Raisio 's malting capacity was in full use in ...positive
23Both operating profit and turnover for the six...[-0.8761054873466492, 0.2913503050804138, -0.8...positive5.0Both operating profit and turnover for the six...positive
24Our superior customer centricity and expertise...[-0.11308443546295166, 0.5551179647445679, -0....positive3.0Our superior customer centricity and expertise...positive
25Commission income rose by 25.7 % to EUR 16.1 m...[-0.7847265601158142, 0.2198852002620697, -0.5...positive6.0Commission income rose by 25.7 % to EUR 16.1 m...positive
26The personnel reduction will be carried out in...[-0.8359718918800354, -0.18021883070468903, -0...positive8.0The personnel reduction will be carried out in...negative
27The result will also be burdened by increased ...[-0.6881139278411865, 1.2389051914215088, -1.0...positive9.0The result will also be burdened by increased ...negative
28Employees are also better prepared to answer c...[-1.3386954069137573, 0.24523791670799255, -0....positive2.0Employees are also better prepared to answer c...positive
29It generated an operating loss of EUR 96.3 mn ...[-0.7223557829856873, 0.23957425355911255, -0....positive2.0It generated an operating loss of EUR 96.3 mn ...negative
30In addition , the company will reduce a maximu...[-0.9004239439964294, 1.6903105974197388, -0.9...positive2.0In addition , the company will reduce a maximu...negative
31Forestries were also higher , driven by yester...[-0.9628726840019226, 0.23581582307815552, 0.2...positive4.0Forestries were also higher , driven by yester...positive
32In the second quarter of 2010 , the group 's n...[-0.9420221447944641, 0.34946343302726746, -0....positive3.0In the second quarter of 2010 , the group 's n...positive
33It is a solid credit that has been compared to...[-0.0682043731212616, 0.5560981035232544, -0.8...positive2.0It is a solid credit that has been compared to...positive
34He said he has been losing five families a mon...[-1.1294713020324707, 0.507931113243103, -0.73...positive9.0He said he has been losing five families a mon...negative
35In Finland , media group Talentum will start p...[-0.6348045468330383, 0.3473186492919922, -0.2...positive3.0In Finland , media group Talentum will start p...negative
36Finnish pharmaceuticals company Orion 's net s...[-0.8268173933029175, 0.19876690208911896, -0....positive1.0Finnish pharmaceuticals company Orion 's net s...positive
37`` The margarine business has been put into go...[-0.6286064386367798, 0.48110127449035645, -0....positive2.0`` The margarine business has been put into go...positive
38Operating profit excluding non-recurring items...[-0.7780808210372925, 0.021108699962496758, -0...positive2.0Operating profit excluding non-recurring items...negative
39MD Henning Bahr of Stockmann Gruppen praises t...[-0.39700981974601746, 0.9596268534660339, -0....positive1.0MD Henning Bahr of Stockmann Gruppen praises t...positive
40Based on the first quarter result , existing o...[-0.7781174182891846, 0.9358287453651428, -1.0...positive9.0Based on the first quarter result , existing o...negative
41Both operating profit and net sales for the 12...[-0.7753643989562988, 0.8645752668380737, -0.6...positive3.0Both operating profit and net sales for the 12...positive
42In Sweden , operating profit for the period un...[-0.8275543451309204, 0.10950104892253876, -0....positive3.0In Sweden , operating profit for the period un...positive
43To our members and partners , the use of IT wi...[-0.798611581325531, 0.7145741581916809, -0.40...positive3.0To our members and partners , the use of IT wi...positive
44Of the sales price , a sales gain of some 3.1 ...[-0.8446655869483948, 0.8184226751327515, -0.3...positive1.0Of the sales price , a sales gain of some 3.1 ...positive
45It is a disappointment to see the plan folded .[-1.0249158143997192, 0.8160240054130554, 0.03...positive1.0It is a disappointment to see the plan folded .negative
46Progress Group , QPR 's representative in Saud...[-0.17222441732883453, 0.3021131455898285, -0....positive9.0Progress Group , QPR 's representative in Saud...positive
47The new agreement , which expands a long-estab...[-0.5851113796234131, 0.6877164244651794, -0.6...positive9.0The new agreement , which expands a long-estab...positive
48With CapMan as a partner , we will be able to ...[-0.2812011241912842, 1.1173006296157837, -0.3...positive4.0With CapMan as a partner , we will be able to ...positive
49`` Overall , we 're pleased with the startup c...[-0.7001730799674988, 0.15320643782615662, 0.3...positive2.0`` Overall , we 're pleased with the startup c...positive
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 4 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lVyOE2wV0fw_" + }, + "source": [ + "# 4. Test the fitted pipe on new example" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 150 + }, + "id": "qdCUg2MR0PD2", + "outputId": "c19ba3ab-6cd4-472e-c800-0bdfd0d1c3c5" + }, + "source": [ + "fitted_pipe.predict('According to the most recent update there has been a major decrese in the rate of oil')" + ], + "execution_count": 5, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "sentence_detector_dl download started this may take some time.\n", + "Approximate size to download 354.6 KB\n", + "[OK!]\n", + "Warning::Spark Session already created, some configs may not take.\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " sentence \\\n", + "0 According to the most recent update there has ... \n", + "\n", + " sentence_embedding_small_bert_L2_128 sentiment \\\n", + "0 [-0.9835950136184692, 0.07015661895275116, -0.... positive \n", + "\n", + " sentiment_confidence \n", + "0 1.0 " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
sentencesentence_embedding_small_bert_L2_128sentimentsentiment_confidence
0According to the most recent update there has ...[-0.9835950136184692, 0.07015661895275116, -0....positive1.0
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 5 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xflpwrVjjBVD" + }, + "source": [ + "## 5. Configure pipe training parameters" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "UtsAUGTmOTms", + "outputId": "e5c0181c-13c7-4bd9-9270-4fbbd2cc1ea6" + }, + "source": [ + "trainable_pipe.print_info()" + ], + "execution_count": 6, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n", + ">>> component_list['bert_sentence_embeddings@sent_small_bert_L2_128'] has settable params:\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setBatchSize(8) | Info: Size of every batch | Currently set to : 8\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setIsLong(False) | Info: Use Long type instead of Int type for inputs buffer - Some Bert models require Long instead of Int. | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setMaxSentenceLength(128) | Info: Max sentence length to process | Currently set to : 128\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setDimension(128) | Info: Number of embedding dimensions | Currently set to : 128\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setCaseSensitive(False) | Info: whether to ignore case in tokens for embeddings matching | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setStorageRef('sent_small_bert_L2_128') | Info: unique reference name for identification | Currently set to : sent_small_bert_L2_128\n", + ">>> component_list['document_assembler'] has settable params:\n", + "component_list['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n", + ">>> component_list['sentiment_dl@sent_small_bert_L2_128'] has settable params:\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setThreshold(0.6) | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setThresholdLabel('neutral') | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setStorageRef('sent_small_bert_L2_128') | Info: unique reference name for identification | Currently set to : sent_small_bert_L2_128\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2GJdDNV9jEIe" + }, + "source": [ + "## 6. Retrain with new parameters" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 719 + }, + "id": "mptfvHx-MMMX", + "outputId": "71701494-c69a-4ea6-a803-677b18ebaffd" + }, + "source": [ + "# Train longer!\n", + "trainable_pipe = nlp.load('train.sentiment')\n", + "trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(5)\n", + "fitted_pipe = trainable_pipe.fit(train_df.iloc[:100])\n", + "# predict with the trainable pipeline on dataset and get predictions\n", + "preds = fitted_pipe.predict(train_df.iloc[:100],output_level='document')\n", + "\n", + "#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n", + "preds.dropna(inplace=True)\n", + "print(classification_report(preds['y'], preds['sentiment']))\n", + "\n", + "preds" + ], + "execution_count": 7, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", + "sent_small_bert_L2_128 download started this may take some time.\n", + "Approximate size to download 16.1 MB\n", + "[OK!]\n", + " precision recall f1-score support\n", + "\n", + " negative 0.00 0.00 0.00 30\n", + " positive 0.70 1.00 0.82 70\n", + "\n", + " accuracy 0.70 100\n", + " macro avg 0.35 0.50 0.41 100\n", + "weighted avg 0.49 0.70 0.58 100\n", + "\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " document \\\n", + "0 The OMX Nordic 40 OMXN40 index , comprising th... \n", + "1 Dubai Nokia has announced the launch of `` Com... \n", + "2 credit 20 November 2009 - Finnish glass techno... \n", + "3 HELSINKI AFX - Outokumpu said its technology u... \n", + "4 Helsingin Uutiset , Vantaan Sanomat and Lansiv... \n", + ".. ... \n", + "95 Ruukki Group calculates that it has lost EUR 4... \n", + "96 This is the second successful effort for the f... \n", + "97 For the first nine months of 2010 , Talvivaara... \n", + "98 Nordea sees a return to positive growth for th... \n", + "99 In the third quarter , net sales increased by ... \n", + "\n", + " sentence_embedding_small_bert_L2_128 sentiment \\\n", + "0 [-0.5881205201148987, 0.19063492119312286, -0.... positive \n", + "1 [-0.37779635190963745, -0.20781250298023224, -... positive \n", + "2 [-0.4517509937286377, 0.6735780239105225, -0.2... positive \n", + "3 [-0.10504205524921417, 0.31528422236442566, -0... positive \n", + "4 [-1.087294101715088, -0.1457926332950592, -0.1... positive \n", + ".. ... ... \n", + "95 [-1.1240153312683105, 0.029101694002747536, -0... positive \n", + "96 [-1.0914063453674316, 0.636502742767334, -0.13... positive \n", + "97 [-1.4264403581619263, -0.3568509817123413, -0.... positive \n", + "98 [-0.5386144518852234, -0.5268656611442566, -0.... positive \n", + "99 [-0.8460788726806641, 0.5203403830528259, -0.8... positive \n", + "\n", + " sentiment_confidence text \\\n", + "0 1.0 The OMX Nordic 40 OMXN40 index , comprising th... \n", + "1 1.0 Dubai Nokia has announced the launch of `` Com... \n", + "2 1.0 credit 20 November 2009 - Finnish glass techno... \n", + "3 1.0 HELSINKI AFX - Outokumpu said its technology u... \n", + "4 1.0 Helsingin Uutiset , Vantaan Sanomat and Lansiv... \n", + ".. ... ... \n", + "95 1.0 Ruukki Group calculates that it has lost EUR 4... \n", + "96 1.0 This is the second successful effort for the f... \n", + "97 1.0 For the first nine months of 2010 , Talvivaara... \n", + "98 1.0 Nordea sees a return to positive growth for th... \n", + "99 1.0 In the third quarter , net sales increased by ... \n", + "\n", + " y \n", + "0 negative \n", + "1 positive \n", + "2 positive \n", + "3 positive \n", + "4 positive \n", + ".. ... \n", + "95 negative \n", + "96 positive \n", + "97 positive \n", + "98 positive \n", + "99 positive \n", + "\n", + "[100 rows x 6 columns]" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
documentsentence_embedding_small_bert_L2_128sentimentsentiment_confidencetexty
0The OMX Nordic 40 OMXN40 index , comprising th...[-0.5881205201148987, 0.19063492119312286, -0....positive1.0The OMX Nordic 40 OMXN40 index , comprising th...negative
1Dubai Nokia has announced the launch of `` Com...[-0.37779635190963745, -0.20781250298023224, -...positive1.0Dubai Nokia has announced the launch of `` Com...positive
2credit 20 November 2009 - Finnish glass techno...[-0.4517509937286377, 0.6735780239105225, -0.2...positive1.0credit 20 November 2009 - Finnish glass techno...positive
3HELSINKI AFX - Outokumpu said its technology u...[-0.10504205524921417, 0.31528422236442566, -0...positive1.0HELSINKI AFX - Outokumpu said its technology u...positive
4Helsingin Uutiset , Vantaan Sanomat and Lansiv...[-1.087294101715088, -0.1457926332950592, -0.1...positive1.0Helsingin Uutiset , Vantaan Sanomat and Lansiv...positive
.....................
95Ruukki Group calculates that it has lost EUR 4...[-1.1240153312683105, 0.029101694002747536, -0...positive1.0Ruukki Group calculates that it has lost EUR 4...negative
96This is the second successful effort for the f...[-1.0914063453674316, 0.636502742767334, -0.13...positive1.0This is the second successful effort for the f...positive
97For the first nine months of 2010 , Talvivaara...[-1.4264403581619263, -0.3568509817123413, -0....positive1.0For the first nine months of 2010 , Talvivaara...positive
98Nordea sees a return to positive growth for th...[-0.5386144518852234, -0.5268656611442566, -0....positive1.0Nordea sees a return to positive growth for th...positive
99In the third quarter , net sales increased by ...[-0.8460788726806641, 0.5203403830528259, -0.8...positive1.0In the third quarter , net sales increased by ...positive
\n", + "

100 rows × 6 columns

\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 7 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qFoT-s1MjTSS" + }, + "source": [ + "#7. Try training with different Embeddings" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "nxWFzQOhjWC8", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "fae0ba7b-3630-493e-f3ed-74e126ab96d2" + }, + "source": [ + "# We can use nlu.print_components(action='embed_sentence') to see every possibler sentence embedding we could use. Lets use bert!\n", + "nlp.nlu.print_components(action='embed_sentence')" + ], + "execution_count": 8, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "For language NLU provides the following Models : \n", + "nlu.load('am.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_amharic\n", + "For language NLU provides the following Models : \n", + "nlu.load('de.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "For language NLU provides the following Models : \n", + "nlu.load('el.embed_sentence.bert.base_uncased') returns Spark NLP model_anno_obj sent_bert_base_uncased\n", + "For language NLU provides the following Models : \n", + "nlu.load('en.embed_sentence') returns Spark NLP model_anno_obj tfhub_use\n", + "nlu.load('en.embed_sentence.albert') returns Spark NLP model_anno_obj albert_base_uncased\n", + "nlu.load('en.embed_sentence.bert') returns Spark NLP model_anno_obj sent_bert_base_uncased\n", + "nlu.load('en.embed_sentence.bert.base_uncased_legal') returns Spark NLP model_anno_obj sent_bert_base_uncased_legal\n", + "nlu.load('en.embed_sentence.bert.finetuned') returns Spark NLP model_anno_obj sbert_setfit_finetuned_financial_text_classification\n", + "nlu.load('en.embed_sentence.bert.pubmed') returns Spark NLP model_anno_obj sent_bert_pubmed\n", + "nlu.load('en.embed_sentence.bert.pubmed_squad2') returns Spark NLP model_anno_obj sent_bert_pubmed_squad2\n", + "nlu.load('en.embed_sentence.bert.wiki_books') returns Spark NLP model_anno_obj sent_bert_wiki_books\n", + "nlu.load('en.embed_sentence.bert.wiki_books_mnli') returns Spark NLP model_anno_obj sent_bert_wiki_books_mnli\n", + "nlu.load('en.embed_sentence.bert.wiki_books_qnli') returns Spark NLP model_anno_obj sent_bert_wiki_books_qnli\n", + "nlu.load('en.embed_sentence.bert.wiki_books_qqp') returns Spark NLP model_anno_obj sent_bert_wiki_books_qqp\n", + "nlu.load('en.embed_sentence.bert.wiki_books_squad2') returns Spark NLP model_anno_obj sent_bert_wiki_books_squad2\n", + "nlu.load('en.embed_sentence.bert.wiki_books_sst2') returns Spark NLP model_anno_obj sent_bert_wiki_books_sst2\n", + "nlu.load('en.embed_sentence.bert_base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "nlu.load('en.embed_sentence.bert_base_uncased') returns Spark NLP model_anno_obj sent_bert_base_uncased\n", + "nlu.load('en.embed_sentence.bert_large_cased') returns Spark NLP model_anno_obj sent_bert_large_cased\n", + "nlu.load('en.embed_sentence.bert_large_uncased') returns Spark NLP model_anno_obj sent_bert_large_uncased\n", + "nlu.load('en.embed_sentence.bert_use_cmlm_en_base') returns Spark NLP model_anno_obj sent_bert_use_cmlm_en_base\n", + "nlu.load('en.embed_sentence.bert_use_cmlm_en_large') returns Spark NLP model_anno_obj sent_bert_use_cmlm_en_large\n", + "nlu.load('en.embed_sentence.biobert.clinical_base_cased') returns Spark NLP model_anno_obj sent_biobert_clinical_base_cased\n", + "nlu.load('en.embed_sentence.biobert.discharge_base_cased') returns Spark NLP model_anno_obj sent_biobert_discharge_base_cased\n", + "nlu.load('en.embed_sentence.biobert.pmc_base_cased') returns Spark NLP model_anno_obj sent_biobert_pmc_base_cased\n", + "nlu.load('en.embed_sentence.biobert.pubmed_base_cased') returns Spark NLP model_anno_obj sent_biobert_pubmed_base_cased\n", + "nlu.load('en.embed_sentence.biobert.pubmed_large_cased') returns Spark NLP model_anno_obj sent_biobert_pubmed_large_cased\n", + "nlu.load('en.embed_sentence.biobert.pubmed_pmc_base_cased') returns Spark NLP model_anno_obj sent_biobert_pubmed_pmc_base_cased\n", + "nlu.load('en.embed_sentence.covidbert.large_uncased') returns Spark NLP model_anno_obj sent_covidbert_large_uncased\n", + "nlu.load('en.embed_sentence.distil_roberta.distilled_base') returns Spark NLP model_anno_obj sent_distilroberta_base\n", + "nlu.load('en.embed_sentence.doc2vec') returns Spark NLP model_anno_obj doc2vec_gigaword_300\n", + "nlu.load('en.embed_sentence.doc2vec.gigaword_300') returns Spark NLP model_anno_obj doc2vec_gigaword_300\n", + "nlu.load('en.embed_sentence.doc2vec.gigaword_wiki_300') returns Spark NLP model_anno_obj doc2vec_gigaword_wiki_300\n", + "nlu.load('en.embed_sentence.electra') returns Spark NLP model_anno_obj sent_electra_small_uncased\n", + "nlu.load('en.embed_sentence.electra_base_uncased') returns Spark NLP model_anno_obj sent_electra_base_uncased\n", + "nlu.load('en.embed_sentence.electra_large_uncased') returns Spark NLP model_anno_obj sent_electra_large_uncased\n", + "nlu.load('en.embed_sentence.electra_small_uncased') returns Spark NLP model_anno_obj sent_electra_small_uncased\n", + "nlu.load('en.embed_sentence.roberta.base') returns Spark NLP model_anno_obj sent_roberta_base\n", + "nlu.load('en.embed_sentence.roberta.large') returns Spark NLP model_anno_obj sent_roberta_large\n", + "nlu.load('en.embed_sentence.small_bert_L10_128') returns Spark NLP model_anno_obj sent_small_bert_L10_128\n", + "nlu.load('en.embed_sentence.small_bert_L10_256') returns Spark NLP model_anno_obj sent_small_bert_L10_256\n", + "nlu.load('en.embed_sentence.small_bert_L10_512') returns Spark NLP model_anno_obj sent_small_bert_L10_512\n", + "nlu.load('en.embed_sentence.small_bert_L10_768') returns Spark NLP model_anno_obj sent_small_bert_L10_768\n", + "nlu.load('en.embed_sentence.small_bert_L12_128') returns Spark NLP model_anno_obj sent_small_bert_L12_128\n", + "nlu.load('en.embed_sentence.small_bert_L12_256') returns Spark NLP model_anno_obj sent_small_bert_L12_256\n", + "nlu.load('en.embed_sentence.small_bert_L12_512') returns Spark NLP model_anno_obj sent_small_bert_L12_512\n", + "nlu.load('en.embed_sentence.small_bert_L12_768') returns Spark NLP model_anno_obj sent_small_bert_L12_768\n", + "nlu.load('en.embed_sentence.small_bert_L2_128') returns Spark NLP model_anno_obj sent_small_bert_L2_128\n", + "nlu.load('en.embed_sentence.small_bert_L2_256') returns Spark NLP model_anno_obj sent_small_bert_L2_256\n", + "nlu.load('en.embed_sentence.small_bert_L2_512') returns Spark NLP model_anno_obj sent_small_bert_L2_512\n", + "nlu.load('en.embed_sentence.small_bert_L2_768') returns Spark NLP model_anno_obj sent_small_bert_L2_768\n", + "nlu.load('en.embed_sentence.small_bert_L4_128') returns Spark NLP model_anno_obj sent_small_bert_L4_128\n", + "nlu.load('en.embed_sentence.small_bert_L4_256') returns Spark NLP model_anno_obj sent_small_bert_L4_256\n", + "nlu.load('en.embed_sentence.small_bert_L4_512') returns Spark NLP model_anno_obj sent_small_bert_L4_512\n", + "nlu.load('en.embed_sentence.small_bert_L4_768') returns Spark NLP model_anno_obj sent_small_bert_L4_768\n", + "nlu.load('en.embed_sentence.small_bert_L6_128') returns Spark NLP model_anno_obj sent_small_bert_L6_128\n", + "nlu.load('en.embed_sentence.small_bert_L6_256') returns Spark NLP model_anno_obj sent_small_bert_L6_256\n", + "nlu.load('en.embed_sentence.small_bert_L6_512') returns Spark NLP model_anno_obj sent_small_bert_L6_512\n", + "nlu.load('en.embed_sentence.small_bert_L6_768') returns Spark NLP model_anno_obj sent_small_bert_L6_768\n", + "nlu.load('en.embed_sentence.small_bert_L8_128') returns Spark NLP model_anno_obj sent_small_bert_L8_128\n", + "nlu.load('en.embed_sentence.small_bert_L8_256') returns Spark NLP model_anno_obj sent_small_bert_L8_256\n", + "nlu.load('en.embed_sentence.small_bert_L8_512') returns Spark NLP model_anno_obj sent_small_bert_L8_512\n", + "nlu.load('en.embed_sentence.small_bert_L8_768') returns Spark NLP model_anno_obj sent_small_bert_L8_768\n", + "nlu.load('en.embed_sentence.tfhub_use') returns Spark NLP model_anno_obj tfhub_use\n", + "nlu.load('en.embed_sentence.tfhub_use.lg') returns Spark NLP model_anno_obj tfhub_use_lg\n", + "nlu.load('en.embed_sentence.use') returns Spark NLP model_anno_obj tfhub_use\n", + "nlu.load('en.embed_sentence.use.lg') returns Spark NLP model_anno_obj tfhub_use_lg\n", + "For language NLU provides the following Models : \n", + "nlu.load('es.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "nlu.load('es.embed_sentence.bert.base_uncased') returns Spark NLP model_anno_obj sent_bert_base_uncased\n", + "For language NLU provides the following Models : \n", + "nlu.load('fi.embed_sentence.bert') returns Spark NLP model_anno_obj bert_base_finnish_uncased\n", + "nlu.load('fi.embed_sentence.bert.cased') returns Spark NLP model_anno_obj bert_base_finnish_cased\n", + "nlu.load('fi.embed_sentence.bert.uncased') returns Spark NLP model_anno_obj bert_base_finnish_uncased\n", + "For language NLU provides the following Models : \n", + "nlu.load('ha.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_hausa\n", + "For language NLU provides the following Models : \n", + "nlu.load('ig.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_igbo\n", + "For language NLU provides the following Models : \n", + "nlu.load('lg.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_luganda\n", + "For language NLU provides the following Models : \n", + "nlu.load('nl.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "For language NLU provides the following Models : \n", + "nlu.load('pcm.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_naija\n", + "For language NLU provides the following Models : \n", + "nlu.load('pt.embed_sentence.bert.base_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_base_tsdae_sts\n", + "nlu.load('pt.embed_sentence.bert.cased_large_legal') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.1\n", + "nlu.load('pt.embed_sentence.bert.large_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_gpl_sts\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.10.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.10\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.2.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.2\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.3.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.3\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.4.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.4\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.5.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.5\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.7.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.7\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.8.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.8\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.9.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.9\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v1.0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v1.0\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_gpl_nli_sts_v0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_gpl_nli_sts_v0\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_gpl_nli_sts_v1.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_gpl_nli_sts_v1\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_nli_sts_v0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_nli_sts_v0\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_nli_sts_v1.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_nli_sts_v1\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_sts_v0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_sts_v0\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_sts_v1.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_sts_v1\n", + "nlu.load('pt.embed_sentence.bert.v2_base_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_base_ma_v2\n", + "nlu.load('pt.embed_sentence.bert.v2_large_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_sts_v2\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.assin.base.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_base_ma\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.assin2.base.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_base\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_sts_by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_sts_ma.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large_ma\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_sts_ma_v3.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large_ma_v3\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_tsdae_sts.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_sts\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_tsdae_sts_v4.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_sts_v4\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_tsdae_v4_gpl_sts.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_v4_gpl_sts\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.v2_large_sts_v2.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large_v2\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.v2_large_v2_sts.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_v2_sts\n", + "For language NLU provides the following Models : \n", + "nlu.load('rw.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_kinyarwanda\n", + "For language NLU provides the following Models : \n", + "nlu.load('sv.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "For language NLU provides the following Models : \n", + "nlu.load('sw.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_swahili\n", + "For language NLU provides the following Models : \n", + "nlu.load('wo.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_wolof\n", + "For language NLU provides the following Models : \n", + "nlu.load('xx.embed_sentence') returns Spark NLP model_anno_obj sent_bert_multi_cased\n", + "nlu.load('xx.embed_sentence.bert') returns Spark NLP model_anno_obj sent_bert_multi_cased\n", + "nlu.load('xx.embed_sentence.bert.cased') returns Spark NLP model_anno_obj sent_bert_multi_cased\n", + "nlu.load('xx.embed_sentence.bert.muril') returns Spark NLP model_anno_obj sent_bert_muril\n", + "nlu.load('xx.embed_sentence.bert_use_cmlm_multi_base') returns Spark NLP model_anno_obj sent_bert_use_cmlm_multi_base\n", + "nlu.load('xx.embed_sentence.bert_use_cmlm_multi_base_br') returns Spark NLP model_anno_obj sent_bert_use_cmlm_multi_base_br\n", + "nlu.load('xx.embed_sentence.labse') returns Spark NLP model_anno_obj labse\n", + "nlu.load('xx.embed_sentence.xlm_roberta.base') returns Spark NLP model_anno_obj sent_xlm_roberta_base\n", + "For language NLU provides the following Models : \n", + "nlu.load('yo.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_yoruba\n", + "For language NLU provides the following Models : \n", + "nlu.load('zh.embed_sentence.bert') returns Spark NLP model_anno_obj sbert_chinese_qmc_finance_v1\n", + "nlu.load('zh.embed_sentence.bert.distilled') returns Spark NLP model_anno_obj sbert_chinese_qmc_finance_v1_distill\n" + ] + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "IKK_Ii_gjJfF", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "7e528684-a7d5-431a-ca4b-626adf50a7e9" + }, + "source": [ + "trainable_pipe = nlp.load('en.embed_sentence.small_bert_L12_768 train.sentiment')\n", + "# We need to train longer and user smaller LR for NON-USE based sentence embeddings usually\n", + "# We could tune the hyperparameters further with hyperparameter tuning methods like gridsearch\n", + "# Also longer training gives more accuracy\n", + "trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(70)\n", + "trainable_pipe['trainable_sentiment_dl'].setLr(0.0005)\n", + "fitted_pipe = trainable_pipe.fit(train_df)\n", + "# predict with the trainable pipeline on dataset and get predictions\n", + "preds = fitted_pipe.predict(train_df,output_level='document')\n", + "\n", + "#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n", + "preds.dropna(inplace=True)\n", + "print(classification_report(preds['y'], preds['sentiment']))\n", + "\n", + "#preds" + ], + "execution_count": 9, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", + "sent_small_bert_L12_768 download started this may take some time.\n", + "Approximate size to download 392.9 MB\n", + "[OK!]\n", + " precision recall f1-score support\n", + "\n", + " negative 0.88 0.82 0.85 482\n", + " neutral 0.00 0.00 0.00 0\n", + " positive 0.96 0.92 0.94 1091\n", + "\n", + " accuracy 0.89 1573\n", + " macro avg 0.61 0.58 0.60 1573\n", + "weighted avg 0.94 0.89 0.91 1573\n", + "\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_1jxw3GnVGlI" + }, + "source": [ + "# 7.1 evaluate on Test Data" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Fxx4yNkNVGFl", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "fdb96659-2395-48af-dd53-673b935bd767" + }, + "source": [ + "preds = fitted_pipe.predict(test_df,output_level='document')\n", + "\n", + "#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n", + "preds.dropna(inplace=True)\n", + "print(classification_report(preds['y'], preds['sentiment']))" + ], + "execution_count": 10, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + " precision recall f1-score support\n", + "\n", + " negative 0.81 0.75 0.78 122\n", + " neutral 0.00 0.00 0.00 0\n", + " positive 0.92 0.89 0.90 272\n", + "\n", + " accuracy 0.85 394\n", + " macro avg 0.58 0.55 0.56 394\n", + "weighted avg 0.88 0.85 0.86 394\n", + "\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2BB-NwZUoHSe" + }, + "source": [ + "# 8. Lets save the model" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "eLex095goHwm" + }, + "source": [ + "stored_model_path = './models/classifier_dl_trained'\n", + "fitted_pipe.save(stored_model_path)" + ], + "execution_count": 11, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e_b2DPd4rCiU" + }, + "source": [ + "# 9. Lets load the model from HDD.\n", + "This makes Offlien NLU usage possible! \n", + "You need to call nlu.load(path=path_to_the_pipe) to load a model/pipeline from disk." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "SO4uz45MoRgp", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 133 + }, + "outputId": "a59aec27-0b97-463f-a8a3-52de96bed3ee" + }, + "source": [ + "hdd_pipe = nlp.load(path=stored_model_path)\n", + "\n", + "preds = hdd_pipe.predict('According to the most recent update there has been a major decrese in the rate of oil')\n", + "preds" + ], + "execution_count": 12, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " document \\\n", + "0 According to the most recent update there has ... \n", + "\n", + " sentence_embedding_from_disk sentiment \\\n", + "0 [-0.02168591320514679, 0.13073040544986725, 0.... negative \n", + "\n", + " sentiment_confidence \n", + "0 0.0 " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
documentsentence_embedding_from_disksentimentsentiment_confidence
0According to the most recent update there has ...[-0.02168591320514679, 0.13073040544986725, 0....negative0.0
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 12 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "e0CVlkk9v6Qi", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "0f3fa874-643e-4ff5-e0f4-0060f585fbfa" + }, + "source": [ + "hdd_pipe.print_info()" + ], + "execution_count": 13, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n", + ">>> component_list['document_assembler'] has settable params:\n", + "component_list['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n", + ">>> component_list['bert_sentence_embeddings@sent_small_bert_L12_768'] has settable params:\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setBatchSize(8) | Info: Size of every batch | Currently set to : 8\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setCaseSensitive(False) | Info: whether to ignore case in tokens for embeddings matching | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setDimension(768) | Info: Number of embedding dimensions | Currently set to : 768\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setMaxSentenceLength(128) | Info: Max sentence length to process | Currently set to : 128\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setIsLong(False) | Info: Use Long type instead of Int type for inputs buffer - Some Bert models require Long instead of Int. | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setStorageRef('sent_small_bert_L12_768') | Info: unique reference name for identification | Currently set to : sent_small_bert_L12_768\n", + ">>> component_list['sentiment_dl@sent_small_bert_L12_768'] has settable params:\n", + "component_list['sentiment_dl@sent_small_bert_L12_768'].setThreshold(0.6) | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n", + "component_list['sentiment_dl@sent_small_bert_L12_768'].setThresholdLabel('neutral') | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n", + "component_list['sentiment_dl@sent_small_bert_L12_768'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['sentiment_dl@sent_small_bert_L12_768'].setClasses(['positive', 'negative']) | Info: get the tags used to trained this SentimentDLModel | Currently set to : ['positive', 'negative']\n", + "component_list['sentiment_dl@sent_small_bert_L12_768'].setStorageRef('sent_small_bert_L12_768') | Info: unique reference name for identification | Currently set to : sent_small_bert_L12_768\n" + ] + } + ] + } + ] +} \ No newline at end of file diff --git a/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_natural_disasters.ipynb b/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_natural_disasters.ipynb index 3d45e611..8393bb2e 100644 --- a/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_natural_disasters.ipynb +++ b/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_natural_disasters.ipynb @@ -1 +1,3547 @@ -{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"NLU_training_sentiment_classifier_demo_natural_disasters.ipynb","provenance":[],"collapsed_sections":["zkufh760uvF3"]},"kernelspec":{"display_name":"Python 3","name":"python3"}},"cells":[{"cell_type":"markdown","metadata":{"id":"zkufh760uvF3"},"source":["![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)\n","\n","[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_natural_disasters.ipynb)\n","\n","\n","# Training a Sentiment Analysis Classifier with NLU \n","## 2 Class Natural Disasters Sentiment Classifer Training\n","With the [SentimentDL model](https://nlp.johnsnowlabs.com/docs/en/annotators#sentimentdl-multi-class-sentiment-analysis-annotator) from Spark NLP you can achieve State Of the Art results on any multi class text classification problem \n","\n","This notebook showcases the following features : \n","\n","- How to train the deep learning classifier\n","- How to store a pipeline to disk\n","- How to load the pipeline from disk (Enables NLU offline mode)\n","\n","You can achieve these results or even better on this dataset with training data:\n","\n","\n","
\n","\n","![image.png]()\n","\n","You can achieve these results or even better on this dataset with test data:\n","\n","\n","
\n","\n","\n","![Screenshot 2021-02-25 142700.png]()"]},{"cell_type":"markdown","metadata":{"id":"dur2drhW5Rvi"},"source":["# 1. Install Java 8 and NLU"]},{"cell_type":"code","metadata":{"id":"hFGnBCHavltY","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1620193265134,"user_tz":-120,"elapsed":142818,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"52bc1bd1-c902-4d24-b470-579f3f4ab6d2"},"source":["!wget https://setup.johnsnowlabs.com/nlu/colab.sh -O - | bash\n","import nlu"],"execution_count":null,"outputs":[{"output_type":"stream","text":["--2021-05-05 05:38:43-- https://raw.githubusercontent.com/JohnSnowLabs/nlu/master/scripts/colab_setup.sh\n","Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.108.133, 185.199.111.133, ...\n","Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.\n","HTTP request sent, awaiting response... 200 OK\n","Length: 1671 (1.6K) [text/plain]\n","Saving to: ‘STDOUT’\n","\n","- 100%[===================>] 1.63K --.-KB/s in 0s \n","\n","2021-05-05 05:38:43 (30.6 MB/s) - written to stdout [1671/1671]\n","\n","Installing NLU 3.0.0 with PySpark 3.0.2 and Spark NLP 3.0.1 for Google Colab ...\n","\u001b[K |████████████████████████████████| 204.8MB 73kB/s \n","\u001b[K |████████████████████████████████| 153kB 45.7MB/s \n","\u001b[K |████████████████████████████████| 204kB 19.8MB/s \n","\u001b[K |████████████████████████████████| 204kB 41.4MB/s \n","\u001b[?25h Building wheel for pyspark (setup.py) ... \u001b[?25l\u001b[?25hdone\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"f4KkTfnR5Ugg"},"source":["# 2. Download Disaster Sentiment dataset \n","https://www.kaggle.com/vstepanenko/disaster-tweets\n","#Context\n","\n","The file contains over 11,000 tweets associated with disaster keywords like “crash”, “quarantine”, and “bush fires” as well as the location and keyword itself. The data structure was inherited from Disasters on social media\n","\n","The tweets were collected on Jan 14th, 2020.\n","\n","Some of the topics people were tweeting:\n","\n","The eruption of Taal Volcano in Batangas, Philippines\n","Coronavirus\n","Bushfires in Australia\n","Iran downing of the airplane flight PS752\n","Disclaimer: The dataset contains text that may be considered profane, vulgar, or offensive."]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"OrVb5ZMvvrQD","executionInfo":{"status":"ok","timestamp":1620193268178,"user_tz":-120,"elapsed":145852,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"d989f5b9-0f5c-4afc-94d4-04eb3f7613a4"},"source":["! wget http://ckl-it.de/wp-content/uploads/2021/02/tweets.csv\n"],"execution_count":null,"outputs":[{"output_type":"stream","text":["--2021-05-05 05:41:04-- http://ckl-it.de/wp-content/uploads/2021/02/tweets.csv\n","Resolving ckl-it.de (ckl-it.de)... 217.160.0.108, 2001:8d8:100f:f000::209\n","Connecting to ckl-it.de (ckl-it.de)|217.160.0.108|:80... connected.\n","HTTP request sent, awaiting response... 200 OK\n","Length: 1207952 (1.2M) [text/csv]\n","Saving to: ‘tweets.csv’\n","\n","tweets.csv 100%[===================>] 1.15M 654KB/s in 1.8s \n","\n","2021-05-05 05:41:07 (654 KB/s) - ‘tweets.csv’ saved [1207952/1207952]\n","\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":406},"id":"y4xSRWIhwT28","executionInfo":{"status":"ok","timestamp":1620193269115,"user_tz":-120,"elapsed":146783,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"5f4be6e9-1900-4776-ee93-9ad537c2daf0"},"source":["import pandas as pd\n","train_path = '/content/tweets.csv'\n","\n","train_df = pd.read_csv(train_path,sep=\",\", encoding='latin-1')\n","# the text data to use for classification should be in a column named 'text'\n","columns=['text','y']\n","train_df = train_df.dropna()\n","positive = train_df[train_df['y']==(\"positive\")].iloc[:1500]\n","negative = train_df[train_df['y']==(\"negative\")].iloc[:1500]\n","positive = positive.append(negative, ignore_index = True)\n","positive = positive.sample(frac=1).reset_index(drop=True)\n","train_df = positive\n","from sklearn.model_selection import train_test_split\n","\n","train_df, test_df = train_test_split(train_df, test_size=0.2)\n","train_df"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
Unnamed: 0idkeywordlocationtexttargety
23102375.06350.0heat%20waveChandigarh, IndiaBihar District Magistrate Issues Order to Clos...1.0positive
20441607.03937.0destroyedScotland, but like at the topYOU THERE, PACHIRISU PUNK, PREPARE TO BE DESTR...0.0negative
2473809.0272.0annihilatedUSAA devastating and thorough refutation of and N...0.0negative
1369152.03817.0desolateUnited States Air Forcebubby finally gonna help with desolate while i...0.0negative
1880305.04609.0earthquakeChileTemblor: mb 4.5 SOUTHEAST OF EASTER ISLAND: Ma...1.0positive
........................
15312759.0110.0aftershockNYCThe magnitude 5.9 quake in #PuertoRico this mo...1.0positive
903182.03256.0debrisRepublic of the PhilippinesHello everyone. Tito Sotto is wrong. Cloud see...0.0negative
2621287.04008.0destructionRecife, BrasilBear witness to his destruction. West Coast, #...0.0negative
12355131.02863.0curfewVehariKASHMIR STILL UNDER CURFEW 🔒🔗ðÂ...1.0positive
22641300.08294.0quarantinedÜT: 45.246915,-76.163963Chinese woman with mystery virus quarantined i...1.0positive
\n","

2400 rows × 7 columns

\n","
"],"text/plain":[" Unnamed: 0 id ... target y\n","2310 2375.0 6350.0 ... 1.0 positive\n","2044 1607.0 3937.0 ... 0.0 negative\n","2473 809.0 272.0 ... 0.0 negative\n","1369 152.0 3817.0 ... 0.0 negative\n","1880 305.0 4609.0 ... 1.0 positive\n","... ... ... ... ... ...\n","1531 2759.0 110.0 ... 1.0 positive\n","903 182.0 3256.0 ... 0.0 negative\n","2621 287.0 4008.0 ... 0.0 negative\n","1235 5131.0 2863.0 ... 1.0 positive\n","2264 1300.0 8294.0 ... 1.0 positive\n","\n","[2400 rows x 7 columns]"]},"metadata":{"tags":[]},"execution_count":3}]},{"cell_type":"markdown","metadata":{"id":"0296Om2C5anY"},"source":["# 3. Train Deep Learning Classifier using nlu.load('train.sentiment')\n","\n","You dataset label column should be named 'y' and the feature column with text data should be named 'text'"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":1000},"id":"3ZIPkRkWftBG","executionInfo":{"status":"ok","timestamp":1620195355299,"user_tz":-120,"elapsed":13046,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"72a7378b-68c0-42ae-f41c-44d717e82f46"},"source":["import nlu \n","from sklearn.metrics import classification_report\n","# load a trainable pipeline by specifying the train. prefix and fit it on a datset with label and text columns\n","# by default the Universal Sentence Encoder (USE) Sentence embeddings are used for generation\n","trainable_pipe = nlu.load('train.sentiment')\n","fitted_pipe = trainable_pipe.fit(train_df.iloc[:50])\n","\n","# predict with the trainable pipeline on dataset and get predictions\n","preds = fitted_pipe.predict(train_df.iloc[:50],output_level='document')\n","#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n","preds.dropna(inplace=True)\n","print(classification_report(preds['y'], preds['sentiment']))\n","\n","preds"],"execution_count":null,"outputs":[{"output_type":"stream","text":["tfhub_use download started this may take some time.\n","Approximate size to download 923.7 MB\n","[OK!]\n","sentence_detector_dl download started this may take some time.\n","Approximate size to download 354.6 KB\n","[OK!]\n"," precision recall f1-score support\n","\n"," negative 1.00 0.20 0.33 25\n"," neutral 0.00 0.00 0.00 0\n"," positive 0.86 0.96 0.91 25\n","\n"," accuracy 0.58 50\n"," macro avg 0.62 0.39 0.41 50\n","weighted avg 0.93 0.58 0.62 50\n","\n"],"name":"stdout"},{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
ytrained_sentimenttextsentenceidtargetdocumenttrained_sentiment_confidencekeywordorigin_indexsentence_embedding_useUnnamed: 0location
0positivepositiveBihar District Magistrate Issues Order to Clos...[Bihar District Magistrate Issues Order to Clo...6350.01.0Bihar District Magistrate Issues Order to Clos...0.714817heat%20wave2310[0.01817215047776699, -0.042255766689777374, -...2375.0Chandigarh, India
1negativenegativeYOU THERE, PACHIRISU PUNK, PREPARE TO BE DESTR...[YOU THERE, PACHIRISU PUNK, PREPARE TO BE DEST...3937.00.0YOU THERE, PACHIRISU PUNK, PREPARE TO BE DESTR...0.624722destroyed2044[0.01567729189991951, 0.00028937371098436415, ...1607.0Scotland, but like at the top
2negativeneutralA devastating and thorough refutation of and N...[A devastating and thorough refutation of and ...272.00.0A devastating and thorough refutation of and N...0.516126annihilated2473[0.030531354248523712, 0.018328234553337097, -...809.0USA
3negativenegativebubby finally gonna help with desolate while i...[bubby finally gonna help with desolate while ...3817.00.0bubby finally gonna help with desolate while i...0.635754desolate1369[-0.010298581793904305, -0.023482605814933777,...152.0United States Air Force
4positivepositiveTemblor: mb 4.5 SOUTHEAST OF EASTER ISLAND: Ma...[Temblor: mb 4.5 SOUTHEAST OF EASTER ISLAND:, ...4609.01.0Temblor: mb 4.5 SOUTHEAST OF EASTER ISLAND: Ma...0.688311earthquake1880[-0.07005153596401215, -0.042087431997060776, ...305.0Chile
5positivepositiveHuman body parts have been found in a bag on a...[Human body parts have been found in a bag on ...1333.01.0Human body parts have been found in a bag on a...0.672013body%20bag936[-0.01760876551270485, -0.07565091550350189, -...1765.0Dublin, Ireland
6negativeneutralJFC!!! Racist comment ! I guess all \"lazy\" may...[JFC!!!, Racist comment !, I guess all \"lazy\" ...1709.00.0JFC!!! Racist comment ! I guess all \"lazy\" may...0.577475buildings%20burning2032[0.023510590195655823, 0.002154689282178879, -...2141.0Southern OBX
7positivepositive14/17:27 EST Severe Thunderstorm Warning for p...[14/17:27 EST Severe Thunderstorm Warning for ...10191.01.014/17:27 EST Severe Thunderstorm Warning for p...0.727776thunderstorm647[-0.0552973710000515, 0.024097146466374397, -0...5108.0Queensland, Australia
8negativepositiveRoadworks are co-ordinated and approved by a g...[Roadworks are co-ordinated and approved by a ...4845.00.0Roadworks are co-ordinated and approved by a g...0.669093emergency%20services1081[-0.05318903550505638, 0.018101226538419724, -...2228.0Edinburgh
9positivepositiveIran Protests: 1,500 killed 5,000 Injured 7,00...[Iran Protests: 1,500 killed 5,000 Injured 7,0...6776.01.0Iran Protests: 1,500 killed 5,000 Injured 7,00...0.727720injured1759[0.03859921544790268, -0.010697060264647007, -...4651.0Ohio River Valley
10negativepositiveKwara State Governor, Mallam purchased Truck f...[Kwara State Governor, Mallam purchased Truck ...5632.00.0Kwara State Governor, Mallam purchased Truck f...0.653245fire%20truck1551[-0.013514711521565914, -0.004037567414343357,...2225.0Ilorin, Nigeria
11negativeneutralI’m wondering if framing these pictures t...[I’m wondering if framing these pictures ...5718.00.0I’m wondering if framing these pictures t...0.545631flattened237[0.03829433023929596, -0.08540338277816772, -0...2171.0ZellRepublik🌍
12negativeneutralTransparency makes everything better: individu...[Transparency makes everything better: individ...5503.00.0Transparency makes everything better: individu...0.597684fear593[0.0025042055640369654, 0.027974789962172508, ...1037.0North East England
13positiveneutralIf I didn't need my Crutch I would seriously w...[If I didn't need my Crutch I would seriously ...1280.01.0If I didn't need my Crutch I would seriously w...0.583970bloody914[0.043823741376399994, 0.057542573660612106, -...819.0Guildford
14negativepositiveImagine you get a phone call from the Israeli ...[Imagine you get a phone call from the Israeli...1552.00.0Imagine you get a phone call from the Israeli ...0.644661bombed596[-0.02263369970023632, -0.04711006581783295, -...2227.0Uganda, Nigeria
15negativepositive[Breaking] Nirbhaya Rape: Supreme Court dismis...[[Breaking] Nirbhaya Rape: Supreme Court dismi...3127.00.0[Breaking] Nirbhaya Rape: Supreme Court dismis...0.660057death2715[-0.014349669218063354, -0.07226337492465973, ...812.0India
16positivepositivethis storm is quite violent oh lord imagine ir...[this storm is quite violent oh lord imagine i...10739.01.0this storm is quite violent oh lord imagine ir...0.662823violent%20storm1010[-0.02750890888273716, 0.021164197474718094, -...5069.0ireland , 20
17negativeneutralYeah France is doing fantastic with all the mi...[Yeah France is doing fantastic with all the m...2347.00.0Yeah France is doing fantastic with all the mi...0.536393collapse2853[0.027280263602733612, 0.015566764399409294, -...399.0South Africa
18negativeneutralHe seems to be crumbling more rapidly now. The...[He seems to be crumbling more rapidly now., ...3350.00.0He seems to be crumbling more rapidly now. The...0.568037deluge1481[0.11189540475606918, -0.009536723606288433, 0...375.0Midwest Red State
19negativeneutral#coup 9/11 vs Cal Fires So is this an attack o...[#coup 9/11 vs Cal Fires So is this an attack ...1779.00.0#coup 9/11 vs Cal Fires So is this an attack o...0.583260buildings%20on%20fire429[-0.015312806703150272, 0.030779404565691948, ...2388.0OnFreedomRoad.com
20positivepositive6 months of videos of protesters smashing, bur...[6 months of videos of protesters smashing, bu...3599.01.06 months of videos of protesters smashing, bur...0.709840derail427[-0.02566157653927803, 0.009822700172662735, -...2900.0hk island, govt held, not K&NT
21positivepositiveMexican Border City on High Alert Over ‘S...[Mexican Border City on High Alert Over ‘...9696.01.0Mexican Border City on High Alert Over ‘S...0.687776suicide%20bomber2907[0.015122084878385067, -0.0711512491106987, -0...5733.0Colorado, USA
22negativeneutralAmazing light show or a battle for your health...[Amazing light show or a battle for your healt...3909.00.0Amazing light show or a battle for your health...0.530186destroy107[0.001234019990079105, -0.06394369900226593, 0...1805.0Brisbane, Queensland
23positivepositiveEmergency services called to a collision on th...[Emergency services called to a collision on t...4877.01.0Emergency services called to a collision on th...0.689470emergency%20services2844[-0.05413218215107918, -0.0021766768768429756,...2548.0United Kingdom
24positivepositiveHEADLINE: \"6.4 earthquake strikes Puerto Rico,...[HEADLINE: \"6.4 earthquake strikes Puerto Rico...9125.01.0HEADLINE: \"6.4 earthquake strikes Puerto Rico,...0.757030seismic1497[0.004592240322381258, -0.03824039548635483, -...6523.0Florida, USA
25negativenegative“You were like a breath of fresh air. ItÃ...[“You were like a breath of fresh air., I...4466.00.0“You were like a breath of fresh air. ItÃ...0.613734drowning1452[-0.03848513588309288, 0.07503059506416321, -0...2700.0Cairo
26negativeneutralThe arsonist turns up with a fire extinguisher...[The arsonist turns up with a fire extinguishe...510.00.0The arsonist turns up with a fire extinguisher...0.532903arsonist2801[-0.001973913051187992, -0.017796456813812256,...761.0NSW. Gumbaynggirr country.
27positivepositiveM7: all lanes have reopened inbound between J5...[M7:, all lanes have reopened inbound between ...2560.01.0M7: all lanes have reopened inbound between J5...0.689800collision1277[-0.04796949401497841, -0.07411075383424759, -...5460.0M50 Dublin
28positivepositiveHi Respected #Friends Hello #TweeterWorld #Nam...[Hi Respected #Friends Hello #TweeterWorld #Na...679.01.0Hi Respected #Friends Hello #TweeterWorld #Nam...0.642852avalanche2212[0.004341206979006529, 0.01691211760044098, 0....6874.0Another World♥️♦️♠ïÂ...
29negativeneutrali wouldn’t even be surprised if they some...[i wouldn’t even be surprised if they som...2284.00.0i wouldn’t even be surprised if they some...0.580780cliff%20fall311[-0.0520143061876297, 0.0450257770717144, -0.0...674.0she/they ⚢
30positivepositive3.5 magnitude earthquake recorded in Bulgan ai...[3.5 magnitude earthquake recorded in Bulgan a...4638.01.03.5 magnitude earthquake recorded in Bulgan ai...0.725532earthquake2961[-0.01304821576923132, -0.023747781291604042, ...6976.0Jigjidjav St, Ulaanbaatar
31positivepositiveTrumps Presidency =2024 Oil spill in Alaska ha...[Trumps Presidency =2024 Oil spill in Alaska h...7931.01.0Trumps Presidency =2024 Oil spill in Alaska ha...0.603689oil%20spill2644[-0.01266513578593731, 0.07648883014917374, -0...1905.0Kelowna, British Columbia
32positivepositiveA fire outbreak has razed down 39 shops, ravag...[A fire outbreak has razed down 39 shops, rava...8394.01.0A fire outbreak has razed down 39 shops, ravag...0.679642razed1982[0.061840321868658066, 0.020564774051308632, -...3117.0Abuja, Nigeria
33negativenegativeWhen technical brains speak to a raging debate...[When technical brains speak to a raging debat...221.00.0When technical brains speak to a raging debate...0.609269ambulance2063[0.02085897885262966, 0.03478141874074936, 0.0...1296.0Ghana
34positivepositiveA wildlife rehab center in South Carolina says...[A wildlife rehab center in South Carolina say...8617.01.0A wildlife rehab center in South Carolina says...0.654906rescuers221[0.05315405875444412, -0.013145094737410545, -...442.0California, USA
35negativeneutralhey you former russian spy electrocute me[hey you former russian spy electrocute me]4657.00.0hey you former russian spy electrocute me0.573599electrocute156[-0.019302884116768837, -0.0029615701641887426...504.0𝚙𝚋&𝚓 𝚑𝚘ð...
36negativeneutralJKP cop Devinder Singh's sordid saga vindicate...[JKP cop Devinder Singh's sordid saga vindicat...3018.00.0JKP cop Devinder Singh's sordid saga vindicate...0.588211danger1676[-0.020209699869155884, -0.05051594600081444, ...1889.0DELHI, KASHMIR
37negativeneutralTime will tell when I fall off a cliff or high...[Time will tell when I fall off a cliff or hig...2277.00.0Time will tell when I fall off a cliff or high...0.521615cliff%20fall505[0.008356968872249126, 0.07969766110181808, 0....1335.0Sydney, Australia
38positivepositiveJUST IN: An earthquake was felt in parts of CA...[JUST IN: An earthquake was felt in parts of C...4622.01.0JUST IN: An earthquake was felt in parts of CA...0.761350earthquake2858[0.03384742885828018, -0.022578943520784378, -...6461.0Pasig City
39negativenegativeThis tweet blew up and this is where people pr...[This tweet blew up and this is where people p...1029.00.0This tweet blew up and this is where people pr...0.602517blew%20up1090[0.056242309510707855, -0.023219745606184006, ...1686.0ENFP / 4w3
40negativeneutralWe thrown him as debris should be out of the h...[We thrown him as debris should be out of the ...3277.00.0We thrown him as debris should be out of the h...0.566319debris1729[0.07086315006017685, -0.04039449617266655, -0...169.0Lahore, Pakistan
41negativeneutral“Key compoNts I appreciate as a transport...[“Key compoNts I appreciate as a transpor...5424.00.0“Key compoNts I appreciate as a transport...0.505895fatalities329[0.06842903792858124, -0.02750425413250923, -0...2833.0California, USA
42positivepositive#childsafety | Local flu death of boy is one o...[#childsafety | Local flu death of boy is one ...3193.01.0#childsafety | Local flu death of boy is one o...0.702334deaths2469[-0.023012271150946617, 0.03902702406048775, -...2076.0Los Angeles, Ca
43positivepositiveA fire outbreak has razed down shops, ravaging...[A fire outbreak has razed down shops, ravagin...8413.01.0A fire outbreak has razed down shops, ravaging...0.718965razed833[0.05622481927275658, 0.0063430992886424065, 0...1372.0Abuja
44positivepositiveDeath toll in Pakistan mosque suicide bombing ...[Death toll in Pakistan mosque suicide bombing...9707.01.0Death toll in Pakistan mosque suicide bombing ...0.765301suicide%20bombing1440[-0.021320270374417305, -0.04400596395134926, ...1737.0Deutschland 🇩🇪
45positivepositivehttps://t.co/CggAt6G2TK Blocked drains, 13th J...[https://t.co/CggAt6G2TK Blocked drains, 13th ...8770.01.0https://t.co/CggAt6G2TK Blocked drains, 13th J...0.664557rubble1282[0.04071395844221115, -0.021260187029838562, -...5288.0Cymru
46negativeneutralUpdate at 18:15 Tuesday to do list - Sleep in ...[Update at 18:15 Tuesday to do list - Sleep in...2268.00.0Update at 18:15 Tuesday to do list - Sleep in ...0.582058cliff%20fall2768[0.04768536612391472, 0.009061871096491814, 0....528.0Plotskiville
47positivepositiveChina sinkhole: Six people killed in Xining as...[China sinkhole: Six people killed in Xining a...3236.01.0China sinkhole: Six people killed in Xining as...0.776652deaths2167[-0.03895467147231102, -0.053079184144735336, ...2861.0Queens, NY
48positivepositiveLawyer who gave aid to anti-CAA protestors ele...[Lawyer who gave aid to anti-CAA protestors el...4727.01.0Lawyer who gave aid to anti-CAA protestors ele...0.691819electrocuted408[-0.016440408304333687, -0.03521612286567688, ...2992.0Bengaluru, India
49positivepositiveA local veterinarian takes care of the horses ...[A local veterinarian takes care of the horses...5121.01.0A local veterinarian takes care of the horses ...0.613162evacuation2140[-0.06955872476100922, 0.06343398988246918, -0...1479.0Republic of the Philippines
\n","
"],"text/plain":[" y ... location\n","0 positive ... Chandigarh, India\n","1 negative ... Scotland, but like at the top\n","2 negative ... USA\n","3 negative ... United States Air Force\n","4 positive ... Chile\n","5 positive ... Dublin, Ireland\n","6 negative ... Southern OBX\n","7 positive ... Queensland, Australia\n","8 negative ... Edinburgh\n","9 positive ... Ohio River Valley\n","10 negative ... Ilorin, Nigeria\n","11 negative ... ZellRepublik🌍\n","12 negative ... North East England\n","13 positive ... Guildford\n","14 negative ... Uganda, Nigeria\n","15 negative ... India\n","16 positive ... ireland , 20\n","17 negative ... South Africa\n","18 negative ... Midwest Red State\n","19 negative ... OnFreedomRoad.com\n","20 positive ... hk island, govt held, not K&NT\n","21 positive ... Colorado, USA\n","22 negative ... Brisbane, Queensland\n","23 positive ... United Kingdom\n","24 positive ... Florida, USA\n","25 negative ... Cairo\n","26 negative ... NSW. Gumbaynggirr country.\n","27 positive ... M50 Dublin\n","28 positive ... Another World♥️♦️♠ïÂ...\n","29 negative ... she/they ⚢\n","30 positive ... Jigjidjav St, Ulaanbaatar\n","31 positive ... Kelowna, British Columbia\n","32 positive ... Abuja, Nigeria\n","33 negative ... Ghana\n","34 positive ... California, USA\n","35 negative ... 𝚙𝚋&𝚓 𝚑𝚘ð...\n","36 negative ... DELHI, KASHMIR\n","37 negative ... Sydney, Australia\n","38 positive ... Pasig City\n","39 negative ... ENFP / 4w3\n","40 negative ... Lahore, Pakistan\n","41 negative ... California, USA\n","42 positive ... Los Angeles, Ca\n","43 positive ... Abuja\n","44 positive ... Deutschland 🇩🇪\n","45 positive ... Cymru\n","46 negative ... Plotskiville\n","47 positive ... Queens, NY\n","48 positive ... Bengaluru, India\n","49 positive ... Republic of the Philippines\n","\n","[50 rows x 13 columns]"]},"metadata":{"tags":[]},"execution_count":6}]},{"cell_type":"markdown","metadata":{"id":"lVyOE2wV0fw_"},"source":["# 4. Test the fitted pipe on new example"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":77},"id":"qdCUg2MR0PD2","executionInfo":{"status":"ok","timestamp":1620195355927,"user_tz":-120,"elapsed":13003,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"40252d3c-3637-4c1d-e499-da32fb741d2d"},"source":["fitted_pipe.predict(\"All the buildings in the capital were destroyed\")"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
documenttrained_sentiment_confidenceorigin_indextrained_sentimentsentence_embedding_usesentence
0All the buildings in the capital were destroyed0.6979480positive[0.01043090783059597, 0.06007970869541168, -0....[All the buildings in the capital were destroyed]
\n","
"],"text/plain":[" document ... sentence\n","0 All the buildings in the capital were destroyed ... [All the buildings in the capital were destroyed]\n","\n","[1 rows x 6 columns]"]},"metadata":{"tags":[]},"execution_count":7}]},{"cell_type":"markdown","metadata":{"id":"xflpwrVjjBVD"},"source":["## 5. Configure pipe training parameters"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"UtsAUGTmOTms","executionInfo":{"status":"ok","timestamp":1620195355927,"user_tz":-120,"elapsed":12923,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"3245d07d-0864-4e67-8b47-44378f110787"},"source":["trainable_pipe.print_info()"],"execution_count":null,"outputs":[{"output_type":"stream","text":["The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n",">>> pipe['sentiment_dl'] has settable params:\n","pipe['sentiment_dl'].setMaxEpochs(1) | Info: Maximum number of epochs to train | Currently set to : 1\n","pipe['sentiment_dl'].setLr(0.005) | Info: Learning Rate | Currently set to : 0.005\n","pipe['sentiment_dl'].setBatchSize(64) | Info: Batch size | Currently set to : 64\n","pipe['sentiment_dl'].setDropout(0.5) | Info: Dropout coefficient | Currently set to : 0.5\n","pipe['sentiment_dl'].setEnableOutputLogs(True) | Info: Whether to use stdout in addition to Spark logs. | Currently set to : True\n","pipe['sentiment_dl'].setThreshold(0.6) | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n","pipe['sentiment_dl'].setThresholdLabel('neutral') | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n",">>> pipe['use@tfhub_use'] has settable params:\n","pipe['use@tfhub_use'].setDimension(512) | Info: Number of embedding dimensions | Currently set to : 512\n","pipe['use@tfhub_use'].setLoadSP(False) | Info: Whether to load SentencePiece ops file which is required only by multi-lingual models. This is not changeable after it's set with a pretrained model nor it is compatible with Windows. | Currently set to : False\n","pipe['use@tfhub_use'].setStorageRef('tfhub_use') | Info: unique reference name for identification | Currently set to : tfhub_use\n",">>> pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'] has settable params:\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setExplodeSentences(False) | Info: whether to explode each sentence into a different row, for better parallelization. Defaults to false. | Currently set to : False\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setStorageRef('SentenceDetectorDLModel_c83c27f46b97') | Info: storage unique identifier | Currently set to : SentenceDetectorDLModel_c83c27f46b97\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setEncoder(com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@29bdbc21) | Info: Data encoder | Currently set to : com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@29bdbc21\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setImpossiblePenultimates(['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']) | Info: Impossible penultimates | Currently set to : ['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setModelArchitecture('cnn') | Info: Model architecture (CNN) | Currently set to : cnn\n",">>> pipe['document_assembler'] has settable params:\n","pipe['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"2GJdDNV9jEIe"},"source":["## 6. Retrain with new parameters"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":1000},"id":"mptfvHx-MMMX","executionInfo":{"status":"ok","timestamp":1620195360521,"user_tz":-120,"elapsed":17440,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"012313a9-d3a0-4c42-9b27-656e59001a1e"},"source":["# Train longer!\n","trainable_pipe = nlu.load('train.sentiment')\n","trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(5) \n","fitted_pipe = trainable_pipe.fit(train_df.iloc[:50])\n","# predict with the trainable pipeline on dataset and get predictions\n","preds = fitted_pipe.predict(train_df.iloc[:50],output_level='document')\n","\n","#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n","preds.dropna(inplace=True)\n","print(classification_report(preds['y'], preds['sentiment']))\n","\n","preds"],"execution_count":null,"outputs":[{"output_type":"stream","text":[" precision recall f1-score support\n","\n"," negative 0.96 0.96 0.96 25\n"," neutral 0.00 0.00 0.00 0\n"," positive 1.00 0.96 0.98 25\n","\n"," accuracy 0.96 50\n"," macro avg 0.65 0.64 0.65 50\n","weighted avg 0.98 0.96 0.97 50\n","\n"],"name":"stdout"},{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
ytrained_sentimenttextsentenceidtargetdocumenttrained_sentiment_confidencekeywordorigin_indexsentence_embedding_useUnnamed: 0location
0positivepositiveBihar District Magistrate Issues Order to Clos...[Bihar District Magistrate Issues Order to Clo...6350.01.0Bihar District Magistrate Issues Order to Clos...0.970833heat%20wave2310[0.01817215047776699, -0.042255766689777374, -...2375.0Chandigarh, India
1negativenegativeYOU THERE, PACHIRISU PUNK, PREPARE TO BE DESTR...[YOU THERE, PACHIRISU PUNK, PREPARE TO BE DEST...3937.00.0YOU THERE, PACHIRISU PUNK, PREPARE TO BE DESTR...0.982484destroyed2044[0.01567729189991951, 0.00028937371098436415, ...1607.0Scotland, but like at the top
2negativenegativeA devastating and thorough refutation of and N...[A devastating and thorough refutation of and ...272.00.0A devastating and thorough refutation of and N...0.969792annihilated2473[0.030531354248523712, 0.018328234553337097, -...809.0USA
3negativenegativebubby finally gonna help with desolate while i...[bubby finally gonna help with desolate while ...3817.00.0bubby finally gonna help with desolate while i...0.985924desolate1369[-0.010298581793904305, -0.023482605814933777,...152.0United States Air Force
4positivepositiveTemblor: mb 4.5 SOUTHEAST OF EASTER ISLAND: Ma...[Temblor: mb 4.5 SOUTHEAST OF EASTER ISLAND:, ...4609.01.0Temblor: mb 4.5 SOUTHEAST OF EASTER ISLAND: Ma...0.973546earthquake1880[-0.07005153596401215, -0.042087431997060776, ...305.0Chile
5positivepositiveHuman body parts have been found in a bag on a...[Human body parts have been found in a bag on ...1333.01.0Human body parts have been found in a bag on a...0.974148body%20bag936[-0.01760876551270485, -0.07565091550350189, -...1765.0Dublin, Ireland
6negativenegativeJFC!!! Racist comment ! I guess all \"lazy\" may...[JFC!!!, Racist comment !, I guess all \"lazy\" ...1709.00.0JFC!!! Racist comment ! I guess all \"lazy\" may...0.960352buildings%20burning2032[0.023510590195655823, 0.002154689282178879, -...2141.0Southern OBX
7positivepositive14/17:27 EST Severe Thunderstorm Warning for p...[14/17:27 EST Severe Thunderstorm Warning for ...10191.01.014/17:27 EST Severe Thunderstorm Warning for p...0.981230thunderstorm647[-0.0552973710000515, 0.024097146466374397, -0...5108.0Queensland, Australia
8negativenegativeRoadworks are co-ordinated and approved by a g...[Roadworks are co-ordinated and approved by a ...4845.00.0Roadworks are co-ordinated and approved by a g...0.755662emergency%20services1081[-0.05318903550505638, 0.018101226538419724, -...2228.0Edinburgh
9positivepositiveIran Protests: 1,500 killed 5,000 Injured 7,00...[Iran Protests: 1,500 killed 5,000 Injured 7,0...6776.01.0Iran Protests: 1,500 killed 5,000 Injured 7,00...0.982878injured1759[0.03859921544790268, -0.010697060264647007, -...4651.0Ohio River Valley
10negativenegativeKwara State Governor, Mallam purchased Truck f...[Kwara State Governor, Mallam purchased Truck ...5632.00.0Kwara State Governor, Mallam purchased Truck f...0.745413fire%20truck1551[-0.013514711521565914, -0.004037567414343357,...2225.0Ilorin, Nigeria
11negativenegativeI’m wondering if framing these pictures t...[I’m wondering if framing these pictures ...5718.00.0I’m wondering if framing these pictures t...0.951121flattened237[0.03829433023929596, -0.08540338277816772, -0...2171.0ZellRepublik🌍
12negativenegativeTransparency makes everything better: individu...[Transparency makes everything better: individ...5503.00.0Transparency makes everything better: individu...0.979479fear593[0.0025042055640369654, 0.027974789962172508, ...1037.0North East England
13positivenegativeIf I didn't need my Crutch I would seriously w...[If I didn't need my Crutch I would seriously ...1280.01.0If I didn't need my Crutch I would seriously w...0.620800bloody914[0.043823741376399994, 0.057542573660612106, -...819.0Guildford
14negativenegativeImagine you get a phone call from the Israeli ...[Imagine you get a phone call from the Israeli...1552.00.0Imagine you get a phone call from the Israeli ...0.894990bombed596[-0.02263369970023632, -0.04711006581783295, -...2227.0Uganda, Nigeria
15negativeneutral[Breaking] Nirbhaya Rape: Supreme Court dismis...[[Breaking] Nirbhaya Rape: Supreme Court dismi...3127.00.0[Breaking] Nirbhaya Rape: Supreme Court dismis...0.502236death2715[-0.014349669218063354, -0.07226337492465973, ...812.0India
16positivepositivethis storm is quite violent oh lord imagine ir...[this storm is quite violent oh lord imagine i...10739.01.0this storm is quite violent oh lord imagine ir...0.917719violent%20storm1010[-0.02750890888273716, 0.021164197474718094, -...5069.0ireland , 20
17negativenegativeYeah France is doing fantastic with all the mi...[Yeah France is doing fantastic with all the m...2347.00.0Yeah France is doing fantastic with all the mi...0.935926collapse2853[0.027280263602733612, 0.015566764399409294, -...399.0South Africa
18negativenegativeHe seems to be crumbling more rapidly now. The...[He seems to be crumbling more rapidly now., ...3350.00.0He seems to be crumbling more rapidly now. The...0.980057deluge1481[0.11189540475606918, -0.009536723606288433, 0...375.0Midwest Red State
19negativenegative#coup 9/11 vs Cal Fires So is this an attack o...[#coup 9/11 vs Cal Fires So is this an attack ...1779.00.0#coup 9/11 vs Cal Fires So is this an attack o...0.926649buildings%20on%20fire429[-0.015312806703150272, 0.030779404565691948, ...2388.0OnFreedomRoad.com
20positivepositive6 months of videos of protesters smashing, bur...[6 months of videos of protesters smashing, bu...3599.01.06 months of videos of protesters smashing, bur...0.905132derail427[-0.02566157653927803, 0.009822700172662735, -...2900.0hk island, govt held, not K&NT
21positivepositiveMexican Border City on High Alert Over ‘S...[Mexican Border City on High Alert Over ‘...9696.01.0Mexican Border City on High Alert Over ‘S...0.714104suicide%20bomber2907[0.015122084878385067, -0.0711512491106987, -0...5733.0Colorado, USA
22negativenegativeAmazing light show or a battle for your health...[Amazing light show or a battle for your healt...3909.00.0Amazing light show or a battle for your health...0.970722destroy107[0.001234019990079105, -0.06394369900226593, 0...1805.0Brisbane, Queensland
23positivepositiveEmergency services called to a collision on th...[Emergency services called to a collision on t...4877.01.0Emergency services called to a collision on th...0.953050emergency%20services2844[-0.05413218215107918, -0.0021766768768429756,...2548.0United Kingdom
24positivepositiveHEADLINE: \"6.4 earthquake strikes Puerto Rico,...[HEADLINE: \"6.4 earthquake strikes Puerto Rico...9125.01.0HEADLINE: \"6.4 earthquake strikes Puerto Rico,...0.992927seismic1497[0.004592240322381258, -0.03824039548635483, -...6523.0Florida, USA
25negativenegative“You were like a breath of fresh air. ItÃ...[“You were like a breath of fresh air., I...4466.00.0“You were like a breath of fresh air. ItÃ...0.984222drowning1452[-0.03848513588309288, 0.07503059506416321, -0...2700.0Cairo
26negativenegativeThe arsonist turns up with a fire extinguisher...[The arsonist turns up with a fire extinguishe...510.00.0The arsonist turns up with a fire extinguisher...0.971530arsonist2801[-0.001973913051187992, -0.017796456813812256,...761.0NSW. Gumbaynggirr country.
27positivepositiveM7: all lanes have reopened inbound between J5...[M7:, all lanes have reopened inbound between ...2560.01.0M7: all lanes have reopened inbound between J5...0.924938collision1277[-0.04796949401497841, -0.07411075383424759, -...5460.0M50 Dublin
28positivepositiveHi Respected #Friends Hello #TweeterWorld #Nam...[Hi Respected #Friends Hello #TweeterWorld #Na...679.01.0Hi Respected #Friends Hello #TweeterWorld #Nam...0.817334avalanche2212[0.004341206979006529, 0.01691211760044098, 0....6874.0Another World♥️♦️♠ïÂ...
29negativenegativei wouldn’t even be surprised if they some...[i wouldn’t even be surprised if they som...2284.00.0i wouldn’t even be surprised if they some...0.974524cliff%20fall311[-0.0520143061876297, 0.0450257770717144, -0.0...674.0she/they ⚢
30positivepositive3.5 magnitude earthquake recorded in Bulgan ai...[3.5 magnitude earthquake recorded in Bulgan a...4638.01.03.5 magnitude earthquake recorded in Bulgan ai...0.988485earthquake2961[-0.01304821576923132, -0.023747781291604042, ...6976.0Jigjidjav St, Ulaanbaatar
31positivepositiveTrumps Presidency =2024 Oil spill in Alaska ha...[Trumps Presidency =2024 Oil spill in Alaska h...7931.01.0Trumps Presidency =2024 Oil spill in Alaska ha...0.639585oil%20spill2644[-0.01266513578593731, 0.07648883014917374, -0...1905.0Kelowna, British Columbia
32positivepositiveA fire outbreak has razed down 39 shops, ravag...[A fire outbreak has razed down 39 shops, rava...8394.01.0A fire outbreak has razed down 39 shops, ravag...0.933097razed1982[0.061840321868658066, 0.020564774051308632, -...3117.0Abuja, Nigeria
33negativenegativeWhen technical brains speak to a raging debate...[When technical brains speak to a raging debat...221.00.0When technical brains speak to a raging debate...0.985238ambulance2063[0.02085897885262966, 0.03478141874074936, 0.0...1296.0Ghana
34positivepositiveA wildlife rehab center in South Carolina says...[A wildlife rehab center in South Carolina say...8617.01.0A wildlife rehab center in South Carolina says...0.941231rescuers221[0.05315405875444412, -0.013145094737410545, -...442.0California, USA
35negativenegativehey you former russian spy electrocute me[hey you former russian spy electrocute me]4657.00.0hey you former russian spy electrocute me0.983450electrocute156[-0.019302884116768837, -0.0029615701641887426...504.0𝚙𝚋&𝚓 𝚑𝚘ð...
36negativenegativeJKP cop Devinder Singh's sordid saga vindicate...[JKP cop Devinder Singh's sordid saga vindicat...3018.00.0JKP cop Devinder Singh's sordid saga vindicate...0.974994danger1676[-0.020209699869155884, -0.05051594600081444, ...1889.0DELHI, KASHMIR
37negativenegativeTime will tell when I fall off a cliff or high...[Time will tell when I fall off a cliff or hig...2277.00.0Time will tell when I fall off a cliff or high...0.960691cliff%20fall505[0.008356968872249126, 0.07969766110181808, 0....1335.0Sydney, Australia
38positivepositiveJUST IN: An earthquake was felt in parts of CA...[JUST IN: An earthquake was felt in parts of C...4622.01.0JUST IN: An earthquake was felt in parts of CA...0.995373earthquake2858[0.03384742885828018, -0.022578943520784378, -...6461.0Pasig City
39negativenegativeThis tweet blew up and this is where people pr...[This tweet blew up and this is where people p...1029.00.0This tweet blew up and this is where people pr...0.977069blew%20up1090[0.056242309510707855, -0.023219745606184006, ...1686.0ENFP / 4w3
40negativenegativeWe thrown him as debris should be out of the h...[We thrown him as debris should be out of the ...3277.00.0We thrown him as debris should be out of the h...0.972861debris1729[0.07086315006017685, -0.04039449617266655, -0...169.0Lahore, Pakistan
41negativenegative“Key compoNts I appreciate as a transport...[“Key compoNts I appreciate as a transpor...5424.00.0“Key compoNts I appreciate as a transport...0.978197fatalities329[0.06842903792858124, -0.02750425413250923, -0...2833.0California, USA
42positivepositive#childsafety | Local flu death of boy is one o...[#childsafety | Local flu death of boy is one ...3193.01.0#childsafety | Local flu death of boy is one o...0.983754deaths2469[-0.023012271150946617, 0.03902702406048775, -...2076.0Los Angeles, Ca
43positivepositiveA fire outbreak has razed down shops, ravaging...[A fire outbreak has razed down shops, ravagin...8413.01.0A fire outbreak has razed down shops, ravaging...0.979082razed833[0.05622481927275658, 0.0063430992886424065, 0...1372.0Abuja
44positivepositiveDeath toll in Pakistan mosque suicide bombing ...[Death toll in Pakistan mosque suicide bombing...9707.01.0Death toll in Pakistan mosque suicide bombing ...0.991969suicide%20bombing1440[-0.021320270374417305, -0.04400596395134926, ...1737.0Deutschland 🇩🇪
45positivepositivehttps://t.co/CggAt6G2TK Blocked drains, 13th J...[https://t.co/CggAt6G2TK Blocked drains, 13th ...8770.01.0https://t.co/CggAt6G2TK Blocked drains, 13th J...0.875209rubble1282[0.04071395844221115, -0.021260187029838562, -...5288.0Cymru
46negativenegativeUpdate at 18:15 Tuesday to do list - Sleep in ...[Update at 18:15 Tuesday to do list - Sleep in...2268.00.0Update at 18:15 Tuesday to do list - Sleep in ...0.935865cliff%20fall2768[0.04768536612391472, 0.009061871096491814, 0....528.0Plotskiville
47positivepositiveChina sinkhole: Six people killed in Xining as...[China sinkhole: Six people killed in Xining a...3236.01.0China sinkhole: Six people killed in Xining as...0.997117deaths2167[-0.03895467147231102, -0.053079184144735336, ...2861.0Queens, NY
48positivepositiveLawyer who gave aid to anti-CAA protestors ele...[Lawyer who gave aid to anti-CAA protestors el...4727.01.0Lawyer who gave aid to anti-CAA protestors ele...0.802344electrocuted408[-0.016440408304333687, -0.03521612286567688, ...2992.0Bengaluru, India
49positivepositiveA local veterinarian takes care of the horses ...[A local veterinarian takes care of the horses...5121.01.0A local veterinarian takes care of the horses ...0.800290evacuation2140[-0.06955872476100922, 0.06343398988246918, -0...1479.0Republic of the Philippines
\n","
"],"text/plain":[" y ... location\n","0 positive ... Chandigarh, India\n","1 negative ... Scotland, but like at the top\n","2 negative ... USA\n","3 negative ... United States Air Force\n","4 positive ... Chile\n","5 positive ... Dublin, Ireland\n","6 negative ... Southern OBX\n","7 positive ... Queensland, Australia\n","8 negative ... Edinburgh\n","9 positive ... Ohio River Valley\n","10 negative ... Ilorin, Nigeria\n","11 negative ... ZellRepublik🌍\n","12 negative ... North East England\n","13 positive ... Guildford\n","14 negative ... Uganda, Nigeria\n","15 negative ... India\n","16 positive ... ireland , 20\n","17 negative ... South Africa\n","18 negative ... Midwest Red State\n","19 negative ... OnFreedomRoad.com\n","20 positive ... hk island, govt held, not K&NT\n","21 positive ... Colorado, USA\n","22 negative ... Brisbane, Queensland\n","23 positive ... United Kingdom\n","24 positive ... Florida, USA\n","25 negative ... Cairo\n","26 negative ... NSW. Gumbaynggirr country.\n","27 positive ... M50 Dublin\n","28 positive ... Another World♥️♦️♠ïÂ...\n","29 negative ... she/they ⚢\n","30 positive ... Jigjidjav St, Ulaanbaatar\n","31 positive ... Kelowna, British Columbia\n","32 positive ... Abuja, Nigeria\n","33 negative ... Ghana\n","34 positive ... California, USA\n","35 negative ... 𝚙𝚋&𝚓 𝚑𝚘ð...\n","36 negative ... DELHI, KASHMIR\n","37 negative ... Sydney, Australia\n","38 positive ... Pasig City\n","39 negative ... ENFP / 4w3\n","40 negative ... Lahore, Pakistan\n","41 negative ... California, USA\n","42 positive ... Los Angeles, Ca\n","43 positive ... Abuja\n","44 positive ... Deutschland 🇩🇪\n","45 positive ... Cymru\n","46 negative ... Plotskiville\n","47 positive ... Queens, NY\n","48 positive ... Bengaluru, India\n","49 positive ... Republic of the Philippines\n","\n","[50 rows x 13 columns]"]},"metadata":{"tags":[]},"execution_count":9}]},{"cell_type":"markdown","metadata":{"id":"qFoT-s1MjTSS"},"source":["# 7. Try training with different Embeddings"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"nxWFzQOhjWC8","executionInfo":{"status":"ok","timestamp":1620195360523,"user_tz":-120,"elapsed":17367,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"a6f5b9b8-5045-4e08-8b35-82c76519b21a"},"source":["# We can use nlu.print_components(action='embed_sentence') to see every possibler sentence embedding we could use. Lets use bert!\n","nlu.print_components(action='embed_sentence')"],"execution_count":null,"outputs":[{"output_type":"stream","text":["For language NLU provides the following Models : \n","nlu.load('en.embed_sentence') returns Spark NLP model tfhub_use\n","nlu.load('en.embed_sentence.use') returns Spark NLP model tfhub_use\n","nlu.load('en.embed_sentence.tfhub_use') returns Spark NLP model tfhub_use\n","nlu.load('en.embed_sentence.use.lg') returns Spark NLP model tfhub_use_lg\n","nlu.load('en.embed_sentence.tfhub_use.lg') returns Spark NLP model tfhub_use_lg\n","nlu.load('en.embed_sentence.albert') returns Spark NLP model albert_base_uncased\n","nlu.load('en.embed_sentence.electra') returns Spark NLP model sent_electra_small_uncased\n","nlu.load('en.embed_sentence.electra_small_uncased') returns Spark NLP model sent_electra_small_uncased\n","nlu.load('en.embed_sentence.electra_base_uncased') returns Spark NLP model sent_electra_base_uncased\n","nlu.load('en.embed_sentence.electra_large_uncased') returns Spark NLP model sent_electra_large_uncased\n","nlu.load('en.embed_sentence.bert') returns Spark NLP model sent_bert_base_uncased\n","nlu.load('en.embed_sentence.bert_base_uncased') returns Spark NLP model sent_bert_base_uncased\n","nlu.load('en.embed_sentence.bert_base_cased') returns Spark NLP model sent_bert_base_cased\n","nlu.load('en.embed_sentence.bert_large_uncased') returns Spark NLP model sent_bert_large_uncased\n","nlu.load('en.embed_sentence.bert_large_cased') returns Spark NLP model sent_bert_large_cased\n","nlu.load('en.embed_sentence.biobert.pubmed_base_cased') returns Spark NLP model sent_biobert_pubmed_base_cased\n","nlu.load('en.embed_sentence.biobert.pubmed_large_cased') returns Spark NLP model sent_biobert_pubmed_large_cased\n","nlu.load('en.embed_sentence.biobert.pmc_base_cased') returns Spark NLP model sent_biobert_pmc_base_cased\n","nlu.load('en.embed_sentence.biobert.pubmed_pmc_base_cased') returns Spark NLP model sent_biobert_pubmed_pmc_base_cased\n","nlu.load('en.embed_sentence.biobert.clinical_base_cased') returns Spark NLP model sent_biobert_clinical_base_cased\n","nlu.load('en.embed_sentence.biobert.discharge_base_cased') returns Spark NLP model sent_biobert_discharge_base_cased\n","nlu.load('en.embed_sentence.covidbert.large_uncased') returns Spark NLP model sent_covidbert_large_uncased\n","nlu.load('en.embed_sentence.small_bert_L2_128') returns Spark NLP model sent_small_bert_L2_128\n","nlu.load('en.embed_sentence.small_bert_L4_128') returns Spark NLP model sent_small_bert_L4_128\n","nlu.load('en.embed_sentence.small_bert_L6_128') returns Spark NLP model sent_small_bert_L6_128\n","nlu.load('en.embed_sentence.small_bert_L8_128') returns Spark NLP model sent_small_bert_L8_128\n","nlu.load('en.embed_sentence.small_bert_L10_128') returns Spark NLP model sent_small_bert_L10_128\n","nlu.load('en.embed_sentence.small_bert_L12_128') returns Spark NLP model sent_small_bert_L12_128\n","nlu.load('en.embed_sentence.small_bert_L2_256') returns Spark NLP model sent_small_bert_L2_256\n","nlu.load('en.embed_sentence.small_bert_L4_256') returns Spark NLP model sent_small_bert_L4_256\n","nlu.load('en.embed_sentence.small_bert_L6_256') returns Spark NLP model sent_small_bert_L6_256\n","nlu.load('en.embed_sentence.small_bert_L8_256') returns Spark NLP model sent_small_bert_L8_256\n","nlu.load('en.embed_sentence.small_bert_L10_256') returns Spark NLP model sent_small_bert_L10_256\n","nlu.load('en.embed_sentence.small_bert_L12_256') returns Spark NLP model sent_small_bert_L12_256\n","nlu.load('en.embed_sentence.small_bert_L2_512') returns Spark NLP model sent_small_bert_L2_512\n","nlu.load('en.embed_sentence.small_bert_L4_512') returns Spark NLP model sent_small_bert_L4_512\n","nlu.load('en.embed_sentence.small_bert_L6_512') returns Spark NLP model sent_small_bert_L6_512\n","nlu.load('en.embed_sentence.small_bert_L8_512') returns Spark NLP model sent_small_bert_L8_512\n","nlu.load('en.embed_sentence.small_bert_L10_512') returns Spark NLP model sent_small_bert_L10_512\n","nlu.load('en.embed_sentence.small_bert_L12_512') returns Spark NLP model sent_small_bert_L12_512\n","nlu.load('en.embed_sentence.small_bert_L2_768') returns Spark NLP model sent_small_bert_L2_768\n","nlu.load('en.embed_sentence.small_bert_L4_768') returns Spark NLP model sent_small_bert_L4_768\n","nlu.load('en.embed_sentence.small_bert_L6_768') returns Spark NLP model sent_small_bert_L6_768\n","nlu.load('en.embed_sentence.small_bert_L8_768') returns Spark NLP model sent_small_bert_L8_768\n","nlu.load('en.embed_sentence.small_bert_L10_768') returns Spark NLP model sent_small_bert_L10_768\n","nlu.load('en.embed_sentence.small_bert_L12_768') returns Spark NLP model sent_small_bert_L12_768\n","For language NLU provides the following Models : \n","nlu.load('fi.embed_sentence') returns Spark NLP model sent_bert_finnish_cased\n","nlu.load('fi.embed_sentence.bert.cased') returns Spark NLP model sent_bert_finnish_cased\n","nlu.load('fi.embed_sentence.bert.uncased') returns Spark NLP model sent_bert_finnish_uncased\n","For language NLU provides the following Models : \n","nlu.load('xx.embed_sentence') returns Spark NLP model sent_bert_multi_cased\n","nlu.load('xx.embed_sentence.bert') returns Spark NLP model sent_bert_multi_cased\n","nlu.load('xx.embed_sentence.bert.cased') returns Spark NLP model sent_bert_multi_cased\n","nlu.load('xx.embed_sentence.labse') returns Spark NLP model labse\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"IKK_Ii_gjJfF","executionInfo":{"status":"ok","timestamp":1620197099679,"user_tz":-120,"elapsed":1756488,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"3d7b9ee2-2fd2-4d3b-ce7b-124efcd30af1"},"source":["trainable_pipe = nlu.load('en.embed_sentence.small_bert_L12_768 train.sentiment')\n","# We need to train longer and user smaller LR for NON-USE based sentence embeddings usually\n","# We could tune the hyperparameters further with hyperparameter tuning methods like gridsearch\n","# Also longer training gives more accuracy\n","trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(120) \n","trainable_pipe['trainable_sentiment_dl'].setLr(0.0005) \n","fitted_pipe = trainable_pipe.fit(train_df)\n","# predict with the trainable pipeline on dataset and get predictions\n","preds = fitted_pipe.predict(train_df,output_level='document')\n","\n","#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n","preds.dropna(inplace=True)\n","print(classification_report(preds['y'], preds['sentiment']))\n","\n","#preds"],"execution_count":null,"outputs":[{"output_type":"stream","text":["sent_small_bert_L12_768 download started this may take some time.\n","Approximate size to download 392.9 MB\n","[OK!]\n","sentence_detector_dl download started this may take some time.\n","Approximate size to download 354.6 KB\n","[OK!]\n"," precision recall f1-score support\n","\n"," negative 0.87 0.82 0.85 1206\n"," neutral 0.00 0.00 0.00 0\n"," positive 0.87 0.84 0.86 1194\n","\n"," accuracy 0.83 2400\n"," macro avg 0.58 0.56 0.57 2400\n","weighted avg 0.87 0.83 0.85 2400\n","\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"_1jxw3GnVGlI"},"source":["# 7.1 evaluate on Test Data"]},{"cell_type":"code","metadata":{"id":"Fxx4yNkNVGFl","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1620197219831,"user_tz":-120,"elapsed":1876555,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"6725b153-10a5-43cc-fc81-b4e1ad6c235d"},"source":["preds = fitted_pipe.predict(test_df,output_level='document')\n","\n","#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n","preds.dropna(inplace=True)\n","print(classification_report(preds['y'], preds['sentiment']))"],"execution_count":null,"outputs":[{"output_type":"stream","text":[" precision recall f1-score support\n","\n"," negative 0.84 0.82 0.83 294\n"," neutral 0.00 0.00 0.00 0\n"," positive 0.86 0.80 0.83 306\n","\n"," accuracy 0.81 600\n"," macro avg 0.56 0.54 0.55 600\n","weighted avg 0.85 0.81 0.83 600\n","\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"2BB-NwZUoHSe"},"source":["# 8. Lets save the model"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"eLex095goHwm","executionInfo":{"status":"ok","timestamp":1620197385034,"user_tz":-120,"elapsed":2041682,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"cd6b37e7-3b61-4489-975b-55303938873c"},"source":["stored_model_path = './models/classifier_dl_trained' \n","fitted_pipe.save(stored_model_path)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["Stored model in ./models/classifier_dl_trained\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"e_b2DPd4rCiU"},"source":["# 9. Lets load the model from HDD.\n","This makes Offlien NLU usage possible! \n","You need to call nlu.load(path=path_to_the_pipe) to load a model/pipeline from disk."]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":95},"id":"SO4uz45MoRgp","executionInfo":{"status":"ok","timestamp":1620197397239,"user_tz":-120,"elapsed":2053806,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"dd9ecdc5-e223-4474-cc4e-a847f818f556"},"source":["hdd_pipe = nlu.load(path=stored_model_path)\n","\n","preds = hdd_pipe.predict('All the buildings in the capital were destroyed')\n","preds"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
textsentimentdocumentorigin_indexsentiment_confidencesentencesentence_embedding_from_disk
0All the buildings in the capital were destroyed[positive]All the buildings in the capital were destroyed8589934592[0.99480504][All the buildings in the capital were destroyed][[-0.393466055393219, 0.33815130591392517, -0....
\n","
"],"text/plain":[" text ... sentence_embedding_from_disk\n","0 All the buildings in the capital were destroyed ... [[-0.393466055393219, 0.33815130591392517, -0....\n","\n","[1 rows x 7 columns]"]},"metadata":{"tags":[]},"execution_count":14}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"e0CVlkk9v6Qi","executionInfo":{"status":"ok","timestamp":1620197397239,"user_tz":-120,"elapsed":2051781,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"5c3d3b1b-32f4-48f1-c69f-2b7344657eba"},"source":["hdd_pipe.print_info()"],"execution_count":null,"outputs":[{"output_type":"stream","text":["The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n",">>> pipe['document_assembler'] has settable params:\n","pipe['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n",">>> pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'] has settable params:\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setExplodeSentences(False) | Info: whether to explode each sentence into a different row, for better parallelization. Defaults to false. | Currently set to : False\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setStorageRef('SentenceDetectorDLModel_c83c27f46b97') | Info: storage unique identifier | Currently set to : SentenceDetectorDLModel_c83c27f46b97\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setEncoder(com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@103a8839) | Info: Data encoder | Currently set to : com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@103a8839\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setImpossiblePenultimates(['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']) | Info: Impossible penultimates | Currently set to : ['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setModelArchitecture('cnn') | Info: Model architecture (CNN) | Currently set to : cnn\n",">>> pipe['bert_sentence@sent_small_bert_L12_768'] has settable params:\n","pipe['bert_sentence@sent_small_bert_L12_768'].setBatchSize(8) | Info: Size of every batch | Currently set to : 8\n","pipe['bert_sentence@sent_small_bert_L12_768'].setCaseSensitive(False) | Info: whether to ignore case in tokens for embeddings matching | Currently set to : False\n","pipe['bert_sentence@sent_small_bert_L12_768'].setDimension(768) | Info: Number of embedding dimensions | Currently set to : 768\n","pipe['bert_sentence@sent_small_bert_L12_768'].setMaxSentenceLength(128) | Info: Max sentence length to process | Currently set to : 128\n","pipe['bert_sentence@sent_small_bert_L12_768'].setIsLong(False) | Info: Use Long type instead of Int type for inputs buffer - Some Bert models require Long instead of Int. | Currently set to : False\n","pipe['bert_sentence@sent_small_bert_L12_768'].setStorageRef('sent_small_bert_L12_768') | Info: unique reference name for identification | Currently set to : sent_small_bert_L12_768\n",">>> pipe['sentiment_dl@sent_small_bert_L12_768'] has settable params:\n","pipe['sentiment_dl@sent_small_bert_L12_768'].setThreshold(0.6) | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n","pipe['sentiment_dl@sent_small_bert_L12_768'].setThresholdLabel('neutral') | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n","pipe['sentiment_dl@sent_small_bert_L12_768'].setClasses(['positive', 'negative']) | Info: get the tags used to trained this SentimentDLModel | Currently set to : ['positive', 'negative']\n","pipe['sentiment_dl@sent_small_bert_L12_768'].setStorageRef('sent_small_bert_L12_768') | Info: unique reference name for identification | Currently set to : sent_small_bert_L12_768\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"F_mfqyyyKGkV"},"source":[""],"execution_count":null,"outputs":[]}]} \ No newline at end of file +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "provenance": [], + "collapsed_sections": [ + "zkufh760uvF3" + ] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "zkufh760uvF3" + }, + "source": [ + "![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)\n", + "\n", + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_natural_disasters.ipynb)\n", + "\n", + "\n", + "# Training a Sentiment Analysis Classifier with NLU\n", + "## 2 Class Natural Disasters Sentiment Classifer Training\n", + "With the [SentimentDL model](https://nlp.johnsnowlabs.com/docs/en/annotators#sentimentdl-multi-class-sentiment-analysis-annotator) from Spark NLP you can achieve State Of the Art results on any multi class text classification problem\n", + "\n", + "This notebook showcases the following features :\n", + "\n", + "- How to train the deep learning classifier\n", + "- How to store a pipeline to disk\n", + "- How to load the pipeline from disk (Enables NLU offline mode)\n", + "\n", + "You can achieve these results or even better on this dataset with training data:\n", + "\n", + "\n", + "
\n", + "\n", + "![image.png]()\n", + "\n", + "You can achieve these results or even better on this dataset with test data:\n", + "\n", + "\n", + "
\n", + "\n", + "\n", + "![Screenshot 2021-02-25 142700.png]()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dur2drhW5Rvi" + }, + "source": [ + "# 1. Install Java 8 and NLU" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "hFGnBCHavltY" + }, + "source": [ + "!pip install -q johnsnowlabs" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "f4KkTfnR5Ugg" + }, + "source": [ + "# 2. Download Disaster Sentiment dataset\n", + "https://www.kaggle.com/vstepanenko/disaster-tweets\n", + "#Context\n", + "\n", + "The file contains over 11,000 tweets associated with disaster keywords like “crash”, “quarantine”, and “bush fires” as well as the location and keyword itself. The data structure was inherited from Disasters on social media\n", + "\n", + "The tweets were collected on Jan 14th, 2020.\n", + "\n", + "Some of the topics people were tweeting:\n", + "\n", + "The eruption of Taal Volcano in Batangas, Philippines\n", + "Coronavirus\n", + "Bushfires in Australia\n", + "Iran downing of the airplane flight PS752\n", + "Disclaimer: The dataset contains text that may be considered profane, vulgar, or offensive." + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "OrVb5ZMvvrQD", + "outputId": "aae15220-50fe-4645-9720-2b2e368f049b" + }, + "source": [ + "! wget https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/resources/en/classifier-dl/disaster_tweets/tweets.csv\n" + ], + "execution_count": 2, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "--2023-11-03 13:51:47-- https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/resources/en/classifier-dl/disaster_tweets/tweets.csv\n", + "Resolving s3.amazonaws.com (s3.amazonaws.com)... 54.231.197.160, 52.217.86.86, 52.217.175.88, ...\n", + "Connecting to s3.amazonaws.com (s3.amazonaws.com)|54.231.197.160|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 1615005 (1.5M) [text/csv]\n", + "Saving to: ‘tweets.csv’\n", + "\n", + "tweets.csv 100%[===================>] 1.54M 3.54MB/s in 0.4s \n", + "\n", + "2023-11-03 13:51:48 (3.54 MB/s) - ‘tweets.csv’ saved [1615005/1615005]\n", + "\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "train_df" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 424 + }, + "id": "3Fu5MWAddMSF", + "outputId": "e4c88301-1f62-419f-8990-83340362379d" + }, + "execution_count": 5, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " id keyword location \\\n", + "2 2 ablaze New York City \n", + "3 3 ablaze Morgantown, WV \n", + "5 5 ablaze OC \n", + "6 6 ablaze London, England \n", + "7 7 ablaze Bharat \n", + "... ... ... ... \n", + "11362 11362 wrecked feuille d'érable \n", + "11365 11365 wrecked Blue State in a red sea \n", + "11366 11366 wrecked arohaonces \n", + "11367 11367 wrecked 🇵🇭 \n", + "11368 11368 wrecked auroraborealis \n", + "\n", + " text target \n", + "2 Arsonist sets cars ablaze at dealership https:... 1 \n", + "3 Arsonist sets cars ablaze at dealership https:... 1 \n", + "5 If this child was Chinese, this tweet would ha... 0 \n", + "6 Several houses have been set ablaze in Ngemsib... 1 \n", + "7 Asansol: A BJP office in Salanpur village was ... 1 \n", + "... ... ... \n", + "11362 Stell wrecked ako palagi sayo. Haha. #ALABTopS... 0 \n", + "11365 Media should have warned us well in advance. T... 0 \n", + "11366 i feel directly attacked 💀 i consider moonb... 0 \n", + "11367 i feel directly attacked 💀 i consider moonb... 0 \n", + "11368 ok who remember \"outcast\" nd the \"dora\" au?? T... 0 \n", + "\n", + "[7952 rows x 5 columns]" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
idkeywordlocationtexttarget
22ablazeNew York CityArsonist sets cars ablaze at dealership https:...1
33ablazeMorgantown, WVArsonist sets cars ablaze at dealership https:...1
55ablazeOCIf this child was Chinese, this tweet would ha...0
66ablazeLondon, EnglandSeveral houses have been set ablaze in Ngemsib...1
77ablazeBharatAsansol: A BJP office in Salanpur village was ...1
..................
1136211362wreckedfeuille d'érableStell wrecked ako palagi sayo. Haha. #ALABTopS...0
1136511365wreckedBlue State in a red seaMedia should have warned us well in advance. T...0
1136611366wreckedarohaoncesi feel directly attacked 💀 i consider moonb...0
1136711367wrecked🇵🇭i feel directly attacked 💀 i consider moonb...0
1136811368wreckedauroraborealisok who remember \"outcast\" nd the \"dora\" au?? T...0
\n", + "

7952 rows × 5 columns

\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 5 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 458 + }, + "id": "y4xSRWIhwT28", + "outputId": "d01f0b51-19d4-4d9c-f22d-4897c6b2bd20" + }, + "source": [ + "import pandas as pd\n", + "train_path = '/content/tweets.csv'\n", + "\n", + "train_df = pd.read_csv(train_path,sep=\",\", encoding='latin-1')\n", + "train_df.rename(columns={'target': 'y'}, inplace=True)\n", + "\n", + "# the text data to use for classification should be in a column named 'text'\n", + "columns=['text','y']\n", + "train_df = train_df.dropna()\n", + "\n", + "train_df = train_df[columns]\n", + "train_df = train_df[~train_df[\"y\"].isin([\"neutral\"])]\n", + "train_df['y'] = train_df['y'].replace({0: 'negative', 1: 'positive'})\n", + "\n", + "positive = train_df[train_df['y']==(\"positive\")].iloc[:1500]\n", + "negative = train_df[train_df['y']==(\"negative\")].iloc[:1500]\n", + "positive = positive.append(negative, ignore_index = True)\n", + "positive = positive.sample(frac=1).reset_index(drop=True)\n", + "train_df = positive\n", + "\n", + "from sklearn.model_selection import train_test_split\n", + "\n", + "train_df, test_df = train_test_split(train_df, test_size=0.2)\n", + "train_df\n" + ], + "execution_count": 19, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + ":17: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.\n", + " positive = positive.append(negative, ignore_index = True)\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " text y\n", + "1786 Arsonist sets cars ablaze at dealership https:... positive\n", + "2465 Travis Manawa [about Brandon's group]: I think... negative\n", + "1318 If a scientist said if you jump off a cliff yo... negative\n", + "2177 #StormBrendon is also bringing high winds, so ... positive\n", + "293 Like , I'm really talking about blending gener... negative\n", + "... ... ...\n", + "666 US Troops Clear Rubble from Iraq Base Days Aft... positive\n", + "2843 can we create an anti-bioterrorism commission? negative\n", + "312 When cultures collide! #southernspain https://... negative\n", + "2072 A look inside a tree that has been struck by l... positive\n", + "2024 Two lanes have been closed while emergency ser... positive\n", + "\n", + "[2400 rows x 2 columns]" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
texty
1786Arsonist sets cars ablaze at dealership https:...positive
2465Travis Manawa [about Brandon's group]: I think...negative
1318If a scientist said if you jump off a cliff yo...negative
2177#StormBrendon is also bringing high winds, so ...positive
293Like , I'm really talking about blending gener...negative
.........
666US Troops Clear Rubble from Iraq Base Days Aft...positive
2843can we create an anti-bioterrorism commission?negative
312When cultures collide! #southernspain https://...negative
2072A look inside a tree that has been struck by l...positive
2024Two lanes have been closed while emergency ser...positive
\n", + "

2400 rows × 2 columns

\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 19 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0296Om2C5anY" + }, + "source": [ + "# 3. Train Deep Learning Classifier using nlu.load('train.sentiment')\n", + "\n", + "You dataset label column should be named 'y' and the feature column with text data should be named 'text'" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "id": "3ZIPkRkWftBG", + "outputId": "3e5f0825-bfd1-402a-e273-5e1656753bd9" + }, + "source": [ + "from johnsnowlabs import nlp\n", + "from sklearn.metrics import classification_report\n", + "# load a trainable pipeline by specifying the train. prefix and fit it on a datset with label and text columns\n", + "# by default the Universal Sentence Encoder (USE) Sentence embeddings are used for generation\n", + "trainable_pipe = nlp.load('train.sentiment')\n", + "fitted_pipe = trainable_pipe.fit(train_df.iloc[:50])\n", + "\n", + "# predict with the trainable pipeline on dataset and get predictions\n", + "preds = fitted_pipe.predict(train_df.iloc[:50],output_level='document')\n", + "#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n", + "preds.dropna(inplace=True)\n", + "print(classification_report(preds['y'], preds['sentiment']))\n", + "\n", + "preds" + ], + "execution_count": 20, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "sent_small_bert_L2_128 download started this may take some time.\n", + "Approximate size to download 16.1 MB\n", + "[OK!]\n", + " precision recall f1-score support\n", + "\n", + " negative 0.56 1.00 0.72 28\n", + " positive 0.00 0.00 0.00 22\n", + "\n", + " accuracy 0.56 50\n", + " macro avg 0.28 0.50 0.36 50\n", + "weighted avg 0.31 0.56 0.40 50\n", + "\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " document \\\n", + "0 Arsonist sets cars ablaze at dealership https:... \n", + "1 Travis Manawa [about Brandon's group]: I think... \n", + "2 If a scientist said if you jump off a cliff yo... \n", + "3 #StormBrendon is also bringing high winds, so ... \n", + "4 Like , I'm really talking about blending gener... \n", + "5 Seriously though... If that defender was taken... \n", + "6 Chemical Hazard - Advice for Cobram. For more ... \n", + "7 2,400 jobs are at stake should the deal fall t... \n", + "8 Western Cape blood stocks down to just four da... \n", + "9 Last night in Sweden 🇸🇪 2 BOMBINGS. - Th... \n", + "10 BREAKING: Ukrainian President Volodymyr Zelens... \n", + "11 Here's what you can learn from the conservativ... \n", + "12 Rajneeti News (Stardust: Oldest material on ea... \n", + "13 In 2008, Laskar and Gastineau simulated 2500 f... \n", + "14 Why are you still calling it a plane crash ðŸ§... \n", + "15 Report recieved of a 9 vehicle RTC on M66 betw... \n", + "16 ** Cleared ** The vehicles involved in the col... \n", + "17 105 is the number to call if you have a power ... \n", + "18 Stress is something that affects many of us. I... \n", + "19 Re: #AustraliaBushfires, a question for any #f... \n", + "20 Enormous exploding sinkhole in China swallows ... \n", + "21 This creature who’s soul is no longer claren... \n", + "22 In 70 CE Titus the son of the Roman Emperor Ve... \n", + "23 #BREAKING: Trudeau says the 57 Canadians kille... \n", + "24 airplane accident answers. The US designated t... \n", + "25 Unlike previous State of the nation addresses,... \n", + "26 Woodbury takes emergency action to address #wa... \n", + "27 Mudslide closes Kailua-bound lane of Pali High... \n", + "28 ᴏʀᴘʜᴀɴs... ᴛʜᴇʏ'ʀᴇ ɴᴏᴛ ... \n", + "29 Eduardo Degrano looks at the damage to his hom... \n", + "30 darinde...how i wish i could put these in hot ... \n", + "31 Earthquake Information No.1 Date and Time: 14 ... \n", + "32 “It’s a blight on the country as a whole.â... \n", + "33 I don't mind being your enemy if you're an ene... \n", + "34 A follow-up to yesterday's Pakistan post: In t... \n", + "35 “Passed away.” This euphemistic trash is p... \n", + "36 I wonder how many homes could have been saved ... \n", + "37 I just zoom it and took ss and feel attack.. H... \n", + "38 Human Body Parts Discovered In Bag In Dublin h... \n", + "39 It is not just an Australian problem. We need ... \n", + "40 If I didn't need my Crutch I would seriously w... \n", + "41 But it eventually will have to work without...... \n", + "42 ... #MAGA5G.LiVEViL+ my recommended read not f... \n", + "43 This earlier collision N'bound between J9 Red ... \n", + "44 I feel attacked. https://t.co/PrtvRimq6y \n", + "45 For the past several months, after imposing a ... \n", + "46 Yuck! Looks like she's wearing a body bag. May... \n", + "47 Such a loss to and the people of NE Fife. pays... \n", + "48 This rain going dumb, it’s flooding now \n", + "49 WEATHER ALERT: Severe Thunderstorm Warning inc... \n", + "\n", + " sentence_embedding_small_bert_L2_128 sentiment \\\n", + "0 [-0.1667916625738144, 1.0302923917770386, 0.18... negative \n", + "1 [-0.9610550999641418, 0.13062980771064758, -0.... negative \n", + "2 [-0.6688402891159058, 0.640354335308075, 0.369... negative \n", + "3 [-1.0540131330490112, 0.8802893757820129, -0.6... negative \n", + "4 [-1.0963889360427856, -0.39644378423690796, 0.... negative \n", + "5 [-0.9136804938316345, 0.593141496181488, -0.13... negative \n", + "6 [-0.38091930747032166, 0.6411349177360535, 0.1... negative \n", + "7 [-0.44847720861434937, 0.5910513997077942, -0.... negative \n", + "8 [-0.5973049402236938, 0.3307306170463562, -0.1... negative \n", + "9 [-0.5309799313545227, -0.5059896111488342, -0.... negative \n", + "10 [-1.1355115175247192, -0.24506860971450806, -0... negative \n", + "11 [-1.1318162679672241, 0.4271548390388489, -0.1... negative \n", + "12 [-0.4446831941604614, -0.11513718217611313, 0.... negative \n", + "13 [-0.7734904885292053, -0.19835318624973297, -0... negative \n", + "14 [-0.12437181919813156, 1.112841010093689, 0.27... negative \n", + "15 [-0.38135823607444763, 1.1142768859863281, -0.... negative \n", + "16 [-0.2368348389863968, 0.9701917171478271, -0.3... negative \n", + "17 [-0.8241100907325745, 1.1454272270202637, -0.0... negative \n", + "18 [-0.9658268690109253, 0.5662633776664734, -0.2... negative \n", + "19 [-0.6997240781784058, 0.5168095827102661, 0.12... negative \n", + "20 [-0.5793707370758057, -0.18351595103740692, -0... negative \n", + "21 [-0.4663412868976593, 0.04406387358903885, 0.1... negative \n", + "22 [-0.8847564458847046, -0.1676793396472931, -0.... negative \n", + "23 [-0.45176926255226135, -0.1339559406042099, -0... negative \n", + "24 [-0.2922760844230652, 0.3953195810317993, -0.1... negative \n", + "25 [-0.8965665698051453, 0.9926007986068726, -0.3... negative \n", + "26 [-0.7830190658569336, 0.8649871349334717, 0.20... negative \n", + "27 [-0.42747634649276733, 0.13894236087799072, -0... negative \n", + "28 [-0.5791727304458618, 0.11972904205322266, 0.4... negative \n", + "29 [-1.1623308658599854, 0.569926917552948, -0.58... negative \n", + "30 [-1.5163923501968384, 0.6165133118629456, -0.5... negative \n", + "31 [-1.085410714149475, -0.15019290149211884, -0.... negative \n", + "32 [-0.5412101745605469, 0.611747682094574, -0.00... negative \n", + "33 [-0.6546711325645447, 0.43016937375068665, -0.... negative \n", + "34 [-1.091268539428711, -0.17232197523117065, -0.... negative \n", + "35 [-1.0350548028945923, 0.04470200464129448, -0.... negative \n", + "36 [-0.6520278453826904, 0.6858862042427063, -0.3... negative \n", + "37 [-0.9297969937324524, -0.12180554866790771, 0.... negative \n", + "38 [-0.6394882202148438, 0.8264853954315186, 0.34... negative \n", + "39 [-0.7464641332626343, 0.9418659806251526, -0.3... negative \n", + "40 [-1.1001498699188232, 1.0316743850708008, -0.3... negative \n", + "41 [-0.6016444563865662, 0.95436030626297, -0.232... negative \n", + "42 [-0.8835289478302002, 0.3090209364891052, 0.29... negative \n", + "43 [-0.19473275542259216, 0.8363125920295715, 0.0... negative \n", + "44 [-0.4955633580684662, 0.16522228717803955, 0.6... negative \n", + "45 [-0.5843112468719482, 0.04129549860954285, 0.2... negative \n", + "46 [-1.058443307876587, 0.23154138028621674, -0.4... negative \n", + "47 [-0.9249093532562256, -0.045939963310956955, -... negative \n", + "48 [-1.7112693786621094, 0.5982310175895691, -0.2... negative \n", + "49 [-0.7861663699150085, 0.22981716692447662, -0.... negative \n", + "\n", + " sentiment_confidence text \\\n", + "0 8.0 Arsonist sets cars ablaze at dealership https:... \n", + "1 2.0 Travis Manawa [about Brandon's group]: I think... \n", + "2 3.0 If a scientist said if you jump off a cliff yo... \n", + "3 4.0 #StormBrendon is also bringing high winds, so ... \n", + "4 1.0 Like , I'm really talking about blending gener... \n", + "5 4.0 Seriously though... If that defender was taken... \n", + "6 3.0 Chemical Hazard - Advice for Cobram. For more ... \n", + "7 5.0 2,400 jobs are at stake should the deal fall t... \n", + "8 8.0 Western Cape blood stocks down to just four da... \n", + "9 0.0 Last night in Sweden 🇸🇪 2 BOMBINGS. - Th... \n", + "10 0.0 BREAKING: Ukrainian President Volodymyr Zelens... \n", + "11 3.0 Here's what you can learn from the conservativ... \n", + "12 0.0 Rajneeti News (Stardust: Oldest material on ea... \n", + "13 8.0 In 2008, Laskar and Gastineau simulated 2500 f... \n", + "14 2.0 Why are you still calling it a plane crash ðŸ§... \n", + "15 0.0 Report recieved of a 9 vehicle RTC on M66 betw... \n", + "16 0.0 ** Cleared ** The vehicles involved in the col... \n", + "17 3.0 105 is the number to call if you have a power ... \n", + "18 6.0 Stress is something that affects many of us. I... \n", + "19 5.0 Re: #AustraliaBushfires, a question for any #f... \n", + "20 9.0 Enormous exploding sinkhole in China swallows ... \n", + "21 3.0 This creature who’s soul is no longer claren... \n", + "22 9.0 In 70 CE Titus the son of the Roman Emperor Ve... \n", + "23 0.0 #BREAKING: Trudeau says the 57 Canadians kille... \n", + "24 5.0 airplane accident answers. The US designated t... \n", + "25 4.0 Unlike previous State of the nation addresses,... \n", + "26 7.0 Woodbury takes emergency action to address #wa... \n", + "27 0.0 Mudslide closes Kailua-bound lane of Pali High... \n", + "28 9.0 ᴏʀᴘʜᴀɴs... ᴛʜᴇʏ'ʀᴇ ɴᴏᴛ ... \n", + "29 2.0 Eduardo Degrano looks at the damage to his hom... \n", + "30 1.0 darinde...how i wish i could put these in hot ... \n", + "31 0.0 Earthquake Information No.1 Date and Time: 14 ... \n", + "32 4.0 “It’s a blight on the country as a whole.â... \n", + "33 1.0 I don't mind being your enemy if you're an ene... \n", + "34 9.0 A follow-up to yesterday's Pakistan post: In t... \n", + "35 3.0 “Passed away.” This euphemistic trash is p... \n", + "36 3.0 I wonder how many homes could have been saved ... \n", + "37 2.0 I just zoom it and took ss and feel attack.. H... \n", + "38 6.0 Human Body Parts Discovered In Bag In Dublin h... \n", + "39 5.0 It is not just an Australian problem. We need ... \n", + "40 2.0 If I didn't need my Crutch I would seriously w... \n", + "41 2.0 But it eventually will have to work without...... \n", + "42 2.0 ... #MAGA5G.LiVEViL+ my recommended read not f... \n", + "43 8.0 This earlier collision N'bound between J9 Red ... \n", + "44 2.0 I feel attacked. https://t.co/PrtvRimq6y \n", + "45 8.0 For the past several months, after imposing a ... \n", + "46 1.0 Yuck! Looks like she's wearing a body bag. May... \n", + "47 1.0 Such a loss to and the people of NE Fife. pays... \n", + "48 2.0 This rain going dumb, it’s flooding now \n", + "49 0.0 WEATHER ALERT: Severe Thunderstorm Warning inc... \n", + "\n", + " y \n", + "0 positive \n", + "1 negative \n", + "2 negative \n", + "3 positive \n", + "4 negative \n", + "5 positive \n", + "6 positive \n", + "7 negative \n", + "8 negative \n", + "9 positive \n", + "10 positive \n", + "11 negative \n", + "12 negative \n", + "13 negative \n", + "14 negative \n", + "15 positive \n", + "16 positive \n", + "17 positive \n", + "18 negative \n", + "19 negative \n", + "20 positive \n", + "21 negative \n", + "22 positive \n", + "23 positive \n", + "24 negative \n", + "25 negative \n", + "26 positive \n", + "27 positive \n", + "28 negative \n", + "29 negative \n", + "30 negative \n", + "31 positive \n", + "32 negative \n", + "33 negative \n", + "34 positive \n", + "35 negative \n", + "36 negative \n", + "37 negative \n", + "38 positive \n", + "39 negative \n", + "40 positive \n", + "41 negative \n", + "42 negative \n", + "43 positive \n", + "44 negative \n", + "45 positive \n", + "46 negative \n", + "47 negative \n", + "48 positive \n", + "49 positive " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
documentsentence_embedding_small_bert_L2_128sentimentsentiment_confidencetexty
0Arsonist sets cars ablaze at dealership https:...[-0.1667916625738144, 1.0302923917770386, 0.18...negative8.0Arsonist sets cars ablaze at dealership https:...positive
1Travis Manawa [about Brandon's group]: I think...[-0.9610550999641418, 0.13062980771064758, -0....negative2.0Travis Manawa [about Brandon's group]: I think...negative
2If a scientist said if you jump off a cliff yo...[-0.6688402891159058, 0.640354335308075, 0.369...negative3.0If a scientist said if you jump off a cliff yo...negative
3#StormBrendon is also bringing high winds, so ...[-1.0540131330490112, 0.8802893757820129, -0.6...negative4.0#StormBrendon is also bringing high winds, so ...positive
4Like , I'm really talking about blending gener...[-1.0963889360427856, -0.39644378423690796, 0....negative1.0Like , I'm really talking about blending gener...negative
5Seriously though... If that defender was taken...[-0.9136804938316345, 0.593141496181488, -0.13...negative4.0Seriously though... If that defender was taken...positive
6Chemical Hazard - Advice for Cobram. For more ...[-0.38091930747032166, 0.6411349177360535, 0.1...negative3.0Chemical Hazard - Advice for Cobram. For more ...positive
72,400 jobs are at stake should the deal fall t...[-0.44847720861434937, 0.5910513997077942, -0....negative5.02,400 jobs are at stake should the deal fall t...negative
8Western Cape blood stocks down to just four da...[-0.5973049402236938, 0.3307306170463562, -0.1...negative8.0Western Cape blood stocks down to just four da...negative
9Last night in Sweden 🇸🇪 2 BOMBINGS. - Th...[-0.5309799313545227, -0.5059896111488342, -0....negative0.0Last night in Sweden 🇸🇪 2 BOMBINGS. - Th...positive
10BREAKING: Ukrainian President Volodymyr Zelens...[-1.1355115175247192, -0.24506860971450806, -0...negative0.0BREAKING: Ukrainian President Volodymyr Zelens...positive
11Here's what you can learn from the conservativ...[-1.1318162679672241, 0.4271548390388489, -0.1...negative3.0Here's what you can learn from the conservativ...negative
12Rajneeti News (Stardust: Oldest material on ea...[-0.4446831941604614, -0.11513718217611313, 0....negative0.0Rajneeti News (Stardust: Oldest material on ea...negative
13In 2008, Laskar and Gastineau simulated 2500 f...[-0.7734904885292053, -0.19835318624973297, -0...negative8.0In 2008, Laskar and Gastineau simulated 2500 f...negative
14Why are you still calling it a plane crash ðŸ§...[-0.12437181919813156, 1.112841010093689, 0.27...negative2.0Why are you still calling it a plane crash ðŸ§...negative
15Report recieved of a 9 vehicle RTC on M66 betw...[-0.38135823607444763, 1.1142768859863281, -0....negative0.0Report recieved of a 9 vehicle RTC on M66 betw...positive
16** Cleared ** The vehicles involved in the col...[-0.2368348389863968, 0.9701917171478271, -0.3...negative0.0** Cleared ** The vehicles involved in the col...positive
17105 is the number to call if you have a power ...[-0.8241100907325745, 1.1454272270202637, -0.0...negative3.0105 is the number to call if you have a power ...positive
18Stress is something that affects many of us. I...[-0.9658268690109253, 0.5662633776664734, -0.2...negative6.0Stress is something that affects many of us. I...negative
19Re: #AustraliaBushfires, a question for any #f...[-0.6997240781784058, 0.5168095827102661, 0.12...negative5.0Re: #AustraliaBushfires, a question for any #f...negative
20Enormous exploding sinkhole in China swallows ...[-0.5793707370758057, -0.18351595103740692, -0...negative9.0Enormous exploding sinkhole in China swallows ...positive
21This creature who’s soul is no longer claren...[-0.4663412868976593, 0.04406387358903885, 0.1...negative3.0This creature who’s soul is no longer claren...negative
22In 70 CE Titus the son of the Roman Emperor Ve...[-0.8847564458847046, -0.1676793396472931, -0....negative9.0In 70 CE Titus the son of the Roman Emperor Ve...positive
23#BREAKING: Trudeau says the 57 Canadians kille...[-0.45176926255226135, -0.1339559406042099, -0...negative0.0#BREAKING: Trudeau says the 57 Canadians kille...positive
24airplane accident answers. The US designated t...[-0.2922760844230652, 0.3953195810317993, -0.1...negative5.0airplane accident answers. The US designated t...negative
25Unlike previous State of the nation addresses,...[-0.8965665698051453, 0.9926007986068726, -0.3...negative4.0Unlike previous State of the nation addresses,...negative
26Woodbury takes emergency action to address #wa...[-0.7830190658569336, 0.8649871349334717, 0.20...negative7.0Woodbury takes emergency action to address #wa...positive
27Mudslide closes Kailua-bound lane of Pali High...[-0.42747634649276733, 0.13894236087799072, -0...negative0.0Mudslide closes Kailua-bound lane of Pali High...positive
28ᴏʀᴘʜᴀɴs... ᴛʜᴇʏ'ʀᴇ ɴᴏᴛ ...[-0.5791727304458618, 0.11972904205322266, 0.4...negative9.0ᴏʀᴘʜᴀɴs... ᴛʜᴇʏ'ʀᴇ ɴᴏᴛ ...negative
29Eduardo Degrano looks at the damage to his hom...[-1.1623308658599854, 0.569926917552948, -0.58...negative2.0Eduardo Degrano looks at the damage to his hom...negative
30darinde...how i wish i could put these in hot ...[-1.5163923501968384, 0.6165133118629456, -0.5...negative1.0darinde...how i wish i could put these in hot ...negative
31Earthquake Information No.1 Date and Time: 14 ...[-1.085410714149475, -0.15019290149211884, -0....negative0.0Earthquake Information No.1 Date and Time: 14 ...positive
32“It’s a blight on the country as a whole.â...[-0.5412101745605469, 0.611747682094574, -0.00...negative4.0“It’s a blight on the country as a whole.â...negative
33I don't mind being your enemy if you're an ene...[-0.6546711325645447, 0.43016937375068665, -0....negative1.0I don't mind being your enemy if you're an ene...negative
34A follow-up to yesterday's Pakistan post: In t...[-1.091268539428711, -0.17232197523117065, -0....negative9.0A follow-up to yesterday's Pakistan post: In t...positive
35“Passed away.” This euphemistic trash is p...[-1.0350548028945923, 0.04470200464129448, -0....negative3.0“Passed away.” This euphemistic trash is p...negative
36I wonder how many homes could have been saved ...[-0.6520278453826904, 0.6858862042427063, -0.3...negative3.0I wonder how many homes could have been saved ...negative
37I just zoom it and took ss and feel attack.. H...[-0.9297969937324524, -0.12180554866790771, 0....negative2.0I just zoom it and took ss and feel attack.. H...negative
38Human Body Parts Discovered In Bag In Dublin h...[-0.6394882202148438, 0.8264853954315186, 0.34...negative6.0Human Body Parts Discovered In Bag In Dublin h...positive
39It is not just an Australian problem. We need ...[-0.7464641332626343, 0.9418659806251526, -0.3...negative5.0It is not just an Australian problem. We need ...negative
40If I didn't need my Crutch I would seriously w...[-1.1001498699188232, 1.0316743850708008, -0.3...negative2.0If I didn't need my Crutch I would seriously w...positive
41But it eventually will have to work without......[-0.6016444563865662, 0.95436030626297, -0.232...negative2.0But it eventually will have to work without......negative
42... #MAGA5G.LiVEViL+ my recommended read not f...[-0.8835289478302002, 0.3090209364891052, 0.29...negative2.0... #MAGA5G.LiVEViL+ my recommended read not f...negative
43This earlier collision N'bound between J9 Red ...[-0.19473275542259216, 0.8363125920295715, 0.0...negative8.0This earlier collision N'bound between J9 Red ...positive
44I feel attacked. https://t.co/PrtvRimq6y[-0.4955633580684662, 0.16522228717803955, 0.6...negative2.0I feel attacked. https://t.co/PrtvRimq6ynegative
45For the past several months, after imposing a ...[-0.5843112468719482, 0.04129549860954285, 0.2...negative8.0For the past several months, after imposing a ...positive
46Yuck! Looks like she's wearing a body bag. May...[-1.058443307876587, 0.23154138028621674, -0.4...negative1.0Yuck! Looks like she's wearing a body bag. May...negative
47Such a loss to and the people of NE Fife. pays...[-0.9249093532562256, -0.045939963310956955, -...negative1.0Such a loss to and the people of NE Fife. pays...negative
48This rain going dumb, it’s flooding now[-1.7112693786621094, 0.5982310175895691, -0.2...negative2.0This rain going dumb, it’s flooding nowpositive
49WEATHER ALERT: Severe Thunderstorm Warning inc...[-0.7861663699150085, 0.22981716692447662, -0....negative0.0WEATHER ALERT: Severe Thunderstorm Warning inc...positive
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 20 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lVyOE2wV0fw_" + }, + "source": [ + "# 4. Test the fitted pipe on new example" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 150 + }, + "id": "qdCUg2MR0PD2", + "outputId": "26791371-6cec-4fc7-cd18-feee3ab9ff33" + }, + "source": [ + "fitted_pipe.predict(\"All the buildings in the capital were destroyed\")" + ], + "execution_count": 21, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "sentence_detector_dl download started this may take some time.\n", + "Approximate size to download 354.6 KB\n", + "[OK!]\n", + "Warning::Spark Session already created, some configs may not take.\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " sentence \\\n", + "0 All the buildings in the capital were destroyed \n", + "\n", + " sentence_embedding_small_bert_L2_128 sentiment \\\n", + "0 [-0.33511286973953247, 0.3084930181503296, -1.... negative \n", + "\n", + " sentiment_confidence \n", + "0 0.99924 " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
sentencesentence_embedding_small_bert_L2_128sentimentsentiment_confidence
0All the buildings in the capital were destroyed[-0.33511286973953247, 0.3084930181503296, -1....negative0.99924
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 21 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xflpwrVjjBVD" + }, + "source": [ + "## 5. Configure pipe training parameters" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "UtsAUGTmOTms", + "outputId": "f3a5243e-8d0b-4d05-c78c-2432e227fb1f" + }, + "source": [ + "trainable_pipe.print_info()" + ], + "execution_count": 22, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n", + ">>> component_list['bert_sentence_embeddings@sent_small_bert_L2_128'] has settable params:\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setBatchSize(8) | Info: Size of every batch | Currently set to : 8\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setIsLong(False) | Info: Use Long type instead of Int type for inputs buffer - Some Bert models require Long instead of Int. | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setMaxSentenceLength(128) | Info: Max sentence length to process | Currently set to : 128\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setDimension(128) | Info: Number of embedding dimensions | Currently set to : 128\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setCaseSensitive(False) | Info: whether to ignore case in tokens for embeddings matching | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setStorageRef('sent_small_bert_L2_128') | Info: unique reference name for identification | Currently set to : sent_small_bert_L2_128\n", + ">>> component_list['document_assembler'] has settable params:\n", + "component_list['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n", + ">>> component_list['sentiment_dl@sent_small_bert_L2_128'] has settable params:\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setThreshold(0.6) | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setThresholdLabel('neutral') | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setStorageRef('sent_small_bert_L2_128') | Info: unique reference name for identification | Currently set to : sent_small_bert_L2_128\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2GJdDNV9jEIe" + }, + "source": [ + "## 6. Retrain with new parameters" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "id": "mptfvHx-MMMX", + "outputId": "57da7cc5-2818-4378-a208-e63a5368ce52" + }, + "source": [ + "# Train longer!\n", + "trainable_pipe = nlp.load('train.sentiment')\n", + "trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(5)\n", + "fitted_pipe = trainable_pipe.fit(train_df.iloc[:50])\n", + "# predict with the trainable pipeline on dataset and get predictions\n", + "preds = fitted_pipe.predict(train_df.iloc[:50],output_level='document')\n", + "\n", + "#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n", + "preds.dropna(inplace=True)\n", + "print(classification_report(preds['y'], preds['sentiment']))\n", + "\n", + "preds" + ], + "execution_count": 23, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", + "sent_small_bert_L2_128 download started this may take some time.\n", + "Approximate size to download 16.1 MB\n", + "[OK!]\n", + " precision recall f1-score support\n", + "\n", + " negative 0.56 1.00 0.72 28\n", + " positive 0.00 0.00 0.00 22\n", + "\n", + " accuracy 0.56 50\n", + " macro avg 0.28 0.50 0.36 50\n", + "weighted avg 0.31 0.56 0.40 50\n", + "\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " document \\\n", + "0 Arsonist sets cars ablaze at dealership https:... \n", + "1 Travis Manawa [about Brandon's group]: I think... \n", + "2 If a scientist said if you jump off a cliff yo... \n", + "3 #StormBrendon is also bringing high winds, so ... \n", + "4 Like , I'm really talking about blending gener... \n", + "5 Seriously though... If that defender was taken... \n", + "6 Chemical Hazard - Advice for Cobram. For more ... \n", + "7 2,400 jobs are at stake should the deal fall t... \n", + "8 Western Cape blood stocks down to just four da... \n", + "9 Last night in Sweden 🇸🇪 2 BOMBINGS. - Th... \n", + "10 BREAKING: Ukrainian President Volodymyr Zelens... \n", + "11 Here's what you can learn from the conservativ... \n", + "12 Rajneeti News (Stardust: Oldest material on ea... \n", + "13 In 2008, Laskar and Gastineau simulated 2500 f... \n", + "14 Why are you still calling it a plane crash ðŸ§... \n", + "15 Report recieved of a 9 vehicle RTC on M66 betw... \n", + "16 ** Cleared ** The vehicles involved in the col... \n", + "17 105 is the number to call if you have a power ... \n", + "18 Stress is something that affects many of us. I... \n", + "19 Re: #AustraliaBushfires, a question for any #f... \n", + "20 Enormous exploding sinkhole in China swallows ... \n", + "21 This creature who’s soul is no longer claren... \n", + "22 In 70 CE Titus the son of the Roman Emperor Ve... \n", + "23 #BREAKING: Trudeau says the 57 Canadians kille... \n", + "24 airplane accident answers. The US designated t... \n", + "25 Unlike previous State of the nation addresses,... \n", + "26 Woodbury takes emergency action to address #wa... \n", + "27 Mudslide closes Kailua-bound lane of Pali High... \n", + "28 ᴏʀᴘʜᴀɴs... ᴛʜᴇʏ'ʀᴇ ɴᴏᴛ ... \n", + "29 Eduardo Degrano looks at the damage to his hom... \n", + "30 darinde...how i wish i could put these in hot ... \n", + "31 Earthquake Information No.1 Date and Time: 14 ... \n", + "32 “It’s a blight on the country as a whole.â... \n", + "33 I don't mind being your enemy if you're an ene... \n", + "34 A follow-up to yesterday's Pakistan post: In t... \n", + "35 “Passed away.” This euphemistic trash is p... \n", + "36 I wonder how many homes could have been saved ... \n", + "37 I just zoom it and took ss and feel attack.. H... \n", + "38 Human Body Parts Discovered In Bag In Dublin h... \n", + "39 It is not just an Australian problem. We need ... \n", + "40 If I didn't need my Crutch I would seriously w... \n", + "41 But it eventually will have to work without...... \n", + "42 ... #MAGA5G.LiVEViL+ my recommended read not f... \n", + "43 This earlier collision N'bound between J9 Red ... \n", + "44 I feel attacked. https://t.co/PrtvRimq6y \n", + "45 For the past several months, after imposing a ... \n", + "46 Yuck! Looks like she's wearing a body bag. May... \n", + "47 Such a loss to and the people of NE Fife. pays... \n", + "48 This rain going dumb, it’s flooding now \n", + "49 WEATHER ALERT: Severe Thunderstorm Warning inc... \n", + "\n", + " sentence_embedding_small_bert_L2_128 sentiment \\\n", + "0 [-0.1667916625738144, 1.0302923917770386, 0.18... negative \n", + "1 [-0.9610550999641418, 0.13062980771064758, -0.... negative \n", + "2 [-0.6688402891159058, 0.640354335308075, 0.369... negative \n", + "3 [-1.0540131330490112, 0.8802893757820129, -0.6... negative \n", + "4 [-1.0963889360427856, -0.39644378423690796, 0.... negative \n", + "5 [-0.9136804938316345, 0.593141496181488, -0.13... negative \n", + "6 [-0.38091930747032166, 0.6411349177360535, 0.1... negative \n", + "7 [-0.44847720861434937, 0.5910513997077942, -0.... negative \n", + "8 [-0.5973049402236938, 0.3307306170463562, -0.1... negative \n", + "9 [-0.5309799313545227, -0.5059896111488342, -0.... negative \n", + "10 [-1.1355115175247192, -0.24506860971450806, -0... negative \n", + "11 [-1.1318162679672241, 0.4271548390388489, -0.1... negative \n", + "12 [-0.4446831941604614, -0.11513718217611313, 0.... negative \n", + "13 [-0.7734904885292053, -0.19835318624973297, -0... negative \n", + "14 [-0.12437181919813156, 1.112841010093689, 0.27... negative \n", + "15 [-0.38135823607444763, 1.1142768859863281, -0.... negative \n", + "16 [-0.2368348389863968, 0.9701917171478271, -0.3... negative \n", + "17 [-0.8241100907325745, 1.1454272270202637, -0.0... negative \n", + "18 [-0.9658268690109253, 0.5662633776664734, -0.2... negative \n", + "19 [-0.6997240781784058, 0.5168095827102661, 0.12... negative \n", + "20 [-0.5793707370758057, -0.18351595103740692, -0... negative \n", + "21 [-0.4663412868976593, 0.04406387358903885, 0.1... negative \n", + "22 [-0.8847564458847046, -0.1676793396472931, -0.... negative \n", + "23 [-0.45176926255226135, -0.1339559406042099, -0... negative \n", + "24 [-0.2922760844230652, 0.3953195810317993, -0.1... negative \n", + "25 [-0.8965665698051453, 0.9926007986068726, -0.3... negative \n", + "26 [-0.7830190658569336, 0.8649871349334717, 0.20... negative \n", + "27 [-0.42747634649276733, 0.13894236087799072, -0... negative \n", + "28 [-0.5791727304458618, 0.11972904205322266, 0.4... negative \n", + "29 [-1.1623308658599854, 0.569926917552948, -0.58... negative \n", + "30 [-1.5163923501968384, 0.6165133118629456, -0.5... negative \n", + "31 [-1.085410714149475, -0.15019290149211884, -0.... negative \n", + "32 [-0.5412101745605469, 0.611747682094574, -0.00... negative \n", + "33 [-0.6546711325645447, 0.43016937375068665, -0.... negative \n", + "34 [-1.091268539428711, -0.17232197523117065, -0.... negative \n", + "35 [-1.0350548028945923, 0.04470200464129448, -0.... negative \n", + "36 [-0.6520278453826904, 0.6858862042427063, -0.3... negative \n", + "37 [-0.9297969937324524, -0.12180554866790771, 0.... negative \n", + "38 [-0.6394882202148438, 0.8264853954315186, 0.34... negative \n", + "39 [-0.7464641332626343, 0.9418659806251526, -0.3... negative \n", + "40 [-1.1001498699188232, 1.0316743850708008, -0.3... negative \n", + "41 [-0.6016444563865662, 0.95436030626297, -0.232... negative \n", + "42 [-0.8835289478302002, 0.3090209364891052, 0.29... negative \n", + "43 [-0.19473275542259216, 0.8363125920295715, 0.0... negative \n", + "44 [-0.4955633580684662, 0.16522228717803955, 0.6... negative \n", + "45 [-0.5843112468719482, 0.04129549860954285, 0.2... negative \n", + "46 [-1.058443307876587, 0.23154138028621674, -0.4... negative \n", + "47 [-0.9249093532562256, -0.045939963310956955, -... negative \n", + "48 [-1.7112693786621094, 0.5982310175895691, -0.2... negative \n", + "49 [-0.7861663699150085, 0.22981716692447662, -0.... negative \n", + "\n", + " sentiment_confidence text \\\n", + "0 0.0 Arsonist sets cars ablaze at dealership https:... \n", + "1 8.0 Travis Manawa [about Brandon's group]: I think... \n", + "2 3.0 If a scientist said if you jump off a cliff yo... \n", + "3 8.0 #StormBrendon is also bringing high winds, so ... \n", + "4 1.0 Like , I'm really talking about blending gener... \n", + "5 5.0 Seriously though... If that defender was taken... \n", + "6 9.0 Chemical Hazard - Advice for Cobram. For more ... \n", + "7 6.0 2,400 jobs are at stake should the deal fall t... \n", + "8 0.0 Western Cape blood stocks down to just four da... \n", + "9 0.0 Last night in Sweden 🇸🇪 2 BOMBINGS. - Th... \n", + "10 0.0 BREAKING: Ukrainian President Volodymyr Zelens... \n", + "11 1.0 Here's what you can learn from the conservativ... \n", + "12 0.0 Rajneeti News (Stardust: Oldest material on ea... \n", + "13 0.0 In 2008, Laskar and Gastineau simulated 2500 f... \n", + "14 4.0 Why are you still calling it a plane crash ðŸ§... \n", + "15 0.0 Report recieved of a 9 vehicle RTC on M66 betw... \n", + "16 0.0 ** Cleared ** The vehicles involved in the col... \n", + "17 6.0 105 is the number to call if you have a power ... \n", + "18 2.0 Stress is something that affects many of us. I... \n", + "19 0.0 Re: #AustraliaBushfires, a question for any #f... \n", + "20 0.0 Enormous exploding sinkhole in China swallows ... \n", + "21 3.0 This creature who’s soul is no longer claren... \n", + "22 0.0 In 70 CE Titus the son of the Roman Emperor Ve... \n", + "23 0.0 #BREAKING: Trudeau says the 57 Canadians kille... \n", + "24 0.0 airplane accident answers. The US designated t... \n", + "25 3.0 Unlike previous State of the nation addresses,... \n", + "26 0.0 Woodbury takes emergency action to address #wa... \n", + "27 0.0 Mudslide closes Kailua-bound lane of Pali High... \n", + "28 2.0 ᴏʀᴘʜᴀɴs... ᴛʜᴇʏ'ʀᴇ ɴᴏᴛ ... \n", + "29 7.0 Eduardo Degrano looks at the damage to his hom... \n", + "30 3.0 darinde...how i wish i could put these in hot ... \n", + "31 0.0 Earthquake Information No.1 Date and Time: 14 ... \n", + "32 6.0 “It’s a blight on the country as a whole.â... \n", + "33 7.0 I don't mind being your enemy if you're an ene... \n", + "34 0.0 A follow-up to yesterday's Pakistan post: In t... \n", + "35 2.0 “Passed away.” This euphemistic trash is p... \n", + "36 4.0 I wonder how many homes could have been saved ... \n", + "37 1.0 I just zoom it and took ss and feel attack.. H... \n", + "38 0.0 Human Body Parts Discovered In Bag In Dublin h... \n", + "39 5.0 It is not just an Australian problem. We need ... \n", + "40 7.0 If I didn't need my Crutch I would seriously w... \n", + "41 3.0 But it eventually will have to work without...... \n", + "42 2.0 ... #MAGA5G.LiVEViL+ my recommended read not f... \n", + "43 0.0 This earlier collision N'bound between J9 Red ... \n", + "44 6.0 I feel attacked. https://t.co/PrtvRimq6y \n", + "45 0.0 For the past several months, after imposing a ... \n", + "46 2.0 Yuck! Looks like she's wearing a body bag. May... \n", + "47 5.0 Such a loss to and the people of NE Fife. pays... \n", + "48 2.0 This rain going dumb, it’s flooding now \n", + "49 0.0 WEATHER ALERT: Severe Thunderstorm Warning inc... \n", + "\n", + " y \n", + "0 positive \n", + "1 negative \n", + "2 negative \n", + "3 positive \n", + "4 negative \n", + "5 positive \n", + "6 positive \n", + "7 negative \n", + "8 negative \n", + "9 positive \n", + "10 positive \n", + "11 negative \n", + "12 negative \n", + "13 negative \n", + "14 negative \n", + "15 positive \n", + "16 positive \n", + "17 positive \n", + "18 negative \n", + "19 negative \n", + "20 positive \n", + "21 negative \n", + "22 positive \n", + "23 positive \n", + "24 negative \n", + "25 negative \n", + "26 positive \n", + "27 positive \n", + "28 negative \n", + "29 negative \n", + "30 negative \n", + "31 positive \n", + "32 negative \n", + "33 negative \n", + "34 positive \n", + "35 negative \n", + "36 negative \n", + "37 negative \n", + "38 positive \n", + "39 negative \n", + "40 positive \n", + "41 negative \n", + "42 negative \n", + "43 positive \n", + "44 negative \n", + "45 positive \n", + "46 negative \n", + "47 negative \n", + "48 positive \n", + "49 positive " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
documentsentence_embedding_small_bert_L2_128sentimentsentiment_confidencetexty
0Arsonist sets cars ablaze at dealership https:...[-0.1667916625738144, 1.0302923917770386, 0.18...negative0.0Arsonist sets cars ablaze at dealership https:...positive
1Travis Manawa [about Brandon's group]: I think...[-0.9610550999641418, 0.13062980771064758, -0....negative8.0Travis Manawa [about Brandon's group]: I think...negative
2If a scientist said if you jump off a cliff yo...[-0.6688402891159058, 0.640354335308075, 0.369...negative3.0If a scientist said if you jump off a cliff yo...negative
3#StormBrendon is also bringing high winds, so ...[-1.0540131330490112, 0.8802893757820129, -0.6...negative8.0#StormBrendon is also bringing high winds, so ...positive
4Like , I'm really talking about blending gener...[-1.0963889360427856, -0.39644378423690796, 0....negative1.0Like , I'm really talking about blending gener...negative
5Seriously though... If that defender was taken...[-0.9136804938316345, 0.593141496181488, -0.13...negative5.0Seriously though... If that defender was taken...positive
6Chemical Hazard - Advice for Cobram. For more ...[-0.38091930747032166, 0.6411349177360535, 0.1...negative9.0Chemical Hazard - Advice for Cobram. For more ...positive
72,400 jobs are at stake should the deal fall t...[-0.44847720861434937, 0.5910513997077942, -0....negative6.02,400 jobs are at stake should the deal fall t...negative
8Western Cape blood stocks down to just four da...[-0.5973049402236938, 0.3307306170463562, -0.1...negative0.0Western Cape blood stocks down to just four da...negative
9Last night in Sweden 🇸🇪 2 BOMBINGS. - Th...[-0.5309799313545227, -0.5059896111488342, -0....negative0.0Last night in Sweden 🇸🇪 2 BOMBINGS. - Th...positive
10BREAKING: Ukrainian President Volodymyr Zelens...[-1.1355115175247192, -0.24506860971450806, -0...negative0.0BREAKING: Ukrainian President Volodymyr Zelens...positive
11Here's what you can learn from the conservativ...[-1.1318162679672241, 0.4271548390388489, -0.1...negative1.0Here's what you can learn from the conservativ...negative
12Rajneeti News (Stardust: Oldest material on ea...[-0.4446831941604614, -0.11513718217611313, 0....negative0.0Rajneeti News (Stardust: Oldest material on ea...negative
13In 2008, Laskar and Gastineau simulated 2500 f...[-0.7734904885292053, -0.19835318624973297, -0...negative0.0In 2008, Laskar and Gastineau simulated 2500 f...negative
14Why are you still calling it a plane crash ðŸ§...[-0.12437181919813156, 1.112841010093689, 0.27...negative4.0Why are you still calling it a plane crash ðŸ§...negative
15Report recieved of a 9 vehicle RTC on M66 betw...[-0.38135823607444763, 1.1142768859863281, -0....negative0.0Report recieved of a 9 vehicle RTC on M66 betw...positive
16** Cleared ** The vehicles involved in the col...[-0.2368348389863968, 0.9701917171478271, -0.3...negative0.0** Cleared ** The vehicles involved in the col...positive
17105 is the number to call if you have a power ...[-0.8241100907325745, 1.1454272270202637, -0.0...negative6.0105 is the number to call if you have a power ...positive
18Stress is something that affects many of us. I...[-0.9658268690109253, 0.5662633776664734, -0.2...negative2.0Stress is something that affects many of us. I...negative
19Re: #AustraliaBushfires, a question for any #f...[-0.6997240781784058, 0.5168095827102661, 0.12...negative0.0Re: #AustraliaBushfires, a question for any #f...negative
20Enormous exploding sinkhole in China swallows ...[-0.5793707370758057, -0.18351595103740692, -0...negative0.0Enormous exploding sinkhole in China swallows ...positive
21This creature who’s soul is no longer claren...[-0.4663412868976593, 0.04406387358903885, 0.1...negative3.0This creature who’s soul is no longer claren...negative
22In 70 CE Titus the son of the Roman Emperor Ve...[-0.8847564458847046, -0.1676793396472931, -0....negative0.0In 70 CE Titus the son of the Roman Emperor Ve...positive
23#BREAKING: Trudeau says the 57 Canadians kille...[-0.45176926255226135, -0.1339559406042099, -0...negative0.0#BREAKING: Trudeau says the 57 Canadians kille...positive
24airplane accident answers. The US designated t...[-0.2922760844230652, 0.3953195810317993, -0.1...negative0.0airplane accident answers. The US designated t...negative
25Unlike previous State of the nation addresses,...[-0.8965665698051453, 0.9926007986068726, -0.3...negative3.0Unlike previous State of the nation addresses,...negative
26Woodbury takes emergency action to address #wa...[-0.7830190658569336, 0.8649871349334717, 0.20...negative0.0Woodbury takes emergency action to address #wa...positive
27Mudslide closes Kailua-bound lane of Pali High...[-0.42747634649276733, 0.13894236087799072, -0...negative0.0Mudslide closes Kailua-bound lane of Pali High...positive
28ᴏʀᴘʜᴀɴs... ᴛʜᴇʏ'ʀᴇ ɴᴏᴛ ...[-0.5791727304458618, 0.11972904205322266, 0.4...negative2.0ᴏʀᴘʜᴀɴs... ᴛʜᴇʏ'ʀᴇ ɴᴏᴛ ...negative
29Eduardo Degrano looks at the damage to his hom...[-1.1623308658599854, 0.569926917552948, -0.58...negative7.0Eduardo Degrano looks at the damage to his hom...negative
30darinde...how i wish i could put these in hot ...[-1.5163923501968384, 0.6165133118629456, -0.5...negative3.0darinde...how i wish i could put these in hot ...negative
31Earthquake Information No.1 Date and Time: 14 ...[-1.085410714149475, -0.15019290149211884, -0....negative0.0Earthquake Information No.1 Date and Time: 14 ...positive
32“It’s a blight on the country as a whole.â...[-0.5412101745605469, 0.611747682094574, -0.00...negative6.0“It’s a blight on the country as a whole.â...negative
33I don't mind being your enemy if you're an ene...[-0.6546711325645447, 0.43016937375068665, -0....negative7.0I don't mind being your enemy if you're an ene...negative
34A follow-up to yesterday's Pakistan post: In t...[-1.091268539428711, -0.17232197523117065, -0....negative0.0A follow-up to yesterday's Pakistan post: In t...positive
35“Passed away.” This euphemistic trash is p...[-1.0350548028945923, 0.04470200464129448, -0....negative2.0“Passed away.” This euphemistic trash is p...negative
36I wonder how many homes could have been saved ...[-0.6520278453826904, 0.6858862042427063, -0.3...negative4.0I wonder how many homes could have been saved ...negative
37I just zoom it and took ss and feel attack.. H...[-0.9297969937324524, -0.12180554866790771, 0....negative1.0I just zoom it and took ss and feel attack.. H...negative
38Human Body Parts Discovered In Bag In Dublin h...[-0.6394882202148438, 0.8264853954315186, 0.34...negative0.0Human Body Parts Discovered In Bag In Dublin h...positive
39It is not just an Australian problem. We need ...[-0.7464641332626343, 0.9418659806251526, -0.3...negative5.0It is not just an Australian problem. We need ...negative
40If I didn't need my Crutch I would seriously w...[-1.1001498699188232, 1.0316743850708008, -0.3...negative7.0If I didn't need my Crutch I would seriously w...positive
41But it eventually will have to work without......[-0.6016444563865662, 0.95436030626297, -0.232...negative3.0But it eventually will have to work without......negative
42... #MAGA5G.LiVEViL+ my recommended read not f...[-0.8835289478302002, 0.3090209364891052, 0.29...negative2.0... #MAGA5G.LiVEViL+ my recommended read not f...negative
43This earlier collision N'bound between J9 Red ...[-0.19473275542259216, 0.8363125920295715, 0.0...negative0.0This earlier collision N'bound between J9 Red ...positive
44I feel attacked. https://t.co/PrtvRimq6y[-0.4955633580684662, 0.16522228717803955, 0.6...negative6.0I feel attacked. https://t.co/PrtvRimq6ynegative
45For the past several months, after imposing a ...[-0.5843112468719482, 0.04129549860954285, 0.2...negative0.0For the past several months, after imposing a ...positive
46Yuck! Looks like she's wearing a body bag. May...[-1.058443307876587, 0.23154138028621674, -0.4...negative2.0Yuck! Looks like she's wearing a body bag. May...negative
47Such a loss to and the people of NE Fife. pays...[-0.9249093532562256, -0.045939963310956955, -...negative5.0Such a loss to and the people of NE Fife. pays...negative
48This rain going dumb, it’s flooding now[-1.7112693786621094, 0.5982310175895691, -0.2...negative2.0This rain going dumb, it’s flooding nowpositive
49WEATHER ALERT: Severe Thunderstorm Warning inc...[-0.7861663699150085, 0.22981716692447662, -0....negative0.0WEATHER ALERT: Severe Thunderstorm Warning inc...positive
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 23 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qFoT-s1MjTSS" + }, + "source": [ + "# 7. Try training with different Embeddings" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "nxWFzQOhjWC8", + "outputId": "a05a0ef1-4d3e-4c7f-87b7-3ff1849608a3" + }, + "source": [ + "# We can use nlu.print_components(action='embed_sentence') to see every possibler sentence embedding we could use. Lets use bert!\n", + "nlp.nlu.print_components(action='embed_sentence')" + ], + "execution_count": 24, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "For language NLU provides the following Models : \n", + "nlu.load('am.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_amharic\n", + "For language NLU provides the following Models : \n", + "nlu.load('de.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "For language NLU provides the following Models : \n", + "nlu.load('el.embed_sentence.bert.base_uncased') returns Spark NLP model_anno_obj sent_bert_base_uncased\n", + "For language NLU provides the following Models : \n", + "nlu.load('en.embed_sentence') returns Spark NLP model_anno_obj tfhub_use\n", + "nlu.load('en.embed_sentence.albert') returns Spark NLP model_anno_obj albert_base_uncased\n", + "nlu.load('en.embed_sentence.bert') returns Spark NLP model_anno_obj sent_bert_base_uncased\n", + "nlu.load('en.embed_sentence.bert.base_uncased_legal') returns Spark NLP model_anno_obj sent_bert_base_uncased_legal\n", + "nlu.load('en.embed_sentence.bert.finetuned') returns Spark NLP model_anno_obj sbert_setfit_finetuned_financial_text_classification\n", + "nlu.load('en.embed_sentence.bert.pubmed') returns Spark NLP model_anno_obj sent_bert_pubmed\n", + "nlu.load('en.embed_sentence.bert.pubmed_squad2') returns Spark NLP model_anno_obj sent_bert_pubmed_squad2\n", + "nlu.load('en.embed_sentence.bert.wiki_books') returns Spark NLP model_anno_obj sent_bert_wiki_books\n", + "nlu.load('en.embed_sentence.bert.wiki_books_mnli') returns Spark NLP model_anno_obj sent_bert_wiki_books_mnli\n", + "nlu.load('en.embed_sentence.bert.wiki_books_qnli') returns Spark NLP model_anno_obj sent_bert_wiki_books_qnli\n", + "nlu.load('en.embed_sentence.bert.wiki_books_qqp') returns Spark NLP model_anno_obj sent_bert_wiki_books_qqp\n", + "nlu.load('en.embed_sentence.bert.wiki_books_squad2') returns Spark NLP model_anno_obj sent_bert_wiki_books_squad2\n", + "nlu.load('en.embed_sentence.bert.wiki_books_sst2') returns Spark NLP model_anno_obj sent_bert_wiki_books_sst2\n", + "nlu.load('en.embed_sentence.bert_base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "nlu.load('en.embed_sentence.bert_base_uncased') returns Spark NLP model_anno_obj sent_bert_base_uncased\n", + "nlu.load('en.embed_sentence.bert_large_cased') returns Spark NLP model_anno_obj sent_bert_large_cased\n", + "nlu.load('en.embed_sentence.bert_large_uncased') returns Spark NLP model_anno_obj sent_bert_large_uncased\n", + "nlu.load('en.embed_sentence.bert_use_cmlm_en_base') returns Spark NLP model_anno_obj sent_bert_use_cmlm_en_base\n", + "nlu.load('en.embed_sentence.bert_use_cmlm_en_large') returns Spark NLP model_anno_obj sent_bert_use_cmlm_en_large\n", + "nlu.load('en.embed_sentence.biobert.clinical_base_cased') returns Spark NLP model_anno_obj sent_biobert_clinical_base_cased\n", + "nlu.load('en.embed_sentence.biobert.discharge_base_cased') returns Spark NLP model_anno_obj sent_biobert_discharge_base_cased\n", + "nlu.load('en.embed_sentence.biobert.pmc_base_cased') returns Spark NLP model_anno_obj sent_biobert_pmc_base_cased\n", + "nlu.load('en.embed_sentence.biobert.pubmed_base_cased') returns Spark NLP model_anno_obj sent_biobert_pubmed_base_cased\n", + "nlu.load('en.embed_sentence.biobert.pubmed_large_cased') returns Spark NLP model_anno_obj sent_biobert_pubmed_large_cased\n", + "nlu.load('en.embed_sentence.biobert.pubmed_pmc_base_cased') returns Spark NLP model_anno_obj sent_biobert_pubmed_pmc_base_cased\n", + "nlu.load('en.embed_sentence.covidbert.large_uncased') returns Spark NLP model_anno_obj sent_covidbert_large_uncased\n", + "nlu.load('en.embed_sentence.distil_roberta.distilled_base') returns Spark NLP model_anno_obj sent_distilroberta_base\n", + "nlu.load('en.embed_sentence.doc2vec') returns Spark NLP model_anno_obj doc2vec_gigaword_300\n", + "nlu.load('en.embed_sentence.doc2vec.gigaword_300') returns Spark NLP model_anno_obj doc2vec_gigaword_300\n", + "nlu.load('en.embed_sentence.doc2vec.gigaword_wiki_300') returns Spark NLP model_anno_obj doc2vec_gigaword_wiki_300\n", + "nlu.load('en.embed_sentence.electra') returns Spark NLP model_anno_obj sent_electra_small_uncased\n", + "nlu.load('en.embed_sentence.electra_base_uncased') returns Spark NLP model_anno_obj sent_electra_base_uncased\n", + "nlu.load('en.embed_sentence.electra_large_uncased') returns Spark NLP model_anno_obj sent_electra_large_uncased\n", + "nlu.load('en.embed_sentence.electra_small_uncased') returns Spark NLP model_anno_obj sent_electra_small_uncased\n", + "nlu.load('en.embed_sentence.roberta.base') returns Spark NLP model_anno_obj sent_roberta_base\n", + "nlu.load('en.embed_sentence.roberta.large') returns Spark NLP model_anno_obj sent_roberta_large\n", + "nlu.load('en.embed_sentence.small_bert_L10_128') returns Spark NLP model_anno_obj sent_small_bert_L10_128\n", + "nlu.load('en.embed_sentence.small_bert_L10_256') returns Spark NLP model_anno_obj sent_small_bert_L10_256\n", + "nlu.load('en.embed_sentence.small_bert_L10_512') returns Spark NLP model_anno_obj sent_small_bert_L10_512\n", + "nlu.load('en.embed_sentence.small_bert_L10_768') returns Spark NLP model_anno_obj sent_small_bert_L10_768\n", + "nlu.load('en.embed_sentence.small_bert_L12_128') returns Spark NLP model_anno_obj sent_small_bert_L12_128\n", + "nlu.load('en.embed_sentence.small_bert_L12_256') returns Spark NLP model_anno_obj sent_small_bert_L12_256\n", + "nlu.load('en.embed_sentence.small_bert_L12_512') returns Spark NLP model_anno_obj sent_small_bert_L12_512\n", + "nlu.load('en.embed_sentence.small_bert_L12_768') returns Spark NLP model_anno_obj sent_small_bert_L12_768\n", + "nlu.load('en.embed_sentence.small_bert_L2_128') returns Spark NLP model_anno_obj sent_small_bert_L2_128\n", + "nlu.load('en.embed_sentence.small_bert_L2_256') returns Spark NLP model_anno_obj sent_small_bert_L2_256\n", + "nlu.load('en.embed_sentence.small_bert_L2_512') returns Spark NLP model_anno_obj sent_small_bert_L2_512\n", + "nlu.load('en.embed_sentence.small_bert_L2_768') returns Spark NLP model_anno_obj sent_small_bert_L2_768\n", + "nlu.load('en.embed_sentence.small_bert_L4_128') returns Spark NLP model_anno_obj sent_small_bert_L4_128\n", + "nlu.load('en.embed_sentence.small_bert_L4_256') returns Spark NLP model_anno_obj sent_small_bert_L4_256\n", + "nlu.load('en.embed_sentence.small_bert_L4_512') returns Spark NLP model_anno_obj sent_small_bert_L4_512\n", + "nlu.load('en.embed_sentence.small_bert_L4_768') returns Spark NLP model_anno_obj sent_small_bert_L4_768\n", + "nlu.load('en.embed_sentence.small_bert_L6_128') returns Spark NLP model_anno_obj sent_small_bert_L6_128\n", + "nlu.load('en.embed_sentence.small_bert_L6_256') returns Spark NLP model_anno_obj sent_small_bert_L6_256\n", + "nlu.load('en.embed_sentence.small_bert_L6_512') returns Spark NLP model_anno_obj sent_small_bert_L6_512\n", + "nlu.load('en.embed_sentence.small_bert_L6_768') returns Spark NLP model_anno_obj sent_small_bert_L6_768\n", + "nlu.load('en.embed_sentence.small_bert_L8_128') returns Spark NLP model_anno_obj sent_small_bert_L8_128\n", + "nlu.load('en.embed_sentence.small_bert_L8_256') returns Spark NLP model_anno_obj sent_small_bert_L8_256\n", + "nlu.load('en.embed_sentence.small_bert_L8_512') returns Spark NLP model_anno_obj sent_small_bert_L8_512\n", + "nlu.load('en.embed_sentence.small_bert_L8_768') returns Spark NLP model_anno_obj sent_small_bert_L8_768\n", + "nlu.load('en.embed_sentence.tfhub_use') returns Spark NLP model_anno_obj tfhub_use\n", + "nlu.load('en.embed_sentence.tfhub_use.lg') returns Spark NLP model_anno_obj tfhub_use_lg\n", + "nlu.load('en.embed_sentence.use') returns Spark NLP model_anno_obj tfhub_use\n", + "nlu.load('en.embed_sentence.use.lg') returns Spark NLP model_anno_obj tfhub_use_lg\n", + "For language NLU provides the following Models : \n", + "nlu.load('es.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "nlu.load('es.embed_sentence.bert.base_uncased') returns Spark NLP model_anno_obj sent_bert_base_uncased\n", + "For language NLU provides the following Models : \n", + "nlu.load('fi.embed_sentence.bert') returns Spark NLP model_anno_obj bert_base_finnish_uncased\n", + "nlu.load('fi.embed_sentence.bert.cased') returns Spark NLP model_anno_obj bert_base_finnish_cased\n", + "nlu.load('fi.embed_sentence.bert.uncased') returns Spark NLP model_anno_obj bert_base_finnish_uncased\n", + "For language NLU provides the following Models : \n", + "nlu.load('ha.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_hausa\n", + "For language NLU provides the following Models : \n", + "nlu.load('ig.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_igbo\n", + "For language NLU provides the following Models : \n", + "nlu.load('lg.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_luganda\n", + "For language NLU provides the following Models : \n", + "nlu.load('nl.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "For language NLU provides the following Models : \n", + "nlu.load('pcm.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_naija\n", + "For language NLU provides the following Models : \n", + "nlu.load('pt.embed_sentence.bert.base_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_base_tsdae_sts\n", + "nlu.load('pt.embed_sentence.bert.cased_large_legal') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.1\n", + "nlu.load('pt.embed_sentence.bert.large_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_gpl_sts\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.10.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.10\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.2.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.2\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.3.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.3\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.4.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.4\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.5.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.5\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.7.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.7\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.8.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.8\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.9.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.9\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v1.0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v1.0\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_gpl_nli_sts_v0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_gpl_nli_sts_v0\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_gpl_nli_sts_v1.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_gpl_nli_sts_v1\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_nli_sts_v0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_nli_sts_v0\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_nli_sts_v1.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_nli_sts_v1\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_sts_v0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_sts_v0\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_sts_v1.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_sts_v1\n", + "nlu.load('pt.embed_sentence.bert.v2_base_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_base_ma_v2\n", + "nlu.load('pt.embed_sentence.bert.v2_large_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_sts_v2\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.assin.base.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_base_ma\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.assin2.base.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_base\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_sts_by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_sts_ma.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large_ma\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_sts_ma_v3.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large_ma_v3\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_tsdae_sts.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_sts\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_tsdae_sts_v4.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_sts_v4\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_tsdae_v4_gpl_sts.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_v4_gpl_sts\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.v2_large_sts_v2.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large_v2\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.v2_large_v2_sts.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_v2_sts\n", + "For language NLU provides the following Models : \n", + "nlu.load('rw.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_kinyarwanda\n", + "For language NLU provides the following Models : \n", + "nlu.load('sv.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "For language NLU provides the following Models : \n", + "nlu.load('sw.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_swahili\n", + "For language NLU provides the following Models : \n", + "nlu.load('wo.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_wolof\n", + "For language NLU provides the following Models : \n", + "nlu.load('xx.embed_sentence') returns Spark NLP model_anno_obj sent_bert_multi_cased\n", + "nlu.load('xx.embed_sentence.bert') returns Spark NLP model_anno_obj sent_bert_multi_cased\n", + "nlu.load('xx.embed_sentence.bert.cased') returns Spark NLP model_anno_obj sent_bert_multi_cased\n", + "nlu.load('xx.embed_sentence.bert.muril') returns Spark NLP model_anno_obj sent_bert_muril\n", + "nlu.load('xx.embed_sentence.bert_use_cmlm_multi_base') returns Spark NLP model_anno_obj sent_bert_use_cmlm_multi_base\n", + "nlu.load('xx.embed_sentence.bert_use_cmlm_multi_base_br') returns Spark NLP model_anno_obj sent_bert_use_cmlm_multi_base_br\n", + "nlu.load('xx.embed_sentence.labse') returns Spark NLP model_anno_obj labse\n", + "nlu.load('xx.embed_sentence.xlm_roberta.base') returns Spark NLP model_anno_obj sent_xlm_roberta_base\n", + "For language NLU provides the following Models : \n", + "nlu.load('yo.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_yoruba\n", + "For language NLU provides the following Models : \n", + "nlu.load('zh.embed_sentence.bert') returns Spark NLP model_anno_obj sbert_chinese_qmc_finance_v1\n", + "nlu.load('zh.embed_sentence.bert.distilled') returns Spark NLP model_anno_obj sbert_chinese_qmc_finance_v1_distill\n" + ] + } + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "IKK_Ii_gjJfF", + "outputId": "ea0aed1f-ffd4-4e0c-f85d-d977fbecdcb0" + }, + "source": [ + "trainable_pipe = nlp.load('en.embed_sentence.small_bert_L12_768 train.sentiment')\n", + "# We need to train longer and user smaller LR for NON-USE based sentence embeddings usually\n", + "# We could tune the hyperparameters further with hyperparameter tuning methods like gridsearch\n", + "# Also longer training gives more accuracy\n", + "trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(120)\n", + "trainable_pipe['trainable_sentiment_dl'].setLr(0.0005)\n", + "fitted_pipe = trainable_pipe.fit(train_df)\n", + "# predict with the trainable pipeline on dataset and get predictions\n", + "preds = fitted_pipe.predict(train_df,output_level='document')\n", + "\n", + "#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n", + "preds.dropna(inplace=True)\n", + "print(classification_report(preds['y'], preds['sentiment']))\n", + "\n", + "#preds" + ], + "execution_count": 25, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", + "sent_small_bert_L12_768 download started this may take some time.\n", + "Approximate size to download 392.9 MB\n", + "[OK!]\n", + " precision recall f1-score support\n", + "\n", + " negative 0.88 0.85 0.86 1203\n", + " neutral 0.00 0.00 0.00 0\n", + " positive 0.88 0.84 0.86 1197\n", + "\n", + " accuracy 0.85 2400\n", + " macro avg 0.59 0.56 0.58 2400\n", + "weighted avg 0.88 0.85 0.86 2400\n", + "\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_1jxw3GnVGlI" + }, + "source": [ + "# 7.1 evaluate on Test Data" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Fxx4yNkNVGFl", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "33633053-c610-4438-cced-15052ff15ffa" + }, + "source": [ + "preds = fitted_pipe.predict(test_df,output_level='document')\n", + "\n", + "#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n", + "preds.dropna(inplace=True)\n", + "print(classification_report(preds['y'], preds['sentiment']))" + ], + "execution_count": 26, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + " precision recall f1-score support\n", + "\n", + " negative 0.85 0.83 0.84 297\n", + " neutral 0.00 0.00 0.00 0\n", + " positive 0.85 0.81 0.83 303\n", + "\n", + " accuracy 0.82 600\n", + " macro avg 0.57 0.55 0.56 600\n", + "weighted avg 0.85 0.82 0.84 600\n", + "\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2BB-NwZUoHSe" + }, + "source": [ + "# 8. Lets save the model" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "eLex095goHwm" + }, + "source": [ + "stored_model_path = './models/classifier_dl_trained'\n", + "fitted_pipe.save(stored_model_path)" + ], + "execution_count": 27, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e_b2DPd4rCiU" + }, + "source": [ + "# 9. Lets load the model from HDD.\n", + "This makes Offlien NLU usage possible! \n", + "You need to call nlu.load(path=path_to_the_pipe) to load a model/pipeline from disk." + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 133 + }, + "id": "SO4uz45MoRgp", + "outputId": "e9b8ab40-ee03-47b2-de2b-99a0cdb54b5b" + }, + "source": [ + "hdd_pipe = nlp.load(path=stored_model_path)\n", + "\n", + "preds = hdd_pipe.predict('All the buildings in the capital were destroyed')\n", + "preds" + ], + "execution_count": 28, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " document \\\n", + "0 All the buildings in the capital were destroyed \n", + "\n", + " sentence_embedding_from_disk sentiment \\\n", + "0 [-0.39346593618392944, 0.33815115690231323, -0... positive \n", + "\n", + " sentiment_confidence \n", + "0 0.0 " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
documentsentence_embedding_from_disksentimentsentiment_confidence
0All the buildings in the capital were destroyed[-0.39346593618392944, 0.33815115690231323, -0...positive0.0
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 28 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "e0CVlkk9v6Qi", + "outputId": "adae61bf-6000-4e21-f24c-d8a547449cd1" + }, + "source": [ + "hdd_pipe.print_info()" + ], + "execution_count": 29, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n", + ">>> component_list['document_assembler'] has settable params:\n", + "component_list['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n", + ">>> component_list['bert_sentence_embeddings@sent_small_bert_L12_768'] has settable params:\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setBatchSize(8) | Info: Size of every batch | Currently set to : 8\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setCaseSensitive(False) | Info: whether to ignore case in tokens for embeddings matching | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setDimension(768) | Info: Number of embedding dimensions | Currently set to : 768\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setMaxSentenceLength(128) | Info: Max sentence length to process | Currently set to : 128\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setIsLong(False) | Info: Use Long type instead of Int type for inputs buffer - Some Bert models require Long instead of Int. | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setStorageRef('sent_small_bert_L12_768') | Info: unique reference name for identification | Currently set to : sent_small_bert_L12_768\n", + ">>> component_list['sentiment_dl@sent_small_bert_L12_768'] has settable params:\n", + "component_list['sentiment_dl@sent_small_bert_L12_768'].setThreshold(0.6) | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n", + "component_list['sentiment_dl@sent_small_bert_L12_768'].setThresholdLabel('neutral') | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n", + "component_list['sentiment_dl@sent_small_bert_L12_768'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['sentiment_dl@sent_small_bert_L12_768'].setClasses(['positive', 'negative']) | Info: get the tags used to trained this SentimentDLModel | Currently set to : ['positive', 'negative']\n", + "component_list['sentiment_dl@sent_small_bert_L12_768'].setStorageRef('sent_small_bert_L12_768') | Info: unique reference name for identification | Currently set to : sent_small_bert_L12_768\n" + ] + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "F_mfqyyyKGkV" + }, + "source": [], + "execution_count": null, + "outputs": [] + } + ] +} \ No newline at end of file diff --git a/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_reddit.ipynb b/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_reddit.ipynb index 0afe36d8..ab2cc6cc 100644 --- a/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_reddit.ipynb +++ b/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_reddit.ipynb @@ -1 +1,3059 @@ -{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"NLU_training_sentiment_classifier_demo_reddit.ipynb","provenance":[],"collapsed_sections":[]},"kernelspec":{"display_name":"Python 3","name":"python3"}},"cells":[{"cell_type":"markdown","metadata":{"id":"zkufh760uvF3"},"source":["![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)\n","\n","[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_reddit.ipynb)\n","\n","\n","# Training a Sentiment Analysis Classifier with NLU \n","## 2 class Reddit comment sentiment classifier training\n","With the [SentimentDL model](https://nlp.johnsnowlabs.com/docs/en/annotators#sentimentdl-multi-class-sentiment-analysis-annotator) from Spark NLP you can achieve State Of the Art results on any multi class text classification problem \n","\n","This notebook showcases the following features : \n","\n","- How to train the deep learning classifier\n","- How to store a pipeline to disk\n","- How to load the pipeline from disk (Enables NLU offline mode)\n","\n"]},{"cell_type":"markdown","metadata":{"id":"dur2drhW5Rvi"},"source":["# 1. Install Java 8 and NLU"]},{"cell_type":"code","metadata":{"id":"hFGnBCHavltY","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1620193274509,"user_tz":-120,"elapsed":122976,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"f1d2ff21-6e64-4c99-fcae-ae5d060d0530"},"source":["!wget https://setup.johnsnowlabs.com/nlu/colab.sh -O - | bash\n","import nlu"],"execution_count":null,"outputs":[{"output_type":"stream","text":["--2021-05-05 05:39:12-- https://raw.githubusercontent.com/JohnSnowLabs/nlu/master/scripts/colab_setup.sh\n","Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.111.133, ...\n","Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.\n","HTTP request sent, awaiting response... 200 OK\n","Length: 1671 (1.6K) [text/plain]\n","Saving to: ‘STDOUT’\n","\n","\r- 0%[ ] 0 --.-KB/s \rInstalling NLU 3.0.0 with PySpark 3.0.2 and Spark NLP 3.0.1 for Google Colab ...\n","- 100%[===================>] 1.63K --.-KB/s in 0s \n","\n","2021-05-05 05:39:12 (39.8 MB/s) - written to stdout [1671/1671]\n","\n","\u001b[K |████████████████████████████████| 204.8MB 64kB/s \n","\u001b[K |████████████████████████████████| 153kB 47.9MB/s \n","\u001b[K |████████████████████████████████| 204kB 17.3MB/s \n","\u001b[K |████████████████████████████████| 204kB 49.5MB/s \n","\u001b[?25h Building wheel for pyspark (setup.py) ... \u001b[?25l\u001b[?25hdone\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"f4KkTfnR5Ugg"},"source":["# 2. Download Reddit Sentiment dataset \n","https://www.kaggle.com/cosmos98/twitter-and-reddit-sentimental-analysis-dataset\n","#Context\n","\n","This is was a Dataset Created as a part of the university Project On Sentimental Analysis On Multi-Source Social Media Platforms using PySpark."]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"OrVb5ZMvvrQD","executionInfo":{"status":"ok","timestamp":1620193275331,"user_tz":-120,"elapsed":123790,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"b398d517-d3af-4205-83ac-7994eb80fffc"},"source":["! wget http://ckl-it.de/wp-content/uploads/2021/01/Reddit_Data.csv\n"],"execution_count":null,"outputs":[{"output_type":"stream","text":["--2021-05-05 05:41:14-- http://ckl-it.de/wp-content/uploads/2021/01/Reddit_Data.csv\n","Resolving ckl-it.de (ckl-it.de)... 217.160.0.108, 2001:8d8:100f:f000::209\n","Connecting to ckl-it.de (ckl-it.de)|217.160.0.108|:80... connected.\n","HTTP request sent, awaiting response... 200 OK\n","Length: 153265 (150K) [text/csv]\n","Saving to: ‘Reddit_Data.csv’\n","\n","Reddit_Data.csv 100%[===================>] 149.67K 402KB/s in 0.4s \n","\n","2021-05-05 05:41:14 (402 KB/s) - ‘Reddit_Data.csv’ saved [153265/153265]\n","\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":406},"id":"y4xSRWIhwT28","executionInfo":{"status":"ok","timestamp":1620193275333,"user_tz":-120,"elapsed":123787,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"34fa4591-c90f-4147-f536-038a9f383019"},"source":["import pandas as pd\n","train_path = '/content/Reddit_Data.csv'\n","\n","train_df = pd.read_csv(train_path)\n","# the text data to use for classification should be in a column named 'text'\n","columns=['text','y']\n","train_df = train_df[columns]\n","train_df"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
texty
0its true they had cut the power what douchebag...positive
1fuck giroud better finishing like this monthpositive
2looks shit now but still proud madepositive
3pelor the burning hate the best evil godnegative
4can ask what you with something this powerfulpositive
.........
595bangali desh bechne main sabse aagepositive
596national media channels were gaged not cover t...positive
597been following these threads from the beginni...negative
598pretty sure this sarcasm satire the news 1500...positive
599much would love for namo our next hard imagin...positive
\n","

600 rows × 2 columns

\n","
"],"text/plain":[" text y\n","0 its true they had cut the power what douchebag... positive\n","1 fuck giroud better finishing like this month positive\n","2 looks shit now but still proud made positive\n","3 pelor the burning hate the best evil god negative\n","4 can ask what you with something this powerful positive\n",".. ... ...\n","595 bangali desh bechne main sabse aage positive\n","596 national media channels were gaged not cover t... positive\n","597 been following these threads from the beginni... negative\n","598 pretty sure this sarcasm satire the news 1500... positive\n","599 much would love for namo our next hard imagin... positive\n","\n","[600 rows x 2 columns]"]},"metadata":{"tags":[]},"execution_count":3}]},{"cell_type":"markdown","metadata":{"id":"0296Om2C5anY"},"source":["# 3. Train Deep Learning Classifier using nlu.load('train.sentiment')\n","\n","You dataset label column should be named 'y' and the feature column with text data should be named 'text'"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":1000},"id":"3ZIPkRkWftBG","executionInfo":{"status":"ok","timestamp":1620195831144,"user_tz":-120,"elapsed":11077,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"40e942ab-8923-4f23-acc6-e600cf4e7e16"},"source":["from sklearn.metrics import classification_report\n","import nlu \n","# load a trainable pipeline by specifying the train. prefix and fit it on a datset with label and text columns\n","# by default the Universal Sentence Encoder (USE) Sentence embeddings are used for generation\n","trainable_pipe = nlu.load('train.sentiment')\n","fitted_pipe = trainable_pipe.fit(train_df.iloc[:50])\n","\n","# predict with the trainable pipeline on dataset and get predictions\n","preds = fitted_pipe.predict(train_df.iloc[:50],output_level='document')\n","#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n","preds.dropna(inplace=True)\n","print(classification_report(preds['y'], preds['sentiment']))\n","\n","preds"],"execution_count":null,"outputs":[{"output_type":"stream","text":["tfhub_use download started this may take some time.\n","Approximate size to download 923.7 MB\n","[OK!]\n","sentence_detector_dl download started this may take some time.\n","Approximate size to download 354.6 KB\n","[OK!]\n"," precision recall f1-score support\n","\n"," negative 0.00 0.00 0.00 24\n"," neutral 0.00 0.00 0.00 0\n"," positive 0.72 1.00 0.84 26\n","\n"," accuracy 0.52 50\n"," macro avg 0.24 0.33 0.28 50\n","weighted avg 0.38 0.52 0.44 50\n","\n"],"name":"stdout"},{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
origin_indextextysentencesentence_embedding_usetrained_sentimenttrained_sentiment_confidencedocument
00its true they had cut the power what douchebag...positive[its true they had cut the power what doucheba...[0.033111296594142914, 0.053994592279195786, -...positive0.626655its true they had cut the power what douchebag...
11fuck giroud better finishing like this monthpositive[fuck giroud better finishing like this month][0.0678204670548439, 0.01411951333284378, -0.0...positive0.653644fuck giroud better finishing like this month
22looks shit now but still proud madepositive[looks shit now but still proud made][0.03247417137026787, -0.09844466298818588, -0...positive0.660186looks shit now but still proud made
33pelor the burning hate the best evil godnegative[pelor the burning hate the best evil god][0.04032062366604805, 0.07666622847318649, -0....neutral0.578461pelor the burning hate the best evil god
44can ask what you with something this powerfulpositive[can ask what you with something this powerful][0.015518003143370152, -0.05116305500268936, -...positive0.691478can ask what you with something this powerful
55aap’ shazia ilmi from puram constituency lag...negative[aap’ shazia ilmi from puram constituency la...[0.02478150464594364, -0.06508146971464157, -0...positive0.612378aap’ shazia ilmi from puram constituency lag...
66fuck yeahnegative[fuck yeah][0.046024102717638016, -0.02504798397421837, -...neutral0.586349fuck yeah
77honestly really surprised alice ranked that lo...positive[honestly really surprised alice ranked that l...[-0.035716041922569275, -0.04127982258796692, ...positive0.654837honestly really surprised alice ranked that lo...
88didn care about politics before now hatenegative[didn care about politics before now hate][-0.006816443987190723, 0.06221264228224754, -...neutral0.581450didn care about politics before now hate
99hard nips and goosebumpsnegative[hard nips and goosebumps][-0.02919699251651764, -0.030449824407696724, ...neutral0.580563hard nips and goosebumps
1010varadabhai ndtv trying too well dilute bjp tre...negative[varadabhai ndtv trying too well dilute bjp tr...[0.04727796092629433, -0.06792476028203964, -0...positive0.621601varadabhai ndtv trying too well dilute bjp tre...
1111old man has lost his mindpositive[old man has lost his mind][0.039657335728406906, -0.04277808964252472, -...positive0.639790old man has lost his mind
1212why this being downvoted you might ask both mo...negative[why this being downvoted you might ask both m...[0.06581216305494308, -0.06079106032848358, -0...positive0.629655why this being downvoted you might ask both mo...
1313hasnt changed all apolitical before simply do...positive[hasnt changed all apolitical before simply do...[0.03509754315018654, -0.004639611579477787, -...positive0.624468hasnt changed all apolitical before simply don...
1414for one campaign pretty much just snatched the...negative[for one campaign pretty much just snatched th...[0.017386479303240776, 0.0443551279604435, -0....positive0.618989for one campaign pretty much just snatched the...
1515vajpayee managed forge much broader coalition ...positive[vajpayee managed forge much broader coalition...[0.0372871570289135, -0.051079731434583664, -0...positive0.661443vajpayee managed forge much broader coalition ...
1616lol this only proves how desperate they are ge...positive[lol this only proves how desperate they are g...[0.05233633145689964, -0.03147873282432556, 0....positive0.615814lol this only proves how desperate they are ge...
1717dont hate aap but your questions are example ...negative[dont hate aap but your questions are example ...[0.026356501504778862, -0.04044199362397194, -...positive0.614039dont hate aap but your questions are example w...
1818what were the other policies you discussed not...negative[what were the other policies you discussed no...[-0.07521010935306549, 0.008543566800653934, 0...neutral0.581745what were the other policies you discussed not...
1919wow lots favorites this bracket haqua tsukushi...positive[wow lots favorites this bracket haqua tsukush...[-0.069316066801548, -0.015458517707884312, -0...positive0.663116wow lots favorites this bracket haqua tsukushi...
2020sorry know this isn what you asked just ventin...negative[sorry know this isn what you asked just venti...[0.016777774319052696, -0.05478339642286301, -...positive0.618847sorry know this isn what you asked just ventin...
2121coming out strongly against gujarat chief min...positive[coming out strongly against gujarat chief min...[0.06856724619865417, -0.019821861758828163, -...positive0.640057coming out strongly against gujarat chief mini...
2222there one tool bjp can use their manifesto whi...positive[there one tool bjp can use their manifesto wh...[0.057847339659929276, -0.05365725979208946, -...positive0.648467there one tool bjp can use their manifesto whi...
2323jakiro spotted the middle top maybepositive[jakiro spotted the middle top maybe][-0.011690961197018623, -0.024473996832966805,...positive0.678376jakiro spotted the middle top maybe
2424family mormon have never tried explain them t...positive[family mormon have never tried explain them t...[0.03987010195851326, -0.0009543427731841803, ...positive0.678150family mormon have never tried explain them th...
2525with these results would have grudgingly accep...negative[with these results would have grudgingly acce...[0.034668292850255966, -0.05392604321241379, -...positive0.632179with these results would have grudgingly accep...
2626tea partier expresses support for namo after ...negative[tea partier expresses support for namo after ...[0.032365716993808746, -0.056087080389261246, ...neutral0.587468tea partier expresses support for namo after e...
2727politically would stupid move take stand right...negative[politically would stupid move take stand righ...[-0.00040777752292342484, -0.01262842211872339...neutral0.593538politically would stupid move take stand right...
2828wtf whynegative[wtf why][0.025807170197367668, -0.07080958038568497, -...neutral0.584124wtf why
2929have actually seen lot users views change dur...positive[have actually seen lot users views change dur...[-0.009333955124020576, 0.01388698909431696, -...positive0.649225have actually seen lot users views change duri...
3030truth told there not insignificant percentage ...positive[truth told there not insignificant percentage...[0.03927519917488098, -0.05597652122378349, -0...positive0.649703truth told there not insignificant percentage ...
3131was anti bjp and neutral cong became anti bjp ...positive[was anti bjp and neutral cong became anti bjp...[0.03805134445428848, -0.030298737809062004, -...positive0.639704was anti bjp and neutral cong became anti bjp ...
3232most religions have dogmatic orthodox well eso...positive[most religions have dogmatic orthodox well es...[0.03939439728856087, -0.02040349319577217, -0...positive0.684981most religions have dogmatic orthodox well eso...
3333laureatte sen said christian schools are perfe...positive[laureatte sen said christian schools are perf...[0.05267934128642082, 0.05836360529065132, 0.0...positive0.647901laureatte sen said christian schools are perfe...
3434need stop watching the garbage that you watch...positive[need stop watching the garbage that you watch...[-0.012382612563669682, 0.01988200470805168, 0...positive0.653579need stop watching the garbage that you watch ...
3535gandhi mandela hitler mao plato chandragupt ma...negative[gandhi mandela hitler mao plato chandragupt m...[0.027552243322134018, 0.013075066730380058, 0...neutral0.593626gandhi mandela hitler mao plato chandragupt ma...
3636hate aap for the other thread points such the...negative[hate aap for the other thread points such the...[0.014617362059652805, -0.038017578423023224, ...positive0.618540hate aap for the other thread points such the ...
3737absolutely agree with you subsidies the worst ...negative[absolutely agree with you subsidies the worst...[0.0109744006767869, 0.0033110964577645063, -0...positive0.623482absolutely agree with you subsidies the worst ...
3838are you corrupt mind have you benefited throu...negative[are you corrupt mind have you benefited throu...[0.03834373503923416, -0.06521473079919815, -0...neutral0.599771are you corrupt mind have you benefited throug...
3939congress needs bogeyman modi without the bad g...positive[congress needs bogeyman modi without the bad ...[0.03138439729809761, -0.06221967190504074, -0...positive0.649323congress needs bogeyman modi without the bad g...
4040protip don type uppercase text all caps harder...negative[protip don type uppercase text all caps harde...[0.044019922614097595, 0.025341013446450233, 0...neutral0.567544protip don type uppercase text all caps harder...
4141brother trog very wrathful indeed but his wil...positive[brother trog very wrathful indeed but his wil...[-0.024625714868307114, 0.06193268671631813, 0...positive0.654243brother trog very wrathful indeed but his will...
4242start off saying that the craftsmanship this ...positive[start off saying that the craftsmanship this ...[0.05780624598264694, -0.06291750818490982, -0...positive0.707086start off saying that the craftsmanship this p...
4343have made request unban namoarmy hell moron h...negative[have made request unban namoarmy hell moron h...[0.015555822290480137, -0.012748800218105316, ...neutral0.578515have made request unban namoarmy hell moron ho...
4444child modi worked his father’ tea shop and ...negative[child modi worked his father’ tea shop and ...[0.05774841457605362, -0.05956699699163437, -0...positive0.608462child modi worked his father’ tea shop and y...
4545namo tea yuupea horrible rhyme knownegative[namo tea yuupea horrible rhyme know][0.025534288957715034, 0.004176765214651823, -...neutral0.576838namo tea yuupea horrible rhyme know
4646great agility from akpom cut back and bendpositive[great agility from akpom cut back and bend][0.06865684688091278, -0.02164856530725956, -0...positive0.676614great agility from akpom cut back and bend
4747from undecided pro aap they are not perfect bu...positive[from undecided pro aap they are not perfect b...[0.01590304635465145, -0.0683458000421524, -0....positive0.651092from undecided pro aap they are not perfect bu...
4848woah there don insane with pray mean you don w...negative[woah there don insane with pray mean you don ...[0.050547026097774506, -0.01725909113883972, 0...neutral0.573394woah there don insane with pray mean you don w...
4949porngress wont announce their candidate cuz th...positive[porngress wont announce their candidate cuz t...[0.05935536324977875, -0.051609162241220474, -...positive0.662533porngress wont announce their candidate cuz th...
\n","
"],"text/plain":[" origin_index ... document\n","0 0 ... its true they had cut the power what douchebag...\n","1 1 ... fuck giroud better finishing like this month\n","2 2 ... looks shit now but still proud made\n","3 3 ... pelor the burning hate the best evil god\n","4 4 ... can ask what you with something this powerful\n","5 5 ... aap’ shazia ilmi from puram constituency lag...\n","6 6 ... fuck yeah\n","7 7 ... honestly really surprised alice ranked that lo...\n","8 8 ... didn care about politics before now hate\n","9 9 ... hard nips and goosebumps\n","10 10 ... varadabhai ndtv trying too well dilute bjp tre...\n","11 11 ... old man has lost his mind\n","12 12 ... why this being downvoted you might ask both mo...\n","13 13 ... hasnt changed all apolitical before simply don...\n","14 14 ... for one campaign pretty much just snatched the...\n","15 15 ... vajpayee managed forge much broader coalition ...\n","16 16 ... lol this only proves how desperate they are ge...\n","17 17 ... dont hate aap but your questions are example w...\n","18 18 ... what were the other policies you discussed not...\n","19 19 ... wow lots favorites this bracket haqua tsukushi...\n","20 20 ... sorry know this isn what you asked just ventin...\n","21 21 ... coming out strongly against gujarat chief mini...\n","22 22 ... there one tool bjp can use their manifesto whi...\n","23 23 ... jakiro spotted the middle top maybe\n","24 24 ... family mormon have never tried explain them th...\n","25 25 ... with these results would have grudgingly accep...\n","26 26 ... tea partier expresses support for namo after e...\n","27 27 ... politically would stupid move take stand right...\n","28 28 ... wtf why\n","29 29 ... have actually seen lot users views change duri...\n","30 30 ... truth told there not insignificant percentage ...\n","31 31 ... was anti bjp and neutral cong became anti bjp ...\n","32 32 ... most religions have dogmatic orthodox well eso...\n","33 33 ... laureatte sen said christian schools are perfe...\n","34 34 ... need stop watching the garbage that you watch ...\n","35 35 ... gandhi mandela hitler mao plato chandragupt ma...\n","36 36 ... hate aap for the other thread points such the ...\n","37 37 ... absolutely agree with you subsidies the worst ...\n","38 38 ... are you corrupt mind have you benefited throug...\n","39 39 ... congress needs bogeyman modi without the bad g...\n","40 40 ... protip don type uppercase text all caps harder...\n","41 41 ... brother trog very wrathful indeed but his will...\n","42 42 ... start off saying that the craftsmanship this p...\n","43 43 ... have made request unban namoarmy hell moron ho...\n","44 44 ... child modi worked his father’ tea shop and y...\n","45 45 ... namo tea yuupea horrible rhyme know\n","46 46 ... great agility from akpom cut back and bend\n","47 47 ... from undecided pro aap they are not perfect bu...\n","48 48 ... woah there don insane with pray mean you don w...\n","49 49 ... porngress wont announce their candidate cuz th...\n","\n","[50 rows x 8 columns]"]},"metadata":{"tags":[]},"execution_count":6}]},{"cell_type":"markdown","metadata":{"id":"lVyOE2wV0fw_"},"source":["# Test the fitted pipe on new example"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":77},"id":"qdCUg2MR0PD2","executionInfo":{"status":"ok","timestamp":1620195831996,"user_tz":-120,"elapsed":11264,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"68ed8568-654b-41bc-fa22-0ea363e1fb2d"},"source":["fitted_pipe.predict(\"Indian prime minister was assinated!\")"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
origin_indexsentencesentence_embedding_usetrained_sentimenttrained_sentiment_confidencedocument
00[Indian prime minister was assinated!][0.012644989416003227, -0.04661174491047859, -...positive0.6117Indian prime minister was assinated!
\n","
"],"text/plain":[" origin_index ... document\n","0 0 ... Indian prime minister was assinated!\n","\n","[1 rows x 6 columns]"]},"metadata":{"tags":[]},"execution_count":7}]},{"cell_type":"markdown","metadata":{"id":"xflpwrVjjBVD"},"source":["## Configure pipe training parameters"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"UtsAUGTmOTms","executionInfo":{"status":"ok","timestamp":1620195831997,"user_tz":-120,"elapsed":11180,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"69dd4215-3f29-4231-9213-0fe956bc520f"},"source":["trainable_pipe.print_info()"],"execution_count":null,"outputs":[{"output_type":"stream","text":["The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n",">>> pipe['sentiment_dl'] has settable params:\n","pipe['sentiment_dl'].setMaxEpochs(1) | Info: Maximum number of epochs to train | Currently set to : 1\n","pipe['sentiment_dl'].setLr(0.005) | Info: Learning Rate | Currently set to : 0.005\n","pipe['sentiment_dl'].setBatchSize(64) | Info: Batch size | Currently set to : 64\n","pipe['sentiment_dl'].setDropout(0.5) | Info: Dropout coefficient | Currently set to : 0.5\n","pipe['sentiment_dl'].setEnableOutputLogs(True) | Info: Whether to use stdout in addition to Spark logs. | Currently set to : True\n","pipe['sentiment_dl'].setThreshold(0.6) | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n","pipe['sentiment_dl'].setThresholdLabel('neutral') | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n",">>> pipe['use@tfhub_use'] has settable params:\n","pipe['use@tfhub_use'].setDimension(512) | Info: Number of embedding dimensions | Currently set to : 512\n","pipe['use@tfhub_use'].setLoadSP(False) | Info: Whether to load SentencePiece ops file which is required only by multi-lingual models. This is not changeable after it's set with a pretrained model nor it is compatible with Windows. | Currently set to : False\n","pipe['use@tfhub_use'].setStorageRef('tfhub_use') | Info: unique reference name for identification | Currently set to : tfhub_use\n",">>> pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'] has settable params:\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setExplodeSentences(False) | Info: whether to explode each sentence into a different row, for better parallelization. Defaults to false. | Currently set to : False\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setStorageRef('SentenceDetectorDLModel_c83c27f46b97') | Info: storage unique identifier | Currently set to : SentenceDetectorDLModel_c83c27f46b97\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setEncoder(com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@3350ae7a) | Info: Data encoder | Currently set to : com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@3350ae7a\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setImpossiblePenultimates(['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']) | Info: Impossible penultimates | Currently set to : ['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setModelArchitecture('cnn') | Info: Model architecture (CNN) | Currently set to : cnn\n",">>> pipe['document_assembler'] has settable params:\n","pipe['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"2GJdDNV9jEIe"},"source":["## Retrain with new parameters"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":1000},"id":"mptfvHx-MMMX","executionInfo":{"status":"ok","timestamp":1620195837227,"user_tz":-120,"elapsed":16321,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"a46b8835-74db-4f5e-dd91-87d9277cb5b3"},"source":["# Train longer!\n","trainable_pipe = nlu.load('train.sentiment')\n","trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(5) \n","fitted_pipe = trainable_pipe.fit(train_df.iloc[:50])\n","# predict with the trainable pipeline on dataset and get predictions\n","preds = fitted_pipe.predict(train_df.iloc[:50],output_level='document')\n","\n","#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n","preds.dropna(inplace=True)\n","print(classification_report(preds['y'], preds['sentiment']))\n","\n","preds"],"execution_count":null,"outputs":[{"output_type":"stream","text":[" precision recall f1-score support\n","\n"," negative 1.00 0.83 0.91 24\n"," neutral 0.00 0.00 0.00 0\n"," positive 1.00 1.00 1.00 26\n","\n"," accuracy 0.92 50\n"," macro avg 0.67 0.61 0.64 50\n","weighted avg 1.00 0.92 0.96 50\n","\n"],"name":"stdout"},{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
origin_indextextysentencesentence_embedding_usetrained_sentimenttrained_sentiment_confidencedocument
00its true they had cut the power what douchebag...positive[its true they had cut the power what doucheba...[0.033111296594142914, 0.053994592279195786, -...positive0.761194its true they had cut the power what douchebag...
11fuck giroud better finishing like this monthpositive[fuck giroud better finishing like this month][0.0678204670548439, 0.01411951333284378, -0.0...positive0.938677fuck giroud better finishing like this month
22looks shit now but still proud madepositive[looks shit now but still proud made][0.03247417137026787, -0.09844466298818588, -0...positive0.954937looks shit now but still proud made
33pelor the burning hate the best evil godnegative[pelor the burning hate the best evil god][0.04032062366604805, 0.07666622847318649, -0....negative0.810980pelor the burning hate the best evil god
44can ask what you with something this powerfulpositive[can ask what you with something this powerful][0.015518003143370152, -0.05116305500268936, -...positive0.956043can ask what you with something this powerful
55aap’ shazia ilmi from puram constituency lag...negative[aap’ shazia ilmi from puram constituency la...[0.02478150464594364, -0.06508146971464157, -0...negative0.708917aap’ shazia ilmi from puram constituency lag...
66fuck yeahnegative[fuck yeah][0.046024102717638016, -0.02504798397421837, -...negative0.731940fuck yeah
77honestly really surprised alice ranked that lo...positive[honestly really surprised alice ranked that l...[-0.035716041922569275, -0.04127982258796692, ...positive0.966494honestly really surprised alice ranked that lo...
88didn care about politics before now hatenegative[didn care about politics before now hate][-0.006816443987190723, 0.06221264228224754, -...negative0.672320didn care about politics before now hate
99hard nips and goosebumpsnegative[hard nips and goosebumps][-0.02919699251651764, -0.030449824407696724, ...negative0.604969hard nips and goosebumps
1010varadabhai ndtv trying too well dilute bjp tre...negative[varadabhai ndtv trying too well dilute bjp tr...[0.04727796092629433, -0.06792476028203964, -0...negative0.639880varadabhai ndtv trying too well dilute bjp tre...
1111old man has lost his mindpositive[old man has lost his mind][0.039657335728406906, -0.04277808964252472, -...positive0.929136old man has lost his mind
1212why this being downvoted you might ask both mo...negative[why this being downvoted you might ask both m...[0.06581216305494308, -0.06079106032848358, -0...neutral0.546161why this being downvoted you might ask both mo...
1313hasnt changed all apolitical before simply do...positive[hasnt changed all apolitical before simply do...[0.03509754315018654, -0.004639611579477787, -...positive0.883017hasnt changed all apolitical before simply don...
1414for one campaign pretty much just snatched the...negative[for one campaign pretty much just snatched th...[0.017386479303240776, 0.0443551279604435, -0....negative0.636396for one campaign pretty much just snatched the...
1515vajpayee managed forge much broader coalition ...positive[vajpayee managed forge much broader coalition...[0.0372871570289135, -0.051079731434583664, -0...positive0.848566vajpayee managed forge much broader coalition ...
1616lol this only proves how desperate they are ge...positive[lol this only proves how desperate they are g...[0.05233633145689964, -0.03147873282432556, 0....positive0.819890lol this only proves how desperate they are ge...
1717dont hate aap but your questions are example ...negative[dont hate aap but your questions are example ...[0.026356501504778862, -0.04044199362397194, -...negative0.724538dont hate aap but your questions are example w...
1818what were the other policies you discussed not...negative[what were the other policies you discussed no...[-0.07521010935306549, 0.008543566800653934, 0...negative0.732422what were the other policies you discussed not...
1919wow lots favorites this bracket haqua tsukushi...positive[wow lots favorites this bracket haqua tsukush...[-0.069316066801548, -0.015458517707884312, -0...positive0.971349wow lots favorites this bracket haqua tsukushi...
2020sorry know this isn what you asked just ventin...negative[sorry know this isn what you asked just venti...[0.016777774319052696, -0.05478339642286301, -...negative0.623325sorry know this isn what you asked just ventin...
2121coming out strongly against gujarat chief min...positive[coming out strongly against gujarat chief min...[0.06856724619865417, -0.019821861758828163, -...positive0.736283coming out strongly against gujarat chief mini...
2222there one tool bjp can use their manifesto whi...positive[there one tool bjp can use their manifesto wh...[0.057847339659929276, -0.05365725979208946, -...positive0.870023there one tool bjp can use their manifesto whi...
2323jakiro spotted the middle top maybepositive[jakiro spotted the middle top maybe][-0.011690961197018623, -0.024473996832966805,...positive0.965604jakiro spotted the middle top maybe
2424family mormon have never tried explain them t...positive[family mormon have never tried explain them t...[0.03987010195851326, -0.0009543427731841803, ...positive0.964053family mormon have never tried explain them th...
2525with these results would have grudgingly accep...negative[with these results would have grudgingly acce...[0.034668292850255966, -0.05392604321241379, -...neutral0.521401with these results would have grudgingly accep...
2626tea partier expresses support for namo after ...negative[tea partier expresses support for namo after ...[0.032365716993808746, -0.056087080389261246, ...negative0.837552tea partier expresses support for namo after e...
2727politically would stupid move take stand right...negative[politically would stupid move take stand righ...[-0.00040777752292342484, -0.01262842211872339...neutral0.541656politically would stupid move take stand right...
2828wtf whynegative[wtf why][0.025807170197367668, -0.07080958038568497, -...negative0.747054wtf why
2929have actually seen lot users views change dur...positive[have actually seen lot users views change dur...[-0.009333955124020576, 0.01388698909431696, -...positive0.818759have actually seen lot users views change duri...
3030truth told there not insignificant percentage ...positive[truth told there not insignificant percentage...[0.03927519917488098, -0.05597652122378349, -0...positive0.776765truth told there not insignificant percentage ...
3131was anti bjp and neutral cong became anti bjp ...positive[was anti bjp and neutral cong became anti bjp...[0.03805134445428848, -0.030298737809062004, -...positive0.630857was anti bjp and neutral cong became anti bjp ...
3232most religions have dogmatic orthodox well eso...positive[most religions have dogmatic orthodox well es...[0.03939439728856087, -0.02040349319577217, -0...positive0.972607most religions have dogmatic orthodox well eso...
3333laureatte sen said christian schools are perfe...positive[laureatte sen said christian schools are perf...[0.05267934128642082, 0.05836360529065132, 0.0...positive0.911020laureatte sen said christian schools are perfe...
3434need stop watching the garbage that you watch...positive[need stop watching the garbage that you watch...[-0.012382612563669682, 0.01988200470805168, 0...positive0.954440need stop watching the garbage that you watch ...
3535gandhi mandela hitler mao plato chandragupt ma...negative[gandhi mandela hitler mao plato chandragupt m...[0.027552243322134018, 0.013075066730380058, 0...negative0.767667gandhi mandela hitler mao plato chandragupt ma...
3636hate aap for the other thread points such the...negative[hate aap for the other thread points such the...[0.014617362059652805, -0.038017578423023224, ...negative0.690414hate aap for the other thread points such the ...
3737absolutely agree with you subsidies the worst ...negative[absolutely agree with you subsidies the worst...[0.0109744006767869, 0.0033110964577645063, -0...neutral0.581476absolutely agree with you subsidies the worst ...
3838are you corrupt mind have you benefited throu...negative[are you corrupt mind have you benefited throu...[0.03834373503923416, -0.06521473079919815, -0...negative0.783217are you corrupt mind have you benefited throug...
3939congress needs bogeyman modi without the bad g...positive[congress needs bogeyman modi without the bad ...[0.03138439729809761, -0.06221967190504074, -0...positive0.764358congress needs bogeyman modi without the bad g...
4040protip don type uppercase text all caps harder...negative[protip don type uppercase text all caps harde...[0.044019922614097595, 0.025341013446450233, 0...negative0.738550protip don type uppercase text all caps harder...
4141brother trog very wrathful indeed but his wil...positive[brother trog very wrathful indeed but his wil...[-0.024625714868307114, 0.06193268671631813, 0...positive0.923871brother trog very wrathful indeed but his will...
4242start off saying that the craftsmanship this ...positive[start off saying that the craftsmanship this ...[0.05780624598264694, -0.06291750818490982, -0...positive0.985073start off saying that the craftsmanship this p...
4343have made request unban namoarmy hell moron h...negative[have made request unban namoarmy hell moron h...[0.015555822290480137, -0.012748800218105316, ...negative0.796430have made request unban namoarmy hell moron ho...
4444child modi worked his father’ tea shop and ...negative[child modi worked his father’ tea shop and ...[0.05774841457605362, -0.05956699699163437, -0...negative0.709697child modi worked his father’ tea shop and y...
4545namo tea yuupea horrible rhyme knownegative[namo tea yuupea horrible rhyme know][0.025534288957715034, 0.004176765214651823, -...negative0.851523namo tea yuupea horrible rhyme know
4646great agility from akpom cut back and bendpositive[great agility from akpom cut back and bend][0.06865684688091278, -0.02164856530725956, -0...positive0.966416great agility from akpom cut back and bend
4747from undecided pro aap they are not perfect bu...positive[from undecided pro aap they are not perfect b...[0.01590304635465145, -0.0683458000421524, -0....positive0.891286from undecided pro aap they are not perfect bu...
4848woah there don insane with pray mean you don w...negative[woah there don insane with pray mean you don ...[0.050547026097774506, -0.01725909113883972, 0...negative0.798072woah there don insane with pray mean you don w...
4949porngress wont announce their candidate cuz th...positive[porngress wont announce their candidate cuz t...[0.05935536324977875, -0.051609162241220474, -...positive0.858500porngress wont announce their candidate cuz th...
\n","
"],"text/plain":[" origin_index ... document\n","0 0 ... its true they had cut the power what douchebag...\n","1 1 ... fuck giroud better finishing like this month\n","2 2 ... looks shit now but still proud made\n","3 3 ... pelor the burning hate the best evil god\n","4 4 ... can ask what you with something this powerful\n","5 5 ... aap’ shazia ilmi from puram constituency lag...\n","6 6 ... fuck yeah\n","7 7 ... honestly really surprised alice ranked that lo...\n","8 8 ... didn care about politics before now hate\n","9 9 ... hard nips and goosebumps\n","10 10 ... varadabhai ndtv trying too well dilute bjp tre...\n","11 11 ... old man has lost his mind\n","12 12 ... why this being downvoted you might ask both mo...\n","13 13 ... hasnt changed all apolitical before simply don...\n","14 14 ... for one campaign pretty much just snatched the...\n","15 15 ... vajpayee managed forge much broader coalition ...\n","16 16 ... lol this only proves how desperate they are ge...\n","17 17 ... dont hate aap but your questions are example w...\n","18 18 ... what were the other policies you discussed not...\n","19 19 ... wow lots favorites this bracket haqua tsukushi...\n","20 20 ... sorry know this isn what you asked just ventin...\n","21 21 ... coming out strongly against gujarat chief mini...\n","22 22 ... there one tool bjp can use their manifesto whi...\n","23 23 ... jakiro spotted the middle top maybe\n","24 24 ... family mormon have never tried explain them th...\n","25 25 ... with these results would have grudgingly accep...\n","26 26 ... tea partier expresses support for namo after e...\n","27 27 ... politically would stupid move take stand right...\n","28 28 ... wtf why\n","29 29 ... have actually seen lot users views change duri...\n","30 30 ... truth told there not insignificant percentage ...\n","31 31 ... was anti bjp and neutral cong became anti bjp ...\n","32 32 ... most religions have dogmatic orthodox well eso...\n","33 33 ... laureatte sen said christian schools are perfe...\n","34 34 ... need stop watching the garbage that you watch ...\n","35 35 ... gandhi mandela hitler mao plato chandragupt ma...\n","36 36 ... hate aap for the other thread points such the ...\n","37 37 ... absolutely agree with you subsidies the worst ...\n","38 38 ... are you corrupt mind have you benefited throug...\n","39 39 ... congress needs bogeyman modi without the bad g...\n","40 40 ... protip don type uppercase text all caps harder...\n","41 41 ... brother trog very wrathful indeed but his will...\n","42 42 ... start off saying that the craftsmanship this p...\n","43 43 ... have made request unban namoarmy hell moron ho...\n","44 44 ... child modi worked his father’ tea shop and y...\n","45 45 ... namo tea yuupea horrible rhyme know\n","46 46 ... great agility from akpom cut back and bend\n","47 47 ... from undecided pro aap they are not perfect bu...\n","48 48 ... woah there don insane with pray mean you don w...\n","49 49 ... porngress wont announce their candidate cuz th...\n","\n","[50 rows x 8 columns]"]},"metadata":{"tags":[]},"execution_count":9}]},{"cell_type":"markdown","metadata":{"id":"qFoT-s1MjTSS"},"source":["# Try training with different Embeddings"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"nxWFzQOhjWC8","executionInfo":{"status":"ok","timestamp":1620195837230,"user_tz":-120,"elapsed":16237,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"2e5d8344-8b95-40fc-b43e-04c0b4aaaa45"},"source":["# We can use nlu.print_components(action='embed_sentence') to see every possibler sentence embedding we could use. Lets use bert!\n","nlu.print_components(action='embed_sentence')"],"execution_count":null,"outputs":[{"output_type":"stream","text":["For language NLU provides the following Models : \n","nlu.load('en.embed_sentence') returns Spark NLP model tfhub_use\n","nlu.load('en.embed_sentence.use') returns Spark NLP model tfhub_use\n","nlu.load('en.embed_sentence.tfhub_use') returns Spark NLP model tfhub_use\n","nlu.load('en.embed_sentence.use.lg') returns Spark NLP model tfhub_use_lg\n","nlu.load('en.embed_sentence.tfhub_use.lg') returns Spark NLP model tfhub_use_lg\n","nlu.load('en.embed_sentence.albert') returns Spark NLP model albert_base_uncased\n","nlu.load('en.embed_sentence.electra') returns Spark NLP model sent_electra_small_uncased\n","nlu.load('en.embed_sentence.electra_small_uncased') returns Spark NLP model sent_electra_small_uncased\n","nlu.load('en.embed_sentence.electra_base_uncased') returns Spark NLP model sent_electra_base_uncased\n","nlu.load('en.embed_sentence.electra_large_uncased') returns Spark NLP model sent_electra_large_uncased\n","nlu.load('en.embed_sentence.bert') returns Spark NLP model sent_bert_base_uncased\n","nlu.load('en.embed_sentence.bert_base_uncased') returns Spark NLP model sent_bert_base_uncased\n","nlu.load('en.embed_sentence.bert_base_cased') returns Spark NLP model sent_bert_base_cased\n","nlu.load('en.embed_sentence.bert_large_uncased') returns Spark NLP model sent_bert_large_uncased\n","nlu.load('en.embed_sentence.bert_large_cased') returns Spark NLP model sent_bert_large_cased\n","nlu.load('en.embed_sentence.biobert.pubmed_base_cased') returns Spark NLP model sent_biobert_pubmed_base_cased\n","nlu.load('en.embed_sentence.biobert.pubmed_large_cased') returns Spark NLP model sent_biobert_pubmed_large_cased\n","nlu.load('en.embed_sentence.biobert.pmc_base_cased') returns Spark NLP model sent_biobert_pmc_base_cased\n","nlu.load('en.embed_sentence.biobert.pubmed_pmc_base_cased') returns Spark NLP model sent_biobert_pubmed_pmc_base_cased\n","nlu.load('en.embed_sentence.biobert.clinical_base_cased') returns Spark NLP model sent_biobert_clinical_base_cased\n","nlu.load('en.embed_sentence.biobert.discharge_base_cased') returns Spark NLP model sent_biobert_discharge_base_cased\n","nlu.load('en.embed_sentence.covidbert.large_uncased') returns Spark NLP model sent_covidbert_large_uncased\n","nlu.load('en.embed_sentence.small_bert_L2_128') returns Spark NLP model sent_small_bert_L2_128\n","nlu.load('en.embed_sentence.small_bert_L4_128') returns Spark NLP model sent_small_bert_L4_128\n","nlu.load('en.embed_sentence.small_bert_L6_128') returns Spark NLP model sent_small_bert_L6_128\n","nlu.load('en.embed_sentence.small_bert_L8_128') returns Spark NLP model sent_small_bert_L8_128\n","nlu.load('en.embed_sentence.small_bert_L10_128') returns Spark NLP model sent_small_bert_L10_128\n","nlu.load('en.embed_sentence.small_bert_L12_128') returns Spark NLP model sent_small_bert_L12_128\n","nlu.load('en.embed_sentence.small_bert_L2_256') returns Spark NLP model sent_small_bert_L2_256\n","nlu.load('en.embed_sentence.small_bert_L4_256') returns Spark NLP model sent_small_bert_L4_256\n","nlu.load('en.embed_sentence.small_bert_L6_256') returns Spark NLP model sent_small_bert_L6_256\n","nlu.load('en.embed_sentence.small_bert_L8_256') returns Spark NLP model sent_small_bert_L8_256\n","nlu.load('en.embed_sentence.small_bert_L10_256') returns Spark NLP model sent_small_bert_L10_256\n","nlu.load('en.embed_sentence.small_bert_L12_256') returns Spark NLP model sent_small_bert_L12_256\n","nlu.load('en.embed_sentence.small_bert_L2_512') returns Spark NLP model sent_small_bert_L2_512\n","nlu.load('en.embed_sentence.small_bert_L4_512') returns Spark NLP model sent_small_bert_L4_512\n","nlu.load('en.embed_sentence.small_bert_L6_512') returns Spark NLP model sent_small_bert_L6_512\n","nlu.load('en.embed_sentence.small_bert_L8_512') returns Spark NLP model sent_small_bert_L8_512\n","nlu.load('en.embed_sentence.small_bert_L10_512') returns Spark NLP model sent_small_bert_L10_512\n","nlu.load('en.embed_sentence.small_bert_L12_512') returns Spark NLP model sent_small_bert_L12_512\n","nlu.load('en.embed_sentence.small_bert_L2_768') returns Spark NLP model sent_small_bert_L2_768\n","nlu.load('en.embed_sentence.small_bert_L4_768') returns Spark NLP model sent_small_bert_L4_768\n","nlu.load('en.embed_sentence.small_bert_L6_768') returns Spark NLP model sent_small_bert_L6_768\n","nlu.load('en.embed_sentence.small_bert_L8_768') returns Spark NLP model sent_small_bert_L8_768\n","nlu.load('en.embed_sentence.small_bert_L10_768') returns Spark NLP model sent_small_bert_L10_768\n","nlu.load('en.embed_sentence.small_bert_L12_768') returns Spark NLP model sent_small_bert_L12_768\n","For language NLU provides the following Models : \n","nlu.load('fi.embed_sentence') returns Spark NLP model sent_bert_finnish_cased\n","nlu.load('fi.embed_sentence.bert.cased') returns Spark NLP model sent_bert_finnish_cased\n","nlu.load('fi.embed_sentence.bert.uncased') returns Spark NLP model sent_bert_finnish_uncased\n","For language NLU provides the following Models : \n","nlu.load('xx.embed_sentence') returns Spark NLP model sent_bert_multi_cased\n","nlu.load('xx.embed_sentence.bert') returns Spark NLP model sent_bert_multi_cased\n","nlu.load('xx.embed_sentence.bert.cased') returns Spark NLP model sent_bert_multi_cased\n","nlu.load('xx.embed_sentence.labse') returns Spark NLP model labse\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"IKK_Ii_gjJfF","executionInfo":{"status":"ok","timestamp":1620196320027,"user_tz":-120,"elapsed":499014,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"4237cd32-fd39-444f-f6a9-9a6de947a62f"},"source":["trainable_pipe = nlu.load('en.embed_sentence.small_bert_L12_768 train.sentiment')\n","# We need to train longer and user smaller LR for NON-USE based sentence embeddings usually\n","# We could tune the hyperparameters further with hyperparameter tuning methods like gridsearch\n","# Also longer training gives more accuracy\n","trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(70) \n","trainable_pipe['trainable_sentiment_dl'].setLr(0.0005) \n","fitted_pipe = trainable_pipe.fit(train_df)\n","# predict with the trainable pipeline on dataset and get predictions\n","preds = fitted_pipe.predict(train_df,output_level='document')\n","\n","#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n","preds.dropna(inplace=True)\n","print(classification_report(preds['y'], preds['sentiment']))\n","\n","#preds"],"execution_count":null,"outputs":[{"output_type":"stream","text":["sent_small_bert_L12_768 download started this may take some time.\n","Approximate size to download 392.9 MB\n","[OK!]\n","sentence_detector_dl download started this may take some time.\n","Approximate size to download 354.6 KB\n","[OK!]\n"," precision recall f1-score support\n","\n"," negative 0.86 0.80 0.83 300\n"," neutral 0.00 0.00 0.00 0\n"," positive 0.90 0.70 0.79 300\n","\n"," accuracy 0.75 600\n"," macro avg 0.59 0.50 0.54 600\n","weighted avg 0.88 0.75 0.81 600\n","\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"2BB-NwZUoHSe"},"source":["# 5. Lets save the model"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"eLex095goHwm","executionInfo":{"status":"ok","timestamp":1620196504424,"user_tz":-120,"elapsed":683325,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"798b31d7-cec2-48c8-ae29-7d4577ed9097"},"source":["stored_model_path = './models/classifier_dl_trained' \n","fitted_pipe.save(stored_model_path)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["Stored model in ./models/classifier_dl_trained\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"e_b2DPd4rCiU"},"source":["# 6. Lets load the model from HDD.\n","This makes Offlien NLU usage possible! \n","You need to call nlu.load(path=path_to_the_pipe) to load a model/pipeline from disk."]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":77},"id":"SO4uz45MoRgp","executionInfo":{"status":"ok","timestamp":1620196519636,"user_tz":-120,"elapsed":698457,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"662503b6-694f-4990-e121-80f6a9f2dcae"},"source":["hdd_pipe = nlu.load(path=stored_model_path)\n","\n","preds = hdd_pipe.predict('Indian prime minister was assinated')\n","preds"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
origin_indextextsentence_embedding_from_disksentencesentiment_confidencesentimentdocument
08589934592Indian prime minister was assinated[[-0.09739551693201065, 0.23939256370067596, 0...[Indian prime minister was assinated][0.81195][negative]Indian prime minister was assinated
\n","
"],"text/plain":[" origin_index ... document\n","0 8589934592 ... Indian prime minister was assinated\n","\n","[1 rows x 7 columns]"]},"metadata":{"tags":[]},"execution_count":13}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"e0CVlkk9v6Qi","executionInfo":{"status":"ok","timestamp":1620196519638,"user_tz":-120,"elapsed":698413,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"10c659fe-32a8-479b-e994-6827d8240a64"},"source":["hdd_pipe.print_info()"],"execution_count":null,"outputs":[{"output_type":"stream","text":["The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n",">>> pipe['document_assembler'] has settable params:\n","pipe['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n",">>> pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'] has settable params:\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setExplodeSentences(False) | Info: whether to explode each sentence into a different row, for better parallelization. Defaults to false. | Currently set to : False\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setStorageRef('SentenceDetectorDLModel_c83c27f46b97') | Info: storage unique identifier | Currently set to : SentenceDetectorDLModel_c83c27f46b97\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setEncoder(com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@2d9c5454) | Info: Data encoder | Currently set to : com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@2d9c5454\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setImpossiblePenultimates(['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']) | Info: Impossible penultimates | Currently set to : ['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setModelArchitecture('cnn') | Info: Model architecture (CNN) | Currently set to : cnn\n",">>> pipe['bert_sentence@sent_small_bert_L12_768'] has settable params:\n","pipe['bert_sentence@sent_small_bert_L12_768'].setBatchSize(8) | Info: Size of every batch | Currently set to : 8\n","pipe['bert_sentence@sent_small_bert_L12_768'].setCaseSensitive(False) | Info: whether to ignore case in tokens for embeddings matching | Currently set to : False\n","pipe['bert_sentence@sent_small_bert_L12_768'].setDimension(768) | Info: Number of embedding dimensions | Currently set to : 768\n","pipe['bert_sentence@sent_small_bert_L12_768'].setMaxSentenceLength(128) | Info: Max sentence length to process | Currently set to : 128\n","pipe['bert_sentence@sent_small_bert_L12_768'].setIsLong(False) | Info: Use Long type instead of Int type for inputs buffer - Some Bert models require Long instead of Int. | Currently set to : False\n","pipe['bert_sentence@sent_small_bert_L12_768'].setStorageRef('sent_small_bert_L12_768') | Info: unique reference name for identification | Currently set to : sent_small_bert_L12_768\n",">>> pipe['sentiment_dl@sent_small_bert_L12_768'] has settable params:\n","pipe['sentiment_dl@sent_small_bert_L12_768'].setThreshold(0.6) | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n","pipe['sentiment_dl@sent_small_bert_L12_768'].setThresholdLabel('neutral') | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n","pipe['sentiment_dl@sent_small_bert_L12_768'].setClasses(['positive', 'negative']) | Info: get the tags used to trained this SentimentDLModel | Currently set to : ['positive', 'negative']\n","pipe['sentiment_dl@sent_small_bert_L12_768'].setStorageRef('sent_small_bert_L12_768') | Info: unique reference name for identification | Currently set to : sent_small_bert_L12_768\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"2LJTK79JKF9-"},"source":[""],"execution_count":null,"outputs":[]}]} \ No newline at end of file +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "zkufh760uvF3" + }, + "source": [ + "![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)\n", + "\n", + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_reddit.ipynb)\n", + "\n", + "\n", + "# Training a Sentiment Analysis Classifier with NLU\n", + "## 2 class Reddit comment sentiment classifier training\n", + "With the [SentimentDL model](https://nlp.johnsnowlabs.com/docs/en/annotators#sentimentdl-multi-class-sentiment-analysis-annotator) from Spark NLP you can achieve State Of the Art results on any multi class text classification problem\n", + "\n", + "This notebook showcases the following features :\n", + "\n", + "- How to train the deep learning classifier\n", + "- How to store a pipeline to disk\n", + "- How to load the pipeline from disk (Enables NLU offline mode)\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dur2drhW5Rvi" + }, + "source": [ + "# 1. Install Java 8 and NLU" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "hFGnBCHavltY" + }, + "outputs": [], + "source": [ + "!pip install -q johnsnowlabs" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "f4KkTfnR5Ugg" + }, + "source": [ + "# 2. Download Reddit Sentiment dataset\n", + "https://www.kaggle.com/cosmos98/twitter-and-reddit-sentimental-analysis-dataset\n", + "#Context\n", + "\n", + "This is was a Dataset Created as a part of the university Project On Sentimental Analysis On Multi-Source Social Media Platforms using PySpark." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "OrVb5ZMvvrQD" + }, + "outputs": [], + "source": [ + "! wget https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/resources/en/classifier-dl/reddit_twitter_sentiment/Reddit_Data.csv\n" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 424 + }, + "id": "y4xSRWIhwT28", + "outputId": "4928f1b2-102c-4519-83f7-6bfa6205749a" + }, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
texty
0title edit 56dd brend ambasittur for vidya sec...negative
1this reminds kunkka old dota loading screen ar...positive
2meanwhile the other news cms intenttargetnegative
3lovely finish there giroudpositive
4glad see kurisu made with surprising results d...positive
.........
595true india needs top universities and not fdi...positive
596why did miyaichi come off earlypositive
597indian media has lost its credibility now ital...negative
598malaysia french satellite find further debris ...negative
599tell many people you can vote for modi this go...positive
\n", + "

600 rows × 2 columns

\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ], + "text/plain": [ + " text y\n", + "0 title edit 56dd brend ambasittur for vidya sec... negative\n", + "1 this reminds kunkka old dota loading screen ar... positive\n", + "2 meanwhile the other news cms intenttarget negative\n", + "3 lovely finish there giroud positive\n", + "4 glad see kurisu made with surprising results d... positive\n", + ".. ... ...\n", + "595 true india needs top universities and not fdi... positive\n", + "596 why did miyaichi come off early positive\n", + "597 indian media has lost its credibility now ital... negative\n", + "598 malaysia french satellite find further debris ... negative\n", + "599 tell many people you can vote for modi this go... positive\n", + "\n", + "[600 rows x 2 columns]" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import pandas as pd\n", + "train_path = '/content/Reddit_Data.csv'\n", + "\n", + "train_df = pd.read_csv(train_path)\n", + "# the text data to use for classification should be in a column named 'text'\n", + "columns=['text','y']\n", + "train_df = train_df[columns]\n", + "train_df" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0296Om2C5anY" + }, + "source": [ + "# 3. Train Deep Learning Classifier using nlu.load('train.sentiment')\n", + "\n", + "You dataset label column should be named 'y' and the feature column with text data should be named 'text'" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "id": "3ZIPkRkWftBG", + "outputId": "7896942b-7b84-4b21-ee47-a3c59de61166" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "sent_small_bert_L2_128 download started this may take some time.\n", + "Approximate size to download 16.1 MB\n", + "[OK!]\n", + " precision recall f1-score support\n", + "\n", + " negative 0.00 0.00 0.00 20\n", + " positive 0.60 1.00 0.75 30\n", + "\n", + " accuracy 0.60 50\n", + " macro avg 0.30 0.50 0.37 50\n", + "weighted avg 0.36 0.60 0.45 50\n", + "\n" + ] + }, + { + "data": { + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
documentsentence_embedding_small_bert_L2_128sentimentsentiment_confidencetexty
0title edit 56dd brend ambasittur for vidya secret[-0.13924618065357208, 0.03819392994046211, -0...positive2.0title edit 56dd brend ambasittur for vidya sec...negative
1this reminds kunkka old dota loading screen ar...[-0.05514780059456825, -0.02986345998942852, -...positive1.0this reminds kunkka old dota loading screen ar...positive
2meanwhile the other news cms intenttarget[-1.3784979581832886, -0.05633991211652756, 0....positive7.0meanwhile the other news cms intenttargetnegative
3lovely finish there giroud[-1.026403546333313, -0.5962088108062744, -0.5...positive4.0lovely finish there giroudpositive
4glad see kurisu made with surprising results d...[-1.3281804323196411, -0.3425588011741638, 0.2...positive2.0glad see kurisu made with surprising results d...positive
5wee jas lawful neutral god death magic and nec...[-1.090584635734558, -0.39823225140571594, -0....positive9.0wee jas lawful neutral god death magic and nec...positive
6holy fuck this amazing[-2.318075656890869, -0.10535052418708801, -0....positive7.0holy fuck this amazingpositive
7yougaiz yougaiz who interested parineeti chopr...[-0.7263230085372925, -0.9229655861854553, 0.1...positive9.0yougaiz yougaiz who interested parineeti chopr...positive
8not whole lot has changed except the fact that...[-1.1650828123092651, 0.09717072546482086, -0....positive8.0not whole lot has changed except the fact that...positive
9different times different cultures same point ...[-1.5615334510803223, -0.6228992938995361, -0....positive3.0different times different cultures same point ...positive
10too much attention[-1.473575234413147, 0.6288167834281921, -0.68...positive3.0too much attentionpositive
11its nice and all but your monitors make want b...[-0.8991073966026306, 0.8156149983406067, -0.2...positive1.0its nice and all but your monitors make want b...positive
12india turns into toi comments section the mome...[-0.7343364357948303, -0.12650255858898163, 0....positive3.0india turns into toi comments section the mom...positive
13nice build man pretty damn sexy can ask why ti...[-1.21924889087677, 0.21342627704143524, -0.40...positive1.0nice build man pretty damn sexy can ask why ti...positive
14rofl why are you asking permission you and you...[-0.8997104167938232, 0.45302870869636536, -0....positive7.0rofl why are you asking permission you and you...positive
15seriously did you infographic maker even consi...[-0.8621772527694702, 0.4609760046005249, -0.1...positive2.0seriously did you infographic maker even consi...negative
16cliche but you can wrong with cyric mean was m...[-1.3570650815963745, -0.10502084344625473, -0...positive1.0cliche but you can wrong with cyric mean was ...negative
17from whatever heard this the model modi speech...[-0.8209349513053894, 0.2195633053779602, -0.2...positive2.0from whatever heard this the model modi speech...positive
18heard there was direct line narendra modi whic...[-0.4502873718738556, -0.10798719525337219, -0...positive3.0heard there was direct line narendra modi whi...positive
19truth told there not insignificant percentage ...[-0.6922494173049927, 0.08739931136369705, -0....positive8.0truth told there not insignificant percentage ...positive
20many them fear namo being prime minister could...[-1.3202203512191772, -0.021619228646159172, -...positive2.0many them fear namo being prime minister coul...positive
21source will have accommodate hindus from bangl...[-0.6071749329566956, 0.21432138979434967, -0....positive1.0source will have accommodate hindus from bang...negative
22confirmed woman and this india 186 comments le...[-0.7401843070983887, -0.4919162094593048, -0....positive3.0confirmed woman and this india 186 comments l...positive
23prepared for downvotes but after watching seve...[-0.5390527844429016, 1.0651843547821045, -0.5...positive2.0prepared for downvotes but after watching sev...negative
24delhi not sleeping after long day see congress...[-1.3164680004119873, 0.059689976274967194, -0...positive4.0delhi not sleeping after long day see congres...negative
25would like bjp come out support scrapping sec ...[-0.39008525013923645, 0.6130499243736267, -0....positive2.0would like bjp come out support scrapping sec...negative
26update still found debris black boxes evidence...[-0.5251181721687317, 0.5843355059623718, -0.2...positive1.0update still found debris black boxes evidenc...negative
27there one tool bjp can use their manifesto whi...[-0.42395493388175964, 0.3328923285007477, -0....positive2.0there one tool bjp can use their manifesto whi...positive
28wtf why[-1.0021588802337646, 1.2250791788101196, 0.05...positive4.0wtf whynegative
29fantastic strike the[-0.6758270859718323, -0.2134172022342682, -0....positive2.0fantastic strike thepositive
30you mean the ruling coalition government hatch...[-0.5493559837341309, -0.2645500600337982, -0....positive1.0you mean the ruling coalition government hatch...negative
31before used anti rss result anti bjp very skep...[-0.11697939038276672, 0.26724380254745483, 0....positive7.0before used anti rss result anti bjp very ske...positive
32first understand that you are not anyway contr...[-1.1990007162094116, -0.23811395466327667, -0...positive1.0first understand that you are not anyway contr...positive
33personally for shooting muslims and internetof...[-1.314710259437561, -0.07083063572645187, -0....positive8.0personally for shooting muslims and internetof...negative
34nice build one thing caught eye absolute splur...[-0.3646998703479767, -0.6708006262779236, -0....positive8.0nice build one thing caught eye absolute splur...positive
35looks shit now but still proud made[-1.128523588180542, 0.31619158387184143, 0.02...positive1.0looks shit now but still proud madepositive
36what wrong with that another lame ass attempt ...[-1.9695912599563599, 0.27932408452033997, 0.0...positive1.0what wrong with that another lame ass attempt ...negative
37how difficult was get the length and bend the ...[-1.2320657968521118, 0.9790331721305847, -0.5...positive8.0how difficult was get the length and bend the ...positive
38very impressed with gnabry movement and linkin...[-0.3107994794845581, 0.3307916522026062, -0.1...positive4.0very impressed with gnabry movement and linki...positive
39here interesting tidbit what the like off pert...[-0.308876633644104, 0.49783819913864136, -0.2...positive2.0here interesting tidbit what the like off pert...negative
40false dichotomy either you are for for bjp you...[-0.6038466691970825, 0.2866456210613251, -0.3...positive3.0false dichotomy either you are for for bjp you...negative
41angry namo bhaktas the survival the angry namo...[-1.20363450050354, -1.119809627532959, -0.183...positive4.0angry namo bhaktas the survival the angry namo...negative
42this would issue these people were prosecuted ...[-0.853580892086029, 0.8160178661346436, 0.117...positive4.0this would issue these people were prosecuted ...negative
43hate free india would boring love hate relatio...[-0.28927552700042725, -0.9648609757423401, -0...positive2.0hate free india would boring love hate relati...negative
44this getting bit silly now[-1.1343425512313843, -0.030997686088085175, 0...positive5.0this getting bit silly nownegative
45giroud should done better but great ball rosicky[-1.308074951171875, -0.03950951248407364, -0....positive3.0giroud should done better but great ball rosickypositive
46jakiro spotted the middle top maybe[-2.2450177669525146, -0.5104915499687195, -0....positive2.0jakiro spotted the middle top maybepositive
47won vote for aap anymore that for sure though ...[-0.7843935489654541, -0.32822510600090027, 0....positive5.0won vote for aap anymore that for sure though...negative
48regardless the opposition all girouds goals ha...[-1.635783076286316, 0.30238792300224304, -0.7...positive2.0regardless the opposition all girouds goals ha...positive
49directly pleading people who oppose modi just ...[-0.7491450905799866, -0.4327720105648041, 0.0...positive1.0directly pleading people who oppose modi just...positive
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ], + "text/plain": [ + " document \\\n", + "0 title edit 56dd brend ambasittur for vidya secret \n", + "1 this reminds kunkka old dota loading screen ar... \n", + "2 meanwhile the other news cms intenttarget \n", + "3 lovely finish there giroud \n", + "4 glad see kurisu made with surprising results d... \n", + "5 wee jas lawful neutral god death magic and nec... \n", + "6 holy fuck this amazing \n", + "7 yougaiz yougaiz who interested parineeti chopr... \n", + "8 not whole lot has changed except the fact that... \n", + "9 different times different cultures same point ... \n", + "10 too much attention \n", + "11 its nice and all but your monitors make want b... \n", + "12 india turns into toi comments section the mome... \n", + "13 nice build man pretty damn sexy can ask why ti... \n", + "14 rofl why are you asking permission you and you... \n", + "15 seriously did you infographic maker even consi... \n", + "16 cliche but you can wrong with cyric mean was m... \n", + "17 from whatever heard this the model modi speech... \n", + "18 heard there was direct line narendra modi whic... \n", + "19 truth told there not insignificant percentage ... \n", + "20 many them fear namo being prime minister could... \n", + "21 source will have accommodate hindus from bangl... \n", + "22 confirmed woman and this india 186 comments le... \n", + "23 prepared for downvotes but after watching seve... \n", + "24 delhi not sleeping after long day see congress... \n", + "25 would like bjp come out support scrapping sec ... \n", + "26 update still found debris black boxes evidence... \n", + "27 there one tool bjp can use their manifesto whi... \n", + "28 wtf why \n", + "29 fantastic strike the \n", + "30 you mean the ruling coalition government hatch... \n", + "31 before used anti rss result anti bjp very skep... \n", + "32 first understand that you are not anyway contr... \n", + "33 personally for shooting muslims and internetof... \n", + "34 nice build one thing caught eye absolute splur... \n", + "35 looks shit now but still proud made \n", + "36 what wrong with that another lame ass attempt ... \n", + "37 how difficult was get the length and bend the ... \n", + "38 very impressed with gnabry movement and linkin... \n", + "39 here interesting tidbit what the like off pert... \n", + "40 false dichotomy either you are for for bjp you... \n", + "41 angry namo bhaktas the survival the angry namo... \n", + "42 this would issue these people were prosecuted ... \n", + "43 hate free india would boring love hate relatio... \n", + "44 this getting bit silly now \n", + "45 giroud should done better but great ball rosicky \n", + "46 jakiro spotted the middle top maybe \n", + "47 won vote for aap anymore that for sure though ... \n", + "48 regardless the opposition all girouds goals ha... \n", + "49 directly pleading people who oppose modi just ... \n", + "\n", + " sentence_embedding_small_bert_L2_128 sentiment \\\n", + "0 [-0.13924618065357208, 0.03819392994046211, -0... positive \n", + "1 [-0.05514780059456825, -0.02986345998942852, -... positive \n", + "2 [-1.3784979581832886, -0.05633991211652756, 0.... positive \n", + "3 [-1.026403546333313, -0.5962088108062744, -0.5... positive \n", + "4 [-1.3281804323196411, -0.3425588011741638, 0.2... positive \n", + "5 [-1.090584635734558, -0.39823225140571594, -0.... positive \n", + "6 [-2.318075656890869, -0.10535052418708801, -0.... positive \n", + "7 [-0.7263230085372925, -0.9229655861854553, 0.1... positive \n", + "8 [-1.1650828123092651, 0.09717072546482086, -0.... positive \n", + "9 [-1.5615334510803223, -0.6228992938995361, -0.... positive \n", + "10 [-1.473575234413147, 0.6288167834281921, -0.68... positive \n", + "11 [-0.8991073966026306, 0.8156149983406067, -0.2... positive \n", + "12 [-0.7343364357948303, -0.12650255858898163, 0.... positive \n", + "13 [-1.21924889087677, 0.21342627704143524, -0.40... positive \n", + "14 [-0.8997104167938232, 0.45302870869636536, -0.... positive \n", + "15 [-0.8621772527694702, 0.4609760046005249, -0.1... positive \n", + "16 [-1.3570650815963745, -0.10502084344625473, -0... positive \n", + "17 [-0.8209349513053894, 0.2195633053779602, -0.2... positive \n", + "18 [-0.4502873718738556, -0.10798719525337219, -0... positive \n", + "19 [-0.6922494173049927, 0.08739931136369705, -0.... positive \n", + "20 [-1.3202203512191772, -0.021619228646159172, -... positive \n", + "21 [-0.6071749329566956, 0.21432138979434967, -0.... positive \n", + "22 [-0.7401843070983887, -0.4919162094593048, -0.... positive \n", + "23 [-0.5390527844429016, 1.0651843547821045, -0.5... positive \n", + "24 [-1.3164680004119873, 0.059689976274967194, -0... positive \n", + "25 [-0.39008525013923645, 0.6130499243736267, -0.... positive \n", + "26 [-0.5251181721687317, 0.5843355059623718, -0.2... positive \n", + "27 [-0.42395493388175964, 0.3328923285007477, -0.... positive \n", + "28 [-1.0021588802337646, 1.2250791788101196, 0.05... positive \n", + "29 [-0.6758270859718323, -0.2134172022342682, -0.... positive \n", + "30 [-0.5493559837341309, -0.2645500600337982, -0.... positive \n", + "31 [-0.11697939038276672, 0.26724380254745483, 0.... positive \n", + "32 [-1.1990007162094116, -0.23811395466327667, -0... positive \n", + "33 [-1.314710259437561, -0.07083063572645187, -0.... positive \n", + "34 [-0.3646998703479767, -0.6708006262779236, -0.... positive \n", + "35 [-1.128523588180542, 0.31619158387184143, 0.02... positive \n", + "36 [-1.9695912599563599, 0.27932408452033997, 0.0... positive \n", + "37 [-1.2320657968521118, 0.9790331721305847, -0.5... positive \n", + "38 [-0.3107994794845581, 0.3307916522026062, -0.1... positive \n", + "39 [-0.308876633644104, 0.49783819913864136, -0.2... positive \n", + "40 [-0.6038466691970825, 0.2866456210613251, -0.3... positive \n", + "41 [-1.20363450050354, -1.119809627532959, -0.183... positive \n", + "42 [-0.853580892086029, 0.8160178661346436, 0.117... positive \n", + "43 [-0.28927552700042725, -0.9648609757423401, -0... positive \n", + "44 [-1.1343425512313843, -0.030997686088085175, 0... positive \n", + "45 [-1.308074951171875, -0.03950951248407364, -0.... positive \n", + "46 [-2.2450177669525146, -0.5104915499687195, -0.... positive \n", + "47 [-0.7843935489654541, -0.32822510600090027, 0.... positive \n", + "48 [-1.635783076286316, 0.30238792300224304, -0.7... positive \n", + "49 [-0.7491450905799866, -0.4327720105648041, 0.0... positive \n", + "\n", + " sentiment_confidence text \\\n", + "0 2.0 title edit 56dd brend ambasittur for vidya sec... \n", + "1 1.0 this reminds kunkka old dota loading screen ar... \n", + "2 7.0 meanwhile the other news cms intenttarget \n", + "3 4.0 lovely finish there giroud \n", + "4 2.0 glad see kurisu made with surprising results d... \n", + "5 9.0 wee jas lawful neutral god death magic and nec... \n", + "6 7.0 holy fuck this amazing \n", + "7 9.0 yougaiz yougaiz who interested parineeti chopr... \n", + "8 8.0 not whole lot has changed except the fact that... \n", + "9 3.0 different times different cultures same point ... \n", + "10 3.0 too much attention \n", + "11 1.0 its nice and all but your monitors make want b... \n", + "12 3.0 india turns into toi comments section the mom... \n", + "13 1.0 nice build man pretty damn sexy can ask why ti... \n", + "14 7.0 rofl why are you asking permission you and you... \n", + "15 2.0 seriously did you infographic maker even consi... \n", + "16 1.0 cliche but you can wrong with cyric mean was ... \n", + "17 2.0 from whatever heard this the model modi speech... \n", + "18 3.0 heard there was direct line narendra modi whi... \n", + "19 8.0 truth told there not insignificant percentage ... \n", + "20 2.0 many them fear namo being prime minister coul... \n", + "21 1.0 source will have accommodate hindus from bang... \n", + "22 3.0 confirmed woman and this india 186 comments l... \n", + "23 2.0 prepared for downvotes but after watching sev... \n", + "24 4.0 delhi not sleeping after long day see congres... \n", + "25 2.0 would like bjp come out support scrapping sec... \n", + "26 1.0 update still found debris black boxes evidenc... \n", + "27 2.0 there one tool bjp can use their manifesto whi... \n", + "28 4.0 wtf why \n", + "29 2.0 fantastic strike the \n", + "30 1.0 you mean the ruling coalition government hatch... \n", + "31 7.0 before used anti rss result anti bjp very ske... \n", + "32 1.0 first understand that you are not anyway contr... \n", + "33 8.0 personally for shooting muslims and internetof... \n", + "34 8.0 nice build one thing caught eye absolute splur... \n", + "35 1.0 looks shit now but still proud made \n", + "36 1.0 what wrong with that another lame ass attempt ... \n", + "37 8.0 how difficult was get the length and bend the ... \n", + "38 4.0 very impressed with gnabry movement and linki... \n", + "39 2.0 here interesting tidbit what the like off pert... \n", + "40 3.0 false dichotomy either you are for for bjp you... \n", + "41 4.0 angry namo bhaktas the survival the angry namo... \n", + "42 4.0 this would issue these people were prosecuted ... \n", + "43 2.0 hate free india would boring love hate relati... \n", + "44 5.0 this getting bit silly now \n", + "45 3.0 giroud should done better but great ball rosicky \n", + "46 2.0 jakiro spotted the middle top maybe \n", + "47 5.0 won vote for aap anymore that for sure though... \n", + "48 2.0 regardless the opposition all girouds goals ha... \n", + "49 1.0 directly pleading people who oppose modi just... \n", + "\n", + " y \n", + "0 negative \n", + "1 positive \n", + "2 negative \n", + "3 positive \n", + "4 positive \n", + "5 positive \n", + "6 positive \n", + "7 positive \n", + "8 positive \n", + "9 positive \n", + "10 positive \n", + "11 positive \n", + "12 positive \n", + "13 positive \n", + "14 positive \n", + "15 negative \n", + "16 negative \n", + "17 positive \n", + "18 positive \n", + "19 positive \n", + "20 positive \n", + "21 negative \n", + "22 positive \n", + "23 negative \n", + "24 negative \n", + "25 negative \n", + "26 negative \n", + "27 positive \n", + "28 negative \n", + "29 positive \n", + "30 negative \n", + "31 positive \n", + "32 positive \n", + "33 negative \n", + "34 positive \n", + "35 positive \n", + "36 negative \n", + "37 positive \n", + "38 positive \n", + "39 negative \n", + "40 negative \n", + "41 negative \n", + "42 negative \n", + "43 negative \n", + "44 negative \n", + "45 positive \n", + "46 positive \n", + "47 negative \n", + "48 positive \n", + "49 positive " + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from sklearn.metrics import classification_report\n", + "from johnsnowlabs import nlp\n", + "# load a trainable pipeline by specifying the train. prefix and fit it on a datset with label and text columns\n", + "# by default the Universal Sentence Encoder (USE) Sentence embeddings are used for generation\n", + "trainable_pipe = nlp.load('train.sentiment')\n", + "fitted_pipe = trainable_pipe.fit(train_df.iloc[:50])\n", + "\n", + "# predict with the trainable pipeline on dataset and get predictions\n", + "preds = fitted_pipe.predict(train_df.iloc[:50],output_level='document')\n", + "#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n", + "preds.dropna(inplace=True)\n", + "print(classification_report(preds['y'], preds['sentiment']))\n", + "\n", + "preds" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lVyOE2wV0fw_" + }, + "source": [ + "# Test the fitted pipe on new example" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 150 + }, + "id": "qdCUg2MR0PD2", + "outputId": "ea8e288d-2016-4777-99f5-9dd729a25e3f" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "sentence_detector_dl download started this may take some time.\n", + "Approximate size to download 354.6 KB\n", + "[OK!]\n", + "Warning::Spark Session already created, some configs may not take.\n" + ] + }, + { + "data": { + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
sentencesentence_embedding_small_bert_L2_128sentimentsentiment_confidence
0Indian prime minister was assinated![-0.7613918781280518, 0.7001779079437256, -0.1...positive0.999982
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "text/plain": [ + " sentence \\\n", + "0 Indian prime minister was assinated! \n", + "\n", + " sentence_embedding_small_bert_L2_128 sentiment \\\n", + "0 [-0.7613918781280518, 0.7001779079437256, -0.1... positive \n", + "\n", + " sentiment_confidence \n", + "0 0.999982 " + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "fitted_pipe.predict(\"Indian prime minister was assinated!\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xflpwrVjjBVD" + }, + "source": [ + "## Configure pipe training parameters" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "UtsAUGTmOTms", + "outputId": "12c0a69d-9fbc-41de-9de3-f923097e9b54" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n", + ">>> component_list['bert_sentence_embeddings@sent_small_bert_L2_128'] has settable params:\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setBatchSize(8) | Info: Size of every batch | Currently set to : 8\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setIsLong(False) | Info: Use Long type instead of Int type for inputs buffer - Some Bert models require Long instead of Int. | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setMaxSentenceLength(128) | Info: Max sentence length to process | Currently set to : 128\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setDimension(128) | Info: Number of embedding dimensions | Currently set to : 128\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setCaseSensitive(False) | Info: whether to ignore case in tokens for embeddings matching | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setStorageRef('sent_small_bert_L2_128') | Info: unique reference name for identification | Currently set to : sent_small_bert_L2_128\n", + ">>> component_list['document_assembler'] has settable params:\n", + "component_list['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n", + ">>> component_list['sentiment_dl@sent_small_bert_L2_128'] has settable params:\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setThreshold(0.6) | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setThresholdLabel('neutral') | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setStorageRef('sent_small_bert_L2_128') | Info: unique reference name for identification | Currently set to : sent_small_bert_L2_128\n" + ] + } + ], + "source": [ + "trainable_pipe.print_info()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2GJdDNV9jEIe" + }, + "source": [ + "## Retrain with new parameters" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "id": "mptfvHx-MMMX", + "outputId": "0905f3b3-babb-401a-f027-eb2e8475da37" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", + "sent_small_bert_L2_128 download started this may take some time.\n", + "Approximate size to download 16.1 MB\n", + "[OK!]\n", + " precision recall f1-score support\n", + "\n", + " negative 0.00 0.00 0.00 20\n", + " positive 0.60 1.00 0.75 30\n", + "\n", + " accuracy 0.60 50\n", + " macro avg 0.30 0.50 0.37 50\n", + "weighted avg 0.36 0.60 0.45 50\n", + "\n" + ] + }, + { + "data": { + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
documentsentence_embedding_small_bert_L2_128sentimentsentiment_confidencetexty
0title edit 56dd brend ambasittur for vidya secret[-0.13924618065357208, 0.03819392994046211, -0...positive2.0title edit 56dd brend ambasittur for vidya sec...negative
1this reminds kunkka old dota loading screen ar...[-0.05514780059456825, -0.02986345998942852, -...positive2.0this reminds kunkka old dota loading screen ar...positive
2meanwhile the other news cms intenttarget[-1.3784979581832886, -0.05633991211652756, 0....positive4.0meanwhile the other news cms intenttargetnegative
3lovely finish there giroud[-1.026403546333313, -0.5962088108062744, -0.5...positive9.0lovely finish there giroudpositive
4glad see kurisu made with surprising results d...[-1.3281804323196411, -0.3425588011741638, 0.2...positive4.0glad see kurisu made with surprising results d...positive
5wee jas lawful neutral god death magic and nec...[-1.090584635734558, -0.39823225140571594, -0....positive4.0wee jas lawful neutral god death magic and nec...positive
6holy fuck this amazing[-2.318075656890869, -0.10535052418708801, -0....positive2.0holy fuck this amazingpositive
7yougaiz yougaiz who interested parineeti chopr...[-0.7263230085372925, -0.9229655861854553, 0.1...positive2.0yougaiz yougaiz who interested parineeti chopr...positive
8not whole lot has changed except the fact that...[-1.1650828123092651, 0.09717072546482086, -0....positive3.0not whole lot has changed except the fact that...positive
9different times different cultures same point ...[-1.5615334510803223, -0.6228992938995361, -0....positive1.0different times different cultures same point ...positive
10too much attention[-1.473575234413147, 0.6288167834281921, -0.68...positive1.0too much attentionpositive
11its nice and all but your monitors make want b...[-0.8991073966026306, 0.8156149983406067, -0.2...positive3.0its nice and all but your monitors make want b...positive
12india turns into toi comments section the mome...[-0.7343364357948303, -0.12650255858898163, 0....positive1.0india turns into toi comments section the mom...positive
13nice build man pretty damn sexy can ask why ti...[-1.21924889087677, 0.21342627704143524, -0.40...positive1.0nice build man pretty damn sexy can ask why ti...positive
14rofl why are you asking permission you and you...[-0.8997104167938232, 0.45302870869636536, -0....positive1.0rofl why are you asking permission you and you...positive
15seriously did you infographic maker even consi...[-0.8621772527694702, 0.4609760046005249, -0.1...positive2.0seriously did you infographic maker even consi...negative
16cliche but you can wrong with cyric mean was m...[-1.3570650815963745, -0.10502084344625473, -0...positive3.0cliche but you can wrong with cyric mean was ...negative
17from whatever heard this the model modi speech...[-0.8209349513053894, 0.2195633053779602, -0.2...positive6.0from whatever heard this the model modi speech...positive
18heard there was direct line narendra modi whic...[-0.4502873718738556, -0.10798719525337219, -0...positive1.0heard there was direct line narendra modi whi...positive
19truth told there not insignificant percentage ...[-0.6922494173049927, 0.08739931136369705, -0....positive1.0truth told there not insignificant percentage ...positive
20many them fear namo being prime minister could...[-1.3202203512191772, -0.021619228646159172, -...positive7.0many them fear namo being prime minister coul...positive
21source will have accommodate hindus from bangl...[-0.6071749329566956, 0.21432138979434967, -0....positive4.0source will have accommodate hindus from bang...negative
22confirmed woman and this india 186 comments le...[-0.7401843070983887, -0.4919162094593048, -0....positive1.0confirmed woman and this india 186 comments l...positive
23prepared for downvotes but after watching seve...[-0.5390527844429016, 1.0651843547821045, -0.5...positive1.0prepared for downvotes but after watching sev...negative
24delhi not sleeping after long day see congress...[-1.3164680004119873, 0.059689976274967194, -0...positive7.0delhi not sleeping after long day see congres...negative
25would like bjp come out support scrapping sec ...[-0.39008525013923645, 0.6130499243736267, -0....positive1.0would like bjp come out support scrapping sec...negative
26update still found debris black boxes evidence...[-0.5251181721687317, 0.5843355059623718, -0.2...positive4.0update still found debris black boxes evidenc...negative
27there one tool bjp can use their manifesto whi...[-0.42395493388175964, 0.3328923285007477, -0....positive3.0there one tool bjp can use their manifesto whi...positive
28wtf why[-1.0021588802337646, 1.2250791788101196, 0.05...positive1.0wtf whynegative
29fantastic strike the[-0.6758270859718323, -0.2134172022342682, -0....positive2.0fantastic strike thepositive
30you mean the ruling coalition government hatch...[-0.5493559837341309, -0.2645500600337982, -0....positive2.0you mean the ruling coalition government hatch...negative
31before used anti rss result anti bjp very skep...[-0.11697939038276672, 0.26724380254745483, 0....positive1.0before used anti rss result anti bjp very ske...positive
32first understand that you are not anyway contr...[-1.1990007162094116, -0.23811395466327667, -0...positive7.0first understand that you are not anyway contr...positive
33personally for shooting muslims and internetof...[-1.314710259437561, -0.07083063572645187, -0....positive5.0personally for shooting muslims and internetof...negative
34nice build one thing caught eye absolute splur...[-0.3646998703479767, -0.6708006262779236, -0....positive4.0nice build one thing caught eye absolute splur...positive
35looks shit now but still proud made[-1.128523588180542, 0.31619158387184143, 0.02...positive4.0looks shit now but still proud madepositive
36what wrong with that another lame ass attempt ...[-1.9695912599563599, 0.27932408452033997, 0.0...positive1.0what wrong with that another lame ass attempt ...negative
37how difficult was get the length and bend the ...[-1.2320657968521118, 0.9790331721305847, -0.5...positive4.0how difficult was get the length and bend the ...positive
38very impressed with gnabry movement and linkin...[-0.3107994794845581, 0.3307916522026062, -0.1...positive5.0very impressed with gnabry movement and linki...positive
39here interesting tidbit what the like off pert...[-0.308876633644104, 0.49783819913864136, -0.2...positive3.0here interesting tidbit what the like off pert...negative
40false dichotomy either you are for for bjp you...[-0.6038466691970825, 0.2866456210613251, -0.3...positive1.0false dichotomy either you are for for bjp you...negative
41angry namo bhaktas the survival the angry namo...[-1.20363450050354, -1.119809627532959, -0.183...positive3.0angry namo bhaktas the survival the angry namo...negative
42this would issue these people were prosecuted ...[-0.853580892086029, 0.8160178661346436, 0.117...positive1.0this would issue these people were prosecuted ...negative
43hate free india would boring love hate relatio...[-0.28927552700042725, -0.9648609757423401, -0...positive9.0hate free india would boring love hate relati...negative
44this getting bit silly now[-1.1343425512313843, -0.030997686088085175, 0...positive1.0this getting bit silly nownegative
45giroud should done better but great ball rosicky[-1.308074951171875, -0.03950951248407364, -0....positive1.0giroud should done better but great ball rosickypositive
46jakiro spotted the middle top maybe[-2.2450177669525146, -0.5104915499687195, -0....positive5.0jakiro spotted the middle top maybepositive
47won vote for aap anymore that for sure though ...[-0.7843935489654541, -0.32822510600090027, 0....positive3.0won vote for aap anymore that for sure though...negative
48regardless the opposition all girouds goals ha...[-1.635783076286316, 0.30238792300224304, -0.7...positive3.0regardless the opposition all girouds goals ha...positive
49directly pleading people who oppose modi just ...[-0.7491450905799866, -0.4327720105648041, 0.0...positive1.0directly pleading people who oppose modi just...positive
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ], + "text/plain": [ + " document \\\n", + "0 title edit 56dd brend ambasittur for vidya secret \n", + "1 this reminds kunkka old dota loading screen ar... \n", + "2 meanwhile the other news cms intenttarget \n", + "3 lovely finish there giroud \n", + "4 glad see kurisu made with surprising results d... \n", + "5 wee jas lawful neutral god death magic and nec... \n", + "6 holy fuck this amazing \n", + "7 yougaiz yougaiz who interested parineeti chopr... \n", + "8 not whole lot has changed except the fact that... \n", + "9 different times different cultures same point ... \n", + "10 too much attention \n", + "11 its nice and all but your monitors make want b... \n", + "12 india turns into toi comments section the mome... \n", + "13 nice build man pretty damn sexy can ask why ti... \n", + "14 rofl why are you asking permission you and you... \n", + "15 seriously did you infographic maker even consi... \n", + "16 cliche but you can wrong with cyric mean was m... \n", + "17 from whatever heard this the model modi speech... \n", + "18 heard there was direct line narendra modi whic... \n", + "19 truth told there not insignificant percentage ... \n", + "20 many them fear namo being prime minister could... \n", + "21 source will have accommodate hindus from bangl... \n", + "22 confirmed woman and this india 186 comments le... \n", + "23 prepared for downvotes but after watching seve... \n", + "24 delhi not sleeping after long day see congress... \n", + "25 would like bjp come out support scrapping sec ... \n", + "26 update still found debris black boxes evidence... \n", + "27 there one tool bjp can use their manifesto whi... \n", + "28 wtf why \n", + "29 fantastic strike the \n", + "30 you mean the ruling coalition government hatch... \n", + "31 before used anti rss result anti bjp very skep... \n", + "32 first understand that you are not anyway contr... \n", + "33 personally for shooting muslims and internetof... \n", + "34 nice build one thing caught eye absolute splur... \n", + "35 looks shit now but still proud made \n", + "36 what wrong with that another lame ass attempt ... \n", + "37 how difficult was get the length and bend the ... \n", + "38 very impressed with gnabry movement and linkin... \n", + "39 here interesting tidbit what the like off pert... \n", + "40 false dichotomy either you are for for bjp you... \n", + "41 angry namo bhaktas the survival the angry namo... \n", + "42 this would issue these people were prosecuted ... \n", + "43 hate free india would boring love hate relatio... \n", + "44 this getting bit silly now \n", + "45 giroud should done better but great ball rosicky \n", + "46 jakiro spotted the middle top maybe \n", + "47 won vote for aap anymore that for sure though ... \n", + "48 regardless the opposition all girouds goals ha... \n", + "49 directly pleading people who oppose modi just ... \n", + "\n", + " sentence_embedding_small_bert_L2_128 sentiment \\\n", + "0 [-0.13924618065357208, 0.03819392994046211, -0... positive \n", + "1 [-0.05514780059456825, -0.02986345998942852, -... positive \n", + "2 [-1.3784979581832886, -0.05633991211652756, 0.... positive \n", + "3 [-1.026403546333313, -0.5962088108062744, -0.5... positive \n", + "4 [-1.3281804323196411, -0.3425588011741638, 0.2... positive \n", + "5 [-1.090584635734558, -0.39823225140571594, -0.... positive \n", + "6 [-2.318075656890869, -0.10535052418708801, -0.... positive \n", + "7 [-0.7263230085372925, -0.9229655861854553, 0.1... positive \n", + "8 [-1.1650828123092651, 0.09717072546482086, -0.... positive \n", + "9 [-1.5615334510803223, -0.6228992938995361, -0.... positive \n", + "10 [-1.473575234413147, 0.6288167834281921, -0.68... positive \n", + "11 [-0.8991073966026306, 0.8156149983406067, -0.2... positive \n", + "12 [-0.7343364357948303, -0.12650255858898163, 0.... positive \n", + "13 [-1.21924889087677, 0.21342627704143524, -0.40... positive \n", + "14 [-0.8997104167938232, 0.45302870869636536, -0.... positive \n", + "15 [-0.8621772527694702, 0.4609760046005249, -0.1... positive \n", + "16 [-1.3570650815963745, -0.10502084344625473, -0... positive \n", + "17 [-0.8209349513053894, 0.2195633053779602, -0.2... positive \n", + "18 [-0.4502873718738556, -0.10798719525337219, -0... positive \n", + "19 [-0.6922494173049927, 0.08739931136369705, -0.... positive \n", + "20 [-1.3202203512191772, -0.021619228646159172, -... positive \n", + "21 [-0.6071749329566956, 0.21432138979434967, -0.... positive \n", + "22 [-0.7401843070983887, -0.4919162094593048, -0.... positive \n", + "23 [-0.5390527844429016, 1.0651843547821045, -0.5... positive \n", + "24 [-1.3164680004119873, 0.059689976274967194, -0... positive \n", + "25 [-0.39008525013923645, 0.6130499243736267, -0.... positive \n", + "26 [-0.5251181721687317, 0.5843355059623718, -0.2... positive \n", + "27 [-0.42395493388175964, 0.3328923285007477, -0.... positive \n", + "28 [-1.0021588802337646, 1.2250791788101196, 0.05... positive \n", + "29 [-0.6758270859718323, -0.2134172022342682, -0.... positive \n", + "30 [-0.5493559837341309, -0.2645500600337982, -0.... positive \n", + "31 [-0.11697939038276672, 0.26724380254745483, 0.... positive \n", + "32 [-1.1990007162094116, -0.23811395466327667, -0... positive \n", + "33 [-1.314710259437561, -0.07083063572645187, -0.... positive \n", + "34 [-0.3646998703479767, -0.6708006262779236, -0.... positive \n", + "35 [-1.128523588180542, 0.31619158387184143, 0.02... positive \n", + "36 [-1.9695912599563599, 0.27932408452033997, 0.0... positive \n", + "37 [-1.2320657968521118, 0.9790331721305847, -0.5... positive \n", + "38 [-0.3107994794845581, 0.3307916522026062, -0.1... positive \n", + "39 [-0.308876633644104, 0.49783819913864136, -0.2... positive \n", + "40 [-0.6038466691970825, 0.2866456210613251, -0.3... positive \n", + "41 [-1.20363450050354, -1.119809627532959, -0.183... positive \n", + "42 [-0.853580892086029, 0.8160178661346436, 0.117... positive \n", + "43 [-0.28927552700042725, -0.9648609757423401, -0... positive \n", + "44 [-1.1343425512313843, -0.030997686088085175, 0... positive \n", + "45 [-1.308074951171875, -0.03950951248407364, -0.... positive \n", + "46 [-2.2450177669525146, -0.5104915499687195, -0.... positive \n", + "47 [-0.7843935489654541, -0.32822510600090027, 0.... positive \n", + "48 [-1.635783076286316, 0.30238792300224304, -0.7... positive \n", + "49 [-0.7491450905799866, -0.4327720105648041, 0.0... positive \n", + "\n", + " sentiment_confidence text \\\n", + "0 2.0 title edit 56dd brend ambasittur for vidya sec... \n", + "1 2.0 this reminds kunkka old dota loading screen ar... \n", + "2 4.0 meanwhile the other news cms intenttarget \n", + "3 9.0 lovely finish there giroud \n", + "4 4.0 glad see kurisu made with surprising results d... \n", + "5 4.0 wee jas lawful neutral god death magic and nec... \n", + "6 2.0 holy fuck this amazing \n", + "7 2.0 yougaiz yougaiz who interested parineeti chopr... \n", + "8 3.0 not whole lot has changed except the fact that... \n", + "9 1.0 different times different cultures same point ... \n", + "10 1.0 too much attention \n", + "11 3.0 its nice and all but your monitors make want b... \n", + "12 1.0 india turns into toi comments section the mom... \n", + "13 1.0 nice build man pretty damn sexy can ask why ti... \n", + "14 1.0 rofl why are you asking permission you and you... \n", + "15 2.0 seriously did you infographic maker even consi... \n", + "16 3.0 cliche but you can wrong with cyric mean was ... \n", + "17 6.0 from whatever heard this the model modi speech... \n", + "18 1.0 heard there was direct line narendra modi whi... \n", + "19 1.0 truth told there not insignificant percentage ... \n", + "20 7.0 many them fear namo being prime minister coul... \n", + "21 4.0 source will have accommodate hindus from bang... \n", + "22 1.0 confirmed woman and this india 186 comments l... \n", + "23 1.0 prepared for downvotes but after watching sev... \n", + "24 7.0 delhi not sleeping after long day see congres... \n", + "25 1.0 would like bjp come out support scrapping sec... \n", + "26 4.0 update still found debris black boxes evidenc... \n", + "27 3.0 there one tool bjp can use their manifesto whi... \n", + "28 1.0 wtf why \n", + "29 2.0 fantastic strike the \n", + "30 2.0 you mean the ruling coalition government hatch... \n", + "31 1.0 before used anti rss result anti bjp very ske... \n", + "32 7.0 first understand that you are not anyway contr... \n", + "33 5.0 personally for shooting muslims and internetof... \n", + "34 4.0 nice build one thing caught eye absolute splur... \n", + "35 4.0 looks shit now but still proud made \n", + "36 1.0 what wrong with that another lame ass attempt ... \n", + "37 4.0 how difficult was get the length and bend the ... \n", + "38 5.0 very impressed with gnabry movement and linki... \n", + "39 3.0 here interesting tidbit what the like off pert... \n", + "40 1.0 false dichotomy either you are for for bjp you... \n", + "41 3.0 angry namo bhaktas the survival the angry namo... \n", + "42 1.0 this would issue these people were prosecuted ... \n", + "43 9.0 hate free india would boring love hate relati... \n", + "44 1.0 this getting bit silly now \n", + "45 1.0 giroud should done better but great ball rosicky \n", + "46 5.0 jakiro spotted the middle top maybe \n", + "47 3.0 won vote for aap anymore that for sure though... \n", + "48 3.0 regardless the opposition all girouds goals ha... \n", + "49 1.0 directly pleading people who oppose modi just... \n", + "\n", + " y \n", + "0 negative \n", + "1 positive \n", + "2 negative \n", + "3 positive \n", + "4 positive \n", + "5 positive \n", + "6 positive \n", + "7 positive \n", + "8 positive \n", + "9 positive \n", + "10 positive \n", + "11 positive \n", + "12 positive \n", + "13 positive \n", + "14 positive \n", + "15 negative \n", + "16 negative \n", + "17 positive \n", + "18 positive \n", + "19 positive \n", + "20 positive \n", + "21 negative \n", + "22 positive \n", + "23 negative \n", + "24 negative \n", + "25 negative \n", + "26 negative \n", + "27 positive \n", + "28 negative \n", + "29 positive \n", + "30 negative \n", + "31 positive \n", + "32 positive \n", + "33 negative \n", + "34 positive \n", + "35 positive \n", + "36 negative \n", + "37 positive \n", + "38 positive \n", + "39 negative \n", + "40 negative \n", + "41 negative \n", + "42 negative \n", + "43 negative \n", + "44 negative \n", + "45 positive \n", + "46 positive \n", + "47 negative \n", + "48 positive \n", + "49 positive " + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Train longer!\n", + "trainable_pipe = nlp.load('train.sentiment')\n", + "trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(5)\n", + "fitted_pipe = trainable_pipe.fit(train_df.iloc[:50])\n", + "# predict with the trainable pipeline on dataset and get predictions\n", + "preds = fitted_pipe.predict(train_df.iloc[:50],output_level='document')\n", + "\n", + "#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n", + "preds.dropna(inplace=True)\n", + "print(classification_report(preds['y'], preds['sentiment']))\n", + "\n", + "preds" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qFoT-s1MjTSS" + }, + "source": [ + "# Try training with different Embeddings" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "nxWFzQOhjWC8", + "outputId": "29e977d1-d916-4b2c-db7f-3acad2f6bcaa" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "For language NLU provides the following Models : \n", + "nlu.load('am.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_amharic\n", + "For language NLU provides the following Models : \n", + "nlu.load('de.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "For language NLU provides the following Models : \n", + "nlu.load('el.embed_sentence.bert.base_uncased') returns Spark NLP model_anno_obj sent_bert_base_uncased\n", + "For language NLU provides the following Models : \n", + "nlu.load('en.embed_sentence') returns Spark NLP model_anno_obj tfhub_use\n", + "nlu.load('en.embed_sentence.albert') returns Spark NLP model_anno_obj albert_base_uncased\n", + "nlu.load('en.embed_sentence.bert') returns Spark NLP model_anno_obj sent_bert_base_uncased\n", + "nlu.load('en.embed_sentence.bert.base_uncased_legal') returns Spark NLP model_anno_obj sent_bert_base_uncased_legal\n", + "nlu.load('en.embed_sentence.bert.finetuned') returns Spark NLP model_anno_obj sbert_setfit_finetuned_financial_text_classification\n", + "nlu.load('en.embed_sentence.bert.pubmed') returns Spark NLP model_anno_obj sent_bert_pubmed\n", + "nlu.load('en.embed_sentence.bert.pubmed_squad2') returns Spark NLP model_anno_obj sent_bert_pubmed_squad2\n", + "nlu.load('en.embed_sentence.bert.wiki_books') returns Spark NLP model_anno_obj sent_bert_wiki_books\n", + "nlu.load('en.embed_sentence.bert.wiki_books_mnli') returns Spark NLP model_anno_obj sent_bert_wiki_books_mnli\n", + "nlu.load('en.embed_sentence.bert.wiki_books_qnli') returns Spark NLP model_anno_obj sent_bert_wiki_books_qnli\n", + "nlu.load('en.embed_sentence.bert.wiki_books_qqp') returns Spark NLP model_anno_obj sent_bert_wiki_books_qqp\n", + "nlu.load('en.embed_sentence.bert.wiki_books_squad2') returns Spark NLP model_anno_obj sent_bert_wiki_books_squad2\n", + "nlu.load('en.embed_sentence.bert.wiki_books_sst2') returns Spark NLP model_anno_obj sent_bert_wiki_books_sst2\n", + "nlu.load('en.embed_sentence.bert_base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "nlu.load('en.embed_sentence.bert_base_uncased') returns Spark NLP model_anno_obj sent_bert_base_uncased\n", + "nlu.load('en.embed_sentence.bert_large_cased') returns Spark NLP model_anno_obj sent_bert_large_cased\n", + "nlu.load('en.embed_sentence.bert_large_uncased') returns Spark NLP model_anno_obj sent_bert_large_uncased\n", + "nlu.load('en.embed_sentence.bert_use_cmlm_en_base') returns Spark NLP model_anno_obj sent_bert_use_cmlm_en_base\n", + "nlu.load('en.embed_sentence.bert_use_cmlm_en_large') returns Spark NLP model_anno_obj sent_bert_use_cmlm_en_large\n", + "nlu.load('en.embed_sentence.biobert.clinical_base_cased') returns Spark NLP model_anno_obj sent_biobert_clinical_base_cased\n", + "nlu.load('en.embed_sentence.biobert.discharge_base_cased') returns Spark NLP model_anno_obj sent_biobert_discharge_base_cased\n", + "nlu.load('en.embed_sentence.biobert.pmc_base_cased') returns Spark NLP model_anno_obj sent_biobert_pmc_base_cased\n", + "nlu.load('en.embed_sentence.biobert.pubmed_base_cased') returns Spark NLP model_anno_obj sent_biobert_pubmed_base_cased\n", + "nlu.load('en.embed_sentence.biobert.pubmed_large_cased') returns Spark NLP model_anno_obj sent_biobert_pubmed_large_cased\n", + "nlu.load('en.embed_sentence.biobert.pubmed_pmc_base_cased') returns Spark NLP model_anno_obj sent_biobert_pubmed_pmc_base_cased\n", + "nlu.load('en.embed_sentence.covidbert.large_uncased') returns Spark NLP model_anno_obj sent_covidbert_large_uncased\n", + "nlu.load('en.embed_sentence.distil_roberta.distilled_base') returns Spark NLP model_anno_obj sent_distilroberta_base\n", + "nlu.load('en.embed_sentence.doc2vec') returns Spark NLP model_anno_obj doc2vec_gigaword_300\n", + "nlu.load('en.embed_sentence.doc2vec.gigaword_300') returns Spark NLP model_anno_obj doc2vec_gigaword_300\n", + "nlu.load('en.embed_sentence.doc2vec.gigaword_wiki_300') returns Spark NLP model_anno_obj doc2vec_gigaword_wiki_300\n", + "nlu.load('en.embed_sentence.electra') returns Spark NLP model_anno_obj sent_electra_small_uncased\n", + "nlu.load('en.embed_sentence.electra_base_uncased') returns Spark NLP model_anno_obj sent_electra_base_uncased\n", + "nlu.load('en.embed_sentence.electra_large_uncased') returns Spark NLP model_anno_obj sent_electra_large_uncased\n", + "nlu.load('en.embed_sentence.electra_small_uncased') returns Spark NLP model_anno_obj sent_electra_small_uncased\n", + "nlu.load('en.embed_sentence.roberta.base') returns Spark NLP model_anno_obj sent_roberta_base\n", + "nlu.load('en.embed_sentence.roberta.large') returns Spark NLP model_anno_obj sent_roberta_large\n", + "nlu.load('en.embed_sentence.small_bert_L10_128') returns Spark NLP model_anno_obj sent_small_bert_L10_128\n", + "nlu.load('en.embed_sentence.small_bert_L10_256') returns Spark NLP model_anno_obj sent_small_bert_L10_256\n", + "nlu.load('en.embed_sentence.small_bert_L10_512') returns Spark NLP model_anno_obj sent_small_bert_L10_512\n", + "nlu.load('en.embed_sentence.small_bert_L10_768') returns Spark NLP model_anno_obj sent_small_bert_L10_768\n", + "nlu.load('en.embed_sentence.small_bert_L12_128') returns Spark NLP model_anno_obj sent_small_bert_L12_128\n", + "nlu.load('en.embed_sentence.small_bert_L12_256') returns Spark NLP model_anno_obj sent_small_bert_L12_256\n", + "nlu.load('en.embed_sentence.small_bert_L12_512') returns Spark NLP model_anno_obj sent_small_bert_L12_512\n", + "nlu.load('en.embed_sentence.small_bert_L12_768') returns Spark NLP model_anno_obj sent_small_bert_L12_768\n", + "nlu.load('en.embed_sentence.small_bert_L2_128') returns Spark NLP model_anno_obj sent_small_bert_L2_128\n", + "nlu.load('en.embed_sentence.small_bert_L2_256') returns Spark NLP model_anno_obj sent_small_bert_L2_256\n", + "nlu.load('en.embed_sentence.small_bert_L2_512') returns Spark NLP model_anno_obj sent_small_bert_L2_512\n", + "nlu.load('en.embed_sentence.small_bert_L2_768') returns Spark NLP model_anno_obj sent_small_bert_L2_768\n", + "nlu.load('en.embed_sentence.small_bert_L4_128') returns Spark NLP model_anno_obj sent_small_bert_L4_128\n", + "nlu.load('en.embed_sentence.small_bert_L4_256') returns Spark NLP model_anno_obj sent_small_bert_L4_256\n", + "nlu.load('en.embed_sentence.small_bert_L4_512') returns Spark NLP model_anno_obj sent_small_bert_L4_512\n", + "nlu.load('en.embed_sentence.small_bert_L4_768') returns Spark NLP model_anno_obj sent_small_bert_L4_768\n", + "nlu.load('en.embed_sentence.small_bert_L6_128') returns Spark NLP model_anno_obj sent_small_bert_L6_128\n", + "nlu.load('en.embed_sentence.small_bert_L6_256') returns Spark NLP model_anno_obj sent_small_bert_L6_256\n", + "nlu.load('en.embed_sentence.small_bert_L6_512') returns Spark NLP model_anno_obj sent_small_bert_L6_512\n", + "nlu.load('en.embed_sentence.small_bert_L6_768') returns Spark NLP model_anno_obj sent_small_bert_L6_768\n", + "nlu.load('en.embed_sentence.small_bert_L8_128') returns Spark NLP model_anno_obj sent_small_bert_L8_128\n", + "nlu.load('en.embed_sentence.small_bert_L8_256') returns Spark NLP model_anno_obj sent_small_bert_L8_256\n", + "nlu.load('en.embed_sentence.small_bert_L8_512') returns Spark NLP model_anno_obj sent_small_bert_L8_512\n", + "nlu.load('en.embed_sentence.small_bert_L8_768') returns Spark NLP model_anno_obj sent_small_bert_L8_768\n", + "nlu.load('en.embed_sentence.tfhub_use') returns Spark NLP model_anno_obj tfhub_use\n", + "nlu.load('en.embed_sentence.tfhub_use.lg') returns Spark NLP model_anno_obj tfhub_use_lg\n", + "nlu.load('en.embed_sentence.use') returns Spark NLP model_anno_obj tfhub_use\n", + "nlu.load('en.embed_sentence.use.lg') returns Spark NLP model_anno_obj tfhub_use_lg\n", + "For language NLU provides the following Models : \n", + "nlu.load('es.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "nlu.load('es.embed_sentence.bert.base_uncased') returns Spark NLP model_anno_obj sent_bert_base_uncased\n", + "For language NLU provides the following Models : \n", + "nlu.load('fi.embed_sentence.bert') returns Spark NLP model_anno_obj bert_base_finnish_uncased\n", + "nlu.load('fi.embed_sentence.bert.cased') returns Spark NLP model_anno_obj bert_base_finnish_cased\n", + "nlu.load('fi.embed_sentence.bert.uncased') returns Spark NLP model_anno_obj bert_base_finnish_uncased\n", + "For language NLU provides the following Models : \n", + "nlu.load('ha.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_hausa\n", + "For language NLU provides the following Models : \n", + "nlu.load('ig.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_igbo\n", + "For language NLU provides the following Models : \n", + "nlu.load('lg.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_luganda\n", + "For language NLU provides the following Models : \n", + "nlu.load('nl.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "For language NLU provides the following Models : \n", + "nlu.load('pcm.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_naija\n", + "For language NLU provides the following Models : \n", + "nlu.load('pt.embed_sentence.bert.base_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_base_tsdae_sts\n", + "nlu.load('pt.embed_sentence.bert.cased_large_legal') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.1\n", + "nlu.load('pt.embed_sentence.bert.large_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_gpl_sts\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.10.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.10\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.2.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.2\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.3.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.3\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.4.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.4\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.5.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.5\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.7.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.7\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.8.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.8\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.9.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.9\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v1.0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v1.0\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_gpl_nli_sts_v0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_gpl_nli_sts_v0\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_gpl_nli_sts_v1.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_gpl_nli_sts_v1\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_nli_sts_v0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_nli_sts_v0\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_nli_sts_v1.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_nli_sts_v1\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_sts_v0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_sts_v0\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_sts_v1.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_sts_v1\n", + "nlu.load('pt.embed_sentence.bert.v2_base_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_base_ma_v2\n", + "nlu.load('pt.embed_sentence.bert.v2_large_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_sts_v2\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.assin.base.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_base_ma\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.assin2.base.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_base\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_sts_by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_sts_ma.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large_ma\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_sts_ma_v3.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large_ma_v3\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_tsdae_sts.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_sts\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_tsdae_sts_v4.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_sts_v4\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_tsdae_v4_gpl_sts.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_v4_gpl_sts\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.v2_large_sts_v2.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large_v2\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.v2_large_v2_sts.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_v2_sts\n", + "For language NLU provides the following Models : \n", + "nlu.load('rw.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_kinyarwanda\n", + "For language NLU provides the following Models : \n", + "nlu.load('sv.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "For language NLU provides the following Models : \n", + "nlu.load('sw.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_swahili\n", + "For language NLU provides the following Models : \n", + "nlu.load('wo.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_wolof\n", + "For language NLU provides the following Models : \n", + "nlu.load('xx.embed_sentence') returns Spark NLP model_anno_obj sent_bert_multi_cased\n", + "nlu.load('xx.embed_sentence.bert') returns Spark NLP model_anno_obj sent_bert_multi_cased\n", + "nlu.load('xx.embed_sentence.bert.cased') returns Spark NLP model_anno_obj sent_bert_multi_cased\n", + "nlu.load('xx.embed_sentence.bert.muril') returns Spark NLP model_anno_obj sent_bert_muril\n", + "nlu.load('xx.embed_sentence.bert_use_cmlm_multi_base') returns Spark NLP model_anno_obj sent_bert_use_cmlm_multi_base\n", + "nlu.load('xx.embed_sentence.bert_use_cmlm_multi_base_br') returns Spark NLP model_anno_obj sent_bert_use_cmlm_multi_base_br\n", + "nlu.load('xx.embed_sentence.labse') returns Spark NLP model_anno_obj labse\n", + "nlu.load('xx.embed_sentence.xlm_roberta.base') returns Spark NLP model_anno_obj sent_xlm_roberta_base\n", + "For language NLU provides the following Models : \n", + "nlu.load('yo.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_yoruba\n", + "For language NLU provides the following Models : \n", + "nlu.load('zh.embed_sentence.bert') returns Spark NLP model_anno_obj sbert_chinese_qmc_finance_v1\n", + "nlu.load('zh.embed_sentence.bert.distilled') returns Spark NLP model_anno_obj sbert_chinese_qmc_finance_v1_distill\n" + ] + } + ], + "source": [ + "# We can use nlu.print_components(action='embed_sentence') to see every possibler sentence embedding we could use. Lets use bert!\n", + "nlp.nlu.print_components(action='embed_sentence')" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "IKK_Ii_gjJfF", + "outputId": "c1ba963c-f065-4eed-88a8-35316200b992" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", + "sent_small_bert_L12_768 download started this may take some time.\n", + "Approximate size to download 392.9 MB\n", + "[OK!]\n", + " precision recall f1-score support\n", + "\n", + " negative 0.87 0.78 0.82 300\n", + " neutral 0.00 0.00 0.00 0\n", + " positive 0.91 0.68 0.77 300\n", + "\n", + " accuracy 0.73 600\n", + " macro avg 0.59 0.49 0.53 600\n", + "weighted avg 0.89 0.73 0.80 600\n", + "\n" + ] + } + ], + "source": [ + "trainable_pipe = nlp.load('en.embed_sentence.small_bert_L12_768 train.sentiment')\n", + "# We need to train longer and user smaller LR for NON-USE based sentence embeddings usually\n", + "# We could tune the hyperparameters further with hyperparameter tuning methods like gridsearch\n", + "# Also longer training gives more accuracy\n", + "trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(70)\n", + "trainable_pipe['trainable_sentiment_dl'].setLr(0.0005)\n", + "fitted_pipe = trainable_pipe.fit(train_df)\n", + "# predict with the trainable pipeline on dataset and get predictions\n", + "preds = fitted_pipe.predict(train_df,output_level='document')\n", + "\n", + "#sentence detector that is part of the pipe generates some NaNs. lets drop them first\n", + "preds.dropna(inplace=True)\n", + "print(classification_report(preds['y'], preds['sentiment']))\n", + "\n", + "#preds" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2BB-NwZUoHSe" + }, + "source": [ + "# 5. Lets save the model" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "id": "eLex095goHwm" + }, + "outputs": [], + "source": [ + "stored_model_path = './models/classifier_dl_trained'\n", + "fitted_pipe.save(stored_model_path)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e_b2DPd4rCiU" + }, + "source": [ + "# 6. Lets load the model from HDD.\n", + "This makes Offlien NLU usage possible! \n", + "You need to call nlu.load(path=path_to_the_pipe) to load a model/pipeline from disk." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 133 + }, + "id": "SO4uz45MoRgp", + "outputId": "ac7bafb1-cd05-4082-86f8-403384f6edfa" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n" + ] + }, + { + "data": { + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
documentsentence_embedding_from_disksentimentsentiment_confidence
0Indian prime minister was assinated[-0.09739536792039871, 0.23939242959022522, 0....negative0.0
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "text/plain": [ + " document \\\n", + "0 Indian prime minister was assinated \n", + "\n", + " sentence_embedding_from_disk sentiment \\\n", + "0 [-0.09739536792039871, 0.23939242959022522, 0.... negative \n", + "\n", + " sentiment_confidence \n", + "0 0.0 " + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "hdd_pipe = nlp.load(path=stored_model_path)\n", + "\n", + "preds = hdd_pipe.predict('Indian prime minister was assinated')\n", + "preds" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "e0CVlkk9v6Qi", + "outputId": "b82bb85a-9063-49ca-8be2-51f4f735b1ea" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n", + ">>> component_list['document_assembler'] has settable params:\n", + "component_list['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n", + ">>> component_list['bert_sentence_embeddings@sent_small_bert_L12_768'] has settable params:\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setBatchSize(8) | Info: Size of every batch | Currently set to : 8\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setCaseSensitive(False) | Info: whether to ignore case in tokens for embeddings matching | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setDimension(768) | Info: Number of embedding dimensions | Currently set to : 768\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setMaxSentenceLength(128) | Info: Max sentence length to process | Currently set to : 128\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setIsLong(False) | Info: Use Long type instead of Int type for inputs buffer - Some Bert models require Long instead of Int. | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setStorageRef('sent_small_bert_L12_768') | Info: unique reference name for identification | Currently set to : sent_small_bert_L12_768\n", + ">>> component_list['sentiment_dl@sent_small_bert_L12_768'] has settable params:\n", + "component_list['sentiment_dl@sent_small_bert_L12_768'].setThreshold(0.6) | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n", + "component_list['sentiment_dl@sent_small_bert_L12_768'].setThresholdLabel('neutral') | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n", + "component_list['sentiment_dl@sent_small_bert_L12_768'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['sentiment_dl@sent_small_bert_L12_768'].setClasses(['positive', 'negative']) | Info: get the tags used to trained this SentimentDLModel | Currently set to : ['positive', 'negative']\n", + "component_list['sentiment_dl@sent_small_bert_L12_768'].setStorageRef('sent_small_bert_L12_768') | Info: unique reference name for identification | Currently set to : sent_small_bert_L12_768\n" + ] + } + ], + "source": [ + "hdd_pipe.print_info()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2LJTK79JKF9-" + }, + "outputs": [], + "source": [] + } + ], + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.4" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_stock_market.ipynb b/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_stock_market.ipynb index 95dd396d..de01a31d 100644 --- a/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_stock_market.ipynb +++ b/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_stock_market.ipynb @@ -1 +1,3260 @@ -{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"NLU_training_sentiment_classifier_demo_stock_market.ipynb","provenance":[],"collapsed_sections":[]},"kernelspec":{"display_name":"Python 3","name":"python3"}},"cells":[{"cell_type":"markdown","metadata":{"id":"zkufh760uvF3"},"source":["![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)\n","\n","[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_stock_market.ipynb)\n","\n","\n","# Training a Sentiment Analysis Classifier with NLU \n","## 2 Class Demo Stock Market Sentiment Training\n","With the [SentimentDL model](https://nlp.johnsnowlabs.com/docs/en/annotators#sentimentdl-multi-class-sentiment-analysis-annotator) from Spark NLP you can achieve State Of the Art results on any multi class text classification problem \n","\n","This notebook showcases the following features : \n","\n","- How to train the deep learning classifier\n","- How to store a pipeline to disk\n","- How to load the pipeline from disk (Enables NLU offline mode)\n","\n","\n","You can achieve these results or even better on this dataset with training data:\n","\n","\n","
\n","\n","\n","![image.png]()\n","\n","\n","\n","\n","You can achieve these results or even better on this dataset with test data:\n","\n","\n","
\n","\n","![img.png]()\n"]},{"cell_type":"markdown","metadata":{"id":"dur2drhW5Rvi"},"source":["# 1. Install Java 8 and NLU"]},{"cell_type":"code","metadata":{"id":"hFGnBCHavltY","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1620214514012,"user_tz":-120,"elapsed":124849,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"ef6ce0ba-22d9-46dc-f409-ed1c02167849"},"source":["!wget https://setup.johnsnowlabs.com/nlu/colab.sh -O - | bash\n","import nlu"],"execution_count":null,"outputs":[{"output_type":"stream","text":["--2021-05-05 11:33:09-- https://raw.githubusercontent.com/JohnSnowLabs/nlu/master/scripts/colab_setup.sh\n","Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...\n","Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.\n","HTTP request sent, awaiting response... 200 OK\n","Length: 1671 (1.6K) [text/plain]\n","Saving to: ‘STDOUT’\n","\n","Installing NLU 3.0.0 with PySpark 3.0.2 and Spark NLP 3.0.1 for Google Colab ...\n","- 100%[===================>] 1.63K --.-KB/s in 0s \n","\n","2021-05-05 11:33:09 (53.5 MB/s) - written to stdout [1671/1671]\n","\n","\u001b[K |████████████████████████████████| 204.8MB 70kB/s \n","\u001b[K |████████████████████████████████| 153kB 48.1MB/s \n","\u001b[K |████████████████████████████████| 204kB 22.4MB/s \n","\u001b[K |████████████████████████████████| 204kB 49.2MB/s \n","\u001b[?25h Building wheel for pyspark (setup.py) ... \u001b[?25l\u001b[?25hdone\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"f4KkTfnR5Ugg"},"source":["# 2. Download Stock Market Sentiment dataset \n","https://www.kaggle.com/yash612/stockmarket-sentiment-dataset\n","#Context\n","\n","Gathered Stock news from Multiple twitter Handles regarding Economic news dividing into two parts : Negative and positive."]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"OrVb5ZMvvrQD","executionInfo":{"status":"ok","timestamp":1620214515441,"user_tz":-120,"elapsed":126265,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"2bfba990-0ff7-4660-8ba5-8821eb540304"},"source":["! wget http://ckl-it.de/wp-content/uploads/2021/02/stock_data.csv\n"],"execution_count":null,"outputs":[{"output_type":"stream","text":["--2021-05-05 11:35:13-- http://ckl-it.de/wp-content/uploads/2021/02/stock_data.csv\n","Resolving ckl-it.de (ckl-it.de)... 217.160.0.108, 2001:8d8:100f:f000::209\n","Connecting to ckl-it.de (ckl-it.de)|217.160.0.108|:80... connected.\n","HTTP request sent, awaiting response... 200 OK\n","Length: 758217 (740K) [text/csv]\n","Saving to: ‘stock_data.csv’\n","\n","stock_data.csv 100%[===================>] 740.45K 819KB/s in 0.9s \n","\n","2021-05-05 11:35:15 (819 KB/s) - ‘stock_data.csv’ saved [758217/758217]\n","\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":391},"id":"y4xSRWIhwT28","executionInfo":{"status":"ok","timestamp":1620214516379,"user_tz":-120,"elapsed":127196,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"9495fbc4-9641-44c5-bfbc-85b564fb7b12"},"source":["import pandas as pd\n","train_path = '/content/stock_data.csv'\n","\n","train_df = pd.read_csv(train_path)\n","# the text data to use for classification should be in a column named 'text'\n","columns=['text','y']\n","train_df = train_df[columns]\n","from sklearn.model_selection import train_test_split\n","\n","train_df, test_df = train_test_split(train_df, test_size=0.2)\n","train_df"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
texty
3761P see what happened from October to Dec that s...negative
1428AAP Closed my short for positiveK Will short a...negative
3091AAP It may be wise to hold off on buying #AAP ...negative
620RT @DaveCBenoit: The banking system is not bui...negative
1611Sensex Slumps Over positive,000 Points From Da...negative
.........
1281The rodeo clown sent BK screaming into the S...negative
1895WMT breaking out of channel + All MAs are lini...positive
3458keep an eye on IDA presentation bout postive d...positive
3399DNDN Good place to lock in (some) profit at 7...positive
2303VVS breaking out of a flag setup. MACD cross-u...positive
\n","

3200 rows × 2 columns

\n","
"],"text/plain":[" text y\n","3761 P see what happened from October to Dec that s... negative\n","1428 AAP Closed my short for positiveK Will short a... negative\n","3091 AAP It may be wise to hold off on buying #AAP ... negative\n","620 RT @DaveCBenoit: The banking system is not bui... negative\n","1611 Sensex Slumps Over positive,000 Points From Da... negative\n","... ... ...\n","1281 The rodeo clown sent BK screaming into the S... negative\n","1895 WMT breaking out of channel + All MAs are lini... positive\n","3458 keep an eye on IDA presentation bout postive d... positive\n","3399 DNDN Good place to lock in (some) profit at 7... positive\n","2303 VVS breaking out of a flag setup. MACD cross-u... positive\n","\n","[3200 rows x 2 columns]"]},"metadata":{"tags":[]},"execution_count":3}]},{"cell_type":"markdown","metadata":{"id":"0296Om2C5anY"},"source":["# 3. Train Deep Learning Classifier using nlu.load('train.sentiment')\n","\n","You dataset label column should be named 'y' and the feature column with text data should be named 'text'"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":1000},"id":"3ZIPkRkWftBG","executionInfo":{"status":"ok","timestamp":1620215080750,"user_tz":-120,"elapsed":569,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"2c6c1232-26f2-4727-e184-f343eb0d699f"},"source":["import nlu \n","from sklearn.metrics import classification_report\n","\n","# load a trainable pipeline by specifying the train. prefix and fit it on a datset with label and text columns\n","# by default the Universal Sentence Encoder (USE) Sentence embeddings are used for generation\n","trainable_pipe = nlu.load('train.sentiment')\n","fitted_pipe = trainable_pipe.fit(train_df.iloc[:50])\n","\n","# predict with the trainable pipeline on dataset and get predictions\n","preds = fitted_pipe.predict(train_df.iloc[:50],output_level='document')\n","#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n","preds.dropna(inplace=True)\n","print(classification_report(preds['y'], preds['sentiment']))\n","\n","preds"],"execution_count":null,"outputs":[{"output_type":"stream","text":[" precision recall f1-score support\n","\n"," negative 0.00 0.00 0.00 22\n"," neutral 0.00 0.00 0.00 0\n"," positive 1.00 0.11 0.19 28\n","\n"," accuracy 0.06 50\n"," macro avg 0.33 0.04 0.06 50\n","weighted avg 0.56 0.06 0.11 50\n","\n"],"name":"stdout"},{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
ysentence_embedding_usetrained_sentiment_confidenceorigin_indextrained_sentimentdocumenttextsentence
0negative[-0.03631044551730156, -0.02809535153210163, -...0.5324203761neutralP see what happened from October to Dec that s...P see what happened from October to Dec that s...[P see what happened from October to Dec that ...
1negative[-0.019356619566679, 0.02282714657485485, 0.00...0.5286911428neutralAAP Closed my short for positiveK Will short a...AAP Closed my short for positiveK Will short a...[AAP Closed my short for positiveK Will short ...
2negative[0.03622237220406532, -0.06491724401712418, 0....0.5183083091neutralAAP It may be wise to hold off on buying #AAP ...AAP It may be wise to hold off on buying #AAP ...[AAP It may be wise to hold off on buying #AAP...
3negative[0.038874898105859756, 0.025729672983288765, -...0.538483620neutralRT @DaveCBenoit: The banking system is not bui...RT @DaveCBenoit: The banking system is not bui...[RT @DaveCBenoit:, The banking system is not ...
4negative[0.04230193793773651, -0.02588844671845436, -0...0.5263581611neutralSensex Slumps Over positive,000 Points From Da...Sensex Slumps Over positive,000 Points From Da...[Sensex Slumps Over positive,000 Points From D...
5negative[-0.000432533270213753, -0.009179827757179737,...0.530686876neutralSelling your home to a computer was supposed t...Selling your home to a computer was supposed t...[Selling your home to a computer was supposed ...
6positive[0.06600425392389297, -0.022004421800374985, -...0.5810161678neutraled Daily Triangle on HEO,..... pdating ong and...ed Daily Triangle on HEO,..... pdating ong an...[ed Daily Triangle on HEO,., .... pdating ong ...
7positive[-0.017160149291157722, -0.03617899864912033, ...0.549800482neutralP breaking out after expanding from Shark Patt...P breaking out after expanding from Shark Patt...[P breaking out after expanding from Shark Pat...
8positive[0.026538891717791557, -0.06604061275720596, -...0.5685792540neutralActing well above 26.positive9 flat base trigg...Acting well above 26.positive9 flat base trigg...[Acting well above 26.positive9 flat base trig...
9positive[0.08514805138111115, -0.024943925440311432, -...0.5240132183neutralCYBX broke out to all time highs accompained b...CYBX broke out to all time highs accompained b...[CYBX broke out to all time highs accompained ...
10positive[0.05149979144334793, -0.061731286346912384, -...0.5183143998neutralsolid day with MGM AAP SBX leading the way for...solid day with MGM AAP SBX leading the way for...[solid day with MGM AAP SBX leading the way fo...
11negative[0.03843201324343681, -0.0733688548207283, -0....0.51609119neutralAAP 465 is resistance and heading to 435 in th...AAP 465 is resistance and heading to 435 in th...[AAP 465 is resistance and heading to 435 in t...
12negative[0.02629232406616211, -0.014554189518094063, -...0.522850219neutralCO pointed out this wknd - huge Outside Day - ...CO pointed out this wknd - huge Outside Day - ...[CO pointed out this wknd - huge Outside Day -...
13positive[0.05966312810778618, -0.08997906744480133, -0...0.6118911041positiveVVS ong set up:VVS ong set up:[VVS ong set up:]
14positive[-0.003521521808579564, -0.011022194288671017,...0.5450922763neutralDDD come on 40! I got some calls at 0.positive...DDD come on 40! I got some calls at 0.positive...[DDD come on 40!, I got some calls at 0.posit...
15positive[0.07560215145349503, -0.009755599312484264, -...0.5360571683neutralWMB might fill this gap and reverse.WMB might fill this gap and reverse.[WMB might fill this gap and reverse.]
16negative[-0.0117484824731946, -0.007429660763591528, -...0.5360932565neutralHeard on the Street: Funeral providers would s...Heard on the Street: Funeral providers would s...[Heard on the Street: Funeral providers would ...
17positive[-0.03314466401934624, -0.04437091201543808, -...0.6207772065positiveuser welcome to the TK clubuser welcome to the TK club[user welcome to the TK club]
18positive[-0.018701869994401932, 0.03903575986623764, -...0.5894172308neutraled Weekly Triangle on DVAX,....Scaling Ped Weekly Triangle on DVAX,....Scaling P[ed Weekly Triangle on DVAX,., ., .., Scaling P]
19positive[-0.011695394292473793, 0.05609796568751335, -...0.5150303355neutralAAP What does Al Gore know that we don't?AAP What does Al Gore know that we don't?[AAP What does Al Gore know that we don't?]
20positive[0.04774581268429756, -0.03589113429188728, -0...0.5819802594neutraluser CSN, yesterday I bough at 25Kpositive.237...user CSN, yesterday I bough at 25Kpositive.237...[user CSN, yesterday I bough at 25Kpositive., ...
21positive[0.04126836359500885, -0.0008074558572843671, ...0.5546333099neutralFAO is deciding its direction #waitforitFAO is deciding its direction #waitforit[FAO is deciding its direction #waitforit]
22negative[0.017080938443541527, 0.021576160565018654, 0...0.5193002091neutralEN Take your profits and run. Nice run up,but ...EN Take your profits and run. Nice run up,but ...[EN Take your profits and run., Nice run up,bu...
23negative[0.04716206341981888, -0.021414492279291153, -...0.5340851819neutralGOOG reminds me so much of AAP in Sept bouncin...GOOG reminds me so much of AAP in Sept bounci...[GOOG reminds me so much of AAP in Sept bounci...
24positive[0.023834621533751488, -0.07077112793922424, -...0.5004513913neutralAAP doji being put in on 60 min after 7 down c...AAP doji being put in on 60 min after 7 down c...[AAP doji being put in on 60 min after 7 down ...
25negative[0.1026904433965683, 0.013861684128642082, -0....0.5103883117neutralJames Dinsmore attributes Bancroft Fundâ€â„...James Dinsmore attributes Bancroft Fundâ€â„...[James Dinsmore attributes Bancroft Fundâ€â...
26positive[0.006304154172539711, -0.0644269809126854, -0...0.518708695neutralSEV nicely green on red marketSEV nicely green on red market[SEV nicely green on red market]
27negative[0.06555721163749695, 0.023601243272423744, -0...0.5154422916neutralVS option trader closes out Feb 50C selling po...VS option trader closes out Feb 50C selling po...[VS option trader closes out Feb 50C selling p...
28positive[-0.0012494433904066682, -0.017743036150932312...0.596320692neutralAON Continuation on good volumeAON Continuation on good volume[AON Continuation on good volume]
29positive[0.04852033779025078, 0.00623974809423089, -0....0.513316475neutralWatch for SGY to break its downward trend line...Watch for SGY to break its downward trend line...[Watch for SGY to break its downward trend lin...
30positive[0.05204714462161064, 0.03000013716518879, -0....0.5568723159neutralSXC - breaking above key downward channel leve...SXC - breaking above key downward channel leve...[SXC - breaking above key downward channel lev...
31negative[0.053190235048532486, -0.023420613259077072, ...0.5245742102neutralGovernment May Slash Borrowing From Market In ...Government May Slash Borrowing From Market In ...[Government May Slash Borrowing From Market In...
32negative[0.013695711269974709, 0.031249094754457474, -...0.5019411689neutralCS - To those that doubted me; today is you're...CS - To those that doubted me; today is you're...[CS - To those that doubted me; today is you'r...
33positive[0.03522776812314987, -0.00036802110844291747,...0.5074912455neutralPAB has resistance from June to Sept but Over ...PAB has resistance from June to Sept but Over ...[PAB has resistance from June to Sept but Over...
34negative[-0.05846807360649109, -0.027525130659341812, ...0.5235262337neutralBAC anyone think this might slush and fall bel...BAC anyone think this might slush and fall bel...[BAC anyone think this might slush and fall be...
35positive[0.05771354213356972, -0.03869543969631195, -0...0.5037133529neutralwith little whips on pops and drops don't go a...with little whips on pops and drops don't go a...[with little whips on pops and drops don't go ...
36negative[-0.049992404878139496, -0.017947403714060783,...0.5362742076neutralGreen Weekly Triangle on CYTX,....pdatingGreen Weekly Triangle on CYTX,....pdating[Green Weekly Triangle on CYTX,., ...pdating]
37positive[0.006133062299340963, -0.05417517572641373, -...0.5463193012neutralreports next Wed. user: ATW: Barclays starts a...reports next Wed. user: ATW: Barclays starts ...[reports next Wed. user: ATW:, Barclays start...
38negative[0.041309256106615067, 0.02620919793844223, -0...0.5188533805neutralNKD target positive50 - positive2 points to be...NKD target positive50 - positive2 points to be...[NKD target positive50 - positive2 points to b...
39positive[0.0385039821267128, -0.040522705763578415, -0...0.5228533789neutralPositive GOOG earnings pushed NQ_F higher and ...Positive GOOG earnings pushed NQ_F higher and ...[Positive GOOG earnings pushed NQ_F higher and...
40positive[0.05128449201583862, -0.04454624652862549, -0...0.5520433625neutralSSQ pulling back as suspected. uckily sold nea...SSQ pulling back as suspected. uckily sold nea...[SSQ pulling back as suspected., uckily sold n...
41negative[0.018300753086805344, -0.04134758561849594, -...0.519100851neutralNKE potential short for many reasons. Check F ...NKE potential short for many reasons. Check F ...[NKE potential short for many reasons., Check ...
42negative[0.09244795143604279, 0.015826726332306862, -0...0.534414608neutralSupply-chain finance has become popular in rec...Supply-chain finance has become popular in rec...[Supply-chain finance has become popular in re...
43positive[0.042959753423929214, 0.004588234703987837, -...0.536310546neutralAEG Over 28.23AEG Over 28.23[AEG Over 28.23]
44negative[0.004285166040062904, -0.07630395889282227, -...0.5061021225neutralAAP Free-falling Now to 457AAP Free-falling Now to 457[AAP Free-falling Now to 457]
45positive[0.017030321061611176, -0.022644072771072388, ...0.5005681227neutralBAC positivepositive.75 to positivepositive.80...BAC positivepositive.75 to positivepositive.80...[BAC positivepositive., 75 to positivepositive...
46positive[0.030308598652482033, 0.02380680851638317, -0...0.5136423991neutralNVDA not working so far still holding though.NVDA not working so far still holding though.[NVDA not working so far still holding though.]
47negative[0.05175251513719559, 0.05424535647034645, -0....0.5129991467neutralCAT Once she loses 86.50 ookout Below!!! Strai...CAT Once she loses 86.50 ookout Below!!! Strai...[CAT Once she loses 86.50 ookout Below!, !!, S...
48positive[-0.022420072928071022, -0.06559137254953384, ...0.6001781000positiveMy setup alerts went bonkers today..One of man...My setup alerts went bonkers today..One of man...[My setup alerts went bonkers today., ., One o...
49positive[0.03310457989573479, -0.038539715111255646, -...0.554509282neutralSWHC Obama/Biden to speak tomorrow on gun cont...SWHC Obama/Biden to speak tomorrow on gun cont...[SWHC Obama/Biden to speak tomorrow on gun con...
\n","
"],"text/plain":[" y ... sentence\n","0 negative ... [P see what happened from October to Dec that ...\n","1 negative ... [AAP Closed my short for positiveK Will short ...\n","2 negative ... [AAP It may be wise to hold off on buying #AAP...\n","3 negative ... [RT @DaveCBenoit:, The banking system is not ...\n","4 negative ... [Sensex Slumps Over positive,000 Points From D...\n","5 negative ... [Selling your home to a computer was supposed ...\n","6 positive ... [ed Daily Triangle on HEO,., .... pdating ong ...\n","7 positive ... [P breaking out after expanding from Shark Pat...\n","8 positive ... [Acting well above 26.positive9 flat base trig...\n","9 positive ... [CYBX broke out to all time highs accompained ...\n","10 positive ... [solid day with MGM AAP SBX leading the way fo...\n","11 negative ... [AAP 465 is resistance and heading to 435 in t...\n","12 negative ... [CO pointed out this wknd - huge Outside Day -...\n","13 positive ... [VVS ong set up:]\n","14 positive ... [DDD come on 40!, I got some calls at 0.posit...\n","15 positive ... [WMB might fill this gap and reverse.]\n","16 negative ... [Heard on the Street: Funeral providers would ...\n","17 positive ... [user welcome to the TK club]\n","18 positive ... [ed Weekly Triangle on DVAX,., ., .., Scaling P]\n","19 positive ... [AAP What does Al Gore know that we don't?]\n","20 positive ... [user CSN, yesterday I bough at 25Kpositive., ...\n","21 positive ... [FAO is deciding its direction #waitforit]\n","22 negative ... [EN Take your profits and run., Nice run up,bu...\n","23 negative ... [GOOG reminds me so much of AAP in Sept bounci...\n","24 positive ... [AAP doji being put in on 60 min after 7 down ...\n","25 negative ... [James Dinsmore attributes Bancroft Fundâ€â...\n","26 positive ... [SEV nicely green on red market]\n","27 negative ... [VS option trader closes out Feb 50C selling p...\n","28 positive ... [AON Continuation on good volume]\n","29 positive ... [Watch for SGY to break its downward trend lin...\n","30 positive ... [SXC - breaking above key downward channel lev...\n","31 negative ... [Government May Slash Borrowing From Market In...\n","32 negative ... [CS - To those that doubted me; today is you'r...\n","33 positive ... [PAB has resistance from June to Sept but Over...\n","34 negative ... [BAC anyone think this might slush and fall be...\n","35 positive ... [with little whips on pops and drops don't go ...\n","36 negative ... [Green Weekly Triangle on CYTX,., ...pdating]\n","37 positive ... [reports next Wed. user: ATW:, Barclays start...\n","38 negative ... [NKD target positive50 - positive2 points to b...\n","39 positive ... [Positive GOOG earnings pushed NQ_F higher and...\n","40 positive ... [SSQ pulling back as suspected., uckily sold n...\n","41 negative ... [NKE potential short for many reasons., Check ...\n","42 negative ... [Supply-chain finance has become popular in re...\n","43 positive ... [AEG Over 28.23]\n","44 negative ... [AAP Free-falling Now to 457]\n","45 positive ... [BAC positivepositive., 75 to positivepositive...\n","46 positive ... [NVDA not working so far still holding though.]\n","47 negative ... [CAT Once she loses 86.50 ookout Below!, !!, S...\n","48 positive ... [My setup alerts went bonkers today., ., One o...\n","49 positive ... [SWHC Obama/Biden to speak tomorrow on gun con...\n","\n","[50 rows x 8 columns]"]},"metadata":{"tags":[]},"execution_count":5}]},{"cell_type":"markdown","metadata":{"id":"lVyOE2wV0fw_"},"source":["# 4. Test the fitted pipe on new example"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":76},"id":"qdCUg2MR0PD2","executionInfo":{"status":"ok","timestamp":1620215088558,"user_tz":-120,"elapsed":1570,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"54f6d037-8cbc-4273-8c9c-04e5599e76cf"},"source":["fitted_pipe.predict(\"Bitcoin dropped by 50 percent!\")"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
sentence_embedding_usetrained_sentiment_confidenceorigin_indextrained_sentimentdocumentsentence
0[0.06509937345981598, -0.05708129703998566, -0...0.5020210neutralBitcoin dropped by 50 percent![Bitcoin dropped by 50 percent!]
\n","
"],"text/plain":[" sentence_embedding_use ... sentence\n","0 [0.06509937345981598, -0.05708129703998566, -0... ... [Bitcoin dropped by 50 percent!]\n","\n","[1 rows x 6 columns]"]},"metadata":{"tags":[]},"execution_count":6}]},{"cell_type":"markdown","metadata":{"id":"xflpwrVjjBVD"},"source":["## 5. Configure pipe training parameters"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"UtsAUGTmOTms","executionInfo":{"status":"ok","timestamp":1620215088559,"user_tz":-120,"elapsed":1284,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"e6190008-9310-45be-f3ac-e47fc541a70f"},"source":["trainable_pipe.print_info()"],"execution_count":null,"outputs":[{"output_type":"stream","text":["The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n",">>> pipe['sentiment_dl'] has settable params:\n","pipe['sentiment_dl'].setMaxEpochs(1) | Info: Maximum number of epochs to train | Currently set to : 1\n","pipe['sentiment_dl'].setLr(0.005) | Info: Learning Rate | Currently set to : 0.005\n","pipe['sentiment_dl'].setBatchSize(64) | Info: Batch size | Currently set to : 64\n","pipe['sentiment_dl'].setDropout(0.5) | Info: Dropout coefficient | Currently set to : 0.5\n","pipe['sentiment_dl'].setEnableOutputLogs(True) | Info: Whether to use stdout in addition to Spark logs. | Currently set to : True\n","pipe['sentiment_dl'].setThreshold(0.6) | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n","pipe['sentiment_dl'].setThresholdLabel('neutral') | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n",">>> pipe['use@tfhub_use'] has settable params:\n","pipe['use@tfhub_use'].setDimension(512) | Info: Number of embedding dimensions | Currently set to : 512\n","pipe['use@tfhub_use'].setLoadSP(False) | Info: Whether to load SentencePiece ops file which is required only by multi-lingual models. This is not changeable after it's set with a pretrained model nor it is compatible with Windows. | Currently set to : False\n","pipe['use@tfhub_use'].setStorageRef('tfhub_use') | Info: unique reference name for identification | Currently set to : tfhub_use\n",">>> pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'] has settable params:\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setExplodeSentences(False) | Info: whether to explode each sentence into a different row, for better parallelization. Defaults to false. | Currently set to : False\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setStorageRef('SentenceDetectorDLModel_c83c27f46b97') | Info: storage unique identifier | Currently set to : SentenceDetectorDLModel_c83c27f46b97\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setEncoder(com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@45181116) | Info: Data encoder | Currently set to : com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@45181116\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setImpossiblePenultimates(['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']) | Info: Impossible penultimates | Currently set to : ['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setModelArchitecture('cnn') | Info: Model architecture (CNN) | Currently set to : cnn\n",">>> pipe['document_assembler'] has settable params:\n","pipe['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"2GJdDNV9jEIe"},"source":["## 6. Retrain with new parameters"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":1000},"id":"mptfvHx-MMMX","executionInfo":{"status":"ok","timestamp":1620215094052,"user_tz":-120,"elapsed":6547,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"b0386b8b-b130-4d65-d972-1ecb8cefbe17"},"source":["# Train longer!\n","trainable_pipe = nlu.load('train.sentiment')\n","trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(5) \n","fitted_pipe = trainable_pipe.fit(train_df.iloc[:50])\n","# predict with the trainable pipeline on dataset and get predictions\n","preds = fitted_pipe.predict(train_df.iloc[:50],output_level='document')\n","\n","#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n","preds.dropna(inplace=True)\n","print(classification_report(preds['y'], preds['sentiment']))\n","\n","preds"],"execution_count":null,"outputs":[{"output_type":"stream","text":[" precision recall f1-score support\n","\n"," negative 1.00 0.73 0.84 22\n"," neutral 0.00 0.00 0.00 0\n"," positive 0.90 1.00 0.95 28\n","\n"," accuracy 0.88 50\n"," macro avg 0.63 0.58 0.60 50\n","weighted avg 0.95 0.88 0.90 50\n","\n"],"name":"stdout"},{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
ysentence_embedding_usetrained_sentiment_confidenceorigin_indextrained_sentimentdocumenttextsentence
0negative[-0.03631044551730156, -0.02809535153210163, -...0.6731803761negativeP see what happened from October to Dec that s...P see what happened from October to Dec that s...[P see what happened from October to Dec that ...
1negative[-0.019356619566679, 0.02282714657485485, 0.00...0.6922761428negativeAAP Closed my short for positiveK Will short a...AAP Closed my short for positiveK Will short a...[AAP Closed my short for positiveK Will short ...
2negative[0.03622237220406532, -0.06491724401712418, 0....0.7049433091negativeAAP It may be wise to hold off on buying #AAP ...AAP It may be wise to hold off on buying #AAP ...[AAP It may be wise to hold off on buying #AAP...
3negative[0.038874898105859756, 0.025729672983288765, -...0.737828620negativeRT @DaveCBenoit: The banking system is not bui...RT @DaveCBenoit: The banking system is not bui...[RT @DaveCBenoit:, The banking system is not ...
4negative[0.04230193793773651, -0.02588844671845436, -0...0.6906601611negativeSensex Slumps Over positive,000 Points From Da...Sensex Slumps Over positive,000 Points From Da...[Sensex Slumps Over positive,000 Points From D...
5negative[-0.000432533270213753, -0.009179827757179737,...0.674662876negativeSelling your home to a computer was supposed t...Selling your home to a computer was supposed t...[Selling your home to a computer was supposed ...
6positive[0.06600425392389297, -0.022004421800374985, -...0.9834131678positiveed Daily Triangle on HEO,..... pdating ong and...ed Daily Triangle on HEO,..... pdating ong an...[ed Daily Triangle on HEO,., .... pdating ong ...
7positive[-0.017160149291157722, -0.03617899864912033, ...0.958774482positiveP breaking out after expanding from Shark Patt...P breaking out after expanding from Shark Patt...[P breaking out after expanding from Shark Pat...
8positive[0.026538891717791557, -0.06604061275720596, -...0.9457012540positiveActing well above 26.positive9 flat base trigg...Acting well above 26.positive9 flat base trigg...[Acting well above 26.positive9 flat base trig...
9positive[0.08514805138111115, -0.024943925440311432, -...0.8974082183positiveCYBX broke out to all time highs accompained b...CYBX broke out to all time highs accompained b...[CYBX broke out to all time highs accompained ...
10positive[0.05149979144334793, -0.061731286346912384, -...0.8397003998positivesolid day with MGM AAP SBX leading the way for...solid day with MGM AAP SBX leading the way for...[solid day with MGM AAP SBX leading the way fo...
11negative[0.03843201324343681, -0.0733688548207283, -0....0.64935919negativeAAP 465 is resistance and heading to 435 in th...AAP 465 is resistance and heading to 435 in th...[AAP 465 is resistance and heading to 435 in t...
12negative[0.02629232406616211, -0.014554189518094063, -...0.696317219negativeCO pointed out this wknd - huge Outside Day - ...CO pointed out this wknd - huge Outside Day - ...[CO pointed out this wknd - huge Outside Day -...
13positive[0.05966312810778618, -0.08997906744480133, -0...0.9940201041positiveVVS ong set up:VVS ong set up:[VVS ong set up:]
14positive[-0.003521521808579564, -0.011022194288671017,...0.8601492763positiveDDD come on 40! I got some calls at 0.positive...DDD come on 40! I got some calls at 0.positive...[DDD come on 40!, I got some calls at 0.posit...
15positive[0.07560215145349503, -0.009755599312484264, -...0.9619401683positiveWMB might fill this gap and reverse.WMB might fill this gap and reverse.[WMB might fill this gap and reverse.]
16negative[-0.0117484824731946, -0.007429660763591528, -...0.7038312565negativeHeard on the Street: Funeral providers would s...Heard on the Street: Funeral providers would s...[Heard on the Street: Funeral providers would ...
17positive[-0.03314466401934624, -0.04437091201543808, -...0.9941792065positiveuser welcome to the TK clubuser welcome to the TK club[user welcome to the TK club]
18positive[-0.018701869994401932, 0.03903575986623764, -...0.9915202308positiveed Weekly Triangle on DVAX,....Scaling Ped Weekly Triangle on DVAX,....Scaling P[ed Weekly Triangle on DVAX,., ., .., Scaling P]
19positive[-0.011695394292473793, 0.05609796568751335, -...0.7255993355positiveAAP What does Al Gore know that we don't?AAP What does Al Gore know that we don't?[AAP What does Al Gore know that we don't?]
20positive[0.04774581268429756, -0.03589113429188728, -0...0.9736692594positiveuser CSN, yesterday I bough at 25Kpositive.237...user CSN, yesterday I bough at 25Kpositive.237...[user CSN, yesterday I bough at 25Kpositive., ...
21positive[0.04126836359500885, -0.0008074558572843671, ...0.9860873099positiveFAO is deciding its direction #waitforitFAO is deciding its direction #waitforit[FAO is deciding its direction #waitforit]
22negative[0.017080938443541527, 0.021576160565018654, 0...0.6751472091negativeEN Take your profits and run. Nice run up,but ...EN Take your profits and run. Nice run up,but ...[EN Take your profits and run., Nice run up,bu...
23negative[0.04716206341981888, -0.021414492279291153, -...0.7596261819negativeGOOG reminds me so much of AAP in Sept bouncin...GOOG reminds me so much of AAP in Sept bounci...[GOOG reminds me so much of AAP in Sept bounci...
24positive[0.023834621533751488, -0.07077112793922424, -...0.6932933913positiveAAP doji being put in on 60 min after 7 down c...AAP doji being put in on 60 min after 7 down c...[AAP doji being put in on 60 min after 7 down ...
25negative[0.1026904433965683, 0.013861684128642082, -0....0.5872683117neutralJames Dinsmore attributes Bancroft Fundâ€â„...James Dinsmore attributes Bancroft Fundâ€â„...[James Dinsmore attributes Bancroft Fundâ€â...
26positive[0.006304154172539711, -0.0644269809126854, -0...0.875854695positiveSEV nicely green on red marketSEV nicely green on red market[SEV nicely green on red market]
27negative[0.06555721163749695, 0.023601243272423744, -0...0.6741482916negativeVS option trader closes out Feb 50C selling po...VS option trader closes out Feb 50C selling po...[VS option trader closes out Feb 50C selling p...
28positive[-0.0012494433904066682, -0.017743036150932312...0.984582692positiveAON Continuation on good volumeAON Continuation on good volume[AON Continuation on good volume]
29positive[0.04852033779025078, 0.00623974809423089, -0....0.778999475positiveWatch for SGY to break its downward trend line...Watch for SGY to break its downward trend line...[Watch for SGY to break its downward trend lin...
30positive[0.05204714462161064, 0.03000013716518879, -0....0.9838673159positiveSXC - breaking above key downward channel leve...SXC - breaking above key downward channel leve...[SXC - breaking above key downward channel lev...
31negative[0.053190235048532486, -0.023420613259077072, ...0.7254562102negativeGovernment May Slash Borrowing From Market In ...Government May Slash Borrowing From Market In ...[Government May Slash Borrowing From Market In...
32negative[0.013695711269974709, 0.031249094754457474, -...0.5771591689neutralCS - To those that doubted me; today is you're...CS - To those that doubted me; today is you're...[CS - To those that doubted me; today is you'r...
33positive[0.03522776812314987, -0.00036802110844291747,...0.7498482455positivePAB has resistance from June to Sept but Over ...PAB has resistance from June to Sept but Over ...[PAB has resistance from June to Sept but Over...
34negative[-0.05846807360649109, -0.027525130659341812, ...0.6835942337negativeBAC anyone think this might slush and fall bel...BAC anyone think this might slush and fall bel...[BAC anyone think this might slush and fall be...
35positive[0.05771354213356972, -0.03869543969631195, -0...0.7768923529positivewith little whips on pops and drops don't go a...with little whips on pops and drops don't go a...[with little whips on pops and drops don't go ...
36negative[-0.049992404878139496, -0.017947403714060783,...0.9157352076positiveGreen Weekly Triangle on CYTX,....pdatingGreen Weekly Triangle on CYTX,....pdating[Green Weekly Triangle on CYTX,., ...pdating]
37positive[0.006133062299340963, -0.05417517572641373, -...0.9254943012positivereports next Wed. user: ATW: Barclays starts a...reports next Wed. user: ATW: Barclays starts ...[reports next Wed. user: ATW:, Barclays start...
38negative[0.041309256106615067, 0.02620919793844223, -0...0.7255683805positiveNKD target positive50 - positive2 points to be...NKD target positive50 - positive2 points to be...[NKD target positive50 - positive2 points to b...
39positive[0.0385039821267128, -0.040522705763578415, -0...0.8798513789positivePositive GOOG earnings pushed NQ_F higher and ...Positive GOOG earnings pushed NQ_F higher and ...[Positive GOOG earnings pushed NQ_F higher and...
40positive[0.05128449201583862, -0.04454624652862549, -0...0.9641383625positiveSSQ pulling back as suspected. uckily sold nea...SSQ pulling back as suspected. uckily sold nea...[SSQ pulling back as suspected., uckily sold n...
41negative[0.018300753086805344, -0.04134758561849594, -...0.644012851positiveNKE potential short for many reasons. Check F ...NKE potential short for many reasons. Check F ...[NKE potential short for many reasons., Check ...
42negative[0.09244795143604279, 0.015826726332306862, -0...0.729759608negativeSupply-chain finance has become popular in rec...Supply-chain finance has become popular in rec...[Supply-chain finance has become popular in re...
43positive[0.042959753423929214, 0.004588234703987837, -...0.985379546positiveAEG Over 28.23AEG Over 28.23[AEG Over 28.23]
44negative[0.004285166040062904, -0.07630395889282227, -...0.6042511225negativeAAP Free-falling Now to 457AAP Free-falling Now to 457[AAP Free-falling Now to 457]
45positive[0.017030321061611176, -0.022644072771072388, ...0.8393841227positiveBAC positivepositive.75 to positivepositive.80...BAC positivepositive.75 to positivepositive.80...[BAC positivepositive., 75 to positivepositive...
46positive[0.030308598652482033, 0.02380680851638317, -0...0.8881663991positiveNVDA not working so far still holding though.NVDA not working so far still holding though.[NVDA not working so far still holding though.]
47negative[0.05175251513719559, 0.05424535647034645, -0....0.5692521467neutralCAT Once she loses 86.50 ookout Below!!! Strai...CAT Once she loses 86.50 ookout Below!!! Strai...[CAT Once she loses 86.50 ookout Below!, !!, S...
48positive[-0.022420072928071022, -0.06559137254953384, ...0.9863961000positiveMy setup alerts went bonkers today..One of man...My setup alerts went bonkers today..One of man...[My setup alerts went bonkers today., ., One o...
49positive[0.03310457989573479, -0.038539715111255646, -...0.925892282positiveSWHC Obama/Biden to speak tomorrow on gun cont...SWHC Obama/Biden to speak tomorrow on gun cont...[SWHC Obama/Biden to speak tomorrow on gun con...
\n","
"],"text/plain":[" y ... sentence\n","0 negative ... [P see what happened from October to Dec that ...\n","1 negative ... [AAP Closed my short for positiveK Will short ...\n","2 negative ... [AAP It may be wise to hold off on buying #AAP...\n","3 negative ... [RT @DaveCBenoit:, The banking system is not ...\n","4 negative ... [Sensex Slumps Over positive,000 Points From D...\n","5 negative ... [Selling your home to a computer was supposed ...\n","6 positive ... [ed Daily Triangle on HEO,., .... pdating ong ...\n","7 positive ... [P breaking out after expanding from Shark Pat...\n","8 positive ... [Acting well above 26.positive9 flat base trig...\n","9 positive ... [CYBX broke out to all time highs accompained ...\n","10 positive ... [solid day with MGM AAP SBX leading the way fo...\n","11 negative ... [AAP 465 is resistance and heading to 435 in t...\n","12 negative ... [CO pointed out this wknd - huge Outside Day -...\n","13 positive ... [VVS ong set up:]\n","14 positive ... [DDD come on 40!, I got some calls at 0.posit...\n","15 positive ... [WMB might fill this gap and reverse.]\n","16 negative ... [Heard on the Street: Funeral providers would ...\n","17 positive ... [user welcome to the TK club]\n","18 positive ... [ed Weekly Triangle on DVAX,., ., .., Scaling P]\n","19 positive ... [AAP What does Al Gore know that we don't?]\n","20 positive ... [user CSN, yesterday I bough at 25Kpositive., ...\n","21 positive ... [FAO is deciding its direction #waitforit]\n","22 negative ... [EN Take your profits and run., Nice run up,bu...\n","23 negative ... [GOOG reminds me so much of AAP in Sept bounci...\n","24 positive ... [AAP doji being put in on 60 min after 7 down ...\n","25 negative ... [James Dinsmore attributes Bancroft Fundâ€â...\n","26 positive ... [SEV nicely green on red market]\n","27 negative ... [VS option trader closes out Feb 50C selling p...\n","28 positive ... [AON Continuation on good volume]\n","29 positive ... [Watch for SGY to break its downward trend lin...\n","30 positive ... [SXC - breaking above key downward channel lev...\n","31 negative ... [Government May Slash Borrowing From Market In...\n","32 negative ... [CS - To those that doubted me; today is you'r...\n","33 positive ... [PAB has resistance from June to Sept but Over...\n","34 negative ... [BAC anyone think this might slush and fall be...\n","35 positive ... [with little whips on pops and drops don't go ...\n","36 negative ... [Green Weekly Triangle on CYTX,., ...pdating]\n","37 positive ... [reports next Wed. user: ATW:, Barclays start...\n","38 negative ... [NKD target positive50 - positive2 points to b...\n","39 positive ... [Positive GOOG earnings pushed NQ_F higher and...\n","40 positive ... [SSQ pulling back as suspected., uckily sold n...\n","41 negative ... [NKE potential short for many reasons., Check ...\n","42 negative ... [Supply-chain finance has become popular in re...\n","43 positive ... [AEG Over 28.23]\n","44 negative ... [AAP Free-falling Now to 457]\n","45 positive ... [BAC positivepositive., 75 to positivepositive...\n","46 positive ... [NVDA not working so far still holding though.]\n","47 negative ... [CAT Once she loses 86.50 ookout Below!, !!, S...\n","48 positive ... [My setup alerts went bonkers today., ., One o...\n","49 positive ... [SWHC Obama/Biden to speak tomorrow on gun con...\n","\n","[50 rows x 8 columns]"]},"metadata":{"tags":[]},"execution_count":8}]},{"cell_type":"markdown","metadata":{"id":"qFoT-s1MjTSS"},"source":["# 7. Try training with different Embeddings"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"nxWFzQOhjWC8","executionInfo":{"status":"ok","timestamp":1620215094053,"user_tz":-120,"elapsed":6283,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"f410569a-7055-427d-c7dd-1388947b8f37"},"source":["# We can use nlu.print_components(action='embed_sentence') to see every possibler sentence embedding we could use. Lets use bert!\n","nlu.print_components(action='embed_sentence')"],"execution_count":null,"outputs":[{"output_type":"stream","text":["For language NLU provides the following Models : \n","nlu.load('en.embed_sentence') returns Spark NLP model tfhub_use\n","nlu.load('en.embed_sentence.use') returns Spark NLP model tfhub_use\n","nlu.load('en.embed_sentence.tfhub_use') returns Spark NLP model tfhub_use\n","nlu.load('en.embed_sentence.use.lg') returns Spark NLP model tfhub_use_lg\n","nlu.load('en.embed_sentence.tfhub_use.lg') returns Spark NLP model tfhub_use_lg\n","nlu.load('en.embed_sentence.albert') returns Spark NLP model albert_base_uncased\n","nlu.load('en.embed_sentence.electra') returns Spark NLP model sent_electra_small_uncased\n","nlu.load('en.embed_sentence.electra_small_uncased') returns Spark NLP model sent_electra_small_uncased\n","nlu.load('en.embed_sentence.electra_base_uncased') returns Spark NLP model sent_electra_base_uncased\n","nlu.load('en.embed_sentence.electra_large_uncased') returns Spark NLP model sent_electra_large_uncased\n","nlu.load('en.embed_sentence.bert') returns Spark NLP model sent_bert_base_uncased\n","nlu.load('en.embed_sentence.bert_base_uncased') returns Spark NLP model sent_bert_base_uncased\n","nlu.load('en.embed_sentence.bert_base_cased') returns Spark NLP model sent_bert_base_cased\n","nlu.load('en.embed_sentence.bert_large_uncased') returns Spark NLP model sent_bert_large_uncased\n","nlu.load('en.embed_sentence.bert_large_cased') returns Spark NLP model sent_bert_large_cased\n","nlu.load('en.embed_sentence.biobert.pubmed_base_cased') returns Spark NLP model sent_biobert_pubmed_base_cased\n","nlu.load('en.embed_sentence.biobert.pubmed_large_cased') returns Spark NLP model sent_biobert_pubmed_large_cased\n","nlu.load('en.embed_sentence.biobert.pmc_base_cased') returns Spark NLP model sent_biobert_pmc_base_cased\n","nlu.load('en.embed_sentence.biobert.pubmed_pmc_base_cased') returns Spark NLP model sent_biobert_pubmed_pmc_base_cased\n","nlu.load('en.embed_sentence.biobert.clinical_base_cased') returns Spark NLP model sent_biobert_clinical_base_cased\n","nlu.load('en.embed_sentence.biobert.discharge_base_cased') returns Spark NLP model sent_biobert_discharge_base_cased\n","nlu.load('en.embed_sentence.covidbert.large_uncased') returns Spark NLP model sent_covidbert_large_uncased\n","nlu.load('en.embed_sentence.small_bert_L2_128') returns Spark NLP model sent_small_bert_L2_128\n","nlu.load('en.embed_sentence.small_bert_L4_128') returns Spark NLP model sent_small_bert_L4_128\n","nlu.load('en.embed_sentence.small_bert_L6_128') returns Spark NLP model sent_small_bert_L6_128\n","nlu.load('en.embed_sentence.small_bert_L8_128') returns Spark NLP model sent_small_bert_L8_128\n","nlu.load('en.embed_sentence.small_bert_L10_128') returns Spark NLP model sent_small_bert_L10_128\n","nlu.load('en.embed_sentence.small_bert_L12_128') returns Spark NLP model sent_small_bert_L12_128\n","nlu.load('en.embed_sentence.small_bert_L2_256') returns Spark NLP model sent_small_bert_L2_256\n","nlu.load('en.embed_sentence.small_bert_L4_256') returns Spark NLP model sent_small_bert_L4_256\n","nlu.load('en.embed_sentence.small_bert_L6_256') returns Spark NLP model sent_small_bert_L6_256\n","nlu.load('en.embed_sentence.small_bert_L8_256') returns Spark NLP model sent_small_bert_L8_256\n","nlu.load('en.embed_sentence.small_bert_L10_256') returns Spark NLP model sent_small_bert_L10_256\n","nlu.load('en.embed_sentence.small_bert_L12_256') returns Spark NLP model sent_small_bert_L12_256\n","nlu.load('en.embed_sentence.small_bert_L2_512') returns Spark NLP model sent_small_bert_L2_512\n","nlu.load('en.embed_sentence.small_bert_L4_512') returns Spark NLP model sent_small_bert_L4_512\n","nlu.load('en.embed_sentence.small_bert_L6_512') returns Spark NLP model sent_small_bert_L6_512\n","nlu.load('en.embed_sentence.small_bert_L8_512') returns Spark NLP model sent_small_bert_L8_512\n","nlu.load('en.embed_sentence.small_bert_L10_512') returns Spark NLP model sent_small_bert_L10_512\n","nlu.load('en.embed_sentence.small_bert_L12_512') returns Spark NLP model sent_small_bert_L12_512\n","nlu.load('en.embed_sentence.small_bert_L2_768') returns Spark NLP model sent_small_bert_L2_768\n","nlu.load('en.embed_sentence.small_bert_L4_768') returns Spark NLP model sent_small_bert_L4_768\n","nlu.load('en.embed_sentence.small_bert_L6_768') returns Spark NLP model sent_small_bert_L6_768\n","nlu.load('en.embed_sentence.small_bert_L8_768') returns Spark NLP model sent_small_bert_L8_768\n","nlu.load('en.embed_sentence.small_bert_L10_768') returns Spark NLP model sent_small_bert_L10_768\n","nlu.load('en.embed_sentence.small_bert_L12_768') returns Spark NLP model sent_small_bert_L12_768\n","For language NLU provides the following Models : \n","nlu.load('fi.embed_sentence') returns Spark NLP model sent_bert_finnish_cased\n","nlu.load('fi.embed_sentence.bert.cased') returns Spark NLP model sent_bert_finnish_cased\n","nlu.load('fi.embed_sentence.bert.uncased') returns Spark NLP model sent_bert_finnish_uncased\n","For language NLU provides the following Models : \n","nlu.load('xx.embed_sentence') returns Spark NLP model sent_bert_multi_cased\n","nlu.load('xx.embed_sentence.bert') returns Spark NLP model sent_bert_multi_cased\n","nlu.load('xx.embed_sentence.bert.cased') returns Spark NLP model sent_bert_multi_cased\n","nlu.load('xx.embed_sentence.labse') returns Spark NLP model labse\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"IKK_Ii_gjJfF","executionInfo":{"status":"ok","timestamp":1620216889890,"user_tz":-120,"elapsed":1801997,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"3ef316f8-15cb-47fa-e781-d26594a43df4"},"source":["trainable_pipe = nlu.load('en.embed_sentence.small_bert_L12_768 train.sentiment')\n","# We need to train longer and user smaller LR for NON-USE based sentence embeddings usually\n","# We could tune the hyperparameters further with hyperparameter tuning methods like gridsearch\n","# Also longer training gives more accuracy\n","trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(120) \n","trainable_pipe['trainable_sentiment_dl'].setLr(0.0005) \n","fitted_pipe = trainable_pipe.fit(train_df)\n","# predict with the trainable pipeline on dataset and get predictions\n","preds = fitted_pipe.predict(train_df,output_level='document')\n","\n","#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n","preds.dropna(inplace=True)\n","print(classification_report(preds['y'], preds['sentiment']))\n","\n","#preds"],"execution_count":null,"outputs":[{"output_type":"stream","text":["sent_small_bert_L12_768 download started this may take some time.\n","Approximate size to download 392.9 MB\n","[OK!]\n","sentence_detector_dl download started this may take some time.\n","Approximate size to download 354.6 KB\n","[OK!]\n"," precision recall f1-score support\n","\n"," negative 0.81 0.68 0.74 1586\n"," neutral 0.00 0.00 0.00 0\n"," positive 0.80 0.73 0.76 1614\n","\n"," accuracy 0.71 3200\n"," macro avg 0.54 0.47 0.50 3200\n","weighted avg 0.80 0.71 0.75 3200\n","\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"_1jxw3GnVGlI"},"source":["# 7.1 evaluate on Test Data"]},{"cell_type":"code","metadata":{"id":"Fxx4yNkNVGFl","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1620217006865,"user_tz":-120,"elapsed":1918670,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"41c4545d-ae7f-47e8-8d6a-19d1a555a10c"},"source":["preds = fitted_pipe.predict(test_df,output_level='document')\n","\n","#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n","preds.dropna(inplace=True)\n","print(classification_report(preds['y'], preds['sentiment']))"],"execution_count":null,"outputs":[{"output_type":"stream","text":[" precision recall f1-score support\n","\n"," negative 0.76 0.62 0.68 414\n"," neutral 0.00 0.00 0.00 0\n"," positive 0.72 0.67 0.69 386\n","\n"," accuracy 0.64 800\n"," macro avg 0.49 0.43 0.46 800\n","weighted avg 0.74 0.64 0.69 800\n","\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"2BB-NwZUoHSe"},"source":["# 8. Lets save the model"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"eLex095goHwm","executionInfo":{"status":"ok","timestamp":1620217179981,"user_tz":-120,"elapsed":2091513,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"0f025bcd-eab2-47ee-8440-0425138fea27"},"source":["stored_model_path = './models/classifier_dl_trained' \n","fitted_pipe.save(stored_model_path)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["Stored model in ./models/classifier_dl_trained\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"e_b2DPd4rCiU"},"source":["# 9. Lets load the model from HDD.\n","This makes Offlien NLU usage possible! \n","You need to call nlu.load(path=path_to_the_pipe) to load a model/pipeline from disk."]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":76},"id":"SO4uz45MoRgp","executionInfo":{"status":"ok","timestamp":1620217194005,"user_tz":-120,"elapsed":2104313,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"004beb68-0b86-41b8-efb7-0c6382311f17"},"source":["hdd_pipe = nlu.load(path=\"./models/classifier_dl_trained\")\n","\n","preds = hdd_pipe.predict('Bitcoin dropped by 50 percent!!')\n","preds"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
sentimentsentence_embedding_from_disksentiment_confidencedocumenttextorigin_indexsentence
0[negative, negative][[0.17410096526145935, 0.14491602778434753, 0....[0.7221757, 0.7221757]Bitcoin dropped by 50 percent!!Bitcoin dropped by 50 percent!!8589934592[Bitcoin dropped by 50 percent!, !]
\n","
"],"text/plain":[" sentiment ... sentence\n","0 [negative, negative] ... [Bitcoin dropped by 50 percent!, !]\n","\n","[1 rows x 7 columns]"]},"metadata":{"tags":[]},"execution_count":13}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"e0CVlkk9v6Qi","executionInfo":{"status":"ok","timestamp":1620217194006,"user_tz":-120,"elapsed":2104171,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"206b4d33-57de-4b00-a731-8182c4359e00"},"source":["hdd_pipe.print_info()"],"execution_count":null,"outputs":[{"output_type":"stream","text":["The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n",">>> pipe['document_assembler'] has settable params:\n","pipe['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n",">>> pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'] has settable params:\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setExplodeSentences(False) | Info: whether to explode each sentence into a different row, for better parallelization. Defaults to false. | Currently set to : False\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setStorageRef('SentenceDetectorDLModel_c83c27f46b97') | Info: storage unique identifier | Currently set to : SentenceDetectorDLModel_c83c27f46b97\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setEncoder(com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@160b9ba) | Info: Data encoder | Currently set to : com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@160b9ba\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setImpossiblePenultimates(['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']) | Info: Impossible penultimates | Currently set to : ['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setModelArchitecture('cnn') | Info: Model architecture (CNN) | Currently set to : cnn\n",">>> pipe['bert_sentence@sent_small_bert_L12_768'] has settable params:\n","pipe['bert_sentence@sent_small_bert_L12_768'].setBatchSize(8) | Info: Size of every batch | Currently set to : 8\n","pipe['bert_sentence@sent_small_bert_L12_768'].setCaseSensitive(False) | Info: whether to ignore case in tokens for embeddings matching | Currently set to : False\n","pipe['bert_sentence@sent_small_bert_L12_768'].setDimension(768) | Info: Number of embedding dimensions | Currently set to : 768\n","pipe['bert_sentence@sent_small_bert_L12_768'].setMaxSentenceLength(128) | Info: Max sentence length to process | Currently set to : 128\n","pipe['bert_sentence@sent_small_bert_L12_768'].setIsLong(False) | Info: Use Long type instead of Int type for inputs buffer - Some Bert models require Long instead of Int. | Currently set to : False\n","pipe['bert_sentence@sent_small_bert_L12_768'].setStorageRef('sent_small_bert_L12_768') | Info: unique reference name for identification | Currently set to : sent_small_bert_L12_768\n",">>> pipe['sentiment_dl@sent_small_bert_L12_768'] has settable params:\n","pipe['sentiment_dl@sent_small_bert_L12_768'].setThreshold(0.6) | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n","pipe['sentiment_dl@sent_small_bert_L12_768'].setThresholdLabel('neutral') | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n","pipe['sentiment_dl@sent_small_bert_L12_768'].setClasses(['positive', 'negative']) | Info: get the tags used to trained this SentimentDLModel | Currently set to : ['positive', 'negative']\n","pipe['sentiment_dl@sent_small_bert_L12_768'].setStorageRef('sent_small_bert_L12_768') | Info: unique reference name for identification | Currently set to : sent_small_bert_L12_768\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"-CdcbSd7WEpm"},"source":[""],"execution_count":null,"outputs":[]}]} \ No newline at end of file +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "zkufh760uvF3" + }, + "source": [ + "![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)\n", + "\n", + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_stock_market.ipynb)\n", + "\n", + "\n", + "# Training a Sentiment Analysis Classifier with NLU\n", + "## 2 Class Demo Stock Market Sentiment Training\n", + "With the [SentimentDL model](https://nlp.johnsnowlabs.com/docs/en/annotators#sentimentdl-multi-class-sentiment-analysis-annotator) from Spark NLP you can achieve State Of the Art results on any multi class text classification problem\n", + "\n", + "This notebook showcases the following features :\n", + "\n", + "- How to train the deep learning classifier\n", + "- How to store a pipeline to disk\n", + "- How to load the pipeline from disk (Enables NLU offline mode)\n", + "\n", + "\n", + "You can achieve these results or even better on this dataset with training data:\n", + "\n", + "\n", + "
\n", + "\n", + "\n", + "![image.png]()\n", + "\n", + "\n", + "\n", + "\n", + "You can achieve these results or even better on this dataset with test data:\n", + "\n", + "\n", + "
\n", + "\n", + "![img.png]()\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dur2drhW5Rvi" + }, + "source": [ + "# 1. Install Java 8 and NLU" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "hFGnBCHavltY" + }, + "source": [ + "!pip install -q johnsnowlabs\n" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "f4KkTfnR5Ugg" + }, + "source": [ + "# 2. Download Stock Market Sentiment dataset\n", + "https://www.kaggle.com/yash612/stockmarket-sentiment-dataset\n", + "#Context\n", + "\n", + "Gathered Stock news from Multiple twitter Handles regarding Economic news dividing into two parts : Negative and positive." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "OrVb5ZMvvrQD" + }, + "source": [ + "! wget https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/resources/en/classifier-dl/stock_data/stock_data.csv\n" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 424 + }, + "id": "y4xSRWIhwT28", + "outputId": "2bb5e57a-472c-4812-dca5-aa29adad67ae" + }, + "source": [ + "import pandas as pd\n", + "train_path = '/content/stock_data.csv'\n", + "\n", + "train_df = pd.read_csv(train_path)\n", + "# the text data to use for classification should be in a column named 'text'\n", + "columns=['text','y']\n", + "train_df = train_df[columns]\n", + "train_df.y = train_df.y.astype(str)\n", + "train_df.y = train_df.y.str.replace('-1','negative')\n", + "train_df.y = train_df.y.str.replace('1','positive')\n", + "from sklearn.model_selection import train_test_split\n", + "\n", + "train_df, test_df = train_test_split(train_df, test_size=0.2)\n", + "train_df" + ], + "execution_count": 4, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " text y\n", + "3971 Taking profits on CMG from 321, not time to a ... positive\n", + "4113 STI 5 min 1 - and 30 min opening range, added ... positive\n", + "1878 ong EN with stop arnd 39.40- entry 40.10 positive\n", + "3270 Adding short BWS to portfolio. I think next ye... negative\n", + "5073 The banks for years rode consumer spending and... negative\n", + "... ... ...\n", + "5591 Sensex, Nifty End Mixed Despite RBI's Steep Re... negative\n", + "2338 ZNGA Merrill upgraded this on 2/5 - just a ret... negative\n", + "984 AMAT reached very high overbought conditions. ... positive\n", + "2883 GOOG reminds me so much of AAP in Sept bounci... negative\n", + "4309 AAP 425 has been the BY target for the last fe... positive\n", + "\n", + "[4632 rows x 2 columns]" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
texty
3971Taking profits on CMG from 321, not time to a ...positive
4113STI 5 min 1 - and 30 min opening range, added ...positive
1878ong EN with stop arnd 39.40- entry 40.10positive
3270Adding short BWS to portfolio. I think next ye...negative
5073The banks for years rode consumer spending and...negative
.........
5591Sensex, Nifty End Mixed Despite RBI's Steep Re...negative
2338ZNGA Merrill upgraded this on 2/5 - just a ret...negative
984AMAT reached very high overbought conditions. ...positive
2883GOOG reminds me so much of AAP in Sept bounci...negative
4309AAP 425 has been the BY target for the last fe...positive
\n", + "

4632 rows × 2 columns

\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 4 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0296Om2C5anY" + }, + "source": [ + "# 3. Train Deep Learning Classifier using nlu.load('train.sentiment')\n", + "\n", + "You dataset label column should be named 'y' and the feature column with text data should be named 'text'" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "id": "3ZIPkRkWftBG", + "outputId": "91339d2a-72ec-490e-8531-2481e68d1447" + }, + "source": [ + "from johnsnowlabs import nlp\n", + "from sklearn.metrics import classification_report\n", + "\n", + "# load a trainable pipeline by specifying the train. prefix and fit it on a datset with label and text columns\n", + "# by default the Universal Sentence Encoder (USE) Sentence embeddings are used for generation\n", + "trainable_pipe = nlp.load('train.sentiment')\n", + "fitted_pipe = trainable_pipe.fit(train_df.iloc[:50])\n", + "\n", + "# predict with the trainable pipeline on dataset and get predictions\n", + "preds = fitted_pipe.predict(train_df.iloc[:50],output_level='document')\n", + "#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n", + "preds.dropna(inplace=True)\n", + "print(classification_report(preds['y'], preds['sentiment']))\n", + "\n", + "preds" + ], + "execution_count": 6, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "sent_small_bert_L2_128 download started this may take some time.\n", + "Approximate size to download 16.1 MB\n", + "[OK!]\n" + ] + }, + { + "output_type": "stream", + "name": "stderr", + "text": [ + "/usr/local/lib/python3.10/dist-packages/nlu/pipe/pipeline.py:149: SettingWithCopyWarning: \n", + "A value is trying to be set on a copy of a slice from a DataFrame.\n", + "Try using .loc[row_indexer,col_indexer] = value instead\n", + "\n", + "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", + " dataset.y = dataset.y.apply(str)\n", + "/usr/local/lib/python3.10/dist-packages/nlu/pipe/utils/data_conversion_utils.py:160: SettingWithCopyWarning: \n", + "A value is trying to be set on a copy of a slice from a DataFrame.\n", + "Try using .loc[row_indexer,col_indexer] = value instead\n", + "\n", + "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", + " data['origin_index'] = data.index\n", + "/usr/local/lib/python3.10/dist-packages/nlu/pipe/utils/data_conversion_utils.py:160: SettingWithCopyWarning: \n", + "A value is trying to be set on a copy of a slice from a DataFrame.\n", + "Try using .loc[row_indexer,col_indexer] = value instead\n", + "\n", + "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", + " data['origin_index'] = data.index\n" + ] + }, + { + "output_type": "stream", + "name": "stdout", + "text": [ + " precision recall f1-score support\n", + "\n", + " negative 0.00 0.00 0.00 17\n", + " positive 0.66 1.00 0.80 33\n", + "\n", + " accuracy 0.66 50\n", + " macro avg 0.33 0.50 0.40 50\n", + "weighted avg 0.44 0.66 0.52 50\n", + "\n" + ] + }, + { + "output_type": "stream", + "name": "stderr", + "text": [ + "/usr/local/lib/python3.10/dist-packages/nlu/pipe/extractors/extractor_methods/base_extractor_methods.py:356: SettingWithCopyWarning: \n", + "A value is trying to be set on a copy of a slice from a DataFrame.\n", + "Try using .loc[row_indexer,col_indexer] = value instead\n", + "\n", + "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", + " df[cols_to_explode] = df[cols_to_explode].apply(pad_same_level_cols, axis=1)\n", + "/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.\n", + " _warn_prf(average, modifier, msg_start, len(result))\n", + "/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.\n", + " _warn_prf(average, modifier, msg_start, len(result))\n", + "/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.\n", + " _warn_prf(average, modifier, msg_start, len(result))\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " document \\\n", + "0 Taking profits on CMG from 321, not time to a ... \n", + "1 STI 5 min 1 - and 30 min opening range, added ... \n", + "2 ong EN with stop arnd 39.40- entry 40.10 \n", + "3 Adding short BWS to portfolio. I think next ye... \n", + "4 The banks for years rode consumer spending and... \n", + "5 of some of the most watched biotechs AMGN look... \n", + "6 GOOG here is the leader of the pack for the ri... \n", + "7 user another good read. keep em coming and tha... \n", + "8 GTXI long 5.16, will take early assuming no ga... \n", + "9 Sen. Kelly Loeffler and her husband, New York ... \n", + "10 WMT if i had enough cash on hand i'd be shorti... \n", + "11 BAC Bank Of America Is The Best Bank Stock On ... \n", + "12 agree user: hedge funds sold AAP in Q4. We'll ... \n", + "13 user: Mr AAP is going to have to stop hanging ... \n", + "14 BAC Obama is slowing the rally... Ouch ! \n", + "15 A couple biotech stocks that are setting up we... \n", + "16 BAC next stop 10.50 \n", + "17 STEM continuing move up - possiby getting some... \n", + "18 CDX taking some off \n", + "19 AJ stopped out +6% for a nice gain \n", + "20 with FCX gapping well above ideal entry lookin... \n", + "21 Rupee Edges Lower To 76.43 Against Dollar Amid... \n", + "22 ACX Today's P reads very positive.Would like t... \n", + "23 KWK this one is heavily oversold here, think w... \n", + "24 user: GEVO the beginning of a new uptrend: \n", + "25 Health insurance stocks should hold up fairly ... \n", + "26 CBMX that is unbelievable!!!!!!! but I am happ... \n", + "27 Global markets rise following fresh signals th... \n", + "28 My SHOTS, various strats, AXDX CMCO DGIT D ECY... \n", + "29 this could be the last good chance to short AMZN \n", + "30 HFC - Great group. ooks good. More here -> \n", + "31 RT @WSJheard: Heard on the Street's @jackycwon... \n", + "32 MON How quickly investors forget the massive b... \n", + "33 EA if price doesn't hold above 9SMA then 16.71... \n", + "34 NKD well above the moving averages, look for s... \n", + "35 AAP looking to take some off around 429, shoul... \n", + "36 Selling ICE Short check out my video analysis \n", + "37 CAT Bingo, it is Bingo everywhere today. \n", + "38 HAO like it on a pop over 6 w vol \n", + "39 user: AAP nothing like firing of CEO to make i... \n", + "40 Coronavirus Crisis: GoAir Decides To Reduce Pa... \n", + "41 As the coronavirus pandemic intensifies, adher... \n", + "42 notable 52wk highs [20 > /sh < 50] AFCE AK ACO... \n", + "43 AAP PMI Manufacturing Index ---> In Few minute... \n", + "44 SPW pauses, SCTY continues its run, up > 20% i... \n", + "45 RT @josephttwallace: Glutted Oil Markets’ Ne... \n", + "46 GOOG keep in mind there are about 1,000 contra... \n", + "47 V and MA FYI: When they tagged the 50d's yeste... \n", + "48 GOOG holding up well \n", + "49 JPMorgan Chase Chief Executive James Dimon ret... \n", + "\n", + " sentence_embedding_small_bert_L2_128 sentiment \\\n", + "0 [-1.1575689315795898, 0.2715361416339874, -0.2... positive \n", + "1 [-1.3090834617614746, -0.1740988940000534, -0.... positive \n", + "2 [-1.189386248588562, 0.4811112582683563, -0.66... positive \n", + "3 [-0.6736384630203247, 0.7296571731567383, -0.1... positive \n", + "4 [-0.3878522217273712, 0.5501695275306702, -0.0... positive \n", + "5 [-0.31379231810569763, 0.5748125910758972, -0.... positive \n", + "6 [-0.5645806193351746, 0.6001827716827393, 0.08... positive \n", + "7 [-1.0190978050231934, 0.7494890093803406, 0.07... positive \n", + "8 [-0.2726511061191559, 0.32721856236457825, -0.... positive \n", + "9 [-0.28392985463142395, 1.012831449508667, -0.0... positive \n", + "10 [-1.1887261867523193, 0.9961069226264954, -0.4... positive \n", + "11 [0.3251396119594574, 0.9818293452262878, -0.51... positive \n", + "12 [-0.5504413843154907, 1.0578975677490234, -0.3... positive \n", + "13 [-0.5393702983856201, 1.2257893085479736, -0.2... positive \n", + "14 [-1.0150749683380127, 0.10769235342741013, -0.... positive \n", + "15 [-0.09664952009916306, -0.05535533279180527, -... positive \n", + "16 [-1.4360231161117554, 0.030535195022821426, -0... positive \n", + "17 [-0.6773928999900818, 0.3930101990699768, -0.6... positive \n", + "18 [-1.3559353351593018, 0.2133549153804779, -0.5... positive \n", + "19 [-1.078188419342041, -0.23945243656635284, -0.... positive \n", + "20 [-0.7907727956771851, 0.7457559108734131, -0.3... positive \n", + "21 [-0.41530898213386536, 0.15534837543964386, -0... positive \n", + "22 [-0.8882211446762085, 0.13111452758312225, 0.1... positive \n", + "23 [-0.4115653336048126, 0.23945458233356476, 0.2... positive \n", + "24 [-0.8558018207550049, 0.20911763608455658, -0.... positive \n", + "25 [-0.6992183327674866, 0.873347282409668, -0.33... positive \n", + "26 [-0.7398191690444946, 0.0032342064660042524, 0... positive \n", + "27 [-0.46590617299079895, 0.5152156949043274, -0.... positive \n", + "28 [-0.9121986627578735, 0.18954770267009735, 0.5... positive \n", + "29 [-1.4936555624008179, -0.08102501928806305, -0... positive \n", + "30 [-1.282071590423584, 0.452540785074234, -0.020... positive \n", + "31 [-0.16953450441360474, 0.5467158555984497, 0.4... positive \n", + "32 [-1.0020204782485962, 0.06035422161221504, -0.... positive \n", + "33 [-0.6850847005844116, 0.7942832708358765, -0.2... positive \n", + "34 [-1.168450117111206, 0.5925496816635132, -0.27... positive \n", + "35 [-0.9121736884117126, 0.7181501388549805, -0.6... positive \n", + "36 [-1.0185582637786865, 0.23524264991283417, -0.... positive \n", + "37 [-1.0476768016815186, -0.31251490116119385, 0.... positive \n", + "38 [-0.397280752658844, -0.7803610563278198, -0.0... positive \n", + "39 [0.2457299381494522, 0.9597267508506775, -0.56... positive \n", + "40 [-0.18016083538532257, 0.4278823137283325, -0.... positive \n", + "41 [-0.7261321544647217, -0.19837094843387604, -0... positive \n", + "42 [-1.047921895980835, 0.6981227993965149, 0.153... positive \n", + "43 [-0.7834053039550781, 0.5567207336425781, -0.6... positive \n", + "44 [-0.8796350955963135, 0.21304339170455933, -0.... positive \n", + "45 [-0.24904966354370117, 0.5449734926223755, 0.1... positive \n", + "46 [-0.6386780738830566, 0.6693974137306213, -0.7... positive \n", + "47 [-1.3859288692474365, 0.1374266892671585, -0.2... positive \n", + "48 [-0.9325542449951172, 0.7117096185684204, -0.2... positive \n", + "49 [-0.5751636624336243, 0.5106868147850037, -0.1... positive \n", + "\n", + " sentiment_confidence text \\\n", + "0 3.0 Taking profits on CMG from 321, not time to a ... \n", + "1 3.0 STI 5 min 1 - and 30 min opening range, added ... \n", + "2 2.0 ong EN with stop arnd 39.40- entry 40.10 \n", + "3 8.0 Adding short BWS to portfolio. I think next ye... \n", + "4 6.0 The banks for years rode consumer spending and... \n", + "5 3.0 of some of the most watched biotechs AMGN look... \n", + "6 7.0 GOOG here is the leader of the pack for the ri... \n", + "7 5.0 user another good read. keep em coming and tha... \n", + "8 1.0 GTXI long 5.16, will take early assuming no ga... \n", + "9 1.0 Sen. Kelly Loeffler and her husband, New York ... \n", + "10 1.0 WMT if i had enough cash on hand i'd be shorti... \n", + "11 2.0 BAC Bank Of America Is The Best Bank Stock On ... \n", + "12 5.0 agree user: hedge funds sold AAP in Q4. We'll ... \n", + "13 2.0 user: Mr AAP is going to have to stop hanging ... \n", + "14 2.0 BAC Obama is slowing the rally... Ouch ! \n", + "15 2.0 A couple biotech stocks that are setting up we... \n", + "16 2.0 BAC next stop 10.50 \n", + "17 4.0 STEM continuing move up - possiby getting some... \n", + "18 5.0 CDX taking some off \n", + "19 1.0 AJ stopped out +6% for a nice gain \n", + "20 2.0 with FCX gapping well above ideal entry lookin... \n", + "21 1.0 Rupee Edges Lower To 76.43 Against Dollar Amid... \n", + "22 4.0 ACX Today's P reads very positive.Would like t... \n", + "23 2.0 KWK this one is heavily oversold here, think w... \n", + "24 7.0 user: GEVO the beginning of a new uptrend: \n", + "25 1.0 Health insurance stocks should hold up fairly ... \n", + "26 5.0 CBMX that is unbelievable!!!!!!! but I am hap... \n", + "27 5.0 Global markets rise following fresh signals th... \n", + "28 8.0 My SHOTS, various strats, AXDX CMCO DGIT D ECY... \n", + "29 5.0 this could be the last good chance to short AMZN \n", + "30 1.0 HFC - Great group. ooks good. More here -> \n", + "31 2.0 RT @WSJheard: Heard on the Street's @jackycwon... \n", + "32 2.0 MON How quickly investors forget the massive b... \n", + "33 2.0 EA if price doesn't hold above 9SMA then 16.71... \n", + "34 3.0 NKD well above the moving averages, look for s... \n", + "35 2.0 AAP looking to take some off around 429, shoul... \n", + "36 1.0 Selling ICE Short check out my video analysis \n", + "37 1.0 CAT Bingo, it is Bingo everywhere today. \n", + "38 7.0 HAO like it on a pop over 6 w vol \n", + "39 2.0 user: AAP nothing like firing of CEO to make i... \n", + "40 7.0 Coronavirus Crisis: GoAir Decides To Reduce Pa... \n", + "41 3.0 As the coronavirus pandemic intensifies, adher... \n", + "42 2.0 notable 52wk highs [20 > /sh < 50] AFCE AK ACO... \n", + "43 6.0 AAP PMI Manufacturing Index ---> In Few minute... \n", + "44 1.0 SPW pauses, SCTY continues its run, up > 20% i... \n", + "45 1.0 RT @josephttwallace: Glutted Oil Markets’ Ne... \n", + "46 3.0 GOOG keep in mind there are about 1,000 contra... \n", + "47 2.0 V and MA FYI: When they tagged the 50d's yeste... \n", + "48 4.0 GOOG holding up well \n", + "49 3.0 JPMorgan Chase Chief Executive James Dimon ret... \n", + "\n", + " y \n", + "0 positive \n", + "1 positive \n", + "2 positive \n", + "3 negative \n", + "4 negative \n", + "5 positive \n", + "6 negative \n", + "7 negative \n", + "8 positive \n", + "9 positive \n", + "10 negative \n", + "11 positive \n", + "12 positive \n", + "13 positive \n", + "14 negative \n", + "15 positive \n", + "16 negative \n", + "17 positive \n", + "18 positive \n", + "19 positive \n", + "20 positive \n", + "21 negative \n", + "22 positive \n", + "23 positive \n", + "24 positive \n", + "25 negative \n", + "26 positive \n", + "27 positive \n", + "28 negative \n", + "29 negative \n", + "30 positive \n", + "31 positive \n", + "32 positive \n", + "33 positive \n", + "34 positive \n", + "35 positive \n", + "36 negative \n", + "37 positive \n", + "38 positive \n", + "39 positive \n", + "40 negative \n", + "41 negative \n", + "42 positive \n", + "43 negative \n", + "44 positive \n", + "45 negative \n", + "46 negative \n", + "47 positive \n", + "48 positive \n", + "49 positive " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
documentsentence_embedding_small_bert_L2_128sentimentsentiment_confidencetexty
0Taking profits on CMG from 321, not time to a ...[-1.1575689315795898, 0.2715361416339874, -0.2...positive3.0Taking profits on CMG from 321, not time to a ...positive
1STI 5 min 1 - and 30 min opening range, added ...[-1.3090834617614746, -0.1740988940000534, -0....positive3.0STI 5 min 1 - and 30 min opening range, added ...positive
2ong EN with stop arnd 39.40- entry 40.10[-1.189386248588562, 0.4811112582683563, -0.66...positive2.0ong EN with stop arnd 39.40- entry 40.10positive
3Adding short BWS to portfolio. I think next ye...[-0.6736384630203247, 0.7296571731567383, -0.1...positive8.0Adding short BWS to portfolio. I think next ye...negative
4The banks for years rode consumer spending and...[-0.3878522217273712, 0.5501695275306702, -0.0...positive6.0The banks for years rode consumer spending and...negative
5of some of the most watched biotechs AMGN look...[-0.31379231810569763, 0.5748125910758972, -0....positive3.0of some of the most watched biotechs AMGN look...positive
6GOOG here is the leader of the pack for the ri...[-0.5645806193351746, 0.6001827716827393, 0.08...positive7.0GOOG here is the leader of the pack for the ri...negative
7user another good read. keep em coming and tha...[-1.0190978050231934, 0.7494890093803406, 0.07...positive5.0user another good read. keep em coming and tha...negative
8GTXI long 5.16, will take early assuming no ga...[-0.2726511061191559, 0.32721856236457825, -0....positive1.0GTXI long 5.16, will take early assuming no ga...positive
9Sen. Kelly Loeffler and her husband, New York ...[-0.28392985463142395, 1.012831449508667, -0.0...positive1.0Sen. Kelly Loeffler and her husband, New York ...positive
10WMT if i had enough cash on hand i'd be shorti...[-1.1887261867523193, 0.9961069226264954, -0.4...positive1.0WMT if i had enough cash on hand i'd be shorti...negative
11BAC Bank Of America Is The Best Bank Stock On ...[0.3251396119594574, 0.9818293452262878, -0.51...positive2.0BAC Bank Of America Is The Best Bank Stock On ...positive
12agree user: hedge funds sold AAP in Q4. We'll ...[-0.5504413843154907, 1.0578975677490234, -0.3...positive5.0agree user: hedge funds sold AAP in Q4. We'll ...positive
13user: Mr AAP is going to have to stop hanging ...[-0.5393702983856201, 1.2257893085479736, -0.2...positive2.0user: Mr AAP is going to have to stop hanging ...positive
14BAC Obama is slowing the rally... Ouch ![-1.0150749683380127, 0.10769235342741013, -0....positive2.0BAC Obama is slowing the rally... Ouch !negative
15A couple biotech stocks that are setting up we...[-0.09664952009916306, -0.05535533279180527, -...positive2.0A couple biotech stocks that are setting up we...positive
16BAC next stop 10.50[-1.4360231161117554, 0.030535195022821426, -0...positive2.0BAC next stop 10.50negative
17STEM continuing move up - possiby getting some...[-0.6773928999900818, 0.3930101990699768, -0.6...positive4.0STEM continuing move up - possiby getting some...positive
18CDX taking some off[-1.3559353351593018, 0.2133549153804779, -0.5...positive5.0CDX taking some offpositive
19AJ stopped out +6% for a nice gain[-1.078188419342041, -0.23945243656635284, -0....positive1.0AJ stopped out +6% for a nice gainpositive
20with FCX gapping well above ideal entry lookin...[-0.7907727956771851, 0.7457559108734131, -0.3...positive2.0with FCX gapping well above ideal entry lookin...positive
21Rupee Edges Lower To 76.43 Against Dollar Amid...[-0.41530898213386536, 0.15534837543964386, -0...positive1.0Rupee Edges Lower To 76.43 Against Dollar Amid...negative
22ACX Today's P reads very positive.Would like t...[-0.8882211446762085, 0.13111452758312225, 0.1...positive4.0ACX Today's P reads very positive.Would like t...positive
23KWK this one is heavily oversold here, think w...[-0.4115653336048126, 0.23945458233356476, 0.2...positive2.0KWK this one is heavily oversold here, think w...positive
24user: GEVO the beginning of a new uptrend:[-0.8558018207550049, 0.20911763608455658, -0....positive7.0user: GEVO the beginning of a new uptrend:positive
25Health insurance stocks should hold up fairly ...[-0.6992183327674866, 0.873347282409668, -0.33...positive1.0Health insurance stocks should hold up fairly ...negative
26CBMX that is unbelievable!!!!!!! but I am happ...[-0.7398191690444946, 0.0032342064660042524, 0...positive5.0CBMX that is unbelievable!!!!!!! but I am hap...positive
27Global markets rise following fresh signals th...[-0.46590617299079895, 0.5152156949043274, -0....positive5.0Global markets rise following fresh signals th...positive
28My SHOTS, various strats, AXDX CMCO DGIT D ECY...[-0.9121986627578735, 0.18954770267009735, 0.5...positive8.0My SHOTS, various strats, AXDX CMCO DGIT D ECY...negative
29this could be the last good chance to short AMZN[-1.4936555624008179, -0.08102501928806305, -0...positive5.0this could be the last good chance to short AMZNnegative
30HFC - Great group. ooks good. More here ->[-1.282071590423584, 0.452540785074234, -0.020...positive1.0HFC - Great group. ooks good. More here ->positive
31RT @WSJheard: Heard on the Street's @jackycwon...[-0.16953450441360474, 0.5467158555984497, 0.4...positive2.0RT @WSJheard: Heard on the Street's @jackycwon...positive
32MON How quickly investors forget the massive b...[-1.0020204782485962, 0.06035422161221504, -0....positive2.0MON How quickly investors forget the massive b...positive
33EA if price doesn't hold above 9SMA then 16.71...[-0.6850847005844116, 0.7942832708358765, -0.2...positive2.0EA if price doesn't hold above 9SMA then 16.71...positive
34NKD well above the moving averages, look for s...[-1.168450117111206, 0.5925496816635132, -0.27...positive3.0NKD well above the moving averages, look for s...positive
35AAP looking to take some off around 429, shoul...[-0.9121736884117126, 0.7181501388549805, -0.6...positive2.0AAP looking to take some off around 429, shoul...positive
36Selling ICE Short check out my video analysis[-1.0185582637786865, 0.23524264991283417, -0....positive1.0Selling ICE Short check out my video analysisnegative
37CAT Bingo, it is Bingo everywhere today.[-1.0476768016815186, -0.31251490116119385, 0....positive1.0CAT Bingo, it is Bingo everywhere today.positive
38HAO like it on a pop over 6 w vol[-0.397280752658844, -0.7803610563278198, -0.0...positive7.0HAO like it on a pop over 6 w volpositive
39user: AAP nothing like firing of CEO to make i...[0.2457299381494522, 0.9597267508506775, -0.56...positive2.0user: AAP nothing like firing of CEO to make i...positive
40Coronavirus Crisis: GoAir Decides To Reduce Pa...[-0.18016083538532257, 0.4278823137283325, -0....positive7.0Coronavirus Crisis: GoAir Decides To Reduce Pa...negative
41As the coronavirus pandemic intensifies, adher...[-0.7261321544647217, -0.19837094843387604, -0...positive3.0As the coronavirus pandemic intensifies, adher...negative
42notable 52wk highs [20 > /sh < 50] AFCE AK ACO...[-1.047921895980835, 0.6981227993965149, 0.153...positive2.0notable 52wk highs [20 > /sh < 50] AFCE AK ACO...positive
43AAP PMI Manufacturing Index ---> In Few minute...[-0.7834053039550781, 0.5567207336425781, -0.6...positive6.0AAP PMI Manufacturing Index ---> In Few minute...negative
44SPW pauses, SCTY continues its run, up > 20% i...[-0.8796350955963135, 0.21304339170455933, -0....positive1.0SPW pauses, SCTY continues its run, up > 20% i...positive
45RT @josephttwallace: Glutted Oil Markets’ Ne...[-0.24904966354370117, 0.5449734926223755, 0.1...positive1.0RT @josephttwallace: Glutted Oil Markets’ Ne...negative
46GOOG keep in mind there are about 1,000 contra...[-0.6386780738830566, 0.6693974137306213, -0.7...positive3.0GOOG keep in mind there are about 1,000 contra...negative
47V and MA FYI: When they tagged the 50d's yeste...[-1.3859288692474365, 0.1374266892671585, -0.2...positive2.0V and MA FYI: When they tagged the 50d's yeste...positive
48GOOG holding up well[-0.9325542449951172, 0.7117096185684204, -0.2...positive4.0GOOG holding up wellpositive
49JPMorgan Chase Chief Executive James Dimon ret...[-0.5751636624336243, 0.5106868147850037, -0.1...positive3.0JPMorgan Chase Chief Executive James Dimon ret...positive
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 6 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lVyOE2wV0fw_" + }, + "source": [ + "# 4. Test the fitted pipe on new example" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 150 + }, + "id": "qdCUg2MR0PD2", + "outputId": "3c226560-fe14-42eb-e537-0535e6d0819f" + }, + "source": [ + "fitted_pipe.predict(\"Bitcoin dropped by 50 percent!\")" + ], + "execution_count": 7, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "sentence_detector_dl download started this may take some time.\n", + "Approximate size to download 354.6 KB\n", + "[OK!]\n", + "Warning::Spark Session already created, some configs may not take.\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " sentence \\\n", + "0 Bitcoin dropped by 50 percent! \n", + "\n", + " sentence_embedding_small_bert_L2_128 sentiment \\\n", + "0 [-1.7797279357910156, 0.3090762495994568, -0.2... positive \n", + "\n", + " sentiment_confidence \n", + "0 1.0 " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
sentencesentence_embedding_small_bert_L2_128sentimentsentiment_confidence
0Bitcoin dropped by 50 percent![-1.7797279357910156, 0.3090762495994568, -0.2...positive1.0
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 7 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xflpwrVjjBVD" + }, + "source": [ + "## 5. Configure pipe training parameters" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "UtsAUGTmOTms", + "outputId": "f9404635-50b4-4278-b7e1-95f0c05692ff" + }, + "source": [ + "trainable_pipe.print_info()" + ], + "execution_count": 8, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n", + ">>> component_list['bert_sentence_embeddings@sent_small_bert_L2_128'] has settable params:\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setBatchSize(8) | Info: Size of every batch | Currently set to : 8\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setIsLong(False) | Info: Use Long type instead of Int type for inputs buffer - Some Bert models require Long instead of Int. | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setMaxSentenceLength(128) | Info: Max sentence length to process | Currently set to : 128\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setDimension(128) | Info: Number of embedding dimensions | Currently set to : 128\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setCaseSensitive(False) | Info: whether to ignore case in tokens for embeddings matching | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setStorageRef('sent_small_bert_L2_128') | Info: unique reference name for identification | Currently set to : sent_small_bert_L2_128\n", + ">>> component_list['document_assembler'] has settable params:\n", + "component_list['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n", + ">>> component_list['sentiment_dl@sent_small_bert_L2_128'] has settable params:\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setThreshold(0.6) | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setThresholdLabel('neutral') | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setStorageRef('sent_small_bert_L2_128') | Info: unique reference name for identification | Currently set to : sent_small_bert_L2_128\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2GJdDNV9jEIe" + }, + "source": [ + "## 6. Retrain with new parameters" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "id": "mptfvHx-MMMX", + "outputId": "bf0f4962-bb67-4c9a-c92c-96b64cbfdf51" + }, + "source": [ + "# Train longer!\n", + "trainable_pipe = nlp.load('train.sentiment')\n", + "trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(5)\n", + "fitted_pipe = trainable_pipe.fit(train_df.iloc[:50])\n", + "# predict with the trainable pipeline on dataset and get predictions\n", + "preds = fitted_pipe.predict(train_df.iloc[:50],output_level='document')\n", + "\n", + "#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n", + "preds.dropna(inplace=True)\n", + "print(classification_report(preds['y'], preds['sentiment']))\n", + "\n", + "preds" + ], + "execution_count": 9, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", + "sent_small_bert_L2_128 download started this may take some time.\n", + "Approximate size to download 16.1 MB\n", + "[OK!]\n" + ] + }, + { + "output_type": "stream", + "name": "stderr", + "text": [ + "/usr/local/lib/python3.10/dist-packages/nlu/pipe/pipeline.py:149: SettingWithCopyWarning: \n", + "A value is trying to be set on a copy of a slice from a DataFrame.\n", + "Try using .loc[row_indexer,col_indexer] = value instead\n", + "\n", + "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", + " dataset.y = dataset.y.apply(str)\n", + "/usr/local/lib/python3.10/dist-packages/nlu/pipe/utils/data_conversion_utils.py:160: SettingWithCopyWarning: \n", + "A value is trying to be set on a copy of a slice from a DataFrame.\n", + "Try using .loc[row_indexer,col_indexer] = value instead\n", + "\n", + "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", + " data['origin_index'] = data.index\n", + "/usr/local/lib/python3.10/dist-packages/nlu/pipe/utils/data_conversion_utils.py:160: SettingWithCopyWarning: \n", + "A value is trying to be set on a copy of a slice from a DataFrame.\n", + "Try using .loc[row_indexer,col_indexer] = value instead\n", + "\n", + "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", + " data['origin_index'] = data.index\n" + ] + }, + { + "output_type": "stream", + "name": "stdout", + "text": [ + " precision recall f1-score support\n", + "\n", + " negative 0.00 0.00 0.00 17\n", + " positive 0.66 1.00 0.80 33\n", + "\n", + " accuracy 0.66 50\n", + " macro avg 0.33 0.50 0.40 50\n", + "weighted avg 0.44 0.66 0.52 50\n", + "\n" + ] + }, + { + "output_type": "stream", + "name": "stderr", + "text": [ + "/usr/local/lib/python3.10/dist-packages/nlu/pipe/extractors/extractor_methods/base_extractor_methods.py:356: SettingWithCopyWarning: \n", + "A value is trying to be set on a copy of a slice from a DataFrame.\n", + "Try using .loc[row_indexer,col_indexer] = value instead\n", + "\n", + "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", + " df[cols_to_explode] = df[cols_to_explode].apply(pad_same_level_cols, axis=1)\n", + "/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.\n", + " _warn_prf(average, modifier, msg_start, len(result))\n", + "/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.\n", + " _warn_prf(average, modifier, msg_start, len(result))\n", + "/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.\n", + " _warn_prf(average, modifier, msg_start, len(result))\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " document \\\n", + "0 Taking profits on CMG from 321, not time to a ... \n", + "1 STI 5 min 1 - and 30 min opening range, added ... \n", + "2 ong EN with stop arnd 39.40- entry 40.10 \n", + "3 Adding short BWS to portfolio. I think next ye... \n", + "4 The banks for years rode consumer spending and... \n", + "5 of some of the most watched biotechs AMGN look... \n", + "6 GOOG here is the leader of the pack for the ri... \n", + "7 user another good read. keep em coming and tha... \n", + "8 GTXI long 5.16, will take early assuming no ga... \n", + "9 Sen. Kelly Loeffler and her husband, New York ... \n", + "10 WMT if i had enough cash on hand i'd be shorti... \n", + "11 BAC Bank Of America Is The Best Bank Stock On ... \n", + "12 agree user: hedge funds sold AAP in Q4. We'll ... \n", + "13 user: Mr AAP is going to have to stop hanging ... \n", + "14 BAC Obama is slowing the rally... Ouch ! \n", + "15 A couple biotech stocks that are setting up we... \n", + "16 BAC next stop 10.50 \n", + "17 STEM continuing move up - possiby getting some... \n", + "18 CDX taking some off \n", + "19 AJ stopped out +6% for a nice gain \n", + "20 with FCX gapping well above ideal entry lookin... \n", + "21 Rupee Edges Lower To 76.43 Against Dollar Amid... \n", + "22 ACX Today's P reads very positive.Would like t... \n", + "23 KWK this one is heavily oversold here, think w... \n", + "24 user: GEVO the beginning of a new uptrend: \n", + "25 Health insurance stocks should hold up fairly ... \n", + "26 CBMX that is unbelievable!!!!!!! but I am happ... \n", + "27 Global markets rise following fresh signals th... \n", + "28 My SHOTS, various strats, AXDX CMCO DGIT D ECY... \n", + "29 this could be the last good chance to short AMZN \n", + "30 HFC - Great group. ooks good. More here -> \n", + "31 RT @WSJheard: Heard on the Street's @jackycwon... \n", + "32 MON How quickly investors forget the massive b... \n", + "33 EA if price doesn't hold above 9SMA then 16.71... \n", + "34 NKD well above the moving averages, look for s... \n", + "35 AAP looking to take some off around 429, shoul... \n", + "36 Selling ICE Short check out my video analysis \n", + "37 CAT Bingo, it is Bingo everywhere today. \n", + "38 HAO like it on a pop over 6 w vol \n", + "39 user: AAP nothing like firing of CEO to make i... \n", + "40 Coronavirus Crisis: GoAir Decides To Reduce Pa... \n", + "41 As the coronavirus pandemic intensifies, adher... \n", + "42 notable 52wk highs [20 > /sh < 50] AFCE AK ACO... \n", + "43 AAP PMI Manufacturing Index ---> In Few minute... \n", + "44 SPW pauses, SCTY continues its run, up > 20% i... \n", + "45 RT @josephttwallace: Glutted Oil Markets’ Ne... \n", + "46 GOOG keep in mind there are about 1,000 contra... \n", + "47 V and MA FYI: When they tagged the 50d's yeste... \n", + "48 GOOG holding up well \n", + "49 JPMorgan Chase Chief Executive James Dimon ret... \n", + "\n", + " sentence_embedding_small_bert_L2_128 sentiment \\\n", + "0 [-1.1575689315795898, 0.2715361416339874, -0.2... positive \n", + "1 [-1.3090834617614746, -0.1740988940000534, -0.... positive \n", + "2 [-1.189386248588562, 0.4811112582683563, -0.66... positive \n", + "3 [-0.6736384630203247, 0.7296571731567383, -0.1... positive \n", + "4 [-0.3878522217273712, 0.5501695275306702, -0.0... positive \n", + "5 [-0.31379231810569763, 0.5748125910758972, -0.... positive \n", + "6 [-0.5645806193351746, 0.6001827716827393, 0.08... positive \n", + "7 [-1.0190978050231934, 0.7494890093803406, 0.07... positive \n", + "8 [-0.2726511061191559, 0.32721856236457825, -0.... positive \n", + "9 [-0.28392985463142395, 1.012831449508667, -0.0... positive \n", + "10 [-1.1887261867523193, 0.9961069226264954, -0.4... positive \n", + "11 [0.3251396119594574, 0.9818293452262878, -0.51... positive \n", + "12 [-0.5504413843154907, 1.0578975677490234, -0.3... positive \n", + "13 [-0.5393702983856201, 1.2257893085479736, -0.2... positive \n", + "14 [-1.0150749683380127, 0.10769235342741013, -0.... positive \n", + "15 [-0.09664952009916306, -0.05535533279180527, -... positive \n", + "16 [-1.4360231161117554, 0.030535195022821426, -0... positive \n", + "17 [-0.6773928999900818, 0.3930101990699768, -0.6... positive \n", + "18 [-1.3559353351593018, 0.2133549153804779, -0.5... positive \n", + "19 [-1.078188419342041, -0.23945243656635284, -0.... positive \n", + "20 [-0.7907727956771851, 0.7457559108734131, -0.3... positive \n", + "21 [-0.41530898213386536, 0.15534837543964386, -0... positive \n", + "22 [-0.8882211446762085, 0.13111452758312225, 0.1... positive \n", + "23 [-0.4115653336048126, 0.23945458233356476, 0.2... positive \n", + "24 [-0.8558018207550049, 0.20911763608455658, -0.... positive \n", + "25 [-0.6992183327674866, 0.873347282409668, -0.33... positive \n", + "26 [-0.7398191690444946, 0.0032342064660042524, 0... positive \n", + "27 [-0.46590617299079895, 0.5152156949043274, -0.... positive \n", + "28 [-0.9121986627578735, 0.18954770267009735, 0.5... positive \n", + "29 [-1.4936555624008179, -0.08102501928806305, -0... positive \n", + "30 [-1.282071590423584, 0.452540785074234, -0.020... positive \n", + "31 [-0.16953450441360474, 0.5467158555984497, 0.4... positive \n", + "32 [-1.0020204782485962, 0.06035422161221504, -0.... positive \n", + "33 [-0.6850847005844116, 0.7942832708358765, -0.2... positive \n", + "34 [-1.168450117111206, 0.5925496816635132, -0.27... positive \n", + "35 [-0.9121736884117126, 0.7181501388549805, -0.6... positive \n", + "36 [-1.0185582637786865, 0.23524264991283417, -0.... positive \n", + "37 [-1.0476768016815186, -0.31251490116119385, 0.... positive \n", + "38 [-0.397280752658844, -0.7803610563278198, -0.0... positive \n", + "39 [0.2457299381494522, 0.9597267508506775, -0.56... positive \n", + "40 [-0.18016083538532257, 0.4278823137283325, -0.... positive \n", + "41 [-0.7261321544647217, -0.19837094843387604, -0... positive \n", + "42 [-1.047921895980835, 0.6981227993965149, 0.153... positive \n", + "43 [-0.7834053039550781, 0.5567207336425781, -0.6... positive \n", + "44 [-0.8796350955963135, 0.21304339170455933, -0.... positive \n", + "45 [-0.24904966354370117, 0.5449734926223755, 0.1... positive \n", + "46 [-0.6386780738830566, 0.6693974137306213, -0.7... positive \n", + "47 [-1.3859288692474365, 0.1374266892671585, -0.2... positive \n", + "48 [-0.9325542449951172, 0.7117096185684204, -0.2... positive \n", + "49 [-0.5751636624336243, 0.5106868147850037, -0.1... positive \n", + "\n", + " sentiment_confidence text \\\n", + "0 4.0 Taking profits on CMG from 321, not time to a ... \n", + "1 7.0 STI 5 min 1 - and 30 min opening range, added ... \n", + "2 4.0 ong EN with stop arnd 39.40- entry 40.10 \n", + "3 7.0 Adding short BWS to portfolio. I think next ye... \n", + "4 4.0 The banks for years rode consumer spending and... \n", + "5 1.0 of some of the most watched biotechs AMGN look... \n", + "6 5.0 GOOG here is the leader of the pack for the ri... \n", + "7 1.0 user another good read. keep em coming and tha... \n", + "8 7.0 GTXI long 5.16, will take early assuming no ga... \n", + "9 1.0 Sen. Kelly Loeffler and her husband, New York ... \n", + "10 9.0 WMT if i had enough cash on hand i'd be shorti... \n", + "11 9.0 BAC Bank Of America Is The Best Bank Stock On ... \n", + "12 3.0 agree user: hedge funds sold AAP in Q4. We'll ... \n", + "13 2.0 user: Mr AAP is going to have to stop hanging ... \n", + "14 1.0 BAC Obama is slowing the rally... Ouch ! \n", + "15 5.0 A couple biotech stocks that are setting up we... \n", + "16 1.0 BAC next stop 10.50 \n", + "17 8.0 STEM continuing move up - possiby getting some... \n", + "18 1.0 CDX taking some off \n", + "19 5.0 AJ stopped out +6% for a nice gain \n", + "20 4.0 with FCX gapping well above ideal entry lookin... \n", + "21 4.0 Rupee Edges Lower To 76.43 Against Dollar Amid... \n", + "22 1.0 ACX Today's P reads very positive.Would like t... \n", + "23 2.0 KWK this one is heavily oversold here, think w... \n", + "24 8.0 user: GEVO the beginning of a new uptrend: \n", + "25 4.0 Health insurance stocks should hold up fairly ... \n", + "26 2.0 CBMX that is unbelievable!!!!!!! but I am hap... \n", + "27 4.0 Global markets rise following fresh signals th... \n", + "28 3.0 My SHOTS, various strats, AXDX CMCO DGIT D ECY... \n", + "29 1.0 this could be the last good chance to short AMZN \n", + "30 6.0 HFC - Great group. ooks good. More here -> \n", + "31 3.0 RT @WSJheard: Heard on the Street's @jackycwon... \n", + "32 2.0 MON How quickly investors forget the massive b... \n", + "33 2.0 EA if price doesn't hold above 9SMA then 16.71... \n", + "34 7.0 NKD well above the moving averages, look for s... \n", + "35 8.0 AAP looking to take some off around 429, shoul... \n", + "36 1.0 Selling ICE Short check out my video analysis \n", + "37 1.0 CAT Bingo, it is Bingo everywhere today. \n", + "38 2.0 HAO like it on a pop over 6 w vol \n", + "39 4.0 user: AAP nothing like firing of CEO to make i... \n", + "40 8.0 Coronavirus Crisis: GoAir Decides To Reduce Pa... \n", + "41 6.0 As the coronavirus pandemic intensifies, adher... \n", + "42 7.0 notable 52wk highs [20 > /sh < 50] AFCE AK ACO... \n", + "43 7.0 AAP PMI Manufacturing Index ---> In Few minute... \n", + "44 3.0 SPW pauses, SCTY continues its run, up > 20% i... \n", + "45 2.0 RT @josephttwallace: Glutted Oil Markets’ Ne... \n", + "46 4.0 GOOG keep in mind there are about 1,000 contra... \n", + "47 1.0 V and MA FYI: When they tagged the 50d's yeste... \n", + "48 2.0 GOOG holding up well \n", + "49 6.0 JPMorgan Chase Chief Executive James Dimon ret... \n", + "\n", + " y \n", + "0 positive \n", + "1 positive \n", + "2 positive \n", + "3 negative \n", + "4 negative \n", + "5 positive \n", + "6 negative \n", + "7 negative \n", + "8 positive \n", + "9 positive \n", + "10 negative \n", + "11 positive \n", + "12 positive \n", + "13 positive \n", + "14 negative \n", + "15 positive \n", + "16 negative \n", + "17 positive \n", + "18 positive \n", + "19 positive \n", + "20 positive \n", + "21 negative \n", + "22 positive \n", + "23 positive \n", + "24 positive \n", + "25 negative \n", + "26 positive \n", + "27 positive \n", + "28 negative \n", + "29 negative \n", + "30 positive \n", + "31 positive \n", + "32 positive \n", + "33 positive \n", + "34 positive \n", + "35 positive \n", + "36 negative \n", + "37 positive \n", + "38 positive \n", + "39 positive \n", + "40 negative \n", + "41 negative \n", + "42 positive \n", + "43 negative \n", + "44 positive \n", + "45 negative \n", + "46 negative \n", + "47 positive \n", + "48 positive \n", + "49 positive " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
documentsentence_embedding_small_bert_L2_128sentimentsentiment_confidencetexty
0Taking profits on CMG from 321, not time to a ...[-1.1575689315795898, 0.2715361416339874, -0.2...positive4.0Taking profits on CMG from 321, not time to a ...positive
1STI 5 min 1 - and 30 min opening range, added ...[-1.3090834617614746, -0.1740988940000534, -0....positive7.0STI 5 min 1 - and 30 min opening range, added ...positive
2ong EN with stop arnd 39.40- entry 40.10[-1.189386248588562, 0.4811112582683563, -0.66...positive4.0ong EN with stop arnd 39.40- entry 40.10positive
3Adding short BWS to portfolio. I think next ye...[-0.6736384630203247, 0.7296571731567383, -0.1...positive7.0Adding short BWS to portfolio. I think next ye...negative
4The banks for years rode consumer spending and...[-0.3878522217273712, 0.5501695275306702, -0.0...positive4.0The banks for years rode consumer spending and...negative
5of some of the most watched biotechs AMGN look...[-0.31379231810569763, 0.5748125910758972, -0....positive1.0of some of the most watched biotechs AMGN look...positive
6GOOG here is the leader of the pack for the ri...[-0.5645806193351746, 0.6001827716827393, 0.08...positive5.0GOOG here is the leader of the pack for the ri...negative
7user another good read. keep em coming and tha...[-1.0190978050231934, 0.7494890093803406, 0.07...positive1.0user another good read. keep em coming and tha...negative
8GTXI long 5.16, will take early assuming no ga...[-0.2726511061191559, 0.32721856236457825, -0....positive7.0GTXI long 5.16, will take early assuming no ga...positive
9Sen. Kelly Loeffler and her husband, New York ...[-0.28392985463142395, 1.012831449508667, -0.0...positive1.0Sen. Kelly Loeffler and her husband, New York ...positive
10WMT if i had enough cash on hand i'd be shorti...[-1.1887261867523193, 0.9961069226264954, -0.4...positive9.0WMT if i had enough cash on hand i'd be shorti...negative
11BAC Bank Of America Is The Best Bank Stock On ...[0.3251396119594574, 0.9818293452262878, -0.51...positive9.0BAC Bank Of America Is The Best Bank Stock On ...positive
12agree user: hedge funds sold AAP in Q4. We'll ...[-0.5504413843154907, 1.0578975677490234, -0.3...positive3.0agree user: hedge funds sold AAP in Q4. We'll ...positive
13user: Mr AAP is going to have to stop hanging ...[-0.5393702983856201, 1.2257893085479736, -0.2...positive2.0user: Mr AAP is going to have to stop hanging ...positive
14BAC Obama is slowing the rally... Ouch ![-1.0150749683380127, 0.10769235342741013, -0....positive1.0BAC Obama is slowing the rally... Ouch !negative
15A couple biotech stocks that are setting up we...[-0.09664952009916306, -0.05535533279180527, -...positive5.0A couple biotech stocks that are setting up we...positive
16BAC next stop 10.50[-1.4360231161117554, 0.030535195022821426, -0...positive1.0BAC next stop 10.50negative
17STEM continuing move up - possiby getting some...[-0.6773928999900818, 0.3930101990699768, -0.6...positive8.0STEM continuing move up - possiby getting some...positive
18CDX taking some off[-1.3559353351593018, 0.2133549153804779, -0.5...positive1.0CDX taking some offpositive
19AJ stopped out +6% for a nice gain[-1.078188419342041, -0.23945243656635284, -0....positive5.0AJ stopped out +6% for a nice gainpositive
20with FCX gapping well above ideal entry lookin...[-0.7907727956771851, 0.7457559108734131, -0.3...positive4.0with FCX gapping well above ideal entry lookin...positive
21Rupee Edges Lower To 76.43 Against Dollar Amid...[-0.41530898213386536, 0.15534837543964386, -0...positive4.0Rupee Edges Lower To 76.43 Against Dollar Amid...negative
22ACX Today's P reads very positive.Would like t...[-0.8882211446762085, 0.13111452758312225, 0.1...positive1.0ACX Today's P reads very positive.Would like t...positive
23KWK this one is heavily oversold here, think w...[-0.4115653336048126, 0.23945458233356476, 0.2...positive2.0KWK this one is heavily oversold here, think w...positive
24user: GEVO the beginning of a new uptrend:[-0.8558018207550049, 0.20911763608455658, -0....positive8.0user: GEVO the beginning of a new uptrend:positive
25Health insurance stocks should hold up fairly ...[-0.6992183327674866, 0.873347282409668, -0.33...positive4.0Health insurance stocks should hold up fairly ...negative
26CBMX that is unbelievable!!!!!!! but I am happ...[-0.7398191690444946, 0.0032342064660042524, 0...positive2.0CBMX that is unbelievable!!!!!!! but I am hap...positive
27Global markets rise following fresh signals th...[-0.46590617299079895, 0.5152156949043274, -0....positive4.0Global markets rise following fresh signals th...positive
28My SHOTS, various strats, AXDX CMCO DGIT D ECY...[-0.9121986627578735, 0.18954770267009735, 0.5...positive3.0My SHOTS, various strats, AXDX CMCO DGIT D ECY...negative
29this could be the last good chance to short AMZN[-1.4936555624008179, -0.08102501928806305, -0...positive1.0this could be the last good chance to short AMZNnegative
30HFC - Great group. ooks good. More here ->[-1.282071590423584, 0.452540785074234, -0.020...positive6.0HFC - Great group. ooks good. More here ->positive
31RT @WSJheard: Heard on the Street's @jackycwon...[-0.16953450441360474, 0.5467158555984497, 0.4...positive3.0RT @WSJheard: Heard on the Street's @jackycwon...positive
32MON How quickly investors forget the massive b...[-1.0020204782485962, 0.06035422161221504, -0....positive2.0MON How quickly investors forget the massive b...positive
33EA if price doesn't hold above 9SMA then 16.71...[-0.6850847005844116, 0.7942832708358765, -0.2...positive2.0EA if price doesn't hold above 9SMA then 16.71...positive
34NKD well above the moving averages, look for s...[-1.168450117111206, 0.5925496816635132, -0.27...positive7.0NKD well above the moving averages, look for s...positive
35AAP looking to take some off around 429, shoul...[-0.9121736884117126, 0.7181501388549805, -0.6...positive8.0AAP looking to take some off around 429, shoul...positive
36Selling ICE Short check out my video analysis[-1.0185582637786865, 0.23524264991283417, -0....positive1.0Selling ICE Short check out my video analysisnegative
37CAT Bingo, it is Bingo everywhere today.[-1.0476768016815186, -0.31251490116119385, 0....positive1.0CAT Bingo, it is Bingo everywhere today.positive
38HAO like it on a pop over 6 w vol[-0.397280752658844, -0.7803610563278198, -0.0...positive2.0HAO like it on a pop over 6 w volpositive
39user: AAP nothing like firing of CEO to make i...[0.2457299381494522, 0.9597267508506775, -0.56...positive4.0user: AAP nothing like firing of CEO to make i...positive
40Coronavirus Crisis: GoAir Decides To Reduce Pa...[-0.18016083538532257, 0.4278823137283325, -0....positive8.0Coronavirus Crisis: GoAir Decides To Reduce Pa...negative
41As the coronavirus pandemic intensifies, adher...[-0.7261321544647217, -0.19837094843387604, -0...positive6.0As the coronavirus pandemic intensifies, adher...negative
42notable 52wk highs [20 > /sh < 50] AFCE AK ACO...[-1.047921895980835, 0.6981227993965149, 0.153...positive7.0notable 52wk highs [20 > /sh < 50] AFCE AK ACO...positive
43AAP PMI Manufacturing Index ---> In Few minute...[-0.7834053039550781, 0.5567207336425781, -0.6...positive7.0AAP PMI Manufacturing Index ---> In Few minute...negative
44SPW pauses, SCTY continues its run, up > 20% i...[-0.8796350955963135, 0.21304339170455933, -0....positive3.0SPW pauses, SCTY continues its run, up > 20% i...positive
45RT @josephttwallace: Glutted Oil Markets’ Ne...[-0.24904966354370117, 0.5449734926223755, 0.1...positive2.0RT @josephttwallace: Glutted Oil Markets’ Ne...negative
46GOOG keep in mind there are about 1,000 contra...[-0.6386780738830566, 0.6693974137306213, -0.7...positive4.0GOOG keep in mind there are about 1,000 contra...negative
47V and MA FYI: When they tagged the 50d's yeste...[-1.3859288692474365, 0.1374266892671585, -0.2...positive1.0V and MA FYI: When they tagged the 50d's yeste...positive
48GOOG holding up well[-0.9325542449951172, 0.7117096185684204, -0.2...positive2.0GOOG holding up wellpositive
49JPMorgan Chase Chief Executive James Dimon ret...[-0.5751636624336243, 0.5106868147850037, -0.1...positive6.0JPMorgan Chase Chief Executive James Dimon ret...positive
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 9 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qFoT-s1MjTSS" + }, + "source": [ + "# 7. Try training with different Embeddings" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "nxWFzQOhjWC8", + "outputId": "017238a7-20f7-432a-ecff-d987390c1221" + }, + "source": [ + "# We can use nlu.print_components(action='embed_sentence') to see every possibler sentence embedding we could use. Lets use bert!\n", + "nlp.nlu.print_components(action='embed_sentence')" + ], + "execution_count": 10, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "For language NLU provides the following Models : \n", + "nlu.load('am.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_amharic\n", + "For language NLU provides the following Models : \n", + "nlu.load('de.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "For language NLU provides the following Models : \n", + "nlu.load('el.embed_sentence.bert.base_uncased') returns Spark NLP model_anno_obj sent_bert_base_uncased\n", + "For language NLU provides the following Models : \n", + "nlu.load('en.embed_sentence') returns Spark NLP model_anno_obj tfhub_use\n", + "nlu.load('en.embed_sentence.albert') returns Spark NLP model_anno_obj albert_base_uncased\n", + "nlu.load('en.embed_sentence.bert') returns Spark NLP model_anno_obj sent_bert_base_uncased\n", + "nlu.load('en.embed_sentence.bert.base_uncased_legal') returns Spark NLP model_anno_obj sent_bert_base_uncased_legal\n", + "nlu.load('en.embed_sentence.bert.finetuned') returns Spark NLP model_anno_obj sbert_setfit_finetuned_financial_text_classification\n", + "nlu.load('en.embed_sentence.bert.pubmed') returns Spark NLP model_anno_obj sent_bert_pubmed\n", + "nlu.load('en.embed_sentence.bert.pubmed_squad2') returns Spark NLP model_anno_obj sent_bert_pubmed_squad2\n", + "nlu.load('en.embed_sentence.bert.wiki_books') returns Spark NLP model_anno_obj sent_bert_wiki_books\n", + "nlu.load('en.embed_sentence.bert.wiki_books_mnli') returns Spark NLP model_anno_obj sent_bert_wiki_books_mnli\n", + "nlu.load('en.embed_sentence.bert.wiki_books_qnli') returns Spark NLP model_anno_obj sent_bert_wiki_books_qnli\n", + "nlu.load('en.embed_sentence.bert.wiki_books_qqp') returns Spark NLP model_anno_obj sent_bert_wiki_books_qqp\n", + "nlu.load('en.embed_sentence.bert.wiki_books_squad2') returns Spark NLP model_anno_obj sent_bert_wiki_books_squad2\n", + "nlu.load('en.embed_sentence.bert.wiki_books_sst2') returns Spark NLP model_anno_obj sent_bert_wiki_books_sst2\n", + "nlu.load('en.embed_sentence.bert_base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "nlu.load('en.embed_sentence.bert_base_uncased') returns Spark NLP model_anno_obj sent_bert_base_uncased\n", + "nlu.load('en.embed_sentence.bert_large_cased') returns Spark NLP model_anno_obj sent_bert_large_cased\n", + "nlu.load('en.embed_sentence.bert_large_uncased') returns Spark NLP model_anno_obj sent_bert_large_uncased\n", + "nlu.load('en.embed_sentence.bert_use_cmlm_en_base') returns Spark NLP model_anno_obj sent_bert_use_cmlm_en_base\n", + "nlu.load('en.embed_sentence.bert_use_cmlm_en_large') returns Spark NLP model_anno_obj sent_bert_use_cmlm_en_large\n", + "nlu.load('en.embed_sentence.biobert.clinical_base_cased') returns Spark NLP model_anno_obj sent_biobert_clinical_base_cased\n", + "nlu.load('en.embed_sentence.biobert.discharge_base_cased') returns Spark NLP model_anno_obj sent_biobert_discharge_base_cased\n", + "nlu.load('en.embed_sentence.biobert.pmc_base_cased') returns Spark NLP model_anno_obj sent_biobert_pmc_base_cased\n", + "nlu.load('en.embed_sentence.biobert.pubmed_base_cased') returns Spark NLP model_anno_obj sent_biobert_pubmed_base_cased\n", + "nlu.load('en.embed_sentence.biobert.pubmed_large_cased') returns Spark NLP model_anno_obj sent_biobert_pubmed_large_cased\n", + "nlu.load('en.embed_sentence.biobert.pubmed_pmc_base_cased') returns Spark NLP model_anno_obj sent_biobert_pubmed_pmc_base_cased\n", + "nlu.load('en.embed_sentence.covidbert.large_uncased') returns Spark NLP model_anno_obj sent_covidbert_large_uncased\n", + "nlu.load('en.embed_sentence.distil_roberta.distilled_base') returns Spark NLP model_anno_obj sent_distilroberta_base\n", + "nlu.load('en.embed_sentence.doc2vec') returns Spark NLP model_anno_obj doc2vec_gigaword_300\n", + "nlu.load('en.embed_sentence.doc2vec.gigaword_300') returns Spark NLP model_anno_obj doc2vec_gigaword_300\n", + "nlu.load('en.embed_sentence.doc2vec.gigaword_wiki_300') returns Spark NLP model_anno_obj doc2vec_gigaword_wiki_300\n", + "nlu.load('en.embed_sentence.electra') returns Spark NLP model_anno_obj sent_electra_small_uncased\n", + "nlu.load('en.embed_sentence.electra_base_uncased') returns Spark NLP model_anno_obj sent_electra_base_uncased\n", + "nlu.load('en.embed_sentence.electra_large_uncased') returns Spark NLP model_anno_obj sent_electra_large_uncased\n", + "nlu.load('en.embed_sentence.electra_small_uncased') returns Spark NLP model_anno_obj sent_electra_small_uncased\n", + "nlu.load('en.embed_sentence.roberta.base') returns Spark NLP model_anno_obj sent_roberta_base\n", + "nlu.load('en.embed_sentence.roberta.large') returns Spark NLP model_anno_obj sent_roberta_large\n", + "nlu.load('en.embed_sentence.small_bert_L10_128') returns Spark NLP model_anno_obj sent_small_bert_L10_128\n", + "nlu.load('en.embed_sentence.small_bert_L10_256') returns Spark NLP model_anno_obj sent_small_bert_L10_256\n", + "nlu.load('en.embed_sentence.small_bert_L10_512') returns Spark NLP model_anno_obj sent_small_bert_L10_512\n", + "nlu.load('en.embed_sentence.small_bert_L10_768') returns Spark NLP model_anno_obj sent_small_bert_L10_768\n", + "nlu.load('en.embed_sentence.small_bert_L12_128') returns Spark NLP model_anno_obj sent_small_bert_L12_128\n", + "nlu.load('en.embed_sentence.small_bert_L12_256') returns Spark NLP model_anno_obj sent_small_bert_L12_256\n", + "nlu.load('en.embed_sentence.small_bert_L12_512') returns Spark NLP model_anno_obj sent_small_bert_L12_512\n", + "nlu.load('en.embed_sentence.small_bert_L12_768') returns Spark NLP model_anno_obj sent_small_bert_L12_768\n", + "nlu.load('en.embed_sentence.small_bert_L2_128') returns Spark NLP model_anno_obj sent_small_bert_L2_128\n", + "nlu.load('en.embed_sentence.small_bert_L2_256') returns Spark NLP model_anno_obj sent_small_bert_L2_256\n", + "nlu.load('en.embed_sentence.small_bert_L2_512') returns Spark NLP model_anno_obj sent_small_bert_L2_512\n", + "nlu.load('en.embed_sentence.small_bert_L2_768') returns Spark NLP model_anno_obj sent_small_bert_L2_768\n", + "nlu.load('en.embed_sentence.small_bert_L4_128') returns Spark NLP model_anno_obj sent_small_bert_L4_128\n", + "nlu.load('en.embed_sentence.small_bert_L4_256') returns Spark NLP model_anno_obj sent_small_bert_L4_256\n", + "nlu.load('en.embed_sentence.small_bert_L4_512') returns Spark NLP model_anno_obj sent_small_bert_L4_512\n", + "nlu.load('en.embed_sentence.small_bert_L4_768') returns Spark NLP model_anno_obj sent_small_bert_L4_768\n", + "nlu.load('en.embed_sentence.small_bert_L6_128') returns Spark NLP model_anno_obj sent_small_bert_L6_128\n", + "nlu.load('en.embed_sentence.small_bert_L6_256') returns Spark NLP model_anno_obj sent_small_bert_L6_256\n", + "nlu.load('en.embed_sentence.small_bert_L6_512') returns Spark NLP model_anno_obj sent_small_bert_L6_512\n", + "nlu.load('en.embed_sentence.small_bert_L6_768') returns Spark NLP model_anno_obj sent_small_bert_L6_768\n", + "nlu.load('en.embed_sentence.small_bert_L8_128') returns Spark NLP model_anno_obj sent_small_bert_L8_128\n", + "nlu.load('en.embed_sentence.small_bert_L8_256') returns Spark NLP model_anno_obj sent_small_bert_L8_256\n", + "nlu.load('en.embed_sentence.small_bert_L8_512') returns Spark NLP model_anno_obj sent_small_bert_L8_512\n", + "nlu.load('en.embed_sentence.small_bert_L8_768') returns Spark NLP model_anno_obj sent_small_bert_L8_768\n", + "nlu.load('en.embed_sentence.tfhub_use') returns Spark NLP model_anno_obj tfhub_use\n", + "nlu.load('en.embed_sentence.tfhub_use.lg') returns Spark NLP model_anno_obj tfhub_use_lg\n", + "nlu.load('en.embed_sentence.use') returns Spark NLP model_anno_obj tfhub_use\n", + "nlu.load('en.embed_sentence.use.lg') returns Spark NLP model_anno_obj tfhub_use_lg\n", + "For language NLU provides the following Models : \n", + "nlu.load('es.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "nlu.load('es.embed_sentence.bert.base_uncased') returns Spark NLP model_anno_obj sent_bert_base_uncased\n", + "For language NLU provides the following Models : \n", + "nlu.load('fi.embed_sentence.bert') returns Spark NLP model_anno_obj bert_base_finnish_uncased\n", + "nlu.load('fi.embed_sentence.bert.cased') returns Spark NLP model_anno_obj bert_base_finnish_cased\n", + "nlu.load('fi.embed_sentence.bert.uncased') returns Spark NLP model_anno_obj bert_base_finnish_uncased\n", + "For language NLU provides the following Models : \n", + "nlu.load('ha.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_hausa\n", + "For language NLU provides the following Models : \n", + "nlu.load('ig.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_igbo\n", + "For language NLU provides the following Models : \n", + "nlu.load('lg.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_luganda\n", + "For language NLU provides the following Models : \n", + "nlu.load('nl.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "For language NLU provides the following Models : \n", + "nlu.load('pcm.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_naija\n", + "For language NLU provides the following Models : \n", + "nlu.load('pt.embed_sentence.bert.base_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_base_tsdae_sts\n", + "nlu.load('pt.embed_sentence.bert.cased_large_legal') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.1\n", + "nlu.load('pt.embed_sentence.bert.large_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_gpl_sts\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.10.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.10\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.2.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.2\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.3.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.3\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.4.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.4\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.5.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.5\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.7.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.7\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.8.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.8\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.9.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.9\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v1.0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v1.0\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_gpl_nli_sts_v0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_gpl_nli_sts_v0\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_gpl_nli_sts_v1.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_gpl_nli_sts_v1\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_nli_sts_v0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_nli_sts_v0\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_nli_sts_v1.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_nli_sts_v1\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_sts_v0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_sts_v0\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_sts_v1.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_sts_v1\n", + "nlu.load('pt.embed_sentence.bert.v2_base_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_base_ma_v2\n", + "nlu.load('pt.embed_sentence.bert.v2_large_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_sts_v2\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.assin.base.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_base_ma\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.assin2.base.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_base\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_sts_by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_sts_ma.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large_ma\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_sts_ma_v3.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large_ma_v3\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_tsdae_sts.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_sts\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_tsdae_sts_v4.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_sts_v4\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_tsdae_v4_gpl_sts.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_v4_gpl_sts\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.v2_large_sts_v2.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large_v2\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.v2_large_v2_sts.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_v2_sts\n", + "For language NLU provides the following Models : \n", + "nlu.load('rw.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_kinyarwanda\n", + "For language NLU provides the following Models : \n", + "nlu.load('sv.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "For language NLU provides the following Models : \n", + "nlu.load('sw.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_swahili\n", + "For language NLU provides the following Models : \n", + "nlu.load('wo.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_wolof\n", + "For language NLU provides the following Models : \n", + "nlu.load('xx.embed_sentence') returns Spark NLP model_anno_obj sent_bert_multi_cased\n", + "nlu.load('xx.embed_sentence.bert') returns Spark NLP model_anno_obj sent_bert_multi_cased\n", + "nlu.load('xx.embed_sentence.bert.cased') returns Spark NLP model_anno_obj sent_bert_multi_cased\n", + "nlu.load('xx.embed_sentence.bert.muril') returns Spark NLP model_anno_obj sent_bert_muril\n", + "nlu.load('xx.embed_sentence.bert_use_cmlm_multi_base') returns Spark NLP model_anno_obj sent_bert_use_cmlm_multi_base\n", + "nlu.load('xx.embed_sentence.bert_use_cmlm_multi_base_br') returns Spark NLP model_anno_obj sent_bert_use_cmlm_multi_base_br\n", + "nlu.load('xx.embed_sentence.labse') returns Spark NLP model_anno_obj labse\n", + "nlu.load('xx.embed_sentence.xlm_roberta.base') returns Spark NLP model_anno_obj sent_xlm_roberta_base\n", + "For language NLU provides the following Models : \n", + "nlu.load('yo.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_yoruba\n", + "For language NLU provides the following Models : \n", + "nlu.load('zh.embed_sentence.bert') returns Spark NLP model_anno_obj sbert_chinese_qmc_finance_v1\n", + "nlu.load('zh.embed_sentence.bert.distilled') returns Spark NLP model_anno_obj sbert_chinese_qmc_finance_v1_distill\n" + ] + } + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "IKK_Ii_gjJfF", + "outputId": "34ce4144-5adf-404e-a0aa-aec86b7ff166" + }, + "source": [ + "trainable_pipe = nlp.load('en.embed_sentence.small_bert_L12_768 train.sentiment')\n", + "# We need to train longer and user smaller LR for NON-USE based sentence embeddings usually\n", + "# We could tune the hyperparameters further with hyperparameter tuning methods like gridsearch\n", + "# Also longer training gives more accuracy\n", + "trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(120)\n", + "trainable_pipe['trainable_sentiment_dl'].setLr(0.0005)\n", + "fitted_pipe = trainable_pipe.fit(train_df)\n", + "# predict with the trainable pipeline on dataset and get predictions\n", + "preds = fitted_pipe.predict(train_df,output_level='document')\n", + "\n", + "#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n", + "preds.dropna(inplace=True)\n", + "print(classification_report(preds['y'], preds['sentiment']))\n", + "\n", + "#preds" + ], + "execution_count": 12, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", + "sent_small_bert_L12_768 download started this may take some time.\n", + "Approximate size to download 392.9 MB\n", + "[OK!]\n" + ] + }, + { + "output_type": "stream", + "name": "stderr", + "text": [ + "/usr/local/lib/python3.10/dist-packages/nlu/pipe/extractors/extractor_methods/base_extractor_methods.py:356: SettingWithCopyWarning: \n", + "A value is trying to be set on a copy of a slice from a DataFrame.\n", + "Try using .loc[row_indexer,col_indexer] = value instead\n", + "\n", + "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", + " df[cols_to_explode] = df[cols_to_explode].apply(pad_same_level_cols, axis=1)\n" + ] + }, + { + "output_type": "stream", + "name": "stdout", + "text": [ + " precision recall f1-score support\n", + "\n", + " negative 0.79 0.54 0.64 1671\n", + " neutral 0.00 0.00 0.00 0\n", + " positive 0.82 0.86 0.84 2961\n", + "\n", + " accuracy 0.74 4632\n", + " macro avg 0.54 0.46 0.49 4632\n", + "weighted avg 0.81 0.74 0.76 4632\n", + "\n" + ] + }, + { + "output_type": "stream", + "name": "stderr", + "text": [ + "/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples. Use `zero_division` parameter to control this behavior.\n", + " _warn_prf(average, modifier, msg_start, len(result))\n", + "/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples. Use `zero_division` parameter to control this behavior.\n", + " _warn_prf(average, modifier, msg_start, len(result))\n", + "/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples. Use `zero_division` parameter to control this behavior.\n", + " _warn_prf(average, modifier, msg_start, len(result))\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_1jxw3GnVGlI" + }, + "source": [ + "# 7.1 evaluate on Test Data" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Fxx4yNkNVGFl", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "c788a55c-8686-4835-e3d4-c8fa5551cae0" + }, + "source": [ + "preds = fitted_pipe.predict(test_df,output_level='document')\n", + "\n", + "#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n", + "preds.dropna(inplace=True)\n", + "print(classification_report(preds['y'], preds['sentiment']))" + ], + "execution_count": 13, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + " precision recall f1-score support\n", + "\n", + " negative 0.69 0.45 0.54 435\n", + " neutral 0.00 0.00 0.00 0\n", + " positive 0.76 0.80 0.78 724\n", + "\n", + " accuracy 0.67 1159\n", + " macro avg 0.48 0.41 0.44 1159\n", + "weighted avg 0.73 0.67 0.69 1159\n", + "\n" + ] + }, + { + "output_type": "stream", + "name": "stderr", + "text": [ + "/usr/local/lib/python3.10/dist-packages/nlu/pipe/extractors/extractor_methods/base_extractor_methods.py:356: SettingWithCopyWarning: \n", + "A value is trying to be set on a copy of a slice from a DataFrame.\n", + "Try using .loc[row_indexer,col_indexer] = value instead\n", + "\n", + "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", + " df[cols_to_explode] = df[cols_to_explode].apply(pad_same_level_cols, axis=1)\n", + "/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples. Use `zero_division` parameter to control this behavior.\n", + " _warn_prf(average, modifier, msg_start, len(result))\n", + "/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples. Use `zero_division` parameter to control this behavior.\n", + " _warn_prf(average, modifier, msg_start, len(result))\n", + "/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples. Use `zero_division` parameter to control this behavior.\n", + " _warn_prf(average, modifier, msg_start, len(result))\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2BB-NwZUoHSe" + }, + "source": [ + "# 8. Lets save the model" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "eLex095goHwm" + }, + "source": [ + "stored_model_path = './models/classifier_dl_trained'\n", + "fitted_pipe.save(stored_model_path)" + ], + "execution_count": 14, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e_b2DPd4rCiU" + }, + "source": [ + "# 9. Lets load the model from HDD.\n", + "This makes Offlien NLU usage possible! \n", + "You need to call nlu.load(path=path_to_the_pipe) to load a model/pipeline from disk." + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 133 + }, + "id": "SO4uz45MoRgp", + "outputId": "d78b4bfd-544c-4add-9583-9356de5ebb7e" + }, + "source": [ + "hdd_pipe = nlp.load(path=\"./models/classifier_dl_trained\")\n", + "\n", + "preds = hdd_pipe.predict('Bitcoin dropped by 50 percent!!')\n", + "preds" + ], + "execution_count": 15, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " document \\\n", + "0 Bitcoin dropped by 50 percent!! \n", + "\n", + " sentence_embedding_from_disk sentiment \\\n", + "0 [0.20597122609615326, 0.16840754449367523, 0.0... negative \n", + "\n", + " sentiment_confidence \n", + "0 0.0 " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
documentsentence_embedding_from_disksentimentsentiment_confidence
0Bitcoin dropped by 50 percent!![0.20597122609615326, 0.16840754449367523, 0.0...negative0.0
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 15 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "e0CVlkk9v6Qi", + "outputId": "ae53e7f2-6371-4473-ca8e-5074f5341f31" + }, + "source": [ + "hdd_pipe.print_info()" + ], + "execution_count": 16, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n", + ">>> component_list['document_assembler'] has settable params:\n", + "component_list['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n", + ">>> component_list['bert_sentence_embeddings@sent_small_bert_L12_768'] has settable params:\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setBatchSize(8) | Info: Size of every batch | Currently set to : 8\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setCaseSensitive(False) | Info: whether to ignore case in tokens for embeddings matching | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setDimension(768) | Info: Number of embedding dimensions | Currently set to : 768\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setMaxSentenceLength(128) | Info: Max sentence length to process | Currently set to : 128\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setIsLong(False) | Info: Use Long type instead of Int type for inputs buffer - Some Bert models require Long instead of Int. | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setStorageRef('sent_small_bert_L12_768') | Info: unique reference name for identification | Currently set to : sent_small_bert_L12_768\n", + ">>> component_list['sentiment_dl@sent_small_bert_L12_768'] has settable params:\n", + "component_list['sentiment_dl@sent_small_bert_L12_768'].setThreshold(0.6) | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n", + "component_list['sentiment_dl@sent_small_bert_L12_768'].setThresholdLabel('neutral') | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n", + "component_list['sentiment_dl@sent_small_bert_L12_768'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['sentiment_dl@sent_small_bert_L12_768'].setClasses(['positive', 'negative']) | Info: get the tags used to trained this SentimentDLModel | Currently set to : ['positive', 'negative']\n", + "component_list['sentiment_dl@sent_small_bert_L12_768'].setStorageRef('sent_small_bert_L12_768') | Info: unique reference name for identification | Currently set to : sent_small_bert_L12_768\n" + ] + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "-CdcbSd7WEpm" + }, + "source": [], + "execution_count": null, + "outputs": [] + } + ] +} \ No newline at end of file diff --git a/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_twitter.ipynb b/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_twitter.ipynb index baa09395..7b54e0b9 100644 --- a/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_twitter.ipynb +++ b/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_twitter.ipynb @@ -1 +1,3158 @@ -{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"NLU_training_sentiment_classifier_demo_twitter.ipynb","provenance":[],"collapsed_sections":["zkufh760uvF3"]},"kernelspec":{"display_name":"Python 3","name":"python3"}},"cells":[{"cell_type":"markdown","metadata":{"id":"zkufh760uvF3"},"source":["![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)\n","\n","[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_twitter.ipynb)\n","\n","\n","\n","# Training a Sentiment Analysis Classifier with NLU \n","## 2 class twitter classifier training\n","With the [SentimentDL model](https://nlp.johnsnowlabs.com/docs/en/annotators#sentimentdl-multi-class-sentiment-analysis-annotator) from Spark NLP you can achieve State Of the Art results on any multi class text classification problem \n","\n","This notebook showcases the following features : \n","\n","- How to train the deep learning classifier\n","- How to store a pipeline to disk\n","- How to load the pipeline from disk (Enables NLU offline mode)\n","\n"]},{"cell_type":"markdown","metadata":{"id":"dur2drhW5Rvi"},"source":["# 1. Install Java 8 and NLU"]},{"cell_type":"code","metadata":{"id":"hFGnBCHavltY","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1620214509349,"user_tz":-120,"elapsed":123909,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"faee9bf7-4d8f-4a48-be35-6966ac6ebd02"},"source":["!wget https://setup.johnsnowlabs.com/nlu/colab.sh -O - | bash\n","import nlu"],"execution_count":null,"outputs":[{"output_type":"stream","text":["--2021-05-05 11:33:05-- https://raw.githubusercontent.com/JohnSnowLabs/nlu/master/scripts/colab_setup.sh\n","Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...\n","Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.\n","HTTP request sent, awaiting response... 200 OK\n","Length: 1671 (1.6K) [text/plain]\n","Saving to: ‘STDOUT’\n","\n","Installing NLU 3.0.0 with PySpark 3.0.2 and Spark NLP 3.0.1 for Google Colab ...\n","- 100%[===================>] 1.63K --.-KB/s in 0s \n","\n","2021-05-05 11:33:06 (47.6 MB/s) - written to stdout [1671/1671]\n","\n","\u001b[K |████████████████████████████████| 204.8MB 67kB/s \n","\u001b[K |████████████████████████████████| 153kB 44.1MB/s \n","\u001b[K |████████████████████████████████| 204kB 22.3MB/s \n","\u001b[K |████████████████████████████████| 204kB 54.6MB/s \n","\u001b[?25h Building wheel for pyspark (setup.py) ... \u001b[?25l\u001b[?25hdone\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"f4KkTfnR5Ugg"},"source":["# 2. Download twitter Sentiment dataset \n","https://www.kaggle.com/cosmos98/twitter-and-reddit-sentimental-analysis-dataset\n","#Context\n","\n","This is was a Dataset Created as a part of the university Project On Sentimental Analysis On Multi-Source Social Media Platforms using PySpark."]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"OrVb5ZMvvrQD","executionInfo":{"status":"ok","timestamp":1620214509351,"user_tz":-120,"elapsed":123894,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"7beb4eab-22ec-4f40-d7bf-6460b1fe261d"},"source":["! wget http://ckl-it.de/wp-content/uploads/2021/01/Twitter_Data.csv\n"],"execution_count":null,"outputs":[{"output_type":"stream","text":["--2021-05-05 11:35:08-- http://ckl-it.de/wp-content/uploads/2021/01/Twitter_Data.csv\n","Resolving ckl-it.de (ckl-it.de)... 217.160.0.108, 2001:8d8:100f:f000::209\n","Connecting to ckl-it.de (ckl-it.de)|217.160.0.108|:80... connected.\n","HTTP request sent, awaiting response... 200 OK\n","Length: 99657 (97K) [text/csv]\n","Saving to: ‘Twitter_Data.csv’\n","\n","Twitter_Data.csv 100%[===================>] 97.32K 310KB/s in 0.3s \n","\n","2021-05-05 11:35:08 (310 KB/s) - ‘Twitter_Data.csv’ saved [99657/99657]\n","\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":391},"id":"y4xSRWIhwT28","executionInfo":{"status":"ok","timestamp":1620214509356,"user_tz":-120,"elapsed":123892,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"5e36e540-7943-4bd4-85e2-956979503db7"},"source":["import pandas as pd\n","train_path = '/content/Twitter_Data.csv'\n","\n","train_df = pd.read_csv(train_path)\n","# the text data to use for classification should be in a column named 'text'\n","# the label column must have name 'y' name be of type str\n","columns=['text','y']\n","train_df = train_df[columns]\n","train_df"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
texty
0how narendra modi has almost killed the indian...negative
1you think was modi behind that accidentnegative
2kamal haasan takes chowkidar modi kamal haasan...negative
3connected name with surname not bcz religion c...negative
4anyone better than modi when nehruji expired s...positive
.........
595perception makes fool some call “foreign inv...negative
596when will see your tweet for justice for you a...negative
597haha congress going gaga over this after looti...positive
598this movie shows the life histiry narendra mod...negative
599modi left his year old wife and returned her r...positive
\n","

600 rows × 2 columns

\n","
"],"text/plain":[" text y\n","0 how narendra modi has almost killed the indian... negative\n","1 you think was modi behind that accident negative\n","2 kamal haasan takes chowkidar modi kamal haasan... negative\n","3 connected name with surname not bcz religion c... negative\n","4 anyone better than modi when nehruji expired s... positive\n",".. ... ...\n","595 perception makes fool some call “foreign inv... negative\n","596 when will see your tweet for justice for you a... negative\n","597 haha congress going gaga over this after looti... positive\n","598 this movie shows the life histiry narendra mod... negative\n","599 modi left his year old wife and returned her r... positive\n","\n","[600 rows x 2 columns]"]},"metadata":{"tags":[]},"execution_count":3}]},{"cell_type":"markdown","metadata":{"id":"0296Om2C5anY"},"source":["# 3. Train Deep Learning Classifier using nlu.load('train.sentiment')\n","\n","You dataset label column should be named 'y' and the feature column with text data should be named 'text'"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":1000},"id":"3ZIPkRkWftBG","executionInfo":{"status":"ok","timestamp":1620214645691,"user_tz":-120,"elapsed":260221,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"c3d16617-f157-4732-8139-86dc380efcca"},"source":["from sklearn.metrics import classification_report\n","import nlu \n","# load a trainable pipeline by specifying the train. prefix and fit it on a datset with label and text columns\n","# by default the Universal Sentence Encoder (USE) Sentence embeddings are used for generation\n","trainable_pipe = nlu.load('train.sentiment')\n","fitted_pipe = trainable_pipe.fit(train_df.iloc[:50])\n","\n","# predict with the trainable pipeline on dataset and get predictions\n","preds = fitted_pipe.predict(train_df.iloc[:50],output_level='document')\n","#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n","preds.dropna(inplace=True)\n","print(classification_report(preds['y'], preds['sentiment']))\n","\n","preds"],"execution_count":null,"outputs":[{"output_type":"stream","text":["tfhub_use download started this may take some time.\n","Approximate size to download 923.7 MB\n","[OK!]\n","sentence_detector_dl download started this may take some time.\n","Approximate size to download 354.6 KB\n","[OK!]\n"," precision recall f1-score support\n","\n"," negative 0.54 1.00 0.70 27\n"," positive 0.00 0.00 0.00 23\n","\n"," accuracy 0.54 50\n"," macro avg 0.27 0.50 0.35 50\n","weighted avg 0.29 0.54 0.38 50\n","\n"],"name":"stdout"},{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
sentence_embedding_usetrained_sentimentyorigin_indextexttrained_sentiment_confidencedocumentsentence
0[0.060062434524297714, -0.05557167902588844, -...negativenegative0how narendra modi has almost killed the indian...0.781124how narendra modi has almost killed the indian...[how narendra modi has almost killed the india...
1[0.05362718552350998, -0.004547705873847008, -...negativenegative1you think was modi behind that accident0.772585you think was modi behind that accident[you think was modi behind that accident]
2[0.07274721562862396, -0.061593908816576004, -...negativenegative2kamal haasan takes chowkidar modi kamal haasan...0.778565kamal haasan takes chowkidar modi kamal haasan...[kamal haasan takes chowkidar modi kamal haasa...
3[0.06106054410338402, -0.060213156044483185, -...negativenegative3connected name with surname not bcz religion c...0.732274connected name with surname not bcz religion c...[connected name with surname not bcz religion ...
4[0.0737471729516983, 0.006071774289011955, -0....negativepositive4anyone better than modi when nehruji expired s...0.783452anyone better than modi when nehruji expired s...[anyone better than modi when nehruji expired ...
5[0.05888386443257332, -0.0646616667509079, -0....negativenegative5\\r\\nmodiji wont tired crying foul\\r\\nmain chow...0.788001modiji wont tired crying foul main chowkidar h...[modiji wont tired crying foul main chowkidar ...
6[0.058948416262865067, -0.029682165011763573, ...negativenegative6poor chap modi hasn’ given him anything can ...0.781363poor chap modi hasn’ given him anything can ...[poor chap modi hasn’ given him anything can...
7[0.051331546157598495, -0.06789953261613846, -...negativenegative7green underwear missing ive been doubting isi ...0.767519green underwear missing ive been doubting isi ...[green underwear missing ive been doubting isi...
8[0.044129759073257446, -0.06111813709139824, -...negativepositive8congress years wasnt able complete one rafale ...0.754464congress years wasnt able complete one rafale ...[congress years wasnt able complete one rafale...
9[0.03665374591946602, -0.03695330768823624, -0...negativenegative9asked learn from how treat minority well does ...0.693848asked learn from how treat minority well does ...[asked learn from how treat minority well does...
10[0.07035735249519348, -0.06952506303787231, -0...negativenegative10stop bull shitting worry about criminal vivek ...0.785139stop bull shitting worry about criminal vivek ...[stop bull shitting worry about criminal vivek...
11[0.013958276249468327, -0.030759528279304504, ...negativepositive11drswamys timesnow last year debate nearly mill...0.731118drswamys timesnow last year debate nearly mill...[drswamys timesnow last year debate nearly mil...
12[0.026277026161551476, -0.06238812580704689, -...negativenegative12asshole bahujan radical marxist grow brain kno...0.784451asshole bahujan radical marxist grow brain kno...[asshole bahujan radical marxist grow brain kn...
13[0.07457270473241806, -0.058670494705438614, -...negativepositive13from selling dreams 2014 selling tshirts 2019 ...0.761034from selling dreams 2014 selling tshirts 2019 ...[from selling dreams 2014 selling tshirts 2019...
14[0.061704088002443314, -0.04553354158997536, -...negativepositive14very true sir thats why they are against modi ...0.772205very true sir thats why they are against modi ...[very true sir thats why they are against modi...
15[0.053420260548591614, -0.0038897113408893347,...negativenegative15they are giving jobs citizen india what you ar...0.769769they are giving jobs citizen india what you ar...[they are giving jobs citizen india what you a...
16[0.027197618037462234, -0.036435648798942566, ...negativenegative16congress has always attempted empower people g...0.762557congress has always attempted empower people g...[congress has always attempted empower people ...
17[0.06601184606552124, -0.020045213401317596, -...negativepositive17have never said that modi succeed yet even als...0.779734have never said that modi succeed yet even als...[have never said that modi succeed yet even al...
18[0.046943631023168564, -0.06800007820129395, -...negativepositive18\\r\\nthe foundation for new india 2022 has alre...0.755853the foundation for new india 2022 has already ...[the foundation for new india 2022 has already...
19[0.05615750327706337, -0.002462629694491625, -...negativenegative19only rahul gandhis politics love can defeat th...0.770993only rahul gandhis politics love can defeat th...[only rahul gandhis politics love can defeat t...
20[0.030352214351296425, -0.06195472553372383, 0...negativenegative20one step time navigating thru looteyns when ev...0.758632one step time navigating thru looteyns when ev...[one step time navigating thru looteyns when e...
21[0.07535804808139801, -0.05643236264586449, -0...negativenegative21why sir mam shabana azami hate much that have ...0.778967why sir mam shabana azami hate much that have ...[why sir mam shabana azami hate much that have...
22[0.05986170098185539, -0.0674145296216011, -0....negativenegative22modi will remain for next 510 years and till t...0.782218modi will remain for next 510 years and till t...[modi will remain for next 510 years and till ...
23[0.023959942162036896, -0.01397246215492487, -...negativepositive23pledge your first vote for modi0.753846pledge your first vote for modi[pledge your first vote for modi]
24[0.04451165348291397, -0.06473662704229355, -0...negativepositive24why need modi lead bjp government again 2019 j...0.775316why need modi lead bjp government again 2019 j...[why need modi lead bjp government again 2019 ...
25[0.06561190634965897, -0.0614917054772377, -0....negativenegative25raghuram rajan sent list high profile bank fra...0.791679raghuram rajan sent list high profile bank fra...[raghuram rajan sent list high profile bank fr...
26[0.05217093229293823, -0.05785880237817764, -0...negativenegative26modi govts slashing indias education budget cl...0.764641modi govts slashing indias education budget cl...[modi govts slashing indias education budget c...
27[0.04579754173755646, -0.051767487078905106, -...negativepositive27why are you hell bent manoj tiwari just her ph...0.748229why are you hell bent manoj tiwari just her ph...[why are you hell bent manoj tiwari just her p...
28[0.047987841069698334, -0.050984784960746765, ...negativenegative28know going into dirty details nehru family its...0.768564know going into dirty details nehru family its...[know going into dirty details nehru family it...
29[0.04509664326906204, -0.05019481107592583, -0...negativenegative29momota begum will let her state become total s...0.745878momota begum will let her state become total s...[momota begum will let her state become total ...
30[0.04315190762281418, -0.04578147828578949, -0...negativepositive30thanks anu sharma will vote and make sure peop...0.768158thanks anu sharma will vote and make sure peop...[thanks anu sharma will vote and make sure peo...
31[0.0144237345084548, -0.052222371101379395, -0...negativepositive31those who themselves dont know how many father...0.752480those who themselves dont know how many father...[those who themselves dont know how many fathe...
32[0.02492097206413746, -0.0531931146979332, -0....negativepositive32the star campaigner myth bjp lost more than as...0.785754the star campaigner myth bjp lost more than as...[the star campaigner myth bjp lost more than a...
33[0.040389616042375565, -0.06375984847545624, -...negativenegative33modi also live for few years only like you not...0.763549modi also live for few years only like you not...[modi also live for few years only like you no...
34[0.06742898374795914, -0.060488566756248474, -...negativepositive34narendra modi more brainy than all the drswamy...0.741321narendra modi more brainy than all the drswamy...[narendra modi more brainy than all the drswam...
35[0.06360629200935364, -0.06786973774433136, -0...negativenegative35have started calling chowkidaar narendra modi ...0.771774have started calling chowkidaar narendra modi ...[have started calling chowkidaar narendra modi...
36[0.024233123287558556, -0.05243394151329994, -...negativepositive36this the difference confident leaders call upo...0.758803this the difference confident leaders call upo...[this the difference confident leaders call up...
37[0.039280060678720474, -0.05146652087569237, -...negativenegative37jawans killed the border\\r\\ncrimes against wom...0.774826jawans killed the border crimes against women ...[jawans killed the border crimes against women...
38[0.05051109194755554, -0.0660049319267273, 0.0...negativenegative38tag this fast growing youtuber cared abt this ...0.789172tag this fast growing youtuber cared abt this ...[tag this fast growing youtuber cared abt this...
39[-0.010975896380841732, -0.059168506413698196,...negativepositive39think hindus should back off and let them suff...0.702905think hindus should back off and let them suff...[think hindus should back off and let them suf...
40[0.02310813218355179, -0.027600247412919998, -...negativepositive40yes cannot make any knee jerk moves drastic ac...0.664961yes cannot make any knee jerk moves drastic ac...[yes cannot make any knee jerk moves drastic a...
41[0.043231260031461716, -0.07101075351238251, -...negativenegative41why picked chairman the devious aadhaar isnt h...0.783011why picked chairman the devious aadhaar isnt h...[why picked chairman the devious aadhaar isnt ...
42[0.04160398617386818, -0.06572042405605316, -0...negativepositive42due automation and artificial intelligence fur...0.748431due automation and artificial intelligence fur...[due automation and artificial intelligence fu...
43[-0.00038854932063259184, -0.04599419981241226...negativepositive43weak state capacity exacerbated excessive acco...0.776327weak state capacity exacerbated excessive acco...[weak state capacity exacerbated excessive acc...
44[-0.02063656784594059, -0.07548005133867264, -...negativepositive44our narendra modi ordered indian air force tak...0.734817our narendra modi ordered indian air force tak...[our narendra modi ordered indian air force ta...
45[0.01779576577246189, -0.06789527088403702, -0...negativenegative45why vote modi dynasty visionary 3no high level...0.758230why vote modi dynasty visionary 3no high level...[why vote modi dynasty visionary 3no high leve...
46[0.065566785633564, -0.04119298234581947, -0.0...negativenegative46its modi chor corrupt maha thugbandhan janta w...0.783157its modi chor corrupt maha thugbandhan janta w...[its modi chor corrupt maha thugbandhan janta ...
47[0.03988223522901535, -0.04965453967452049, -0...negativepositive47before modis arrival 2014 all supported him fo...0.767342before modis arrival 2014 all supported him fo...[before modis arrival 2014 all supported him f...
48[0.010842484422028065, 0.01363383699208498, -0...negativepositive48think you forgot dollar india handled exceptio...0.767096think you forgot dollar india handled exceptio...[think you forgot dollar india handled excepti...
49[-0.01967957802116871, 0.05570048466324806, -0...negativepositive49tulsi gabbard rejected interviews with tyt but...0.677169tulsi gabbard rejected interviews with tyt but...[tulsi gabbard rejected interviews with tyt bu...
\n","
"],"text/plain":[" sentence_embedding_use ... sentence\n","0 [0.060062434524297714, -0.05557167902588844, -... ... [how narendra modi has almost killed the india...\n","1 [0.05362718552350998, -0.004547705873847008, -... ... [you think was modi behind that accident]\n","2 [0.07274721562862396, -0.061593908816576004, -... ... [kamal haasan takes chowkidar modi kamal haasa...\n","3 [0.06106054410338402, -0.060213156044483185, -... ... [connected name with surname not bcz religion ...\n","4 [0.0737471729516983, 0.006071774289011955, -0.... ... [anyone better than modi when nehruji expired ...\n","5 [0.05888386443257332, -0.0646616667509079, -0.... ... [modiji wont tired crying foul main chowkidar ...\n","6 [0.058948416262865067, -0.029682165011763573, ... ... [poor chap modi hasn’ given him anything can...\n","7 [0.051331546157598495, -0.06789953261613846, -... ... [green underwear missing ive been doubting isi...\n","8 [0.044129759073257446, -0.06111813709139824, -... ... [congress years wasnt able complete one rafale...\n","9 [0.03665374591946602, -0.03695330768823624, -0... ... [asked learn from how treat minority well does...\n","10 [0.07035735249519348, -0.06952506303787231, -0... ... [stop bull shitting worry about criminal vivek...\n","11 [0.013958276249468327, -0.030759528279304504, ... ... [drswamys timesnow last year debate nearly mil...\n","12 [0.026277026161551476, -0.06238812580704689, -... ... [asshole bahujan radical marxist grow brain kn...\n","13 [0.07457270473241806, -0.058670494705438614, -... ... [from selling dreams 2014 selling tshirts 2019...\n","14 [0.061704088002443314, -0.04553354158997536, -... ... [very true sir thats why they are against modi...\n","15 [0.053420260548591614, -0.0038897113408893347,... ... [they are giving jobs citizen india what you a...\n","16 [0.027197618037462234, -0.036435648798942566, ... ... [congress has always attempted empower people ...\n","17 [0.06601184606552124, -0.020045213401317596, -... ... [have never said that modi succeed yet even al...\n","18 [0.046943631023168564, -0.06800007820129395, -... ... [the foundation for new india 2022 has already...\n","19 [0.05615750327706337, -0.002462629694491625, -... ... [only rahul gandhis politics love can defeat t...\n","20 [0.030352214351296425, -0.06195472553372383, 0... ... [one step time navigating thru looteyns when e...\n","21 [0.07535804808139801, -0.05643236264586449, -0... ... [why sir mam shabana azami hate much that have...\n","22 [0.05986170098185539, -0.0674145296216011, -0.... ... [modi will remain for next 510 years and till ...\n","23 [0.023959942162036896, -0.01397246215492487, -... ... [pledge your first vote for modi]\n","24 [0.04451165348291397, -0.06473662704229355, -0... ... [why need modi lead bjp government again 2019 ...\n","25 [0.06561190634965897, -0.0614917054772377, -0.... ... [raghuram rajan sent list high profile bank fr...\n","26 [0.05217093229293823, -0.05785880237817764, -0... ... [modi govts slashing indias education budget c...\n","27 [0.04579754173755646, -0.051767487078905106, -... ... [why are you hell bent manoj tiwari just her p...\n","28 [0.047987841069698334, -0.050984784960746765, ... ... [know going into dirty details nehru family it...\n","29 [0.04509664326906204, -0.05019481107592583, -0... ... [momota begum will let her state become total ...\n","30 [0.04315190762281418, -0.04578147828578949, -0... ... [thanks anu sharma will vote and make sure peo...\n","31 [0.0144237345084548, -0.052222371101379395, -0... ... [those who themselves dont know how many fathe...\n","32 [0.02492097206413746, -0.0531931146979332, -0.... ... [the star campaigner myth bjp lost more than a...\n","33 [0.040389616042375565, -0.06375984847545624, -... ... [modi also live for few years only like you no...\n","34 [0.06742898374795914, -0.060488566756248474, -... ... [narendra modi more brainy than all the drswam...\n","35 [0.06360629200935364, -0.06786973774433136, -0... ... [have started calling chowkidaar narendra modi...\n","36 [0.024233123287558556, -0.05243394151329994, -... ... [this the difference confident leaders call up...\n","37 [0.039280060678720474, -0.05146652087569237, -... ... [jawans killed the border crimes against women...\n","38 [0.05051109194755554, -0.0660049319267273, 0.0... ... [tag this fast growing youtuber cared abt this...\n","39 [-0.010975896380841732, -0.059168506413698196,... ... [think hindus should back off and let them suf...\n","40 [0.02310813218355179, -0.027600247412919998, -... ... [yes cannot make any knee jerk moves drastic a...\n","41 [0.043231260031461716, -0.07101075351238251, -... ... [why picked chairman the devious aadhaar isnt ...\n","42 [0.04160398617386818, -0.06572042405605316, -0... ... [due automation and artificial intelligence fu...\n","43 [-0.00038854932063259184, -0.04599419981241226... ... [weak state capacity exacerbated excessive acc...\n","44 [-0.02063656784594059, -0.07548005133867264, -... ... [our narendra modi ordered indian air force ta...\n","45 [0.01779576577246189, -0.06789527088403702, -0... ... [why vote modi dynasty visionary 3no high leve...\n","46 [0.065566785633564, -0.04119298234581947, -0.0... ... [its modi chor corrupt maha thugbandhan janta ...\n","47 [0.03988223522901535, -0.04965453967452049, -0... ... [before modis arrival 2014 all supported him f...\n","48 [0.010842484422028065, 0.01363383699208498, -0... ... [think you forgot dollar india handled excepti...\n","49 [-0.01967957802116871, 0.05570048466324806, -0... ... [tulsi gabbard rejected interviews with tyt bu...\n","\n","[50 rows x 8 columns]"]},"metadata":{"tags":[]},"execution_count":4}]},{"cell_type":"markdown","metadata":{"id":"lVyOE2wV0fw_"},"source":["# Test the fitted pipe on new example"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":76},"id":"qdCUg2MR0PD2","executionInfo":{"status":"ok","timestamp":1620214646472,"user_tz":-120,"elapsed":260997,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"7fe8bbe8-421a-4231-971e-8527629b7bc8"},"source":["fitted_pipe.predict('the president of india just died')"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
sentence_embedding_usetrained_sentimentorigin_indextrained_sentiment_confidencedocumentsentence
0[0.013345886953175068, -0.021778283640742302, ...negative00.722135the president of india just died[the president of india just died]
\n","
"],"text/plain":[" sentence_embedding_use ... sentence\n","0 [0.013345886953175068, -0.021778283640742302, ... ... [the president of india just died]\n","\n","[1 rows x 6 columns]"]},"metadata":{"tags":[]},"execution_count":5}]},{"cell_type":"markdown","metadata":{"id":"xflpwrVjjBVD"},"source":["## Configure pipe training parameters"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"UtsAUGTmOTms","executionInfo":{"status":"ok","timestamp":1620214646475,"user_tz":-120,"elapsed":260995,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"47f312f8-22c4-432e-a7e0-e152aedf6585"},"source":["trainable_pipe.print_info()"],"execution_count":null,"outputs":[{"output_type":"stream","text":["The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n",">>> pipe['sentiment_dl'] has settable params:\n","pipe['sentiment_dl'].setMaxEpochs(1) | Info: Maximum number of epochs to train | Currently set to : 1\n","pipe['sentiment_dl'].setLr(0.005) | Info: Learning Rate | Currently set to : 0.005\n","pipe['sentiment_dl'].setBatchSize(64) | Info: Batch size | Currently set to : 64\n","pipe['sentiment_dl'].setDropout(0.5) | Info: Dropout coefficient | Currently set to : 0.5\n","pipe['sentiment_dl'].setEnableOutputLogs(True) | Info: Whether to use stdout in addition to Spark logs. | Currently set to : True\n","pipe['sentiment_dl'].setThreshold(0.6) | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n","pipe['sentiment_dl'].setThresholdLabel('neutral') | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n",">>> pipe['use@tfhub_use'] has settable params:\n","pipe['use@tfhub_use'].setDimension(512) | Info: Number of embedding dimensions | Currently set to : 512\n","pipe['use@tfhub_use'].setLoadSP(False) | Info: Whether to load SentencePiece ops file which is required only by multi-lingual models. This is not changeable after it's set with a pretrained model nor it is compatible with Windows. | Currently set to : False\n","pipe['use@tfhub_use'].setStorageRef('tfhub_use') | Info: unique reference name for identification | Currently set to : tfhub_use\n",">>> pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'] has settable params:\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setExplodeSentences(False) | Info: whether to explode each sentence into a different row, for better parallelization. Defaults to false. | Currently set to : False\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setStorageRef('SentenceDetectorDLModel_c83c27f46b97') | Info: storage unique identifier | Currently set to : SentenceDetectorDLModel_c83c27f46b97\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setEncoder(com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@3e286d29) | Info: Data encoder | Currently set to : com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@3e286d29\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setImpossiblePenultimates(['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']) | Info: Impossible penultimates | Currently set to : ['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setModelArchitecture('cnn') | Info: Model architecture (CNN) | Currently set to : cnn\n",">>> pipe['document_assembler'] has settable params:\n","pipe['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"2GJdDNV9jEIe"},"source":["## Retrain with new parameters"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":1000},"id":"mptfvHx-MMMX","executionInfo":{"status":"ok","timestamp":1620214651490,"user_tz":-120,"elapsed":266005,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"d2862ef6-efc5-46b8-cf34-077b6ee3f4b5"},"source":["# Train longer!\n","trainable_pipe = nlu.load('train.sentiment')\n","trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(5) \n","fitted_pipe = trainable_pipe.fit(train_df.iloc[:50])\n","# predict with the trainable pipeline on dataset and get predictions\n","preds = fitted_pipe.predict(train_df.iloc[:50],output_level='document')\n","\n","#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n","preds.dropna(inplace=True)\n","print(classification_report(preds['y'], preds['sentiment']))\n","\n","preds"],"execution_count":null,"outputs":[{"output_type":"stream","text":[" precision recall f1-score support\n","\n"," negative 0.79 0.96 0.87 27\n"," neutral 0.00 0.00 0.00 0\n"," positive 1.00 0.09 0.16 23\n","\n"," accuracy 0.56 50\n"," macro avg 0.60 0.35 0.34 50\n","weighted avg 0.89 0.56 0.54 50\n","\n"],"name":"stdout"},{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
sentence_embedding_usetrained_sentimentyorigin_indextexttrained_sentiment_confidencedocumentsentence
0[0.060062434524297714, -0.05557167902588844, -...negativenegative0how narendra modi has almost killed the indian...0.689142how narendra modi has almost killed the indian...[how narendra modi has almost killed the india...
1[0.05362718552350998, -0.004547705873847008, -...negativenegative1you think was modi behind that accident0.689483you think was modi behind that accident[you think was modi behind that accident]
2[0.07274721562862396, -0.061593908816576004, -...negativenegative2kamal haasan takes chowkidar modi kamal haasan...0.707988kamal haasan takes chowkidar modi kamal haasan...[kamal haasan takes chowkidar modi kamal haasa...
3[0.06106054410338402, -0.060213156044483185, -...negativenegative3connected name with surname not bcz religion c...0.675382connected name with surname not bcz religion c...[connected name with surname not bcz religion ...
4[0.0737471729516983, 0.006071774289011955, -0....negativepositive4anyone better than modi when nehruji expired s...0.638730anyone better than modi when nehruji expired s...[anyone better than modi when nehruji expired ...
5[0.05888386443257332, -0.0646616667509079, -0....negativenegative5\\r\\nmodiji wont tired crying foul\\r\\nmain chow...0.723110modiji wont tired crying foul main chowkidar h...[modiji wont tired crying foul main chowkidar ...
6[0.058948416262865067, -0.029682165011763573, ...negativenegative6poor chap modi hasn’ given him anything can ...0.690602poor chap modi hasn’ given him anything can ...[poor chap modi hasn’ given him anything can...
7[0.051331546157598495, -0.06789953261613846, -...negativenegative7green underwear missing ive been doubting isi ...0.705077green underwear missing ive been doubting isi ...[green underwear missing ive been doubting isi...
8[0.044129759073257446, -0.06111813709139824, -...neutralpositive8congress years wasnt able complete one rafale ...0.561979congress years wasnt able complete one rafale ...[congress years wasnt able complete one rafale...
9[0.03665374591946602, -0.03695330768823624, -0...negativenegative9asked learn from how treat minority well does ...0.746584asked learn from how treat minority well does ...[asked learn from how treat minority well does...
10[0.07035735249519348, -0.06952506303787231, -0...negativenegative10stop bull shitting worry about criminal vivek ...0.768111stop bull shitting worry about criminal vivek ...[stop bull shitting worry about criminal vivek...
11[0.013958276249468327, -0.030759528279304504, ...neutralpositive11drswamys timesnow last year debate nearly mill...0.511294drswamys timesnow last year debate nearly mill...[drswamys timesnow last year debate nearly mil...
12[0.026277026161551476, -0.06238812580704689, -...negativenegative12asshole bahujan radical marxist grow brain kno...0.689268asshole bahujan radical marxist grow brain kno...[asshole bahujan radical marxist grow brain kn...
13[0.07457270473241806, -0.058670494705438614, -...negativepositive13from selling dreams 2014 selling tshirts 2019 ...0.641822from selling dreams 2014 selling tshirts 2019 ...[from selling dreams 2014 selling tshirts 2019...
14[0.061704088002443314, -0.04553354158997536, -...negativepositive14very true sir thats why they are against modi ...0.651230very true sir thats why they are against modi ...[very true sir thats why they are against modi...
15[0.053420260548591614, -0.0038897113408893347,...negativenegative15they are giving jobs citizen india what you ar...0.706768they are giving jobs citizen india what you ar...[they are giving jobs citizen india what you a...
16[0.027197618037462234, -0.036435648798942566, ...negativenegative16congress has always attempted empower people g...0.607062congress has always attempted empower people g...[congress has always attempted empower people ...
17[0.06601184606552124, -0.020045213401317596, -...negativepositive17have never said that modi succeed yet even als...0.628577have never said that modi succeed yet even als...[have never said that modi succeed yet even al...
18[0.046943631023168564, -0.06800007820129395, -...neutralpositive18\\r\\nthe foundation for new india 2022 has alre...0.547697the foundation for new india 2022 has already ...[the foundation for new india 2022 has already...
19[0.05615750327706337, -0.002462629694491625, -...negativenegative19only rahul gandhis politics love can defeat th...0.632572only rahul gandhis politics love can defeat th...[only rahul gandhis politics love can defeat t...
20[0.030352214351296425, -0.06195472553372383, 0...negativenegative20one step time navigating thru looteyns when ev...0.635106one step time navigating thru looteyns when ev...[one step time navigating thru looteyns when e...
21[0.07535804808139801, -0.05643236264586449, -0...negativenegative21why sir mam shabana azami hate much that have ...0.738669why sir mam shabana azami hate much that have ...[why sir mam shabana azami hate much that have...
22[0.05986170098185539, -0.0674145296216011, -0....negativenegative22modi will remain for next 510 years and till t...0.659078modi will remain for next 510 years and till t...[modi will remain for next 510 years and till ...
23[0.023959942162036896, -0.01397246215492487, -...neutralpositive23pledge your first vote for modi0.555447pledge your first vote for modi[pledge your first vote for modi]
24[0.04451165348291397, -0.06473662704229355, -0...neutralpositive24why need modi lead bjp government again 2019 j...0.578395why need modi lead bjp government again 2019 j...[why need modi lead bjp government again 2019 ...
25[0.06561190634965897, -0.0614917054772377, -0....negativenegative25raghuram rajan sent list high profile bank fra...0.706507raghuram rajan sent list high profile bank fra...[raghuram rajan sent list high profile bank fr...
26[0.05217093229293823, -0.05785880237817764, -0...negativenegative26modi govts slashing indias education budget cl...0.607360modi govts slashing indias education budget cl...[modi govts slashing indias education budget c...
27[0.04579754173755646, -0.051767487078905106, -...neutralpositive27why are you hell bent manoj tiwari just her ph...0.588993why are you hell bent manoj tiwari just her ph...[why are you hell bent manoj tiwari just her p...
28[0.047987841069698334, -0.050984784960746765, ...negativenegative28know going into dirty details nehru family its...0.753084know going into dirty details nehru family its...[know going into dirty details nehru family it...
29[0.04509664326906204, -0.05019481107592583, -0...negativenegative29momota begum will let her state become total s...0.615988momota begum will let her state become total s...[momota begum will let her state become total ...
30[0.04315190762281418, -0.04578147828578949, -0...neutralpositive30thanks anu sharma will vote and make sure peop...0.555271thanks anu sharma will vote and make sure peop...[thanks anu sharma will vote and make sure peo...
31[0.0144237345084548, -0.052222371101379395, -0...negativepositive31those who themselves dont know how many father...0.631877those who themselves dont know how many father...[those who themselves dont know how many fathe...
32[0.02492097206413746, -0.0531931146979332, -0....neutralpositive32the star campaigner myth bjp lost more than as...0.586682the star campaigner myth bjp lost more than as...[the star campaigner myth bjp lost more than a...
33[0.040389616042375565, -0.06375984847545624, -...neutralnegative33modi also live for few years only like you not...0.587196modi also live for few years only like you not...[modi also live for few years only like you no...
34[0.06742898374795914, -0.060488566756248474, -...neutralpositive34narendra modi more brainy than all the drswamy...0.533663narendra modi more brainy than all the drswamy...[narendra modi more brainy than all the drswam...
35[0.06360629200935364, -0.06786973774433136, -0...negativenegative35have started calling chowkidaar narendra modi ...0.672972have started calling chowkidaar narendra modi ...[have started calling chowkidaar narendra modi...
36[0.024233123287558556, -0.05243394151329994, -...neutralpositive36this the difference confident leaders call upo...0.510922this the difference confident leaders call upo...[this the difference confident leaders call up...
37[0.039280060678720474, -0.05146652087569237, -...negativenegative37jawans killed the border\\r\\ncrimes against wom...0.701794jawans killed the border crimes against women ...[jawans killed the border crimes against women...
38[0.05051109194755554, -0.0660049319267273, 0.0...negativenegative38tag this fast growing youtuber cared abt this ...0.714883tag this fast growing youtuber cared abt this ...[tag this fast growing youtuber cared abt this...
39[-0.010975896380841732, -0.059168506413698196,...neutralpositive39think hindus should back off and let them suff...0.553189think hindus should back off and let them suff...[think hindus should back off and let them suf...
40[0.02310813218355179, -0.027600247412919998, -...positivepositive40yes cannot make any knee jerk moves drastic ac...0.671809yes cannot make any knee jerk moves drastic ac...[yes cannot make any knee jerk moves drastic a...
41[0.043231260031461716, -0.07101075351238251, -...negativenegative41why picked chairman the devious aadhaar isnt h...0.709371why picked chairman the devious aadhaar isnt h...[why picked chairman the devious aadhaar isnt ...
42[0.04160398617386818, -0.06572042405605316, -0...neutralpositive42due automation and artificial intelligence fur...0.553482due automation and artificial intelligence fur...[due automation and artificial intelligence fu...
43[-0.00038854932063259184, -0.04599419981241226...negativepositive43weak state capacity exacerbated excessive acco...0.609747weak state capacity exacerbated excessive acco...[weak state capacity exacerbated excessive acc...
44[-0.02063656784594059, -0.07548005133867264, -...neutralpositive44our narendra modi ordered indian air force tak...0.513191our narendra modi ordered indian air force tak...[our narendra modi ordered indian air force ta...
45[0.01779576577246189, -0.06789527088403702, -0...negativenegative45why vote modi dynasty visionary 3no high level...0.635148why vote modi dynasty visionary 3no high level...[why vote modi dynasty visionary 3no high leve...
46[0.065566785633564, -0.04119298234581947, -0.0...negativenegative46its modi chor corrupt maha thugbandhan janta w...0.687171its modi chor corrupt maha thugbandhan janta w...[its modi chor corrupt maha thugbandhan janta ...
47[0.03988223522901535, -0.04965453967452049, -0...neutralpositive47before modis arrival 2014 all supported him fo...0.557571before modis arrival 2014 all supported him fo...[before modis arrival 2014 all supported him f...
48[0.010842484422028065, 0.01363383699208498, -0...negativepositive48think you forgot dollar india handled exceptio...0.615532think you forgot dollar india handled exceptio...[think you forgot dollar india handled excepti...
49[-0.01967957802116871, 0.05570048466324806, -0...positivepositive49tulsi gabbard rejected interviews with tyt but...0.604604tulsi gabbard rejected interviews with tyt but...[tulsi gabbard rejected interviews with tyt bu...
\n","
"],"text/plain":[" sentence_embedding_use ... sentence\n","0 [0.060062434524297714, -0.05557167902588844, -... ... [how narendra modi has almost killed the india...\n","1 [0.05362718552350998, -0.004547705873847008, -... ... [you think was modi behind that accident]\n","2 [0.07274721562862396, -0.061593908816576004, -... ... [kamal haasan takes chowkidar modi kamal haasa...\n","3 [0.06106054410338402, -0.060213156044483185, -... ... [connected name with surname not bcz religion ...\n","4 [0.0737471729516983, 0.006071774289011955, -0.... ... [anyone better than modi when nehruji expired ...\n","5 [0.05888386443257332, -0.0646616667509079, -0.... ... [modiji wont tired crying foul main chowkidar ...\n","6 [0.058948416262865067, -0.029682165011763573, ... ... [poor chap modi hasn’ given him anything can...\n","7 [0.051331546157598495, -0.06789953261613846, -... ... [green underwear missing ive been doubting isi...\n","8 [0.044129759073257446, -0.06111813709139824, -... ... [congress years wasnt able complete one rafale...\n","9 [0.03665374591946602, -0.03695330768823624, -0... ... [asked learn from how treat minority well does...\n","10 [0.07035735249519348, -0.06952506303787231, -0... ... [stop bull shitting worry about criminal vivek...\n","11 [0.013958276249468327, -0.030759528279304504, ... ... [drswamys timesnow last year debate nearly mil...\n","12 [0.026277026161551476, -0.06238812580704689, -... ... [asshole bahujan radical marxist grow brain kn...\n","13 [0.07457270473241806, -0.058670494705438614, -... ... [from selling dreams 2014 selling tshirts 2019...\n","14 [0.061704088002443314, -0.04553354158997536, -... ... [very true sir thats why they are against modi...\n","15 [0.053420260548591614, -0.0038897113408893347,... ... [they are giving jobs citizen india what you a...\n","16 [0.027197618037462234, -0.036435648798942566, ... ... [congress has always attempted empower people ...\n","17 [0.06601184606552124, -0.020045213401317596, -... ... [have never said that modi succeed yet even al...\n","18 [0.046943631023168564, -0.06800007820129395, -... ... [the foundation for new india 2022 has already...\n","19 [0.05615750327706337, -0.002462629694491625, -... ... [only rahul gandhis politics love can defeat t...\n","20 [0.030352214351296425, -0.06195472553372383, 0... ... [one step time navigating thru looteyns when e...\n","21 [0.07535804808139801, -0.05643236264586449, -0... ... [why sir mam shabana azami hate much that have...\n","22 [0.05986170098185539, -0.0674145296216011, -0.... ... [modi will remain for next 510 years and till ...\n","23 [0.023959942162036896, -0.01397246215492487, -... ... [pledge your first vote for modi]\n","24 [0.04451165348291397, -0.06473662704229355, -0... ... [why need modi lead bjp government again 2019 ...\n","25 [0.06561190634965897, -0.0614917054772377, -0.... ... [raghuram rajan sent list high profile bank fr...\n","26 [0.05217093229293823, -0.05785880237817764, -0... ... [modi govts slashing indias education budget c...\n","27 [0.04579754173755646, -0.051767487078905106, -... ... [why are you hell bent manoj tiwari just her p...\n","28 [0.047987841069698334, -0.050984784960746765, ... ... [know going into dirty details nehru family it...\n","29 [0.04509664326906204, -0.05019481107592583, -0... ... [momota begum will let her state become total ...\n","30 [0.04315190762281418, -0.04578147828578949, -0... ... [thanks anu sharma will vote and make sure peo...\n","31 [0.0144237345084548, -0.052222371101379395, -0... ... [those who themselves dont know how many fathe...\n","32 [0.02492097206413746, -0.0531931146979332, -0.... ... [the star campaigner myth bjp lost more than a...\n","33 [0.040389616042375565, -0.06375984847545624, -... ... [modi also live for few years only like you no...\n","34 [0.06742898374795914, -0.060488566756248474, -... ... [narendra modi more brainy than all the drswam...\n","35 [0.06360629200935364, -0.06786973774433136, -0... ... [have started calling chowkidaar narendra modi...\n","36 [0.024233123287558556, -0.05243394151329994, -... ... [this the difference confident leaders call up...\n","37 [0.039280060678720474, -0.05146652087569237, -... ... [jawans killed the border crimes against women...\n","38 [0.05051109194755554, -0.0660049319267273, 0.0... ... [tag this fast growing youtuber cared abt this...\n","39 [-0.010975896380841732, -0.059168506413698196,... ... [think hindus should back off and let them suf...\n","40 [0.02310813218355179, -0.027600247412919998, -... ... [yes cannot make any knee jerk moves drastic a...\n","41 [0.043231260031461716, -0.07101075351238251, -... ... [why picked chairman the devious aadhaar isnt ...\n","42 [0.04160398617386818, -0.06572042405605316, -0... ... [due automation and artificial intelligence fu...\n","43 [-0.00038854932063259184, -0.04599419981241226... ... [weak state capacity exacerbated excessive acc...\n","44 [-0.02063656784594059, -0.07548005133867264, -... ... [our narendra modi ordered indian air force ta...\n","45 [0.01779576577246189, -0.06789527088403702, -0... ... [why vote modi dynasty visionary 3no high leve...\n","46 [0.065566785633564, -0.04119298234581947, -0.0... ... [its modi chor corrupt maha thugbandhan janta ...\n","47 [0.03988223522901535, -0.04965453967452049, -0... ... [before modis arrival 2014 all supported him f...\n","48 [0.010842484422028065, 0.01363383699208498, -0... ... [think you forgot dollar india handled excepti...\n","49 [-0.01967957802116871, 0.05570048466324806, -0... ... [tulsi gabbard rejected interviews with tyt bu...\n","\n","[50 rows x 8 columns]"]},"metadata":{"tags":[]},"execution_count":7}]},{"cell_type":"markdown","metadata":{"id":"qFoT-s1MjTSS"},"source":["# Try training with different Embeddings"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"nxWFzQOhjWC8","executionInfo":{"status":"ok","timestamp":1620214651491,"user_tz":-120,"elapsed":266002,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"36b0db66-250d-499a-acce-976bd9e188e9"},"source":["# We can use nlu.print_components(action='embed_sentence') to see every possibler sentence embedding we could use. Lets use bert!\n","nlu.print_components(action='embed_sentence')"],"execution_count":null,"outputs":[{"output_type":"stream","text":["For language NLU provides the following Models : \n","nlu.load('en.embed_sentence') returns Spark NLP model tfhub_use\n","nlu.load('en.embed_sentence.use') returns Spark NLP model tfhub_use\n","nlu.load('en.embed_sentence.tfhub_use') returns Spark NLP model tfhub_use\n","nlu.load('en.embed_sentence.use.lg') returns Spark NLP model tfhub_use_lg\n","nlu.load('en.embed_sentence.tfhub_use.lg') returns Spark NLP model tfhub_use_lg\n","nlu.load('en.embed_sentence.albert') returns Spark NLP model albert_base_uncased\n","nlu.load('en.embed_sentence.electra') returns Spark NLP model sent_electra_small_uncased\n","nlu.load('en.embed_sentence.electra_small_uncased') returns Spark NLP model sent_electra_small_uncased\n","nlu.load('en.embed_sentence.electra_base_uncased') returns Spark NLP model sent_electra_base_uncased\n","nlu.load('en.embed_sentence.electra_large_uncased') returns Spark NLP model sent_electra_large_uncased\n","nlu.load('en.embed_sentence.bert') returns Spark NLP model sent_bert_base_uncased\n","nlu.load('en.embed_sentence.bert_base_uncased') returns Spark NLP model sent_bert_base_uncased\n","nlu.load('en.embed_sentence.bert_base_cased') returns Spark NLP model sent_bert_base_cased\n","nlu.load('en.embed_sentence.bert_large_uncased') returns Spark NLP model sent_bert_large_uncased\n","nlu.load('en.embed_sentence.bert_large_cased') returns Spark NLP model sent_bert_large_cased\n","nlu.load('en.embed_sentence.biobert.pubmed_base_cased') returns Spark NLP model sent_biobert_pubmed_base_cased\n","nlu.load('en.embed_sentence.biobert.pubmed_large_cased') returns Spark NLP model sent_biobert_pubmed_large_cased\n","nlu.load('en.embed_sentence.biobert.pmc_base_cased') returns Spark NLP model sent_biobert_pmc_base_cased\n","nlu.load('en.embed_sentence.biobert.pubmed_pmc_base_cased') returns Spark NLP model sent_biobert_pubmed_pmc_base_cased\n","nlu.load('en.embed_sentence.biobert.clinical_base_cased') returns Spark NLP model sent_biobert_clinical_base_cased\n","nlu.load('en.embed_sentence.biobert.discharge_base_cased') returns Spark NLP model sent_biobert_discharge_base_cased\n","nlu.load('en.embed_sentence.covidbert.large_uncased') returns Spark NLP model sent_covidbert_large_uncased\n","nlu.load('en.embed_sentence.small_bert_L2_128') returns Spark NLP model sent_small_bert_L2_128\n","nlu.load('en.embed_sentence.small_bert_L4_128') returns Spark NLP model sent_small_bert_L4_128\n","nlu.load('en.embed_sentence.small_bert_L6_128') returns Spark NLP model sent_small_bert_L6_128\n","nlu.load('en.embed_sentence.small_bert_L8_128') returns Spark NLP model sent_small_bert_L8_128\n","nlu.load('en.embed_sentence.small_bert_L10_128') returns Spark NLP model sent_small_bert_L10_128\n","nlu.load('en.embed_sentence.small_bert_L12_128') returns Spark NLP model sent_small_bert_L12_128\n","nlu.load('en.embed_sentence.small_bert_L2_256') returns Spark NLP model sent_small_bert_L2_256\n","nlu.load('en.embed_sentence.small_bert_L4_256') returns Spark NLP model sent_small_bert_L4_256\n","nlu.load('en.embed_sentence.small_bert_L6_256') returns Spark NLP model sent_small_bert_L6_256\n","nlu.load('en.embed_sentence.small_bert_L8_256') returns Spark NLP model sent_small_bert_L8_256\n","nlu.load('en.embed_sentence.small_bert_L10_256') returns Spark NLP model sent_small_bert_L10_256\n","nlu.load('en.embed_sentence.small_bert_L12_256') returns Spark NLP model sent_small_bert_L12_256\n","nlu.load('en.embed_sentence.small_bert_L2_512') returns Spark NLP model sent_small_bert_L2_512\n","nlu.load('en.embed_sentence.small_bert_L4_512') returns Spark NLP model sent_small_bert_L4_512\n","nlu.load('en.embed_sentence.small_bert_L6_512') returns Spark NLP model sent_small_bert_L6_512\n","nlu.load('en.embed_sentence.small_bert_L8_512') returns Spark NLP model sent_small_bert_L8_512\n","nlu.load('en.embed_sentence.small_bert_L10_512') returns Spark NLP model sent_small_bert_L10_512\n","nlu.load('en.embed_sentence.small_bert_L12_512') returns Spark NLP model sent_small_bert_L12_512\n","nlu.load('en.embed_sentence.small_bert_L2_768') returns Spark NLP model sent_small_bert_L2_768\n","nlu.load('en.embed_sentence.small_bert_L4_768') returns Spark NLP model sent_small_bert_L4_768\n","nlu.load('en.embed_sentence.small_bert_L6_768') returns Spark NLP model sent_small_bert_L6_768\n","nlu.load('en.embed_sentence.small_bert_L8_768') returns Spark NLP model sent_small_bert_L8_768\n","nlu.load('en.embed_sentence.small_bert_L10_768') returns Spark NLP model sent_small_bert_L10_768\n","nlu.load('en.embed_sentence.small_bert_L12_768') returns Spark NLP model sent_small_bert_L12_768\n","For language NLU provides the following Models : \n","nlu.load('fi.embed_sentence') returns Spark NLP model sent_bert_finnish_cased\n","nlu.load('fi.embed_sentence.bert.cased') returns Spark NLP model sent_bert_finnish_cased\n","nlu.load('fi.embed_sentence.bert.uncased') returns Spark NLP model sent_bert_finnish_uncased\n","For language NLU provides the following Models : \n","nlu.load('xx.embed_sentence') returns Spark NLP model sent_bert_multi_cased\n","nlu.load('xx.embed_sentence.bert') returns Spark NLP model sent_bert_multi_cased\n","nlu.load('xx.embed_sentence.bert.cased') returns Spark NLP model sent_bert_multi_cased\n","nlu.load('xx.embed_sentence.labse') returns Spark NLP model labse\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"IKK_Ii_gjJfF","executionInfo":{"status":"ok","timestamp":1620215021948,"user_tz":-120,"elapsed":636454,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"8669ed6f-b02a-4b6a-b1f4-9dc850d2d384"},"source":["trainable_pipe = nlu.load('en.embed_sentence.small_bert_L12_768 train.sentiment')\n","# We need to train longer and user smaller LR for NON-USE based sentence embeddings usually\n","# We could tune the hyperparameters further with hyperparameter tuning methods like gridsearch\n","# Also longer training gives more accuracy\n","trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(100) \n","trainable_pipe['trainable_sentiment_dl'].setLr(0.0005) \n","fitted_pipe = trainable_pipe.fit(train_df)\n","# predict with the trainable pipeline on dataset and get predictions\n","preds = fitted_pipe.predict(train_df,output_level='document')\n","\n","#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n","preds.dropna(inplace=True)\n","print(classification_report(preds['y'], preds['sentiment']))\n","\n","#preds"],"execution_count":null,"outputs":[{"output_type":"stream","text":["sent_small_bert_L12_768 download started this may take some time.\n","Approximate size to download 392.9 MB\n","[OK!]\n","sentence_detector_dl download started this may take some time.\n","Approximate size to download 354.6 KB\n","[OK!]\n"," precision recall f1-score support\n","\n"," negative 0.78 0.67 0.72 300\n"," neutral 0.00 0.00 0.00 0\n"," positive 0.86 0.56 0.67 300\n","\n"," accuracy 0.61 600\n"," macro avg 0.54 0.41 0.46 600\n","weighted avg 0.82 0.61 0.70 600\n","\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"2BB-NwZUoHSe"},"source":["# 5. Lets save the model"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"eLex095goHwm","executionInfo":{"status":"ok","timestamp":1620215202403,"user_tz":-120,"elapsed":816905,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"3b42a600-3686-4fe9-a430-93647db780af"},"source":["stored_model_path = './models/classifier_dl_trained' \n","fitted_pipe.save(stored_model_path)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["Stored model in ./models/classifier_dl_trained\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"e_b2DPd4rCiU"},"source":["# 6. Lets load the model from HDD.\n","This makes Offlien NLU usage possible! \n","You need to call nlu.load(path=path_to_the_pipe) to load a model/pipeline from disk."]},{"cell_type":"code","metadata":{"id":"SO4uz45MoRgp","colab":{"base_uri":"https://localhost:8080/","height":76},"executionInfo":{"status":"ok","timestamp":1620215216214,"user_tz":-120,"elapsed":830714,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"fce06315-d677-491f-9a59-c276b6d23b62"},"source":["hdd_pipe = nlu.load(path=stored_model_path)\n","\n","preds = hdd_pipe.predict('the president of india just died')\n","preds"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
sentimentsentence_embedding_from_diskorigin_indextextsentiment_confidencedocumentsentence
0[positive][[0.009460609406232834, -0.07943306118249893, ...8589934592the president of india just died[0.8165338]the president of india just died[the president of india just died]
\n","
"],"text/plain":[" sentiment ... sentence\n","0 [positive] ... [the president of india just died]\n","\n","[1 rows x 7 columns]"]},"metadata":{"tags":[]},"execution_count":11}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"e0CVlkk9v6Qi","executionInfo":{"status":"ok","timestamp":1620215216215,"user_tz":-120,"elapsed":830711,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"d0978b3b-d8fd-4c14-e00b-7e5e34450ea7"},"source":["hdd_pipe.print_info()"],"execution_count":null,"outputs":[{"output_type":"stream","text":["The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n",">>> pipe['document_assembler'] has settable params:\n","pipe['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n",">>> pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'] has settable params:\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setExplodeSentences(False) | Info: whether to explode each sentence into a different row, for better parallelization. Defaults to false. | Currently set to : False\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setStorageRef('SentenceDetectorDLModel_c83c27f46b97') | Info: storage unique identifier | Currently set to : SentenceDetectorDLModel_c83c27f46b97\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setEncoder(com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@3c74a4cb) | Info: Data encoder | Currently set to : com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@3c74a4cb\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setImpossiblePenultimates(['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']) | Info: Impossible penultimates | Currently set to : ['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setModelArchitecture('cnn') | Info: Model architecture (CNN) | Currently set to : cnn\n",">>> pipe['bert_sentence@sent_small_bert_L12_768'] has settable params:\n","pipe['bert_sentence@sent_small_bert_L12_768'].setBatchSize(8) | Info: Size of every batch | Currently set to : 8\n","pipe['bert_sentence@sent_small_bert_L12_768'].setCaseSensitive(False) | Info: whether to ignore case in tokens for embeddings matching | Currently set to : False\n","pipe['bert_sentence@sent_small_bert_L12_768'].setDimension(768) | Info: Number of embedding dimensions | Currently set to : 768\n","pipe['bert_sentence@sent_small_bert_L12_768'].setMaxSentenceLength(128) | Info: Max sentence length to process | Currently set to : 128\n","pipe['bert_sentence@sent_small_bert_L12_768'].setIsLong(False) | Info: Use Long type instead of Int type for inputs buffer - Some Bert models require Long instead of Int. | Currently set to : False\n","pipe['bert_sentence@sent_small_bert_L12_768'].setStorageRef('sent_small_bert_L12_768') | Info: unique reference name for identification | Currently set to : sent_small_bert_L12_768\n",">>> pipe['sentiment_dl@sent_small_bert_L12_768'] has settable params:\n","pipe['sentiment_dl@sent_small_bert_L12_768'].setThreshold(0.6) | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n","pipe['sentiment_dl@sent_small_bert_L12_768'].setThresholdLabel('neutral') | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n","pipe['sentiment_dl@sent_small_bert_L12_768'].setClasses(['positive', 'negative']) | Info: get the tags used to trained this SentimentDLModel | Currently set to : ['positive', 'negative']\n","pipe['sentiment_dl@sent_small_bert_L12_768'].setStorageRef('sent_small_bert_L12_768') | Info: unique reference name for identification | Currently set to : sent_small_bert_L12_768\n"],"name":"stdout"}]}]} \ No newline at end of file +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "provenance": [], + "collapsed_sections": [ + "zkufh760uvF3" + ] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "zkufh760uvF3" + }, + "source": [ + "![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)\n", + "\n", + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/Training/binary_text_classification/NLU_training_sentiment_classifier_demo_twitter.ipynb)\n", + "\n", + "\n", + "\n", + "# Training a Sentiment Analysis Classifier with NLU\n", + "## 2 class twitter classifier training\n", + "With the [SentimentDL model](https://nlp.johnsnowlabs.com/docs/en/annotators#sentimentdl-multi-class-sentiment-analysis-annotator) from Spark NLP you can achieve State Of the Art results on any multi class text classification problem\n", + "\n", + "This notebook showcases the following features :\n", + "\n", + "- How to train the deep learning classifier\n", + "- How to store a pipeline to disk\n", + "- How to load the pipeline from disk (Enables NLU offline mode)\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dur2drhW5Rvi" + }, + "source": [ + "# 1. Install Java 8 and NLU" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "hFGnBCHavltY" + }, + "source": [ + "!pip install -q johnsnowlabs" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "f4KkTfnR5Ugg" + }, + "source": [ + "# 2. Download twitter Sentiment dataset\n", + "https://www.kaggle.com/cosmos98/twitter-and-reddit-sentimental-analysis-dataset\n", + "#Context\n", + "\n", + "This is was a Dataset Created as a part of the university Project On Sentimental Analysis On Multi-Source Social Media Platforms using PySpark." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "OrVb5ZMvvrQD" + }, + "source": [ + "! wget https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/resources/en/classifier-dl/reddit_twitter_sentiment/Twitter_Data.csv\n" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 424 + }, + "id": "y4xSRWIhwT28", + "outputId": "cb744bdd-bf86-4bec-b200-e8249a2e3f1c" + }, + "source": [ + "import pandas as pd\n", + "train_path = '/content/Twitter_Data.csv'\n", + "\n", + "train_df = pd.read_csv(train_path)\n", + "# the text data to use for classification should be in a column named 'text'\n", + "# the label column must have name 'y' name be of type str\n", + "train_df = train_df.rename(columns={'clean_text': 'text'})\n", + "\n", + "columns=['text','y']\n", + "train_df = train_df[columns]\n", + "train_df" + ], + "execution_count": 5, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " text y\n", + "0 new post added mumbai press official site prod... positive\n", + "1 not wrong the actual temperature might but and... positive\n", + "2 why pakistan crying name modi every day how na... negative\n", + "3 congress years wasnt able complete one rafale ... positive\n", + "4 public toilet near kanagadurga temple nizampet... positive\n", + ".. ... ...\n", + "595 jai hind modi very nice thought positive\n", + "596 after going thru all the comedy speeches shri ... positive\n", + "597 mistry man not then why drag modi the nri foll... negative\n", + "598 why modi have not held single press conference... negative\n", + "599 modi government which fails protect its women ... negative\n", + "\n", + "[600 rows x 2 columns]" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
texty
0new post added mumbai press official site prod...positive
1not wrong the actual temperature might but and...positive
2why pakistan crying name modi every day how na...negative
3congress years wasnt able complete one rafale ...positive
4public toilet near kanagadurga temple nizampet...positive
.........
595jai hind modi very nice thoughtpositive
596after going thru all the comedy speeches shri ...positive
597mistry man not then why drag modi the nri foll...negative
598why modi have not held single press conference...negative
599modi government which fails protect its women ...negative
\n", + "

600 rows × 2 columns

\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 5 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0296Om2C5anY" + }, + "source": [ + "# 3. Train Deep Learning Classifier using nlu.load('train.sentiment')\n", + "\n", + "You dataset label column should be named 'y' and the feature column with text data should be named 'text'" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "id": "3ZIPkRkWftBG", + "outputId": "fbb3d3a1-9cf5-4ecc-89a4-fdb6e0d59d17" + }, + "source": [ + "from sklearn.metrics import classification_report\n", + "from johnsnowlabs import nlp\n", + "# load a trainable pipeline by specifying the train. prefix and fit it on a datset with label and text columns\n", + "# by default the Universal Sentence Encoder (USE) Sentence embeddings are used for generation\n", + "trainable_pipe = nlp.load('train.sentiment')\n", + "fitted_pipe = trainable_pipe.fit(train_df.iloc[:50])\n", + "\n", + "# predict with the trainable pipeline on dataset and get predictions\n", + "preds = fitted_pipe.predict(train_df.iloc[:50],output_level='document')\n", + "#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n", + "preds.dropna(inplace=True)\n", + "print(classification_report(preds['y'], preds['sentiment']))\n", + "\n", + "preds" + ], + "execution_count": 6, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "sent_small_bert_L2_128 download started this may take some time.\n", + "Approximate size to download 16.1 MB\n", + "[OK!]\n" + ] + }, + { + "output_type": "stream", + "name": "stderr", + "text": [ + "/usr/local/lib/python3.10/dist-packages/nlu/pipe/pipeline.py:149: SettingWithCopyWarning: \n", + "A value is trying to be set on a copy of a slice from a DataFrame.\n", + "Try using .loc[row_indexer,col_indexer] = value instead\n", + "\n", + "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", + " dataset.y = dataset.y.apply(str)\n", + "/usr/local/lib/python3.10/dist-packages/nlu/pipe/utils/data_conversion_utils.py:160: SettingWithCopyWarning: \n", + "A value is trying to be set on a copy of a slice from a DataFrame.\n", + "Try using .loc[row_indexer,col_indexer] = value instead\n", + "\n", + "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", + " data['origin_index'] = data.index\n", + "/usr/local/lib/python3.10/dist-packages/nlu/pipe/utils/data_conversion_utils.py:160: SettingWithCopyWarning: \n", + "A value is trying to be set on a copy of a slice from a DataFrame.\n", + "Try using .loc[row_indexer,col_indexer] = value instead\n", + "\n", + "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", + " data['origin_index'] = data.index\n" + ] + }, + { + "output_type": "stream", + "name": "stdout", + "text": [ + " precision recall f1-score support\n", + "\n", + " negative 0.00 0.00 0.00 19\n", + " positive 0.62 1.00 0.77 31\n", + "\n", + " accuracy 0.62 50\n", + " macro avg 0.31 0.50 0.38 50\n", + "weighted avg 0.38 0.62 0.47 50\n", + "\n" + ] + }, + { + "output_type": "stream", + "name": "stderr", + "text": [ + "/usr/local/lib/python3.10/dist-packages/nlu/pipe/extractors/extractor_methods/base_extractor_methods.py:356: SettingWithCopyWarning: \n", + "A value is trying to be set on a copy of a slice from a DataFrame.\n", + "Try using .loc[row_indexer,col_indexer] = value instead\n", + "\n", + "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", + " df[cols_to_explode] = df[cols_to_explode].apply(pad_same_level_cols, axis=1)\n", + "/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.\n", + " _warn_prf(average, modifier, msg_start, len(result))\n", + "/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.\n", + " _warn_prf(average, modifier, msg_start, len(result))\n", + "/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.\n", + " _warn_prf(average, modifier, msg_start, len(result))\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " document \\\n", + "0 new post added mumbai press official site prod... \n", + "1 not wrong the actual temperature might but and... \n", + "2 why pakistan crying name modi every day how na... \n", + "3 congress years wasnt able complete one rafale ... \n", + "4 public toilet near kanagadurga temple nizampet... \n", + "5 the foundation for new india 2022 has already ... \n", + "6 dear governorani can you let the people indian... \n", + "7 this daft donkey’ dick aap was born the iac mo... \n", + "8 major reason for social hatred and strife modi... \n", + "9 demo was black money caught modi did inspite r... \n", + "10 one the best ministers modis cabinet \n", + "11 raghuram rajan sent list high profile bank fra... \n", + "12 governor kalyan singh aligarh 23rd march all a... \n", + "13 this campaign low hanging fruit seems giving f... \n", + "14 strict policing that too health system sir reg... \n", + "15 shatrughan sinha was far bigger public figure ... \n", + "16 people wish your vision india and least intere... \n", + "17 chowkidar hee chor hain baap chor beta bada ch... \n", + "18 with modi all his drawbacks atleast know what ... \n", + "19 compare that with modi’ 2014 “vikas purush” el... \n", + "20 with welfare delivery gst ibc and feo place mo... \n", + "21 young and dynamic chowkidar says you are not w... \n", + "22 dont forget petrol prices have risen ₹ modi go... \n", + "23 prime minister narendra modi has urged voters ... \n", + "24 when someone asks random question economy sche... \n", + "25 from whatsapp have two options this elections ... \n", + "26 heres interesting section work which gives com... \n", + "27 know why you willpapu his chamchas nothing tar... \n", + "28 only rahul gandhis politics love can defeat th... \n", + "29 one see the fake calculation calsi chek kariye... \n", + "30 being born religion where female deities worsh... \n", + "31 narendra modi became and despite biased viciou... \n", + "32 think hindus should back off and let them suff... \n", + "33 from the very beginningmodi doing wada faramos... \n", + "34 women are powerful nature wat hinduism states ... \n", + "35 almost 4000 crore spent but the ganga more pol... \n", + "36 seriously must have sick mind call small kid r... \n", + "37 agree with you was unrequired was kinda uncomf... \n", + "38 rajdeep think 4got that imran not indian never... \n", + "39 the great modi trap ways congress has walked into \n", + "40 has come new meanings nationalism hindu and su... \n", + "41 modi govt hindus are behaving wildly \n", + "42 sir one request why bjp candidate contesting f... \n", + "43 vapas lana hai desh agey badana hai vote for m... \n", + "44 even with massive mandate 336 seats the nda go... \n", + "45 can promise what can delivered epf pension uni... \n", + "46 and even print this seriously whats this elect... \n", + "47 she has asked three questions from modi and he... \n", + "48 dear all tsunami favour modi 2019 coming from ... \n", + "49 rahul gandhis politics love can defeat the mod... \n", + "\n", + " sentence_embedding_small_bert_L2_128 sentiment \\\n", + "0 [-0.2672112286090851, 0.22553667426109314, -0.... positive \n", + "1 [-1.243513822555542, 0.24190887808799744, -0.4... positive \n", + "2 [-0.7444853782653809, -0.0514342300593853, -0.... positive \n", + "3 [-0.34242647886276245, 0.46881920099258423, -0... positive \n", + "4 [-1.1381851434707642, 0.512217104434967, -0.74... positive \n", + "5 [-0.4057338237762451, 1.0029019117355347, -0.9... positive \n", + "6 [-1.4633989334106445, 0.0006002967129461467, -... positive \n", + "7 [0.04606145992875099, 0.3098487854003906, 0.02... positive \n", + "8 [-0.9470604658126831, 0.27183642983436584, -1.... positive \n", + "9 [-0.7136786580085754, 0.0788763239979744, -0.5... positive \n", + "10 [-0.3664279282093048, 0.3727397918701172, -0.4... positive \n", + "11 [-1.0506610870361328, 0.20963071286678314, -0.... positive \n", + "12 [-0.20803016424179077, 0.07477151602506638, -0... positive \n", + "13 [-1.2681258916854858, 0.24513696134090424, -0.... positive \n", + "14 [-0.9757605195045471, 1.0792640447616577, -0.4... positive \n", + "15 [-0.5276148319244385, 0.2546652853488922, -0.1... positive \n", + "16 [-1.0698672533035278, 0.5003032088279724, -0.4... positive \n", + "17 [-0.5458223223686218, -0.23064681887626648, -0... positive \n", + "18 [-0.8066104054450989, -0.014454682357609272, -... positive \n", + "19 [-0.5401442646980286, -0.41557249426841736, -0... positive \n", + "20 [-0.5764278769493103, 1.2036645412445068, -1.0... positive \n", + "21 [-0.08391488343477249, -0.43613195419311523, 0... positive \n", + "22 [-0.10981855541467667, 0.5448974370956421, -0.... positive \n", + "23 [0.27992355823516846, 0.0582934208214283, -0.2... positive \n", + "24 [-0.5603336691856384, -0.1663953959941864, -0.... positive \n", + "25 [0.02258816733956337, -0.18138407170772552, -0... positive \n", + "26 [-1.3662021160125732, 0.042950600385665894, 0.... positive \n", + "27 [-1.0259658098220825, 0.31534343957901, -0.242... positive \n", + "28 [-0.411965548992157, -0.5224093198776245, -0.6... positive \n", + "29 [-1.0891247987747192, -0.5220181345939636, -0.... positive \n", + "30 [-1.0681140422821045, -0.6835974454879761, -0.... positive \n", + "31 [-0.20442193746566772, -0.3683846890926361, -0... positive \n", + "32 [-0.9112239480018616, 0.17845268547534943, -0.... positive \n", + "33 [-1.3499250411987305, 0.23698307573795319, -0.... positive \n", + "34 [-0.47838863730430603, 0.19593238830566406, -0... positive \n", + "35 [-1.229308009147644, -0.07380063086748123, -0.... positive \n", + "36 [-0.8348453640937805, -0.03178590536117554, -0... positive \n", + "37 [-0.4827183485031128, 0.08657485246658325, -0.... positive \n", + "38 [-0.6942201852798462, -0.3642423152923584, -0.... positive \n", + "39 [-0.9660191535949707, -0.2219739854335785, -0.... positive \n", + "40 [-0.03318127244710922, 0.04329724237322807, 0.... positive \n", + "41 [-0.5563615560531616, 0.4725855588912964, 0.10... positive \n", + "42 [-0.6955257654190063, 0.37047961354255676, -0.... positive \n", + "43 [-0.8704789876937866, -0.22370557487010956, -0... positive \n", + "44 [-0.20755957067012787, 0.34972018003463745, -0... positive \n", + "45 [-1.3415300846099854, 1.6326956748962402, -0.6... positive \n", + "46 [-1.4377732276916504, -0.11478081345558167, 0.... positive \n", + "47 [-1.193617343902588, -0.02149897627532482, 0.1... positive \n", + "48 [-0.5854026675224304, -0.21378959715366364, -0... positive \n", + "49 [-0.39933672547340393, -0.5969381332397461, -0... positive \n", + "\n", + " sentiment_confidence text \\\n", + "0 8.0 new post added mumbai press official site prod... \n", + "1 8.0 not wrong the actual temperature might but and... \n", + "2 2.0 why pakistan crying name modi every day how na... \n", + "3 2.0 congress years wasnt able complete one rafale ... \n", + "4 1.0 public toilet near kanagadurga temple nizampet... \n", + "5 7.0 \\nthe foundation for new india 2022 has alread... \n", + "6 3.0 dear governorani can you let the people indian... \n", + "7 6.0 this daft donkey’ dick aap was born the iac mo... \n", + "8 5.0 major reason for social hatred and strife modi... \n", + "9 1.0 demo was black money caught modi did inspite r... \n", + "10 1.0 one the best ministers modis cabinet \n", + "11 8.0 raghuram rajan sent list high profile bank fra... \n", + "12 1.0 governor kalyan singh aligarh 23rd march all a... \n", + "13 3.0 this campaign low hanging fruit seems giving f... \n", + "14 1.0 strict policing that too health system sir reg... \n", + "15 6.0 shatrughan sinha was far bigger public figure ... \n", + "16 1.0 people wish your vision india and least intere... \n", + "17 7.0 chowkidar hee chor hain baap chor beta bada ch... \n", + "18 7.0 with modi all his drawbacks atleast know what ... \n", + "19 2.0 compare that with modi’ 2014 “vikas purush” el... \n", + "20 7.0 with welfare delivery gst ibc and feo place mo... \n", + "21 2.0 young and dynamic chowkidar says you are not w... \n", + "22 5.0 dont forget petrol prices have risen ₹ modi go... \n", + "23 1.0 prime minister narendra modi has urged voters ... \n", + "24 1.0 when someone asks random question economy sche... \n", + "25 2.0 from whatsapp have two options this elections ... \n", + "26 1.0 heres interesting section work which gives com... \n", + "27 6.0 know why you willpapu his chamchas nothing tar... \n", + "28 2.0 only rahul gandhis politics love can defeat th... \n", + "29 2.0 one see the fake calculation calsi chek kariye... \n", + "30 2.0 being born religion where female deities worsh... \n", + "31 3.0 narendra modi became and despite biased viciou... \n", + "32 6.0 think hindus should back off and let them suff... \n", + "33 3.0 from the very beginningmodi doing wada faramos... \n", + "34 5.0 women are powerful nature wat hinduism states ... \n", + "35 1.0 almost 4000 crore spent but the ganga more pol... \n", + "36 1.0 seriously must have sick mind call small kid r... \n", + "37 7.0 agree with you was unrequired was kinda uncomf... \n", + "38 4.0 rajdeep think 4got that imran not indian never... \n", + "39 6.0 the great modi trap ways congress has walked i... \n", + "40 2.0 has come new meanings nationalism hindu and su... \n", + "41 3.0 modi govt hindus are behaving wildly \n", + "42 1.0 sir one request why bjp candidate contesting f... \n", + "43 1.0 vapas lana hai\\ndesh agey badana hai\\nvote for... \n", + "44 2.0 even with massive mandate 336 seats the nda go... \n", + "45 3.0 can promise what can delivered epf pension uni... \n", + "46 2.0 and even print this seriously whats this elect... \n", + "47 1.0 she has asked three questions from modi and he... \n", + "48 2.0 dear all tsunami favour modi 2019 coming from ... \n", + "49 4.0 rahul gandhis politics love can defeat the mod... \n", + "\n", + " y \n", + "0 positive \n", + "1 positive \n", + "2 negative \n", + "3 positive \n", + "4 positive \n", + "5 positive \n", + "6 negative \n", + "7 negative \n", + "8 positive \n", + "9 positive \n", + "10 positive \n", + "11 negative \n", + "12 positive \n", + "13 positive \n", + "14 positive \n", + "15 positive \n", + "16 negative \n", + "17 positive \n", + "18 positive \n", + "19 negative \n", + "20 positive \n", + "21 positive \n", + "22 negative \n", + "23 positive \n", + "24 negative \n", + "25 positive \n", + "26 positive \n", + "27 negative \n", + "28 negative \n", + "29 negative \n", + "30 positive \n", + "31 negative \n", + "32 positive \n", + "33 negative \n", + "34 positive \n", + "35 positive \n", + "36 negative \n", + "37 positive \n", + "38 negative \n", + "39 positive \n", + "40 positive \n", + "41 positive \n", + "42 negative \n", + "43 positive \n", + "44 positive \n", + "45 positive \n", + "46 negative \n", + "47 positive \n", + "48 negative \n", + "49 negative " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
documentsentence_embedding_small_bert_L2_128sentimentsentiment_confidencetexty
0new post added mumbai press official site prod...[-0.2672112286090851, 0.22553667426109314, -0....positive8.0new post added mumbai press official site prod...positive
1not wrong the actual temperature might but and...[-1.243513822555542, 0.24190887808799744, -0.4...positive8.0not wrong the actual temperature might but and...positive
2why pakistan crying name modi every day how na...[-0.7444853782653809, -0.0514342300593853, -0....positive2.0why pakistan crying name modi every day how na...negative
3congress years wasnt able complete one rafale ...[-0.34242647886276245, 0.46881920099258423, -0...positive2.0congress years wasnt able complete one rafale ...positive
4public toilet near kanagadurga temple nizampet...[-1.1381851434707642, 0.512217104434967, -0.74...positive1.0public toilet near kanagadurga temple nizampet...positive
5the foundation for new india 2022 has already ...[-0.4057338237762451, 1.0029019117355347, -0.9...positive7.0\\nthe foundation for new india 2022 has alread...positive
6dear governorani can you let the people indian...[-1.4633989334106445, 0.0006002967129461467, -...positive3.0dear governorani can you let the people indian...negative
7this daft donkey’ dick aap was born the iac mo...[0.04606145992875099, 0.3098487854003906, 0.02...positive6.0this daft donkey’ dick aap was born the iac mo...negative
8major reason for social hatred and strife modi...[-0.9470604658126831, 0.27183642983436584, -1....positive5.0major reason for social hatred and strife modi...positive
9demo was black money caught modi did inspite r...[-0.7136786580085754, 0.0788763239979744, -0.5...positive1.0demo was black money caught modi did inspite r...positive
10one the best ministers modis cabinet[-0.3664279282093048, 0.3727397918701172, -0.4...positive1.0one the best ministers modis cabinetpositive
11raghuram rajan sent list high profile bank fra...[-1.0506610870361328, 0.20963071286678314, -0....positive8.0raghuram rajan sent list high profile bank fra...negative
12governor kalyan singh aligarh 23rd march all a...[-0.20803016424179077, 0.07477151602506638, -0...positive1.0governor kalyan singh aligarh 23rd march all a...positive
13this campaign low hanging fruit seems giving f...[-1.2681258916854858, 0.24513696134090424, -0....positive3.0this campaign low hanging fruit seems giving f...positive
14strict policing that too health system sir reg...[-0.9757605195045471, 1.0792640447616577, -0.4...positive1.0strict policing that too health system sir reg...positive
15shatrughan sinha was far bigger public figure ...[-0.5276148319244385, 0.2546652853488922, -0.1...positive6.0shatrughan sinha was far bigger public figure ...positive
16people wish your vision india and least intere...[-1.0698672533035278, 0.5003032088279724, -0.4...positive1.0people wish your vision india and least intere...negative
17chowkidar hee chor hain baap chor beta bada ch...[-0.5458223223686218, -0.23064681887626648, -0...positive7.0chowkidar hee chor hain baap chor beta bada ch...positive
18with modi all his drawbacks atleast know what ...[-0.8066104054450989, -0.014454682357609272, -...positive7.0with modi all his drawbacks atleast know what ...positive
19compare that with modi’ 2014 “vikas purush” el...[-0.5401442646980286, -0.41557249426841736, -0...positive2.0compare that with modi’ 2014 “vikas purush” el...negative
20with welfare delivery gst ibc and feo place mo...[-0.5764278769493103, 1.2036645412445068, -1.0...positive7.0with welfare delivery gst ibc and feo place mo...positive
21young and dynamic chowkidar says you are not w...[-0.08391488343477249, -0.43613195419311523, 0...positive2.0young and dynamic chowkidar says you are not w...positive
22dont forget petrol prices have risen ₹ modi go...[-0.10981855541467667, 0.5448974370956421, -0....positive5.0dont forget petrol prices have risen ₹ modi go...negative
23prime minister narendra modi has urged voters ...[0.27992355823516846, 0.0582934208214283, -0.2...positive1.0prime minister narendra modi has urged voters ...positive
24when someone asks random question economy sche...[-0.5603336691856384, -0.1663953959941864, -0....positive1.0when someone asks random question economy sche...negative
25from whatsapp have two options this elections ...[0.02258816733956337, -0.18138407170772552, -0...positive2.0from whatsapp have two options this elections ...positive
26heres interesting section work which gives com...[-1.3662021160125732, 0.042950600385665894, 0....positive1.0heres interesting section work which gives com...positive
27know why you willpapu his chamchas nothing tar...[-1.0259658098220825, 0.31534343957901, -0.242...positive6.0know why you willpapu his chamchas nothing tar...negative
28only rahul gandhis politics love can defeat th...[-0.411965548992157, -0.5224093198776245, -0.6...positive2.0only rahul gandhis politics love can defeat th...negative
29one see the fake calculation calsi chek kariye...[-1.0891247987747192, -0.5220181345939636, -0....positive2.0one see the fake calculation calsi chek kariye...negative
30being born religion where female deities worsh...[-1.0681140422821045, -0.6835974454879761, -0....positive2.0being born religion where female deities worsh...positive
31narendra modi became and despite biased viciou...[-0.20442193746566772, -0.3683846890926361, -0...positive3.0narendra modi became and despite biased viciou...negative
32think hindus should back off and let them suff...[-0.9112239480018616, 0.17845268547534943, -0....positive6.0think hindus should back off and let them suff...positive
33from the very beginningmodi doing wada faramos...[-1.3499250411987305, 0.23698307573795319, -0....positive3.0from the very beginningmodi doing wada faramos...negative
34women are powerful nature wat hinduism states ...[-0.47838863730430603, 0.19593238830566406, -0...positive5.0women are powerful nature wat hinduism states ...positive
35almost 4000 crore spent but the ganga more pol...[-1.229308009147644, -0.07380063086748123, -0....positive1.0almost 4000 crore spent but the ganga more pol...positive
36seriously must have sick mind call small kid r...[-0.8348453640937805, -0.03178590536117554, -0...positive1.0seriously must have sick mind call small kid r...negative
37agree with you was unrequired was kinda uncomf...[-0.4827183485031128, 0.08657485246658325, -0....positive7.0agree with you was unrequired was kinda uncomf...positive
38rajdeep think 4got that imran not indian never...[-0.6942201852798462, -0.3642423152923584, -0....positive4.0rajdeep think 4got that imran not indian never...negative
39the great modi trap ways congress has walked into[-0.9660191535949707, -0.2219739854335785, -0....positive6.0the great modi trap ways congress has walked i...positive
40has come new meanings nationalism hindu and su...[-0.03318127244710922, 0.04329724237322807, 0....positive2.0has come new meanings nationalism hindu and su...positive
41modi govt hindus are behaving wildly[-0.5563615560531616, 0.4725855588912964, 0.10...positive3.0modi govt hindus are behaving wildlypositive
42sir one request why bjp candidate contesting f...[-0.6955257654190063, 0.37047961354255676, -0....positive1.0sir one request why bjp candidate contesting f...negative
43vapas lana hai desh agey badana hai vote for m...[-0.8704789876937866, -0.22370557487010956, -0...positive1.0vapas lana hai\\ndesh agey badana hai\\nvote for...positive
44even with massive mandate 336 seats the nda go...[-0.20755957067012787, 0.34972018003463745, -0...positive2.0even with massive mandate 336 seats the nda go...positive
45can promise what can delivered epf pension uni...[-1.3415300846099854, 1.6326956748962402, -0.6...positive3.0can promise what can delivered epf pension uni...positive
46and even print this seriously whats this elect...[-1.4377732276916504, -0.11478081345558167, 0....positive2.0and even print this seriously whats this elect...negative
47she has asked three questions from modi and he...[-1.193617343902588, -0.02149897627532482, 0.1...positive1.0she has asked three questions from modi and he...positive
48dear all tsunami favour modi 2019 coming from ...[-0.5854026675224304, -0.21378959715366364, -0...positive2.0dear all tsunami favour modi 2019 coming from ...negative
49rahul gandhis politics love can defeat the mod...[-0.39933672547340393, -0.5969381332397461, -0...positive4.0rahul gandhis politics love can defeat the mod...negative
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 6 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lVyOE2wV0fw_" + }, + "source": [ + "# Test the fitted pipe on new example" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 150 + }, + "id": "qdCUg2MR0PD2", + "outputId": "7372f9d4-4492-4784-a7dd-d1bef74201bb" + }, + "source": [ + "fitted_pipe.predict('the president of india just died')" + ], + "execution_count": 7, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "sentence_detector_dl download started this may take some time.\n", + "Approximate size to download 354.6 KB\n", + "[OK!]\n", + "Warning::Spark Session already created, some configs may not take.\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " sentence \\\n", + "0 the president of india just died \n", + "\n", + " sentence_embedding_small_bert_L2_128 sentiment \\\n", + "0 [-0.9852966070175171, 0.5659735798835754, -1.0... positive \n", + "\n", + " sentiment_confidence \n", + "0 1.0 " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
sentencesentence_embedding_small_bert_L2_128sentimentsentiment_confidence
0the president of india just died[-0.9852966070175171, 0.5659735798835754, -1.0...positive1.0
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 7 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xflpwrVjjBVD" + }, + "source": [ + "## Configure pipe training parameters" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "UtsAUGTmOTms", + "outputId": "e3503721-ee9a-4c98-b35b-d861cbc98880" + }, + "source": [ + "trainable_pipe.print_info()" + ], + "execution_count": 8, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n", + ">>> component_list['bert_sentence_embeddings@sent_small_bert_L2_128'] has settable params:\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setBatchSize(8) | Info: Size of every batch | Currently set to : 8\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setIsLong(False) | Info: Use Long type instead of Int type for inputs buffer - Some Bert models require Long instead of Int. | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setMaxSentenceLength(128) | Info: Max sentence length to process | Currently set to : 128\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setDimension(128) | Info: Number of embedding dimensions | Currently set to : 128\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setCaseSensitive(False) | Info: whether to ignore case in tokens for embeddings matching | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setStorageRef('sent_small_bert_L2_128') | Info: unique reference name for identification | Currently set to : sent_small_bert_L2_128\n", + ">>> component_list['document_assembler'] has settable params:\n", + "component_list['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n", + ">>> component_list['sentiment_dl@sent_small_bert_L2_128'] has settable params:\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setThreshold(0.6) | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setThresholdLabel('neutral') | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n", + "component_list['sentiment_dl@sent_small_bert_L2_128'].setStorageRef('sent_small_bert_L2_128') | Info: unique reference name for identification | Currently set to : sent_small_bert_L2_128\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2GJdDNV9jEIe" + }, + "source": [ + "## Retrain with new parameters" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "id": "mptfvHx-MMMX", + "outputId": "ffccac34-9684-41f4-d1e7-7f07951fd516" + }, + "source": [ + "# Train longer!\n", + "trainable_pipe = nlp.load('train.sentiment')\n", + "trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(5)\n", + "fitted_pipe = trainable_pipe.fit(train_df.iloc[:50])\n", + "# predict with the trainable pipeline on dataset and get predictions\n", + "preds = fitted_pipe.predict(train_df.iloc[:50],output_level='document')\n", + "\n", + "#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n", + "preds.dropna(inplace=True)\n", + "print(classification_report(preds['y'], preds['sentiment']))\n", + "\n", + "preds" + ], + "execution_count": 9, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", + "sent_small_bert_L2_128 download started this may take some time.\n", + "Approximate size to download 16.1 MB\n", + "[OK!]\n" + ] + }, + { + "output_type": "stream", + "name": "stderr", + "text": [ + "/usr/local/lib/python3.10/dist-packages/nlu/pipe/pipeline.py:149: SettingWithCopyWarning: \n", + "A value is trying to be set on a copy of a slice from a DataFrame.\n", + "Try using .loc[row_indexer,col_indexer] = value instead\n", + "\n", + "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", + " dataset.y = dataset.y.apply(str)\n", + "/usr/local/lib/python3.10/dist-packages/nlu/pipe/utils/data_conversion_utils.py:160: SettingWithCopyWarning: \n", + "A value is trying to be set on a copy of a slice from a DataFrame.\n", + "Try using .loc[row_indexer,col_indexer] = value instead\n", + "\n", + "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", + " data['origin_index'] = data.index\n", + "/usr/local/lib/python3.10/dist-packages/nlu/pipe/utils/data_conversion_utils.py:160: SettingWithCopyWarning: \n", + "A value is trying to be set on a copy of a slice from a DataFrame.\n", + "Try using .loc[row_indexer,col_indexer] = value instead\n", + "\n", + "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", + " data['origin_index'] = data.index\n" + ] + }, + { + "output_type": "stream", + "name": "stdout", + "text": [ + " precision recall f1-score support\n", + "\n", + " negative 0.00 0.00 0.00 19\n", + " positive 0.62 1.00 0.77 31\n", + "\n", + " accuracy 0.62 50\n", + " macro avg 0.31 0.50 0.38 50\n", + "weighted avg 0.38 0.62 0.47 50\n", + "\n" + ] + }, + { + "output_type": "stream", + "name": "stderr", + "text": [ + "/usr/local/lib/python3.10/dist-packages/nlu/pipe/extractors/extractor_methods/base_extractor_methods.py:356: SettingWithCopyWarning: \n", + "A value is trying to be set on a copy of a slice from a DataFrame.\n", + "Try using .loc[row_indexer,col_indexer] = value instead\n", + "\n", + "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", + " df[cols_to_explode] = df[cols_to_explode].apply(pad_same_level_cols, axis=1)\n", + "/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.\n", + " _warn_prf(average, modifier, msg_start, len(result))\n", + "/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.\n", + " _warn_prf(average, modifier, msg_start, len(result))\n", + "/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.\n", + " _warn_prf(average, modifier, msg_start, len(result))\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " document \\\n", + "0 new post added mumbai press official site prod... \n", + "1 not wrong the actual temperature might but and... \n", + "2 why pakistan crying name modi every day how na... \n", + "3 congress years wasnt able complete one rafale ... \n", + "4 public toilet near kanagadurga temple nizampet... \n", + "5 the foundation for new india 2022 has already ... \n", + "6 dear governorani can you let the people indian... \n", + "7 this daft donkey’ dick aap was born the iac mo... \n", + "8 major reason for social hatred and strife modi... \n", + "9 demo was black money caught modi did inspite r... \n", + "10 one the best ministers modis cabinet \n", + "11 raghuram rajan sent list high profile bank fra... \n", + "12 governor kalyan singh aligarh 23rd march all a... \n", + "13 this campaign low hanging fruit seems giving f... \n", + "14 strict policing that too health system sir reg... \n", + "15 shatrughan sinha was far bigger public figure ... \n", + "16 people wish your vision india and least intere... \n", + "17 chowkidar hee chor hain baap chor beta bada ch... \n", + "18 with modi all his drawbacks atleast know what ... \n", + "19 compare that with modi’ 2014 “vikas purush” el... \n", + "20 with welfare delivery gst ibc and feo place mo... \n", + "21 young and dynamic chowkidar says you are not w... \n", + "22 dont forget petrol prices have risen ₹ modi go... \n", + "23 prime minister narendra modi has urged voters ... \n", + "24 when someone asks random question economy sche... \n", + "25 from whatsapp have two options this elections ... \n", + "26 heres interesting section work which gives com... \n", + "27 know why you willpapu his chamchas nothing tar... \n", + "28 only rahul gandhis politics love can defeat th... \n", + "29 one see the fake calculation calsi chek kariye... \n", + "30 being born religion where female deities worsh... \n", + "31 narendra modi became and despite biased viciou... \n", + "32 think hindus should back off and let them suff... \n", + "33 from the very beginningmodi doing wada faramos... \n", + "34 women are powerful nature wat hinduism states ... \n", + "35 almost 4000 crore spent but the ganga more pol... \n", + "36 seriously must have sick mind call small kid r... \n", + "37 agree with you was unrequired was kinda uncomf... \n", + "38 rajdeep think 4got that imran not indian never... \n", + "39 the great modi trap ways congress has walked into \n", + "40 has come new meanings nationalism hindu and su... \n", + "41 modi govt hindus are behaving wildly \n", + "42 sir one request why bjp candidate contesting f... \n", + "43 vapas lana hai desh agey badana hai vote for m... \n", + "44 even with massive mandate 336 seats the nda go... \n", + "45 can promise what can delivered epf pension uni... \n", + "46 and even print this seriously whats this elect... \n", + "47 she has asked three questions from modi and he... \n", + "48 dear all tsunami favour modi 2019 coming from ... \n", + "49 rahul gandhis politics love can defeat the mod... \n", + "\n", + " sentence_embedding_small_bert_L2_128 sentiment \\\n", + "0 [-0.2672112286090851, 0.22553667426109314, -0.... positive \n", + "1 [-1.243513822555542, 0.24190887808799744, -0.4... positive \n", + "2 [-0.7444853782653809, -0.0514342300593853, -0.... positive \n", + "3 [-0.34242647886276245, 0.46881920099258423, -0... positive \n", + "4 [-1.1381851434707642, 0.512217104434967, -0.74... positive \n", + "5 [-0.4057338237762451, 1.0029019117355347, -0.9... positive \n", + "6 [-1.4633989334106445, 0.0006002967129461467, -... positive \n", + "7 [0.04606145992875099, 0.3098487854003906, 0.02... positive \n", + "8 [-0.9470604658126831, 0.27183642983436584, -1.... positive \n", + "9 [-0.7136786580085754, 0.0788763239979744, -0.5... positive \n", + "10 [-0.3664279282093048, 0.3727397918701172, -0.4... positive \n", + "11 [-1.0506610870361328, 0.20963071286678314, -0.... positive \n", + "12 [-0.20803016424179077, 0.07477151602506638, -0... positive \n", + "13 [-1.2681258916854858, 0.24513696134090424, -0.... positive \n", + "14 [-0.9757605195045471, 1.0792640447616577, -0.4... positive \n", + "15 [-0.5276148319244385, 0.2546652853488922, -0.1... positive \n", + "16 [-1.0698672533035278, 0.5003032088279724, -0.4... positive \n", + "17 [-0.5458223223686218, -0.23064681887626648, -0... positive \n", + "18 [-0.8066104054450989, -0.014454682357609272, -... positive \n", + "19 [-0.5401442646980286, -0.41557249426841736, -0... positive \n", + "20 [-0.5764278769493103, 1.2036645412445068, -1.0... positive \n", + "21 [-0.08391488343477249, -0.43613195419311523, 0... positive \n", + "22 [-0.10981855541467667, 0.5448974370956421, -0.... positive \n", + "23 [0.27992355823516846, 0.0582934208214283, -0.2... positive \n", + "24 [-0.5603336691856384, -0.1663953959941864, -0.... positive \n", + "25 [0.02258816733956337, -0.18138407170772552, -0... positive \n", + "26 [-1.3662021160125732, 0.042950600385665894, 0.... positive \n", + "27 [-1.0259658098220825, 0.31534343957901, -0.242... positive \n", + "28 [-0.411965548992157, -0.5224093198776245, -0.6... positive \n", + "29 [-1.0891247987747192, -0.5220181345939636, -0.... positive \n", + "30 [-1.0681140422821045, -0.6835974454879761, -0.... positive \n", + "31 [-0.20442193746566772, -0.3683846890926361, -0... positive \n", + "32 [-0.9112239480018616, 0.17845268547534943, -0.... positive \n", + "33 [-1.3499250411987305, 0.23698307573795319, -0.... positive \n", + "34 [-0.47838863730430603, 0.19593238830566406, -0... positive \n", + "35 [-1.229308009147644, -0.07380063086748123, -0.... positive \n", + "36 [-0.8348453640937805, -0.03178590536117554, -0... positive \n", + "37 [-0.4827183485031128, 0.08657485246658325, -0.... positive \n", + "38 [-0.6942201852798462, -0.3642423152923584, -0.... positive \n", + "39 [-0.9660191535949707, -0.2219739854335785, -0.... positive \n", + "40 [-0.03318127244710922, 0.04329724237322807, 0.... positive \n", + "41 [-0.5563615560531616, 0.4725855588912964, 0.10... positive \n", + "42 [-0.6955257654190063, 0.37047961354255676, -0.... positive \n", + "43 [-0.8704789876937866, -0.22370557487010956, -0... positive \n", + "44 [-0.20755957067012787, 0.34972018003463745, -0... positive \n", + "45 [-1.3415300846099854, 1.6326956748962402, -0.6... positive \n", + "46 [-1.4377732276916504, -0.11478081345558167, 0.... positive \n", + "47 [-1.193617343902588, -0.02149897627532482, 0.1... positive \n", + "48 [-0.5854026675224304, -0.21378959715366364, -0... positive \n", + "49 [-0.39933672547340393, -0.5969381332397461, -0... positive \n", + "\n", + " sentiment_confidence text \\\n", + "0 5.0 new post added mumbai press official site prod... \n", + "1 1.0 not wrong the actual temperature might but and... \n", + "2 1.0 why pakistan crying name modi every day how na... \n", + "3 8.0 congress years wasnt able complete one rafale ... \n", + "4 2.0 public toilet near kanagadurga temple nizampet... \n", + "5 9.0 \\nthe foundation for new india 2022 has alread... \n", + "6 2.0 dear governorani can you let the people indian... \n", + "7 1.0 this daft donkey’ dick aap was born the iac mo... \n", + "8 1.0 major reason for social hatred and strife modi... \n", + "9 1.0 demo was black money caught modi did inspite r... \n", + "10 5.0 one the best ministers modis cabinet \n", + "11 3.0 raghuram rajan sent list high profile bank fra... \n", + "12 8.0 governor kalyan singh aligarh 23rd march all a... \n", + "13 2.0 this campaign low hanging fruit seems giving f... \n", + "14 2.0 strict policing that too health system sir reg... \n", + "15 8.0 shatrughan sinha was far bigger public figure ... \n", + "16 3.0 people wish your vision india and least intere... \n", + "17 4.0 chowkidar hee chor hain baap chor beta bada ch... \n", + "18 1.0 with modi all his drawbacks atleast know what ... \n", + "19 6.0 compare that with modi’ 2014 “vikas purush” el... \n", + "20 1.0 with welfare delivery gst ibc and feo place mo... \n", + "21 5.0 young and dynamic chowkidar says you are not w... \n", + "22 1.0 dont forget petrol prices have risen ₹ modi go... \n", + "23 7.0 prime minister narendra modi has urged voters ... \n", + "24 2.0 when someone asks random question economy sche... \n", + "25 5.0 from whatsapp have two options this elections ... \n", + "26 1.0 heres interesting section work which gives com... \n", + "27 1.0 know why you willpapu his chamchas nothing tar... \n", + "28 6.0 only rahul gandhis politics love can defeat th... \n", + "29 6.0 one see the fake calculation calsi chek kariye... \n", + "30 2.0 being born religion where female deities worsh... \n", + "31 1.0 narendra modi became and despite biased viciou... \n", + "32 2.0 think hindus should back off and let them suff... \n", + "33 2.0 from the very beginningmodi doing wada faramos... \n", + "34 1.0 women are powerful nature wat hinduism states ... \n", + "35 2.0 almost 4000 crore spent but the ganga more pol... \n", + "36 4.0 seriously must have sick mind call small kid r... \n", + "37 2.0 agree with you was unrequired was kinda uncomf... \n", + "38 2.0 rajdeep think 4got that imran not indian never... \n", + "39 9.0 the great modi trap ways congress has walked i... \n", + "40 1.0 has come new meanings nationalism hindu and su... \n", + "41 1.0 modi govt hindus are behaving wildly \n", + "42 1.0 sir one request why bjp candidate contesting f... \n", + "43 1.0 vapas lana hai\\ndesh agey badana hai\\nvote for... \n", + "44 4.0 even with massive mandate 336 seats the nda go... \n", + "45 2.0 can promise what can delivered epf pension uni... \n", + "46 9.0 and even print this seriously whats this elect... \n", + "47 1.0 she has asked three questions from modi and he... \n", + "48 4.0 dear all tsunami favour modi 2019 coming from ... \n", + "49 5.0 rahul gandhis politics love can defeat the mod... \n", + "\n", + " y \n", + "0 positive \n", + "1 positive \n", + "2 negative \n", + "3 positive \n", + "4 positive \n", + "5 positive \n", + "6 negative \n", + "7 negative \n", + "8 positive \n", + "9 positive \n", + "10 positive \n", + "11 negative \n", + "12 positive \n", + "13 positive \n", + "14 positive \n", + "15 positive \n", + "16 negative \n", + "17 positive \n", + "18 positive \n", + "19 negative \n", + "20 positive \n", + "21 positive \n", + "22 negative \n", + "23 positive \n", + "24 negative \n", + "25 positive \n", + "26 positive \n", + "27 negative \n", + "28 negative \n", + "29 negative \n", + "30 positive \n", + "31 negative \n", + "32 positive \n", + "33 negative \n", + "34 positive \n", + "35 positive \n", + "36 negative \n", + "37 positive \n", + "38 negative \n", + "39 positive \n", + "40 positive \n", + "41 positive \n", + "42 negative \n", + "43 positive \n", + "44 positive \n", + "45 positive \n", + "46 negative \n", + "47 positive \n", + "48 negative \n", + "49 negative " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
documentsentence_embedding_small_bert_L2_128sentimentsentiment_confidencetexty
0new post added mumbai press official site prod...[-0.2672112286090851, 0.22553667426109314, -0....positive5.0new post added mumbai press official site prod...positive
1not wrong the actual temperature might but and...[-1.243513822555542, 0.24190887808799744, -0.4...positive1.0not wrong the actual temperature might but and...positive
2why pakistan crying name modi every day how na...[-0.7444853782653809, -0.0514342300593853, -0....positive1.0why pakistan crying name modi every day how na...negative
3congress years wasnt able complete one rafale ...[-0.34242647886276245, 0.46881920099258423, -0...positive8.0congress years wasnt able complete one rafale ...positive
4public toilet near kanagadurga temple nizampet...[-1.1381851434707642, 0.512217104434967, -0.74...positive2.0public toilet near kanagadurga temple nizampet...positive
5the foundation for new india 2022 has already ...[-0.4057338237762451, 1.0029019117355347, -0.9...positive9.0\\nthe foundation for new india 2022 has alread...positive
6dear governorani can you let the people indian...[-1.4633989334106445, 0.0006002967129461467, -...positive2.0dear governorani can you let the people indian...negative
7this daft donkey’ dick aap was born the iac mo...[0.04606145992875099, 0.3098487854003906, 0.02...positive1.0this daft donkey’ dick aap was born the iac mo...negative
8major reason for social hatred and strife modi...[-0.9470604658126831, 0.27183642983436584, -1....positive1.0major reason for social hatred and strife modi...positive
9demo was black money caught modi did inspite r...[-0.7136786580085754, 0.0788763239979744, -0.5...positive1.0demo was black money caught modi did inspite r...positive
10one the best ministers modis cabinet[-0.3664279282093048, 0.3727397918701172, -0.4...positive5.0one the best ministers modis cabinetpositive
11raghuram rajan sent list high profile bank fra...[-1.0506610870361328, 0.20963071286678314, -0....positive3.0raghuram rajan sent list high profile bank fra...negative
12governor kalyan singh aligarh 23rd march all a...[-0.20803016424179077, 0.07477151602506638, -0...positive8.0governor kalyan singh aligarh 23rd march all a...positive
13this campaign low hanging fruit seems giving f...[-1.2681258916854858, 0.24513696134090424, -0....positive2.0this campaign low hanging fruit seems giving f...positive
14strict policing that too health system sir reg...[-0.9757605195045471, 1.0792640447616577, -0.4...positive2.0strict policing that too health system sir reg...positive
15shatrughan sinha was far bigger public figure ...[-0.5276148319244385, 0.2546652853488922, -0.1...positive8.0shatrughan sinha was far bigger public figure ...positive
16people wish your vision india and least intere...[-1.0698672533035278, 0.5003032088279724, -0.4...positive3.0people wish your vision india and least intere...negative
17chowkidar hee chor hain baap chor beta bada ch...[-0.5458223223686218, -0.23064681887626648, -0...positive4.0chowkidar hee chor hain baap chor beta bada ch...positive
18with modi all his drawbacks atleast know what ...[-0.8066104054450989, -0.014454682357609272, -...positive1.0with modi all his drawbacks atleast know what ...positive
19compare that with modi’ 2014 “vikas purush” el...[-0.5401442646980286, -0.41557249426841736, -0...positive6.0compare that with modi’ 2014 “vikas purush” el...negative
20with welfare delivery gst ibc and feo place mo...[-0.5764278769493103, 1.2036645412445068, -1.0...positive1.0with welfare delivery gst ibc and feo place mo...positive
21young and dynamic chowkidar says you are not w...[-0.08391488343477249, -0.43613195419311523, 0...positive5.0young and dynamic chowkidar says you are not w...positive
22dont forget petrol prices have risen ₹ modi go...[-0.10981855541467667, 0.5448974370956421, -0....positive1.0dont forget petrol prices have risen ₹ modi go...negative
23prime minister narendra modi has urged voters ...[0.27992355823516846, 0.0582934208214283, -0.2...positive7.0prime minister narendra modi has urged voters ...positive
24when someone asks random question economy sche...[-0.5603336691856384, -0.1663953959941864, -0....positive2.0when someone asks random question economy sche...negative
25from whatsapp have two options this elections ...[0.02258816733956337, -0.18138407170772552, -0...positive5.0from whatsapp have two options this elections ...positive
26heres interesting section work which gives com...[-1.3662021160125732, 0.042950600385665894, 0....positive1.0heres interesting section work which gives com...positive
27know why you willpapu his chamchas nothing tar...[-1.0259658098220825, 0.31534343957901, -0.242...positive1.0know why you willpapu his chamchas nothing tar...negative
28only rahul gandhis politics love can defeat th...[-0.411965548992157, -0.5224093198776245, -0.6...positive6.0only rahul gandhis politics love can defeat th...negative
29one see the fake calculation calsi chek kariye...[-1.0891247987747192, -0.5220181345939636, -0....positive6.0one see the fake calculation calsi chek kariye...negative
30being born religion where female deities worsh...[-1.0681140422821045, -0.6835974454879761, -0....positive2.0being born religion where female deities worsh...positive
31narendra modi became and despite biased viciou...[-0.20442193746566772, -0.3683846890926361, -0...positive1.0narendra modi became and despite biased viciou...negative
32think hindus should back off and let them suff...[-0.9112239480018616, 0.17845268547534943, -0....positive2.0think hindus should back off and let them suff...positive
33from the very beginningmodi doing wada faramos...[-1.3499250411987305, 0.23698307573795319, -0....positive2.0from the very beginningmodi doing wada faramos...negative
34women are powerful nature wat hinduism states ...[-0.47838863730430603, 0.19593238830566406, -0...positive1.0women are powerful nature wat hinduism states ...positive
35almost 4000 crore spent but the ganga more pol...[-1.229308009147644, -0.07380063086748123, -0....positive2.0almost 4000 crore spent but the ganga more pol...positive
36seriously must have sick mind call small kid r...[-0.8348453640937805, -0.03178590536117554, -0...positive4.0seriously must have sick mind call small kid r...negative
37agree with you was unrequired was kinda uncomf...[-0.4827183485031128, 0.08657485246658325, -0....positive2.0agree with you was unrequired was kinda uncomf...positive
38rajdeep think 4got that imran not indian never...[-0.6942201852798462, -0.3642423152923584, -0....positive2.0rajdeep think 4got that imran not indian never...negative
39the great modi trap ways congress has walked into[-0.9660191535949707, -0.2219739854335785, -0....positive9.0the great modi trap ways congress has walked i...positive
40has come new meanings nationalism hindu and su...[-0.03318127244710922, 0.04329724237322807, 0....positive1.0has come new meanings nationalism hindu and su...positive
41modi govt hindus are behaving wildly[-0.5563615560531616, 0.4725855588912964, 0.10...positive1.0modi govt hindus are behaving wildlypositive
42sir one request why bjp candidate contesting f...[-0.6955257654190063, 0.37047961354255676, -0....positive1.0sir one request why bjp candidate contesting f...negative
43vapas lana hai desh agey badana hai vote for m...[-0.8704789876937866, -0.22370557487010956, -0...positive1.0vapas lana hai\\ndesh agey badana hai\\nvote for...positive
44even with massive mandate 336 seats the nda go...[-0.20755957067012787, 0.34972018003463745, -0...positive4.0even with massive mandate 336 seats the nda go...positive
45can promise what can delivered epf pension uni...[-1.3415300846099854, 1.6326956748962402, -0.6...positive2.0can promise what can delivered epf pension uni...positive
46and even print this seriously whats this elect...[-1.4377732276916504, -0.11478081345558167, 0....positive9.0and even print this seriously whats this elect...negative
47she has asked three questions from modi and he...[-1.193617343902588, -0.02149897627532482, 0.1...positive1.0she has asked three questions from modi and he...positive
48dear all tsunami favour modi 2019 coming from ...[-0.5854026675224304, -0.21378959715366364, -0...positive4.0dear all tsunami favour modi 2019 coming from ...negative
49rahul gandhis politics love can defeat the mod...[-0.39933672547340393, -0.5969381332397461, -0...positive5.0rahul gandhis politics love can defeat the mod...negative
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 9 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qFoT-s1MjTSS" + }, + "source": [ + "# Try training with different Embeddings" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "nxWFzQOhjWC8", + "outputId": "6e781fd6-e1eb-4641-b3e1-6f6f4e50b7a8" + }, + "source": [ + "# We can use nlu.print_components(action='embed_sentence') to see every possibler sentence embedding we could use. Lets use bert!\n", + "nlp.nlu.print_components(action='embed_sentence')" + ], + "execution_count": 10, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "For language NLU provides the following Models : \n", + "nlu.load('am.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_amharic\n", + "For language NLU provides the following Models : \n", + "nlu.load('de.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "For language NLU provides the following Models : \n", + "nlu.load('el.embed_sentence.bert.base_uncased') returns Spark NLP model_anno_obj sent_bert_base_uncased\n", + "For language NLU provides the following Models : \n", + "nlu.load('en.embed_sentence') returns Spark NLP model_anno_obj tfhub_use\n", + "nlu.load('en.embed_sentence.albert') returns Spark NLP model_anno_obj albert_base_uncased\n", + "nlu.load('en.embed_sentence.bert') returns Spark NLP model_anno_obj sent_bert_base_uncased\n", + "nlu.load('en.embed_sentence.bert.base_uncased_legal') returns Spark NLP model_anno_obj sent_bert_base_uncased_legal\n", + "nlu.load('en.embed_sentence.bert.finetuned') returns Spark NLP model_anno_obj sbert_setfit_finetuned_financial_text_classification\n", + "nlu.load('en.embed_sentence.bert.pubmed') returns Spark NLP model_anno_obj sent_bert_pubmed\n", + "nlu.load('en.embed_sentence.bert.pubmed_squad2') returns Spark NLP model_anno_obj sent_bert_pubmed_squad2\n", + "nlu.load('en.embed_sentence.bert.wiki_books') returns Spark NLP model_anno_obj sent_bert_wiki_books\n", + "nlu.load('en.embed_sentence.bert.wiki_books_mnli') returns Spark NLP model_anno_obj sent_bert_wiki_books_mnli\n", + "nlu.load('en.embed_sentence.bert.wiki_books_qnli') returns Spark NLP model_anno_obj sent_bert_wiki_books_qnli\n", + "nlu.load('en.embed_sentence.bert.wiki_books_qqp') returns Spark NLP model_anno_obj sent_bert_wiki_books_qqp\n", + "nlu.load('en.embed_sentence.bert.wiki_books_squad2') returns Spark NLP model_anno_obj sent_bert_wiki_books_squad2\n", + "nlu.load('en.embed_sentence.bert.wiki_books_sst2') returns Spark NLP model_anno_obj sent_bert_wiki_books_sst2\n", + "nlu.load('en.embed_sentence.bert_base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "nlu.load('en.embed_sentence.bert_base_uncased') returns Spark NLP model_anno_obj sent_bert_base_uncased\n", + "nlu.load('en.embed_sentence.bert_large_cased') returns Spark NLP model_anno_obj sent_bert_large_cased\n", + "nlu.load('en.embed_sentence.bert_large_uncased') returns Spark NLP model_anno_obj sent_bert_large_uncased\n", + "nlu.load('en.embed_sentence.bert_use_cmlm_en_base') returns Spark NLP model_anno_obj sent_bert_use_cmlm_en_base\n", + "nlu.load('en.embed_sentence.bert_use_cmlm_en_large') returns Spark NLP model_anno_obj sent_bert_use_cmlm_en_large\n", + "nlu.load('en.embed_sentence.biobert.clinical_base_cased') returns Spark NLP model_anno_obj sent_biobert_clinical_base_cased\n", + "nlu.load('en.embed_sentence.biobert.discharge_base_cased') returns Spark NLP model_anno_obj sent_biobert_discharge_base_cased\n", + "nlu.load('en.embed_sentence.biobert.pmc_base_cased') returns Spark NLP model_anno_obj sent_biobert_pmc_base_cased\n", + "nlu.load('en.embed_sentence.biobert.pubmed_base_cased') returns Spark NLP model_anno_obj sent_biobert_pubmed_base_cased\n", + "nlu.load('en.embed_sentence.biobert.pubmed_large_cased') returns Spark NLP model_anno_obj sent_biobert_pubmed_large_cased\n", + "nlu.load('en.embed_sentence.biobert.pubmed_pmc_base_cased') returns Spark NLP model_anno_obj sent_biobert_pubmed_pmc_base_cased\n", + "nlu.load('en.embed_sentence.covidbert.large_uncased') returns Spark NLP model_anno_obj sent_covidbert_large_uncased\n", + "nlu.load('en.embed_sentence.distil_roberta.distilled_base') returns Spark NLP model_anno_obj sent_distilroberta_base\n", + "nlu.load('en.embed_sentence.doc2vec') returns Spark NLP model_anno_obj doc2vec_gigaword_300\n", + "nlu.load('en.embed_sentence.doc2vec.gigaword_300') returns Spark NLP model_anno_obj doc2vec_gigaword_300\n", + "nlu.load('en.embed_sentence.doc2vec.gigaword_wiki_300') returns Spark NLP model_anno_obj doc2vec_gigaword_wiki_300\n", + "nlu.load('en.embed_sentence.electra') returns Spark NLP model_anno_obj sent_electra_small_uncased\n", + "nlu.load('en.embed_sentence.electra_base_uncased') returns Spark NLP model_anno_obj sent_electra_base_uncased\n", + "nlu.load('en.embed_sentence.electra_large_uncased') returns Spark NLP model_anno_obj sent_electra_large_uncased\n", + "nlu.load('en.embed_sentence.electra_small_uncased') returns Spark NLP model_anno_obj sent_electra_small_uncased\n", + "nlu.load('en.embed_sentence.roberta.base') returns Spark NLP model_anno_obj sent_roberta_base\n", + "nlu.load('en.embed_sentence.roberta.large') returns Spark NLP model_anno_obj sent_roberta_large\n", + "nlu.load('en.embed_sentence.small_bert_L10_128') returns Spark NLP model_anno_obj sent_small_bert_L10_128\n", + "nlu.load('en.embed_sentence.small_bert_L10_256') returns Spark NLP model_anno_obj sent_small_bert_L10_256\n", + "nlu.load('en.embed_sentence.small_bert_L10_512') returns Spark NLP model_anno_obj sent_small_bert_L10_512\n", + "nlu.load('en.embed_sentence.small_bert_L10_768') returns Spark NLP model_anno_obj sent_small_bert_L10_768\n", + "nlu.load('en.embed_sentence.small_bert_L12_128') returns Spark NLP model_anno_obj sent_small_bert_L12_128\n", + "nlu.load('en.embed_sentence.small_bert_L12_256') returns Spark NLP model_anno_obj sent_small_bert_L12_256\n", + "nlu.load('en.embed_sentence.small_bert_L12_512') returns Spark NLP model_anno_obj sent_small_bert_L12_512\n", + "nlu.load('en.embed_sentence.small_bert_L12_768') returns Spark NLP model_anno_obj sent_small_bert_L12_768\n", + "nlu.load('en.embed_sentence.small_bert_L2_128') returns Spark NLP model_anno_obj sent_small_bert_L2_128\n", + "nlu.load('en.embed_sentence.small_bert_L2_256') returns Spark NLP model_anno_obj sent_small_bert_L2_256\n", + "nlu.load('en.embed_sentence.small_bert_L2_512') returns Spark NLP model_anno_obj sent_small_bert_L2_512\n", + "nlu.load('en.embed_sentence.small_bert_L2_768') returns Spark NLP model_anno_obj sent_small_bert_L2_768\n", + "nlu.load('en.embed_sentence.small_bert_L4_128') returns Spark NLP model_anno_obj sent_small_bert_L4_128\n", + "nlu.load('en.embed_sentence.small_bert_L4_256') returns Spark NLP model_anno_obj sent_small_bert_L4_256\n", + "nlu.load('en.embed_sentence.small_bert_L4_512') returns Spark NLP model_anno_obj sent_small_bert_L4_512\n", + "nlu.load('en.embed_sentence.small_bert_L4_768') returns Spark NLP model_anno_obj sent_small_bert_L4_768\n", + "nlu.load('en.embed_sentence.small_bert_L6_128') returns Spark NLP model_anno_obj sent_small_bert_L6_128\n", + "nlu.load('en.embed_sentence.small_bert_L6_256') returns Spark NLP model_anno_obj sent_small_bert_L6_256\n", + "nlu.load('en.embed_sentence.small_bert_L6_512') returns Spark NLP model_anno_obj sent_small_bert_L6_512\n", + "nlu.load('en.embed_sentence.small_bert_L6_768') returns Spark NLP model_anno_obj sent_small_bert_L6_768\n", + "nlu.load('en.embed_sentence.small_bert_L8_128') returns Spark NLP model_anno_obj sent_small_bert_L8_128\n", + "nlu.load('en.embed_sentence.small_bert_L8_256') returns Spark NLP model_anno_obj sent_small_bert_L8_256\n", + "nlu.load('en.embed_sentence.small_bert_L8_512') returns Spark NLP model_anno_obj sent_small_bert_L8_512\n", + "nlu.load('en.embed_sentence.small_bert_L8_768') returns Spark NLP model_anno_obj sent_small_bert_L8_768\n", + "nlu.load('en.embed_sentence.tfhub_use') returns Spark NLP model_anno_obj tfhub_use\n", + "nlu.load('en.embed_sentence.tfhub_use.lg') returns Spark NLP model_anno_obj tfhub_use_lg\n", + "nlu.load('en.embed_sentence.use') returns Spark NLP model_anno_obj tfhub_use\n", + "nlu.load('en.embed_sentence.use.lg') returns Spark NLP model_anno_obj tfhub_use_lg\n", + "For language NLU provides the following Models : \n", + "nlu.load('es.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "nlu.load('es.embed_sentence.bert.base_uncased') returns Spark NLP model_anno_obj sent_bert_base_uncased\n", + "For language NLU provides the following Models : \n", + "nlu.load('fi.embed_sentence.bert') returns Spark NLP model_anno_obj bert_base_finnish_uncased\n", + "nlu.load('fi.embed_sentence.bert.cased') returns Spark NLP model_anno_obj bert_base_finnish_cased\n", + "nlu.load('fi.embed_sentence.bert.uncased') returns Spark NLP model_anno_obj bert_base_finnish_uncased\n", + "For language NLU provides the following Models : \n", + "nlu.load('ha.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_hausa\n", + "For language NLU provides the following Models : \n", + "nlu.load('ig.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_igbo\n", + "For language NLU provides the following Models : \n", + "nlu.load('lg.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_luganda\n", + "For language NLU provides the following Models : \n", + "nlu.load('nl.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "For language NLU provides the following Models : \n", + "nlu.load('pcm.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_naija\n", + "For language NLU provides the following Models : \n", + "nlu.load('pt.embed_sentence.bert.base_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_base_tsdae_sts\n", + "nlu.load('pt.embed_sentence.bert.cased_large_legal') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.1\n", + "nlu.load('pt.embed_sentence.bert.large_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_gpl_sts\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.10.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.10\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.2.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.2\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.3.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.3\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.4.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.4\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.5.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.5\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.7.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.7\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.8.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.8\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v0.9.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v0.9\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_sts_v1.0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_sts_v1.0\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_gpl_nli_sts_v0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_gpl_nli_sts_v0\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_gpl_nli_sts_v1.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_gpl_nli_sts_v1\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_nli_sts_v0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_nli_sts_v0\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_nli_sts_v1.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_nli_sts_v1\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_sts_v0.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_sts_v0\n", + "nlu.load('pt.embed_sentence.bert.legal.cased_large_mlm_v0.11_sts_v1.by_stjiris') returns Spark NLP model_anno_obj sbert_bert_large_portuguese_cased_legal_mlm_v0.11_sts_v1\n", + "nlu.load('pt.embed_sentence.bert.v2_base_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_base_ma_v2\n", + "nlu.load('pt.embed_sentence.bert.v2_large_legal') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_sts_v2\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.assin.base.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_base_ma\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.assin2.base.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_base\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_sts_by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_sts_ma.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large_ma\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_sts_ma_v3.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large_ma_v3\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_tsdae_sts.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_sts\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_tsdae_sts_v4.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_sts_v4\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.large_tsdae_v4_gpl_sts.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_tsdae_v4_gpl_sts\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.v2_large_sts_v2.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_sts_large_v2\n", + "nlu.load('pt.embed_sentence.bertimbau.legal.v2_large_v2_sts.by_rufimelo') returns Spark NLP model_anno_obj sbert_legal_bertimbau_large_v2_sts\n", + "For language NLU provides the following Models : \n", + "nlu.load('rw.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_kinyarwanda\n", + "For language NLU provides the following Models : \n", + "nlu.load('sv.embed_sentence.bert.base_cased') returns Spark NLP model_anno_obj sent_bert_base_cased\n", + "For language NLU provides the following Models : \n", + "nlu.load('sw.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_swahili\n", + "For language NLU provides the following Models : \n", + "nlu.load('wo.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_wolof\n", + "For language NLU provides the following Models : \n", + "nlu.load('xx.embed_sentence') returns Spark NLP model_anno_obj sent_bert_multi_cased\n", + "nlu.load('xx.embed_sentence.bert') returns Spark NLP model_anno_obj sent_bert_multi_cased\n", + "nlu.load('xx.embed_sentence.bert.cased') returns Spark NLP model_anno_obj sent_bert_multi_cased\n", + "nlu.load('xx.embed_sentence.bert.muril') returns Spark NLP model_anno_obj sent_bert_muril\n", + "nlu.load('xx.embed_sentence.bert_use_cmlm_multi_base') returns Spark NLP model_anno_obj sent_bert_use_cmlm_multi_base\n", + "nlu.load('xx.embed_sentence.bert_use_cmlm_multi_base_br') returns Spark NLP model_anno_obj sent_bert_use_cmlm_multi_base_br\n", + "nlu.load('xx.embed_sentence.labse') returns Spark NLP model_anno_obj labse\n", + "nlu.load('xx.embed_sentence.xlm_roberta.base') returns Spark NLP model_anno_obj sent_xlm_roberta_base\n", + "For language NLU provides the following Models : \n", + "nlu.load('yo.embed_sentence.xlm_roberta') returns Spark NLP model_anno_obj sent_xlm_roberta_base_finetuned_yoruba\n", + "For language NLU provides the following Models : \n", + "nlu.load('zh.embed_sentence.bert') returns Spark NLP model_anno_obj sbert_chinese_qmc_finance_v1\n", + "nlu.load('zh.embed_sentence.bert.distilled') returns Spark NLP model_anno_obj sbert_chinese_qmc_finance_v1_distill\n" + ] + } + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "IKK_Ii_gjJfF", + "outputId": "ae16a73d-eb90-4d1f-f1d0-9769c28ce54f" + }, + "source": [ + "trainable_pipe = nlp.load('en.embed_sentence.small_bert_L12_768 train.sentiment')\n", + "# We need to train longer and user smaller LR for NON-USE based sentence embeddings usually\n", + "# We could tune the hyperparameters further with hyperparameter tuning methods like gridsearch\n", + "# Also longer training gives more accuracy\n", + "trainable_pipe['trainable_sentiment_dl'].setMaxEpochs(100)\n", + "trainable_pipe['trainable_sentiment_dl'].setLr(0.0005)\n", + "fitted_pipe = trainable_pipe.fit(train_df)\n", + "# predict with the trainable pipeline on dataset and get predictions\n", + "preds = fitted_pipe.predict(train_df,output_level='document')\n", + "\n", + "#sentence detector that is part of the pipe generates sone NaNs. lets drop them first\n", + "preds.dropna(inplace=True)\n", + "print(classification_report(preds['y'], preds['sentiment']))\n", + "\n", + "#preds" + ], + "execution_count": 11, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", + "sent_small_bert_L12_768 download started this may take some time.\n", + "Approximate size to download 392.9 MB\n", + "[OK!]\n", + " precision recall f1-score support\n", + "\n", + " negative 0.78 0.65 0.71 300\n", + " neutral 0.00 0.00 0.00 0\n", + " positive 0.89 0.52 0.65 300\n", + "\n", + " accuracy 0.58 600\n", + " macro avg 0.55 0.39 0.45 600\n", + "weighted avg 0.83 0.58 0.68 600\n", + "\n" + ] + }, + { + "output_type": "stream", + "name": "stderr", + "text": [ + "/usr/local/lib/python3.10/dist-packages/nlu/pipe/extractors/extractor_methods/base_extractor_methods.py:356: SettingWithCopyWarning: \n", + "A value is trying to be set on a copy of a slice from a DataFrame.\n", + "Try using .loc[row_indexer,col_indexer] = value instead\n", + "\n", + "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", + " df[cols_to_explode] = df[cols_to_explode].apply(pad_same_level_cols, axis=1)\n", + "/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples. Use `zero_division` parameter to control this behavior.\n", + " _warn_prf(average, modifier, msg_start, len(result))\n", + "/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples. Use `zero_division` parameter to control this behavior.\n", + " _warn_prf(average, modifier, msg_start, len(result))\n", + "/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples. Use `zero_division` parameter to control this behavior.\n", + " _warn_prf(average, modifier, msg_start, len(result))\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2BB-NwZUoHSe" + }, + "source": [ + "# 5. Lets save the model" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "eLex095goHwm" + }, + "source": [ + "stored_model_path = './models/classifier_dl_trained'\n", + "fitted_pipe.save(stored_model_path)" + ], + "execution_count": 12, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e_b2DPd4rCiU" + }, + "source": [ + "# 6. Lets load the model from HDD.\n", + "This makes Offlien NLU usage possible! \n", + "You need to call nlu.load(path=path_to_the_pipe) to load a model/pipeline from disk." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "SO4uz45MoRgp", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 133 + }, + "outputId": "5634381d-e25b-49dd-d720-860be456d9cd" + }, + "source": [ + "hdd_pipe = nlp.load(path=stored_model_path)\n", + "\n", + "preds = hdd_pipe.predict('the president of india just died')\n", + "preds" + ], + "execution_count": 13, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " document \\\n", + "0 the president of india just died \n", + "\n", + " sentence_embedding_from_disk sentiment \\\n", + "0 [0.009459968656301498, -0.07943318039178848, 0... positive \n", + "\n", + " sentiment_confidence \n", + "0 0.0 " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
documentsentence_embedding_from_disksentimentsentiment_confidence
0the president of india just died[0.009459968656301498, -0.07943318039178848, 0...positive0.0
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 13 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "e0CVlkk9v6Qi", + "outputId": "54c80119-8661-4b08-f9c3-952193759535" + }, + "source": [ + "hdd_pipe.print_info()" + ], + "execution_count": 14, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n", + ">>> component_list['document_assembler'] has settable params:\n", + "component_list['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n", + ">>> component_list['bert_sentence_embeddings@sent_small_bert_L12_768'] has settable params:\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setBatchSize(8) | Info: Size of every batch | Currently set to : 8\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setCaseSensitive(False) | Info: whether to ignore case in tokens for embeddings matching | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setDimension(768) | Info: Number of embedding dimensions | Currently set to : 768\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setMaxSentenceLength(128) | Info: Max sentence length to process | Currently set to : 128\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setIsLong(False) | Info: Use Long type instead of Int type for inputs buffer - Some Bert models require Long instead of Int. | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L12_768'].setStorageRef('sent_small_bert_L12_768') | Info: unique reference name for identification | Currently set to : sent_small_bert_L12_768\n", + ">>> component_list['sentiment_dl@sent_small_bert_L12_768'] has settable params:\n", + "component_list['sentiment_dl@sent_small_bert_L12_768'].setThreshold(0.6) | Info: The minimum threshold for the final result otheriwse it will be neutral | Currently set to : 0.6\n", + "component_list['sentiment_dl@sent_small_bert_L12_768'].setThresholdLabel('neutral') | Info: In case the score is less than threshold, what should be the label. Default is neutral. | Currently set to : neutral\n", + "component_list['sentiment_dl@sent_small_bert_L12_768'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['sentiment_dl@sent_small_bert_L12_768'].setClasses(['positive', 'negative']) | Info: get the tags used to trained this SentimentDLModel | Currently set to : ['positive', 'negative']\n", + "component_list['sentiment_dl@sent_small_bert_L12_768'].setStorageRef('sent_small_bert_L12_768') | Info: unique reference name for identification | Currently set to : sent_small_bert_L12_768\n" + ] + } + ] + } + ] +} \ No newline at end of file diff --git a/examples/colab/Training/entity_resolution/sentence_entity_resolution_training.ipynb b/examples/colab/Training/entity_resolution/sentence_entity_resolution_training.ipynb index b2e05ebc..9cb8e254 100644 --- a/examples/colab/Training/entity_resolution/sentence_entity_resolution_training.ipynb +++ b/examples/colab/Training/entity_resolution/sentence_entity_resolution_training.ipynb @@ -3,9 +3,7 @@ "nbformat_minor": 0, "metadata": { "colab": { - "name": "sentence_entity_resolution_training.ipynb", - "provenance": [], - "collapsed_sections": [] + "provenance": [] }, "kernelspec": { "display_name": "Python 3", @@ -32,19 +30,19 @@ "\n", "# Sentence Entity Resolution training\n", "Named Entities are sub pieces in textual data which are labled with classes. \n", - "These classes and strings are still ambious though and it is not possible to group semantically identically entities withouth any definition of `terminology`. \n", + "These classes and strings are still ambious though and it is not possible to group semantically identically entities withouth any definition of `terminology`.\n", "With the `Sentence Resolver` you can train a state of the art deep learning architecture to map entities to their unique terminological representation.\n", "\n", - "A concrete example would be : \n", + "A concrete example would be :\n", "\n", - "- The `TSLA` stock is good to buy. \n", + "- The `TSLA` stock is good to buy.\n", "- `Tesla, Inc`. is a great company to invest int\n", "- The price of `Teslas` stocks is going up\n", "\n", "`TSLA` , `Tesla`, `Teslas` can be extracted by an NER model an labled as `company` entity class. But we cannot tell programmatically, if all the referring to the same sematic concept, in this case company. \n", "\n", "To solve this abigous problem, we can introduce a Terminlogy, where the Tesla company has the ID 21 and every other company in our portfolio get a unique ID aswell. \n", - "With a defined terminology at hand and a labled dataset, we can train a chunk resolver to map textually different but semantically equivalent `company entities` to `the same id`. \n", + "With a defined terminology at hand and a labled dataset, we can train a chunk resolver to map textually different but semantically equivalent `company entities` to `the same id`.\n", "\n", "\n", "\n", @@ -53,9 +51,7 @@ "\n", "\n", "\n", - "## 1. Install NLU, dependecies and Authenticate\n", - "\n", - "See the [install docs](https://nlu.johnsnowlabs.com/docs/en/install#super-quickstart-on-google-colab-or-kaggle) and [authentification docs](https://nlu.johnsnowlabs.com/docs/en/examples_hc#authorize-access-to-licensed-features-and-install-healthcare-dependencies) for more infos \n" + "## 1. Colab Setup\n" ] }, { @@ -64,54 +60,60 @@ "id": "qlZgaz0oXtb6" }, "source": [ - "!wget http://setup.johnsnowlabs.com/nlu/colab.sh -O - | bash\n", - "import nlu\n", - "import pandas as pd " + "# Install the johnsnowlabs library\n", + "! pip install -q johnsnowlabs" ], "execution_count": null, "outputs": [] }, { - "cell_type": "markdown", + "cell_type": "code", + "source": [ + "from google.colab import files\n", + "print('Please Upload your John Snow Labs License using the button below')\n", + "license_keys = files.upload()" + ], "metadata": { - "id": "hp4j0_IluV3y" + "id": "tiyJEsGmN9Hy" }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", "source": [ - "# Train Sentence Resolver\n", + "from johnsnowlabs import nlp\n", "\n", - "This is a mini example to make you familiar with the dataset structure you must provide for training. \n", - "Train a chunk resolver on a dataset with columns named `y` , `_y` and `text`. `y` is a label, `_y` is an extra identifier label, `text` is the raw text\n" - ] + "# After uploading your license run this to install all licensed Python Wheels and pre-download Jars the Spark Session JVM\n", + "nlp.install()" + ], + "metadata": { + "id": "adVFDvmDOBfG" + }, + "execution_count": null, + "outputs": [] }, { "cell_type": "code", + "source": [ + "spark=nlp.start()" + ], "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "BEHVidCG3wO2", - "outputId": "87628d73-4cd0-47d5-ad04-71b34fcea8df" + "id": "1AM75UupORAA" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hp4j0_IluV3y" }, "source": [ - "import nlu\n", - "SPARK_NLP_LICENSE =\"????\"\n", - "AWS_ACCESS_KEY_ID = \"????\"\n", - "AWS_SECRET_ACCESS_KEY =\"????\"\n", - "JSL_SECRET =\"????\"\n", - "nlu.auth(SPARK_NLP_LICENSE,AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY,JSL_SECRET)" - ], - "execution_count": 1, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "" - ] - }, - "metadata": {}, - "execution_count": 1 - } + "# Train Sentence Resolver\n", + "\n", + "This is a mini example to make you familiar with the dataset structure you must provide for training.\n", + "Train a chunk resolver on a dataset with columns named `y` , `_y` and `text`. `y` is a label, `_y` is an extra identifier label, `text` is the raw text\n" ] }, { @@ -119,14 +121,14 @@ "metadata": { "colab": { "base_uri": "https://localhost:8080/", - "height": 277 + "height": 559 }, "id": "cib1vJ_1tJRr", - "outputId": "ef575edd-a7a5-45b8-89b6-7edea30cb2ea" + "outputId": "e56c920a-fea6-46d2-bd91-9b040fe3aac8" }, "source": [ - "import pandas as pd \n", - "import nlu\n", + "import pandas as pd\n", + "\n", "dataset = pd.DataFrame({\n", " 'text': ['The Tesla company is good to invest is', 'TSLA is good to invest','TESLA INC. we should buy','PUT ALL MONEY IN TSLA inc!!'],\n", " 'y': ['23','23','23','23'],\n", @@ -134,29 +136,90 @@ "\n", "})\n", "\n", - "trainable_pipe = nlu.load('train.resolve_sentence')\n", + "trainable_pipe = nlp.load('train.resolve_sentence')\n", "fitted_pipe = trainable_pipe.fit(dataset)\n", "fitted_pipe.predict(dataset.text)" ], - "execution_count": 2, + "execution_count": 5, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ - "tfhub_use download started this may take some time.\n", - "Approximate size to download 923.7 MB\n", + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", + "setInputCols in SentenceEntityResolverApproach_d4f860f61823 expecting 1 columns. Provided column amount: 2. Which should be columns from the following annotators: ['sentence_embeddings']\n", + "sent_small_bert_L2_128 download started this may take some time.\n", + "Approximate size to download 16.1 MB\n", "[OK!]\n", "sentence_detector_dl download started this may take some time.\n", "Approximate size to download 354.6 KB\n", - "[OK!]\n" + "[OK!]\n", + "Warning::Spark Session already created, some configs may not take.\n" ] }, { "output_type": "execute_result", "data": { + "text/plain": [ + " document \\\n", + "0 The Tesla company is good to invest is \n", + "1 TSLA is good to invest \n", + "2 TESLA INC. we should buy \n", + "3 PUT ALL MONEY IN TSLA inc!! \n", + "\n", + " resolution_sentence_entity_resolver_code \\\n", + "0 23 \n", + "1 23 \n", + "2 23 \n", + "3 23 \n", + "\n", + " resolution_sentence_entity_resolver_confidence \\\n", + "0 1.0000 \n", + "1 1.0000 \n", + "2 1.0000 \n", + "3 1.0000 \n", + "\n", + " resolution_sentence_entity_resolver_distance \\\n", + "0 0.0000 \n", + "1 0.0000 \n", + "2 0.0000 \n", + "3 0.0000 \n", + "\n", + " resolution_sentence_entity_resolver_origin_sentence \\\n", + "0 0 \n", + "1 0 \n", + "2 0 \n", + "3 0 \n", + "\n", + " resolution_sentence_entity_resolver_resolved_text \\\n", + "0 TESLA \n", + "1 TESLA \n", + "2 TESLA \n", + "3 TESLA \n", + "\n", + " resolution_sentence_entity_resolver_target_text \\\n", + "0 The Tesla company is good to invest is \n", + "1 TSLA is good to invest \n", + "2 TESLA INC. we should buy \n", + "3 PUT ALL MONEY IN TSLA inc!! \n", + "\n", + " resolution_sentence_entity_resolver_token \\\n", + "0 The Tesla company is good to invest is \n", + "1 TSLA is good to invest \n", + "2 TESLA INC. we should buy \n", + "3 PUT ALL MONEY IN TSLA inc!! \n", + "\n", + " sentence_embedding_small_bert_L2_128 \n", + "0 [[0.5044986009597778, 0.7948187589645386, -0.6... \n", + "1 [[-1.1105577945709229, 0.8402332067489624, -1.... \n", + "2 [[-0.6380321979522705, 0.5634128451347351, -0.... \n", + "3 [[-1.7485851049423218, 0.26517942547798157, -0... " + ], "text/html": [ - "
\n", + "\n", + "
\n", + "
\n", "\n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" ] }, "metadata": {}, - "execution_count": 2 + "execution_count": 5 } ] }, @@ -255,39 +532,45 @@ "base_uri": "https://localhost:8080/" }, "id": "LYqhWOcWbqgD", - "outputId": "9bcb5e12-82f9-41ca-932f-2e9e0d0acf6a" + "outputId": "174f56aa-d817-4c36-ac13-b7067646837b" }, "source": [ "# We can configurevarious parameters on the Chunk resolver\n", "trainable_pipe.print_info()\n" ], - "execution_count": 4, + "execution_count": 6, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n", - ">>> pipe['sentence_resolver'] has settable params:\n", - "pipe['sentence_resolver'].setNormalizedCol('_y') | Info: Column name for the original, normalized description | Currently set to : _y\n", - "pipe['sentence_resolver'].setDistanceFunction('EUCLIDIAN') | Info: What distance function to use for WMD: 'EUCLIDEAN' or 'COSINE' | Currently set to : EUCLIDIAN\n", - "pipe['sentence_resolver'].setNeighbours(25) | Info: Number of neighbours to consider in the KNN query to calculate WMD | Currently set to : 25\n", - "pipe['sentence_resolver'].setThreshold(1000.0) | Info: Threshold value for the last distance calculated | Currently set to : 1000.0\n", - "pipe['sentence_resolver'].setMissAsEmpty(True) | Info: whether or not to return an empty annotation on unmatched chunks | Currently set to : True\n", - "pipe['sentence_resolver'].setReturnCosineDistances(True) | Info: Extract Cosine Distances. TRUE or False | Currently set to : True\n", - "pipe['sentence_resolver'].setCaseSensitive(False) | Info: whether to ignore case in tokens for embeddings matching | Currently set to : False\n", - ">>> pipe['use@tfhub_use'] has settable params:\n", - "pipe['use@tfhub_use'].setDimension(512) | Info: Number of embedding dimensions | Currently set to : 512\n", - "pipe['use@tfhub_use'].setLoadSP(False) | Info: Whether to load SentencePiece ops file which is required only by multi-lingual models. This is not changeable after it's set with a pretrained model nor it is compatible with Windows. | Currently set to : False\n", - "pipe['use@tfhub_use'].setStorageRef('tfhub_use') | Info: unique reference name for identification | Currently set to : tfhub_use\n", - ">>> pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'] has settable params:\n", - "pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setExplodeSentences(False) | Info: whether to explode each sentence into a different row, for better parallelization. Defaults to false. | Currently set to : False\n", - "pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setStorageRef('SentenceDetectorDLModel_c83c27f46b97') | Info: storage unique identifier | Currently set to : SentenceDetectorDLModel_c83c27f46b97\n", - "pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setEncoder(com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@31dd891e) | Info: Data encoder | Currently set to : com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@31dd891e\n", - "pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setImpossiblePenultimates(['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']) | Info: Impossible penultimates | Currently set to : ['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']\n", - "pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setModelArchitecture('cnn') | Info: Model architecture (CNN) | Currently set to : cnn\n", - ">>> pipe['document_assembler'] has settable params:\n", - "pipe['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n" + ">>> component_list['bert_sentence_embeddings@sent_small_bert_L2_128'] has settable params:\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setBatchSize(8) | Info: Size of every batch | Currently set to : 8\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setIsLong(False) | Info: Use Long type instead of Int type for inputs buffer - Some Bert models require Long instead of Int. | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setMaxSentenceLength(128) | Info: Max sentence length to process | Currently set to : 128\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setDimension(128) | Info: Number of embedding dimensions | Currently set to : 128\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setCaseSensitive(False) | Info: whether to ignore case in tokens for embeddings matching | Currently set to : False\n", + "component_list['bert_sentence_embeddings@sent_small_bert_L2_128'].setStorageRef('sent_small_bert_L2_128') | Info: unique reference name for identification | Currently set to : sent_small_bert_L2_128\n", + ">>> component_list['document_assembler'] has settable params:\n", + "component_list['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n", + ">>> component_list['sentence_entity_resolver@sent_small_bert_L2_128'] has settable params:\n", + "component_list['sentence_entity_resolver@sent_small_bert_L2_128'].setAux_label_col('aux_label') | Info: Auxiliary label which maps resolved entities to additional labels | Currently set to : aux_label\n", + "component_list['sentence_entity_resolver@sent_small_bert_L2_128'].setEnableInMemoryStorage(False) | Info: whether to load whole indexed storage in memory (in-memory lookup) | Currently set to : False\n", + "component_list['sentence_entity_resolver@sent_small_bert_L2_128'].setIncludeStorage(True) | Info: whether to include indexed storage in trained model | Currently set to : True\n", + "component_list['sentence_entity_resolver@sent_small_bert_L2_128'].setReturnAllKEmbeddings(False) | Info: Whether to return all embeddings of all K candidates of the resolution. Embeddings will be in the metadata. Increase in RAM usage to be expected | Currently set to : False\n", + "component_list['sentence_entity_resolver@sent_small_bert_L2_128'].setReturnEuclideanDistances(True) | Info: Whether to Euclidean distances of the k closest candidates for a chunk/token. | Currently set to : True\n", + "component_list['sentence_entity_resolver@sent_small_bert_L2_128'].setDistanceFunction('EUCLIDIAN') | Info: What distance function to use for Word Mover's Distance (WMD). Either 'EUCLIDEAN' or 'COSINE' | Currently set to : EUCLIDIAN\n", + "component_list['sentence_entity_resolver@sent_small_bert_L2_128'].setMissAsEmpty(True) | Info: whether or not to return an empty annotation on unmatched chunks | Currently set to : True\n", + "component_list['sentence_entity_resolver@sent_small_bert_L2_128'].setNeighbours(25) | Info: Number of neighbours to consider in the KNN query to calculate Word Mover's Distance (WMD) | Currently set to : 25\n", + "component_list['sentence_entity_resolver@sent_small_bert_L2_128'].setReturnCosineDistances(True) | Info: Extract Cosine Distances. TRUE or False | Currently set to : True\n", + "component_list['sentence_entity_resolver@sent_small_bert_L2_128'].setThreshold(1000.0) | Info: Threshold value for the last distance calculated | Currently set to : 1000.0\n", + "component_list['sentence_entity_resolver@sent_small_bert_L2_128'].setCaseSensitive(False) | Info: whether to ignore case in tokens for embeddings matching | Currently set to : False\n", + "component_list['sentence_entity_resolver@sent_small_bert_L2_128'].setConfidenceFunction('SOFTMAX') | Info: what function to use to calculate confidence: Either 'INVERSE' or 'SOFTMAX'. | Currently set to : SOFTMAX\n", + "component_list['sentence_entity_resolver@sent_small_bert_L2_128'].setDimension(128) | Info: Number of embedding dimensions | Currently set to : 128\n", + "component_list['sentence_entity_resolver@sent_small_bert_L2_128'].setStorageRef('sent_small_bert_L2_128') | Info: unique reference name for identification | Currently set to : sent_small_bert_L2_128\n", + "component_list['sentence_entity_resolver@sent_small_bert_L2_128'].setUseAuxLabel(False) | Info: Use AuxLabel Col or not | Currently set to : False\n" ] } ] @@ -311,7 +594,7 @@ "!wget -q https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp-workshop/master/tutorials/Certification_Trainings/Healthcare/data/AskAPatient.fold-0.test.txt\n", "!wget -q https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp-workshop/master/tutorials/Certification_Trainings/Healthcare/data/AskAPatient.fold-0.train.txt" ], - "execution_count": 5, + "execution_count": 7, "outputs": [] }, { @@ -319,10 +602,10 @@ "metadata": { "colab": { "base_uri": "https://localhost:8080/", - "height": 414 + "height": 424 }, "id": "Fo30y4S6bRgz", - "outputId": "87fcb8a9-5f3f-4d6c-c448-e0e3de661516" + "outputId": "f335847d-973d-4081-877b-958d5c91b330" }, "source": [ "import pandas as pd\n", @@ -338,8 +621,26 @@ { "output_type": "execute_result", "data": { + "text/plain": [ + " y _y text\n", + "0 108367008 Dislocation of joint Dislocation of joint\n", + "1 3384011000036100 Arthrotec Arthrotec\n", + "2 166717003 Serum creatinine raised Serum creatinine raised\n", + "3 3877011000036101 Lipitor Lipitor\n", + "4 402234004 Foot eczema Foot eczema\n", + ".. ... ... ...\n", + "245 162290004 Dry eyes Dry eyes\n", + "246 419723007 Mentally dull Mentally dull\n", + "247 4216011000036104 Norvasc Norvasc\n", + "248 13791008 Asthenia Asthenia\n", + "249 162059005 Upset stomach Upset stomach\n", + "\n", + "[250 rows x 3 columns]" + ], "text/html": [ - "
\n", + "\n", + "
\n", + "
\n", "\n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" ] }, "metadata": {}, @@ -462,37 +955,241 @@ "id": "5ngBy_2CbwIY", "colab": { "base_uri": "https://localhost:8080/", - "height": 704 + "height": 808 }, - "outputId": "b178136e-cf46-46d9-a63d-dceb54a94d45" + "outputId": "50f25786-0dc4-448e-c5f5-d1cd417acd6b" }, "source": [ "# Healthcare Embeddings\n", - "trainable_pipe = nlu.load('en.embed_sentence.bert.jsl_tiny_umls_uncased train.resolve_sentence')\n", - "trainable_pipe['sentence_resolver'].setNeighbours(4) \n", + "trainable_pipe = nlp.load('en.embed_sentence.bert.jsl_tiny_umls_uncased train.resolve_sentence')\n", + "trainable_pipe['trainable_sentence_entity_resolver'].setNeighbours(4)\n", "fitted_pipe = trainable_pipe.fit(aap_tr)\n", "prediction = fitted_pipe.predict(aap_tr)\n", "prediction" ], - "execution_count": 11, + "execution_count": 12, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", "sbert_jsl_tiny_umls_uncased download started this may take some time.\n", "Approximate size to download 15.8 MB\n", "[OK!]\n", + "setInputCols in SentenceEntityResolverApproach_2520cda31704 expecting 1 columns. Provided column amount: 2. Which should be columns from the following annotators: ['sentence_embeddings']\n", "sentence_detector_dl download started this may take some time.\n", "Approximate size to download 354.6 KB\n", - "[OK!]\n" + "[OK!]\n", + "Warning::Spark Session already created, some configs may not take.\n" ] }, { "output_type": "execute_result", "data": { + "text/plain": [ + " _y document \\\n", + "0 Dislocation of joint Dislocation of joint \n", + "1 Arthrotec Arthrotec \n", + "2 Serum creatinine raised Serum creatinine raised \n", + "3 Lipitor Lipitor \n", + "4 Foot eczema Foot eczema \n", + ".. ... ... \n", + "245 Dry eyes Dry eyes \n", + "246 Mentally dull Mentally dull \n", + "247 Norvasc Norvasc \n", + "248 Asthenia Asthenia \n", + "249 Upset stomach Upset stomach \n", + "\n", + " resolution_sentence_entity_resolver_code \\\n", + "0 108367008 \n", + "1 3384011000036100 \n", + "2 166717003 \n", + "3 3877011000036101 \n", + "4 402234004 \n", + ".. ... \n", + "245 162290004 \n", + "246 419723007 \n", + "247 4216011000036104 \n", + "248 13791008 \n", + "249 162059005 \n", + "\n", + " resolution_sentence_entity_resolver_confidence \\\n", + "0 0.9992 \n", + "1 0.9921 \n", + "2 0.9975 \n", + "3 1.0000 \n", + "4 0.9942 \n", + ".. ... \n", + "245 0.9981 \n", + "246 1.0000 \n", + "247 0.9864 \n", + "248 1.0000 \n", + "249 0.9960 \n", + "\n", + " resolution_sentence_entity_resolver_distance \\\n", + "0 0.0000 \n", + "1 0.0000 \n", + "2 0.0000 \n", + "3 0.0000 \n", + "4 0.0000 \n", + ".. ... \n", + "245 0.0000 \n", + "246 0.0000 \n", + "247 0.0000 \n", + "248 0.0000 \n", + "249 0.0000 \n", + "\n", + " resolution_sentence_entity_resolver_k_codes \\\n", + "0 [[108367008, 21288011000036105, 404640003]] \n", + "1 [[3384011000036100, 57676002]] \n", + "2 [[166717003, 39575007, 13644009, 124055002]] \n", + "3 NaN \n", + "4 [[402234004, 21930011000036101, 41710110000361... \n", + ".. ... \n", + "245 [[162290004, 238810007, 404640003]] \n", + "246 NaN \n", + "247 [[4216011000036104, 2929011000036108, 367391008]] \n", + "248 NaN \n", + "249 [[162059005, 271681002, 25064002]] \n", + "\n", + " resolution_sentence_entity_resolver_k_confidences \\\n", + "0 [[0.9992, 0.0005, 0.0004]] \n", + "1 [[0.9921, 0.0079]] \n", + "2 [[0.9975, 0.0011, 0.0009, 0.0005]] \n", + "3 NaN \n", + "4 [[0.9942, 0.0025, 0.0020, 0.0013]] \n", + ".. ... \n", + "245 [[0.9981, 0.0013, 0.0006]] \n", + "246 NaN \n", + "247 [[0.9864, 0.0080, 0.0056]] \n", + "248 NaN \n", + "249 [[0.9960, 0.0030, 0.0010]] \n", + "\n", + " resolution_sentence_entity_resolver_k_cos_distances \\\n", + "0 [[0.0000, 0.2300, 0.2344]] \n", + "1 [[0.0000, 0.0922]] \n", + "2 [[0.0000, 0.1798, 0.2049, 0.2325]] \n", + "3 NaN \n", + "4 [[0.0000, 0.1463, 0.1500, 0.1788]] \n", + ".. ... \n", + "245 [[0.0000, 0.1612, 0.2016]] \n", + "246 NaN \n", + "247 [[0.0000, 0.0863, 0.1000]] \n", + "248 NaN \n", + "249 [[0.0000, 0.1238, 0.1752]] \n", + "\n", + " resolution_sentence_entity_resolver_k_distances \\\n", + "0 [[0.0000, 7.7017, 7.9164]] \n", + "1 [[0.0000, 4.8368]] \n", + "2 [[0.0000, 6.7997, 7.0506, 7.5232]] \n", + "3 NaN \n", + "4 [[0.0000, 6.0054, 6.1894, 6.6619]] \n", + ".. ... \n", + "245 [[0.0000, 6.6192, 7.4328]] \n", + "246 NaN \n", + "247 [[0.0000, 4.8183, 5.1746]] \n", + "248 NaN \n", + "249 [[0.0000, 5.8050, 6.9000]] \n", + "\n", + " resolution_sentence_entity_resolver_k_resolution \\\n", + "0 [[Dislocation of joint, diclofenac, Dizziness]] \n", + "1 [[Arthrotec, Arthralgia]] \n", + "2 [[Serum creatinine raised, Urine looks dark, H... \n", + "3 NaN \n", + "4 [[Foot eczema, ezetimibe, Celebrex, Arthralgia]] \n", + ".. ... \n", + "245 [[Dry eyes, Flushing, Dizziness]] \n", + "246 NaN \n", + "247 [[Norvasc, Nexium, Malaise]] \n", + "248 NaN \n", + "249 [[Upset stomach, Stomach ache, Headache]] \n", + "\n", + " resolution_sentence_entity_resolver_origin_sentence \\\n", + "0 0 \n", + "1 0 \n", + "2 0 \n", + "3 0 \n", + "4 0 \n", + ".. ... \n", + "245 0 \n", + "246 0 \n", + "247 0 \n", + "248 0 \n", + "249 0 \n", + "\n", + " resolution_sentence_entity_resolver_resolved_text \\\n", + "0 Dislocation of joint \n", + "1 Arthrotec \n", + "2 Serum creatinine raised \n", + "3 Lipitor \n", + "4 Foot eczema \n", + ".. ... \n", + "245 Dry eyes \n", + "246 Mentally dull \n", + "247 Norvasc \n", + "248 Asthenia \n", + "249 Upset stomach \n", + "\n", + " resolution_sentence_entity_resolver_target_text \\\n", + "0 Dislocation of joint \n", + "1 Arthrotec \n", + "2 Serum creatinine raised \n", + "3 Lipitor \n", + "4 Foot eczema \n", + ".. ... \n", + "245 Dry eyes \n", + "246 Mentally dull \n", + "247 Norvasc \n", + "248 Asthenia \n", + "249 Upset stomach \n", + "\n", + " resolution_sentence_entity_resolver_token \\\n", + "0 Dislocation of joint \n", + "1 Arthrotec \n", + "2 Serum creatinine raised \n", + "3 Lipitor \n", + "4 Foot eczema \n", + ".. ... \n", + "245 Dry eyes \n", + "246 Mentally dull \n", + "247 Norvasc \n", + "248 Asthenia \n", + "249 Upset stomach \n", + "\n", + " sentence_embedding_bert \\\n", + "0 [[-0.9687817692756653, -0.31864216923713684, -... \n", + "1 [[-0.7108752131462097, -0.5266207456588745, -0... \n", + "2 [[-0.5410001277923584, -2.0953280925750732, 0.... \n", + "3 [[-0.45240962505340576, -1.394622564315796, -0... \n", + "4 [[-0.763110876083374, -0.40250054001808167, -0... \n", + ".. ... \n", + "245 [[-0.03702589124441147, -1.3459508419036865, -... \n", + "246 [[-0.9327226281166077, -1.3695887327194214, -0... \n", + "247 [[-0.4530910551548004, -1.576862096786499, -0.... \n", + "248 [[-0.5592130422592163, -1.6610543727874756, -0... \n", + "249 [[-2.242663860321045, -0.9422457814216614, 0.0... \n", + "\n", + " text y \n", + "0 Dislocation of joint 108367008 \n", + "1 Arthrotec 3384011000036100 \n", + "2 Serum creatinine raised 166717003 \n", + "3 Lipitor 3877011000036101 \n", + "4 Foot eczema 402234004 \n", + ".. ... ... \n", + "245 Dry eyes 162290004 \n", + "246 Mentally dull 419723007 \n", + "247 Norvasc 4216011000036104 \n", + "248 Asthenia 13791008 \n", + "249 Upset stomach 162059005 \n", + "\n", + "[250 rows x 17 columns]" + ], "text/html": [ - "
\n", + "\n", + "
\n", + "
\n", "\n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" ] }, "metadata": {}, - "execution_count": 11 + "execution_count": 12 } ] }, { "cell_type": "code", + "source": [], "metadata": { - "id": "yZT2id3Mxu_Q" + "id": "Yg52noo6PIYV" }, - "source": [ - "" - ], "execution_count": null, "outputs": [] } diff --git a/examples/colab/Training/named_entity_recognition/NLU_training_NER_demo.ipynb b/examples/colab/Training/named_entity_recognition/NLU_training_NER_demo.ipynb index 568a5a74..654405df 100644 --- a/examples/colab/Training/named_entity_recognition/NLU_training_NER_demo.ipynb +++ b/examples/colab/Training/named_entity_recognition/NLU_training_NER_demo.ipynb @@ -1 +1,2348 @@ -{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"NLU_training_NER_demo.ipynb","provenance":[],"collapsed_sections":[]},"kernelspec":{"name":"python3","display_name":"Python 3"}},"cells":[{"cell_type":"markdown","metadata":{"id":"zkufh760uvF3"},"source":["![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)\n","\n","[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/Training/named_entity_recognition/NLU_training_NER_demo.ipynb)\n","\n","\n","\n","# Training a Named Entity Recognition (NER) model with NLU \n","With the [NER_DL model](https://nlp.johnsnowlabs.com/docs/en/annotators#ner-dl-named-entity-recognition-deep-learning-annotator) from Spark NLP you can achieve State Of the Art results on any NER problem \n","\n","This notebook showcases the following features : \n","\n","- How to train the deep learning classifier\n","- How to store a pipeline to disk\n","- How to load the pipeline from disk (Enables NLU offline mode)\n","\n"]},{"cell_type":"markdown","metadata":{"id":"dur2drhW5Rvi"},"source":["# 1. Install Java 8 and NLU"]},{"cell_type":"code","metadata":{"id":"hFGnBCHavltY","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1620191530267,"user_tz":-300,"elapsed":115129,"user":{"displayName":"ahmed lone","photoUrl":"","userId":"02458088882398909889"}},"outputId":"23d94588-aeb0-4b83-fc5c-9345a0274e99"},"source":["!wget https://setup.johnsnowlabs.com/nlu/colab.sh -O - | bash\n"," \n","\n","import nlu"],"execution_count":null,"outputs":[{"output_type":"stream","text":["--2021-05-05 05:10:15-- https://raw.githubusercontent.com/JohnSnowLabs/nlu/master/scripts/colab_setup.sh\n","Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.109.133, 185.199.111.133, ...\n","Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.\n","HTTP request sent, awaiting response... 200 OK\n","Length: 1671 (1.6K) [text/plain]\n","Saving to: ‘STDOUT’\n","\n","\r- 0%[ ] 0 --.-KB/s Installing NLU 3.0.0 with PySpark 3.0.2 and Spark NLP 3.0.1 for Google Colab ...\n","\r- 100%[===================>] 1.63K --.-KB/s in 0.001s \n","\n","2021-05-05 05:10:16 (1.54 MB/s) - written to stdout [1671/1671]\n","\n","\u001b[K |████████████████████████████████| 204.8MB 72kB/s \n","\u001b[K |████████████████████████████████| 153kB 52.9MB/s \n","\u001b[K |████████████████████████████████| 204kB 22.3MB/s \n","\u001b[K |████████████████████████████████| 204kB 46.2MB/s \n","\u001b[?25h Building wheel for pyspark (setup.py) ... \u001b[?25l\u001b[?25hdone\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"f4KkTfnR5Ugg"},"source":["# 2. Download conll2003 dataset"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"OrVb5ZMvvrQD","executionInfo":{"status":"ok","timestamp":1620191531629,"user_tz":-300,"elapsed":116460,"user":{"displayName":"ahmed lone","photoUrl":"","userId":"02458088882398909889"}},"outputId":"ba3d6f05-3c79-4939-f31e-437137762cd3"},"source":["! wget https://github.com/patverga/torch-ner-nlp-from-scratch/raw/master/data/conll2003/eng.train"],"execution_count":null,"outputs":[{"output_type":"stream","text":["--2021-05-05 05:12:10-- https://github.com/patverga/torch-ner-nlp-from-scratch/raw/master/data/conll2003/eng.train\n","Resolving github.com (github.com)... 140.82.114.3\n","Connecting to github.com (github.com)|140.82.114.3|:443... connected.\n","HTTP request sent, awaiting response... 302 Found\n","Location: https://raw.githubusercontent.com/patverga/torch-ner-nlp-from-scratch/master/data/conll2003/eng.train [following]\n","--2021-05-05 05:12:10-- https://raw.githubusercontent.com/patverga/torch-ner-nlp-from-scratch/master/data/conll2003/eng.train\n","Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...\n","Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.\n","HTTP request sent, awaiting response... 200 OK\n","Length: 3283420 (3.1M) [text/plain]\n","Saving to: ‘eng.train’\n","\n","eng.train 100%[===================>] 3.13M --.-KB/s in 0.09s \n","\n","2021-05-05 05:12:11 (36.4 MB/s) - ‘eng.train’ saved [3283420/3283420]\n","\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"0296Om2C5anY"},"source":["# 3. Train Deep Learning Classifier using nlu.load('train.ner')\n","\n","You dataset label column should be named 'y' and the feature column with text data should be named 'text'"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":199},"id":"3ZIPkRkWftBG","executionInfo":{"status":"ok","timestamp":1620192283869,"user_tz":-300,"elapsed":868676,"user":{"displayName":"ahmed lone","photoUrl":"","userId":"02458088882398909889"}},"outputId":"a62cbe12-b235-482a-87e9-98d5f5de5444"},"source":["import nlu\n","# load a trainable pipeline by specifying the train. prefix and fit it on a datset with label and text columns\n","# Since there are no\n","train_path = '/content/eng.train'\n","trainable_pipe = nlu.load('train.ner')\n","fitted_pipe = trainable_pipe.fit(dataset_path=train_path)\n","\n","# predict with the trainable pipeline on dataset and get predictions\n","preds = fitted_pipe.predict('Donald Trump and Angela Merkel dont share many oppinions')\n","preds"],"execution_count":null,"outputs":[{"output_type":"stream","text":["sentence_detector_dl download started this may take some time.\n","Approximate size to download 354.6 KB\n","[OK!]\n","glove_100d download started this may take some time.\n","Approximate size to download 145.3 MB\n","[OK!]\n"],"name":"stdout"},{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
sentenceword_embedding_glovedocumententities_classentitiestokenorigin_index
0[Donald Trump and Angela Merkel dont share man...[[-0.5496799945831299, -0.488319993019104, 0.5...Donald Trump and Angela Merkel dont share many...[PER, PER][Donald Trump, Angela Merkel][Donald, Trump, and, Angela, Merkel, dont, sha...0
\n","
"],"text/plain":[" sentence ... origin_index\n","0 [Donald Trump and Angela Merkel dont share man... ... 0\n","\n","[1 rows x 7 columns]"]},"metadata":{"tags":[]},"execution_count":3}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"owFhjKqzQiv5","executionInfo":{"status":"ok","timestamp":1620192283871,"user_tz":-300,"elapsed":868665,"user":{"displayName":"ahmed lone","photoUrl":"","userId":"02458088882398909889"}},"outputId":"327dc82a-49c2-4362-c982-31e8208ca9f4"},"source":["# Check out the Parameters of the NER model we can configure\n","trainable_pipe.print_info()"],"execution_count":null,"outputs":[{"output_type":"stream","text":["The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n",">>> pipe['named_entity_recognizer_dl'] has settable params:\n","pipe['named_entity_recognizer_dl'].setMinEpochs(0) | Info: Minimum number of epochs to train | Currently set to : 0\n","pipe['named_entity_recognizer_dl'].setMaxEpochs(2) | Info: Maximum number of epochs to train | Currently set to : 2\n","pipe['named_entity_recognizer_dl'].setLr(0.001) | Info: Learning Rate | Currently set to : 0.001\n","pipe['named_entity_recognizer_dl'].setPo(0.005) | Info: Learning rate decay coefficient. Real Learning Rage = lr / (1 + po * epoch) | Currently set to : 0.005\n","pipe['named_entity_recognizer_dl'].setBatchSize(8) | Info: Batch size | Currently set to : 8\n","pipe['named_entity_recognizer_dl'].setDropout(0.5) | Info: Dropout coefficient | Currently set to : 0.5\n","pipe['named_entity_recognizer_dl'].setVerbose(0) | Info: Level of verbosity during training | Currently set to : 0\n","pipe['named_entity_recognizer_dl'].setUseContrib(True) | Info: whether to use contrib LSTM Cells. Not compatible with Windows. Might slightly improve accuracy. | Currently set to : True\n","pipe['named_entity_recognizer_dl'].setValidationSplit(0.0) | Info: Choose the proportion of training dataset to be validated against the model on each Epoch. The value should be between 0.0 and 1.0 and by default it is 0.0 and off. | Currently set to : 0.0\n","pipe['named_entity_recognizer_dl'].setEvaluationLogExtended(False) | Info: Choose the proportion of training dataset to be validated against the model on each Epoch. The value should be between 0.0 and 1.0 and by default it is 0.0 and off. | Currently set to : False\n","pipe['named_entity_recognizer_dl'].setIncludeConfidence(True) | Info: whether to include confidence scores in annotation metadata | Currently set to : True\n","pipe['named_entity_recognizer_dl'].setEnableOutputLogs(False) | Info: Whether to use stdout in addition to Spark logs. | Currently set to : False\n","pipe['named_entity_recognizer_dl'].setEnableMemoryOptimizer(False) | Info: Whether to optimize for large datasets or not. Enabling this option can slow down training. | Currently set to : False\n",">>> pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'] has settable params:\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setExplodeSentences(False) | Info: whether to explode each sentence into a different row, for better parallelization. Defaults to false. | Currently set to : False\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setStorageRef('SentenceDetectorDLModel_c83c27f46b97') | Info: storage unique identifier | Currently set to : SentenceDetectorDLModel_c83c27f46b97\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setEncoder(com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@4b329158) | Info: Data encoder | Currently set to : com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@4b329158\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setImpossiblePenultimates(['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']) | Info: Impossible penultimates | Currently set to : ['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']\n","pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setModelArchitecture('cnn') | Info: Model architecture (CNN) | Currently set to : cnn\n",">>> pipe['glove@glove_100d'] has settable params:\n","pipe['glove@glove_100d'].setIncludeStorage(True) | Info: whether to include indexed storage in trained model | Currently set to : True\n","pipe['glove@glove_100d'].setCaseSensitive(False) | Info: whether to ignore case in tokens for embeddings matching | Currently set to : False\n","pipe['glove@glove_100d'].setDimension(100) | Info: Number of embedding dimensions | Currently set to : 100\n","pipe['glove@glove_100d'].setStorageRef('glove_100d') | Info: unique reference name for identification | Currently set to : glove_100d\n",">>> pipe['default_tokenizer'] has settable params:\n","pipe['default_tokenizer'].setTargetPattern('\\S+') | Info: pattern to grab from text as token candidates. Defaults \\S+ | Currently set to : \\S+\n","pipe['default_tokenizer'].setContextChars(['.', ',', ';', ':', '!', '?', '*', '-', '(', ')', '\"', \"'\"]) | Info: character list used to separate from token boundaries | Currently set to : ['.', ',', ';', ':', '!', '?', '*', '-', '(', ')', '\"', \"'\"]\n","pipe['default_tokenizer'].setCaseSensitiveExceptions(True) | Info: Whether to care for case sensitiveness in exceptions | Currently set to : True\n","pipe['default_tokenizer'].setMinLength(0) | Info: Set the minimum allowed legth for each token | Currently set to : 0\n","pipe['default_tokenizer'].setMaxLength(99999) | Info: Set the maximum allowed legth for each token | Currently set to : 99999\n",">>> pipe['document_assembler'] has settable params:\n","pipe['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n",">>> pipe['chunk_converter@entities'] has settable params:\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"25RTuUXMFyEA"},"source":["# 4. Lets use BERT embeddings instead of the default Glove_100d ones!"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"QMxPpeiDGNVi","executionInfo":{"status":"ok","timestamp":1620192283872,"user_tz":-300,"elapsed":868623,"user":{"displayName":"ahmed lone","photoUrl":"","userId":"02458088882398909889"}},"outputId":"3e37419d-f473-4282-9a72-5fd03a5153c2"},"source":["# We can use nlu.print_components(action='embed') to see every possibler sentence embedding we could use. Lets use bert!\n","nlu.print_components(action='embed')"],"execution_count":null,"outputs":[{"output_type":"stream","text":["For language NLU provides the following Models : \n","nlu.load('en.embed') returns Spark NLP model glove_100d\n","nlu.load('en.embed.glove') returns Spark NLP model glove_100d\n","nlu.load('en.embed.glove.100d') returns Spark NLP model glove_100d\n","nlu.load('en.embed.bert') returns Spark NLP model bert_base_uncased\n","nlu.load('en.embed.bert.base_uncased') returns Spark NLP model bert_base_uncased\n","nlu.load('en.embed.bert.base_cased') returns Spark NLP model bert_base_cased\n","nlu.load('en.embed.bert.large_uncased') returns Spark NLP model bert_large_uncased\n","nlu.load('en.embed.bert.large_cased') returns Spark NLP model bert_large_cased\n","nlu.load('en.embed.biobert') returns Spark NLP model biobert_pubmed_base_cased\n","nlu.load('en.embed.biobert.pubmed_base_cased') returns Spark NLP model biobert_pubmed_base_cased\n","nlu.load('en.embed.biobert.pubmed_large_cased') returns Spark NLP model biobert_pubmed_large_cased\n","nlu.load('en.embed.biobert.pmc_base_cased') returns Spark NLP model biobert_pmc_base_cased\n","nlu.load('en.embed.biobert.pubmed_pmc_base_cased') returns Spark NLP model biobert_pubmed_pmc_base_cased\n","nlu.load('en.embed.biobert.clinical_base_cased') returns Spark NLP model biobert_clinical_base_cased\n","nlu.load('en.embed.biobert.discharge_base_cased') returns Spark NLP model biobert_discharge_base_cased\n","nlu.load('en.embed.elmo') returns Spark NLP model elmo\n","nlu.load('en.embed.use') returns Spark NLP model tfhub_use\n","nlu.load('en.embed.albert') returns Spark NLP model albert_base_uncased\n","nlu.load('en.embed.albert.base_uncased') returns Spark NLP model albert_base_uncased\n","nlu.load('en.embed.albert.large_uncased') returns Spark NLP model albert_large_uncased\n","nlu.load('en.embed.albert.xlarge_uncased') returns Spark NLP model albert_xlarge_uncased\n","nlu.load('en.embed.albert.xxlarge_uncased') returns Spark NLP model albert_xxlarge_uncased\n","nlu.load('en.embed.xlnet') returns Spark NLP model xlnet_base_cased\n","nlu.load('en.embed.xlnet_base_cased') returns Spark NLP model xlnet_base_cased\n","nlu.load('en.embed.xlnet_large_cased') returns Spark NLP model xlnet_large_cased\n","nlu.load('en.embed.electra') returns Spark NLP model electra_small_uncased\n","nlu.load('en.embed.electra.small_uncased') returns Spark NLP model electra_small_uncased\n","nlu.load('en.embed.electra.base_uncased') returns Spark NLP model electra_base_uncased\n","nlu.load('en.embed.electra.large_uncased') returns Spark NLP model electra_large_uncased\n","nlu.load('en.embed.covidbert') returns Spark NLP model covidbert_large_uncased\n","nlu.load('en.embed.covidbert.large_uncased') returns Spark NLP model covidbert_large_uncased\n","nlu.load('en.embed.bert.small_L2_128') returns Spark NLP model small_bert_L2_128\n","nlu.load('en.embed.bert.small_L4_128') returns Spark NLP model small_bert_L4_128\n","nlu.load('en.embed.bert.small_L6_128') returns Spark NLP model small_bert_L6_128\n","nlu.load('en.embed.bert.small_L8_128') returns Spark NLP model small_bert_L8_128\n","nlu.load('en.embed.bert.small_L10_128') returns Spark NLP model small_bert_L10_128\n","nlu.load('en.embed.bert.small_L12_128') returns Spark NLP model small_bert_L12_128\n","nlu.load('en.embed.bert.small_L2_256') returns Spark NLP model small_bert_L2_256\n","nlu.load('en.embed.bert.small_L4_256') returns Spark NLP model small_bert_L4_256\n","nlu.load('en.embed.bert.small_L6_256') returns Spark NLP model small_bert_L6_256\n","nlu.load('en.embed.bert.small_L8_256') returns Spark NLP model small_bert_L8_256\n","nlu.load('en.embed.bert.small_L10_256') returns Spark NLP model small_bert_L10_256\n","nlu.load('en.embed.bert.small_L12_256') returns Spark NLP model small_bert_L12_256\n","nlu.load('en.embed.bert.small_L2_512') returns Spark NLP model small_bert_L2_512\n","nlu.load('en.embed.bert.small_L4_512') returns Spark NLP model small_bert_L4_512\n","nlu.load('en.embed.bert.small_L6_512') returns Spark NLP model small_bert_L6_512\n","nlu.load('en.embed.bert.small_L8_512') returns Spark NLP model small_bert_L8_512\n","nlu.load('en.embed.bert.small_L10_512') returns Spark NLP model small_bert_L10_512\n","nlu.load('en.embed.bert.small_L12_512') returns Spark NLP model small_bert_L12_512\n","nlu.load('en.embed.bert.small_L2_768') returns Spark NLP model small_bert_L2_768\n","nlu.load('en.embed.bert.small_L4_768') returns Spark NLP model small_bert_L4_768\n","nlu.load('en.embed.bert.small_L6_768') returns Spark NLP model small_bert_L6_768\n","nlu.load('en.embed.bert.small_L8_768') returns Spark NLP model small_bert_L8_768\n","nlu.load('en.embed.bert.small_L10_768') returns Spark NLP model small_bert_L10_768\n","nlu.load('en.embed.bert.small_L12_768') returns Spark NLP model small_bert_L12_768\n","For language NLU provides the following Models : \n","nlu.load('ar.embed') returns Spark NLP model arabic_w2v_cc_300d\n","nlu.load('ar.embed.cbow') returns Spark NLP model arabic_w2v_cc_300d\n","nlu.load('ar.embed.cbow.300d') returns Spark NLP model arabic_w2v_cc_300d\n","nlu.load('ar.embed.aner') returns Spark NLP model arabic_w2v_cc_300d\n","nlu.load('ar.embed.aner.300d') returns Spark NLP model arabic_w2v_cc_300d\n","nlu.load('ar.embed.glove') returns Spark NLP model arabic_w2v_cc_300d\n","For language NLU provides the following Models : \n","nlu.load('bn.embed.glove') returns Spark NLP model bengaliner_cc_300d\n","nlu.load('bn.embed') returns Spark NLP model bengaliner_cc_300d\n","For language NLU provides the following Models : \n","nlu.load('fi.embed.bert.') returns Spark NLP model bert_finnish_cased\n","nlu.load('fi.embed.bert.cased.') returns Spark NLP model bert_finnish_cased\n","nlu.load('fi.embed.bert.uncased.') returns Spark NLP model bert_finnish_uncased\n","For language NLU provides the following Models : \n","nlu.load('he.embed') returns Spark NLP model hebrew_cc_300d\n","nlu.load('he.embed.glove') returns Spark NLP model hebrew_cc_300d\n","nlu.load('he.embed.cbow_300d') returns Spark NLP model hebrew_cc_300d\n","For language NLU provides the following Models : \n","nlu.load('hi.embed') returns Spark NLP model hindi_cc_300d\n","For language NLU provides the following Models : \n","nlu.load('fa.embed') returns Spark NLP model persian_w2v_cc_300d\n","nlu.load('fa.embed.word2vec') returns Spark NLP model persian_w2v_cc_300d\n","nlu.load('fa.embed.word2vec.300d') returns Spark NLP model persian_w2v_cc_300d\n","For language NLU provides the following Models : \n","nlu.load('zh.embed') returns Spark NLP model bert_base_chinese\n","nlu.load('zh.embed.bert') returns Spark NLP model bert_base_chinese\n","For language NLU provides the following Models : \n","nlu.load('ur.embed') returns Spark NLP model urduvec_140M_300d\n","nlu.load('ur.embed.glove.300d') returns Spark NLP model urduvec_140M_300d\n","nlu.load('ur.embed.urdu_vec_140M_300d') returns Spark NLP model urduvec_140M_300d\n","For language NLU provides the following Models : \n","nlu.load('xx.embed') returns Spark NLP model glove_840B_300\n","nlu.load('xx.embed.glove.840B_300') returns Spark NLP model glove_840B_300\n","nlu.load('xx.embed.glove.6B_300') returns Spark NLP model glove_6B_300\n","nlu.load('xx.embed.bert_multi_cased') returns Spark NLP model bert_multi_cased\n","nlu.load('xx.embed.bert') returns Spark NLP model bert_multi_cased\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":199},"id":"Xz7xnvbCFxE3","executionInfo":{"status":"ok","timestamp":1620193033467,"user_tz":-300,"elapsed":1617963,"user":{"displayName":"ahmed lone","photoUrl":"","userId":"02458088882398909889"}},"outputId":"9b215cae-fab0-4333-dc4b-224fb80f29e4"},"source":["# Add bert word embeddings to pipe \n","fitted_pipe = nlu.load('bert train.ner').fit(dataset_path=train_path)\n","\n","# predict with the trainable pipeline on dataset and get predictions\n","preds = fitted_pipe.predict('Donald Trump and Angela Merkel dont share many oppinions')\n","preds"],"execution_count":null,"outputs":[{"output_type":"stream","text":["small_bert_L2_128 download started this may take some time.\n","Approximate size to download 16.1 MB\n","[OK!]\n","sentence_detector_dl download started this may take some time.\n","Approximate size to download 354.6 KB\n","[OK!]\n"],"name":"stdout"},{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
sentencedocumentword_embedding_bertentities_classentitiestokenorigin_index
0[Donald Trump and Angela Merkel dont share man...Donald Trump and Angela Merkel dont share many...[[-0.447601318359375, 1.0348621606826782, 0.51...[PER, PER][Donald Trump, Angela Merkel dont][Donald, Trump, and, Angela, Merkel, dont, sha...0
\n","
"],"text/plain":[" sentence ... origin_index\n","0 [Donald Trump and Angela Merkel dont share man... ... 0\n","\n","[1 rows x 7 columns]"]},"metadata":{"tags":[]},"execution_count":6}]},{"cell_type":"markdown","metadata":{"id":"2BB-NwZUoHSe"},"source":["# 5. Lets save the model"]},{"cell_type":"code","metadata":{"id":"eLex095goHwm","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1620193052249,"user_tz":-300,"elapsed":1636720,"user":{"displayName":"ahmed lone","photoUrl":"","userId":"02458088882398909889"}},"outputId":"b30131b2-f8ff-443f-d8ff-98b5ff26d4a2"},"source":["stored_model_path = './models/classifier_dl_trained' \n","fitted_pipe.save(stored_model_path)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["Stored model in ./models/classifier_dl_trained\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"e_b2DPd4rCiU"},"source":["# 6. Lets load the model from HDD.\n","This makes Offlien NLU usage possible! \n","You need to call nlu.load(path=path_to_the_pipe) to load a model/pipeline from disk."]},{"cell_type":"code","metadata":{"id":"SO4uz45MoRgp","colab":{"base_uri":"https://localhost:8080/","height":97},"executionInfo":{"status":"ok","timestamp":1620193057841,"user_tz":-300,"elapsed":1642287,"user":{"displayName":"ahmed lone","photoUrl":"","userId":"02458088882398909889"}},"outputId":"2e60866d-9d3b-4785-ce39-cefea4b2ea17"},"source":["hdd_pipe = nlu.load(path=stored_model_path)\n","\n","preds = hdd_pipe.predict('Donald Trump and Angela Merkel dont share many oppinions on laws about cheeseburgers')\n","preds"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
textsentenceorigin_indexdocumententities_classentitiestokenword_embedding_from_disk
0Donald Trump and Angela Merkel dont share many...[Donald Trump and Angela Merkel dont share man...8589934592Donald Trump and Angela Merkel dont share many...[PER, PER][Donald Trump, Angela Merkel dont][Donald, Trump, and, Angela, Merkel, dont, sha...[[-0.6870571374893188, 1.1118954420089722, 0.5...
\n","
"],"text/plain":[" text ... word_embedding_from_disk\n","0 Donald Trump and Angela Merkel dont share many... ... [[-0.6870571374893188, 1.1118954420089722, 0.5...\n","\n","[1 rows x 8 columns]"]},"metadata":{"tags":[]},"execution_count":8}]},{"cell_type":"code","metadata":{"id":"e0CVlkk9v6Qi","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1620193057843,"user_tz":-300,"elapsed":1642279,"user":{"displayName":"ahmed lone","photoUrl":"","userId":"02458088882398909889"}},"outputId":"95443872-f5fc-4431-b803-20e2020949c1"},"source":["hdd_pipe.print_info()"],"execution_count":null,"outputs":[{"output_type":"stream","text":["The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n",">>> pipe['document_assembler'] has settable params:\n","pipe['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n",">>> pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'] has settable params:\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setExplodeSentences(False) | Info: whether to explode each sentence into a different row, for better parallelization. Defaults to false. | Currently set to : False\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setStorageRef('SentenceDetectorDLModel_c83c27f46b97') | Info: storage unique identifier | Currently set to : SentenceDetectorDLModel_c83c27f46b97\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setEncoder(com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@75127717) | Info: Data encoder | Currently set to : com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@75127717\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setImpossiblePenultimates(['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']) | Info: Impossible penultimates | Currently set to : ['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']\n","pipe['sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setModelArchitecture('cnn') | Info: Model architecture (CNN) | Currently set to : cnn\n",">>> pipe['default_tokenizer'] has settable params:\n","pipe['default_tokenizer'].setCaseSensitiveExceptions(True) | Info: Whether to care for case sensitiveness in exceptions | Currently set to : True\n","pipe['default_tokenizer'].setTargetPattern('\\S+') | Info: pattern to grab from text as token candidates. Defaults \\S+ | Currently set to : \\S+\n","pipe['default_tokenizer'].setMaxLength(99999) | Info: Set the maximum allowed length for each token | Currently set to : 99999\n","pipe['default_tokenizer'].setMinLength(0) | Info: Set the minimum allowed length for each token | Currently set to : 0\n",">>> pipe['bert@small_bert_L2_128'] has settable params:\n","pipe['bert@small_bert_L2_128'].setBatchSize(8) | Info: Size of every batch | Currently set to : 8\n","pipe['bert@small_bert_L2_128'].setCaseSensitive(False) | Info: whether to ignore case in tokens for embeddings matching | Currently set to : False\n","pipe['bert@small_bert_L2_128'].setDimension(128) | Info: Number of embedding dimensions | Currently set to : 128\n","pipe['bert@small_bert_L2_128'].setMaxSentenceLength(128) | Info: Max sentence length to process | Currently set to : 128\n","pipe['bert@small_bert_L2_128'].setStorageRef('small_bert_L2_128') | Info: unique reference name for identification | Currently set to : small_bert_L2_128\n",">>> pipe['named_entity_recognizer_dl@small_bert_L2_128'] has settable params:\n","pipe['named_entity_recognizer_dl@small_bert_L2_128'].setBatchSize(8) | Info: Size of every batch | Currently set to : 8\n","pipe['named_entity_recognizer_dl@small_bert_L2_128'].setIncludeConfidence(True) | Info: whether to include confidence scores in annotation metadata | Currently set to : True\n","pipe['named_entity_recognizer_dl@small_bert_L2_128'].setClasses(['O', 'B-ORG', 'I-ORG', 'I-MISC', 'I-PER', 'B-LOC', 'B-MISC', 'I-LOC']) | Info: get the tags used to trained this NerDLModel | Currently set to : ['O', 'B-ORG', 'I-ORG', 'I-MISC', 'I-PER', 'B-LOC', 'B-MISC', 'I-LOC']\n","pipe['named_entity_recognizer_dl@small_bert_L2_128'].setStorageRef('small_bert_L2_128') | Info: unique reference name for identification | Currently set to : small_bert_L2_128\n",">>> pipe['ner_to_chunk_converter'] has settable params:\n","pipe['ner_to_chunk_converter'].setPreservePosition(True) | Info: Whether to preserve the original position of the tokens in the original document or use the modified tokens | Currently set to : True\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"USD6d66Sw6_P"},"source":[""],"execution_count":null,"outputs":[]}]} \ No newline at end of file +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + } + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "zkufh760uvF3" + }, + "source": [ + "![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)\n", + "\n", + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/Training/named_entity_recognition/NLU_training_NER_demo.ipynb)\n", + "\n", + "\n", + "\n", + "# Training a Named Entity Recognition (NER) model with NLU\n", + "With the [NER_DL model](https://nlp.johnsnowlabs.com/docs/en/annotators#ner-dl-named-entity-recognition-deep-learning-annotator) from Spark NLP you can achieve State Of the Art results on any NER problem\n", + "\n", + "This notebook showcases the following features :\n", + "\n", + "- How to train the deep learning classifier\n", + "- How to store a pipeline to disk\n", + "- How to load the pipeline from disk (Enables NLU offline mode)\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dur2drhW5Rvi" + }, + "source": [ + "# 1. Colab Setup" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "hFGnBCHavltY" + }, + "source": [ + "# Install the johnsnowlabs library\n", + "! pip install -q johnsnowlabs\n" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "f4KkTfnR5Ugg" + }, + "source": [ + "# 2. Download conll2003 dataset" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "OrVb5ZMvvrQD", + "outputId": "3b928110-85ec-43b4-fb0b-59b7f8a1e82b" + }, + "source": [ + "! wget https://github.com/patverga/torch-ner-nlp-from-scratch/raw/master/data/conll2003/eng.train" + ], + "execution_count": 2, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "--2023-10-27 13:23:45-- https://github.com/patverga/torch-ner-nlp-from-scratch/raw/master/data/conll2003/eng.train\n", + "Resolving github.com (github.com)... 140.82.112.4\n", + "Connecting to github.com (github.com)|140.82.112.4|:443... connected.\n", + "HTTP request sent, awaiting response... 302 Found\n", + "Location: https://raw.githubusercontent.com/patverga/torch-ner-nlp-from-scratch/master/data/conll2003/eng.train [following]\n", + "--2023-10-27 13:23:45-- https://raw.githubusercontent.com/patverga/torch-ner-nlp-from-scratch/master/data/conll2003/eng.train\n", + "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.109.133, 185.199.108.133, ...\n", + "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 3283420 (3.1M) [text/plain]\n", + "Saving to: ‘eng.train’\n", + "\n", + "eng.train 100%[===================>] 3.13M --.-KB/s in 0.09s \n", + "\n", + "2023-10-27 13:23:46 (35.7 MB/s) - ‘eng.train’ saved [3283420/3283420]\n", + "\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0296Om2C5anY" + }, + "source": [ + "# 3. Train Deep Learning Classifier using nlu.load('train.ner')\n", + "\n", + "You dataset label column should be named 'y' and the feature column with text data should be named 'text'" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 286 + }, + "id": "3ZIPkRkWftBG", + "outputId": "fc7f1e4b-6286-4171-8b65-46d7567e607b" + }, + "source": [ + "from johnsnowlabs import nlp\n", + "# load a trainable pipeline by specifying the train. prefix and fit it on a dataset with label and text columns\n", + "# Since there are no\n", + "train_path = '/content/eng.train'\n", + "trainable_pipe = nlp.load('train.ner')\n", + "fitted_pipe = trainable_pipe.fit(dataset_path=train_path)\n", + "\n", + "# predict with the trainable pipeline on dataset and get predictions\n", + "preds = fitted_pipe.predict('Donald Trump and Angela Merkel dont share many oppinions')\n", + "preds" + ], + "execution_count": 3, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "small_bert_L2_128 download started this may take some time.\n", + "Approximate size to download 16.1 MB\n", + "[OK!]\n", + "sentence_detector_dl download started this may take some time.\n", + "Approximate size to download 354.6 KB\n", + "[OK!]\n", + "Warning::Spark Session already created, some configs may not take.\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " document entities_ner \\\n", + "0 Donald Trump and Angela Merkel dont share many... Donald Trump \n", + "0 Donald Trump and Angela Merkel dont share many... Angela Merkel dont \n", + "\n", + " entities_ner_class entities_ner_confidence entities_ner_origin_chunk \\\n", + "0 PER 0.9544 0 \n", + "0 PER 0.88476664 1 \n", + "\n", + " entities_ner_origin_sentence \\\n", + "0 0 \n", + "0 0 \n", + "\n", + " word_embedding_bert \n", + "0 [[-0.44760167598724365, 1.0348622798919678, 0.... \n", + "0 [[-0.44760167598724365, 1.0348622798919678, 0.... " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
documententities_nerentities_ner_classentities_ner_confidenceentities_ner_origin_chunkentities_ner_origin_sentenceword_embedding_bert
0Donald Trump and Angela Merkel dont share many...Donald TrumpPER0.954400[[-0.44760167598724365, 1.0348622798919678, 0....
0Donald Trump and Angela Merkel dont share many...Angela Merkel dontPER0.8847666410[[-0.44760167598724365, 1.0348622798919678, 0....
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 3 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "owFhjKqzQiv5", + "outputId": "7be23961-d30c-4176-cb25-ad9b6920806c" + }, + "source": [ + "# Check out the Parameters of the NER model we can configure\n", + "trainable_pipe.print_info()" + ], + "execution_count": 4, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n", + ">>> component_list['document_assembler'] has settable params:\n", + "component_list['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n", + ">>> component_list['bert_embeddings@small_bert_L2_128'] has settable params:\n", + "component_list['bert_embeddings@small_bert_L2_128'].setBatchSize(8) | Info: Size of every batch | Currently set to : 8\n", + "component_list['bert_embeddings@small_bert_L2_128'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['bert_embeddings@small_bert_L2_128'].setMaxSentenceLength(128) | Info: Max sentence length to process | Currently set to : 128\n", + "component_list['bert_embeddings@small_bert_L2_128'].setDimension(128) | Info: Number of embedding dimensions | Currently set to : 128\n", + "component_list['bert_embeddings@small_bert_L2_128'].setCaseSensitive(False) | Info: whether to ignore case in tokens for embeddings matching | Currently set to : False\n", + "component_list['bert_embeddings@small_bert_L2_128'].setStorageRef('small_bert_L2_128') | Info: unique reference name for identification | Currently set to : small_bert_L2_128\n", + ">>> component_list['tokenizer'] has settable params:\n", + "component_list['tokenizer'].setTargetPattern('\\S+') | Info: pattern to grab from text as token candidates. Defaults \\S+ | Currently set to : \\S+\n", + "component_list['tokenizer'].setContextChars(['.', ',', ';', ':', '!', '?', '*', '-', '(', ')', '\"', \"'\"]) | Info: character list used to separate from token boundaries | Currently set to : ['.', ',', ';', ':', '!', '?', '*', '-', '(', ')', '\"', \"'\"]\n", + "component_list['tokenizer'].setCaseSensitiveExceptions(True) | Info: Whether to care for case sensitiveness in exceptions | Currently set to : True\n", + "component_list['tokenizer'].setMinLength(0) | Info: Set the minimum allowed length for each token | Currently set to : 0\n", + "component_list['tokenizer'].setMaxLength(99999) | Info: Set the maximum allowed length for each token | Currently set to : 99999\n", + ">>> component_list['chunk_converter@entities'] has settable params:\n", + ">>> component_list['ner_dl@small_bert_L2_128'] has settable params:\n", + "component_list['ner_dl@small_bert_L2_128'].setBatchSize(8) | Info: Size of every batch | Currently set to : 8\n", + "component_list['ner_dl@small_bert_L2_128'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['ner_dl@small_bert_L2_128'].setIncludeConfidence(True) | Info: whether to include confidence scores in annotation metadata | Currently set to : True\n", + "component_list['ner_dl@small_bert_L2_128'].setIncludeAllConfidenceScores(False) | Info: whether to include all confidence scores in annotation metadata or just the score of the predicted tag | Currently set to : False\n", + "component_list['ner_dl@small_bert_L2_128'].setStorageRef('small_bert_L2_128') | Info: unique reference name for identification | Currently set to : small_bert_L2_128\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "25RTuUXMFyEA" + }, + "source": [ + "# 4. Lets use BERT embeddings instead of the default Glove_100d ones!" + ] + }, + { + "cell_type": "code", + "source": [ + "nlp.nlu.print_components(action='embed')\n" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "ixhWEXz6Yc7z", + "outputId": "b03dbb2d-7e23-43fc-a642-2a1f941f2719" + }, + "execution_count": 8, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "For language NLU provides the following Models : \n", + "nlu.load('af.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('als.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('am.embed.am_roberta') returns Spark NLP model_anno_obj roberta_embeddings_am_roberta\n", + "nlu.load('am.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "nlu.load('am.embed.xlm_roberta') returns Spark NLP model_anno_obj xlm_roberta_base_finetuned_amharic\n", + "For language NLU provides the following Models : \n", + "nlu.load('an.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('ar.embed') returns Spark NLP model_anno_obj arabic_w2v_cc_300d\n", + "nlu.load('ar.embed.AraBertMo_base_V1') returns Spark NLP model_anno_obj bert_embeddings_AraBertMo_base_V1\n", + "nlu.load('ar.embed.Ara_DialectBERT') returns Spark NLP model_anno_obj bert_embeddings_Ara_DialectBERT\n", + "nlu.load('ar.embed.DarijaBERT') returns Spark NLP model_anno_obj bert_embeddings_DarijaBERT\n", + "nlu.load('ar.embed.MARBERT') returns Spark NLP model_anno_obj bert_embeddings_MARBERT\n", + "nlu.load('ar.embed.MARBERTv2') returns Spark NLP model_anno_obj bert_embeddings_MARBERTv2\n", + "nlu.load('ar.embed.albert') returns Spark NLP model_anno_obj albert_embeddings_albert_base_arabic\n", + "nlu.load('ar.embed.albert_large_arabic') returns Spark NLP model_anno_obj albert_embeddings_albert_large_arabic\n", + "nlu.load('ar.embed.albert_xlarge_arabic') returns Spark NLP model_anno_obj albert_embeddings_albert_xlarge_arabic\n", + "nlu.load('ar.embed.aner') returns Spark NLP model_anno_obj arabic_w2v_cc_300d\n", + "nlu.load('ar.embed.aner.300d') returns Spark NLP model_anno_obj arabic_w2v_cc_300d\n", + "nlu.load('ar.embed.arabert_c19') returns Spark NLP model_anno_obj bert_embeddings_arabert_c19\n", + "nlu.load('ar.embed.arbert') returns Spark NLP model_anno_obj bert_embeddings_ARBERT\n", + "nlu.load('ar.embed.bert') returns Spark NLP model_anno_obj bert_embeddings_arbert\n", + "nlu.load('ar.embed.bert.base') returns Spark NLP model_anno_obj bert_embeddings_base_arabert\n", + "nlu.load('ar.embed.bert.base.by_asafaya') returns Spark NLP model_anno_obj bert_embeddings_base_arabic\n", + "nlu.load('ar.embed.bert.base.v1.by_aubmindlab') returns Spark NLP model_anno_obj bert_embeddings_base_arabertv01\n", + "nlu.load('ar.embed.bert.base.v2.by_aubmindlab') returns Spark NLP model_anno_obj bert_embeddings_base_arabertv02\n", + "nlu.load('ar.embed.bert.base_mix.by_camel_lab') returns Spark NLP model_anno_obj bert_embeddings_base_arabic_camel_mix\n", + "nlu.load('ar.embed.bert.base_msa.by_camel_lab') returns Spark NLP model_anno_obj bert_embeddings_base_arabic_camel_msa\n", + "nlu.load('ar.embed.bert.base_msa_eighth.by_camel_lab') returns Spark NLP model_anno_obj bert_embeddings_base_arabic_camel_msa_eighth\n", + "nlu.load('ar.embed.bert.base_msa_half.by_camel_lab') returns Spark NLP model_anno_obj bert_embeddings_base_arabic_camel_msa_half\n", + "nlu.load('ar.embed.bert.base_msa_quarter.by_camel_lab') returns Spark NLP model_anno_obj bert_embeddings_base_arabic_camel_msa_quarter\n", + "nlu.load('ar.embed.bert.base_msa_sixteenth.by_camel_lab') returns Spark NLP model_anno_obj bert_embeddings_base_arabic_camel_msa_sixteenth\n", + "nlu.load('ar.embed.bert.by_ubc_nlp') returns Spark NLP model_anno_obj bert_embeddings_marbert\n", + "nlu.load('ar.embed.bert.cased_base') returns Spark NLP model_anno_obj bert_embeddings_base_ar_cased\n", + "nlu.load('ar.embed.bert.large') returns Spark NLP model_anno_obj bert_embeddings_large_arabertv02\n", + "nlu.load('ar.embed.bert.large.by_asafaya') returns Spark NLP model_anno_obj bert_embeddings_large_arabic\n", + "nlu.load('ar.embed.bert.medium') returns Spark NLP model_anno_obj bert_embeddings_medium_arabic\n", + "nlu.load('ar.embed.bert.mini') returns Spark NLP model_anno_obj bert_embeddings_mini_arabic\n", + "nlu.load('ar.embed.bert.v2') returns Spark NLP model_anno_obj bert_embeddings_marbertv2\n", + "nlu.load('ar.embed.bert.v2_base') returns Spark NLP model_anno_obj bert_embeddings_base_arabertv2\n", + "nlu.load('ar.embed.bert.v2_large') returns Spark NLP model_anno_obj bert_embeddings_large_arabertv2\n", + "nlu.load('ar.embed.bert_base_arabert') returns Spark NLP model_anno_obj bert_embeddings_bert_base_arabert\n", + "nlu.load('ar.embed.bert_base_arabertv01') returns Spark NLP model_anno_obj bert_embeddings_bert_base_arabertv01\n", + "nlu.load('ar.embed.bert_base_arabertv02') returns Spark NLP model_anno_obj bert_embeddings_bert_base_arabertv02\n", + "nlu.load('ar.embed.bert_base_arabertv02_twitter') returns Spark NLP model_anno_obj bert_embeddings_bert_base_arabertv02_twitter\n", + "nlu.load('ar.embed.bert_base_arabertv2') returns Spark NLP model_anno_obj bert_embeddings_bert_base_arabertv2\n", + "nlu.load('ar.embed.bert_base_arabic') returns Spark NLP model_anno_obj bert_embeddings_bert_base_arabic\n", + "nlu.load('ar.embed.bert_base_arabic_camelbert_mix') returns Spark NLP model_anno_obj bert_embeddings_bert_base_arabic_camelbert_mix\n", + "nlu.load('ar.embed.bert_base_arabic_camelbert_msa') returns Spark NLP model_anno_obj bert_embeddings_bert_base_arabic_camelbert_msa\n", + "nlu.load('ar.embed.bert_base_arabic_camelbert_msa_eighth') returns Spark NLP model_anno_obj bert_embeddings_bert_base_arabic_camelbert_msa_eighth\n", + "nlu.load('ar.embed.bert_base_arabic_camelbert_msa_half') returns Spark NLP model_anno_obj bert_embeddings_bert_base_arabic_camelbert_msa_half\n", + "nlu.load('ar.embed.bert_base_arabic_camelbert_msa_quarter') returns Spark NLP model_anno_obj bert_embeddings_bert_base_arabic_camelbert_msa_quarter\n", + "nlu.load('ar.embed.bert_base_arabic_camelbert_msa_sixteenth') returns Spark NLP model_anno_obj bert_embeddings_bert_base_arabic_camelbert_msa_sixteenth\n", + "nlu.load('ar.embed.bert_base_qarib') returns Spark NLP model_anno_obj bert_embeddings_bert_base_qarib\n", + "nlu.load('ar.embed.bert_base_qarib60_1790k') returns Spark NLP model_anno_obj bert_embeddings_bert_base_qarib60_1790k\n", + "nlu.load('ar.embed.bert_base_qarib60_860k') returns Spark NLP model_anno_obj bert_embeddings_bert_base_qarib60_860k\n", + "nlu.load('ar.embed.bert_large_arabertv02') returns Spark NLP model_anno_obj bert_embeddings_bert_large_arabertv02\n", + "nlu.load('ar.embed.bert_large_arabertv02_twitter') returns Spark NLP model_anno_obj bert_embeddings_bert_large_arabertv02_twitter\n", + "nlu.load('ar.embed.bert_large_arabertv2') returns Spark NLP model_anno_obj bert_embeddings_bert_large_arabertv2\n", + "nlu.load('ar.embed.bert_large_arabic') returns Spark NLP model_anno_obj bert_embeddings_bert_large_arabic\n", + "nlu.load('ar.embed.bert_medium_arabic') returns Spark NLP model_anno_obj bert_embeddings_bert_medium_arabic\n", + "nlu.load('ar.embed.bert_mini_arabic') returns Spark NLP model_anno_obj bert_embeddings_bert_mini_arabic\n", + "nlu.load('ar.embed.cbow') returns Spark NLP model_anno_obj arabic_w2v_cc_300d\n", + "nlu.load('ar.embed.cbow.300d') returns Spark NLP model_anno_obj arabic_w2v_cc_300d\n", + "nlu.load('ar.embed.distilbert') returns Spark NLP model_anno_obj distilbert_embeddings_distilbert_base_ar_cased\n", + "nlu.load('ar.embed.dziribert') returns Spark NLP model_anno_obj bert_embeddings_dziribert\n", + "nlu.load('ar.embed.electra.base') returns Spark NLP model_anno_obj electra_embeddings_araelectra_base_generator\n", + "nlu.load('ar.embed.glove') returns Spark NLP model_anno_obj arabic_w2v_cc_300d\n", + "nlu.load('ar.embed.mbert_ar_c19') returns Spark NLP model_anno_obj bert_embeddings_mbert_ar_c19\n", + "nlu.load('ar.embed.multi_dialect_bert_base_arabic') returns Spark NLP model_anno_obj bert_embeddings_multi_dialect_bert_base_arabic\n", + "For language NLU provides the following Models : \n", + "nlu.load('arz.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('as.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('ast.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('az.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('azb.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('ba.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('bar.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('bcl.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('be.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('bg.embed.bert.cased_base') returns Spark NLP model_anno_obj bert_embeddings_base_bg_cased\n", + "nlu.load('bg.embed.roberta.base') returns Spark NLP model_anno_obj roberta_embeddings_base_bulgarian\n", + "nlu.load('bg.embed.roberta.small') returns Spark NLP model_anno_obj roberta_embeddings_small_bulgarian\n", + "nlu.load('bg.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('bh.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('bn.embed') returns Spark NLP model_anno_obj bengali_cc_300d\n", + "nlu.load('bn.embed.bangala_bert') returns Spark NLP model_anno_obj bert_embeddings_bangla_bert_base\n", + "nlu.load('bn.embed.bangla_bert') returns Spark NLP model_anno_obj bert_embeddings_bangla_bert\n", + "nlu.load('bn.embed.bert') returns Spark NLP model_anno_obj bert_embeddings_indic_transformers\n", + "nlu.load('bn.embed.bert.base') returns Spark NLP model_anno_obj bert_embeddings_bangla_base\n", + "nlu.load('bn.embed.distil_bert') returns Spark NLP model_anno_obj distilbert_embeddings_indic_transformers\n", + "nlu.load('bn.embed.glove') returns Spark NLP model_anno_obj bengali_cc_300d\n", + "nlu.load('bn.embed.indic_transformers_bn_bert') returns Spark NLP model_anno_obj bert_embeddings_indic_transformers_bn_bert\n", + "nlu.load('bn.embed.indic_transformers_bn_distilbert') returns Spark NLP model_anno_obj distilbert_embeddings_indic_transformers_bn_distilbert\n", + "nlu.load('bn.embed.muril_adapted_local') returns Spark NLP model_anno_obj bert_embeddings_muril_adapted_local\n", + "nlu.load('bn.embed.roberta') returns Spark NLP model_anno_obj roberta_embeddings_indic_transformers\n", + "nlu.load('bn.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "nlu.load('bn.embed.xlmr_roberta') returns Spark NLP model_anno_obj xlmroberta_embeddings_indic_transformers_bn_xlmroberta\n", + "For language NLU provides the following Models : \n", + "nlu.load('bo.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('bpy.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language
NLU provides the following Models : \n", + "nlu.load('br.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('bs.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('ca.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('ce.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('ceb.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('co.embed.roberta.small') returns Spark NLP model_anno_obj roberta_embeddings_codeberta_small_v1\n", + "nlu.load('co.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('cs.embed.bert') returns Spark NLP model_anno_obj bert_embeddings_fernet_c5\n", + "nlu.load('cs.embed.roberta.news.') returns Spark NLP model_anno_obj roberta_embeddings_fernet_news\n", + "nlu.load('cs.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('cv.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('cy.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('da.embed.bert.cased_base') returns Spark NLP model_anno_obj bert_embeddings_base_da_cased\n", + "nlu.load('da.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('de.embed.albert_german_ner') returns Spark NLP model_anno_obj albert_embeddings_albert_german_ner\n", + "nlu.load('de.embed.bert') returns Spark NLP model_anno_obj bert_base_german_cased\n", + "nlu.load('de.embed.bert.base') returns Spark NLP model_anno_obj bert_embeddings_g_base\n", + "nlu.load('de.embed.bert.by_smanjil') returns Spark NLP model_anno_obj bert_embeddings_german_medbert\n", + "nlu.load('de.embed.bert.cased_base') returns Spark NLP model_anno_obj bert_embeddings_base_de_cased\n", + "nlu.load('de.embed.bert.cased_base.by_dbmdz') returns Spark NLP model_anno_obj bert_embeddings_dbmdz_base_german_cased\n", + "nlu.load('de.embed.bert.cased_base.by_uploaded by huggingface') returns Spark NLP model_anno_obj bert_embeddings_base_german_cased\n", + "nlu.load('de.embed.bert.finance') returns Spark NLP model_anno_obj bert_sentence_embeddings_financial\n", + "nlu.load('de.embed.bert.large') returns Spark NLP model_anno_obj bert_embeddings_g_large\n", + "nlu.load('de.embed.bert.uncased') returns Spark NLP model_anno_obj bert_base_german_uncased\n", + "nlu.load('de.embed.bert.uncased_base') returns Spark NLP model_anno_obj bert_embeddings_base_german_uncased\n", + "nlu.load('de.embed.bert_base_5lang_cased') returns Spark NLP model_anno_obj bert_embeddings_bert_base_5lang_cased\n", + "nlu.load('de.embed.bert_base_de_cased') returns Spark NLP model_anno_obj bert_embeddings_bert_base_de_cased\n", + "nlu.load('de.embed.bert_base_german_cased_oldvocab') returns Spark NLP model_anno_obj bert_embeddings_bert_base_german_cased_oldvocab\n", + "nlu.load('de.embed.bert_base_german_dbmdz_cased') returns Spark NLP model_anno_obj bert_embeddings_bert_base_german_dbmdz_cased\n", + "nlu.load('de.embed.bert_base_german_dbmdz_uncased') returns Spark NLP model_anno_obj bert_embeddings_bert_base_german_dbmdz_uncased\n", + "nlu.load('de.embed.bert_base_german_uncased') returns Spark NLP model_anno_obj bert_embeddings_bert_base_german_uncased\n", + "nlu.load('de.embed.bert_base_historical_german_rw_cased') returns Spark NLP model_anno_obj bert_embeddings_bert_base_historical_german_rw_cased\n", + "nlu.load('de.embed.distilbert_base_de_cased') returns Spark NLP model_anno_obj distilbert_embeddings_distilbert_base_de_cased\n", + "nlu.load('de.embed.distilbert_base_german_cased') returns Spark NLP model_anno_obj distilbert_embeddings_distilbert_base_german_cased\n", + "nlu.load('de.embed.electra.base') returns Spark NLP model_anno_obj electra_embeddings_gelectra_base_generator\n", + "nlu.load('de.embed.electra.cased_base_64d') returns Spark NLP model_anno_obj electra_embeddings_electra_base_gc4_64k_0_cased_generator\n", + "nlu.load('de.embed.electra.cased_base_gc4_64k_100000.by_stefan_it') returns Spark NLP model_anno_obj electra_embeddings_electra_base_gc4_64k_100000_cased_generator\n", + "nlu.load('de.embed.electra.cased_base_gc4_64k_1000000.by_stefan_it') returns Spark NLP model_anno_obj electra_embeddings_electra_base_gc4_64k_1000000_cased_generator\n", + "nlu.load('de.embed.electra.cased_base_gc4_64k_200000.by_stefan_it') returns Spark NLP model_anno_obj electra_embeddings_electra_base_gc4_64k_200000_cased_generator\n", + "nlu.load('de.embed.electra.cased_base_gc4_64k_300000.by_stefan_it') returns Spark NLP model_anno_obj electra_embeddings_electra_base_gc4_64k_300000_cased_generator\n", + "nlu.load('de.embed.electra.cased_base_gc4_64k_400000.by_stefan_it') returns Spark NLP model_anno_obj electra_embeddings_electra_base_gc4_64k_400000_cased_generator\n", + "nlu.load('de.embed.electra.cased_base_gc4_64k_500000.by_stefan_it') returns Spark NLP model_anno_obj electra_embeddings_electra_base_gc4_64k_500000_cased_generator\n", + "nlu.load('de.embed.electra.cased_base_gc4_64k_600000.by_stefan_it') returns Spark NLP model_anno_obj electra_embeddings_electra_base_gc4_64k_600000_cased_generator\n", + "nlu.load('de.embed.electra.cased_base_gc4_64k_700000.by_stefan_it') returns Spark NLP model_anno_obj electra_embeddings_electra_base_gc4_64k_700000_cased_generator\n", + "nlu.load('de.embed.electra.cased_base_gc4_64k_800000.by_stefan_it') returns Spark NLP model_anno_obj electra_embeddings_electra_base_gc4_64k_800000_cased_generator\n", + "nlu.load('de.embed.electra.cased_base_gc4_64k_900000.by_stefan_it') returns Spark NLP model_anno_obj electra_embeddings_electra_base_gc4_64k_900000_cased_generator\n", + "nlu.load('de.embed.electra.large') returns Spark NLP model_anno_obj electra_embeddings_gelectra_large_generator\n", + "nlu.load('de.embed.gbert_base') returns Spark NLP model_anno_obj bert_embeddings_gbert_base\n", + "nlu.load('de.embed.gbert_large') returns Spark NLP model_anno_obj bert_embeddings_gbert_large\n", + "nlu.load('de.embed.german_financial_statements_bert') returns Spark NLP model_anno_obj bert_embeddings_german_financial_statements_bert\n", + "nlu.load('de.embed.medbert') returns Spark NLP model_anno_obj bert_embeddings_German_MedBERT\n", + "nlu.load('de.embed.roberta') returns Spark NLP model_anno_obj roberta_embeddings_hotelbert\n", + "nlu.load('de.embed.roberta.small') returns Spark NLP model_anno_obj roberta_embeddings_hotelbert_small\n", + "nlu.load('de.embed.roberta_base_wechsel_german') returns Spark NLP model_anno_obj roberta_embeddings_roberta_base_wechsel_german\n", + "For language NLU provides the following Models : \n", + "nlu.load('diq.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('dv.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('el.embed.bert.base_uncased') returns Spark NLP model_anno_obj bert_base_uncased\n", + "nlu.load('el.embed.bert.cased_base') returns Spark NLP model_anno_obj bert_embeddings_base_el_cased\n", + "nlu.load('el.embed.bert.uncased_base') returns Spark NLP model_anno_obj bert_embeddings_greeksocial_base_greek_uncased_v1\n", + "nlu.load('el.embed.roberta.uncased_base') returns Spark NLP model_anno_obj roberta_embeddings_palobert_base_greek_uncased_v1\n", + "For language NLU provides the following Models : \n", + "nlu.load('eml.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('en.embed') returns Spark NLP model_anno_obj glove_100d\n", + "nlu.load('en.embed.Bible_roberta_base') returns Spark NLP model_anno_obj roberta_embeddings_Bible_roberta_base\n", + "nlu.load('en.embed.COVID_SciBERT') returns Spark NLP model_anno_obj bert_embeddings_COVID_SciBERT\n", + "nlu.load('en.embed.DiLBERT') returns Spark NLP model_anno_obj bert_embeddings_DiLBERT\n", + "nlu.load('en.embed.FinancialBERT') returns Spark NLP model_anno_obj bert_embeddings_FinancialBERT\n", + "nlu.load('en.embed.SecBERT') returns Spark NLP model_anno_obj bert_embeddings_SecBERT\n", + "nlu.load('en.embed.SecRoBERTa') returns Spark NLP model_anno_obj roberta_embeddings_SecRoBERTa\n", + "nlu.load('en.embed.agriculture_bert_uncased') returns Spark NLP model_anno_obj bert_embeddings_agriculture_bert_uncased\n", + "nlu.load('en.embed.albert') returns Spark NLP model_anno_obj albert_base_uncased\n", + "nlu.load('en.embed.albert.base_uncased') returns Spark NLP model_anno_obj albert_base_uncased\n", + "nlu.load('en.embed.albert.large_uncased') returns Spark NLP model_anno_obj albert_large_uncased\n", + "nlu.load('en.embed.albert.xlarge_uncased') returns Spark NLP model_anno_obj albert_xlarge_uncased\n", + "nlu.load('en.embed.albert.xxlarge_uncased') returns Spark NLP model_anno_obj albert_xxlarge_uncased\n", + "nlu.load('en.embed.albert_base_v1') returns Spark NLP model_anno_obj albert_embeddings_albert_base_v1\n", + "nlu.load('en.embed.albert_xlarge_v1') returns Spark NLP model_anno_obj albert_embeddings_albert_xlarge_v1\n", + "nlu.load('en.embed.albert_xxlarge_v1') returns Spark NLP model_anno_obj albert_embeddings_albert_xxlarge_v1\n", + "nlu.load('en.embed.bert') returns Spark NLP model_anno_obj bert_base_uncased\n", + "nlu.load('en.embed.bert.base') returns Spark NLP model_anno_obj bert_embeddings_v_2021_base\n", + "nlu.load('en.embed.bert.base_cased') returns Spark NLP model_anno_obj bert_base_cased\n", + "nlu.load('en.embed.bert.base_uncased') returns Spark NLP model_anno_obj bert_base_uncased\n", + "nlu.load('en.embed.bert.base_uncased_legal') returns Spark NLP model_anno_obj bert_base_uncased_legal\n", + "nlu.load('en.embed.bert.by_anferico') returns Spark NLP model_anno_obj bert_embeddings_for_patents\n", + "nlu.load('en.embed.bert.by_beatrice_portelli') returns Spark NLP model_anno_obj bert_embeddings_dilbert\n", + "nlu.load('en.embed.bert.by_law_ai') returns Spark NLP model_anno_obj bert_embeddings_incaselawbert\n", + "nlu.load('en.embed.bert.by_philschmid') returns Spark NLP model_anno_obj bert_embeddings_fin_pretrain_yiyanghkust\n", + "nlu.load('en.embed.bert.cased_base') returns Spark NLP model_anno_obj bert_embeddings_jobbert_base_cased\n", + "nlu.load('en.embed.bert.cased_base.by_ayansinha') returns Spark NLP model_anno_obj bert_embeddings_lic_class_scancode_base_cased_l32_1\n", + "nlu.load('en.embed.bert.cased_base.by_geotrend') returns Spark NLP model_anno_obj bert_embeddings_base_en_cased\n", + "nlu.load('en.embed.bert.cased_base.by_model_attribution_challenge') returns Spark NLP model_anno_obj bert_embeddings_model_attribution_challenge_base_cased\n", + "nlu.load('en.embed.bert.cased_base.by_uploaded by huggingface') returns Spark NLP model_anno_obj bert_embeddings_base_cased\n", + "nlu.load('en.embed.bert.cased_large') returns Spark NLP model_anno_obj bert_embeddings_large_cased\n", + "nlu.load('en.embed.bert.cased_large_whole_word_masking') returns Spark NLP model_anno_obj bert_embeddings_large_cased_whole_word_masking\n", + "nlu.load('en.embed.bert.contracts.large_small_finetuned_legal') returns Spark NLP model_anno_obj bert_embeddings_bert_small_finetuned_legal_contracts_larger20_5_1\n", + "nlu.load('en.embed.bert.contracts.large_small_finetuned_legal.by_muhtasham') returns Spark NLP model_anno_obj bert_embeddings_bert_small_finetuned_legal_contracts_larger4010\n", + "nlu.load('en.embed.bert.contracts.small_finetuned_legal') returns Spark NLP model_anno_obj bert_embeddings_bert_small_finetuned_legal_contracts10train10val\n", + "nlu.load('en.embed.bert.contracts.uncased_base') returns Spark NLP model_anno_obj bert_base_uncased_contracts\n", + "nlu.load('en.embed.bert.covid_bio_clinical.finetuned') returns Spark NLP model_anno_obj bert_embeddings_bioclinicalbert_finetuned_covid_papers\n", + "nlu.load('en.embed.bert.large') returns Spark NLP model_anno_obj bert_embeddings_v_2021_large\n", + "nlu.load('en.embed.bert.large_cased') returns Spark NLP model_anno_obj bert_large_cased\n", + "nlu.load('en.embed.bert.large_legal_7m') returns Spark NLP model_anno_obj bert_embeddings_legalbert_large_1.7m_1\n", + "nlu.load('en.embed.bert.large_legal_7m.by_pile_of_law') returns Spark NLP model_anno_obj bert_embeddings_legalbert_large_1.7m_2\n", + "nlu.load('en.embed.bert.large_uncased') returns Spark NLP model_anno_obj bert_large_uncased\n", + "nlu.load('en.embed.bert.legal') returns Spark NLP model_anno_obj bert_embeddings_inlegalbert\n", + "nlu.load('en.embed.bert.phs') returns Spark NLP model_anno_obj bert_embeddings_phs_bert\n", + "nlu.load('en.embed.bert.pubmed') returns Spark NLP model_anno_obj bert_pubmed\n", + "nlu.load('en.embed.bert.pubmed.uncased') returns Spark NLP model_anno_obj bert_biomed_pubmed_uncased\n", + "nlu.load('en.embed.bert.pubmed_squad2') returns Spark NLP model_anno_obj bert_pubmed_squad2\n", + "nlu.load('en.embed.bert.small_L10_128') returns Spark NLP model_anno_obj small_bert_L10_128\n", + "nlu.load('en.embed.bert.small_L10_256') returns Spark NLP model_anno_obj small_bert_L10_256\n", + "nlu.load('en.embed.bert.small_L10_512') returns Spark NLP model_anno_obj small_bert_L10_512\n", + "nlu.load('en.embed.bert.small_L10_768') returns Spark NLP model_anno_obj small_bert_L10_768\n", + "nlu.load('en.embed.bert.small_L12_128') returns Spark NLP model_anno_obj small_bert_L12_128\n", + "nlu.load('en.embed.bert.small_L12_256') returns Spark NLP model_anno_obj small_bert_L12_256\n", + "nlu.load('en.embed.bert.small_L12_512') returns Spark NLP model_anno_obj small_bert_L12_512\n", + "nlu.load('en.embed.bert.small_L12_768') returns Spark NLP model_anno_obj small_bert_L12_768\n", + "nlu.load('en.embed.bert.small_L2_128') returns Spark NLP model_anno_obj small_bert_L2_128\n", + "nlu.load('en.embed.bert.small_L2_256') returns Spark NLP model_anno_obj small_bert_L2_256\n", + "nlu.load('en.embed.bert.small_L2_512') returns Spark NLP model_anno_obj small_bert_L2_512\n", + "nlu.load('en.embed.bert.small_L2_768') returns Spark NLP model_anno_obj small_bert_L2_768\n", + "nlu.load('en.embed.bert.small_L4_128') returns Spark NLP model_anno_obj small_bert_L4_128\n", + "nlu.load('en.embed.bert.small_L4_256') returns Spark NLP model_anno_obj small_bert_L4_256\n", + "nlu.load('en.embed.bert.small_L4_512') returns Spark NLP model_anno_obj small_bert_L4_512\n", + "nlu.load('en.embed.bert.small_L4_768') returns Spark NLP model_anno_obj small_bert_L4_768\n", + "nlu.load('en.embed.bert.small_L6_128') returns Spark NLP model_anno_obj small_bert_L6_128\n", + "nlu.load('en.embed.bert.small_L6_256') returns Spark NLP model_anno_obj small_bert_L6_256\n", + "nlu.load('en.embed.bert.small_L6_512') returns Spark NLP model_anno_obj small_bert_L6_512\n", + "nlu.load('en.embed.bert.small_L6_768') returns Spark NLP model_anno_obj small_bert_L6_768\n", + "nlu.load('en.embed.bert.small_L8_128') returns Spark NLP model_anno_obj small_bert_L8_128\n", + "nlu.load('en.embed.bert.small_L8_256') returns Spark NLP model_anno_obj small_bert_L8_256\n", + "nlu.load('en.embed.bert.small_L8_512') returns Spark NLP model_anno_obj small_bert_L8_512\n", + "nlu.load('en.embed.bert.small_L8_768') returns Spark NLP model_anno_obj small_bert_L8_768\n", + "nlu.load('en.embed.bert.small_finetuned_legal') returns Spark NLP model_anno_obj bert_embeddings_bert_small_finetuned_legal_definitions\n", + "nlu.load('en.embed.bert.small_finetuned_legal.by_muhtasham') returns Spark NLP model_anno_obj bert_embeddings_bert_small_finetuned_legal_definitions_longer\n", + "nlu.load('en.embed.bert.tiny_finetuned_legal') returns Spark NLP model_anno_obj bert_embeddings_bert_tiny_finetuned_legal_definitions\n", + "nlu.load('en.embed.bert.uncased_base') returns Spark NLP model_anno_obj bert_embeddings_base_uncased\n", + "nlu.load('en.embed.bert.uncased_base.by_model_attribution_challenge') returns Spark NLP model_anno_obj bert_embeddings_model_attribution_challenge_base_uncased\n", + "nlu.load('en.embed.bert.uncased_base_finetuned_legal') returns Spark NLP model_anno_obj bert_embeddings_legal_bert_base_uncased_finetuned_rramicus\n", + "nlu.load('en.embed.bert.uncased_base_finetuned_legal.by_hatemestinbejaia') returns Spark NLP model_anno_obj bert_embeddings_legal_bert_base_uncased_finetuned_ledgarscotus7\n", + "nlu.load('en.embed.bert.uncased_large') returns Spark NLP model_anno_obj bert_embeddings_large_uncased\n", + "nlu.load('en.embed.bert.uncased_large_whole_word_masking') returns Spark NLP model_anno_obj bert_embeddings_large_uncased_whole_word_masking\n", + "nlu.load('en.embed.bert.wiki_books') returns Spark NLP model_anno_obj bert_wiki_books\n", + "nlu.load('en.embed.bert.wiki_books_mnli') returns Spark NLP model_anno_obj bert_wiki_books_mnli\n", + "nlu.load('en.embed.bert.wiki_books_qnli') returns Spark NLP model_anno_obj bert_wiki_books_qnli\n", + "nlu.load('en.embed.bert.wiki_books_qqp') returns Spark NLP model_anno_obj bert_wiki_books_qqp\n", + "nlu.load('en.embed.bert.wiki_books_squad2') returns Spark NLP model_anno_obj bert_wiki_books_squad2\n", + "nlu.load('en.embed.bert.wiki_books_sst2') returns Spark NLP model_anno_obj bert_wiki_books_sst2\n", + "nlu.load('en.embed.bert_base_5lang_cased') returns Spark NLP model_anno_obj bert_embeddings_bert_base_5lang_cased\n", + "nlu.load('en.embed.bert_base_en_cased') returns Spark NLP model_anno_obj bert_embeddings_bert_base_en_cased\n", + "nlu.load('en.embed.bert_base_uncased_dstc9') returns Spark NLP model_anno_obj bert_embeddings_bert_base_uncased_dstc9\n", + "nlu.load('en.embed.bert_base_uncased_mnli_sparse_70_unstructured_no_classifier') returns Spark NLP model_anno_obj bert_embeddings_bert_base_uncased_mnli_sparse_70_unstructured_no_classifier\n", + "nlu.load('en.embed.bert_base_uncased_sparse_70_unstructured') returns Spark NLP model_anno_obj bert_embeddings_bert_base_uncased_sparse_70_unstructured\n", + "nlu.load('en.embed.bert_for_patents') returns Spark NLP model_anno_obj bert_embeddings_bert_for_patents\n", + "nlu.load('en.embed.bert_large_cased_whole_word_masking') returns Spark NLP model_anno_obj bert_embeddings_bert_large_cased_whole_word_masking\n", + "nlu.load('en.embed.bert_large_uncased_whole_word_masking') returns Spark NLP model_anno_obj bert_embeddings_bert_large_uncased_whole_word_masking\n", + "nlu.load('en.embed.bert_political_election2020_twitter_mlm') returns Spark NLP model_anno_obj bert_embeddings_bert_political_election2020_twitter_mlm\n", + "nlu.load('en.embed.biobert') returns Spark NLP model_anno_obj biobert_pubmed_base_cased\n", + "nlu.load('en.embed.biobert.clinical_base_cased') returns Spark NLP model_anno_obj biobert_clinical_base_cased\n", + "nlu.load('en.embed.biobert.discharge_base_cased') returns Spark NLP model_anno_obj biobert_discharge_base_cased\n", + "nlu.load('en.embed.biobert.pmc_base_cased') returns Spark NLP model_anno_obj biobert_pmc_base_cased\n", + "nlu.load('en.embed.biobert.pubmed.cased_base') returns Spark NLP model_anno_obj biobert_pubmed_base_cased_v1.2\n", + "nlu.load('en.embed.biobert.pubmed_large_cased') returns Spark NLP model_anno_obj biobert_pubmed_large_cased\n", + "nlu.load('en.embed.biobert.pubmed_pmc_base_cased') returns Spark NLP model_anno_obj biobert_pubmed_pmc_base_cased\n", + "nlu.load('en.embed.bioformer.cased') returns Spark NLP model_anno_obj bert_embeddings_bioformer_cased_v1.0\n", + "nlu.load('en.embed.chEMBL26_smiles_v2') returns Spark NLP model_anno_obj roberta_embeddings_chEMBL26_smiles_v2\n", + "nlu.load('en.embed.chEMBL_smiles_v1') returns Spark NLP model_anno_obj roberta_embeddings_chEMBL_smiles_v1\n", + "nlu.load('en.embed.chemical_bert_uncased') returns Spark NLP model_anno_obj bert_embeddings_chemical_bert_uncased\n", + "nlu.load('en.embed.childes_bert') returns Spark NLP model_anno_obj bert_embeddings_childes_bert\n", + "nlu.load('en.embed.clinical_pubmed_bert_base_128') returns Spark NLP model_anno_obj bert_embeddings_clinical_pubmed_bert_base_128\n", + "nlu.load('en.embed.clinical_pubmed_bert_base_512') returns Spark NLP model_anno_obj bert_embeddings_clinical_pubmed_bert_base_512\n", + "nlu.load('en.embed.covidbert') returns Spark NLP model_anno_obj covidbert_large_uncased\n", + "nlu.load('en.embed.covidbert.large_uncased') returns Spark NLP model_anno_obj covidbert_large_uncased\n", + "nlu.load('en.embed.crosloengual_bert') returns Spark NLP model_anno_obj bert_embeddings_crosloengual_bert\n", + "nlu.load('en.embed.danbert_small_cased') returns Spark NLP model_anno_obj bert_embeddings_danbert_small_cased\n", + "nlu.load('en.embed.deberta_base_uncased') returns Spark NLP model_anno_obj bert_embeddings_deberta_base_uncased\n", + "nlu.load('en.embed.deberta_v3_base') returns Spark NLP model_anno_obj deberta_v3_base\n", + "nlu.load('en.embed.deberta_v3_large') returns Spark NLP model_anno_obj deberta_v3_large\n", + "nlu.load('en.embed.deberta_v3_small') returns Spark NLP model_anno_obj deberta_v3_small\n", + "nlu.load('en.embed.deberta_v3_xsmall') returns Spark NLP model_anno_obj deberta_v3_xsmall\n", + "nlu.load('en.embed.distil_bert') returns Spark NLP model_anno_obj distilbert_embeddings_test_text\n", + "nlu.load('en.embed.distil_bert.finetuned') returns Spark NLP model_anno_obj distilbert_embeddings_finetuned_sarcasm_classification\n", + "nlu.load('en.embed.distil_bert.uncased_base') returns Spark NLP model_anno_obj distilbert_embeddings_base_uncased\n", + "nlu.load('en.embed.distil_bert.uncased_base_sparse_85_unstructured_pruneofa.by_intel') returns Spark NLP model_anno_obj distilbert_embeddings_base_uncased_sparse_85_unstructured_pruneofa\n", + "nlu.load('en.embed.distil_bert.uncased_base_sparse_90_unstructured_pruneofa.by_intel') returns Spark NLP model_anno_obj distilbert_embeddings_base_uncased_sparse_90_unstructured_pruneofa\n", + "nlu.load('en.embed.distilbert') returns Spark NLP model_anno_obj distilbert_base_cased\n", + "nlu.load('en.embed.distilbert.base') returns Spark NLP model_anno_obj distilbert_base_cased\n", + "nlu.load('en.embed.distilbert.base.uncased') returns Spark NLP model_anno_obj distilbert_base_uncased\n", + "nlu.load('en.embed.distilbert_base_en_cased') returns Spark NLP model_anno_obj distilbert_embeddings_distilbert_base_en_cased\n", + "nlu.load('en.embed.distilbert_base_uncased_sparse_85_unstructured_pruneofa') returns Spark NLP model_anno_obj distilbert_embeddings_distilbert_base_uncased_sparse_85_unstructured_pruneofa\n", + "nlu.load('en.embed.distilbert_base_uncased_sparse_90_unstructured_pruneofa') returns Spark NLP model_anno_obj distilbert_embeddings_distilbert_base_uncased_sparse_90_unstructured_pruneofa\n", + "nlu.load('en.embed.distilroberta') returns Spark NLP model_anno_obj distilroberta_base\n", + "nlu.load('en.embed.distilroberta_base') returns Spark NLP model_anno_obj roberta_embeddings_distilroberta_base\n", + "nlu.load('en.embed.distilroberta_base_climate_d') returns Spark NLP model_anno_obj roberta_embeddings_distilroberta_base_climate_d\n", + "nlu.load('en.embed.distilroberta_base_climate_d_s') returns Spark NLP model_anno_obj roberta_embeddings_distilroberta_base_climate_d_s\n", + "nlu.load('en.embed.distilroberta_base_climate_f') returns Spark NLP model_anno_obj roberta_embeddings_distilroberta_base_climate_f\n", + "nlu.load('en.embed.distilroberta_base_finetuned_jira_qt_issue_title') returns Spark NLP model_anno_obj roberta_embeddings_distilroberta_base_finetuned_jira_qt_issue_title\n", + "nlu.load('en.embed.distilroberta_base_finetuned_jira_qt_issue_titles_and_bodies') returns Spark NLP model_anno_obj roberta_embeddings_distilroberta_base_finetuned_jira_qt_issue_titles_and_bodies\n", + "nlu.load('en.embed.e') returns Spark NLP model_anno_obj bert_biolink_base\n", + "nlu.load('en.embed.electra') returns Spark NLP model_anno_obj electra_small_uncased\n", + "nlu.load('en.embed.electra.base') returns Spark NLP model_anno_obj electra_embeddings_electra_base_generator\n", + "nlu.load('en.embed.electra.base_uncased') returns Spark NLP model_anno_obj electra_base_uncased\n", + "nlu.load('en.embed.electra.large') returns Spark NLP model_anno_obj electra_embeddings_electra_large_generator\n", + "nlu.load('en.embed.electra.large_uncased') returns Spark NLP model_anno_obj electra_large_uncased\n", + "nlu.load('en.embed.electra.medical') returns Spark NLP model_anno_obj electra_medal_acronym\n", + "nlu.load('en.embed.electra.small') returns Spark NLP model_anno_obj electra_embeddings_electra_small_generator\n", + "nlu.load('en.embed.electra.small_uncased') returns Spark NLP model_anno_obj electra_small_uncased\n", + "nlu.load('en.embed.elmo') returns Spark NLP model_anno_obj elmo\n", + "nlu.load('en.embed.fairlex_ecthr_minilm') returns Spark NLP model_anno_obj roberta_embeddings_fairlex_ecthr_minilm\n", + "nlu.load('en.embed.fairlex_scotus_minilm') returns Spark NLP model_anno_obj roberta_embeddings_fairlex_scotus_minilm\n", + "nlu.load('en.embed.false_positives_scancode_bert_base_uncased_L8_1') returns Spark NLP model_anno_obj bert_embeddings_false_positives_scancode_bert_base_uncased_L8_1\n", + "nlu.load('en.embed.finbert_pretrain_yiyanghkust') returns Spark NLP model_anno_obj bert_embeddings_finbert_pretrain_yiyanghkust\n", + "nlu.load('en.embed.finest_bert') returns Spark NLP model_anno_obj bert_embeddings_finest_bert\n", + "nlu.load('en.embed.ge') returns Spark NLP model_anno_obj bert_biolink_large\n", + "nlu.load('en.embed.glove') returns Spark NLP model_anno_obj glove_100d\n", + "nlu.load('en.embed.glove.100d') returns Spark NLP model_anno_obj glove_100d\n", + "nlu.load('en.embed.hateBERT') returns Spark NLP model_anno_obj bert_embeddings_hateBERT\n", + "nlu.load('en.embed.legal.osf_lemmatized_legal') returns Spark NLP model_anno_obj word2vec_osf_lemmatized_legal\n", + "nlu.load('en.embed.legal.osf_raw_legal') returns Spark NLP model_anno_obj word2vec_osf_raw_legal\n", + "nlu.load('en.embed.legal.osf_replaced_lemmatized_legal') returns Spark NLP model_anno_obj word2vec_osf_replaced_lemmatized_legal\n", + "nlu.load('en.embed.legal.osf_replaced_raw_legal') returns Spark NLP model_anno_obj word2vec_osf_replaced_raw_legal\n", + "nlu.load('en.embed.legal_bert_base_uncased') returns Spark NLP model_anno_obj bert_embeddings_legal_bert_base_uncased\n", + "nlu.load('en.embed.legal_bert_small_uncased') returns Spark NLP model_anno_obj bert_embeddings_legal_bert_small_uncased\n", + "nlu.load('en.embed.legal_roberta_base') returns Spark NLP model_anno_obj roberta_embeddings_legal_roberta_base\n", + "nlu.load('en.embed.legalbert.legal.by_zlucia') returns Spark NLP model_anno_obj bert_embeddings_legalbert\n", + "nlu.load('en.embed.legalbert.legal.custom.by_zlucia') returns Spark NLP model_anno_obj bert_embeddings_custom_legalbert\n", + "nlu.load('en.embed.lic_class_scancode_bert_base_cased_L32_1') returns Spark NLP model_anno_obj bert_embeddings_lic_class_scancode_bert_base_cased_L32_1\n", + "nlu.load('en.embed.longformer') returns Spark NLP model_anno_obj longformer_base_4096\n", + "nlu.load('en.embed.longformer.base_legal') returns Spark NLP model_anno_obj legal_longformer_base\n", + "nlu.load('en.embed.longformer.clinical') returns Spark NLP model_anno_obj clinical_longformer\n", + "nlu.load('en.embed.longformer.large') returns Spark NLP model_anno_obj longformer_large_4096\n", + "nlu.load('en.embed.muppet_roberta_base') returns Spark NLP model_anno_obj roberta_embeddings_muppet_roberta_base\n", + "nlu.load('en.embed.muppet_roberta_large') returns Spark NLP model_anno_obj roberta_embeddings_muppet_roberta_large\n", + "nlu.load('en.embed.muril_adapted_local') returns Spark NLP model_anno_obj bert_embeddings_muril_adapted_local\n", + "nlu.load('en.embed.netbert') returns Spark NLP model_anno_obj bert_embeddings_netbert\n", + "nlu.load('en.embed.pmc_med_bio_mlm_roberta_large') returns Spark NLP model_anno_obj roberta_embeddings_pmc_med_bio_mlm_roberta_large\n", + "nlu.load('en.embed.pos.uncased_base') returns Spark NLP model_anno_obj bert_embeddings_false_positives_scancode_base_uncased_l8_1\n", + "nlu.load('en.embed.psych_search') returns Spark NLP model_anno_obj bert_embeddings_psych_search\n", + "nlu.load('en.embed.roberta') returns Spark NLP model_anno_obj roberta_base\n", + "nlu.load('en.embed.roberta.base') returns Spark NLP model_anno_obj roberta_base\n", + "nlu.load('en.embed.roberta.base.by_model_attribution_challenge') returns Spark NLP model_anno_obj roberta_embeddings_model_attribution_challenge_base\n", + "nlu.load('en.embed.roberta.base_finetuned') returns Spark NLP model_anno_obj roberta_embeddings_ruperta_base_finetuned_spa_constitution\n", + "nlu.load('en.embed.roberta.base_legal') returns Spark NLP model_anno_obj roberta_embeddings_legal_base\n", + "nlu.load('en.embed.roberta.cord19.1m') returns Spark NLP model_anno_obj roberta_embeddings_cord19_1m7k\n", + "nlu.load('en.embed.roberta.distilled_base') returns Spark NLP model_anno_obj roberta_embeddings_distil_base\n", + "nlu.load('en.embed.roberta.financial') returns Spark NLP model_anno_obj roberta_embeddings_financial\n", + "nlu.load('en.embed.roberta.large') returns Spark NLP model_anno_obj roberta_large\n", + "nlu.load('en.embed.roberta_pubmed') returns Spark NLP model_anno_obj roberta_embeddings_roberta_pubmed\n", + "nlu.load('en.embed.scibert.cord19_scibert.finetuned') returns Spark NLP model_anno_obj bert_embeddings_scibert_scivocab_finetuned_cord19\n", + "nlu.load('en.embed.scibert.covid_scibert.') returns Spark NLP model_anno_obj bert_embeddings_covid_scibert\n", + "nlu.load('en.embed.sec_bert_base') returns Spark NLP model_anno_obj bert_embeddings_sec_bert_base\n", + "nlu.load('en.embed.sec_bert_num') returns Spark NLP model_anno_obj bert_embeddings_sec_bert_num\n", + "nlu.load('en.embed.sec_bert_sh') returns Spark NLP model_anno_obj bert_embeddings_sec_bert_sh\n", + "nlu.load('en.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "nlu.load('en.embed.word2vec.gigaword') returns Spark NLP model_anno_obj word2vec_gigaword_300\n", + "nlu.load('en.embed.word2vec.gigaword_wiki') returns Spark NLP model_anno_obj word2vec_gigaword_wiki_300\n", + "nlu.load('en.embed.xlmr_roberta') returns Spark NLP model_anno_obj xlmroberta_embeddings_litlat_bert\n", + "nlu.load('en.embed.xlnet') returns Spark NLP model_anno_obj xlnet_base_cased\n", + "nlu.load('en.embed.xlnet_base_cased') returns Spark NLP model_anno_obj xlnet_base_cased\n", + "nlu.load('en.embed.xlnet_large_cased') returns Spark NLP model_anno_obj xlnet_large_cased\n", + "For language NLU provides the following Models : \n", + "nlu.load('eo.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('es.embed.RoBERTalex') returns Spark NLP model_anno_obj roberta_embeddings_RoBERTalex\n", + "nlu.load('es.embed.RuPERTa_base') returns Spark NLP model_anno_obj roberta_embeddings_RuPERTa_base\n", + "nlu.load('es.embed.alberti_bert_base_multilingual_cased') returns Spark NLP model_anno_obj bert_embeddings_alberti_bert_base_multilingual_cased\n", + "nlu.load('es.embed.bert.base_cased') returns Spark NLP model_anno_obj bert_base_cased\n", + "nlu.load('es.embed.bert.base_legal') returns Spark NLP model_anno_obj legalectra_base\n", + "nlu.load('es.embed.bert.base_uncased') returns Spark NLP model_anno_obj bert_base_uncased\n", + "nlu.load('es.embed.bert.cased_base') returns Spark NLP model_anno_obj bert_embeddings_base_es_cased\n", + "nlu.load('es.embed.bert.cased_base.by_dccuchile') returns Spark NLP model_anno_obj bert_embeddings_base_spanish_wwm_cased\n", + "nlu.load('es.embed.bert.small_legal') returns Spark NLP model_anno_obj legalectra_small\n", + "nlu.load('es.embed.bert.uncased_base') returns Spark NLP model_anno_obj bert_embeddings_base_spanish_wwm_uncased\n", + "nlu.load('es.embed.bert_base_5lang_cased') returns Spark NLP model_anno_obj bert_embeddings_bert_base_5lang_cased\n", + "nlu.load('es.embed.bert_base_es_cased') returns Spark NLP model_anno_obj bert_embeddings_bert_base_es_cased\n", + "nlu.load('es.embed.bertin_base_gaussian') returns Spark NLP model_anno_obj roberta_embeddings_bertin_base_gaussian\n", + "nlu.load('es.embed.bertin_base_gaussian_exp_512seqlen') returns Spark NLP model_anno_obj roberta_embeddings_bertin_base_gaussian_exp_512seqlen\n", + "nlu.load('es.embed.bertin_base_random') returns Spark NLP model_anno_obj roberta_embeddings_bertin_base_random\n", + "nlu.load('es.embed.bertin_base_random_exp_512seqlen') returns Spark NLP model_anno_obj roberta_embeddings_bertin_base_random_exp_512seqlen\n", + "nlu.load('es.embed.bertin_base_stepwise') returns Spark NLP model_anno_obj roberta_embeddings_bertin_base_stepwise\n", + "nlu.load('es.embed.bertin_base_stepwise_exp_512seqlen') returns Spark NLP model_anno_obj roberta_embeddings_bertin_base_stepwise_exp_512seqlen\n", + "nlu.load('es.embed.bertin_roberta_base_spanish') returns Spark NLP model_anno_obj roberta_embeddings_bertin_roberta_base_spanish\n", + "nlu.load('es.embed.bertin_roberta_large_spanish') returns Spark NLP model_anno_obj roberta_embeddings_bertin_roberta_large_spanish\n", + "nlu.load('es.embed.beto_gn_base_cased') returns Spark NLP model_anno_obj bert_embeddings_beto_gn_base_cased\n", + "nlu.load('es.embed.distilbert_base_es_cased') returns Spark NLP model_anno_obj distilbert_embeddings_distilbert_base_es_cased\n", + "nlu.load('es.embed.distilbert_base_es_multilingual_cased') returns Spark NLP model_anno_obj distilbert_embeddings_distilbert_base_es_multilingual_cased\n", + "nlu.load('es.embed.dpr_spanish_passage_encoder_allqa_base') returns Spark NLP model_anno_obj bert_embeddings_dpr_spanish_passage_encoder_allqa_base\n", + "nlu.load('es.embed.dpr_spanish_passage_encoder_squades_base') returns Spark NLP model_anno_obj bert_embeddings_dpr_spanish_passage_encoder_squades_base\n", + "nlu.load('es.embed.dpr_spanish_question_encoder_allqa_base') returns Spark NLP model_anno_obj bert_embeddings_dpr_spanish_question_encoder_allqa_base\n", + "nlu.load('es.embed.dpr_spanish_question_encoder_squades_base') returns Spark NLP model_anno_obj bert_embeddings_dpr_spanish_question_encoder_squades_base\n", + "nlu.load('es.embed.electra.base') returns Spark NLP model_anno_obj electra_embeddings_electricidad_base_generator\n", + "nlu.load('es.embed.jurisbert') returns Spark NLP model_anno_obj roberta_embeddings_jurisbert\n", + "nlu.load('es.embed.legal.cbow.cased_d100') returns Spark NLP model_anno_obj word2vec_cbow_legal_d100_cased\n", + "nlu.load('es.embed.legal.cbow.cased_d300') returns Spark NLP model_anno_obj word2vec_cbow_legal_d300_cased\n", + "nlu.load('es.embed.legal.cbow.cased_d50') returns Spark NLP model_anno_obj word2vec_cbow_legal_d50_cased\n", + "nlu.load('es.embed.legal.cbow.uncased_d100') returns Spark NLP model_anno_obj word2vec_cbow_legal_d100_uncased\n", + "nlu.load('es.embed.legal.cbow.uncased_d300') returns Spark NLP model_anno_obj word2vec_cbow_legal_d300_uncased\n", + "nlu.load('es.embed.legal.cbow.uncased_d50') returns Spark NLP model_anno_obj word2vec_cbow_legal_d50_uncased\n", + "nlu.load('es.embed.legal.skipgram.cased_d100') returns Spark NLP model_anno_obj word2vec_skipgram_legal_d100_cased\n", + "nlu.load('es.embed.legal.skipgram.cased_d300') returns Spark NLP model_anno_obj word2vec_skipgram_legal_d300_cased\n", + "nlu.load('es.embed.legal.skipgram.cased_d50') returns Spark NLP model_anno_obj word2vec_skipgram_legal_d50_cased\n", + "nlu.load('es.embed.legal.skipgram.uncased_d100') returns Spark NLP model_anno_obj word2vec_skipgram_legal_d100_uncased\n", + "nlu.load('es.embed.legal.skipgram.uncased_d300') returns Spark NLP model_anno_obj word2vec_skipgram_legal_d300_uncased\n", + "nlu.load('es.embed.legal.skipgram.uncased_d50') returns Spark NLP model_anno_obj word2vec_skipgram_legal_d50_uncased\n", + "nlu.load('es.embed.longformer.base_legal') returns Spark NLP model_anno_obj longformer_legal_base_8192\n", + "nlu.load('es.embed.longformer.legal') returns Spark NLP model_anno_obj longformer_legal_embeddings\n", + "nlu.load('es.embed.mlm_spanish_roberta_base') returns Spark NLP model_anno_obj roberta_embeddings_mlm_spanish_roberta_base\n", + "nlu.load('es.embed.roberta_base_bne') returns Spark NLP model_anno_obj roberta_embeddings_roberta_base_bne\n", + "nlu.load('es.embed.roberta_large_bne') returns Spark NLP model_anno_obj roberta_embeddings_roberta_large_bne\n", + "nlu.load('es.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('et.embed.camembert') returns Spark NLP model_anno_obj camembert_embeddings_est_roberta\n", + "nlu.load('et.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('eu.embed.roberta') returns Spark NLP model_anno_obj roberta_embeddings_robasqu\n", + "nlu.load('eu.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('fa.embed') returns Spark NLP model_anno_obj persian_w2v_cc_300d\n", + "nlu.load('fa.embed.albert') returns Spark NLP model_anno_obj albert_embeddings_albert_fa_base_v2\n", + "nlu.load('fa.embed.albert_fa_zwnj_base_v2') returns Spark NLP model_anno_obj albert_embeddings_albert_fa_zwnj_base_v2\n", + "nlu.load('fa.embed.bert.base') returns Spark NLP model_anno_obj bert_embeddings_fa_zwnj_base\n", + "nlu.load('fa.embed.bert.uncased_base') returns Spark NLP model_anno_obj bert_embeddings_fa_base_uncased\n", + "nlu.load('fa.embed.distilbert_fa_zwnj_base') returns Spark NLP model_anno_obj distilbert_embeddings_distilbert_fa_zwnj_base\n", + "nlu.load('fa.embed.roberta_fa_zwnj_base') returns Spark NLP model_anno_obj roberta_embeddings_roberta_fa_zwnj_base\n", + "nlu.load('fa.embed.word2vec') returns Spark NLP model_anno_obj persian_w2v_cc_300d\n", + "nlu.load('fa.embed.word2vec.300d') returns Spark NLP model_anno_obj persian_w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('fi.embed.bert.cased_base') returns Spark NLP model_anno_obj bert_embeddings_base_finnish_cased_v1\n", + "nlu.load('fi.embed.bert.uncased_base') returns Spark NLP model_anno_obj bert_embeddings_base_finnish_uncased_v1\n", + "nlu.load('fi.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('fr.embed.albert') returns Spark NLP model_anno_obj albert_embeddings_fralbert_base\n", + "nlu.load('fr.embed.bert.cased_base') returns Spark NLP model_anno_obj bert_embeddings_base_fr_cased\n", + "nlu.load('fr.embed.bert_5lang_cased') returns Spark NLP model_anno_obj bert_embeddings_bert_base_5lang_cased\n", + "nlu.load('fr.embed.bert_base_fr_cased') returns Spark NLP model_anno_obj bert_embeddings_bert_base_fr_cased\n", + "nlu.load('fr.embed.camembert') returns Spark NLP model_anno_obj camembert_embeddings_dummy\n", + "nlu.load('fr.embed.camembert.91m_generic') returns Spark NLP model_anno_obj camembert_embeddings_generic_model_r91m\n", + "nlu.load('fr.embed.camembert.adverse_drug_event_generic') returns Spark NLP model_anno_obj camembert_embeddings_adeimousa_generic_model\n", + "nlu.load('fr.embed.camembert.base') returns Spark NLP model_anno_obj camembert_embeddings_dataikunlp_camembert_base\n", + "nlu.load('fr.embed.camembert.by_ebtihal') returns Spark NLP model_anno_obj camembert_embeddings_arbertmo\n", + "nlu.load('fr.embed.camembert.by_ghani_25') returns Spark NLP model_anno_obj camembert_embeddings_summfinfr\n", + "nlu.load('fr.embed.camembert.by_hueynemud') returns Spark NLP model_anno_obj camembert_embeddings_das22_10_camembert_pretrained\n", + "nlu.load('fr.embed.camembert.by_jodsa') returns Spark NLP model_anno_obj camembert_embeddings_camembert_mlm\n", + "nlu.load('fr.embed.camembert.distilled_base') returns Spark NLP model_anno_obj camembert_embeddings_distilcamembert_base\n", + "nlu.load('fr.embed.camembert.generic') returns Spark NLP model_anno_obj camembert_embeddings_doyyingface_generic_model\n", + "nlu.load('fr.embed.camembert.generic.by_adam1224') returns Spark NLP model_anno_obj camembert_embeddings_adam1224_generic_model\n", + "nlu.load('fr.embed.camembert.generic.by_aliasdasd') returns Spark NLP model_anno_obj camembert_embeddings_aliasdasd_generic_model\n", + "nlu.load('fr.embed.camembert.generic.by_ankitkupadhyay') returns Spark NLP model_anno_obj camembert_embeddings_ankitkupadhyay_generic_model\n", + "nlu.load('fr.embed.camembert.generic.by_codingjacob') returns Spark NLP model_anno_obj camembert_embeddings_codingjacob_generic_model\n", + "nlu.load('fr.embed.camembert.generic.by_cylee') returns Spark NLP model_anno_obj camembert_embeddings_cylee_generic_model\n", + "nlu.load('fr.embed.camembert.generic.by_devtrent') returns Spark NLP model_anno_obj camembert_embeddings_devtrent_generic_model\n", + "nlu.load('fr.embed.camembert.generic.by_dianeshan') returns Spark NLP model_anno_obj camembert_embeddings_dianeshan_generic_model\n", + "nlu.load('fr.embed.camembert.generic.by_edge2992') returns Spark NLP model_anno_obj camembert_embeddings_edge2992_generic_model\n", + "nlu.load('fr.embed.camembert.generic.by_eduardopds') returns Spark NLP model_anno_obj camembert_embeddings_eduardopds_generic_model\n", + "nlu.load('fr.embed.camembert.generic.by_elliotsmith') returns Spark NLP model_anno_obj camembert_embeddings_elliotsmith_generic_model\n", + "nlu.load('fr.embed.camembert.generic.by_elusive_magnolia') returns Spark NLP model_anno_obj camembert_embeddings_elusive_magnolia_generic_model\n", + "nlu.load('fr.embed.camembert.generic.by_ericchchiu') returns Spark NLP model_anno_obj camembert_embeddings_ericchchiu_generic_model\n", + "nlu.load('fr.embed.camembert.generic.by_fjluque') returns Spark NLP model_anno_obj camembert_embeddings_fjluque_generic_model\n", + "nlu.load('fr.embed.camembert.generic.by_gulabpatel') returns Spark NLP model_anno_obj camembert_embeddings_new_generic_model\n", + "nlu.load('fr.embed.camembert.generic.by_h4d35') returns Spark NLP model_anno_obj camembert_embeddings_h4d35_generic_model\n", + "nlu.load('fr.embed.camembert.generic.by_hackertec') returns Spark NLP model_anno_obj camembert_embeddings_generic2\n", + "nlu.load('fr.embed.camembert.generic.by_hasanmurad') returns Spark NLP model_anno_obj camembert_embeddings_hasanmurad_generic_model\n", + "nlu.load('fr.embed.camembert.generic.by_hasanmuradbuet') returns Spark NLP model_anno_obj camembert_embeddings_hasanmuradbuet_generic_model\n", + "nlu.load('fr.embed.camembert.generic.by_henrywang') returns Spark NLP model_anno_obj camembert_embeddings_henrywang_generic_model\n", + "nlu.load('fr.embed.camembert.generic.by_jcai1') returns Spark NLP model_anno_obj camembert_embeddings_jcai1_generic_model\n", + "nlu.load('fr.embed.camembert.generic.by_joe8zhang') returns Spark NLP model_anno_obj camembert_embeddings_joe8zhang_generic_model\n", + "nlu.load('fr.embed.camembert.generic.by_jonathansum') returns Spark NLP model_anno_obj camembert_embeddings_jonathansum_generic_model\n", + "nlu.load('fr.embed.camembert.generic.by_juliencarbonnell') returns Spark NLP model_anno_obj camembert_embeddings_juliencarbonnell_generic_model\n", + "nlu.load('fr.embed.camembert.generic.by_katrin_kc') returns Spark NLP model_anno_obj camembert_embeddings_katrin_kc_generic_model\n", + "nlu.load('fr.embed.camembert.generic.by_katster') returns Spark NLP model_anno_obj camembert_embeddings_katster_generic_model\n", + "nlu.load('fr.embed.camembert.generic.by_kaushikacharya') returns Spark NLP model_anno_obj camembert_embeddings_kaushikacharya_generic_model\n", + "nlu.load('fr.embed.camembert.generic.by_leisa') returns Spark NLP model_anno_obj camembert_embeddings_leisa_generic_model\n", + "nlu.load('fr.embed.camembert.generic.by_lewtun') returns Spark NLP model_anno_obj camembert_embeddings_lewtun_generic_model\n", + "nlu.load('fr.embed.camembert.generic.by_lijingxin') returns Spark NLP model_anno_obj camembert_embeddings_lijingxin_generic_model\n", + "nlu.load('fr.embed.camembert.generic.by_linyi') returns Spark NLP model_anno_obj camembert_embeddings_linyi_generic_model\n", + "nlu.load('fr.embed.camembert.generic.by_mbateman') returns Spark NLP model_anno_obj camembert_embeddings_mbateman_generic_model\n", + "nlu.load('fr.embed.camembert.generic.by_mohammadrea76') returns Spark NLP model_anno_obj camembert_embeddings_mohammadrea76_generic_model\n", + "nlu.load('fr.embed.camembert.generic.by_myx4567') returns Spark NLP model_anno_obj camembert_embeddings_myx4567_generic_model\n", + "nlu.load('fr.embed.camembert.generic.by_osanseviero') returns Spark NLP model_anno_obj camembert_embeddings_generic_model_test\n", + "nlu.load('fr.embed.camembert.generic.by_peterhsu') returns Spark NLP model_anno_obj camembert_embeddings_peterhsu_generic_model\n", + "nlu.load('fr.embed.camembert.generic.by_pgperrone') returns Spark NLP model_anno_obj camembert_embeddings_pgperrone_generic_model\n", + "nlu.load('fr.embed.camembert.generic.by_safik') returns Spark NLP model_anno_obj camembert_embeddings_safik_generic_model\n", + "nlu.load('fr.embed.camembert.generic.by_sebu') returns Spark NLP model_anno_obj camembert_embeddings_sebu_generic_model\n", + "nlu.load('fr.embed.camembert.generic.by_seyfullah') returns Spark NLP model_anno_obj camembert_embeddings_seyfullah_generic_model\n", + "nlu.load('fr.embed.camembert.generic.by_sonny') returns Spark NLP model_anno_obj camembert_embeddings_sonny_generic_model\n", + "nlu.load('fr.embed.camembert.generic.by_tnagata') returns Spark NLP model_anno_obj camembert_embeddings_tnagata_generic_model\n", + "nlu.load('fr.embed.camembert.generic.by_tpanza') returns Spark NLP model_anno_obj camembert_embeddings_tpanza_generic_model\n", + "nlu.load('fr.embed.camembert.generic.by_wangst') returns Spark NLP model_anno_obj camembert_embeddings_wangst_generic_model\n", + "nlu.load('fr.embed.camembert.generic.by_weipeng') returns Spark NLP model_anno_obj camembert_embeddings_weipeng_generic_model\n", + "nlu.load('fr.embed.camembert.generic.by_xkang') returns Spark NLP model_anno_obj camembert_embeddings_xkang_generic_model\n", + "nlu.load('fr.embed.camembert.generic.by_yancong') returns Spark NLP model_anno_obj camembert_embeddings_yancong_generic_model\n", + "nlu.load('fr.embed.camembert.generic.by_ysharma') returns Spark NLP model_anno_obj camembert_embeddings_ysharma_generic_model_2\n", + "nlu.load('fr.embed.camembert.generic.by_zhenghuabin') returns Spark NLP model_anno_obj camembert_embeddings_zhenghuabin_generic_model\n", + "nlu.load('fr.embed.camembert.generic_v2.by_fjluque') returns Spark NLP model_anno_obj camembert_embeddings_fjluque_generic_model2\n", + "nlu.load('fr.embed.camembert.generic_v2.by_hackertec') returns Spark NLP model_anno_obj camembert_embeddings_hackertec_generic\n", + "nlu.load('fr.embed.camembert.generic_v2.by_lijingxin') returns Spark NLP model_anno_obj camembert_embeddings_lijingxin_generic_model_2\n", + "nlu.load('fr.embed.camembert.generic_v2.by_osanseviero') returns Spark NLP model_anno_obj camembert_embeddings_osanseviero_generic_model\n", + "nlu.load('fr.embed.camembert.generic_v2.by_peterhsu') returns Spark NLP model_anno_obj camembert_embeddings_tf_generic_model\n", + "nlu.load('fr.embed.camembert.tweet.base') returns Spark NLP model_anno_obj camembert_embeddings_bertweetfr_base\n", + "nlu.load('fr.embed.camembert_base') returns Spark NLP model_anno_obj camembert_base\n", + "nlu.load('fr.embed.camembert_base_ccnet') returns Spark NLP model_anno_obj camembert_base_ccnet\n", + "nlu.load('fr.embed.camembert_ccnet4g') returns Spark NLP model_anno_obj camembert_base_ccnet_4gb\n", + "nlu.load('fr.embed.camembert_large') returns Spark NLP model_anno_obj camembert_large\n", + "nlu.load('fr.embed.camembert_oscar_4g') returns Spark NLP model_anno_obj camembert_base_oscar_4gb\n", + "nlu.load('fr.embed.camembert_wiki_4g') returns Spark NLP model_anno_obj camembert_base_wikipedia_4gb\n", + "nlu.load('fr.embed.distilbert') returns Spark NLP model_anno_obj distilbert_embeddings_distilbert_base_fr_cased\n", + "nlu.load('fr.embed.electra.cased_base') returns Spark NLP model_anno_obj electra_embeddings_electra_base_french_europeana_cased_generator\n", + "nlu.load('fr.embed.french_roberta') returns Spark NLP model_anno_obj roberta_embeddings_french_roberta\n", + "nlu.load('fr.embed.roberta_base_wechsel_french') returns Spark NLP model_anno_obj roberta_embeddings_roberta_base_wechsel_french\n", + "nlu.load('fr.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "nlu.load('fr.embed.word2vec_wac_200') returns Spark NLP model_anno_obj word2vec_wac_200\n", + "nlu.load('fr.embed.word2vec_wiki_1000') returns Spark NLP model_anno_obj word2vec_wiki_1000\n", + "For language NLU provides the following Models : \n", + "nlu.load('frr.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('fy.embed.bert.cased_base') returns Spark NLP model_anno_obj bert_embeddings_base_dutch_cased_frisian\n", + "nlu.load('fy.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('gd.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('gl.embed.roberta') returns Spark NLP model_anno_obj roberta_embeddings_robertinh\n", + "nlu.load('gl.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('gom.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('gu.embed.RoBERTa_hindi_guj_san') returns Spark NLP model_anno_obj roberta_embeddings_RoBERTa_hindi_guj_san\n", + "For language NLU provides the following Models : \n", + "nlu.load('gv.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('ha.embed.bert.cased_multilingual_base_finetuned') returns Spark NLP model_anno_obj bert_embeddings_base_multilingual_cased_finetuned_hausa\n", + "nlu.load('ha.embed.bert.cased_multilingual_base_finetuned.by_davlan') returns Spark NLP model_anno_obj bert_embeddings_base_multilingual_cased_finetuned_swahili\n", + "nlu.load('ha.embed.xlm_roberta') returns Spark NLP model_anno_obj xlm_roberta_base_finetuned_hausa\n", + "For language NLU provides the following Models : \n", + "nlu.load('he.embed') returns Spark NLP model_anno_obj hebrew_cc_300d\n", + "nlu.load('he.embed.bert.base') returns Spark NLP model_anno_obj bert_embeddings_onlplab_aleph_base\n", + "nlu.load('he.embed.bert.legal') returns Spark NLP model_anno_obj bert_embeddings_legal_hebert\n", + "nlu.load('he.embed.bert.legal.by_avichr') returns Spark NLP model_anno_obj bert_embeddings_legal_hebert_ft\n", + "nlu.load('he.embed.cbow_300d') returns Spark NLP model_anno_obj hebrew_cc_300d\n", + "nlu.load('he.embed.glove') returns Spark NLP model_anno_obj hebrew_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('hi.embed') returns Spark NLP model_anno_obj hindi_cc_300d\n", + "nlu.load('hi.embed.RoBERTa_hindi_guj_san') returns Spark NLP model_anno_obj roberta_embeddings_RoBERTa_hindi_guj_san\n", + "nlu.load('hi.embed.bert') returns Spark NLP model_anno_obj bert_embeddings_indic_transformers\n", + "nlu.load('hi.embed.bert.cased_base') returns Spark NLP model_anno_obj bert_embeddings_base_hi_cased\n", + "nlu.load('hi.embed.bert_hi_cased') returns Spark NLP model_anno_obj bert_embeddings_bert_base_hi_cased\n", + "nlu.load('hi.embed.distil_bert') returns Spark NLP model_anno_obj distilbert_embeddings_indic_transformers\n", + "nlu.load('hi.embed.distilbert_base_hi_cased') returns Spark NLP model_anno_obj distilbert_embeddings_distilbert_base_hi_cased\n", + "nlu.load('hi.embed.indic_transformers_hi_bert') returns Spark NLP model_anno_obj bert_embeddings_indic_transformers_hi_bert\n", + "nlu.load('hi.embed.indic_transformers_hi_distilbert') returns Spark NLP model_anno_obj distilbert_embeddings_indic_transformers_hi_distilbert\n", + "nlu.load('hi.embed.indic_transformers_hi_roberta') returns Spark NLP model_anno_obj roberta_embeddings_indic_transformers_hi_roberta\n", + "nlu.load('hi.embed.muril_adapted_local') returns Spark NLP model_anno_obj bert_embeddings_muril_adapted_local\n", + "nlu.load('hi.embed.roberta') returns Spark NLP model_anno_obj roberta_embeddings_hindi\n", + "nlu.load('hi.embed.roberta.by_neuralspace_reverie') returns Spark NLP model_anno_obj roberta_embeddings_indic_transformers\n", + "nlu.load('hi.embed.xlmr_roberta') returns Spark NLP model_anno_obj xlmroberta_embeddings_indic_transformers_hi_xlmroberta\n", + "For language NLU provides the following Models : \n", + "nlu.load('hif.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language
NLU provides the following Models : \n", + "nlu.load('hr.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('hsb.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('hy.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('id.embed.bert.base') returns Spark NLP model_anno_obj bert_embeddings_base_indonesian_1.5g\n", + "nlu.load('id.embed.bert.base_522m') returns Spark NLP model_anno_obj bert_embeddings_base_indonesian_522m\n", + "nlu.load('id.embed.distilbert') returns Spark NLP model_anno_obj distilbert_embeddings_distilbert_base_indonesian\n", + "nlu.load('id.embed.indo_roberta_small') returns Spark NLP model_anno_obj roberta_embeddings_indo_roberta_small\n", + "nlu.load('id.embed.indonesian_roberta_base') returns Spark NLP model_anno_obj roberta_embeddings_indonesian_roberta_base\n", + "nlu.load('id.embed.indonesian_roberta_large') returns Spark NLP model_anno_obj roberta_embeddings_indonesian_roberta_large\n", + "nlu.load('id.embed.roberta.base_522m') returns Spark NLP model_anno_obj roberta_embeddings_base_indonesian_522m\n", + "nlu.load('id.embed.roberta.small') returns Spark NLP model_anno_obj roberta_embeddings_indo_small\n", + "nlu.load('id.embed.roberta_base_indonesian_522M') returns Spark NLP model_anno_obj roberta_embeddings_roberta_base_indonesian_522M\n", + "For language NLU provides the following Models : \n", + "nlu.load('ig.embed.xlm_roberta') returns Spark NLP model_anno_obj xlm_roberta_base_finetuned_igbo\n", + "For language NLU provides the following Models : \n", + "nlu.load('it.embed.BERTino') returns Spark NLP model_anno_obj distilbert_embeddings_BERTino\n", + "nlu.load('it.embed.bert') returns Spark NLP model_anno_obj bert_base_italian_cased\n", + "nlu.load('it.embed.bert.cased_base') returns Spark NLP model_anno_obj bert_embeddings_base_it_cased\n", + "nlu.load('it.embed.bert.cased_base.by_dbmdz') returns Spark NLP model_anno_obj bert_embeddings_base_italian_cased\n", + "nlu.load('it.embed.bert.cased_xxl_base') returns Spark NLP model_anno_obj bert_embeddings_base_italian_xxl_cased\n", + "nlu.load('it.embed.bert.uncased') returns Spark NLP model_anno_obj bert_base_italian_uncased\n", + "nlu.load('it.embed.bert.uncased_base') returns Spark NLP model_anno_obj bert_embeddings_base_italian_uncased\n", + "nlu.load('it.embed.bert.uncased_xxl_base') returns Spark NLP model_anno_obj bert_embeddings_base_italian_xxl_uncased\n", + "nlu.load('it.embed.bert_base_italian_xxl_cased') returns Spark NLP model_anno_obj bert_embeddings_bert_base_italian_xxl_cased\n", + "nlu.load('it.embed.bert_base_italian_xxl_uncased') returns Spark NLP model_anno_obj bert_embeddings_bert_base_italian_xxl_uncased\n", + "nlu.load('it.embed.bert_it_cased') returns Spark NLP model_anno_obj bert_embeddings_bert_base_it_cased\n", + "nlu.load('it.embed.camembert.cased') returns Spark NLP model_anno_obj camembert_embeddings_umberto_commoncrawl_cased_v1\n", + "nlu.load('it.embed.camembert.uncased') returns Spark NLP model_anno_obj camembert_embeddings_umberto_wikipedia_uncased_v1\n", + "nlu.load('it.embed.chefberto_italian_cased') returns Spark NLP model_anno_obj bert_embeddings_chefberto_italian_cased\n", + "nlu.load('it.embed.distil_bert') returns Spark NLP model_anno_obj distilbert_embeddings_bertino\n", + "nlu.load('it.embed.distilbert_base_it_cased') returns Spark NLP model_anno_obj distilbert_embeddings_distilbert_base_it_cased\n", + "nlu.load('it.embed.electra.cased_xxl_base') returns Spark NLP model_anno_obj electra_embeddings_electra_base_italian_xxl_cased_generator\n", + "nlu.load('it.embed.hseBert_it_cased') returns Spark NLP model_anno_obj bert_embeddings_hseBert_it_cased\n", + "nlu.load('it.embed.wineberto_italian_cased') returns Spark NLP model_anno_obj bert_embeddings_wineberto_italian_cased\n", + "nlu.load('it.embed.word2vec') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('ja.embed.albert_base_japanese_v1') returns Spark NLP model_anno_obj albert_embeddings_albert_base_japanese_v1\n", + "nlu.load('ja.embed.bert.base') returns Spark NLP model_anno_obj bert_base_japanese\n", + "nlu.load('ja.embed.bert.base_whole_word_masking') returns Spark NLP model_anno_obj bert_embeddings_base_japanese_char_whole_word_masking\n", + "nlu.load('ja.embed.bert.base_whole_word_masking.by_cl_tohoku') returns Spark NLP model_anno_obj bert_embeddings_base_japanese_whole_word_masking\n", + "nlu.load('ja.embed.bert.cased_base') returns Spark NLP model_anno_obj bert_embeddings_base_ja_cased\n", + "nlu.load('ja.embed.bert.large') returns Spark NLP model_anno_obj bert_embeddings_large_japanese\n", + "nlu.load('ja.embed.bert.large.by_cl_tohoku') returns Spark NLP model_anno_obj bert_embeddings_large_japanese_char\n", + "nlu.load('ja.embed.bert.v2_base') returns Spark NLP model_anno_obj bert_embeddings_base_japanese_char_v2\n", + "nlu.load('ja.embed.bert.v2_base.by_cl_tohoku') returns Spark NLP model_anno_obj bert_embeddings_base_japanese_v2\n", + "nlu.load('ja.embed.bert.wiki.base.by_cl_tohoku') returns Spark NLP model_anno_obj bert_embeddings_base_japanese\n", + "nlu.load('ja.embed.bert.wiki.base_char.by_cl_tohoku') returns Spark NLP model_anno_obj bert_embeddings_base_japanese_char\n", + "nlu.load('ja.embed.bert_base_ja_cased') returns Spark NLP model_anno_obj bert_embeddings_bert_base_ja_cased\n", + "nlu.load('ja.embed.bert_base_japanese_basic_char_v2') returns Spark NLP model_anno_obj bert_embeddings_bert_base_japanese_basic_char_v2\n", + "nlu.load('ja.embed.bert_base_japanese_char') returns Spark NLP model_anno_obj bert_embeddings_bert_base_japanese_char\n", + "nlu.load('ja.embed.bert_base_japanese_char_extended') returns Spark NLP model_anno_obj bert_embeddings_bert_base_japanese_char_extended\n", + "nlu.load('ja.embed.bert_base_japanese_char_v2') returns Spark NLP model_anno_obj bert_embeddings_bert_base_japanese_char_v2\n", + "nlu.load('ja.embed.bert_base_japanese_char_whole_word_masking') returns Spark NLP model_anno_obj bert_embeddings_bert_base_japanese_char_whole_word_masking\n", + "nlu.load('ja.embed.bert_base_japanese_v2') returns Spark NLP model_anno_obj bert_embeddings_bert_base_japanese_v2\n", + "nlu.load('ja.embed.bert_base_japanese_whole_word_masking') returns Spark NLP model_anno_obj bert_embeddings_bert_base_japanese_whole_word_masking\n", + "nlu.load('ja.embed.bert_large_japanese') returns Spark NLP model_anno_obj bert_embeddings_bert_large_japanese\n", + "nlu.load('ja.embed.bert_large_japanese_char') returns Spark NLP model_anno_obj bert_embeddings_bert_large_japanese_char\n", + "nlu.load('ja.embed.bert_large_japanese_char_extended') returns Spark NLP model_anno_obj bert_embeddings_bert_large_japanese_char_extended\n", + "nlu.load('ja.embed.bert_small_japanese') returns Spark NLP model_anno_obj bert_embeddings_bert_small_japanese\n", + "nlu.load('ja.embed.bert_small_japanese_fin') returns Spark NLP model_anno_obj bert_embeddings_bert_small_japanese_fin\n", + "nlu.load('ja.embed.distilbert_base_ja_cased') returns Spark NLP model_anno_obj distilbert_embeddings_distilbert_base_ja_cased\n", + "nlu.load('ja.embed.electra.base') returns Spark NLP model_anno_obj electra_embeddings_electra_base_japanese_generator\n", + "nlu.load('ja.embed.electra.small') returns Spark NLP model_anno_obj electra_embeddings_electra_small_japanese_fin_generator\n", + "nlu.load('ja.embed.electra.small.by_cinnamon') returns Spark NLP model_anno_obj electra_embeddings_electra_small_japanese_generator\n", + "nlu.load('ja.embed.electra.small_paper_japanese_fin_generator.small.by_izumi_lab') returns Spark NLP model_anno_obj electra_embeddings_electra_small_paper_japanese_fin_generator\n", + "nlu.load('ja.embed.electra.small_paper_japanese_generator.small.by_izumi_lab') returns Spark NLP model_anno_obj electra_embeddings_electra_small_paper_japanese_generator\n", + "nlu.load('ja.embed.glove.cc_300d') returns Spark NLP model_anno_obj japanese_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('jv.embed.bert.imdb_javanese.small') returns Spark NLP model_anno_obj bert_embeddings_javanese_small_imdb\n", + "nlu.load('jv.embed.bert.small') returns Spark NLP model_anno_obj bert_embeddings_javanese_small\n", + "nlu.load('jv.embed.distil_bert.imdb_javanese.small') returns Spark NLP model_anno_obj distilbert_embeddings_javanese_small_imdb\n", + "nlu.load('jv.embed.distil_bert.small') returns Spark NLP model_anno_obj distilbert_embeddings_javanese_small\n", + "nlu.load('jv.embed.distilbert') returns Spark NLP model_anno_obj distilbert_embeddings_javanese_distilbert_small\n", + "nlu.load('jv.embed.javanese_bert_small') returns Spark NLP model_anno_obj bert_embeddings_javanese_bert_small\n", + "nlu.load('jv.embed.javanese_bert_small_imdb') returns Spark NLP model_anno_obj bert_embeddings_javanese_bert_small_imdb\n", + "nlu.load('jv.embed.javanese_distilbert_small_imdb') returns Spark NLP model_anno_obj distilbert_embeddings_javanese_distilbert_small_imdb\n", + "nlu.load('jv.embed.javanese_roberta_small') returns Spark NLP model_anno_obj roberta_embeddings_javanese_roberta_small\n", + "nlu.load('jv.embed.javanese_roberta_small_imdb') returns Spark NLP model_anno_obj roberta_embeddings_javanese_roberta_small_imdb\n", + "nlu.load('jv.embed.roberta.imdb_javanese.small') returns Spark NLP model_anno_obj roberta_embeddings_javanese_small_imdb\n", + "nlu.load('jv.embed.roberta.small') returns Spark NLP model_anno_obj roberta_embeddings_javanese_small\n", + "For language NLU provides the following Models : \n", + "nlu.load('ka.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('kn.embed.KNUBert') returns Spark NLP model_anno_obj roberta_embeddings_KNUBert\n", + "nlu.load('kn.embed.KanBERTo') returns Spark NLP model_anno_obj roberta_embeddings_KanBERTo\n", + "For language NLU provides the following Models : \n", + "nlu.load('ko.embed.KR_FinBert') returns Spark NLP model_anno_obj bert_embeddings_KR_FinBert\n", + "nlu.load('ko.embed.bert') returns Spark NLP model_anno_obj bert_embeddings_bert_base\n", + "nlu.load('ko.embed.bert.base') returns Spark NLP model_anno_obj bert_embeddings_kor_base\n", + "nlu.load('ko.embed.bert_base_v1_sports') returns Spark NLP model_anno_obj bert_embeddings_bert_base_v1_sports\n", + "nlu.load('ko.embed.bert_kor_base') returns Spark NLP model_anno_obj bert_embeddings_bert_kor_base\n", + "nlu.load('ko.embed.dbert') returns Spark NLP model_anno_obj bert_embeddings_dbert\n", + "nlu.load('ko.embed.electra') returns Spark NLP model_anno_obj electra_embeddings_kr_electra_generator\n", + "nlu.load('ko.embed.electra.base') returns Spark NLP model_anno_obj electra_embeddings_finance_koelectra_base_generator\n", + "nlu.load('ko.embed.electra.by_deeq') returns Spark NLP model_anno_obj electra_embeddings_delectra_generator\n", + "nlu.load('ko.embed.electra.small') returns Spark NLP model_anno_obj electra_embeddings_finance_koelectra_small_generator\n", + "nlu.load('ko.embed.electra.small.by_monologg') returns Spark NLP model_anno_obj electra_embeddings_koelectra_small_generator\n", + "nlu.load('ko.embed.electra.v2_base') returns Spark NLP model_anno_obj electra_embeddings_koelectra_base_v2_generator\n", + "nlu.load('ko.embed.koelelectra.base.by_monologg') returns Spark NLP model_anno_obj electra_embeddings_koelectra_base_generator\n", + "nlu.load('ko.embed.koelelectra.base_v3.by_monologg') returns Spark NLP model_anno_obj electra_embeddings_koelectra_base_v3_generator\n", + "nlu.load('ko.embed.roberta_ko_small') returns Spark NLP model_anno_obj roberta_embeddings_roberta_ko_small\n", + "For language NLU provides the following Models : \n", + "nlu.load('la.embed.bert') returns Spark NLP model_anno_obj bert_embeddings_cicero_similis\n", + "For language NLU provides the following Models : \n", + "nlu.load('lb.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('lg.embed.xlm_roberta') returns Spark NLP model_anno_obj xlm_roberta_base_finetuned_luganda\n", + "For language NLU provides the following Models : \n", + "nlu.load('lmo.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('lou.embed.xlm_roberta') returns Spark NLP model_anno_obj xlm_roberta_base_finetuned_luo\n", + "For language NLU provides the following Models : \n", + "nlu.load('lt.embed.bert.cased_base') returns Spark NLP model_anno_obj bert_embeddings_base_lt_cased\n", + "nlu.load('lt.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('lu.embed.bert.medium') returns Spark NLP model_anno_obj bert_embeddings_medium_luxembourgish\n", + "For language NLU provides the following Models : \n", + "nlu.load('mai.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('mg.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('min.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('mk.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('ml.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('mn.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('mr.embed.albert') returns Spark NLP model_anno_obj albert_embeddings_marathi_albert\n", + "nlu.load('mr.embed.albert_v2') returns Spark NLP model_anno_obj albert_embeddings_marathi_albert_v2\n", + "nlu.load('mr.embed.distil_bert') returns Spark NLP model_anno_obj distilbert_embeddings_marathi\n", + "nlu.load('mr.embed.distilbert') returns Spark NLP model_anno_obj distilbert_embeddings_marathi_distilbert\n", + "nlu.load('mr.embed.marathi_bert') returns Spark NLP model_anno_obj bert_embeddings_marathi_bert\n", + "nlu.load('mr.embed.muril_adapted_local') returns Spark NLP model_anno_obj bert_embeddings_muril_adapted_local\n", + "nlu.load('mr.embed.xlmr_roberta') returns Spark NLP model_anno_obj xlmroberta_embeddings_marathi_roberta\n", + "For language NLU provides the following Models : \n", + "nlu.load('ms.embed.albert') returns Spark NLP model_anno_obj albert_embeddings_albert_large_bahasa_cased\n", + "nlu.load('ms.embed.albert_base_bahasa_cased') returns Spark NLP model_anno_obj albert_embeddings_albert_base_bahasa_cased\n", + "nlu.load('ms.embed.albert_tiny_bahasa_cased') returns Spark NLP model_anno_obj albert_embeddings_albert_tiny_bahasa_cased\n", + "nlu.load('ms.embed.bert') returns Spark NLP model_anno_obj bert_embeddings_melayubert\n", + "nlu.load('ms.embed.distil_bert.small') returns Spark NLP model_anno_obj distilbert_embeddings_malaysian_small\n", + "nlu.load('ms.embed.distilbert') returns Spark NLP model_anno_obj distilbert_embeddings_malaysian_distilbert_small\n", + "nlu.load('ms.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('mt.embed.camembert') returns Spark NLP model_anno_obj camembert_embeddings_camembert_aux_amandes\n", + "nlu.load('mt.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('mwl.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('my.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('myv.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('mzn.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('nah.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('nap.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('nds.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('ne.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('new.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('nl.embed') returns Spark NLP model_anno_obj dutch_cc_300d\n", + "nlu.load('nl.embed.bert') returns Spark NLP model_anno_obj bert_base_dutch_cased\n", + "nlu.load('nl.embed.bert.base_cased') returns Spark NLP model_anno_obj bert_base_cased\n", + "nlu.load('nl.embed.bert.cased_base') returns Spark NLP model_anno_obj bert_embeddings_base_dutch_cased\n", + "nlu.load('nl.embed.bert.cased_base.by_geotrend') returns Spark NLP model_anno_obj bert_embeddings_base_nl_cased\n", + "nlu.load('nl.embed.distilbert_base_cased') returns Spark NLP model_anno_obj distilbert_embeddings_distilbert_base_nl_cased\n", + "nlu.load('nl.embed.robbert_v2_dutch_base') returns Spark NLP model_anno_obj roberta_embeddings_robbert_v2_dutch_base\n", + "nlu.load('nl.embed.robbertje_1_gb_bort') returns Spark NLP model_anno_obj roberta_embeddings_robbertje_1_gb_bort\n", + "nlu.load('nl.embed.robbertje_1_gb_merged') returns Spark NLP model_anno_obj roberta_embeddings_robbertje_1_gb_merged\n", + "nlu.load('nl.embed.robbertje_1_gb_non_shuffled') returns Spark NLP model_anno_obj roberta_embeddings_robbertje_1_gb_non_shuffled\n", + "nlu.load('nl.embed.robbertje_1_gb_shuffled') returns Spark NLP model_anno_obj roberta_embeddings_robbertje_1_gb_shuffled\n", + "nlu.load('nl.embed.roberta') returns Spark NLP model_anno_obj roberta_embeddings_medroberta.nl\n", + "nlu.load('nl.embed.roberta.conll.v2_base') returns Spark NLP model_anno_obj roberta_embeddings_pdelobelle_robbert_v2_dutch_base\n", + "nlu.load('nl.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('nn.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('no.embed.bert') returns Spark NLP model_anno_obj bert_embeddings_norbert\n", + "nlu.load('no.embed.bert.by_ltgoslo') returns Spark NLP model_anno_obj bert_embeddings_norbert2\n", + "nlu.load('no.embed.bert.cased_base') returns Spark NLP model_anno_obj bert_embeddings_base_no_cased\n", + "nlu.load('no.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('nso.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('oc.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('or.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('os.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('pa.embed.muril_adapted_local') returns Spark NLP model_anno_obj bert_embeddings_muril_adapted_local\n", + "nlu.load('pa.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('pcm.embed.xlm_roberta') returns Spark NLP model_anno_obj xlm_roberta_base_finetuned_naija\n", + "For language NLU provides the following Models : \n", + "nlu.load('pfl.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('pl.embed.bert.cased_base') returns Spark NLP model_anno_obj bert_embeddings_base_pl_cased\n", + "nlu.load('pl.embed.distilbert_base_cased') returns Spark NLP model_anno_obj distilbert_embeddings_distilbert_base_pl_cased\n", + "nlu.load('pl.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('pms.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('pnb.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('ps.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('pt.embed.BR_BERTo') returns Spark NLP model_anno_obj roberta_embeddings_BR_BERTo\n", + "nlu.load('pt.embed.bert.cased_base') returns Spark NLP model_anno_obj bert_embeddings_base_portuguese_cased\n", + "nlu.load('pt.embed.bert.cased_base.by_geotrend') returns Spark NLP model_anno_obj bert_embeddings_base_pt_cased\n", + "nlu.load('pt.embed.bert_base_cased_pt_lenerbr') returns Spark NLP model_anno_obj bert_embeddings_bert_base_cased_pt_lenerbr\n", + "nlu.load('pt.embed.bert_base_gl_cased') returns Spark NLP model_anno_obj bert_embeddings_bert_base_gl_cased\n", + "nlu.load('pt.embed.bert_base_portuguese_cased') returns Spark NLP model_anno_obj bert_embeddings_bert_base_portuguese_cased\n", + "nlu.load('pt.embed.bert_base_portuguese_cased_finetuned_peticoes') returns Spark NLP model_anno_obj bert_embeddings_bert_base_portuguese_cased_finetuned_peticoes\n", + "nlu.load('pt.embed.bert_base_portuguese_cased_finetuned_tcu_acordaos') returns Spark NLP model_anno_obj bert_embeddings_bert_base_portuguese_cased_finetuned_tcu_acordaos\n", + "nlu.load('pt.embed.bert_base_pt_cased') returns Spark NLP model_anno_obj bert_embeddings_bert_base_pt_cased\n", + "nlu.load('pt.embed.bert_large_cased_pt_lenerbr') returns Spark NLP model_anno_obj bert_embeddings_bert_large_cased_pt_lenerbr\n", + "nlu.load('pt.embed.bert_large_portuguese_cased') returns Spark NLP model_anno_obj bert_embeddings_bert_large_portuguese_cased\n", + "nlu.load('pt.embed.bert_small_gl_cased') returns Spark NLP model_anno_obj bert_embeddings_bert_small_gl_cased\n", + "nlu.load('pt.embed.biobert') returns Spark NLP model_anno_obj bert_embeddings_biobertpt_all\n", + "nlu.load('pt.embed.biobert.by_pucpr') returns Spark NLP model_anno_obj bert_embeddings_biobertpt_bio\n", + "nlu.load('pt.embed.biobert.clinical.by_pucpr') returns Spark NLP model_anno_obj bert_embeddings_biobertpt_clin\n", + "nlu.load('pt.embed.distilbert_base_cased') returns Spark NLP model_anno_obj distilbert_embeddings_distilbert_base_pt_cased\n", + "nlu.load('pt.embed.gs_all') returns Spark NLP model_anno_obj biobert_embeddings_all\n", + "nlu.load('pt.embed.gs_biomedical') returns Spark NLP model_anno_obj biobert_embeddings_biomedical\n", + "nlu.load('pt.embed.gs_clinical') returns Spark NLP model_anno_obj biobert_embeddings_clinical\n", + "nlu.load('pt.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('qu.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('rm.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('ro.embed.ALR_BERT') returns Spark NLP model_anno_obj albert_embeddings_ALR_BERT\n", + "nlu.load('ro.embed.bert.cased_base') returns Spark NLP model_anno_obj bert_base_cased\n", + "nlu.load('ro.embed.bert.cased_base.by_geotrend') returns Spark NLP model_anno_obj bert_embeddings_base_ro_cased\n", + "nlu.load('ro.embed.distilbert_base_cased') returns Spark NLP model_anno_obj distilbert_embeddings_distilbert_base_ro_cased\n", + "nlu.load('ro.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('ru.embed.bert.cased_base') returns Spark NLP model_anno_obj bert_embeddings_base_ru_cased\n", + "nlu.load('ru.embed.bert_base_ru_cased') returns Spark NLP model_anno_obj bert_embeddings_bert_base_ru_cased\n", + "nlu.load('ru.embed.distilbert_base_cased') returns Spark NLP model_anno_obj distilbert_embeddings_distilbert_base_ru_cased\n", + "nlu.load('ru.embed.roberta_base_russian_v0') returns Spark NLP model_anno_obj roberta_embeddings_roberta_base_russian_v0\n", + "nlu.load('ru.embed.ruRoberta_large') returns Spark NLP model_anno_obj roberta_embeddings_ruRoberta_large\n", + "nlu.load('ru.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('rw.embed.xlm_roberta') returns Spark NLP model_anno_obj xlm_roberta_base_finetuned_kinyarwanda\n", + "For language NLU provides the following Models : \n", + "nlu.load('sa.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('sah.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('sc.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('scn.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('sco.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('sd.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('sh.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('si.embed.roberta') returns Spark NLP model_anno_obj roberta_embeddings_sinhalaberto\n", + "nlu.load('si.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('sk.embed.bert') returns Spark NLP model_anno_obj bert_embeddings_fernet_cc\n", + "nlu.load('sk.embed.roberta') returns Spark NLP model_anno_obj roberta_embeddings_slovakbert\n", + "nlu.load('sk.embed.roberta.news.') returns Spark NLP model_anno_obj roberta_embeddings_fernet_news\n", + "nlu.load('sk.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('sl.embed.camembert') returns Spark NLP model_anno_obj camembert_embeddings_sloberta\n", + "nlu.load('sl.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('so.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('sq.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('sr.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('su.embed.sundanese_roberta_base') returns Spark NLP model_anno_obj roberta_embeddings_sundanese_roberta_base\n", + "nlu.load('su.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('sv.embed.bert.base_cased') returns Spark NLP model_anno_obj bert_base_cased\n", + "nlu.load('sv.embed.bert.cased_base') returns Spark NLP model_anno_obj bert_embeddings_kb_base_swedish_cased\n", + "nlu.load('sv.embed.bert.cased_base.by_kblab') returns Spark NLP model_anno_obj bert_embeddings_kblab_base_swedish_cased\n", + "nlu.load('sv.embed.bert.distilled_cased') returns Spark NLP model_anno_obj bert_embeddings_kb_distilled_cased\n", + "nlu.load('sv.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('sw.embed.bert.cased_base') returns Spark NLP model_anno_obj bert_embeddings_base_sw_cased\n", + "nlu.load('sw.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "nlu.load('sw.embed.xlm_roberta') returns Spark NLP model_anno_obj xlm_roberta_base_finetuned_swahili\n", + "For language NLU provides the following Models : \n", + "nlu.load('ta.embed.muril_adapted_local') returns Spark NLP model_anno_obj bert_embeddings_muril_adapted_local\n", + "nlu.load('ta.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('te.embed.bert') returns Spark NLP model_anno_obj bert_embeddings_indic_transformers\n", + "nlu.load('te.embed.distil_bert') returns Spark NLP model_anno_obj distilbert_embeddings_indic_transformers\n", + "nlu.load('te.embed.distilbert') returns Spark NLP model_anno_obj distilbert_uncased\n", + "nlu.load('te.embed.indic_transformers_te_bert') returns Spark NLP model_anno_obj bert_embeddings_indic_transformers_te_bert\n", + "nlu.load('te.embed.indic_transformers_te_roberta') returns Spark NLP model_anno_obj roberta_embeddings_indic_transformers_te_roberta\n", + "nlu.load('te.embed.muril_adapted_local') returns Spark NLP model_anno_obj bert_embeddings_muril_adapted_local\n", + "nlu.load('te.embed.roberta') returns Spark NLP model_anno_obj roberta_embeddings_indic_transformers\n", + "nlu.load('te.embed.telugu_bertu') returns Spark NLP model_anno_obj bert_embeddings_telugu_bertu\n", + "nlu.load('te.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "nlu.load('te.embed.xlmr_roberta') returns Spark NLP model_anno_obj xlmroberta_embeddings_indic_transformers_te_xlmroberta\n", + "For language NLU provides the following Models : \n", + "nlu.load('tg.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('th.embed.bert.cased_base') returns Spark NLP model_anno_obj bert_embeddings_base_th_cased\n", + "nlu.load('th.embed.distilbert_base_cased') returns Spark NLP model_anno_obj distilbert_embeddings_distilbert_base_th_cased\n", + "nlu.load('th.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('tk.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('tl.embed.electra.cased_base') returns Spark NLP model_anno_obj electra_embeddings_electra_tagalog_base_cased_generator\n", + "nlu.load('tl.embed.electra.cased_small') returns Spark NLP model_anno_obj electra_embeddings_electra_tagalog_small_cased_generator\n", + "nlu.load('tl.embed.electra.uncased_base') returns Spark NLP model_anno_obj electra_embeddings_electra_tagalog_base_uncased_generator\n", + "nlu.load('tl.embed.electra.uncased_small') returns Spark NLP model_anno_obj electra_embeddings_electra_tagalog_small_uncased_generator\n", + "nlu.load('tl.embed.roberta.base') returns Spark NLP model_anno_obj roberta_embeddings_tagalog_base\n", + "nlu.load('tl.embed.roberta.large') returns Spark NLP model_anno_obj roberta_embeddings_tagalog_large\n", + "nlu.load('tl.embed.roberta_tagalog_base') returns Spark NLP model_anno_obj roberta_embeddings_roberta_tagalog_base\n", + "nlu.load('tl.embed.roberta_tagalog_large') returns Spark NLP model_anno_obj roberta_embeddings_roberta_tagalog_large\n", + "nlu.load('tl.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('tn.embed.roberta') returns Spark NLP model_anno_obj roberta_embeddings_tswanabert\n", + "For language NLU provides the following Models : \n", + "nlu.load('tr.embed.bert') returns Spark NLP model_anno_obj bert_base_turkish_cased\n", + "nlu.load('tr.embed.bert.cased_base') returns Spark NLP model_anno_obj bert_embeddings_base_tr_cased\n", + "nlu.load('tr.embed.bert.uncased') returns Spark NLP model_anno_obj bert_base_turkish_uncased\n", + "nlu.load('tr.embed.bert_cased') returns Spark NLP model_anno_obj bert_embeddings_bert_base_tr_cased\n", + "nlu.load('tr.embed.distilbert_base_cased') returns Spark NLP model_anno_obj distilbert_embeddings_distilbert_base_tr_cased\n", + "nlu.load('tr.embed.electra.cased_base') returns Spark NLP model_anno_obj electra_embeddings_electra_base_turkish_mc4_cased_generator\n", + "nlu.load('tr.embed.electra.uncased_base') returns Spark NLP model_anno_obj electra_embeddings_electra_base_turkish_mc4_uncased_generator\n", + "nlu.load('tr.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('tt.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('ug.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('uk.embed.bert.cased_base') returns Spark NLP model_anno_obj bert_embeddings_base_uk_cased\n", + "nlu.load('uk.embed.distilbert_base_cased') returns Spark NLP model_anno_obj distilbert_embeddings_distilbert_base_uk_cased\n", + "nlu.load('uk.embed.ukr_roberta_base') returns Spark NLP model_anno_obj roberta_embeddings_ukr_roberta_base\n", + "nlu.load('uk.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "nlu.load('uk.embed.xlmr_roberta.base') returns Spark NLP model_anno_obj xlmroberta_embeddings_xlm_roberta_base\n", + "For language NLU provides the following Models : \n", + "nlu.load('ur.embed') returns Spark NLP model_anno_obj urduvec_140M_300d\n", + "nlu.load('ur.embed.bert.cased_base') returns Spark NLP model_anno_obj bert_embeddings_base_ur_cased\n", + "nlu.load('ur.embed.bert_cased') returns Spark NLP model_anno_obj bert_embeddings_bert_base_ur_cased\n", + "nlu.load('ur.embed.distilbert_base_cased') returns Spark NLP model_anno_obj distilbert_embeddings_distilbert_base_ur_cased\n", + "nlu.load('ur.embed.glove.300d') returns Spark NLP model_anno_obj urduvec_140M_300d\n", + "nlu.load('ur.embed.muril_adapted_local') returns Spark NLP model_anno_obj bert_embeddings_muril_adapted_local\n", + "nlu.load('ur.embed.roberta_urdu_small') returns Spark NLP model_anno_obj roberta_embeddings_roberta_urdu_small\n", + "nlu.load('ur.embed.urdu_vec_140M_300d') returns Spark NLP model_anno_obj urduvec_140M_300d\n", + "nlu.load('ur.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('uz.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('vec.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('vi.embed.bert.cased_base') returns Spark NLP model_anno_obj bert_embeddings_base_vi_cased\n", + "nlu.load('vi.embed.bert_cased') returns Spark NLP model_anno_obj bert_embeddings_bert_base_vi_cased\n", + "nlu.load('vi.embed.distilbert.cased') returns Spark NLP model_anno_obj distilbert_base_cased\n", + "nlu.load('vi.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('vls.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('vo.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('wa.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('war.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('wo.embed.xlm_roberta') returns Spark NLP model_anno_obj xlm_roberta_base_finetuned_wolof\n", + "For language NLU provides the following Models : \n", + "nlu.load('xmf.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('xx.embed') returns Spark NLP model_anno_obj glove_840B_300\n", + "nlu.load('xx.embed.albert.indic') returns Spark NLP model_anno_obj albert_indic\n", + "nlu.load('xx.embed.bert') returns Spark NLP model_anno_obj bert_multi_cased\n", + "nlu.load('xx.embed.bert.muril') returns Spark NLP model_anno_obj bert_muril\n", + "nlu.load('xx.embed.bert_base_multilingual_cased') returns Spark NLP model_anno_obj bert_base_multilingual_cased\n", + "nlu.load('xx.embed.bert_base_multilingual_uncased') returns Spark NLP model_anno_obj bert_base_multilingual_uncased\n", + "nlu.load('xx.embed.bert_multi_cased') returns Spark NLP model_anno_obj bert_multi_cased\n", + "nlu.load('xx.embed.distilbert') returns Spark NLP model_anno_obj distilbert_base_multilingual_cased\n", + "nlu.load('xx.embed.glove.6B_300') returns Spark NLP model_anno_obj glove_6B_300\n", + "nlu.load('xx.embed.glove.840B_300') returns Spark NLP model_anno_obj glove_840B_300\n", + "nlu.load('xx.embed.glove.glove_6B_100') returns Spark NLP model_anno_obj glove_6B_100\n", + "nlu.load('xx.embed.mdeberta_v3_base') returns Spark NLP model_anno_obj mdeberta_v3_base\n", + "nlu.load('xx.embed.xlm') returns Spark NLP model_anno_obj xlm_roberta_base\n", + "nlu.load('xx.embed.xlm.base') returns Spark NLP model_anno_obj xlm_roberta_base\n", + "nlu.load('xx.embed.xlm.twitter') returns Spark NLP model_anno_obj twitter_xlm_roberta_base\n", + "nlu.load('xx.embed.xlm_roberta_large') returns Spark NLP model_anno_obj xlm_roberta_large\n", + "nlu.load('xx.embed.xlm_roberta_xtreme_base') returns Spark NLP model_anno_obj xlm_roberta_xtreme_base\n", + "nlu.load('xx.embed.xlmr_roberta.base') returns Spark NLP model_anno_obj xlmroberta_embeddings_afriberta_base\n", + "nlu.load('xx.embed.xlmr_roberta.large') returns Spark NLP model_anno_obj xlmroberta_embeddings_afriberta_large\n", + "nlu.load('xx.embed.xlmr_roberta.large.by_hfl') returns Spark NLP model_anno_obj xlmroberta_embeddings_cino_large\n", + "nlu.load('xx.embed.xlmr_roberta.large_128d') returns Spark NLP model_anno_obj xlmroberta_embeddings_roberta_large_eng_ara_128k\n", + "nlu.load('xx.embed.xlmr_roberta.mini_lm_mini') returns Spark NLP model_anno_obj xlmroberta_embeddings_fairlex_fscs_minilm\n", + "nlu.load('xx.embed.xlmr_roberta.small') returns Spark NLP model_anno_obj xlmroberta_embeddings_afriberta_small\n", + "nlu.load('xx.embed.xlmr_roberta.v2_base') returns Spark NLP model_anno_obj xlmroberta_embeddings_cino_base_v2\n", + "nlu.load('xx.embed.xlmr_roberta.v2_large') returns Spark NLP model_anno_obj xlmroberta_embeddings_cino_large_v2\n", + "nlu.load('xx.embed.xlmr_roberta.v2_small') returns Spark NLP model_anno_obj xlmroberta_embeddings_cino_small_v2\n", + "For language NLU provides the following Models : \n", + "nlu.load('yi.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('yo.embed.bert.cased_multilingual_base_finetuned') returns Spark NLP model_anno_obj bert_embeddings_base_multilingual_cased_finetuned_yoruba\n", + "nlu.load('yo.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "nlu.load('yo.embed.xlm_roberta') returns Spark NLP model_anno_obj xlm_roberta_base_finetuned_yoruba\n", + "For language NLU provides the following Models : \n", + "nlu.load('zea.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "For language NLU provides the following Models : \n", + "nlu.load('zh.embed') returns Spark NLP model_anno_obj bert_base_chinese\n", + "nlu.load('zh.embed.bert') returns Spark NLP model_anno_obj bert_base_chinese\n", + "nlu.load('zh.embed.bert.10l_128d_128d') returns Spark NLP model_anno_obj bert_embeddings_chinese_roberta_l_10_h_128\n", + "nlu.load('zh.embed.bert.10l_256d_256d') returns Spark NLP model_anno_obj bert_embeddings_chinese_roberta_l_10_h_256\n", + "nlu.load('zh.embed.bert.10l_512d_512d') returns Spark NLP model_anno_obj bert_embeddings_chinese_roberta_l_10_h_512\n", + "nlu.load('zh.embed.bert.10l_768d_768d') returns Spark NLP model_anno_obj bert_embeddings_chinese_roberta_l_10_h_768\n", + "nlu.load('zh.embed.bert.12l_128d_128d') returns Spark NLP model_anno_obj bert_embeddings_chinese_roberta_l_12_h_128\n", + "nlu.load('zh.embed.bert.12l_256d_256d') returns Spark NLP model_anno_obj bert_embeddings_chinese_roberta_l_12_h_256\n", + "nlu.load('zh.embed.bert.12l_512d_512d') returns Spark NLP model_anno_obj bert_embeddings_chinese_roberta_l_12_h_512\n", + "nlu.load('zh.embed.bert.12l_768d_768d') returns Spark NLP model_anno_obj bert_embeddings_chinese_roberta_l_12_h_768\n", + "nlu.load('zh.embed.bert.2l_128d_128d') returns Spark NLP model_anno_obj bert_embeddings_chinese_roberta_l_2_h_128\n", + "nlu.load('zh.embed.bert.2l_256d_256d') returns Spark NLP model_anno_obj bert_embeddings_chinese_roberta_l_2_h_256\n", + "nlu.load('zh.embed.bert.2l_512d_512d') returns Spark NLP model_anno_obj bert_embeddings_chinese_roberta_l_2_h_512\n", + "nlu.load('zh.embed.bert.2l_768d_768d') returns Spark NLP model_anno_obj bert_embeddings_chinese_roberta_l_2_h_768\n", + "nlu.load('zh.embed.bert.4l_128d_128d') returns Spark NLP model_anno_obj bert_embeddings_chinese_roberta_l_4_h_128\n", + "nlu.load('zh.embed.bert.4l_256d_256d') returns Spark NLP model_anno_obj bert_embeddings_chinese_roberta_l_4_h_256\n", + "nlu.load('zh.embed.bert.4l_512d_512d') returns Spark NLP model_anno_obj bert_embeddings_chinese_roberta_l_4_h_512\n", + "nlu.load('zh.embed.bert.4l_768d_768d') returns Spark NLP model_anno_obj bert_embeddings_chinese_roberta_l_4_h_768\n", + "nlu.load('zh.embed.bert.6l_128d_128d') returns Spark NLP model_anno_obj bert_embeddings_chinese_roberta_l_6_h_128\n", + "nlu.load('zh.embed.bert.6l_256d_256d') returns Spark NLP model_anno_obj bert_embeddings_chinese_roberta_l_6_h_256\n", + "nlu.load('zh.embed.bert.6l_512d_512d') returns Spark NLP model_anno_obj bert_embeddings_chinese_roberta_l_6_h_512\n", + "nlu.load('zh.embed.bert.6l_768d_768d') returns Spark NLP model_anno_obj bert_embeddings_chinese_roberta_l_6_h_768\n", + "nlu.load('zh.embed.bert.8l_128d_128d') returns Spark NLP model_anno_obj bert_embeddings_chinese_roberta_l_8_h_128\n", + "nlu.load('zh.embed.bert.8l_256d_256d') returns Spark NLP model_anno_obj bert_embeddings_chinese_roberta_l_8_h_256\n", + "nlu.load('zh.embed.bert.8l_512d_512d') returns Spark NLP model_anno_obj bert_embeddings_chinese_roberta_l_8_h_512\n", + "nlu.load('zh.embed.bert.8l_768d_768d') returns Spark NLP model_anno_obj bert_embeddings_chinese_roberta_l_8_h_768\n", + "nlu.load('zh.embed.bert.base') returns Spark NLP model_anno_obj bert_embeddings_base_chinese\n", + "nlu.load('zh.embed.bert.base.by_model_attribution_challenge') returns Spark NLP model_anno_obj bert_embeddings_model_attribution_challenge_base_chinese\n", + "nlu.load('zh.embed.bert.base.by_ptrsxu') returns Spark NLP model_anno_obj bert_embeddings_ptrsxu_base_chinese\n", + "nlu.load('zh.embed.bert.by_ptrsxu') returns Spark NLP model_anno_obj bert_embeddings_ptrsxu_chinese_wwm_ext\n", + "nlu.load('zh.embed.bert.by_qinluo') returns Spark NLP model_anno_obj bert_embeddings_wo_chinese_plus\n", + "nlu.load('zh.embed.bert.cased_base') returns Spark NLP model_anno_obj bert_embeddings_base_zh_cased\n", + "nlu.load('zh.embed.bert.chinese_wwm') returns Spark NLP model_anno_obj bert_embeddings_chinese_wwm\n", + "nlu.load('zh.embed.bert.large') returns Spark NLP model_anno_obj bert_embeddings_chinese_lert_large\n", + "nlu.load('zh.embed.bert.large.by_hfl') returns Spark NLP model_anno_obj bert_embeddings_chinese_mac_large\n", + "nlu.load('zh.embed.bert.lert.base.by_hfl') returns Spark NLP model_anno_obj bert_embeddings_chinese_lert_base\n", + "nlu.load('zh.embed.bert.mac.base.by_hfl') returns Spark NLP model_anno_obj bert_embeddings_chinese_mac_base\n", + "nlu.load('zh.embed.bert.mini') returns Spark NLP model_anno_obj bert_embeddings_minirbt_h256\n", + "nlu.load('zh.embed.bert.mini.by_hfl') returns Spark NLP model_anno_obj bert_embeddings_minirbt_h288\n", + "nlu.load('zh.embed.bert.rbt4_h312.by_hfl') returns Spark NLP model_anno_obj bert_embeddings_rbt4_h312\n", + "nlu.load('zh.embed.bert.small') returns Spark NLP model_anno_obj bert_embeddings_chinese_lert_small\n", + "nlu.load('zh.embed.bert.wwm') returns Spark NLP model_anno_obj chinese_bert_wwm\n", + "nlu.load('zh.embed.bert.wwm_ext.by_hfl') returns Spark NLP model_anno_obj bert_embeddings_hfl_chinese_wwm_ext\n", + "nlu.load('zh.embed.bert_5lang_cased') returns Spark NLP model_anno_obj bert_embeddings_bert_base_5lang_cased\n", + "nlu.load('zh.embed.bert_base_chinese_jinyong') returns Spark NLP model_anno_obj bert_embeddings_bert_base_chinese_jinyong\n", + "nlu.load('zh.embed.bert_base_zh_cased') returns Spark NLP model_anno_obj bert_embeddings_bert_base_zh_cased\n", + "nlu.load('zh.embed.bert_large_chinese') returns Spark NLP model_anno_obj bert_embeddings_bert_large_chinese\n", + "nlu.load('zh.embed.chinese_bert_wwm_ext') returns Spark NLP model_anno_obj bert_embeddings_chinese_bert_wwm_ext\n", + "nlu.load('zh.embed.chinese_macbert_base') returns Spark NLP model_anno_obj bert_embeddings_chinese_macbert_base\n", + "nlu.load('zh.embed.chinese_macbert_large') returns Spark NLP model_anno_obj bert_embeddings_chinese_macbert_large\n", + "nlu.load('zh.embed.chinese_roberta_wwm_ext') returns Spark NLP model_anno_obj bert_embeddings_chinese_roberta_wwm_ext\n", + "nlu.load('zh.embed.chinese_roberta_wwm_ext_large') returns Spark NLP model_anno_obj bert_embeddings_chinese_roberta_wwm_ext_large\n", + "nlu.load('zh.embed.chinese_roberta_wwm_large_ext_fix_mlm') returns Spark NLP model_anno_obj bert_embeddings_chinese_roberta_wwm_large_ext_fix_mlm\n", + "nlu.load('zh.embed.distilbert_base_cased') returns Spark NLP model_anno_obj distilbert_embeddings_distilbert_base_zh_cased\n", + "nlu.load('zh.embed.env_bert_chinese') returns Spark NLP model_anno_obj bert_embeddings_env_bert_chinese\n", + "nlu.load('zh.embed.jdt_fin_roberta_wwm') returns Spark NLP model_anno_obj bert_embeddings_jdt_fin_roberta_wwm\n", + "nlu.load('zh.embed.jdt_fin_roberta_wwm_large') returns Spark NLP model_anno_obj bert_embeddings_jdt_fin_roberta_wwm_large\n", + "nlu.load('zh.embed.macbert4csc_base_chinese') returns Spark NLP model_anno_obj bert_embeddings_macbert4csc_base_chinese\n", + "nlu.load('zh.embed.mengzi_bert_base') returns Spark NLP model_anno_obj bert_embeddings_mengzi_bert_base\n", + "nlu.load('zh.embed.mengzi_bert_base_fin') returns Spark NLP model_anno_obj bert_embeddings_mengzi_bert_base_fin\n", + "nlu.load('zh.embed.mengzi_oscar_base') returns Spark NLP model_anno_obj bert_embeddings_mengzi_oscar_base\n", + "nlu.load('zh.embed.mengzi_oscar_base_caption') returns Spark NLP model_anno_obj bert_embeddings_mengzi_oscar_base_caption\n", + "nlu.load('zh.embed.mengzi_oscar_base_retrieval') returns Spark NLP model_anno_obj bert_embeddings_mengzi_oscar_base_retrieval\n", + "nlu.load('zh.embed.rbt3') returns Spark NLP model_anno_obj bert_embeddings_rbt3\n", + "nlu.load('zh.embed.rbt4') returns Spark NLP model_anno_obj bert_embeddings_rbt4\n", + "nlu.load('zh.embed.rbt6') returns Spark NLP model_anno_obj bert_embeddings_rbt6\n", + "nlu.load('zh.embed.rbtl3') returns Spark NLP model_anno_obj bert_embeddings_rbtl3\n", + "nlu.load('zh.embed.roberta.wwm_ext.by_hfl') returns Spark NLP model_anno_obj bert_embeddings_hfl_chinese_roberta_wwm_ext\n", + "nlu.load('zh.embed.roberta_base_wechsel_chinese') returns Spark NLP model_anno_obj roberta_embeddings_roberta_base_wechsel_chinese\n", + "nlu.load('zh.embed.sikubert') returns Spark NLP model_anno_obj bert_embeddings_sikubert\n", + "nlu.load('zh.embed.sikuroberta') returns Spark NLP model_anno_obj bert_embeddings_sikuroberta\n", + "nlu.load('zh.embed.uer_large') returns Spark NLP model_anno_obj bert_embeddings_uer_large\n", + "nlu.load('zh.embed.w2v_cc_300d') returns Spark NLP model_anno_obj w2v_cc_300d\n", + "nlu.load('zh.embed.wobert_chinese_base') returns Spark NLP model_anno_obj bert_embeddings_wobert_chinese_base\n", + "nlu.load('zh.embed.wobert_chinese_plus') returns Spark NLP model_anno_obj bert_embeddings_wobert_chinese_plus\n", + "nlu.load('zh.embed.wobert_chinese_plus_base') returns Spark NLP model_anno_obj bert_embeddings_wobert_chinese_plus_base\n", + "nlu.load('zh.embed.xlmr_roberta.mini_lm_mini') returns Spark NLP model_anno_obj xlmroberta_embeddings_fairlex_cail_minilm\n", + "nlu.load('zh.embed.xlnet') returns Spark NLP model_anno_obj chinese_xlnet_base\n", + "For language NLU provides the following Models : \n", + "nlu.load('zu.embed.roberta') returns Spark NLP model_anno_obj roberta_embeddings_zuberta\n" + ] + } + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 303 + }, + "id": "Xz7xnvbCFxE3", + "outputId": "39864822-1fde-4fbf-c44c-d15e62d9707a" + }, + "source": [ + "# Add bert word embeddings to pipe\n", + "fitted_pipe = nlp.load('bert train.ner').fit(dataset_path=train_path)\n", + "\n", + "# predict with the trainable pipeline on dataset and get predictions\n", + "preds = fitted_pipe.predict('Donald Trump and Angela Merkel dont share many oppinions')\n", + "preds" + ], + "execution_count": 9, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", + "small_bert_L2_128 download started this may take some time.\n", + "Approximate size to download 16.1 MB\n", + "[OK!]\n", + "sentence_detector_dl download started this may take some time.\n", + "Approximate size to download 354.6 KB\n", + "[OK!]\n", + "Warning::Spark Session already created, some configs may not take.\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " document entities_ner \\\n", + "0 Donald Trump and Angela Merkel dont share many... Donald Trump \n", + "0 Donald Trump and Angela Merkel dont share many... Angela Merkel dont \n", + "\n", + " entities_ner_class entities_ner_confidence entities_ner_origin_chunk \\\n", + "0 PER 0.9427 0 \n", + "0 PER 0.9236667 1 \n", + "\n", + " entities_ner_origin_sentence \\\n", + "0 0 \n", + "0 0 \n", + "\n", + " word_embedding_bert \n", + "0 [[-0.44760167598724365, 1.0348622798919678, 0.... \n", + "0 [[-0.44760167598724365, 1.0348622798919678, 0.... " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
documententities_nerentities_ner_classentities_ner_confidenceentities_ner_origin_chunkentities_ner_origin_sentenceword_embedding_bert
0Donald Trump and Angela Merkel dont share many...Donald TrumpPER0.942700[[-0.44760167598724365, 1.0348622798919678, 0....
0Donald Trump and Angela Merkel dont share many...Angela Merkel dontPER0.923666710[[-0.44760167598724365, 1.0348622798919678, 0....
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 9 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2BB-NwZUoHSe" + }, + "source": [ + "# 5. Lets save the model" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "eLex095goHwm" + }, + "source": [ + "stored_model_path = './models/classifier_dl_trained'\n", + "fitted_pipe.save(stored_model_path)" + ], + "execution_count": 10, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e_b2DPd4rCiU" + }, + "source": [ + "# 6. Lets load the model from HDD.\n", + "This makes Offlien NLU usage possible! \n", + "You need to call nlu.load(path=path_to_the_pipe) to load a model/pipeline from disk." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "SO4uz45MoRgp", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 199 + }, + "outputId": "6987b2bc-9d70-4082-b8b5-915477ecb6e7" + }, + "source": [ + "hdd_pipe = nlp.load(path=stored_model_path)\n", + "\n", + "preds = hdd_pipe.predict('Donald Trump and Angela Merkel dont share many oppinions on laws about cheeseburgers')\n", + "preds" + ], + "execution_count": 11, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " document entities_from_disk \\\n", + "0 Donald Trump and Angela Merkel dont share many... Donald Trump \n", + "0 Donald Trump and Angela Merkel dont share many... Angela Merkel dont \n", + "\n", + " entities_from_disk_class entities_from_disk_confidence \\\n", + "0 PER 0.9282 \n", + "0 PER 0.8248 \n", + "\n", + " entities_from_disk_origin_chunk entities_from_disk_origin_sentence \\\n", + "0 0 0 \n", + "0 1 0 \n", + "\n", + " word_embedding_from_disk \n", + "0 [[-0.687057375907898, 1.1118954420089722, 0.58... \n", + "0 [[-0.687057375907898, 1.1118954420089722, 0.58... " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
documententities_from_diskentities_from_disk_classentities_from_disk_confidenceentities_from_disk_origin_chunkentities_from_disk_origin_sentenceword_embedding_from_disk
0Donald Trump and Angela Merkel dont share many...Donald TrumpPER0.928200[[-0.687057375907898, 1.1118954420089722, 0.58...
0Donald Trump and Angela Merkel dont share many...Angela Merkel dontPER0.824810[[-0.687057375907898, 1.1118954420089722, 0.58...
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 11 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "e0CVlkk9v6Qi", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "be0a118f-72a6-4cd7-9968-2f644036f4fa" + }, + "source": [ + "hdd_pipe.print_info()" + ], + "execution_count": 12, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n", + ">>> component_list['document_assembler'] has settable params:\n", + "component_list['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n", + ">>> component_list['sentence_detector_dl'] has settable params:\n", + "component_list['sentence_detector_dl'].setCustomBounds([]) | Info: characters used to explicitly mark sentence bounds | Currently set to : []\n", + "component_list['sentence_detector_dl'].setExplodeSentences(False) | Info: whether to explode each sentence into a different row, for better parallelization. Defaults to false. | Currently set to : False\n", + "component_list['sentence_detector_dl'].setMaxLength(99999) | Info: Set the maximum allowed length for each sentence | Currently set to : 99999\n", + "component_list['sentence_detector_dl'].setMinLength(0) | Info: Set the minimum allowed length for each sentence. | Currently set to : 0\n", + "component_list['sentence_detector_dl'].setUseCustomBoundsOnly(False) | Info: Only utilize custom bounds in sentence detection | Currently set to : False\n", + "component_list['sentence_detector_dl'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['sentence_detector_dl'].setSplitLength(2147483647) | Info: length at which sentences will be forcibly split. | Currently set to : 2147483647\n", + "component_list['sentence_detector_dl'].setStorageRef('SentenceDetectorDLModel_c83c27f46b97') | Info: storage unique identifier | Currently set to : SentenceDetectorDLModel_c83c27f46b97\n", + "component_list['sentence_detector_dl'].setEncoder(com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@3d7fecba) | Info: Data encoder | Currently set to : com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@3d7fecba\n", + "component_list['sentence_detector_dl'].setImpossiblePenultimates(['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']) | Info: Impossible penultimates - list of strings which a sentence can't end with | Currently set to : ['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']\n", + "component_list['sentence_detector_dl'].setModelArchitecture('cnn') | Info: Model architecture (CNN) | Currently set to : cnn\n", + ">>> component_list['tokenizer'] has settable params:\n", + "component_list['tokenizer'].setCaseSensitiveExceptions(True) | Info: Whether to care for case sensitiveness in exceptions | Currently set to : True\n", + "component_list['tokenizer'].setTargetPattern('\\S+') | Info: pattern to grab from text as token candidates. Defaults \\S+ | Currently set to : \\S+\n", + "component_list['tokenizer'].setMaxLength(99999) | Info: Set the maximum allowed length for each token | Currently set to : 99999\n", + "component_list['tokenizer'].setMinLength(0) | Info: Set the minimum allowed length for each token | Currently set to : 0\n", + ">>> component_list['bert_embeddings@small_bert_L2_128'] has settable params:\n", + "component_list['bert_embeddings@small_bert_L2_128'].setBatchSize(8) | Info: Size of every batch | Currently set to : 8\n", + "component_list['bert_embeddings@small_bert_L2_128'].setCaseSensitive(False) | Info: whether to ignore case in tokens for embeddings matching | Currently set to : False\n", + "component_list['bert_embeddings@small_bert_L2_128'].setDimension(128) | Info: Number of embedding dimensions | Currently set to : 128\n", + "component_list['bert_embeddings@small_bert_L2_128'].setMaxSentenceLength(128) | Info: Max sentence length to process | Currently set to : 128\n", + "component_list['bert_embeddings@small_bert_L2_128'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['bert_embeddings@small_bert_L2_128'].setStorageRef('small_bert_L2_128') | Info: unique reference name for identification | Currently set to : small_bert_L2_128\n", + ">>> component_list['ner_dl@small_bert_L2_128'] has settable params:\n", + "component_list['ner_dl@small_bert_L2_128'].setBatchSize(8) | Info: Size of every batch | Currently set to : 8\n", + "component_list['ner_dl@small_bert_L2_128'].setIncludeAllConfidenceScores(False) | Info: whether to include all confidence scores in annotation metadata or just the score of the predicted tag | Currently set to : False\n", + "component_list['ner_dl@small_bert_L2_128'].setIncludeConfidence(True) | Info: whether to include confidence scores in annotation metadata | Currently set to : True\n", + "component_list['ner_dl@small_bert_L2_128'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['ner_dl@small_bert_L2_128'].setClasses(['O', 'B-ORG', 'I-ORG', 'I-MISC', 'I-PER', 'B-LOC', 'B-MISC', 'I-LOC']) | Info: get the tags used to trained this NerDLModel | Currently set to : ['O', 'B-ORG', 'I-ORG', 'I-MISC', 'I-PER', 'B-LOC', 'B-MISC', 'I-LOC']\n", + "component_list['ner_dl@small_bert_L2_128'].setStorageRef('small_bert_L2_128') | Info: unique reference name for identification | Currently set to : small_bert_L2_128\n", + ">>> component_list['ner_converter'] has settable params:\n", + "component_list['ner_converter'].setNerHasNoSchema(False) | Info: set this to true if your NER tags coming from a model that does not have a IOB/IOB2 schema | Currently set to : False\n", + "component_list['ner_converter'].setPreservePosition(True) | Info: Whether to preserve the original position of the tokens in the original document or use the modified tokens | Currently set to : True\n" + ] + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "USD6d66Sw6_P" + }, + "source": [], + "execution_count": null, + "outputs": [] + } + ] +} \ No newline at end of file diff --git a/examples/colab/Training/part_of_speech/NLU_training_POS_demo.ipynb b/examples/colab/Training/part_of_speech/NLU_training_POS_demo.ipynb index a6a23e1a..ff85dd72 100644 --- a/examples/colab/Training/part_of_speech/NLU_training_POS_demo.ipynb +++ b/examples/colab/Training/part_of_speech/NLU_training_POS_demo.ipynb @@ -1 +1,891 @@ -{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"NLU_training_POS_demo.ipynb","provenance":[],"collapsed_sections":[]},"kernelspec":{"name":"python3","display_name":"Python 3"}},"cells":[{"cell_type":"markdown","metadata":{"id":"zkufh760uvF3"},"source":["![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)\n","\n","[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/Training/part_of_speech/NLU_training_POS_demo.ipynb)\n","\n","\n","\n","# Training a Named Entity Recognition (POS) model with NLU \n","With the [POS tagger](https://nlp.johnsnowlabs.com/docs/en/annotators#postagger-part-of-speech-tagger) from Spark NLP you can achieve State Of the Art results on any POS problem.\n","It uses an Averaged Percetron Model approach under the hood.\n","\n","This notebook showcases the following features : \n","\n","- How to train the deep learning POS classifier\n","- How to store a pipeline to disk\n","- How to load the pipeline from disk (Enables NLU offline mode)\n","\n"]},{"cell_type":"markdown","metadata":{"id":"dur2drhW5Rvi"},"source":["# 1. Install Java 8 and NLU"]},{"cell_type":"code","metadata":{"id":"hFGnBCHavltY"},"source":["import os\n","! apt-get update -qq > /dev/null \n","# Install java\n","! apt-get install -y openjdk-8-jdk-headless -qq > /dev/null\n","os.environ[\"JAVA_HOME\"] = \"/usr/lib/jvm/java-8-openjdk-amd64\"\n","os.environ[\"PATH\"] = os.environ[\"JAVA_HOME\"] + \"/bin:\" + os.environ[\"PATH\"]\n","! pip install nlu pyspark==2.4.7 > /dev/null \n","\n","import nlu"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"IWp5LbydCkqC"},"source":[""]},{"cell_type":"markdown","metadata":{"id":"f4KkTfnR5Ugg"},"source":["# 2. Download French POS dataset"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"OrVb5ZMvvrQD","executionInfo":{"status":"ok","timestamp":1607932039873,"user_tz":-60,"elapsed":80981,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"76f3b769-a646-444b-fdfc-d764d4b74e45"},"source":["! wget https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/resources/fr/pos/UD_French/UD_French-GSD_2.3.txt"],"execution_count":null,"outputs":[{"output_type":"stream","text":["--2020-12-14 07:47:19-- https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/resources/fr/pos/UD_French/UD_French-GSD_2.3.txt\n","Resolving s3.amazonaws.com (s3.amazonaws.com)... 52.216.143.238\n","Connecting to s3.amazonaws.com (s3.amazonaws.com)|52.216.143.238|:443... connected.\n","HTTP request sent, awaiting response... 200 OK\n","Length: 3565213 (3.4M) [text/plain]\n","Saving to: ‘UD_French-GSD_2.3.txt’\n","\n","UD_French-GSD_2.3.t 100%[===================>] 3.40M 15.8MB/s in 0.2s \n","\n","2020-12-14 07:47:19 (15.8 MB/s) - ‘UD_French-GSD_2.3.txt’ saved [3565213/3565213]\n","\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"0296Om2C5anY"},"source":["# 3. Train Deep Learning Classifier using nlu.load('train.pos')\n","\n","You dataset label column should be named 'y' and the feature column with text data should be named 'text'"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"3ZIPkRkWftBG","executionInfo":{"status":"ok","timestamp":1607932112061,"user_tz":-60,"elapsed":153158,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"c6032381-0446-484a-8c4e-0ad9fc500c48"},"source":["import nlu\n","# load a trainable pipeline by specifying the train. prefix and fit it on a datset with label and text columns\n","# Since there are no\n","train_path = '/content/UD_French-GSD_2.3.txt'\n","trainable_pipe = nlu.load('train.pos')\n","fitted_pipe = trainable_pipe.fit(dataset_path=train_path)\n","\n","# predict with the trainable pipeline on dataset and get predictions\n","preds = fitted_pipe.predict('Donald Trump and Angela Merkel dont share many oppinions')\n","preds"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
tokenpos
origin_index
0DonaldPROPN
0TrumpPROPN
0andCCONJ
0AngelaPROPN
0MerkelPROPN
0dontPRON
0shareVERB
0manyADJ
0oppinionsNOUN
\n","
"],"text/plain":[" token pos\n","origin_index \n","0 Donald PROPN\n","0 Trump PROPN\n","0 and CCONJ\n","0 Angela PROPN\n","0 Merkel PROPN\n","0 dont PRON\n","0 share VERB\n","0 many ADJ\n","0 oppinions NOUN"]},"metadata":{"tags":[]},"execution_count":3}]},{"cell_type":"markdown","metadata":{"id":"2BB-NwZUoHSe"},"source":["# 4. Lets save the model"]},{"cell_type":"code","metadata":{"id":"eLex095goHwm","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1607932114637,"user_tz":-60,"elapsed":155726,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"24d34ea2-dcc1-42b2-a5c6-10d345b76a3c"},"source":["stored_model_path = './models/pos_trained' \n","fitted_pipe.save(stored_model_path)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["Stored model in ./models/pos_trained\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"e_b2DPd4rCiU"},"source":["# 5. Lets load the model from HDD.\n","This makes Offlien NLU usage possible! \n","You need to call nlu.load(path=path_to_the_pipe) to load a model/pipeline from disk."]},{"cell_type":"code","metadata":{"id":"SO4uz45MoRgp","colab":{"base_uri":"https://localhost:8080/","height":485},"executionInfo":{"status":"ok","timestamp":1607932120301,"user_tz":-60,"elapsed":161383,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"db790b35-a51d-4226-8a0b-bb3e9e39e368"},"source":["hdd_pipe = nlu.load(path=stored_model_path)\n","\n","preds = hdd_pipe.predict('Donald Trump and Angela Merkel dont share many oppinions on laws about cheeseburgers')\n","preds"],"execution_count":null,"outputs":[{"output_type":"stream","text":["Fitting on empty Dataframe, could not infer correct training method!\n"],"name":"stdout"},{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
tokenpos
origin_index
0DonaldPROPN
0TrumpPROPN
0andCCONJ
0AngelaPROPN
0MerkelPROPN
0dontPRON
0shareVERB
0manyADJ
0oppinionsNOUN
0onPRON
0lawsVERB
0aboutADV
0cheeseburgersNOUN
\n","
"],"text/plain":[" token pos\n","origin_index \n","0 Donald PROPN\n","0 Trump PROPN\n","0 and CCONJ\n","0 Angela PROPN\n","0 Merkel PROPN\n","0 dont PRON\n","0 share VERB\n","0 many ADJ\n","0 oppinions NOUN\n","0 on PRON\n","0 laws VERB\n","0 about ADV\n","0 cheeseburgers NOUN"]},"metadata":{"tags":[]},"execution_count":5}]},{"cell_type":"code","metadata":{"id":"e0CVlkk9v6Qi","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1607932120301,"user_tz":-60,"elapsed":161374,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"6bb7769e-f545-40b8-f0ef-90fd9f32c149"},"source":["hdd_pipe.print_info()"],"execution_count":null,"outputs":[{"output_type":"stream","text":["The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n",">>> pipe['document_assembler'] has settable params:\n","pipe['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n",">>> pipe['sentence_detector'] has settable params:\n","pipe['sentence_detector'].setCustomBounds([]) | Info: characters used to explicitly mark sentence bounds | Currently set to : []\n","pipe['sentence_detector'].setDetectLists(True) | Info: whether detect lists during sentence detection | Currently set to : True\n","pipe['sentence_detector'].setExplodeSentences(False) | Info: whether to explode each sentence into a different row, for better parallelization. Defaults to false. | Currently set to : False\n","pipe['sentence_detector'].setMaxLength(99999) | Info: Set the maximum allowed length for each sentence | Currently set to : 99999\n","pipe['sentence_detector'].setMinLength(0) | Info: Set the minimum allowed length for each sentence. | Currently set to : 0\n","pipe['sentence_detector'].setUseAbbreviations(True) | Info: whether to apply abbreviations at sentence detection | Currently set to : True\n","pipe['sentence_detector'].setUseCustomBoundsOnly(False) | Info: Only utilize custom bounds in sentence detection | Currently set to : False\n",">>> pipe['regex_tokenizer'] has settable params:\n","pipe['regex_tokenizer'].setCaseSensitiveExceptions(True) | Info: Whether to care for case sensitiveness in exceptions | Currently set to : True\n","pipe['regex_tokenizer'].setTargetPattern('\\S+') | Info: pattern to grab from text as token candidates. Defaults \\S+ | Currently set to : \\S+\n","pipe['regex_tokenizer'].setMaxLength(99999) | Info: Set the maximum allowed length for each token | Currently set to : 99999\n","pipe['regex_tokenizer'].setMinLength(0) | Info: Set the minimum allowed length for each token | Currently set to : 0\n",">>> pipe['sentiment_dl'] has settable params:\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"o3jCHbIsMZrn"},"source":[""],"execution_count":null,"outputs":[]}]} \ No newline at end of file +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + } + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "zkufh760uvF3" + }, + "source": [ + "![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)\n", + "\n", + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/Training/part_of_speech/NLU_training_POS_demo.ipynb)\n", + "\n", + "\n", + "\n", + "# Training a Named Entity Recognition (POS) model with NLU\n", + "With the [POS tagger](https://nlp.johnsnowlabs.com/docs/en/annotators#postagger-part-of-speech-tagger) from Spark NLP you can achieve State Of the Art results on any POS problem.\n", + "It uses an Averaged Percetron Model approach under the hood.\n", + "\n", + "This notebook showcases the following features :\n", + "\n", + "- How to train the deep learning POS classifier\n", + "- How to store a pipeline to disk\n", + "- How to load the pipeline from disk (Enables NLU offline mode)\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dur2drhW5Rvi" + }, + "source": [ + "# 1. Colab Setup" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "hFGnBCHavltY" + }, + "source": [ + "# Install the johnsnowlabs library\n", + "! pip install -q johnsnowlabs" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "f4KkTfnR5Ugg" + }, + "source": [ + "# 2. Download French POS dataset" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "OrVb5ZMvvrQD", + "outputId": "3a9d194f-c90e-4159-f5c2-6603697e596a" + }, + "source": [ + "! wget https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/resources/fr/pos/UD_French/UD_French-GSD_2.3.txt" + ], + "execution_count": 2, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "--2023-10-27 13:16:24-- https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/resources/fr/pos/UD_French/UD_French-GSD_2.3.txt\n", + "Resolving s3.amazonaws.com (s3.amazonaws.com)... 52.217.10.150, 54.231.138.136, 52.217.116.224, ...\n", + "Connecting to s3.amazonaws.com (s3.amazonaws.com)|52.217.10.150|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 3565213 (3.4M) [text/plain]\n", + "Saving to: ‘UD_French-GSD_2.3.txt’\n", + "\n", + "UD_French-GSD_2.3.t 100%[===================>] 3.40M 14.4MB/s in 0.2s \n", + "\n", + "2023-10-27 13:16:25 (14.4 MB/s) - ‘UD_French-GSD_2.3.txt’ saved [3565213/3565213]\n", + "\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0296Om2C5anY" + }, + "source": [ + "# 3. Train Deep Learning Classifier using nlu.load('train.pos')\n", + "\n", + "You dataset label column should be named 'y' and the feature column with text data should be named 'text'" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 418 + }, + "id": "3ZIPkRkWftBG", + "outputId": "8eeec2a8-5eeb-4303-d168-e25ac2b0da9d" + }, + "source": [ + "from johnsnowlabs import nlp\n", + "# load a trainable pipeline by specifying the train. prefix and fit it on a datset with label and text columns\n", + "# Since there are no\n", + "train_path = '/content/UD_French-GSD_2.3.txt'\n", + "trainable_pipe = nlp.load('train.pos')\n", + "fitted_pipe = trainable_pipe.fit(dataset_path=train_path)\n", + "\n", + "# predict with the trainable pipeline on dataset and get predictions\n", + "preds = fitted_pipe.predict('Donald Trump and Angela Merkel dont share many oppinions')\n", + "preds" + ], + "execution_count": 3, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "sentence_detector_dl download started this may take some time.\n", + "Approximate size to download 354.6 KB\n", + "[OK!]\n", + "Warning::Spark Session already created, some configs may not take.\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " pos token\n", + "0 9_NUM Donald\n", + "0 1_NUM Trump\n", + "0 6_NUM and\n", + "0 7_NUM Angela\n", + "0 7_NUM Merkel\n", + "0 7_NUM dont\n", + "0 7_NUM share\n", + "0 6_NUM many\n", + "0 940_NUM oppinions" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
postoken
09_NUMDonald
01_NUMTrump
06_NUMand
07_NUMAngela
07_NUMMerkel
07_NUMdont
07_NUMshare
06_NUMmany
0940_NUMoppinions
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 3 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2BB-NwZUoHSe" + }, + "source": [ + "# 4. Lets save the model" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "eLex095goHwm" + }, + "source": [ + "stored_model_path = './models/pos_trained'\n", + "fitted_pipe.save(stored_model_path)" + ], + "execution_count": 4, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e_b2DPd4rCiU" + }, + "source": [ + "# 5. Lets load the model from HDD.\n", + "This makes Offlien NLU usage possible! \n", + "You need to call nlu.load(path=path_to_the_pipe) to load a model/pipeline from disk." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "SO4uz45MoRgp", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 509 + }, + "outputId": "d93e1dd6-71dd-45cb-9ca5-69c5002f94f0" + }, + "source": [ + "hdd_pipe = nlp.load(path=stored_model_path)\n", + "\n", + "preds = hdd_pipe.predict('Donald Trump and Angela Merkel dont share many oppinions on laws about cheeseburgers')\n", + "preds" + ], + "execution_count": 5, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " pos token\n", + "0 9_NUM Donald\n", + "0 1_NUM Trump\n", + "0 6_NUM and\n", + "0 7_NUM Angela\n", + "0 7_NUM Merkel\n", + "0 7_NUM dont\n", + "0 7_NUM share\n", + "0 7_NUM many\n", + "0 7_NUM oppinions\n", + "0 7_NUM on\n", + "0 7_NUM laws\n", + "0 6_NUM about\n", + "0 940_NUM cheeseburgers" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
postoken
09_NUMDonald
01_NUMTrump
06_NUMand
07_NUMAngela
07_NUMMerkel
07_NUMdont
07_NUMshare
07_NUMmany
07_NUMoppinions
07_NUMon
07_NUMlaws
06_NUMabout
0940_NUMcheeseburgers
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 5 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "e0CVlkk9v6Qi", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "1d002cad-1b60-4ef8-8b42-90a4a34c97b5" + }, + "source": [ + "hdd_pipe.print_info()" + ], + "execution_count": 6, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n", + ">>> component_list['document_assembler'] has settable params:\n", + "component_list['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n", + ">>> component_list['sentence_detector_dl'] has settable params:\n", + "component_list['sentence_detector_dl'].setCustomBounds([]) | Info: characters used to explicitly mark sentence bounds | Currently set to : []\n", + "component_list['sentence_detector_dl'].setExplodeSentences(False) | Info: whether to explode each sentence into a different row, for better parallelization. Defaults to false. | Currently set to : False\n", + "component_list['sentence_detector_dl'].setMaxLength(99999) | Info: Set the maximum allowed length for each sentence | Currently set to : 99999\n", + "component_list['sentence_detector_dl'].setMinLength(0) | Info: Set the minimum allowed length for each sentence. | Currently set to : 0\n", + "component_list['sentence_detector_dl'].setUseCustomBoundsOnly(False) | Info: Only utilize custom bounds in sentence detection | Currently set to : False\n", + "component_list['sentence_detector_dl'].setEngine('tensorflow') | Info: Deep Learning engine used for this model | Currently set to : tensorflow\n", + "component_list['sentence_detector_dl'].setSplitLength(2147483647) | Info: length at which sentences will be forcibly split. | Currently set to : 2147483647\n", + "component_list['sentence_detector_dl'].setStorageRef('SentenceDetectorDLModel_c83c27f46b97') | Info: storage unique identifier | Currently set to : SentenceDetectorDLModel_c83c27f46b97\n", + "component_list['sentence_detector_dl'].setEncoder(com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@1bf9ccd7) | Info: Data encoder | Currently set to : com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@1bf9ccd7\n", + "component_list['sentence_detector_dl'].setImpossiblePenultimates(['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']) | Info: Impossible penultimates - list of strings which a sentence can't end with | Currently set to : ['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']\n", + "component_list['sentence_detector_dl'].setModelArchitecture('cnn') | Info: Model architecture (CNN) | Currently set to : cnn\n", + ">>> component_list['tokenizer'] has settable params:\n", + "component_list['tokenizer'].setCaseSensitiveExceptions(True) | Info: Whether to care for case sensitiveness in exceptions | Currently set to : True\n", + "component_list['tokenizer'].setTargetPattern('\\S+') | Info: pattern to grab from text as token candidates. Defaults \\S+ | Currently set to : \\S+\n", + "component_list['tokenizer'].setMaxLength(99999) | Info: Set the maximum allowed length for each token | Currently set to : 99999\n", + "component_list['tokenizer'].setMinLength(0) | Info: Set the minimum allowed length for each token | Currently set to : 0\n", + ">>> component_list['pos'] has settable params:\n" + ] + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "o3jCHbIsMZrn" + }, + "source": [], + "execution_count": null, + "outputs": [] + } + ] +} \ No newline at end of file diff --git a/examples/colab/spark_nlp_utilities/NLU_utils_for_Spark_NLP.ipynb b/examples/colab/spark_nlp_utilities/NLU_utils_for_Spark_NLP.ipynb index b1817d4e..8eb56d1d 100644 --- a/examples/colab/spark_nlp_utilities/NLU_utils_for_Spark_NLP.ipynb +++ b/examples/colab/spark_nlp_utilities/NLU_utils_for_Spark_NLP.ipynb @@ -1,1172 +1,1498 @@ { - "nbformat": 4, - "nbformat_minor": 0, - "metadata": { - "colab": { - "name": "NLU_utils_for_Spark_NLP.ipynb", - "provenance": [], - "collapsed_sections": [] - }, - "kernelspec": { - "name": "python3", - "display_name": "Python 3" - }, - "language_info": { - "name": "python" - } - }, - "cells": [ - { - "cell_type": "markdown", - "source": [ - "![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)\n", - "\n", - "\n", - "\n", - "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/spark_nlp_utilities/NLU_utils_for_Spark_NLP.ipynb)\n", - "# NLU utilities for Spark NLP\n", - "This notebook showcases various utils provided for Spark NLP by NLU" - ], - "metadata": { - "id": "CUpEfC3JcETa", - "pycharm": { - "name": "#%% md\n" - } - } - }, - { - "cell_type": "markdown", - "source": [ - "## Install and Authorize" - ], - "metadata": { - "id": "11sKij2tcQKo", - "pycharm": { - "name": "#%% md\n" - } - } - }, - { - "cell_type": "code", - "source": [ - "%%capture\n", - "import nlu\n", - "! pip install nlu pyspark==3.1.1\n", - "SPARK_NLP_LICENSE = \"YOUR SECRETS HERE\"\n", - "AWS_ACCESS_KEY_ID = \"YOUR SECRETS HERE\"\n", - "AWS_SECRET_ACCESS_KEY = \"YOUR SECRETS HERE\"\n", - "JSL_SECRET = \"YOUR SECRETS HERE\"\n", - "nlu.auth(SPARK_NLP_LICENSE, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, JSL_SECRET)" - ], - "metadata": { - "id": "WN9uZX9z2KDU", - "pycharm": { - "name": "#%%\n" - } - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "## nlu.viz(pipe,data) \n", - "\n", - "Visualize input data with an already configured Spark NLP pipeline, \n", - "for Algorithms of type (Ner,Assertion, Relation, Resolution, Dependency) \n", - "using [Spark NLP Display](https://nlp.johnsnowlabs.com/docs/en/display) \n", - "Automatically infers applicable viz type and output columns to use for visualization. \n", - "\n", - "\n", - "\n", - "If a pipeline has multiple models candidates that can be used for a viz, \n", - "the first Annotator that is vizzable will be used to create viz. \n", - "You can specify which type of viz to create with the viz_type parameter \n", - " \n", - "Output columns to use for the viz are automatically deducted from the pipeline, by using the \n", - "first annotator that provides the correct output type for a specific viz. \n", - "You can specify which columns to use for a viz by using the \n", - "corresponding ner_col, pos_col, dep_untyped_col, dep_typed_col, resolution_col, relation_col, assertion_col, parameters.\n" - ], - "metadata": { - "id": "GpG5Bn3JarDD", - "pycharm": { - "name": "#%% md\n" - } - } - }, - { - "cell_type": "code", - "source": [ - "# works with Pipeline, LightPipeline, PipelineModel, List[Annotator] \n", - "from sparknlp.pretrained import PretrainedPipeline, LightPipeline\n", - "\n", - "ade_pipeline = PretrainedPipeline('explain_clinical_doc_ade', 'en', 'clinical/models')\n", - "text = \"\"\"I have an allergic reaction to vancomycin. \n", - " My skin has be itchy, sore throat/burning/itchy, and numbness in tongue and gums. \n", - " I would not recommend this dr\t- new conversion tool\n" - ], - "metadata": { + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { "colab": { - "base_uri": "https://localhost:8080/", - "height": 295 + "provenance": [] }, - "id": "gX5c-KFAFsO1", - "outputId": "609bdd0f-2a40-4018-d070-bd8592c4c8d8", - "pycharm": { - "name": "#%%\n" + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "language_info": { + "name": "python" } - }, - "execution_count": 26, - "outputs": [ + }, + "cells": [ { - "output_type": "stream", - "name": "stdout", - "text": [ - "explain_clinical_doc_ade download started this may take some time.\n", - "Approx size to download 462.2 MB\n", - "[OK!]\n" - ] + "cell_type": "markdown", + "source": [ + "![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)\n", + "\n", + "\n", + "\n", + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/spark_nlp_utilities/NLU_utils_for_Spark_NLP.ipynb)\n", + "# NLU utilities for Spark NLP\n", + "This notebook showcases various utils provided for Spark NLP by NLU" + ], + "metadata": { + "id": "CUpEfC3JcETa", + "pycharm": { + "name": "#%% md\n" + } + } }, { - "output_type": "display_data", - "data": { - "text/plain": [ - "" + "cell_type": "markdown", + "source": [ + "## Colab Setup" ], - "text/html": [ - "\n", - "\n", - " I have an allergic reaction ADEpresent to vancomycin.
My skin has be
itchy ADEpresent , sore throat/burning/itchy ADEpresent , and numbness in tongue and gums ADEpresent .
I would not recommend this drug to anyone, especially since I have never had such an adverse reaction to any other medication.
" - ] - }, - "metadata": {} - } - ] - }, - { - "cell_type": "markdown", - "source": [ - "## nlu.to_pretty_df(pipe,data) \n", - "\n", - "Annotates a Pandas Dataframe/Pandas Series/Numpy Array/Spark DataFrame/Python List strings /Python String \n", - "with given Spark NLP pipeline, which is assumed to be complete and runnable and returns it in a pythonic pandas dataframe format.\n", - "\n", - "\n", - "Annotators are grouped internally by NLU into output levels `token`,`sentence`, `document`,`chunk` and `relation`\n", - "Same level annotators output columns are zipped and exploded together to create the final output df. \n", - "Additionally, most keys from the metadata dictionary in the result annotations will be collected and expanded into their own columns in the resulting Dataframe, with special handling for Annotators that encode multiple metadata fields inside of one, seperated by strings like `|||` or `:::`.\n", - "Some columns are omitted from metadata to reduce total amount of output columns, these can be re-enabled by setting `metadata=True`\n", - "\n", - "For a given pipeline output level is automatically set to the last anntators output level by default.\n", - "This can be changed by defining `to_preddty_df(pipe,text,output_level='my_level'` for levels `token`,`sentence`, `document`,`chunk` and `relation` . " - ], - "metadata": { - "id": "jqXbQZoZa4fA", - "pycharm": { - "name": "#%% md\n" - } - } - }, - { - "cell_type": "code", - "source": [ - "# works with Pipeline, LightPipeline, PipelineModel, List[Annotator] \n", - "\n", - "text = \"\"\"I have an allergic reaction to vancomycin. \n", - " My skin has be itchy, sore throat/burning/itchy, and numbness in tongue and gums. \n", - " I would not recommend this drug to anyone, especially since I have never had such an adverse reaction to any other medication.\"\"\"\n", - "ade_assert_cols = ['assertion', 'entities_ner_chunks_ade_assertion',\t 'entities_ner_chunks_ade_assertion_class','assertion_confidence']\n", - "df = nlu.to_pretty_df(ade_pipeline,text)\n", - "df[ade_assert_cols]\n" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 206 + "metadata": { + "id": "11sKij2tcQKo", + "pycharm": { + "name": "#%% md\n" + } + } + }, + { + "cell_type": "code", + "source": [ + "!pip install -q johnsnowlabs" + ], + "metadata": { + "id": "WN9uZX9z2KDU", + "pycharm": { + "name": "#%%\n" + } + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "from google.colab import files\n", + "print('Please Upload your John Snow Labs License using the button below')\n", + "license_keys = files.upload()" + ], + "metadata": { + "id": "Ke87QahOJQpM" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "from johnsnowlabs import nlp\n", + "\n", + "# After uploading your license run this to install all licensed Python Wheels and pre-download Jars the Spark Session JVM\n", + "nlp.install()" + ], + "metadata": { + "id": "p_-OAEtMJmD9" + }, + "execution_count": null, + "outputs": [] }, - "id": "amg15blSGANo", - "outputId": "82985bb1-0c98-4ada-c3a9-cc67f2d975f3", - "pycharm": { - "name": "#%%\n" - } - }, - "execution_count": 66, - "outputs": [ { - "output_type": "execute_result", - "data": { - "text/plain": [ - " assertion entities_ner_chunks_ade_assertion \\\n", - "0 present allergic reaction \n", - "0 present itchy \n", - "0 present sore throat/burning/itchy \n", - "0 present numbness in tongue and gums \n", - "0 NaN NaN \n", - "\n", - " entities_ner_chunks_ade_assertion_class assertion_confidence \n", - "0 ADE 0.998 \n", - "0 ADE 0.8414 \n", - "0 ADE 0.9019 \n", - "0 ADE 0.9991 \n", - "0 NaN NaN " + "cell_type": "code", + "source": [ + "from johnsnowlabs import nlp\n", + "spark=nlp.start()" ], - "text/html": [ - "\n", - "
\n", - "
\n", - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
assertionentities_ner_chunks_ade_assertionentities_ner_chunks_ade_assertion_classassertion_confidence
0presentallergic reactionADE0.998
0presentitchyADE0.8414
0presentsore throat/burning/itchyADE0.9019
0presentnumbness in tongue and gumsADE0.9991
0NaNNaNNaNNaN
\n", - "
\n", - " \n", - " \n", - " \n", - "\n", - " \n", - "
\n", - "
\n", - " " + "metadata": { + "id": "B_xUxj9QJpjX" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "## nlu.viz(pipe,data)\n", + "\n", + "Visualize input data with an already configured Spark NLP pipeline, \n", + "for Algorithms of type (Ner,Assertion, Relation, Resolution, Dependency) \n", + "using [Spark NLP Display](https://nlp.johnsnowlabs.com/docs/en/display) \n", + "Automatically infers applicable viz type and output columns to use for visualization. \n", + "\n", + "\n", + "\n", + "If a pipeline has multiple models candidates that can be used for a viz, \n", + "the first Annotator that is vizzable will be used to create viz. \n", + "You can specify which type of viz to create with the viz_type parameter \n", + " \n", + "Output columns to use for the viz are automatically deducted from the pipeline, by using the \n", + "first annotator that provides the correct output type for a specific viz. \n", + "You can specify which columns to use for a viz by using the \n", + "corresponding ner_col, pos_col, dep_untyped_col, dep_typed_col, resolution_col, relation_col, assertion_col, parameters.\n" + ], + "metadata": { + "id": "GpG5Bn3JarDD", + "pycharm": { + "name": "#%% md\n" + } + } + }, + { + "cell_type": "code", + "source": [ + "from johnsnowlabs import nlp\n", + "\n", + "ade_pipeline = nlp.PretrainedPipeline('explain_clinical_doc_ade', 'en', 'clinical/models')\n", + "\n", + "text = \"\"\"I have an allergic reaction to vancomycin.\n", + " My skin has be itchy, sore throat/burning/itchy, and numbness in tongue and gums.\n", + " I would not recommend this dr\t- new conversion tool\"\"\"\n", + "nlp.nlu.viz(ade_pipeline,text)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 330 + }, + "id": "gX5c-KFAFsO1", + "outputId": "ba32dcd3-935f-4e28-cfac-5c5068918002", + "pycharm": { + "name": "#%%\n" + } + }, + "execution_count": 12, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "explain_clinical_doc_ade download started this may take some time.\n", + "Approx size to download 462.6 MB\n", + "[OK!]\n", + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n" + ] + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "" + ], + "text/html": [ + "\n", + "\n", + " I have an allergic reaction ADEpresent to vancomycin.
My skin has be
itchy ADEpresent , sore throat/burning/itchy ADEpresent , and numbness in tongue and gums ADEpresent .
I would not recommend this dr\t- new conversion tool
" + ] + }, + "metadata": {} + } ] - }, - "metadata": {}, - "execution_count": 66 - } - ] - }, - { - "cell_type": "markdown", - "source": [ - "## nlu.to_nlu_pipe(pipe)\n", - "\n", - "Convert a pipeline or list of annotators into a NLU pipeline making `.predict()` and `.viz()` avaiable for every Spark NLP pipeline.\n", - "Assumes the pipeline is already runnable.\n" - ], - "metadata": { - "id": "KI5u-6aia8bo", - "pycharm": { - "name": "#%% md\n" - } - } - }, - { - "cell_type": "code", - "source": [ - "\n", - "text = \"\"\"I have an allergic reaction to vancomycin. \n", - " My skin has be itchy, sore throat/burning/itchy, and numbness in tongue and gums. \n", - " I would not recommend this drug to anyone, especially since I have never had such an adverse reaction to any other medication.\"\"\"\n", - "nlu_pipe = nlu.to_nlu_pipe(ade_pipeline)\n", - "nlu_pipe.viz(text)\n", - "nlu_pipe.predict(text)[ade_assert_cols]\n", - "\n" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 432 }, - "id": "xuZL1uTmF5SJ", - "outputId": "3ffc84d3-4da4-4a72-bedb-3831dc31b6dd", - "pycharm": { - "name": "#%%\n" - } - }, - "execution_count": 69, - "outputs": [ { - "output_type": "display_data", - "data": { - "text/plain": [ - "" + "cell_type": "markdown", + "source": [ + "## nlu.to_pretty_df(pipe,data)\n", + "\n", + "Annotates a Pandas Dataframe/Pandas Series/Numpy Array/Spark DataFrame/Python List strings /Python String \n", + "with given Spark NLP pipeline, which is assumed to be complete and runnable and returns it in a pythonic pandas dataframe format.\n", + "\n", + "\n", + "Annotators are grouped internally by NLU into output levels `token`,`sentence`, `document`,`chunk` and `relation`\n", + "Same level annotators output columns are zipped and exploded together to create the final output df.\n", + "Additionally, most keys from the metadata dictionary in the result annotations will be collected and expanded into their own columns in the resulting Dataframe, with special handling for Annotators that encode multiple metadata fields inside of one, seperated by strings like `|||` or `:::`.\n", + "Some columns are omitted from metadata to reduce total amount of output columns, these can be re-enabled by setting `metadata=True`\n", + "\n", + "For a given pipeline output level is automatically set to the last anntators output level by default.\n", + "This can be changed by defining `to_preddty_df(pipe,text,output_level='my_level'` for levels `token`,`sentence`, `document`,`chunk` and `relation` ." + ], + "metadata": { + "id": "jqXbQZoZa4fA", + "pycharm": { + "name": "#%% md\n" + } + } + }, + { + "cell_type": "code", + "source": [ + "# works with Pipeline, LightPipeline, PipelineModel, List[Annotator]\n", + "\n", + "text = \"\"\"I have an allergic reaction to vancomycin.\n", + " My skin has be itchy, sore throat/burning/itchy, and numbness in tongue and gums.\n", + " I would not recommend this drug to anyone, especially since I have never had such an adverse reaction to any other medication.\"\"\"\n", + "ade_assert_cols = ['assertion', 'entities_ner_chunks_ade_assertion',\t 'entities_ner_chunks_ade_assertion_class','assertion_confidence']\n", + "df = nlp.to_pretty_df(ade_pipeline,text)\n", + "df[ade_assert_cols]\n" ], - "text/html": [ - "\n", - "\n", - " I have an allergic reaction ADEpresent to vancomycin.
My skin has be
itchy ADEpresent , sore throat/burning/itchy ADEpresent , and numbness in tongue and gums ADEpresent .
I would not recommend this drug to anyone, especially since I have never had such an adverse reaction to any other medication.
" + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 241 + }, + "id": "amg15blSGANo", + "outputId": "851139e4-3ee6-43e9-cddc-2081596ea650", + "pycharm": { + "name": "#%%\n" + } + }, + "execution_count": 10, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " assertion entities_ner_chunks_ade_assertion \\\n", + "0 present allergic reaction \n", + "0 present itchy \n", + "0 present sore throat/burning/itchy \n", + "0 present numbness in tongue and gums \n", + "0 NaN NaN \n", + "\n", + " entities_ner_chunks_ade_assertion_class assertion_confidence \n", + "0 ADE 0.9532 \n", + "0 ADE 0.9641 \n", + "0 ADE 0.9184 \n", + "0 ADE 0.9998 \n", + "0 NaN NaN " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
assertionentities_ner_chunks_ade_assertionentities_ner_chunks_ade_assertion_classassertion_confidence
0presentallergic reactionADE0.9532
0presentitchyADE0.9641
0presentsore throat/burning/itchyADE0.9184
0presentnumbness in tongue and gumsADE0.9998
0NaNNaNNaNNaN
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 10 + } ] - }, - "metadata": {} }, { - "output_type": "execute_result", - "data": { - "text/plain": [ - " assertion entities_ner_chunks_ade_assertion \\\n", - "0 present allergic reaction \n", - "0 present itchy \n", - "0 present sore throat/burning/itchy \n", - "0 present numbness in tongue and gums \n", - "0 NaN NaN \n", - "\n", - " entities_ner_chunks_ade_assertion_class assertion_confidence \n", - "0 ADE 0.998 \n", - "0 ADE 0.8414 \n", - "0 ADE 0.9019 \n", - "0 ADE 0.9991 \n", - "0 NaN NaN " + "cell_type": "markdown", + "source": [ + "## nlu.to_nlu_pipe(pipe)\n", + "\n", + "Convert a pipeline or list of annotators into a NLU pipeline making `.predict()` and `.viz()` avaiable for every Spark NLP pipeline.\n", + "Assumes the pipeline is already runnable.\n" ], - "text/html": [ - "\n", - "
\n", - "
\n", - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
assertionentities_ner_chunks_ade_assertionentities_ner_chunks_ade_assertion_classassertion_confidence
0presentallergic reactionADE0.998
0presentitchyADE0.8414
0presentsore throat/burning/itchyADE0.9019
0presentnumbness in tongue and gumsADE0.9991
0NaNNaNNaNNaN
\n", - "
\n", - " \n", - " \n", - " \n", - "\n", - " \n", - "
\n", - "
\n", - " " + "metadata": { + "id": "KI5u-6aia8bo", + "pycharm": { + "name": "#%% md\n" + } + } + }, + { + "cell_type": "code", + "source": [ + "\n", + "text = \"\"\"I have an allergic reaction to vancomycin.\n", + " My skin has be itchy, sore throat/burning/itchy, and numbness in tongue and gums.\n", + " I would not recommend this drug to anyone, especially since I have never had such an adverse reaction to any other medication.\"\"\"\n", + "nlu_pipe = nlp.to_nlu_pipe(ade_pipeline)\n", + "nlu_pipe.viz(text)\n", + "nlu_pipe.predict(text)[ade_assert_cols]\n", + "\n" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 467 + }, + "id": "xuZL1uTmF5SJ", + "outputId": "df5ffc84-ccde-467b-87a1-ca9214f85050", + "pycharm": { + "name": "#%%\n" + } + }, + "execution_count": 13, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n" + ] + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "" + ], + "text/html": [ + "\n", + "\n", + " I have an allergic reaction ADEpresent to vancomycin.
My skin has be
itchy ADEpresent , sore throat/burning/itchy ADEpresent , and numbness in tongue and gums ADEpresent .
I would not recommend this drug to anyone, especially since I have never had such an adverse reaction to any other medication.
" + ] + }, + "metadata": {} + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " assertion entities_ner_chunks_ade_assertion \\\n", + "0 present allergic reaction \n", + "0 present itchy \n", + "0 present sore throat/burning/itchy \n", + "0 present numbness in tongue and gums \n", + "0 NaN NaN \n", + "\n", + " entities_ner_chunks_ade_assertion_class assertion_confidence \n", + "0 ADE 0.9532 \n", + "0 ADE 0.9641 \n", + "0 ADE 0.9184 \n", + "0 ADE 0.9998 \n", + "0 NaN NaN " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
assertionentities_ner_chunks_ade_assertionentities_ner_chunks_ade_assertion_classassertion_confidence
0presentallergic reactionADE0.9532
0presentitchyADE0.9641
0presentsore throat/burning/itchyADE0.9184
0presentnumbness in tongue and gumsADE0.9998
0NaNNaNNaNNaN
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 13 + } ] - }, - "metadata": {}, - "execution_count": 69 - } - ] - }, - { - "cell_type": "markdown", - "source": [ - "## nlu.autocomplete_pipeline(pipe)\n", - "\n", - "Auto-Complete a pipeline or single annotator into a runnable pipeline by harnessing NLU's DAG Autocompletion algorithm and returns it as NLU pipeline.\n", - "The standard Spark pipeline is avaiable on the `.vanilla_transformer_pipe` attribute of the returned nlu pipe\n", - "\n", - "Every Annotator and Pipeline of Annotators defines a `DAG` of tasks, with various dependencies that must be satisfied in `topoligical order`.\n", - "NLU enables the completion of an incomplete DAG by finding or creating a path between\n", - "the very first input node which is almost always is `DocumentAssembler/MultiDocumentAssembler` \n", - "and the very last node(s), which is given by the `topoligical sorting` the iterable annotators parameter. \n", - "Paths are created by resolving input features of annotators to the corrrosponding providers with matching storage references.\n" - ], - "metadata": { - "id": "txeCLCtWbSVp", - "pycharm": { - "name": "#%% md\n" - } - } - }, - { - "cell_type": "code", - "source": [ - "from sparknlp_jsl.annotator import RelationExtractionModel\n", - "text = \"\"\"I have an allergic reaction to vancomycin. \n", - " My skin has be itchy, sore throat/burning/itchy, and numbness in tongue and gums. \n", - " I would not recommend this drug to anyone, especially since I have never had such an adverse reaction to any other medication.\"\"\"\n", - "\n", - "re_model = RelationExtractionModel().pretrained(\"re_ade_clinical\", \"en\", 'clinical/models')\n", - "\n", - "nlu_pipe = nlu.autocomplete_pipeline(re_model)\n", - "df = nlu_pipe.predict(text)\n", - "cols = [\n", - "'relation_RelationExtractionModel_1fb1dfa024c7',\n", - "'relation_RelationExtractionModel_1fb1dfa024c7_confidence',\n", - "'relation_RelationExtractionModel_1fb1dfa024c7_entity1',\n", - "'relation_RelationExtractionModel_1fb1dfa024c7_entity2',\n", - "'relation_RelationExtractionModel_1fb1dfa024c7_entity2_class',\n", - "]\n", - "\n", - "df[cols]" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 846 }, - "id": "Pk8ThwOaGHW2", - "outputId": "3547f237-ca2d-4ff0-97cd-2dc13839fa7d", - "pycharm": { - "name": "#%%\n" - } - }, - "execution_count": 73, - "outputs": [ { - "output_type": "stream", - "name": "stdout", - "text": [ - "re_ade_clinical download started this may take some time.\n", - "Approximate size to download 10.9 MB\n", - "[OK!]\n", - "embeddings_clinical download started this may take some time.\n", - "Approximate size to download 1.6 GB\n", - "[OK!]\n", - "pos_anc download started this may take some time.\n", - "Approximate size to download 3.9 MB\n", - "[OK!]\n", - "dependency_conllu download started this may take some time.\n", - "Approximate size to download 16.7 MB\n", - "[OK!]\n", - "ner_jsl download started this may take some time.\n", - "[OK!]\n", - "sentence_detector_dl download started this may take some time.\n", - "Approximate size to download 354.6 KB\n", - "[OK!]\n" - ] + "cell_type": "markdown", + "source": [ + "## nlu.autocomplete_pipeline(pipe)\n", + "\n", + "Auto-Complete a pipeline or single annotator into a runnable pipeline by harnessing NLU's DAG Autocompletion algorithm and returns it as NLU pipeline.\n", + "The standard Spark pipeline is avaiable on the `.vanilla_transformer_pipe` attribute of the returned nlu pipe\n", + "\n", + "Every Annotator and Pipeline of Annotators defines a `DAG` of tasks, with various dependencies that must be satisfied in `topoligical order`.\n", + "NLU enables the completion of an incomplete DAG by finding or creating a path between\n", + "the very first input node which is almost always is `DocumentAssembler/MultiDocumentAssembler`\n", + "and the very last node(s), which is given by the `topoligical sorting` the iterable annotators parameter.\n", + "Paths are created by resolving input features of annotators to the corrrosponding providers with matching storage references.\n" + ], + "metadata": { + "id": "txeCLCtWbSVp", + "pycharm": { + "name": "#%% md\n" + } + } }, { - "output_type": "execute_result", - "data": { - "text/plain": [ - " relation_RelationExtractionModel_1fb1dfa024c7 \\\n", - "0 1 \n", - "0 1 \n", - "0 1 \n", - "0 1 \n", - "0 1 \n", - "0 0 \n", - "0 1 \n", - "0 1 \n", - "0 1 \n", - "0 0 \n", - "0 0 \n", - "0 1 \n", - "0 0 \n", - "0 1 \n", - "0 1 \n", - "0 1 \n", - "\n", - " relation_RelationExtractionModel_1fb1dfa024c7_confidence \\\n", - "0 1.0 \n", - "0 0.9999999 \n", - "0 0.99998033 \n", - "0 0.9562254 \n", - "0 0.9990915 \n", - "0 0.94292736 \n", - "0 0.80632734 \n", - "0 0.52616316 \n", - "0 0.9999474 \n", - "0 0.9946185 \n", - "0 0.99416226 \n", - "0 0.9893037 \n", - "0 0.999969 \n", - "0 1.0 \n", - "0 1.0 \n", - "0 1.0 \n", - "\n", - " relation_RelationExtractionModel_1fb1dfa024c7_entity1 \\\n", - "0 allergic reaction \n", - "0 skin \n", - "0 skin \n", - "0 skin \n", - "0 skin \n", - "0 skin \n", - "0 itchy \n", - "0 itchy \n", - "0 itchy \n", - "0 itchy \n", - "0 sore throat/burning/itchy \n", - "0 sore throat/burning/itchy \n", - "0 sore throat/burning/itchy \n", - "0 numbness \n", - "0 numbness \n", - "0 tongue \n", - "\n", - " relation_RelationExtractionModel_1fb1dfa024c7_entity2 \\\n", - "0 vancomycin \n", - "0 itchy \n", - "0 sore throat/burning/itchy \n", - "0 numbness \n", - "0 tongue \n", - "0 gums \n", - "0 sore throat/burning/itchy \n", - "0 numbness \n", - "0 tongue \n", - "0 gums \n", - "0 numbness \n", - "0 tongue \n", - "0 gums \n", - "0 tongue \n", - "0 gums \n", - "0 gums \n", - "\n", - " relation_RelationExtractionModel_1fb1dfa024c7_entity2_class \n", - "0 Drug_Ingredient \n", - "0 Symptom \n", - "0 Symptom \n", - "0 Symptom \n", - "0 External_body_part_or_region \n", - "0 External_body_part_or_region \n", - "0 Symptom \n", - "0 Symptom \n", - "0 External_body_part_or_region \n", - "0 External_body_part_or_region \n", - "0 Symptom \n", - "0 External_body_part_or_region \n", - "0 External_body_part_or_region \n", - "0 External_body_part_or_region \n", - "0 External_body_part_or_region \n", - "0 External_body_part_or_region " + "cell_type": "code", + "source": [ + "from johnsnowlabs import medical\n", + "text = \"\"\"I have an allergic reaction to vancomycin.\n", + " My skin has be itchy, sore throat/burning/itchy, and numbness in tongue and gums.\n", + " I would not recommend this drug to anyone, especially since I have never had such an adverse reaction to any other medication.\"\"\"\n", + "\n", + "re_model = medical.RelationExtractionModel().pretrained(\"re_ade_clinical\", \"en\", 'clinical/models')\n", + "\n", + "nlu_pipe = nlp.autocomplete_pipeline(re_model)\n", + "df = nlu_pipe.predict(text)\n", + "cols = [\n", + "'relation_RelationExtractionModel_1fb1dfa024c7',\n", + "'relation_RelationExtractionModel_1fb1dfa024c7_confidence',\n", + "'relation_RelationExtractionModel_1fb1dfa024c7_entity1',\n", + "'relation_RelationExtractionModel_1fb1dfa024c7_entity2',\n", + "'relation_RelationExtractionModel_1fb1dfa024c7_entity2_class',\n", + "]\n", + "\n", + "df[cols]" ], - "text/html": [ - "\n", - "
\n", - "
\n", - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
relation_RelationExtractionModel_1fb1dfa024c7relation_RelationExtractionModel_1fb1dfa024c7_confidencerelation_RelationExtractionModel_1fb1dfa024c7_entity1relation_RelationExtractionModel_1fb1dfa024c7_entity2relation_RelationExtractionModel_1fb1dfa024c7_entity2_class
011.0allergic reactionvancomycinDrug_Ingredient
010.9999999skinitchySymptom
010.99998033skinsore throat/burning/itchySymptom
010.9562254skinnumbnessSymptom
010.9990915skintongueExternal_body_part_or_region
000.94292736skingumsExternal_body_part_or_region
010.80632734itchysore throat/burning/itchySymptom
010.52616316itchynumbnessSymptom
010.9999474itchytongueExternal_body_part_or_region
000.9946185itchygumsExternal_body_part_or_region
000.99416226sore throat/burning/itchynumbnessSymptom
010.9893037sore throat/burning/itchytongueExternal_body_part_or_region
000.999969sore throat/burning/itchygumsExternal_body_part_or_region
011.0numbnesstongueExternal_body_part_or_region
011.0numbnessgumsExternal_body_part_or_region
011.0tonguegumsExternal_body_part_or_region
\n", - "
\n", - " \n", - " \n", - " \n", - "\n", - " \n", - "
\n", - "
\n", - " " + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 601 + }, + "id": "Pk8ThwOaGHW2", + "outputId": "e118b2f3-2271-45c8-9447-da54314ef26d", + "pycharm": { + "name": "#%%\n" + } + }, + "execution_count": 14, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "re_ade_clinical download started this may take some time.\n", + "[OK!]\n", + "Warning::Spark Session already created, some configs may not take.\n", + "embeddings_clinical download started this may take some time.\n", + "Approximate size to download 1.6 GB\n", + "[OK!]\n", + "pos_anc download started this may take some time.\n", + "Approximate size to download 3.9 MB\n", + "[OK!]\n", + "dependency_conllu download started this may take some time.\n", + "Approximate size to download 16.7 MB\n", + "[OK!]\n", + "ner_jsl download started this may take some time.\n", + "[OK!]\n", + "sentence_detector_dl download started this may take some time.\n", + "Approximate size to download 354.6 KB\n", + "[OK!]\n", + "Warning::Spark Session already created, some configs may not take.\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " relation_RelationExtractionModel_1fb1dfa024c7 \\\n", + "0 1 \n", + "0 1 \n", + "0 1 \n", + "0 1 \n", + "0 1 \n", + "0 1 \n", + "0 0 \n", + "\n", + " relation_RelationExtractionModel_1fb1dfa024c7_confidence \\\n", + "0 0.5657 \n", + "0 0.9604 \n", + "0 0.9604 \n", + "0 0.9604 \n", + "0 0.667 \n", + "0 0.667 \n", + "0 0.75065005 \n", + "\n", + " relation_RelationExtractionModel_1fb1dfa024c7_entity1 \\\n", + "0 allergic reaction \n", + "0 skin \n", + "0 skin \n", + "0 skin \n", + "0 itchy \n", + "0 itchy \n", + "0 sore throat/burning/itchy \n", + "\n", + " relation_RelationExtractionModel_1fb1dfa024c7_entity2 \\\n", + "0 vancomycin \n", + "0 itchy \n", + "0 sore throat/burning/itchy \n", + "0 numbness in tongue and gums \n", + "0 sore throat/burning/itchy \n", + "0 numbness in tongue and gums \n", + "0 numbness in tongue and gums \n", + "\n", + " relation_RelationExtractionModel_1fb1dfa024c7_entity2_class \n", + "0 Drug_Ingredient \n", + "0 Symptom \n", + "0 Symptom \n", + "0 Symptom \n", + "0 Symptom \n", + "0 Symptom \n", + "0 Symptom " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
relation_RelationExtractionModel_1fb1dfa024c7relation_RelationExtractionModel_1fb1dfa024c7_confidencerelation_RelationExtractionModel_1fb1dfa024c7_entity1relation_RelationExtractionModel_1fb1dfa024c7_entity2relation_RelationExtractionModel_1fb1dfa024c7_entity2_class
010.5657allergic reactionvancomycinDrug_Ingredient
010.9604skinitchySymptom
010.9604skinsore throat/burning/itchySymptom
010.9604skinnumbness in tongue and gumsSymptom
010.667itchysore throat/burning/itchySymptom
010.667itchynumbness in tongue and gumsSymptom
000.75065005sore throat/burning/itchynumbness in tongue and gumsSymptom
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 14 + } ] - }, - "metadata": {}, - "execution_count": 73 - } - ] - }, - { - "cell_type": "code", - "source": [], - "metadata": { - "id": "9P3-EmI0l6Dx", - "pycharm": { - "name": "#%%\n" + }, + { + "cell_type": "code", + "source": [], + "metadata": { + "id": "9P3-EmI0l6Dx", + "pycharm": { + "name": "#%%\n" + } + }, + "execution_count": null, + "outputs": [] } - }, - "execution_count": null, - "outputs": [] - } - ] + ] } \ No newline at end of file diff --git a/examples/colab/visualization/NLU_visualizations_tutorial.ipynb b/examples/colab/visualization/NLU_visualizations_tutorial.ipynb index 5b0daac7..7568c0da 100644 --- a/examples/colab/visualization/NLU_visualizations_tutorial.ipynb +++ b/examples/colab/visualization/NLU_visualizations_tutorial.ipynb @@ -3,9 +3,7 @@ "nbformat_minor": 0, "metadata": { "colab": { - "name": "NLU_visualizations_tutorial.ipynb", - "provenance": [], - "collapsed_sections": [] + "provenance": [] }, "kernelspec": { "display_name": "Python 3", @@ -26,7 +24,7 @@ "\n", "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/visualization/NLU_visualizations_tutorial.ipynb)\n", "\n", - "# With NLU and [Spark-NLP-Display](https://github.com/JohnSnowLabs/spark-nlp-display) you can visualize the outputs of various NLP models \n", + "# With NLU and [Spark-NLP-Display](https://github.com/JohnSnowLabs/spark-nlp-display) you can visualize the outputs of various NLP models\n", "visualization\n", "## Available vizualizations\n", "\n", @@ -43,74 +41,69 @@ "id": "5sQOK29tNYlL" }, "source": [ - "# Install NLU" + "# Colab Setup" ] }, { "cell_type": "code", "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "Fgyayzo48ziU", - "outputId": "0ee261dd-47bb-441c-98dd-31c21ab39172" + "id": "Fgyayzo48ziU" }, "source": [ - "!wget https://setup.johnsnowlabs.com/nlu/colab.sh -O - | bash\n", - "import nlu" + "# Install the johnsnowlabs library\n", + "! pip install -q johnsnowlabs" ], - "execution_count": 1, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "--2022-07-16 18:24:31-- https://setup.johnsnowlabs.com/nlu/colab.sh\n", - "Resolving setup.johnsnowlabs.com (setup.johnsnowlabs.com)... 51.158.130.125\n", - "Connecting to setup.johnsnowlabs.com (setup.johnsnowlabs.com)|51.158.130.125|:443... connected.\n", - "HTTP request sent, awaiting response... 302 Moved Temporarily\n", - "Location: https://raw.githubusercontent.com/JohnSnowLabs/nlu/master/scripts/colab_setup.sh [following]\n", - "--2022-07-16 18:24:31-- https://raw.githubusercontent.com/JohnSnowLabs/nlu/master/scripts/colab_setup.sh\n", - "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.109.133, 185.199.111.133, ...\n", - "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.\n", - "HTTP request sent, awaiting response... 200 OK\n", - "Length: 1662 (1.6K) [text/plain]\n", - "Saving to: ‘STDOUT’\n", - "\n", - "- 100%[===================>] 1.62K --.-KB/s in 0s \n", - "\n", - "2022-07-16 18:24:32 (31.1 MB/s) - written to stdout [1662/1662]\n", - "\n", - "Installing NLU 3.4.4 with PySpark 3.0.3 and Spark NLP 3.4.4 for Google Colab ...\n", - "Get:1 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]\n", - "Hit:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease\n", - "Get:3 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/ InRelease [3,626 B]\n", - "Hit:4 http://archive.ubuntu.com/ubuntu bionic InRelease\n", - "Hit:5 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu bionic InRelease\n", - "Ign:6 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 InRelease\n", - "Hit:7 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 Release\n", - "Get:8 http://archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB]\n", - "Hit:9 http://ppa.launchpad.net/cran/libgit2/ubuntu bionic InRelease\n", - "Hit:10 http://ppa.launchpad.net/deadsnakes/ppa/ubuntu bionic InRelease\n", - "Get:11 http://archive.ubuntu.com/ubuntu bionic-backports InRelease [74.6 kB]\n", - "Hit:12 http://ppa.launchpad.net/graphics-drivers/ppa/ubuntu bionic InRelease\n", - "Get:13 http://security.ubuntu.com/ubuntu bionic-security/restricted amd64 Packages [1,057 kB]\n", - "Get:14 http://security.ubuntu.com/ubuntu bionic-security/universe amd64 Packages [1,526 kB]\n", - "Get:15 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages [2,897 kB]\n", - "Get:17 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages [3,329 kB]\n", - "Get:18 http://archive.ubuntu.com/ubuntu bionic-updates/universe amd64 Packages [2,302 kB]\n", - "Fetched 11.4 MB in 4s (2,838 kB/s)\n", - "Reading package lists... Done\n", - "tar: spark-3.0.2-bin-hadoop2.7.tgz: Cannot open: No such file or directory\n", - "tar: Error is not recoverable: exiting now\n", - "\u001b[K |████████████████████████████████| 209.1 MB 49 kB/s \n", - "\u001b[K |████████████████████████████████| 145 kB 53.2 MB/s \n", - "\u001b[K |████████████████████████████████| 531 kB 54.8 MB/s \n", - "\u001b[K |████████████████████████████████| 198 kB 55.0 MB/s \n", - "\u001b[?25h Building wheel for pyspark (setup.py) ... \u001b[?25l\u001b[?25hdone\n" - ] - } - ] + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "# Upload License for Licensed visualizations\n", + "After NER example Licensed visualizations will start with licensed model. To be able to use Licensed visualizations, upload you license and run the following cells. If you will not use Licensed visualizations, you can skip the following cells.\n" + ], + "metadata": { + "id": "-lyRf41WGpyn" + } + }, + { + "cell_type": "code", + "source": [ + "from google.colab import files\n", + "print('Please Upload your John Snow Labs License using the button below')\n", + "license_keys = files.upload()" + ], + "metadata": { + "id": "qUpu6nmzGbgB" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "from johnsnowlabs import nlp\n", + "\n", + "# After uploading your license run this to install all licensed Python Wheels and pre-download Jars the Spark Session JVM\n", + "nlp.install()" + ], + "metadata": { + "id": "Yanv3DX1GcVK" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "from johnsnowlabs import nlp\n", + "spark=nlp.start()" + ], + "metadata": { + "id": "Fw5Jm5gXGjd8" + }, + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", @@ -127,62 +120,28 @@ "metadata": { "colab": { "base_uri": "https://localhost:8080/", - "height": 636 + "height": 192 }, "id": "EwIk8DSLlje4", - "outputId": "9b39d98f-65b0-4fcf-cedf-640b9225aea3" + "outputId": "57f4a7f2-9427-4c15-a18a-3b6730b81cb5" }, "source": [ - "nlu.load('ner').viz(\"Donald Trump from America and Angela Merkel from Germany don't share many oppinions.\")" + "from johnsnowlabs import nlp\n", + "\n", + "nlp.load('ner').viz(\"Donald Trump from America and Angela Merkel from Germany don't share many oppinions.\")" ], - "execution_count": 2, + "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", "onto_recognize_entities_sm download started this may take some time.\n", - "Approx size to download 160.1 MB\n", - "[OK!]\n" - ] - }, - { - "output_type": "stream", - "name": "stderr", - "text": [ - "WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.\n", - "Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.\n", - "To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.\n" - ] - }, - { - "output_type": "stream", - "name": "stdout", - "text": [ - "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n", - "Collecting spark-nlp-display\n", - " Downloading spark_nlp_display-4.0-py3-none-any.whl (95 kB)\n", - "Requirement already satisfied: pandas in /usr/local/lib/python3.7/dist-packages (from spark-nlp-display) (1.3.5)\n", - "Requirement already satisfied: spark-nlp in /usr/local/lib/python3.7/dist-packages (from spark-nlp-display) (4.0.1)\n", - "Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from spark-nlp-display) (1.21.6)\n", - "Requirement already satisfied: ipython in /usr/local/lib/python3.7/dist-packages (from spark-nlp-display) (5.5.0)\n", - "Collecting svgwrite==1.4\n", - " Downloading svgwrite-1.4-py3-none-any.whl (66 kB)\n", - "Requirement already satisfied: decorator in /usr/local/lib/python3.7/dist-packages (from ipython->spark-nlp-display) (4.4.2)\n", - "Requirement already satisfied: pickleshare in /usr/local/lib/python3.7/dist-packages (from ipython->spark-nlp-display) (0.7.5)\n", - "Requirement already satisfied: traitlets>=4.2 in /usr/local/lib/python3.7/dist-packages (from ipython->spark-nlp-display) (5.1.1)\n", - "Requirement already satisfied: setuptools>=18.5 in /usr/local/lib/python3.7/dist-packages (from ipython->spark-nlp-display) (57.4.0)\n", - "Requirement already satisfied: prompt-toolkit<2.0.0,>=1.0.4 in /usr/local/lib/python3.7/dist-packages (from ipython->spark-nlp-display) (1.0.18)\n", - "Requirement already satisfied: pygments in /usr/local/lib/python3.7/dist-packages (from ipython->spark-nlp-display) (2.6.1)\n", - "Requirement already satisfied: simplegeneric>0.8 in /usr/local/lib/python3.7/dist-packages (from ipython->spark-nlp-display) (0.8.1)\n", - "Requirement already satisfied: pexpect in /usr/local/lib/python3.7/dist-packages (from ipython->spark-nlp-display) (4.8.0)\n", - "Requirement already satisfied: wcwidth in /usr/local/lib/python3.7/dist-packages (from prompt-toolkit<2.0.0,>=1.0.4->ipython->spark-nlp-display) (0.2.5)\n", - "Requirement already satisfied: six>=1.9.0 in /usr/local/lib/python3.7/dist-packages (from prompt-toolkit<2.0.0,>=1.0.4->ipython->spark-nlp-display) (1.15.0)\n", - "Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.7/dist-packages (from pandas->spark-nlp-display) (2.8.2)\n", - "Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.7/dist-packages (from pandas->spark-nlp-display) (2022.1)\n", - "Requirement already satisfied: ptyprocess>=0.5 in /usr/local/lib/python3.7/dist-packages (from pexpect->ipython->spark-nlp-display) (0.7.0)\n", - "Installing collected packages: svgwrite, spark-nlp-display\n", - "Successfully installed spark-nlp-display-4.0 svgwrite-1.4\n" + "Approx size to download 159 MB\n", + "[OK!]\n", + "Warning::Spark Session already created, some configs may not take.\n" ] }, { @@ -272,7 +231,7 @@ " }\n", "\n", "\n", - " Donald Trump PERSON from America GPE and Angela Merkel PERSON from Germany GPE don't share many oppinions." + " Donald Trump PERSON from America GPE and Angela Merkel PERSON from Germany GPE don't share many oppinions." ] }, "metadata": {} @@ -285,7 +244,7 @@ "id": "7PbHZHel14Ot" }, "source": [ - "# Visualize Dependency tree \n", + "# Visualize Dependency tree\n", "Visualizes the structure of the labeled dependency tree and part of speech tags" ] }, @@ -294,29 +253,32 @@ "metadata": { "colab": { "base_uri": "https://localhost:8080/", - "height": 535 + "height": 580 }, "id": "VtA6h62SVxTQ", - "outputId": "139ec924-499d-4a13-aa0e-b46ad11d9f9e" + "outputId": "b3f096dc-9e64-4bd0-b1bb-66b444bdae33" }, "source": [ - "nlu.load('dep.typed').viz(\"Billy went to the mall\")" + "nlp.load('dep.typed').viz(\"Billy went to the mall\")" ], - "execution_count": 3, + "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", "dependency_typed_conllu download started this may take some time.\n", "Approximate size to download 2.4 MB\n", "[OK!]\n", + "pos_anc download started this may take some time.\n", + "Approximate size to download 3.9 MB\n", + "[OK!]\n", "dependency_conllu download started this may take some time.\n", "Approximate size to download 16.7 MB\n", "[OK!]\n", - "pos_anc download started this may take some time.\n", - "Approximate size to download 3.9 MB\n", - "[OK!]\n" + "Warning::Spark Session already created, some configs may not take.\n" ] }, { @@ -330,7 +292,7 @@ " font-family: \"Lucida\"; \n", " src: url(\"data:application/x-font-ttf;charset=utf-8;base64,\"); \n", "}\n", - "]]>BillyNNPwentVBDtoTOtheDTmallNNnsubjnsubjmarknsubj" + "]]>BillyNNPwentVBDtoTOtheDTmallNNnsubjnsubjmarknsubj" ] }, "metadata": {} @@ -340,14 +302,54 @@ { "cell_type": "code", "metadata": { - "id": "0Sip0vRS19wi" + "id": "0Sip0vRS19wi", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 700 + }, + "outputId": "9c5ef714-d526-45c5-8fba-e8662c7ba517" }, "source": [ "#Bigger Example\n", - "nlu.load('dep.typed').viz(\"Donald Trump from America and Angela Merkel from Germany don't share many oppinions but they both love John Snow Labs software\")" + "nlp.load('dep.typed').viz(\"Donald Trump from America and Angela Merkel from Germany don't share many oppinions but they both love John Snow Labs software\")" ], "execution_count": null, - "outputs": [] + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", + "dependency_typed_conllu download started this may take some time.\n", + "Approximate size to download 2.4 MB\n", + "[OK!]\n", + "pos_anc download started this may take some time.\n", + "Approximate size to download 3.9 MB\n", + "[OK!]\n", + "dependency_conllu download started this may take some time.\n", + "Approximate size to download 16.7 MB\n", + "[OK!]\n", + "Warning::Spark Session already created, some configs may not take.\n" + ] + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "" + ], + "text/html": [ + "DonaldNNPTrumpNNPfromINAmericaNNPandCCAngelaNNPMerkelNNPfromINGermanyNNPdon'tNNshareNNmanyJJoppinionsNNSbutCCtheyPRPbothDTloveNNJohnNNPSnowNNPLabsNNPsoftwareNNcccaseflatnsubjflatamodflatflatnsubjflatflatcompoundflatflatflatdetflatflatflatflat" + ] + }, + "metadata": {} + } + ] }, { "cell_type": "markdown", @@ -377,31 +379,38 @@ "metadata": { "colab": { "base_uri": "https://localhost:8080/", - "height": 349 + "height": 460 }, "id": "El2MNpo8_0H8", - "outputId": "d5858458-f8a2-4ff5-e798-dd198eca7eea" + "outputId": "69568e60-928a-4f3b-d68e-73acb1482fe3" }, "source": [ - "nlu.load('med_ner.jsl.wip.clinical en.resolve.rxnorm').viz(\"He took 2 pills of Aspirin daily\")" + "nlp.load('med_ner.jsl.wip.clinical en.resolve.rxnorm').viz(\"He took 2 pills of Aspirin daily\")" ], - "execution_count": 4, + "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", "ner_wikiner_glove_840B_300 download started this may take some time.\n", "Approximate size to download 14.8 MB\n", "[OK!]\n", "sbiobertresolve_rxnorm download started this may take some time.\n", "[OK!]\n", + "setInputCols in ENTITY_e8e8480527b5 expecting 1 columns. Provided column amount: 2. Which should be columns from the following annotators: ['sentence_embeddings']\n", "glove_840B_300 download started this may take some time.\n", "Approximate size to download 2.3 GB\n", "[OK!]\n", "sbiobert_base_cased_mli download started this may take some time.\n", "Approximate size to download 384.3 MB\n", - "[OK!]\n" + "[OK!]\n", + "setInputCols in ENTITY_e8e8480527b5 expecting 1 columns. Provided column amount: 2. Which should be columns from the following annotators: ['sentence_embeddings']\n", + "setInputCols in ENTITY_e8e8480527b5 expecting 1 columns. Provided column amount: 2. Which should be columns from the following annotators: ['sentence_embeddings']\n", + "setInputCols in ENTITY_e8e8480527b5 expecting 1 columns. Provided column amount: 2. Which should be columns from the following annotators: ['sentence_embeddings']\n", + "Warning::Spark Session already created, some configs may not take.\n" ] }, { @@ -491,7 +500,7 @@ " }\n", "\n", "\n", - " He took 2 pills of Aspirin MISC1114513 vitamine e acetate daily" + " He took 2 pills of Aspirin MISC1191 aspirin daily" ] }, "metadata": {} @@ -503,33 +512,40 @@ "metadata": { "colab": { "base_uri": "https://localhost:8080/", - "height": 611 + "height": 853 }, "id": "1klMOFjZmyGF", - "outputId": "1d4db86a-551e-4201-d7a6-199705d0401f" + "outputId": "c396427f-9094-4370-a921-672eeb4bc2c9" }, "source": [ "# bigger example\n", "data = \"This is an 82 - year-old male with a history of prior tobacco use , hypertension , chronic renal insufficiency , COPD , gastritis , and TIA who initially presented to Braintree with a non-ST elevation MI and Guaiac positive stools , transferred to St . Margaret\\'s Center for Women & Infants for cardiac catheterization with PTCA to mid LAD lesion complicated by hypotension and bradycardia requiring Atropine , IV fluids and transient dopamine possibly secondary to vagal reaction , subsequently transferred to CCU for close monitoring , hemodynamically stable at the time of admission to the CCU .\"\n", - "nlu.load('med_ner.jsl.wip.clinical en.resolve.rxnorm').viz(data)\n" + "nlp.load('med_ner.jsl.wip.clinical en.resolve.rxnorm').viz(data)\n" ], - "execution_count": 5, + "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", "ner_wikiner_glove_840B_300 download started this may take some time.\n", "Approximate size to download 14.8 MB\n", "[OK!]\n", "sbiobertresolve_rxnorm download started this may take some time.\n", "[OK!]\n", + "setInputCols in ENTITY_e8e8480527b5 expecting 1 columns. Provided column amount: 2. Which should be columns from the following annotators: ['sentence_embeddings']\n", "glove_840B_300 download started this may take some time.\n", "Approximate size to download 2.3 GB\n", "[OK!]\n", "sbiobert_base_cased_mli download started this may take some time.\n", "Approximate size to download 384.3 MB\n", - "[OK!]\n" + "[OK!]\n", + "setInputCols in ENTITY_e8e8480527b5 expecting 1 columns. Provided column amount: 2. Which should be columns from the following annotators: ['sentence_embeddings']\n", + "setInputCols in ENTITY_e8e8480527b5 expecting 1 columns. Provided column amount: 2. Which should be columns from the following annotators: ['sentence_embeddings']\n", + "setInputCols in ENTITY_e8e8480527b5 expecting 1 columns. Provided column amount: 2. Which should be columns from the following annotators: ['sentence_embeddings']\n", + "Warning::Spark Session already created, some configs may not take.\n" ] }, { @@ -619,7 +635,7 @@ " }\n", "\n", "\n", - " This is an 82 - year-old male with a history of prior tobacco use , hypertension , chronic renal insufficiency , COPD , gastritis , and TIA who initially presented to Braintree LOC1043301 prenexa with a non-ST MISC759577 nohist plus elevation MI and Guaiac MISC261530 gengraf positive stools , transferred to St LOC402085 factive . Margaret's Center for Women & Infants ORG1856515 levonorgestrel intrauterine system for cardiac catheterization with PTCA to mid LAD lesion complicated by hypotension and bradycardia requiring Atropine , IV fluids and transient dopamine possibly secondary to vagal reaction , subsequently transferred to CCU ORG1602024 natpara for close monitoring , hemodynamically stable at the time of admission to the CCU ORG1602024 natpara ." + " This is an 82 - year-old male with a history of prior tobacco use , hypertension , chronic renal insufficiency , COPD , gastritis , and TIA who initially presented to Braintree LOC607707 bionect with a non-ST MISC1314253 nonanal elevation MI and Guaiac MISC1373150 guaiac positive stools , transferred to St LOC583344 accu . Margaret's Center for Women & Infants ORG1856515 levonorgestrel intrauterine system for cardiac catheterization with PTCA to mid LAD lesion complicated by hypotension and bradycardia requiring Atropine , IV fluids and transient dopamine possibly secondary to vagal reaction , subsequently transferred to CCU ORG225636 clinicide for close monitoring , hemodynamically stable at the time of admission to the CCU ORG225636 clinicide ." ] }, "metadata": {} @@ -642,30 +658,37 @@ "id": "MFP2lPVlAQyf", "colab": { "base_uri": "https://localhost:8080/", - "height": 242 + "height": 373 }, - "outputId": "a2358953-f5b0-4971-a342-88d6dcdc3133" + "outputId": "655aad3a-7192-401d-999b-f048145313ce" }, "source": [ - "nlu.load('med_ner.jsl.wip.clinical resolve.icd10cm').viz('She was diagnosed with a respiratory congestion')" + "nlp.load('med_ner.jsl.wip.clinical resolve.icd10cm').viz('She was diagnosed with a respiratory congestion')" ], - "execution_count": 6, + "execution_count": 23, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", "ner_wikiner_glove_840B_300 download started this may take some time.\n", "Approximate size to download 14.8 MB\n", "[OK!]\n", "sbiobertresolve_icd10cm download started this may take some time.\n", "[OK!]\n", + "setInputCols in ENTITY_05f4bc95f0c5 expecting 1 columns. Provided column amount: 2. Which should be columns from the following annotators: ['sentence_embeddings']\n", "glove_840B_300 download started this may take some time.\n", "Approximate size to download 2.3 GB\n", "[OK!]\n", "sbiobert_base_cased_mli download started this may take some time.\n", "Approximate size to download 384.3 MB\n", - "[OK!]\n" + "[OK!]\n", + "setInputCols in ENTITY_05f4bc95f0c5 expecting 1 columns. Provided column amount: 2. Which should be columns from the following annotators: ['sentence_embeddings']\n", + "setInputCols in ENTITY_05f4bc95f0c5 expecting 1 columns. Provided column amount: 2. Which should be columns from the following annotators: ['sentence_embeddings']\n", + "setInputCols in ENTITY_05f4bc95f0c5 expecting 1 columns. Provided column amount: 2. Which should be columns from the following annotators: ['sentence_embeddings']\n", + "Warning::Spark Session already created, some configs may not take.\n" ] }, { @@ -768,32 +791,39 @@ "id": "lBQAsPSXm2_Y", "colab": { "base_uri": "https://localhost:8080/", - "height": 528 + "height": 663 }, - "outputId": "08374860-57bb-4e8d-8585-51f148791486" + "outputId": "1aefc1b6-23cc-4040-897b-8a070d82dbbc" }, "source": [ "# bigger example\n", "data = 'The patient is a 5-month-old infant who presented initially on Monday with a cold, cough, and runny nose for 2 days. Mom states she had no fever. Her appetite was good but she was spitting up a lot. She had no difficulty breathing and her cough was described as dry and hacky. At that time, physical exam showed a right TM, which was red. Left TM was okay. She was fairly congested but looked happy and playful. She was started on Amoxil and Aldex and we told to recheck in 2 weeks to recheck her ear. Mom returned to clinic again today because she got much worse overnight. She was having difficulty breathing. She was much more congested and her appetite had decreased significantly today. She also spiked a temperature yesterday of 102.6 and always having trouble sleeping secondary to congestion'\n", - "nlu.load('med_ner.jsl.wip.clinical resolve.icd10cm').viz(data)" + "nlp.load('med_ner.jsl.wip.clinical resolve.icd10cm').viz(data)" ], - "execution_count": 7, + "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", "ner_wikiner_glove_840B_300 download started this may take some time.\n", "Approximate size to download 14.8 MB\n", "[OK!]\n", "sbiobertresolve_icd10cm download started this may take some time.\n", "[OK!]\n", + "setInputCols in ENTITY_05f4bc95f0c5 expecting 1 columns. Provided column amount: 2. Which should be columns from the following annotators: ['sentence_embeddings']\n", "glove_840B_300 download started this may take some time.\n", "Approximate size to download 2.3 GB\n", "[OK!]\n", "sbiobert_base_cased_mli download started this may take some time.\n", "Approximate size to download 384.3 MB\n", - "[OK!]\n" + "[OK!]\n", + "setInputCols in ENTITY_05f4bc95f0c5 expecting 1 columns. Provided column amount: 2. Which should be columns from the following annotators: ['sentence_embeddings']\n", + "setInputCols in ENTITY_05f4bc95f0c5 expecting 1 columns. Provided column amount: 2. Which should be columns from the following annotators: ['sentence_embeddings']\n", + "setInputCols in ENTITY_05f4bc95f0c5 expecting 1 columns. Provided column amount: 2. Which should be columns from the following annotators: ['sentence_embeddings']\n", + "Warning::Spark Session already created, some configs may not take.\n" ] }, { @@ -883,7 +913,7 @@ " }\n", "\n", "\n", - " The patient is a 5-month-old infant who presented initially on Monday with a cold, cough, and runny nose for 2 days. Mom states she had no fever. Her appetite was good but she was spitting up a lot. She had no difficulty breathing and her cough was described as dry and hacky. At that time, physical exam showed a right TM, which was red. Left TM MISCH93212 Auditory recruitment, left ear was okay. She was fairly congested but looked happy and playful. She was started on Amoxil and Aldex LOCF650 Fetishism and we told to recheck in 2 weeks to recheck her ear. Mom returned to clinic again today because she got much worse overnight. She was having difficulty breathing. She was much more congested and her appetite had decreased significantly today. She also spiked a temperature yesterday of 102.6 and always having trouble sleeping secondary to congestion" + " The patient is a 5-month-old infant who presented initially on Monday with a cold, cough, and runny nose for 2 days. Mom states she had no fever. Her appetite was good but she was spitting up a lot. She had no difficulty breathing and her cough was described as dry and hacky. At that time, physical exam showed a right TM, which was red. Left TM MISCL02522 Furuncle left hand was okay. She was fairly congested but looked happy and playful. She was started on Amoxil and Aldex LOCQ843 Anonychia and we told to recheck in 2 weeks to recheck her ear. Mom returned to clinic again today because she got much worse overnight. She was having difficulty breathing. She was much more congested and her appetite had decreased significantly today. She also spiked a temperature yesterday of 102.6 and always having trouble sleeping secondary to congestion" ] }, "metadata": {} @@ -906,19 +936,21 @@ "id": "d0a_GgOqnM7b", "colab": { "base_uri": "https://localhost:8080/", - "height": 242 + "height": 284 }, - "outputId": "62979688-1472-4a77-f7ba-50b2f2238ed9" + "outputId": "8d37fa99-ac18-45a4-b9ce-4c41c2131e1b" }, "source": [ - "nlu.load('med_ner.clinical assert').viz(\"The MRI scan showed no signs of cancer in the left lung\")" + "nlp.load('med_ner.clinical assert').viz(\"The MRI scan showed no signs of cancer in the left lung\")" ], - "execution_count": 8, + "execution_count": 26, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", "ner_wikiner_glove_840B_300 download started this may take some time.\n", "Approximate size to download 14.8 MB\n", "[OK!]\n", @@ -929,7 +961,8 @@ "[OK!]\n", "embeddings_clinical download started this may take some time.\n", "Approximate size to download 1.6 GB\n", - "[OK!]\n" + "[OK!]\n", + "Warning::Spark Session already created, some configs may not take.\n" ] }, { @@ -1032,21 +1065,23 @@ "id": "hQPS_-2y-Quu", "colab": { "base_uri": "https://localhost:8080/", - "height": 540 + "height": 606 }, - "outputId": "9d086a53-130e-4ff9-b9d9-73af5919e353" + "outputId": "c94d34d7-43ae-4d42-9b83-5f7525a61ae4" }, "source": [ "#bigger example\n", "data ='This is the case of a very pleasant 46-year-old Caucasian female, seen in clinic on 12/11/07 during which time MRI of the left shoulder showed no evidence of rotator cuff tear. She did have a previous MRI of the cervical spine that did show an osteophyte on the left C6-C7 level. Based on this, negative MRI of the shoulder, the patient was recommended to have anterior cervical discectomy with anterior interbody fusion at C6-C7 level. Operation, expected outcome, risks, and benefits were discussed with her. Risks include, but not exclusive of bleeding and infection, bleeding could be soft tissue bleeding, which may compromise airway and may result in return to the operating room emergently for evacuation of said hematoma. There is also the possibility of bleeding into the epidural space, which can compress the spinal cord and result in weakness and numbness of all four extremities as well as impairment of bowel and bladder function. However, the patient may develop deeper-seated infection, which may require return to the operating room. Should the infection be in the area of the spinal instrumentation, this will cause a dilemma since there might be a need to remove the spinal instrumentation and/or allograft. There is also the possibility of potential injury to the esophageus, the trachea, and the carotid artery. There is also the risks of stroke on the right cerebral circulation should an undiagnosed plaque be propelled from the right carotid. She understood all of these risks and agreed to have the procedure performed.'\n", - "nlu.load('med_ner.clinical assert').viz(data)" + "nlp.load('med_ner.clinical assert').viz(data)" ], - "execution_count": 9, + "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", "ner_wikiner_glove_840B_300 download started this may take some time.\n", "Approximate size to download 14.8 MB\n", "[OK!]\n", @@ -1057,7 +1092,8 @@ "[OK!]\n", "embeddings_clinical download started this may take some time.\n", "Approximate size to download 1.6 GB\n", - "[OK!]\n" + "[OK!]\n", + "Warning::Spark Session already created, some configs may not take.\n" ] }, { @@ -1147,7 +1183,7 @@ " }\n", "\n", "\n", - " This is the case of a very pleasant 46-year-old Caucasian MISChypothetical female, seen in clinic on 12/11/07 during which time MRI of the left shoulder showed no evidence of rotator cuff tear. She did have a previous MRI of the cervical spine that did show an osteophyte on the left C6-C7 level. Based on this, negative MRI of the shoulder, the patient was recommended to have anterior cervical discectomy with anterior interbody fusion at C6-C7 level. Operation MISChypothetical , expected outcome, risks, and benefits were discussed with her. Risks include, but not exclusive of bleeding and infection, bleeding could be soft tissue bleeding, which may compromise airway and may result in return to the operating room emergently for evacuation of said hematoma. There is also the possibility of bleeding into the epidural space, which can compress the spinal cord and result in weakness and numbness of all four extremities as well as impairment of bowel and bladder function. However, the patient may develop deeper-seated infection, which may require return to the operating room. Should the infection be in the area of the spinal instrumentation, this will cause a dilemma since there might be a need to remove the spinal instrumentation and/or allograft. There is also the possibility of potential injury to the esophageus, the trachea, and the carotid artery. There is also the risks of stroke on the right cerebral circulation should an undiagnosed plaque be propelled from the right carotid. She understood all of these risks and agreed to have the procedure performed." + " This is the case of a very pleasant 46-year-old Caucasian MISCpresent female, seen in clinic on 12/11/07 during which time MRI of the left shoulder showed no evidence of rotator cuff tear. She did have a previous MRI of the cervical spine that did show an osteophyte on the left C6-C7 level. Based on this, negative MRI of the shoulder, the patient was recommended to have anterior cervical discectomy with anterior interbody fusion at C6-C7 level. Operation MISCpresent , expected outcome, risks, and benefits were discussed with her. Risks include, but not exclusive of bleeding and infection, bleeding could be soft tissue bleeding, which may compromise airway and may result in return to the operating room emergently for evacuation of said hematoma. There is also the possibility of bleeding into the epidural space, which can compress the spinal cord and result in weakness and numbness of all four extremities as well as impairment of bowel and bladder function. However, the patient may develop deeper-seated infection, which may require return to the operating room. Should the infection be in the area of the spinal instrumentation, this will cause a dilemma since there might be a need to remove the spinal instrumentation and/or allograft. There is also the possibility of potential injury to the esophageus, the trachea, and the carotid artery. There is also the risks of stroke on the right cerebral circulation should an undiagnosed plaque be propelled from the right carotid. She understood all of these risks and agreed to have the procedure performed." ] }, "metadata": {} @@ -1170,19 +1206,21 @@ "id": "2AXuhrytnS7N", "colab": { "base_uri": "https://localhost:8080/", - "height": 342 + "height": 387 }, - "outputId": "68051435-1056-4bfa-ea10-a10e6132ca4d" + "outputId": "612642c9-1cd3-49bf-b8b1-78a0a0e44d6a" }, "source": [ - "nlu.load('med_ner.jsl.wip.clinical relation.temporal_events').viz('He developed cancer after a mercury poisoning in 1999 ') " + "nlp.load('med_ner.jsl.wip.clinical relation.temporal_events').viz('He developed cancer after a mercury poisoning in 1999')" ], - "execution_count": 10, + "execution_count": 14, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", "ner_wikiner_glove_840B_300 download started this may take some time.\n", "Approximate size to download 14.8 MB\n", "[OK!]\n", @@ -1190,7 +1228,8 @@ "[OK!]\n", "glove_840B_300 download started this may take some time.\n", "Approximate size to download 2.3 GB\n", - "[OK!]\n" + "[OK!]\n", + "Warning::Spark Session already created, some configs may not take.\n" ] }, { @@ -1219,19 +1258,36 @@ "base_uri": "https://localhost:8080/", "height": 1000 }, - "outputId": "527dc314-5f7c-4961-a496-38cc11541379" + "outputId": "126624d5-7892-4367-f715-35340acddc0a" }, "source": [ "# bigger example\n", - "data = 'This is the case of a very pleasant 46-year-old Caucasian female, seen in clinic on 12/11/07 during which time MRI of the left shoulder showed no evidence of rotator cuff tear. She did have a previous MRI of the cervical spine that did show an osteophyte on the left C6-C7 level. Based on this, negative MRI of the shoulder, the patient was recommended to have anterior cervical discectomy with anterior interbody fusion at C6-C7 level. Operation, expected outcome, risks, and benefits were discussed with her. Risks include, but not exclusive of bleeding and infection, bleeding could be soft tissue bleeding, which may compromise airway and may result in return to the operating room emergently for evacuation of said hematoma. There is also the possibility of bleeding into the epidural space, which can compress the spinal cord and result in weakness and numbness of all four extremities as well as impairment of bowel and bladder function. However, the patient may develop deeper-seated infection, which may require return to the operating room. Should the infection be in the area of the spinal instrumentation, this will cause a dilemma since there might be a need to remove the spinal instrumentation and/or allograft. There is also the possibility of potential injury to the esophageus, the trachea, and the carotid artery. There is also the risks of stroke on the right cerebral circulation should an undiagnosed plaque be propelled from the right carotid. She understood all of these risks and agreed to have the procedure performed'\n", - "pipe = nlu.load('med_ner.jsl.wip.clinical relation.clinical').viz(data)" + "data =\"\"\"A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus ( T2DM ),\n", + "one prior episode of HTG-induced pancreatitis three years prior to presentation, associated with an acute hepatitis , and obesity with a body mass index ( BMI ) of 33.5 kg/m2 ,\n", + "presented with a one-week history of polyuria , polydipsia , poor appetite , and vomiting . Two weeks prior to presentation , she was treated with a five-day course of amoxicillin for a respiratory tract infection .\n", + "She was on metformin , glipizide , and dapagliflozin for T2DM and atorvastatin and gemfibrozil for HTG . She had been on dapagliflozin for six months at the time of presentation.\n", + "Physical examination on presentation was significant for dry oral mucosa ; significantly , her abdominal examination was benign with no tenderness , guarding , or rigidity .\n", + "Pertinent laboratory findings on admission were : serum glucose 111 mg/dl , bicarbonate 18 mmol/l , anion gap 20 , creatinine 0.4 mg/dL , triglycerides 508 mg/dL , total cholesterol 122 mg/dL , glycated hemoglobin ( HbA1c ) 10% , and venous pH 7.27 .\n", + "Serum lipase was normal at 43 U/L . Serum acetone levels could not be assessed as blood samples kept hemolyzing due to significant lipemia .\n", + "The patient was initially admitted for starvation ketosis , as she reported poor oral intake for three days prior to admission .\n", + "However , serum chemistry obtained six hours after presentation revealed her glucose was 186 mg/dL , the anion gap was still elevated at 21 , serum bicarbonate was 16 mmol/L , triglyceride level peaked at 2050 mg/dL , and lipase was 52 U/L .\n", + "The β-hydroxybutyrate level was obtained and found to be elevated at 5.29 mmol/L - the original sample was centrifuged and the chylomicron layer removed prior to analysis due to interference from turbidity caused by lipemia again .\n", + "The patient was treated with an insulin drip for euDKA and HTG with a reduction in the anion gap to 13 and triglycerides to 1400 mg/dL , within 24 hours .\n", + "Her euDKA was thought to be precipitated by her respiratory tract infection in the setting of SGLT2 inhibitor use .\n", + "The patient was seen by the endocrinology service and she was discharged on 40 units of insulin glargine at night , 12 units of insulin lispro with meals , and metformin 1000 mg two times a day .\n", + "It was determined that all SGLT2 inhibitors should be discontinued indefinitely .\n", + "She had close follow-up with endocrinology post discharge .\n", + "\"\"\"\n", + "pipe = nlp.load('med_ner.jsl.wip.clinical relation.clinical').viz(data)" ], - "execution_count": 11, + "execution_count": 5, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", "ner_wikiner_glove_840B_300 download started this may take some time.\n", "Approximate size to download 14.8 MB\n", "[OK!]\n", @@ -1239,7 +1295,8 @@ "[OK!]\n", "glove_840B_300 download started this may take some time.\n", "Approximate size to download 2.3 GB\n", - "[OK!]\n" + "[OK!]\n", + "Warning::Spark Session already created, some configs may not take.\n" ] }, { @@ -1249,11 +1306,11 @@ "" ], "text/html": [ - "Thisisthecaseofaverypleasant46-year-oldCaucasianfemale,seeninclinicon12/11/07duringwhichtimeMRIoftheleftshouldershowednoevidenceofrotatorcufftear.ShedidhaveapreviousMRIofthecervicalspinethatdidshowanosteophyteontheleftC6-C7level.Basedonthis,negativeMRIoftheshoulder,thepatientwasrecommendedtohaveanteriorcervicaldiscectomywithanteriorinterbodyfusionatC6-C7level.Operation,expectedoutcome,risks,andbenefitswerediscussedwithher.Risksinclude,butnotexclusiveofbleedingandinfection,bleedingcouldbesofttissuebleeding,whichmaycompromiseairwayandmayresultinreturntotheoperatingroomemergentlyforevacuationofsaidhematoma.Thereisalsothepossibilityofbleedingintotheepiduralspace,whichcancompressthespinalcordandresultinweaknessandnumbnessofallfourextremitiesaswellasimpairmentofbowelandbladderfunction.However,thepatientmaydevelopdeeper-seatedinfection,whichmayrequirereturntotheoperatingroom.Shouldtheinfectionbeintheareaofthespinalinstrumentation,thiswillcauseadilemmasincetheremightbeaneedtoremovethespinalinstrumentationand/orallograft.Thereisalsothepossibilityofpotentialinjurytotheesophageus,thetrachea,andthecarotidartery.Thereisalsotherisksofstrokeontherightcerebralcirculationshouldanundiagnosedplaquebepropelledfromtherightcarotid.Sheunderstoodalloftheserisksandagreedtohavetheprocedureperformed" + "]]>A28-year-oldfemalewithahistoryofgestationaldiabetesmellitusdiagnosedeightyearspriortopresentationandsubsequenttypetwodiabetesmellitus(T2DMMISC),onepriorepisodeofHTG-inducedpancreatitisthreeyearspriortopresentation,associatedwithanacutehepatitis,andobesitywithabodymassindex(BMI)of33.5kg/m2,presentedwithaone-weekhistoryofpolyuria,polydipsia,poorappetite,andvomiting.Twoweekspriortopresentation,shewastreatedwithafive-daycourseofamoxicillinforarespiratorytractinfection.Shewasonmetformin,glipizide,anddapagliflozinforT2DMMISCandatorvastatinandgemfibrozilforHTG.Shehadbeenondapagliflozinforsixmonthsatthetimeofpresentation.PhysicalMISCexaminationonpresentationwassignificantfordryoralmucosa;significantly,herabdominalexaminationwasbenignwithnotenderness,guarding,orrigidity.Pertinentlaboratoryfindingsonadmissionwere:serumglucose111mg/dl,bicarbonate18mmol/l,aniongap20,creatinine0.4mg/dL,triglycerides508mg/dL,totalcholesterol122mg/dL,glycatedhemoglobin(HbA1cMISC)10%,andvenouspH7.27.Serumlipasewasnormalat43U/L.Serumacetonelevelscouldnotbeassessedasbloodsampleskepthemolyzingduetosignificantlipemia.Thepatientwasinitiallyadmittedforstarvationketosis,asshereportedpoororalintakeforthreedayspriortoadmission.However,serumchemistryobtainedsixhoursafterpresentationrevealedherglucosewas186mg/dL,theaniongapwasstillelevatedat21,serumbicarbonatewas16mmol/L,triglyceridelevelpeakedat2050mg/dL,andlipasewas52U/L.Theβ-hydroxybutyratelevelwasobtainedandfoundtobeelevatedat5.29mmol/L-theoriginalsamplewascentrifugedandthechylomicronlayerremovedpriortoanalysisduetointerferencefromturbiditycausedbylipemiaagain.ThepatientwastreatedwithaninsulindripforeuDKAORGandHTGwithareductionintheaniongapto13andtriglyceridesto1400mg/dL,within24hours.HereuDKAwasthoughttobeprecipitatedbyherrespiratorytractinfectioninthesettingofSGLT2inhibitoruse.Thepatientwasseenbytheendocrinologyserviceandshewasdischargedon40unitsofinsulinglargineatnight,12unitsofinsulinlisprowithmeals,andmetformin1000mgtwotimesaday.ItwasdeterminedthatallSGLT2MISCinhibitorsshouldbediscontinuedindefinitely.Shehadclosefollow-upwithendocrinologypostdischarge.TrAPTrAPTrAPTrAPTrAPTrAPTrAPTrAPTrAPTrAPTrAPTrAPTrAPTrAPTrAP" ] }, "metadata": {} @@ -1266,7 +1323,7 @@ "id": "nzLx9o7ks3lj" }, "source": [ - "# Configuring visualizations \n", + "# Configuring visualizations\n", "\n", "- `labels_to_viz` : Defines a subset of NER labels to viz i.e. ['PER'] , by default=[] which will display all labels. Applicable only for NER viz\n", "- `viz_colors` : Applicable for [ner, resolution, assert ] key = label, value=hex color, i.e. viz_colors={'TREATMENT':'#008080', 'problem':'#800080'}\n" @@ -1278,28 +1335,31 @@ "id": "HP2sq58_C6vW", "colab": { "base_uri": "https://localhost:8080/", - "height": 198 + "height": 244 }, - "outputId": "e4ebd6f3-f6a6-4e86-bd41-e77482ef683f" + "outputId": "8d2477a9-22f4-44fc-99e1-36193a7c2cf6" }, "source": [ "data = 'Dr. John Snow suggested that Fritz takes 5mg penicilin for his cough'\n", "# Define custom colors for labels\n", "viz_colors={'STRENGTH':'#800080', 'DRUG_BRANDNAME':'#77b5fe', 'GENDER':'#ebde34'}\n", - "nlu.load('med_ner.jsl.wip.clinical').viz(data,viz_colors =viz_colors)" + "nlp.load('med_ner.jsl.wip.clinical').viz(data,viz_colors =viz_colors)" ], - "execution_count": 12, + "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", "ner_wikiner_glove_840B_300 download started this may take some time.\n", "Approximate size to download 14.8 MB\n", "[OK!]\n", "glove_840B_300 download started this may take some time.\n", "Approximate size to download 2.3 GB\n", - "[OK!]\n" + "[OK!]\n", + "Warning::Spark Session already created, some configs may not take.\n" ] }, { @@ -1389,7 +1449,7 @@ " }\n", "\n", "\n", - " Dr. John Snow PER suggested that Fritz PER takes 5mg penicilin for his cough" + " Dr. John Snow PER suggested that Fritz PER takes 5mg penicilin for his cough" ] }, "metadata": {} @@ -1402,28 +1462,31 @@ "id": "r6wMavwSnTDa", "colab": { "base_uri": "https://localhost:8080/", - "height": 151 + "height": 197 }, - "outputId": "45b8cf9f-a67a-4f2c-baf3-922beb640681" + "outputId": "24a0af92-6657-47fa-f910-750dff6424ff" }, "source": [ "data = 'Dr. John Snow suggested that Fritz takes 5mg penicilin for his cough'\n", "# Filter wich NER label to viz\n", "labels_to_viz=['SYMPTOM']\n", - "nlu.load('med_ner.jsl.wip.clinical').viz(data,viz_colors=viz_colors,labels_to_viz=labels_to_viz)" + "nlp.load('med_ner.jsl.wip.clinical').viz(data,viz_colors=viz_colors,labels_to_viz=labels_to_viz)" ], - "execution_count": 13, + "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", "ner_wikiner_glove_840B_300 download started this may take some time.\n", "Approximate size to download 14.8 MB\n", "[OK!]\n", "glove_840B_300 download started this may take some time.\n", "Approximate size to download 2.3 GB\n", - "[OK!]\n" + "[OK!]\n", + "Warning::Spark Session already created, some configs may not take.\n" ] }, { @@ -1527,7 +1590,7 @@ }, "source": [ "# Specify visualization type manually\n", - "NLU tries to automatically infer a viz type if none is specified. \n", + "NLU tries to automatically infer a viz type if none is specified.\n", "You can manually specify which component to viz by setting `viz_type=type` for one type out of `ner,dep,resolution,relation`" ] }, @@ -1537,23 +1600,26 @@ "id": "Nw9OlQB26qd7", "colab": { "base_uri": "https://localhost:8080/", - "height": 143 + "height": 192 }, - "outputId": "5db17aaa-1f9c-4c62-d73c-c787c1efec9d" + "outputId": "db1a6576-6d6d-4861-8c95-e3de04d6ce4f" }, "source": [ "data = \"Donald Trump from America and Angela Merkel from Germany don't share many oppinions, but they both love John Snow Labs software!\"\n", - "nlu.load('ner').viz(data,viz_type='ner')" + "nlp.load('ner').viz(data,viz_type='ner')" ], - "execution_count": 14, + "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", "onto_recognize_entities_sm download started this may take some time.\n", - "Approx size to download 160.1 MB\n", - "[OK!]\n" + "Approx size to download 159 MB\n", + "[OK!]\n", + "Warning::Spark Session already created, some configs may not take.\n" ] }, { @@ -1643,7 +1709,7 @@ " }\n", "\n", "\n", - " Donald Trump PERSON from America GPE and Angela Merkel PERSON from Germany GPE don't share many oppinions, but they both love John Snow Labs ORG software!" + " Donald Trump PERSON from America GPE and Angela Merkel PERSON from Germany GPE don't share many oppinions, but they both love John Snow Labs ORG software!" ] }, "metadata": {} @@ -1656,7 +1722,7 @@ "id": "t77NHAB6PegH" }, "source": [ - "# Viz Dependency " + "# Viz Dependency" ] }, { @@ -1665,21 +1731,22 @@ "id": "fmgUO8Eh9cks", "colab": { "base_uri": "https://localhost:8080/", - "height": 705 + "height": 750 }, - "outputId": "6acb7b66-76e3-408b-9775-d126f1dfcbc5" + "outputId": "a671e1ab-c89e-46fd-cb24-43a232bcaf1b" }, "source": [ - "import nlu\n", "data = \"Donald Trump from America and Angela Merkel from Germany don't share many oppinions, but they both love John Snow Labs software!\"\n", - "viz = nlu.load('dep.typed').viz(data,viz_type='dep')" + "viz = nlp.load('dep.typed').viz(data,viz_type='dep')" ], - "execution_count": 4, + "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", "dependency_typed_conllu download started this may take some time.\n", "Approximate size to download 2.4 MB\n", "[OK!]\n", @@ -1688,7 +1755,8 @@ "[OK!]\n", "pos_anc download started this may take some time.\n", "Approximate size to download 3.9 MB\n", - "[OK!]\n" + "[OK!]\n", + "Warning::Spark Session already created, some configs may not take.\n" ] }, { @@ -1702,7 +1770,7 @@ " font-family: \"Lucida\"; \n", " src: url(\"data:application/x-font-ttf;charset=utf-8;base64,\"); \n", "}\n", - "]]>DonaldNNPTrumpNNPfromINAmericaNNPandCCAngelaNNPMerkelNNPfromINGermanyNNPdon'tNNshareNNmanyJJoppinionsNNS,,butCCtheyPRPbothDTloveNNJohnNNPSnowNNPLabsNNPsoftwareNN!.<no-type><no-type><no-type><no-type><no-type><no-type><no-type><no-type><no-type><no-type><no-type><no-type><no-type><no-type><no-type><no-type><no-type><no-type><no-type><no-type><no-type><no-type>" + "]]>DonaldNNPTrumpNNPfromINAmericaNNPandCCAngelaNNPMerkelNNPfromINGermanyNNPdon'tNNshareNNmanyJJoppinionsNNS,,butCCtheyPRPbothDTloveNNJohnNNPSnowNNPLabsNNPsoftwareNN!.ccflatflatnsubjamodcaseflatflatnsubjflatflatcompoundflatflat<no-type>detflatflatflatflatdiscoursepunct" ] }, "metadata": {} @@ -1724,27 +1792,29 @@ "id": "Gfmn4hXa_nNd", "colab": { "base_uri": "https://localhost:8080/", - "height": 198 + "height": 244 }, - "outputId": "f52945c6-f488-467a-f7b2-c7df25481621" + "outputId": "8082845c-87eb-4041-c2f9-bc993ab63653" }, "source": [ - "import nlu\n", "data = \"Donald Trump and Angela Merkel from Germany don't share many oppinions, but they both fear cancer!\"\n", - "nlu.load('med_ner.jsl.wip.clinical').viz(data,viz_type='ner')" + "nlp.load('med_ner.jsl.wip.clinical').viz(data,viz_type='ner')" ], - "execution_count": 5, + "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", "ner_wikiner_glove_840B_300 download started this may take some time.\n", "Approximate size to download 14.8 MB\n", "[OK!]\n", "glove_840B_300 download started this may take some time.\n", "Approximate size to download 2.3 GB\n", - "[OK!]\n" + "[OK!]\n", + "Warning::Spark Session already created, some configs may not take.\n" ] }, { @@ -1834,7 +1904,7 @@ " }\n", "\n", "\n", - " Donald Trump PER and Angela Merkel PER from Germany LOC don't share many oppinions, but they both fear cancer!" + " Donald Trump PER and Angela Merkel PER from Germany LOC don't share many oppinions, but they both fear cancer!" ] }, "metadata": {} @@ -1856,34 +1926,40 @@ "id": "HuLmkCNcVV-R", "colab": { "base_uri": "https://localhost:8080/", - "height": 528 + "height": 663 }, - "outputId": "f17e6f48-33dc-4db1-a6db-291f897ee824" + "outputId": "c2325310-aed0-4920-857b-723d884ed295" }, "source": [ - "import nlu\n", "nlu_ref = 'med_ner.jsl.wip.clinical en.resolve.icd10cm'\n", "data = \"\"\"The patient is a 5-month-old infant who presented initially on Monday with a cold, cough, and runny nose for 2 days. Mom states she had no fever. Her appetite was good but she was spitting up a lot. She had no difficulty breathing and her cough was described as dry and hacky. At that time, physical exam showed a right TM, which was red. Left TM was okay. She was fairly congested but looked happy and playful. She was started on Amoxil and Aldex and we told to recheck in 2 weeks to recheck her ear. Mom returned to clinic again today because she got much worse overnight. She was having difficulty breathing. She was much more congested and her appetite had decreased significantly today. She also spiked a temperature yesterday of 102.6 and always having trouble sleeping secondary to congestion.\"\"\"\n", - "pipe = nlu.load(nlu_ref)\n", + "pipe = nlp.load(nlu_ref)\n", "viz = pipe.viz(data,viz_type='resolution')" ], - "execution_count": 8, + "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", "ner_wikiner_glove_840B_300 download started this may take some time.\n", "Approximate size to download 14.8 MB\n", "[OK!]\n", "sbiobertresolve_icd10cm download started this may take some time.\n", "[OK!]\n", + "setInputCols in ENTITY_05f4bc95f0c5 expecting 1 columns. Provided column amount: 2. Which should be columns from the following annotators: ['sentence_embeddings']\n", + "sbiobert_base_cased_mli download started this may take some time.\n", + "Approximate size to download 384.3 MB\n", + "[OK!]\n", "glove_840B_300 download started this may take some time.\n", "Approximate size to download 2.3 GB\n", "[OK!]\n", - "sbiobert_base_cased_mli download started this may take some time.\n", - "Approximate size to download 384.3 MB\n", - "[OK!]\n" + "setInputCols in ENTITY_05f4bc95f0c5 expecting 1 columns. Provided column amount: 2. Which should be columns from the following annotators: ['sentence_embeddings']\n", + "setInputCols in ENTITY_05f4bc95f0c5 expecting 1 columns. Provided column amount: 2. Which should be columns from the following annotators: ['sentence_embeddings']\n", + "setInputCols in ENTITY_05f4bc95f0c5 expecting 1 columns. Provided column amount: 2. Which should be columns from the following annotators: ['sentence_embeddings']\n", + "Warning::Spark Session already created, some configs may not take.\n" ] }, { @@ -1973,7 +2049,7 @@ " }\n", "\n", "\n", - " The patient is a 5-month-old infant who presented initially on Monday with a cold, cough, and runny nose for 2 days. Mom states she had no fever. Her appetite was good but she was spitting up a lot. She had no difficulty breathing and her cough was described as dry and hacky. At that time, physical exam showed a right TM, which was red. Left TM MISCH93212 Auditory recruitment, left ear was okay. She was fairly congested but looked happy and playful. She was started on Amoxil and Aldex LOCF650 Fetishism and we told to recheck in 2 weeks to recheck her ear. Mom returned to clinic again today because she got much worse overnight. She was having difficulty breathing. She was much more congested and her appetite had decreased significantly today. She also spiked a temperature yesterday of 102.6 and always having trouble sleeping secondary to congestion." + " The patient is a 5-month-old infant who presented initially on Monday with a cold, cough, and runny nose for 2 days. Mom states she had no fever. Her appetite was good but she was spitting up a lot. She had no difficulty breathing and her cough was described as dry and hacky. At that time, physical exam showed a right TM, which was red. Left TM MISCL02522 Furuncle left hand was okay. She was fairly congested but looked happy and playful. She was started on Amoxil and Aldex LOCQ843 Anonychia and we told to recheck in 2 weeks to recheck her ear. Mom returned to clinic again today because she got much worse overnight. She was having difficulty breathing. She was much more congested and her appetite had decreased significantly today. She also spiked a temperature yesterday of 102.6 and always having trouble sleeping secondary to congestion." ] }, "metadata": {} @@ -1995,32 +2071,38 @@ "id": "PGHhDU6QlO7z", "colab": { "base_uri": "https://localhost:8080/", - "height": 242 + "height": 353 }, - "outputId": "d050a730-2c03-4611-dbc8-e4ef66d2d260" + "outputId": "929a101b-0acf-4734-9dfa-fd5a8d5f6b00" }, "source": [ - "import nlu \n", "data = [\"\"\"He has a starvation ketosis but nothing found for significant for dry oral mucosa\"\"\"]\n", - "nlu.load('med_ner.jsl.wip.clinical resolve.icd10pcs').viz(data,viz_type='resolution' )" + "nlp.load('med_ner.jsl.wip.clinical resolve.icd10pcs').viz(data,viz_type='resolution' )" ], - "execution_count": 9, + "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", "ner_wikiner_glove_840B_300 download started this may take some time.\n", "Approximate size to download 14.8 MB\n", "[OK!]\n", "sbiobertresolve_icd10pcs download started this may take some time.\n", "[OK!]\n", + "setInputCols in ENTITY_7090bfd98dcf expecting 1 columns. Provided column amount: 2. Which should be columns from the following annotators: ['sentence_embeddings']\n", + "sbiobert_base_cased_mli download started this may take some time.\n", + "Approximate size to download 384.3 MB\n", + "[OK!]\n", "glove_840B_300 download started this may take some time.\n", "Approximate size to download 2.3 GB\n", "[OK!]\n", - "sbiobert_base_cased_mli download started this may take some time.\n", - "Approximate size to download 384.3 MB\n", - "[OK!]\n" + "setInputCols in ENTITY_7090bfd98dcf expecting 1 columns. Provided column amount: 2. Which should be columns from the following annotators: ['sentence_embeddings']\n", + "setInputCols in ENTITY_7090bfd98dcf expecting 1 columns. Provided column amount: 2. Which should be columns from the following annotators: ['sentence_embeddings']\n", + "setInputCols in ENTITY_7090bfd98dcf expecting 1 columns. Provided column amount: 2. Which should be columns from the following annotators: ['sentence_embeddings']\n", + "Warning::Spark Session already created, some configs may not take.\n" ] }, { @@ -2132,32 +2214,35 @@ "id": "e4B8OtuqlPhb", "colab": { "base_uri": "https://localhost:8080/", - "height": 319 + "height": 361 }, - "outputId": "1c5fca1a-ae90-43c7-9ccb-2fbbc5b3e94f" + "outputId": "f0ef792b-b76d-41df-d5c4-f2a82314d2a6" }, "source": [ "nlu_ref = 'med_ner.jsl.wip.clinical assert'\n", "data = \"The patient was tested for cancer, but none was detected, he is free of cancer.\"\n", - "nlu.load(nlu_ref).viz(data,viz_type='assert')" + "nlp.load(nlu_ref).viz(data,viz_type='assert')" ], - "execution_count": 10, + "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", "ner_wikiner_glove_840B_300 download started this may take some time.\n", "Approximate size to download 14.8 MB\n", "[OK!]\n", "assertion_dl download started this may take some time.\n", "[OK!]\n", + "embeddings_clinical download started this may take some time.\n", + "Approximate size to download 1.6 GB\n", + "[OK!]\n", "glove_840B_300 download started this may take some time.\n", "Approximate size to download 2.3 GB\n", "[OK!]\n", - "embeddings_clinical download started this may take some time.\n", - "Approximate size to download 1.6 GB\n", - "[OK!]\n" + "Warning::Spark Session already created, some configs may not take.\n" ] }, { @@ -2247,7 +2332,7 @@ " }\n", "\n", "\n", - " The patient MISCpresent was tested for cancer, but none was detected, he is free of cancer." + " The patient MISCpresent was tested for cancer, but none was detected, he is free of cancer." ] }, "metadata": {} @@ -2269,21 +2354,23 @@ "id": "KKBr-S5tlWlX", "colab": { "base_uri": "https://localhost:8080/", - "height": 342 + "height": 387 }, - "outputId": "7f4c4afd-9921-43c0-dcf4-196dee452f90" + "outputId": "e0e78958-7c91-4274-dc49-ffb3c1a46428" }, "source": [ "nlu_ref = 'med_ner.jsl.wip.clinical relation.temporal_events'\n", "data = \"He was advised chest X-ray or CT scan after checking his SpO2 which was <= 93%\"\n", - "pipe = nlu.load(nlu_ref).viz(data,viz_type='relation')" + "pipe = nlp.load(nlu_ref).viz(data,viz_type='relation')" ], - "execution_count": 11, + "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", "ner_wikiner_glove_840B_300 download started this may take some time.\n", "Approximate size to download 14.8 MB\n", "[OK!]\n", @@ -2291,7 +2378,8 @@ "[OK!]\n", "glove_840B_300 download started this may take some time.\n", "Approximate size to download 2.3 GB\n", - "[OK!]\n" + "[OK!]\n", + "Warning::Spark Session already created, some configs may not take.\n" ] }, { @@ -2305,12 +2393,21 @@ " font-family: \"Lucida\"; \n", " src: url(\"data:application/x-font-ttf;charset=utf-8;base64,\"); \n", "}\n", - "]]>HewasadvisedchestX-rayMISCorCTscanaftercheckinghisSpO2MISCwhichwas<=93%OVERLAP" + "]]>HewasadvisedchestX-rayMISCorCTscanaftercheckinghisSpO2MISCwhichwas<=93%OVERLAP" ] }, "metadata": {} } ] + }, + { + "cell_type": "code", + "source": [], + "metadata": { + "id": "3JdCqYL8ToXp" + }, + "execution_count": null, + "outputs": [] } ] } \ No newline at end of file