zingg-0.3.4-SNAPSHOT-spark-3.1.2
Lots of goodies in this release - python interface, stop words, new match types
What's Changed
- Extract Stop words by @navinrathore in #186
- CI: Executing maven compile at each commit by @edmondo1984 in #189
- Introduce CodeQL pipeline on each commit by @edmondo1984 in #190
- New InMemoryPipe has been added by @navinrathore in #209
- Changes to support databases that use Jdbc driver by @navinrathore in #214
- Data processing for Stop Words removal by @navinrathore in #191
- MatchType 'DONOT USE' updated in the docs and TC added by @navinrathore in #222
- Added new format 'bigquery' by @navinrathore in #233
- Documentation for BigQuery connector by @navinrathore in #236
- Renamed ZINGG_ARGS_EXTRA to ZINGG_EXTRA_SPARK_CONF and ZINGG_EXTRA to ZINGG_EXTRA_JARS. by @navinrathore in #239
- Setting-Up Zingg Development Environment by @Aditya-R-Chakole in #240
- Documenter refactoring and handling error 'Path does not exist' by @navinrathore in #223
- Bump jackson-databind from 2.10.0 to 2.12.6.1 in client/pom.xml by @navinrathore in #195
- Bump poi-scratchpad from 3.16 to 5.2.1 in /client by @dependabot in #167
- From original data, select fields whose definition is provided in config to be written in output by @navinrathore in #217
- Removed dependencies of snowflake, mysql, cassandra, elastic, apache-poi, log4j from pom by @navinrathore in #248
- Exception handling in PipeUtil::read() by @navinrathore in #229
- Z columns doc to have different template. by @navinrathore in #257
- GenerateDocs becomes independent of Data by @navinrathore in #270
- Working with StopWord file if its header does not include the column 'StopWord' by @navinrathore in #274
- specify path for ZINGG_HOME by @chetan453 in #280
- Updated installation document to install maven using sudo apt by @navinrathore in #282
- blockSize - a new config paramter for max size of the block by @navinrathore in #272
- moved getRecords & setRecords from InMemoryPipe to Pipe by @chetan453 in #286
- Revert "moved getRecords & setRecords from InMemoryPipe to Pipe" by @sonalgoyal in #288
- Revert "Revert "moved getRecords & setRecords from InMemoryPipe to Pipe"" by @sonalgoyal in #294
- resolved errors by @chetan453 in #293
- Match type pin code by @RavirajBaraiya in #290
- Removed EMAIL, LICENSE, SPARK_MEM and elastic references from zingg.sh by @navinrathore in #253
- Match type email by @RavirajBaraiya in #291
- Documenter testcases by @navinrathore in #281
- More blocking functions by @navinrathore in #292
- Checking if default zinggDir exists else create it by @navinrathore in #297
- moved config files for junits by @chetan453 in #298
- support for python phases in zingg.sh by @navinrathore in #301
- release 0.3.4 by @navinrathore in #305
- Python User script support in zingg.sh by @navinrathore in #311
- python unit at compilation time by @RavirajBaraiya in #318
- Updates in Python classes and 'assessModel' python phase by @navinrathore in #313
- env variables can be defined in zingg.conf in addition to spark properties by @navinrathore in #303
- new phase PeekModel by @RavirajBaraiya in #319
- added api,python,config dirs into distribution package by @navinrathore in #323
- added csv for testPeekModel by @RavirajBaraiya in #324
- new python phase exportModel by @RavirajBaraiya in #325
- API chnages issue part 2,4,5 by @RavirajBaraiya in #326
- rename matchtype dont use to dont_use by @RavirajBaraiya in #328
- modified TestDSUtil by @RavirajBaraiya in #331
- Python API - Specialized Pipes for SnowFlake, BigQuery etc. by @navinrathore in #327
- formatting of help message by @RavirajBaraiya in #332
- Documenter changes issue #335 by @RavirajBaraiya in #343
- pip package artifects by @RavirajBaraiya in #344
- Revert "release 0.3.4" by @sonalgoyal in #345
- zingg pip package artifacts modification by @RavirajBaraiya in #356
- Null Pointer check in "Range" hash functions by @navinrathore in #350
- proper handling of case when zingg config file does not exit by @navinrathore in #358
- python api dir deleted and moved FebrlExample.py by @RavirajBaraiya in #362
- Removed test involving reading generated file by @navinrathore in #368
- To fix Databrics UserWarning: DataFrame constructor is internal... by @navinrathore in #373
- getUnmarkedRecords() - updated to the version with correct functionality and fixed its name by @navinrathore in #372
- inmemorypipe accepts pandas df by @navinrathore in #376
- fixed the broken link for pipes.md by @shefalika-thapa in #381
- Tests for getAs() for Double, Integer types by @navinrathore in #375
- python examples by @RavirajBaraiya in #382
- python api doc by @RavirajBaraiya in #385
- Added specific pipes property constants by @navinrathore in #374
- testWriteArgumentObjectToJSONFile class modification by @RavirajBaraiya in #387
- Double similarity function - null pointer exception by @navinrathore in #369
- manofest.in changes- only add febrl and amazonGoogle example by @RavirajBaraiya in #394
- modification for jar and deps issue #308 by @RavirajBaraiya in #398
- added stopword functionality in zingg FieldDefinition by @RavirajBaraiya in #392
- recommender phase issue #336 by @RavirajBaraiya in #399
- TCs for String Similarity Distance function by @navinrathore in #371
- added python script to run all with febrl example python unittest by @RavirajBaraiya in #395
- Csvpipe by @RavirajBaraiya in #402
- mdification according to pipes changes by @RavirajBaraiya in #405
- Pipe by @RavirajBaraiya in #406
- testGetAs changes by @RavirajBaraiya in #410
- Pipes changes issue #401 by @RavirajBaraiya in #411
- Revert "Pipe" by @sonalgoyal in #412
- setStopword changes by @RavirajBaraiya in #413
- Removed Format type by @RavirajBaraiya in #409
New Contributors
- @edmondo1984 made their first contribution in #189
- @Aditya-R-Chakole made their first contribution in #240
- @chetan453 made their first contribution in #280
- @RavirajBaraiya made their first contribution in #290
- @shefalika-thapa made their first contribution in #381
Full Changelog: v0.3.3...v0.3.4