All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog.
- User documentation now has an example showing how to install a library in Python
- The user interface now checks if the input file, the output folder and the log folder exist or can be created
- If one of them does not exist, the user will stay in the main configuration window and an error pop-up will open
- If one of the folder was created, the information will be written in the error pop-up or an informative pop-up (if there was no error)
- The main function now properly checks and displays in the terminal (and logs if available) if the input file, output folder or log folder do not exist or can not be created
- User documentation now correctly uses
BETTER_ITEM_MAPS
instead ofBETTER_ITEM_NO_MAPS
- Processings
BETTER_ITEM_DVD
&BETTER_ITEM_MAPS
do not look anymore only for dates in the origin database physical description (method__special_better_item()
)- This behaviour is now an opt-in, the default one is keeping the data raw
- Documentation now states which library versions were used for development (due to issues with
PySimpleGUI
in this project andpymarc
in another project) - Now uses
pysisbn
library - Added error
ISBN_979_CAN_NOT_BE_CONVERTED
for ISBN 13 not starting with978
if trying to convert them to ISBN 10 - Operations
SEARCH_IN_SUDOC_BY_ISBN
,SEARCH_IN_SUDOC_DVD
,SEARCH_IN_SUDOC_NO_ISBN
&SEARCH_IN_SUDOC_MAPS
now have an additionnal last action that queries only on title on the title index (and document type filter) - More specific erros were added if the title, the authors, the publisher or the dates were missing and they were required for the matching process and if the configured document type is not supported
- Now developped for :
- Python 3.12 (3.11 before)
unidecode 1.3.8
(1.3.6
before)python-Levenshtein 0.25.1
(0.23.0
before)requests 2.32.3
(2.28.1
before)python-dotenv 1.0.1
(1.0.0
before)
- Changed graphic user interface library from
PySimpleGUI 4.2.2
toFreeSimpleGUI 5.1.0
- Light grey color was replaced by creme
- Replaced ISBN functions from
Abes_id2ppn
bypyisbn
library - Updated
Abes_id2ppn
with lastest version - Splitted big files into smaller ones & reduce dependencies between them
- Notably,
cl_UDE.py
can now be used alone as it was originnally intended - Added a
classes_dependencies.md
file explaining the dependencies
- Notably,
- Reworked
request_action
to avoid redundacy on Sudoc SRU, Koha SRU & Sudoc's id2ppn webservice - Documentation (developper & user) was updated (user english documentation was added)
- Removed unecessary personnal Python test scripts and files
- Put back the mention of Alexandre Faure original script (that I apparently deleted)
- Fixed various syntax warning with regular expressions
- Fixed fatal errors when trying to convert ISBN with more than just digits or
X
- Fixed the functions returning instances from
Enum
(though the bug was not used in the main script I think) - Fixed the interface not loading the saved language properly
Universal_Data_Extractor.extract_list_of_strings
method does not crash anymore if some subfield had no values- ASCII range (hexadecimal)
21-2F
,3A-40
,5B-60
,7B-7F
added to the noise list
- New action requesting Koha SRU only using title on the title index
- The operation
SEARCH_IN_KOHA_SRU_VANILLA
now queries as a last resort Koha SRU using only the title
- Transformations before request actions using titles, authors or publishers now deletes every word that does not start with a letter or a number
- Reordered some columns in CSV output :
- Document type is now between erroneous ISBN and titles
- Authors are now between document type and titles
- Matched IDs, database ID and current ID are moved after FCR prcessed ID
- Added more dahs to the noise deletion process (Unicode 2010 to 2015)
- New exportable data : piece (UNIMARC 463, internal name is
linking_piece
) and document type - New action : query title, publisher and publication date in the any index in Koha SRU (added to
SEARCH_IN_KOHA_SRU_VANILLA
operation)
- Processing
MARC_FILE_IN_KOHA_SRU
now export :- Physical descriptions for both records
- Items barcodes for both records
- Document type for both records
- Piece for both records
- Koha SRU now uses Filter 1 to filter items and items barcode information
- Mapping
KOHA_ARCHIRES
now usesb
as filtering subfield for items and items barcode - Added
collectif
&collectifs
as empty words to delete
- Added
°
to noise list
- Added
UN
to the lsit of empty words deleted - Added
+
to the list of deleted noise - Logs now properly write the target database value instead of the origin database twice
- JSON output exports much mor eprecise data, including raw data from records
- Lists containing one element and empty elements are now properly output in CSV export
- CSV output does not replace
D
by spaces anymore
- List of list should not crash the CSV output function anymore
- New action
ISBN2PPN_MODIFIED_ISBN_SAME_KEY
was added toSEARCH_IN_SUDOC_BY_ISBN
operation afterISBN2PPN_MODIFIED_ISBN
: they behave the same axcept that the new one keeps the original input ISBN check digit instead of recomputing it - New FCR processed ID with the form :
XXXXXZYYY
:XXXXX
: record index of the file being processed, with leadings0
(always 5 character long)Z
:Z
: failed before getting origin database recordO
: origin database record retrieved, failed before getting matched recordsM
: successfully matched records, failed before looping through matched recordsY
: failed before getting the target database recordT
: failed to analyze the target database recordA
: succesfully did all the checks
YYY
: matched record index,XXX
if it failed before looping through matched records
- Errors are now translated
- Internal changes on the management of processings, operations, databases, UI screens, screen tabs and errors
- Main function and classes are less dependent on the big execution settings variable
- The function exporting lists as strings no longer replaces
1
by a space - Action
ISBN2PPN_MODIFIED_ISBN
now properly query the modified ISBN instead of the original one - Actions
ISBN2PPN_MODIFIED_ISBN
andISBN2PPN_MODIFIED_ISBN_SAME_KEY
now return a new error if a modified ISBN failed to be created
- Old configuration script
define_default_settings_GUI.py
- New extractable data : exported to digital library, maps horizontal scales, maps mathematical data, series, series link, geographical subject
- Processing
MARC_FILE_IN_KOHA_SRU
now exports if record was exported to digital library - New processings :
BETTER_ITEM_NO_ISBN
&BETTER_ITEM_MAPS
- Duplicated Sudoc SRU actions limited on
V
document type to queryB
andK
- Processing
BETTER_ITEM_DVD
now properly appliesBETTER_ITEM
's special transformation before exporting data to CSV
- Updated
Koha_SRU
&Abes_SRU
versions, including a fix preventing crashes when the SRU responded with anumberOfRecords
but data retrieved from it could not be changed to integer
list_as_string()
now returns values separated by,
without being enclosed in[]
- Fixed records missing general processing data crashing the export to CSV function, thus not exporting every data
- Fixed crashes when trying to use regular expression on some strings containing Unicode format characters after merging them with
list_as_string()
- Report is more precise
- Results files are nammed
results
instead ofresultats
, the reprot file is now calledresults_report
- Report is back
outputing.py
(moved its final function insideReport
class infcr_classes.py
)scripts
folder as nothing was there anymore
- New file in user documentation explaining the exported data (only calculated fields)
- New processing analysing a local MARC file and querying a Koha SRU (
MARC_FILE_IN_KOHA_SRU
) - Added actions for Koha SRU :
- ISBN
- Title, author, publisher and date using their own indexes
- Title, author and date using their own indexes
- Title, author, publisher and date using
any
index - Title, author and date using
any
index
- Added a new operation to query Koha SRU
SEARCH_IN_KOHA_SRU_VANILLA
- Added ISBN as extractable data from records
- Added
Utils
methods toDatabase_Records
to get the first ISBN and the first EAN as a string - Added
Utils
methods toDatabase_Records
to get the other database IDs - Added a new default marc field mapping
OTHER_DB_IN_LOCAL_DB
was renammed inMARC_FILE_IN_KOHA_SRU
- Updated
Koha_SRU
andKoha_API_PublicBiblio
version - Filters are now properly implanted
- Records are now correctly retrieved from the correct database instead of Koha (origin database) and Sudoc (target database)
- Fixed displayed processing in Processing configuration screen, previously not updating properly if selected processing was changed in the main screen
- Fixed
Database_Record.utils
methods crashing the application if some data was not extracted from the record (or if there is no title returned) - Fixed some errors in main function not logging themselves
- Fixed marc field mapping using only to
ORIGIN_DATABASE
andTARGET_DATABASE
- Fixed universal data extractor not extracting anything from single line coded data if positions in a range larger than the max length of the field content
- Fixed crashes if other database IDs were not exported for the target database : a new value
SKIPPED
is used in those cases
- Authors are exported in processing
BETTER_ITEM
- Changed Koha SRU connector version
- ID validation and analysis result now output text instead of internal codes
- Changed some columns names
- Fixed CSV column names configuration file browse type
- Fixed
BETTER_ITEM
target database has items not displaying the correct information
- Added changelog file
- New processing
BETTER_ITEM_DVD
- New actions :
- EAN to PPN
- Sudoc SRU using title, authors, publishers and dates using audiovisual document type
- Sudoc SRU using title, authors and dates using audiovisual document type
- Sudoc SRU in all indexes using title, authors, publishers and dates using audiovisual document type
- Sudoc SRU in all indexes using title, authors and dates using audiovisual document type
- Sudoc SRU in all indexes using title, authors and publishers using audiovisual document type
- New functions to :
- Delete CBS boolean operators
- Delete Sudoc empty words
- Use both of the previous at the same time
- Delete duplicate words in a string
Class Database_Record
now has autils
property with methods to return data formatted- New environment variable
CSV_OUTPUT_JSON_CONFIG_PATH
, selectable in the UI - Log level is selectable in the UI
- Used action is now exportable
- New
json_configs
file :csv_cols.json
(and a personnalisedcsv_cols_BETTER_ITEM.json
)
- Documentation to add actions, operations and processings are more clear
- The noise deletion function deletes more noise
- Authors data defaults to fields
700
,701
,702
,710
,711
and712
- Some output functions have been moved to a new
output
property ofClass Original_Record
- Internal changes to error management
- Internal changes to logs, notably updated old logged informations and only logging once the query used
- Reworked CSV export
- Changed
id2ppn.py
version to fix incorrect status and incorrect returned value if only one record matched with JSON returned data - Changed
Sudoc_SRU.py
version to fix XML parse errors when some queries using angle brackets were not properly transformed in Sudoc's response - Create the output folder if it does not exist
- Output used query display the actual query used
- File
csv_export_cols.json
logs.py
(moved its fucntion insideLogger
subclass infcr_classes.py
)
- Default analysis is now Titles 80% (3 out of 4), publishers 80%, dates
- Checking if the ID from the origin database is in the target list of othre database IDs is now more precise (see
Enum Other_Database_Id_In_Target
infcr_enum.py
) - Global validation has now 2 output columns : the previous one now returns values from
Enum Analysis_Final_Results
infcr_enum.py
and the new one the number of successful checks
- Values filled in the UI without being saved are now properly sent to the main script, instead of using saved values
- Multipe matches are now all retrieved instead of throwing a fake error
- Does not check if the biblionumber is different than those in the Sudoc if no biblionumbers are found in the Sudoc (previous behaviour)
- Removed the fake error concept
- Graphic user interface has been entirely remade and can now configure most configurations
- French user documentation updated
- Fixed an error on unknown ISBN in
isbn2ppn
- Test files for the universal data extractor
- Application name is now Find and Compare Records
- List outputing in CSV file is now an emty cell if the list is empty, or just the value if list contains only one value
- Internal changes, notably centralizing data export functions so they are universal
- Technical documentation about the universal data extractor
- Technical documentation for the matching records part
- Can now query Sudoc's SRU
- Now query Sudoc's SRU using ISBN if both query on
isbn2ppn
failed
- Internal changes to start implementing classes
- Report is now broken
- Outputs erroneous ISBN from Koha & Sudoc
- Part of the settings were moved from
settings.json
to a.env
file
- Changed
Koha_API_PublicBiblio.py
version to fix crashes when Koha record had only empty subfields in the title
- Changed
Abes_isbn2ppn.py
version to fix invalid ISBN returned values
- Output years found inside Koha
215$a
Abes_isbn2ppn.py
is now queried with a converted version of ISBN 10 to 13 (or the other way) if the first query did not return result
- Output notes about edition for both Koha and Sudoc
- Internal changes to the matching records process
- Correctly outputs every lines, even if an error occurred
- Fixed Koha API encoding problem when using JSON
- Added a graphic user interface
- Now outputs :
- PPN already present in Koha record
- Items information already present in Sudoc record
- RCR can now be configured
- Internal changes (to a lot of things)
- Creatting the report no longer crashes the script
- Correctly ctaches all errors instead of just HTTP ones
- Output local system number present in Sudoc record
- ILN can now be configured
- Documentation
- A new configuration file
settings.json
- Internal changes, notably in the CSV export and better error handling
- Prevent crashes if no publisher was found in Koha or Sudoc record
- Correctly increases ISBN to PPN success for the report
- Correctly compute ht enumber of unique matching in the report
- Realeased version for first executions