Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mainframe Condensed data #637

Open
manchikalapudi opened this issue Aug 8, 2023 · 3 comments
Open

Mainframe Condensed data #637

manchikalapudi opened this issue Aug 8, 2023 · 3 comments
Labels
question Further information is requested

Comments

@manchikalapudi
Copy link

Background [Optional]

"Condensed" data refers to data that has been compressed or shortened to take up less space in memory. This is typically done by removing any unnecessary or redundant information from the data. For example, a condensed representation of a series of numbers might store only the differences between each number, rather than the actual values themselves. This can help reduce the amount of memory required to store the data, which can be especially important in systems with limited resources.

Question

Does Cobrix parse Condensed mainframe files?

@manchikalapudi manchikalapudi added the question Further information is requested label Aug 8, 2023
@yruslan
Copy link
Collaborator

yruslan commented Aug 10, 2023

Hi, no it is not supported at the moment. We might add support, depending on the complexity of the feature.

Could you please point me to some documentation with more details about the data format? Specifically:

  • How condensed fields look in the copybook?
  • How the compression/decompression is performed?

@mark-weghorst
Copy link

@yruslan I also have a very similar requirement, so I'm posting my use-case details here in this issue so I can get your advice on how to proceed. I also have a very similar requirement to the original issue raised, so here is what I'm trying to solve.

I have a packaged financial application that maintains a transactional journal in VBVR VSAM dataset(s). The datsets are somewhat large, and the system vendor has implemented a method by which unused sections of the copybook are skipped and not written to the VSAM.

My system vendor refers to this as compression, but @manchikalapudi refered to this as "collapsed" which might be a better term. The data is not compressed with any compression codec, rather entire sections of the copybook are skipped based on the contents of certain fields.

This has not been implemented using occurs depending upon, but instead entire sections of the copybook are skipped. This can be different on each and every record.

Consider the following copybook (this simplifed example has been adapted from our very complex copybook)

    01  TRAN-HIST.
        05 TR-LNGTH           PIC S9(04)  COMP.
        05 TR-ACCT-NUM        PIC 9(14) COMP-3.
        05 TR-EFF-DATE        PIC 9(9) COMP-3.
        05 TR-COM-DATA.
           10 TR-TRAN-CD       PIC X(4).
           10 TR-TRAN-TYPE     PIC X.
           10 TR-CAPTURE-BIT-CD PIC X.
           10 TR-CAPTURE-BIT-2-CD PIC X.
        05 TR-MONETARY-DATA.
           10 TR-TRAN-AMT      PIC S9(13)V99  COMP-3.
           10 TR-CHK-NUM       PIC X(10).
           10 RT-CURRNCY       PIC X(3).
           10 RT-CURRNCY-DEC   PIC X(01).
           10 TR-XCHG-IND      PIC X(01).
        05 TR-XCHG-DATA.
           10 TR-XCHG-CURRNCY  PIC X(03).
           10 TR-XCHG-AMT-DEC  PIC X(01).
           10 TR-XCHG-AMT      PIC S9(15)V99 COMP-3.
           10 TR-XCHG-RATE     PIC S9(07)V9(8) COMP-3.
           10 TR-XCHG-OVRD     PIC X(01).
        05 TR-CAPTURE-DATA.
           10 TR-ORIGIN        PIC X(4).
           10 TR-OPER          PIC X(8).
           10 TR-SYS-DATE      PIC 9(9) COMP-3.
           10 TR-ENTRY-TIME    PIC S9(7)  COMP-3.
           10 TR-BATCH         PIC S9(5)  COMP-3.
           10 TR-SEQ           PIC S9(5)  COMP-3.
           10 TR-COMMENT       PIC X(15).

TR-LNGTH, TR-ACCT-NUM, TR-EFF-DATE, and TR-COM-DATA are always populated on each and every record.

The TR-MONETARY-DATA group is only populated when RT-TRAN-TYPE is a C or D, otherwise this section is skipped and does not appear in the record (the bytes were skipped and never written to DASD).

The TR-XCHG-DATA group is only populated when the TR-MONETARY-DATA.TR-XCHG-IND is a "Y," otherwise this section is skiped and does not appear in the record (the bytes were never written to DASD)

The TR-CAPTURE-DATA group is always populated, but what members it contains is dependent on the TR-CAPTURE-BIT-CD field. In both cases, we need to extract the raw bits to derive 8 boolean flags. Based on these flags, we would determine which fields to skip. For example, one of the bits controls whether TR-ORIGIN is present, another if TR-OPER is present, etc. If an individual bit is a 0, then that field was never written to DASD.

Our actual use case is a lot more complicated, but I've tried to distill this down to a simpler representative example.

At first I thought that generate_record_bytes might be useful when combined with a row processor, however I quickly realized that I would need to know how to access your lower level methods and functions directly so that I could create a custom record parser.

I would be very interested in your advice on how to tackle this. When a section/field is skipped I would to set its value to null, otherwise I want to assign the value from the conversion code.

Essentially, I need to decode sections of the file incrementially, and then use that data to know whether to skip or process data groups/fields. As I mentioned, this can different on each and every record, and not all of the controlling variables are in the common (always present) section.

I'm sure you remember that in 2020 you and I collaborated on the VBVR support using a custom record extractor. Is there such a thing as a custom record parser, where I could write a class to do what I have described?

@yruslan
Copy link
Collaborator

yruslan commented Jan 21, 2025

Thanks @mark-weghorst for the very detailed description of the issue!

I think the support can be added to handle such cases. Several ideas come to mind, will get back on a proposal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants