Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

draft for path and path_list #513

Merged
merged 39 commits into from
Jan 31, 2024

Conversation

muescha
Copy link
Contributor

@muescha muescha commented Jan 4, 2024

just to play around how to implement --path-list

TODO:

  • tests
  • _process to exclude
  • fix docs
  • add to readme (is done by readmegen.py)
  • examples
  • refactor to use pathlib
  • for windows path

@muescha
Copy link
Contributor Author

muescha commented Jan 4, 2024

this is working:

echo "/abc/def/gh.txt:/xyz/uvw/ab.app" \
| jc --path-list -p
[
  {
    "url": "/abc/def/gh.txt",
    "scheme": null,
    "netloc": null,
    "path": "/abc/def/gh.txt",
    "parent": "/abc/def",
    "filename": "gh.txt",
    "stem": "gh",
    "extension": "txt",
    "path_list": [
      "abc",
      "def",
      "gh.txt"
    ],
    "query": null,
    "query_obj": null,
    "fragment": null,
    "username": null,
    "password": null,
    "hostname": null,
    "port": null,
    "encoded": {
      "url": "/abc/def/gh.txt",
      "scheme": null,
      "netloc": null,
      "path": "/abc/def/gh.txt",
      "parent": "/abc/def",
      "filename": "gh.txt",
      "stem": "gh",
      "extension": "txt",
      "path_list": [
        "abc",
        "def",
        "gh.txt"
      ],
      "query": null,
      "fragment": null,
      "username": null,
      "password": null,
      "hostname": null,
      "port": null
    },
    "decoded": {
      "url": "/abc/def/gh.txt",
      "scheme": null,
      "netloc": null,
      "path": "/abc/def/gh.txt",
      "parent": "/abc/def",
      "filename": "gh.txt",
      "stem": "gh",
      "extension": "txt",
      "path_list": [
        "abc",
        "def",
        "gh.txt"
      ],
      "query": null,
      "fragment": null,
      "username": null,
      "password": null,
      "hostname": null,
      "port": null
    }
  },
  {
    "url": "/xyz/uvw/ab.app",
    "scheme": null,
    "netloc": null,
    "path": "/xyz/uvw/ab.app",
    "parent": "/xyz/uvw",
    "filename": "ab.app",
    "stem": "ab",
    "extension": "app",
    "path_list": [
      "xyz",
      "uvw",
      "ab.app"
    ],
    "query": null,
    "query_obj": null,
    "fragment": null,
    "username": null,
    "password": null,
    "hostname": null,
    "port": null,
    "encoded": {
      "url": "/xyz/uvw/ab.app",
      "scheme": null,
      "netloc": null,
      "path": "/xyz/uvw/ab.app",
      "parent": "/xyz/uvw",
      "filename": "ab.app",
      "stem": "ab",
      "extension": "app",
      "path_list": [
        "xyz",
        "uvw",
        "ab.app"
      ],
      "query": null,
      "fragment": null,
      "username": null,
      "password": null,
      "hostname": null,
      "port": null
    },
    "decoded": {
      "url": "/xyz/uvw/ab.app",
      "scheme": null,
      "netloc": null,
      "path": "/xyz/uvw/ab.app",
      "parent": "/xyz/uvw",
      "filename": "ab.app",
      "stem": "ab",
      "extension": "app",
      "path_list": [
        "xyz",
        "uvw",
        "ab.app"
      ],
      "query": null,
      "fragment": null,
      "username": null,
      "password": null,
      "hostname": null,
      "port": null
    }
  }
]

slurp this is also working:

echo "/abc/def/gh.txt:/xyz/uvw/ab.app\n/def/hij/klm.txt:/efe/app.txt" \
| jc --path-list -p -s | jq ".[][] .path"
"/abc/def/gh.txt"
"/xyz/uvw/ab.app"
"/def/hij/klm.txt"
"/efe/app.txt"
echo "/abc/def/gh.txt:/xyz/uvw/ab.app\n/def/hij/klm.txt:/efe/app.txt" \
| jc --path-list -p -s
[
  [
    {
      "url": "/abc/def/gh.txt",
      "scheme": null,
      "netloc": null,
      "path": "/abc/def/gh.txt",
      "parent": "/abc/def",
      "filename": "gh.txt",
      "stem": "gh",
      "extension": "txt",
      "path_list": [
        "abc",
        "def",
        "gh.txt"
      ],
      "query": null,
      "query_obj": null,
      "fragment": null,
      "username": null,
      "password": null,
      "hostname": null,
      "port": null,
      "encoded": {
        "url": "/abc/def/gh.txt",
        "scheme": null,
        "netloc": null,
        "path": "/abc/def/gh.txt",
        "parent": "/abc/def",
        "filename": "gh.txt",
        "stem": "gh",
        "extension": "txt",
        "path_list": [
          "abc",
          "def",
          "gh.txt"
        ],
        "query": null,
        "fragment": null,
        "username": null,
        "password": null,
        "hostname": null,
        "port": null
      },
      "decoded": {
        "url": "/abc/def/gh.txt",
        "scheme": null,
        "netloc": null,
        "path": "/abc/def/gh.txt",
        "parent": "/abc/def",
        "filename": "gh.txt",
        "stem": "gh",
        "extension": "txt",
        "path_list": [
          "abc",
          "def",
          "gh.txt"
        ],
        "query": null,
        "fragment": null,
        "username": null,
        "password": null,
        "hostname": null,
        "port": null
      }
    },
    {
      "url": "/xyz/uvw/ab.app",
      "scheme": null,
      "netloc": null,
      "path": "/xyz/uvw/ab.app",
      "parent": "/xyz/uvw",
      "filename": "ab.app",
      "stem": "ab",
      "extension": "app",
      "path_list": [
        "xyz",
        "uvw",
        "ab.app"
      ],
      "query": null,
      "query_obj": null,
      "fragment": null,
      "username": null,
      "password": null,
      "hostname": null,
      "port": null,
      "encoded": {
        "url": "/xyz/uvw/ab.app",
        "scheme": null,
        "netloc": null,
        "path": "/xyz/uvw/ab.app",
        "parent": "/xyz/uvw",
        "filename": "ab.app",
        "stem": "ab",
        "extension": "app",
        "path_list": [
          "xyz",
          "uvw",
          "ab.app"
        ],
        "query": null,
        "fragment": null,
        "username": null,
        "password": null,
        "hostname": null,
        "port": null
      },
      "decoded": {
        "url": "/xyz/uvw/ab.app",
        "scheme": null,
        "netloc": null,
        "path": "/xyz/uvw/ab.app",
        "parent": "/xyz/uvw",
        "filename": "ab.app",
        "stem": "ab",
        "extension": "app",
        "path_list": [
          "xyz",
          "uvw",
          "ab.app"
        ],
        "query": null,
        "fragment": null,
        "username": null,
        "password": null,
        "hostname": null,
        "port": null
      }
    }
  ],
  [
    {
      "url": "/def/hij/klm.txt",
      "scheme": null,
      "netloc": null,
      "path": "/def/hij/klm.txt",
      "parent": "/def/hij",
      "filename": "klm.txt",
      "stem": "klm",
      "extension": "txt",
      "path_list": [
        "def",
        "hij",
        "klm.txt"
      ],
      "query": null,
      "query_obj": null,
      "fragment": null,
      "username": null,
      "password": null,
      "hostname": null,
      "port": null,
      "encoded": {
        "url": "/def/hij/klm.txt",
        "scheme": null,
        "netloc": null,
        "path": "/def/hij/klm.txt",
        "parent": "/def/hij",
        "filename": "klm.txt",
        "stem": "klm",
        "extension": "txt",
        "path_list": [
          "def",
          "hij",
          "klm.txt"
        ],
        "query": null,
        "fragment": null,
        "username": null,
        "password": null,
        "hostname": null,
        "port": null
      },
      "decoded": {
        "url": "/def/hij/klm.txt",
        "scheme": null,
        "netloc": null,
        "path": "/def/hij/klm.txt",
        "parent": "/def/hij",
        "filename": "klm.txt",
        "stem": "klm",
        "extension": "txt",
        "path_list": [
          "def",
          "hij",
          "klm.txt"
        ],
        "query": null,
        "fragment": null,
        "username": null,
        "password": null,
        "hostname": null,
        "port": null
      }
    },
    {
      "url": "/efe/app.txt",
      "scheme": null,
      "netloc": null,
      "path": "/efe/app.txt",
      "parent": "/efe",
      "filename": "app.txt",
      "stem": "app",
      "extension": "txt",
      "path_list": [
        "efe",
        "app.txt"
      ],
      "query": null,
      "query_obj": null,
      "fragment": null,
      "username": null,
      "password": null,
      "hostname": null,
      "port": null,
      "encoded": {
        "url": "/efe/app.txt",
        "scheme": null,
        "netloc": null,
        "path": "/efe/app.txt",
        "parent": "/efe",
        "filename": "app.txt",
        "stem": "app",
        "extension": "txt",
        "path_list": [
          "efe",
          "app.txt"
        ],
        "query": null,
        "fragment": null,
        "username": null,
        "password": null,
        "hostname": null,
        "port": null
      },
      "decoded": {
        "url": "/efe/app.txt",
        "scheme": null,
        "netloc": null,
        "path": "/efe/app.txt",
        "parent": "/efe",
        "filename": "app.txt",
        "stem": "app",
        "extension": "txt",
        "path_list": [
          "efe",
          "app.txt"
        ],
        "query": null,
        "fragment": null,
        "username": null,
        "password": null,
        "hostname": null,
        "port": null
      }
    }
  ]
]

@muescha
Copy link
Contributor Author

muescha commented Jan 4, 2024

I know the raw_output type and the _process type is wrong
maybe I don't need the _process - I just take it over because there is an empty _process in url.py.

@muescha
Copy link
Contributor Author

muescha commented Jan 4, 2024

should it makes sense to have an path parser with uses the url parser and then with the _process strip down some fields which is only needed for a path?

did we need encoded and decoded?

    [
      {
        "path": "/abc/def/gh.txt",
        "parent": "/abc/def",
        "filename": "gh.txt",
        "stem": "gh",
        "extension": "txt",
        "path_list": [
          "abc",
          "def",
          "gh.txt"
        ],
        "encoded": {
          "path": "/abc/def/gh.txt",
          "parent": "/abc/def",
          "filename": "gh.txt",
          "stem": "gh",
          "extension": "txt",
          "path_list": [
            "abc",
            "def",
            "gh.txt"
          ],
        },
        "decoded": {
          "path": "/abc/def/gh.txt",
          "parent": "/abc/def",
          "filename": "gh.txt",
          "stem": "gh",
          "extension": "txt",
          "path_list": [
            "abc",
            "def",
            "gh.txt"
          ],
        }
      }
    ]

@muescha
Copy link
Contributor Author

muescha commented Jan 4, 2024

Edge case:

  • I think on windows there must be an ; as path delimiter and it can have an drive name like C:\Program Files;C:\Winnt;C:\Winnt\System32?

@muescha
Copy link
Contributor Author

muescha commented Jan 4, 2024

73bbfac: added _process to remove some fields ( with --raw all the fields are visible)

@muescha
Copy link
Contributor Author

muescha commented Jan 4, 2024

Edge Case:

  • there are also paths like ./path/to/file.txt?
echo "./path/to/file.txt" | jc --path-list -p

note: the path_list how has a .path:

[
  {
    "path": "./path/to/file.txt",
    "parent": "path/to",
    "filename": "file.txt",
    "stem": "file",
    "extension": "txt",
    "path_list": [
      ".path",
      "to",
      "file.txt"
    ]
  }
]

the same with ~ and ..:

echo "~/path/to/file.txt" | jc --path-list -p
[
  {
    "path": "~/path/to/file.txt",
    "parent": "~/path/to",
    "filename": "file.txt",
    "stem": "file",
    "extension": "txt",
    "path_list": [
      "~path",
      "to",
      "file.txt"
    ]
  }
]

Comment on lines 295 to 303
raw_output: List[Dict] = []
if jc.utils.has_data(data):
for line in data.split(":"):
parsed_line = url.parse(
line,
raw=raw,
quiet=quiet
)
raw_output.append(parsed_line)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion from chatgpt - it is shorter but is it better readable?

Suggested change
raw_output: List[Dict] = []
if jc.utils.has_data(data):
for line in data.split(":"):
parsed_line = url.parse(
line,
raw=raw,
quiet=quiet
)
raw_output.append(parsed_line)
raw_output = [
url.parse(line, raw=raw, quiet=quiet)
for line in data.split(":")
if jc.utils.has_data(data)
]

@muescha
Copy link
Contributor Author

muescha commented Jan 4, 2024

The more I consider it, the more it seems preferable to create a path parser using pathlib rather than repurposing the url parser for paths. pathlib offers all the necessary functions for handling various edge cases effortlessly.

@kellyjonbrazil
Copy link
Owner

The more I consider it, the more it seems preferable to create a path parser using pathlib rather than repurposing the url parser for paths. pathlib offers all the necessary functions for handling various edge cases effortlessly.

I agree. I'm also curious how this parser would/could work with slurp? If you have a list of pathlists, then they could get bundled into and array of arrays. Another approach would be to have slurp use extend rather than append so it stays an array of objects.

@muescha
Copy link
Contributor Author

muescha commented Jan 4, 2024

I anticipate that the --slurp option should generate arrays of arrays when the initial command produces an array.

I'm uncertain whether a --slurp-flat command should be created.

A workaround involves eliminating the nested array using jq "[ .[] .[] ]":

echo "/abc/def/gh.txt:/xyz/uvw/ab.app\n/def/hij/klm.txt:/efe/app.txt" \
| jc --path-list -p -s | jq "[ .[] .[] ]"
[
  {
    "path": "/abc/def/gh.txt",
    "parent": "/abc/def",
    "filename": "gh.txt",
    "stem": "gh",
    "extension": "txt",
    "path_list": [
      "abc",
      "def",
      "gh.txt"
    ]
  },
  {
    "path": "/xyz/uvw/ab.app",
    "parent": "/xyz/uvw",
    "filename": "ab.app",
    "stem": "ab",
    "extension": "app",
    "path_list": [
      "xyz",
      "uvw",
      "ab.app"
    ]
  },
  {
    "path": "/def/hij/klm.txt",
    "parent": "/def/hij",
    "filename": "klm.txt",
    "stem": "klm",
    "extension": "txt",
    "path_list": [
      "def",
      "hij",
      "klm.txt"
    ]
  },
  {
    "path": "/efe/app.txt",
    "parent": "/efe",
    "filename": "app.txt",
    "stem": "app",
    "extension": "txt",
    "path_list": [
      "efe",
      "app.txt"
    ]
  }
]

@muescha muescha changed the title draft for path_list draft for path and path_list Jan 4, 2024
@muescha
Copy link
Contributor Author

muescha commented Jan 6, 2024

I'd use PurePosixPath as I did for the URL parser. I ran into the same issue and have not yet investigated why there is different behavior on different platforms. I think we can document this parser is for POSIX compliant paths only.

On Windows systems, paths are parsed using Path as a Windows path, resulting in the use of \\ as the path delimiter instead of /. However, with PurePosixPath, the path is parsed following the conventions of *nix systems.

@muescha
Copy link
Contributor Author

muescha commented Jan 6, 2024

POSIX

I did not know POSIX before - should it be named better Unix styled path instead of POSIX styled path?

@muescha
Copy link
Contributor Author

muescha commented Jan 6, 2024

I anticipate that the --slurp option should generate arrays of arrays when the initial command produces an array.

Makes sense - I can see how you might want to treat each array of paths separately instead of having them all in one bucket.

Also, if a parser returns a List instead of a Dict, this segment of code won't function as expected and may result in unexpected behavior.

I backed out the slurp flattening code.

👍

@muescha
Copy link
Contributor Author

muescha commented Jan 6, 2024

I added some windows path checks in 66830e4
(debug prints still there to see what the windows CI produces)

Note: On mac I need to use single quotes otherwise the by \\x or \\f or get excaped...

echo "C:\\\\Windows\\\\Program Files\\\\xfolder\\\\file.txt" | jc --path -p
echo 'C:\\Windows\\Program Files\\xfolder\\file.txt' | jc --path -p

This are the drive and root values for windows:

 path: /Library/Application Support/Script Editor/Templates/Cocoa-AppleScript Applet.app/Contents/Info.plist
drive: 
 root: /
 
 path: C:\Windows\Program Files\xfolder\file.txt
drive: C:
 root: \

@muescha
Copy link
Contributor Author

muescha commented Jan 6, 2024

Fun fact: PureWindowsPath and PurePosixPath did see an \n at the end on command line and on tests

@muescha
Copy link
Contributor Author

muescha commented Jan 8, 2024

fun fact after this PR

find . -type f -name '*.json' | wc -l
    1000

🎉

@kellyjonbrazil
Copy link
Owner

Will you be ready for me to merge this soon?

@muescha muescha marked this pull request as ready for review January 31, 2024 01:45
@muescha
Copy link
Contributor Author

muescha commented Jan 31, 2024

yes it is ready to merge

@muescha
Copy link
Contributor Author

muescha commented Jan 31, 2024

I expect update to the docs at README.md and path.md and path_list.md done by docgen.sh and should not be included in this PR.

@kellyjonbrazil
Copy link
Owner

Looks nice, thanks!

@kellyjonbrazil kellyjonbrazil merged commit d65d2af into kellyjonbrazil:dev Jan 31, 2024
21 checks passed
@muescha muescha deleted the feature/path-list branch February 1, 2024 17:22
kellyjonbrazil added a commit that referenced this pull request Feb 6, 2024
* draft for path_list

* updaate doc

* add input check

* fix types

* fix schema: add missing properties

* add _process

* fix _process docs

* refactor: extract path.py parser

* swap order of names alphabetically

* documentation and comments

* path parser: add early return for nodata

* path and path-list parser: add test and fixtures

* typo in file name

* add early return for nodata

* add test and fixtures

* typo in file name

* rename fixtures

* rename fixtures

* refactor to pathlib.Path

* failing on windows - use PurePosixPath

* changed the way to strip dot from suffix

* add POSIX to path

* test commit to see results on windows is failing

* test commit to see results on windows is failing

* add windows path detection

* somehow Path not like the newline from input line

* add test with more items

* remove debug print

* wrap test loops into into subTest

* remove print statements

* add path and path-list to CHANGELOG

---------

Co-authored-by: Kelly Brazil <[email protected]>
kellyjonbrazil added a commit that referenced this pull request Feb 6, 2024
* draft for path_list

* updaate doc

* add input check

* fix types

* fix schema: add missing properties

* add _process

* fix _process docs

* refactor: extract path.py parser

* swap order of names alphabetically

* documentation and comments

* path parser: add early return for nodata

* path and path-list parser: add test and fixtures

* typo in file name

* add early return for nodata

* add test and fixtures

* typo in file name

* rename fixtures

* rename fixtures

* refactor to pathlib.Path

* failing on windows - use PurePosixPath

* changed the way to strip dot from suffix

* add POSIX to path

* test commit to see results on windows is failing

* test commit to see results on windows is failing

* add windows path detection

* somehow Path not like the newline from input line

* add test with more items

* remove debug print

* wrap test loops into into subTest

* remove print statements

* add path and path-list to CHANGELOG

---------

Co-authored-by: Kelly Brazil <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants