Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add new dir package for directory management #43

Merged
merged 8 commits into from
Sep 3, 2024

Conversation

wanliqun
Copy link
Contributor

@wanliqun wanliqun commented Aug 16, 2024

  • Introduced the FsNode structure for representing hierarchical directories and files.
  • Implemented encoding and decoding of FsNode to/from a binary format with versioning and validation.
  • Added support for comparing directory structures to identify changes.
  • Included unit tests and documentation.

This change is Reviewable

@wanliqun wanliqun requested a review from boqiu August 16, 2024 09:38
Copy link
Contributor

@boqiu boqiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 8 of 8 files at r1, all commit messages.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @wanliqun)


transfer/unixfs/fs.go line 30 at r1 (raw file):

	Name    string    `json:"name"`              // File or directory name
	Type    FileType  `json:"type"`              // Type of the node (file or directory)
	Size    int64     `json:"size,omitempty"`    // File size in bytes (omitted for directories)

only for files as below


transfer/unixfs/fs.go line 31 at r1 (raw file):

	Type    FileType  `json:"type"`              // Type of the node (file or directory)
	Size    int64     `json:"size,omitempty"`    // File size in bytes (omitted for directories)
	Hash    string    `json:"hash,omitempty"`    // Merkle root hash (only for files)

Why directory do not have root hash?


transfer/unixfs/fs.go line 52 at r1 (raw file):

	// Insert and sort the new file at the correct position
	node.Entries = append(node.Entries, file)
	node.sort()

use btree instead?


transfer/unixfs/fs.go line 58 at r1 (raw file):

// BatchAdd adds multiple files or directories to the current directory and sorts once.
func (node *FsNode) BatchAdd(entries []*FsNode) error {

Why not merge with Add method, e.g. Add(entries ...*FsNode)?


transfer/unixfs/fs.go line 64 at r1 (raw file):

	// Append all new entries
	node.Entries = append(node.Entries, entries...)

What if file already exists?


transfer/unixfs/fs.go line 72 at r1 (raw file):

// Update updates an existing file's metadata within the directory.
func (node *FsNode) Update(file *FsNode) error {

Could merge with Add method? Maybe insert is a better name.


transfer/unixfs/fs.go line 88 at r1 (raw file):

// Rename renames a file or directory within the directory and keeps the entries sorted.
func (node *FsNode) Rename(oldName, newName string) error {

please consider to provide diff cli for different versions of folders.


transfer/unixfs/util.go line 10 at r1 (raw file):

// GetMerkleTreeRootOfFile calculates and returns the Merkle tree root hash of a given file.
// It opens the file, generates the Merkle tree, and returns the root hash as a hexadecimal string.
func GetMerkleTreeRootOfFile(filename string) (string, error) {

How about moving this utility to relevant package?

@wanliqun wanliqun requested a review from boqiu August 20, 2024 07:40
Copy link
Contributor Author

@wanliqun wanliqun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Reviewable status: 0 of 16 files reviewed, 8 unresolved discussions (waiting on @boqiu)


transfer/unixfs/fs.go line 30 at r1 (raw file):

Previously, boqiu (Bo QIU) wrote…

only for files as below

Done.


transfer/unixfs/fs.go line 31 at r1 (raw file):

Previously, boqiu (Bo QIU) wrote…

Why directory do not have root hash?

Done.


transfer/unixfs/fs.go line 52 at r1 (raw file):

Previously, boqiu (Bo QIU) wrote…

use btree instead?

Done.


transfer/unixfs/fs.go line 58 at r1 (raw file):

Previously, boqiu (Bo QIU) wrote…

Why not merge with Add method, e.g. Add(entries ...*FsNode)?

Done.


transfer/unixfs/fs.go line 64 at r1 (raw file):

Previously, boqiu (Bo QIU) wrote…

What if file already exists?

Done.


transfer/unixfs/fs.go line 72 at r1 (raw file):

Previously, boqiu (Bo QIU) wrote…

Could merge with Add method? Maybe insert is a better name.

Done.


transfer/unixfs/fs.go line 88 at r1 (raw file):

Previously, boqiu (Bo QIU) wrote…

please consider to provide diff cli for different versions of folders.

Done.


transfer/unixfs/util.go line 10 at r1 (raw file):

Previously, boqiu (Bo QIU) wrote…

How about moving this utility to relevant package?

Done.

@wanliqun wanliqun changed the title feat: Add UnixFS-based FsNode with encoding/decoding feat: Add new dir package for directory management Aug 20, 2024
- Implemented the FsNode data structure to represent file and directory hierarchies using UnixFS.
- Added support for encoding FsNode into UnixFS format and decoding back into the FsNode structure.
- Included unit tests to verify the correctness of serialization and deserialization processes etc.
@wanliqun wanliqun force-pushed the folder-ipld-definition branch from 395251d to d40e362 Compare August 26, 2024 09:48
Copy link
Contributor

@boqiu boqiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 16 of 16 files at r2, all commit messages.
Reviewable status: all files reviewed, 11 unresolved discussions (waiting on @wanliqun)


transfer/dir/codec.go line 19 at r2 (raw file):

// EncodeFsNode serializes an FsNode into a custom binary format with magic bytes and versioning.
func EncodeFsNode(node *FsNode) ([]byte, error) {

How about to implement the standard encoding.BinaryMarshaler interface for FsNode?


transfer/dir/codec.go line 48 at r2 (raw file):

// DecodeFsNode deserializes an FsNode from a custom binary format with magic bytes and versioning.
func DecodeFsNode(data []byte) (*FsNode, error) {

How about to implement the standard encoding.BinaryUnmarshaler interface for FsNode?


transfer/dir/codec.go line 63 at r2 (raw file):

	// Verify codec version
	var version uint16
	if err := binary.Read(buf, binary.BigEndian, &version); err != nil {

Is there any error if no enough data in buf?

Basically, it is very easy to check the length for magic number and version, and just use slice along with offset is enough to decode data.


transfer/dir/diff.go line 15 at r2 (raw file):

	Removed   DiffStatus = "removed"
	Modified  DiffStatus = "modified"
	Unchanged DiffStatus = "unchanged"

Better to and DiffStatus as prefix for all above const variables.


transfer/dir/fs.go line 19 at r2 (raw file):

const (
	File      FileType = "file"

Better to add prefix FileType.


transfer/dir/fs.go line 21 at r2 (raw file):

	File      FileType = "file"
	Directory FileType = "directory"
	Symbolic  FileType = "symbolic"

How about raw data? E.g. small files (1-4 KB) could be wrapped in IPLD object.


transfer/dir/fs.go line 28 at r2 (raw file):

	Name    string      `json:"name"`              // File or directory name
	Type    FileType    `json:"type"`              // File type of the node
	Hash    common.Hash `json:"hash,omitempty"`    // Merkle hash

Should be not empty all the time?

Code quote:

omitempty

transfer/dir/fs.go line 59 at r2 (raw file):

	root = crypto.Keccak256Hash(entries[len(entries)-1].Hash[:])
	for i := len(entries) - 2; i >= 0; i-- {

may overflow? e.g. only 1 entry.


transfer/dir/fs.go line 81 at r2 (raw file):

		Name: name,
		Type: Symbolic,
		Hash: crypto.Keccak256Hash([]byte(link)),

It seems Hash is useless for symbol link, right?


transfer/dir/fs.go line 89 at r2 (raw file):

func (node *FsNode) Search(fileName string) (*FsNode, bool) {
	i, found := sort.Find(len(node.Entries), func(i int) int {
		return strings.Compare(fileName, node.Entries[i].Name)

Why not return bool? I think you want to find the FsNode with exact name, right?


transfer/dir/fs.go line 115 at r2 (raw file):

	// Root directory represented as "."
	root.Name = "."

Retrieve the dir name from path?

Copy link
Contributor Author

@wanliqun wanliqun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 10 of 16 files reviewed, 11 unresolved discussions (waiting on @boqiu)


transfer/dir/codec.go line 19 at r2 (raw file):

Previously, boqiu (Bo QIU) wrote…

How about to implement the standard encoding.BinaryMarshaler interface for FsNode?

Done.


transfer/dir/codec.go line 48 at r2 (raw file):

Previously, boqiu (Bo QIU) wrote…

How about to implement the standard encoding.BinaryUnmarshaler interface for FsNode?

Done.


transfer/dir/codec.go line 63 at r2 (raw file):

Previously, boqiu (Bo QIU) wrote…

Is there any error if no enough data in buf?

Basically, it is very easy to check the length for magic number and version, and just use slice along with offset is enough to decode data.

Done.


transfer/dir/diff.go line 15 at r2 (raw file):

Previously, boqiu (Bo QIU) wrote…

Better to and DiffStatus as prefix for all above const variables.

Done.


transfer/dir/fs.go line 19 at r2 (raw file):

Previously, boqiu (Bo QIU) wrote…

Better to add prefix FileType.

Done.


transfer/dir/fs.go line 21 at r2 (raw file):

Previously, boqiu (Bo QIU) wrote…

How about raw data? E.g. small files (1-4 KB) could be wrapped in IPLD object.

Done.


transfer/dir/fs.go line 28 at r2 (raw file):

Previously, boqiu (Bo QIU) wrote…

Should be not empty all the time?

Done.


transfer/dir/fs.go line 59 at r2 (raw file):

Previously, boqiu (Bo QIU) wrote…

may overflow? e.g. only 1 entry.

Done.


transfer/dir/fs.go line 81 at r2 (raw file):

Previously, boqiu (Bo QIU) wrote…

It seems Hash is useless for symbol link, right?

It can be used for diff two symbolic links just like other types like directory or raw file etc.


transfer/dir/fs.go line 89 at r2 (raw file):

Previously, boqiu (Bo QIU) wrote…

Why not return bool? I think you want to find the FsNode with exact name, right?

I guess you mean by using sort.Search instead, if so that would be rather verbose to achieve the same result:
`
// Ensure that entries are sorted by name
idx := sort.Search(len(node.Entries), func(i int) bool {
return node.Entries[i].Name >= fileName
})

// Check if the found entry matches the fileName
if idx < len(node.Entries) && node.Entries[idx].Name == fileName {
return node.Entries[idx], true
}
`

or do you mean Search function only returns a boolean?


transfer/dir/fs.go line 115 at r2 (raw file):

Previously, boqiu (Bo QIU) wrote…

Retrieve the dir name from path?

use root dir instead now.

Copy link
Contributor

@boqiu boqiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 6 files at r3, all commit messages.
Reviewable status: all files reviewed, 10 unresolved discussions (waiting on @wanliqun)


transfer/dir/codec.go line 39 at r3 (raw file):

	// Calculate the total length needed for the byte slice:
	// MagicBytes + CodecVersion (2 bytes) + Metadata Length (4 bytes) + JSON Metadata + Optional Raw Data
	totalLength := len(CodecMagicBytes) + 2 + 4 + len(mdata)

If raw data supported, why not calculate the raw data size here?


transfer/dir/codec.go line 63 at r3 (raw file):

	// Write raw data to buffer
	buf := bytes.NewBuffer(data)

offset not changed after writing JSON data, or should to write raw data from data[offset:].

Is this case covered by unit test?


transfer/dir/codec.go line 92 at r3 (raw file):

// It decodes the FsNode from a binary format.
func (node *FsNode) UnmarshalBinary(data []byte) error {
	offset := int64(0)

offset is unnecessary, could change the data slice after value retrievd, e.g. data = data[offset:]


transfer/dir/codec.go line 137 at r3 (raw file):

	// Read and decode raw data
	buf := bytes.NewReader(nil)
	if offset < datalen {

is it possible?


transfer/dir/codec.go line 151 at r3 (raw file):

	switch node.Type {
	case FileTypeRaw:
		data := make([]byte, node.Size)

Uncessary mem alloc?


transfer/dir/codec.go line 169 at r3 (raw file):

	}

	return nil

Check the exact raw data length?


transfer/dir/fs.go line 20 at r3 (raw file):

func init() {
	if runtime.GOOS == "windows" {
		RootDir = "\\"

Unnecessary, root dir will not be used as path. Instead, it may be used in http path.


transfer/dir/fs.go line 73 at r3 (raw file):

	root = crypto.Keccak256Hash(entries[0].Hash[:])
	for i := 1; i < len(entries); i++ {
		root = crypto.Keccak256Hash(entries[i].Hash[:], root[:])

Suggest to calculate hash in order, e.g. root = keccak(root, right)


transfer/dir/fs.go line 91 at r3 (raw file):

// NewRawFsNode creates a new FsNode representing a raw file.
func NewRawFsNode(name string, data []byte) (*FsNode, error) {
	iterdata, err := core.NewDataInMemory(data)

What if data is empty?


transfer/dir/fs.go line 167 at r3 (raw file):

	case info.Mode().IsRegular():
		return buildFileNode(path, info)
	// TODO: build raw file node based on file size?

Allow user to specify a threshold to determine if file should be packed as raw data.

On the other hand, it's better add unit test to ensure the root hash of raw data or small file is the same.

Copy link
Contributor

@boqiu boqiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 11 unresolved discussions (waiting on @wanliqun)


transfer/dir/codec.go line 144 at r3 (raw file):

	}

	return nil

How about to verify root hash at the end of Unmarshal to ensure data not changed?

Copy link
Contributor Author

@wanliqun wanliqun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 10 of 16 files reviewed, 11 unresolved discussions (waiting on @boqiu)


transfer/dir/codec.go line 39 at r3 (raw file):

Previously, boqiu (Bo QIU) wrote…

If raw data supported, why not calculate the raw data size here?

deprecated due to raw file type removed


transfer/dir/codec.go line 63 at r3 (raw file):

Previously, boqiu (Bo QIU) wrote…

offset not changed after writing JSON data, or should to write raw data from data[offset:].

Is this case covered by unit test?

deprecated due to raw file type removed


transfer/dir/codec.go line 92 at r3 (raw file):

Previously, boqiu (Bo QIU) wrote…

offset is unnecessary, could change the data slice after value retrievd, e.g. data = data[offset:]

done


transfer/dir/codec.go line 137 at r3 (raw file):

Previously, boqiu (Bo QIU) wrote…

is it possible?

deprecated due to raw file type removed


transfer/dir/codec.go line 144 at r3 (raw file):

Previously, boqiu (Bo QIU) wrote…

How about to verify root hash at the end of Unmarshal to ensure data not changed?

Merkle hash is now unset for directory or symbolic file type.


transfer/dir/codec.go line 151 at r3 (raw file):

Previously, boqiu (Bo QIU) wrote…

Uncessary mem alloc?

deprecated due to raw file type removed


transfer/dir/codec.go line 169 at r3 (raw file):

Previously, boqiu (Bo QIU) wrote…

Check the exact raw data length?

deprecated due to raw file type removed


transfer/dir/fs.go line 20 at r3 (raw file):

Previously, boqiu (Bo QIU) wrote…

Unnecessary, root dir will not be used as path. Instead, it may be used in http path.

done


transfer/dir/fs.go line 73 at r3 (raw file):

Previously, boqiu (Bo QIU) wrote…

Suggest to calculate hash in order, e.g. root = keccak(root, right)

Merkle hash is now unset for directory or symbolic file type.


transfer/dir/fs.go line 91 at r3 (raw file):

Previously, boqiu (Bo QIU) wrote…

What if data is empty?

deprecated due to raw file type removed


transfer/dir/fs.go line 167 at r3 (raw file):

Previously, boqiu (Bo QIU) wrote…

Allow user to specify a threshold to determine if file should be packed as raw data.

On the other hand, it's better add unit test to ensure the root hash of raw data or small file is the same.

deprecated due to raw file type removed

Copy link
Contributor

@boqiu boqiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 6 files at r4, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @wanliqun)


transfer/dir/codec.go line 88 at r4 (raw file):

	data = data[2:]

	// Read metadata length

Left data is all for IPLD json data, and length is unnecessary.


transfer/dir/fs.go line 27 at r4 (raw file):

	Name    string    `json:"name"`              // File or directory name
	Type    FileType  `json:"type"`              // File type of the node
	Hash    string    `json:"hash,omitempty"`    // Merkle hash (only for regular files)

Root or MerkleRoot is more meaningful.

Code quote:

Hash

transfer/dir/fs.go line 85 at r4 (raw file):

	switch node.Type {
	case FileTypeFile:
		return node.Hash == rhs.Hash && node.Size == rhs.Size

Compare the root hash is enough.

Copy link
Contributor Author

@wanliqun wanliqun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 12 of 16 files reviewed, 3 unresolved discussions (waiting on @boqiu)


transfer/dir/codec.go line 88 at r4 (raw file):

Previously, boqiu (Bo QIU) wrote…

Left data is all for IPLD json data, and length is unnecessary.

done


transfer/dir/fs.go line 27 at r4 (raw file):

Previously, boqiu (Bo QIU) wrote…

Root or MerkleRoot is more meaningful.

done


transfer/dir/fs.go line 85 at r4 (raw file):

Previously, boqiu (Bo QIU) wrote…

Compare the root hash is enough.

done

Copy link
Contributor

@boqiu boqiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r5, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @wanliqun)

@wanliqun wanliqun merged commit 2cf72f6 into main Sep 3, 2024
3 checks passed
@wanliqun wanliqun deleted the folder-ipld-definition branch September 3, 2024 08:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants