Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add block archiving support #339

Open
Tracked by #309
a-saksena opened this issue Nov 13, 2024 · 0 comments · May be fixed by #485
Open
Tracked by #309

feat: add block archiving support #339

a-saksena opened this issue Nov 13, 2024 · 0 comments · May be fixed by #485
Assignees
Labels
Block Node Issues/PR related to the Block Node. Improvement Code changes driven by non business requirements P1 High priority issue. Required to be completed in the assigned milestone.
Milestone

Comments

@a-saksena
Copy link
Contributor

a-saksena commented Nov 13, 2024

Story

AS A Block Node User
I WANT to have my block files archived
SO THAT I will have a reduced storage cost

Tech Notes:

  • DEPENDS ON feat: add compression mode for BlockAsFileWriter #282

  • See Design a directory structure #125 where we have defined the design of the new block-as-file.

  • The scope of this issue is to add archiving support to the block files that will be written to disk.

  • We need to implement a process/thread, that will be going to the fs and will be archiving block files that are already written to disk.

  • The parameters of this process/thread should probably be configurable externally.


The Archiving Process:

  • we have a live root and an archive root
  • has configurable amount of decimal order of magnitude for amount
    of items to be archived at a given time
  • we always want to be one step behind the live root (writing)
    e.g.
    we are archiving per 1000, the live root will be writing 2000-2999,
    1000-1999 is already written, and we are archiving at 0-999,
    so in that way we have the less than 1000 most recent and also the
    1000 before the most recent
  • the "gap" should not be less than 1 archive unit. It could be more,
    but must not be less. Reasons for doing this:
    • we want to make sure the most recent x amount are rapidly available
    • we want to make sure that we do not accidentally try to archive an active block

The Algorithm:

  1. Writing the blocks:

    1. We first write the block where it must be located in the live root
    2. Then, we have a process that will archive a given amount of blocks
      which will be an order of magnitude of 10
      • the archiving must be behind the live root with at least 1 full order
        of magnitude as configured
      • the archiving must not go below 1 order of magnitude of difference
        between the live root and the archive root in order to ensure that
        no partially written block will be archived, but also to have the
        latest blocks present immediately which will be always more performant
        for fetching them
      • visualization:

        written 2000
        archived 0-999
        time passes...
        written 3000
        archived 1000-1999

    3. After the archiving is done, there needs to be a symlink made in the live root
      • after we have archived, now we need to symlink the zip in the live root
      • the symlink will point to the archive root where the actual blocks now reside
      • the blocks now can again be resolved from the live root, but now they need to
        be searched for under the symlink
      • visualization:

        let's archive all between 7..0001 and 7..1000:
        ...
        live/0/0/0/7/0/0/0001.blk
        ...
        live/0/0/0/7/0/0/1000.blk

        • first of all, we zip all between 7..0001 and 7..1000
        • the dest. of the zip will be at archive/0/0/0/7.zip/...
        • archiving is now done completely
        • now we need to symlink live/0/0/0/7.zip -> archive/0/0/0/7.zip
    4. After archiving is done completely and the symlink is made, we need to delete
      what is in the live root
      • visualization:

        utilizing the previous example to archive all between 7..0001 and 7..1000:

        • we are certain the archiving is entirely done and symlink is made
        • we now delete live/0/0/0/7/...
        • blocks between 7..0001 and 7..1000 can now be found through the symlink
  2. Searching for a block:

    1. We first search for the block under the live root (currently implemented)
      • if found, we return it
    2. If not found, then we need to search the archive, again under the live root,
      since there will be a symlink there that points to the actual place in the
      archive root, but now we need to search for the symlink instead of where
      the block would actually be
      • if found there, we return it
@ata-nas ata-nas changed the title Add block archiving support feat: add block archiving support Nov 15, 2024
@ata-nas ata-nas added Improvement Code changes driven by non business requirements P1 High priority issue. Required to be completed in the assigned milestone. Block Node Issues/PR related to the Block Node. labels Nov 15, 2024
@ata-nas ata-nas self-assigned this Nov 15, 2024
@ata-nas ata-nas added this to the 0.3.0 milestone Nov 15, 2024
@ata-nas ata-nas modified the milestones: 0.3.0, 0.4.0 Dec 16, 2024
@ata-nas ata-nas linked a pull request Jan 13, 2025 that will close this issue
2 tasks
@ata-nas ata-nas linked a pull request Jan 13, 2025 that will close this issue
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Block Node Issues/PR related to the Block Node. Improvement Code changes driven by non business requirements P1 High priority issue. Required to be completed in the assigned milestone.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants