Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: fetch remote schema async #5347

Open
wants to merge 21 commits into
base: master
Choose a base branch
from
Open

Conversation

cisse21
Copy link
Member

@cisse21 cisse21 commented Dec 6, 2024

Description

  • Gets rid of warehouseSchema in schema struct
  • Introduces sync.go which syncs schema from warehouse
  • Update the warehouse schema in wh_schemas table at a configured frequency
  • Skip updating the schema on error in fetching schema
  • Fetches local schema from wh_schemas table instead of fetching from warehouse for every sync

Linear Ticket

Fixes WAR-151

Security

  • The code changed/added as part of this pull request won't create any security issues with how the software is being used.

@cisse21 cisse21 force-pushed the feat.fetchRemoteSchemaAsync branch 3 times, most recently from 05f985f to 59ae13c Compare December 6, 2024 05:18
warehouse/router/sync.go Outdated Show resolved Hide resolved
@cisse21 cisse21 force-pushed the feat.fetchRemoteSchemaAsync branch from 59ae13c to 842a746 Compare December 9, 2024 10:14
@cisse21 cisse21 force-pushed the feat.fetchRemoteSchemaAsync branch 4 times, most recently from 7883385 to 9415887 Compare December 9, 2024 11:51
@cisse21 cisse21 force-pushed the feat.fetchRemoteSchemaAsync branch from 9415887 to 229b4c8 Compare December 9, 2024 11:59
@cisse21 cisse21 force-pushed the feat.fetchRemoteSchemaAsync branch from 1c7ebd9 to 29e1254 Compare December 11, 2024 09:13
@cisse21 cisse21 force-pushed the feat.fetchRemoteSchemaAsync branch from 29e1254 to 649b4a7 Compare December 11, 2024 09:36
@cisse21 cisse21 force-pushed the feat.fetchRemoteSchemaAsync branch from 57579b5 to 1161fbb Compare December 11, 2024 10:00
@cisse21 cisse21 force-pushed the feat.fetchRemoteSchemaAsync branch from 420a53e to 5702ec8 Compare December 11, 2024 12:42
@cisse21 cisse21 force-pushed the feat.fetchRemoteSchemaAsync branch 2 times, most recently from 63cb186 to bfc0ab2 Compare December 23, 2024 10:34
@cisse21 cisse21 force-pushed the feat.fetchRemoteSchemaAsync branch 4 times, most recently from 24afc54 to cd29d05 Compare December 24, 2024 10:24
@cisse21 cisse21 force-pushed the feat.fetchRemoteSchemaAsync branch 3 times, most recently from 4ad3f26 to 6ac93e5 Compare December 30, 2024 10:53
@cisse21 cisse21 force-pushed the feat.fetchRemoteSchemaAsync branch from 6ac93e5 to b013c87 Compare January 2, 2025 12:59
@cisse21 cisse21 force-pushed the feat.fetchRemoteSchemaAsync branch from 7e43ff9 to 4117ad2 Compare January 3, 2025 05:15
@cisse21 cisse21 force-pushed the feat.fetchRemoteSchemaAsync branch from 392f793 to 30d0f23 Compare January 3, 2025 07:46
@cisse21 cisse21 requested a review from shekhar-rudder January 3, 2025 08:43
@cisse21 cisse21 marked this pull request as ready for review January 3, 2025 09:01
warehouse/router/state_export_data.go Outdated Show resolved Hide resolved
warehouse/router/state_export_data.go Outdated Show resolved Hide resolved
warehouse/router/state_export_data.go Outdated Show resolved Hide resolved
warehouse/router/sync.go Outdated Show resolved Hide resolved
warehouse/router/sync.go Outdated Show resolved Hide resolved
warehouse/router/sync.go Outdated Show resolved Hide resolved
warehouse/router/sync_test.go Outdated Show resolved Hide resolved
warehouse/router/sync_test.go Outdated Show resolved Hide resolved
warehouse/router/sync_test.go Outdated Show resolved Hide resolved
warehouse/router/sync_test.go Outdated Show resolved Hide resolved
@cisse21 cisse21 force-pushed the feat.fetchRemoteSchemaAsync branch from 4db5814 to f27fb84 Compare January 8, 2025 07:27
Comment on lines +1301 to +1303
if rs.DB == nil {
return
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor:

if rs.DB != nil {
    _ = rs.DB.Close()
}

CC: @achettyiitr common interface might help reduce some duplicate code per warehouse. Just keeping a note of it.

FetchSchemaFromWarehouse(ctx context.Context, m schema.FetchSchemaRepo) (model.Schema, error)
}

func (r *Router) sync(ctx context.Context) error {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cisse21 Thinking of tracking below list as a backlog task:

  1. Metric for failures
  2. How long it takes for a schema sync job to run for all warehouse models
  3. Consider if we would want any parallelism more so for MT or even hosted.

CC: @achettyiitr

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add metrics... didn't add parallelism in the first iteration as we plan to sync for every 12hrs or more

Comment on lines +342 to +344
// HasSchemaChanged compares the localSchema with the schemaInWarehouse
func (sh *Schema) HasSchemaChanged(schema model.Schema) bool {
return !reflect.DeepEqual(schema, sh.localSchema)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cisse21 Is the comment correct now? It will compare the schema with localSchema now?

Comment on lines +248 to +256
func (sh *Schema) UpdateLocalSchemaWithWarehouse(ctx context.Context, warehouseSchema model.Schema) error {
return sh.updateSchema(ctx, warehouseSchema)
}

func (sh *Schema) UpdateLocalSchema(ctx context.Context) error {
return sh.updateSchema(ctx, sh.localSchema)
}

func (sh *Schema) UpdateLocalSchema(ctx context.Context, updatedSchema model.Schema) error {
return sh.updateLocalSchema(ctx, updatedSchema)
func (sh *Schema) UpdateSchema(ctx context.Context, updatedSchema model.Schema) error {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cisse21
Too many functions with same name 😅
UpdateLocalSchemaWithWarehouse - Seems clear with func name.
UpdateLocalSchema - Seems to update in DB whatever is in memory?
UpdateSchema - ? Update in local and DB with whatever is passed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already had many functions with similar names :/ didn't want to get into much more refactoring in this PR

@cisse21 cisse21 force-pushed the feat.fetchRemoteSchemaAsync branch from f072690 to 195f9b2 Compare January 9, 2025 07:04
@cisse21 cisse21 force-pushed the feat.fetchRemoteSchemaAsync branch from 195f9b2 to 86fab9b Compare January 9, 2025 07:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants