Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add replication support for ClickHouse state table + Fixes ListMigration query for ClickHouse #520

Closed
wants to merge 12 commits into from

Conversation

arunk-s
Copy link

@arunk-s arunk-s commented May 16, 2023

This MR adds replication support for state table (goose_db_version) in ClickHouse.

  • Replication support can be enabled by calling AttachOptions to goose and provide additional parameters.
  • AttachOptions allows for free form variables to be passed to applicable drivers and currently only ClickHouse dialect inspects it.
  • ClickHouse dialect looks for ON_CLUSTER key present in the options and if enabled picks the cluster name provided or uses a {cluster} macro that is expected to be present in ClickHouse configuration.
  • Changes the table engine to KeeperMap which is a special table engine based on ClickHouse keeper that provides linearizable reads and consistent reads. This was picked to ensure that concurrent writes to state table have deterministic behavior. ClickHouse configuration allows for a strict mode that can ensure that migrations are only applied once.
  • Fixes a bug introduced with clickhouse: Fixed get db version query #240 which breaks AllowMissingMigrations for Clickhouse.
  • Changes timestamp column type to DateTime64 with nano seconds in precision to ensure as deterministic results as possible when returning rows for ListMigrations.
  • End to End tests are modified to include a test for Cluster mode and additional configuration is added to other tests as well to test with KeeperMap engine.
  • Addes testcontainers-go and google-cmp as dependencies for end to end tests. Testcontainers offers much better API for providing disk mounts and ensuring cleanup.
  • Bumps Go version to 1.20.

A fork of this library with these changes lives at https://gitlab.com/gitlab-org/opstrace/goose which is used in https://gitlab.com/gitlab-org/opstrace/opstrace.

@arunk-s
Copy link
Author

arunk-s commented May 16, 2023

@mfridman Huge thanks for maintaining this library.

I'm hoping to contribute the fixes back so if you can take some time to review and provide feedback. That will be great :)

@mfridman
Copy link
Collaborator

@mfridman Huge thanks for maintaining this library.

I'm hoping to contribute the fixes back so if you can take some time to review and provide feedback. That will be great :)

Thanks, it'll take me some time to get through this one. And I'm also not sure if the /v3 goose module is the right place for this change. (Most of my recent efforts have been on /v4).

Having said that, I'll drop a few comments to get the conversation started.

go.mod Outdated Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
@@ -34,6 +59,20 @@ func (c *Clickhouse) GetMigrationByVersion(tableName string) string {
}

func (c *Clickhouse) ListMigrations(tableName string) string {
q := `SELECT version_id, is_applied FROM %s ORDER BY version_id DESC`
q := `SELECT version_id, is_applied FROM %s ORDER BY tstamp DESC`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is fixing a bug? (I think you mentioned this in the description)

The core goose library relies on a non-user-defined order for when migrations get applied, this is typically a sequential id in all other databases. It seems here we were using version_id, but this is incorrect. In lieu of a sequential id, we should be using time..

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. There is a no equivalent of an auto incrementing ID in clickhouse and it doesn't provide any way to get a globally incremented unique ID across its nodes.

So I relied on using timestamps, with just second precision, it was non-deterministic as migrations could have same timestamps.

With nanosecond precision, that chance is almost non-existent and the implementation further augments it with use of KeeeperMap Engine.


// QuerierOptions is an interface to provide specific options to a dialect.
// For ex: providing replication support to underlying migration table.
type QuerierOptions interface {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I initially added a "store" and "querier" the idea was to have the core goose library only interact with the store, and have no idea about the underlying queries.

The store abstracts the underlying querier which holds the raw string queries for the different database technologies.

Not sure if this made it easier/harder to evolve, but we should probably think about if there's a more elegant way to now thread options from the core library to those 2 internal layers.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I struggled around this as well. I didn't find an easier way to hookup options all the way to the implementation specific detail. I tried to minimize the leaky abstraction(s) but I think it will have to exposed in one way.

If you have any suggestions, I can try those out.

)
ENGINE = MergeTree()
ORDER BY (date)`
ENGINE = KeeperMap('/goose_version')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been coming back to this PR on/off, trying to think through the implementation from the perspective of a "general purpose" migration tool. On the surface, everything looks reasonable with the exception of changing the engine.

For context, another PR #530 came in requesting another engine.

Should the default engine be MergeTree but allow this to be configurable?

Maybe we could put our heads together on how to solve this so the tool can be useful for all or most ClickHouse users? cc @iamluc @arunk-s

I think PR #530 is on the right track to setting a custom engine, and afaics this would satisfy:

  1. default engine: MergeTree()
  2. ReplicatedMergeTree()
  3. KeeperMap()
  4. ... etc

Other questions, does the ORDER BY or PARTITION BY need to be configurable? Hopefully not so we can make this as generic as possible.

Last thought, leaving ENGINE = %s to the user may be a footgun and I wish we could do more upfront validation, but then we have to account for all the possible ClickHouse engine types and this seems like a lot of work to maintain.

Maybe there's a world where the user is allowed to supply a template for the table and as long as it contains:

  • version_id
  • is_applied (legacy field, deprecated)
  • tstamp

Not sure I like this idea, since goose should own the table it creates to ensure the identifiers match up across all implementations.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,
I had a quick look at the PR and it looks good to me (and could replace my own PR #530)

My only concern is about the engine. I'm not a ClickHouse expert and didn't know about the KeeperMap engine, but it seems it must be enabled (https://clickhouse.com/docs/en/engines/table-engines/special/keeper-map) which could be not possible with hosted services.
I suggest to add the engine as a new option.

Copy link
Author

@arunk-s arunk-s Jul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having an option to toggle underlying table engine can definitely work!

Other questions, does the ORDER BY or PARTITION BY need to be configurable? Hopefully not so we can make this as generic as possible.

I think these settings are affected by the choice of Engine so yeah they'll need to be configurable if the custom engine is allowed.

I can understand the concern with hosted services to not have KeeperMap though I doubt that platforms like ClickHouse Cloud disallows creating tables with this table engine.

Last thought, leaving ENGINE = %s to the user may be a footgun and I wish we could do more upfront validation, but then we have to account for all the possible ClickHouse engine types and this seems like a lot of work to maintain.

Definitely, convention over configuration is generally desirable however in this particular case I'm afraid the abstractions will leak a bit in order to properly support the underlying database. But in any case we need to have a sensible default and only selected number of choices.

The library does quite well supporting different databases but that does come at a cost I'm afraid.

I'll be happy to lend a hand with ClickHouse specific topics though it can take a bit of time for me to catch up.

@mfridman
Copy link
Collaborator

Dropping a thought here. We're soon (#379) going to add a goose provider, which unlocks new capabilities.

One option available to us is exposing the "sql store" interface so users could bring their own. The goose package offers a few default ones, but for more advanced use cases you can implement a thin interface while goose handles the migration logic.

Let me know what you think, as that would satisfy a lot of requests like this one.

Example

The "store" (unsure if that'll be the final name) is the interface goose leverges for all it's migration logic, and currently all implementations are internal: sqlite3, postgres, mysql, clickhouse, etc.

type customStore struct{}

var _ Store = (*customStore)(nil)

func (s *customStore) CreateVersionTable(ctx context.Context, db sqlextended.DBTxConn) error {
	return errors.New("not implemented")
}

func (s *customStore) InsertOrDelete(ctx context.Context, db sqlextended.DBTxConn, direction bool, version int64) error {
	return errors.New("not implemented")
}

func (s *customStore) GetMigration(ctx context.Context, db sqlextended.DBTxConn, version int64) (*GetMigrationResult, error) {
	return nil, errors.New("not implemented")
}

func (s *customStore) ListMigrations(ctx context.Context, db sqlextended.DBTxConn) ([]*ListMigrationsResult, error) {
	return nil, errors.New("not implemented")
}

The sqlextended is an interface to combine *sql.DB, *sql.TX and *sql.Conn.

type DBTxConn interface {
	ExecContext(ctx context.Context, query string, args ...any) (sql.Result, error)
	QueryContext(ctx context.Context, query string, args ...any) (*sql.Rows, error)
	QueryRowContext(ctx context.Context, query string, args ...any) *sql.Row
}

var (
	_ DBTxConn = (*sql.DB)(nil)
	_ DBTxConn = (*sql.Tx)(nil)
	_ DBTxConn = (*sql.Conn)(nil)
)

And with the new goose provider, we could have something like:

store := yourpkg.NewStore(...)

goose.NewProvider(goose.DialectCustom, db, fsys,
	provider.WithCustomStore(store),
)

By setting the dialect to DialectCustom and supplying your own "custom store" you unlock this capability.

Do you think that would satisfy your requirement?

@arunk-s
Copy link
Author

arunk-s commented Oct 24, 2023

@mfridman

Yeah I think having a custom store interface can satisfy a lot of these changes. However there can be still some configuration options that are specific to the store, how do you see those options attached to the internal implementation via goose ?

I still think there will need to be a helper function that can be called from the goose that passes arguments or options to the internal implementation. In this MR that role is fulfilled by AttachOptions(map[string]string).

@mfridman
Copy link
Collaborator

I still think there will need to be a helper function that can be called from the goose that passes arguments or options to the internal implementation. In this MR that role is fulfilled by AttachOptions(map[string]string).

Maybe not? Most (if not all) options can be defined when constructing the Provider. And if you're able to pass a custom Store implementation to the Provider, then you can also initialize it with whatever options.

In this MR you implemented the Querier, but what I'm suggesting is you instead implement this thin interface:

type Store interface {
	CreateVersionTable(ctx context.Context, db sqlextended.DBTxConn) error
	InsertOrDelete(ctx context.Context, db sqlextended.DBTxConn, direction bool, version int64) error
	GetMigration(ctx context.Context, db sqlextended.DBTxConn, version int64) (*GetMigrationResult, error)
	ListMigrations(ctx context.Context, db sqlextended.DBTxConn) ([]*ListMigrationsResult, error)
}

And then in your implementation / code you'd have:

type CustomStore struct {
	OnCluster    bool
	ClusterMacro string
}

var _ Store = (*CustomStore)(nil)

func NewCustomStore(onCluster bool, clusterMacro string) *CustomStore {
	return &CustomStore{
		OnCluster:    onCluster,
		ClusterMacro: clusterMacro,
	}
}

func (c *CustomStore) CreateVersionTable(context.Context, sqlextended.DBTxConn) error {
	return errors.New("not implemented")
}

func (c *CustomStore) InsertOrDelete(context.Context, sqlextended.DBTxConn, bool, int64) error {
	return errors.New("not implemented")
}

func (c *CustomStore) GetMigration(context.Context, sqlextended.DBTxConn, int64) (*GetMigrationResult, error) {
	return nil, errors.New("not implemented")
}

func (c *CustomStore) ListMigrations(context.Context, sqlextended.DBTxConn) ([]*ListMigrationsResult, error) {
	return nil, errors.New("not implemented")
}

And then when you start your app, you would:

func main() {
	// Bring your own Store implementation:
	// 	- CreateVersionTable
	// 	- InsertOrDelete
	// 	- GetMigration
	// 	- ListMigrations
	store := NewCustomStore(true, "clusterMacro")
	p, err := provider.NewProvider(CustomDialect, db, fsys, WithCustomStore(store))
}

Based on registered migrations, goose knows how to use the 4 store methods to do various migration-related tasks.

@iamluc
Copy link

iamluc commented Oct 25, 2023

I like your proposal @mfridman !
The only drawback I see is that it cannot be used with the "standard" goose binary. But as I already use a custom one, it's no big deal for me.

Also, a bonus point would be that the custom store could access the regular store to decorate it instead of rewriting all the methods.

@arunk-s
Copy link
Author

arunk-s commented Oct 26, 2023

Based on registered migrations, goose knows how to use the 4 store methods to do various migration-related tasks.

@mfridman, I see. Yeah that might work. I can try to give it a go but I am not yet sure when I'll be able to get to it.

@mfridman
Copy link
Collaborator

Based on registered migrations, goose knows how to use the 4 store methods to do various migration-related tasks.

@mfridman, I see. Yeah that might work. I can try to give it a go but I am not yet sure when I'll be able to get to it.

No worries whatsoever, the provider API is still internal and under active development. It'll land sometime in the next few weeks with a blog post about what's new and the more advanced use cases. I think it'll open a lot of opportunities for us to build upon.

@arunk-s
Copy link
Author

arunk-s commented Oct 17, 2024

Closing the MR as the provider support has landed and any additional requirements such as replication support on ClickHouse state table can be added via a custom store.
See reference if anyone is interested.

@arunk-s arunk-s closed this Oct 17, 2024
@arunk-s arunk-s deleted the arun/clickhouse-fixes branch October 17, 2024 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants