Config Backward Compatibility
- RFC PR: datafuselabs/databend#5324
- Tracking Issue: datafuselabs/databend#5297
Summary
Adding config backward compatibility will allow us to iterate quickly while avoiding breaking the environment.
Motivation
While early birds are starting to deploy databend by themselves, it's time for us to establish some contracts between users. We should allow users to upgrade their deployments without breaking backward compatibility. In this RFC, we will focus on config.
config I mentioned here including:
- config file that is read by
databend-queryanddatabend-meta. - config env that read by
databend-queryanddatabend-meta - application args that are accepted by
databend-queryanddatabend-meta - protobuf messaged that generated by
databend-query(stored insidedatabend-meta)
Out of scope:
- Tools like
fuzzandmetactlare not covered by this RFC. - Command-line UX of
databend-queryanddatabend-metais another topic. We will not cover it in this RFC. - Config input/output by SQL/HTTP Rest API are not covered (for example, the output of table
system.config)
For convenience, I will use
databendto refer todatabend-queryanddatabend-meta.
With this RFC, our users will upgrade their deployments without breaking. Old configs should always work along with the new implementations.
Guide-level explanation
No action is needed for users to take while upgrading their deployments. They upgrade databend by replacing binaries and images directly.
Sometimes, they will get DEPRECATED warnings for some config fields. It's up to users to decide whether to migrate them. Before we introduce the versioned config formally, no config will be removed. And all config fields will work as before.
Reference-level explanation
Inside databend, we will split config into inner and outer:
inner
Config instances used inside databend. All logic SHOULD be implemented towards the inner config.
outer
Config instances are used as the front office of the databend. They will transform into an inner config. Other modules SHOULD NOT depend on outer config.
Take query for example:
The inner config of the query will be like this:
#[derive(Clone, Default, Debug, PartialEq, Serialize, Deserialize)]
#[serde(default)]
pub struct Config {
pub query: QueryConfig,
pub log: LogConfig,
pub meta: MetaConfig,
pub storage: StorageConfig,
pub catalog: HiveCatalogConfig,
}
The outer config of the query will be like this:
#[derive(Clone, Default, Debug, PartialEq, Serialize, Deserialize, Parser)]
#[clap(about, version, author)]
#[serde(default)]
pub struct ConfigV0 {
#[clap(long, short = 'c', default_value_t)]
pub config_file: String,
#[clap(flatten)]
pub query: QueryConfigV0,
#[clap(flatten)]
pub log: LogConfigV0,
#[clap(flatten)]
pub meta: MetaConfigV0,
#[clap(flatten)]
pub storage: StorageConfigV0,
#[clap(flatten)]
pub catalog: HiveCatalogConfigV0,
}
The inner config users have to maintain the outer config.
For example: common-io should provide inner config StorageConfig. If query wants to include StorageConfig inside QueryConfig, query needs to:
- Implement versioned
outerconfig forStorageConfigcalledStorageConfigV0. - Implement
Into<StorageConfig> for StorageConfigV0. - Refer
StorageConfiginQueryConfig, - Refer
StorageConfigV0inQueryConfigV0.
Config Maintenance
All maintenance notices SHOULD be applied to the outer config struct.
- Add config: add with new default is compatible, or it's forbidden.
- Remove config: remove field is not allowed. Mark them as
DEPRECATEDinstead. - Change config: change config type and structure are not allowed.
Drawbacks
Maintenance burden
Introducing an outer config will increase the complexity of the config handler.
Rationale and alternatives
Why not use serde and related tools?
The most important thing is that RFC intends to split inner and outer config instances. Make inner as simple as possible and leave the userland interactive works for outer to handle.
serde doesn't work in this way.
How to work with protobuf used by meta?
As described in the reference, the config used by protobuf is another outer config. It should handle versions by itself. Based on the current status of databend common-proto-conv, we will keep all fields until we decide to increase OLDEST_COMPATIBLE_VER.
Prior art
None, this RFC is the first try for backward config compatibility.
Unresolved questions
None.
Future possibilities
Introduce versioned config
We can introduce a versioned config to allow users to specify the config versions:
- config file:
version=42 - config env:
export CONFIG_VERSION=42 - args:
--config-version=42
Suppose compatible changes happened as a new config entry was added. databend will make sure that the entry has a default value.
Suppose incompatible changes happened, like config been removed/renamed/changed. databend increases the config version. The older version will still load by the specified version and be converted to the latest config internally. A DEPRECATED warning will also be printed for removed config fields. So users can decide whether to migrate them.
Load different versions from config files and envs
It's possible to load different versions from config files and envs.
For example:
Old version from config files:
version = 23
a = "Version 23"
New version from env:
export CONFIG_VERSION=42
export QUERY_B = "Version 42"
For the best situation, we can load from env via version 42 and then load from config via version 23.