tigase/_server/server-core

Change data source assignment - by default use per-domain repositories (#1207)

Wojciech Kapcia (Tigase) opened 5 years ago

As a result of discussion in #tigaseim-90 we should change the default behaviour of MDRepositoryBean* so that all components will prefer using domain repository if it's present - you suggested changing tigase.db.beans.MDRepositoryBean.SelectorType to EveryUserRepository. We have following possibilities (selectors) now:

List - Repository instances will be created for default data source and for data sources listed in configuration.
EveryDataSource - Repository instances will be created for every data source.
EveryUserRepository - Repository instances will be created for every data source for which user repository exists.

to-do:

change selector to prefered one
include information in documentation and how to handle it (add repository configuration manually for each matching domain pointing to default data-source)
include information in upgrade-schema that configuration has to be adjusted or selector should be changed

optionally: 4) add automatic configuration updater (configure use of old selector)

Activities

Wojciech Kapcia (Tigase) commented 5 years ago

@andrzej.wojcik Could you confirm, that the last option doesn't require additional, dedicated configuration apart for data-sources? (It's not entirely clear what's the default UserRepositories mapping to data-source - user-repository for every data-source or user-repository for data-source that matches configured VHost or only manually configured repositories in UserRepository bean in TDSL)?
Andrzej Wójcik (Tigase) commented 5 years ago

I think it is for each UserRepository configured for each bean entry in the userRepository section of TDSL. It was quite a while since it was implemented, but from what I see in the code it retrieves all names of beans registered in userRepository section and being implementation of UserRepository and the registers in MDRepositoryBean beans with the same name - that leads to the usage of data sources with the same names.
Wojciech Kapcia (Tigase) commented 5 years ago

I was pondering the issue some more: we were discussing that it's not possible to distinguish if the particular dataSource is "vhost" or not without actually having access to some "main" configuration repository, which then yielded conclusion, that EveryUserRepository would be the best as it would allow creation of explicit repositories, but in all components. However - couldn't we just add a configuration option to dataSource that would allow it marking as "vhost datasource"? That way we wouldn't have to seemingly duplicate the configuration into userRepository bean - what do you think?
Andrzej Wójcik (Tigase) commented 5 years ago
I think that we have 2 issues here.

we do not know which database/datasource to use for each vhost? (name does not need to match)

we do not have an info about vhosts until we initialize repository and vhost repository is based on user repository, so we need to have user repository ready before we will get vhost repository

Moreover, Kernel is prepared to inject what is needed and it is very difficult (or impossible) to initialize single user repository without initializing the rest of them.

Wojciech Kapcia (Tigase) commented 5 years ago

OK, my understanding is as follows:

I assume we always initialised default repository - correct?
In components/userrepository repositories will be created as follows: default for default datasource (and all non-configured repositories with names matching vhosts) and configured, per-vhost repositories (names have to match the vhost).
I assume, that the actual decision which repository to use (default or named) is done after initialisation (after default repository is initialised) - correct?

If so, then there shouldn't be any significant difference between specifying named repository under repository (which may or may not match the vhost) and indicating that the data-source should be used as vhost repository. For example, currently you have to configure explicitly dataSources and userRespository

dataSource {
    default () {
        uri = 'jdbc:mysql://localhost:3306/tigasedb?user=tigase&password=tigasepassword'
    }
    'jabber.today' () {
        uri = 'jdbc:mysql://localhost:3306/jabbertoday?user=tigase&password=tigasepassword'
    }
    'sure.im' () {
        uri = 'jdbc:mysql://localhost:3306/sureim?user=tigase&password=tigasepassword'
    }
    'tigase.im' () {
        uri = 'jdbc:mysql://localhost:3306/tigaseim?user=tigase&password=tigasepassword'
    }
    'service' () {
        uri = 'jdbc:mysql://localhost:3306/service?user=tigase&password=tigasepassword'
    }
}
userRepository {
    default () {}
    'jabber.today' () {}
    'sure.im' () {}
    'tigase.im' () {}
}

Which, with additional dataSource parameter could be translated to

dataSource {
    default () {
        uri = 'jdbc:mysql://localhost:3306/tigasedb?user=tigase&password=tigasepassword'
    }
    'jabber.today' () {
        vhostDataSource=true
        uri = 'jdbc:mysql://localhost:3306/jabbertoday?user=tigase&password=tigasepassword'
    }
    'sure.im' () {
        vhostDataSource=true
        uri = 'jdbc:mysql://localhost:3306/sureim?user=tigase&password=tigasepassword'
    }
    'tigase.im' () {
        vhostDataSource=true
        uri = 'jdbc:mysql://localhost:3306/tigaseim?user=tigase&password=tigasepassword'
    }
    'service' () {
        uri = 'jdbc:mysql://localhost:3306/service?user=tigase&password=tigasepassword'
    }
}

(this is just an idea and possible improvement, if this is unfeasible then we can stick with using EveryUserRepository)

Andrzej Wójcik (Tigase) commented 5 years ago
@wojtek A few comments:

I assume we always initialised default repository - correct?

We always initialize all registered repositories. default is registered in the code so it is always initialized, but may not be the first one to initialize.

In components/userrepository repositories will be created as follows: default for default datasource (and all non-configured repositories with names matching vhosts) and configured, per-vhost repositories (names have to match the vhost).

How that would happen? We need user repository to get vhosts, so you would need to register (an unregister) new beans dynamically based on existing data sources and vhosts. To make this work, you would need to:

Register only default user repository

Initialize default user repository and load vhosts

After each reload of vhosts, you should automatically check all registered data sources and for matching ones register new beans if they are not registered yet. (You should also unregister a bean of vhost is removed!) I suppose that we do not have any way to know when that happens so we would need to use EventBus.

That will trigger creation of a bean by a kernel and it will register and inject it in the running Tigase XMPP Server.

That should start to work

NOTE: Changes done in points 3-4 are "similar" to changes done by hot reload of a configuration (which you worry that will crash a server) and may interact with changes applied by hot reload of the configuration when we will have it (adding a bean of repository is just change in the configuration but done on the fly).

I assume, that the actual decision which repository to use (default or named) is done after initialisation (after default repository is initialised) - correct?

This is done in the same way as it was done before. There is a map and in which key is a domain name and the value is a repository to use.

I do not want to add a new config field to data source to mark it as vhost data source. This is something which should be simple and generic. We may want to have a data source ie. for counter data logger and why it should be marked (or even has a field to be marked) as vhost data source?

Also, how will you solve an issue when user repository named ie. pandion.im should use non default repository jabber.one? still with a manual configuration?

How about auth repository? by default they are using user repository implementations, but not always. Would you manually configure them? or would you just "generate" them as with user repositories?

If so, how will you tackle usage of LDAP for authentication? What if I would like to have authentication for all domains in a single "credentials" repository while the rest of a data in a separate repositories? would that still be possible?
Wojciech Kapcia (Tigase) commented 5 years ago

How that would happen? We need user repository to get vhosts, so you would need to register (an unregister) new beans dynamically based on existing data sources and vhosts. To make this work, you would need to:

I was referring to the current case: if you specify repository in userRepo (or any component that uses repository actually) then appropriate repositories are be created. I think we are considering slightly different cases here...

Register only default user repository Initialize default user repository and load vhosts After each reload of vhosts, you should automatically check all registered data sources and for matching ones register new beans if they are not registered yet. (You should also unregister a bean of vhost is removed!) I suppose that we do not have any way to know when that happens so we would need to use EventBus. That will trigger creation of a bean by a kernel and it will register and inject it in the running Tigase XMPP Server. That should start to work

NOTE: Changes done in points 3-4 are "similar" to changes done by hot reload of a configuration (which you worry that will crash a server) and may interact with changes applied by hot reload of the configuration when we will have it (adding a bean of repository is just change in the configuration but done on the fly). As we currently don't change repositories during runtime (as it requires configuration change and restart) I don't think we should address it here.

That would be a somewhat different use cases where it would be more dynamic. It could be extended even to automatic database/schema creation when vhost is created (which would be interesting, but it's completely out of scope here)

I do not want to add a new config field to data source to mark it as vhost data source. This is something which should be simple and generic. We may want to have a data source ie. for counter data logger and why it should be marked (or even has a field to be marked) as vhost data source?

Valid point.

But we still have a case where we somewhat "mark" data-sources as special (by listing them in userRepo). And based on that we would create other respective repositories in other components. And while "marking" dataSource could feel wrong, using userRepo as indication that other beans should also have same set of repositories as user repo feels unintuitive. Having this information in dataSource seems more direct/cleaner.

Maybe to make it saner we could add filed/set "data-source-features" which could contain additional "capabilities" of the data-source, one of which would be "domain source"?

Also, how will you solve an issue when user repository named ie. pandion.im should use non default repository jabber.one? still with a manual configuration?

First of all - I don't want to eliminate current mechanism - if someone desire to have such configuration then go ahead. I only want to simplify (default) configuration. Second thing - such configuration is quite unusual in itself. Yes, we do have it on tigase.im but only because previously two domains were sharing one database, which is just weird if you are going for per-domain database...

How about auth repository? by default they are using user repository implementations, but not always. Would you manually configure them? or would you just "generate" them as with user repositories? If so, how will you tackle usage of LDAP for authentication?

Same as currently or when using EveryUserRepository. You use default one, or the "domain repo" (if it was configured in userRepo) or override it with dedicated repository.

What if I would like to have authentication for all domains in a single "credentials" repository while the rest of a data in a separate repositories? would that still be possible?

Well, I'd say that with using EveryUserRepository that wouldn't be possible as well...

What I'm proposing with additional configuration in data-source is the same as listing domain/repositories under userRepo and using EveryUserRepository - all your points above applies in that case just as well.

(I think we need BeagleTime tomorrow ;-) )
Andrzej Wójcik (Tigase) commented 5 years ago

@wojtek

Maybe to make it saner we could add filed/set "data-source-features" which could contain additional "capabilities" of the data-source, one of which would be "domain source"?

That feels like a good solution. I would just want to keep being able to user "current" version, so we would create an extension (ie. DataSourceFeatureAware) which would provide this feature-set configuration option and would make it a default bean for DataSourcepool bean. OK?

Well, I'd say that with using EveryUserRepository that wouldn't be possible as well...

Valid point.

I agree. We need to discuss during a call.
Wojciech Kapcia (Tigase) commented 5 years ago
Some conclusions from the call:

there is no hard decision whether we should use EveryUserRepository or data-sources with additional "vhost" feature (both have prose and cons, I'd lean toward the latter but with huge IMHO)

authrepository also has to be adapted to follow same selector type as other repositories

regarding handling upgrade - we could introduce in ConfigurationConverter a feature to explicitly include current selector type (if it's not configured) so during changing of the defaults existing users won't be affected. (there was also possibility to add current server version to the tdsl, but that would require handling of versions, not sure it that would be beneficial here)
Wojciech Kapcia (Tigase) batch edited 1 year ago
Name Previous Value Current Value
Iterations
empty
tigase-server-9.0.0
Login to comment

Name	Previous Value	Current Value
Iterations	empty	tigase-server-9.0.0

Type	Task
Priority	Normal
Assignee	Wojciech Kapcia (Tigase)
Version	tigase-server-9.0.0
Spent time	0

Iterations

tigase-server-9.0.0 Open

Issue Votes (0)

Watchers (2)

Reference

tigase/_server/server-core#1207