Create collection retrieval methods.

Description

The Configuration interface currently has no way to retrieve a list of items, such as might be represented by a TURF list or even a string with delimiters. See for comparison Apache Commons ImmutableConfiguration.getList(String key).

Implementing this feature will involve several steps:

  • Implement methods to retrieve a collection of objects, both as Collection<Object> and with a passed-in type.

  • Add a means to split strings (if so configured) in the string-based configurations.

  • Add convenience methods to get lists of other types, e.g. URI.

  • Update the key expression to retrieve indexed items in lists (if the collection happens to be a list), e.g. foo[2].bar.

Design Considerations

  • The current thinking is to use Collection<> as the interface. We assume that configuration values are small in number, and there would thus be little reason to return something less rich such as Iterable<>. i.e. We should always expect to have the size. Stream<> is another option, but we can always get a stream from a collection, and the underlying value may be a Set<>, which we couldn't get back to (without collecting the values) if we returned a stream. Lastly we could return a List<>, but we might as well distinguish underlying implementations that are collections but may not be list (again without forcing item collection).

  • We'll continue the get/find dichotomy we have for the single values. That is, findObjects(key) would return an Optional<Collection<Object>>. This will allow us to distinguish between a defined empty list, for example, or an undefined configuration key, because a configuration key defined as [] (an empty list) might be very different than undefined value which means we should use some default. Apache Commons Configuration considers a missing value as an empty list. Consumers who want that sort of functionality can simply use findObjects(key).orElse(Collections.emptyList()).

  • As with Apache Commons Configuration, we'll wrap single, non-collections objects in a collection of one. We'll probably use List<>, but maybe provide a facility to indicate the type of collection. (This could be used as variations of the normal retrieval methods as well.)

Environment

None

Activity

Show:
Garret Wilson
September 14, 2019, 6:18 PM

So far so good. But there is another complication. If we say that we're going to convert the items in the collection as necessary, doesn't that mean we need to create a new collection in many cases, or perhaps even always? We wouldn't want to update the existing collection, because it might be "live". But how do we know which sort of new collection to create? Do we require that the underlying collection implementation be Cloneable?

Which brings up another issue: would we ever want to blindly return the underlying collection, which might be mutable and allow the consumer to make changes (perhaps inadvertently) in the underlying configuration store? If not, we'd always want to do a defensive copy.

If we're always making a copy of the collection, do we want to convert lazily by returning an Iterable<>? (But the same consideration above applies: many use cases would probably want to know the number of things.) Or do we want to go the other way, and force everything into, say, a List<>? (But are is the use case for "contains" very common? This is the the main reason to use a set.)

The other possibility is to wrap the underlying collection in an immutable version. But would this provide threading problems? (Does Configuration even provide any thread safety guarantees? Maybe not.)

Garret Wilson
September 14, 2019, 6:26 PM
Edited

Always making a copy of the collection seems inefficient. But we can't just wrap it in an immutable decorator if things need to be converted.

It may be a little (tiny) bit complicated, but maybe the best way forward is to check the elements for the correct type. If they are all of the correct type, wrap the existing collection in an immutable decorator. If not, create a List<> copy, converting the objects.

The implication is that if you need your collection elements converted (which would always be the case for string-based collections, for example):

  • Each lookup will require creation of a new collection.

  • You will lose any special collection characteristics of the underlying collection, e.g. if the underlying storage used a Set<>, you may not get back a Set<>.

But then what about calling getObject(List.class). Doe that return a mutable or immutable collection? And do we want to create complicated logic to figure out which type of immutable wrapper to use? And what if they want something special, such as LinkedHashSet.class?

Garret Wilson
September 14, 2019, 6:51 PM

And here is the bigger concern, as far as Guise Mummy goes (which is the immediate need for this): If the configuration is based upon TURF, and we have a list of e.g. deployment targets such as [*S3], what type of list element would we use? Would we ask for an UrfObject.class? But not every configuration storage would support URF or even know what it is. Or would we request a type of S3.class? But we don't even have a system for creating an S3 type from TURF *S3 at the moment (we did in the original URF implementation), much less a way to integrate that into Confound. (Actually it's more complicated than this, because each element would be of some base type, such as DeploymentTarget, with S3 being one subtype.)

But here's another idea: each "object" inside the collection is really a subconfiguration, i.e. a subtree of the original Configuration. So maybe you could ask for a collection of subconfigurations! This would basically allow you to consider each element as an independent entity, but in terms of Configuration, which we already have!!

Then we could just do, e.g. for an S3 configuration (assuming there is at least one):

This would correspond to deploy.targets[0].bucket in the root configuration.

This is sounding good. The only missing ingredient is something we need anyway: a way to indicate the "type" of some configuration node. Not all configuration implementations would support this (i.e. they may not even have the concept of a "type"), but for those that do, such as URF, we need a way to get that in Configuration. Basically we should have a root Configuration.getType(). Then each node should probably have a Configuration.getType(key), and if we do get a subconfiguration, it should use that type as the type of the subconfiguration. This would allow for complex, typed configurations such as we used to have for the original Guise Framework, but without some pluggable object mapper; instead, each "object" is just a subconfiguration that also exposes a type.

So what do we return for Configuration.getType()? Not a Class<>, because for things like S3 we would want to indicate perhaps the URF handle S3. But what about things in URF that don't have handles, i.e. a type that is only represented by an URF tag?

I think we're very close to a solution here that solves various problems on different fronts. The big decision now, requiring a bit more thought, is what type of "type" to expose for nodes in a Configuration.

Garret Wilson
September 15, 2019, 1:45 PM
Edited

After much thought, I realized that a section in an INI file is no different than having an object as the value of the root object in an URF configuration file. That is the following INI file:

is semantically the same as the following TURF file (assuming there is a root object, or a RIB file in the future)

The only differences are that:

  • In the INI, something about the "property" (the brackets of [example] indicates that there is a section, while in TURF all the properties look the same, and it is the value that determines whether the thing is a section (an URF object).

  • In URF the "section" can have a type.

So we need to add section support to Confound (). Then lists of objects will in TURF will naturally contain "sections".

Garret Wilson
September 18, 2019, 3:20 PM

Other issues: thread-safety and iterability. Even if later we allow pluggable synchronization for each configuration (as does Apache Commons Configuration), that is, letting each configuration determine how it handles thread-safety, there is no way for the caller to know how the original configuration handles synchronization. We could add something to the API so that each caller would have to coordinate with the configuration before and after configuration to perform some sort of locking, but that seems onerous.

Better to require the configuration API to always return some thread-safe, iterable-safe collection (whether that is a copy or a concurrent collection).

Assignee

Garret Wilson

Reporter

Garret Wilson

Labels

None

Priority

Major

Epic Name

Collections
Configure