Many data sources were not developed using a schema-first approach. For these datasets, neither an explicit schema nor a complete list of semantic constraints is available in advance.
In all these cases, reverse engineering methods are needed to reconstruct the schema and certain constraints from the data. We have developed such methods to derive a schema as well as schema statistics and information about structural outliers in datasets.
A solution for deriving a schema overview (JSON schema) from JSON data, additional schema statistics and outliers in the datasets can be found here:
Schema versions (in JSON schema syntax) can be derived from JSON documents as well as the evolution operations that transform one schema version into the subsequent one, this is described in this publication:
Information how these schema extraction approaches are integrated into the NoSQL evolution approach can be found here
and in theses pubications: