Schema Registry Service - Version 1.0-rc1

Abstract

This specification defines a Schema Registry extension to the xRegistry document format and API specification. A Schema Registry allows for the storage, management and discovery of schema documents.

Table of Contents

Overview

This specification defines a Schema Registry extension to the xRegistry document format and API specification.

For easy reference, the JSON serialization of a Schema Registry adheres to this form:

{
  "specversion": "STRING",                         # xRegistry core attributes
  "registryid": "STRING",
  "self": "URL",
  "xid": "XID",
  "epoch": UINTEGER,
  "name": "STRING", ?
  "description": "STRING", ?
  "documentation": "URL", ?
  "labels": {
    "STRING": "STRING" *
  }, ?
  "createdat": "TIMESTAMP",
  "modifiedat": "TIMESTAMP",

  "model": { ... }, ?

  "schemagroupsurl": "URL",                        # SchemaGroups collection
  "schemagroupscount": UINTEGER,
  "schemagroups": {
    "KEY": {                                       # schemagroupid
      "schemagroupid": "STRING",                   # xRegistry core attributes
      "self": "URL",
      "xid": "XID",
      "epoch": UINTEGER,
      "name": "STRING", ?
      "description": "STRING", ?
      "documentation": "URL", ?
      "labels": { "STRING": "STRING" * }, ?
      "createdat": "TIMESTAMP",
      "modifiedat": "TIMESTAMP",

      "schemasurl": "URL",                         # Schemas collection
      "schemascount": UINTEGER,
      "schemas": {
        "KEY": {                                   # schemaid
          "schemaid": "STRING",                    # xRegistry core attributes
          "versionid": "STRING",
          "self": "URL",
          "xid": "XID",
          #  Start of default Version's attributes
          "epoch": UINTEGER,
          "name": "STRING", ?                      # Version level attrs
          "description": "STRING", ?
          "documentation": "URL", ?
          "labels": { "STRING": "STRING" * }, ?
          "createdat": "TIMESTAMP",
          "modifiedat": "TIMESTAMP",
          "ancestor": "STRING",

          "format": "STRING", ?

          "schemaurl": "URL", ?
          "schema": ANY ?
          "schemabase64": "STRING", ?
          #  End of default Version's attributes

          "metaurl": "URL",                    # Resource level attrs
          "meta": {
            ... core spec metadata attributes ...
            "validation": BOOLEAN ?
          }, ?

          "versionsurl": "URL",
          "versionscount": UINTEGER,
          "versions": { ... } ?
        } *
      } ?
    } *
  } ?
}

Notations and Terminology

Notational Conventions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

For clarity, OPTIONAL attributes (specification-defined and extensions) are OPTIONAL for clients to use, but the servers' responsibility will vary. Server-unknown extension attributes MUST be silently stored in the backing datastore. Specification-defined, and server-known extension, attributes MUST generate an error if corresponding feature is not supported or enabled. However, as with all attributes, if accepting the attribute would result in a bad state (such as exceeding a size limit, or results in a security issue), then the server MAY choose to reject the request.

In the pseudo JSON format snippets ? means the preceding attribute is OPTIONAL, * means the preceding attribute MAY appear zero or more times, and + means the preceding attribute MUST appear at least once. The presence of the # character means the remaining portion of the line is a comment. Whitespace characters in the JSON snippets are used for readability and are not normative.

Terminology

This specification defines the following terms:

Schema

A schema, in the sense of this specification, defines the structure of the body/payload of a serialized message, but also of a structured data element stored on disk or elsewhere. In self-describing serialization formats like JSON, XML, BSON, or MessagePack, schemas primarily help with validating whether a message body conforms with a set of rules defined in the schema and with generating code that can produce or consume the defined structure. For schema-dependent serialization formats like Protobuf or Apache Avro, a schema is needed to decode the structured data from its serialized, binary form.

We use the term schema (or schema Resource) in this specification as a logical grouping of schema Versions. A schema Version is a concrete document. The schema Resource is a semantic umbrella formed around one or more concrete schema Version documents. Per the definition of the compatibility attribute, all Versions of a single schema MUST adhere to the rules defined by the compatibility attribute. Any breaking change MUST result in a new schema being created.

In "semantic Versioning" terms, you can think of a schema as a "major version" and the schema Versions as "minor versions".

Schema Registry Model

The formal xRegistry extension model of the Schema Registry resides in the model.json file.

Schema Group

A schema group is a container for schemas that are related to each other in some application-defined way. This specification does not impose any restrictions on what schemas can be contained in a schema group.

Schema Registry

The Schema Registry is a metadata store for organizing schemas and schema Versions of any kind; it is a document store.

The xRegistry API extension model of the Schema Registry, which is defined in detail below, is as follows:

{
  "groups": [
    {
      "singular": "schemagroup",
      "plural": "schemagroups",

      "resources": [
        {
          "singular": "schema",
          "plural": "schemas",
          "versions": 0,

          "attributes": {
            "format": {
              "name": "format",
              "type": "string",
              "description": "Schema format identifier for this schema version"
            }
          },
          "metaattributes": {
            "validation": {
              "name": "validation",
              "type": "boolean",
              "description": "Verify compliance with specified 'format'"
            }
          }
        }
      ]
    }
  ]
}

Implementations of this specification MAY include additional extension attributes, including the * attribute of type any.

Since the Schema Registry is an application of the xRegistry specification, all attributes for Groups, Resources, and Resource Version objects are inherited from there.

Schema Groups

The group (GROUP) name for the Schema Registry is schemagroups. The group does not have any specific extension attributes.

A schema group is a collection of schemas that are related to each other in some application-defined way. A schema group does not impose any restrictions on the contained schemas, meaning that a schema group MAY contain schemas of different formats. Every schema MUST reside inside a schema group.

Example:

The follow abbreviated Schema Registry's contents shows a single schema group containing 5 schemas.

{
  "specversion": 1.0-rc1,
  # other xRegistry top-level attributes excluded for brevity

  "schemagroupsurl": "http://example.com/schemagroups",
  "schemagroupscount": 1,
  "schemagroups": {
    "com.example.schemas": {
      "schemagroupid": "com.example.schemas",
      # Other xRegistry group-level attributes excluded for brevity

      "schemasurl": "https://example.com/schemagroups/com.example.schemas/schemas",
      "schemascount": 5
    }
  }
}

Schema Resources

The resources (RESOURCE) collection inside of schema groups is named schemas. The type of the resource is schema. Any single schema is a container for one or more Versions, which hold the concrete schema documents or schema document references.

All Versions of a schema MUST adhere to the semantic rules of the schema's compatibility attribute. This specification defines "compatibility" for schemas as follows; version B of a schema is said to be compatible with version A of a schema if all of the following are true:

Implementations of this specification MAY choose to support any of the compatibility values defined in the core xRegistry specification.

Implementations of this specification SHOULD use the xRegistry default algorithm for generating new versionid values and for determining which is the latest Version. See Version IDs for more information, but in summary it means:

When semantic versioning is used in a solution, it is RECOMMENDED to include a major version identifier in the schemaid, like "com.example.event.v1" or "com.example.event.2024-02", so that incompatible, but historically related schemas can be more easily identified by users and developers. The schema versionid then functions as the semantic minor version identifier.

The following extensions are defined for the schema Resource in addition to the core xRegistry Resource attributes:

validation

format

The following abbreviated example shows three embedded Protobuf/3 schema Versions for a schema named com.example.telemetrydata:

{
  "specversion": 1.0-rc1,
  # other xRegistry top-level attributes excluded for brevity

  "schemagroupsurl": "http://example.com/schemagroups",
  "schemagroupscount": 1,
  "schemagroups": {
    "com.example.telemetry": {
      "schemagroupid": "com.example.telemetry",
      # other xRegistry group-level attributes excluded for brevity

      "schemasurl": "http://example.com/schemagroups/com.example.telemetry/schemas",
      "schemascount": 1,
      "schemas": {
        "com.example.telemetrydata": {
          "schemaid": "com.example.telemetrydata",
          "versionid": "3",
          "description": "device telemetry event data",
          "ancestor": "2",
          "format": "Protobuf/3",
          # other xRegistry default Version attributes excluded for brevity

          "schema": "syntax = \"proto3\"; message Metrics { float metric = 1; string unit = 2; string description = 3; } }",

          "metaurl": "http://example.com/schemagroups/com.example.telemetry/schemas/com.example.telemetrydata/meta",
          "versionsurl": "http://example.com/schemagroups/com.example.telemetry/schemas/com.example.telemetrydata/versions",
          "versionscount": 3,
          "versions": {
            "1": {
              "schemaid": "com.example.telemetrydata",
              "versionid": "1",
              "description": "device telemetry event data",
              "ancestor": "1",
              "format": "Protobuf/3",
              # other xRegistry Version-level attributes excluded for brevity

              "schema": "syntax = \"proto3\"; message Metrics { float metric = 1; } }"
            },
            "2": {
              "schemaid": "com.example.telemetrydata",
              "versionid": "2",
              "description": "device telemetry event data",
              "ancestor": "1",
              "format": "Protobuf/3",
              # other xRegistry Version-level attributes excluded for brevity

              "schema": "syntax = \"proto3\"; message Metrics { float metric = 1; string unit = 2; } }"
            },
            "3": {
              "schemaid": "com.example.telemetrydata",
              "versionid": "3",
              "description": "device telemetry event data",
              "ancestor": "2",
              "format": "Protobuf/3",
              # other xRegistry Version-level attributes excluded for brevity

              "schema": "syntax = \"proto3\"; message Metrics { float metric = 1; string unit = 2; string description = 3; } }"
            }
          }
        }
      }
    }
  }
}

Schema Formats

This section defines a set of common schema format values that MUST be used for the given formats, but applications MAY define extensions for other formats on their own.

JSON Schema

The format identifier for JSON Schema is JsonSchema.

When the format attribute is set to JsonSchema, the schema attribute of the schema Resource is a JSON object representing a JSON Schema document conformant with the declared version.

A URI-reference, like schemauri that points to a JSON Schema document MAY use a JSON pointer expression to deep link into the schema document to reference a particular type definition. Otherwise the top-level object definition of the schema is used.

The version of the JSON Schema format is the version of the JSON Schema specification that is used to define the schema. The version of the JSON Schema specification is defined in the $schema attribute of the schema document.

The identifiers for the following JSON Schema versions

are defined as follows:

which follows the exact convention as defined for JSON schema and expecting an eventually released version 1.0 of the JSON Schema specification using a plain version number.

XML Schema

The format identifier for XML Schema is XSD. The version of the XML Schema format is the version of the W3C XML Schema specification that is used to define the schema.

When the format attribute is set to XSD, the schema attribute of schema Resource is a string containing an XML Schema document conformant with the declared version.

A URI-reference, like schemauri that points to a XSD Schema document MAY use an XPath expression to deep link into the schema document to reference a particular type definition. Otherwise the root element definition of the schema is used.

The identifiers for the following XML Schema versions:

are defined as follows:

Apache Avro Schema

The format identifier for Apache Avro Schema is Avro. The version of the Apache Avro Schema format is the version of the Apache Avro Schema release that is used to define the schema.

When the format attribute is set to Avro, the schema attribute of the schema Resource is a JSON object representing an Avro schema document conformant with the declared version.

Examples:

A URI-reference, like schemauri that points to an Avro Schema document MUST reference an Avro record declaration contained in the schema document using a URI fragment suffix [:]{record-name}. The ':' character is used as a separator when the URI already contains a fragment.

Examples:

Protobuf Schema

The format identifier for Protobuf Schema is Protobuf. The version of the Protobuf Schema format is the version of the Protobuf syntax that is used to define the schema.

When the format attribute is set to Protobuf, the schema attribute of the schema Resource is a string containing a Protobuf schema document conformant with the declared version.

A URI-reference, like [schemauri](../message/spec.md#dataschemauri that points to an Protobuf Schema document MUST reference an Protobuf message declaration contained in the schema document using a URI fragment suffix [:]{message-name}. The ':' character is used as a separator when the URI already contains a fragment.

Examples: