[TINKERPOP-3261] Enable multiple labels on vertex with configurable label cardinality#3483
[TINKERPOP-3261] Enable multiple labels on vertex with configurable label cardinality#3483xiazcy wants to merge 12 commits into
Conversation
ac8825a to
9d2e0fe
Compare
- LabelCardinality enum (ONE, ONE_OR_MORE, ZERO_OR_MORE) on Graph.Features
- Element.labels() returns Set<String>, addLabel()/dropLabels() mutation steps
- LabelsStep for g.V().labels() traversal
- LabelCardinalityValidator for constraint enforcement
- LabelsDropVerificationStrategy prevents accidental label removal
- MergeVertex supports multi-label via T.label list in merge map
- elementMap()/valueMap() label output controlled by with("multilabel")/with("singlelabel")
- GremlinLang propagates with("multilabel")/with("singlelabel") in gremlin text
- Grammar rules for addLabel(), dropLabels(), labels(), addV(String...)
- Java test infrastructure: World.getMultiLabelGraphTraversalSource(),
@MultiLabel/@MultiLabelDefault/@SingleLabelDefault tags
- GraphBinary V4 serialization for multi-label vertices/edges
- TinkerVertex stores labels as CopyOnWriteArraySet - TinkerGraph.vertexLabelCardinality configuration property - addLabel/dropLabels wired to TinkerVertex mutation - Unit tests for label cardinality, mutation, merge, and GremlinLang
- GremlinLang with("multilabel")/with("singlelabel") propagation in all GLVs
- Fix: render options in gremlin text both when OptionsStrategy is first created
and when it already exists (from prior with() calls like with("language",...))
- Go: With() method now variadic (single-arg g.With("multilabel") works)
- Feature tests: full matrix for valueMap/elementMap with @multilabel,
@MultiLabelDefault, @SingleLabelDefault tags
- Existing valueMap/elementMap tests use with("singlelabel") for provider safety
- Test infrastructure: @MultiLabelDefault uses gmultilabel + with("multilabel")
programmatically (no server-side YAML config needed)
- Docker gremlin-server-integration.yaml: multilabel graph config
- Cucumber support in all GLVs: terrain/world/steps handle multilabel graphs
- Upgrade docs: user guide for with("multilabel")/with("singlelabel"),
provider guide for LabelCardinality and step overrides
- Reference docs: elementMap/valueMap label format, TinkerGraph configuration
- Provider semantics: valueMap with-options section
- For-committers: @MultiLabel/@MultiLabelDefault/@SingleLabelDefault gherkin tags
- CHANGELOG entry
9d2e0fe to
be5072e
Compare
| edges, properties) in the same order in which they were inserted into the graph. | ||
| * `@MetaProperties` - The scenario makes use of meta-properties. | ||
| * `@MultiLabel` - The scenario requires a graph that supports multi-label vertices (i.e. | ||
| `ZERO_OR_MORE` vertex label cardinality). Providers that only support single-label vertices should |
There was a problem hiding this comment.
is this truly ZERO_OR_MORE? like, have we structured it to be only that cardinality?
There was a problem hiding this comment.
Given the tests we currently have, I think this be a catch-all tag which essentially means "not single-label". I see this as similar to how we combined @multimetaproperties for many years before ultimately splitting them. I think it's ok to combine everything for now, but as we see what sort of configurations providers like to offer, we may want to split into more fine-grained control in the future. I could this evolve into a full set of tags including @MultiLabelV, @ZeroLabelV, @MultiLabelE, @ZeroLabelE.
| * `@MultiLabel` - The scenario requires a graph that supports multi-label vertices (i.e. | ||
| `ZERO_OR_MORE` vertex label cardinality). Providers that only support single-label vertices should | ||
| exclude these tests. | ||
| * `@MultiLabelDefault` - The scenario expects multi-label output as the default behavior for |
There was a problem hiding this comment.
is this correct? if it's testing a default, why are we setting with in this case. We didn't write that for single below. maybe the docs are just all off in this section?
There was a problem hiding this comment.
I agree the docs seem off here, the test infrastructure should not be explicitly setting with("multilabel") in these scenarios, they should be testing traversals which do not explicitly set with(), and the expectation is that the output would match a traversal with an with("multilabel") for these graphs.
| link:https://github.com/apache/tinkerpop/tree/x.y.z/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/AddVertexStartStep.java[source (start)], | ||
| link:https://tinkerpop.apache.org/docs/x.y.z/reference/#addvertex-step[reference] | ||
|
|
||
| === addLabel() |
There was a problem hiding this comment.
I made some changes to the structure of gremlin-semantics.asciidoc and the tinker-doc skill. Perhaps rebase with those changes and ask kiro or whoever to align these semantics docs with the new style (ensuring it triggers the skill). I did that on P(Traversal) branch and that worked pretty well.
| link:https://github.com/apache/tinkerpop/tree/x.y.z/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/LengthLocalStep.java[source (local)], | ||
| link:https://tinkerpop.apache.org/docs/x.y.z/reference/#length-step[reference] | ||
|
|
||
| === labels() |
There was a problem hiding this comment.
Alphabetically this should be above length. I think we should document label() semantics on this PR because introducing this feature affects its behavior as we need to document what label() does in the face of multiple labels being produced.
| * `moreLabels` - Additional labels to add. | ||
| * `labelTraversal` - A traversal that produces labels to add. | ||
|
|
||
| *Considerations:* |
There was a problem hiding this comment.
The graph must be configured with a
LabelCardinalitythat supports mutation (ONE_OR_MOREorZERO_OR_MORE).
"supports mutation"? First, that sounds a bit odd...LabelCardinality is not really "mutation" right? Does single imply immutability? is that right?
| ** 15 for including all | ||
|
|
||
| The label format in the map is controlled by source-level `with("multilabel")` and `with("singlelabel")` options, not | ||
| by the graph's `LabelCardinality`. When `"multilabel"` is present and `"singlelabel"` is not, the label value is a |
There was a problem hiding this comment.
When
"multilabel"is present and"singlelabel"is not,
that's an odd way to say that, no? it's not like they make any sense both being configured. ah, but then:
When both options are present on the same source,
"singlelabel"always takes precedence regardless of the order in which they were applied.
I'm not sure I like that. I suppose it's better than an error? Was this an explicit design choice?
| |gremlin.tinkergraph.vertexPropertyIdManager |The `IdManager` implementation to use for vertex properties. | ||
| |gremlin.tinkergraph.defaultVertexPropertyCardinality |The default `VertexProperty.Cardinality` to use when `Vertex.property(k,v)` is called. | ||
| |gremlin.tinkergraph.allowNullPropertyValues |A boolean value that determines whether or not `null` property values are allowed and defaults to `false`. | ||
| |gremlin.tinkergraph.vertexLabelCardinality |The `LabelCardinality` for vertices. Options are `ONE` (default, single immutable label, backward-compatible with 3.x), `ONE_OR_MORE` (at least one label, mutable), or `ZERO_OR_MORE` (fully flexible, zero or more labels). See <<tinkergraph-multi-label>>. |
There was a problem hiding this comment.
"fully flexible" is an odd description to me. just "zero or more labels" seems right.
| g.V().properties() | ||
| ---- | ||
|
|
||
| [[tinkergraph-multi-label]] |
There was a problem hiding this comment.
We're missing a lot of docs updates. TinkerGraph isn't the story here. The story is that TinkerPop now has multi-label support. The entire reference guide, recipes, tutorials, etc. need review. At minimum, we have to conceptually introduce this change to the "structure" at the front end of the reference docs: https://tinkerpop.apache.org/docs/current/reference/#graph-structure We should be looking to explain this feature throughout the docs in a seamless fashion.
| has label "a" OR label "b". To match vertices having both labels, chain multiple `hasLabel()` calls: | ||
| `hasLabel("a").hasLabel("b")`. | ||
|
|
||
| IMPORTANT: Edge labels are currently fixed at cardinality `ONE` and are not configurable. |
There was a problem hiding this comment.
do we still even have a notion of edge cardinality?
| `ONE_OR_MORE` or `ZERO_OR_MORE`. This enables use of the `addLabel()`, `dropLabel()`, `dropLabels()`, and `labels()` | ||
| traversal steps. | ||
|
|
||
| [source,groovy] |
There was a problem hiding this comment.
shouldn't this be executable and use our callout style?
| NOTE: When `LabelCardinality` is set to `ONE` (the default), calling `addLabel()` will throw an | ||
| `IllegalStateException` since labels are immutable in that mode. | ||
|
|
||
| See: <<labels-step,labels() step>>, <<droplabel-step,dropLabel() step>> |
There was a problem hiding this comment.
"See:"? that doesn't belong in the Reference Docs. I mean, it's an interesting idea to reference other related steps, but not sure now is the time to introduce things like that. Please remove in the other newly added step content too.
|
|
||
| [gremlin-groovy] | ||
| ---- | ||
| conf = new BaseConfiguration() |
There was a problem hiding this comment.
as a follow-on task, should we perhaps add a JIRA to create a Builder of sorts for TinkerGraph configurations?
| [[droplabel-step]] | ||
| === DropLabel Step | ||
|
|
||
| The `dropLabel()`-step (*sideEffect*) removes one or more specific labels from an element. The `dropLabels()`-step |
There was a problem hiding this comment.
I really dislike that our reference documentation is reading like semantics. For example:
Dropping a label that does not exist on the element is a no-op.
Leave specification statements for semantics. if we want to talk about a no-op condition then we should introduce a scenario that sets up that case. That statement on its own does not naturally flow from the prior sentence. The tinker-doc skill continues to need refinement, but even then, I think we need to pay much closer attention to what is being produced here. Perhaps we've already let too much of this shortened language into the reference docs and it all needs review.
| v = g.addV('person').property('name','marko').addLabel('employee').addLabel('manager').next() | ||
| g.V(v).labels() | ||
| g.V(v).dropLabel('manager').labels() | ||
| g.V(v).dropLabels().labels() |
There was a problem hiding this comment.
how about a g.V().labels().count() to confirm the drop of all at the end?
| final Object literalOrVar = antlr.argumentVisitor.visitStringArgument(stringArgs.get(i)); | ||
| moreLabels[i - 2] = literalOrVar instanceof String ? (String) literalOrVar : ((GValue<String>) literalOrVar).get(); | ||
| } | ||
| return t.addLabel(secondLabel, moreLabels); |
There was a problem hiding this comment.
is this the right way to do this translation? it won't represent exactly what the user sent, right? like, g.addV('dog','pet') and g.addV('dog').addLabel('pet')do the same thing, but shouldn't we be trying to construct the traversal in the fashion the user sent it? Similar issue inTraversalMethodVisitor`.
| * Adds dynamically computed labels to the current element. This is a side-effect step that passes the | ||
| * element through unchanged. | ||
| * | ||
| * @param labelTraversal the traversal that produces labels to add |
There was a problem hiding this comment.
the traversal that produces labels to add
labels plural? I assume it's not the case that we are going to iterate out all the Strings this traversal produces right? it's just pulling the first one, no? if not, i think that's an issue because that would cut against the by rule. i'm not sure what that means though because then we have nothing analogous to addLabel(final String label, final String... moreLabels) - not quite as nice to do multiple addLabel(Traversal) which may get ugly if you're trying to pick apart a list of labels. i dunno...not sure what we want to do here.
| * @return the traversal with the {@link AddVertexStepContract} added | ||
| * @since 4.0.0 | ||
| */ | ||
| public default GraphTraversal<S, Vertex> addV(final String label1, final String label2, final String... moreLabels) { |
There was a problem hiding this comment.
was it intentional to skip the multi-variation of addV(Traversal)? Is the intent to push those folks to addLabel(Traversal) or something?
| * | ||
| * @since 4.0.0 | ||
| */ | ||
| public final class LabelsDropVerificationStrategy |
There was a problem hiding this comment.
I don't think we need a new strategy here right? This isn't something you would turn off individually. It should probably just go in StandardVerificationStrategy.
If we do more of this syntax validation in strategies, we probably should consider creating a single SyntaxValidationStrategy with some kind of generalized type checker rules in it. That would be something someone might turn on/off as a whole. That would let StandardVerificationStrategy remain as an operational MUSTs and syntax validation be more about an optional correctness layer. That might be worth a JIRA.
| e.setId(deserializationContext.readValue(jsonParser, Object.class)); | ||
| } else if (jsonParser.getCurrentName().equals(GraphSONTokens.LABEL)) { | ||
| jsonParser.nextToken(); | ||
| final java.util.Set<String> labels = new java.util.LinkedHashSet<>(); |
There was a problem hiding this comment.
necessary for edges? they don't have multilabel concepts right?
| final Object id = context.read(buffer); | ||
| // reading single string value for now according to GraphBinaryV4 | ||
| final String label = (String) context.readValue(buffer, List.class, false).get(0); | ||
| // Read all labels as List<String> for multi-label support |
There was a problem hiding this comment.
no multi-label for edge, right?
There was a problem hiding this comment.
I don't think we should support multi-label or zero-label edges right now, but I think we should align the structure API and serializers to present the single label in a list anyways, it gives symmetry with Vertex as children of Element, and it sets us up for a future data model extension without the same hard breaks.
|
What are the implications for GraphML? Has that been considered? At minimum, there are some documentation concerns to take into account. Maybe, there's a knob on the IO builder to do the colon separator thingy. At least then it would still work. maybe other knobs to help it work in different capacities? if you don't want to figure it out for this PR, that's fine, create a JIRA that blocks for official 4.x and add appropriate warning docs to explain the current behavior ( i assume random label). |
| When iterated to list | ||
| Then the result should be unordered | ||
| | result | | ||
| | m[{"t[id]": "v[marko].id", "t[label]": "person", "name": ["marko"]}] | |
There was a problem hiding this comment.
We probably shouldn't assume the single label here is always "person" and not "employee"
| | result | | ||
| | m[{"t[id]": "v[marko].id", "t[label]": "s[person,employee]", "name": "marko"}] | | ||
|
|
||
| @MultiLabel @MultiLabelDefault |
There was a problem hiding this comment.
We should include an extra @MultilabelDefault tagged test in which the vertex only has a single label, but the label is still wrapped in a list.
| When iterated to list | ||
| Then the result should be unordered | ||
| | result | | ||
| | m[{"t[id]": "v[marko].id", "t[label]": "person", "name": "marko"}] | |
There was a problem hiding this comment.
We need additional scenarios with a zero-labelled vertex to validate the t.label field is completely absent from the map. We should include cases for with("singlelabel"), with("multilabel"), @MultiLabelDefault and @SingleLabelDefault. Similarly for valueMap(true)
| | person | | ||
| | person | | ||
| | person | | ||
|
|
There was a problem hiding this comment.
We need additional scenarios here which give full coverage of the labels() step for single label graphs as well (which will be skipping the @MultiLabel scenarios.
| g.V().elementMap("name", "age", null) | ||
| g.with("singlelabel").V().elementMap("name", "age", null) | ||
| """ | ||
| When iterated to list |
There was a problem hiding this comment.
We need full coverage of how with("singlelabel"), with("multilabel") and the varying defaults interact with edge elementMaps. I think edge labels should behave the same as vertices in this respect, despite us not currently allowing multilabel edges. It should be set up for future proofing.
| g.E().addLabel("friend").labels().fold() | ||
| """ | ||
| When iterated to list | ||
| Then the traversal will raise an error with message containing text of "Label mutation is not supported" |
There was a problem hiding this comment.
We need at least one test validating the errors thrown from single-label graphs.
| edges, properties) in the same order in which they were inserted into the graph. | ||
| * `@MetaProperties` - The scenario makes use of meta-properties. | ||
| * `@MultiLabel` - The scenario requires a graph that supports multi-label vertices (i.e. | ||
| `ZERO_OR_MORE` vertex label cardinality). Providers that only support single-label vertices should |
There was a problem hiding this comment.
Given the tests we currently have, I think this be a catch-all tag which essentially means "not single-label". I see this as similar to how we combined @multimetaproperties for many years before ultimately splitting them. I think it's ok to combine everything for now, but as we see what sort of configurations providers like to offer, we may want to split into more fine-grained control in the future. I could this evolve into a full set of tags including @MultiLabelV, @ZeroLabelV, @MultiLabelE, @ZeroLabelE.
| * `@MultiLabel` - The scenario requires a graph that supports multi-label vertices (i.e. | ||
| `ZERO_OR_MORE` vertex label cardinality). Providers that only support single-label vertices should | ||
| exclude these tests. | ||
| * `@MultiLabelDefault` - The scenario expects multi-label output as the default behavior for |
There was a problem hiding this comment.
I agree the docs seem off here, the test infrastructure should not be explicitly setting with("multilabel") in these scenarios, they should be testing traversals which do not explicitly set with(), and the expectation is that the output would match a traversal with an with("multilabel") for these graphs.
| * {@inheritDoc} | ||
| */ | ||
| @Override public T visitTraversalMethod_addE_String(final GremlinParser.TraversalMethod_addE_StringContext ctx) { notImplemented(ctx); return null; } | ||
| /** |
| final Object id = context.read(buffer); | ||
| // reading single string value for now according to GraphBinaryV4 | ||
| final String label = (String) context.readValue(buffer, List.class, false).get(0); | ||
| // Read all labels as List<String> for multi-label support |
There was a problem hiding this comment.
I don't think we should support multi-label or zero-label edges right now, but I think we should align the structure API and serializers to present the single label in a list anyways, it gives symmetry with Vertex as children of Element, and it sets us up for a future data model extension without the same hard breaks.
The grammar visitor for a source-spawned multi-label addV (e.g.
g.addV('a','b')) decomposed the call into addV('a').addLabel('b'),
producing a structurally different traversal than the user wrote.
TraversalSourceSpawnMethodVisitor now calls the native multi-arg
addV(String, String, String...) directly, matching the mid-traversal
TraversalMethodVisitor and yielding a single atomic addV step.
Assisted-by: Kiro:claude-opus-4.8
…ddV API Variable (GValue) bindings were collapsed to plain strings on the multi-label addV path, losing their variable-ness. The label is now threaded through end-to-end so variables survive parsing, the GraphTraversal API, and step storage, mirroring the single-label addV(GValue) path. The addV overloads are also simplified for 4.0.0: the single-arg addV(String)/addV(GValue) and the (first, second, more) forms are replaced by unified addV(String first, String... more) and addV(GValue<String> first, GValue<String>... more). A single label (empty 'more') keeps the prior scalar storage and serialization; two or more labels use the multi-label set path. Traversal-based addV remains single-arg for now. The Add*StepPlaceholder hierarchy stores multi-labels via a single Set<Object> constructor that registers any GValue variables, removing the redundant gvalueMarker constructor overload. Note: dropping the public addV(String)/addV(GValue) overloads is a binary-incompatible change appropriate for the 4.x major line. Assisted-by: Kiro:claude-opus-4.8
…l test coverage Label traversals now follow the by()-pattern (single result); a single label traversal producing a Collection spreads its elements into the label set. AddLabelStep no longer iterates all results. Adds addV(Traversal first, Traversal... more) traversal-varargs: each traversal contributes one label; a lone traversal may spread a Collection result. Requires a new Gremlin.g4 production; the addV grammar alternatives are also consolidated to align with the unified addV(label, additionalLabels) API shape. Zero-label vertices now omit T.label in single-label rendering and render an empty set in multi-label rendering (ElementMapStep, PropertyMapStep). Adds feature coverage for multi-label valueMap/ elementMap/labels/addLabel/dropLabel including single-label-graph error cases and edge label-format cases; single-label mutation-error scenarios are excluded from GraphComputer runners. Aligns the .NET AddV overloads to the (label, additionalLabels) shape and regenerates GLV test glue. Fixes the Go cucumber harness so name->vertex-id resolution works for dynamically-created vertices. Assisted-by: Kiro:claude-opus-4.8
Fixes gremlin-semantics ordering and wording (label()/labels() entries, LabelCardinality mutation phrasing, singlelabel/multilabel precedence), introduces vertex-label cardinality in the graph-structure reference, tidies the-traversal step docs (removes inline cross-refs, demonstrates no-op behavior via examples, confirms drop-all via labels().count()), corrects implementations-tinkergraph wording and makes the multi-label example executable, and clarifies the @MultiLabel/@MultiLabelDefault/ @SingleLabelDefault test-tag descriptions. Also removes a duplicate javadoc block in DefaultGremlinBaseVisitor. Assisted-by: Kiro:claude-opus-4.8
The labels().drop() rejection is an always-on structural check that is not individually toggleable, so it belongs with the other always-on verifications in StandardVerificationStrategy rather than as a separate default strategy. Moves the check (including its exact message), removes the standalone strategy and its registrations, and migrates the test coverage. Assisted-by: Kiro:claude-opus-4.8
The loosened addLabel(Traversal<?, ?>) overload broke diamond type inference for AddLabelStep (S extends Element cannot be inferred from a captured wildcard). Use the raw constructor, matching the existing raw (AddLabelStep) cast. Assisted-by: Kiro:claude-opus-4.8
The single-label-default valueMap/elementMap scenarios were previously weakened to a single-label vertex, defeating their purpose. Restore the multi-label vertex and assert the (non-deterministic) single-label result is one of the possible label renderings using 'the result should be of'. Assisted-by: Kiro:claude-opus-4.8
Introduces configurable label cardinality for graph elements. Vertex cardinality is user-configurable; edge cardinality defaults to ONE and is not yet exposed as a configuration option but uses the same underlying infrastructure.
Vertices can now have zero, one, or many labels controlled by LabelCardinality (defaults to ONE for full backward compatibility). New traversal steps labels(), addLabel(), dropLabel(), and dropLabels() enable label retrieval and mutation. Edge labels remain fixed at cardinality ONE.
Notes on design:
Commits:
Configuration
To Enable multi-label in TinkerGraph:
gremlin.tinkergraph.vertexLabelCardinality=ZERO_OR_MORE
Testing
VOTE +1