Skip to content

[SPARK-57855][SQL] Refactor JDBC value getters into a sealed type for readability#56936

Closed
Gamal72 wants to merge 2 commits into
apache:masterfrom
Gamal72:jdbc-value-getter-sealed-type
Closed

[SPARK-57855][SQL] Refactor JDBC value getters into a sealed type for readability#56936
Gamal72 wants to merge 2 commits into
apache:masterfrom
Gamal72:jdbc-value-getter-sealed-type

Conversation

@Gamal72

@Gamal72 Gamal72 commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

JdbcUtils.makeGetter builds a per-column reader (JDBCValueGetter) that reads a column from a ResultSet, converts it to Spark's internal representation, and stores it into an InternalRow. JDBCValueGetter was a private type alias for (ResultSet, InternalRow, Int) => Unit, so makeGetter returned an anonymous closure per column.

This PR makes the getter an explicit type:

  • Adds JDBCValueGetter.scala with a private[jdbc] sealed trait JDBCValueGetter (single apply(rs, row, pos) method) and one named case per getter in its companion object — case object for stateless getters, final case class for parameterized ones — plus the getter-only helpers nullSafeConvert / arrayConverter.
  • JdbcUtils.makeGetter becomes a straightforward DataType -> getter selection.

Class hierarchy (sealed):

JDBCValueGetter
 ├─ BooleanGetter, DoubleGetter, FloatGetter, IntGetter, LongGetter, ShortGetter,
 │  ByteGetter, StringGetter, RowIdGetter, BytesGetter, BinaryLongGetter,
 │  BinaryBitGetter, TimeGetter, LogicalTimeGetter, PostgresBitArrayGetter, NullGetter   (case object)
 └─ DateGetter, DecimalGetter, TimestampGetter, LogicalTimeNTZGetter, TimestampNTZGetter,
    YearMonthIntervalGetter, DayTimeIntervalGetter, ArrayGetter                          (final case class)

Each getter's body is transcribed unchanged from the corresponding match arm, and the selection order and guards in makeGetter are unchanged.

Why are the changes needed?

Because JDBCValueGetter was only a function alias, the getter chosen for a column was not expressed in the type system: makeGetter read as a large match returning indistinguishable closures, and an individual getter could not be named or referred to. Turning it into a sealed trait with one named case per getter makes the set of getters explicit and self-documenting, and reduces makeGetter to a simple selection over the column's DataType, separating "which getter to use" from "what each getter does".

Does this PR introduce any user-facing change?

No. This is an internal, behavior-preserving refactor of private (private[jdbc]) symbols; the per-column read/convert/store logic is unchanged, and there is no public API or binary-compatibility (MiMa) impact.

How was this patch tested?

Existing JDBC test suites (JDBCSuite, JDBCV2Suite, JDBCWriteSuite, JdbcUtilsSuite, etc.). This is a behavior-preserving refactor with no new behavior, so no new tests were added — the getter logic is a direct transcription of the previous match arms.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code

@Gamal72 Gamal72 force-pushed the jdbc-value-getter-sealed-type branch from f714a95 to 481b94e Compare July 1, 2026 19:27
@Gamal72 Gamal72 force-pushed the jdbc-value-getter-sealed-type branch from 481b94e to ac2d7e8 Compare July 1, 2026 19:30

@cloud-fan cloud-fan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 blocking, 0 non-blocking, 1 nits.
A clean, faithful, behavior-preserving refactor. Transcription verified arm-by-arm; every old match arm matches its new getter case, including ArrayGetter's error args (dt -> arrayType) and the case object singletons being stateless.

Nits: 1 minor item (see inline comments).

…rces/jdbc/JDBCValueGetter.scala

Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
@cloud-fan

Copy link
Copy Markdown
Contributor

thanks, merging to master/4.x

@cloud-fan cloud-fan closed this in 589e007 Jul 2, 2026
cloud-fan pushed a commit that referenced this pull request Jul 2, 2026
… readability

### What changes were proposed in this pull request?

`JdbcUtils.makeGetter` builds a per-column reader (`JDBCValueGetter`) that reads a column from a `ResultSet`, converts it to Spark's internal representation, and stores it into an `InternalRow`. `JDBCValueGetter` was a private type alias for `(ResultSet, InternalRow, Int) => Unit`, so `makeGetter` returned an anonymous closure per column.

This PR makes the getter an explicit type:

- Adds `JDBCValueGetter.scala` with a `private[jdbc]` sealed trait `JDBCValueGetter` (single `apply(rs, row, pos)` method) and one named case per getter in its companion object — `case object` for stateless getters, `final case class` for parameterized ones — plus the getter-only helpers `nullSafeConvert` / `arrayConverter`.
- `JdbcUtils.makeGetter` becomes a straightforward `DataType` -> getter selection.

Class hierarchy (sealed):

```
JDBCValueGetter
 ├─ BooleanGetter, DoubleGetter, FloatGetter, IntGetter, LongGetter, ShortGetter,
 │  ByteGetter, StringGetter, RowIdGetter, BytesGetter, BinaryLongGetter,
 │  BinaryBitGetter, TimeGetter, LogicalTimeGetter, PostgresBitArrayGetter, NullGetter   (case object)
 └─ DateGetter, DecimalGetter, TimestampGetter, LogicalTimeNTZGetter, TimestampNTZGetter,
    YearMonthIntervalGetter, DayTimeIntervalGetter, ArrayGetter                          (final case class)
```

Each getter's body is transcribed unchanged from the corresponding `match` arm, and the selection order and guards in `makeGetter` are unchanged.

### Why are the changes needed?

Because `JDBCValueGetter` was only a function alias, the getter chosen for a column was not expressed in the type system: `makeGetter` read as a large `match` returning indistinguishable closures, and an individual getter could not be named or referred to. Turning it into a sealed trait with one named case per getter makes the set of getters explicit and self-documenting, and reduces `makeGetter` to a simple selection over the column's `DataType`, separating "which getter to use" from "what each getter does".

### Does this PR introduce _any_ user-facing change?

No. This is an internal, behavior-preserving refactor of private (`private[jdbc]`) symbols; the per-column read/convert/store logic is unchanged, and there is no public API or binary-compatibility (MiMa) impact.

### How was this patch tested?

Existing JDBC test suites (`JDBCSuite`, `JDBCV2Suite`, `JDBCWriteSuite`, `JdbcUtilsSuite`, etc.). This is a behavior-preserving refactor with no new behavior, so no new tests were added — the getter logic is a direct transcription of the previous `match` arms.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code

Closes #56936 from Gamal72/jdbc-value-getter-sealed-type.

Authored-by: Abdelrahman Gamal <a.gamal@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 589e007)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@cloud-fan

Copy link
Copy Markdown
Contributor

Merge Summary:

Posted by merge_spark_pr.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants