Skip to content

[SPARK-57847][SQL] Support the TIME data type in approx_count_distinct_for_intervals#56934

Open
yadavay-amzn wants to merge 2 commits into
apache:masterfrom
yadavay-amzn:SPARK-57847
Open

[SPARK-57847][SQL] Support the TIME data type in approx_count_distinct_for_intervals#56934
yadavay-amzn wants to merge 2 commits into
apache:masterfrom
yadavay-amzn:SPARK-57847

Conversation

@yadavay-amzn

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Adds TimeType to the input types accepted by the approx_count_distinct_for_intervals aggregate. TIME values are bucketed by their internal nanosecond-of-day Long representation, routed through the same Long -> Double path already used for TimestampType / DayTimeIntervalType.

Why are the changes needed?

approx_count_distinct_for_intervals accepts numeric/date/timestamp/interval endpoints but rejected TIME at analysis time. TIME has a natural numeric (nanosecond-of-day) ordering, so it can be bucketed like the other temporal types.

Does this PR introduce any user-facing change?

Yes - approx_count_distinct_for_intervals now accepts TIME columns and endpoints.

How was this patch tested?

Extended ApproxCountDistinctForIntervalsSuite with TIME endpoints asserting the per-interval approximate distinct counts; the error-message expectations were updated to include TIME.

Was this patch authored or co-authored using generative AI tooling?

Authored with assistance by Claude Opus 4.8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant