Skip to content

branch-4.1: [fix](simplify agg) SimplifyAggGroupBy should verify injectivity #64335#65109

Merged
yiguolei merged 1 commit into
branch-4.1from
auto-pick-64335-branch-4.1
Jul 2, 2026
Merged

branch-4.1: [fix](simplify agg) SimplifyAggGroupBy should verify injectivity #64335#65109
yiguolei merged 1 commit into
branch-4.1from
auto-pick-64335-branch-4.1

Conversation

@github-actions

@github-actions github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Cherry-picked from #64335

)

## Problem

`SimplifyAggGroupBy` simplified `GROUP BY f(x)` to `GROUP BY x` without
verifying that `f(x)` is injective (one-to-one). This caused wrong
results:

| Expression | Why wrong |
|---|---|
| `a * 0` / `0 * a` | always evaluates to 0 — all rows fall into one
group |
| `0 / a` | always evaluates to 0 |
| `a / 0` | division by zero |
| `a + NULL` / `a * NULL` / ... | always evaluates to NULL |
| `a * 0.1` with float/double | precision loss may map different inputs
to same result |

## Fix

1. **`isBinaryArithmeticSlot`**: restructured to separate slot-expr from
literal,
then validate each independently. Float/double check runs early, before
   slot extraction.

2. **New `checkLiteral(expr, literal)`**: rejects NULL literal and
   Multiply/Divide by zero.

3. **New `canExtractSlot(expr)`**: replaces the old unconditional
`extractSlotOrCastOnSlot` — only accepts bare `Slot` or implicit
lossless
   widening casts (integral→integral, float→double, integral→decimal,
decimal→decimal). Range and scale are compared directly for correctness.

## Changes

- `SimplifyAggGroupBy.java`: +80 lines, rewritten core logic
- `ExpressionUtils.java`: -35 lines, removed unused `isSlotOrCastOnSlot`
/
  `extractSlotOrCastOnSlot`
- `SimplifyAggGroupByTest.java`: +216 lines, 25 tests covering all new
paths

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions github-actions Bot requested a review from yiguolei as a code owner July 1, 2026 09:57
@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@hello-stephen

Copy link
Copy Markdown
Contributor

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 100.00% (27/27) 🎉
Increment coverage report
Complete coverage report

@yujun777

yujun777 commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

run nonConcurrent

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 38.78% (19/49) 🎉
Increment coverage report
Complete coverage report

1 similar comment
@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 38.78% (19/49) 🎉
Increment coverage report
Complete coverage report

@yiguolei yiguolei merged commit 56c3abf into branch-4.1 Jul 2, 2026
29 of 32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants