Native row-major PCA#3036
Conversation
| auto components_copy_view = raft::make_device_matrix_view<math_t, idx_t, LayoutPolicy>( | ||
| components_copy.data(), n_components, n_cols); |
There was a problem hiding this comment.
Why explicitly here instead of inline in the diff before this PR?
| auto components_copy_view = raft::make_device_matrix_view<math_t, idx_t, LayoutPolicy>( | ||
| components_copy.data(), n_components, n_cols); |
| void to_row_major(const T* col_major_src, T* row_major_dst, int n_rows, int n_cols) | ||
| { | ||
| std::vector<T> host_col(n_rows * n_cols); | ||
| std::vector<T> host_row(n_rows * n_cols); | ||
| raft::update_host(host_col.data(), col_major_src, n_rows * n_cols, stream); | ||
| RAFT_CUDA_TRY(cudaStreamSynchronize(stream)); | ||
| for (int i = 0; i < n_rows; ++i) { | ||
| for (int j = 0; j < n_cols; ++j) { | ||
| host_row[i * n_cols + j] = host_col[j * n_rows + i]; | ||
| } | ||
| } | ||
| raft::update_device(row_major_dst, host_row.data(), n_rows * n_cols, stream); | ||
| } |
There was a problem hiding this comment.
Why is this needed? You're generating 1-dimensional random data. It can be interpreted as row or col major.
| raft::stats::cov<input_row_major>( | ||
| handle, cov.data(), input.data_handle(), mu.data_handle(), n_cols, n_rows, true, true, stream); | ||
|
|
||
| // The eigendecomposition of the (symmetric) covariance matrix naturally produces a |
There was a problem hiding this comment.
But since covar matrix is symmetric, it shouldn't matter if it's column or row layout, right? Why do we need to transpose at all?
Co-authored-by: Divye Gala <divyegala@gmail.com>
|
Hi, all the comments should be addressed now in the recent batch of commits. |
| components_all.data(), n_cols, n_cols), | ||
| raft::make_device_matrix_view<math_t, idx_t, raft::col_major>( | ||
| components.data_handle(), n_components, n_cols)); | ||
| if constexpr (is_row_major) { |
There was a problem hiding this comment.
Do you not need to call trunc_zero_origin in this branch?
There was a problem hiding this comment.
It is doing the same thing as trunc_zero_origin except using a simple raft::copy.
The goal is to copy a slice of some rows from the src n_cols x n_cols matrix.
If src is row-major we can just get the prefix n_components x n_cols from memory.
If src is col-major then we need to copy some elements, skip some, and the copy some more etc. That is what trunc_zero_origin is doing.
|
@coderabbitai review |
✅ Action performedReview finished.
|
📝 WalkthroughSummary by CodeRabbit
WalkthroughPCA and tSVD internal and public functions across raft's linalg module are generalized with a ChangesPCA/tSVD Layout Generalization
Estimated code review effort: 4 (Complex) | ~60 minutes Related issues: None found in provided context. Related PRs: None found in provided context. Suggested labels: enhancement, cpp Suggested reviewers: None found in provided context. Poem 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cpp/tests/linalg/pca.cu (1)
135-198: 🎯 Functional Correctness | 🟠 Major | 🏗️ Heavy liftHIGH:
advancedTestround-trip does not validate PCA correctness for row-major.With
n_components == params.n_col2(full rank) andprms.whiten = false, transform followed by inverse reduces toZ·V = X_c·VᵀV = X_cfor any orthonormalV. Thedata2 ≈ data2_backassertion therefore passes regardless of whether the row-major path produces the correct eigenvectors, the correct component ordering, or the correctsign_flip_componentsresult. The only reference-based checks (explained_vars/components/trans_data) run throughbasicTest, which is col-major only.Consider adding row-major coverage that (a) truncates (
n_components < n_cols) and (b) validates against a reference (e.g. compare eigenvalues/components against the col-major result on the same data). The whitening path (whiten = true, exercisingbinary_div_skip_zero/binary_mult_skip_zerowithLayoutPolicyviews) is also untested for both layouts.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/tests/linalg/pca.cu` around lines 135 - 198, `advancedTest` only checks an inverse-transform round trip, which can succeed even if the row-major PCA path is wrong. Update `advancedTest` to use a non-full-rank case with `n_components < n_cols` and add a reference comparison for the row-major `pca_fit_transform`/`pca_inverse_transform` results against the col-major path on the same input, including `components`, `explained_vars`, and `trans_view` ordering/sign handling. Also add coverage for the `prms.whiten = true` path so the `LayoutPolicy`-dependent branches in `binary_div_skip_zero`/`binary_mult_skip_zero` are exercised.Source: Path instructions
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@cpp/tests/linalg/pca.cu`:
- Around line 135-198: `advancedTest` only checks an inverse-transform round
trip, which can succeed even if the row-major PCA path is wrong. Update
`advancedTest` to use a non-full-rank case with `n_components < n_cols` and add
a reference comparison for the row-major
`pca_fit_transform`/`pca_inverse_transform` results against the col-major path
on the same input, including `components`, `explained_vars`, and `trans_view`
ordering/sign handling. Also add coverage for the `prms.whiten = true` path so
the `LayoutPolicy`-dependent branches in
`binary_div_skip_zero`/`binary_mult_skip_zero` are exercised.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: d53b3a91-b4d4-40ca-a12f-a8e17df5734e
📒 Files selected for processing (4)
cpp/include/raft/linalg/detail/pca.cuhcpp/include/raft/linalg/detail/tsvd.cuhcpp/include/raft/linalg/pca.cuhcpp/tests/linalg/pca.cu
Currently PCA only supports col-major which is not ideal for consumers passing in row-major datasets.
This PR exposes row-major APIs for PCA and calls the corresponding row-major raft primitives based on templated row or col layout.
This PR is backward compatible with cuml and cuvs.