Skip to content

Conversation

@jeffhiltz
Copy link

@jeffhiltz jeffhiltz commented Dec 19, 2025

The New Shortcut

This PR adds a shortcut for creating Glue tables that are backed by Iceberg. The new shortcut, GlueIcebergTable, is a subclass of the existing GlueTable class.

Manual Configuration

There are a few configuration options that are not available in CloudFormation and if you wish to use them you need to set them after the resource is created.

Optimizer Configuration

The Table Optimizer API has a number of configuration options that are not exposed in CloudFormation.

CompactionConfiguration

Setting CompactionConfiguration can only be done via API calls (ie: CLI) after the resource has been constructed. Compaction can be enabled using this shortcut, but it cannot be configured. For many cases, the default configuration may be sufficient. The following options require post-creation manual configuration:

  • strategy: the default is binpack. Note that using sort or z-order requires the table to have the sort order manually set via Spark SQL.
  • minInputFiles: minimum number of files to in order to initiate a compaction, default is 100
  • deleteFileThershold: minimum number of deletes that must be present in a data file to make it eligible for compaction, default is 1

OrphanFileDeletionConfiguration

CloudFormation includes support for setting the OrphanFileRetentionPeriodInDays property, but the following must be set using the API/CLI:

  • location: a sub-directory in which to look for files, default is the table location
  • runRateInHours: interval in hours between orphan file deletion job runs, default is 24

RetentionConfiguration

CloudFormation includes support for setting the cleanExpiredFiles, numberOfSnapshotsToRetain and snapshotRetentionPeriodInDays properties, but the following must be set using the API/CLI:

  • runRateInHours: interval in hours between retention job runs, default is 24

Sort Order

Sort order can only be set using Spark SQL.

TODO: add details

Testing

TODO:

  • use the shortcut to create some tables and use them
  • make sure that example Spark SQL code works for setting order (and that the table keeps working)
  • try making a table that uses bucketing (we don't need to do anything extra to support that, right? it's in partition definition? or?)

@jeffhiltz jeffhiltz requested a review from a team December 19, 2025 16:51
@jeffhiltz jeffhiltz added the ai AI coding agents co-authored the code label Dec 19, 2025
const isIcebergTable = filename.includes('glue-iceberg-table');
const ignoreChecks = isIcebergTable ? 'W,E3003,E3002' : 'W';

cp.exec(`cfn-lint ${filepath} --ignore-checks ${ignoreChecks}`, (err, stdout) => {

Check warning

Code scanning / CodeQL

Shell command built from environment values Medium test

This shell command depends on an uncontrolled
file name
.
This shell command depends on an uncontrolled
absolute path
.

Copilot Autofix

AI about 13 hours ago

In general, the problem should be fixed by avoiding dynamic shell command strings when incorporating environment-derived values (like file paths). Instead, invoke the target program directly (without a shell) and pass all variable parts as separate arguments. In Node.js, this means using child_process.execFile/execFileSync/spawn with an argument array, rather than exec with a concatenated string.

For this specific code, we can replace the call to cp.exec on line 25 with cp.execFile, passing "cfn-lint" as the command and supplying filepath and the --ignore-checks option as separate arguments. This removes shell interpretation entirely so any spaces or special characters in filepath cannot alter how the command is parsed. The rest of the logic (promisified interface, error handling via err and stdout) can remain unchanged. We don’t need new imports or helpers: child_process is already required as cp, and execFile is a standard method on that module.

Concretely, in test/shortcuts.test.js, inside the cfnLint function, change:

cp.exec(`cfn-lint ${filepath} --ignore-checks ${ignoreChecks}`, (err, stdout) => {

to:

cp.execFile('cfn-lint', [filepath, '--ignore-checks', ignoreChecks], (err, stdout) => {

This single change addresses both alert variants because it removes both the shell string and the dependence of command parsing on the absolute path.

Suggested changeset 1
test/shortcuts.test.js

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/test/shortcuts.test.js b/test/shortcuts.test.js
--- a/test/shortcuts.test.js
+++ b/test/shortcuts.test.js
@@ -22,7 +22,7 @@
     const isIcebergTable = filename.includes('glue-iceberg-table');
     const ignoreChecks = isIcebergTable ? 'W,E3003,E3002' : 'W';
 
-    cp.exec(`cfn-lint ${filepath} --ignore-checks ${ignoreChecks}`, (err, stdout) => {
+    cp.execFile('cfn-lint', [filepath, '--ignore-checks', ignoreChecks], (err, stdout) => {
       if (err) return reject(new Error(stdout));
       return resolve();
     });
EOF
@@ -22,7 +22,7 @@
const isIcebergTable = filename.includes('glue-iceberg-table');
const ignoreChecks = isIcebergTable ? 'W,E3003,E3002' : 'W';

cp.exec(`cfn-lint ${filepath} --ignore-checks ${ignoreChecks}`, (err, stdout) => {
cp.execFile('cfn-lint', [filepath, '--ignore-checks', ignoreChecks], (err, stdout) => {
if (err) return reject(new Error(stdout));
return resolve();
});
Copilot is powered by AI and may make mistakes. Always verify output.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai AI coding agents co-authored the code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants