-
Notifications
You must be signed in to change notification settings - Fork 10
DATAPLT-1268 Add shortcut for Iceberg-backed Glue Table #161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| const isIcebergTable = filename.includes('glue-iceberg-table'); | ||
| const ignoreChecks = isIcebergTable ? 'W,E3003,E3002' : 'W'; | ||
|
|
||
| cp.exec(`cfn-lint ${filepath} --ignore-checks ${ignoreChecks}`, (err, stdout) => { |
Check warning
Code scanning / CodeQL
Shell command built from environment values Medium test
file name
This shell command depends on an uncontrolled
absolute path
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 13 hours ago
In general, the problem should be fixed by avoiding dynamic shell command strings when incorporating environment-derived values (like file paths). Instead, invoke the target program directly (without a shell) and pass all variable parts as separate arguments. In Node.js, this means using child_process.execFile/execFileSync/spawn with an argument array, rather than exec with a concatenated string.
For this specific code, we can replace the call to cp.exec on line 25 with cp.execFile, passing "cfn-lint" as the command and supplying filepath and the --ignore-checks option as separate arguments. This removes shell interpretation entirely so any spaces or special characters in filepath cannot alter how the command is parsed. The rest of the logic (promisified interface, error handling via err and stdout) can remain unchanged. We don’t need new imports or helpers: child_process is already required as cp, and execFile is a standard method on that module.
Concretely, in test/shortcuts.test.js, inside the cfnLint function, change:
cp.exec(`cfn-lint ${filepath} --ignore-checks ${ignoreChecks}`, (err, stdout) => {to:
cp.execFile('cfn-lint', [filepath, '--ignore-checks', ignoreChecks], (err, stdout) => {This single change addresses both alert variants because it removes both the shell string and the dependence of command parsing on the absolute path.
-
Copy modified line R25
| @@ -22,7 +22,7 @@ | ||
| const isIcebergTable = filename.includes('glue-iceberg-table'); | ||
| const ignoreChecks = isIcebergTable ? 'W,E3003,E3002' : 'W'; | ||
|
|
||
| cp.exec(`cfn-lint ${filepath} --ignore-checks ${ignoreChecks}`, (err, stdout) => { | ||
| cp.execFile('cfn-lint', [filepath, '--ignore-checks', ignoreChecks], (err, stdout) => { | ||
| if (err) return reject(new Error(stdout)); | ||
| return resolve(); | ||
| }); |
The New Shortcut
This PR adds a shortcut for creating Glue tables that are backed by Iceberg. The new shortcut, GlueIcebergTable, is a subclass of the existing GlueTable class.
Manual Configuration
There are a few configuration options that are not available in CloudFormation and if you wish to use them you need to set them after the resource is created.
Optimizer Configuration
The Table Optimizer API has a number of configuration options that are not exposed in CloudFormation.
CompactionConfiguration
Setting CompactionConfiguration can only be done via API calls (ie: CLI) after the resource has been constructed. Compaction can be enabled using this shortcut, but it cannot be configured. For many cases, the default configuration may be sufficient. The following options require post-creation manual configuration:
strategy: the default isbinpack. Note that usingsortorz-orderrequires the table to have the sort order manually set via Spark SQL.minInputFiles: minimum number of files to in order to initiate a compaction, default is 100deleteFileThershold: minimum number of deletes that must be present in a data file to make it eligible for compaction, default is 1OrphanFileDeletionConfiguration
CloudFormation includes support for setting the
OrphanFileRetentionPeriodInDaysproperty, but the following must be set using the API/CLI:location: a sub-directory in which to look for files, default is the table locationrunRateInHours: interval in hours between orphan file deletion job runs, default is 24RetentionConfiguration
CloudFormation includes support for setting the
cleanExpiredFiles,numberOfSnapshotsToRetainandsnapshotRetentionPeriodInDaysproperties, but the following must be set using the API/CLI:runRateInHours: interval in hours between retention job runs, default is 24Sort Order
Sort order can only be set using Spark SQL.
TODO: add details
Testing
TODO: