Skip to content

Conversation

@kagalenko-m-b
Copy link

@kagalenko-m-b kagalenko-m-b commented Mar 4, 2025

This PR adds the ability to unpack a binary buffer while skipping some parts of it. The usage may be illustrated by this fragment from runtests.jl:

@io struct RawData
    A::UInt32       # offset 0
    dummy_1::UInt16 # offset 4
    B::UInt16       # offset 6
    dummy_2::UInt32 # offset 8
    C::UInt128      # offset 12
    dummy_3::UInt16 # offset 28
    dummy_4::UInt32 # offset 30
    D::UInt8        # offset 34
end align_packed

@io struct OffsetStruct
    A::UInt32  ~ 0
    C::UInt128 ~ 12
    B::UInt16  ~ 6
    D::UInt8   ~ 34
end align_offset

A,B,C,D = 1,2,3,4
raw_data = RawData(UInt32(A),0xBEEF,UInt16(B),0xDEADBEEF,UInt128(C),
                                      0xBEEF, 0xDEADBEEF, UInt8(D));
pack(buf, raw_data, endian)
seekstart(buf)
@test unpack(buf, OffsetStruct, endian) == OffsetStruct(A, C, B, D)

The pack() method for align_offset packing strategy is not yet implemented, because there are several ways to go about it, and I could use some discussion. If there's an interest in merging this PR, I'll fill the gaps.

@codecov
Copy link

codecov bot commented Mar 4, 2025

Codecov Report

❌ Patch coverage is 89.47368% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.93%. Comparing base (1b79251) to head (3f060fe).
⚠️ Report is 8 commits behind head on master.

Files with missing lines Patch % Lines
src/StructIO.jl 89.47% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master      #34      +/-   ##
==========================================
- Coverage   94.53%   92.93%   -1.60%     
==========================================
  Files           1        1              
  Lines         128      184      +56     
==========================================
+ Hits          121      171      +50     
- Misses          7       13       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@stevengj
Copy link
Member

stevengj commented Jul 2, 2025

Sorry, I don't think anyone has had a chance to look at this.

I'm a little skeptical of just entering the raw byte offset for each field — seems error prone (what if the user gives invalid offsets that produce overlapping fields?), and confusing to be able to specify the fields out of order.

An alternative would be to just support _ as a dummy field name that doesn't get put into the final struct, e.g.:

@io struct RawData
    A::UInt32 
    _::UInt16 # 2 bytes of padding
    B::UInt16
    _::UInt32 # 4 bytes of padding
    C::UInt128
    _::NTuple{6,UInt8} # 6 bytes of padding
    D::UInt8
end align_packed

(Strictly speaking this would be a breaking change, as Julia currently allows you to have fields named _, but as a practical matter I don't think this should be an issue. We could also have something like

@io struct RawData
    A::UInt32 
    _::2
    B::UInt16
    _::4
    C::UInt128
    _::6
    D::UInt8
end align_packed

in which _::N would denote a hidden/dummy NTuple{N,UInt8} field, which might be even clearer (and would be backward compatible since numeric types are not allowed).

(One question that occurs to me is whether the offset should be in bytes, or bits? But I guess we don't support bit-packing right now anyway.)

@kagalenko-m-b
Copy link
Author

kagalenko-m-b commented Jul 2, 2025

I'm a little skeptical of just entering the raw byte offset for each field — seems error prone (what if the user gives invalid offsets that produce overlapping fields?),

That produces assert error in unpack(), tested for in runtests.jl

and confusing to be able to specify the fields out of order.

Easy to remove that. I actually took extra steps to implement the functionality.

We could also have something like ... in which _::N would denote a hidden/dummy NTuple{N,UInt8} field

That seems to me clearer, indeed, compared to the dummy field name with an existing type.

The question is, how to do packing of this structure with gaps. The easiest to implement approach is just to ignore gaps. Ability to write to only a few needed offsets in existing binary file could be useful, however. Perhaps, the gap specification can be added to all strategies, and additional align_gapped strategy would allow to write out the "holey" struct, skipping the gaps.

One question that occurs to me is whether the offset should be in bytes, or bits? But I guess we don't support bit-packing right now anyway

The default may be in bytes, and if there's a need for bits offset, additional syntax can be added later:

@io struct RawData
    A::UInt32 
    _::2
    B::UInt16
    _::4
    C::UInt128
    _::_6bits
    D::UInt8
end align_gapped

But with no way to seek() or skip() non-integer number of bytes, implementation would involve fiddling with shifts and bitmasks.

@stevengj
Copy link
Member

stevengj commented Jul 7, 2025

The question is, how to do packing of this structure with gaps.

I thought that the point is that

@io struct RawData
    A::UInt32 
    _::2
    B::UInt16
    _::4
    C::UInt128
    _::6
    D::UInt8
end align_packed

would be exactly the same on-disk format as

@io struct RawData
    A::UInt32 
    _dummy1::NTuple{2,UInt8}
    B::UInt16
    _dummy2::NTuple{4,UInt8}
    C::UInt128
    _dummy3::NTuple{6,UInt8}
    D::UInt8
end align_packed

for I/O purposes, except that on the Julia side the struct would simply omit the _ fields and so they could be skipped over for reading/writing.

@kagalenko-m-b
Copy link
Author

The question is, how to do packing of this structure with gaps.

I thought that the point is that ...
would be exactly the same on-disk format as

Someone may want to pull just the needed fields from structure and then write them out in more compact file: "_packed" sort of conveys the idea of compacting the structure. So align_packed would write only

struct RawData
    A::UInt32 
    B::UInt16
    C::UInt128
    D::UInt8
end

whereas align_gapped would keep the gaps.

@stevengj
Copy link
Member

stevengj commented Jul 7, 2025

Someone may want to pull just the needed fields from structure and then write them out in more compact file: "_packed" sort of conveys the idea of compacting the structure. So align_packed would write only

I think in that case you would want to copy to a different struct.

@kagalenko-m-b
Copy link
Author

kagalenko-m-b commented Jul 8, 2025

I think in that case you would want to copy to a different struct.

Actually, more interesting question is what to do when encountering a node with gap specification like :(_::4) (or a sequence of them) . Those can be either removed, or replaced with something 'valid'.

  • If I remove it, I have to somehow associate it with the next following valid structure field, because unsafe_unpack() iterates over fields. Then it becomes necessary to detect every node that contains a valid structure field specification, with all the special cases that are allowed or will be allowed by the Julia compiler. Is it safe to assume that an expression node for a field will always be a Symbol followed by the double colon? If yes, then implementation is easy.

  • If I replace it with a field with a special type that does not actually read anything during unsafe_unpack(), that dummy field becomes visible to user, unless I also overload propertynames(), set- and get-methods and so forth.

The syntax currently proposed by this PR sidesteps those questions, because an offset is linked to the field.

@stevengj
Copy link
Member

stevengj commented Jul 9, 2025

Is it safe to assume that an expression node for a field will always be a Symbol followed by the double colon? If yes, then implementation is easy.

I think it would be safe to restrict the support to this case.

If I replace it with a field with a special type that does not actually read anything during unsafe_unpack(), that dummy field becomes visible to user,

The whole point is that the _ fields won't actually exist in the Julia struct. They just add offsets during I/O.

@kagalenko-m-b
Copy link
Author

I implemented the proposed syntax in #35

@kagalenko-m-b
Copy link
Author

Closed because #35 has more intuitive syntax

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants