Skip to content

Conversation

@monchin
Copy link

@monchin monchin commented Jan 25, 2026

Hi, I refered pymupdf to write a library to extract pdf tables, and found a table-extracting bug when one strategy is "text" and the other is not, please see monchin/tablers#8 for more details.

I have fixed it in my library, and I found it also occurs in pymupdf, so I'd like to fix it.

@JorjMcKie
Copy link
Collaborator

We already support parameters add_lines and add_boxes which act like virtual vector graphics and obviously already lead to more edges.
So I fail to see the benefit of adding edges in the proposed way: just use add_lines.
In addition, I do not understand why a feature of this kind should be made dependent on specific detection strategies: If I have "external" information that helps the table algorithm being successful, then I certainly should supply all of it - and not care about redundancy considerations: the algorithm is clever enough to drop stuff that it doesn't need.

@monchin
Copy link
Author

monchin commented Jan 31, 2026

Thank you for your reply!

So I fail to see the benefit of adding edges in the proposed way: just use add_lines.

I proposed this PR because IMHO users may want to use library as easy as possible, and as correct as possible. add_lines is very flexible, but the problem for this scenario does exist, and as long as the users aim to get correct results in this scenario, they need to extend text edges. So, why don't we do that for users?

why a feature of this kind should be made dependent on specific detection strategies

extend_edges is offered in the new code and anyone could use it by any means. But as I said, "one strat text and the other non-text" is specific, but the current problem in this specific strategy is general. With new code, users could get correct results without any effort in this scenario, that's why I believe it would be good to automatically extend edges in this scenario. If users have some other requirements, it's also OK for them to add other lines as you said.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants