Skip to content

Conversation

@acwhite211
Copy link
Member

@acwhite211 acwhite211 commented Sep 26, 2025

Fixes #6490

Look at Django docs, it seems that explicitly locking mysql tables from within a Django atomic transaction can lead to issues of tables not getting unlocked. The post_resource function has the transaction.atomic decorator, which calls create_obj -> autonumber_and_save -> do_autonumbering -> lock_tables. This needs to be avoid in order to evade the issue.

LOCK TABLES implicitly commits any active transaction, and while tables are locked the connection is restricted to those tables only. When Django later tries to manage the savepoint/transaction stack, the connection state can no longer matches expectations. This can leave the connection stuck.

This solution tries using GET_LOCK and RELEASE_LOCK, acting like a mutex. Using these instead of lock tables should avoid problems of query transactions getting stuck. These sql named locks are scoped so that only one autonumbering operation can only be performed on a table at a time. This will avoid race-conditions between autonumbering operations.

In addition to the sql named lock, we'll use Django's select_for_update function to perform row-level locking during the autonumbering operation. This implementation will replace the need to create explicit table locks. While the sql named lock will prevent race-conditions between autonumbering process, this row-level locking will ensure that other processes will not create/edit records that the autonumbering operation depends on while running. Indexes are added to the common autonumbering fields in order to get the row-level locking to work with Django.

Checklist

  • Self-review the PR after opening it to make sure the changes look good and
    self-explanatory (or properly documented)
  • Add relevant issue to release milestone
  • Add pr to documentation list

Testing instructions

  • Test creating a collection object to make sure that the autonumbering functionality is working.
  • Try creating two collection objects, in two different windows, at the same time, and see that both get created without timing out.
  • Setup up two different medium-large batch edits in separate windows. Try running them both at relatively the same time. See that they both complete and one does not hang indefinitely.
  • Try running queries in the QB will a batch edit is running. Hopefully see that the queries complete before timing out.

@acwhite211
Copy link
Member Author

Hey @melton-jason, can you test this PR out before I open it up to other testers.

@acwhite211
Copy link
Member Author

I created some other solutions that try to avoid race conditions when doing autonumbering, but none are fully satisfactory:

@acwhite211 acwhite211 marked this pull request as ready for review October 24, 2025 20:25
@acwhite211 acwhite211 changed the title Fix autonumbering locking with GET_LOCK and RELEASE_LOCK Fix autonumbering locking with SQL named locks and Django row locking Oct 24, 2025
@acwhite211
Copy link
Member Author

TODO: add indexes to all the common used autonumbering fields.

Copy link
Contributor

@melton-jason melton-jason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for taking so long on this.
I haven't yet finished reviewing the autonumbering/locking changes, but here are my comments so far regarding the migrations!

@github-project-automation github-project-automation bot moved this from 📋Back Log to Dev Attention Needed in General Tester Board Nov 19, 2025
Copy link
Contributor

@melton-jason melton-jason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had never heard of GET_LOCK before this PR!
Looks like it has potential and this implementation handles the implicit transaction commit problem with LOCK_TABLES mentioned in #6490 (comment) and #6490 (comment) 👏.

I don't think it's mentioned in this PR or on the Issue yet, but this PR would subsume/replace #5404.


I will provide a note here that this implementation utilizing the advisory GET_LOCKS and row level locks from FOR UPDATE is more vulnerable then the prior exclusive full table lock.
Notably, if there are any database triggers or people are interacting directly with the database, outside from the row level update locks, there are little safe guards in the way to prevent "messing up" or interfering with Specify's autoincrementing code (e.g., Specify is vulnerable to race conditions related to record inserts).
For example, it is now possible for some agent with a direct connection to the database to insert a record while Specify is autoincrementing (after it has read the largest record but before committing the new record), resulting in Specify and the agent inserting "duplicate" or "out-of-sync with the numbering scheme" records

melton-jason added a commit that referenced this pull request Jan 14, 2026
@grantfitzsimmons grantfitzsimmons added this to the 7.13.0 milestone Jan 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Dev Attention Needed

Development

Successfully merging this pull request may close these issues.

The Specify Worker unexpectedly stops responding

4 participants