Skip to content

Stream hooks#32

Draft
arnaud-lb wants to merge 18 commits into
masterfrom
io-hooks
Draft

Stream hooks#32
arnaud-lb wants to merge 18 commits into
masterfrom
io-hooks

Conversation

@arnaud-lb

@arnaud-lb arnaud-lb commented Jun 2, 2026

Copy link
Copy Markdown
Owner

Introduce function Io\Hooks\set_hooks(Io\Hooks\Hooks $hooks). Methods on the given object are used as a replacement for php_pollfd_for(), and more.

Internally php_pollfd_for() is called before any stream operation that may block, to implement timeouts. Therefore it's a perfect place to context switch.

Example: https://github.com/arnaud-lb/php-src/blob/io-hooks/ext/standard/tests/streams/hooks/use-case.phpt

TODO:

  • Prevent concurrent stream accesses (or serialize accesses)
  • Network clients
    • curl (naive - one multi per exec)
    • curl (advanced - single multi handle)
    • mysqli/mysqlnd (uses streams internally)
    • pgsql
    • PDO (for mysql)
  • Rename poll_multi -> pollMulti
  • EINTR (Execute signal handlers before restarting syscalls php/php-src#22538)
  • Timers (for sleep() and related)
  • Timeouts
  • Are high level streams (SSL, HTTP) safe to consume in blocking mode?
  • Non-blocking fds by default
  • WeakHandle, automatic unwatch, and onRemove callback
  • Ensure that php_pollfd_for() is called when there is no timeout
  • Offloading-compatible API (io_uring)
  • Callback offloading: provide a callback that executes in the scheduler context instead of resuming a fiber
  • Improve API
  • Internal hooks (for TrueAsync)
  • Per-stream hooks
  • set_hooks(): Throw if hooks were already installed?
  • POLLPRI vs EdgeTriggering
  • Replace PollInfo::$timeout_ms by a parameter like on pollMulti()

Current API: https://github.com/arnaud-lb/php-src/blob/io-hooks/ext/standard/io_hooks.stub.php

@arnaud-lb arnaud-lb force-pushed the io-hooks branch 3 times, most recently from 886f6ca to 80cbb8d Compare June 4, 2026 17:26
@arnaud-lb arnaud-lb changed the base branch from master to poll_api June 4, 2026 17:26
@arnaud-lb arnaud-lb force-pushed the io-hooks branch 2 times, most recently from 6d08274 to b6eabf6 Compare June 8, 2026 11:15
@iluuu1994

iluuu1994 commented Jun 19, 2026

Copy link
Copy Markdown

Interesting idea. Two unpolished questions:

  • Do I understand correctly that the hooks are called for every iteration of the IO loop? Normally we'd only need the first to register the handle in the context. The handler will have to prevent repeated registrations. Maybe this is something to be configured when registering the hooks. This might get a bit more tricky when the same handle is used from multiple fibers. When transferred to a second fiber, the hook wouldn't be triggered, unless this unification logic explicitly considers fibers.

  • How could a scheduler implementation remove handles from the watcher? I think what we'd roughly want is a weak reference from the Context so that unreferenced handles are automatically removed, given we can't possibly wait on them anywhere.

@arnaud-lb

arnaud-lb commented Jun 24, 2026

Copy link
Copy Markdown
Owner Author

Thank you for looking into it!

The idea of introducing stream hooks allowing user space to implement a scheduler originates from a suggestion of Frode Børli a long time ago. I'm working on this as part of STF.

Do I understand correctly that the hooks are called for every iteration of the IO loop? Normally we'd only need the first to register the handle in the context. The handler will have to prevent repeated registrations. Maybe this is something to be configured when registering the hooks. This might get a bit more tricky when the same handle is used from multiple fibers. When transferred to a second fiber, the hook wouldn't be triggered, unless this unification logic explicitly considers fibers.

In the current iteration, poll() is called every time I/O is about to be performed on the given stream (so just before any I/O). The method is expected to return only when the stream is ready, so the I/O performed just after that will not block. The hook can implement multiplexing and concurrency by switching to another fiber while the stream is not ready.

The plan is to change this logic a bit, to make the system compatible with EdgeTriggered mode, which allows to register streams only once:

  • Make fds non-blocking at stream creation
  • Perform I/O optimistically before polling. When this results in EAGAIN, poll and try again (this is faster, and the fact that we only poll when I/O is not ready makes it compatible with EdgeTriggered mode)
  • Add a new unwatch() method to let userspace remove the stream from the context when it's about to be closed

This allows userspace to register streams only once. Systems without EdgeTriggered can emulate it.

Handling concurrent accesses to the same stream (which leads to multiple fibers waiting for the same stream) is on my TODO, but I'm not sure how this should be handled yet. What's almost certain is that these accesses will needed to wait in higher layers, not in php_pollfd_for() (we can likely call poll() during parameter parsing when the stream is already being polled).

How could a scheduler implementation remove handles from the watcher? I think what we'd roughly want is a weak reference from the Context so that unreferenced handles are automatically removed, given we can't possibly wait on them anywhere.

Exactly. Right now the plan is to introduce an Io\Poll\WeakHandle so that we can register a stream without preventing it from closing, plus the Io\Hooks\Hooks::unwatch() method to allow the scheduler to cleanup when a stream is closed.

Edit: All of these have been implemented now.

arnaud-lb added 15 commits June 25, 2026 17:59
Introduce function stream_set_hook(callable $hook). The given hook is called
just before performing a read or write operation on any stream, and must have
the following signature: function (/*resource*/ $fd, StreamOperation
$operation).
Closing a stream resource frees the stream itself, so doing that in a hook will
result in UAFs in some parent function.

Possible solutions:

 * Deny fclose() during hook invokation
 * Never access streams after stream operations, or use the resource to check
   whether the stream was closed
 * Do not free the php_stream itself in fclose(). Replace ->ops with always-fail
   handlers, mark as eof.
The stream hook is now a php_pollfd_for() replacement.

Stream ops typically call php_pollfd_for() before any blocking operations, to
implement timeouts. We can hook here to delegate polling to user-space.

TODO:
 * php_pollfd_for() is not called where there is no timeout. Ensure that we call
   it when a hook is installed.
 * Prevent concurrent stream ops (lock / serialize)
 * Timeouts should be handled by the hook
Rationale:
 * We can't ensure consistency of internal structures when a stream is accessed
   concurrency, at least for the same direction.
 * It may be possible to accept concurrent accesses from different directions
   (read and write)
 * Valid use-cases for concurrent accesses in the same direction are unknown.
   This would require synchronization from the concumer.
 * Introduce StreamPollWeakHandle, which holds its stream weakly
 * Context stops watching any Handle whose stream is collected
 * Introduce Context::onWatcherRemoved(?callable $callback). Callback is invoked
   when a weak handle is removed automatically.
 * Generalized to Curl's SocketHandle -> SocketWeakHandle
 * Mark OS sockets as non-blocking when creating a stream (without affecting the
   stream's blocking status)
 * Perform I/O optimistically before polling
 * Poll on EAGAIN only

This should be faster (less polling), and makes operations compatible with
edge-triggering.
@arnaud-lb arnaud-lb changed the base branch from poll_api to master June 30, 2026 16:02
@arnaud-lb arnaud-lb changed the title Stream hook Stream hooks Jul 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants