Skip to content

docs(domain-skills): add X (Twitter) articles + tweets reading recipe#484

Open
optemism wants to merge 1 commit into
browser-use:mainfrom
optemism:feat/x-articles-domain-skill
Open

docs(domain-skills): add X (Twitter) articles + tweets reading recipe#484
optemism wants to merge 1 commit into
browser-use:mainfrom
optemism:feat/x-articles-domain-skill

Conversation

@optemism

@optemism optemism commented Jul 3, 2026

Copy link
Copy Markdown

What

Adds domain-skills/x/articles.md — a field-tested recipe for reading X Articles (long-form posts) and regular tweets.

Key findings captured

  • X Articles share the tweet URL shape (x.com/{handle}/status/{id}) — there is no /article/ route; document.title tells you which you got.
  • The "See what's happening" login modal blocks clicks, not DOM reads — the article body is fully hydrated underneath and document.body.innerText reads straight through it. (Observed with a logged-in profile; fully logged-out is flagged as unverified.)
  • [data-testid="tweetText"] is empty for Articles — it works for regular tweets/threads only; Articles need the innerText path.
  • Tab-drift trap: navigate + extract must happen in ONE browser-harness -c invocation, or a follow-up call can attach to a stale/different tab. ensure_real_tab() referenced as the canonical remedy.
  • Hydration is lazy — wait_for_load() alone returns a short/empty body; needs wait(3–4).
  • Body-slicing helper anchored on the post-timestamp regex, with noted failure modes.

Follows the SKILL.md domain-skill conventions: no pixel coordinates, no run narration, no secrets; explicit "does not work / untested" section.

🤖 Generated with Claude Code


Summary by cubic

Adds domain-skills/x/articles.md, a concise recipe for reading X (Twitter) Articles and regular tweets via DOM text extraction. Covers URL shape, selectors, timing, and tab attachment to make extraction reliable even with the login modal.

  • New Features
    • Articles share the tweet URL x.com/{handle}/status/{id}; use document.title to distinguish.
    • For Articles, read document.body.innerText; the login modal blocks clicks but not reads.
    • For tweets/threads, use [data-testid="tweetText"] (empty for Articles).
    • Avoid tab drift by navigating and extracting in one invocation; wait 3–4s for hydration and use a timestamp-anchored slice to isolate the body.

Written for commit 49a9922. Summary will update on new commits.

Review in cubic

Field-tested against an X Article (long-form post). Covers: Articles share
the tweet URL shape, the login modal blocks clicks but not DOM reads
(observed logged-in; logged-out unverified), innerText extraction + body
slicing, tweetText selector for regular tweets (empty for Articles), and
the navigate-and-extract-in-one-invocation tab-drift trap.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@browser-harness-review

Copy link
Copy Markdown

✅ Skill review passed

Reviewed 1 file(s) — no findings.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="domain-skills/x/articles.md">

<violation number="1" location="domain-skills/x/articles.md:64">
P2: `x_article_body` can silently return a near-empty string when no newline follows the headline. When `full_text.find('\n', i)` returns `-1` (no newline after the headline), `start` becomes `-1`. Python slicing `full_text[-1:]` then evaluates to just the last character of the page, the regex fails, and the function returns a single character instead of the article body. Add a guard so `start` falls back to `i + len(headline)` when no newline is found.</violation>
</file>

Reply with feedback, questions, or to request a fix.

Fix all with cubic | Re-trigger cubic

def x_article_body(full_text, headline):
# start just after the headline's first occurrence in the content area
i = full_text.find(headline)
start = full_text.find('\n', i) if i != -1 else 0

@cubic-dev-ai cubic-dev-ai Bot Jul 3, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: x_article_body can silently return a near-empty string when no newline follows the headline. When full_text.find('\n', i) returns -1 (no newline after the headline), start becomes -1. Python slicing full_text[-1:] then evaluates to just the last character of the page, the regex fails, and the function returns a single character instead of the article body. Add a guard so start falls back to i + len(headline) when no newline is found.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At domain-skills/x/articles.md, line 64:

<comment>`x_article_body` can silently return a near-empty string when no newline follows the headline. When `full_text.find('\n', i)` returns `-1` (no newline after the headline), `start` becomes `-1`. Python slicing `full_text[-1:]` then evaluates to just the last character of the page, the regex fails, and the function returns a single character instead of the article body. Add a guard so `start` falls back to `i + len(headline)` when no newline is found.</comment>

<file context>
@@ -0,0 +1,131 @@
+def x_article_body(full_text, headline):
+    # start just after the headline's first occurrence in the content area
+    i = full_text.find(headline)
+    start = full_text.find('\n', i) if i != -1 else 0
+    # end at the post timestamp ("H:MM AM/PM · Mon DD, YYYY") — search AFTER
+    # start, or a timestamp-shaped string earlier in the page truncates the body
</file context>
Fix with cubic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant