Skip to content

fix(together): record failed spans on API errors#4083

Open
muhamedfazalps wants to merge 2 commits intotraceloop:mainfrom
muhamedfazalps:together-http-error-spans
Open

fix(together): record failed spans on API errors#4083
muhamedfazalps wants to merge 2 commits intotraceloop:mainfrom
muhamedfazalps:together-http-error-spans

Conversation

@muhamedfazalps
Copy link
Copy Markdown

@muhamedfazalps muhamedfazalps commented May 7, 2026

Summary

  • record exceptions and set ERROR span status when Together API calls raise
  • keep the exception behavior unchanged by re-raising after ending the span
  • add focused unit tests covering failed span recording

Testing

  • uv run --directory packages/opentelemetry-instrumentation-together --group test pytest tests -q

Related to #412. This keeps the fix scoped to the Together instrumentation package, matching the maintainer preference for small PRs.

Summary by CodeRabbit

  • Bug Fixes

    • Improved error handling in the Together AI integration so exceptions are recorded on traces, spans are marked as errors and properly closed when failures occur.
  • Tests

    • Added unit tests covering error scenarios to validate span recording, error status, and proper propagation of exceptions.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 7, 2026

Review Change Stack
No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b4aedc82-7b4e-4956-bb34-7985fff171b4

📥 Commits

Reviewing files that changed from the base of the PR and between 996c4b7 and 9a6fb5c.

⛔ Files ignored due to path filters (1)
  • packages/opentelemetry-instrumentation-together/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (1)
  • packages/opentelemetry-instrumentation-together/tests/test_errors.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/opentelemetry-instrumentation-together/tests/test_errors.py

📝 Walkthrough

Walkthrough

The Together AI instrumentor adds explicit exception handling to the _wrap function: exceptions from the wrapped API call are recorded on the active span, the span status is set to ERROR (with the exception message when available), the span is ended, and the exception is re-raised. Tests cover these error scenarios.

Changes

Together AI Error Handling

Layer / File(s) Summary
Error Handling Implementation
packages/opentelemetry-instrumentation-together/opentelemetry/instrumentation/together/__init__.py
The _wrap function is changed to call the wrapped method inside a try/except; on exception it records the exception on the span, sets span status to ERROR with the exception message, ends the span, and re-raises. The success path still sets OK and ends the span.
Exception Handling Tests
packages/opentelemetry-instrumentation-together/tests/test_errors.py
Adds _make_wrapper() and two tests (test_chat_api_error_marks_span_failed, test_chat_api_error_records_exception_message) that assert exceptions from the wrapped callable cause exception recording, set_status(StatusCode.ERROR, ...) when applicable, span end, and propagation of the original exception.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A nimble rabbit hums, alert and bright,
When calls go wrong I catch the flight,
I note the error, mark it true,
End the span and pass it through.
🌿 Together we hop, and log the night.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(together): record failed spans on API errors' clearly and specifically describes the main change: adding error handling to record failed spans when Together API calls raise exceptions.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
packages/opentelemetry-instrumentation-together/tests/test_errors.py (2)

30-39: ⚡ Quick win

span.end assertion missing, and prefer direct enum comparison over .name.

Two small gaps in test_chat_api_error_records_exception_message:

  1. span.end.assert_called_once() is not asserted here — test 1 covers it for a different error type, but since this test independently constructs a new span mock it should verify the span is also ended for completeness.
  2. Comparing status_arg.status_code.name == "ERROR" via string is fragile; comparing directly against the imported StatusCode enum is more idiomatic and immune to name changes.
🔧 Proposed improvement
+from opentelemetry.trace.status import StatusCode
+
 def test_chat_api_error_records_exception_message():
     span, wrapper = _make_wrapper()
     wrapped = MagicMock(side_effect=ValueError("bad request"))
 
     with pytest.raises(ValueError, match="bad request"):
         wrapper(wrapped, None, [], {"model": "test-model"})
 
     status_arg = span.set_status.call_args.args[0]
-    assert status_arg.status_code.name == "ERROR"
+    assert status_arg.status_code == StatusCode.ERROR
     assert status_arg.description == "bad request"
+    span.end.assert_called_once()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/opentelemetry-instrumentation-together/tests/test_errors.py` around
lines 30 - 39, The test test_chat_api_error_records_exception_message is missing
an assertion that the span was ended and uses a fragile string comparison for
the status code; update the test to assert span.end.assert_called_once() (or
span.end.assert_called_once_with() if specific args) after the error handling,
and replace the string comparison status_arg.status_code.name == "ERROR" with a
direct enum comparison using the imported StatusCode (e.g.,
status_arg.status_code == StatusCode.ERROR) so the test checks end() was invoked
and uses the stable StatusCode enum.

25-25: 💤 Low value

Strengthen record_exception assertion to verify the exception object.

assert_called_once() confirms the call was made but not that the right exception was passed. Consider using assert_called_once_with(ANY) or capturing the argument and checking its message:

🔧 Proposed improvement
-    span.record_exception.assert_called_once()
+    exc_arg = span.record_exception.call_args.args[0]
+    assert isinstance(exc_arg, RuntimeError)
+    assert "401 Unauthorized" in str(exc_arg)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/opentelemetry-instrumentation-together/tests/test_errors.py` at line
25, The test currently only checks that span.record_exception was called
(span.record_exception.assert_called_once()); strengthen it by verifying the
actual exception argument: replace or augment that assertion with
span.record_exception.assert_called_once_with(ANY) (importing unittest.mock.ANY)
or capture the call args (via span.record_exception.call_args) and assert the
exception instance/message matches the expected error string; reference
span.record_exception, assert_called_once_with, ANY or use
span.record_exception.call_args to inspect the passed exception.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@packages/opentelemetry-instrumentation-together/tests/test_errors.py`:
- Around line 30-39: The test test_chat_api_error_records_exception_message is
missing an assertion that the span was ended and uses a fragile string
comparison for the status code; update the test to assert
span.end.assert_called_once() (or span.end.assert_called_once_with() if specific
args) after the error handling, and replace the string comparison
status_arg.status_code.name == "ERROR" with a direct enum comparison using the
imported StatusCode (e.g., status_arg.status_code == StatusCode.ERROR) so the
test checks end() was invoked and uses the stable StatusCode enum.
- Line 25: The test currently only checks that span.record_exception was called
(span.record_exception.assert_called_once()); strengthen it by verifying the
actual exception argument: replace or augment that assertion with
span.record_exception.assert_called_once_with(ANY) (importing unittest.mock.ANY)
or capture the call args (via span.record_exception.call_args) and assert the
exception instance/message matches the expected error string; reference
span.record_exception, assert_called_once_with, ANY or use
span.record_exception.call_args to inspect the passed exception.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d1e8a54f-4fac-4992-a1a9-4763b0be065b

📥 Commits

Reviewing files that changed from the base of the PR and between 3735204 and 996c4b7.

📒 Files selected for processing (2)
  • packages/opentelemetry-instrumentation-together/opentelemetry/instrumentation/together/__init__.py
  • packages/opentelemetry-instrumentation-together/tests/test_errors.py

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


muhamedfazalps seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants