Skip to content

Dataset reimplementation#2341

Draft
shagun-singh-inkeep wants to merge 18 commits intomainfrom
dataset-reimplementation
Draft

Dataset reimplementation#2341
shagun-singh-inkeep wants to merge 18 commits intomainfrom
dataset-reimplementation

Conversation

@shagun-singh-inkeep
Copy link
Copy Markdown
Collaborator

No description provided.

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Feb 25, 2026

⚠️ No Changeset found

Latest commit: ba763d5

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@vercel
Copy link
Copy Markdown

vercel Bot commented Feb 25, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agents-api Ready Ready Preview, Comment Mar 31, 2026 5:58pm
agents-docs Ready Ready Preview, Comment Mar 31, 2026 5:58pm
agents-manage-ui Error Error Mar 31, 2026 5:58pm

Request Review

claude[bot]

This comment was marked as outdated.

@github-actions github-actions Bot deleted a comment from claude Bot Feb 25, 2026
@itoqa
Copy link
Copy Markdown

itoqa Bot commented Feb 25, 2026

Ito Test Report ❌

25 test cases ran. 20 passed, 5 failed.

This test run verified the Dataset Reimplementation feature (PR #2341), which re-enables the "Test Suites" sidebar link, adds new dataset run endpoints, and replaces old chat-API-based execution with scheduled trigger infrastructure. The core routing, API endpoints, and backend infrastructure are working correctly. However, several UI bugs were identified: the run detail page incorrectly displays progress and invocation data when all agent executions fail, the cross-product display groups by dataset item instead of showing per-invocation rows, and unauthenticated users can access protected UI routes.

✅ Passed (20)
Test Case Summary Timestamp Screenshot
ROUTE-1 Verified Test Suites link appears under Monitor section in sidebar with Database icon. Clicking it navigates to /default/projects/activities-planner/datasets showing dataset listing. 1:54 ROUTE-1_1-54.png
ROUTE-2 Created run config 'Test Run E2E' with 1 agent, auto-trigger fired creating 3 invocations (3 items x 1 agent), run appeared in runs list 6:16 ROUTE-2_6-16.png
ROUTE-3 Run detail page loads at correct URL pattern, displays run name 'Test Run E2E', shows 'Run in progress' indicator with spinner, progress bar and test cases table with 3 items. 14:35 ROUTE-3_14-35.png
ROUTE-5 All filter mechanisms work correctly: search input filters by text, Show Filters expands filter panel, Agent and Output Status dropdowns work, Clear Filters resets all. 28:07 ROUTE-5_28-07.png
ROUTE-6 GET dataset-runs/by-dataset endpoint returns 200 with run objects containing id, datasetId, status, totalItems, completedItems, failedItems. 32:09 ROUTE-6_32-09.png
ROUTE-7 GET dataset-runs/{runId}/items returns 200 with invocation objects containing id, agentId, datasetRunId, datasetItemId, status, attemptNumber, conversationId. 32:50 ROUTE-7_32-50.png
ROUTE-8 Trigger endpoint created 3 invocations (3 dataset items x 1 agent). API response shows totalItems=3, status=completed. 11:07 ROUTE-8_11-07.png
ROUTE-9 Created evaluator and run config with evaluator attached. Run detail page shows View Evaluation Job button. Clicking opens new tab with correct URL format. 48:55 ROUTE-9_48-55.png
ROUTE-10 Clicking 'Back to test suite' button on run detail page navigates to dataset detail page with the Runs tab active. 19:46 ROUTE-10_19-46.png
EDGE-1 Created run config on empty dataset (0 items). API confirmed totalItems:0. Run detail page correctly shows Test Cases (0) with No items found message. 51:12 EDGE-1_51-12.png
EDGE-4 API response shows run with all items failed (failedItems=3, completedItems=0) has status=completed. Confirms deriveRunStatus correctly returns completed when pending+running=0. 39:44 EDGE-4_39-44.png
EDGE-6 Timestamps display in local timezone format using browser's locale settings via Intl.DateTimeFormat. 14:44 EDGE-6_14-44.png
EDGE-7 API run status metadata has only totalItems/completedItems/failedItems fields. Cancelled invocations are counted under failedItems. 40:24 EDGE-7_40-24.png
EDGE-8 Created run config via API without triggering. Config exists in system but does not appear in runs list. Confirms partial failure handling. 44:37 EDGE-8_44-37.png
ADV-1 POST to trigger with non-existent runConfigId returns HTTP 404 with body {"res":{},"status":404}. 33:22 ADV-1_33-22.png
ADV-3 Clicked Create Run with empty Name field, validation message 'Name is required' appeared preventing submission. 7:04 ADV-3_7-04.png
ADV-4 Rapidly triple-clicked Create Run button. Only 1 run config was created. Button disabled after first click prevents duplicates. 12:37 ADV-4_12-37.png
ADV-5 Navigated to non-existent runId. Page shows error state with Error title and HTTP 404: Not Found message. 35:33 ADV-5_35-33.png
ADV-6 Created dataset item with script tag content. Verified HTML/script content rendered as plain text in JSON code view. No script execution detected. 37:29 ADV-6_37-29.png
LOGIC-1 POST /evals/run-dataset-items returns HTTP 404 Not Found. The old route has been successfully removed. 33:51 LOGIC-1_33-51.png
❌ Failed (5)
Test Case Summary Timestamp Screenshot
ROUTE-4 Table shows incorrect data: Agent column shows '-', Output shows 'Processing...' for failed items, Test Cases header shows (0) despite 3 rows displayed. 25:55 ROUTE-4_25-55.png
EDGE-2 API shows totalItems:0 but UI incorrectly shows 4 items with Agent dash and Processing status stuck. 54:28 EDGE-2_54-28.png
EDGE-3 API confirms 8 invocations (4 items x 2 agents) but UI only shows 4 rows grouped by dataset item. 59:59 EDGE-3_59-59.png
EDGE-5 Auto-refresh never stops. UI stuck showing 'Run in progress' even though API reports run completed with all items failed. 14:59 EDGE-5_14-59.png
ADV-2 API correctly returns 401 for unauthenticated requests, but UI loads datasets page without redirecting to login. 1:04:21 ADV-2_1-04-21.png
ROUTE-4: Dataset run detail shows test cases table with correct data – Failed
  • Where: Run detail page at /{tenantId}/projects/{projectId}/datasets/{datasetId}/runs/{runId}

  • Steps to reproduce:

    1. Navigate to a run detail page where all agent invocations have failed
    2. Observe the Test Cases table
  • What failed: The table displays incorrect information: (1) Agent column shows '-' instead of the agent ID, (2) Output column shows 'Processing...' with spinner for items where the API confirms status='failed', (3) Test Cases section header shows count '(0)' despite 3 rows being displayed in the table.

  • Code analysis: The UI derives display state from conversations array on each item. When invocations fail before creating a conversation, the conversations array is empty. The code at line 394-434 shows a placeholder row with Agent='-' and "Processing..." when no conversations exist, regardless of actual invocation status.

  • Relevant code:

    agents-manage-ui/src/app/[tenantId]/projects/[projectId]/datasets/[datasetId]/runs/[runId]/page.tsx (lines 394–435)

    const conversations = item.conversations || [];
    if (conversations.length === 0) {
      // No conversations yet - show placeholder row with loading state if run is in progress
      return (
        <TableRow key={item.id}>
          <TableCell>
            {/* ... */}
          </TableCell>
          <TableCell>
            <span className="text-sm text-muted-foreground">-</span>
          </TableCell>
          <TableCell>
            {conversationProgress.isRunning ? (
              <span className="flex items-center gap-2 text-sm text-muted-foreground">
                <Loader2 className="h-3 w-3 animate-spin" />
                Processing...
              </span>
            ) : (
              <span className="text-sm text-muted-foreground">No output</span>
            )}
          </TableCell>
          {/* ... */}
        </TableRow>
      );
    }

    agents-manage-ui/src/app/[tenantId]/projects/[projectId]/datasets/[datasetId]/runs/[runId]/page.tsx (lines 324–330)

    <CardTitle>
      Test Cases (
      {filteredItems.reduce((acc, item) => acc + (item.conversations?.length || 0), 0)}{' '}
      {/* This counts conversations, not invocations - shows 0 when all fail */}
      )
    </CardTitle>
  • Why this is likely a bug: The UI should display invocation status (from scheduled trigger invocations) rather than relying solely on conversations. When invocations fail, no conversation is created, but the UI should still show the failure status and agent ID from the invocation data.

  • Introduced by this PR: Yes – this PR modified the relevant code. The run detail page was part of the dataset reimplementation.

  • Timestamp: 25:55

EDGE-2: Run config with no agent relations produces zero invocations – Failed
  • Where: Run detail page for a run config created with no agents selected

  • Steps to reproduce:

    1. Create a run config with no agents selected on a populated dataset (4 items)
    2. Navigate to the run detail page
  • What failed: API correctly shows totalItems:0 for the run. However, the UI run detail page incorrectly shows 4 items with Agent column as dash, all showing "Processing..." with "Run in progress" status stuck. The UI displays dataset items even when there are no invocations.

  • Code analysis: The backend at datasetRuns.ts line 213-217 fetches ALL dataset items via listDatasetItems(db) regardless of how many invocations exist. The UI displays these items, creating a disconnect between what the API reports (0 invocations) and what the UI shows (4 dataset items with placeholder rows).

  • Relevant code:

    agents-api/src/domains/manage/routes/evals/datasetRuns.ts (lines 213–217)

    // Get all dataset items for this dataset
    const datasetItems = await listDatasetItems(db)({
      scopes: { tenantId, projectId, datasetId: run.datasetId },
    });

    agents-manage-ui/src/app/[tenantId]/projects/[projectId]/datasets/[datasetId]/runs/[runId]/page.tsx (lines 106–113)

    const conversationProgress = useMemo(() => {
      if (!run?.items) return { total: 0, completed: 0, isRunning: false };
      const total = run.items.length;  // Uses dataset items count, not invocations
      const completed = run.items.filter(
        (item) => item.conversations && item.conversations.length > 0
      ).length;
      return { total, completed, isRunning: completed < total && total > 0 };
    }, [run]);
  • Why this is likely a bug: When no agents are selected, the run has zero invocations (totalItems=0). The UI should show "No test cases" or display based on actual invocation count from the API's status metadata, not based on the number of dataset items.

  • Introduced by this PR: Yes – this PR modified the relevant code in both the API endpoint and UI page.

  • Timestamp: 54:28

EDGE-3: Run with multiple agents and items creates correct cross-product of invocations – Failed
  • Where: Run detail page for a run with 4 items x 2 agents

  • Steps to reproduce:

    1. Create a run config selecting 2 agents on a dataset with 4 items
    2. Trigger the run and navigate to the run detail page
  • What failed: The API correctly reports totalItems=8 (4 items x 2 agents = 8 invocations). However, the UI run detail page only displays 4 rows (grouped by dataset item) instead of the expected 8 rows (one per agent-item combination). The progress bar shows '0 of 4 completed' instead of '0 of 8'.

  • Code analysis: The UI iterates over filteredItems (dataset items) and then maps over item.conversations. Since no conversations were created (all failed), each dataset item shows a single placeholder row. The UI structure is designed to show one row per conversation, not one row per invocation.

  • Relevant code:

    agents-manage-ui/src/app/[tenantId]/projects/[projectId]/datasets/[datasetId]/runs/[runId]/page.tsx (lines 356–360)

    {filteredItems.flatMap((item) => {
      // ...
      const conversations = item.conversations || [];
      if (conversations.length === 0) {
        // Shows ONE placeholder row per dataset item, not per invocation
        return (
          <TableRow key={item.id}>

    agents-manage-ui/src/app/[tenantId]/projects/[projectId]/datasets/[datasetId]/runs/[runId]/page.tsx (lines 437–481)

    return conversations.map((conversation) => (
      <TableRow key={`${item.id}-${conversation.conversationId}`}>
        {/* This correctly shows one row per conversation when they exist */}
      </TableRow>
    ));
  • Why this is likely a bug: The UI should display one row per scheduled trigger invocation (from the API's invocations data), not per dataset item. When multiple agents are selected, the cross-product creates N x M invocations, and the UI should reflect this.

  • Introduced by this PR: Yes – this PR introduced the dataset run detail page as part of the reimplementation.

  • Timestamp: 59:59

EDGE-5: Auto-refresh stops when run completes – Failed
  • Where: Run detail page during and after run completion

  • Steps to reproduce:

    1. Navigate to a run detail page where all invocations have failed
    2. Observe the auto-refresh behavior and UI state
  • What failed: Auto-refresh does NOT stop when the run completes. The API reports status=completed with 3 failed items, but the UI remains stuck showing 'Run in progress' with '0 of 3 completed'. The polling continues indefinitely via repeated requests every ~3 seconds.

  • Code analysis: The isRunInProgress flag at line 116-117 depends on conversationProgress.isRunning. The conversation progress calculates completion based on conversations created. When all invocations fail, no conversations are created, so completed is always 0 and total is always > 0 (dataset items count), making isRunning perpetually true.

  • Relevant code:

    agents-manage-ui/src/app/[tenantId]/projects/[projectId]/datasets/[datasetId]/runs/[runId]/page.tsx (lines 106–117)

    const conversationProgress = useMemo(() => {
      if (!run?.items) return { total: 0, completed: 0, isRunning: false };
      const total = run.items.length;
      const completed = run.items.filter(
        (item) => item.conversations && item.conversations.length > 0
      ).length;
      return { total, completed, isRunning: completed < total && total > 0 };
    }, [run]);
    
    // Overall progress - run is complete only when both conversations AND evaluations are done
    const isRunInProgress =
      conversationProgress.isRunning || (evaluationProgress?.isRunning ?? false);

    agents-manage-ui/src/app/[tenantId]/projects/[projectId]/datasets/[datasetId]/runs/[runId]/page.tsx (lines 124–133)

    useEffect(() => {
      if (!isRunInProgress) return;
    
      const interval = setInterval(() => {
        loadRun(false); // Don't show loading state for refresh
      }, 3000); // Refresh every 3 seconds
    
      return () => clearInterval(interval);
    }, [isRunInProgress, loadRun]);
  • Why this is likely a bug: The UI should use the API's reported status (from deriveRunStatus which returns 'completed' when pending+running=0) to determine if the run is complete, not rely on conversation count. This causes infinite polling and a permanently stuck "in progress" state for any run where invocations fail.

  • Introduced by this PR: Yes – this PR introduced the run detail page with auto-refresh functionality.

  • Timestamp: 14:59

ADV-2: Accessing dataset routes without authentication returns 401/403 – Failed
  • Where: UI datasets page at /default/projects/activities-planner/datasets

  • Steps to reproduce:

    1. Clear all cookies (unauthenticated state)
    2. Navigate directly to the datasets page URL
  • What failed: The API (port 3002) correctly returns 401 Unauthorized for unauthenticated requests. However, the UI (port 3000) does NOT redirect unauthenticated users to a login page. After clearing all cookies, navigating to the datasets page renders the full page with data visible.

  • Code analysis: The agents-manage-ui app does not have a Next.js middleware.ts file for route protection. The tenant layout ([tenantId]/layout.tsx) renders content without checking authentication status. Authentication is handled client-side via the AuthClientProvider context, but there's no server-side redirect for unauthenticated users.

  • Relevant code:

    agents-manage-ui/src/app/[tenantId]/layout.tsx (lines 9–42)

    const Layout: FC<LayoutProps<'/[tenantId]'>> = ({ children, breadcrumbs }) => {
      return (
        <AppSidebarProvider>
          <SentryScopeProvider>
            <SidebarInset>
              {/* Layout renders without auth check */}
              <main>
                <div className="flex-1 p-6">{children}</div>
              </main>
            </SidebarInset>
          </SentryScopeProvider>
        </AppSidebarProvider>
      );
    };

    agents-manage-ui/src/lib/api/api-config.ts (lines 44–68)

    // API requests include bypass secret for server-side calls
    const headers: HeadersInit = {
      'Content-Type': 'application/json',
      ...(isServer && process.env.INKEEP_AGENTS_MANAGE_API_BYPASS_SECRET
        ? {
            Authorization: `Bearer ${process.env.INKEEP_AGENTS_MANAGE_API_BYPASS_SECRET}`,
          }
        : {}),
    };
  • Why this is likely a bug: Protected routes should redirect unauthenticated users to the login page. The current implementation allows direct access to UI pages that display protected data because server-side rendering uses a bypass secret, but then renders the page to an unauthenticated user.

  • Introduced by this PR: No – pre-existing bug (authentication code not changed in this PR). However, this PR re-enabled the datasets routes which expose this issue.

  • Timestamp: 1:04:21

📋 View Recording

Screen Recording

@itoqa
Copy link
Copy Markdown

itoqa Bot commented Feb 25, 2026

Ito Test Report ❌

32 test cases ran. 31 passed, 1 failed.

Testing verified the Dataset (Test Suite) reimplementation in PR #2341. The core functionality works well: sidebar navigation, dataset CRUD operations, tab switching, item creation, run config creation, run progress tracking, auto-refresh, filtering, XSS prevention, and error handling all passed. One validation bug was confirmed in the run config form where submitting without agents selected does not show an error.

✅ Passed (31)
Test Case Summary Timestamp Screenshot
ROUTE-1 Verified sidebar has Monitor section with Test Suites link positioned between Traces and Evaluations 3:38 ROUTE-1_3-38.png
ROUTE-2 Datasets page shows empty state with 'No test suites yet.' heading, description text, and 'Create test suite' link 5:12 ROUTE-2_5-12.png
ROUTE-3 Created dataset 'Playwright Test Suite' via the create form 6:14 ROUTE-3_6-14.png
ROUTE-4 Verified default tab is Items, clicking Runs tab shows runs content with URL ?tab=runs 7:54 ROUTE-4_7-54.png
ROUTE-5 Created a dataset item with role 'user' and content 'What is the weather in San Francisco?' 9:21 ROUTE-5_9-21.png
ROUTE-6 Successfully created run config 'Test Run Alpha' with Activities Planner agent selected 20:03 ROUTE-6_20-03.png
ROUTE-7 Run detail page shows progress tracking with 'Run in progress' banner, progress bar, and test cases table 20:29 ROUTE-7_20-29.png
ROUTE-8 Observed auto-refresh on run detail page with timestamp progressing from 'just now' to '2m ago' 22:38 ROUTE-8_22-38.png
ROUTE-9 Verified search filter, Show/Hide Filters toggle, Output Status filter, and Clear Filters button 25:54 ROUTE-9_25-54.png
ROUTE-10 DatasetItemViewDialog opened showing full Input messages with role and content 27:12 ROUTE-10_27-12.png
ROUTE-12 View Evaluation Job button appears on run detail page when evaluators are attached 42:08 ROUTE-12_42-08.png
ROUTE-13 Run detail page shows dual progress tracking for Test cases and Evaluations 42:08 ROUTE-13_42-08.png
ROUTE-14 Runs list shows 'Test Run Alpha' with relative creation timestamp and chevron icon 20:04 ROUTE-14_20-04.png
ROUTE-15 Run At column shows local timezone format, Created shows relative timestamp with clock icon 20:32 ROUTE-15_20-32.png
ROUTE-16 Runs tab empty state showing 'No runs yet' text and 'Add first run' button 48:32 ROUTE-16_48-32.png
ROUTE-17 Back to test suite button navigates to dataset page with Runs tab selected 32:32 ROUTE-17_32-32.png
ROUTE-18 Run config form showed 'Loading agents...' and 'Loading evaluators...' during data load 48:53 ROUTE-18_48-53.png
EDGE-1 Triggered run on empty dataset, graceful handling with 'No items found' message 52:58 EDGE-1_52-58.png
EDGE-3 Validation error 'Name is required' displayed when submitting empty name 49:39 EDGE-3_49-39.png
EDGE-4 Run detail page shows pending items correctly with 'Processing...' spinner and 'Pending...' text 20:30 EDGE-4_20-30.png
EDGE-6 Created Run B and Run C in quick succession, both appear as separate entries 60:04 EDGE-6_60-04.png
EDGE-7 Tab state persists via URL query parameter ?tab=runs 10:18 EDGE-7_10-18.png
EDGE-8 Complex message content formats all display correctly in run detail table 60:43 EDGE-8_60-43.png
EDGE-9 Search for non-matching term shows 'No test cases match the current filters' message 33:09 EDGE-9_33-09.png
EDGE-10 Long input text truncated at ~100 chars with ellipsis, dialog shows full content 60:46 EDGE-10_60-46.png
EDGE-11 Runs list shows skeleton loading placeholders during data fetch 65:32 EDGE-11_65-32.png
ADV-1 XSS payload rendered as plain escaped text, no script execution 68:50 ADV-1_68-50.png
ADV-2 Non-existent run ID shows Error card with HTTP 404 Not Found 69:57 ADV-2_69-57.png
ADV-3 Invalid tab query parameter falls back gracefully, tab switching works normally 10:56 ADV-3_10-56.png
ADV-4 Dev mode auto-authenticates, no redirect to login page 0:00 ADV-4_0-00.png
ADV-5 Rapid double-click on Create Run button prevented duplicate creation 19:26 ADV-5_19-26.png
❌ Failed (1)
Test Case Summary Timestamp Screenshot
EDGE-2 Form submitted successfully with 0 agents selected - expected validation error but got success 50:26 EDGE-2_50-26.png
EDGE-2: Run config form with no agents selected validation – Failed
  • Where: Dataset run config creation form dialog

  • Steps to reproduce:

    1. Navigate to a dataset's Runs tab
    2. Click 'Add first run' or 'New run' button
    3. Enter a name in the Name field (e.g., 'Validation Test Run')
    4. Do NOT select any agents from the Agents multi-selector
    5. Click 'Create Run' button
  • What failed: Expected a validation error preventing form submission when no agents are selected. Instead, the form submitted successfully, creating a run with 0 agents. The success toast 'Run config created successfully' appeared.

  • Code analysis: Examined the form validation schema and found the root cause. The UI shows the Agents field with an isRequired indicator (asterisk), but the Zod validation schema does not enforce a minimum of one agent.

  • Relevant code:

    agents-manage-ui/src/components/datasets/form/dataset-run-config-validation.ts (lines 3–8)

    export const datasetRunConfigSchema = z.object({
      name: z.string().min(1, 'Name is required'),
      description: z.string().optional(),
      agentIds: z.array(z.string()).default([]),  // Bug: missing .min(1) validation
      evaluatorIds: z.array(z.string()).default([]),
    });

    agents-manage-ui/src/components/datasets/form/dataset-run-config-form.tsx (lines 177–181)

    <FormItem>
      <div className="flex items-center gap-2">
        <FormLabel isRequired>Agents</FormLabel>  {/* Shows required indicator */}
        <Badge variant="count">{(agentIds as string[]).length}</Badge>
      </div>
  • Why this is likely a bug: The UI displays an asterisk (isRequired) on the Agents label indicating it's a required field, but the Zod schema only uses .default([]) without .min(1, ...). This creates a mismatch where users see a required indicator but can submit without selecting any agents. The fix is to change line 6 to: agentIds: z.array(z.string()).min(1, 'At least one agent is required').

  • Introduced by this PR: Yes – this PR modified the relevant code. This PR re-enabled the dataset run configs routes and modified the dataset run config actions. While the validation file itself may not be new, the feature re-enablement means this validation gap is now exposed to users.

  • Timestamp: 50:26

📋 View Recording

Screen Recording

@shagun-singh-inkeep shagun-singh-inkeep force-pushed the dataset-reimplementation branch from d4643e1 to 6b72a69 Compare March 3, 2026 21:53
Made-with: Cursor
@itoqa
Copy link
Copy Markdown

itoqa Bot commented Mar 10, 2026

Ito Test Report ❌

11 test cases ran. 10 passed, 1 failed.

The run validated core feedback, branch, and dataset flows that were executable in this environment. One user-facing defect was confirmed through code inspection: invalid feedback query parameters are forwarded without bounds sanitization, which can trigger a hard load error instead of graceful coercion.

✅ Passed (10)
Test Case Summary Timestamp Screenshot
ROUTE-1 Feedback page loaded at /default/projects/default/feedback without runtime crash and displayed a valid empty state. 0:00 ROUTE-1_0-00.png
ROUTE-2 Created positive message-scoped feedback via localhost API fallback and verified the positive row with messageId renders in Feedback UI. 10:45 ROUTE-2_10-45.png
ROUTE-5 UI delete removed the feedback row and repeat delete via API returned not found, confirming non-false-success behavior after deletion. 10:45 ROUTE-5_10-45.png
ROUTE-8 Clean branch merge API returned success and no conflicts. 14:07 ROUTE-8_14-07.png
ROUTE-10 Non-main branch deletion succeeded and protected main-branch deletion was correctly rejected. 14:07 ROUTE-10_14-07.png
ROUTE-11 Created a dataset run config with an agent relation and verified automatic run creation in UI; API trigger endpoint returned 202 with datasetRunId. 38:01 ROUTE-11_38-01.png
ROUTE-12 Run detail showed consistent status and counters, and dataset-runs items API returned 200 with matching datasetRunId, status, and attempt fields for all items. 38:16 ROUTE-12_38-16.png
EDGE-1 Branches page rendered a valid empty state with 'No branches' messaging and no broken table artifacts. 14:07 EDGE-1_14-07.png
ADV-2 Rapid repeated clicks on merge and delete confirmations produced one effective mutation each pending cycle; UI prevented duplicate destructive requests and ended with a single branch deletion outcome. 25:53 ADV-2_25-53.png
ADV-3 Unauthorized feedback create and branch merge mutation calls were both denied with 401 responses, confirming mutation boundaries were enforced. 42:23 ADV-3_42-23.png
❌ Failed (1)
Test Case Summary
EDGE-2 Invalid feedback query parameters rendered a failed-load state instead of being safely coerced to valid pagination bounds.
Feedback page query-param coercion – Failed
  • Where: Feedback route (/{tenantId}/projects/{projectId}/feedback) server-side load path.

  • Steps to reproduce: Open the feedback page with out-of-range pagination params (for example ?page=999999&limit=100000).

  • What failed: The page passes unbounded numeric query values directly to the API, receives a validation error for oversized limit, and falls into the full-page error state instead of coercing inputs to safe bounds.

  • Code analysis: The page parser only checks numeric parse/finite-ness, not bounds; the API route validates query params with strict pagination schema, so oversized values are rejected and bubble up to error UI.

  • Relevant code:

    agents-manage-ui/src/app/[tenantId]/projects/[projectId]/feedback/page.tsx (lines 31-39)

    const pageNumber = page ? Number.parseInt(page, 10) : 1;
    const limitNumber = limit ? Number.parseInt(limit, 10) : 25;
    
    const response = await fetchFeedback(tenantId, projectId, {
      conversationId,
      messageId,
      page: Number.isFinite(pageNumber) ? pageNumber : 1,
      limit: Number.isFinite(limitNumber) ? limitNumber : 25,
    });

    agents-api/src/domains/manage/routes/feedback.ts (lines 39-43)

    query: PaginationQueryParamsSchema.extend({
      conversationId: z.string().optional().describe('Optionally filter by conversation ID'),
      messageId: z.string().optional().describe('Optionally filter by message ID'),
    }),

    agents-api/src/domains/manage/routes/feedback.ts (lines 58-65)

    const { conversationId, messageId, page = 1, limit = 10 } = c.req.valid('query');
    
    const result = await listFeedback(runDbClient)({
      scopes: { tenantId, projectId },
      conversationId,
      messageId,
      pagination: { page, limit },
    });
  • Why this is likely a bug: The UI path explicitly intends query-param handling for feedback pagination, but out-of-range values are not sanitized before strict API validation, producing a user-visible load failure.

  • Introduced by this PR: Yes - this PR modified the relevant code.

📋 View Recording

Screen Recording

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant