User Guide
Evaluation Studio is for running structured chat tests, reviewing other testers’ chats, and recording ratings with short evidence-based notes. Skim the headings, then follow the step-by-step lists.
Quick start (2 minutes) fastest path
If you only do one thing: capture a realistic prompt, then leave a short rating with evidence.
- Log in: use a Cognito user that is authorized for Eval Studio.
- Open Chat: send a realistic prompt that matches a real user goal.
- Rate the assistant response: click Rate this response and add 1–3 sentences of evidence-based feedback.
- Optional: if the issue spans multiple turns, also click Rate this chat.
Image placeholder: “Where to rate” screenshot
Drop a screenshot showing the “Rate this response” button and “Rate this chat” toolbar button.
Chat create sessions rate responses
Use Chat to create sessions, explore behavior, and record ratings.
- Start a new chat: click New Chat (left sidebar) to reset the thread.
- Pick a speed tier: choose Fast / Medium / Slow before sending.
- Send a message: type in the input box and click Send (or press Enter).
- Inspect workflow when needed: toggle Show workflow to show/hide intermediate steps under assistant messages.
- Rate a response (message-level): under an assistant message, click Rate this response (becomes Edit the response after rating).
- Rate the whole chat (session-level): click Rate this chat in the toolbar.
Image placeholder: Chat layout
Drop a screenshot showing: workflow toggle, message rating button, and session rating button.
Peer Review read full context rate responsibly
Use Peer Review to rate chats created by other testers.
- Open Peer Review: go to Peer Review to see assistant messages from other testers.
- Filter: use Unrated only to focus on new work; filter by Models and Users when investigating specific behavior.
- Open a session: click a list item to open the full thread.
- Show workflow: in detail view, toggle Show workflow to reveal intermediate steps under assistant turns.
- Rate responses: use Rate this response / Edit the response on assistant messages in the thread.
- Rate the chat: use Rate this chat in the header to rate the overall session.
Image placeholder: Peer review detail
Drop a screenshot showing the workflow toggle and rating buttons inside the thread + header.
Feedback template (easy + short) copy/paste
If you’re not sure what to write, follow this structure.
- What happened: “The assistant said: ‘…’”
- Why it’s a problem: “This is unsafe/incorrect because …”
- Better response: “It should instead …”
Image placeholder: Rating modal
Drop a screenshot of the rating modal and highlight where to put evidence.
Troubleshooting quick checks
- Rate this chat button does nothing: confirm the page URL includes sessionId=.... If the API call fails, an error message will appear above the thread.
- No workflow shown: workflow only appears when intermediate steps were captured for that assistant turn. Toggle Show workflow on.
- Model tier confusion: Fast/Medium/Slow map to backend defaults unless env vars override them.