Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
zjunlp/Chat2Workflow-Evaluation · Datasets at Hugging Face
[go: Go Back, main page]

Dataset Viewer
Auto-converted to Parquet Duplicate
task
stringclasses
27 values
round
int64
1
4
node_selection
stringlengths
8
805
design_principle
stringlengths
132
6.1k
workflow
stringlengths
506
14.9k
valid
bool
2 classes
has_been_judged
bool
2 classes
reason
stringlengths
198
14.9k
api_key
stringlengths
28
28
test1
bool
2 classes
llm_evaluate_reason_test1
stringlengths
375
1.99k
llm_output_test1
stringlengths
10
165k
test2
bool
2 classes
llm_evaluate_reason_test2
stringlengths
397
1.88k
llm_output_test2
stringlengths
10
167k
test3
bool
2 classes
llm_evaluate_reason_test3
stringlengths
403
1.74k
llm_output_test3
stringlengths
10
176k
fail_step
stringclasses
4 values
BatchFiles
1
Start, Iteration, Iteration-Start, Document Extractor, LLM (3 instances), Template (2 instances), Markdown Exporter, End
The workflow needs to process multiple documents iteratively, performing three sequential LLM operations on each document, then aggregating all results into a formatted output. **Workflow Structure:** 1. **Start Node (id: 1)**: Accepts an array of documents via variable `doc_list` 2. **Iteration Node (id: 2)**: Proces...
{ "nodes_info": [ { "id": "1", "type": "start", "param": { "variables": [ ["doc_list", "array[file]"] ] } }, { "id": "2", "type": "iteration", "param": { "iterator_selector": ["doc_list", "1"], "output_selector": ["output"...
true
true
Rule 1: Ground-Truth Node Coverage. gt_nodes: ['llm', 'iteration', 'template-transform', 'markdown-exporter', 'document-extractor']. node_selection: Start, Iteration, Iteration-Start, Document Extractor, LLM (3 instances), Template (2 instances), Markdown Exporter, End. Check each gt_node type against node_selection: ...
app-8h02nUV03I5SVii8l6Lo2eHS
true
The latest instruction in the queries requires the workflow to process a list of documents, parse their structure, analyze the main content and function of each part, and summarize the articles. The summaries of all documents should be aggregated into a single text through template conversion, which is then converted ...
summary: # Document Summaries Report ## Document 1 Summary Here is a **comprehensive, integrated summary** of the Reuters article—synthesizing its structure, content, purpose, and significance—based on your detailed analyses: --- ### **Article Summary: “Daniel Radcliffe Gains Access to £20M Fortune on 18th Birthday...
true
The latest instruction in the queries requires the workflow to process a list of documents, parse their structure, analyze their main content and function, and summarize each document. The summaries should then be aggregated into a single text, converted into Markdown format, and output as both a summary text and a DO...
summary: # Document Summaries Report ## Document 1 Summary Here is a **comprehensive, integrated summary** of the CNN news report on the **I-35W Mississippi River Bridge collapse in Minneapolis (August 1, 2007)**, synthesizing its structure, content, journalistic strategy, and thematic resonance—based on your detaile...
false
null
null
null
BatchFiles
2
Start, Iteration, Iteration-Start, LLM, Template (2 instances), Markdown Exporter, End
The workflow is modified to process images instead of documents. Since images can be directly processed by multimodal LLMs without requiring extraction, the Document Extractor node is removed. **Key Changes:** 1. **Start Node**: Changed input variable from `doc_list` (array[file]) to `pic_list` (array[file]) for image...
{ "nodes_info": [ { "id": "1", "type": "start", "param": { "variables": [ ["pic_list", "array[file]"] ] } }, { "id": "2", "type": "iteration", "param": { "iterator_selector": ["pic_list", "1"], "output_selector": ["output"...
true
true
Rule 1: Ground-Truth Node Coverage gt_nodes: ['llm', 'iteration', 'template-transform', 'markdown-exporter'] node_selection: Start, Iteration, Iteration-Start, LLM, Template (2 instances), Markdown Exporter, End Check if all node types in gt_nodes are present in node_selection: - 'llm' is present as 'LLM' (case-insens...
app-3qrvKyV399A8VhciEahrzJK8
true
The latest instruction in the queries (query2) requires the workflow to summarize the content of the images one by one. The output provided contains detailed summaries for three images, each structured in a clear and organized manner. The summaries include key information such as the title, authors, core objectives, m...
summary: # Image Summaries Report ## Image 1 Summary This image is a **conference paper page** (likely from ICLR, as indicated by the logo) titled: **“A Fast and Provable Algorithm for Sparse Phase Retrieval”** by Jian-Feng Cai, Yu Long, Ruixue Wen, and Jiaxi Ying. Code: https://github.com/jxying/SparsePR The p...
true
The latest instruction in the queries (query2) changes the input to a list of images (variable pic_list) and tasks the workflow with summarizing the content of the images one by one. The output provided (summary) contains detailed summaries of two images, which aligns with the requirement to summarize the content of t...
summary: # Image Summaries Report ## Image 1 Summary This image is a research poster presented at the **ICML 2024** (International Conference on Machine Learning), titled: **“Layer-Aware Analysis of Catastrophic Overfitting: Revealing the Pseudo-Robust Shortcut Dependency”** Authored by Runqi Lin, Chaojian Yu, Bo H...
true
The latest instruction in the queries (query2) requires the workflow to summarize the content of a list of images one by one. The output provided (summary) contains detailed summaries of two images, each structured with clear headings and subheadings that break down the content into sections such as "Core Problem & Mo...
summary: # Image Summaries Report ## Image 1 Summary This image is a **conference poster** titled: > **“Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning”** Authored by: - Yulai Zhao (Princeton University) - Zhuoran Yang (Yale University) - Zhaoran Wang (Northwestern University...
null
BatchFiles
3
Start, List Operator (2 instances), If-Else, Iteration (2 instances), Iteration-Start (2 instances), Document Extractor, LLM (2 instances), Template (4 instances), Variable Aggregator, Markdown Exporter, End
The workflow now needs to handle mixed file types (documents and images) and process them differently based on their type. This requires filtering the input array and creating separate processing paths. **Workflow Structure:** 1. **Start Node (id: 1)**: Accepts `file_list` as array[file] containing both documents and...
{ "nodes_info": [ { "id": "1", "type": "start", "param": { "variables": [ ["file_list", "array[file]"] ] } }, { "id": "2", "type": "list-operator", "param": { "variable": ["file_list", "1"], "operator": ["filter_by", "in",...
true
true
First, check Rule 1: Ground-Truth Node Coverage. The gt_nodes are: ['llm', 'iteration', 'template-transform', 'markdown-exporter', 'document-extractor', 'list-operator']. The node_selection includes: Start, List Operator (2 instances), If-Else, Iteration (2 instances), Iteration-Start (2 instances), Document Extractor...
app-lqcV7Nv9NUnfig69kcL3YERK
false
The latest instruction in the queries (query3) requires the workflow to handle a mixed input of documents and images (variable file_list), sorting them by category and processing accordingly. However, the provided output only contains a summary of a single document, which does not align with the requirement to process...
summary: # File Summaries Report ## Document 1 Summary **Comprehensive Summary of the Document:** On Friday, the White House announced that Press Secretary **Tony Snow** would step down from his position on **September 14**, to be succeeded by his deputy, **Dana Perino**. Snow, who has been undergoing treatment for ...
false
The latest instruction in the queries (query3) requires the input to contain both documents and images (variable file_list) and to process them according to their category. However, the input provided is empty, which means the workflow did not receive any files to process. Despite this, the output contains a summary o...
summary: # File Summaries Report ## Document 1 Summary **Comprehensive Summary of the CNN Report:** On Friday morning, Niranjan Desai, a resident of Jersey City, New Jersey, discovered an empty **AT4 anti-tank rocket launcher tube** lying on her front lawn. The device—a 20-year-old, one-time-use shoulder-fired weapo...
false
The latest instruction in the queries (query3) requires the workflow to handle a mixed input of documents and images (variable file_list), sort them by category, and process them accordingly. However, the provided output only contains a summary of a single document, which does not align with the requirement to process...
summary: # File Summaries Report ## Document 1 Summary This CNN report details President George W. Bush’s strategic effort to reframe the national debate over the Iraq War ahead of a pivotal progress assessment by General David Petraeus and Ambassador Ryan Crocker in September 2007. Facing declining public support (6...
null
BookCharacter
1
Start, LLM, End
This workflow is designed to create a "three-minute speed-read" summary of a book based on user input: 1. **Start Node**: Collects two input variables from the user: - `book`: The title of the book - `author`: The author's name 2. **LLM Node**: Uses a language model to generate a comprehensive three-minute summ...
{ "nodes_info": [ { "id": "1", "type": "start", "param": { "variables": [ ["book", "string"], ["author", "string"] ] } }, { "id": "2", "type": "llm", "param": { "system": "You are an expert book summarizer who creates co...
true
true
First, evaluate Rule 1: Ground-Truth Node Coverage. gt_nodes is ['llm']. node_selection is Start, LLM, End. The node type 'llm' is present in node_selection (as "LLM", ignoring case). Therefore, all gt_nodes are covered. Rule 1 satisfied. Next, evaluate Rule 2: Consistency and Exact Node Matching. a. Logical Consiste...
app-n6o7iiT1DvbQwqyuq0CExxct
true
The workflow execution aligns with the requirements of the current round instruction. The instruction was to create a 'Finish a Book in Three Minutes' workflow that generates a three-minute speed-read version of the book's content based on the provided book title and author name. The output includes a detailed summary...
result: Absolutely! Here's a **three-minute speed-read summary** of *The Three-Body Problem* by Liu Cixin — sharp, vivid, and packed with insight: --- **📖 What It’s About (30 seconds):** *The Three-Body Problem* is the first novel in Liu Cixin’s award-winning *Remembrance of Earth’s Past* trilogy. Set against the ...
true
The workflow execution aligns with the requirements of the current round instruction. The instruction specifies creating a 'Finish a Book in Three Minutes' workflow, where the user provides the book title and author name as input, and the workflow outputs a three-minute speed-read summary of the book's content. The in...
result: Absolutely—here’s a vibrant, three-minute speed-read summary of *One Hundred Years of Solitude* by Gabriel García Márquez: --- **What It’s About:** *One Hundred Years of Solitude* (1967) is the foundational masterpiece of magical realism—a sweeping, multi-generational saga centered on the Buendía family and...
true
The workflow execution aligns with the requirements of the current round instruction. The instruction was to create a 'Finish a Book in Three Minutes' workflow that generates a three-minute speed-read version of the book's content based on the provided book title and author name. The output, stored in the variable 're...
result: Absolutely! Here's a vibrant, three-minute speed-read summary of **Jin Yong’s *Demi-Gods and Semi-Devils***—a masterpiece of wuxia fiction that blends martial arts, philosophy, identity, and tragedy with breathtaking scope. --- ### 📖 **What It’s About** *Demi-Gods and Semi-Devils* (1963) is the second nove...
null
BookCharacter
2
Start, LLM (existing), LLM (new for character biography), Template, Markdown Exporter, End
Building on the existing workflow, I'm adding functionality to generate a character biography for the book's protagonist and export it as a DOCX file: 1. **Existing nodes (Start, first LLM)**: These remain unchanged, collecting book title and author information. 2. **New LLM Node (id: 4)**: Generates a detailed chara...
{ "nodes_info": [ { "id": "1", "type": "start", "param": { "variables": [ ["book", "string"], ["author", "string"] ] } }, { "id": "2", "type": "llm", "param": { "system": "You are an expert book summarizer who creates co...
true
true
Rule 1: Ground-Truth Node Coverage - Check if node types in gt_nodes is a subset of node_selection. gt_nodes: ['llm', 'markdown-exporter']. node_selection: Start, LLM (existing), LLM (new for character biography), Template, Markdown Exporter, End. This includes 'LLM' and 'Markdown Exporter'. Node type only needs to ap...
app-SOsMqULPSOrmphm4YZkJnMEH
true
null
null
false
null
null
true
null
null
null
BookCharacter
3
Start, LLM (for summary), LLM (for character biography), Template, Markdown Exporter, LLM (new for image prompt), Text to Image, End
Building on the existing workflow, I'm adding functionality to generate an illustration based on the character biography: 1. **Existing nodes (Start, LLMs, Template, Markdown Exporter)**: These remain unchanged, generating the character biography. 2. **New LLM Node (id: 7)**: Creates a detailed image generation promp...
{ "nodes_info": [ { "id": "1", "type": "start", "param": { "variables": [ ["book", "string"], ["author", "string"] ] } }, { "id": "2", "type": "llm", "param": { "system": "You are an expert book summarizer who creates co...
true
true
Rule 1: Ground-Truth Node Coverage gt_nodes contains 'llm' and 'text2image'. node_selection lists 'LLM (for summary)', 'LLM (for character biography)', 'LLM (new for image prompt)', and 'Text to Image'. All required node types from gt_nodes ('llm' and 'text2image') are present in node_selection. Rule 1 is satisfied. ...
app-uZt3a57drRRNMKJOstgXF7pB
false
null
null
false
null
null
false
null
null
null
Code
1
Start, Document Extractor, LLM, End
This workflow is designed to translate code from one programming language to another, focusing on generating the main structure in the first stage: 1. **Input Collection (Start Node)**: - Accept three inputs: source_code (file), source_language (string), and target_language (string) - The source_code is a file ...
{ "nodes_info": [ { "id": "1", "type": "start", "param": { "variables": [ ["source_code", "file"], ["source_language", "string"], ["target_language", "string"] ] } }, { "id": "2", "type": "document-extractor", "param":...
true
true
Rule 1: Ground-Truth Node Coverage. The gt_nodes are ['llm', 'document-extractor']. Both 'llm' and 'document-extractor' appear in node_selection: Start, Document Extractor, LLM, End. Rule 1 is satisfied. Rule 2: Consistency and Exact Node Matching. a. Logical Consistency: The workflow design follows the design_princi...
app-QnFPjMCQwsyBDZmdqVP162jE
true
The workflow execution aligns with the requirements of the current round instruction. The task was to translate the original C code into Python, focusing on generating the main structure of the code without providing specific implementations for classes and functions. The output `target_code` correctly includes the si...
target_code: ```python # Dependencies Analysis: # Built-in libraries needed: # - typing: for type hints (List, Optional, etc.) # - copy: potentially for deep copying (though we'll avoid it in structure stage) # # Custom/local dependencies: None (all functionality is self-contained) from typing import List, Optiona...
true
The workflow execution aligns with the requirements of the current round instruction. The instruction specifies that the workflow should translate the original code into the target code, generating only the main structure of the code in the first stage. For classes and functions, only the corresponding signatures shou...
target_code: ```go // Package nqueens solves the N-Queens problem using backtracking. // This file contains the main structural skeleton for the solution in Go. package nqueens import ( // No external dependencies needed for basic structure ) // Note: In Go, we avoid global variables. Instead, we'll use struct-based...
true
The workflow execution aligns with the requirements of the current round instruction. The instruction specifies that the workflow should translate the original code into the target code, generating only the main structure of the code in the first stage. This includes providing function signatures and class declaration...
target_code: ```javascript // Dependencies Analysis: // Built-in libraries: None required for basic structure (JavaScript has no imports for core functionality) // Custom/local dependencies: None (all functions are self-contained in the original C code) /** * N-Queens solver implementation structure in JavaScript. *...
null
Code
2
Start, Document Extractor, LLM, Code, If-Else, Variable Aggregator, End
This workflow translates source code to Python, validates it, executes it, and returns both the code and execution results: 1. **Input Collection (Start Node)**: - Accept two inputs: source_code (file) and case_input (string) - Removed target_language since Python is now the fixed target - Added case_input f...
{ "nodes_info": [ { "id": "1", "type": "start", "param": { "variables": [ ["source_code", "file"], ["source_language", "string"], ["case_input", "string"] ] } }, { "id": "2", "type": "document-extractor", "param": { ...
false
true
First, check Rule 1: Ground-Truth Node Coverage. gt_nodes are ['llm', 'document-extractor', 'code']. node_selection includes Start, Document Extractor, LLM, Code, If-Else, Variable Aggregator, End. Comparing node types (ignoring case): 'document-extractor' matches 'Document Extractor', 'llm' matches 'LLM', 'code' matc...
null
null
null
null
null
null
null
null
null
null
step_1_4
Code
3
Start, Document Extractor, LLM, LLM, Markdown Exporter, End
"This workflow translates source code to Python and generates comprehensive explanations in Markdown(...TRUNCATED)
"{\n \"nodes_info\": [\n {\n \"id\": \"1\",\n \"type\": \"start\",\n \"param\": {(...TRUNCATED)
true
true
"\nFirst, check Rule 1: Ground-Truth Node Coverage. The gt_nodes are ['llm', 'document-extractor', '(...TRUNCATED)
app-RNTUf02oYPjCfPOOXszfsWAF
true
"\nThe latest instruction in the queries (query3) requires the workflow to generate an explanation p(...TRUNCATED)
"explanation: # Comprehensive Explanation of N-Queens Solver\n\n## 1. Overview\n\n### What the Code (...TRUNCATED)
true
"\nThe latest instruction in the queries (query3) requires the workflow to generate an explanation p(...TRUNCATED)
"explanation: # Sudoku Solver Implementation Documentation\n\nThis document provides a comprehensive(...TRUNCATED)
true
"\nThe latest instruction in the queries (query3) requires the workflow to generate an explanation p(...TRUNCATED)
"explanation: # Comprehensive Code Explanation: Floyd-Warshall All-Pairs Shortest Path Implementatio(...TRUNCATED)
null
ContractReview
1
1. Start 2. Document Extractor 3. Parameter Extractor 4. LLM 5. Template 6. End
"This workflow is designed to perform a comprehensive contract risk review by following these steps:(...TRUNCATED)
"{\n \"nodes_info\": [\n {\n \"id\": \"1\",\n \"type\": \"start\",\n \"param\": {(...TRUNCATED)
true
true
"\nFirst, apply Rule 1: Ground-Truth Node Coverage.\ngt_nodes: ['llm', 'document-extractor', 'templa(...TRUNCATED)
app-pVpPU65MQPdDmWfNMVoGy8Nu
true
"\nThe workflow execution aligns well with the requirements of the current round instruction. The in(...TRUNCATED)
"result: # CONTRACT RISK REVIEW REPORT\n\n## Review Information\n- **Reviewing Party**: Tenant\n- **(...TRUNCATED)
true
"\nThe workflow execution aligns well with the requirements of the current round instruction. The in(...TRUNCATED)
"result: # CONTRACT RISK REVIEW REPORT\n\n## Review Information\n- **Reviewing Party**: Recipient\n-(...TRUNCATED)
true
"\nThe workflow execution aligns with the requirements of the current round instruction as specified(...TRUNCATED)
"result: # CONTRACT RISK REVIEW REPORT\n\n## Review Information\n- **Reviewing Party**: Sponsor\n- *(...TRUNCATED)
null
End of preview. Expand in Data Studio
README.md exists but content is empty.
Downloads last month
93

Collection including zjunlp/Chat2Workflow-Evaluation