Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
zjunlp/Chat2Workflow-Evaluation · Datasets at Hugging Face

Datasets:

zjunlp
/

Chat2Workflow-Evaluation

Modalities:

Size:

License:

Dataset card Data Studio Files Files and versions

Dataset Viewer

Auto-converted to Parquet Duplicate

Split (1)

test · 3.56k rows

task stringclasses 27 values	round int64 1 4	node_selection stringlengths 8 805 ⌀	design_principle stringlengths 132 6.1k ⌀	workflow stringlengths 506 14.9k ⌀	valid bool 2 classes	has_been_judged bool 2 classes	reason stringlengths 198 14.9k ⌀	api_key stringlengths 28 28 ⌀	test1 bool 2 classes	llm_evaluate_reason_test1 stringlengths 375 1.99k ⌀	llm_output_test1 stringlengths 10 165k ⌀	test2 bool 2 classes	llm_evaluate_reason_test2 stringlengths 397 1.88k ⌀	llm_output_test2 stringlengths 10 167k ⌀	test3 bool 2 classes	llm_evaluate_reason_test3 stringlengths 403 1.74k ⌀	llm_output_test3 stringlengths 10 176k ⌀	fail_step stringclasses 4 values
BatchFiles	1	Start, Iteration, Iteration-Start, Document Extractor, LLM (3 instances), Template (2 instances), Markdown Exporter, End	The workflow needs to process multiple documents iteratively, performing three sequential LLM operations on each document, then aggregating all results into a formatted output. Workflow Structure: 1. Start Node (id: 1): Accepts an array of documents via variable `doc_list` 2. Iteration Node (id: 2): Proces...	{ "nodes_info": [ { "id": "1", "type": "start", "param": { "variables": [ ["doc_list", "array[file]"] ] } }, { "id": "2", "type": "iteration", "param": { "iterator_selector": ["doc_list", "1"], "output_selector": ["output"...	true	true	Rule 1: Ground-Truth Node Coverage. gt_nodes: ['llm', 'iteration', 'template-transform', 'markdown-exporter', 'document-extractor']. node_selection: Start, Iteration, Iteration-Start, Document Extractor, LLM (3 instances), Template (2 instances), Markdown Exporter, End. Check each gt_node type against node_selection: ...	app-8h02nUV03I5SVii8l6Lo2eHS	true	The latest instruction in the queries requires the workflow to process a list of documents, parse their structure, analyze the main content and function of each part, and summarize the articles. The summaries of all documents should be aggregated into a single text through template conversion, which is then converted ...	summary: # Document Summaries Report ## Document 1 Summary Here is a comprehensive, integrated summary of the Reuters article—synthesizing its structure, content, purpose, and significance—based on your detailed analyses: --- ### **Article Summary: “Daniel Radcliffe Gains Access to £20M Fortune on 18th Birthday...	true	The latest instruction in the queries requires the workflow to process a list of documents, parse their structure, analyze their main content and function, and summarize each document. The summaries should then be aggregated into a single text, converted into Markdown format, and output as both a summary text and a DO...	summary: # Document Summaries Report ## Document 1 Summary Here is a comprehensive, integrated summary of the CNN news report on the I-35W Mississippi River Bridge collapse in Minneapolis (August 1, 2007), synthesizing its structure, content, journalistic strategy, and thematic resonance—based on your detaile...	false	null	null	null
BatchFiles	2	Start, Iteration, Iteration-Start, LLM, Template (2 instances), Markdown Exporter, End	The workflow is modified to process images instead of documents. Since images can be directly processed by multimodal LLMs without requiring extraction, the Document Extractor node is removed. Key Changes: 1. Start Node: Changed input variable from `doc_list` (array[file]) to `pic_list` (array[file]) for image...	{ "nodes_info": [ { "id": "1", "type": "start", "param": { "variables": [ ["pic_list", "array[file]"] ] } }, { "id": "2", "type": "iteration", "param": { "iterator_selector": ["pic_list", "1"], "output_selector": ["output"...	true	true	Rule 1: Ground-Truth Node Coverage gt_nodes: ['llm', 'iteration', 'template-transform', 'markdown-exporter'] node_selection: Start, Iteration, Iteration-Start, LLM, Template (2 instances), Markdown Exporter, End Check if all node types in gt_nodes are present in node_selection: - 'llm' is present as 'LLM' (case-insens...	app-3qrvKyV399A8VhciEahrzJK8	true	The latest instruction in the queries (query2) requires the workflow to summarize the content of the images one by one. The output provided contains detailed summaries for three images, each structured in a clear and organized manner. The summaries include key information such as the title, authors, core objectives, m...	summary: # Image Summaries Report ## Image 1 Summary This image is a conference paper page (likely from ICLR, as indicated by the logo) titled: “A Fast and Provable Algorithm for Sparse Phase Retrieval” by Jian-Feng Cai, Yu Long, Ruixue Wen, and Jiaxi Ying. Code: https://github.com/jxying/SparsePR The p...	true	The latest instruction in the queries (query2) changes the input to a list of images (variable pic_list) and tasks the workflow with summarizing the content of the images one by one. The output provided (summary) contains detailed summaries of two images, which aligns with the requirement to summarize the content of t...	summary: # Image Summaries Report ## Image 1 Summary This image is a research poster presented at the ICML 2024 (International Conference on Machine Learning), titled: “Layer-Aware Analysis of Catastrophic Overfitting: Revealing the Pseudo-Robust Shortcut Dependency” Authored by Runqi Lin, Chaojian Yu, Bo H...	true	The latest instruction in the queries (query2) requires the workflow to summarize the content of a list of images one by one. The output provided (summary) contains detailed summaries of two images, each structured with clear headings and subheadings that break down the content into sections such as "Core Problem & Mo...	summary: # Image Summaries Report ## Image 1 Summary This image is a conference poster titled: > “Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning” Authored by: - Yulai Zhao (Princeton University) - Zhuoran Yang (Yale University) - Zhaoran Wang (Northwestern University...	null
BatchFiles	3	Start, List Operator (2 instances), If-Else, Iteration (2 instances), Iteration-Start (2 instances), Document Extractor, LLM (2 instances), Template (4 instances), Variable Aggregator, Markdown Exporter, End	The workflow now needs to handle mixed file types (documents and images) and process them differently based on their type. This requires filtering the input array and creating separate processing paths. Workflow Structure: 1. Start Node (id: 1): Accepts `file_list` as array[file] containing both documents and...	{ "nodes_info": [ { "id": "1", "type": "start", "param": { "variables": [ ["file_list", "array[file]"] ] } }, { "id": "2", "type": "list-operator", "param": { "variable": ["file_list", "1"], "operator": ["filter_by", "in",...	true	true	First, check Rule 1: Ground-Truth Node Coverage. The gt_nodes are: ['llm', 'iteration', 'template-transform', 'markdown-exporter', 'document-extractor', 'list-operator']. The node_selection includes: Start, List Operator (2 instances), If-Else, Iteration (2 instances), Iteration-Start (2 instances), Document Extractor...	app-lqcV7Nv9NUnfig69kcL3YERK	false	The latest instruction in the queries (query3) requires the workflow to handle a mixed input of documents and images (variable file_list), sorting them by category and processing accordingly. However, the provided output only contains a summary of a single document, which does not align with the requirement to process...	summary: # File Summaries Report ## Document 1 Summary Comprehensive Summary of the Document: On Friday, the White House announced that Press Secretary Tony Snow would step down from his position on September 14, to be succeeded by his deputy, Dana Perino. Snow, who has been undergoing treatment for ...	false	The latest instruction in the queries (query3) requires the input to contain both documents and images (variable file_list) and to process them according to their category. However, the input provided is empty, which means the workflow did not receive any files to process. Despite this, the output contains a summary o...	summary: # File Summaries Report ## Document 1 Summary Comprehensive Summary of the CNN Report: On Friday morning, Niranjan Desai, a resident of Jersey City, New Jersey, discovered an empty AT4 anti-tank rocket launcher tube lying on her front lawn. The device—a 20-year-old, one-time-use shoulder-fired weapo...	false	The latest instruction in the queries (query3) requires the workflow to handle a mixed input of documents and images (variable file_list), sort them by category, and process them accordingly. However, the provided output only contains a summary of a single document, which does not align with the requirement to process...	summary: # File Summaries Report ## Document 1 Summary This CNN report details President George W. Bush’s strategic effort to reframe the national debate over the Iraq War ahead of a pivotal progress assessment by General David Petraeus and Ambassador Ryan Crocker in September 2007. Facing declining public support (6...	null
BookCharacter	1	Start, LLM, End	This workflow is designed to create a "three-minute speed-read" summary of a book based on user input: 1. Start Node: Collects two input variables from the user: - `book`: The title of the book - `author`: The author's name 2. LLM Node: Uses a language model to generate a comprehensive three-minute summ...	{ "nodes_info": [ { "id": "1", "type": "start", "param": { "variables": [ ["book", "string"], ["author", "string"] ] } }, { "id": "2", "type": "llm", "param": { "system": "You are an expert book summarizer who creates co...	true	true	First, evaluate Rule 1: Ground-Truth Node Coverage. gt_nodes is ['llm']. node_selection is Start, LLM, End. The node type 'llm' is present in node_selection (as "LLM", ignoring case). Therefore, all gt_nodes are covered. Rule 1 satisfied. Next, evaluate Rule 2: Consistency and Exact Node Matching. a. Logical Consiste...	app-n6o7iiT1DvbQwqyuq0CExxct	true	The workflow execution aligns with the requirements of the current round instruction. The instruction was to create a 'Finish a Book in Three Minutes' workflow that generates a three-minute speed-read version of the book's content based on the provided book title and author name. The output includes a detailed summary...	result: Absolutely! Here's a three-minute speed-read summary of The Three-Body Problem by Liu Cixin — sharp, vivid, and packed with insight: --- 📖 What It’s About (30 seconds): The Three-Body Problem is the first novel in Liu Cixin’s award-winning Remembrance of Earth’s Past trilogy. Set against the ...	true	The workflow execution aligns with the requirements of the current round instruction. The instruction specifies creating a 'Finish a Book in Three Minutes' workflow, where the user provides the book title and author name as input, and the workflow outputs a three-minute speed-read summary of the book's content. The in...	result: Absolutely—here’s a vibrant, three-minute speed-read summary of One Hundred Years of Solitude by Gabriel García Márquez: --- What It’s About: One Hundred Years of Solitude (1967) is the foundational masterpiece of magical realism—a sweeping, multi-generational saga centered on the Buendía family and...	true	The workflow execution aligns with the requirements of the current round instruction. The instruction was to create a 'Finish a Book in Three Minutes' workflow that generates a three-minute speed-read version of the book's content based on the provided book title and author name. The output, stored in the variable 're...	result: Absolutely! Here's a vibrant, three-minute speed-read summary of *Jin Yong’s Demi-Gods and Semi-Devils*—a masterpiece of wuxia fiction that blends martial arts, philosophy, identity, and tragedy with breathtaking scope. --- ### 📖 What It’s About** Demi-Gods and Semi-Devils (1963) is the second nove...	null
BookCharacter	2	Start, LLM (existing), LLM (new for character biography), Template, Markdown Exporter, End	Building on the existing workflow, I'm adding functionality to generate a character biography for the book's protagonist and export it as a DOCX file: 1. Existing nodes (Start, first LLM): These remain unchanged, collecting book title and author information. 2. New LLM Node (id: 4): Generates a detailed chara...	{ "nodes_info": [ { "id": "1", "type": "start", "param": { "variables": [ ["book", "string"], ["author", "string"] ] } }, { "id": "2", "type": "llm", "param": { "system": "You are an expert book summarizer who creates co...	true	true	Rule 1: Ground-Truth Node Coverage - Check if node types in gt_nodes is a subset of node_selection. gt_nodes: ['llm', 'markdown-exporter']. node_selection: Start, LLM (existing), LLM (new for character biography), Template, Markdown Exporter, End. This includes 'LLM' and 'Markdown Exporter'. Node type only needs to ap...	app-SOsMqULPSOrmphm4YZkJnMEH	true	null	null	false	null	null	true	null	null	null
BookCharacter	3	Start, LLM (for summary), LLM (for character biography), Template, Markdown Exporter, LLM (new for image prompt), Text to Image, End	Building on the existing workflow, I'm adding functionality to generate an illustration based on the character biography: 1. Existing nodes (Start, LLMs, Template, Markdown Exporter): These remain unchanged, generating the character biography. 2. New LLM Node (id: 7): Creates a detailed image generation promp...	{ "nodes_info": [ { "id": "1", "type": "start", "param": { "variables": [ ["book", "string"], ["author", "string"] ] } }, { "id": "2", "type": "llm", "param": { "system": "You are an expert book summarizer who creates co...	true	true	Rule 1: Ground-Truth Node Coverage gt_nodes contains 'llm' and 'text2image'. node_selection lists 'LLM (for summary)', 'LLM (for character biography)', 'LLM (new for image prompt)', and 'Text to Image'. All required node types from gt_nodes ('llm' and 'text2image') are present in node_selection. Rule 1 is satisfied. ...	app-uZt3a57drRRNMKJOstgXF7pB	false	null	null	false	null	null	false	null	null	null
Code	1	Start, Document Extractor, LLM, End	This workflow is designed to translate code from one programming language to another, focusing on generating the main structure in the first stage: 1. Input Collection (Start Node): - Accept three inputs: source_code (file), source_language (string), and target_language (string) - The source_code is a file ...	{ "nodes_info": [ { "id": "1", "type": "start", "param": { "variables": [ ["source_code", "file"], ["source_language", "string"], ["target_language", "string"] ] } }, { "id": "2", "type": "document-extractor", "param":...	true	true	Rule 1: Ground-Truth Node Coverage. The gt_nodes are ['llm', 'document-extractor']. Both 'llm' and 'document-extractor' appear in node_selection: Start, Document Extractor, LLM, End. Rule 1 is satisfied. Rule 2: Consistency and Exact Node Matching. a. Logical Consistency: The workflow design follows the design_princi...	app-QnFPjMCQwsyBDZmdqVP162jE	true	The workflow execution aligns with the requirements of the current round instruction. The task was to translate the original C code into Python, focusing on generating the main structure of the code without providing specific implementations for classes and functions. The output `target_code` correctly includes the si...	target_code: ```python # Dependencies Analysis: # Built-in libraries needed: # - typing: for type hints (List, Optional, etc.) # - copy: potentially for deep copying (though we'll avoid it in structure stage) # # Custom/local dependencies: None (all functionality is self-contained) from typing import List, Optiona...	true	The workflow execution aligns with the requirements of the current round instruction. The instruction specifies that the workflow should translate the original code into the target code, generating only the main structure of the code in the first stage. For classes and functions, only the corresponding signatures shou...	target_code: ```go // Package nqueens solves the N-Queens problem using backtracking. // This file contains the main structural skeleton for the solution in Go. package nqueens import ( // No external dependencies needed for basic structure ) // Note: In Go, we avoid global variables. Instead, we'll use struct-based...	true	The workflow execution aligns with the requirements of the current round instruction. The instruction specifies that the workflow should translate the original code into the target code, generating only the main structure of the code in the first stage. This includes providing function signatures and class declaration...	target_code: ```javascript // Dependencies Analysis: // Built-in libraries: None required for basic structure (JavaScript has no imports for core functionality) // Custom/local dependencies: None (all functions are self-contained in the original C code) /** * N-Queens solver implementation structure in JavaScript. *...	null
Code	2	Start, Document Extractor, LLM, Code, If-Else, Variable Aggregator, End	This workflow translates source code to Python, validates it, executes it, and returns both the code and execution results: 1. Input Collection (Start Node): - Accept two inputs: source_code (file) and case_input (string) - Removed target_language since Python is now the fixed target - Added case_input f...	{ "nodes_info": [ { "id": "1", "type": "start", "param": { "variables": [ ["source_code", "file"], ["source_language", "string"], ["case_input", "string"] ] } }, { "id": "2", "type": "document-extractor", "param": { ...	false	true	First, check Rule 1: Ground-Truth Node Coverage. gt_nodes are ['llm', 'document-extractor', 'code']. node_selection includes Start, Document Extractor, LLM, Code, If-Else, Variable Aggregator, End. Comparing node types (ignoring case): 'document-extractor' matches 'Document Extractor', 'llm' matches 'LLM', 'code' matc...	null	null	null	null	null	null	null	null	null	null	step_1_4
Code	3	Start, Document Extractor, LLM, LLM, Markdown Exporter, End	"This workflow translates source code to Python and generates comprehensive explanations in Markdown(...TRUNCATED)	"{\n \"nodes_info\": [\n {\n \"id\": \"1\",\n \"type\": \"start\",\n \"param\": {(...TRUNCATED)	true	true	"\nFirst, check Rule 1: Ground-Truth Node Coverage. The gt_nodes are ['llm', 'document-extractor', '(...TRUNCATED)	app-RNTUf02oYPjCfPOOXszfsWAF	true	"\nThe latest instruction in the queries (query3) requires the workflow to generate an explanation p(...TRUNCATED)	"explanation: # Comprehensive Explanation of N-Queens Solver\n\n## 1. Overview\n\n### What the Code (...TRUNCATED)	true	"\nThe latest instruction in the queries (query3) requires the workflow to generate an explanation p(...TRUNCATED)	"explanation: # Sudoku Solver Implementation Documentation\n\nThis document provides a comprehensive(...TRUNCATED)	true	"\nThe latest instruction in the queries (query3) requires the workflow to generate an explanation p(...TRUNCATED)	"explanation: # Comprehensive Code Explanation: Floyd-Warshall All-Pairs Shortest Path Implementatio(...TRUNCATED)	null
ContractReview	1	1. Start 2. Document Extractor 3. Parameter Extractor 4. LLM 5. Template 6. End	"This workflow is designed to perform a comprehensive contract risk review by following these steps:(...TRUNCATED)	"{\n \"nodes_info\": [\n {\n \"id\": \"1\",\n \"type\": \"start\",\n \"param\": {(...TRUNCATED)	true	true	"\nFirst, apply Rule 1: Ground-Truth Node Coverage.\ngt_nodes: ['llm', 'document-extractor', 'templa(...TRUNCATED)	app-pVpPU65MQPdDmWfNMVoGy8Nu	true	"\nThe workflow execution aligns well with the requirements of the current round instruction. The in(...TRUNCATED)	"result: # CONTRACT RISK REVIEW REPORT\n\n## Review Information\n- Reviewing Party: Tenant\n- **(...TRUNCATED)	true	"\nThe workflow execution aligns well with the requirements of the current round instruction. The in(...TRUNCATED)	"result: # CONTRACT RISK REVIEW REPORT\n\n## Review Information\n- Reviewing Party: Recipient\n-(...TRUNCATED)	true	"\nThe workflow execution aligns with the requirements of the current round instruction as specified(...TRUNCATED)	"result: # CONTRACT RISK REVIEW REPORT\n\n## Review Information\n- Reviewing Party: Sponsor\n- *(...TRUNCATED)	null

End of preview. Expand in Data Studio

README.md exists but content is empty.

Downloads last month: 93

Number of rows:

Total file size:

Collection including zjunlp/Chat2Workflow-Evaluation

Chat2Workflow

1 item • Updated Feb 9 • 1