Benchmarking Human-AI Collaboration for Common Evidence Appraisal Tools
Tim Woelfle
— 04/2024
Accuracy & Reliability
Column
Accuracy: (1) Human Raters / (2) Individual LLMs / (3) Combined LLMs (
main
/
all
)
Accuracy: (4) Human-AI Collaboration (
main
/
all
)
Reliability
Column
Method Overview
(
Show overview
)
Combined LLMs Overview
Formatting & Quoting Accuracy
Column
Formatting & Quoting Accuracy (
main
/
all
)