About the Role
We are looking for an Evaluation Scientist who can work across both hands-on experimentation and automation infrastructure. This role begins with running manual evaluations (e.g., executing and monitoring individual experiments) and progresses toward building scripts, tools, and infrastructure that streamline and automate these processes, with the long-term goal of reducing manual work as much as possible.
The ideal candidate will also bring expertise in coding agents and quality evaluation, enabling them to design robust experiments and improve workflows. While the role will receive high-level guidance, candidates should be able to independently define and implement the lower-level details of experiment setup after ramping up. For example, given a high-level requirement for a new type of evaluation, the candidate should be able to propose and execute an implementation plan with detailed steps, metrics, and automation in place.
Key Responsibilities