LLM-as-JudgeUsing models to evaluate model outputs, positional bias, self-preference, and calibration failures.