
We introduce CLEVER, the first curated benchmark for evaluating the generation of specifications and formally verified code in Lean. The benchmark comprises of 161 programming problems; …
Submissions | OpenReview
Jan 22, 2025 · Promoting openness in scientific communication and the peer-review process
Evaluating the Robustness of Neural Networks: An Extreme Value...
Our analysis yields a novel robustness metric called CLEVER, which is short for Cross Lipschitz Extreme Value for nEtwork Robustness. The proposed CLEVER score is attack-agnostic and …
579 In this paper, we have proposed a novel counter- factual framework CLEVER for debiasing fact- checking models. Unlike existing works, CLEVER is augmentation-free and mitigates …
Forum | OpenReview
Promoting openness in scientific communication and the peer-review process
Chimera: Diagnosing Shortcut Learning in Visual-Language
Sep 18, 2025 · We evaluate 15 open-source VLMs from 7 model families on Chimera and find that their seemingly strong performance largely stems from shortcut behaviors: visual …
RAHP: Robustness-Aware Head Pruning for Certified
Sep 15, 2025 · Across evaluated tasks, RAHP yields compact models with stronger CLEVER lower bounds and minimal change in clean accuracy, and it improves resistance to a wide …
Alias-Free Mamba Neural Operator | OpenReview
Sep 25, 2024 · Functionally, MambaNO achieves a clever balance between global integration, facilitated by state space model of Mamba that scans the entire function, and local integration, …
STAIR: Improving Safety Alignment with Introspective Reasoning
May 1, 2025 · One common approach is training models to refuse unsafe queries, but this strategy can be vulnerable to clever prompts, often referred to as jailbreak attacks, which can …
Provably Mitigating Overoptimization in RLHF: Your SFT Loss is...
Jun 18, 2024 · With a clever usage of the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines (i) a …