About 2,850 results
Open links in new tab
  1. We introduce CLEVER, the first curated benchmark for evaluating the generation of specifications and formally verified code in Lean. The benchmark comprises of 161 programming problems; …

  2. Submissions | OpenReview

    Jan 22, 2025 · Promoting openness in scientific communication and the peer-review process

  3. Evaluating the Robustness of Neural Networks: An Extreme Value...

    Our analysis yields a novel robustness metric called CLEVER, which is short for Cross Lipschitz Extreme Value for nEtwork Robustness. The proposed CLEVER score is attack-agnostic and …

  4. 579 In this paper, we have proposed a novel counter- factual framework CLEVER for debiasing fact- checking models. Unlike existing works, CLEVER is augmentation-free and mitigates …

  5. Forum | OpenReview

    Promoting openness in scientific communication and the peer-review process

  6. Chimera: Diagnosing Shortcut Learning in Visual-Language

    Sep 18, 2025 · We evaluate 15 open-source VLMs from 7 model families on Chimera and find that their seemingly strong performance largely stems from shortcut behaviors: visual …

  7. RAHP: Robustness-Aware Head Pruning for Certified

    Sep 15, 2025 · Across evaluated tasks, RAHP yields compact models with stronger CLEVER lower bounds and minimal change in clean accuracy, and it improves resistance to a wide …

  8. Alias-Free Mamba Neural Operator | OpenReview

    Sep 25, 2024 · Functionally, MambaNO achieves a clever balance between global integration, facilitated by state space model of Mamba that scans the entire function, and local integration, …

  9. STAIR: Improving Safety Alignment with Introspective Reasoning

    May 1, 2025 · One common approach is training models to refuse unsafe queries, but this strategy can be vulnerable to clever prompts, often referred to as jailbreak attacks, which can …

  10. Provably Mitigating Overoptimization in RLHF: Your SFT Loss is...

    Jun 18, 2024 · With a clever usage of the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines (i) a …