• pkjqpg1h@lemmy.zip
    link
    fedilink
    English
    arrow-up
    10
    ·
    5 days ago

    According to the AA-Omniscience benchmark

    The most expensive models,

    Opus 4.6 has a 60% hallucination rate and 46% accuracy rate. Gemini 3.1 Pro Preview has a 50% hallucination rate and 55% accuracy rate.

    And the questions aren’t even open-ended.

    I don’t even need to tell you about the other models.

    • Kairos@lemmy.today
      link
      fedilink
      arrow-up
      4
      ·
      edit-2
      5 days ago

      “Opus 4.6” like every other LLM has a 100% hallucination rate because that’s the literal only thing they do.