<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Ai-Safety on rkoush</title>
    <link>https://www.rkoush.com/categories/ai-safety/</link>
    <description>Recent content in Ai-Safety on rkoush</description>
    <generator>Hugo</generator>
    <language>en</language>
    <lastBuildDate>Sun, 17 May 2026 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://www.rkoush.com/categories/ai-safety/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Do AI models cheat?</title>
      <link>https://www.rkoush.com/posts/reward-hacking/</link>
      <pubDate>Sun, 17 May 2026 00:00:00 +0000</pubDate>
      <guid>https://www.rkoush.com/posts/reward-hacking/</guid>
      <description>&lt;h1 id=&#34;reward-hacking&#34;&gt;Reward Hacking&lt;/h1&gt;
&lt;p&gt;Do AI systems cheat or always perform what the intended task is? In this post we explore an idea where AI can exhibit
sycophantic behavior or user pleasing behavior.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Why is AI Safety Challenging?</title>
      <link>https://www.rkoush.com/posts/interp/</link>
      <pubDate>Fri, 01 May 2026 00:00:00 +0000</pubDate>
      <guid>https://www.rkoush.com/posts/interp/</guid>
      <description>&lt;h2 id=&#34;safety-for-weak-capability&#34;&gt;Safety for weak capability?&lt;/h2&gt;
&lt;p&gt;We are postulating that in near future AI capabilities will grow so much, that it will be superior to human intelligence.
In such a reality, enforcing some mechanism that allows AI to still work for the progress of humanity will be necessary.
I have been reading the book &amp;ldquo;Superintelligence&amp;rdquo; and an opinion that exists among researchers and thinkers, is that
we have once chance to build that level of AI that surpasses human intelligence and when we get there, if we don&amp;rsquo;t do
it in a safe way, that could possibly mean in worst case end of humanity.&lt;/p&gt;</description>
    </item>
  </channel>
</rss>
