Do AI models cheat?

Sun, 17 May 2026 00:00:00 +0000

Reward Hacking

Do AI systems cheat or always perform what the intended task is? In this post we explore an idea where AI can exhibit sycophantic behavior or user pleasing behavior.

Why is AI Safety Challenging?

Fri, 01 May 2026 00:00:00 +0000

Safety for weak capability?

We are postulating that in near future AI capabilities will grow so much, that it will be superior to human intelligence. In such a reality, enforcing some mechanism that allows AI to still work for the progress of humanity will be necessary. I have been reading the book “Superintelligence” and an opinion that exists among researchers and thinkers, is that we have once chance to build that level of AI that surpasses human intelligence and when we get there, if we don’t do it in a safe way, that could possibly mean in worst case end of humanity.

Ai-Safety on rkoush

Do AI models cheat?

Reward Hacking

Why is AI Safety Challenging?

Safety for weak capability?