Do AI models cheat?
Reward Hacking Do AI systems cheat or always perform what the intended task is? In this post we explore an idea where AI can exhibit sycophantic behavior or user pleasing behavior.
Reward Hacking Do AI systems cheat or always perform what the intended task is? In this post we explore an idea where AI can exhibit sycophantic behavior or user pleasing behavior.
Safety for weak capability? We are postulating that in near future AI capabilities will grow so much, that it will be superior to human intelligence. In such a reality, enforcing some mechanism that allows AI to still work for the progress of humanity will be necessary. I have been reading the book “Superintelligence” and an opinion that exists among researchers and thinkers, is that we have once chance to build that level of AI that surpasses human intelligence and when we get there, if we don’t do it in a safe way, that could possibly mean in worst case end of humanity. ...