Reward Hacking
Do AI systems cheat or always perform what the intended task is? In this post we explore an idea where AI can exhibit sycophantic behavior or user pleasing behavior.
Do AI systems cheat or always perform what the intended task is? In this post we explore an idea where AI can exhibit sycophantic behavior or user pleasing behavior.