Reinforcement learning is one of the critical components of machine learning algorithms today. Every right prediction/action is rewarded, every wrong prediction/action is punished. Each reward/punishment tells the algorithm what behavior to repeat and what to stop. Over time , the algorithm works to maximise rewards.
What is critical here? In most situations, there are more wrong answers than right answers. There are more wrong steps and very few /or a single right step. What is key for the algorithm to learn fast? Fail more.
The more wrongs that you avoid ( based on past learning) , the closer you are to the right answer. While you might chance upon the right answer very quickly, the learning happens by finding the wrong ones and avoiding them in future.
Yet this behavior that is so intrinsic to us early in our childhood elludes us later in life. We try to look/plan /be prepared for 100% success at the first attempt . We are woefully unprepared for any other result. Failed project owners are crucified/vilified/taunted. Failed project teams try to hide under the table/ act as if they have committed a crime.
Without the wrong actions, there is no learning ( as we have seen from the reinforcement learning ). Instead of celebrating learning, we are forever scared of failures/ do not prepare for failures. What is worse is that we do not want to start a project if we are not 200% sure it will work.
While such conservatism worked in the past, it may not work now as every industry/business is getting disrupted from unkown quarters. ‘Fail Fast’ is not a buzzword and certainly not an optional!
No comments:
Post a Comment