Reinforcement, Negative and Positive
Well I promised to start a thread on this topic, and as I am sat at home with a cold it seemed like a good time to do it.
Reinforcement. What is it ? Simply put reinforcement is done to reinforce a particular behaviour. But it needn't be just us who do the reinforcing. Animals use reinforcement amongst each other all the time and they don't have a textbook to tell them how to do it. It is a perfectly natural process, but by understanding it we can use that process for our own ends.
It is even used within human society. On any particular day we are subjected to positive and negative reinforcement without even thinking about it. If you think this is not the case then I guess you didn't answer a phone once today because that is a classic case of negative and positive reinforcement.
I like to think of reinforcement as being on a continuum. Say a straight line with negative reinforcement on the left and positive reinforcement on the right. In the middle is neutral where nothing happens. There is also a time line here from left to right. The stimulus for negative reinforcement occurs before the behaviour occurs, the stimulus for positive reinforcement happens after the behaviour occurs. This relationship is worth bearing in mind as we discuss each form of reinforcement separately.
I start with this one as it is the most common form of reinforcement used in training horses. I would say 95% of our reinforcement is of the negative variety.
Negative reinforcement sounds like a nasty way to train an animal (or spouse !) but don't let the negative connotations of the word "negative" cloud your mind over this. In this instance the word "negative" is used in the scientific context of "to remove or subtract".
For example let us consider the simple case of asking a horse to lower its' head by applying downward pressure on the halter. This applies pressure to the poll of the horse which is uncomfortable and the horse will seek to get rid of the uncomfortable feeling, technically called an aversive stimulus. To start they will have no idea as to how to get rid of the pressure, but nature teaches prey animals to push into pressure so that is what they are likely to do to start with. When that doesn't work they may try head shaking or other behaviour.
Eventually, by chance, they may try to lower their head at which point the pressure is removed by the handler, with exquisite timing, letting go of the halter. This is the negative reinforcement bit, the removal of the aversive stimulus. The closer in time that the NR occurs relative to the desired behaviour then the quicker the horse will learn the correct response to the stimulus.
I strongly suspect that one of the major factors to being a great horseman as opposed to an all right one is the timing of the release, especially when the horse is being difficult.
Nearly every ridden aid we give a horse is NR based. For example asking the neck to bend in a lateral flexion. Given the nature of the horse to oppose pressure there is no particular reason as to why they should understand that pulling on one side of the mouth means "bend the neck". But we can train that by applying the pressure and releasing it the instant that the horse even "thinks" about bending the neck. Then ask for a little more next time etc.
This is where the phrase "a good horseman has hands that close slowly but open quickly" comes from. A good horseman is always looking for a reason to open his hands not close them. The snag is that humans, being predators and formerly tree dwellers, have hands that automatically close quickly and open slowly. Swinging among trees that was probably a good thing but if not modified it leads to lousy timing of negative reinforcement. If you don't think this is the case try swinging a stick into the palm of your hand, the instinct is to grab it and hold on.
This is just the other side of neutral to NR. Positive reinforcement, PR, is the addition of a positive stimulus after a particular behaviour has been performed. It is a favourite of animal trainers who cannot have direct contact with their animals so cannot apply aversive stimuli to use the principle of NR. Sea mammals are an obvious case.
The positve stimuli could be food, water, air, praise. The stimulus is applied after the animal has performed a desired behaviour and so is likely to reinforce that behaviour being repeated in the future. Again the timing is critical, the stimulus must take place as soon after the behaviour occurs as possible. If you are in close proximity to the subject that is easy. When riding a quick stroke on the neck can be achieved as soon as that perfect lead change has happened.
But what about the distant subject, such as the horse at liberty. By the time you manage to stop the horse and get it to come to you, or you go to it a myriad of other behaviours have occurred. What behaviour are you actually rewarding ? The last one to occur before you gave the treat. This can cause real problems. Suppose you are trying to reward that really good forward walk with PR. By the time you deliver the treat the horse has probably disengaged the hind quarters, stopped and faced you. So rather than rewarding the forward walk you are rewarding the exact opposite.
This is where bridging cues come in. A bridging cue is a cue that is given as the required behaviour occurs and is a "contract" that a treat is forthcoming. Once the bridging cue has been established then the "happy hormones" in the subjects brain are released immediately as the reward is anticipated.
This bridging cue need not be a clicker, there is nothing magical about a bit of plastic that goes "click". Personally I find the clicker impractical when training a horse as my hands are often occupied doing something else and the timing would be poor. I use a "cluck" noise with my tongue. It is always available and never tied up doing something else.
The delivery of the treat should be made as soon as possible after the bridging cue is made to maximise its' effect.
I should mention that you can even use "pre-bridging" cues. In this instance you are in effect saying "you are getting closer to the solution, keep trying". I use "good, Good, Good Girl" as a string of increasingly intense bridging cues with Filly and they work well. This is a slightly more advanced concept, so if you are just starting out using PR I would stick to the single bridging cue.
In horsemanship we can, of course, use a combination of NR and PR to speed up the teaching of new cues associated behaviour. Thus a lateral flexion would start with a gentle aversive stimulus of applying pressure to a single rein, slowly increasing in intensity until the flexion is achieved, followed by the NR of releasing the pressure. Concurrent with the release a bridging cue can be given to enter the PR side of the reinforcement continuum.
As this behaviour becomes more established then the PR side of the continuum can be put on an "intermittent schedule" i.e. not every time the behaviour is displayed. With a fine balance of how often to use PR the behaviour can actually be strengthened as the subject tries harder to get the PR.
There is much much more to discuss on this topic. For example
: NR the incremental way the the aversive stimulus is made more aversive until the desired behaviour occurs
: PR the whole subject of how much positive stimulus to apply and the concept of intemittency and bonus rewards.
: for both there is the subject of shaping and conflicting cues
There are many many good books on this subject. For PR I would thoroughly recommend "Don't Shoot the Dog" by Karen Pryor. For a much more academic approach try "Animal Learning and Cognition" by John M Pearce, but be aware it is very very academic and contains results from animal experiments some might find distasteful. I did but still learnt a lot from it.
Back to the telephone. When a phone rings it tends to make an annoying noise which is an aversive stimulus. We get negative reinforcement when we answer the phone and the noise, with exquisite timing stops. We the often get strong positive reinforcement when we get to talk to a loved one. This is why humans are so strongly conditioned to answering the phone. How many times have you had a strong urge to answer a phone that is ringing that is not even yours ? This is why !
Hope this is of interest, if a bit more academic than most posts on here. I hope the subsequent discussion/debate will take us into more practical areas :-)
you forgot to mention how punishment fits in
Behavioral theory is based primarily in the work of psychologist B. F. Skinner who worked mostly with pigeons and rats. At the most... extreme... behaviorists believe that there's no such thing as free will and that every "decision" that is made or action that takes place is because of a stimulus generating a response.
Here's a quick overview. It applies to horses, dogs, people,...
Positive Reinforcement involves the introduction of something positive to encourage a behavior.
Negative Reinforcement involves the removal of something negative to encourage a behavior.
Positive Punishment involves the introduction of something negative to discourage a behavior.
Negative Punishment involves removing something pleasant to discourage a behavior.
Reinforcement is used when you want a behavior to continue.
Punishment is used when you want a behavior to cease.
Positive actions introduce a stimulus to elicit a response.
Negative actions remove a stimulus to elicit a response.
Sorry about the omission of punishment. I should have at least mentioned it. Maybe someone else can run with that baton ?
B.F. Skinner was indeed the man who really started our understanding of the subject, though it has since be expanded by many others.
If you want an equestrian slant on the subject then http://www.amazon.co.uk/Equitation-Science-Paul-D-McGreevy/dp/1405189053is a good, if again somewhat academic read. But then I guess the title suggests that. It is interesting to note that the authors of this book are involved in lobbying the FEI on more ethical standards in equitation sport. Flash nosebands and use of sticks (in horse racing) are of particular concern. They are drawing on their academic research to highlight the ethical and sporting issues that these items incur. May they succeed.
Glad you posted this.
Any thoughts on the part of reinforcement where you make the wrong thing harder to make the right thing easier?
Mark Rashid made a comment to the effect of instead of making the wrong thing harder why not focus on making the right thing easier as they are often behaving in the 'bad attitude way' because they are finding the 'right thing' hard for some reason
Clear, accurate and to the point! GREAT POST! It deserves repeating!
I think Marks idea is that some folks go too far with making the wrong thing harder and wind up in a flight response instead of the horse just trying different behaviours to get the right answer. He is trying to get folks to concentrate on being very quick with the release rather than the application of pressure. Quick, timely release is the very essence of negative reinforcement. If you have not applied sufficient pressure then little harm will be done, it will just take longer to get a result. If you fail to release the pressure you have a) missed a learning opportunity and b) possibly made the horse discount that behaviour as the correct one. It may take sometime before the offer that behaviour again.
If they wind up in instinctive flight (right brain in Parelli land) then they are no longer really thinking, just reacting and there is little chance that even if they do happen upon the right behaviour they will even notice the release of pressure.
To teach a horse you must get its' mind first, then its' body. If you loose its' mind by applying inappropriately firm pressure then no learning can happen.
So for each horse you have to determine what the maximum amount of pressure it is that you can apply and then however frustrated you get never go beyond it. In general left brain horses (confident horses) can take more pressure than inherently right brain (reactive) horses.
I should emphasise that you should be reading the mind set of the horse in front of you this second and behave appropriately, but an understanding of the inherent horsenality of the horse will help inform your assesment in difficult circumstances
in the example you gave when you apply downward pressure on the poll, you are positively punishing everything the horse does other than lowering it's head, and you are negatively reinforcing when the horse does lower it's head.
another example would be if you were walking along on a horse and you used whatever aids are normal for you to have your horse trot. then when your horse does trot you release the pressure. you've negatively reinforced trotting but you've also (and equally) positively punished walking. next time you apply the same aid your horse will be more likely to stop walking because you are +punishing it, and will be more likely to start trotting because you have -reinforced it in the past.
anyway i mention this because a lot of people i know seem to think that reinforcement is how you get things done with a horse, and think that punishment is "taboo" or "bad". which always humors me a bit because how can you remove an aversive stimulus if you didn't first put it there?
|All times are GMT -4. The time now is 12:43 AM.|
Powered by vBulletin® Version 3.8.8
Copyright ©2000 - 2016, vBulletin Solutions, Inc.
vBulletin Security provided by vBSecurity v2.2.2 (Pro) - vBulletin Mods & Addons Copyright © 2016 DragonByte Technologies Ltd.
User Alert System provided by Advanced User Tagging (Pro) - vBulletin Mods & Addons Copyright © 2016 DragonByte Technologies Ltd.