Well I promised to start a thread on this topic, and as I am sat at home with a cold it seemed like a good time to do it.
Reinforcement. What is it ? Simply put reinforcement is done to reinforce a particular behaviour. But it needn't be just us who do the reinforcing. Animals use reinforcement amongst each other all the time and they don't have a textbook to tell them how to do it. It is a perfectly natural process, but by understanding it we can use that process for our own ends.
It is even used within human society. On any particular day we are subjected to positive and negative reinforcement without even thinking about it. If you think this is not the case then I guess you didn't answer a phone once today because that is a classic case of negative and positive reinforcement.
I like to think of reinforcement as being on a continuum. Say a straight line with negative reinforcement on the left and positive reinforcement on the right. In the middle is neutral where nothing happens. There is also a time line here from left to right. The stimulus for negative reinforcement occurs before the behaviour occurs, the stimulus for positive reinforcement happens after the behaviour occurs. This relationship is worth bearing in mind as we discuss each form of reinforcement separately.
I start with this one as it is the most common form of reinforcement used in training horses. I would say 95% of our reinforcement is of the negative variety.
Negative reinforcement sounds like a nasty way to train an animal (or spouse !) but don't let the negative connotations of the word "negative" cloud your mind over this. In this instance the word "negative" is used in the scientific context of "to remove or subtract".
For example let us consider the simple case of asking a horse to lower its' head by applying downward pressure on the halter. This applies pressure to the poll of the horse which is uncomfortable and the horse will seek to get rid of the uncomfortable feeling, technically called an aversive stimulus. To start they will have no idea as to how to get rid of the pressure, but nature teaches prey animals to push into pressure so that is what they are likely to do to start with. When that doesn't work they may try head shaking or other behaviour.
Eventually, by chance, they may try to lower their head at which point the pressure is removed by the handler, with exquisite timing, letting go of the halter. This is the negative reinforcement bit, the removal of the aversive stimulus. The closer in time that the NR occurs relative to the desired behaviour then the quicker the horse will learn the correct response to the stimulus.
I strongly suspect that one of the major factors to being a great horseman as opposed to an all right one is the timing of the release, especially when the horse is being difficult.
Nearly every ridden aid we give a horse is NR based. For example asking the neck to bend in a lateral flexion. Given the nature of the horse to oppose pressure there is no particular reason as to why they should understand that pulling on one side of the mouth means "bend the neck". But we can train that by applying the pressure and releasing it the instant that the horse even "thinks" about bending the neck. Then ask for a little more next time etc.
This is where the phrase "a good horseman has hands that close slowly but open quickly" comes from. A good horseman is always looking for a reason to open his hands not close them. The snag is that humans, being predators and formerly tree dwellers, have hands that automatically close quickly and open slowly. Swinging among trees that was probably a good thing but if not modified it leads to lousy timing of negative reinforcement. If you don't think this is the case try swinging a stick into the palm of your hand, the instinct is to grab it and hold on.
This is just the other side of neutral to NR. Positive reinforcement, PR
, is the addition of a positive stimulus after a particular behaviour has been performed. It is a favourite of animal trainers who cannot have direct contact with their animals so cannot apply aversive stimuli to use the principle of NR. Sea mammals are an obvious case.
The positve stimuli could be food, water, air, praise. The stimulus is applied after the animal has performed a desired behaviour and so is likely to reinforce that behaviour being repeated in the future. Again the timing is critical, the stimulus must take place as soon after the behaviour occurs as possible. If you are in close proximity to the subject that is easy. When riding a quick stroke on the neck can be achieved as soon as that perfect lead change has happened.
But what about the distant subject, such as the horse at liberty. By the time you manage to stop the horse and get it to come to you, or you go to it a myriad of other behaviours have occurred. What behaviour are you actually rewarding ? The last one to occur before you gave the treat. This can cause real problems. Suppose you are trying to reward that really good forward walk with PR
. By the time you deliver the treat the horse has probably disengaged the hind quarters, stopped and faced you. So rather than rewarding the forward walk you are rewarding the exact opposite.
This is where bridging cues come in. A bridging cue is a cue that is given as the required behaviour occurs and is a "contract" that a treat is forthcoming. Once the bridging cue has been established then the "happy hormones" in the subjects brain are released immediately as the reward is anticipated.
This bridging cue need not be a clicker, there is nothing magical about a bit of plastic that goes "click". Personally I find the clicker impractical when training a horse as my hands are often occupied doing something else and the timing would be poor. I use a "cluck" noise with my tongue. It is always available and never tied up doing something else.
The delivery of the treat should be made as soon as possible after the bridging cue is made to maximise its' effect.
I should mention that you can even use "pre-bridging" cues. In this instance you are in effect saying "you are getting closer to the solution, keep trying". I use "good, Good, Good Girl" as a string of increasingly intense bridging cues with Filly and they work well. This is a slightly more advanced concept, so if you are just starting out using PR
I would stick to the single bridging cue.
In horsemanship we can, of course, use a combination of NR and PR
to speed up the teaching of new cues associated behaviour. Thus a lateral flexion would start with a gentle aversive stimulus of applying pressure to a single rein, slowly increasing in intensity until the flexion is achieved, followed by the NR of releasing the pressure. Concurrent with the release a bridging cue can be given to enter the PR
side of the reinforcement continuum.
As this behaviour becomes more established then the PR
side of the continuum can be put on an "intermittent schedule" i.e. not every time the behaviour is displayed. With a fine balance of how often to use PR
the behaviour can actually be strengthened as the subject tries harder to get the PR
There is much much more to discuss on this topic. For example
: NR the incremental way the the aversive stimulus is made more aversive until the desired behaviour occurs
the whole subject of how much positive stimulus to apply and the concept of intemittency and bonus rewards.
: for both there is the subject of shaping and conflicting cues
There are many many good books on this subject. For PR
I would thoroughly recommend "Don't Shoot the Dog" by Karen Pryor. For a much more academic approach try "Animal Learning and Cognition" by John M Pearce, but be aware it is very very academic and contains results from animal experiments some might find distasteful. I did but still learnt a lot from it.
Back to the telephone. When a phone rings it tends to make an annoying noise which is an aversive stimulus. We get negative reinforcement when we answer the phone and the noise, with exquisite timing stops. We the often get strong positive reinforcement when we get to talk to a loved one. This is why humans are so strongly conditioned to answering the phone. How many times have you had a strong urge to answer a phone that is ringing that is not even yours ? This is why !
Hope this is of interest, if a bit more academic than most posts on here. I hope the subsequent discussion/debate will take us into more practical areas