Specific troubleshooting techniques
After applying some of the general
troubleshooting tips to narrow the scope of a problem's
location, there are techniques useful in further isolating
it. Here are a few:
Swap identical components
In a system with identical or parallel
subsystems, swap components between those subsystems and see
whether or not the problem moves with the swapped component.
If it does, you've just swapped the faulty component; if it
doesn't, keep searching!
This is a powerful troubleshooting method,
because it gives you both a positive and a negative
indication of the swapped component's fault: when the bad
part is exchanged between identical systems, the formerly
broken subsystem will start working again and the formerly
good subsystem will fail.
I was once able to troubleshoot an elusive
problem with an automotive engine ignition system using this
method: I happened to have a friend with an automobile
sharing the exact same model of ignition system. We swapped
parts between the engines (distributor, spark plug wires,
ignition coil -- one at a time) until the problem moved to
the other vehicle. The problem happened to be a "weak"
ignition coil, and it only manifested itself under heavy
load (a condition that could not be simulated in my garage).
Normally, this type of problem could only be pinpointed
using an ignition system analyzer (or oscilloscope) and
a dynamometer to simulate loaded driving conditions. This
technique, however, confirmed the source of the problem with
100% accuracy, using no diagnostic equipment whatsoever.
Occasionally you may swap a component and
find that the problem still exists, but has changed in some
way. This tells you that the components you just swapped are
somehow different (different calibration, different
function), and nothing more. However, don't dismiss this
information just because it doesn't lead you straight to the
problem -- look for other changes in the system as a whole
as a result of the swap, and try to figure out what these
changes tell you about the source of the problem.
An important caveat to this technique is the
possibility of causing further damage. Suppose a component
has failed because of another, less conspicuous failure in
the system. Swapping the failed component with a good
component will cause the good component to fail as well. For
example, suppose that a circuit develops a short, which
"blows" the protective fuse for that circuit. The blown fuse
is not evident by inspection, and you don't have a meter to
electrically test the fuse, so you decide to swap the
suspect fuse with one of the same rating from a working
circuit. As a result of this, the good fuse that you move to
the shorted circuit blows as well, leaving you with two
blown fuses and two non-working circuits. At least you know
for certain that the original fuse was blown, because
the circuit it was moved to stopped working after the swap,
but this knowledge was gained only through the loss of a
good fuse and the additional "down time" of the second
circuit.
Another example to illustrate this caveat is
the ignition system problem previously mentioned. Suppose
that the "weak" ignition coil had caused the engine to
backfire, damaging the muffler. If swapping ignition system
components with another vehicle causes the problem to move
to the other vehicle, damage may be done to the other
vehicle's muffler as well. As a general rule, the technique
of swapping identical components should be used only when
there is minimal chance of causing additional damage. It is
an excellent technique for isolating non-destructive
problems.
Example 1: You're working on a CNC
machine tool with X, Y, and Z-axis drives. The Y axis is not
working, but the X and Z axes are working. All three axes
share identical components (feedback encoders, servo motor
drives, servo motors).
What to do: Exchange these identical
components, one at a time, Y axis and either one of the
working axes (X or Z), and see after each swap whether or
not the problem has moved with the swap.
Example 2: A stereo system
produces no sound on the left speaker, but the right speaker
works just fine.
What to do: Try swapping respective
components between the two channels and see if the problem
changes sides, from left to right. When it does, you've
found the defective component. For instance, you could swap
the speakers between channels: if the problem moves to the
other side (i.e. the same speaker that was dead before is
still dead, now that it's connected to the right channel
cable) then you know that speaker is bad. If the problem
stays on the same side (i.e. the speaker formerly silent is
now producing sound after having been moved to the other
side of the room and connected to the other cable), then you
know the speakers are fine, and the problem must lie
somewhere else (perhaps in the cable connecting the silent
speaker to the amplifier, or in the amplifier itself).
If the speakers have been verified as good,
then you could check the cables using the same method. Swap
the cables so that each one now connects to the other
channel of the amplifier and to the other speaker. Again, if
the problem changes sides (i.e. now the right speaker is now
"dead" and the left speaker now produces sound), then the
cable now connected to the right speaker must be defective.
If neither swap (the speakers nor the cables) causes the
problem to change sides from left to right, then the problem
must lie within the amplifier (i.e. the left channel output
must be "dead").
Remove parallel components
If a system is composed of several parallel
or redundant components which can be removed without
crippling the whole system, start removing these components
(one at a time) and see if things start to work again.
Example 1: A "star" topology
communications network between several computers has failed.
None of the computers are able to communicate with each
other.
What to do: Try unplugging the
computers, one at a time from the network, and see if the
network starts working again after one of them is unplugged.
If it does, then that last unplugged computer may be the one
at fault (it may have been "jamming" the network by
constantly outputting data or noise).
Example 2: A household fuse keeps
blowing (or the breaker keeps tripping open) after a short
amount of time.
What to do: Unplug appliances from
that circuit until the fuse or breaker quits interrupting
the circuit. If you can eliminate the problem by unplugging
a single appliance, then that appliance might be defective.
If you find that unplugging almost any appliance solves the
problem, then the circuit may simply be overloaded by too
many appliances, neither of them defective.
Divide system into sections and test
those sections
In a system with multiple sections or
stages, carefully measure the variables going in and out of
each stage until you find a stage where things don't look
right.
Example 1: A radio is not working
(producing no sound at the speaker))
What to do: Divide the circuitry into
stages: tuning stage, mixing stages, amplifier stage, all
the way through to the speaker(s). Measure signals at test
points between these stages and tell whether or not a stage
is working properly.
Example 2: An analog summer
circuit is not functioning properly.
What to do: I would test the passive
averager network (the three resistors at the lower-left
corner of the schematic) to see that the proper (averaged)
voltage was seen at the noninverting input of the op-amp. I
would then measure the voltage at the inverting input to see
if it was the same as at the noninverting input (or,
alternatively, measure the voltage difference between the
two inputs of the op-amp, as it should be zero). Continue
testing sections of the circuit (or just test points within
the circuit) to see if you measure the expected voltages and
currents.
Simplify and rebuild
Closely related to the strategy of dividing
a system into sections, this is actually a design and
fabrication technique useful for new circuits, machines, or
systems. It's always easier begin the design and
construction process in little steps, leading to larger and
larger steps, rather than to build the whole thing at once
and try to troubleshoot it as a whole.
Suppose that someone were building a custom
automobile. He or she would be foolish to bolt all the parts
together without checking and testing components and
subsystems as they went along, expecting everything to work
perfectly after it's all assembled. Ideally, the builder
would check the proper operation of components along the way
through the construction process: start and tune the engine
before it's connected to the drivetrain, check for
wiring problems before all the cover panels are put
in place, check the brake system in the driveway before
taking it out on the road, etc.
Countless times I've witnessed students
build a complex experimental circuit and have trouble
getting it to work because they didn't stop to check things
along the way: test all resistors before plugging
them into place, make sure the power supply is regulating
voltage adequately before trying to power anything
with it, etc. It is human nature to rush to completion of a
project, thinking that such checks are a waste of valuable
time. However, more time will be wasted in troubleshooting a
malfunctioning circuit than would be spent checking the
operation of subsystems throughout the process of
construction.
Take the example of the analog summer
circuit in the previous section for example: what if it
wasn't working properly? How would you simplify it and test
it in stages? Well, you could reconnect the op-amp as a
basic comparator and see if it's responsive to differential
input voltages, and/or connect it as a voltage follower
(buffer) and see if it outputs the same analog voltage as
what is input. If it doesn't perform these simple functions,
it will never perform its function in the summer circuit! By
stripping away the complexity of the summer circuit, paring
it down to an (almost) bare op-amp, you can test that
component's functionality and then build from there (add
resistor feedback and check for voltage amplification, then
add input resistors and check for voltage summing), checking
for expected results along the way.
Trap a signal
Set up instrumentation (such as a datalogger,
chart recorder, or multimeter set on "record" mode) to
monitor a signal over a period of time. This is especially
helpful when tracking down intermittent problems, which have
a way of showing up the moment you've turned your back and
walked away.
This may be essential for proving what
happens first in a fast-acting system. Many fast systems
(especially shutdown "trip" systems) have a "first out"
monitoring capability to provide this kind of data.
Example #1: A turbine control
system shuts automatically in response to an abnormal
condition. By the time a technician arrives at the scene to
survey the turbine's condition, however, everything is in a
"down" state and it's impossible to tell what signal or
condition was responsible for the initial shutdown, as all
operating parameters are now "abnormal."
What to do: One technician I knew
used a videocamera to record the turbine control panel, so
he could see what happened (by indications on the gauges)
first in an automatic-shutdown event. Simply by looking at
the panel after the fact, there was no way to tell which
signal shut the turbine down, but the videotape playback
would show what happened in sequence, down to a
frame-by-frame time resolution.
Example #2: An alarm system is
falsely triggering, and you suspect it may be due to a
specific wire connection going bad. Unfortunately, the
problem never manifests itself while you're watching it!
What to do: Many modern digital
multimeters are equipped with "record" settings, whereby
they can monitor a voltage, current, or resistance over time
and note whether that measurement deviates substantially
from a regular value. This is an invaluable tool for use in
"intermittent" electronic system failures. |