|
|
View previous topic :: View next topic |
Author |
Message |
MotoDan
Joined: 30 Dec 2011 Posts: 55
|
Possible Baud Rate Tolerance Issue |
Posted: Wed Nov 19, 2014 2:51 pm |
|
|
I'm working on a first production run problem on a pair of boards (PIC16F1516) that communicate over a 6 ft cable at 19.2kb. Both are using the internal osc running at 2 MHz.
According to the datasheet, the baud rate error at this frequency calculates to be 0.16%. The trouble we're having is that a small number of board pairs will occasionally fail due to data reception errors. The data for both PICs decodes correctly on my scope - even when the transmission fails. This could just be due to the scope's wider acceptance of baud rate tolerance errors.
The PICs are all programmed by a 3rd party. I don't have any reason to believe that they would be altering OSCAL so I assume the internal oscillators are within the +/- 8% @ 5V. I'm also not sure if the two PICs happened to be running at their opposite worst-case frequencies would cause the uarts to fail to receive reliably.
The communication between the PICs is a simple ASCII fprintf/fgets affair consisting of 8 bytes (master) with a 3 byte slave reply. No CRC or other error checking is currently being used. I've looked at the failing data using a second (software) uart and PC connection and the value that is failing does not appear to be random like first suspected. Instead, the failing value is always the same.
I've noticed that the problem disappears when I change Fosc to 4 MHz on one of the PICs. The baud rate error at 4 MHz is the same 0.16% so on the surface it doesn't look like I'm improving anything by increasing Fosc.
I've tried heating up the PICs to see if I can change the failure rate, but so far, temperature doesn't seem to have an affect. I have also made some timing measurements on the uart data from both PICs which looks to be reasonably close to the target 19.2kb. I have rs232 'errors' enabled, but am not currently testing for framing errors, etc.
I'm looking for any suggestions on how to verify that the failing data is actually due to a baud rate issue. My main concern is that the fix I end up with will be reliable and not just show up with another set of boards which are slightly different in a way (Fosc, etc) which causes the failure to reappear. My suspicion is that the 2 MHz to 4 MHz change is solving the problem on the set of boards that I am troubleshooting, but may not work on another set of boards with oscillators running a slightly different frequencies. |
|
|
Mike Walne
Joined: 19 Feb 2004 Posts: 1785 Location: Boston Spa UK
|
|
Posted: Wed Nov 19, 2014 3:33 pm |
|
|
You don't tell us much about what else you're doing, so we're second guessing.
However, I encountered a similar problem many moons ago.
Two boards were communicating down a single wire.
So both boards had to turn round from TX to RX and vice versa.
The error rates were of the order of 1% so quite rare but not acceptable.
To diagnose the problem I looked at what data each board thought it was getting, rather than what my scope and other test gear saw.
What came out from the analysis was that the errors were being caused by a corruption of the leading edge of the first byte, rather than a baud-rate issue.
It's possible that your problem is being caused by some part of one board not reacting quickly enough to the incoming data.
Moving from 2MHz to 4MHz is allowing for a faster response and thus masking things.
Try experimenting with different baud & clock rates.
If your system is robust enough it should be able to tolerate timing errors of several % at both ends.
Mike |
|
|
MotoDan
Joined: 30 Dec 2011 Posts: 55
|
|
Posted: Wed Nov 19, 2014 4:01 pm |
|
|
Great information Mike. I failed to mention that these boards also communicate over a 1-wire connection. There is ample (5 ms) between master/slave bus activity. You bring up a very interesting point about the start of data corruption. I'll have to take a look at what the PIC is receiving to see if I can tell if this is what's happening or not.
Thanks for the reply! |
|
|
newguy
Joined: 24 Jun 2004 Posts: 1909
|
|
Posted: Wed Nov 19, 2014 4:06 pm |
|
|
Can you add a small deadtime between each transmitted packet? If you already have such a thing, how about a small deadtime between each transmitted character?
Another trick you could try is to take a page from simple RF transmissions, and preface all data transmissions with a series of superfluous synchronization bytes. Instead of master sending 8 bytes to slave, send 8 + x, where the first x bytes are your unique sync bytes. On the slave, alter your data reception routine/logic to automatically discard any number of the sync bytes and instead focus on the "start of transmission" character instead, then continue as you have been. Similarly add slave sync bytes to what the slave transmits. The assumption here is that the UART of either master or slave isn't properly delineating each transmitted character....the UART may think it saw a start of character when the other UART is actually in the middle of a character. |
|
|
PCM programmer
Joined: 06 Sep 2003 Posts: 21708
|
|
|
temtronic
Joined: 01 Jul 2010 Posts: 9246 Location: Greensville,Ontario
|
|
Posted: Wed Nov 19, 2014 5:36 pm |
|
|
Another area to look into is the power supply for the PICs. My rule of thumb is to have a PSU good for at least 5X the max you 'think' the PCB will draw AND be sure to have proper filtering. ANY 'unguarded' pin could easily let a 'glitch' or 'gremlin' in causing no end of grief and hair pulling. Also be sure you don't have any unused pins 'floating', use a pull resistor to ensure a good high.
Sounds like you've gone the 'economical' route of no xtal and 2 caps and I understand the penny saving idea but....it'd be interesting if your problem 'goes away' if you added them back in.
I know 6 feet of wire isn't a lot BUT it can easily become an antenna, pick up some EMI (cell phone, wireless modem, etc.) and the 'fun' begins.
Be sure to have a GOOD ground between the two PCBs.
I know most of this you've probably thought of but sometimes it's the little detail that you miss comes back to bite you.
good luck
report back with what you find.
Jay |
|
|
gpsmikey
Joined: 16 Nov 2010 Posts: 588 Location: Kirkland, WA
|
|
Posted: Wed Nov 19, 2014 5:54 pm |
|
|
Don't forget that according to Murphy's law, tolerances will add up in the worst way possible (so one will be on the high end of the spec and the other will be on the low end of the spec). See if the ones that are giving errors are also giving errors if they are paired with a different board.
mikey _________________ mikey
-- you can't have too many gadgets or too much disk space !
old engineering saying: 1+1 = 3 for sufficiently large values of 1 or small values of 3 |
|
|
RF_Developer
Joined: 07 Feb 2011 Posts: 839
|
|
Posted: Thu Nov 20, 2014 4:41 am |
|
|
I too am still worried about timing, as well as line turn-round. With typical 8N1 asynch comms the acceptable end to end baud rate error is about 4%. Its not all that helpful to state it as +/-2%, but instead to better to consider it a a error between the transmitter and the receiver.
If devices use a non-baudrate related clock frequency such as 4MHz there will generally be a fixed (as in non-varying) error as the baud rate gen simply can't precisely produce most or all of the standard baud rates. In the case of your PIC16F1516 at 2MHz that smallest that error can be is 0.16%. So, instead of producing 19200 baud, the best it can do, if configured optimally, is 19230 baud. It may be far worse than that, it could be -6.99% or 17857baud, depending on how the baudrate generator is set up. With CCS I assume it tries to set up for the lowest error, but is it? At 4MHz it's easier to set up, but still the minimum error is 0.16%.
That though only matters if the two ends of the link are different. If they are both PICs running at 2MHz set up the same way then there will be no end-to-end error due to baudrate generation: just the link happens to run at 17857 baud, or whatever.
If the PIC is talking to a PC (and by no means do all PCs and serial ports do better than PICs, indeed they can share the same baudrate generation limitations) then there's potentially more of a problem that two PICs talking to each other. These days most serial ports are USB to serial convertors. What clock are they running at? How do they generate their baudrate clock?
A bigger problem is clock generation. Baudrates are, of course, derived by division from the some clock source and are totally dependant on that source. Internal oscillators on PICs are not great clock sources. They drift, especially with temperature and supply voltage. Its not so bad when an internally clocked device talks to a crystal controlled device - there's only one significant source of drift and clock error, but when two internally clocked devices talk there's twice the error, and a whole lot more chance of problems. What looks fine to a PC, might not be interpreted correctly by another internally clocked PIC.
Oscilloscopes vary a lot in their timebase and time measurement accuracy and even good ones need careful use to get anything like accurate measurements. Just because a scope thinks its OK, doesn't necessarily mean it really is.
The bottom line is that even at low baud rates, PIC internal oscillators are barely adequate for async comms under benign environmental conditions, and not good enough over wide temperature and supply ranges, and totally inadequate for CAN and USB and other comms standards that have tighter frequency tolerances.
With a half-duplex line there's also turnround issues as others have said. If a transmit driver is turned on only just before the start of the first character then there's likely to be problems. Async comms generally rely on accurately detecting the leading edge of the start bit. If that's corrupted then the margin for timing error drops significantly. That's the point of sync characters before packets/message: to give a clean "run-in". There should be more than one if there is any chance the first may get corrupted by turn-round issues.
Also, running straight "TTL" (in fact almost no one uses TTL levels these days, with their inherently poor noise margins. PICs effectively have HCMOS levels) over even a few feet of cable can have a lot of problems. Noise and ground differences can wreak havoc on signals. There are good reasons for all the data transmission standards that have been developed over the years, all of them better than the raw signals produced by PICs. I never use raw logic levels to transfer data in my systems. But then, I do usually have potentially large levels of RF floating around to worry about, and I can't take grounds/chassis potential for granted as they have both large return currents AND RF! I have adopted CAN as my preferred intra-system comms standard due to its inherent robustness.
So, what's causing your problems? I don't know, it could well be a combination of turnround timing and clock drift. This looks likely as some pairs work and some don't - that says timing error to me. Personally, I'd not try to run async comms off the PICs internal clock. I did once for a prototype on a tiny board, as the space taken up by a crystal or ceramic resonator was not acceptable. Even with a non-standard baud rate to eliminate clock gen error, it would NOT have worked reliably over the required temperature range of the final product. |
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19551
|
|
Posted: Thu Nov 20, 2014 4:53 am |
|
|
I wonder about one thing. Still timing related.
This PIC says it uses a clock at 16* the baud rate, for it's internal UART sampling. At 4MHz, this can be generated directly by division from the oscillator, but at 2Mhz, this frequency is not achievable directly. So it'll probably have to generate a half cycle division. Depending on the symmetry of the clock, this can produce yet another slight timing error. Add this to the tolerance between the oscillators (already potentially out of spec), and things are getting worse....
Given that one doesn't actually 'care' about the real baud rate (no other devices than the PICs involved), why not think of some way of knowing if the packet has failed, and if it does, enable the ABDEN bit in the USART, and let it recalculate the best fit timing?. |
|
|
Mike Walne
Joined: 19 Feb 2004 Posts: 1785 Location: Boston Spa UK
|
|
Posted: Thu Nov 20, 2014 5:38 am |
|
|
There are still too many loose ends here you're not telling us about.
How are the PICs connected, directly, via ttl buffer, ttl to RS232 converters, ttl to 485 converters, whatever?
How are you achieving the turn round?
How are you measuring the actual baud/oscillator rates?
What is the accuracy of your measurements?
Mike |
|
|
temtronic
Joined: 01 Jul 2010 Posts: 9246 Location: Greensville,Ontario
|
|
Posted: Thu Nov 20, 2014 6:31 am |
|
|
Just looked at the datasheet and have a few questions
1) Why 4MHz operation when it can do 16MHz? Usually most programmers go for the max.
1a) If you try at the higher clock does the problem still exist ?
2) PIC has 'autobaud' capacity. Is that turned on? Confirm by the listing and see the config bits, never assume the 'defaults' are right!
3) Any chance the PIC is in 'sleep' mode? Potentially a problem..'lazy PIC itis'.
4) How is your 'one wire' hardware and software done. If the pullup on the 'bus' is weak,signal will be 'lazy' or slow. If you show us your 'one wire' code, someone might see ' there's the problem'...
5) WDT enabled ? Maybe a red herring but be nice to know.
That PIC has a lot of nice features but since the problem only appears to be a few boards maybe they're not clean?
Please confirm, you ARE using the hardware UART and running a good 5 volt power supply.
hth
jay |
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19551
|
|
Posted: Thu Nov 20, 2014 7:06 am |
|
|
Just a comment, the AutoBaud on that chip is not something you want on, _unless_ you are adjusting the rate. The CCS code will not turn it on.
It works by switching the internal clocks, so it 'count' the next incoming character, and works out what count value should be loaded in the BRG to give the best result. You have to set it, have a character sent by the other end, then read the time recorded, subtract one, and write it back (it records the division count required, but the BRG divides by value+1).
It definately would be the way to adjust the BRG to get the best results, though it too would like a higher clock rate (at the existing divisor of /26, the adjustments would be /25 or /27 -> nearly 4%, while at a higher clock the adjustment becomes finer. |
|
|
MotoDan
Joined: 30 Dec 2011 Posts: 55
|
|
Posted: Wed Dec 03, 2014 4:50 pm |
|
|
Thanks for all of the insightfuil input. Really appreciate everyone's efforts.
Here's a little more info on what's going on. Just got another set of boards from the mfgr in China and this time the UART output from the Master unit is not correct. The Remote unit is not responding as before, but now the UART decoder on my Rigol scope is not seeing the correct bytes.
Suspecting an error in Fosc, I decided to skew the '#use delay' statement to see if it had an effect in the UART output. I first went to 1.9 MHZ which made the decoding worse. I then went to 2.1 MHZ which fixed the problem. The scope is decoding the correct data and more importantly, the Remote is too.
To the question about whether I'm using the hardware UART - yes. UART1 is specified in the #use RS232 statement.
I'm assuming the reason that increasing the delay statement to 2.1 MHZ is because the compiler is adjusting the baud rate generator values which then allows the UART to transmit/receive at the correct rate of 19.2kb.
Another question was related to why 2 MHz instead of 16 MHz. The reason I'm using 2 MHz is to reduce current consumption as this is a battery-powered device. Also, to your question about a clean supply, these devices are running at 5V via a switching supply. The output was measured at 5V with about 25 mV ripple so it's pretty clean. The switcher also has ample current to run this circuitry.
These boards use the CLKOUT which produces Fosc/4 at an output pin. I changed the internal osc to 16 MHz (for this test) and measured the frequency with an accurate counter to be 4.15 MHz which correlates to an Fosc of 16.6 MHz. This is only about 4% high which is within the +/- 6.5% (over temp) spec. These boards are all at room temp so I would think the factory calibration would be much closer than this.
So at this point my assumption is that the internal 16 MHz osc is in tolerance, but slightly high. This makes me wonder if perhaps the factory is using either PIC knock-offs. The other possibility might be that the factory calibration of the internal osc has somehow been altered. Either way, there is no way I can achieve the +/- 2.5% baud rate tolerance (that others have cited) when the Fosc is 4% high.
The possible solutions that I'm coming up with are: 1) switch to an ext osc with a frequency - preferably one that is a multiple of the 19.2kb rate, 2) perform a software calibration based on Fosc/4 which corrects the BRG values, or 3) select a custom baud rate that is a 1:1 multiple at 2 MHz. The latter wouldn't make much difference since the 19.2 kB is only 0.16% off at 2 MHz. |
|
|
PCM programmer
Joined: 06 Sep 2003 Posts: 21708
|
|
Posted: Wed Dec 03, 2014 5:58 pm |
|
|
Quote: |
1) switch to an ext osc with a frequency - preferably one that is a multiple
of the 19.2kb rate.
|
This means re-doing the layout or re-working existing boards.
Quote: |
2) perform a software calibration based on Fosc/4 which corrects the BRG values.
|
This means every board must be individually calibrated against an
accurate frequency reference. I assume you don't have one on the
board, so it must be done in the lab or possibly could be coded to
be done in production/test.
Quote: | 3) select a custom baud rate that is a 1:1 multiple at 2 MHz
|
Given that each board can be +/- 6.5% of the nominal 2 MHz, how will
a custom baud rate solve anything ? You can't trust that all boards will
be 4% high. Some might be 4% low. The data sheet says +/- 6.5% over
a small temperature and voltage range, and +/- 8% overall.
Microchip's QA test might pass any PIC that is +/- 4%. Given production
tolerances, then any PIC shipped would meet their overall spec.
Your assumption is that Microchip is aiming for a +/- 0.1 % spec and
just accidentally shipped some PICs with the oscillator 4% high.
I don't think so, given the +/- 8% spec in the data sheet.
4) Possibly switch to a pin-compatible PIC that has a +/- 2% internal oscillator. Using this tool,
http://www.microchip.com/ParamChartSearch/chart.aspx?branchID=1002&mid=10&lang=en&pageId=74
and selecting 28 pins and 512 RAM bytes, I get:
16F1713
16F1783
16LF1906
16F1936
The 19xx series are specialized LCD PICs so maybe just look at the
first two. |
|
|
temtronic
Joined: 01 Jul 2010 Posts: 9246 Location: Greensville,Ontario
|
|
Posted: Wed Dec 03, 2014 8:30 pm |
|
|
Sorry you've found out the hard way that the int osc isn't too good for serial communications(BTDT). It's also not good for other 'timer' related operations.
Any chance you can add a xtal/2caps AND use a bigger battery ? Using the xtal will give you rock solid communications, and well, a bigger battery isn't that much money or size these days. Maybe add a supercap to reduce peak demands?
Another possible thing to try is to reduce from 19k2 down to say 9600. Slower speeds are more 'forgiving'. Heck I run at 24 Baud (yes, 24 bits per second) and never have 'issues'.
Not too sure what else to suggest...
Jay |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|