|
|
View previous topic :: View next topic |
Author |
Message |
RossJ
Joined: 25 Aug 2004 Posts: 66
|
Stack depth is unexpectedly high |
Posted: Mon Dec 16, 2013 7:31 pm |
|
|
Hello,
I am looking into why my program compiles very differently for debug or release.
Code: | DEBUG RELEASE
Compile time 30sec 180sec
ROM used 81% 91%
Inlined funcions 49 493
Stack used (main) 24 19
Stack used (ints) 6 8
|
There are some very small differences in the code between the two builds, which appears to be forcing the compiler to inline more functions in the release version. This takes more time and more ROM.
I decided to try to determine where the function nesting is occuring so that I could alter the code structure a bit to reduce the need for the compiler to inline functions to stay under the 31 level stack limit. My problem is that the TRE file only shows a maximum depth of 13 functions from main(), and it's the same for both builds. Why would the compiler think there is a greater depth, and be doing massive amounts of inlining? Or perhaps something else is going on here...
I'm building for PIC18F87J11 with PCH 5.016 in single file mode.
Thanks, Ross |
|
|
ckielstra
Joined: 18 Mar 2004 Posts: 3680 Location: The Netherlands
|
|
Posted: Tue Dec 17, 2013 2:27 am |
|
|
The difference between Debug and Release version is because in the Release version the compiler applies more aggressive optimization for both speed and size. See the compiler settings. Higher level means more optimization. Too bad CCS doesn't explain the differences between optimization codes. Often level 9 is chosen.
Aren't you looking into a problem that isn't a problem at all?
Stack depth isn't in the critical zone yet. Your debug version is using 24 of the 31 available levels. Considering that you are already using 81% of the available ROM space it is not to be expected that this will grow much higher.
Quote: | This takes more time and more ROM. | Here you make a mistake.
Inlined functions do take more ROM because the same function is created multiple times in code space. But execution is faster as there is no overhead for pushing and popping variables on the stack.
If this is a new product then I would be worrying more about the high memory usage of 81% / 91%. For a new product you can expect the following years more features to be added. As a rule of thumb I like to ship a first product version with maximum 60% ROM usage. With higher ROM usage you better consider to chose the next higher processor with more memory. |
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19589
|
|
Posted: Tue Dec 17, 2013 2:51 am |
|
|
The latest compilers primarily optimise for speed on the release code, which is why V5 compilers often give larger ROM sizes than the older compilers.
There is a new optimisation command #OPT COMPRESS, which instead makes it try to optimise for minimum size (very aggressive...).
I'd worry about the stack if you were within a couple of levels, not when it is half empty. As Ckielstra says, worry more about the ROM size. Look for any 'low priority' functions (things where you don't care about speed), which are being inlined, and then declare these as #separate. Will increase your stack usage, but decrease the ROM.
Remember 'simple' things like arithmetic, will often involve several call levels. I don't think the compiler shows these in the .tre file. That may change if you remove the #nolist option?.
As a comment, I think you may be getting misled about what you are seeing. The 'main' figure given by the compiler, _includes_ the potential interrupt calls inside the main.
So the 19 levels shown is only half way up the stack.
Look at the figure at the top of the lst file, which is the most informative one.
Best Wishes |
|
|
RossJ
Joined: 25 Aug 2004 Posts: 66
|
|
Posted: Thu Jan 02, 2014 11:13 pm |
|
|
Hi guys,
Thanks for your comments and my apologies for taking so long to respond. I have since done further investigation on this and believe I have the issue sorted. I have also realised that I omitted a couple of details from my original post which may have influenced your responses. So below are some additional points of relevance and some of my conclusions.
1. I am not employing any CCS feature when building explicitly for debug or release, except for using #fuses DEBUG which doesn't impact the build other than setting the appropriate fuse bit. There are a couple of small code differences controlled through pre-processor conditions. This seems to be what's triggering the differences in compiler behavior as the release code introduces a function which uses several additional stack levels.
2. The afore mentioned 'release only' function is actually part of an error handler which always leads to a restart. I had attempted to remove this code from the call tree by locating it at a fixed address, resetting the STKPTR on entry and using goto_address() to reach it. The problem with this is that the compiler treats the isolated function in the same way as it does interrupt routines. Thus all stack levels become an overhead across the entire program, and not just from the point at which they are needed. This is why 'Stack used (ints)' is so high (8).
3. The same optimisation level is used for debug and release in the table quoted earlier (#opt 8). I normally use 9, but had changed to 8 due to a bug in 5.013. I'm now back to using 9.
4. The debug build quoted above uses 24 + 6 = 30 of 31 stack levels. The release build added another 5 or so levels which forced the compiler to inline many functions which lead to the increased ROM size (as expected) and a massive increase in compile time (from 30 sec to 3 min).
5. I have now refactored the error handling code so that all work is done following the restart (instead of prior to it). My program now compiles with 68% ROM and 24+3 stack levels. I also tried the #opt compress option mentioned and that reduces ROM usage to 63% but stack rises to 26+3. The compiler is not performing inlining to free up stack in either case. The code is 'reasonably' mature so I am not too concerned about this ROM usage. The next PIC up is a PIC24.
FINALLY SOME OBSERVATIONS ABOUT THE COMPILER...
1. The .tre file is actually compressed. Any function which is called multiple times is only included once with subsequent occurrences being replaced with a *. This mislead me when I was trying to determine what parts of my code were at peak stack depth. I wrote a small Java utility to expand the .tre file and that produced a tree consistent with the summary at the top of the .lst file. I don't think the compiler always did this. Maybe it should be optional...
2. Optimisation doesn't appear to affect the .tre file. Compiler generated functions (arithmetic, delay, sprintf etc.) are shown in the .tre file.
3. When the compiler inlines a function, it is still shown in the .tre file but marked as such. Inlined functions are not counted toward the stack usage.
4. When the compiler replaces CALL with GOTO/BRA to enter a function, as it does when a function is only called once, this still counts as a stack level!!! The compiler counts it towards the summary at the top of the .lst file and may begin inlining functions early due to a perceived shortage of stack. In fact I was able to create a test program which the compiler inlined many functions because it 'ran out' of stack, and yet there were no CALLs at all in the compiled code. Given that the main purpose for replacing CALL with GOTO/BRA has to be to save on stack, this must be a bug?
5. When a program is using too many stack levels (main + ints > available), the compiler automatically inlines functions to reduce stack usage to within hardware constraints. This step occurs after the individual files and main are compiled, and can take considerable time. The PCH GUI window is also non-responsive during this activity.
Cheers, Ross. |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|