Thursday, January 20, 2011

Thoughts following 2010 FogBugz and Kiln World Tour





I was attending yesterday Joel Spolsky world tour for FogBugz and Kiln.


I must admit that till yesterday I thought that Joel has a great blog about SW development, but I didn't quite understand - who needs yet another bug tracking tool... is there still a real market for that, when you have today so many good free tools!?


A few thoughts following the event:



  1. Joel's presentation skills do not fall from his writing. The gathering starting with a projected countdown analog clock, Joel went on stage right at the moment when the countdown reached its end - with a pre-planned bug on the clock finish resulting with some fake errors and blue screens, which led Joel to open a bug, start his presentation, then get a response on his newly opened bug "poping accidentally" while in his presentation (interestingly enough, while his Outlook is closed :-) , which led to getting into the code itself using Kiln, comparing versions and eventually "solving" the bug and checking in. A full bug detection and solving cycle in 10 minutes. With a few jokes here and there, and while going quickly through the products' abilities and strength points.


  2. The traditional tools for bug tracking and source control are OK. But even in this "already solved" domain, there is still room for improvements and for new players, either open source or commercial. When we have an idea to develop something we usually tend to check who already done that and how good it is. And when we see that there are already several reasonable solutions we assume that the market is closed for us on that. Well, Joel shows that there is always a market for realy good products, you don't have to invent a new big thing, you may need however to invent many small things.


  3. Distributed Source Version Control - GIT, Mercurial, Kiln - disconnects the developers own copy from a repository, while preserving full history and repository context. This concept has several significant advantages:
    (a) You do not postpone your check-ins, being afraid of hurting the main repository, check-ins are made into your private copy. When you are ready you push your copy back to the main repository. Your check-ins preserve full history during your development! This is also important when you perform a merge and realize that your original file was overriden by a wrong change - you don't have to worry as you have your full history at hand and you can get back to your last check-in.
    (b) You can push your developments to another repository, e.g. to a QA repository, thus allowing to push relevant fixes quickly, without releasing them yet to the development repository.
    (c) Upon merge, it's easier to see for each change the exact origin of it. Managing several customer releases you can see which changes in the main release where merged into the customer release and which not.

Thursday, January 13, 2011

Online IDE for almost any SW lang you can think of

Take a look at this one: http://ideone.com/


It supports:
Ada, Assembler, AWK, Bash, bc, Brainf**k, C, C#, C++ se, C++0x, C99 strict, CLIPS, Clojure, COBOL, COBOL 85, Common Lisp (clisp), D (dmd), Erlang, F#, Factor, Falcon, Forth, Fortran, Go, Groovy, Haskell, Icon, Intercal, Java, JavaScript (rhino), JavaScript (spidermonkey), Lua, Nemerle, Nice, Nimrod, Objective-C, Ocaml, Oz, Pascal (fpc), Pascal (gpc), Perl, Perl 6, PHP, Pike, Prolog (gnu), Prolog (swi), Python, Python 3, R, Ruby, Scala, Scheme (guile), Smalltalk, SQL, Tcl, Text, Unlambda, Visual Basic .NET, Whitespace


It's very useful when you want to check little programs, like the ones I tried when writing Freaking behavior of a small little C/C++ bug:

[1]


[2]

Freaking behavior of a small little C/C++ bug

Oh boy.
Read till the end the event and its root cause. Important morals follow below.





We run systems that on high capacity events handle thousands of transactions per second. One of the most heavy-traffic periods is New-Year's-Eve, the 31st of December, were most of our systems are under heavy stress around the world, stress that tends to difuse to our support teams. Structured and strict preparations usually make us pass this heavy-traffic day properly in most, if not all sites. Which happily was the case also this year.

Shockingly, on January 2nd we had a crash in two sites.

Analyzing the crash led to a timer that instead of re-scheduling itself for every 5 seconds, keeps snapping abruptly in periods of milliseconds.

While still analyzing the case, reproducing it in our labs, the problem vanished as suddenly as it appeared, on the end of the same day. January 3rd, 00:00, systems went back to behave nicely.

That's really odd. How does the bug relates to the date? Is it a coincidence? It doesn't look so, as a second after midnight problem disappears. Trying to reproduce it in the lab we got the same behavior: it is the bug of January 2nd 2011. (By the way, when running the system in our labs in debug mode, problem didn't reproduce! Bug appears only when running without debug! That's common for memory related bugs, smears etc.)

To some of us, it sounded like the iPhone alarm bug. Which was reported also not to work properly on 2011 start, being fixed on its own, by January 3rd.

http://www.tipb.com/2010/12/31/iphone-bugs-alarms-working-2011/


Maybe it's the same bug?

iPhone runs on iOS which is Linux based. We also run on Linux. Maybe there is something with Linux timers on beginning of 2011?
Looking for something in this direction led to nothing.

On the other hand, analytical investigation led to the following:

  1. The timer, when awakes, calls our callbak function. The callback function shall return an int value. Any value except 1 says "OK", 1 says - please call me again.
  2. Our callback function didn't return a value at all
Wait... - is it legal not to return a value from a non-void method?
Unfortunately, in C/C++ it is. And the bevior is undefined. The function do return a value, in some environmnets it will be the last value from the register. And, well, occasionaly it can be 1.
See:
http://stackoverflow.com/questions/1610030/why-can-you-return-from-a-non-void-function-without-returning-a-value-without-pro/1610454#1610454
http://stackoverflow.com/questions/2598084/function-with-missing-return-value-behavior-at-runtime
What shall be done?
Read:
http://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html
-Wreturn-type
Warn whenever a function is defined with a return-type that defaults to int. Also warn about any return statement with no return-value in a function whose return-type is not void (falling off the end of the function body is considered returning without a value), and about a return statement with an expression in a function whose return-type is void. For C++, a function without return type always produces a diagnostic message, even when -Wno-return-type is specified. The only exceptions are `main' and functions defined in system headers. This warning is enabled by -Wall.

Morals
  • Listen to compiler warnings!
    Solve all warnings, you should have a zero warnings policy.
    The problem above could be caught and solved as a warning (-Wreturn-type).
  • If you don't keep a policy of zero warnings, which you should, turn bad warnings as the one above into an error, with a compilation flag, e.g.: -Werror=return-type
  • You may want to test your software in future time, for example, have a test system that runs all the time 30 days ahead, if there is a time related bug it may help catching it on time. It won't probably catch everything, but it could have catch the problem we had above!