Tuesday, 17 November 2009

Lessons from the Recent Deadlock issues

I usually don't do this over this blog... on my private blog diary however I maintain my own lessons but thought it has become very critical and worth a share. The recent deadlock issues really really tested my patience... it started on 30th October and finally seems to have ended today. But before I loose track of what and how we did, what we should take care.. here are my lessons from the issue :

1. A bug should be treated as immutable - It should be removed or created never fixed.
Every time a bug is created in an application it causes a lot of issues. People say we fixed the bugs.. the truth is every time a bug is said to be fixed, there is either another created or the same one removed. Don't ever think of a fix actually fixing the actual problem.

2. List down the problems , don't fix them yet
We usually mistake by finding one problem and starting to fix it. A better approach is to generate a huge list of problems in 3 orders:
1. Actual Problems
2. Potential Problems
3. Past problems in the same area

Prioritize it with 3 different sets of people... QA, Dev and Project management.. generate a list and then start working on them. In the meantime you may get requests to hit each of these areas adhoc.. Do not move around, unless you have convincing inputs. Fix one and make sure the fix doesn't break anything else.

3. Pick up the one that you think is obvious and get deeper to the non obvious
When you start the investigation on tricky issues, you end up investigating or looking for problems.. more usual answer to bigger problems are very small fixes.. I remember one of the alerting issues we had in the past had a typo and 2 years 3 hard core developers could not get the fix working.
When you step into a problem area make sure you cover the linked non obvious area.. there maybe something very silly in there...

4. Never bring only experts on the issue
Experts tend to focus only on technical directions, many times experts have a rigid view of how things work and they dont want to move away from it and find problems... 3 times during this deadlock issue we found that non experts came very handy:
1st When they traced that downloads block.. by accident or by luck and by hawk eye
2nd When they found that uploads take time when done in parallel
3rd when somebody not even on the system gave an idea to look at areas that are used in conjunction and have less of a link
All 3 times we found good traces and inputs to move further and find out whats on.

5. Analyse, List , Analyse , Detail , Analyze , Fix , Analyze , Test, Analyze, Analyze the Analyzed
Follow it hard... anything you do , just analyze and re analyze.

6. Trust your Findings, Analysis, Team , Fix , Test and Release
Trust whatever you do. Stand by it. Make sure it succeeds.

7. Nothing can be found and fixed in 1 go
It can never be, make sure you find , remove and keep finding.

8. There is never an end to a problem.
It will come back. Very soon. Be prepared.

9. don't loose track of what you have done.
List them out. put it as a checklist and then tick them on and off. Revisit them.

10. Wiki your approach and the tools, technology you used to fix it
Document your approach.

11. Spend hours and hours on looking at whats going on, minutes looking at exact problem area
May help you give more ideas on where to look at the exact problems.

12. leverage the dependency factors, Isolate what is not common
Add up flexibility to your analysis

13. Make a Team to look at the issue not individuals.
Individuals focus on certain areas based on there likings and feelings. Bring a Team that can drive discussions to move in the right areas. Add Team that can value in discussions..

14. Group to find it, Individualize to analyze, Pair to fix, Group to Test
Bring groups to find the issues, helps when different people, Individuals to analyse them , brings developers to pair and fix them as they can be more effective.

15. Revisit what is done earlier
Go back and see what has being done so far and what has being done in the past.

16. Track everything you do to resolve
Keep it in a way you can refer to it later.

17. Buy a Lunch to those who fixed it
Finally !!!!