Epic Bitter: January 2015

Thursday, January 29, 2015

What users want ?

One big challenge when dealing with user support is around time management. Years ago I read a book called Time Management for System Administrators by Thomas Limoncelli. This was a great book at the time because opened my eyes for a concept of huge importance that I only came to fully understand when I started to study TPS and Lean.
In this book the author insisted in the idea that it is the system administrator's responsibility to talk to users and gather their requests. He called it The Check-in-with-Customers Walk-around, and his idea was inspired by a coworker that used to visit and talk with all users every day.

When I first started to use this idea I was the only tech support in an office of around 60 people. After a few weeks, the first thing I realized was the amount of time that I had freed.I was really surprised by that, so I started to think, how could I have more free time, if actually I was spending more time to talk to people ?

After I put some thought on it I realized that getting the issue beforehand, when I came to talk with the users, allowed me to better organize my work and priorities. In the end, I had lots of free time to do stuff like research, improvements, playing video-game, at the same time that work was all done and users were happy.

People will only search for solutions and help when they actually need it. It is normal for people to be focused on different issues and only realize that they need your help when it becomes urgent. When it becomes urgent, you have little time to deal with it, and in most cases you will have to handle the user's frustration. By talking to them in a regular basis you reach their needs before it becomes a problem. This give you more time to solve the problem. Also allows you to organize your work in a way it becomes more optimal, resulting in free time to invest in other activities.

After doing this for years I learned a few tricks that can actually help a lot gathering the right information from users. For instance, if you just walk by and ask if "everything is ok ?" the reply most likely will be a nod and people will say it "is all good". The reason is because they are not focused on your question. You need to break the focus on what they are doing and force them to focus on your question. When I visit tables nowadays, I first ask how is everything. People will say, it is fine most times. Then I start to get specific about questions. How is the internet ? We upgraded the link, did you noticed difference ? How are the meeting rooms ? How is your laptops ? Is your project needing something ? Usually after shooting a few of this questions, something will surface. The questions induce them to really think about it making it easy to remember what they need. I take notes on every request and they all go to my backlog. This is the main way we collect requirements.

After doing this consistently for some time, users will get used to it. Many times I saw they organizing their needs to report them during the walks that we perform in the office. Also this discourages them to search a solution by themselves. In a classical ticket systems, where users need to send emails or call a service desk, it is not unusual for users to try to solve the issues by themselves instead of going thru the bureaucratic process of opening a ticket. This can have many negative impacts in our work:

Sometimes user solutions ignores important aspects as cost, security, monitoring and availability.
You will never know for sure the real demand you have, once some problems are solved by users.
Many times those solutions do not follow standards, but still the users will ask for you to help with their solution whenever they have an issue with it.
They are spending time in a activity that is not their focus. This could have business impact on the project they are working.

Another element behind this practice is the fact that it approaches the clients and the support team. Email addresses and phone numbers are too distant and many times the user would like to have a face associated with the issue. It is psychological, but must people would rather know the person who is helping them. If they can have the person's contact number even better. Most times it will not come to a point where people will actually call you, but it works to increase confidence and the security sense.

Here in our company we started with the walks once every week and now we are increasing into twice every week (Tuesdays and Fridays).

Also you can assign different members of your team to collect the requirements from different teams in your company. This can be used in larger offices and whenever you have remote clients.

It is not important who will solve the issue, but it is important who is responsible to gather it and keep track of the status. This can be simply done during the daily standup meeting.

Friday, January 23, 2015

Cards, metrics and how to deal with it.

Organize workflow is not a simple task. Your goal is to have a system that provides you with valid metrics, works on pull, have continuous improvement process built-in , focus on quality and also heijunka (even flow). Not a simple task.
It is strange to start to write a post about it when we did not yet totally achieved it. We are administering small changes in a continuous way to ensure that the results we get are the ones we expect. So this is a post about where do we want to go, how do we plan to get there and what we did so far.

Before we can proceed with analysis of work itself, it is required to understand the possible status that work can possibly have:

Backlog
Working
Blocked
Done

Backlog is the queue where tasks wait until someone can start to work on it.
Working is a status where the team is performing activities related with that card.
Blocked is when a activity is not finish, but the team is waiting for an external party in order to either resume the work or move it to done.
Done is a status when all the work related with that activity is finished.

It is truly important to understand what kind of work you and your team do. Different types of work requires different approaches, because then you can have more assertive processes on how to perform it and specially how they relate with each other. In our infrastructure/support work, we ended up with three classifications for work:

Fix
BAU - Business as usual.
Improvement

Fix is any system that was working and it is no longer functional.
This type of activity never goes into backlog, skipping directly into working. This means that whenever a Fix became necessary, the required resource to deal with it needs to be freed, and whatever work was being performed by the resource would go into blocked state with a special flag to identify the reason why the task is currently blocked.
Due to Fix nature of disrupting services, it needs to be restored as quick as possible. Once the service is resumed, the card can be move to done, but additional steps become necessary. Every Fix occurrence generates a new card in Improvement Backlog. This card needs to be tagged as a Hi-Priority card.

The reason for this behavior is to ensure that the root cause problem that generated that fix will not generate additional Fix requests in the future.

BAU or Business as Usual is the daily activity. It can include purchases, support activities, housekeeping, etc... This type of activity should be always driven by an existing process. Obviously when starting to adopt this methodology, processes for most of activities will not exist. In this case, the card is solved in the best possible way and a new request for a process is placed on the Improvement Backlog. If there is a process in place, the card will be solved following the existing process. If the person executing the process detects that there is room for improvement in that specific process, a new card related with the improvement is created in the Improvement Backlog. This is extremely important to ensure that continuous improvement or Kaizen is in place.

Another fundamental point around BAU is regarding how much Work in Progress you can allow.
Not limiting WIP can exceeds the limit of work your team can optimally perform. The overwork can have direct effects on the quality of the solutions and in the ability to finish activities. The amount of work that can be allowed as WIP varies from project to project, and it is not target to discussion in this post. We currently allow only one card in the working column per team member.

Improvement is your key element for quality. The amount of improvement that you have performed in your environment will have direct relation with your area maturity level. Improvement are, as the name says, changes that have the goal to change a current behavior into a more optimal one. Different from BAU and Fix, Improvements are based on a cycle time, which means that it always happens following a pre-defined cycle. Improvements cycles always follow a PDCA (Plan Do Check Act) model. In the Plan phase, an inception meeting is made with all the teams members to discuss the work. The goal here is to have a deep analysis of the problem to generate an action plan and a acceptance criteria. Once we have a plan and an acceptance criteria defined, we move to the next step, which is the implementation according with the plan built. After the implementation it is necessary to wait for the change to converge. In this phase we collect feedback that will allow us to evaluate if the change have met the acceptance criteria that was defined in the first phase. Feedback can come in different forms and sources like people, monitoring systems, logs, etc...
If the acceptance criteria was not met, a new cycle is started and a new inception meeting is required to revisit the change (or sometimes revisit the acceptance criteria). If the acceptance criteria is met, then a new process is accepted and documented. The card is moved to done and a new election process will happen to determine which improvement from the backlog will be selected for the next cycle. The election process must obey an order:

Ticket can not have pre-requisits that are not met.
Ticket is marked as hi-priority
Impact that the improvement will have on the BAU.
Other improvements.

The number of improvement cycles that can exists is directly related with your team availability and how big is your BAU backlog. If your BAU backlog is under control, and you have the resources available there is no reason to not have more than one cycle. This balanced will change from team to team and it is not the objective of this post to discuss it.

Improvements can have a drastic effect on the number of Fix and BAU cards you will have in the future.

A big advantage of using a Kanban wall is the ability to get metrics. Metrics will show you many elements around your team's work and can be an excellent base for decisions.
Some useful metrics that are normally extracted from a Kanban system are:

Lead and Cycle time - This metric represents the average time your team spends in every task. This metric is normally compared with the project tackt time in order to predict and allow corrections on the performance. This is also crucial for creating continuous flow or heijunka.
Cards solved per week - Similar to your cycle time, gives you an idea of the average number of tickets your team solves in an average week.
Backlog growth - This metric is directly associated with your demand. It is the number of new cards (work) that your clients demand every week. Normally is compared with the cards solved per week in order to determine how effective your team is related with its demand.
Work in Progress - Measures the amount of work that is being executed at a specific moment. Usually is compared with cycle time in order to predict delivery time and other CFD info.
Blocked time - Can be used to determine improvement opportunities in processes that require third parties interaction.

Tuesday, January 13, 2015

Thank you Ms. Satir - A study on change.

When I decided to bring this blog back from the dead, one of the thoughts that constantly were in my mind was the question "where to start ?". What is the most logical point to start with ? I decided that a (not so) brief introduction of the knowledge base I would be using in my posts was a good start place. So I did wrote posts about Lean and TPS. Now, having done that, the question visited me again.
Considering that all the tools and techniques discussed in this blog will possibly represent a change, I decided that a good course of action would be to analyze the change itself, it's impacts and how to handle it in a controlled way.

One of the best studies on change I could find was in the work of Ms. Virginia Satir. Ms. Satir was a social worker and a researcher on family therapy. Amongst other works she is responsible for the creation of the Virginia Satir Change Process Model. In this change model, focused on human and family behavior changes, Virginia identified that change had 5 distinct stages:

Old status quo
Resistance
Chaos
Integration
New status quo

Although this model was developed focused on human behavior and interaction, it was acknowledged as having a great importance to understand change itself.

The first state, or Old Status Quo describes a stable system, where occurrences are predictable, familiar and comfortable. From a organization perspective, the Old Status Quo represents a system stable, where everyone understand their roles and attributions, although not necessarily this is an optimal system. In 1A change gets introduced into the system. According to Ms Satir analysis, the first reaction to change is Resistance. In a organization this could mean that there is slow adherence to the change. Several reasons could be behind a slow compliance to the change, from disagreement to lack of familiarity (it is easier to keep the Old Status Quo). After some time, people will get more used with the idea and attempt to adopt the change. This is where Chaos becomes apparent. The lack of experience with the new scenario will directly impact functionality. While people learn and get familiarized with the new standards, it is normal to expect delays, questions, misunderstandings and errors that will reflect as a drop in overall performance. Given proper time, questions will get answered, people will become familiarized with the new standards, error rates will drop. This period is called by Ms. Satir as Integration. During this period, performance will start to raise again, until stabilizes on a New Status Quo, where people will understand their new roles in the process and new ways of work are bedded down. The New Status Quo can be positive, neutral or negative, depending on the change impact on the organization. A positive change will have a New Status Quo with higher performance than the Old Status Quo. A neutral will practically remain unchanged while a negative will have a worst performance.
The size of a change will impact the size of the curve in a proportional way. A large change tends to generate more Chaos (and more performance drop) than a small change. One big risk when introducing changes in the environment is that your performance drop falls bellow of your minimal acceptable performance. A MAP is a estimated line in your performance level, that represents an impact the business can not live with. In this cases, it is not unusual for the change to get rolled back to the Old Status Quo.

The larger the change, the more risk it carries to hit your MAP. One approach used to avoid this issue is to break the changes into smaller changes, with smaller impacts on performance. This way, it is still possible to perform change keeping the impacts to an acceptable level.

Another great advantage of smaller changes is related with the outcomes. Even if you get a negative change, due to a shorter cycle time of small changes, and the continuous improvement of the process, it is easier to review and start a new change.

Lean uses this concept in the form of kaizen and standardization.
Masaaki Imai in his outstanding book Gemba Kaisen: A Commonsense Low Cost Approach to Management, describes this process in a comprehensive way.

"...SDCA (Standardize-Do-Check-Act) standardizes and stabilizes the current process, while PDCA (Plan-Do-Check-Act) improves them. SDCA refers to maintenance and PDCA refers to improvement" - Gemba Kaisen.

When inserting a change, it comes in the format of a standard (standardize), stabilizing the process. Do refers to implementing the standard. Check refers to determine if the implementation remains on track and has brought the planned improvement. Act refers to performing and adopt the new procedure to prevent recurrence of the original problem or to set the goals for the new improvement in case it does not meet expectancies. Then, as part of continuous improvement, a PDCA cycle is called. P refers to plan the improvement, D refers to implement it, C to evaluate and A to adopt or plan a new cycle.

When applying the cycles to Virginia Satir Change Model, we use SDCA in the first interaction. If the results are below expected, then a PDCA can be used to improve the change process. This step can be repeated until we have a satisfactory result from the change. Later, if the team identify that there is still room for improvement, we repeat every step of the process, starting with a new SDCA and iterate with PDCA until the results match the expectancy.

To illustrate the use of this process, I will give a real example that happened last year in our Porto Alegre office. One of the critical components of our infrastructure are the virtual conference rooms. This is due to the distributed nature of our company and the fact that majority of our clients are located in different countries. The meeting rooms setup that we use for virtual conference have a flat screen TVs, one mac mini, one conference microphone and a Full HD webcam. During the day, different groups and projects share the meeting rooms, and it is not unusual for them to change configurations. Different versions and brands of communication software are used, based on what the client have available. In order to maintain all this updated and functional, we discussed and created a process. In this process we would automate the deploys of images in a daily basis. The images would receive software updates once a month. We implement the process and waited. During the next two weeks, we collected feedback from the users about this new process. Many users were not satisfied because their temporary files and configurations would be erased when the new image was deployed. At this point, we decided that the results were below expected and started a new PDCA cycle to improve the process. After discussion about the problem, we decided to change the deploy time, from daily to monthly. We implemented and once again observed. In the feedback sessions with the users we found that it was better, but still was impacting them. So we started a new cycle. This time we decided to change the deploy from monthly to every 3 months, based on a research of how often companies would release new patches. We also, change the image generation to every 3 months, to eliminate unnecessary work. We applied the changes and observed. From the feedback that we received, the users were, overall, feeling comfortable with the solution. At this time, we decided that we had reached a desired result and documented the process. The process then was distributed to other offices of the region to be deployed. All the changes that we did were small changes, with low risk to the business. When we discussed the issue, we always attempt to unveil the root cause and work directly on it. Also, in the check phases, we gave two weeks for the users to familiarize with the change and only then we actively look for feedback. In this case, it took us 3 cycles to get into a state where the results from the change were satisfactory. It is really difficult to get it right in the first try. Many of the changes that we applied in our offices required at least two cycles to get it right. This is not waste. By ensuring that your change will have satisfactory results you will always be aggregating value and eliminating waste from the system.

The conclusion is that the use of continuous improvement and standardization to control small continuous changes will lead your environment to a positive transformation, without the addition of unnecessary risks.

Monday, January 12, 2015

Toyota Production System (Part II of II)

This post is a continuation from this post here.

Principle 9. Grow Leaders Who Thoroughly Understand the Work, Live the Philosophy, and Teach It to Others.

If Toyota considers the culture being one of the most important elements inside TPS, it becomes obvious the importance of having leaders who truly understand it. Toyota's leaders are developed inside and not brought from outside. Toyota want someone who is familiar with the culture. They want someone who can understand how the work is done in the line. And they want someone who is committed to pass the culture to the people he leads.

Another important aspect of Toyota leadership is regarding the leader presence. Toyota believes that the leaders should be always present in gemba. Gemba is translated as "the real place". Toyota leaders all practice genchi gembutsu or the "go-and-see" art. This is about leaders and managers to be present in the place where the product is being developed. It is said that new leaders in Toyota are taken to a place in the line, and a chalk circle is drawn in the floor. Than he/she is asked for the new manager to stay in that circle and observe. Hours later, someone will come and ask him, what did he see. This is a common way in Toyota to teach the importance of gemba.

Also it is expected from leaders full understanding of the processes and the work, making it very difficult to hire an external leader. Leaders in Toyota often interacts with their workers, and it is not unusual inside Toyota for a leader to cover a worker in his absence.

Principle 10. Develop Exceptional People and Teams Who Follows your Company's Philosophy

In order to achieve development of teams one must first consider what is the role of leadership.
The main role of a leader inside TPS is to sustain the development of Lean thinking inside the company. It is done by coaching and by example, from the highest levels of leadership to the team leader in the line. The leaders, aside from their work responsibilities, have to coach and help to spread the Toyota culture into the company. It is a common attribution every leadership position comes with in Toyota.

The hiring process in Toyota also worth mention. It focus on identifying in the candidates characteristics that are align with the company's values. It is a three-stage process that includes a job fair and several interviews. This kind of effort ensures that the new hired person have the potential to develop himself inside the company's culture and values.
After the job offer, a new hire receives instruction on the culture and TPS. Until the individuals and teams really understands the culture and TPS, they are not in a position to be empowered. The road for a team development is usually long. Toyota understands that groups have to develop over time and are not able to jump right into functional efficient work.

Another contrast TPS presents when compared with traditional models is regarding of where the problems are solved. TPS understands that problem solving is a responsibility of the working groups and leadership should act as facilitators for this process. The reason behind this approach is the familiarity with the process and the eventual problems that the working groups have, placing them in an optimal position to improve process and solve any related issues. This also empowers the workers to contribute with kaizen or continuous improvement.

Principle 11. Find Solid Partners and Grow Together to Mutual Benefit in the Long Term.

In many business, the quality of your partners and suppliers will be a determinant factor into the quality of your product. Toyota recognizes this statement and always aims for long term relationships with its partners and suppliers.
If it is normal for your company to undergo a complex hiring process to ensure that your workers will be aligned with the company's culture, then why it should be any different when selecting partners and suppliers ?
Toyota usually considers new suppliers with caution and normally test them with small orders to determine quality and commitment. It is not unusual for Toyota managers to visit new suppliers lines in order to evaluate their processes and work. Also Toyota will teach new suppliers and partners about TPS and the Toyota Way. A partner or supplier is part of the Toyota family and as such have the same expectancies and opportunities as a regular employee. Once a partnership is established, it is very difficult to be ruptured.

Principle 12. Go and See for Yourself to Thoroughly Understand the Situation (Genchi Gembutsu)

Fugio Cho was the first president of the Toyota's Georgetown, Kentucky's plant. It is said that frequently he would be found in the line, observing the teams working. Workers and managers would say that he would be really focused on the line, staring, in a trance like mode. After some time he would break focus, say good morning to people around and get back to his office. It was not unusual for a request from the president's office to come later in the day, requesting to tight a process or to correct the flow in the area. This is what Toyota call Genchi Gembutsu or go-and-see.
Leaders in Toyota understand that the most important place for them to be was the production line, or Gemba (the real place). Gemba is the place where value is added. It's the place where continuous improvement or Kaizen happens. It is the heart of TPS.
Another famous story tells that a group of Toyota managers would be visiting a possible new supplier in US. The supplier team scheduled a full day of presentations about their product with the sales team and other managers. When they arrive in the site, they asked to go directly to the production line. The sales team insisted to have them going to the presentation first, but they shown little interest, insisting into going directly to the line. Once they got to the line, they stay there, staring at the workers and the machines for some time. Then, they turned to the sales managers and said that unfortunately it would not be possible to conduct business with them. This is how serious Toyota considers Gechi Gembutsu.

Principle 13. Make Decisions Slowly by Consensus, Thoroughly Considering All Options; Implement Rapidly.

Decisions are, without any doubt, an important part of any business. The way business make their decision varies from company to company, business to business. Toyota decision process is slow and usually requires a significant amount of work, research and discussion. Toyota collects all possible data on a subject before making a decision. Although this is a costly process, it has been proven quite efficient and fundamental for TPS. Decision making have five major elements:

Genchi Gembutsu to find what is really happening.
5 whys to reach the root cause.
Map and analyze all possible courses of actions.
Involve teams, stakeholders and partners in the discussion.
Use efficient communication to facilitate from one to four.

This methodology is called Nemawashi in the TPS. The literal translation would be "going around the root", and the original meaning was just that, digging around the root to prepare the tree for transplant. It's main goal is to allow all the people to propose possible solutions, focusing the core issue and decide which approach will be taken by consensus.

Principle 14. Become a Learning Organization Through Relentless Reflection (Hansei) and continuous improvement (Kaizen)

Learning to learn is a big thing inside Toyota. They achieve it by merging standardization and innovation, two concepts often regarded as opposite, into a functional methodology. By standardizing the processes, it creates a environment where improvement thru innovation is possible.
The key element for this is employee empowerment. Allowing individual and team innovation to be spread into the organization in the form of a standard. Standardization punctuated by innovation which leads to new standards.
Problem solving is one place where standardization meets innovation to generate new standards. When studying a new problem in a process, the flow usually starts by gathering more information about the situation. Many times this is done using Genchi Gembutsu. The next stage is focused into finding the real problem. This is often done asking repeatedly the question why? Why A happened? Because of B. Why B happened ? Because of C...repeat until you find the fundamental issue. Apply countermeasure. Observe how the countermeasure behaves to fix the problem. Repeat until satisfactory. Standardize.

Inside this method, hides two important aspects of the TPS culture: hansei and kaizen.
Hansei is the self-reflection. It pictures the ability of looking inside (to self) and identify what is wrong. In quality, this is of paramount importance, once it is impossible to solve a problem you don't acknowledge. It is said, in Toyota that no problem is a problem. There is a story of a senior director from Toyota who was visiting a plant in the US. After a few days, the director asked the local plant manager how many times did they stopped the line (because of problems). The manager say that it was a good thing that they did not had any problems, and so the line did not had to be stopped. The Toyota director replied to the manager "no problem is a problem".
Only through hansei, kaizen can exists. This is a very strong statement that means "there is no improvement without identifying first what needs to be improved". So, in TPS, find your weakness will enable you to improve. And the way to improve is through kaizen.
Kaizen, as described in this post, is the constant search for improvement. The literate translation would be continuous improvement, and it is at the core of the TPS.

These last three posts:

Lean
Toyota Production System (Part I of II)
Toyota Production System (Part II of II)

They form the theory base from what we will explore in this blog in future, more practical posts based on my experiences. I hope I was able to touch each concept in a way that is not boring or tedious (although I might have failed). What I can promise, is that the next posts will be more practical and less conceptual. I hope you guys are still there...