top of page

SRE- The first take - Brandish the sword with care and caution

While literally everyone who has transformation in mind is buzzing and raving about Site reliability Engineering (SRE ) philosophy ( Rightly so !!) & trying to find the best fit for the organization to embrace SRE culturally, there is also a though process about being cautious about the approach.


Today, Organizations small, medium  & large. Old, New and ancient are looking for using SRE as an instrument to success. While it’s a war winning instrument but there are aspects which would be required to be addressed for SRE s to unleash their true potential. More or less, almost all of the monolithic set up ( read traditional companies) are looking at a turn key solution from SRE, like an instant gratification to their age old problems. It can be compared to a rusted giant wheel which is being turned and there are two most important things required for this wheel to turn , One, lubrication and second the force . Both in equal measures.


Comparing the same analogy and coming out of metaphoric world to real one , SRE s need the thoughprocess, freedom, authority and tools to lead the transformation and cannot be looked at as an object for getting an instant gratification ( like gaining efficiencies first day, DevOps tool chain the next day etc etc). It’s a journey and its going to take bit of time, it’s not a turn key solution else every organization would have achieved it by now. Its an investment for the future , to make sure the new business can be catered to for next decade to come.

So, today , the discussion would be around what should NOT be done with SRE when the organization starts the journey.

1. Instrument of gratification

2. Inculcating the principles and training in isolation 

3. Heavy duty governance

4. Analytics of data from micro management perspective

5. Non technology centric approach

6. Not ready to take tearing bold decisions

7. Not making cautious cultural shift 


Not that I want to sound negative but we shall discuss these in details with a positive set up what should be done 


Instrument of Gratification 

The instrument of gratification refers to certain accountability entrust on SRE because people start looking at them as someone who can do that heavy lifting alone while others in the system can clap on their success or be critical on their failure. SRE are the change agents, not the change it self. Instances where SREs are bound by achieving certain hours of saving is totally uncalled for. SREs cannot take those accountabilities but the service owners should. An error budget can be defined and SRE can comment on good new areas of automation, leaning or tools usage for optimization but cannot own the target for that. The ownership of these sort of targets ( if at all its there) lies with the service owners . SREs are part of the day to day operations and those targets can be maintained to ensure that they can automate themselves out of their jobs 😉


Inculcating the principles and training in isolation 

Any transformation cannot have all change agent coming from outside the ecosystem. There is a certain degree of changes which is required in the current set up, certain re skilling exercise which needs to be performed inside the organization to ensure the critical mass of the change agents are in place. Training has been the best way to do that but for topics like SRE, training alone would not help. There needs to be a certain degree of hand holding which would be required for these new turned SREs. One of the cracks majority of the organizations fall through is that training is the solution to every reskilling. These cultural shifts have to be nurtured using Pilots, early adoptors etc. Once the trainings are completed, the SRE s need to go to ground to understand, the methodology needs to be discussed which varies from service to service. What tools to be used, how to use these tools etc. etc.


Heavy duty governance


There is a general tendency that whenever these transformations happen, its needs to have clear governance is required. Its true to an extent but it has to be understood that micro management is not required and too much of Governance would pull the SREs into different directions. For Example, the SREs cannot have one level of governance at service operations level, another at architecture level, another at community level and may be another one at the department level. This sort of governance would damage the thought process and would maligned the entire flow . This needs to be trusted more , while governance can be surely over arching but should not interfere with thought process and philosophy. The philosophy and thought process has to be managed centrally and slowly percolated down into the organization , not the other way around.


Analytics of data from micro management perspective


Analytics is a very vague and over used word in the industry. Analytics thought process varies from simple excel select to AI/ML . While all these latest technologies do exactly what an excel with human mind can do, its just that it does it exponentially faster. Another important point is , where do we apply this analytics. While we can have all the data wanted , usage of it in the correct form and correct thought process is very important. This data should NEVER be used for deriving directly what is the amount of time spent. Its not possible to calculate that in any form of IT service or product set up. The analytics should be used for deterministic way, where the set up is used to get deterministic output not approx. output as that would require double effort of again going back and verifying it. 

The other aspect is data should be used also to take decisions at run time, not only for identifying gaps or improvements. The monitoring element is of SRE is still underrated even if its one of the key principles of it. The analytics should be used for technology side of things, not to micro manage the human side of things. It can be used to get a rough idea of where the gaps can be from process stand point but it has to verified again and its not a full poof solution. There is a reason AI/ML is so much talked about as it can do certain degree of magic, in terms of speed but the thought process has to be right and then the magic becomes a reality.


Non technology centric approach


This aspect is very much connected to the first point of Instrument of usage. SREs are supposed to use technology to bring out changes or automation. Automation is the end product, it’s the final act. There is a lot more which happens behind it to get to that. This can be analysis of data, looking at ops work, challenging the process & one very important topic which is missed is legacy monitoring and automation set up. While, some of these tools might have served well for the organization but everything has shelf life and organization must be cognizant of that fact. It’s a very important aspect which would be a key lever for modernization from the bottom. Sometimes, this tool modernization itself bring a certain degree of transformation by default.  Intergratable solutions are the future (including open source) and there are enough examples in the market to suggest that they are good enough.


Not ready to take tearing bold decisions


One of the key points which cannot be overlooked. Sometimes, it requires to tear down a process and make it from scratch rather than try to change an existing set up. This same analogy is required for tooling, analysis & ways of working. While, we should strive to re use as much standard tools which are existing but legacy tooling has limitation and the best approach is to embrace old and new and making the entire eco system integratable. One of the advantage of having legacy systems is that we have plethora of data ( years of data) and that would be a very good starting point for analytics set up.  These are all the thought process which has to be looked into. Mordern Tools need to be accessed for “fit for purpose”. More or less, they are designed in such a way to ease the work of the operator and that should be the thought process behind SRE tools decision making. Security plays a big part anywhere but  in monolithic organizations , it can be a point of delay & it has to be factored in to ensure the new tools can be rolled out.


Not making cautious cultural shift 


While we look at SRE from all modern aspect and service assurance aspect, we tend to forget that 50% of the time , they are going to be part of service delivery team doing day to day operations. This aspect is important as they are going to be the change agent at the lowest plank of service delivery. The introduction of blameless postmorterm, co hesive approach with the team for new tools, talking about analysis of data with the team and looking at automation opportunity always is a cultural shift which can easily rub into the way team delivers and that’s the bottom up cultural shift which would eventually start the real transformation in strict sense. Remember SRE is as much a  service and tech topic as it is culturally.



Conclusion:

The words of Benjamin Treynor Slose , VP Engineering Google . “SRE is what happens when you ask software engineer to design an operations team “. This is as simple as that but it can be difficult to implement as changes are slow and yield over time. That too if enabled properly. The last thing someone wants is we try to guide and nurture wrong expectations from the SREs because it takes years to get it right but days to negate and get it horribly wrong. Empowerment and cultural shift is the key to success and should be barely compromised.

Body

SRE for transformation: Text
  • Facebook
  • Twitter
  • LinkedIn

©2021 by SRE- The first Take. Proudly created with Wix.com

bottom of page