Archive

Archive for November, 2006

Submarine compartments

November 8th, 2006 No comments

Submarines are built with compartments separated by bulkheads. These prevent the spread of fire and limit the impact of damage to one compartment. Good technical architecture follows exactly the same principle, but it normally applies to:

  • Compromise of one system leading to compromise of other(s)
  • Failure of one system leading to failure of other(s)

Interestingly, wherever this principle is applied there is a tradeoff, normally in managing it, and this limits just how far to go with it. From experience I don’t think people go far enough.

Networks

The most obvious application of this is with networks where it is quite common to split them into VLANs or even separate LANs with firewalls or routers acting as the bulkheads. Access between them is limited and specific thought is given to what happens if one is compromised.

The tradeoff is that this requires more kit, more management and things that would otherwise just work have to be allowed to work. But these are generally such small tradeoffs that no professional omits firewalls for those reasons.

Servers and Applications

Less common, outside the world of financial insitutions and other high value targets, is the separation of applications between different servers. I joined one organisaton where this principle had been applied fairly stringently for some years to the position of only allowing one or two application ports open per server. Whilst extreme, I think it was a major contributor to the lack of downtime and systems failure.

We all know that different applications on the same system can interfere with each other, particularly in a Windows environment with shared libraries and poor memory protection. But tracing such problems generally requires special tools and a thorough investigation. If a fault only occurs very occasionally then this analysis is rarely done. From working in an environment with applications so completely partitioned on different servers, I think the impact is dramatically reduced failure rates.

Of course there are sound security reasons for this separation as well. If one application is compromised then it is much more difficult for that to lead to compromise of other applications if there aren’t any.

The major tradeoff here is that many servers are left grossly underused and that has knock-on implications, particuarly for space, power and air-con in the datacentre. These are increasingly important concerns and I would urge anyone to resist allowing those concerns to dictate server planning. Just buy more racks, supply more power and fit more air-con.

Passwords

Even less common is the separation of passwords (or keys) into domains. This is only about security and it is the application of submarine compartments just as with networks. If someone manages to crack a password then where else can they use it? The same thing goes with keys, specially ssh keys.

Passwords, and keys in particular, are much more carefully guarded than most information because the inherent security value of them is instantly recognisable. Despite that, I doubt that many people do a formal analysis on what happens if a password or key is compromised – just how far can people get and what is there to stop them?

Conclusion

That leads me on to the final point that this principle is not one that needs a blanket imposition, but rather a risk-based analysis to determine where best to use it. This is undertaken by asking the basic questions of ‘what else can be affected if this system fails?’ or ‘what other systems can be targetted if this system is compromised?’. Asking these questions can produce some uncomfortable answers.

Categories: Machines

Delegation, empowerment and decision making

November 6th, 2006 No comments

Delegation and empowerment are two very trendy buzzwords used by modern managers. But I’m not sure that many people, both managers and staff, have anything more than a vague notion of what they mean in practice.

Every now and then someone comes to me and says,

“I’ve got this really good idea how to sort out this problem. It means us changing the way we work like this …”

to which I reply,

“Great idea. As you’ve thought of it I’d like to delegate managing this change to you.”

Now a very small percentage of people can’t wait to go and tell everyone else what to do, but most people are horrified at the idea and instead reply with either

“Thanks. So can I go and tell everyone they you’ve decided they have to change and if they have any problems to talk to you?

or

Thanks. So will you go and tell everyone that you’ve decided they have to change?

All three of them are making the same mistake, which is to assume that a management decision is about telling people what to do. So they react according to their personality when they think this is what they are being asked to do. I’ve learnt to spot a flicker in the eyes that gives away this internal conflict (or in some cases the eyes lighting up at the prospect) and explain exactly what I expect.

The steps are:

  • Start off by working out who will be affected by the change. Not just colleagues but managers as well.
  • Then consider how they are likely to react to the change.
  • If it looks like they will have some issues, then work out what to do about it
  • Then go and consult with the people who will be affected, explaining the change, listening to their views armed with the planning you have already done.
  • If there are any concerns, suggestions or other views, then listen to them and change your idea to accommodate these views. Don’t be stubborn, negotiate.
  • Aim for consensus, and when you have it then the get a firm plan agreed for implementation.
  • Keep the people affected informed every step of the way

And that’s all there is to delegation, empowerment and decision making – reaching consensus amongst colleagues and keeping people informed.

The most important thing I want them to realise, is that none of this takes the official role of ‘management’ to achieve. You can do it whether you are the most junior person in the team or the most senior, all it takes is good listening, persistence and a desire to get things done. Interestingly, it was following these steps before I was a manager that got me recognised as someone who got things done and helped me get promoted to management.

Unfortunately there are quite a few managers who don’t realise that this is how they should reach decisions, by consensus, not by the management big stick. I had one manager working for me a few years ago who was so bad at this that I had to draw a diagram in Visio to teach him how to make decisions without trampling over all his staff. He never really got it and was both unpopular and ineffectual as a result.

Categories: Organisations, People

Loosely coupled systems

November 3rd, 2006 No comments

It is increasingly common to find that applications have remote API access. With the ubiquitous use of web technology people can add XML based remote interfaces quite quickly.

The temptation is now there to connect together all sorts of systems using these APIs in an event driven fashion. For example, if we want to connect our customer database to our mailing list system, which has a web based API for managing users, then this can be easy to set up. Whenever there is a change to our customer database this triggers some code that uses the mailing list API and we have the change propagated in real time. Seeing systems connected together like this with real time interaction is addictive.

If we’re not careful though, we end up building tightly coupled systems when this might not be what we want.

To explain this, let’s talk about a sending system and a receiving system and go through some scenarios:

What happens if we want to take the receiving system down for an upgrade?

Hopefully that’s something that was planned for and the sending system can detect if the receiving system is not available and queue the changes. Then when the receiving system comes back on line the sending system can process the queue. So that one was easy enough.

What happens if the receiving system dies and has to be restored from a backup tape?

Now that’s a bit more complicated because the changes might have been lost. To get around this all we need do is ensure that every change is written out to a table or file of changes. That way we can replay the changes lot by restoring from backup to get the receiving system back to the right state.

What happens when the format of the changes, changes (if you see what I mean)?

This happens all the time. The format changes so we change the code that fires off the changes but forget to change the code that stores the changes. Either that or we do remember but make a mistake in the storing of the changes. Well this is also easy to get around, we just ensure that the process on the sending system that sends changes to the receiving system uses as it’s source information, the stored changes after they are stored. That way, there is only one source and we can’t introduce an error we don’t find until much later.

What happens if for some reason we don’t want to send the changes?

Let’s just imagine the receiving system develops a fault in the API, say after an upgrade, and trying to use the API causes it to crash. Obviously we don’t want to turn off the sending system so we could firewall the receiving system, but that’s a bit of a pain and may not be possible. The obvious thing to do is to turn off the process on the sending system that fires off the changes.

And there you have it, a loosely coupled system. All it takes is for us develop the sending system so that it:

  • writes out every change to a table or file of changes
  • uses the written out changes as the source of data for publishing those changes
  • makes the process that publishes those processes a separate process that can be started and stopped independently of any other systems
  • enables that process to start from any point in the stored list of changes so that you can replay changes.

The final thing to point out is that this has an unexpected benefit. If we want to test a new version of the receiving system API then we can just set the sending system process to replay the last few days of changes and see if the receiving system copes. No special code needed.

Categories: Machines

Buy any book you want

November 2nd, 2006 1 comment

One of the most unusual policies that I operate in my technical team is to let them buy whatever books they want provided they are at least loosely work related. All they have to do is send an email to purchasing and they will have the book delivered, no questions asked.

I don’t mind if they might start reading the book, decide it is rubbish and stop reading. Nor do I mind if they want a book that their neighbour has on their desk. People read books in different ways.

The rationale is

  • the more they know the better they will be at their job
  • in order to do their job they have to keep up to date with new things
  • books are a very cheap way of learning
  • having a book nearby as a reference can be an enormous timesaver

Most people on this scheme spend around £250 per year, but even the most determined person won’t exceed £1,000 per year on books, which is about the same cost as one third of an average technical training course. For a 25 person team I budget around £5,000.

You might be wondering how do you stop technical staff from building a library at home? The answer is that I don’t actually care if they do. Technical books are not, on the whole, like classic novels. Nobody reads a FoxPro manual any more because it is stale information. They are much more transient, almost disposable.

But we do stamp all the books with a ‘property of …’ stamp, just to limit their resale (not that anyone in my team would do that) and to remind people whose books they are.

Interestingly, with this policy in place, I find myself getting suspicious of those in the team that don’t buy any books. Can you really keep up with new technology just by reading the web?

Categories: People

Indirection – the life saver

November 1st, 2006 No comments

This is one of the very few technical principles that I absolutely insist upon and tend to have sense of humour failure when it is not adhered to. This has saved my life so many times that I have lost count.

The principle is simple:

All applications and end users should, wherever possible, only ever connect to a service by an aliased name, never the real name of the service.

By service I mean server, remote application, or anything with a name on the network.

In other words, always use the magic of indirection:

client -> alias -> service

So why is this so important? Well it is so much easier, orders of magnitude easier in fact, to change an alias to point at a different service than to change all the clients to point at a different name or to reinstall the service elsewhere with the same name. When things need moving, either for planned maintenance or because they break this gives us so much flexibility and control. Unfortunately too many techies only think about that when they are planning the move, by which time it is a nightmare.

The normal technical way to do this is using a CNAME in DNS. So we create a DNS record for the service like this

my-service       IN        A          10.0.0.1

and add the CNAME

my-alias         IN        CNAME      my-service

But don’t forget, it is not just DNS where this can be done.

Categories: Machines

The dangers of technical monoculture

November 1st, 2006 1 comment

When you manage a large infrastructure there are plenty of good reasons to standardise on particular hardware and software. It makes training simpler, it makes movement of people around the organisation easier, it reduces support and maintenance costs and so on.

However there is a strong movement to impose the same standardisation on the IT department, usually following the reasoning “we should be seen to be doing the same thing that we expect of the rest of the company”. Well, having been guilty of this in past I came to realise this is quite wrong.

The purpose of end users using computers is to help them do their work and in order to do that they need to learn how to use computers as effectively as possible. But the one thing that is special for people in an IT department is that they must learn about all sorts of things that are *not* used within the organisation. That’s because it is their job to improve the infrastructure and one of the best ways to do that is to introduce new or different hardware and software. And the only way the IT people are going to know to do that is if they get to use all sorts of things not on the approved list in real depth.

So give your IT people the freedom to choose their own hardware and their own software. Obviously this sits outside the normal support structures so make them responsible for supporting it themselves, that way they will learn even faster. Of course they still need access to the ‘corporate’ desktop for all those applications that only run there and to make sure they know the end user experience. But don’t force them to see everything only through the eyes of the end user.

The alternative is monoculture. Your staff get stale as their skills only develop in a limited framework and innovation dries up. This then becomes self perpetuating as you as a manager end up being frustrated that they don’t seem to be innovative and increasingly listen to the things that outsiders say (i.e. consultants and suppliers) because they are the only people saying anything different. Then, of course, the staff feel threatened by the consultants and resist, which compounds the view that they are not innovative.

So avoid the dangers of monoculture and let your staff use whatever kit they want and if it goes wrong then make them fix it. They will be a lot more innovative as a result.

Categories: Machines