Archive

Archive for the ‘Machines’ Category

Statistical significance

December 18th, 2006 1 comment

All too often I see results of a survey displayed where someone says ‘so 66% of people are in favour of this change so we should get on and do it’ and I find myself resisting the urge to jump up and say ‘that’s nonsense, don’t you know about statistical significance?’. This happens so often with information that is used for important management decisions that it scares me. Here’s an explanation of just what I’m on about.

The basics

The best way to find out what a group of people think is to ask everyone and get an answer from everyone. However when only some people answer there is a mathematical way of working out just how reliable any answer is, that is Statistical Significance. This method works out from the sample taken, how many people would say the same thing if you asked everyone and just how likely that answer is to be correct. It cannot give you an exact answer, but it can give you a range within which the actual answer lies and you have to determine whether that range is too big to be of any use.

It can also tell you how many replies you need to get in order to get a significant result. It cannot tell you how many to ask since not everyone will reply and that is down to psychology.

The theory

Statisticians over the centuries have noticed that the answers to surveys (amongst other things) fit into patterns of distribution and that with enough answers you can work out what that distribution pattern would be. The more answers you get the more certain you can be about the distribution pattern.

You can never be 100% certain what everyone thinks but with enough answers you can get close. So most statisticians work on the basis of trying to be 95% certain about an answer. However there are times when you might want to be 90% certain or even 99% certain. This degree of certainty about the answer is called the Confidence Level. For most management statistics a 95% Confidence Level is fine. If you are dealing with things like infection rates you might want to use 99%.

The significance of any answer depends on three things:

  • Population. The total number of people that you could ask if you asked everyone.
  • Sample. The number of people from whom you actually got a response.
  • Percentage. The percentage who gave a particular answer to a particular question.

What you get back is a spread of percentages, which is the measure of statistical significance, known as the Confidence Interval. So for example if you did a survey and 75% answered yes then the calculation might tell you that if you asked everyone the actual number who said yes would be between 35% and 115%. However with a larger sample size it might tell you that if you asked everyone the number who said yes would be between 70% and 80%. As you can see the first result is meaningless but the second is very useful.

There are some interesting points to note about the way this spread changes:

  • The larger the sample size the smaller the spread. But it is not linear so calculate it, don’t try to guess it.
  • For a larger population then a smaller sample relative to the population is needed to get the same degree of spread.
  • The closer the percentage of people who give one answer gets to 50% (above or below) then the wider the spread becomes.
  • The spread is the same whether you take a positive or negative answer. So for example if 75% said yes and 25% said no then you would get the same answer which ever way you did the calculation.

Examples

There are actually two ways you can use this calculation. One is after you have the answer and the other is before you send a survey. Both of these assume a Confidence Level of 95%.

Example 1 – “ How significant is this answer?

Assume we have a population of 3000 people and we send a survey to which a sample of 300 reply and out of those replies 225 (percentage 75%) say yes to a particular question.

When we do the calculation we find out that if we were to ask everyone the same question then the percentage who would say yes would be between 71.35% and 79.65%. So we can be certain that the majority agree.

The actual calculation returns a Confidence Interval of 4.65% so the lower limit is given by 75% – 4.65% = 71.35% and the upper limit is given by 75% + 4.65% = 79.65%.

If however we only got replies from 30 and of those 22 (73%) said yes then when we do the calculation we find out that if we asked everyone the same question then the number who said yes would be between 58% and 89%. Still a majority but a much wider spread of possible results.

Example 2 – “ How many do I need to ask?

Assume we have a population of 3,000,000 then we can do the reverse calculation to find out what sample size we need to get answers from to get a spread of just 1% on any answer they give. In this case it is 9573.

As the spread changes depending on the percentage who answer, this calculation assumes the worst case, which is that 50% give one particular answer. If we did get 9573 replies and more or less than 50% gave the same answer then the spread would be shorter.

The calculation

The actual calculation is too complicated to explain (and I’ve forgotten it) but you can visit the web site

http://www.surveysystem.com/sscalc.htm

which will do the calculation for you both ways. You can also save this page and do the calculation offline if you want as the code that performs the calculation is all stored with the page.

Categories: Machines, People

Building systems like a tailor made suit

December 11th, 2006 4 comments

There seems to be an increasing trend for people building systems to ensure they meet current needs but then take that even further with customising them to fit just so. Or sometimes it is expressed as a refusal to consider future needs unless those are clearly expressed and the need for them is well established. Maybe this is an old trend from mainframe days that is going through a resurgence.

Here are some examples of what I mean:

  • In software just writing code that does what is needed now without the hooks for what might come later.
  • In hardware just configuring the capacity that is needed now without planning for future growth. (As an aside, the thing I find most irritating here is cutting cables to exactly the right length).

To me, almost everything I design must have an virtually obsessive degree of future-proofing built into it. Over the years I’ve built up a whole set of techniques to minimise the impact of this need, so that I get the future proofing I want, but without hassle.

Maybe some people just don’t know how to do that and so they avoid building in the future proofing as they can only see it adding complexity and cost?

I’m fairly convinced that my approach has two significant benefits:

  • The first and most obvious is that it enables later change to happen much quicker and with much less difficulty than otherwise. Of course that relies on the nature of the future proofing work that was done originally, but then that is the skill that needs to be learnt.
  • The second and potentially more contestable is that designing with future in mind actually produces better design now. It forces me down the road of making things generic, designing distinct interfaces or protocols and, most importantly, building the whole design around a conceptual framework.

This last point about the conceptual framework is critical to the understanding of any system. So long as you understand that everything fits in a conceptual framework and know what that is then you can far easier extrapolate your understanding to areas of detail that you don’t know, than otherwise. In other words, it is so much easier to take a sensible guess. It is also so much easier to understand how you would extend it.

The counter-argument to my whole proposition here is that modern technology and modern methodology make changing systems so much easier than before. For example, with virtualisation tools a server or disk volume can be moved and resized with a single click. Or, with modern IDEs, software can be re-factored with a single command.

However, both of those would be even faster, or perhaps even unnecessary if the design was done right in the first place.

Categories: Machines

The power of a technical blog

December 9th, 2006 No comments

Imagine I am holding my hands out in front of me, two feet apart and imagine that represents all the work an average technical team does. Then think about just how much your customers actually get to see. For me I think that is about the last two inches. So we have all that effort, all the brilliance, all the achievement and yet apart from those in the technical team, nobody ever gets to hear about it.

My answer to this is a strategy for exposing this wealth of hidden experience and expertise. For me this means doing the following two things, running a technical blog and getting people onto the presentation circuit. In this article I am only going to cover the former – the power of the technical blog.

First of all let me get the basics out of the way. Blog software that a whole team can use is easy to come by, we use WordPress and hosting server should be easy for a technical team, but if not then go to a cheap hosting company and it can be yours for a fiver a month. There should be no practical barrier to doing it.

The purpose of the blog is for my team to document the technical things they do that nobody would normally find out about. So that ranges from:

  • Documenting that great OS bug they found that required a special patch from the manufacturer
  • Describing the correct configuration for that obscure piece of software that they had to struggle to find out
  • Recording the results of some testing they did on a new piece of kit
  • Sharing the things the things they learnt on a technical conference
  • Promoting that great technical idea they have that manufacturers should all be adopting

and so on. I’m repeating myself, but basically anything that is technical and would otherwise not be seen, goes.

I don’t authorise the articles in advance, or even know about them until they are published. If fact, I only have two rules that I ask of people:

  1. It must be about a technical subject
  2. It must not be too rude

The next step is marketing the blog. Now, provided you are sticking to the purpose above and you don’t want to use it for product marketing, you can simply brand it as a peek into the work of the technical team. Then, link to a relevant article wherever the opportunity arises. In fact it is generally preferable not to repeat too much of an article in another context as people might not follow the link.  I tend to check the referrer stats to see just how much traffic has been generated by the placement of a link.

The trickier element is ensuring that the titles of the articles, the way they are written and the categorisation given, meet with search engine requirements to ensure that our articles appear at the top on focussed searches. But so long as people stick to the plan and write one article at a time then this should get better over time.

Once the initial internal promotion to a sceptical technical team is out of the way I find myself left with very little work to do. I have to remind people, when they are near to completing an important piece, that now is the best time to blog it. I have to check categorisation and add or change categories as needed to cope with the changing nature of the articles. I check the web stats to see what groups are reading it. Finally I avidly check it every day to see if there is something new.

The power of a technical blog is that it makes the work of the team transparent, it establishes credibility for the team, it gives them the recognition they deserve and it builds a community. It also makes great reading. Priceless.

Categories: Machines, Organisations

The horror of the Windows registry

November 15th, 2006 No comments

Whilst there are plenty of people who mistakenly follow a technical religion, there are actually some very good technical reasons for minimising your use of Microsoft Windows. The main one of these is the horror of the Windows Registry.

At the heart of Windows lies a group of binary files that store almost all the configuration information for both the operation sytem itself and the installed applications, collectively called the Windows registry. Other operating systems have individual configuration files, sometimes binary and sometimes text.

The registry has lots of problems:

  • It can’t be edited by hand with a text editor, a specialist tool is needed. I realise this applies to other configuration systems (like OSX plist) and I would make the same comment about them.
  • Spurious entries and corruption in the registry can cause slowness and operational problems. It’s for this reason that a great market has grown up in registry cleaning software.
  • Corruption is non-obvious to spot. This is caused partly by the binary format and partly by the cross-linking between various parts of the registry. This is another reason why registry cleaning software is needed.
  • The configuration information for a particular application appears all over the place, not just one place in the registry, so you can’t easily see all the information you need at once.
  • It is not easy to just swap between two configuration sets for an application without considerable hassle. With config files it is of course easy enough to just keep various copies and move them around as needed.
  • De-installation of applications is a nightmare so normally only the installing application can do it since the bits can be all over the place. Of course if you don’t have the installing application, or it does not have a good de-installer then we need a registry cleaner, again.
  • The registry is a single point of failure. If you lose one of these files (say the software one) then you lose all the configuration information for all applications.

Therefore, the impact of the registry compared to separate config files is:

  • It takes a lot more time to get anything done.
  • It takes much more effort to learn.
  • The routes open to a sysadmin to test and debug are much more limited compared to other OSs.
  • When it goes wrong, it take much longer to fix.
  • It ends up costing a lot more money to support.
  • Systems are less stable.

All very good reasons for just sticking with separate configuration files, as other operatings systems do, preferrably in text format.

Categories: Machines

Beware technology religion

November 14th, 2006 No comments

This is one of the most difficult problems to deal with – when people get religion – in other words when people get extremely strong views on technology because that’s what the entrails of the chicken told them.

Now don’t get me wrong, I’m not equating strong views with a belief in voodoo, it is actually much more complicated than that.  There are plenty of people with strong views who have detailed reasoning for those views.  The issue is those views that are more religion than science.

For example, there are plenty of Windows avoiders out there.  But there is a real difference between those that avoid Windows because they see them as the evil empire and those that avoid them because they understand just what a disaster the Windows registry is.

So whenever I encounter any strong views I have to dig down to see if there is reasoning behind it to separate it from religion.  On the surface the distinction is not obvious.  Occasionally I find that the strong views are based on an intuitive understanding of the issues that can’t easily be put into words.  In these cases the process of digging down can sometimes lead to a switch from religion to science (a deconversion?).

The reason this bothers me so much is because I generally find that those for whom this is religion, can unpredictably change their views and worst of all, I don’t think they really understand technology.

At the same time I’m slightly concerned when I meet people who have been working in IT for several years and yet don’t have strong views.  We all come across so much rubbish that anyone who fails to be polarised by it is probably not paying attention.

Categories: Machines, People

Islands of information

November 14th, 2006 2 comments

For some years now I’ve viewed the practice of users storing data files in directories on local hard disks or even servers as completely anachronistic. In fact I think of it much the same way as I think of data held on a floppy disk – I know it’s there but getting at it is so hard it might as well not be.

To me, all this data is isolated in islands of information and needs to be rescued by being stored in a different way.

What I want is for all information to be:

  • Searchable
  • Categorised, with multiple categorisations
  • Catalogued. In other words described in a table of contents
  • Versioned
  • Accessible by anywhere on the net in a controlled way

and that is not done by current operating systems. So, for that reason I think we need to see a shift towards all data being stored in specialist applications that provide these functions. Yes, I realise that desktop search has dealt with the searching issues, but it still doesn’t tackle my other points.

The product I prefer for this at the moment is Lotus Notes, but the moment something better comes along, I’ll switch. For example if there was an Internet based app that combined webdav, caldav, email, opendoc format files and so on then that might do it.

The final thing to note is that in any organistion the highly structured information is already stored in a specialist application, called a database. All I’m talking about is taking the same approach to loosely structured information, which is actually a first step in a knowledge managememt strategy.

Categories: Machines

Convention over configuration

November 9th, 2006 No comments

When I used to be a programmer I was very hot on my own internal naming conventions and I was using my own variant of Hungarian notation long before I discovered there was such a thing. Writing systems by myself meant that I could enforce this internal consistency and make my life simpler. But I always longed for my conventions to be understood by the compiler. So, just by giving something a specific name the compiler would understand how to connect it to other code, without me needing to explicitly configure it.

I’ve now seen that in action in Ruby on Rails and I am well impressed. It even has a name – ‘convention over configuration’.

An example of how it works in Rails is simple enough. If you have a URL http://domain/x/y then Rails expects to invoke a method called y in a controller called x. So easy. But Rails does this in a few more places and is always looking to increase use further.

I find that this approach makes so many things easier that it should be considered as a general principle for software design, not just a quirk of Rails. The benefits that it brings are:

  • You have to write much less code.
  • Spotting a naming mistake is simpler because it stands out more.
  • If you don’t know the name of a particular function then you have a lot more chance of making a successful guess.
  • You have more chance of guessing how and where a function is used just from the name and context.

Brains are naturally pattern matching machines and I think convention over configuration appeals to them much more than a long XML file. I’m surprised I don’t see it being used much more often but I bet we will in the future.

Categories: Machines

Forcing password changes

November 9th, 2006 1 comment

A small bugbear of mine. Why do auditors ask me every year “Do you force your users to change their passwords regularly?”. Since when did this become such an unchallenged tenet of security?

Well I don’t agree that this is a good idea at all, and I patiently explain that to the auditors each year. In fact I take the view that forcing people to change their password regularly can actually reduce the security of most systems. My reasons for this are fairly clear:

  • Most people find passwords difficult to remember and the more complex the password, the more likely they are to forget it. This is probably because there is no obvious handle for the memory to be hooked onto, i.e. no mnemonic, no event, no place etc.
  • If you force people to change regularly then you also force them to develop mechanisms for storing their passwords other than just remembering them, because the burden on their memories is just too much.
  • These mechanisms are nearly always insecure. In some cases it is a post-it note stuck to the screen, sometimes a note in a drawer or on the desk or even a text file on their computer.
  • Choosing a password is actually quite hard for many people. They can’t think of anything and so they tend to go for simple, memorable names. Just the kind of thing that is vulnerable to a dictionary attack
  • Once someone has finally remembered a password, generally only through repeated use, do they then destroy the physical record of it.

So what really matters, is to teach users

  1. How to choose a strong password. I’ve seen some nice online tools that evaluate the password strength as you type it. I’m sure they’re a bit corny and generally don’t include dictionary searching, but the do help the user.
  2. How to choose a memorable password, so they don’t have to record it anywhere.

Now if auditors asked me whether we do that or not, then that would be a sensible question to ask.

Categories: Machines

Submarine compartments

November 8th, 2006 No comments

Submarines are built with compartments separated by bulkheads. These prevent the spread of fire and limit the impact of damage to one compartment. Good technical architecture follows exactly the same principle, but it normally applies to:

  • Compromise of one system leading to compromise of other(s)
  • Failure of one system leading to failure of other(s)

Interestingly, wherever this principle is applied there is a tradeoff, normally in managing it, and this limits just how far to go with it. From experience I don’t think people go far enough.

Networks

The most obvious application of this is with networks where it is quite common to split them into VLANs or even separate LANs with firewalls or routers acting as the bulkheads. Access between them is limited and specific thought is given to what happens if one is compromised.

The tradeoff is that this requires more kit, more management and things that would otherwise just work have to be allowed to work. But these are generally such small tradeoffs that no professional omits firewalls for those reasons.

Servers and Applications

Less common, outside the world of financial insitutions and other high value targets, is the separation of applications between different servers. I joined one organisaton where this principle had been applied fairly stringently for some years to the position of only allowing one or two application ports open per server. Whilst extreme, I think it was a major contributor to the lack of downtime and systems failure.

We all know that different applications on the same system can interfere with each other, particularly in a Windows environment with shared libraries and poor memory protection. But tracing such problems generally requires special tools and a thorough investigation. If a fault only occurs very occasionally then this analysis is rarely done. From working in an environment with applications so completely partitioned on different servers, I think the impact is dramatically reduced failure rates.

Of course there are sound security reasons for this separation as well. If one application is compromised then it is much more difficult for that to lead to compromise of other applications if there aren’t any.

The major tradeoff here is that many servers are left grossly underused and that has knock-on implications, particuarly for space, power and air-con in the datacentre. These are increasingly important concerns and I would urge anyone to resist allowing those concerns to dictate server planning. Just buy more racks, supply more power and fit more air-con.

Passwords

Even less common is the separation of passwords (or keys) into domains. This is only about security and it is the application of submarine compartments just as with networks. If someone manages to crack a password then where else can they use it? The same thing goes with keys, specially ssh keys.

Passwords, and keys in particular, are much more carefully guarded than most information because the inherent security value of them is instantly recognisable. Despite that, I doubt that many people do a formal analysis on what happens if a password or key is compromised – just how far can people get and what is there to stop them?

Conclusion

That leads me on to the final point that this principle is not one that needs a blanket imposition, but rather a risk-based analysis to determine where best to use it. This is undertaken by asking the basic questions of ‘what else can be affected if this system fails?’ or ‘what other systems can be targetted if this system is compromised?’. Asking these questions can produce some uncomfortable answers.

Categories: Machines

Loosely coupled systems

November 3rd, 2006 No comments

It is increasingly common to find that applications have remote API access. With the ubiquitous use of web technology people can add XML based remote interfaces quite quickly.

The temptation is now there to connect together all sorts of systems using these APIs in an event driven fashion. For example, if we want to connect our customer database to our mailing list system, which has a web based API for managing users, then this can be easy to set up. Whenever there is a change to our customer database this triggers some code that uses the mailing list API and we have the change propagated in real time. Seeing systems connected together like this with real time interaction is addictive.

If we’re not careful though, we end up building tightly coupled systems when this might not be what we want.

To explain this, let’s talk about a sending system and a receiving system and go through some scenarios:

What happens if we want to take the receiving system down for an upgrade?

Hopefully that’s something that was planned for and the sending system can detect if the receiving system is not available and queue the changes. Then when the receiving system comes back on line the sending system can process the queue. So that one was easy enough.

What happens if the receiving system dies and has to be restored from a backup tape?

Now that’s a bit more complicated because the changes might have been lost. To get around this all we need do is ensure that every change is written out to a table or file of changes. That way we can replay the changes lot by restoring from backup to get the receiving system back to the right state.

What happens when the format of the changes, changes (if you see what I mean)?

This happens all the time. The format changes so we change the code that fires off the changes but forget to change the code that stores the changes. Either that or we do remember but make a mistake in the storing of the changes. Well this is also easy to get around, we just ensure that the process on the sending system that sends changes to the receiving system uses as it’s source information, the stored changes after they are stored. That way, there is only one source and we can’t introduce an error we don’t find until much later.

What happens if for some reason we don’t want to send the changes?

Let’s just imagine the receiving system develops a fault in the API, say after an upgrade, and trying to use the API causes it to crash. Obviously we don’t want to turn off the sending system so we could firewall the receiving system, but that’s a bit of a pain and may not be possible. The obvious thing to do is to turn off the process on the sending system that fires off the changes.

And there you have it, a loosely coupled system. All it takes is for us develop the sending system so that it:

  • writes out every change to a table or file of changes
  • uses the written out changes as the source of data for publishing those changes
  • makes the process that publishes those processes a separate process that can be started and stopped independently of any other systems
  • enables that process to start from any point in the stored list of changes so that you can replay changes.

The final thing to point out is that this has an unexpected benefit. If we want to test a new version of the receiving system API then we can just set the sending system process to replay the last few days of changes and see if the receiving system copes. No special code needed.

Categories: Machines