Tuesday, November 06, 2007

Paremus colleagues continuing to flag articles to me, and partially restored vigor (New Year and all that) - have conspired to overcome my Q4-07 Blogger's Block.

I'll start by briefly para-phrasing the recent crop of virualization articles.

Virtualization is great because...
  • You can increase data centre resource utilization.
  • You can simply restart a service on a new physically platform, should the current physical platform fail.
However there may be some dark clouds on the horizon because...
  • Resource / Dependency Management and Security are problematic
  • Operational Risk may adversely affected
Well, I'll go to the foot of our stairs!

Isn't it obvious that, "visible" runtime complexity is increased by current forms of virtualization. Given this, it is surely no surprise that virtualization can negatively impact manageability, OPEX and, ironically, service availability?

Increasing server utilization at the expense of increased runtime complexity seems like a poor trade; especially if you remember that complexity is proportional to the number of skilled personnel required - and so OPEX. From a recent survey by Sun Microsystems ( Sun Survey ) it would appear that many CIO's would agree.

Meanwhile, we're told that:

"the IT industry will develop a new generation of management tools to address manageability and security issues created by virtualization. A great opportunity for start-ups and large IT companies alike".


Doubt this? Then check the current datacentre virtualization hype, and the number of VC funded companies in this market sector. Clearly the adage - "Identify the Pain - and sell them the Aspirin" is still in vogue with our VC friends. Unfortunately such strategies are at best simplistic; at worse, they demonstrate both the level of stupidity only achievable via a fully qualified MBA, and also the lemming behavior of the IT industry.

This time, the patient (Enterprise IT), really does need more than yet another, in a long sequence, of expensive Aspirins.

But perhaps the established IT vendors will address the problem?

Let's see. How many established vendors after 15 years of client server computing have enterprise management frameworks that are:
  • Simple to Use
  • Cost Effective
  • Simple to deploy
  • Address simple requirements like configuration management for software, server, storage and networks.
Its been a while since I've been involved in this area (i.e. HP Openview, Tivoli and the like), but I suspect the answer is still the same.

So what real hope is there for extending such solutions to address the new complications posed by the service virtualization?

I'll let the reader come to their own conclusions.

Wednesday, September 19, 2007

Complexity - Part I: What would IT Marketing do without it?

For all its press coverage, little effort has been made in defining "Complexity" in a manner that is relevant to the modern enterprise.

So here goes...

We'll start by imagining two abstract distributed "systems"; each system an infinite 3 dimensional lattice, each in a 3 dimensional space -- we'll avoid distractions caused by none-euclidean geometry :).

However, whereas the first lattice comprises of regularly spaced identical nodes, the second lattice has randomly spaced identical nodes.

Here is the crunch.

Whereas the first regular lattices may be simply, and completely, described in-terms of,

* A description of a node
* A description of the offset of a selected node from your chosen co-ordinate system, and
* The 3 parameters that describe the spacing between the nodes.

In contrast, the second random lattice needs an infinite number of spacing parameters to describe the system to the same level of accuracy.

By choosing to model each system in this manner, the first system is seen as trivial, whereas the second is infinitely complex!

Now, let us assume that relative node position is not important, and that instead we use an emergent property; in this case the density ( the number of nodes within a given volume of space).

Now the amount of information required to describe each system, is identical, and reduces to

* Composition of a node
* Density of nodes in a given volume of space.

Whilst "density" is only an abstract concept, it never the less captures important characteristics of each system with minimal information, so hiding in the case of the random lattice an infinite amount of structural complexity.

I'll now define System Complexity as, a measure of the amount of information required to describe a System; but crucially, with respect to the System Properties that are of interest to us. Furthermore, by defining/modeling a system w.r.t relevant Emergent properties, we can dramatically reduce the amount of information required to usefully describe the system. The model, representing the System and it's emergent properties, isolates us from potentially vast amounts of internal structure / complexity.

Also, for a given System, the abstraction / model that optimally describes the relevant emergent properties, with the least information; provides the least complex representation of that system.


Back to IT

Whilst IT provisionals are no longer required to understand:
* The arrangement of silicon atoms required to produce semi-conductors
* The detailed architecture of the processor or memory chip in use
* The firmware used
* The specific considerations in an OS kernel design

The resultant distributed systems are still "complex"; complicated by the fact that they consist of many inter-dependent components and services, each of which must continue to function within a volatile runtime environment.

The response to this "complexity" can be seen in every FT/Fortune 1000 company.

* Attempts are made to lock down the runtime infrastructure, to completely describe it, and prevent changes to it. More recently, attempts are made to virtualize / abstract to runtime infrastructure in a manner that presents an unchanging persona to the static business systems.
* Meanwhile, software middleware is treated as strategic investment - with physical silos of grid computing, ESB's and data caching introduced into these environments, the mandated then made that these infrastructure services must be used.

What is wrong with this consensus approach? Quite simply, as with the random lattice example, organizations are viewing their systems and so associated system complexity in the wrong frame of reference! And then attempt to address perceived complexity issues with a series of measures that actual drive up operational costs whilst impacting service agility and availability.

Enough for today - next blogging session I'll provide, what I believe to be, the answer ;)

Thursday, July 19, 2007

The Death of Middleware??


Recent attempts - no I'm not saying who :) - to justify centralized approaches to enterprise middleware, in the light of current application modularization trends, triggered a fond memory of driving from San Francisco to Palo Alto, probably sometime 2004.

In between the usual process of struggling with US road signs, the in-car navigation system, and for us Brits, being on the wrong side of the road, I noticed two advertisements. The first, "Middleware Everywhere" was courtesy of IBM, this seemingly in response to a billboard a mile or so further down the road (or visa versa) "The End of Middleware" courtesy of Sun Microsystems.

Ironically, counter to what tradition middleware companies may have you believe, both marketing messages may now be rapidly realized by application modularization, fueled by OSGi and dynamic composition, exemplified by SCA.

Whilst SCA allows Service bindings to be defined at application composition time, OSGi allows these bindings to be dynamically loaded and used by dynamically assembled runtime applications. In principle, the relevant infrastructure messaging / caching components may also be dynamically deployed alongside business logic; see the Newton open source project, and its commercial big brother - Infiniflow - for examples of this approach to dynamic "Business System" assembly.

So - no longer strategic, high cost, high risk, monolithic frameworks that constrain application agility and scalability - "middleware" will simply be the ensemble, or aggregate, of all applications bindings and associated infrastructure components - in use - at each point in time!

The impact on the industry should be significant. Enterprise Service Buses, Space Based Architectures, Message Centric, direct synchronous / asynchronous communication, Web Services?? Ultimately why should we care? Rather than purchasing that strategic "all-purpose" Hammer, and treating all Enterprise inter-service interaction as Nails, lets start using the right tool for the right job; dynamically deploy the appropriate infrastructure service alongside the applications they serve!! And whilst we're at it, lets do this in a manner that increases overall resilience and rips OPEX costs out of the operational environment.

So perhaps IBM's "Middleware is Everywhere" was nearer the mark - that said perhaps Sun's response should now be "Yes But - Enterprise Middleware is rapidly becoming Irrelevant".

Friday, July 06, 2007

We live in exciting times!

Java EE 6 is announced. The Interface21 folks think its finally "right", and the daggers are drawn as the old JBoss boys feel the need to defend their position as popular open source JEE appserver vendor (see theserverside).

Extensibility and Profiling are a couple of key features in Java EE 6.

Mmmm. So I can take my very bloated Java EE infrastructure and reduce it to merely bloated.

I'm almost sold on the idea ;-)

But hang on? What about OSGi and SCA. Can I not already dynamically build very sophisticated distributed composite applications that adapt and evolve to their resource landscapes? Such distributed application services only running loading and running what is required at each specific point in time. These solutions self-managing, self-configuring and self healing?

Well actually, yes I can - and Java EE - in any form - doesn't figure!

On a finishing note - a nice article (concerning Web Services) whose underlying message is, I'd suggest, as equally applicable to the monolith Java EE v.s. composite OSGi / SCA debate.

Wednesday, May 30, 2007

Venture Capitalists embrace Command Economy in preference to Free Market!

A recent article Interesting Times for Distributed DataCentres by Paul Strong (Ebay - Distinguished Research Scientist ) makes a number of interesting points:
  • For Web2.0 Services to scale, you MUST back-end these onto massively horizontally scaled processing environments.
  • Most Enterprise datacentre environments are moving towards, or could be considered as, priomordial Grid type architectures.
  • What is really missing is the Data Centre MetaOperating System - to provide the resource scheduling and management functions required.
Whilst these arguments are correct, and highlight a real need, Industry & VC response seems entirely inappropriate.

Whilst VC and major Systems Vendors are happly throwing money into expounding the virtues of loosely coupled business models enabled by Web2.0 and all things WS-SOA; somewhat perplexingly, they also continue to invest in managment / virtualization / infrastructure solutions which drive tight couplings through the infrastructure stack. Examples include data centre "virtualization" or, as per my previous blog entry on the Complexity Crisis, configuration / deployment management tools.

Hence, industry investment seems to continue to favor the technology equivalent of the "command economy" in which the next generation of distributed Grid data centre is really just one more iteration on today's; central IT organisation control/manage and allocate IT resource in a rigid hierarchical/control command structure. The whole environment is viewed as rigid system which one centrally controls at each layer of the ISO stack; approaches that continue the futile attempt to make distributed environments behave like MainFrames!

What is actually needed is a good dose of Free Market Economics!
  • Business Services dynamically compete for available resources at each point in time,
  • Resources may come and go - as they feel fit!
  • Infrastructure and Systems look after their own interests, and optimise their behaviors to ensure overall efficency within the Business Ecosystem.
Successful next generation MetaOperating Systems, will heavily leverage such principles at the core of their architectures!

You simply cannot beat an efficient Market!
A new survey posted on GRID today highlights the Risks associated with Infrastructure Complexity. Interesting highlights include:
  • Each hour of downtime costs Fortune 1000 companies in excess of $300,000 according to 1/3 of the survey responses.
Of course, dependent on the specific Industry, these figures could be so much larger! Everyone tends to focus on availability/scaling issues for the new Internet based companies (Google, Yahoo, Amazon, Ebay). However, if you want to see real risk - consider the impact on some of the core systems that support global Banking / Financial systems.
  • Trouble shooting the problem can take more than a day. According to 1/3 of survey responses.
So if these are the same guys that have the $300,000 an hour loss - the figures are starting to mount up.
  • Change Management for Fortune/FT 1000 companies occupies 11 full time people!
  • Installation and configuration of core applications is a major resource sink; taking 4 days to configure a complete application infrastructure stack.
The report then goes on to justify change management / configuration management products. The implication being that to address the complexity issues, these Fortune/FT 1000 companies need to purchase and configure yet more enterprise software?

So Layering Complexity upon Complexity!!

I wonder, just what is the Production impact, if after all this automation, you loose the systems that are doing the automation and configuration?? I suspect recovery would be significantly longer than 1 working day!

The truth of the matter is that Enterprise Systems including those based upon the latest ESB, Grid, WS-SOA Marketectures are the root cause of the explosive increase in Complexity.

Each of these approaches implicitly assume that:
  • The compute resource landscape is static,
  • Software functionality is static
  • Provisioning is thought of as a one time event, and
  • Failure is treated as an exception.
Whereas in reality:
  • Compute resource landscape is dynamic
  • Software functionality needs to evolve and adapt
  • Provisioning is an on-going process - caused by
  • Continual - re-optimisation against the shfting compute landscape and recovery from failure.
So how do these Fortune/FT 1000 companies dig themselves out of their current Complexity Crisis?

By building the correct IT foundations for their businesses! Fortune 1000 companies need to implement Enterprise wide solutions where configuration, adaption and recovery are core design features. Systems configure, deploy and maintain themselves, as part of what they do (by way of an example - see Infiniflow)! Such solutions will also heavily leverage industry trends towards modularization via OSGi & SCA.

Whether you are the CIO of a Global Bank, a Gaming Company or a Telcoms company, once the correct technology foundations have been put in place - no easy task - significant OPEX savings WILL follow. However, take the easy route - fail with the foundations, avoid necessary change - and no amount of management, configuration or deployment software bandaid will save you!

Saturday, May 12, 2007

Its been a hectic month, with Paremus working on a number of projects including the SSOA demonstrator for DODIIS, our product release, and getting ready for JavaOne!
















As can be seen to the far right in the above photo - Paremus shipped some 16 Mac Mini's to JavaOne to provide real time demonstrations of multiple distributed SCA / OSGi based Systems running across an Infiniflow Enterprise Service Fabric! Each SCA system was dynamical instantiated, and we demonstrated the isolation and recovery capabilities of the Service Fabric by failing compute resources (well actually visitors to the stand insisted on pulling the cables) - without impact to the running Systems!

I was on stand duty for much of the time, so didn't get first hand experience of the presentations. However, feedback indicated that the usual keynote presentations were, well, as expected; but that both the OSGi and SCA standards are at the "tipping point" - with a significant increase in delegate interest and vendor activity relative to last year.

In addition to the usual conversations about OSGi, SCA, WS & Jini, those passing the Paremus stand may have overheard conversations concerning genetic algorithms, the importance of CAS and even the apparent failure of string theory - the latter, I hasten to add, having nothing to do with the Infiniflow's architecture :)


I'm almost looking forward to JavaOne 2008!

Wednesday, April 11, 2007

An new white paper concerning the synergies between OSGi, SCA and Spring can be found on the OSOA site, well worth reading for those you want and introduction to this field of activity.

The white paper concludes that "SCA, OSGi and Spring together provide powerful capabilities for building service implementations from simple sets of simple Java Beans using a few simple API's".

The one interesting omission is any real discussion about the challenges of building distributed OSGi, SCA, Spring based distributed systems. Whilst the white paper explains the virtues of dynamic dependency resolution, this is only within the context of a static resource landscape, and so fails to acknowledge the additional changes presented by Peter Deutsch's 8 Fallacies of Distributed Computing.

For those interest in dynamic distributed systems based on OSGi and SCA that support Spring, check out the Newton project. Newton is itself built from the ground up to be a robust distributed runtime environment using OSGi, SCA and Jini as foundation technologies, providing a "robust" - in the true non-marketing sense of the word - enterprise runtime platform for Java Pojo based applications including Spring.

A final thought - there still seems to be a real lack of understanding within the industry w.r.t. the fundamental relationship between agility, distributed systems, complexity and OPEX (operational expenditure). A subject I suspect I'll post more on, once I've caught up with my day job.

Saturday, February 17, 2007

The "Hidden Costs of Virtualization"

An interesting article arguing that whereas OS virtualisation is sold on the costs savings achieved by higher CPU utilisation of existing resource, several cost factors seem to be overlooked. The most important of these being that operational costs scale with number of OS instances ; it immaterial whether these instances are OS's real or virtual. The article also points out that commercial OS virtualization software is not cheap; around $20,000 for VMware ESX for a 4 way Intel box, though open source solutions should in due course pull this pricing down.

However, I'd suggest that there are a number of additional considerations.

OS virtualisation, in itself, does nothing to address inherent complexity issue within modern enterprise environments. Instead of sprawl of physical machines with poorly managed applications and configurations, one can now extend these complexity and management issues into a virtualized resource space! Obviously, OS virtualization management is needed, and indeed provided, as commercial products, by virtualization vendors.

In many respects are we not back to where we started? Sure, we can now drive up CPU utilisation, but the runtime infrastructure is more complex than ever. Meanwhile business applications are still as brittle, as tightly coupled, as change resistent as they ever were!

Also, is increased CPU untilisation, at the cost of increased complexity, a good trade?

Driving up CPU utilization has got to be good, right? Indeed, many CIO's want to make dramatic OPEX cost savings by driving CPU utilisation to ~100%.

Yet, whilst running a large datacentre's CPU resource at single figure utilisation levels is an obvious cost issue, what seems to be overlooked are the issues associated with running resources at near maximum utilisation.

Load volatility is an obvious concern. If you achieve, on average 80% utilization across your resource population, just how do you cope with peaks that require say 50% more resource? The standard response may be to outsource the extra resource requirement to a third party utility compute supplier. Yet whilst frequently discussed by the industry, I'm not aware of many over-capacity deals. Quite the contrary, early entants into the Utility Compute Market have recently dismanled facilities due to lack of commerical interest.

Yet, there is a more important issue; namely operational risk. Evidence suggests that compute resource under excessive load is statistically more likely to experience software failure (reference). Moreover, any complex tightly coupled system may suffer cascading failures ; i.e. an initial component failure cascading into a major system outage.

Hopefully, such cascading failures are the exception, however component failure within a heavily loaded environment will always be more intrusive than an for a lightly loaded equivalent, as dependent upon priority of lost service/component, other running services may need to be terminated to free sufficient resource.

Hence, Operational Risk and Data Centre Resource Utilization are issues that are fundamentally linked; linked by the sizes of the potential failure domains within the system.

Failure domains may be defined by / mapped to:
  • Physical locality / Physically shared resource.
  • Hardware type
  • Software type / version
  • Management / Security domains
Considering "physical" failure domains. The following domains usual exist:
  • A data centre facility - (complete power failure, halon release, collapse of network infrastructure)
  • A shared PDU - ( possibly affecting 25% of data centre resource - assuming critical systems like San storage and IP Newtorking - a wired into at least two PDU's)
  • A network switch failure - perhaps impacting 50 servers if each is single homed.
  • A cabinet failure - perhaps affecting the 100 processors in that cabinet.
  • Single, physical machine.

Without spare usable capacity, re-provisioning OS instances caused by a single cabinet or even machine failure may prove challenging.

Perhaps we're need to think about things in a slightly different way?

Conventional Wisdom:

Large datacentres are running out of space and / or limitied by environmental conciderations (power, air-conditioning requirements). Usually, there is little possibility of building a secondary large datacentre facility within the Metro area, because cost can by substantial, and suitable real-estate is not available. Hence, virtualize compute instances and maximally use what resource is already there.


The Alternative:

Instead of a single large datacentre facility, adopt a modular datacentre; these distributed over a larger geographic region. The largest failure domain that we care about is a datacentre module instance, so for 'N' active modular datacentres, we need 'N+1' to allow for complete failure of any instance. So by modularizing and virtualizing the datacentre, we actually increase the ability to use spare CPU resource per datacentre instance, without impacting operational risk.

In this respect Sun's Black Box strategy was an interesting marketing move. I say "marketing" as in reality the hardware required to realise a modular datacentre centre is the easy bit! The difficultly lies in the unstated, but very real requirement, for a distributed/extremely robust composite application resource broker that seemlessly knits these modular resource together into one robust distributed Enterprise/Utility Service Fabric.
Along with some of the Paremus team, I attend the London JSIG meeting hosted by Alex Blewitt , the EclipseZone editor and Neil Bartlett. In summary - a really good session!

Alex and Neil gave an excellent introductory presentation and demonstration of OSGi usage. Some of OSGi's key differentiators in the areas of versioning behaviour / management were communicated in a clear and compelling fashion. Well worth looking through the slides, which are now available on the JSIG site.

Prior to the meeting also had the pleasure of meeting Glyn Normington from IBM. Glyn, who had travelled up to London for the JSIG, is spec lead for JSR 291 and contributor to both JSR 277 and JSR 294. Those interested in development in these areas should certainly keep an eye on Glyn's blog.

Finally, a question to Alex - clearly a master of Apple's Keynote presentation software :) I have keynote 3.02, but I cannot find any of the transitions you used. So what is the secret?

Tuesday, February 06, 2007

EclipseCon 2007

For those of you attending this years EclipseCon conference in Santa Clara, Robert Dunne - one of our lead Infiniflow / Newton developers - will be presenting a short piece on the classloader consideration when attempting to integrate OSGi with Java RMI.

Paremus will also have a stand in the exhibition centre, so feel free to drop by and quizz us on our approach to OSGi, Jini, SCA, or any aspect of distributed pervasive autonomic computing ;)

Saturday, January 20, 2007

cognito ergo sum?

Unfortunately, 'I blog, therefore I am' seems to have greater resonance these days.

So, dragged kicking and screaming by my Paremus colleagues, I've agreed to start, what feels to be the somewhat unnatural behavior, that of 'blogging'.

The Usual Disclaimer:

Being the founder and CEO for a enterprise software company based in the UK (yes - I did say the UK!), my views hopefully influence my colleagues, and company direction ;) That said, my views are my own, and do not formally represent those of my colleagues and company.

In summary, my career initially started as an Astrophysicist, but a family and mortgage finally convinced me that I need to earn a living, and so I became an economic migrant, moving into an IT career with a major Investment Bank with offices based in London. After 7 very interesting years, I made the difficult decision to leave the Bank, and set-up Paremus in 2001.

"Great timing!", I hear from those of you whom started ventures around the same time ;)

I've always been more interested in the underlying principles, concepts and fundamental "truths", rather than the specifics of a system implementation. With the creation of Paremus, I found these interests translated into a deep curiosity concerning Complex Adaptive Systems (CAS), Recovery Oriented Computing ROC and how these concepts might finally address some of the fundamental issues faced by modern distributed enterprise systems.

Whilst these interests have ongoing influence in our internal research and product development programs, they were also a key driver for the formation of the codeCauldron community. The Cauldron community, founded by Paremus in 2006, has the simple intention of fostering the development of next generation of distributed autonomic system. For our part, Paremus engineers, have successfully leverage some of the CAS / RoC design principles to which I refer to create Newton; a distributed OSGi / Jini / SCA based service framework, again hosted by the codeCauldron open-source community.

So, if you find my subsequent ramblings of interest, you may find a visit to Cauldron worthwhile.

Thursday, January 11, 2007

A not so recent report on end-user adoption of SOA by Saugatuck Technology makes interesting reading; especially if one shares my misgivings about the Industries' ongoing Web Services mantra.

I found the following extracts particulary enlightening:

"... it became clear that many ( early SOA adopters) are merely managing a collection of Web services, and have yet to make a strong commitment to SOA as a management discipline — as opposed to an integration technology."

additional;

"... ironically whilst 57% of end users cited cost reduction as the primary driver for the adoption of SOA, no evidence was found for short-term operational cost savings, though longer term cost savings were expected".

and finally

"only 23% of adopters expected to increase business agility from their SOA any time soon."

So wrapping an existing business service to create a Web Service has no immediate effect on either operational costs or system agility?

Well, I'll be damned!

But why the suprise?

In reality, the Web Services revolution had little to do with making enterprise business systems more agile or robust. Rather Web Services enable existing monolithic, and operationally brittle and expensive services, to be delivered through corporate firewall infrastructure. This allowing for the potential of service outsourcing or the use of alternative White Labled or ASP type services; these service delivery models of great interest to the giants in the IT industry.

Hence, for those that rely on a "wrap it, and make it a Web Service" - cost reduction, resilience and agility benefits will most like remain elusive/unobtainable goals.

To achieve these objectives, one needs to radically re-think one's approach to enterprise IT ;)