Saturday, July 24, 2010

Dalai's First Law

PACS IS the Radiology department. 

Agfa IMPAX 6.3 has been completely dead for the last three plus hours.  This takes down a Level One trauma center, and two other smaller hospitals.  So far, I have no explanation, and no end in sight.  The trauma center might have to consider a Code Yellow, as we have seen in Western Australia. 

I'm thinking we need to reconsider the concept of distributed architecture, which was discarded for the central archive and production processor model which is currently betraying us.  Even with three redundant application servers, we are down and dead. 

I have no further comments at this time.  I will let everyone know what happens when we get back up and running, although there will be a very large number of studies to be read when that happens.

I will invite Agfa to submit their narrative of these events once the dust settles.  This will, I'm sure, prove interesting and informative.

ADDENDUM:

We seem to be back up, after three and one half hours of downtime.  I've got some work to do, if you'll excuse me...

9 comments :

PACSFerret (II) (in York said...

To complete your metaphor... PACS IS your trauma centre/hospital/clinic. The lesson we HAVE to take from the last 10 years or so is that failure of certain systems (including PACS) is simply not an option.

The answer will NEVER come from a single vendor. That is certainly the lesson I have learned. A break-the-glass alternative is a MUST even if it isn't as UI-friendly as the primary system.

PS Sorry for shouting.

Anonymous said...

At 3 hours downtime... why wasn't the downtime plan activated? Systems are GOING to go down there is no avoiding it. No PACS vendor can have 100 % uptime. The Radiology department should have had a downtime plan in place and ready to go after the first 30 mins. You cant just keep sending images into the system and hope someone will be able to see them. This was a trauma center ( right?) to not have an easily executed disaster/downtime plan in place is inexcusable.

Yeah AGFA should have been on it harder and faster but all the blame for your pain does NOT rest with the PACS vendor the Hospital has to take responsibility for not having a back-up plan in place

stacey said...

distributed architecture, which was discarded for the central archive and production processor model which is currently betraying us. Even with three redundant application servers, we are down and dead"...

I think that the "distributed architecture of the future"... is the cloud...

Anonymous said...

Agh... Sigh... Another slam of Agfa. I think you would be completely embarassed if you really learned how Amicas operated in a hospital environment. Just embarassed. I know it works for you, but it has the SAME problems that every system has, in MANY cases much worse as it is not ready for primetime use in a hsopital. Not even close.

Dalai said...

I take it you have Agfa PACS there in Springdale, Arkansas. I can tell you that the two AMICAS systems we use have NEVER had this sort of downtime, and even if they did, it wouldn't excuse Agfa's glitch. Did someone in the main office put you up to this comment, or did you decide to defend this on your own?

3.5 hours of downtime is completely unacceptable, especially at a Level I trauma center.

I have yet to hear from Agfa as to what went wrong.

Anonymous said...

All three app servers down? Sounds like a network problem to me. Does Agfa maintain your network?

Anonymous said...

All 3 app servers down at once? Perhaps an expired ssl cert common to all 3 servers? If that should be the case and patient care was impacted, I would think an FDA customer complaint to Agfa would be in order. If it was an LDAP issue, a chat with your network folks. Please keep us informed as to the root cause.

Anonymous said...

Enterprise workflow manager down?
Load balancer failure (High Avail option not bought)?
Database failure (no dataguard)? Could be a common cert or an LDAP issue like mentioned before.
Network issues can isolate a department from PAC's (especially if core is offsite).

Things like this do happen from time to time and it sucks for all parties involved when it does. Nothing like finding a single point of failure the hard way. However, no failure should ever be repeated.

Anonymous said...

the root cause of the down?