[aclug-L] Preliminary outage details
[Top] [All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index] [Thread Index]
The below note arrived from my upstream hosting provider today. Some of you
may have noticed outages today; some of you not. No mail was lost (the
backup MX was reachable by complete.org all day).
This is the first outage we have experienced since moving to the new hosting
provider following the nasty implosion at the previous one. I am confident
that everything they say below is true and reliable. My own diagnosis
throughout the morning corroborates much of what they have said below, and
indeed I was able to access certain machines from alternate addresses.
Those of you in Europe seem to be most likely to have noticed no outage
whatsoever.
-- John Goerzen
----- Forwarded message from support@[hidden] -----
From: support@[hidden]
Date: Fri, 20 Sep 2002 12:18:04 -0700 (PDT)
To: jgoerzen@xxxxxxxxxxxx
Subject: Preliminary outage details - johncompanies
Bleah,
This is a short, preliminary reponse to address the major network problems
in southern california that have affected our network greatly. I will be
mailing out a full and detailed message later today.
Here is, briefly, what we know so far:
1. there was a big, long network outage.
2. At no time did any of our machines or network components go down - it all
happened several hops upstream from us. When you login to your server (if
you haven't already) you will note that the `uptime` command supports that
we did not have any failures on our end. This is actually very important
because:
3. You were _not_ down/inaccessable to everyone in the world - just a fair
amount of it. As many of you have already written me, you can actually
login to and ping your system from other places if not from your normal
location. In fact, a good portion of you may be wondering what all this is
about because you experienced no outage at all from your location (although
it is a good bet that other people in other locations may not have been able
to reach your system)
4. We don't know the cause, and we aren't getting any semblance of a straight
answer from sprint. In my experience when I don't gert a straight answer it
is because something _really dumb_ happened (like a backhoe cut a fiber line)
However the fact that there was network connectivity from different places
throughout the entire outage makes it seem like the "really dumb" action
was more along the lines of a router misconfiguration - perhaps a new
firmware injection gone bad. We'll tell you when we find out, if ever.
5. And finally, as of right now, it looks like _everything_ is back to
normal, and we have had some tentative reassurances that it will stay this
way. As some of you noticed, everything did come back to normal about
two hours ago, but then after 15 minutes it started going bad again. This
time is different, I'm told :)
This is the first time we have ever had a massive upstream network outage
in two years of being located at this datacenter. Rest assured that we are
doing everything we can and leaning/screaming on/at the right people to make
sure it doesn't happen again.
Again, more details to come later tonight as we get them. Please don't
respond to these mass mails unless you really need to - our inbox is flooded
as it is. Regardless of how technical or boring the final explanation is,
we will pass it along to you - as most of you know we never leave people in
the dark.
And finally, if you are one of the lucky ones whose routes were unaffected,
then please disregard all of this and have a nice day.
--john
----- End forwarded message -----
-- This is the discussion@xxxxxxxxx list. To unsubscribe,
visit http://www.complete.org/cgi-bin/listargate-aclug.cgi
- [aclug-L] Preliminary outage details,
John Goerzen <=
|
|