Today I changed the name of this blog to reflect my endeavor to post more than just little snippets of notes and references for problem solving. I will continue to post those as always, but in addition I am going to attempt to chronicle my life as an IT Manager at a professional services firm.
I wear many hats even though I work as part of a management team in a fairly large IT organization. I get to be "specialized" to the point of being focused on enterprise-wide infrastructure operations and projects. Although I tend to hate my job, I love the technical work that I do and am privileged to work with a diverse team of fun people. Aside from that, like most organizations, I think my greatest challenge is the organization of this business, even down to the organization of the IT department. Part of being in this sub-department includes being a catch-all place for unknowable questions and the final destination of superstitious help requests that nobody else can figure out and deem them "network issues." My documented responsibilities are enterprise wide design and support of server side systems including physical servers and storage, messaging, server workloads(vmware guests), and infrastructure such as DNS, DHCP, etc. The enterprise team is unfortunately without a network manager and hiring is frozen indefinitely. That means I get to split network responsibilities with my boss. This adds some aspects to my job that I enjoy, but it makes for busy days.
- arrive at work, urgently needing the restroom after a long commute.
- before visiting the facilities I got pulled into troubleshooting an odd problem
with a system supporting our cardkey system. Long story on background, but a
recovery operation conducted on a large number of VM's to fix a storage
issue, somehow resulted in the software being "unregistered" since it didn't
see the server as the same hardware. Delegated someone to work with this
totally useless vendor to re-register or reinstall or whatever.
- discussed with my guys some alerts we received overnight and a helpcall
regarding access to our intranet from a specific office. Suspected an issue
with Riverbed (much more on this thing in the future...) However this
problem has gone away by the time we troubleshoot it.
- restroom break - finally.
- respond to stupid e-mails from stupider people about a project that has been a goat rodeo since it's botched initiation phase. As the non-stupid person on the project team it is my task to do all useful work and answer every single idiotic question they come up with now that the vendor seems to be checking out since the poorly defined
requirements are 50-85% complete. (I find I cope better when I have low expectations and don't try to be the superhero of every project, especially the more stupid ones.)
- received update that completely uncooperative local tech in one of our asia offices failed to follow instructions for connecting a new physical server to the network switch and we will not be able to move forward with installing it until another day is wasted on asking for her to correct the situation.
- one on one meeting with my most experienced guy. Discussed many little items that had cropped up while he was out of the office on leave. And walked through the security architecture of a get-it-done-yesterday extranet project that has long been whined for but at numerous times when we get into details and ask questions about requirements the effort has fizzled out.
- Got update on cardlock server which is dragging on half the day to do what the retard vendor said would take 5 minutes
- Get coffee and run into extranet developer, successfully dodge questions and avoid
standing there all day getting grilled about stuff that isn't approved.
- research into Scrutinizer netflow aggregation software. briefly compare with
Orien plus SolarWinds with a colleague who had used them in the past.
- use Scrutinizer to further investigate an issue with qos for an office that was
recently expanded and reconfigured. [Yesterday spent watching the service policy
report dropped packets from priority queue (over 2000 in 24 hours.) Tried using IP Accounting to sort out if something strange was going on or if there is just too much traffic.] Scrutinizer is a great utility and I am going to recommend
purchasing it. The free version is useful on it's own -- I gained several
insights today -- but actually buying the real product will give us tons more
features/options and the ability to store the data longer than 24 hours and get
An aside here -- even though the telephone system is under the responsibility of our department it is very tighly siloed and many systems details are hidden deep in documents that nobody knows are there or are on screens in systems that we don't have a logon to access. Paranoia and limited networking skills on the part of our one primary resource are a big headache as we integrate more with voice systems. It's been 18-24 months that we have been in the same department and yet we are still dragging details out about how IP trunks have been configured and what systems a
voice call passes over.
- Get more coffee. It's already after 2 and our cut-to-the-bone staff is light today due to vacations and sickness and I'd probably get called back if I tried to go to lunch so I'll just eat this coffee cake that somebody dumped here in the kitchen after
nobody ate it at some meeting or something.
- After finding some interesting things in the netflow data I have some questions to ask of the voice person/vendor. I also see some odd things hitting the priority queue that shouldn't. I check the router configuration and see an obvious error and correct
- after seeing over 2000 drops again today we shall see what tomorrow
brings. However I am still seeing packets getting dropped.
- Talk to yet another useless network service provider about a chronic problem with a circuit on a backup WAN link to one of our domestic offices. This circuit went down
shortly after activation and went unnoticed for a week or so since it was a redundant T1 on a backup WAN router. When it was reported it spent a week getting passed from the service provider to the LEC and back only for them all to report it was fine, even though nobody ever called us about it and it was still down. Ticket was reopened and then mysteriously got closed without any action. After over a month we are escalated to a service manager and this guy was so scatterbrained. He put me on hold about 5 times while he talked to people about my issue and got paged by his boss about something else, etc. Every time he gets back apologizes again that this ticket got closed for no reason last time. He made lame small talk while his slow system came up and expressed a sentiment that I share but I would never tell one of
my customers: "I love routers, I'd rather work with routers than people..." ROTFL ... So anyway while I'm on the phone trying to get him to call and yell at the LEC again to get them to finally come onsite to check this out apparently the LEC is doing testing on this circuit for the 3rd time. And by the time I get off the phone with this crazy guy the circuit is actually working. Then I get to call this guy back and go through a long process of convincing him that it's now working after he reads the number wrong pulls up the wrong interface and swears to God and me that it's still down. Finally he is convinced and will call the LEC and tell them not to go onsite and charge us a fortune.
- Get a negative response to a backchannel request to get the RTM
of Windows 2008 R2 now, instead of 2-3 weeks from now, so that we can proceed
with upgrading our domain prior to another big project that is likely to affect
AD. Realize I've never gotten a response to my official request from our TAM
that seems to spend his life e-mailing everybody patch notices and e-mails full
of links to stuff on the MS website.