2006/10/17

Write something

It's been over a month since I last posted. It's not like I haven't been dealing with lots of enterprise-SA type material, just that I've been too busy to even breathe, much less distill my thoughts into something for this site. But since I'm sick right now, I sorta have a little bit of time on my hands...

Some of the recent topics that are worth discussing (probably in their own posts, or several posts)...
  • Thoughts from the monitoring meeting (discussions about what we need for enterprise monitoring, but not all related to monitoring): false buy vs. build dichotomy; fundamental architectural difference between BB-style and SNMP trap... (no explicit "OK" status) ; Industry combination of Monitoring tools with Management tools; the myth of Agentless monitoring; SNMP support on Windows (SNMP Informant)
  • An Infrastructures.org mailing list post Message-ID: <20060818174228.B26037@so.lanier.com>
  • The usefulness of professional services and consultancy in enterprise application deployment: experiences with CA, EMC, and Hyperion
  • Why the hell can't I keep my desk clean?
  • I miss going to conferences: VMworld is on now, LISA is in December. I'm expecting a new baby about halfway between, and there's no way I can go out of town for a week.
  • I hate being sick. Daytime TV sucks even with satellite and a DVR. If I'd known I was going to be sick this long, I should have joined NetFlix.
Not a bad topic list... now, discuss amongst myself.

--Joe

2006/08/25

The Ultimate P2V

There's been a lot of talk about the "Blue Pill" trick where a hypothetical virus would use the new x86 virtualization features (VT or pacifica) to move a running OS under a hypervisor (where the virus would run undetectably) It would be very interesting to extend this into a positive technology...

Imagine a program that uses Blue Pill to move the OS under a hypervisor. That's fine, but the OS is still coupled to the physical devices (network cards, disks, etc). Now have the hypervisor generate a virtual (hotplug) PCI bus and attach it to the running OS. And have it hotplug a vmnic and an emulated scsi controller. The OS notices the new redundant paths to the disks (standard multipathing software) and fails over all the network connections onto the virtual card. Then the hypervisor virtually unplugs the real PCI bus, and we're left with a completely virtualized (i.e. VMotion-able) machine. Without a downtime.

That would be really cool.

This would require:
  • A bluepill-compatible hypervisor that can create virtual hotplug PCI buses, and that can transport running VMs across physical machines
  • An OS that supports PCI hotplug, dynamic disk multipathing, and transparent network failover
  • All the disks on the physical system being on a SAN or otherwise multihosted
--Joe

2006/07/18

DamnDamnDamnDamn

The hard drive in my work laptop is in the process of dying. That is to say, it has died (bluescreen: kernel inpage error) but has occasionally spun up enough to boot Windows.

Just long enough for the backup software to load and start a backup, not long enough for the backup to finish.

On the bright side, Support has sent me a new drive, and it's an 80GB: a 20GB upgrade from what I had. So I should have enough space now for some of the virtual machines I've been meaning to create.

Unfortunately, I still haven't finished installing my software on the new image (so far going on 4 hours of work). The only reason I have email is because OWA actually works through Firefox on Linux. Whoda thunk?

--Joe

2006/06/22

Enterprise Monitoring

In the grand quest for the "One True Ring^W^W^WSilver Bullet^W^WIntegrated Solution", this week's goal is to reduce the number of tools we're using for enterprise monitoring.

Currently we have 5 major players:
MOM has "Management support" (and, therefore $$$), Big Brother has a rich history of success (and is free), MRTG is tightly integrated with the way Networking does their stuff, Cesura has gone out of business (but they had some really cool demo technology), and of course, nobody really knows what those scripts do.

On the bright side, there's this Hobbit project I've been following for a while, which looks like a better Big Brother than BB...

On the really bright side, I've not been tasked with getting all this crap together.

I just get called on to get it working because $COWORKER[0] doesn't know Solaris at all (production enterprise is Solaris) but he's the MOM wizard, and $COWORKER[1] needs to learn more about our environment (relative new guy), and needs some visibility in the larger organization. I just happen to be the only expert in the monitoring world, just like everywhere else.

So because there's money for MOM, we're looking to see if there's any way to get non-Windows platforms to work with this Microsoft solution. As it happens, there are several third-party addons (management pack extensions) that purport to "monitor" non-Windows clients. Also the Windows guys love MOM because it has links to MSKB articles about how to tune Exchange servers when there's a low memory alert, for example.

The first extension we tried (from eXc Software) sucked. Not so much that the eXc software sucked, but OOTB, it monitors 4 items: Total CPU usage (alert if CPU is >10% busy), CPU usage by process (alert if a single process is eating more than 10% of the CPU), disk free space, and swap space usage. And that's it. Anything more and we have to write our own JScript (or VB) test that runs on the MOM server, leverages their "clientless" (aka telnet) interface to gather status on the server, and the parses the output to create a MOM/WMI event. And then maintain that code. Not exactly what we had in mind.

But eXc also has an SNMP extension agent to monitor Solaris via SNMP, so we'll try that too. A few clickety-clicks later, I've configured the basic SNMP service that's installed with the community names and it's running on our test box. Except that the software is exclusively trap-driven. And the Solaris side doesn't have any (readily apparent) way to throw the traps. Basically the eXc stack is just the Solaris trap MIBs pre-configured.

Well, if we're going down the SNMP route, let's see what MOM can do on its own. After all, it says it can monitor via SNMP. One KB article later, (and lowering the monitoring standards significantly) and I have our linux-based Digi CM console server happily SNMP-trapping into MOM. Ok, it was a lot more than just the KB article, there was also some registry editing, MIB compiling, MIB editing so it would be acceptable to the MS SMI compiler, interpreting the help page for the MS SMI compiler, some minor VB scripting and finally, turning a checkbox on on the Digi. And we still only have SNMP traps. No queries, no performance trending, no performance alerts. Also, no MIB translation (so you have to be able to recognize that 1.3.6.1.4.1.332.10.14.14.0.2 means "authentication failure", which I'm sure we'll get good at in no time at all)

So back to the drawing board... there's 2 other extension packs for MOM that we're going to try out... one from here in Cincinnati (version 1.0 was released last week) and one that appears to be a whole management infrastructure that surrounds and integrates MOM (and happens to do non-Windows clients too)

The really unfortunate thing is (as I mentioned above) there's this Hobbit project, which would leverage our existing Big Brother clients and successes, and looks like it would be fairly straightforward to implement and has a reasonably sane, extensible architecture (but it isn't MOM -- the Windows guys really like MOM)

So I ask myself "what would it take to make Hobbit work with MOM?" (at least as well as the SNMP integration or the other products did)

Hobbit's backend consists of passing messages along "channels". Messages such as "serverX is down" and channels such as "status" or "page" (or "data"), passed via IPC to worker modules. It should just be a Small Matter Of Programming to create a worker module that would accept "stachg" (status change) and/or "data" channels, massage them into something like WBEM events, and toss them across to the WMI receiver on the MOM server. I mean heck, if VB can massage SNMP traps into WBEM, surely it can't be that hard. There's even sample channels in the hobbit distribution.

I think it'd take a couple of days of programming (and learning how MOM is different than Microsoft's WMI is different than WBEM). Unfortunately I'm the only one in the group who can code. And with everything else that's going on, the chances of me taking a couple of days is exceptionally slim.

Oh well, maybe somebody else will read this and think it's a cool, easy idea.

--Joe

2006/06/02

Need to build a secure (public) download site

I have a fairly simple task in front of me: Provide a place for random internet users to download (via anon. ftp, http and/or https) one of a set of several 300MB files. (Oh yeah, and they have no budget for hardware)

From this, I add the "usual" Enterprise Systems requirements: It has to be
  • manageable
  • secure
  • reliable
Seems straightforward: We have a Solaris 10 system in the DMZ in the central datacenter, it has enough mirrored disk space (over 20GB free) and it's running an application that's "more important" than this little download site, so reliability isn't a problem. If I create a zone on this server, it will be no less manageable than any of the rest (ok, the other) of the DMZ-based virtualization servers we have deployed.

That just leaves the "secure" requirement. There's lots of "interesting" opportunities there, though...

I think ideally the zone would be a mininmally installed zone (with just enough software to make apache and ftpd work) with everything mounted read-only from the global zone, and with a helper zone (only accessible to the LAN-side) having read-write access to the space (accessed via scp), with firewall rules allowing only (anyone->dlserver:80,443, and ftp) and (lan->helper:22) Oh yeah, and with traffic shaping to prevent this from eating too much of our outbound internet feed.

The firewall rules are easy... that's someone else's problem. "They" don't do traffic shaping, however, so I get to figure out the Solaris IPQOS functionality, if I get that far.

So how do you create a minimalist zone? Answers as I find them...

--Joe