Child pages
  • Home
Skip to end of metadata
Go to start of metadata
Year of the Nimble Llama

Toolshed: Year of the Nimble Llama

Happy New Year from Tools!

This is the first of my monthly announcements I will send out about Network Management Tools news.

Tools Quarterly Report

I'm putting together a tools quarterly report for my managers to see Tools costs and services (projects, initially). A draft is available now.


KPI project complete

The Key Performance Indicators (KPI) project has been completed. In addition to Network Availability we now provide charts for Service Points, Network Devices, and Network Switch Ports.

In addition to providing these KPIs, we now have an efficient facility for more rapidly producing charts like these in the future.

Thomas Nash, Heather, and Jody did a great job on the project!

EDB Project done first of February

The Equipment Database project will be done and in production the first part of February.

The primary purpose of this project is to improve UW Technology's ability to plan and budget around equipment costs. It does this by recording the purchase cost and also allowing you to register Lifecycle Next Actions with the equipment. Next Actions include: Replace, Retire, Virtualize, etc.; a Date, and a Cost.

Further, you can track Agreements with equipment. This allows you to track Warranties and Service Contracts and their descriptions and dates.

The Equipment Database will be synchronized nightly from OASIS and there is a process where OASIS gets updated when you update information in EDB.

Thomas is the Project Manager, and Nick and Jody are working on the implementation.

Alpha/SMS Paging

Andrew is working on updating the `nocpage` tool so that it no longer uses DS Unix servers for paging. DS Unix will be retiring these servers. Moving forward, `nocpage` will use the USA Mobility alpha paging web service directly. Additionally, we will support SMS paging using the Clickatell web service.

Since we won't be using internal servers any longer, paging failure scenarios are different. If the USA Mobility (or Clickatell) web services out on the Internet don't reply successfully, then `nocpage` will tell you it couldn't deliver the page. You then must manually page the person you are trying to reach.

To register your sms device, update the port:/usr/groups/netops/local/beepers.txt file with a line with your <cell number>: <name>: sms: <uwnetid> where 'sms' is literally 'sms'. The format should be obvious within the file.

`notch` notifications that use the alpha paging system will also be updated.

These should be updated this week.

New Unix interactive server 'port' replacing 'shiva'

There are no show-stopper issues on port and we expect to disable shiva accounts next Tuesday, January 12th.

One feature that will be coming soon is the ability to mount your port home directory to your nebula desktop. This will be a different filesystem than your nebula home directory (on shiva your home directory was your mounted nebula filesystem but not so on port). We will announce that feature when it is available.

Everyone in tools has been helping this move forward. Andrew and Heather, in particular, have been tracking issues down and resolving them.

HFS Registration Project starting soon

At the end of this month or beginning of next month we will be starting the HFS Registration project. This will provide an EasyReg-like registration portal to their wired networks. It will require the user to read and agree to the the User Agreement before registering with their UWNetID and password. Users on those subnets will need to re-register quarterly.

We are planning to roll this out by Spring quarter.

Jim Srnec will be leading this and Nick and Jody will be working on the implementation.

switch firmware upgrades coming

In order to support enabling rogue dhcp blocking in DLink switches, the firmware needs to be updated on most of them. The OM Edge team is leading this effort.

Following up on the rogue dhcp work that had to be tabled, Ben will be working on this.

ETAR Re-architecture project

Andrew updated our nsync network discover software to collect VRF information from Cisco and Juniper routers. And he updated puma to display VRF information.

Finishing up Link Visualization

Now that the ETAR rearchitecture project is complete, we will be starting up the LinkViz project again. The current end date is 1/30. Andrew and I will be evaluating this week whether we can make that date or not and what the actual date will be.

Other items of note

enableip, killip, doa CLI flags converged

Nick modified these command line tools so that they have more similar or the same command-line options where appropriate.

Datacenter Facilities

Nick and I have been working with Datacenter facilities on seeing if they would get value out of using our tools to manage their datacenter powerstrips. We've set some things up in puma and Ting for them to consider.

Meerkat/RT:Outages Integration

Last month Sri deployed Meerkat/RT:Outages Integration. This allows you to directly feed an outage with the devices and/or interfaces from meerkat alerts.

Also, he deployed the Meerkat Alert Circuits page which allows you to easily find all the circuits related to a meerkat alert so you can copy and paste them into an email (to DIS for example).

Puma updates

Ben created a standard header, breadcrumbs, and footer that we can use to do more look-and-feel standardization in our tools. He updated these in puma and the barista index.

Andrew added a VRF tile for routers and if an interface is in a VRF it will be displayed next to the interface name.

Creeps upgrade

Sri is working to upgrade our UW data collection server creeps.

Prove you've read this far by showing me the "happy llama" hand next time I see you (smile).

Today Geoff and I met with Joseph Wolfgram, Russ[?] from UW School of Medicine and folks from Apptio.  Apptio provides a service that takes your log data, asset data, ticket data (and more) and, to some extent, can discover your services but further, can show you how much your services/infrastructure cost in terms of capital, facilities, FTEs, and operations.  It solves a very important problem.

The nice thing for UW Tech host/net management teams is that it fills a need that we generally don't currently fill and it sounds like it would integrate with our data/systems.  We simply need to send our data to the service.

The meeting also showed me how others are looking at measuring IT costs/prices including FTE and operational costs.  These ideas aren't particularly special but:

If operations support staff are ticketing all of their work, we should be able to measure their rate of ticket processing and, using the general cost of the positions, measure the cost per ticket.  Thus, we could generally show how much an incident cost to resolve.  Even using just the information we collect now, we could show within an order of magnitude what an incident cost.

Further, over time, we could include this number in our calculations of the cost of the service.  E.g. over a quarter, we could see how much time was being spent on layer 2 incidents and put that in our calculations for the total cost of ownership for the service.  It could go in to our per port calculations.  Then we could perhaps calculate how much money we could save/lose if we chose more expensive switches with better management features.

Another opportunity could be to measure and then perhaps improve NOC employee ticket processing performance.  If the average ticket processing rate per Tier 1 employee is X, then you can partially measure an employee's productivity by comparing their particular ticket processing rate to X.  Certainly this measure would need to be taken with a grain of salt because without further instrumentation, it wouldn't take the quality of customer interactions into account and other important metrics but, still, it's still a useful measure.  Also, it can provide a feedback loop for employees to see how they are doing compared to the average which, as they modify their behavior to stay average or better, would improve the average.  This can all seem kind of Machiavellian but I think if it's used with care could really be beneficial for the quantity and quality of our operations service.

  • No labels