Popular Posts

Wednesday, May 23, 2012

STEEDd: Another Implementation of The Near-Real-Time Control Charting and EV Calculating


Thierry Déléris is a French System Programmer on Mainframe in a team dedicated to performance, metrology & capacity planning. He used some ideas published in Trubin's CMG papers to implement the following:

1. The solution, wich gives a daily eMail by CEC with a spreadsheet by LPAR and Workload, on a daily basis: thresholds are calculated thanks to the R Language by day of the week, hour of the day, LPAR name and WLM Workload, based on a 6 month history data (based on SMF72 records) with exclusion of outliers using Tukey Statistical Method.
This initial part of the solution has a big inconvenient: it gives the resulting spreadsheet for a CEC only the next day because it is based on the SMF 72-3 records of the previous day collected during the last night by TDSz...

2. Then the second part of the solution called STEEDd (Statistical Tool for Enhanced Exceptions Detection and Diagnosis, and as a reference to the "Avenger" British TV Show character John Steed and is legendary bowl hat) was developed using a Java solution to use the same R calculated thresholds but on a 15 minutes control solution, which interacts with BMC Mainview on the Host to collect the current data (In fact the last 15 minutes data). This solution gives a main screen to select the metric to control, and a control screen by metric. An eMail alert is sent to the team if for some metric the result is higher or lower than the target high or low thresholds.

As an example, here is a picture of the control screen used for CPU Metric by Workload & LPAR :

Legend:
When the icon is selected, the associated control chart pops up showing the metric for the last 12 hours like the shown below:


The idea of EV (Extra Value or Exception Value, introduced in Trubin’s CMG papers and discussed in this blog) is used there (Red bars for EV+ and Yellow bars for EV- on above picture) . This helps filtering the right & false negative alerts.

3. Third part of the solution: On the way! An Artificial Intelligence solution based on a rule engine is studied to explore the detected problem by a hierarchical way... This application will be used to enhance the analysis of the metric alerts thanks to an "expert system" way.

(Posted with the Thierry Déléris permission)

Monday, May 7, 2012

SEDS-Lite Presentation at Southern CMG Meeting in the SAS Institute

Southern CMGLast Friday I have made my presentation which was announced here: SEDS-Lite: Using Open Source Tools (R, BIRT and MySQL) to Report and Analyze Performance Data. That was presented at the Southern CMG Meeting in the SAS Institute, Cary, NC. The presentation slides are linked within AGENDA and also can be downloaded from HERE


I plan to write a paper based on this presentation and to submit that to this year CMG'12 conference.


Friday, April 20, 2012

Building IT-Control Chart with COGNOS

I am developing SEDS elements using IBM Cognos. Here is the 1st result, which is just a POC prototype of IT-Control Chart report.
I used the test data (Date-hour stamped utilization metric) that I developed to build the same IT-Control Charts by other tools (BIRT, MySQL, R). I have published some information about that on my previous blog posts. (e.g. R-script to plot IT-Control Chart against MySQL)

This time I have developed simplest meta-data package against ODBC to MySQL database by using Cognos Framework Manager and published that in TCR locally on my Laptop. Then I used Cognos Report Studio to build the following report:

The result of running the report is following:

I got the same result as I got by using R or BIRT, but I have noticed some nice features in COGNOS that helped me to build that faster and more accurate (e.g. adding the dates at the X-Axis)

I am going to mention that progress with some details on my up-coming SCMG presentation:

SEDS-Lite: Using Open Source Tools (R, BIRT and MySQL) to Report and Analyze Performance Data

Tuesday, April 17, 2012

Southern CMG Spring 2012 Meeting in Richmond - MXG is our Sponsor!

At SCMG we have usually two meetings each season (2 Spring and 2 Fall ones, both in Richmond and Raleigh). Last season - 2011 fall - I had my presentation; see the following post: "My Southern CMG Presentation in Richmond Is About Open Source Tools for Capacity Management ". Presentation slides are published here: slides

This spring I have the similar but updated presentation in our Raleigh SCMG meeting: "SEDS-Lite: Using Open Source Tools (R, BIRT and MySQL) to Report and Analyze Performance Data" 

So this time I am not presenting in Richmond but I found very good sponsor for that meeting - Merrill Consultants (http://www.mxg.com). Barry Merrill himself responded on my invitation and now we have a great opportunity to see and listen the legendary Capacity Management inventor!

Please consider attending our Richmond VA SCMG meeting on May 11, 2012: 
 
http://regions.cmg.org/regions/scmg/spring_12/richmond/meeting.htm


Wednesday, April 11, 2012

SEDS-Lite: Using Open Source Tools (R, BIRT and MySQL) to Report and Analyze Performance Data

My presentation with this name has been scheduled for the next Southern CMG meeting  at SAS Institute:

SCMG Meeting Raleigh
May 04, 2012

You are welcome to attend!

Thursday, April 5, 2012

Prehistory of SEDS: Virtual CMG'90 Trip Report about Control Chart Usage. Part 1.

Using the key word "Control Chart" I have found in the www.CMG.org knowledge base a few very old CMG papers with some discussions about using classical SPC approach against computer performance data.

Here is the first one:

 Fine-Grain Analysis (FGA): A Methodology for Analyzing Intermittent Performance Problems Open in a new window
  By Robert Berry & Jeffrey Hedglin 

 

The paper describes what Mainframe metrics are good to use for Control Charting. They should be two types - a. Performance Quality Measure - sounds like modern KPI... (e.g. response time);  b. System performance metrics (e.g. CPU queue length). Then the paper describes how the intermittent problem could be detected just by plotting SPC Control Charts for both type of metrics in sync (correlated).

I use that approach a lot now, but using MASF type of Control chart and specifically my IT-Control Charts.  BTW I am writing now my next CMG paper and plan to add there a couple very persuasive  examples of correlated IT-Control Charts, such as, number of concurrent user LOGONS vs. number of Ph. CPUs used by LPARS on some p770 AIX frame....

To be continued....