Ideas, Solutions, Work in progress

and other things

Java Double and NaN Weirdness

We learn something everyday. We don’t always realise it, but we do. Sometimes the thing you learn isn’t new at all, but something you sort of knew but never really thought about too much.

I recently learned that my understanding of what causes a NaN value in Java’s double was wrong.

The story

I was working on an integration project and received a bug report on one of my services. The report said that my service is returning an HTTP code ‘500’ for a specific input message.

During my investigation I found the cause of the exception was an unexpected value returned from a down stream service. It was a SOAP service which returned something like the following in its XML response:

<SomeNumberField type="number">NaN</SomeNumberField>

I was a bit surprised to see the NaN there since I would expect them to either leave the field off or set it to null if they don’t have a value. This looked like a calculation bug since we all know that, in Java and C# at least, dividing a double with 0 results in a NaN. (Spoiler: It doesn’t)

However, this got me thinking and I tried to remember what I know about doubleand NaN. This resulted in an embarrisingly deep spiral down the rabbit hole.

NaN

Well if you think about it NaN is kind of like a number in this case, even though NaN means Not-a-Number. It exists to enable calculations with indeterminate results to be represented as a “number” in the set of valid double values. Without NaN you could get completely wrong results or you’ll get an exception, which isn’t ideal either. NaN is defined, same as Infinity, to be part of the set of valid doubles.

System.out.println(Double.isNaN(Double.NaN)); //true
System.out.println(Double.POSITIVE_INFINITY == Double.POSITIVE_INFINITY); //true
System.out.println(Double.NEGATIVE_INFINITY == Double.NEGATIVE_INFINITY); //true

I played around with double a bit and I thought to share it in a post, because I think the various edge cases of double are interesting.

I started with the following experiment:

//Lets make a NaN!
double NaN = 5.0/0;
System.out.println("NaN: " + NaN);

>> NaN: Infinity

Wait. What?

Turns out that I have lived with this misconception about what happens when you divide a double by zero. I seriously expected that a double divided by 0 is NaN. Well it turns out I was wrong. You get:

POSITIVE_INFINITY

double infinity = 5.0/0;
System.out.println((infinity == Double.POSITIVE_INFINITY)); //true

I can sort of rationalise that the answer could be infinity because you are dividing something largish with something much much smaller. In fact, dividing it by nothing so you could argue the result of that should be infitely large. Although, mathematically this does not make any sense. x/0 is undefined since there is no number that you can multiply with 0 to get back to x again. (for x <> 0)

Anyway lets play with NaN a bit.

double NaN = Double.NaN;
System.out.println("NaN: " + NaN); //NaN: NaN

System.out.println((NaN + 10)); //(NaN + 10): NaN
System.out.println((NaN - 10)); //(NaN - 10): NaN
System.out.println((NaN - NaN)); //NaN - NaN: NaN
System.out.println((NaN / 0));     //NaN / 0: NaN
System.out.println((NaN * 0));     //NaN * 0: NaN

Well no surprises here. Once a NaN always a NaN.

I used Double.NaN above to be sure I have a NaN but if you want to make one yourself then calculating the square root of a negative number is an easy way:

System.out.println((Math.sqrt(-1))); //NaN

Max and Min value

Before we get to infinity let take a quick look at Double.MAX_VALUE and Double.MIN_VALUE. These are special constants defined on Double which you can use to check if a number is at the maximum of what a double can represent. If a number is equal to Double.MAX_VALUE it means that it is about to overflow into Double.POSITIVE_INFINITY. The same goes for Double.MIN_VALUE except that it will overflow to Double.NEGATIVE_INFINITY.

Something to note about double is that it can represent ridiculously large numbers using a measly 64 bits. The maximum value is larger than 1.7*10^308 !

System.out.println("Double.MAX_VALUE is large! : " + (Double.MAX_VALUE == 1.7976931348623157 * Math.pow(10,308)));

> Double.MAX_VALUE is large! : true

It can represent these large numbers because it encodes numbers as a small real number multiplied by some exponent. See the IEEE spec

Let’s see what it takes to make Double.MAX_VALUE overflow to infinity.

double max = Double.MAX_VALUE;

System.out.println((max == (max + 1))); //true
System.out.println((max == (max + 1000))); //true
System.out.println("EVEN...");
System.out.println((max == (max + Math.pow(10,291)))); //true

System.out.println("HOWEVER...");
System.out.println((max == (max + Math.pow(10,292)))); //false
System.out.println((max + Math.pow(10,292))); //Infinity

This ability to represent seriously large numbers comes at a price of accuracy. After a while only changes in the most significant parts of the number can be reflected. As seen in the following code snippet:

double large_num = Math.pow(10,200);
System.out.println("large_num == (large_num + 1000): " + (large_num == (large_num + 1000))); //true

At large integer values the steps between numbers are very very large since the double has no place to record the change if it doesn’t affect its most 16 most significant digits. As shown above 1000 plus a very large number is still that same very large number.

Infinity

Java’s double supports two kinds of infinity. Positive and negative inifity. The easiest to make those are by dividing by 0.

double pos_infinity = 5.0/0;
System.out.println("POSITIVE_INFINITY == pos_infinity: " + (Double.POSITIVE_INFINITY == pos_infinity));

double neg_infinity = -5.0/0;
System.out.println("NEGATIVE_INFINITY == neg_infinity: " + (Double.NEGATIVE_INFINITY == neg_infinity));

In maths infinity is a numerical concept representing the idea of an infinitly large number. It is used, for example in calculus, to describe an unbounded limit - some number that can grow without bound.

In this case things are pretty much the same as in maths, where POSITIVE_INFINITY and NEGATIVE_INFINITY are used to represent numbers that are infinitely large. However they function more as a way to know something went wrong in your calculation. You are either trying to calculate something that is too large to store in a double or there is some bug in the code.

There are once again some interesting things to note when playing with positive and negative infinity.

double pos = Double.POSITIVE_INFINITY;

System.out.println("POSITIVE_INFINITY + 1000 = " + (pos + 1000));
System.out.println("POSITIVE_INFINITY + 10^1000 = " + (pos + Math.pow(10,1000)));
System.out.println("POSTIVE_INFINITY * 2 = " + (pos * 2));

Once the value is infinity it stays there even if you add or substract rediculously large numbers. However there is one interesting case, when you substract infinity from infinity:

double pos = Double.POSITIVE_INFINITY;
double neg = Double.NEGATIVE_INFINITY;

System.out.println("POSITIVE_INFINITY - POSITIVE_INFINITY = " + (pos - pos));
System.out.println("POSITIVE_INFINITY + NEGATIVE_INFINITY = " + (pos + neg));

Subtracting infinity from infinity yields NaN and as you would expect adding or subtracting NaN yields a NaN again.

System.out.println("POSTIVE_INFINITY + NaN" + (pos + Double.NaN));
System.out.println("POSTIVE_INFINITY - NaN" + (pos - Double.NaN));

In closing

Both Java’s float and double types follow the IEEE 754-1985 standard for representing floating point numbers. I am not going to go into great detail on the internals of double, but it suffice to say that double and float are not perfectly accurate when you use them to perform arithmetic. The Java primitive type documentation says:

This data type should never be used for precise values, such as currency. For that, you will need to use the java.math.BigDecimal class instead.

If precision is you main concern then it is generally better to stick with good old java.math.BigDecimal. BigDecimal is immutable which makes it nice to work with, but the most important thing is precision. You have absolute control over number precision, without the rounding or overflow surprises you get with double and float. However, if performance is the main concern it is better to stick with float or double and live with the inaccuracies.

For more information on how Java handles NaN, infinity and rouding read the documentation here.

Extending Metrics for Complex Dashboards in AppDynamics

Overview

Some time ago, I was tasked to replicate one of our client’s Wily Introscope dashboards in AppDynamics. The Wily dashboard displayed a number of status lights indicating how recently activity was detected from a particular client of the application.

The status light colours were assigned as follows:

Status Meaning
GREY No activity since 5am this morning
RED No activity in the last hour, but something since 5 am
YELLOW No activity in the last 10 minutes, but some in the last hour
GREEN Activity in the last 10 minutes

 
The data for each light was gathered by Introscope using custom instrumentation points looking for calls to a particular POJO method. The first parameter to this method was the client identifier, so Introscope collected metrics for each call to this method grouping it by 10 minutes, 1 hour and 1 day.

In this post I will describe what I did to reproduce the dashboard in AppDynamics. Even though it is a rather hacky work around, it is still interesting. The solution works by extracting metrics from AppDynamics using the REST API and sending it back in as new metrics, which can be used by health rules to drive status lights.

The code and examples in this post is from an example application built to illustrate the solution more clearly.

See github: https://github.com/dirkvanrensburg/blogs-appd-metrics-for-dashboards

Status lights in AppDynamics

The status light in AppDynamics relies on a health rule to drive the state of the light. The AppDynamics status light is green by default, to indicate no health rule violations. Yellow for WARNING rule violations and Red for CRITICAL rule violations. The status light in Introscope is grey when there is no data, so it essentially has four states compared to the three states available in AppDynamics.

As mentioned, the AppDynamics status light uses one health rule, which means you cannot tie the different colours of the light to metrics gathered over different time ranges. The time range for the light is determined by the setting on the widget or the dashboard, where the Introscope status light can use separate metrics for each status.

Getting the information

The first step to solving the problem is to gather the information we need to display. We can look at the Introscope probe configuration to see what it uses for for the status light:

TraceOneMethodWithParametersOfClass: SomeCentralClass loadItems(Ljava/lang/String;)V BlamedMethodRateTracer "SYS|Service By Client|{0}:Invocations Per Second"

This means that Introscope will count the invocations per second of a method called loadItems, on an instance of the class SomeCentralClass and group this count by the client identifier (the String parameter to loadItems).

Information points

To capture that type of information in AppDynamics you use information points. Information points tracks calls to a method on a POJO and collects metrics such as Calls Per Minute and Average Response Time. AppDynamics does not allow information points to be “split” by parameter in a generic way. That means to get the required information, we have to create an information point for every client.

You create information points by opening the Information Points view from the side menu and clicking on New

Analyse -> Information Points -> New

Information points track calls to specific methods so you need to provide the class name, method name of the method to collect metrics for. In this case we want separate informations points based on the parameter to the method call, so we need to set a match condition

The information point will then start collecting data for Average Response Time, Calls per minute, and Errors per minute as seen on the following dashboard view.

Once defined, the information points are also available in the metric browser where you can plot the different metrics of each information point on the same graph. The following image shows the Average Response Time for CLIENT2 and CLIENT4

Analyse -> Metric Browser

Using the REST API

The AppDynamics controller provides a REST API, which enables you to programmatically extract information out of the controller and, in the case of configuration, send information to the controller. This means that we can call the controller to get the metric information of the information points we just configured. The URL to an information point metric can be retrieved from the metric browser. Right click on the information point and the metric you are interested in, Calls per Minute in our case, and select Copy REST URL

"Rest URL from metric browser"

This will copy the URL to the clipboard and you can test it by pasting it into a new tab in your web browser. You should see something like this

"Example REST results"

The URL can be changed to get the information over different time ranges by changing the time-range-type and duration-in-mins fields. The time-range-type field is used to get information relative to a point in time, so for example it can be used to get information for the last 15 minutes or for the 30 minutes after 10 am this morning. We can use this to get the information we are after out of AppDynamics. We can get the number of times each client called the service in the last 10, 60 or 960 minutes by changing these fields and calling the controller.

Having the information available as a REST service call is one thing, but we need it in the controller so we can create a dashboard. It is of no real use on the outside. To get metrics into the controller we need to use the Standalone Machine Agent.

The Standalone Machine Agent

The Standalone Machine Agent is a Java application whose primary function is to monitor machine statistics such as CPU, Memory utilisation and Disk IO. It also provides a way to send metrics into AppDynamics by means of a Monitoring Extension. The extension can supplement the existing metrics in AppDynamics by sending your custom metrics to the controller. A custom metric can be common across the nodes or associated with a specific tier. You specify the path, as seen in the metric browser, where the metrics should be collected relative to the root Custom Metrics

Get the information out

As mentioned before the metrics we are interested in can be extracted from the AppDynamics controller using the REST API and using the Standalone Machine Agent we can create new metrics, which we can use for the dashboard. Using the following REST API call, we can get the metrics captured by our information points rolled up to the different time ranges. The call below will get the Calls per Minute metric of CLIENT1

http://controller:8090/controller/rest/applications/ComplexDashboardTest/metric-data?metric-path=Information Points|C1|Calls per Minute&time-range-type=BEFORE_NOW&duration-in-mins=10

By calling the above REST call multiple times for every client we can get values for Calls per Minute rolled up over the periods we are interested in (10, 60 and 960 minutes). However, just getting the values of the last 960 minutes (16 hours) is not good enough since it will give incorrect values early in the day. Before 13h00 it could still pick up calls from the previous day, so we need a different approach. To do this we change the time-range-type to AFTER_TIME and provide a start time of 5am the morning. This will then only return values for the 960 minutes after 5am.

The following REST call will do that - replace the ${timeat5am} value with the UNIX time for 5am of that day.

http://controller:8090/controller/rest/applications/ComplexDashboardTest/metric-data?metric-path=Information Points|C1|Calls per Minute&time-range-type=AFTER_TIME&start-time=${timeat5am}000&duration-in-mins=960

Send the information back in

To send the information back in we need to actually create the monitoring extension, which essentially is a script which the Standalone machine agent will call periodically and any values the script writes to standard output will be forwarded to the controller. We want the script to send metrics such as the following:

name=Custom Metrics|Information Points|CLIENT1|Calls per 10 Minutes,value=0
name=Custom Metrics|Information Points|CLIENT1|Calls per 60 Minutes,value=2
name=Custom Metrics|Information Points|CLIENT1|Calls per 960 Minutes,value=2
name=Custom Metrics|Information Points|CLIENT2|Calls per 10 Minutes,value=0
name=Custom Metrics|Information Points|CLIENT2|Calls per 60 Minutes,value=1
name=Custom Metrics|Information Points|CLIENT2|Calls per 960 Minutes,value=3519

...And so on for all the clients

Once we have the extension installed and reporting, the new metrics will show up in the AppDynamics metric browser at the following location, assuming the machine agent is reporting for the tier called ‘OneTier’.

Application Infrastructure Performance -> OneTier -> Custom Metrics

There will be a section for each client (CLIENT1 to CLIENTx) and each will have a metric for each of the time ranges we are interested in (10, 60 and 960 minutes)

"The new metrics displayed in the browser"

Health Rules

Health Rules provides a way to specify conditions which the system will consider WARNING or CRITICAL conditions. You specify the metric to monitor and the threshold or baseline to compare it to for both the WARNING and CRITICAL condition.

We can now create health rules to track these metrics, so that the dashboard lights can show how recently a particular client accessed the system. To create a health rule we use the side menu in the AppDynamics controller.

Alert & Response -> Health Rules -> (click on +)

First specify a name for the rule, the type of metric we will use and the time range to use when evaluating the health rule. The last 5 minutes is good enough since the machine agent will send a value every minute and the value it sends is already summed over the period in question.

We need to create one health rule for every client "Create the health rule"

The WARNING condition is raised if there were no calls in the last 10 minutes, but some in the last 60 minutes. "Create the health rule"

The CRITICAL condition is raised if there were no calls in the last 60 minutes. "Create the health rule"

Putting it all together

Now we have all the information we need to start assembling the dashboard. Status lights only work on Custom Dashboard as opposed to Node/Tier Dashboards. To create a Custom Dashboard we click on the AppDynamics logo at the left top an choose Custom Dashboards

"Create Custom Dashboard"

Next we create a new dashboard by clicking on the Create Dashboad and set the layout of the canvas to absolute. This is because the grid layout does not support metric labels on top of other widgets and we need this to complete the dashboard.

"Create new Dashboard"

Put a label with the text Client Access and place it at the top of the dashboard, add a label for the first status light with the text CLIENT 1 and then add the status light for client 1. The status light is linked to the health rule for CLIENT1 by selecting it in the status light properties.

"Status light properties"

We can now repeat these steps for the remaining 5 clients, linking each to the appropriate health rule, and finally the dashboard looks like this

"All the status lights"

As mentioned at the start of the post, the Introscope status light can be in four states and the AppDynamics status light only three. To represent the fourth state we can put the value of the Calls per 960 Minutes metric on the status light as a dynamic label.

"Metric label"

The label background is set as transparent, and sized so that it will fit on the status light for client 1. After adding a metric label for each client, the dashboard is complete. We now have a fully functional dashboard which displays the same information as the original Introscope dashboard. In fact, it shows a little more information because we added the ‘calls today’ label on the status to make up for the missing fourth state. Knowing the number of calls for the day is much better than just having a red light meaning ‘some calls today but nothing in the last hour’.

"Completed Dashboard"

Conclusion

Using the AppDynamics REST API and Standalone Machine Agent allows you to do powerful enrichment of the metric information in AppDynamics. You could, for example, monitor an internal legacy system through a custom built API and combine that data with information already captured by AppDynamics. This can then be stored as a new metric which you can use to visualise the data.