Wednesday, September 4, 2013

No data in TNPM reports? Some things to investigate...

Hi,

This last week I saw something in TNPM that reminded me about one of the most common issues when processing and reporting performance data...no data in the reports. In this case it is quite obvious that something is wrong...but where is the problem?

I would like to share some steps I always take in order to investigate the problem in TNPM

Was data ever collected before?

This is the first thing to check...navigate to the past in the report (preferably a report with a chart in order to see raw data over time). Is there any datapoint available? If yes, when was the last one?
The most part of the time you will notice that the issue is not in TNPM itself, but in the data source. It can be a not announced patch applied to the device by the engineer dept (and they always forget to inform it :). It can be some network change caused by a new product release. It can be a change in the sftp credentials used to collect data...etc. Looking backward on time and identifying exactly the last collection point should be your first step.

Are the datachannel components running correctly?

Go to the Datachannel server and do a dccmd status all. Check if all components are on time or if one or many of them are delayed by some hours. Look for a column called "ES DURATION". The numbers indicate how long the component is in a "fixed state". Ideally you should see small numbers (let's say smaller than 100), indicating that something is being processed, except for the DISC component, that will usually have a big number.
Please be attentive here, specially if you are using CSE/CME formulas that depends on input from different subchannels. If one subchannel is delayed in time it will delay all dependent subchannels as well.

Are there ".bof" files being generated?

Go to the datachannel server and check the done directory for the subchannel with problems. You should see files with the extension ".bof" in it (except for the BCOL where you will find ".pvline" files). Please remember that the data flows in the following order:

BCOL
                => FTE   =>  CME   =>  LDR  => DLDR
SNMP

So, if you can find ".bof" files in the FTE/done directory and in the CME/done directory but not in the LDR/done directory for a specific subchannel, this means that the problem is in the CME and not in the LDR

Can you find some metric for your subelement in the ".bof" files?

Select one subelement that should have data displayed in the report, and get its dbIndex (the easiest way is to export the RST table to csv and look into the resource column).

Go to the data directory, find a recent ".bof" file and execute 

bofDump -r <dbIndex> <filename>.bof 

If nothing comes out, them the problem may be in the collection tree or collection requests (continue reading...)

Is the subelement in the correct folder in the collection tree?

Open the "pvm", go to the "resource editor" and check if the subelement is in the correct collection folder. If not, please check your grouping rules using the "rule editor".

Is a collection formula deployed and active for the collection folder?

Open the "request editor" and check if at least one collection formula is deployed and active for the collection folder.

NOTE: One small trick here. If the subelement exists in the collection folder, the formula is deployed and active and you can see data for other subelements in the same folder but not for this specific one, do a grep in the tnpm.log file using the dbIndex of this subelement and you will probably find an error saying the subelement was dropped because no request exists for it.

If this is the case, there is a problem with the CME local metadata image. Try the following:

1) Bounce the CME in question (dccmd bounce CME.X.Y)
2) If that doesn't solve, go the the "request editor", select the metrics deployed in the collection folder, disable them, save, enable them and save again. This was the only solution for me in some cases. I know it smells like a bug, but I don't have time right now to open a PMR (fell free to do it if you face this same problem :) )