Sunday, October 26, 2014

Piped merge error - what is wrong?

Last week a friend of mine came to me to ask about a strange error he was getting on TNPM. Basically, he had many gaps on report data for all devices, and it was apparently intermittent.

The error message on the log was the following:


V1:3353 2014.10.22-02.08.43 UTC LDR.1-21884:9897        SQLLDR  2 SQL Loader started
V1:3354 2014.10.22-02.08.43 UTC LDR.1-21884:13196       SQLLDR  3 Starting Piped Merge
V1:3355 2014.10.22-02.09.10 UTC LDR.1-21884:13196       MERGE_ERROR     GYMDC10118F Piped Merge Error: No such device or address:Transfer error
V1:3356 2014.10.22-02.09.10 UTC LDR.1-21884:13196       SQLLDRKILL      GYMDC10102W Killed sqlldr pid=a UnixProcess (Inactive: exitStatus nil, Error: Success) , result=a UnixProcess (Inactive: exitStatus nil, Error: Success)
V1:3357 2014.10.22-02.09.10 UTC LDR.1-21884:9897        ORASQLLDR       GYMDC10104F  ErrorCode=nil CommandLine=$ORA_HOME/11.2.0-client32/bin/sqlldr userid=PV_LDR_01/xxxx@pv log=...datachannel/LDR.1/state/2014.10.22-00/MERGED~000.1DGA.BOF.log control=...datachannel/LDR.1/loader.gagg.ctl logErrors=

After some investigation, it was evident that for some reason, the unix pipe created during the data merge was getting corrupted.

It is important to know that the LDR component has two options for merging and loading the data files. On the topology editor, if the option "USE_PIPE" is false, it will generate an intermediate file with the merged data and then use this file to upload via oracle sqlldr. If "USE_PIPE" is true, it will create a unix pipe and the oracle sqlldr will use it to load the files. Some people say that using the pipe is faster, because you don't have to create the intermediate file, but this can cause issues as well, as we will see.

The TNPM system I mentioned was using the pipe method for quite a long time before the issue occurred. So it, should be something in the system itself that had changed. And indeed, it was.

When using the pipe method, the pipe pointer is created under /tmp/LDR.X wher X is the LDR channel number. This works fine if the /tmp is mounted locally, but, if for any reason, the IT team decides to mount it using a remote data store... well, you will have problems. This was exactly what happened. The IT team decide to mount the /tmp using a remote data store for the VMware cluster. Once the datachannel was running on the cluster, the pipe load was affected.

So, we deactivated the pipe (USE_PIPE=false) on the topology editor, deployed the topology and the problem was solved.

I could not find a way to change where to create the pipe pointer, so it must be hard-coded somewhere. If you know how to do it, let me know :)

Sunday, October 19, 2014

link($DC_HOME/bin/visual,CMGR_visual) failed with error 18

If you ever installed the old version of TNPM (Proviso), you know that it was not possible to split the datachannel binaries from the data using two different locations. This was not very smart, considering that it is a common practice among many companies.

Since version 4.4.1 (I believe), this option exists and you can configure different locations for your datachannel binary files and the data files.

But, there is a catch if you use different partitions for the split (what is also the common practice...your data partition is usually remote mounted from the company data cluster).

The limitation is on the visual binary. You cannot execute it from a remote partition. If you try so, you will receive the following error on the screen:

link($DC_HOME/bin/visual,CMGR_visual) failed with error 18

This happens for all the tools that use the visual binary to bootstrap themselves (cmgr, amgr, frmi, etc...).

Fortunately, the solution is very simple:

1) Go to the $DC_HOME/bin folder
2) Open the run script used to start the tool. For instance "cmgr" for the CMGR component.
3) Go to the end of the script and add the line in red right above the last line:

cd $DC_BIN_HOME

$DC_BIN_HOME/pvexec CMGR_visual $DC_BIN_HOME/visual -nologo -noherald $DC_BIN_HOME/dc.im -headless -a CMGR "$@"

This will make sure you go back to the partition where your datachannel binaries are installed, before executing the visual command.

You have to do the same for all run scripts.

That's it.