Sunday, October 26, 2014

Piped merge error - what is wrong?

Last week a friend of mine came to me to ask about a strange error he was getting on TNPM. Basically, he had many gaps on report data for all devices, and it was apparently intermittent.

The error message on the log was the following:


V1:3353 2014.10.22-02.08.43 UTC LDR.1-21884:9897        SQLLDR  2 SQL Loader started
V1:3354 2014.10.22-02.08.43 UTC LDR.1-21884:13196       SQLLDR  3 Starting Piped Merge
V1:3355 2014.10.22-02.09.10 UTC LDR.1-21884:13196       MERGE_ERROR     GYMDC10118F Piped Merge Error: No such device or address:Transfer error
V1:3356 2014.10.22-02.09.10 UTC LDR.1-21884:13196       SQLLDRKILL      GYMDC10102W Killed sqlldr pid=a UnixProcess (Inactive: exitStatus nil, Error: Success) , result=a UnixProcess (Inactive: exitStatus nil, Error: Success)
V1:3357 2014.10.22-02.09.10 UTC LDR.1-21884:9897        ORASQLLDR       GYMDC10104F  ErrorCode=nil CommandLine=$ORA_HOME/11.2.0-client32/bin/sqlldr userid=PV_LDR_01/xxxx@pv log=...datachannel/LDR.1/state/2014.10.22-00/MERGED~000.1DGA.BOF.log control=...datachannel/LDR.1/loader.gagg.ctl logErrors=

After some investigation, it was evident that for some reason, the unix pipe created during the data merge was getting corrupted.

It is important to know that the LDR component has two options for merging and loading the data files. On the topology editor, if the option "USE_PIPE" is false, it will generate an intermediate file with the merged data and then use this file to upload via oracle sqlldr. If "USE_PIPE" is true, it will create a unix pipe and the oracle sqlldr will use it to load the files. Some people say that using the pipe is faster, because you don't have to create the intermediate file, but this can cause issues as well, as we will see.

The TNPM system I mentioned was using the pipe method for quite a long time before the issue occurred. So it, should be something in the system itself that had changed. And indeed, it was.

When using the pipe method, the pipe pointer is created under /tmp/LDR.X wher X is the LDR channel number. This works fine if the /tmp is mounted locally, but, if for any reason, the IT team decides to mount it using a remote data store... well, you will have problems. This was exactly what happened. The IT team decide to mount the /tmp using a remote data store for the VMware cluster. Once the datachannel was running on the cluster, the pipe load was affected.

So, we deactivated the pipe (USE_PIPE=false) on the topology editor, deployed the topology and the problem was solved.

I could not find a way to change where to create the pipe pointer, so it must be hard-coded somewhere. If you know how to do it, let me know :)

No comments:

Post a Comment