The error message on the log was the following:
V1:3353 2014.10.22-02.08.43 UTC LDR.1-21884:9897 SQLLDR 2 SQL Loader started
V1:3354 2014.10.22-02.08.43 UTC LDR.1-21884:13196 SQLLDR 3 Starting Piped Merge
V1:3355 2014.10.22-02.09.10 UTC LDR.1-21884:13196 MERGE_ERROR GYMDC10118F Piped Merge Error: No such device or address:Transfer error
V1:3356 2014.10.22-02.09.10 UTC LDR.1-21884:13196 SQLLDRKILL GYMDC10102W Killed sqlldr pid=a UnixProcess (Inactive: exitStatus nil, Error: Success) , result=a UnixProcess (Inactive: exitStatus nil, Error: Success)
V1:3357 2014.10.22-02.09.10 UTC LDR.1-21884:9897 ORASQLLDR GYMDC10104F ErrorCode=nil CommandLine=$ORA_HOME/11.2. 0-client32/bin/sqlldr userid=PV_LDR_01/xxxx@pv log=...datachannel/LDR.1/state/2014. 10.22-00/MERGED~000.1DGA.BOF. log control=...datachannel/LDR.1/loader. gagg.ctl logErrors=
After some investigation, it was evident that for some reason, the unix pipe created during the data merge was getting corrupted.
It is important to know that the LDR component has two options for merging and loading the data files. On the topology editor, if the option "USE_PIPE" is false, it will generate an intermediate file with the merged data and then use this file to upload via oracle sqlldr. If "USE_PIPE" is true, it will create a unix pipe and the oracle sqlldr will use it to load the files. Some people say that using the pipe is faster, because you don't have to create the intermediate file, but this can cause issues as well, as we will see.
The TNPM system I mentioned was using the pipe method for quite a long time before the issue occurred. So it, should be something in the system itself that had changed. And indeed, it was.
When using the pipe method, the pipe pointer is created under /tmp/LDR.X wher X is the LDR channel number. This works fine if the /tmp is mounted locally, but, if for any reason, the IT team decides to mount it using a remote data store... well, you will have problems. This was exactly what happened. The IT team decide to mount the /tmp using a remote data store for the VMware cluster. Once the datachannel was running on the cluster, the pipe load was affected.
So, we deactivated the pipe (USE_PIPE=false) on the topology editor, deployed the topology and the problem was solved.
I could not find a way to change where to create the pipe pointer, so it must be hard-coded somewhere. If you know how to do it, let me know :)