You have to do it on application level anyway. No matter what DBMS
you use which support transactions. If the DBMS is aborting the
transaction the application has to consider retrying it, give a
failure after 3 attempts, for example.
In a high availability and distributed DBMS cluster not doing this in
the DMBS is a cop out.
Is it possible to rerun the part of the transaction that failed on the
replica of the data node that failed instead of having to run it in a
larger cluster?
I'm just trying to think that on a NDB install with a LOT of node a
data node failing would require a LOT of extra work on the nodes that
stay online.
Also..... shouldn't there be a section about this in the manual?
There should be a wiki for NDB that's actually used. the HashMySQL
wiki doesn't have a cluster section that I saw.