October 31, 2018 at 9:37 am #23674
So we are having three nodes with OpenDJ and replication enabled (by the first node). Now if the replication service cannot start for some reason (eg this bug as in our case) on one of the nodes that means that we cannot write to that node but reads are still possible.
How is fail over handled in this situation? Can it be handled by OpenDJ somehow, assuming a correct setup of course, or does it fall on any LDAP SDKs (like Unboundid) to handle that?
NiclasOctober 31, 2018 at 9:55 am #23675Gentjan KocaqiParticipant
I do believe that the term ‘failover’ in your description is not used correctly or it might be I did not get correctly your question.
You are saying that you have 3 instances of OpenDJ and these instances are part of the same replication topology. If one of your instances hit the bug you reported, I do agree that this instance should still continue provide ‘read’ but not ‘write’. And it makes sense if you think about that. Do you really want to allow ‘writes’ on a instance that is not working properly under the replication topology? No, you don’t cause you will end up with instances not aligned because of the replication not working properly. If you read the workaround on that bug, it is clear that you need to disable replication, clean changelogDb and enable replication (I will add the step of doing the initialization as well to align this instance with the others).
CheersOctober 31, 2018 at 10:18 am #23676
I’m new to OpenDJ and directory databases/servers so my terminology might not be correct. The failover functionality is what I’m looking for for our application though. I agree that we don’t want to write to an instance that is not working properly, but in those cases we would like to write to one of the other instances instead. I would guess from your response and other forum posts that this is not something that OpenDJ supports? Rather it would be up to any individual applications to supply this functionality?
Regarding the bug we have solved that already and will step to a newer version as soon as possible to avoid this situation again. We do have high requirements on stability though, which is why we are looking into these questions.
BrOctober 31, 2018 at 10:32 am #23677Gentjan KocaqiParticipant
One thing you can do to speed up the things in such situation you occured is having monitorings in place for OpenDJs. You could monitor if you have correct reads/writes in your instances and fire an alert if not.
Regarding the failover, you do this at the application level but that might not solve this issue to you cause as far as I know the failover at application level checks if a certain first_service_in_failover is available, if not fails over the next second_service_in_failover, and so on. Failover will not catch if your writes are not working properly. Anyhow, upgrading and having a monitoring in place are my best suggestions here.
CheersOctober 31, 2018 at 11:45 am #23691
You must be logged in to reply to this topic.