This topic has 4 replies, 2 voices, and was last updated 3 years ago by Tillman.

  • Author
    Posts
  • #23674
     Tillman
    Participant

    Hi,

    So we are having three nodes with OpenDJ and replication enabled (by the first node). Now if the replication service cannot start for some reason (eg this bug as in our case) on one of the nodes that means that we cannot write to that node but reads are still possible.

    How is fail over handled in this situation? Can it be handled by OpenDJ somehow, assuming a correct setup of course, or does it fall on any LDAP SDKs (like Unboundid) to handle that?

    Br
    Niclas

    #23675
     Gentjan Kocaqi
    Participant

    Hello @tillman,

    I do believe that the term ‘failover’ in your description is not used correctly or it might be I did not get correctly your question.
    You are saying that you have 3 instances of OpenDJ and these instances are part of the same replication topology. If one of your instances hit the bug you reported, I do agree that this instance should still continue provide ‘read’ but not ‘write’. And it makes sense if you think about that. Do you really want to allow ‘writes’ on a instance that is not working properly under the replication topology? No, you don’t cause you will end up with instances not aligned because of the replication not working properly. If you read the workaround on that bug, it is clear that you need to disable replication, clean changelogDb and enable replication (I will add the step of doing the initialization as well to align this instance with the others).

    Cheers

    #23676
     Tillman
    Participant

    Hi @gentjan-kocaqi,

    I’m new to OpenDJ and directory databases/servers so my terminology might not be correct. The failover functionality is what I’m looking for for our application though. I agree that we don’t want to write to an instance that is not working properly, but in those cases we would like to write to one of the other instances instead. I would guess from your response and other forum posts that this is not something that OpenDJ supports? Rather it would be up to any individual applications to supply this functionality?

    Regarding the bug we have solved that already and will step to a newer version as soon as possible to avoid this situation again. We do have high requirements on stability though, which is why we are looking into these questions.

    Br

    #23677
     Gentjan Kocaqi
    Participant

    Hi @tillman,

    One thing you can do to speed up the things in such situation you occured is having monitorings in place for OpenDJs. You could monitor if you have correct reads/writes in your instances and fire an alert if not.
    Regarding the failover, you do this at the application level but that might not solve this issue to you cause as far as I know the failover at application level checks if a certain first_service_in_failover is available, if not fails over the next second_service_in_failover, and so on. Failover will not catch if your writes are not working properly. Anyhow, upgrading and having a monitoring in place are my best suggestions here.

    Cheers

    #23691
     Tillman
    Participant

    Hi @gentjan-kocaqi,

    I will look into monitors and what the application level can do. Thanks for educating me.

    Cheers

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic.

©2021 ForgeRock - we provide an identity and access platform to secure every online relationship for the enterprise market, educational sector and even entire countries. Click to view our privacy policy and terms of use.

Log in with your credentials

Forgot your details?