开发者

Quartz retry when failure

开发者 https://www.devze.com 2023-01-29 16:47 出处:网络
Let\'s say I have a trigger configured this way: <bean id=\"updateInsBBTrigger\" class=\"org.springframework.scheduling.quartz.CronTriggerBean\">

Let's say I have a trigger configured this way:

<bean id="updateInsBBTrigger"         
    class="org.springframework.scheduling.quartz.CronTriggerBean">
    <property name="jobDetail" ref="updateInsBBJobDetail"/>
    <!--  run every morning at 5 AM  -->
    <property name="cronExpression" value="0 0 5 * * ?"/>
</bean>

The trigger have to connect with another application and 开发者_如何学运维if there is any problem (like a connection failure) it should to retry the task up to five times every 10 minutes or until success. There is any way to configure the trigger to work like this?


I would recommend an implementation like this one to recover the job after a fail:

final JobDataMap jobDataMap = jobCtx.getJobDetail().getJobDataMap();
// the keys doesn't exist on first retry
final int retries = jobDataMap.containsKey(COUNT_MAP_KEY) ? jobDataMap.getIntValue(COUNT_MAP_KEY) : 0;

// to stop after awhile
if (retries < MAX_RETRIES) {
  log.warn("Retry job " + jobCtx.getJobDetail());

  // increment the number of retries
  jobDataMap.put(COUNT_MAP_KEY, retries + 1);

  final JobDetail job = jobCtx
      .getJobDetail()
      .getJobBuilder()
       // to track the number of retries
      .withIdentity(jobCtx.getJobDetail().getKey().getName() + " - " + retries, "FailingJobsGroup")
      .usingJobData(jobDataMap)
      .build();

  final OperableTrigger trigger = (OperableTrigger) TriggerBuilder
      .newTrigger()
      .forJob(job)
       // trying to reduce back pressure, you can use another algorithm
      .startAt(new Date(jobCtx.getFireTime().getTime() + (retries*100))) 
      .build();

  try {
    // schedule another job to avoid blocking threads
    jobCtx.getScheduler().scheduleJob(job, trigger);
  } catch (SchedulerException e) {
    log.error("Error creating job");
    throw new JobExecutionException(e);
  }
}

Why?

  1. It will not block Quartz Workers
  2. It will avoid back pressure. With setRefireImmediately the job will be fired immediately and it could lead to back pressure issues


Source: Automatically Retry Failed Jobs in Quartz

If you want to have a job which keeps trying over and over again until it succeeds, all you have to do is throw a JobExecutionException with a flag to tell the scheduler to fire it again when it fails. The following code shows how:

class MyJob implements Job {

    public MyJob() {
    }

    public void execute(JobExecutionContext context) throws JobExecutionException {

        try{
            //connect to other application etc
        }
        catch(Exception e){

            Thread.sleep(600000); //sleep for 10 mins

            JobExecutionException e2 = new JobExecutionException(e);
            //fire it again
            e2.setRefireImmediately(true);
            throw e2;
        }
    }
}

It gets a bit more complicated if you want to retry a certain number of times. You have to use a StatefulJob and hold a retryCounter in its JobDataMap, which you increment if the job fails. If the counter exceeds the maximum number of retries, then you can disable the job if you wish.

class MyJob implements StatefulJob {

    public MyJob() {
    }

    public void execute(JobExecutionContext context) throws JobExecutionException {
        JobDataMap dataMap = context.getJobDetail().getJobDataMap();
        int count = dataMap.getIntValue("count");

        // allow 5 retries
        if(count >= 5){
            JobExecutionException e = new JobExecutionException("Retries exceeded");
            //make sure it doesn't run again
            e.setUnscheduleAllTriggers(true);
            throw e;
        }


        try{
            //connect to other application etc

            //reset counter back to 0
            dataMap.putAsString("count", 0);
        }
        catch(Exception e){
            count++;
            dataMap.putAsString("count", count);
            JobExecutionException e2 = new JobExecutionException(e);

            Thread.sleep(600000); //sleep for 10 mins

            //fire it again
            e2.setRefireImmediately(true);
            throw e2;
        }
    }
}


I would suggest for more flexibility and configurability to better store in your DB two offsets: the repeatOffset which will tell you after how long the job should be retried and the trialPeriodOffset which will keep the information of the time window that the job is allowed to be rescheduled. Then you can retrieve these two parameters like (I assume you are using Spring):

String repeatOffset = yourDBUtilsDao.getConfigParameter(..);
String trialPeriodOffset = yourDBUtilsDao.getConfigParameter(..);

Then instead of the job to remember the counter it will need to remember the initalAttempt:

Long initialAttempt = null;
initialAttempt = (Long) existingJobDetail.getJobDataMap().get("firstAttempt");

and perform the something like the following check:

long allowedThreshold = initialAttempt + Long.parseLong(trialPeriodOffset);
        if (System.currentTimeMillis() > allowedThreshold) {
            //We've tried enough, time to give up
            log.warn("The job is not going to be rescheduled since it has reached its trial period threshold");
            sched.deleteJob(jobName, jobGroup);
            return YourResultEnumHere.HAS_REACHED_THE_RESCHEDULING_LIMIT;
        }

It would be a good idea to create an enum for the result of the attempt that is being returned back to the core workflow of your application like above.

Then construct the rescheduling time:

Date startTime = null;
startTime = new Date(System.currentTimeMillis() + Long.parseLong(repeatOffset));

String triggerName = "Trigger_" + jobName;
String triggerGroup = "Trigger_" + jobGroup;

Trigger retrievedTrigger = sched.getTrigger(triggerName, triggerGroup);
if (!(retrievedTrigger instanceof SimpleTrigger)) {
            log.error("While rescheduling the Quartz Job retrieved was not of SimpleTrigger type as expected");
            return YourResultEnumHere.ERROR;
}

        ((SimpleTrigger) retrievedTrigger).setStartTime(startTime);
        sched.rescheduleJob(triggerName, triggerGroup, retrievedTrigger);
        return YourResultEnumHere.RESCHEDULED;
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号