M/M/c Process-based with Failures

Processes represent typical event sequences. But if those sequences are altered by "external" events like failures, we have to handle those unusual events as interrupts or exceptions.

The following example shows how this can be done. We assume here external streams of failure events (Poisson processes) causing interrupts to our server processes. We also assume that failures and repairs are independent from each other and therefore that a repair can be interrupted by another failure.

We introduce state variables tracking whether a failure has occurred and whether a job is worked on. Therefore we define a server structure:

using DiscreteEvents, Printf, Distributions, Random

struct Failure end      # a failure event

mutable struct Server
    id::Int
    input::Channel
    output::Channel
    Ts::Distribution    # service time distribution
    Tr::Distribution    # repair time distribution
    job::Int
    failed::Bool
end

Our serve process then has two branches: one for regular service and another for failure handling. With a try ... catch ... end block we capture any exceptions and let a Failure event switch the process to the failure handling branch.

The regular service branch also has to be modified: We must take! a job from the input channel only if no other job is present and we must reset the job status variable when a job is finished. We assume here that either service time or repair time are restarted anew after any failure:

# describe the server process
function serve(c::Clock, s::Server)
    try
        if !s.failed
            if s.job == 0 
                s.job = take!(s.input)
                print(c, @sprintf("%5.3f: server %d serving customer %d\n", tau(c), s.id, s.job))
            end
            if s.job > 0
                delay!(c, s.Ts)
                print(c, @sprintf("%5.3f: server %d finished serving %d\n", tau(c), s.id, s.job))
                put!(s.output, s.job)
                s.job = 0
            end
        else
            print(c, @sprintf("%5.3f: server %d fails\n", tau(c), s.id))
            delay!(c, s.Tr)
            print(c, @sprintf("%5.3f: server %d back to work\n", tau(c), s.id))
            s.failed = false
        end
    catch exc
        if exc isa PrcException && exc.event isa Failure
            s.failed = true
        else
            rethrow(exc)
        end
    end
end

If you compare the serve process with the previous one without failures, it certainly has become more complicated.

Next we need an arrivals function for a repeating event and some constants:

# model the arrivals
function arrive(c::Clock, input::Channel, jobno::Vector{Int})
    jobno[1] += 1
    @printf("%5.3f: customer %d arrived\n", tau(c), jobno[1])
    put!(input, jobno[1])
end

Random.seed!(8710)          # set random number seed for reproducibility
const N = 10                # total number of customers
const c = 2                 # number of servers
const μ = 1.0 / c           # service rate
const λ = 0.9               # arrival rate
const M₁ = Exponential(1/λ) # inter-arrival time distribution
const M₂ = Exponential(1/μ) # service time distribution
const F₁ = Exponential(8)   # inter-failure time distribution
const F₂ = Exponential(2)   # repair time distribution 
const jobno = [0]           # job counter

Parallel to the two servers and their serve processes we start two Poisson processes interrupt!ing them every simulated random inter-failure time.

# initialize simulation environment
clock = Clock()
input = Channel{Int}(Inf)
output = Channel{Int}(Inf)
for i in 1:c
    s = Server(i, input, output, M₂, F₂, 0, false)
    p = Prc(i, serve, s)
    process!(clock, p)
    event!(clock, fun(interrupt!, p, Failure(), nothing), every, F₁)
end
event!(clock, fun(arrive, clock, input, jobno), every, M₁, n=N)

run!(clock, 20)

We get the following output:

0.231: customer 1 arrived
0.231: server 1 serving customer 1
0.672: server 1 finished serving 1
0.743: server 2 fails
0.885: server 1 fails
1.478: server 2 back to work
2.140: customer 2 arrived
2.140: server 2 serving customer 2
2.439: server 1 fails
3.040: customer 3 arrived
3.979: server 1 back to work
3.979: server 1 serving customer 3
...
11.035: server 2 back to work
11.035: server 2 serving customer 8
11.193: customer 10 arrived
11.375: server 1 finished serving 4
11.375: server 1 serving customer 9
11.658: server 1 finished serving 9
11.658: server 1 serving customer 10
12.059: server 2 finished serving 8
12.526: server 1 finished serving 10
12.870: server 1 fails
12.877: server 1 fails
13.034: server 1 back to work
"run! finished with 71 clock events, 0 sample steps, simulation time: 20.0"

As you see we get repeating failures, longer cycle times and more disturbance in job service sequence than in the example without failures.