0

My stack is as follows:

  1. EventBridge fires a Glue job at a regular interval.
  2. Said Glue job runs Python scripts, which run as Step Functions.
  3. The output of these scripts is saved to S3.

How can I monitor this? I ideally would like AWS (presumably, CloudWatch talking to SNS) to email me if the Glue job fails, but my definition of "fails" seems so broad that I feel like I'm solving the wrong problem. The below is a list of possible failures, but I'm hoping for a single solution that is so general that it hits all possibilities. If I could both always be alerted in case of a Glue error and make most possible failure cases trigger a Glue error, then I would probably be happy.

  • The Glue job simply doesn't fire.
  • The Glue job fires, but errors.
  • The Glue job fires, but the Step Functions do not.
  • The Step Functions error.

1 Answer 1

0

In the documentation there's examples of how to trigger events and they all start with something like:

exit 1 unless topic_exists?(
  Aws::SNS::Client.new(region: 'us-east-1'),
  'arn:aws:sns:us-east-1:111111111111:aws-doc-sdk-examples-topic'
)

In POSIX, exit(1) is a signal that should be treated as error, and per the Elastic Bridge monitoring docs, you should be able to see the counters incrementing in case a rule exits with exit(1).

I think you have to first start all your rules with the exit 1 unless... construct for triggering rules. Then check if Cloudwatch increment the error counter.

I hope this helps.

1
  • This is more of a guess than an answer.
    – J. Mini
    Commented May 13 at 18:08

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .