• Link naar LinkedIn
00 31 85 0285 085
Intodata Nederland
  • Home
  • Services
    • Data efficiency
    • Stappenplan
    • Aanpak
  • Expert teams
    • Expert teams
    • Migratie & integratie
    • Kwaliteit
    • Ai Analytics & rapportage
    • Workflow automation
    • Beleid
    • Architectuur
  • Technologie
    • iPaaS experts
      • Exact online integraties
      • Bluestone PIM integraties
  • Ons bedrijf
    • Over ons
    • Onze waarden
    • ISO certificering
    • Cronos Groep
    • Careers
    • Onze teams
  • Blog
    • Blog Intodata
    • Keynotes Intodata
  • ENG
  • Contact
    • Afspraak plannen
    • Locaties & contactgegevens
  • Menu Menu

Elastic observability helping reduce time-to-resolution

By Hugo D’Arcy – Data Engineer & Elastic Expert

During my time in Tech there’s always a large problem is, why did this break, usually followed by, oh this has been broken for days …. We’ve all been there and finding ways of getting to the root cause is nice; finding ways of getting to the root cause quickly is amazing.  I’ve been working with the Elastic Stack for 18 months now and whilst the standard version helped a lot with understanding various aspects of operational work and finding errors within data integrations, which is nice, some of the features in the Gold Cloud version will help provide some much needed proactiveness to the monitoring.

This article will cover some of the Gold Cloud Elastic features I am looking forward to tinkering with, to greatly reduce time-to-resolution with a mix of proactive monitoring and more in-depth logs, metrics and traces, as well as a high-level overview of application environments. The example environment I will use in this article will consist off ActiveMQ (middleware), Talend (Intergration/ETL) and a Postgres (source database), feel free to swap out each component for your own environment’s.

Probably the most important feature in finding root causes significantly easier and faster is 3rd party alerting, i.e. when a process fails an alert is sent to slack, email, or teams, these can be based on specific custom rules. This serves as the catalyst in our root cause analysis. For our example, the rule will be an ActiveMQ dead-letter queue having a message within it, which enables us to have proactive monitoring, instead of someone stumbling across its days later.

Another useful feature is Synthethic Monitoring which allows us to scripts to check various states of applications.  I will experiment creating a script that checks the status of the dead letter queue, i.e. when it is not empty it returns a response, and we tied this to an alert rule. This triggers our catalyst allowing the team on call to start their incident analysis the moment the issue occurs, additionally, providing them with the precise moments to investigate. This is particularly useful as we can find the most eloquent ways of finding errors within our environments. From here, all the logs and metrics from ActiveMQ and Talend are within our Elastic Stack, as such we can check precisely where the issues came from.

As a result, by using precise alerting rules and leveraging knowledge of the middleware, we can proactively notify ourselves greatly reducing time-to-resolution.

Now let us turn our attention to another avenue we can use to improve observability, traces. A trace is a group of transaction and spans with a common root.  I am intrigued to place an elastic agent onto a Talend remote engine’s JVM to see what kind of traces it returns and how these can improve efficiency and performance. So that we can use this to get information on exceptions and errors within our jobs, routes and connectors for other applications such as databases; thereby, aiding development and optimization of our data integrations.  This along with the two features above will greatly aid in our quest to find root causes quickly and efficiently.

Lastly, services inventory enables us to see a high-level overview of all our applications within an environment, with all the dependences and connections for our applications. From a quick glance we can see useful metrics and logs. Since, sometimes an error creeps in our dead-letter queue, but it stemmed from somewhere completely different, say a database connector being faulty. Using the services inventory, we see that the logs for the database are spiking due to an error in the connector, and this was the root cause of our issues.

In conclusion, I am really intrigued as to the full scale of features that can be used within the Elastic stack for proactive observability. Be it, poignant alerts, a dead-letter queue’s status not being empty, a span from Talend remote engine being slow or a faulty connector which enables us to save hours if not days of down time and let alone the amount of time saved for root cause analysis for the issues.

Ook minder downtime? Neem contact op! Back-in-time Back-in-time

Intodata & De Cronos Groep

Intodata is onderdeel van
De Cronos Groep

Meer weten over onze unieke groep met meer dan 9.000 bevlogen IT-collega’s en ruim 5.000 klanten in Europa?
We vertellen je er graag meer over. Kijk op www.cronos.nl

Intodata Amsterdam

Basisweg 32
1043 AP Amsterdam

+31 85 0285 085
info@intodata.nl

Intodata Utrecht

Van Deventerlaan 31-51
3528 AG Utrecht

+31(0)85 0285 085
info@intodata.nl

Intodata Breda

Rithmeesterpark 50 A1
4838 GZ Breda

+31 85 0285 085
info@intodata.nl

Intodata Antwerpen

Veldkant 33
2550 Kontich

+32 3 451 93 28
info@intodata.be

Copyright - Intodata Nederland
  • Privacy- en cookieverklaring
Link naar: Keynote 22 mei 2025: Top 3 Data-uitdagingen voor IT-managers in Logistiek ​(en hoe je ze direct oplost) Link naar: Keynote 22 mei 2025: Top 3 Data-uitdagingen voor IT-managers in Logistiek ​(en hoe je ze direct oplost) Keynote 22 mei 2025: Top 3 Data-uitdagingen voor IT-managers in Logistiek ​(en...
Scroll naar bovenzijde Scroll naar bovenzijde Scroll naar bovenzijde