Rhize Up

Rhize Up w/ David Schultz: Manufacturing Data Hub? (feat. Andy German and Geoff Nunan)

David Schultz

David: Good morning, good afternoon, good evening, and welcome to the Rhize Up podcast. My name is David Schultz, and today we’re going to talk about the Manufacturing Data Hub.

We’ll continue to draw on our conversation around the Unified Namespace (UNS) and the Uber Broker. We’ll also talk about some of the elements of the ISA-95 standard, going back to our first two episodes of the podcast. It’s going to contain some of the same elements. So, we’re going to talk a little bit about composition versus inheritance. We’re going to talk a little bit about orchestration versus choreography. And we’re even going to get into talk a little bit about, event-driven architecture.

But before we get to this, let’s introduce our two guests. I am joined today by Geoff Nunan and Andy German. So, Geoff, take a moment. And for people who don’t know you, make a quick introduction. Tell us a little bit about who you are.

Geoff: Thanks, David. My name is Geoff, and I’m the co-founder and CTO of Rhize. I’ve been doing control systems and manufacturing software since 1994, so 30 years this year. I’m obviously Australian, as you will tell from the accent, but I have been doing this around the world for so many years. Yeah, I’m excited to be here and excited for the conversation.

David: Excellent. So, Andy, not an Aussie but a Brit. Let’s learn a little bit about you.

Andy: Yeah, absolutely. I’m Andy. I’m based out of the UK – somewhere close to Manchester. Which is a town some people may recognize – mainly football followers, I think.

I’ve been with Rhize for three years now. But I have a much longer history. I’ve been around for about 25 years in this space. The first half of that was mostly in software engineering roles and, more recently, in architecture/leadership roles. So, yeah, it makes me feel old when I say 25 years.

David: Hey, I’ve been doing it for 25 years as well, so I totally get it. But I still feel young. When I say “the young people,” I sometimes still think of me in that as well. So, Andy and Geoff, welcome to the podcast. I’m looking forward to this conversation. Thanks for joining.

REVISITING UNIFIED NAMESPACE

I want to do a quick level set. I’m just going back to a couple of our previous episodes. One was around the Unified Namespace, and the purpose of that podcast was to talk a little bit about what is the definition. Let’s socialize that definition, just so there’s a a good understanding and an agreed set of terms there.

Where we landed is it’s an approach for an event-driven architecture that uses a Pub/Sub technology like a broker. It also applies a DataOps tool that defines the data models, as well as the topics, and where that data will be exchanged. 

One of the things that we all agreed on was that’s a pretty generic definition.

And that’s a good segue into what we were talking about with Rick Bullotta—what he calls his Uber Broker. I think it’s a codename for what it is he’s trying to build. Some of the things that the UNS doesn’t offer the Uber Broker tries to solve. 

First the ability to make call functions against some sort of endpoint. I need to be able to call methods on that. There needs to be the ability to query UNS. And also I need to start containing and having some metadata in that. Bringing all those things together, I realize there are a few more parts that are going to be part of it. UNS really just creates that core architecture, that core ecosystem.

And then we’re going to expand that. So, we now have the ability to get back the data. That’s really what’s important to us is how do we understand what’s actually happening on our plant floor—and the ability to interrogate and analyze and do all the things with that? 

So with that, let’s talk a little bit about what’s a Manufacturing Data Hub, then? Where is that different? My position is that a Data Hub contains a UNS. I would say it contains the Uber Broker or has elements of that. Geoff, can you tell us a little bit about when somebody asks you what’s a Manufacturing Data Hub, what do you tell them?

HOW DO YOU DEFINE A MANUFACTURING DATA HUB

Geoff: Yeah. Great question. And obviously, it’s one that we get asked about a lot. So, if we start with the context of the UNS, which is that Pub/Sub broker and the unified topic namespace, it’s really a mechanism for communicating between systems. Right? It’s an integration pattern. It’s somewhere that you can go and get the current context of whatever is in there.

But it’s also the mechanism for systems to communicate with each other. So, then you’ve got what Rick Bullotta was talking about, which is okay, but I need to be able to kind of do a little bit more than that. I need to be able to separate things out into different topics. I need to call functions. I might need to be able to query both that current state and history, which means that history is being stored in some way. Right? So, a manufacturing data hub is if the UNS is an onion; it’s the layer around the outside.

It’s how systems communicate. But then inside that, you’ve got functions, and inside that, you’ve got storage. You’ve got a database. You’ve got a schema that you’re working with. You’ve got an application logic layer to have functions. And you can integrate that in with all the things that are connected to your UNS. It’s an application that sits inside the context of a UNS. So it’s that unified application layer if you like. 

Read more about defining an MDH → 

How it came about is really from two different places. As I said, I’ve been, working in this industry for 30 years. Many years ago, I did a lot of MES (manufacturing execution system) implementations using off-the-shelf MES software.

Read MES is Dead! Long Live MES → 

I’ve done a lot of those over the years, and I really got frustrated with how difficult it had become to do quite simple things. I was always battling against the hard constraints of an off-the-shelf system to do whatever the customer or the business wanted to do. It was never a fantastic fit between the requirements and the capabilities of the platform. Maybe you get 80%. But then the other 20% of the requirements would take 80% of the time to build. 

So, I was trying to come up with the concept of saying, “Well, what if we followed the pattern that exists in other industries of a headless pattern of having the hard computer science part done in a platform?” But then being able to leverage low-code, front-end application development tools to be able to do the application development bit a whole lot simpler. 

So, it came from that and the other set of requirements, which were born out of seeing a lot of data scientists come into manufacturing and spending a whole bunch of their time doing the tedious bit of collecting and contextualizing and cleaning the data so that the data science team could actually then do the magic—the data science bit.

And so these two things go together with the need for a platform that can collect, contextualize, and cleanse this data so that I can leverage low-code, front-end tools to build applications. But I also need to be able to stream that out in that nice, clean, contextualized data set to enable the AI and the data science teams to be a whole lot more efficient. To save them from 80% of the data collecting and cleansing work and be able to stream good data to them.

So it came from those two needs that we said, “Okay if we were to take the best of modern technology and put it together, what would it look like to solve those two problems?” The answer was it’s kind of like a UNS, with some storage and with an application layer, to be able to do the application functions and the stream processing and the other bits to it. That’s really what a Manufacturing Data Hub is.

David: Yeah. Excellent. We talked a lot about “don’t buy your MES, build your MES.” So it sounds like you actually built your own MES. And, of course, the next thing you do is make this commercially available.

But I think what’s different here is that it’s much like UNS. There’s an approach to how you’re doing it. As we define a Manufacturing Data Hub, there are some elements to it that need to be there. And, of course, within the Rhize platform itself, there are some services that that need to run in there to make all of this stuff happen.

UNPACKING THE COMPONENTS OF A MANUFACTURING DATA HUB

With that, Andy, let’s just talk a little bit about some of those services that are available. There’s an edge agent. We have a rule engine. There’s a workflow engine. We utilize a graph database or a knowledge graph as another term to describe that. And, of course, there’s a query-able endpoint. So, let’s just talk a little bit about some of the services that you would need, generically speaking, for a Manufacturing Data Hub, but specifically for the Rhize Manufacturing Data Hub platform. What’s the role of the edge agent? What is it doing?

Explore the Rhize Manufacturing Data Hub → 

Andy: Okay. So, in broad terms, what we generally try to do with any of our project implementations is bring data in, land it into a persistence layer, generally speaking, into the graph, and make sure it goes in the right place as it comes in.

ABOUT THE GRAPH DATABASE

So, working back from the graph database. The graph database is a persistent schema. It’s an ISA-95 schema. We’ve got the whole object model within our graph database. We’re looking to land data from the process and also put definition data into the graph database, which sets us up to be ready to respond to events that we’ve got coming in from the shop floor.

I normally describe it as two streams of data that we get from a shop floor. We have what I would describe as telemetry data, process data or tag data. Generally speaking, we would be abstracted from the low-level protocol layers. We’re not talking necessarily to PLCs. We’re normally looking to connect to an OPCUA server or an MQTT or some other broker – something that can provide our data on a basic level for telemetry purposes on a kind of topic structure.

ABOUT THE EDGE AGENT

So our edge agent is the part that takes care of that. So, the edge agent, broadly speaking, provides the connectivity layer to bring the data in. It provides a bit of filtering on that first layer of contextualization. The ege agent is able to determine to its configuration if a particular tag value comes in from the process. Then we can allow that data on the right piece of equipment.

I would say the edge agent provides us with this stream of tag values and allows us to position the data alongside the correct equipment, in a time series context, and in a broker context as well. So the data becomes available in its current state via subscription to the broker. Also it’s queryable, because the agent has pushed that data through a sequence of steps that allows the data to arrive down in the time series layer.

ABOUT THE RULE ENGINE

So, we’ve got our ingress processing for tag data, which also includes a rule engine. That’s another one of the services that we’ve got in the stack. What we do with the rule engine, again, it’s a configuration piece. We configure rules that will effectively sit on a tag or a collection of tags on a particular type of equipment or a particular equipment class.

We’ll look at those values coming in from the process, and we’re able to do simple filtering, simple arithmetic, and simple evaluation, on those discrete values. We can decide whether we’re going to produce the second type of event, which I would determine to be a complex event. Simply put, something’s happened. So what the rule engine does is it allows us to look at a stream of data coming in from the process that’s coming off a particular piece of equipment and then infer from that data whether something significant has happened.

The use cases are reasonably simple. If we see that the order number, the product number, or a barcode scan has changed that means the material may have changed. So that may be an event like the order has changed on a particular piece of equipment. Or we may see the run speed go to a certain value, which is higher than a predetermined threshold value. So we now know that equipment is running at full speed, and we now know that equipment is out of set up, for example. Our rule engine allows us to interpret a raw stream of data coming in from the equipment, and allows us to create or infer more complex events from that.

And what the rule engine does is allows us to say, “I’ve found something interesting in our process, what are we going to do with this?” So, at that point, the rule engine will bundle the context that’s available at that moment in time, and it can be configured to push the event details across to our workflow engine. Our workflow engine can then be used to compose a sequence of business logic effectively, which will allow us to persist something more complicated and more sophisticated into the database. Then, at that stage it can be used to resolve some of the relationships that we’ve got. This is where it becomes a little bit more abstract because when you’re talking about workflows and rules and events and relationships and that kind of thing, it’s the point at which you can kind of get lost and find yourself talking about concepts and abstractions rather than the actual things.

I’ll give you a concrete example of what I was saying before. If we can detect that the process values have changed in such a way that we know a new order is running on a piece of equipment, then what we can do is access our graph and run a couple of queries to find out what the next order is, for example, or if there are any orders in the system that match the conditions that we’ve just out. We can head into the graph and start performing permutations to say that a new order was started at this time, and we know that this user was logged onto the line because we’ve got some other context. We can then start to create a new context for the next bunch of data that’s coming in from the line. So, we may create a window of records for job responses, for example, to use the ISA-95 terminology and how all the new process data that’s coming in will be streamed into a specific job response context. The job response will have a relationship in our graph to a processed segment definition, an operations segment definition, or a scheduled order. So, it’s this chain of events that allows us to explore and understand the contextualization, enrich what’s happening, and persist that into our graph layer.

That’s one part of it. The other side of that, which I’ve not really touched on, is I’m only talking about equipment tag data. But the other the other interesting events that we can have coming through our system are complex events coming from all of the system or other actors within the environment.

So it could be that an SAP order is released, for example. Or it could be that an MES system emits an event that says an order is stopped, or an order is started, or a machine has gone down. Alongside the raw data that we’re ingesting into the platform, we’re also ingesting these more complex events that have themselves got embedded relationships within the data that’s being published.

An SAP order, for example, is a very complex object. There will be an order number. There will be a bunch of materials to be consumed, materials to be produced, and that kind of thing. We’re processing that kind of data not through the rule engine but through our workflow engine in a slightly more sophisticated way. The goal here is to make sure that we’re persisting data into the right place. So we’re persisting data into time series, but we’re also persisting these complex objects in the graph so they can be queried at a later point. We’re trying to build up a full picture of what’s going on in the manufacturing environment in graph and time series so that this data we can see can be consumed in a number of different ways later on.

ABOUT THE GRAPHQL ENDPOINT

The last thing is that this all comes through and is accessible through a query endpoint, and our query endpoint is a GraphQL endpoint. That allows a consumer to consume the data that we’ve been persisting via query. So you can hit our GraphQL endpoint and look at the job orders that running and the job orders that have run online. You can look at the current process values through the GraphQL endpoint. We federate our time series database, so you can also get history values right there from within the middle of a query within GraphQL. Because we’ve got quite a convenient architecture within Rhize – we’ve got the NATS broker embedded, we’ve got the graph embedded, and we’ve got the time series embedded. They’re quite well integrated together.

We’ve also got quite a number of options for getting data via subscription as well. We’ve got a natural Pub/Sub option. So if, for example, an order has been started, it’s a fairly trivial exercise to get Rhize set up so that we emit that event and make that available either through our broker, through MQTT, or maybe we pushed down to a queuing Kafka or something like that.

So when the data comes in and gets persisted, we’ve got a bunch of convenient methods for not only persisting that data but allowing the data to bounce back out of the platform and into other platforms and can be published to other interested parties. I think the difference is what we’re doing is reacting to the data as it gets persisted into the knowledge graph, and the moment after it’s been persisted into the knowledge graph, the data then still becomes available for Pub/Sub. The next stage of Pub/Sub where it’s happening is when the graph gives us convenient access to relationships.

A long explanation there, David, but, I think that tries to sum up some of the services that we’ve got in the platform and what we’re trying to do for the use cases that we’re interested in.

David: Yeah. Awesome. There’s certainly a lot in there. So let me see if I understand, and I’ll relay what I just heard back to you.

I would think that the edge agent is, in many ways, what we’re going to call the UNS piece of the Manufacturing Data Hub. This is all that edge data. Ideally, it’s been contextualized and normalized in some way. That’s mostly where we’re going to notice something change down at the PLC or in the SCADA layer. Something is occurring there. Is that a fair comparison to the edge agent?

Andy: It’s correct. Yeah. And the agents responsible for bringing that in.

Geoff: It’s also doing a bit of what Rick Bullotta was describing in the Uber Broker, which is republishing or breaking out a message and publishing it into different bits. What the edge agent is doing is taking a sensor and putting it in the context of your UNS equipment hierarchy. Okay, let’s say we’ve got a temperature sensor, which might be “Temperature Sensor 1” in the PLC. It will come in through the edge agent as “Temperature Sensor 1.” But it will get re-published into the UNS as equipment, maybe it’s “injection molding machine input temperature.”

So, it’s doing the data collection and the Pub/Sub broker bits, but it’s also binding into the standardized model. I know people tend to have different areas for sensor data in the UNS structures that then get re-published into a unified equipment hierarchy structure. The edge agent is doing that bit, as well, in that re-publishing and the breaking out of one message into multiple messages.

David: Yeah. To me, that rule engine is great for when sometimes the data I’m getting at the edge, I’m not really able to determine what’s actually happening here. So that’s when I create this rule engine to say that the temperature sensor changed, or perhaps the state of the machine changed. But based on its relationship to other data, that’s what the rule engine is doing. This thing changed. What does that mean? Ah, it means this is the outcome there. Now, I can apply the rule to it. And it’s that republish of Temperature Sensor 1. Now, it’s a piece of equipment, and something is occurring there. Is that correct?

Geoff: Not quite. So that’s all in the edge agent. This temperature has come in, and it’s republishing that and putting that in context and saying, okay that’s actually injection molding machine input temperature.

What the rule engine is doing is applying rules to that. And those rules are data rules. For example, we receive the temperature. But is that temperature good or bad? Is is it too high? Is it too low?

So the rule engine is actually somewhere you could apply limits to the temperature and then republish an event that says, “actually we’ve had an over temperature event here.”

It’s putting that temperature in the context of other streams of data or other topics. It also allows you to run an expression across multiple topics to determine whether something interesting has happened.

David: Perfect. Well, thanks for that explanation because I definitely misunderstood there. Great clarification.

Once we have the rule and we’ve just said, “Yep, this is proper; this is good.” Now, we’re going to instantiate a workflow. To me, this is really where it gets hard in the manufacturing execution or the manufacturing operations management. What are all the steps that actually need to occur there in this workflow? It could be just some movement of data, but in this workflow, there could also be some interactions with other systems that need to occur in the traditional way we look at workflow. Is ours doing the same thing on there as well?

Geoff: Here’s an example most people can kind of relate to – the car. If we were to run a Manufacturing Data Hub for our car, we would have an edge agent collecting the data. We might have a rule that’s looking at the engine temperature and monitoring the engine temperature between upper and lower limits. Maybe we’ve got that rule set up so that if the engine gets too hot, I want to trigger a workflow, and the workflow might automatically go and book my car into the mechanic and have it checked out. So, the workflow is triggered by an event of some sort.

But as a series of activities or series of actions that can happen in response to that event, that can be doing some sort of data pipeline, or it can be doing something that’s more of a long-running workflow. For example, I’m going to book my car for a service at the mechanic, and then all the steps of paying for that and picking it up afterward, and all of the sequence of actions that happen after I put my car in for a service.

David: I think related to the graph database, and this is when I first was introduced to what Rhize is doing, this to me is the true value. I’m pre-defining all the relationships. And I’ve also built it against a schema that is a widely accepted standard, but it’s a knowledge graph in the sense that we’re not constrained to a relational database, which has existed for a long time.

That’s the whole point. Every time we’re writing, we’re making these mutations, as they are called, within a graph database. The data that’s written in there is written in a way that is already predetermined, and there are rules for what this data needs to look like before we can run that mutation or make the change there.

So a lot of it in these previous steps is making sure that we’re getting it all ready to go. Then we’re going to pop it into the graph, and the beauty is that it’s a knowledge graph. And I’m not constrained to that relationship. Is that that fair?

Andy: Yeah, I think that’s fair. I think our graph database and the way that we’ve gone about implementing our graph database emphasizes the schema upfront.

We’re not emphasizing schema on write, which is kind of the way that a lot of graph databases assume they might be used. We’ve got this sort of flexibility of mutation that allows you to push the relationships. As you push the data it allows those relationships to evolve. We can do that with our graph database.

But what we’ve actually done, for the specifics of the ISA-95 ontology part of the database is we predefined the schema. We’ve pre-codified all the relationships between all the objects that ISA-95 exposes as part of that standard. When you push data into our Manufacturing Knowledge Graph, you need to respect the relationships that have already been pre-established within that schema.

Read more about the Manufacturing Knowledge Graph →

So that gives you a way to express with ISA-95. It allows you to know in advance what the relationships should be and what the queries back out of that database should be based on the ontology, on the schema that’s defined ISA-95. Which is difficult to do in anything other than a graph database because of the complexity of ISA-95. It’s an object model. It’s quite a complicated object model with a lot of recursion. When you want to access the knowledge graph because of the hierarchical nature of ISA-95 and the long learning relationships from materials all the way through to equipment to schedule, there’s often a real need to traverse and recurse those data objects when you’re accessing that data. I think you can only really do that with a graph database. I think that heavy recursion and traversing and joining across entities get wildly complicated when you’re working with an SQL server just because of the object-relational impedance mismatch. Which is probably a bit too technical, but that’s one of the advancements when you go with graph.

David: I was just going to say I’ve run into those queries where you’re joining across ten different tables just so you can get that data back. After that, I think it’s where you really start to bog down the system, especially if 20 people are trying to run a similar type of point.

But querying the data that’s important. We use a GraphQL API. It allows people to access not only the graph database but you can bring in other data as well. It’s not purely limited. And I think this relates to when we were talking about the Uber Broker. It’s about having that common endpoint where there’s going to be all these various nodes that are going to exist within that ecosystem, and if I want to get data out of them, as we look at it from the context of the UNS, I have to go to all of those various nodes. In this case, I can have a common node that will, if it makes sense or if it’s relevant, have it- that GraphQL API. Now, I can maybe have it hit those other services if it’s relevant, whether it’s time-series data or some data warehouse information we need to bring back. Just having that open GraphQL API that we’re using for that gives us the ability to access the data very quickly. Then the way you ask it questions that’s already been predefined. Is that right?

Andy: Yeah, that’s exactly right. But there’s something else with the way that GraphQL works, which makes it particularly convenient compared to all the other ways of expressing an API.

For example, if you’re using a Rest API, a very basic Rest API, you would tend to have an object on an endpoint. So you are able to determine, generally speaking, which object that you’re going to access. So you might have, you know, “my API slash version one slash products,” and then some parameters that allow you to sort of supply which product you are interested in.

Then, when you hit that product endpoint, which is effectively a path in Windows Explorer, for example. When you hit that endpoint, you’re going to get the files that exist in that folder. It’s analogous to a file system in a way. And that’s not the whole story. I’m just trying to find an analogy.

But if you’re accessing multiple entities, if you’re accessing products, but you’re also accessing orders or materials or other objects that are sort of hidden behind this database layer, what you end up having to do with the client side of things, if you’re interested in multiple objects, you’ve got to make multiple calls out to that REST API. Suppose you’re going to bring those objects back in your client code, in memory, or whatever system you’re working in. In that case, you’ve got to rebuild the relationships at runtime between those objects when you’ve queried for them, which limits your ability to manage complex object relationships on the client side. It’s cognitively difficult to code for that kind of thing.

And it’s almost exactly the same problem if you’re working with a broker. If you’re working with an MQTT broker, the topic structure defines part of the data. Your equipment hierarchy is defined by the path, the topic structure, and that kind of thing. So if you want to try to make relationships or understand how things in different parts of the broker are related, then you’ve got to do multiple subscriptions. Then there’s a temporal element to the way that you deal with the data that then arrives that allows you to connect that data back up.

It works because we’ve got this idea that we can democratize data with an MQTT subscription or a REST endpoint. We do run into this next problem of rebuilding relationships on the client side. You don’t get that with GraphQL. You query, and you can traverse and recurse across these objects that are available in the graph. But when you get your data back, the relationships are in place.

They’re embedded in the JSON response that you that you’ve got back. There’s a convenience there for the front-end developer or even for the back-end developer who’s putting together the business logic. You can choose the data you want and the query time, and you can bring it back into a predefined structure with the relationships in place. Then, you can work on that data. So it avoids the need for having orchestration of multiple different object types and many visits to the database because you get them in one hit on a single endpoint.

People on this episode