This topic is very broad and what I am presenting is by no means an exhaustive list, as there are probably hundreds of different ways this can be done. Below are 3 different types of methods to display information in your Sitecore website that comes from an external source. You might be wondering why I haven’t included a Sitecore job in this list. That was done purposefully. As we move to a more composable technology landscape we need to be architecting platform communications with microservices and apis and move away from platform customization when possible. The days of integrated data crunching are numbered, in favor of external processes acting on platforms via api.
Using a microservice to load data directly from a source into Sitecore.
Imagine your client has a requirement to present press releases on the Sitecore site, but the press release isn’t actually authored in Sitecore. It’s authored in a separate platform that provides authorability but also distribution of the press release on behalf of the client. Typically in these scenarios, the platform will either have an xml feed of press releases or some sort of API. Regardless they will/should provide a way to get press releases.
In this architecture, we are using a simple microserver/console job that runs on a scheduled interval. Perhaps it’s 5 mins, or perhaps it s an hours. This microservice is then responsible for the querying of data from the press release source, transforming it in same run, and then using the RESTful API for the ItemService or another Sitecore API, to load the press release item into Sitecore. Simple and straighforward.
Using Webhooks and Message Queueing
This scenario is similar to the previous example, except that we’ve introduced a more complex set of data we need to distill down and enter into Sitecore. In this example, imagine a client has a set of locations they are adding as a list inside Sitecore that will eventually be loaded on a map and have some data graphed about them. The locations are simply an address, but the location item has other fields such as latitude and longitude, monthly crime rate, average monthly temperature and rainfall. Since there are multiple sources of information and it might take a long time acquire and transform that data we need/should add in some other infrastructure to support this scenario.
Let’s assume that in an Item Saved event in Sitecore, we are taking that information, bundling it up, and making an http POST request to a Logic App. That data in that post eventually makes it to a service bus. When the Location Item Added Topic is picked up from the service bus, we can then introduce our longer running process to aggregate the required data for this topic. This microservice does not need know about Sitecore. It just needs to know how to connect to the various apis and aggregate the data, then place it back on the bus as a Location Aggregated Data Topic.
The second microservice listens for the Location Aggregated Data Topic and connects directly to Sitecore via API to add/update the content.
What this sort of architecture does is start to separate concerns, leans in microservice architecture, and allows for processing of large volumes of data. Sitecore doesn’t currently offer Web hooks, however, this can be simulated by the making an https post request in the item:saved event. In the future, if Sitecore uses WebHooks, you can imagine easily transitioning to use that new feature with this architecture in place. This is just one example of how this sort of decoupling can be done and is not the answer for all scenarios. It is good for us to start thinking in this way though.
Create Separate Architecture and Load directly from the UI
This architecture is based on a real scenario in which a client was using a Digital Asset Management (DAM) platform to host and manage assets. The assets include PDF, Video, Images, Audio, etc. The requirement was to have all the assets available and filterable on their website by their website consumers. We did not transfer the actual asset, but rather the data we collected was meta data about the assets, data needed for filtering and faceting, language specific fields, and then various links to download or preview, etc. In this case the external data source is the DAM.
Because the data in the DAM was spread across multiple API endpoints, a Webjob (Console App) was created to query the APIs and aggregate the data. A rather complicated job that has to negotiate authentication, as well as, the various API endpoints, and finally the database updates. In total, there are 4 tables that store information about the job every time it runs and one master asset table.
In the database, the track changes feature was enabled, which allowed for connecting an indexer to the database’s asset table and provides the ability to have the Azure Cognitive Search index automatically update when items in the database are created, updated, or deleted. A very nice feature.
The final piece of this was a Azure Function that used Apollo Server to provide the graphql functionliaty, and which utilized the azure search modules for querying Azure Cognitive Search.
On the Sitecore side, we created a specific React Rendering that was able to send and receive graphql requests to and from the function to get the data. The React Rendering was also responsible for the grid display, filtering and sorting, pagination, etc.
Summary
As always, there are many options for us to choose from when architecting solutions. Balancing complexity, cost of implementation and infrastructure, and maintenance must be considered as we make architectural decisions. Along with that, is there a pattern that works for your organization when working with data and Sitecore? Establishing the pattern can be difficult or time consuming, but once it’s established, adding on microservices to the pattern should be easy.