Let’s look at how we can improve the reliability of the process for passing Orders from WizAStore to EntMine. The classic solution to this problem is to buffer the information locally on the sender in a file then ask the receiving system to collect the information when it is ready this is usually done through batch file processing. The sender writes all the orders to a known file location and the receiving system has a job that wakes up every hour or so and copies that file locally and processes it.
So what is notable here? Well the first thing is that there is that as order numbers are produced in EntMine and not WizAStore, we have no way of displaying these orders immediately back to the user. Remember order numbers only generated at the end as EntMine stores the orders back to the database and by that time we may have waited 12 hours for the batch job to run, and the customer has already left to go home. This could be solved by say sending an email confirmation to the user but this gives us a disjointed process.
The second and more positive thing to point out is that WizAStore software is only blocked for the amount of time it takes to append the order to the local file. As this is reasonably quick we don’t have to worry about any of our nifty multi-tasking tricks as we did earlier. So this simplifies the solution somewhat.
You should also notice that the file store we have chosen to use as a temporary store, is local to the WizAStore software (specifically on the same physical machine). This is not an accident, it’s there for two reasons, firstly we don’t want any network delay to be incurred should we store this on an NFS mount (or equivalent). Secondly and more importantly, if we put this information on a different machine we are simply hiding the problem with a synchronous push model down a level (moving the problem down the OSI layers to NFS not HTTP for example) and not actually solving the problem of what happens when the storage (network) is not available.
Resilience is nicely built into this process as the orders from WisAStore are persisted (i.e. stored to a disk and not just floating about in memory somewhere) so if that program fails then all the orders are not lost. Also if the network is not available when EntMine requests a copy of all the latest information then that is fine, the orders are still safely being added to by WizAStore and will be still be there to be processed when IT fix the network issue in a few hours time.
Another nice feature of this file based model is that pretty much every development language out there can write to and read from files, this means that we don’t have to include any third party libraries to our customisations to get the communications to work. All these positives offer some explanation as to why this pattern is so very common in enterprises still today.
Its not all plain sailing though, we do need to be careful that we do not run into file locking problems, that is to say that WizAStore is trying to append to the orders file as EntMine is trying to copy or delete the file. There are, to be fair, many standard patterns to handle this. But we also need to remember if this local disk runs out of space we could still face issues of lost orders. Also if we have multiple instances of WizAStore running – for say load balancing, resilience or scalability, we could have lots of order files to manage from these different machines.
This model is also not suited to event driven architectures as we are relying on batch processing, i.e. something waking up at pre-defined intervals and processing a batch of information. We can reduce the time interval but in reality, and especially with file processing, we will start getting into contention problems if we take this down to sub 60 seconds.
In many senses a file being processed in this way is a very basic example of a queue system, the first order of the batch is at the top of the file and is generally processed first when the file is transferred to the file processing system. Other queue methods include shared databases and message queueing systems. In these cases you could argue that by replacing the file with a local database or a persistent queue you can achieve exactly the same result, and you may also give your self more flexibility to reduce the polling time down further than you could with the file storage described earlier.
If you used a corporate database or a corporate messaging queue system instead of a local store, you have not really solved the issue of what to do when that system is unavailable just like if you moved the local store to an NFS mount somewhere – the problem has been shifted again and not solved. I am purposely not covering purported 100% up systems as from experience they are not, again see how well Microsoft, AWS and Google manage it.
Arguments abound in this area as to if a shared queue system or a replicated database can be relied upon here to give you a persistent transport mechanism, in all honesty they can, and you could use two local databases or queues installed on each machine in a cluster to replicate the information instead of relying on push or pull models as described. However, if you do this be aware that whist you are not performing those pushes and pulls, your replicated database or fancy queueing system is just doing it for you under the hood. Even worse what you have done is tied your two systems together using a single vendor technology (usually propriety), this may be acceptable in a software suite, but is really not a great idea in Enterprise Architectures as it means that you are creating integrations tied to a specific technology (and usually vendor) through what is termed ‘consumer to implementation coupling’.