Where Data Science meets Freight Logistics

 “The rise in demand for fresh food requires the food industry to reinvent food logistics from farm to fork.”, says Ronnie Garrett from Food Logistics.[1] For us at DIVERGENCE.ai, we’re obsessed with Delivering, Packaging and Marketing fresh and tasty food. In this article, we’ll first cover the various fields in Data Science that apply to Freight Logistics, and then present the top 5 areas Data Science techniques can be applied in Freight Logistics.

Data Science techniques in the context of Freight Logistics

1. Combinatorial Optimization:

Company A has a fleet of 100 trucks across the US and they specialize in supplying fresh produce and meats to various restaurants. From the routes each truck must take to meet the delivery windows to scheduling the drivers on specific days, many decisions have to be made within specific “constraints” so that an “Objective function” Eg: Cost of labor and fuel must be minimized.

Coming up with a feasible (or optimal) solution (Eg: routing schedule) given specific constraints (Eg: 11 hour driving limit/driver[2], cost of fuel, cost of labor etc) and objective functions (Eg: Profit considering Fuel, Labor costs etc) falls in the rhelm of Combinatorial Optimization. Routing, Job Shop scheduling, Binning (Eg: for Less-Than-Truck-Load use cases) are classic examples of this field. For any logistics company, these decisions are central to the day to day operations.

2. AI, ML and Statistics

Note however that the constraints and objective functions provided as inputs to above Combinatorial Optimization algorithms can either come from domain experts, traditional Business Intelligence or from AI/ML/Statistical techniques (Eg: “Statistical Time Series Forecasting for Demand” or “Impending Truck failure event detection from Vehicle Telematics data”).

For example, in the above case for Company A, for the routing schedule to be optimal, the forecasted demand for a specific day is fed into the routing algorithm. Needless to say, the route plan that the routing algorithm gives out is only as good as the forecast.

In the context of this article, I’m clubbing AI/ML/Statistics into a single bin. AI/ML and Statistics typically do not come up with routing schedule. They come up with better predictions for demand or supply that can in turn be fed in as constraints into the Routing/Scheduling algorithms.

Of course there’s a lot more to AI/ML and Statistics and pardon me for lumping such vast areas together, but in this context, I want to contrast these fields with Combinatorial Optimization techniques as well as Process Mining (explained below) techniques.

3. Process Mining

Every process generates time event data. Consider the example of tracking a shipment from the time it is received at the source to the moment it was delivered at the destination. At each stage in the entire process, an event (activity) is generated. Each activity has a unique time stamp.

Process Mining is a relatively new field in Data Science that automatically reconstructs (reverse engineers) the underlying process just by looking at the the time series event data!

Now, why is this important? Remember that the truck route that was computed is only so good as the constraints it was fed. What if an unexpected event (Eg: Driver getting sick or Truck breaking down) occurs? Now, the constraint has to be updated and a new routing schedule must be generated. What would you update the constraint with? This is where Process Mining can be effective. Since Process Mining reverse engineers underlying processes based on past time event data, it can automatically figure out bottlenecks in the systems and more importantly, quantify those bottlenecks, which in-turn can be fed as constraints into the Routing/Scheduling algorithms.

The concept of delivering the right message to the right customer, at the right time, and via the right channel has been around for some time. Next best action marketing can best be described as an evolution of this concept, evaluating the customer’s past behavior, recent actions, interests, and needs in the context of the organization’s marketing goals to identify the most effective action (making an offer, a promotion, reaching out by phone, sending an email, etc.) to achieve desired outcomes.

Here’s an architecture diagram of how the above fields fit together in a typical Freight Logistics company 

Data Science in Freight Logistics

Now that we covered the major areas in Data Science that can be applied to Logistics & Freight, lets now dig into the top 5 areas above techniques can be applied.

Top 5 areas Data Science techniques can be applied in Freight Logistics

1. Dynamic Routing: Time Series Forecasting and Process Mining

Picture Courtesy of blogs.bing.com[3]

Until the advent of Managed Services like Azure Event Hub/Kafka/AWS Kinesis in the last 12-18 months, it took quite a bit of up front Capital and Engineering expenditure to setup the appropriate infrastructure to process such large volumes of telematic data. However, the ability of enterprises to bring up such services on demand is enabling responding to environmental and accidental disruptions more affordable now than ever.

As mentioned earlier, Process Mining can be extremely effective in quantifying specific bottlenecks based on historic data. In addition to these techniques, Truck Routing APIs available from services like Azure and open sourcing of Combinatorial Optimization tools like Google-OR are bringing the most cutting edge technologies to the masses.

All the above aspects need to work synchronously to make Dynamic Routing viable.

  1. IoT and Predictive Maintenance

IoT and telematic data is nothing new, but placing multiple sensors on a vehicle system, effectively and securely managing them has been a challenge until the advent and general availability of mesh networking technologies like Thread [4] and Particle.io.

As mentioned earlier, even ingesting and processing such large volume of data in near real time has been a challenge before the advent of Managed services from Cloud providers.

Note that using IoT data and Time/Frequency analysis, we can predict that a disruptive event is about to happen. It is when this event data can be processed and fed into the dynamic routing (or maintenance) algorithms that the real value of Predictive Maintenance can be achieved.

  1. AI Driven document Processing as a Service

Picture (modified) Courtesy of receipts-templates.com

Sure, you can write an app for everything and give an iPhone to each of your staff and only EDI [5] data is exchanged across systems. But I’m reminded of the story of how NASA spent $3 Billion on a pen that writes in zero gravity while the Russians used a pencil. We see many cases where paper just works and switching over to an app is not without heavy amount of time and capital investment. Even the requirements for what the app should actually look like is not entirely data driven, making the UI design challenging.

At DIVERGENCE.ai, we were able to use Deep Learning augmented Document Processing to automate data entry into back end ERP/Financial systems and we do so without disrupting existing processes. Often, the front end users barely notice any change in their processes. And as we process more documents, we get better understanding of what kind of mobile app can best serve each use case. It has been a painful and memorable lesson for us that getting drawn by the sexiness of AI and automating without profiling is a fast way to burn cash 🙁

  1. Optimizing Processes using Process Mining (Eg: On-Time-In-Full (OTIF), Order-To-Cash etc)

“Walmart has changed their vendor guidelines and scorecard parameters a few times in recent years. They went from requiring a four-day shipping window in 2016 to a two-day shipping window in 2017 and a no-day shipping window in 2018. They have since relaxed rules to allow carriers to deliver one day early (as of April 2018).” [6]

From the time Walmart started enforcing OTIF guidelines in 2016, it has been extremely challenging for suppliers and logistics companies to comply and very few companies have 90% compliance rates, which probably explains why Walmart had to relax the rules in 2018.

From the time you receive order for shipment to the time you deliver the goods and get paid for it is a fairly complex process within most organizations. While the Business Process experts within your company can point to what the intended process is via Process Flow diagrams, most organizations have a hard time pin pointing what the actual process looks like and where exactly the bottlenecks in the process are. Traditional techniques like Lean-Six-Sigma are valuable at narrowing down and root causing specific bottlenecks, but are unable to get overall understanding of underlying processes.

This is an area where Process Mining is most effective. As the time stamp events from various ERP systems get fed into Process Mining tools (Eg: Disco, Celonis) they reverse engineer and find out what the real underlying processes are and help find where the bottlenecks are as well as cases where the real process deviates from the intended process.

  1. Driver behavior prediction

In the US alone there are about 2.5 million accidents every year and 64 percent of those accidents are caused by distracted drivers. Forty-seven percent of drivers are comfortable either texting manually or using voice controls while driving according to a survey conducted by the National Safety Council (NSC). But in reality, texting is more dangerous than drunk driving.[7]

These are use cases where Deep Learning based Vision systems are now a viable solution. Note that until recently, Deep Learning on Edge devices has been an engineering challenge. With advent of fully managed AI services like customvision.ai, it has become practical to deploy/maintain and update deep learning machine visions systems on the edge using model containerization.

Note that vision systems are not the only methods to quantify driver behavior. Accelerometer sensors are also pretty good at detecting driver behavior as well, according to Brian Kursar at Toyota Connected [8].

Implications in Packaging and Warehouse management and beyond

Most of above techniques are just as applicable in Packaging and Warehouse management. Also note that most of individual techniques have been around for at least a decade and quite a few techniques even dating back to the World War II. However, most techniques were only available for a privileged few, and even then was an engineering and logistic marvel to connect disparate systems even a few years ago. Getting (near) real time results on Big Data has also been a relatively recent phenomenon.

With cloud adoption increasing rapidly and more services becoming “managed”, companies are now able to focus directly on the problems as opposed to the logistical nightmares of getting licenses for the tools and provisioning them. This also needs a radically different thinking workforce that can learn fast, fall in love with the problems (not tools) and just pick up the right tools and techniques when needed.

We hope you found this article helpful in connecting the various Data Science Techniques and would love to hear your thoughts.


Divergence.AI is a Full service Management and AI consulting firm. We are based in Dallas, TX.
Our deep capabilities in strategy, process, analytics and technology help our clients improve their performance. We provide expert, objective advice to help solve complex business and technology challenges. We bring our knowledge and experience to develop and integrate AI-driven solutions within the customer’s business environments.

About the Author: Vish Puttagunta

As the CTO and Principal Data Scientists at DIVERGENCE.ai, Vish helps companies incubate Data Driven teams centered around Marketing, Operational Excellence, Fraud Detection and Food Safety.

As the Director of Data Science Programs at Divergence Academy, he teaches and continuously evolves the curriculum for Data Science on Big Data/Cloud based on feedback from various consulting engagements and market research.



[1] https://www.foodlogistics.com/cold-chain/article/12332506/logistics-gets-fresh

[2] https://www.fmcsa.dot.gov/regulations/hours-service/summary-hours-service-regulations

[3] https://blogs.bing.com/maps/2017-05/truck-routing-and-more-exciting-news-from-build-2017

[4] https://en.wikipedia.org/wiki/Thread_(network_protocol)

[5] https://en.wikipedia.org/wiki/Electronic_data_interchange

[6] https://ziplinelogistics.com/blog/walmart-on-time-in-full-otif-program/

[7] https://www.linkedin.com/pulse/financial-impact-distracted-driving-tony-summerville/

[8] https://towardsdatascience.com/data-science-at-toyota-connected-69bf50982b09


Additional References

Combinatorial Optimization Tools





Process Mining











Real Time “Big” Data Stores





“Big” Data Stores for Generic processing



Real Time “Big” Data Processing/Ingest





IoT Mesh Networking