If you had to train a GEOINT (geospatial intelligence) object detection algorithm to counter an emerging threat, how quickly could you do it?
The past decade has witnessed a seismic shift in the technology landscape, courtesy of the deep learning revolution. Gone are the days when the sole emphasis was on crafting intricate algorithms or merely enhancing processor speeds. Today, the fulcrum of innovation pivots on one unparalleled asset: data. For those immersed in the technological fray, the saying “data is the new oil” has evolved from a business cliché to a foundational truth. The crux of training sophisticated machine learning systems lies not just in algorithms or computing capability, but in the depth and integrity of the data propelling them.
Expedition Technology’s (EXP) journey through the multifaceted world of applied machine learning revealed an inescapable reality: managing data, fine-tuning training pipelines, and vouchsafing reproducibility are not peripheral tasks, they are the very heart of the mission.
Over the past few years EXP has built and deployed a platform called the “Training Data Storefront” (TDS). This innovative system is designed to address a long-standing problem inhibiting adoption of Machine Learning (ML) in the national defense space: the fact that it is extremely difficult to identify, gather, normalize, and extract ML-ready datasets for intelligence challenges.
Due to technological challenges as well as political hurdles, analysts have historically had a tortuous road to assemble raw input for their algorithms. Fragments of potential datasets for many missions are scattered across multiple government agencies, lacking any discoverability or standardization of data formats. TDS ingests and normalizes many disparate streams of proto training data (both human-generated as well as algorithm-generated), resolving formatting differences and greatly streamlining the extraction of an ML-ready dataset. With a few button clicks, users of TDS can search for annotations relevant to their mission and extract a dataset in the precise format needed by their algorithm.
TDS adopts a unique approach to managing dataset errors and inconsistencies. All real training datasets have noise, whether it be in class labels, positional coordinates, or both. Rather than try to ignore this fact, TDS is designed from the ground up to encourage collaborative curation of annotations. TDS takes inspiration from the Wikipedia philosophy of communal editing, and provides visual tools that greatly expedite the correction (and creation) of data annotations.
For “few-shot” missions in which labeled real data is in short supply, TDS also boasts synthetic data generation capabilities. These tools leverage high-fidelity simulation engines to generate large quantities of synthetic data using the principle of “domain randomization.” As a bonus, these datasets are guaranteed to have pixel-perfect annotations! These synthetic datasets can be very effective at pre-training an algorithm for a high-priority few-shot mission.
While the early focus of TDS was on GEOINT data in order to feed computer vision algorithms, the ambitions of TDS have broadened considerably to handle new types of data. For example, we recently added support for storing vehicle track data (intended for pattern recognition tasks), and will soon integrate SIGINT (signals intelligence) annotations. We deployed TDS to a number of different networks and classification levels, leveraging the unique sensor data available on each one.
Today, TDS boasts tens of millions of labels rooted in authentic data, a robust class ontology, and a REST API primed to dovetail with community automation tools. At Expedition Technology, our array of solutions doesn’t merely challenge norms—it reinvents them. Embarking on a career here signifies more than just joining a company; it’s a commitment to a movement poised to craft the forthcoming epoch for our nation. If spearheading innovation ignites your passion, we invite you to be part of our transformative journey.