LGND Raises $9M Seed to Build 'ChatGPT for the Earth' with Geospatial Vector Embeddings

The Earth is awash in data about itself. Every single day, satellites orbiting our planet capture an estimated 100 terabytes of imagery, alongside countless other forms of sensor data, environmental readings, and geographic information. This deluge of data holds immense potential for understanding our world, monitoring climate change, managing resources, planning urban development, and much more. However, extracting meaningful insights from this massive, complex, and often unstructured geospatial data is a significant challenge.

Seemingly simple questions about the physical world can be fiendishly complex to answer using traditional data processing methods. Consider a question of vital economic and safety importance to regions like California: How many effective fire breaks exist across the state, and how have these critical barriers changed since the last wildfire season? Answering this requires analyzing vast stretches of land, identifying features like roads, rivers, lakes, or cleared areas, and assessing their characteristics (width, surrounding vegetation, etc.) to determine their potential effectiveness against a wildfire.

The Challenge of Traditional Geospatial Data Analysis

Historically, analyzing geographic data, especially from imagery, was a labor-intensive process. Nathaniel Manning, co-founder and CEO of LGND, explains the traditional approach: “Originally, you’d have a person look at pictures. And that only scales so far.” Manual inspection is slow, expensive, and impractical for analyzing data covering vast areas or changing rapidly over time.

More recently, advancements in artificial intelligence, particularly neural networks and machine learning, have offered a more scalable solution. Machine learning experts and data scientists can train algorithms to identify specific features, like fire breaks, within satellite imagery. This involves curating large, labeled datasets of images showing examples of the features they want to detect and then training a model to recognize patterns associated with those features.

While this machine learning approach is more efficient than purely manual methods, it still presents significant hurdles. Training highly accurate models for specific tasks requires substantial resources. Manning notes the cost: “You probably sink, you know, couple hundred thousand dollars — if not multiple hundred thousand dollars — to try to create that data set, and it would only be able to do that one thing.” Each new type of feature or query often requires a new dataset and a new training effort, making it costly and time-consuming to build comprehensive analytical capabilities.

Furthermore, traditional geospatial data is often stored and processed in formats like pixels (in imagery) or traditional vectors (points, lines, areas with associated attributes). While these formats are standard and flexible, interpreting the complex relationships and underlying meaning within this data for AI models requires significant computational effort and specialized expertise. Connecting disparate pieces of geographic information – say, finding all roads wider than 10 meters that are within 50 meters of a forest edge and haven't been overgrown in the last year – involves complex spatial queries and processing that can be computationally expensive and difficult to formulate.

Introducing Geospatial Vector Embeddings

LGND aims to fundamentally change this paradigm by making geospatial data dramatically more accessible and usable for AI. Their core product revolves around creating vector embeddings of geographic data. This approach draws inspiration from the success of embeddings in other AI domains, particularly natural language processing (NLP), where words or phrases are converted into numerical vectors that capture their semantic meaning and relationships.

In the geospatial context, LGND's vector embeddings summarize spatial data in a high-dimensional numerical format. These embeddings are designed to capture the essential characteristics and relationships of geographic features and areas, regardless of their original format (imagery, sensor data, traditional vectors, etc.). The goal is to represent complex spatial information in a dense, numerical form that AI models can process much more efficiently than raw pixels or complex traditional vector structures.

Bruno Sánchez-Andrade Nuño, LGND’s co-founder and chief scientist, highlights the efficiency gains: “Embeddings get you 90% of all the undifferentiated compute up front.” Instead of performing extensive, task-specific computation every time a query is made or a model needs to analyze data, the heavy lifting of understanding and summarizing the spatial information is done once when the embeddings are created. “Embeddings are the universal, super-short summaries that embody 90% of the computation you have to do anyways,” he adds.

This means that once the geospatial data is converted into these vector embeddings, subsequent analysis, querying, and model training become significantly faster and less computationally expensive. AI models can work directly with these numerical summaries, which are designed to highlight relevant features and relationships, rather than having to interpret raw, high-resolution data from scratch for each task.

Applying Embeddings to Real-World Problems

Let's revisit the fire break example. Fire breaks can manifest in various forms on a map or in satellite imagery – they might be paved roads, unpaved tracks, natural riverbeds, lakeshores, or intentionally cleared strips of land. Each of these looks different visually, but they share common characteristics relevant to their function as a fire break: a lack of flammable vegetation within a certain width, and a minimum width requirement that might depend on surrounding conditions.

Traditionally, identifying all potential fire breaks would require separate models or complex rule sets for each type of feature (roads, rivers, cleared land) and then combining that information, while also considering attributes like width and vegetation density derived from other data layers. With geospatial embeddings, LGND can create numerical representations of different areas that capture these underlying characteristics. The embeddings for a road, a river, or a cleared strip, if they function as a fire break, would share similarities in the embedding space that distinguish them from areas with dense, flammable vegetation. This makes it much easier and faster to query the embedding space to find all locations that match the criteria for a fire break, regardless of the specific surface type.

This approach allows for more flexible and nuanced queries. Instead of just asking “Where are the roads?” or “Where are the rivers?”, users can ask questions based on the functional characteristics captured by the embeddings, such as “Where are all the linear features wider than X meters with minimal vegetation?” This shifts the focus from identifying specific object types to identifying areas with desired spatial properties and relationships.

Building the Platform and Vision

LGND is building both an enterprise application and an API to deliver their geospatial embedding capabilities. The enterprise app is designed to help large companies and organizations answer complex questions involving spatial data without needing deep expertise in geospatial analysis or machine learning. The API allows users with more specific or technical needs to integrate LGND's embedding service directly into their own workflows and applications.

Manning envisions LGND's embeddings enabling entirely new ways for companies and individuals to interact with and query geospatial data. He uses the example of an AI-powered travel agent to illustrate the potential. A user might ask for a short-term rental with specific criteria: “find a short-term rental with three rooms that’s close to good snorkeling. But also, I want to be on a white sand beach. I want to know that there’s very little sea weed in February, when we’re going to go, and maybe most importantly, at this time of booking, there’s no construction happening within one kilometer of our of the house.”

Answering such a complex, multi-faceted query using traditional geospatial methods would be incredibly challenging and time-consuming. It would require accessing and integrating data layers on property locations, room counts, proximity to coastlines, beach type, historical seaweed data, current construction permits, and performing complex spatial analyses (buffering, intersection, etc.). With geospatial embeddings, the relevant characteristics of each location (beach type, proximity to snorkeling, construction activity, etc.) could be encoded into its vector representation. The AI travel agent could then query the embedding space to find locations that match the user's criteria much more efficiently.

This vision extends far beyond travel. It could apply to everything from supply chain logistics (finding optimal routes considering real-time road conditions, weather, and regulatory zones) to environmental monitoring (identifying areas experiencing specific types of land-use change or pollution patterns) to resource exploration and management.

Funding the Future of Earth Data

To bring this vision to fruition, LGND recently secured significant seed funding. The company exclusively told TechCrunch that they have raised a $9 million seed round. The round was led by Javelin Venture Partners, a venture capital firm known for investing in early-stage technology companies.

Participation in the round came from a diverse group of investors, including AENU, Clocktower Ventures, Coalition Operators, MCJ (My Climate Journey), Overture, Ridgeline, and Space Capital. The involvement of firms like MCJ and Space Capital underscores the potential applications of LGND's technology in climate-related fields and the burgeoning space-based data industry.

A number of prominent angel investors also joined the round, lending further validation to LGND's approach. These include John Hanke, the founder of Keyhole (which became the foundation for Google Earth), Karim Atiyeh, co-founder of Ramp, and Suzanne DiBianca, an executive at Salesforce. The participation of individuals with deep experience in geospatial technology, fintech, and enterprise software highlights the broad potential impact of LGND's platform.

The $9 million in seed funding will provide LGND with the resources needed to further develop their core technology, build out their platform and APIs, and begin engaging with enterprise customers. It will also likely be used to expand their team of engineers and scientists.

Market Potential and Vision

If LGND can successfully deliver a tool that makes complex geospatial data analysis accessible and efficient, it has the potential to capture a significant share of a rapidly growing market. The global geospatial solutions market was valued near $400 billion, according to a report cited in the article. This market encompasses everything from GIS software and services to satellite imagery, remote sensing, and location-based services.

By providing a foundational layer – the geospatial embeddings – that can power a wide range of applications and analyses, LGND aims to become a critical piece of infrastructure for anyone working with Earth data. Manning articulated this ambitious goal, stating, “We’re trying to be the Standard Oil for this data.” This comparison suggests a desire to become the dominant provider of the underlying data representation layer that fuels numerous applications and industries built on top of geospatial information.

The vision is not to replace human analysts or existing geospatial tools entirely, but to augment them dramatically. As Nuño put it, “We’re not looking to replace people doing these things. We’re looking to make them 10 times more efficient, one hundred times more efficient.” By abstracting away the complexity of raw spatial data and providing a readily usable, AI-friendly format, LGND hopes to unlock new possibilities for innovation and analysis across countless domains that rely on understanding the physical world.

The success of large language models like ChatGPT has demonstrated the power of embeddings to make vast amounts of unstructured text data understandable and queryable by AI. LGND is applying a similar principle to the equally vast and complex world of Earth data. If they succeed, they could indeed become a foundational layer, enabling a new generation of AI applications that help us better understand, manage, and interact with our planet.