Site Selection Software – Solving the Data Problem Holding Back the AI Buildout

Four years ago, generative AI didn’t exist as a consumer product. Now, four billion AI prompts are issued across the major LLMs every day. Models and their AI-native applications run on physical hardware housed in data centers around the country and globe. The scale of that hardware requirement has grown faster than the existing infrastructure’s ability to support it, and hyperscalers are spending hundreds of billions annually to bring new capacity online. The result is a data center construction boom unlike anything we saw during the growth of cloud software, and there is intense pressure to complete these projects quickly and efficiently.

 

The Problem: Site Selection

Building a new data center can be a three-to-seven-year undertaking. This process moves through a predictable sequence – site selection, design, engineering, construction, and commissioning for operation. Each stage comes with its own complexities, but the one that drives the most uncertainty and can determine the ultimate success of the project is the first: site selection.

Developers (could be wholesale developers, construction firms or the hyperscalers themselves) must clear a number of independent hurdles, each with different stakeholders, timelines, and approval processes. The data required to navigate these hurdles is scattered across agencies and operators with no central aggregator, and firms are forced to commit considerable capital and time to a project before the full risk assessment is complete.

The dominant constraint is uncertainty around power access, with interconnection queues at major utilities stretching five or more years in the most competitive markets. This is forcing developers to more rural areas rather than the obvious tier-one markets and / or to supplement with on-site power generation (often natural gas or renewables).

Developers must also contend with zoning and regulatory risk as communities tighten restrictions around energy consumption and land use. Water access, particularly in arid markets where cooling demands are difficult to meet, is increasingly important. There are also physical risk criteria that rule out flood zones, seismic hazard areas, and other potential geospatial risks.

Understanding and resolving these bottlenecks is critical to meet AI demand across the country – data center vacancy rates are now at all-time lows (<2%) and nearly three quarters of new capacity under construction has already been preleased. An eight-to-sixteen-month site selection process is painful enough, but when nearly 90% of all data center projects face delays of some kind and another 10-20% fail altogether, it becomes a structural drag on the economy that grows more costly as AI demand continues to outpace our current supply infrastructure.

 

The Software Layer

Until recently, site selection was a manual process built on relationships and sequential data gathering.  A new generation of software is beginning to disrupt that model, aggregating site evaluation data and enabling analysis in days that previously took months. These companies are early but moving quickly, and the category is attracting serious attention from developers and investment from capital allocators.

On the power supply side, platforms are applying AI and grid modeling to map available capacity onto existing infrastructure, identifying power-viable sites before developers commit to land or enter interconnection queues. The traditional approach required months-long grid studies, but these new tools ingest utility filings, transmission records, and other interconnection queue data to produce a probability weighted picture of where capacity exists. This not only shortens time to commencing construction but also has an impact on the long-term success rate of the project. An example is GridCARE, which uses generative AI and grid modeling to identify hidden capacity on existing utility infrastructure, giving developers a data-driven picture of where power is most accessible before committing to a site.

Land and property insight platforms are also an important layer. It’s critical for developers to understand discrete data surrounding the land parcel they will be buying – think of ownership records, acreage, purchase price history, zoning status or colocation lease rates. Platforms such as LandGate consolidate this parcel level data and produce a composite score for each candidate site, giving land teams a structured, comparable view across their pipeline. Again, what previously took weeks or months can now take days or hours and result in lower failure rates.

Then there is the issue of local zoning and community approval. Permitting requirements vary by district and are often opaque until a project is already underway, meaning unforeseen issues sometimes arise after a site has been selected and construction is underway. By that point, relocating the project is not a viable option and the lengthy construction timeline is extended even further, or the project is killed entirely. Software that maps these requirements before selecting a site is a game changer and companies like Pulley and PermitFlow are helping address this problem now.

The same logic applies to geospatial and environmental constraint screening. Data centers must avoid flood zones, seismic risk areas, protected habitats, and wetlands, while also satisfying environmental impact requirements in each jurisdiction. Historically, this analysis happened late in the evaluation process through ad hoc consultant work, often surfacing disqualifying constraints that kill projects after significant time and capital had already been committed to a site. Platforms like Transect and PVcase automate this screening and bring it to the front of the funnel, allowing developers to focus resources on the most viable locations.

A common thread across these tools is that they accelerate discovery by surfacing disqualifying information earlier, before meaningful investment is made. This matters because the most expensive outcome in data center development isn’t always a slow project, it’s one that fails late. Environmental remediation costs that weren’t in the model, longer than expected interconnection queues and permitting delays from local governments can all lead to failure. Paces is built around this core value prop – the ability to identify and move on from a bad site early is a critical advantage.

 

What We Are Looking For

At Catalyst, we believe the companies described above are each solving a unique and important problem in a market that is still in the early stages of being defined. Winning companies in site selection software will continue to expand their product capabilities across other areas of site selection and power access, building breadth across the workflow and depth in proprietary data that makes the solution defensible long-term. As those moats grow, we expect advanced platforms to move toward agentic solutions that partially automate the decision-making process itself. The infrastructure buildout driving this opportunity is not slowing down, and the software layer that supports it is only beginning to take shape.