The Problem: 200 Billion Rows of Invisible Data

For more than a decade after NASA's NEOWISE telescope began its reactivated mission, one of the most comprehensive infrared surveys of the entire sky sat largely untapped in institutional archives. While the primary asteroid-tracking data was processed, an astronomical treasure trove remained hidden: variable objects like flickering quasars, pulsing stars, and eclipsing binaries buried within 200 billion individual detections.

Davy Kirkpatrick, a Caltech senior research scientist at IPAC (Infrared Processing and Analysis Center), had spent years contemplating this dataset. "At that point, we were creeping up towards 200 billion rows in the table of every single detection that we had made over the course of over a decade," Kirkpatrick explained. The data had grown too large for conventional analysis methods.

His original summer project proposal was modest: analyze a small patch of sky, manually find some variable stars, publish them as proof of concept. Then 17-year-old Matteo Paz from Pasadena High School walked into his laboratory with a different idea entirely.

VARnet: The Model That Runs at 53 Microseconds Per Star

Matteo Paz wasn't your typical high school intern. He had been attending Caltech's public stargazing lectures with his mother since elementary school. By summer 2023, when he joined Kirkpatrick's lab through the university's Summer Research Connection program, he had already completed AP Calculus in eighth grade through Pasadena Unified's accelerated Math Academy and was studying undergraduate-level mathematics.

On his first day, Paz told Kirkpatrick he wanted to publish a paper. Rather than the manual analysis Kirkpatrick had planned, Paz proposed building an AI model to analyze the entire NEOWISE database—all 10.5 years of observations covering the complete sky.

The result was VARnet, a sophisticated three-stage machine learning pipeline designed specifically for astronomical time series data:

The performance metrics, published in The Astronomical Journal in peer-reviewed detail, are staggering: VARnet processes each astronomical source in less than 53 microseconds on a GPU with 22 gigabytes of VRAM, achieving an F1 score of 0.91 on a validation set of known variable objects. At that speed, the model can analyze the entire NEOWISE catalog—something that would have taken human astronomers years—in a matter of hours.

Mentorship: The Tennessee Connection

Kirkpatrick's approach to mentoring Paz was deeply informed by his own experience. Growing up in a farming community in Tennessee, his ninth-grade chemistry and physics teacher, Marilyn Morrison, told him and his mother that he had scientific potential and outlined the courses he would need for college. She was, Kirkpatrick said, the reason he became an astronomer.

"I wanted to pass on that same sort of mentoring to someone else and hopefully many someone elses," Kirkpatrick reflected. "If I see their potential, I want to make sure that they are reaching it. I'll do whatever I can to help them out."

That philosophy shaped every decision in the summer project. When Paz proposed the ambitious AI approach, Kirkpatrick didn't dismiss it as too complex for a high school student. Instead, he connected Paz with Caltech researchers Shoubaneh Hemmati, Daniel Masters, Ashish Mahabal, and Matthew Graham—experts in machine learning techniques for astronomy and the analysis of variable objects on different timescales.

Paz acknowledged the impact: "He has allowed an unbridled learning experience. I think that's why I've grown so much as a scientist."

The 1.5 Million Candidates: What They Mean

VARnet flagged 1.5 million potential variable objects in the NEOWISE archive. This figure requires careful interpretation: these aren't 1.5 million confirmed new discoveries in the traditional sense. Each flagged source is a candidate requiring follow-up observation and classification by astronomers.

The breakdown will likely include:

The full catalog is scheduled for publication in 2025. Once released, it will provide the astronomical community with a dataset large enough to support statistical studies of infrared variability across the entire sky—a capability that has never existed before. Instead of the piecemeal analyses that have characterized infrared variable star research, astronomers will be able to study populations, distributions, and rare phenomena at scale.

Beyond Astronomy: Time Series AI at Scale

Paz, now employed at IPAC while finishing high school, sees applications far beyond the night sky. "The model I implemented can be used for other time domain studies in astronomy, and potentially anything else that comes in a temporal format," he explained.

The implications are significant:

The architecture's ability to handle irregularly sampled data—a common challenge in real-world time series—makes it particularly valuable beyond astronomy. Most time series occur with missing data points, irregular intervals, or varying measurement quality. VARnet was designed from the ground up to handle these realities.

The Democratization of Scientific Discovery

Perhaps the most profound aspect of this story isn't the technical achievement—though that's impressive—but what it represents for the future of scientific research. A high school student, given access to institutional data, mentorship from established researchers, and computational resources, accomplished what professional astronomers had considered a multi-year institutional project.

This democratization is happening across scientific disciplines:

The collaboration also revealed constraints that institutional researchers hadn't fully appreciated. NEOWISE's observational rhythm—scanning in great circles centered on the Sun—means it cannot systematically detect objects that flash once and fade, or those that change gradually over years. Some classes of variable phenomena will remain invisible to any automated survey based on NEOWISE data alone. Understanding these limitations required both Paz's fresh perspective and the institutional knowledge of the Caltech team.

What Comes Next

The catalog's publication in 2025 will mark the beginning, not the end. Astronomers worldwide will use VARnet's candidate list to:

For Paz personally, the project has opened doors. Now working at IPAC while completing high school, he's already thinking about extending VARnet's capabilities and exploring new applications. Kirkpatrick's mentorship model—treating a talented high school student as a genuine research collaborator rather than an intern to be given simplified tasks—has produced exactly what he hoped: a young scientist reaching their potential and contributing meaningfully to their field.

The Bigger Picture: AI as Scientific Amplifier

This breakthrough exemplifies AI's role in modern science: not replacing human insight but amplifying it exponentially. The NEOWISE data existed for over a decade, accessible to any astronomer. What changed wasn't the availability of information but the tools to process it at scale.

Machine learning models like VARnet act as scientific amplifiers, letting researchers ask questions that were previously computationally infeasible. The pattern repeats across disciplines:

In each case, human expertise guides the questions, interprets the results, and provides domain knowledge to shape the models. But AI handles the scale—processing more data, faster, than any human team could manage.

Conclusion: Mapping the Invisible

"I mapped the invisible," is how Paz described his work—a fitting summary for a project that revealed 1.5 million objects hidden in plain sight within publicly available data. The story is about more than astronomy or AI. It's about what becomes possible when we give talented young people access to powerful tools, pair them with generous mentors, and trust them to tackle genuinely difficult problems.

As Kirkpatrick reflected on the experience, he emphasized the importance of recognizing potential: "If I see their potential, I want to make sure that they are reaching it." That philosophy—combined with open data, accessible computational resources, and machine learning frameworks anyone can use—is democratizing scientific discovery in ways that would have seemed impossible a generation ago.

The 1.5 million candidate variable objects are just the beginning. VARnet's real legacy may be demonstrating that the next major scientific breakthrough could come from anywhere—including a high school student in Pasadena with access to NASA data and an AI model running at 53 microseconds per star.