The Yhat Blog

machine learning, data science, engineering

Analyzing the conditions for studying stars

by Philipp Plewa |

Astronomical Weather

Even a cloudless sky can mean bad weather for one of the world's biggest optical telescopes. The reason is turbulence in Earth's atmosphere, which constantly changes the optical properties of different layers of air. As a result stars appear to twinkle and any deep images of the night sky get blurred.

For astronomers it is extremely useful to know the current observing conditions, not least because prime observing time is precious and optimal scheduling is critical. At the Paranal Observatory, home to the Very Large Telescope (VLT), conditions are recorded every minute during regular operations.

Image: ESO / S. Brunier

We will explore the historical observing conditions at the VLT over the past 16 years and answer a lingering question in the minds of telescope operators: What is the probability that good conditions will last?

The Python packages we need are the high-level toolkits pandas for data analysis and seaborn for data visualization.

# Python 3.5.2
import pandas as pd # 0.18.1
import seaborn as sns # 0.7.1

The ESO Ambient Conditions Database

A database of ambient observing conditions at the VLT is available from the European Southern Observatory (ESO). There is also a neat online monitor. The various parameters are measured by a small auxiliary telescope called a 'Differential Image Motion Monitor' (DIMM). It works by keeping track of a bright reference star that is imaged along two different lines of sight through the atmosphere.

Let's import a dump of this database containing all DIMM measurements made between January 1999 and December 2015 (3 million rows).

data = pd.read_csv("dimm_paranal.csv",
        "raw_datetime", # timestamp (UT)
        "raw_seeing", # seeing (arcsec = 1/3600 degree)
        "raw_tau" # coherence time (seconds)

# discard incomplete or bad data
data = data[(data["raw_seeing"] > 0) & (data["raw_tau"] > 0)]

data.index = pd.DatetimeIndex(pd.to_datetime(data["raw_datetime"]), tz="UTC")
data["localtime"] = data.index.tz_convert("America/Santiago")

data[["raw_seeing", "raw_tau", "localtime"]].tail(3)
2015-12-31 08:59:34+00:001.450.0019672015-12-31 05:59:34-03:00
2015-12-31 09:00:39+00:001.620.0019652015-12-31 06:00:39-03:00
2015-12-31 09:01:45+00:001.520.0017622015-12-31 06:01:45-03:00

Seeing and Coherence Time

The most important astronomical weather parameter is the seeing. It is measured as the apparent (angular) size of a point-like star on the sky and is a direct indicator of the 'blurriness' of the atmosphere. As a general rule, the smaller the seeing the better.

The VLT is located at an exceptional observing site in Chile that lies isolated on a high and dry mountaintop in the Atacama desert. A typical seeing of around 1 arcsecond makes it one of the best sites in the word for stargazing.

data["seeing"] = data["raw_seeing"] # (arcsec)


Yet the monthly seeing statistics show that there are significant changes in the average seeing over the course of a year and also on longer timescales. The frequency of good or excellent observing conditions can thus be expected to change as well.

sns.boxplot(x=data["localtime"].apply(lambda d: (d.year, d.month)), y=data["seeing"])

Another important parameter is the coherence time. It is effectively the time it takes one turbulent pocket of air to pass over the telescope. A longer coherence time implies a more stable atmosphere, which is advantageous.

data["tau"] = data["raw_tau"]*1000 # (milliseconds)

A good coherence time of a few milliseconds usually occurs together with good seeing, but it is particularly relevant for technology that functions on similar timescales, such as adaptive optics systems (with our without a laser guide star like in the image above) or 'fringe-tracking' interferometers.

data.plot.hexbin(x="seeing", y="tau")

The VLT Weather History

To analyze the weather history in more detail, let's start by writing a function that classifies a single observing night as either 'good' or 'bad' based on the ambient conditions.

The is_good function first checks each input measurement in data for violating limits on the seeing (max_seeing) and coherence time (min_tau). It then calculates the time differences between successive measurements and finds the longest time interval during which one could have observed within these limits, without interruption. This value is finally compared to a required duration.

def is_good(timestamp, data, max_seeing, min_tau, duration):
    flag = (data["seeing"] < max_seeing) & (data["tau"] > min_tau)
    delta = data["localtime"] - data["localtime"][0]
    delta -= delta.shift().fillna(0)
    delta /= pd.Timedelta(hours=1)
    block = (flag.shift().fillna(0) != flag).cumsum()
    good = (delta*flag).groupby(block).sum().max() >= duration
    return good, timestamp

Now it is straightforward to determine for each month a probability of being able to observe in certain conditions by counting the number of 'good' nights in that month.

def probability(freq="1M", *args, **kwargs):
    nights = pd.Series(*zip(*[is_good(*group, *args, **kwargs) for group
        in data.groupby(pd.TimeGrouper("24H", base=12, key="localtime"))
        if len(group[1]) > 0]))
    return nights.groupby(pd.TimeGrouper(freq)) \
        .apply(lambda group: group.mean())

Because not every observing program demands the best ambient conditions, it is practical to consider three different seeing limits: 'fair' (1.2 arcsec), 'good' (0.8 arcsec) and 'excellent' (0.6 arcsec). A reasonable minimum coherence time is 2 ms, assuming one observing block lasts 1 hour (following the paper by Mérand et al. 2014).

result = {}
for condition, seeing in (("fair", 1.2), ("good", 0.8), ("excellent", 0.6)):
    result[condition] = probability(max_seeing=seeing, min_tau=2, duration=1)
result = pd.DataFrame(result)



The influence of the seasons on the observing conditions stands out at first glance. For example, the probability of having 'fair' conditions has recently been as small as 40%-60% in the winter semester (April to September), but still larger than 80% in the summer semester (October to March). However, 'excellent' conditions have been somewhat more likely in the winter months.

More surprising though is that since 2006 the probability of 'excellent' conditions has never been more than 20% for any extended amount of time. Ever since 2010 it has been less than 10%, roughly. The earlier years must have had much better weather overall, with the exception of 2002. The most recent El Niño event, one of the strongest on record, presumably caused the latest stretch of poor conditions starting in 2015.

Parting Thoughts

For further analysis it would be interesting to also have comparison statistics from other observatories.

In any case, it looks like astronomers should be prepared for more bad weather to come.

Our Products

Rodeo: a native Python editor built for doing data science on your desktop.

Download it now!

ScienceOps: deploy predictive models in production applications without IT.

Learn More

Yhat (pronounced Y-hat) provides data science solutions that let data scientists deploy and integrate predictive models into applications without IT or custom coding.