James Thinks

writing is a kind of thinking

My last post about analysing my sleep data had plenty of caveats, but despite my caution I started to wonder whether I was taking an interest in the right variables.

I'm aiming to sleep better for health and to feel more alert during the day. My first thought was to find out what influences how many hours I sleep each night. This was a guesstimate of my hours of sleep based on roughly when I fell asleep and woke up, minus any trips to the bathroom or time spent starting at the ceiling in frustration. Then I'd compare this to various lifestyle measures like how much I'd eaten, exercise, screen time, etc to see what, if anything correlated with a long sleep. Despite buying a gadget to help measure it, I'm not sure I have a more accurate measure of sleep quality, so approximate time asleep is what I tried.

However, I've realised that there are several ways in which "Hours that night" as I call it might not be the most useful measure. For example, there are times when I can't get a full night's sleep no matter how well prepared my body is for it. Sometimes I have to get up early for work, to go on holiday or because I have an audax that starts at 6am. Occasionally my daughter is ill and will wake me up several times. These things are thankfully rare, but could skew the results. I could simply delete any results where my maximum possible sleep was less than six hours, but this leaves less extreme cases.

I also recorded the maximum possible hours I could get each night. In my spreadsheet I subtracted the "Hours that night" from this to get "Missed sleep", thinking that would be a better measure. On the other hand, if I can only get three hours maximum and I miss none, is that really better than having a Saturday lie-in for up to nine hours, but only sleeping for eight, meaning missed sleep is one hour? Who knows how many hours I might have got if I'd tried to sleep for more than three hours?

So I tried working out some kind of scaling adjustment, so that "missing" one hour out of a possible nine gives a better score than missing one hour out of a possible seven. I could ignore anything over eight hours as most people are unlikely to sleep that long unless they've missed out on sleep the night before. But that makes a hard cut-off, which feels wrong.

So I've come up with a simple scaling algorithm which looks like this:-

def missed_sleep_scaled(row):
    useful_max = min(target_sleep, row['Max possible (hrs)'])
    if useful_max == float(0):
        # result is invalid.
        return -1
    max_expected_hours = min(target_sleep, row['Max possible (hrs)'])
    useful_missed_sleep = max_expected_hours - min(row['Hours that night'], target_sleep)
    if useful_missed_sleep <= hours_noise_threshold:
        useful_missed_reduced_noise = float(0)
    else:
        useful_missed_reduced_noise = useful_missed_sleep
    return float(10) * useful_missed_reduced_noise / useful_max

This "sleep score" correlates less strongly with "Max possible (hrs)" than "missed sleep" did (0.104 vs 0.198). That seems like a step in the right direction. I'm uncertain about whether I should tweak it until it doesn't correlate with "Max possible (hrs)" at all.

Mugshot of James cycling on a road in the sunshine.

James Bradbury

I write about whatever is on my mind. I do so mostly to help me think more clearly. If other people find it interesting that's good too. :-)

Read more...