Supporting research and encouraging collaboration
With excitement and gratitude, we are pleased to announce that over 10,000 anonymous diabetes datasets have been donated through the Tidepool Big Data Donation Project.
The Tidepool Big Data Donation project helps drive innovation and research in the diabetes space by providing students, academics, and industry partners with anonymized datasets donated by people living with diabetes.
More than 14,000 users have signed up to donate their data, with over 10,000 having uploaded and donated both pump and continuous glucose monitor (CGM) data so far. This data provides rich information on basal levels, bolus amounts, and glucose values uploaded from a variety of different diabetes devices.
Here at Tidepool, we’re big believers in collaboration, and we are proud to support research and innovation through the Tidepool Big Data Donation Project. We want to thank the Tidepool community for helping to make it happen, and we wanted to share what we’ve learned, thanks to you.
Visualizing the Big Data Donation Project
In celebration of this important milestone, one of our Data Science Interns, Anne Evered, created an exploratory (interactive) data visualization. This tool allows you to compare CGM information across groups of Tidepool users, and to select glycemic outcome metrics from a pull down. For example, you can look at the percent time in range, the average BG value, or the percent time different ages spend below 54 mg/dL.
Find out what over 10,000 Tidepool users’ worth of continuous glucose monitor (CGM) data looks like.
In this view, you can compare glycemic metrics across different ages. For example, the percent of time from donors between the ages of 7-24 spend 59% time in range, and those >= 25 are closer to 70%. The plots are interactive, so you can hover over the bars to see details, and you can select different glycemic metrics using the pull down menu on the top. Happy exploring!
Also, looking at percent above 250 mg/dL shows some pretty significant shifts between donor age groups (figure above). What do you think are some causes for those differences?
Here are some other visualization examples from this donated dataset for you to explore and enjoy, starting with CGM data grouped by years since diagnosis.
Something that stood out to us in the figure below is that the standard deviation of CGM values was lowest among newly diagnosed (<1 year living with diabetes) donors.
Finally, here's what the data looks like when you factor both age and years since diagnosis. Try selecting a different metric from the dropdown menu. Does any of this data surprise you?
Thanks to you
We’d like to give one last thanks to everyone who has chosen to donate their data, in the name of progress – we literally couldn’t do it without you! If you’re new to the Tidepool Big Data Donation Project and interested in donating your data, please see this walkthrough for details – it only takes a few clicks, and it’s completely anonymous.
While the milestone of 10,000 donations represents an exciting step in Tidepool’s commitment to fostering data sharing and collaboration, it is far from the end of this project. We encourage Tidepool users to continue to donate their data to help research and innovation grow, and help us support other amazing diabetes nonprofits.
Questions about the data or the Tidepool Big Data Donation Projector that aren't addressed in the FAQs? Send an email to bigdata@tidepool.org.
Frequently Asked Questions
I’m an academic/student/industry partner/citizen scientist/other party and am interested in viewing or getting access to the datasets.
Great! Thank you for your interest. We have several ways for you (or your organization) to engage with the project.
If you are a researcher or industry professional interested in getting access to the anonymized donor datasets, please email bigdata@tidepool.org and we’ll be in touch.
We also plan to make a subset of the data donated through Tidepool Big Data Donation Project available at no cost to give everyone an idea of what’s possible with Tidepool’s donated datasets.
If you are not looking to license the detailed person-level datasets, you can also continue to explore the data at a high level in the data visualizations in this blog post, which contains additional CGM metrics and user groupings.
Can I still donate to the Tidepool Big Data Donation Project?
Yes, definitely. Please visit the Tidepool Big Data Donation Project page to learn more about how you can donate your data to support diabetes research and innovation. The 10,000 donor milestone represents an important step for this project, but we are continuing to add more donor datasets and would be proud to have you onboard.
What’s next for the Big Data Donation Project?
We have some exciting projects coming up, including partnerships with JDRF and some other fun explorations of the donated datasets.
I have additional questions about how the data was analyzed. How do I find out more?
First, you can check out our technical notes about the data below, which contain additional information about the analysis and metrics. If you have additional questions, definitely get in touch at bigdata@tidepool.org.
---
Technical notes on this data
- The Age and Years Living with Buckets, which are denoted, for example 0-7 as short-hand in the visualizations, are inclusive on the lower bound and exclusive on the upper bound. For example, 0-7 stands for 0 x < 7. These buckets match those used in the Jaeb Loop Observational Study.
- “N” is one year of data donated by a Tidepool user. A single user may have donated more than one year of data. For example, an 8 year old who has donated three years of data from when they were 6, 7, and 8 years old. As such, they will contribute N=2 to the 1-7 bucket and N=1 to the 7-14 bucket. Likewise, for the Years Living With Buckets, a single user may contribute an N to multiple buckets. You can see the number of values per bucket in the “(n= x)” notation in the visualizations.
- Donors to the Tidepool Big Data Donation Project are self-selecting (they volunteered); they are not randomly selected.