Data Umbrella Newsletter: May 2022

We organize data science events for the community.

Apr 30, 2022

About Data Umbrella

Data Umbrella is a non-profit global community for underrepresented persons in data science. We organize online data science events for the community. All levels are welcome, beginners and experts. Our Code of Conduct applies to all of our spaces.

Announcements

Call for Volunteers: Video Timestamps

We are looking for assistance in adding video timestamps.

Timestamps make it easy for viewers to get to the part in the video they are interested in.
Timestamps also help potential viewers find the video based on their search terms.

We have instructions on how you can contribute to this project on GitHub. Help us help the community. Pick a video and get started!

Here is an example of timestamps for the GitHub Actions video:

Community Blogs

Check out our latest blogs:

Call for Speakers

We are looking for speakers on the following topics. If you or someone you know can speak on these topics, please email us: info@dataumbrella.org

Open Source Literacy (history, challenges, education or other related topics)
How to Debug in Python

Data Umbrella Impact

Would you like to share an impact that Data Umbrella events and resources have had in your data journey? Send it to us (info@dataumbrella.org), and we will feature it on our Impact Page.

Upcoming Events

Introduction to PyTorch (+Scaling Up with LightningLite)

Sebastian Raschka & Adrian Wälchli

This talk will introduce attendees to using PyTorch for deep learning. We will start by covering PyTorch from the ground up and learn how it can be both powerful and convenient. At times, machine learning models can become so large that they can't be trained on a notebook anymore. Being able to take advantage AI-optimized accelerators such as GPU or TPU and scaling the training of models to hundreds of these devices is essential to the researcher and data scientist.
However, adding support for one or several of these in the source code can be complex, time consuming and error-prone. What starts as a fun research project ends up being an engineering problem with hard to debug code. This talk will introduce LightningLite, an open source library that removes this burden completely. You will learn how you can accelerate your PyTorch training script in just under ten lines of code to take advantage of multi-GPU, TPU, multi-node, mixed-precision training and more.

Contributing to SciPy (Melissa Weber Mendonça)

SciPy is one of the foundational libraries in the PyData/Scientific Python stack, and is a popular tool for scientists, data scientists, industry professionals and students. It builds upon the NumPy array structures and provides algorithms for optimization, integration, interpolation, eigenvalue problems, algebraic equations, differential equations, statistics and many other classes of problems. In this talk, we will walk through the steps necessary to contribute to SciPy, including code, documentation, triaging issues and pull request reviews. We will also talk about community and the different ways you can interact with SciPy maintainers.

Community Events

The Wikimedia Hackathon

Wikimedia Hackathon 2022 will be held online May 20-22, 2022. The Research team also holds monthly office hours (Wikimedia Research/Office hours - MediaWiki) and Research Showcases (Wikimedia Research/Showcase - MediaWiki). These are good opportunities to learn more about how to get involved and the types of research being done on or related to Wikimedia projects.

Recent Events

In case you missed our recent events, the videos have been posted. Subscribe to our Data Umbrella YouTube to receive notifications when the videos premiere.

Intro to GraphQL for Data Scientists (William Lyon)

This session will introduce GraphQL, with a focus on the use of GraphQL in data science. We will start with an overview of GraphQL, including the advantages/disadvantages of using GraphQL. Next, we will explore common patterns for building GraphQL APIs. Finally, we will see how GraphQL can be integrated into common data science workflows using Python and Jupyter notebooks.

Editing Wikipedia: Because Someone Has to... (Rob Lanphier)

Rob Lanphier, a longtime editor of English Wikipedia, will describe how to get over one's existential dread of SCREWING UP WIKIPEDIA and making some changes to the mistakes. As it turns out, there are a few mistakes on English Wikipedia, not to mention the hundreds of other languages (some of which were introduced by Rob). Rob will also describe some of the resources available to Python developers, and describe how to see which articles get loaded by people all over the world. He'll also leave plenty of time for questions, to each of which his answer may have something to do with the question.

Arrays, Linked Lists & Graphs (Clair Sullivan)

With a Leetcode example!

It is easy and convenient to treat all data as an array. They are the basis of much of Python and a simple data structure to deal with. But there are times that arrays fail us, such as on element insertion and the pre-allocation of memory. There is power and efficiency in linking data through data structures such as linked lists. We will see how using linked lists can reduce Big O complexity and solve a variety of problems. Then we will explore how graph data structures take this a step further and open up a world of new options and opportunities for efficient computation.

Introduction to Holoviz (Julia Signell)

HoloViz provides a set of Python packages that make viz easier, more accurate, and more powerful: Panel for making apps and dashboards for your plots from any supported plotting library, hvPlot to quickly generate interactive plots from your data, HoloViews to help you make all of your data instantly visualizable, GeoViews to extend HoloViews for geographic data, Datashader for rendering even the largest datasets, Param to create declarative user-configurable objects, and Colorcet for perceptually uniform colormaps.

Featured Resources

Video Playlists

Highlighted Resource

Collecting Gender Data

Data Umbrella Team

In this section, we share content from our team.

Blog

Deploying A Deep Learning Model on Mobile Using TensorFlow and React

From the Vault

Here we share one of our favorite and impactful videos in open source:

Supporting Data Umbrella

Data Umbrella is now on Benevity. If your company uses Benevity, which is a donation platform for employer-matching contributions to non-profits, please consider making a contribution to Data Umbrella. Note: this link is active for registered users of Benevity: Data Umbrella on Benevity
For users not on Benevity, donations can be made directly to the Open Collective.
Supporters can donate company stock to Data Umbrella Open Collective. Normally when you sell stock, you have to pay capital gains tax. But if you donate it to a tax-exempt nonprofit, it can be sold tax-free. So everyone wins: the donor gets tax benefits, and the Collective gets a donation to support its mission. Contact us for more information: info@dataumbrella.org

Data Umbrella Resources

Visit our blog site: blog.dataumbrella.org, and see articles written by our community members on their experience in recent sprints.
We have a Job Board. You can: post jobs (for free), search jobs, subscribe to a weekly update to see postings.
Our Data Umbrella YouTube is growing! Subscribe to our channel to receive notifications of when our event videos are posted.

Accessibility Corner

Accessibility Update: Closed Captioning

Our webinars have closed captioning available! This feature makes our live events more accessible to those with hearing needs and for folks in general who like to see the transcript live during presentation to fully process information.