Energy Data Analytics

0 Comment

Good afternoon and welcome to this
discussion of how data science techniques are starting to transform the
energy sector. My name is Kyle Bradbury. I’m the managing director of the energy data analytics lab here at the Duke University Energy Initiative and special thanks to the Alfred P. Sloan Foundation for making this event and the larger PhD
Fellows Program which motivated this event possible. So, today what we’re going
to do, is talk about how like I said how data science techniques are starting to
change how we think about energy systems problems starting with a short, short explanation. I’m calling it the world’s shortest explanation of data science. I’ll go on to talk about how energy data themselves are increasing, how we’re getting more and more sources of energy data emerging through sensors, and other aspects of the energy system, and then dive into some of the specific data science energy applications, and talk about a number of those that are really
changing the face of how we generate, transmit, and consume energy. Also of course, in this space there are some challenges around data science in
the space, security privacy challenges. We’ll talk a little bit about that, and
discuss how to get involved in Duke’s energy data analytics lab, both for PhD students who may be involved in the PhD fellows program as well as undergraduates and graduate students interested in this space. So you know, we have a wealth of emerging energy data resources. But to have the right tools to make use of these data, we need to think about what are some of the disciplines involved and tools that are available. What is this data science thing? So, one way to think about it is, it’s actually a fusion of a number of disciplines. Right, so probability, statistics, and math– certainly a very important part of it. Computer science and programming, another key component, but also domain expertise. And so if you look at sort of the intersections of each of these, you know, domain expertise and computer science, you think of software development for specific applications. When you think of computer science, probability, stats, and math, well one of the children of that couple happens to be machine learning. And then for domain expertise and prob, stats, and math, you find a lot of traditional research that kind of fits into that space And then at the nexus of those, is where we see data science fitting. So it’s bringing together a number of disciplines for the purpose of gaining insights from increasing amounts of data. And so you know, when we talk about data science,
machine learning inevitably comes up as one of the key components of it. But you
know, there’s a lot of terminology that floats around. So you start off with hearing about artificial intelligence, kind of the big umbrella. That’s everything from
cybernetics statistical learning to symbolic AI, which would be creating a
lot of very specific rules that are done by experts, reflecting directly
human thought into code. But then, there’s a level below that, that kind of gets more specific and honing in on the statistical learning piece, which is
machine learning, which can do a number of things. So, we can uncover structure
and data; it’s typically unsupervised learning. Make predictions from data. Now predictions doesn’t– I’m not always talking about the future. It’s a broader topic there than that. But making predictions that
supervised learning, learning from examples, and then learning by doing,
reinforcement learning. If any of you have heard of of AlphaGo,
or any of the key breakthroughs in you know, a computer beating a chess master,
that’s what you think of with reinforcement learning. Learning to take actions that help you achieve a strategy. Okay. And so then within
machine learning, there’s deep learning, which basically is a set of
techniques that is structured in such a way that it allows to really make the
greatest use of advances in computation, things like graphics processing units,
which allow extreme increases in computational speed, which are
required to do these vast processes, to train deep learning models. So
that’s a little bit of the hierarchy there, but so if we have, sort of these
these tools at the ready, we have to think about what sort of data do we have
in the energy space to work on there. So I mentioned at the very start of that
talk, smart meters. So as you start to see more and more of these deployed, that’s making more and more available of these large data sets of a time series data at the individual building level, okay. And then of course, we have grid level data, phasor measurement units or PMUs, which measure voltage and current
throughout the grid. We have smart appliances. So these are things that you might have heard of: a Nest thermostat, or a smart thermostat. There are certainly lighting solutions that are also within this space that have other
additional sensors in them, and a whole connected home type of environment,
that’s not only making your house slightly more comfortable or more energy
efficient, but it may be also helping to generate data about different types of
behavioral patterns in the home that you might be able to utilize to better
understand your own energy usage, or third parties might be able to utilize
to help make suggestions for improvements and cost savings. And then
of course, we have power markets more and more producing vast quantities
of data, you know, every five minutes, and some, you know, with
some of the real-time markets that are available, these can produce a number of
different indicators about what’s driving crisis electricity prices, and
what’s the status of the capacity that’s available for use, you know, in
electricity systems. And of course, we have vehicles, right? So you know electric
vehicles and even some, you know, conventional vehicles are now generating
data. How many of you have driven in a vehicle that has given fuel economy? Okay,
okay. That’s, it’s giving behavioral feedback, which, you know, certainly we can
use as humans, but also may be able to be analyzed through data science techniques. And lastly, an unexpected kind of source of energy data: satellite imagery. You can
actually see different components of the energy system from above. So we’re
going to think a little bit about this and first talk about a roadmap of
how we go from making use of these data for specific applications. But it
requires kind of going through some of the different components of the energy
system and talking about key areas that can most benefit from the application of
data science technique. So we’ll start kind of, from talking about resources
through end use. So of course, when you talk about renewable generation, and oil
and gas systems, and we can use those certainly to make electricity and active
electricity markets. And of course, there’s transmission and distribution to
consider there, and collectively, this top piece makes up our electricity’s system in part. And then you know, the electricity can go to buildings, we can have oil and gas potentially going to heat buildings as well, and both oil and gas
and electricity may be going to fuel our our vehicles and transportation systems.
And kind of crossing all these boundaries is, you know, system level
assessment and planning, how we can take a bigger picture
look at all of these systems collectively. So we’re going to start and
go through them in these order to talk about specific energy data applications.
So starting with renewable generation, there’s actually a number of
interesting innovations going on in this space. So, first of all, generation
prediction and forecasting. So this, you know, we’re talking about wind power
forecasting, solar power forecasting stochastic generators on the grid lead
to challenges in maintaining balance between supply and demand. And so
improved predictions can significantly increase system reliability,
but also reduce system cost for eliminating needs for additional
backups. So there have been a number of companies that have been investing in
this space. There’s an interesting anecdote from IBM. So they purchased the
weather company, and a product called deep thunder, which is interesting deep learning technique, that is making hyper local weather predictions. So why might you care about that? Well, hyper local weather predictions, that’s something where we can be predicting specific cloud cover over individual solar panels, right. So this can significantly improve forecasts, and they’re advertising 50 percent improvements in prediction accuracy, so potentially
significant gains. But when they announced the purchase of the weather
company, there was a tongue-in-cheek remark that IBM had gotten confused of what it meant to do computation in the cloud. So yeah, renewable generation definitely a significant potential for improvement there. And then of course, you have other areas that can gain from data science and machine learning techniques, optimally citing wind and solar techniques to produce the most power, but also to do so in a way which best leads to reliable integration into the grid. And of course, optimal sizing to do the same. So, these are complex problems with many, many
variables and so kind of to make exhaustive searches of the possibilities
it is usually not feasible. That’s often where different machine learning
techniques and data science techniques can really shine. And then there’s
another interesting area of materials discovery– finding ways of
identifying new materials. So I mean, perhaps it’s photovoltaic materials,
perhaps it’s energy storage materials, by again, looking at many possible
combinations of molecular patterns to find new materials that otherwise may take an exceedingly long amount of time to do trial and error research. So that’s another area where there’s some enhancements going on in this space. So, you know, moving for renewable generation, a lot of interesting stuff going on in oil and gas as well. So, you know, on the
exploration side, analyzing seismic data. For those of you who are not aware,
seismic data, we’re talking about these, they have these large trucks that are
trying to sense below the ground, whether or not there is oil and gas. So they do
that by pounding the ground and waiting for that vibration to travel
below the surface and bounce back, right. And by doing that, they can get
information about what might potentially be down there. But this is a very complex
process and results in these enormous three-dimensional cubes of data. So
how do you identify in a really efficient way, whether or not there is
oil and gas? Of course, what’s at stake is a very expensive, potential drilling process to go through there. So it requires some careful analysis. Another piece is on the production side. So you know, what can we try to do here? Well, optimize output while minimizing cost and impact. So you know, this might be choosing well pressures and flow rates very carefully, so that you can optimize each of these pieces. But of course, we have human expertise that can be put to play here. But are there ways that different types of data science, maybe reinforcement
learning, can be used to improve that process and increase the efficiency of
these types of systems. So then when we take our resources from, whether it’s renewable or oil and gas, we can talk about electricity
markets and how things might play out there. So first of all, forecasting, again, is it is another really important topic here. So, when you’re looking at market clearing prices and bids into the market, so a generator may choose to bid X amount of dollars per megawatt hour or per megawatt attendance at capacity or energy market. This type of information certainly would be of interest to traders, right? Who are trading in these energy markets, and players in the market, of
course, as well. And the more information we have about these pieces, the greater
the potential for increasing, once again, system efficiency. You’re going to hear a
number of these terms– things like efficiency optimization repeated over
and over again, but that’s critical because each of these will lead
to some specific outputs. And demand forecasting, so one thing is, how much will the price of electricity be, or the price of oil and gas be? Another piece could be, how much demand will there be? What do I have to prepare for?
The closer that we can get the forecasted demand to the actual demand,
the greater the efficiency of the system. Because I don’t need, again, to schedule
lots and lots of reserve requirements on the system. And then another one which
is probably one that may be least thought of when this slide was initially put up: enabling distributed
peer-to-peer transactions. So the seemingly ethereal word blockchain gets
thrown in here. So what are we talking about here? So, imagine that you have
solar PV on your house, and your neighbor has, well actually would like to purchase your
solar PV electricity. Right now there’s not really any way to facilitate that direct transaction. This essentially, could allow something like that. Distributed peer-to-peer transactions, that are made secure because of what’s
known as a distributed ledger. Blockchain in a nutshell is the idea that, if you have a traditional, let’s say you were a bank, right. You have your
central database, somebody wants to make a transaction, they tell you about making
this transaction with this person. You register that in your database and you
are the one maintaining that. You have to keep that ultra secure, be very, very
careful with all of that because if it gets out, there’s a lot of potential loss
of personally identifiable information. Or people could steal money from that obviously. Now, what if this was distributed instead? So, what does that mean? Instead of just this central bank has all of the information about every transaction, every participant, and thousands of nodes around the world have
the entire set of transactions, so everybody is collectively recording
this, and as you go through that, you can say okay, if one person
tries to fake something on the ledger and that one is different from the other
thousand, you can say, “oh no that’s, something weird’s going on there. We’re
going to trust the masses in this case because it would be
very hard for one hacker to hit all of those different notes.” So that’s the
basic idea. You distribute the accounting, essentially, across many different individuals, and that’s essentially
the nature of blockchain. So this could significantly transform how we do
energy transactions, and enable, sort of, these smaller scale, peer-to-peer transactions. So certainly would be something to look towards in the next few years. So then, moving from energy markets, we’re talking, okay we’ve sold things, there needs to be power flowing, we need to get our energy from the
generator to the end-user. So what are our challenges here or
opportunities here? So one, detecting and predicting line falls. Was there a
failure or is there going to be a failure? If we can do each of those more
quickly, we can prevent outages and save system costs, and you know, thereby
increasing the reliability, as well. Also, preventive maintenance: can we anticipate
what parts of the system are going to need the most servicing going forward,
and proactively perform maintenance, rather than reactively, when something
fails. And then a non-technical loss or “theft” detection is another important
piece. Can we see if there’s a line in which there’s some power flowing
that’s not accounted for anywhere on the system and is not accounted for by
natural losses due to power flowing on the line and losing a little bit of
energy along the way due to thermal heating and whatnot. Another
piece of course is anomaly detection, which covers many of these things, but
can we just figure out if something weird is going on in the system? Again, it
kind of gets back up to that top point, on detecting and predicting faults.
Can we detect that beforehand though, or at least while it’s going on,
before it disrupts the system significantly. Okay, so now we’ve got our
power flowing through, onto the transmission lines across from the
generators, and we end up towards one of our first categories of end-users
buildings. Okay, so what can we do with data science for
for buildings when it comes to energy? One piece is internet of things devices, and how we
can gain energy insights from those. So, you can imagine, certainly talked
about before, smart thermostats, different smart appliances. These different
appliances could potentially provide information that could help the individual
building owner to increase the efficiency of their homes by allowing
these systems to be coordinated, perhaps making use of time abuse pricing
changes, where its price is lowered some times of the day, higher other times in the day. Identifying inefficient appliances and getting insight into those
appliances and perhaps even acting as a little bit of a decentralized
demand response. So what’s demand response? Demand response is if
there’s increased demand on the system and we don’t have enough
generation to meet load, I can either increase generation or reduce demand.
Both of them put the two back into sync. But, to do that typically requires some sort of demand response aggregator to be to be actively
managing all of that and turning off appliances and turning things on as
needed, to do that. What if you had a system that was able to tell what the
demand was on the system, on the larger grid, based on measuring voltage and current, and use that to proactively enable demand response? Certainly a possibility there, but it would take some data science
techniques to enable that to happen. Automated demand management is another piece, highly related to what we were just talking about there. You know, the idea of peak
shifting, from expensive times of day to less expensive times of day.
Arbitrage, buying energy when the price is low and selling when it’s high.
This could be enabled if you had automated demand management systems, and
demand charge reduction. So typically, we’re used to paying a dollar per
kilowatt hour, so a dollar per unit energy cost, but for many customers, especially commercial customers and industrial customers, you pay a dollar
per kilowatt hour cost, your energy cost, plus, a capacity cost, a dollar per
kilowatt. What is not just the total amount of energy used, but what was the
peak amount of demand that you added to the grid at any point in time. If it was
when you know, you turned on your pool pump in the middle
of the day, and things spiked by an extra 400 watts, then you’re going to
pay an extra added cost based on that addition to your total demand. And so
that’s another piece that can be potentially predicted and mitigated
through the use of data science techniques. Automated building energy
auditing–so what if we took, if we took all of this information that’s
coming from our various smart meters and put those through a data
science algorithm to break it down into device level information that
corresponds to how much energy my refrigerator is using, my HVAC system is
using, my TV is using, that provides insight into how we’re consuming
electricity and provides actionable feedback that we can act on. Another word
for that is non-intrusive load mopping, because you don’t have an active sensor
in the home, you’re using the smart meter. Demand-side management aggregation,
talked about that a moment ago. You know, this idea of aggregating all of
these, you know, demand response units there. Another piece is storage
aggregation and certainly, energy storage is something that has come down
significantly in cost, but still has a ways to go. It’s still somewhat
expensive, but there are a number of entrepreneurial companies that are
looking at how they can install a lot of individual energy storage systems, that provide backup and maybe demand charge
reduction and energy management, but collectively, form a grid resource that
they can bid into the market. And so, storage aggregation and an
optimized operation, these are things, these are decisions that can be made by
an operator control room when you have thousands of assets all around, you know,
with very different needs. This is something where you need some sort of
artificial intelligence to do, to assist with and the company like Stem of
California, have been working on this for some time. In the last piece of this
section is customer segmentation. So if we have information on smart meter use,
then perhaps we can then take that information and determine what
type of customer is in this pile over here and what type of customer is in
this pile over here. Maybe this type of customer, you know, all these customers have swimming pools and you know, happened to live in
affluent areas. Well maybe it’s useful to know that to provide energy efficiency
rebates that would be most targeted for them or other types of targeted
opportunities. So certainly there are many commercial
customers and utilities that would understand that we’d be interested
in that sort of technology. That brings us to one of
the other end uses, which is transportation. So transportation, obviously with vehicles, and these are very complex systems. Vehicles on roads and highways and we’re not just talking about cars here, but this could be buses, this
could be, potentially, airplanes, trains. You know, the old movie, Planes, Trains, and Automobiles. Whatever it may be! But
because there is so much complexity in these systems, having some amount of
infrastructure planning is pretty critical. So being able to eliminate
bottlenecks and increase the efficient, increase the efficiency there,
optimizing traffic signals to make sure that the flow is a little bit
smoother and of course investment decisions and all of the above.
Improving engine design in individual cars is another piece that’s that’s
quite important. You can imagine there are thousands of possible
technologies that could be added in many different combinations, meaning there are
millions or billions of possible combinations of ways that that engines
could be designed, right. So you know, finding a way, for example, if you wanted
to meet the corporate average fuel economy or cafe standards of fuel
emission standard, what might be some of the most cost effective ways of doing
that? Maybe you can use data science techniques to help in the design process
without having to build hundreds of prototypes, but actually
find ones that will meet those standards in the least cost way. Of course, if we
had an entire fleet of cars available that we could autonomously control, that
may allow us to very carefully route them in a way that leads to
increased energy efficiency among other objectives–safety being a major one, of
course, right? But you can imagine autonomous vehicle fleets providing that
service. In addition, if that fleet happens to be electric vehicles, maybe
that’s also a great resource, but it would need management and potentially
mission learning techniques to help out there. And the last component of the system: the system level assessment and planning. So here, it’s not a mistake that there’s a satellite image here. You know, certainly a broad scope to
a lot of these types of pieces. You can imagine using this to assess installed
generation capacity. Looking at solar photovoltaic arrays that are on rooftops
and using the size of them to estimate distributed installed capacity.
Maybe it’s finding power plants of that have recently been installed, where there
aren’t necessarily records available of that. You know, it can also be kind of the
other problem, predicting where future distributed capacity may be installed,
that may be of interest of course to solar PV installers
looking to put it out there, but may also be of significant interest to utility
planners and system planners, looking to see where the grids gonna grow, how
systems may need to be upgraded to accommodate that. And lastly, an
interesting application: monitoring global oil supply. So there’s a company
called Orbital Insight that actually uses images of oil tanks and you can see the
top of the oil tank will go down or up, depending on the level of oil that’s
contained within it. So by using satellite images that are taken frequently,
you can see economic flows happening as that oil tank goes, top goes up and down. So with all of this, there’s a really important takeaway here, which is, you’ve
heard again and again things like improve forecasts across these, optimized
operation enhanced planning, and maybe in some cases, feeding into that, was
expanded system disability, whether it’s through satellite imagery or
other new data sources. All of these, one of the outcomes they can lead to is increased efficiency and what is increased efficiency mean?
Decreased energy consumption, decrease environmental impacts can decrease costs.
So collectively, you know even though there were a lot of applications that
we’ve just discussed, here they’re going towards these overarching
archangels. So I think that’s pretty exciting and of course though,
there are some challenges here. We have this bit of a trade-off. On
one hand, we have wanting to maintain privacy and ensure absolute data security. On the other hand, we have making data available, increasing that availability of data for researchers, policy makers, planners, whatever may be. On the privacy side, there’s a few things that you might be able to learn from. from some of the data that’s here. From smart things like activities in a building, potentially, to a level of resolution and to accuracy, there’s a big question mark around. But that might also lead to presence in a home and some of these things they can get, potentially,
some sensitive information or information that some would view
as sensitive. And some of the databases that are involved in here, if it’s building data or whatnot may contain other more personally identifiable information. So it makes sharing some of that difficult. On the other hand, on the data availability, without having access to
huge amounts of data on all topics, this is reducing innovation, by a
certain extent, and inhibiting a system understanding an insight. If
we couple that with the fact that data are often proprietary or
restricted, so we either have to pay for it or you might just not get access to it
at all, this is a bit of a limitation there. So you know increasing data
availability versus increasing privacy, this is sort of a question that comes up
and we have to be considering this with every application that was mentioned
here. What’s more important in each case. So to kind of wrap up them on this part
of the talk, I just want to mention all of this work–there’s a lot of math we’re not going to go through all of this– but the energy data analytics
lab, which is the organization that is sponsoring this, in part
sponsored by the Duke University Energy Initiative, the Information Initiative at Duke
and the Social Science Research Institute, is trying to come together to
tackle a lot of the problems that we’re talking about here, to
investing in research– to transform from our energy application means to
those system in performance improvements that we talked about earlier. So you know
there are many of us in this room that are actively engaged in that and you
know, we certainly hope that if there are opportunities for you to be
involved, would love to explore that and so in particular, options to get involved:
Data+ and Bass Connection. Data+ is a summer program that occurs
over ten weeks in the summer of an intensive exploration of a particular
data science question and so, this is showing up a project from this
past summer, where we started looking into how we can use satellite imagery to
identify transmission and distribution lines, and automatically assess the
presence of those lines. And that project will continue as part of Bass Connections.
So typically, there are a couple undergrads per team, and a graduate student mentor. So for grad students in the room, if that’s something of interest,
you know, there will certainly be an application that comes open there. And
Bass Connections is a two-semester opportunity during the academic year, in
which students can again, engage in a project spanning a number of possible
topics. We have one going on now, with the Energy Data Analytics Lab, that’s following up on this project that I just mentioned here. Again, those applications will open up in the beginning of next year, so keep your eyes open for that if you’re interested now in being a graduate student mentor in either of those. And for the PhD students in the room, we are gathered together here today because of the Energy Data Analytics PhD Fellows Program again, with funding provided by the Alfred P. Sloan Foundation, and there are a number of benefits associated with this program. It’s really meant to bring together energy domain expertise and data science. Tools and research into single projects and so, if this is something that sounds
interesting and you are a full time doctoral student, consider applying this
year for that program.

Tags: , ,

Leave a Reply

Your email address will not be published. Required fields are marked *