|
1
|
- What is an Intelligent Power Grid, and why do we need it?
- Why do we need neural networks?
- How can we make neural nets really work here, & in
diagnostics/”prediction”/”control” in general?
|
|
2
|
|
|
3
|
- True intelligence (like brain) Þ foresight, Þ ability to learn to coordinate all pieces, for optimal
expected performance on the bottom line in future despite random
disturbances.
- Managing complexity is easy– if you don’t aim for best possible
performance! The challenge is to come as close as possible to optimal
performance of whole system.
- Bottom line utility function includes value added, quality of service
(reliability), etc. A general concept. Nonlinear robust control is just
a special case.
- Enhanced communication/chips/sensing/actuation/HPC needed for max
benefit(cyberinfrastructure, EPRI roadmap)
- Brain-like intelligence = embodied intelligence, ¹ AI
|
|
4
|
- DSOPF02 started from EPRI
question: can we optimally
manage&plan the whole grid as one system, with foresight, etc.?
- Closest past precedent: Momoh’s OPF integrates &optimizes many grid
functions – but deterministic and without foresight. UPGRADE!
- ADP math required to add
foresight and stochastics, critical to
more complete integration.
|
|
5
|
- As Gas Prices Ý
Imports Ý
& Nuclear Tech in unstable areas Ý, human extinction is a serious risk. Need to move
faster.
- Optimal time-shifting – big boost to rapid adjustment, $
|
|
6
|
- For optimal performance in the general nonlinear case (nonlinear control
strategies, state estimators, predictors, etc…), we need to adaptively
estimate nonlinear functions. Thus we must use universal nonlinear
function approximators.
- Barron (Yale) proved basic ANNs (MLP) much better than Taylor series,
RBF, etc., to approximate smooth functions of many inputs. Similar
theorems for approximating dynamic systems, etc., especially with more
advanced, more powerful, MLP-like ANNs.
- ANNs more “chip-friendly” by definition: Mosaix chips, CNN here today,
for embedded apps, massive thruput
|
|
7
|
- Neural Nets, A Route to Learning/Intelligence
- goals, history, basic concepts, consciousness
- State of the Art -- Working Tools Vs. Toys and Fads
- static prediction/classification
- dynamic prediction/classification
- control: cloning experts, tracking, optimization
- Advanced Brain-Like Capabilities & Grids
|
|
8
|
|
|
9
|
- 4th Gen: Your PC. One VLSI CPU chip executes one sequential stream of C
code.
- 5th Gen: “MPP”, “Supercomputers”: Many CPU chips in 1 box. Each does 1
stream. HPCC.
- 6th Gen or “ZISC.” Ks or Millions of simple streams per chip or
optics. Neural nets may be
defined as designs for 6th gen + learning. (Psaltis, Mead.)
- New interest; Moore, SRC; Mosaix, JPL sugarcube, CNN.
- 7th Gen: Massively parallel quantum computing? General? Grover like
Hopfield?
|
|
10
|
|
|
11
|
|
|
12
|
- The Physical Layer – Devices and Networks
- National Nanofabrication Users Network (NNUN)
- Ultra-High-Capacity Optical Communications and Networking
- Electric Power Sources, Distributed Generation and Grids
- Information Layer – Algorithms, Information and Design
- General tools for distributed, robust, adaptive, hybrid control &
related tools for modeling, system identification, estimation
- General tools for sensors-to-information & to decision/control
- Generality via computational intelligence, machine learning, neural
networks & related pattern recognition, data mining etc.
- Integration of Physical Layer and Information Layer
- Wireless Communication Systems
- Self-Organizing Sensor and Actuator Networks
- System on Chip for Information and Decision Systems
- Reconfigurable Micro/Nano Sensor Arrays
- Efficient and Secure Grids and Testbeds for Power Systems
|
|
13
|
|
|
14
|
|
|
15
|
|
|
16
|
|
|
17
|
|
|
18
|
- Hebbian Learning Rules Are All Based on Correlation Coefficients
- Good Associative Memory: one component of the larger brain (Kohonen,
ART, Hassoun)
- Linear decorrelators and predictors
- Hopfield f(u) minimizers never scaled, but:
- Gursel Serpen and SRN minimizers
- Brain-Like Stochastic Search (Needs R&D)
|
|
19
|
|
|
20
|
|
|
21
|
- We don’t believe in neural networks – see Minsky
(Anderson&Rosenfeld, Talking Nets)
- Prove that your backwards differentiation works. (That is enough for a
PhD thesis.) The critic/DP stuff published in ’77,’79,’81,’87..
- Applied to affordable vector ARMA statistical estimation, general TSP
package, and robust political forecasting
|
|
22
|
|
|
23
|
|
|
24
|
|
|
25
|
|
|
26
|
|
|
27
|
|
|
28
|
- All 3 train predictors, use sensor data X(t), other data u(t),
fault classifications F1 to Fm
- Type 1: predict Fi(t) from X(t), u(t), MEMORY
- Others: first train to predict X(t+1) from X,u,MEM
- Type 2: when actual X(t+1) 6s from prediction, ALARM
- Type 3: if prediction net predicts BAD X(t+T), ALARM
- Combination best. See PJW in Maren, ed, Handbook
- Neural Computing Apps, Academic, 1990.
|
|
29
|
|
|
30
|
|
|
31
|
|
|
32
|
|
|
33
|
- For short-term memory, for state estimation, for fast adaptation – time-lagged
recurrence needed. (TLRN = time-lagged recurrent net)
- For better Y=F(X,W) mapping, Simultaneous Recurrent Networks Needed. For
large-scale tasks, SRNs WITH SYMMETRY tricks needed – cellular SRN,
Object Nets
- For robustness over time, “recurrent training”
|
|
34
|
- E.g.: law X sends extra $ to schools with low test scores
- Does negative correlation of $ with test scores imply X is a bad
program? No! Under such a law, negative correlation is hard-wired. Low
test scores cause $ to be there! No evidence + or – re the program
effect!
- Solution: compare $ at time t with performance changes from t to t+1!
More generally/accurately: train dynamic model/network – essential to
any useful information about causation or for decision!
|
|
35
|
|
|
36
|
- “Simple BP” – incorrect derivatives due to truncated calaculation,
robustness problem
- BTT – exact, efficient, see Roots of BP (’74), but not brain-like (back
time calculations)
- Forward propagation – many kinds (e.g, Roots, ch.7, 1981) – not
brainlike, O(nm)
- Error Critic– see Handbook ch. 13, Prokhorov
- Simultaneous BP – SRNS only.
|
|
37
|
- Bugs – need good diagnostics
- “Bumpy error surface” – Schmidhuber says is common, Ford not. Sticky
neuron, RPROP, DEFK (Ford), etc.
- Shallow plateaus – adaptive learning rate, DEKF etc., new in works…
- Local minima – shaping, unavoidable issues, creativity
|
|
38
|
|
|
39
|
|
|
40
|
|
|
41
|
|
|
42
|
- 4 General Object Types (busbar, wire, G, L)
- Net should allow arbitrary number of the 4 objects
- How design ANN to input and output FIELDS -- variables like the SET of
values for current ACROSS all objects?
|
|
43
|
- One System does it all -- not just a collection of chapters or methods
- Domain-specific info is 2-edged sword:
- need to use it; need to be able to do without it
- Neural Nets demand/inspire new work on general-purpose prior
probabilities and on dynamic robustness (See HIC chapter 10)
- SEDP&Kohonen: general nonlinear stochastic ID of partially observed
systems
|
|
44
|
- Bayesian: Maximize Pr(Model|data)
- “Prior probabilities” essential when many inputs
- Minimize “bottom line” directly
- Vapnik: “empirical risk” static SVM and “sytructural risk” error bars
around same like linear robust control on nonlinear system
- Werbos ’74 thesis: “pure robust” time-series
- Reality: Combine understanding and bottom line.
- Compromise method (Handbook)
- Model-based adaptive critics
- Suykens, Land????
|
|
45
|
|
|
46
|
|
|
47
|
|
|
48
|
|
|
49
|
|
|
50
|
|
|
51
|
|
|
52
|
|
|
53
|
|
|
54
|
- Robust or H Infinity Control (Oak Tree)
- Adaptive Control (Grass)
- Learn Offline/Adaptive Online (Maren 90)
- “Multistreaming” (Ford, Felkamp et al)
- Need TLRN Controller, Noise Wrapper
- ADP Versions: Online or “Devil Net”
|
|
55
|
|
|
56
|
|
|
57
|
|
|
58
|
|
|
59
|
|
|
60
|
|
|
61
|
- Basic thrust is scientific. Bellman gives exact optima for 1 or 2
continuous state vars. New work allows 50-100 (thousands sometimes).
Goal is to scale up in space and time -- the math we need to know to
know how brains do it. And unify the recent progress.
- Low lying fruit -- missile interception, vehicle/engine control,
strategic games
- New book from ADP02 workshop in Mexico www.eas.asu.edu/~nsfadp (IEEE
Press, 2004, Si et al eds)
|
|
62
|
- IEEE Computational Intelligence (CI) Society, new to 2004, about 2000
people in meetings.
- Central goal: “end-to-end learning” from sensors to actuators to
maximize performance of plant over future, with general-purpose learning
ability.
- This is DARPA’s “new cogno” in the new nano-info-bio-cogno convergence
- This is end-to-end cyberinfrastructure
- See hot link at bottom of www.eng.nsf.gov/ecs
- What’s new is a path to make it real
|
|
63
|
- Model-free (levels 0-2)*
- Barto-Sutton-Anderson (BSA) design, 1983
- Model-based (levels 3-5)*
- Werbos Heuristic dynamic programming with backpropagated adaptive
critic, 1977, Dual heuristic programming and Generalized dual heuristic
programming, 1987
- Error Critic (TLRN, cerebellum models)
- 2-Brain, 3-Brain models
|
|
64
|
- Basic thrust is scientific. Bellman gives exact optima for 1 or 2
continuous state vars. New work allows 50-100 (thousands sometimes).
Goal is to scale up in space and time -- the math we need to know to
know how brains do it. And unify the recent progess.
- Low lying fruit -- missile interception, vehicle/engine control,
strategic games
- Workshops: ADP02 in Mexico ebrains.la.asu.edu/~nsfadp; coordinated
workshop on anticipatory optimization for power.
|
|
65
|
- Neural Network Engineering
- Widrow 1st ‘Critic’ (‘73), Werbos ADP/RL (‘68-’87)
- Wunsch, Lendaris, Balakrishnan, White, Si,LDW......
- Control Theory
- Ferrari/Stengel (Optimal), Sastry, Lewis, VanRoy
(Bertsekas/Tsitsiklis),Nonlinear Robust...
- Computer Science/AI
- Barto et al (‘83), TD, Q, Game-Playing, ..........
- Operations Research
- Original DP: Bellman, Howard; Powell
- Fuzzy Logic/Control
|
|
66
|
|
|
67
|
|
|
68
|
- Stabilized voltage & reactance under intense disturbance where
neuroadaptive & usual methods failed
- Being implemented in full-scale experimental grid in South Africa
- Best paper award IJCNN99
|
|
69
|
- HDP=TD For DISCRETE set of
Choices
- DHP when action variables u are continuous
- GDHP when you face a mix of both (but put zero weight on undefined
derivative)
- See arXiv. org , nlin-sys area, adap-org 9810001 for detailed history,
equation, stability
|
|
70
|
- ANNs For Distributed/Network I/O: “spatial chunking,” ObjectNets, Cellular SRNs
- Ways to Learn Levels of a Hierarchical Decision System – Goals,
Decisions
- “Imagination” Networks, which learn from domain knowledge how to escape
local optima (Brain-Like Stochastic Search BLiSS)
- Predicting True Probability Distributions
|
|
71
|
- 4 General Object Types (busbar, wire, G, L)
- Net should allow arbitrary number of the 4 objects
- How design ANN to input and output FIELDS -- variables like the SET of
values for current ACROSS all objects?
|
|
72
|
- Train 4 FF Nets, one for each TYPE of object, over all data on that
object.
- E.g.: Predict Busbar(t+1) as function of Busbar(t) and Wire(t) for all 4
wires linked to that busbar (imposing symmetry).
- Dortmund diagnostic system uses this idea
- This IMPLICITLY defines a global FF net which inputs X(t) and outputs
grid prediction
|
|
73
|
- Define a global FF Net, FF, as the combination of local object model
networks, as before
- Add an auxiliary vector, y, defined as a field over the grid (just like X
itself)
- The structure of the object net
is an SRN:
- y[k+1] = FF( X(t), y[k], W)
- prediction (e.g. X(t+1)) = g(y[¥])
- Train SRNs as in xxx.lanl.gov, adap-org 9806001
- General I/O Mapping -- Key to Value Functions
|
|
74
|
- ANNs For Distributed/Network I/O: “spatial chunking,” ObjectNets, Cellular SRNs
- Ways to Learn Levels of a Hierarchical Decision System
- “Imagination” Networks, which learn from domain knowledge how to escape
local optima (Brain-Like Stochastic Search BLiSS)
- Predicting True Probability Distributions
|
|
75
|
- Brute Force, Fixed “T”, Multiresolution
- “Clock Based Synchronization”, NIST
- e.g., in Go, predict 20 moves ahead
- Action Schemas or Task Modules
- “Event Based Synchronization”:BRAIN
- Miller/G/Pribram, Bobrow, Russell, me...
|
|
76
|
|
|
77
|
|
|
78
|
- Usual Way: J(0) = U, J(n+1) = U + MTJ(n)
- After n iterations, J(t) approximates
- U(t) + U(t+1) + ... + U(t+n)
- DOUBLING TRICK shows one can be faster: JT =UT(I+M)
(I+M2) (I+M4)...
- After n BIG iterations, J(t) approximates
- U(t) + U(t+1) + ... + U(t+2n)
|
|
79
|
- M-to-the-2-to-the-nth Becomes a MESS
- Instead use the following equation, the key result for the flat lookup
table case:
|
|
80
|
|
|
81
|
|
|
82
|
|
|
83
|
|