(Don't read this page. It is a work in progress for a Fall'19 graduate automated SE subject at NC State. Come back in mid-October!)
Start1.preface2.why se 4 ai? 3.tools 4.ethics: how |
ToolsbaselinesData mining: discretization basic advanced Optimizers: landscapes basic advanced optimizing+data mining Theorem provers: basic advanced |
Processrequirementscollect cleanse label train eval deploy monitor |
Codeconfigtests |
Exercises12 3a 3b 3c 3d 4 |
The premise of this book is that AI tools offer a rich tapestry of choices that software engineers can weave through to a variety of goals (and which can include ethical goals). This chapter offers specific examples of that process.
We explore ethics since, More and more, AI tools are making decisions that affect people’s lives in such high-stakes applications such as mortgage lending, hiring, and prison sentencing. Many high-stake applications such as finance, hiring, admissions, criminal justice use AI decision-making frequently1,2,3,4,5,6,7,8,9. Unfortunately, some of AI tools are known to exhibit “group discrimination”; i.e. their decisions are inappropriately affected attributes like race, gender, age, etc:
We say that, to some degree, the ethical impact of AI tools can be controlled by the developers building that software. We stress “to some degree” since the best ethical intentions of any developer can be defeated by malevolent forces, or even by just dumb luck. So it is wrong to say that if our guileless are followed that the result AI tool will always adhere to socially-accepted ethical standards.
But it also wrong to say that just because some ethical goals are not always reached, that we should not strive towards those goals. Developers will always try to adhere to ethical standards. Or, at the very least, they should monitor their AI tools and report unethical usage or consequences.
Our point will be that, in the 21st century, the wise software engineering knows how different AI tools offer different services, and how some of those services can achieve certain ethical goals. We offer fair warning to the reader versed in the standard texts on, say, data mining. The technologies discussed below roam far away from standard discussion of (say) classification vs regression vs whatever else. Once we introduce ethical goals like inclusiveness or fairness then the technology choices become very different.
The following methods are discussed above, very briefly (and for more details, see later in this book):
For the industrial practitioner who wishes to distinguish themselves within the currently crowded AI market, the above list might be a marketing opportunity. By augmenting their current toolkit with some of the above, industrial practitioners might be able to offer services that is absent amongst their rivals.
For the researcher who is an advocated of a particular AI tool, the above list might inspire a research challenge:
For us, this list is like a specification for an ideal “ethics machine”. Later in this book we offer a version 0.1 implementation of that ethics machine. As will be seen, that implementation requires much extension and improvement. Nevertheless, it does show that a surprisingly large portion of the above can be created in a relatively simple manner. It is hoped that that implementation seeds a research community devoted to exploring algorithms with ethical effects.
The Institute for Electronics and Electrical Engineers (IEEE) has recently discussed general principles for implementing autonomous and intelligent systems (A/IS). They propose that the design of such A/IS systems satisfy certain criteria:
Other organizations, like Microsoft offer their own principles for AI:
Ethics is a
rapidly evolving concept so it hardly surprising to say that mapping
the stated ethical concerns of one organization (Microsoft) into
another (IEEE) is not easy.
Nevertheless,
the following table shows one way we might map together these two
sets of ethical concerns. Note that:
Accountable | Transparent | Fairnessx | Rely+Safex | Inclusivex | Private+Secure | |
Accountability | ✔ | |||||
Transparency | ✔ | |||||
Well-being + aware of misuse |
✔ | ✔ | ||||
Human-rights | ✔ | ✔ | ✔ | |||
Data agency | ✔ | ✔ | ||||
Effectivenessx | ✔ | ✔ | ✔ |
The reader might dispute this mapping, perhaps saying that we have missed, or missed out, or misrepresented, some vital ethical concern. This would be a good thing since that would mean you are now engaging in discussions about software and ethics. In fact, the best thing that could happen below is that you say “that is wrong; a better way to do that would be…” As George Box said, all models are wrong; but some are useful.
In any case, what the above table does demonstrate is that:
The above table maps between ethical concerns from different organizations. The rest of this chapter discusses how different algorithm choices enable these ethical goals.
It is unethical to deliver an AI tool that is performing poorly, particularly when there are so many ways to make an AI tool perform better. As discussed in our chapter on Baselines, no AI tool works best for all problems. Hence, we exploring new problems, there must be a commissioning process where different AI tools are explored and/or adjusted to the local problem:
The faster the algorithm, the easier it is to fiddle with. So measured in terms of commissioning effort, we prefer linear time methods (e.g. Naive Bayes) to very slow algorithms (e.g. KNN, that scale very poorly to large data sets).
It is important to stress that the commissioning effort cannot be the only way we assess an AI tool. For high dimensional image data, deep learning] has proved to be very effective.
Training such learners can be a very slow process, so tuning and comparing with other learners may be impractical. In this book we made no case that deep learning (or any other AI tool) is inherently better or worse. Rather, our goal is to map the trade-offs associated with AI tool such that the best one can be selected from the next problem.
Machine learning software, by its nature, is always a form of statistical discrimination. This discrimination becomes objectionable when it places certain social groups (e.g. those characterized by age, sex, gender, nationality) at a systematic advantage or Disadvantage
There are various measures that can be applied to measure unfariness. The first step is to identify “protected attributes” (e.g. race, age, gender, etc). Next, we use all attributes (privileged and otherwise) to build a classifier. Thirdly, we measure unfariness using measures like:
After that, handling unfairness becomes a hyperparameter optimization issue. Recent result shows that hyperparameter tuning can find fairer models (where “fair” measured by EOD and AOD). The trick here is that such optimization must strive for fairness AND performance (precision, recall etc) since experiments show that optimizing for performance separately to fairness means that we can succeed on one and fail on the other.
AI tools that include humans in their reasoning process must do several things:
Inclusiveness is helped by AI tools that generate succinct human-readable models since humans can read and understand such models. Rule-based learners like contrast set learners and FFTrees are useful for generating such succinct models:
Another interesting approach to explanation is to use locality reasoning. The LIME explanation algorithm builds some model using examples near the example of interest (LIME does not specify which model is used). Next, LIMES builds a local regression mode using the predictions from . The coefficients of are then informative as to what factors are most influential. For example, in the diagram at right, the example of interest is marked with a red cross and the coefficients would reveal why this example is labeled (say) ref, not blue).
For a discussion of other explanation algorithms, see Gosiekska and Biecek.
Once a system can explain itself, then most probably humans will want to change some part of it. Active learning is a general framework within which humans and AI can learn from each other, in the context of specific examples.
One of the lessons of research into requirements engineering is that the stakeholders for software have many competing goals. Simple AI tools know how to chase a single goals (e.g. a classifier might try to maximize the accuracy of its predictions). Better AI tools now how to trade off between the multiple competing goals of different stakeholders.
One way to trade-off between competing goals are multi-goal reasoners. Pareto frontiers were introduced in Chapter 3 in the section discussing how data miners use optimizers. Recall that, given many solutions floating in a space of multiple goals, the “Pareto frontier” are those solutions that are not demonstrably worse that anything else. In the figure at right, if we wish to maximize both the quantities, then “heaven” is top right so “K,N” are not on the frontier (since there are other items between them and heaven). On the other hand, “A,B,C,D,E,F,G,H” are on the frontier since they have a clear line of sight to heaven.
There are many methods for finding the Pareto frontier including genetic algorithms and sequential model-based optimization and the three data mining methods described below. Once the frontier is found, the reasoning can stop. Alternatively, in multi-generational reasoning, this frontier becomes the seed for a new round of reasoning.
One of the lessons of this book is that building ethical systems is not hard. If developers really understand how their AI tools work, it is possible to refactor them and produce simpler systems that can better achieve the desired goals. For example, in this section, we offer three very simple data mining methods that implement multi-goal optimization.
One way to use data mining method to implement multi-goal reasoning is via recursive random projections. Krall et al. and Chen et al. applied RPP to randomly generated candidates. Instead of evaluating all candidates, Krall and Chen just evaluated the “east,west” pairs. There approach achieved similar (and sometimes better) results than standard optimizers while running much faster (for one large model, RRP terminated in minutes, not the hours required for standard optimizers). Chen et al. improved on Krall’s work by showing that if the initial candidate size was large (say ) then (a) multi-generational reasoning was not required while at the same time leading to (b) results competitive with other methods.
A second way to use data mining to implement multi-goal reasoning is via frugal trees. Recall from the above that a frugal tree generator ranks different divisions of data columns according the goal of the learning. Chen et al. out-performed the prior state of the art (in one are) by ranking their divisions using a pair of two-dimensional goals:
A third way to use data mining to implement multi-goal reasoning is to use contrast set learning and the Zitler and Künnzli indicator measure In the following equation, and are the i-th goal of row and and are those goals normalized 0..1 for min..max. Each of the “N” goals is weighted depending on whether or not we seek to minimize or maximze it.
Row is better than row if we “lose more” going to than going to ; i.e. . Rows can be sorted according to how many times they are better than (say) other rows (selected at random). Contrast set learning can then be applied to discover what selects for the (say) 20% top scoring rows (while avoiding the rest). Note that, in practice, we have seen this indicator measure work well for up to 5 goals.
These three examples demonstrate the value of understanding AI tools. All the above refactor existing AI tools (RRP, frugal trees, contrast set learning) to achieve better systems:
This three points are an excellent demonstrator of the main point of this book: AI tools give software developers more choices in how to implement a system. Developers can use those choices they can use to great benefit, including ethical benefits).
Formally, reliability is the probability of failure-free software operation for a specified period of time in a specified environment. Since modern software is so complex, we cannot usually assign such a probability. Instead, a more pragmatic goal for “reliable software” is to decreases the odds that it is will do harm, in the future.
Four tools for pragmatically assessing AI tool reliability are
Further to the last point, certification envelopes have two major issues. Firstly, from a privacy perspective, it is problematic to share data least it reveals private information. For more on this point, see the next section.
Secondly , from a systems perspective, it can be primitively expensive to pass around large amounts of data with each AI tool. To address this problem, we suggest:
TBD
To share data, while maintaining privacy, two important tricks are prototype section and mutation. Prototype section as discussed above. Piracy-based mutation must be done with care since, if otherwise, Grechanik et al. and Brickell et al.](brickell-2008) warn that the more we obfuscate data (to maintain privacy), the worse the effectiveness of the models learned from that data.
Peters et al. addresses this problem via supervised mutation methods. In that approach, after prototype selection, data is mutated by a random amount up to, but not more than the hyperspace boundary between classes.
privacy.centralized. target fr hackers. ditsibuted with transitions: dat tehft during transitions. why send alld ata prorotype generation.
Transparencey:
-transparent makes users of a system aware ot the use and misue of that ssytem
- see explanation work [Feather'02] [Menzies'07] [Gay'12] [Matheer'16]
Repair: reliabiltiy planning drihan EFFECTIVENESS, RELIABILTIY, Krishna (Ph.D. 2019?):
Gigerenzer’s thinking was influenced by the Nobel-Prize winning economist and AI pioneer Herbert Simon. Simon argued that humans do make optimizer decisions, since such optimality assumes complete knowledge about a situation. Rather, says Simon, humans reason via “satisficing” ( a portmanteau of satisfy and suffice) in which they seek solutions good enough for the current context. ↩