\documentstyle[12pt,times,fullpage,epsfig]{article}
\newcommand{\bm}[1]{\mbox{\boldmath$#1$}}
\def\baselinestretch{1.5}
%\input{psfig}
\begin{document}

\vspace*{2.75in}

\begin{center}
A Note on High Breakdown \\
Regression Estimators \\
by \\
R.W. Oldford \\
Technical Report No. 39 \hspace*{1.0in} August 1983
\end{center}

\vspace*{2.0in}

\renewcommand{\baselinestretch}{1.0}
\large\normalsize

\begin{center}
Massachusetts Institute of Technology \\
Center for Computational Research in Economics and Management Science \\
Alfred P. Sloan School of Management
\end{center}

\renewcommand{\baselinestretch}{1.5}
\large\normalsize

\newpage
\begin{center}
{\bf Abstract}
\end{center}

High breakdown, without other measures of estimator resistance, is an inadequate
goal for regression estimators.  This is shown by constructing an easily computed 
regression estimator with 50\% breakdown.  The estimator is essentially least squares.

\vspace*{2.0in}

\begin{center}
{\bf Acknowledgements} 
\end{center}

Special thanks go to David Belsley, David Donoho, Peter Rousseeuw and Alexander Samarov
for their helpful comments on an earlier draft and to Karen Martel for typing this paper.
\newpage
\noindent
{\bf 1.  Introduction}

Suppose we have the regression model
$$
y _{i} = {\bm x} _{i} ^ {T} {\bm \beta} + e _{i} \qquad i = 1,...,n
$$
where ${\bm \beta} \in \Re ^ {p}$ is the parameter vector and $( {\bm x} _{i} ^ {T} , y _{i} ) \in
\Re ^ {p+1}$ is the $i ^ {\mbox{th}}$ observation vector.  Let
$D _{n} = \{ ( {\bm x} _{i} ^ {T} , y _{i} ) \ : \ i = 1,...,n \}$ denote the sample
of $n$ observations and ${\bm b} (D _{n} )$ be any estimator of ${\bm \beta}$ based on
$D _{n}$.  Further let $D _{n,m} \subset \Re ^{p+1}$ be any set of cardinality
$n$ such that $D _{n,m} \cap D _{n}$ has cardinality $m$ and let ${\cal D} _{n,m}$ be the set
containing all such sets for fixed $m$ and $n$.

The finite sample replacement breakdown (Donoho and Huber (1983)) is defined to be
$\epsilon ^ {\ast} = m/n$ where $m$ is the smallest integer such that

\renewcommand{\theequation}{1.\arabic{equation}}
\setcounter{equation}{0}

\begin{equation}
\sup _{D _{n,m} \in {\cal D} _{n,m} } \parallel {\bm b} (D _{n,m} ) - {\bm b} (D _{n} ) \parallel =
\infty
\end{equation}
and $\parallel \cdot \parallel$ denotes the Euclidean norm.

Recently [Siegel (1982), Donoho and Huber (1983), Rousseeuw (1982)] there has been increasing
interest in the construction of regression estimators which have high breakdown.  It is generally
felt that the resulting estimates will provide good starting values for more efficient robust
estimation procedures [e.g., Andrews (1974)].  Alternatively, such estimates may be used for exploratory
data analytic purposes.

The purpose of the present note is to demonstrate the inadequacy of the definition (1.1) of breakdown for regression
estimators.  We do this by proposing a new estimator which has breakdown of 1/2, but which is clearly unsuitable for data
analysis.  In addition, this estimator is equivariant to non-singular transformations of the explanatory variables, highly
efficient at the standard Gaussian model and trivially computed.  Other high breakdown estimators, repeated medians (RM) [Siegel
(1979, 1982)], and least median square (LMS) [Rousseeuw (1982)], do not possess all of those attributes.

The estimator exploits a peculiarity inherent in the definition of breakdown (1.1), namely, that the supremum must be
infinite.  Since the parameters being estimated are slope and/or intercept parameters, most estimators will give infinite estimates
only if some of the $| y _{i} |$'s $\rightarrow \infty$.  The proposed estimator prevents this occurrence. \\
\ \\
{\bf 2.  The Estimator}

\renewcommand{\theequation}{2.\arabic{equation}}
\setcounter{equation}{0}

Let $y ^{\prime}$ be the median of the $y _{i}$'s and MAD be their median absolute deviation, median
$( | y _{i} - y ^{\prime} | )$.  An estimator $b ^ {\ast}  (D _{n} )$ may be constructed as a weighted
least squares estimate with weights $w _{i}$ given by
\begin{equation}
w _{i} = \left\{
\begin{array}{ll}
1 & \mbox{if} \ \ | y _{i} - y ^{\prime} | \leq c \cdot \mbox{MAD} \\
0 & \mbox{otherwise},
\end{array}
\right.
\end{equation}
where $c$ is some constant satisfying $1 \leq c < \infty$.  The weights ignore the $x$-data entirely.
Even so, we may prove that it has breakdown 1/2 (as $n \rightarrow \infty$).

We make the following assumptions on the elements of $D _{n}$:

\noindent
\begin{tabular}{llll}
A1. & $y _{i} \in \Re$, & $|y _{i} | < \infty$ & $i = 1,...,n$ \\
A2. & ${\bm x} _{i} \in \Re ^ {p} $, & $\parallel {\bm x} _{i} \parallel < \infty$ & $ i = 1,...,n$
\end{tabular}

\noindent
\begin{itemize}
\item[\rm A3.]
${\bm x} _{1} ,..., {\bm x} _{n} $ are in general position, that is no $(p+1)$ points lie on a $(p-1)$
dimensional linear manifold.
\end{itemize}
And we introduce the following notation:  let
\begin{itemize}
\item[\rm (i)]
$\parallel A \parallel$ denote the spectral norm of a matrix $A$,
\item[\rm (ii)]
$J _{m} \subset \{ 1,...,n \}$ of cardinality $m$ and $\overline{J} _{m}$ its complement in this set be defined implicitly by
$$
D _{n,m} = \{ {\bm z} _{i} \ : \ {\bm z} _{i} ^ {T} = ( {\bm u} _{i} ^ {T} , v _{i} ) \ \ \forall \ i \in J _{m} \ \ \mbox{and} \ \ {\bm z} _{i} ^{T} =
( {\bm x} _{i} ^ {T} , y _{i} )
$$
$$
\forall \ i \in \overline{J} _{m} \} \subset \Re ^{p+1} ,
$$
\item[\rm (iii)]
$W(D) = \mbox{diag} (w _{1} (D) ,..., w _{n} (D))$ where $w _{i} (D)$'s are weights
as in (2.1) based on a data set $D$,
\item[\rm (iv)]
$X(D)$ and ${\bm y} (D)$ be the usual $X$ matrix and $y$-vector constructed from a data set $D$,
\item[\rm (v)]
$\lambda _{j} (A)$ the $j ^{\mbox{th}}$ largest eigenvalue of $A$,
\item[\rm (vi)]
$I(D) = \{ i \ : \ w _{i} (D) = 1 \} \subset \{ 1,...,n \}$,
\item[\rm (vii)]
$[a]$ denote the largest integer less than or equal to $a$.
\end{itemize}
We may now prove the following result. \\
\ \\
{\bf Theorem}: Given A1-A3, the weighted least squares estimator ${\bm b} ^ {\ast} (D _{n} )$ with weights
given by (2.1) has finite sample replacement breakdown of
\begin{equation}
\epsilon ^ {\ast} \geq \frac{ n-[n/2] -p}{n} .
\end{equation}
{\bf Proof}:  By A1-A3, $\parallel {\bm b} ^ {\ast} (D _{n} ) \parallel < \infty$ and we need only examine
$\parallel {\bm b} ^ {\ast} (D _{n,m} ) \parallel$ in (1.1).  By a property of the spectral norm [Wilkinson (1965)]
$$
\parallel {\bm b} ^ {\ast} (D _{n,m} ) \parallel \ \leq \ \parallel (X ^ {T} (D _{n,m} ) W (D _{n,m} ) 
X( D _{n,m} )) ^ {-1} X ( D _{n,m} ) ^ {T} W (D _{n,m} ) \parallel
$$
\begin{equation}
\cdot \parallel W ( D _{n,m} ) {\bm y} ( D _{n,m} ) \parallel
\end{equation}
and \\
$ \parallel (X ^ {T} (D _{n,m} ) W (D _{n,m} ) X (D _{n,m} )) ^ {-1} X (D _{n,m} ) ^{T} W (D _{n,m} ) \parallel ^ {2}$
\begin{eqnarray}
\ & = &  \lambda _{p} ^ {-1} \left( \sum _{i \in J _{m}} w _{i} (D _{n,m} ) {\bm u} _{i} {\bm u} _{i} ^ {T} +
\sum _{i \in \overline{J} _{m}} w _{i} (D _{n,m} ) {\bm x} _{i} {\bm x} _{i} ^ {T}  \right) \nonumber \\
\ & \leq &  \lambda _{p} ^ {-1} \left( \sum _{i \in \overline{J} _{m}} w _{i} ( D _{n,m} ) {\bm x} _{i} {\bm x} _{i} ^ {T} \right) \nonumber \\
\ & = & \lambda _{p} ^ {-1} \left( \sum _{i \in \overline{J} _{m} \cap I( D _{n,m} ) } {\bm x} _{i} {\bm x} _{i} ^ {T} \right) .
\end{eqnarray}
The inequality may be found for example in Lawson and Hanson (1974).  By A3, (2.4) is finite provided the cardinality of
$\overline{J} _{m} \cap I(D _{n,m} )$ is greater than $p$.  The cardinality of $\overline{J} _{m}$ is $n-m$.  Let that of
$I(D _{n,m} ) $ be $k$.  The cardinality of their intersection is guaranteed to be greater than $p$ if $(n-m) + k -n > p$
or $m < k-p$.  Since $k \geq n - [n/2]$ the last quantity is certainly finite if $m < n - [n/2] -p$.
Also,
$$
\parallel W( D _{n,m} ) {\bm y} ( D _{n,m} ) \parallel ^ {2} = \sum _{i \in J _{m}} w _{i} (D _{n,m} ) v _{i} ^ {2} +
\sum _{i \in \overline{J} _{m}} w _{i} ( D _{n,m} ) y _{i} ^ {2}
$$
which is always finite provided $m < \left[ \frac{n}{2} \right]$.  Therefore
$$
\parallel {\bm b} ^ {\ast} ( D _{n,m} ) \parallel \ < \infty \qquad \forall D _{n,m}
$$
for all $m < n - [n/2] -p$ and the theorem is proved.

Since $\epsilon ^ {\ast}$ is bounded from above by $\frac{1}{2}$, as $n \rightarrow \infty$,
$\epsilon ^ {\ast} \rightarrow \frac{1}{2}$.  Note also that the theorem could be proven with the inclusion of an
intercept term. \\
\ \\
{\bf 3.  Discussion}

In addition to having a high breakdown value, the estimator is equivariant to non-singular transformations and easily computed.
For $c$ sufficiently large in (2.1), the estimator is essentially least squares and therefore highly efficient at the usual
Gaussian Model.  Further, $c$ can be chosen large enough so that, for practical purposes, the estimator is equivariant to location
transformations of the regression parameters.

However, it is clearly not a resistant regression estimator.
While $\sup \parallel {\bm b} ^ {\ast} (D _{n} ) - {\bm b} (D _{n,m} ) \parallel$ is bounded for $m < n - [n/2] -p$,
it may be quite large for $m=1$ (finite-sample-influence or sensitivity function).
Figure 1 demonstrates that this can also be the case for the LMS estimator.
Here a single point essentially determines the line.  Rousseeuw (1982) gives an example with $p=2$, where RM is substantially
affected by 40\% contamination but LMS is not.  Clearly these estimators differ on other resistance properties.

Breakdown describes a worst-possible-case scenario.  Because of this it is often more easily assessed than an estimator's
sensitivity curve (e.g., LMS).  However, without assessment of other resistance properties, it may be misleading.
High breakdown implies bounded sensitivity but the bounds may be high enough to be ineffectual.

At least two alternative definitions of breakdown are possible.
The first would replace $\infty$ in (1.1) with some constant $k$.  Breakdown would now be a function
$\epsilon ^ {\ast} (k)$ and more difficult to assess.  Moreover, breakdown would no longer be invariant to a linear
transformation of ${\bm \beta}$.

The second possibility is closer in spirit to the original finite sample version of Andrews et al. (1971).
It consists of fixing the sets $D _{n}$ and ${\cal D} _{n,m}$ to be investigated.  In such cases $\infty$ is usually
replaced by $k < \infty$.

Donoho and Rousseeuw (1983; personal communication) have suggested that the {\em exact fit property} (EFP) of a regression
estimator be investigated.  Basically the set $D _{n}$ is chosen to be such that all $( {\bm x} _{i} ^ {T} , y _{i} )$ lie exactly
on a plane.  ${\cal D} _{n,m}$ is as before and ${\bm b} (D _{n} )$ is said to have the exact fit property $\epsilon = m/n$ if $m$
is the largest integer such that \\
$\displaystyle{\sup _{D _{n,m} \in {\cal D} _{n,m} } } \parallel {\bm b} (D _{n} ) - {\bm b} (D _{n,m} ) \parallel = 0$.
Siegel (1979) and Rousseeuw (1983) proved that the RM and LMS respectively have EFP of $\epsilon = 1/2$.

Other possibilities for $D _{n}$ and ${\cal D} _{n,m}$ should be investigated.  It may be the case that estimators
which are reasonable on other grounds ``break down'' only for pairs $(D _{n} , D _{n,m} )$ which rarely occur
in practice.  It would be helpful to have breakdown type assessments for commonly occurring $(D _{n} , D _{n,m} )$ and exploratory
techniques for recognizing those infrequent pairs $(D _{n} , D _{n,m} )$ which are dangerous to the estimator being used.

\begin{center}
\par
\centerline{\psfig{figure=figure1.ps,width=4.0in,height=3.0in}}

Figure 1.  The LMS estimate for a particular $D _{n,m}$.
\end{center}
\noindent
{\bf References}
\begin{description}
\item[\rm Andrews,]
D.F., P.J. Bickel, F.R. Hampel, P.J. Huber, W.H. Rogers, and J.W. Tukey (1972), {\em Robust Estimates of Location:  Survey
and Advances}, Princeton University Press:  Princeton, New Jersey.
\item[\rm Andrews,]
D.F. (1974), ``A Robust Method for Multiple Linear Regression'',  {\em Technometrics 16}, pp. 523-531.
\item[\rm Donoho,]
D.L. and P.J. Huber (1983), ``The Notion of Breakdown Point'', {\em E.L. Lehmann Festschriftt}, Bickel, P.J., K. Doksum
and J.L. Hodges, Jr. (eds.), Wadsworth Press.
\item[\rm Lawson,]
C.R. and R.J. Hanson (1974), {\em Solving Least Squares Problems}, Prentice-Hall: Englewood Cliffs, New Jersey.
\item[\rm Rousseeuw,]
P.J. (1982), ``Least Median of Squares Regression'', Technical Report, Centrum voor Statistiek en Operationeel
Onderzoek, Vrije Universiteit Brussel.
\item[\rm Siegel,]
A.F. (1979), ``The Repeated Median Algorithm'', unpublished working paper, University of Wisconsin.
\item[\rm Siegel,]
A.F. (1982), ``Robust Regression using Repeated Medians'', {\em Biometrika, 69}, pp. 242-244.
\item[\rm Wilkinson,]
J.H. (1965), {\em The Algebraic Eigenvalue Problem}, Oxford University Press, London.
\end{description}
\end{document}