AN EFFECTIVE SELECTION OF REGRESSION VARIABLES ..
WHEN THE ERROR DISTRIBUTION IS INCORRECTLY SPECIFIED*
Wolfgang HardIe
FB Mathematik
Johann-Wolfgang-Goethe-Universitat
D-6000 Frankfurt/M.
WEST GERMA.~
and
Department of Statistics
University of North Carolina
Chapel Hill, North Carolina 27514
-·e
,
Abstract
In the situation where the statistician estimates regression parameters by maxi-:mum likelihood methods but fails to choose a likelihood function matching the
true error distribution, an asymptotically efficient selection of regression
variables is considered.
The proposed procedure is especially useful when a
robust regression is applied but the data in fact do not require that treatment.
Examples are given and relationships to other selectors such as Mallows' Care
p
investigated .
•
Keywords and Phrases:
variable selection, regression analysis, robust regression,
model choice
AMS Subject Classification:
·'e
*Research
Primary 62J05, Secondary 62G99
supported by Deutsche Forschungsgemeinschaft, Sonderforschungsbereich
123 "Stochastische Mathematische Model1£''' and AFOSR Contract No. F49620 82 C 0009.
1.
Introduction and Results
mean
~ = (~1"'"
covariate xi' i
=
(Y , ••• , Y )' is a random vector of n observations with
1
n
~n)' and assume that the i th observation is associated ~ith a
Suppose that Y
= 1, ... ,
finite parameter vector
n.
S;
The dependence of
lJ
i
on xi is specified by an in-
therefore at most n parameters can be estimated on
the basis of the observations.
Suppose that a certain likelihood function, not
necessarily matching the true error distribution, has been chosen and that parameter estimates in a finite dimensional submode1 p have been obtained by the
maximum likelihood principle.
/\.
)J.(p) and a loss L (p) = II~
n
1.
-
The regression curve
/\
u(p)
II
2
is suffered.
lJ.
1.
at x. is estimated by
1.
We shall consider. an ef-
ficient model selection procedure that asymptotically minimizes the loss L (p)
n
"ver a certain class of finite dimensional models of increasing dimension.
This paper completes earlier papers in various ways.
Shibata (1981),
Breiman and Freedman (1983) consider the problem of selecting regression varia-
known
to be Gaussian and derive selec-
tors that are equivalent to ours in this case.
In the setting. of least squares
bles when the true error distribution is
~stimation
Li (1984) recently gave conditions for asymptotic efficiency of
model choice procedures based on cross validation, FPE and other means.
Schrader and Hettmansberger (1980) provided a robust analysis of variance
based on Huber's M-estimates and propose a likelihood ratio type of test for
testing between finite dimensional submode1s.
This approach was pursued by
Ronchetti (1985) who derived a "robust model selection" procedure that is
related to ours.
The mismatch of the chosen likelihood function and of the true error
f
1.2
situation that a Gaussian maximum likelihood estimate (i.e. the least squares
estimate) is computed but the true error distribution is different, possibly
long tailed.
This mismatch is mathematically reflected in the way that a
larger constant in the dimensionality penalty term has to be chosen.
.
For. in-
stance, Akaike's (1970) AIC has the penalty constant 2, whereas in case of
mismatch this constant is changed depending on the type of mismatch.
If there
are outlietsin the data and we apply AIC based on a Gaussian maximum likelihood
,
estimate we will overfit the data since the model selection
fit the outliers.
pr~cedure
tries to
On the other hand, if we apply the model selection procedure
to be presented below, a high dimensional model is more penalized since the
penalty constant is bigger than 2.
In the simple case that the data is Gaussian and the statistician chooses
a Gaussian likelihood function, then our model selection procedure is equiva-
,
lent to Mallow's C
p
(1973) (see Section 4).
This entails equivalence to many
other selectors such as FPE, AlC, GCV, as was shown by Li (1984).
We will assume the control variables xi = (XiI ,x
and the parameter vector
i2
' .•. ) "
i = 1, ... , n,
B ~ (6 1 ,6 2 ",,)' are in l2'
The model can then be written as
Y
= X6 +
e,
where e= (e , ... , en)' is the vector of the independent observation errors
l
haVing distribution F with density f and X'
a linear operator from
l2 to
n
lR
•
=
(Xix2"'X~) is considered as
By p = (PI' P2' ••• , Pk(p)
finite dimensional submodel with parameter
B'(p) = (0, .•. , B ,0, .•. , B ,0, •.. , B
PI
where PI <P2 <••• < Pk(p)' k(p) ;:: 1.
. P2
Pk(p)
,0, ... )
we denote a
1.3
The statistician chooses a likelihood function h of which he believes to repre-
e
sent the true error distribution t and estimates the parameters in a Bubmodel p
by maxiI:!izing
J
n
n
hey. -
1.
i=l
x~(p)8(p»
1.
"
where x ~ (p) = (0 t
1.
x.
••• t
1. Pl
tOt"'t x.
1.P 2
tOt"'t
X.
1. Pk (p)
tOt"')'
1\
Call this maximum likelihood estimate S(p).
The regression surface point V.
:I.
is estimated by
/),
= <x..
:I.
t
S(p».
d
2.
2
Define ~(u) = - du log h(u)t Y = EF~ (e)/(EFw'(e»
t B
n
a family of models p.
= X'X
and let P
A possible selection rule for chootsing a model p
could be defined by W'(p) = -IIO(p)11
n
= -2110(p)
_v11 2
2
+ 2yk(p) +
- 2{-1I0(p)
IIvl1 2 t
n
€
be
P
,
n
since
_v1l 2 +<~(p) -SJ(P»B
}
n
,
+ 2yk(p)
= 2{yk(p) -
<~(p) - SJ(P»B }
n
(1)
1.4
e
denotes the bilinear form u'B v for vectors
n
u,v E: £'2' Suppose
n
that it can be shown that the last term in (1) is tending to a constant urii-
where <u,v>B
form1y over the model class P.
n
same task, at least
Then minimizing W'(p) over P will be the
n
n
as)~ptotica1ly,
However, W' cannot
as minimizing Ln(p).
n
be computed directly from the data since it depends on the unknown regression
curve
lJ.
But note that the last term in W'(p) is independent of the model p.
n
We will therefore define
wn (p) = -II G(p) 1\ 2 +
2yk (p)
as the score function that is to be minimized over P .
The problem of simul-
n
taneously estimating y from the data, in order to make W
n
comp~etely
data driven
is considered in Section 4.
Remark 1
•
If the statistician is in the happy situation of knowing f, then he will
choose h
= f.
If f is symmetric, then by partial integration
I(F)
!l!-"f
and therefore the constant y reduces to (EF~,)-l
•
= I(F)-l,
= EFW'
the Fisher-informa-
tion number in a location family with density f .
We will use the concept of asymptotic efficiency as in Shibata (1981),
Li (1984), Stone (1984): A selected
p is
called asymFtotia~ZZJj optirrtaZ if,
as' n ..... 00
"
L (p)
n
inf L (p)
PE:P
n
n
p
..... 1.
(2)
1.5
The following condition on
~
will be needed.
Condition 1
J
del p based on the (unobservable) pseudodata Y.=~
+~.,~.=J4J(e,)/q.
Define X(n)
as
1
i
1
1
1
.
the (n, p) matrix containing the nonzero control variables in model p and assume that
B (p) = X'(p)X(p) has full rank k(p).
Then the Gauss-Markov estimate or
n
~
based on the pseudodata Y = (Y , •.. , Y )' is defined as W(p) = Mn(P)Y, where
n
l
Mn (p) = X(p)B-l(p)X
(p) denotes the hat matrix in model p.
n
The loss for the
Gauss-~1arkov estimate is Ln (p) = II~ (p) - ~ 11 2 which will be approximately Ln (p)
as will be seen later on.
A well behaved design is guarenteed by
Condition 2
There exists a positive integer N such that with R (p)
n
) R (p)-N
p-p
E: n n
~ 0,
as n ~
00
Leth(p) be the largest diagonal element of the hat matrix M (p).
n
of h(p) is controlled by
Condition 3
sup h(p)R (p) ~ 0, as n ~
'PE:P
n
n
Remark 2
It follows from Condition 3 that
00
•
The speed
1.6
2
k (p)/n ~ 0,
since
Rn (p)
= yk (p)
+
8S
n ~ -,
II ~ - ~ (p) ,,2, ~ (p)
=
~n (p)~.
This should be seen as an
2
\~
analogue of the necessary condition, p In ~ 0, that can be found in Huber (1981,
p. 166).
Conditions 2,3 imply also
h(p)N ~ 0.
L
pEP
(3)
n
Remark 3
If ¢ is bounded, as is assumed in robust regression analysis, Condition 2 can
be weakened.
It is seen from the proofs that in this case Bernstein's inequality
could be used instead of \fuitt1e's (1960, Theorp.m 2).
_
•
weakened to 1:
p~
-
~
P exp(-CR (p»
n
n
/\
De:1ote by p a model p ",P
n
Condition 2 could be
0, for some C > 0.
that ninimizes
~
n
(p) over P.
n
The main resul t
is as follows.
Theorem
"
Under Conditions 1-3, p is asymptotically optimal.
The rest of the paper is organized in four sections.
In Section 2 the
theorem above is shown, in Section 3 we give a variety of examples that satisfy our conditions, and in Section 4 the estimation of y and the relation to
other model selection procedures is investigated.
Section 5 is devoted to de-
tailed proofs of the lemmata that are needed in showing the asymptotic optimality.
2.
Proof of the Theorem
In the proof of the Theorem, the following Lemmata will be used.
Lemma 1
Under the conditions of the Theorem, for all £ > 0
,."
P{suPI'I"IJ(p) -U(p)
p~P
11 2 IRn(n)
""
>
d
~ 0,
~
as n
00.
n
Lemma 2
Under the conditions of the Theorem, for all £ > 0
p{sup
P
pE: n
11n (p) - Rn (p)I/Rn (p)
> e:) ~ 0, as n ~
00
Lemma 3
Under the conditions of the Theorem, fOT all
> 0
£
,."
,."
Recall that the Gauss-Markov estimate based on the pseudodata Y is B(p)
B-l(p)X'(p)Y.
n
=
The crossterrn in (1) will be approximated by a corresponding
,."
crossterm based on the linearized estimates B(p).
<~(p)- SJ(P»B
=
,."
"
\I u (p) - U(p) II
+ <~(p)
S, B(p) >B
- <B(p) -
n
n
2
,(\
,."
,."
+ < ~ (p) - S(p) , B(p) - B( p) >B
.
+
- S(p) ,S(P»B
n
n
<B(p) - sJ(p) - B(p»B •
n
(4)
2.2
By Lemma 1, the first term is of·lower order than R(p) uniformly over P , the
n
n
second term is bounded by the Cauchy-Schwarz inequality and then Lemma land
Lemma 2 are applied.
The third term is handled by formula (8). given in the
I...
proof of Lemma 1, by setting a
as the second term.
= ~(p), n = ~(p).
The fourth term is handled
Suppose that
(W (p) sup
p,p'EP
n
wn (p'»
L (p)
n
n
- (L (p) - L
n
n
+
L (p')
n
a_ld let p* denote a minimizer of L (p) over P.
n
(pl»'I=
I
o (1),
p
Then by (5) \Jith probability
n
greater than 1 - E,
wn (~) -
Wn (p*)
- (L{~) - L CpA»~
n
n
L (~)
n
II
+
L (p*)
n
II
By the definition of p, Wn{p) - Wn(p*)
II
~
-{L (p) - L (p*»
n
n
L (p*)(l
n
+
~L
n
<M
n
0; therefore,
II
-£{L (p) + L (p*»
n
n
II
E) ~ L (p) (l - E)
n
L (p*)
1
S
~
1 - E
1 + E
~-~
which shows that (2) holds, i.e. p" is asymptotically optimal.
follows by observing that
(5)
Formula (5)
2.3
(W'(p) + ~,; - L (p»
n
n
L (p)
n
Ln (p)
• L (p)
n
i n (p)
. .,"',.-,.-.Ln(P)
J
The first factor is tending to zero in probability, uniformly over P
n
3 and formula (4).
over P ., by
n
by
Le~~a
The two other factors tend to one in probability, uniformlv
Lemma 1 and
Le~'Ila ~.
3.
Examples
We start with a reformulation of Condition 2 in the case of hierarchical
model sequences, Le. P
1'>1
infinity.
n
= {(l),(1,2), •.. ,
(1,2, ... , Pn)} with Pn tending to
In this case Condition 2 follows for N = 2 from
Condition 2'
inf
pEP n
Rn (p)
~
00,
as n ~OO.
We slightly abuse notation by writing j for (1, .•• , j).
lows from
Rn (j) = yj
+ IIlJ(p) _J..t11
2
Then Condition 2 fol-
and
InP
j=J +1
n
S
n pEP
.
if J
n
R (p)}-2
J {inf
n
n
tends to infinity slowly enough.
P represents
n
a
co
L
+ y-2
j-2 ~ 0,
j=J +1
n
In the following examples we assume that
hierarchical model sequence.
The following lemma, which is
due to Shibata (1981), is useful in checking Condition 2'.
Lemma 4
. Assume that with a positive divergent sequence {c } the linear operator
n
-1
cn B converges weakly toa nonsingular operator B:i
n
2
~
i , such that every
2
pYp principal submatrix B(p) has full rank p for all p > O.
If (3 has infinitely
many nonzero coordinates, then Condition 2' holds and p* diverges to infinity,
as n
~ 00.
Are the conditions of the Theorem fulfilled for typical examples?
check conditions in examples given by Shibata (1981).
We
3.2
Example 1
Consider the polynomial regression on the interval [0,1).
X.. = (i n- l)j-l ,...~ = 1, ..• , n,
~J
Here
,J
j = 1,2, ...
• and
e.,
y. =
~
~
i = 1, ... , n
is observed.
Condition 1 is model independent and is an assumption about the error distribution.
Condition 2' is satisfied via Lenuna 4 (set c = n).
check Condition 3.
n
It remains to
The symmetric matrix B-l(p) has a spectral decomposition
n
C'lardia, Kent, Bibby, 1979, Theorem A.6.4)
where A = diag(Al(p), ... ,
n
t h el.th norma l'~ze d
A (p»
.
e~genvector
smallest eigenvalue of n
-1
and
p
0f
rn =
B- l ()
n p.
(Y ,· .. , Y ), Y. = (Y ..... , Y ,)'
l
lJ
P
J
PJ
Lemma 4'~nsures t h
()
h
at '1mln
\,
.p t e
Bn(P) is bounded above zero by a constant C.
fore each diagonal element h. of M (p) can be estimated by
~
n
2
-1
-1 2
P Amin(p)
S C pIn.
There-
3.3
So Condition 3 is fulfilled if we ask for
( 6)
I"
A necessary condition is p3 In
-+
°which is slightly stronger than Huber s
I
(1981)
cond it ion s .
Example 2
Consider the following representation of the regression curve
co
lJ
i
=
I
B.cos(~(j -l)(i -l)/nj).
j=l J
x
Here the observations are taken at
= O,n -1 , ... ,
As in the example
above Condition 2 is satisfied by Lemma 4, setting c
n
= n/2.
Condition 3 is
satisfied by similar arguments as above if we assume that (6) holds.
Example 3
Consider the robust M-estimation of location at different units x ..
J
tions are taken repeatedly at p
taken at the point x., j
J
is satisfied if
~,yl
n
= 1, ... ,
are bounded.
different units and nIp
P .
n
Assume that
EF~I(e)
n
observations are
= 0, then Condition 1
Shibata (1981, p. 51) shows that Condition 2
is satisfied if the vectors of the control-variables (xl' ... ' x
independent.
Observa-
Condition 3 can be checked as above.
Pn
) are linearly
4.
Other methods and estimation of X.
There are a variety of other model selection methods, most of which were
shown to be equivalent to Mallows' C.
P
Conly.
P
We therefore compare our method "'ith
J
.....
For simplicity, we work with the linearized estimate lJ(p) based or
the pseudodata
Y.
l1allows' (1973) score function reads
C (p) =
p
.....
II Y
-
.....
2
lJ (p)il + 2yk(p)
The first term is independent of p, the third and the last term vanish uniformly
over model classes P , as can be seen in the next section.
n
model selected by C is asymptotically optimal.
p
It can now be seen that W (p) has a similar structure.
n
-
2(~ - lJ)' ~ + 2yk(p)
....
+ 2lJ' e -
"2
Ill.! II •
>
This shows that a
..
4.2
Here the last two terms are independent of the model.
The
re~aining
terms are
identical to those in Mallows C , which shows that W (p) is equivalent to C .
P
n
It could be argued that the score function that is proposed
,
.
h~re
p
is not
'v
very reasonable in a practical application since the constant y is unknown to
..
the statistician.
However, if the constant y can be consistently estimated (in-
dependent of p) then the score function based on an estimated y is also
optimal.
ic~lly
A
1\
consistent estimate Y
.
n
n
-1 \
L ~
2
'1
~=
n
n
1\
(e.(p »!(n
-1 \
~n
L ~
'1
~=
'
01 Y
1\
a3)~ptot-
is provided, for instance, by
(e.(p »)
2
~n
1\
where £.(p ) denote residuals from a fit with a deterministic model Pn' in~
n
creasing in magnitude as n
inequality show that ~n
~
g y,
00.
A Taylor expansion and the Cauchy-Schwarz
as n ~
00
5.
Proofs
In this section we give the proofs of Lemmas 1-3.
lows a related proof in Huber (1981), Section 7.4.
the proof of Lemma 1 f01Similar ideas were
used by Cox (1983) who considered M-type smoothing splines.
J
In order to.simplify
notation we will consider the hierarchical case only, i.e. the model "p" is
identified with "(1.2 ..... k(p». k(p) = p."
Furthermore we' assume without loss
of generality that the coordinate system in the p-dimensional subspace of the
first p components has been chosen so that X' (p)X(p) = I .
p
4>:E
P
-+
E P•
4>k(T')
T') = (T')l ••.. ' T')p)'
a zero of
~Ik (T')
pseudodata
Y.
=
€
_q-lr~ ll)i(Y. - r~ lxi.n.)x. k , k =
1=
1
J=
J J 1
EP •
Consider the mapping
1, •••• p where
A zero (with respect to n) of 4> will be co~pared with
The
= T')k - r~1= leu.1 + e.)x·
1 1k , where e.1 = l)i(e 1.)/q. q = EF~' (e);
....
....
the least squares estimate
B(p) = X (p)Y based on the
Consider an arbitrary normalized vector a
€
lRP ,
II a II
= 1.
Taylor
expansion of 4>, using Condition 1, leads to
..
= _q-l
I
I
~a
(~' (e.) - q)(
x. 6.)x'
k=l k i=l
1
j=p+l 1 j J 1k
1 -1
- -2 q
l?
n
l?
P
2,
2 8 k L ~"(e. +v( L x .. B. - L x .. n.»)( L x .. t3. - L x .. n.) x ik
k=l
i=l
1
j=l 1J J j=l 1J J j=l 1J J j=l 1J J
(p)
= T
l ,n
00
+ T2· ,n (p) + T3 ,n (p).v
00
£
(-1,1).
We will now show that each of these terms uniformly vanishes over P • in the
n
sense that
5.2
....1/
SUp T
(p)/R 2(p) ~ 0,
P a,n
n,
= 1,2,3
a
(7)
P€ n
for all
(~,a)
in the set
F = n
n
•
. I (I
{(~,a)
pEP
x, , (S.
i=l j=l 1J
n
-
J
~,»
2
J
$
KR
n
(p),
IJall = 1
Define for i = 1, ... , n
V
i
=q
-1
'
(~' (e.) -
q),
1
00
B,
1,n
~
s,
L akx'k'
1
c
L x .. 8.,
j=p+l 1J J
(p) =
k=l
6 i . (p)
,n
1
=
I Xi·(S. - ~.).
. 1
J=
J
J
J
Note that
n
2
lis 11 2 = L s,
i=l
1
n
=
p
I ( L a k x . k)
i=l k=l
2
=
1
The first term T
(p) is estimated as follows.
l ,n
•
p{sup IT
P€p
n
....1/
(p)I/R 2(p) > £}
1 , n.
n
lr
}
5.3
=
n
Ip
I
.1 2N IRNn (p)}.
E-2N E{ I
siBi (p)V
. 1
,n
1
P€ n
1=
Applying Condition 1 and Whittle's (1960), Th. 2 inequality, this term is
J
bounded by
S C1E
-2N \
n
2 N n 2
N N
L
(max S.) ( I B. (p» IR (p), C1 > O.
P€p i=l 1
i=l 1,n
n .
n
Recall that Rn(p) = yp + lI(In - Mn (p»\J11
2
~ r~=lB~,n(P)'
Condition 2,3 (see
2
Remark 2, Formula (3» and the simple inequality s: s r~ 1x2 r P 1a k = h .(p),
1
k
1
J = 1J =
th
where hi(p) denotes the i
diagonal element of the hat matrix Mn(p), imply
that (7) holds for
~
= 1.
The second term is estimated similarly.
We omit some details.
If
(n,a) € F then as above
n
S
CE
1
-2N
I
P€p
n
N n 2
NN
h(p) ( I~. (p» IR (p) •
i=l 1,n
n
Now apply Condition 2,3 as described in Remark 2, Formula (3).
The third term, involving the second derivative of
~
is bounded by
4It»
5.4
If (D,a)
€
F we obtain that with a constant C
n
2
•
which tends to zero by Condition 3.
Altogether we have shown that
(8)
for all (D,a)
Gn
= {nI
€
lR
P
F.
n
€
r~=l (L~=l (n k
: sup
p~p
This entails that for all
n
in the set
- f\)X ik ) 2 lin (p) :s; K},
n
f
(9)
Condition 2 and bounds on higher moments as above imply that with probability
greater than 1 - 0,
-' (p) < Y +
sup Ilw(p)
- w(p) II 2 IR
pEP
.
,n
(10)
E.
n
-
This shows that S(p) e: G with high probability.
"
II ¢(n) - nil
:s;
n
= II Hn) - 'l'(n) + (S(p)
11¢(n) -'l'(n) II
+
Note that
- B(p» + B(p) II
IIB(p) -S(p)1I + IIB(p)"
(ll)
5.5
. From formula (9) we know that the first term vanishes asymptotically.
From
(10) we conclude that for K big enough
Certainly the third term can be made less than
~ ~
has a fixed point
n - ¢(n)
~*
1
2
11 11
K 2p
2.
Thus the function
in the compact, convex set G.
n
Since this·
fixed point is necessarily a zero of $, it is seen that g(p) is in G with
n
~
probabi 1 ity greater than 1 - u.
Substituting AB(p) into equation (9) shows that
Lemma 1 holds.
Lemma 2 is seen by the following equation.
Ln (p)
-
Rn (p)
;'M (p); - yk(p).
=
n
,
Condition 2 implies that
-
sup IIIM(p)ell
~P
p~
n
2
n
.
.
....,
p
- Yk(p) I/R(p) ~ 0
n
which shows Lemma 2.
Lemma 3 follows similarly observing that
<B -
B(p) ,B(P»B
n
~
Acknowledgement
1 would like to thank Charles J. Stone and Johanna Behrens for helpful
discussions .
•
•
References
fI
Akaike, H. (1970). Statistical Predictor Identification.
Stat. ~l, 203-217.
Ann. Inst.
~ath.
Breirnan, L. and Freedman, D. (1983). How many variables should be entered in
a regression equation.
J. Amer. Stat. Assoc. 1~, 131-136.
Cox, D. (1983).
530-551.
Asyrnptotics for M-type smoothing splines.
Ann. Statist.
11,
Huber, P. (1973). Robust regression:
Carlo. Ann. Statist. 1, 799-821.
Asymptotics, conjectures, and Monte
HUber, P.
Wiley, New York.
Mallows, C.
(1981).
(1973).
Robust Statistics.
Some comments on C •
p
K.V., Kent, J.T. and Bibby,
Academic Press, London.
~ar~ia,
J.~.
Technometrics,
(1979).
~,
661-675.
Multivariate Analysis,
Li, K.C. (1984). Asymptotic optimality for Cp ,C , cross-validation and
1
generalized cross-validation: Discrete inaex set. Manuscript.
~
. E
(1985) • •
R.obust model selection in regression.
R:':'lC.,,:?ttl,.
Letters ~, 21-23.
Stat. and
Prc~.
Schrader, R.M. and Hettmansberger, T.P. (1980). Robust analysis of varianc~
based upon a likelihood ratio criterion. Biometrika, ~l, 93-101.
Shibata, R. (1981).
.§~, 45-54.
An optimal selection of regression variables.
Bio:;:etrika,
Stone, C.J. (1984). An asymptotically optimal window selection rule for kernel density estimates. Ann. Statist. ll, 1285-1297.
Whittle, P. (1960). Bounds for the moments of linear and quadratic forms in
independent variables. Theor. Prob. App1. ~, 302-305.
© Copyright 2026 Paperzz