Fast (constant-time) computation of the leftmost 1 in CRCW PRAM

Fast (constant-time) computation of the leftmost 1 in CRCW PRAM model
Input: any binary (filled with 1s and 0s) array A[0,..,n-1]
Output: variable Location = i, if i is the position of the leftmost 1 in A, and Location = -1 if A is without 1s
Observation If A is populated by 1s, there is only one prefix A[0,..,i], s.t., A[j]=0, for all j<I, and A[i]=1.
Algorithm 1 [suboptimal] with O(n2) processors P0, P1, … split into groups [P0,…,Pn-1] [Pn, P2n-1],.., [Pin,..,P(i+1)n-1], …
where group [Pin,..,P(i+1)n-1] is responsible for the prefix A[0,..,i].
Stage 1.A Identify prefix A[0,..,i] where i is the leftmost 1 in A, output in ID[0,..,n-1], s.t., ID[i]= 1, other values set to 0s.
Location = -1; {this initialisation can be done by processor P0}
for any processor Pj do in parallel
Prefix= j div n; Position= j mod n;
ID[Prefix]= 1; {initialisation of output array ID, we will cull all 1s but one in ID}
if (Position < Prefix) and (A[Position]= 1) then ID[Prefix]= 0; {cull 1 in ID if 1 appears to early in A[0,..,i]}
if (Position = Prefix) and (A[Position]= 0) then ID[Prefix]= 0; {cull 1 in ID if 0 appears at A[i]}
if ((ID[Prefix]= 1) and (Position=0)) then Location= Prefix; {only one processor Pin reports prefix A[0,..,i]}
Algorithm 2 [optimal] with O(n) processors P0, P1, …, Pn-1], sketched below
Hint: (1) Split A into n1/2 consecutive blocks Bi of size n1/2, (2) Test each block Bi for having 1s, (3) identify
the leftmost Bx block with 1s, (4) identify position y of the leftmost 1 in Bx, (5) Location= x·n1/2 + y