1 2 3 4 5 6 7 8 9 10 11 Supplemental Materials: I. Weighted semi-partial correlation matrices Table 1 (Supplemental material): Weighted semi-partial correlation matrices for Calanoids (upper) and Daphniids (lower) involving phylogeny and biogeography (left side) and phylogeny and environment (right side). Correlations that are statistically significant are highlighted in red (with FDR corrections for multiple comparisons) or pink (uncorrected). Blank entries are correlations that could not be calculated for technical reasons, either due to insufficient data (n of occupied lakes must be greater than number of variables tested) or because biogeographic contrasts in which all species in clade were absent from any of the biogeographic areas tested. R2adj.is the regression adjusted coefficient of determination considering all predictors within the Biogeograpical or Environmental sets. 12 13 14 15 16 17 18 1 1 2 3 4 II. Phylogenetic Coding and Calculation of P . Note that parts of the text in the paper are repeated here in order to provide a better link between this expansion on the phylogenetic coding and the paper. The species-by-node matrix P contains the phylogenetic coding. In principle, there are 5 other possible coding schemes (discussed below), but we use a scheme based on a node-by-node 6 approach inspired by Felsenstein’s phylogenetic independent contrasts (PIC, Felsenstein 1985). 7 We code the species-by-node P matrix to facilitate the calculation of a set of node-by-community 8 statistics contained in the matrix P . In matrix P, all species descending from one of the 9 branches emanating from a node are all arbitrarily given negative values, and those descendeding 10 from the other branch are given positive values (which branch is given the negative sign is 11 arbitrary). If a species is not a descendent of a particular node, it is given a value of 0. The sum 12 of species codes from the two branches equal 1 and -1, thus species located in more species-rich 13 branches are downweighted. In the same way that there are several ways of representing 14 phylogenetic diversity in communities (e.g., Caddotte et al. 2010), there are different possibilities 15 for coding phylogenetic information related to the weights that can be given to each species to 16 reflect the phylogenetic topology and branch lengths. For the analyses in this paper, we used a 17 coding that is based on ancestral state reconstruction but does not account for branch lengths 18 (which are not available for our composite phylogeny), but we suggest below a coding scheme to 19 account for branch length variation as well. . In our scheme, codes begin at 1 (or -1, depending 20 on the arbitrary sign of the branch) at the origin of the branch, and are reduced by half at each 21 bifurcating node until the species are reached. Thus a species on a branch with one species has a 22 code of 1, a branch with two species each have a code of (0.5, 0.5), three species gives (0.5, 0.25, 23 0.25), and so forth (for examples, see Figure 2). More formally, each entry in the matrix (species 2 1 i and node j) was given a value of pij (0.5) d , where dij is the number of intermediate nodes 2 passed on the path between the focal node j and the species in question i. So if the species is the 3 only daughter species on a branch emanating from a node, it has a value of 1, and if it is one of 4 two species on a branch, it has a value of 0.5, and so forth. If the focal node j is not passed 5 between the root and species i (i.e., species i is not a daughter species of node j), the entry has no 6 value (i.e., it takes a value of 0) and that species is not included in the analysis for that node. The 7 coding is quite simple and with the values for the first node (tree root), one can rebuild the entire 8 tree. Once the phylogenetic coding matrix P is built, the community value P for a given node- 9 community pair is simply the sum of the pij’s of all the species (i) descendent from node j, that 10 occur in the local community. This can be achieved by simply multiplying the incidence matrix I 11 by P (i.e., P IP Figure 2). Therefore, P is a combined function of the location of the 12 species on the phylogeny and their distribution in the landscape. Note that because of the 13 multiplication process, species having 0 entries for any particular focal coding has no influence 14 on P even if they are present at a particular site. 15 ij The entries of the node-by-community P matrix are simply the sums of P values for all 16 species that are both descendents of a given node and occupants of the community (note 17 P IP ). The P values reflect the differential representation of species in the community along 18 the two different branches emanating from a node in the phylogeny. The values in P range 19 between 1, where all daughter species of the one of the branches are present but none of the other 20 are present, and the opposite scenario where all of the second but none of the first are present: -1. 21 If all species that are daughter of a given node are present in any particular site then the value is 22 0 for the site in question, which means there is no evidence from that community that the two 3 1 branches have a difference in their propensity to occur under the given local conditions. Note 2 that the expected value of P is 0 if all species have the same probability of occurrence, and is 3 nonzero if species from one branch have a different probability of occurrence than species of the 4 other branch, regardless if the two sides have different numbers of descendent species. 5 However, if one side of the node contains species having in average greater site occupancy (i.e., 6 occupying more sites regardless if they have greater amount of species or not), then, under 7 chance alone (i.e., random site occupancy), P will have a tendency to be different from zero 8 (smaller or greater depending on weather species having greater average occupancies were coded 9 as negative or positive, respectively. However, because the standardization procedure (see 10 section V) is based on site occupancy, this bias in P is corrected. We used simulations (not 11 shown here) to confirm that the standardization scheme do correct for the potential bias 12 described. Moreover, we also considered simulations (not shown here) to assess the possibility 13 of statistical bias (i.e., elevated type I error rates in which the probability of our procedure in 14 rejecting the null hypothesis when there is no association between P and a predictor would be 15 greater than the expected alpha; see Peres-Neto et al. 2001 for overall). The results indicate that 16 our entire framework (i.e., contrast, standardization, weighted regression and permutation tests) 17 has correct type I error rates (i.e., they equal to pre-established alphas 0.01 and 0.05). 18 The matrix P is a combined function of the location of the species on the phylogeny and 19 their site occupancies. Looking across many communities, correlations between these node- 20 community values P and the site variables of those communities reveals that the two branches 21 emanating from the node have diverged in their response to an environmental filter (e.g. different 22 temperature tolerances) or biogeographic event (e.g. on different sides of a historical dispersal 23 barrier). . This procedure is equivalent to calculating a Phylogenetic Independent Contrast (PIC) 4 1 at the node, by coding the states of all present species 1 and the absent species 0, reconstructing 2 the “ancestral states” of the two branches, and subtracting them. Note again that while the 3 procedure is quantitatively inspired by PIC, we do not interpret our ancestral state values as 4 statements of whether the actual ancestor of the species occurred in the community, but rather 5 the procedure is quantitatively useful coding phylogenetic composition across sites and for our 6 purposes of evaluating differential representation of the two branches across communities. 7 8 Examples: We present a number of illustrated examples intended to demonstrate the 9 phylogenetic coding procedure (creating the P matrix), the calculation of P , and the logic 10 behind both. Consider node III in the phylogeny depicted in figure S1. We use the names abc 11 and d to identify the branches emanating from node III, and leading to species ABC, and D, 12 respectively (see figure S1). After arbitrarily assigning the species of one of the branches (abc) 13 negative values, the values given by the formula pij (0.5) ij are (A: -0.25, B: -0.25, C: -0.5, D: 14 1). The dij for A and B is 2, as a path from A or B to node II passes through two intermediate 15 nodes, and C passes 1, D passes 0. The P is simply the sum of all pij of the species that occur in 16 the local community j and are daughters of the node in question. If case 1) all species are present 17 the sum is 0, while case 2) if all are absent, the sum is also 0. Likewise one could use a 18 mathematically equivalent PIC approach, by coding the states of all present species 1 and the 19 absent species 0, and reconstructing the states of the abc and d lineages, then subtracting them. 20 This gives all species with a value of 1 the first case or 0 in the second case, so the reconstructed 21 branch states are abc: 1 and d: 1, or abc: 0 and d: 0, and the subtraction of both give a value of 0. 22 23 d Now consider four additional scenarios of occurrence and the calculations for node III, case 3) D present only, case 4) A present only, case 5) AD present, and 5 1 case 6) CD present. In case 3, one entire branch (d) emanating from node II is 2 present in the local community but no species of the other (abc) are present, which 3 is the maximum difference that can occur in one community. The P of all present 4 species is simply the value for D, 1. Likewise the opposite occurs when only ABC 5 is present, the sum of those scores is -1. Note if we were reconstructing a PIC for 6 case 1, ABC would get values of zero and D would get a value of 1. The 7 reconstructed branches would still be 0 and 1, respectively, and their subtraction (d- 8 abc) would give us 1. 9 In case 4, A is the only species occurring in the site. There is information in 10 the partial occurrence of the abc lineage, and the absence of species B,C, and D. As 11 A shares recent evolutionary history with B, it individually contributes less to the 12 reconstruction of abc than does species C. The sum of all values ( P ) is simply the 13 value for A, -0.25. This implies that there is a weakly greater representation of the 14 ABC branch to occur in the conditions of the site than the D branch. Using a PIC 15 approach, we would first take the mean of the states of A (1-present) and B (0- 16 absent), which is 0.5. Then we would take the mean of this value with the state of 17 C (0-absent), which gives us the state of the abc lineage, 0.25. The state of the d 18 lineage is 0-absent. So subtracting (d-abc) gives us our value of -0.25. 19 In case 5, AD are present, so the P for the community is 0.75. This 20 indicates that all species of the d lineage are present, but only part of the abc clade 21 is present, thus indicating that d is more represented in the local community. Again, 22 using PIC, we would take the average of A-B, which is 0.5, then average that with 6 1 C (0-absent), to give a value of 0.25 for the abc branch. The subtraction, d-abc, 2 gives us our value of 0.75. 3 In case 6, C and D are present, so the sum of the values (-0.5+1) is 0.5. 4 Using the PIC approach, the mean of A and B is 0, which we then average with the 5 value for C (1), to give us abc= 0.5. The value of d is 1, and after the subtraction, 6 (d-abc), we have our final value of 0.5. 7 III. Incorporating phylogenetic branch lengths 8 9 If branch lengths are available, one can incorporate this information by reallocating weights reflecting both topology and lengths, while preserving our original scheme in which the 10 weights of each side of a node sum to -1 and 1, respectively. As there many ways to measure 11 phylogenetic diversity (Cadotte et al. 2010), there is likely no single “correct” weighting scheme 12 to code for phylogenetic relationships 13 Indeed, although it is beyond the scope of this paper, we anticipate that an interesting 14 area of future work would be exploring and modifying the phylogenetic coding to reflect 15 different scenarios and applications of the method to increase the statistical power of our 16 framework or perhaps to test different aspects of community phylogenetics. 17 Here we propose a simple scheme to allocate weights based on trees with variable branch 18 lengths, but where each species is the same distance from the root. The coding is based on the 19 intuition that species that share recent evolutionary history should be individually down- 20 weighted because they are not evolutionarily independent. Thus, we down-weight species based 21 on how much branch length they share with other species. We propose the formula for species i 22 and node j, pij (l1 23 is the length of the branch that species i shares with 1 other species, and ln is the length of the l2 l 3 l ... n ) / B , where l1 is length of the terminal branch of species i, l2 2 3 n 7 1 branch that species I shares with n-1 other species. The first term sums over only those branches 2 (l values) between the species i and node j. The second value is the sum of all branch lengths on 3 the one side of node j, and is divided to normalize the values to sum to 1. In the supplemental 4 figure 2, we work through the calculation for several nodes in an example phylogeny with 5 variable branch lengths. 6 Note that, although species on long terminal branches get higher weights, this coding is 7 consistent with the idea that long branches should be down-weighted in reconstructions, because 8 a long branch implies more opportunity to change. In the example (figure S2) for node III, note 9 that while the weight for C >A,B, the sum (A+B)>C. In other words, the common ancestor of A 10 and B, located at node II, is closer to node III and has a greater weight than C. In summary, the 11 individual species A and B are downweighted relative to C because they share evolutionary 12 history, but their combined weight is greater because their mean is likely to be closer to the 13 ancestor than C. 14 IV. The meaning of P The value of P for a single community simply tells us the representation of the two 15 16 branches on either side of a node relative to each other in the local community. When compared 17 across many communities, it can be used to correlate representation of different branches with 18 site characteristics. So, if in the example above for node III, the ABC clade tends to occur in 19 shallow ponds but D tends to occur in deep ponds, then P should be correlated with depth 20 across sites. This approach is potentially susceptible to artifacts due to having species with 21 different total occupancies, or other similar effects. This is why a simple correlation must be 22 compared with appropriate null models to assess significance, which are described in the main 23 text. 8 1 2 V. Standardization procedure of the matrix P of phylogenetic contrast and matrix E of 3 environment 4 5 Matrix P (species x nodes) was standardized as follows: Pstd P 1k 1Tk Wk P(1 / trace(Wk )) o 1k 1Tk Wk (P P) - ((1Tk Wk P) (1Tk Wk P))(1 / trace(Wk ))(1 / (trace(Wk ) 1) 0.5 6 7 where Pstd is the matrix of standardized node contrasts, 1kis a (k x 1) column vector of 8 ones, k is the number of species, T denotes matrix transpose, Wk is a (k x k) diagonal matrix with 9 elements equal to the sum of the columns of the incidence matrix I, where the first non-zero 10 element of Wk (i.e., [Wk(1,1)]) is the sum of the occurrences across all sites of species 1, and so 11 on. Thus, trace(Wk) equals the total sum of all species’ occurrences. (o) denotes the Hadamard 12 multiplication(i.e., element-wise product), (o) denotes the Hadamard division (i.e., element-wise 13 division) and 0.5 denotes the element wise square-root. 14 15 We standardized the environmental E as follows: Estd E 1n 1Tn Wn E(1/ trace(Wn )) 1n 1Tn Wn (E E) - ((1Tn WnE) (1Tn WnE))(1/ trace(Wn ))(1/ (trace(Wn ) 1) 0.5 16 17 where Estd is the matrix of standardized environmental predictors. We also standardize the 18 biogeographic matrix in the same way by replacing E by B. 1n is a (n x 1) column vector of 19 ones, n is the number of sites (or patches), Wnis a (n x n) diagonal matrix with elements equal to 20 the sum of the rows of the incidence matrix I, where the first non-zero element of Wn (i.e., 21 [Wn(1,1)]) is the sum of the occurrences across all species for patch 1, and so on. Thus, trace(Wn) 22 also equals the total sum of all species’ occurrences. The complexity of both formulae is due to 9 1 the fact that they standardize all nodes across all species and all environmental variables across 2 all sites at once; though complex, they simply standardize each node or environmental variable 3 by its weighted average and variance, where weights are based on the number of sites occupied 4 by each species (Pstd) and the number of species in each site (Estd ). Note, however, that our 5 method is flexible enough to incorporate other weighting schemes including traditional 6 standardization (i.e., mean=0 and variance=1) where all diagonal values in Wk and Wn are set to 7 one, and schemes in which higher weights are given to species with intermediate occupancy and 8 sites with intermediate richness should have the most information (see Peres-Neto et al. 2006) 9 where diagonal values in Wk and Wn are set equal to the variance of species columns and the 10 variance of site rows, respectively. 11 12 10 1 Figure S1: A graphical depiction of the phylogenetic coding scheme, and calculation of the P 2 statistic. In each case we calculate P for node III in two ways, the method inspired by PIC and 3 the matrix-based method. P is the phylogenetic coding and I is the row from the incidence 4 matrix corresponding to that site. 5 6 Figure S2: An example of a phylogeny with variable branch lengths and an adjusted P matrix 7 accounting for those branch lengths. The basic algorithm is presented and an example is 8 calculated for node III. 11 1 FigureS1: 2 12 1 2 Figure 3 S2: 4 13
© Copyright 2024 Paperzz