CICI: Secure and Resilient Architecture: Building

1. Introduction:
Researchers of many disciplines depend on data sets from a variety of sources, including sensors,
instruments, regulated data sets, and/or other collaborators. Edge infrastructure supplied for research
staff is inconsistent for various data types and often a duplicated effort among IT groups across multiple
support units. Support for researchers at the edge can lack compliance understanding and leave data
vulnerable to exploit. While the Science DMZ model has proven very effective for moving large data sets,
but the reach was limited to data intensive areas. There are opportunities expand the network to fulfill
any research activity which are already underway. This proposal to establish host, network, identity, and
storage security baselines around research networking architecture.
This project will demonstrate, through researcher engagement and persistent monitoring, how to scale
the Science DMZ around data compliance standards. While current Research Data Categorization is
typically considered restricted, there are specific federal standards or governance requirements
depending on the regulated data source, like NIST 800-53, NIST 800-171, FISMA, NIH dbGAP, etc.
These Research Data Compliance levels are inconsistently applied at the edge where researcher connect
and complete their investigations
This project is a collaborative effort between the Applied Research Laboratory(ARL), ITS
Telecommunications and Networking Services(TNS), ITS Services and Solutions(SAS), the Institute for
CyberScience(ICS), the Office of Information Security(OIS), and the Office of the Vice-President of
Research(VPR).
2.0 Rationale for a Secure Data Architecture for scaling the Science DMZ to accommodate any research
connection to Advance CyberInfrastructure(ACI)
While local researcher support is overall sufficient, some researchers require more resources than
are available locally and others fend for themselves. Inconsistent performance, security controls, lack of
storage, and local policies are too frequent. Efforts to solve these problems by many IT groups are often
duplicated and waste resources. The difficulty of meeting specific guidelines, requirements, or compliance
levels is a concern to qualify for future research funding.
2.1 The Science DMZ:
Traditional HPC has operated to serve compute and storage, but lacked consistent network connectivity
on a campus and national scale. The new ICS ACI provides high-quality, advanced computing and
storage for researchers. Penn State 100G Science DMZ was planned, built, and is operated to facilitate
high-performance research data transfers to and from ACI. Twelve campus locations are connected via
two 10 Gb/s fiber interconnects back to ACI in, soon to be, two different data centers. This has been
fantastic for to use cases so far, but does not scale to other researchers needing similar connectivity.
New connectivity options, based on research requirements, have been designed and are being trialed to
build the “Science DMZ as a Service”. This will be defined in a later section. This proposal seeks to build
a security model around these new connectivity types. Also, this proposal seeks funding for a Packet
Broker Router. This telemetry device will receive mirrored traffic from the Science DMZ interconnects and
then has the ability to redistribute all or a filtered set of traffic to multiple security tools.
2.2 Edge Firewalls:
Penn State operates a BRO intrusion detection system(IDS) cluster to monitor border traffic and deploys
border Access Control Lists (ACLs) to mitigate common security threats or know offenders.
The
University does not deploy border firewalls and operates an open perimeter network. Colleges, institutes,
and departments each must deploy a firewall solution where their LAN uplinks to the University Enterprise
Network(UEN). TNS offers a managed edge firewall service, which can be purchased. Otherwise, a
locally managed firewall solution can be installed and operated. IT groups that run their own firewall
solution have various funding and staff resources, resulting in a range of quality. Across the enterprise,
this does not guarantee consistent, secure, or high performance connectivity for researchers who are
behind these firewalls.
perfSONAR nodes have been setup just outside and inside some of these
firewalls. The tests have been as bad as only 37% of capacity and 65% loss, while others operate at
acceptable rates.
This proposal illustrates plans to scale out Science DMZ connectivity options. Private VLANs over Virtual
Private LAN Service (VPLS) 10Gb/s connections can be provided for any research data type or group.
This option removes edge firewall slowness and provides a consistent platform to implement security.
This proposal seeks to investigate a security models around VPLS connections, private or RFC1918 subnetting, http(s) proxy services, use of Globus Personal endpoints, and Globus Connect Data Transfer
Nodes(DTN).
2.3 Storage:
Research requires storage.
Some store on what is attached to traditional HPC, while others use
commodity requested storage pools. With access to storage in place, some have experience inconsistent
performance and access as noted in the previous section with firewall inconsistency. This has frustrated
some researchers, as transfer speeds are slow and dropped connections leave file open.
Local
department storage could be available, but is usually not enough. Some buy an array of hard drives as a
desktop solution.
While presenting the Science DMZ concepts on campus, we’ve heard of some carrying or bicycling hard
drives and USB memory sticks across campus because of nested firewalls and multiple NAT. We feel the
connectivity options reference above will reduce latency, lost data, and locked files when connecting to
the new ACI storage options. This proposal seeks to fund security monitoring for data transfer to and from
ACI storage from anywhere within the Science DMZ or outside.
2.4 Hosts or Servers connecting through the Science DMZ to ACI:
As mentioned previously with the firewalls, research desktop hosts and servers that are not centrally
managed are inconsistently operated and efforts duplicated in maintaining like hardware or software
across the university. Some researchers can opt-out of local support to maintain devices themselves.
Others have figured out how to trick the vulnerability scanners to skip their machines.
Leased
workstations connected to instruments can be contractually obligated for public IP addresses or very
open firewall rule sets.
Host security represents a large opportunity for a consistent approach to a secure data architecture
model.
This proposal seeks to investigate host minimum security baselines, as well as advanced
configurations for more sensitive data. Depending on data categorization and researcher requirements, a
separate host with a separate connection to ACI may be requested. The new connectivity models will
have this capability. Additionally, host client side security application may be needed. This proposal
would like to establish a table of researcher data types with host security requirements. These host
requirements are built around their data, to keep it safe, and to guarantee future funding. If the data types
are appended for research data, host security is no longer around academic or administrative models.
This could eliminate some researchers from opting-out of policy, which ultimately leaves their data further
exposed.
A Researcher could be told what they can do instead of what they can’t. The more
inconsistently managed the hosts are, the less time people have to keep researcher data secure.
2.5 Identity and Access Management
Identity and Access Management (IAM) in a collaborative environment can be difficult to manage. Web
access with cosign has provided a Single Sign-On (SSO) for a number of resources. Kerberos, Radius,
and LDAP are additional service available. Federeated authentication is through InCommon. Multi-factor
authentication (MFA) and Two Factor Authentication (2FA) have recently been implemented across the
university and new ACI infrastructure.
How are shipped Hard Drives, USB sticks, and downloaded
research data verified? Was there a MD5 checksum included and processed to validate data integrity?
This proposal will work with InCommon federated authentication and 2FA with Globus when available.
Also, as the Internet 2 TIER programs matures, there may be integration opportunities with that IAM
stack.
3.0 Research Science DMZ On-boarding
Part of the proposed plan will contain section for on-boarding researchers to the network and ACI. Penn
State’s institutional data is a mission critical asset. Policies and protections on the network have been
crafted around researcher data types and requirements to safeguard this data.
As technology has
evolved, larger data sets, and needs for high speed, low impedance research networks have become a
required business need for Penn State to remain competitive in the research space. The intent of a new
100GB research network is to maximize bandwidth available to researchers who have the need for high
speed connections in support of their research activities. TNS, ICS, and OIS have worked together to
provide 1/10G networking at the edge for researchers. The new network is based on a Science DMZ
design will be referred to as the “Research Network”. TNS will maintain the network, OIS will provide
oversight, and ICS will provide compute and storage.
This section will serve as the basis for a
memorandum of understanding (MOU) and Operating Level Agreement (OLA).
3.1 Researcher Engagement
The principal investigator (PI) is responsible for requesting authorization to use the Research Network. A
web form will be available for this purpose and will be integrated with a Service Management Workflow. If
the PI will not be the primary person managing the endpoint, then a designated person to whom the PI
manages will be named on the request authorization form. The form will be reviewed and approved by
OIS to ensure compliance with the data, host, and access. The review process will normally take a
maximum of one work week, but expedited handling can be requested via the form for time-sensitive
research needs.
The PI will certify they accept responsibility for other users in their group that will have access to
endpoints on the Research Network. The PI will be responsible for making users aware of Shared
Governance of Research Computing and Cyberinfrastructure(RCCI) groups and review policies and
expectations from the RCCI groups recommendation.
There is an Advisory Council, Executive
Committee, and Senior Advisor(aka the Research Guru). The Guru serves on the EC, the AC, and each
of the AC’s working groups: Data Centers, Data Governance, High Performance Computing, IT/HR Job
Classification/Compensation, Research Network(Science DMZ) and Data Classification Policies, and
Software.
The PI to certifies that the endpoint adheres, and will continue to adhere to the Minimum Security
Baseline(MSB) while connected to the Science DMZ. Because of the high-speed nature of the network,
portions of the MSB related to network firewalls are exempted.
The PI is responsible for the acquisition and maintenance of the hardware/software associated with the
endpoint connected to the Science DMZ.
The PI will notify TNS, ICS, and OIS when the endpoint is removed or decommissioned from the
Research Network via an online form.
3.1.1 Data Types
Three categories of sensitivity shall exist with regard to data used within the University. These are Public,
Internal/Controlled and Restricted. The following definitions apply:
Public: Public data are intended for distribution to the general public, both internal and external to the
University. The release of the data would have no or minimal damage to the institution.
Internal/Controlled: Internal/controlled data is intended for distribution within the University only, generally
to defined subsets of the user population. The release of the data has the potential to create moderate
damage to the institution. (Such damage may be legal, academic [loss or alteration of intellectual
property] financial, or intangible [loss of reputation]).
Restricted: Restricted data are those, which the University has legal, regulatory, policy or contractual
obligations to protect. Access to restricted data must be strictly and individually controlled and logged.
The release of such data has the potential to create major damage to the institution. (Such damage may
be legal, academic [loss or alteration of intellectual property], financial, or intangible [loss of reputation]).
This proposal will investigate a possible fourth data type as regulated data. Data generated outside
Penn State at a federal institution or another governing body could have additional infrastructure controls,
mandating policies, and/or specific conditions in data use agreements. TNS, ICS, OIS, and VPR offices
must address these documents and create matching infrastructure.
3.1.2 Governance, Risk, and Compliance (GRC) tool, also known as Modulo Risk Manager, automates
GRC processes to manage risk. The tool integrates different areas and activities to allow for centralized
reporting related to risk management, compliance with laws and standards, and enterprise risk. GRC
Support provides help in using the tool as well as building risk prevention programs like audits, legal
activities, internal and external controls, and compliance with health and safety, etc.
3.1.3 Transfer Data to/from Research Network
Globus Online will be the primary and preferred method of transferring data to/from the Science DMZ
endpoints to/from any other device on/off the Science DMZ. SSH, Gridftp, sFTP, rsync maybe used to
transfer data to/from the Science DMZ.
Other means to transfer data to/from the Science DMZ will require submission to and approval from the
ICS, OIS, TNS units.
3.1.4 Remote Connectivity
If remote access to the Science DMZ is needed from a unit-level system, access must be requested on
the initial request form, with the specific network IP addresses that need to access the Science DMZ
endpoint(s) for which the researcher is responsible. Approval will also be necessary from the unit’s
Network Contact if unit firewall rules must be modified to facilitate connections. On the Science DMZ,
router ACLs will be setup based on the same information.
Wireless access to the Science is generally not allowed. Exceptions must be requested by the PI and will
normally be facilitated by VPN specific to the Science DMZ or a dedicated host within ICS-ACI.
3.2 Responsibilities by Group
3.2.1 Responsibilities of Telecommunications and Networking Services (TNS)
TNS will ensure that the IP addresses associated with the Science DMZ are segregated via either router
ACL’s or firewall rules from the main academic and administrative networks of the University. The
exception would be devices for which Remote Access is authorized one-way from the unit to the Science
DMZ to allow researchers to access their servers from their work environment. TNS will also maintain
current and future connectivity options. TNS will fulfill connectivity service for the Science DMZ and
access to ACI. TNS will operate perfSONAR to maintain optimal performance. TNS will maintain a web
form the Service Management Office that will handle the on-boarding and removal of devices from the
network.
Exceptions to the MSB or unique cases will be reviewed by the RCCI RNWG.
TNS will
troubleshoot suspected network issues in collaboration with the PI or represented contact. Capacity
Planning will provide planning for current headroom and future upgrades.
3.2.2 Responsibilities of Office of Information Security (OIS)
OIS will analyze network data and traffic to ensure compliance with the MSB and this documentation.
3.2.3 Responsibilities of the Network Contact
The network contact is responsible for working with the researcher to ensure that local policies and
guidelines are being followed.
The researcher and network contact will need to negotiate support
conditions once equipment is moved into the Research network.
3.2.4 Responsibilities of the Senior Advisor (Research GURU)
As the Research GURU is covering all groups within the RCCI councils, committees, and groups, any
technical, compliance, or security requirement will be communicated.
3.2.5 Responsibilities of ICS
ICS will operate ACI systems and storage. ICS will provide feedback to connectivity options and notify
TNS of any required upgrades.
3.3.0 Security, Performance, and Compliance Monitoring
The basic premise of the Science DMZ for access to ACI are as follows:
Deny All traffic then build ACLs around researcher requirements. Mac lock down per port could be
applied when needed.
Private VLANs could be applied if requested.
Vulnerability Scan will be
established. Prefer private IP addresses for hosts, endpoints, instruments, and workstations. Public IPv6
then RFC1918 private IPv4 to limit public access to workstations and servers. Prefer Globus DTN for
public IP and data transfer.
3.3.1 The proposal seeks to the following on the network to monitor security:
1. sFlow Data will be sampled from all switches, all ports.
2. OIS BRO Deep packet inspection is performed on all data that crosses the PSU border
3. A packet broker router to collect traffic on 100G Layer 3 ports, 10G L2 uplinks, and 100G VPLS
connections and forward to Broala appliances for deep packet inspection and/or any other
existing security tools.
4. Science DMZ Router ACLs that syslog an even on every denied packet.
3.3.2 The following tools will be available to address problematic behavior or hardware:
1. Deactivation of port to prevent further access to the Science DMZ
2. Border ACLs that prevent access to resources beyond the Penn State border
3. Science DMZ ACLS
3.3.3 Development, changes, and growth of the network will be handled by the RCCI Research
Networking and Data Compliance Working Group
3.4 Compliance
OIS and Research GURU will be responsible for the security oversight and notification of violations of this
document or compromise. The violation notifications will be made to the PI and the Network Contacts.
The PI is responsible for providing access to the endpoint and credentials for logging into the endpoint in
the event of a violation or compromise.
The Network Contacts are responsible for completing the violation or compromise protocols before the
endpoint is permitted back on the Science DMZ.
4.0 Secure Data Architecture components
In section 2, lessons learned and opportunities were presented from the Science DMZ implementation. In
section 3, research engagement has illustrated the need to scale and handle more data types. In this
section, we will add onto the two previous sections services and tools requested.
4.1 Science DMZ Connectivity
The current implementation of the Penn State Science DMZ Research Network consists of a central
network core and edge (Brocade VDX 6740) switches, which were paid for by National Science
Foundation CC-NIE Program (NSF 12-541). To ascertain the extent of the data movement problem,
network research flows were monitored on the existing network and locations were identified where the
largest data movements were occurring. Edge switches were added to those buildings in an effort to
address the lion’s share of the research data movement. This had the inherent effect of making utilization
of the Science DMZ contingent upon the location of the researcher.
With a desire to remedy the location dependence of the existing Science DMZ, we have developed the
following plan to scale out the network and make it more accessible to all researchers with large data
sets, faster visualization, or data compliance requirements.
4.1.1 Premium Switch (Available Now): At the premium level, a researcher, department, or College could
purchase additional Brocade VDX 6740 switches to expand the network out to their “big data” location.
This option is the most similar to an edge connection of the existing Science DMZ. A group considering
this should meet with our Engagement and Implementation teams to discuss the specifications for the
device and any physical or geographic limitations. This 20 Gb/s option provides two (2), 10 Gb/s
connections to the Science DMZ Core in each data center and up to 48, 10 Gb/s or 1 Gb/s connections to
computer and equipment (these can be “mixed and matched”). This option has the same advanced
switching capability that the original Science DMZ locations have.
4.1.2 Data Center (Available Now): Researchers with equipment already in a Data Center (either the
Computer Building Data Center or the forthcoming Data Center on Tower Road) are encouraged to
connect to the Penn State Science DMZ Research Network via the DMZ aggregation switches in those
Data Centers at 10 Gb/s. This will also provide that researcher with direct connections to ICS-ACI
compute clusters and resources located in those Data Centers. Provisions can be made for those
connections to comply with different levels of Federal and/or granting agency requirements. This option
also includes the above mentioned advanced switching capability.
4.1.3 Ethernet Fabric VPLS (Available Fall 2016:Testing complete, In trials): Another high-speed option
consists of a 10 Gb/s Ethernet Fabric Switch. This option provides one (1), 10 Gb/s connection from the
switch to the Science DMZ, an additional 10 Gb/s fiber edge port, and either 24 or 48 1 Gb/s connections
to individual research workstations or instruments. This option will provide faster access to other points on
the Science DMZ including the ACI equipment and reduce network congestion on a department or
College’s firewall and local area network (LAN). The existing building wiring plant should suffice to allow
for 1 Gb/s connections over Category 6/5e, copper Ethernet connections and wall jacks. We are
investigating the design of a Federal/granting agency compliant solution on a switch-by-switch basis.
Again, this option should be coordinated with our Engagement and Implementation teams to assure
seamless integration into the Science DMZ.
4.1.4 Compliance Port (Proof of Concept): At base level of connectivity, we can provision an individual
“research or compliance port” on an existing ITS managed, converged network switch. Using the
capabilities of these switches, a wall jack network connection can be “virtualized” as a connection on the
Research Network. This will be the least expensive solution. It is unclear at audit level whether the virtual
port can be made Federal/granting agency compliant. Further investigation is needed. As with the above
solution, this will provide a single, 1 Gb/s connection to the Science DMZ.
4.2 sFlow
sFlow is a industry standard packet sampling technology built into most switch vendors hardware.
Through sampling, sFlow becomes extremely scalable, even at 100Gb/s. The samples can be exported,
along with interface counters, to a collector for further analysis. This is very intriguing for a number of
reasons. If each interface in the Science DMZ can export packet samples, counters, and hardware
information, the interfaces act as sensors to your network. Normally on most security platforms, you must
select your collection point and either mirror or tap that traffic back to an analyzer. If you have your uplink
ports mirrored to a security collector for full packet analysis and all ports sampled with sFlow, sFlow
provides you visibility everywhere. sFlow has another additional feature, its near real-time. As packets
are flowing through devices at thousands per second, the real-time data can allow you create a security
signature, create a threshold, trigger an event, and act on a control of that signature. This is how the
sFlow DDOS mitigation tool.
4.2.1 InMon Traffic Sentinel
As networks converge and operate more seamlessly, visibility into the switch layer is key for success
merging of network research resources. Moving from the Science DMZ idea to “Science DMZ as a
Service” will require scalable monitoring of a multi-tenant infrastructure. This one tool provide visibility
into routers, switches, data center fabrics, linux/windows servers (with host-sflow agent), storage
networks, virtual machines, virtual switching, applications, docker containers to create a single view into
the entire Science DMZ. This system will collect the sFlow form every port within the Science DMZ and
any systems or applications forwarding metrics to it.
4.2.2 sFlow-RT SDN controller
This sFlow analytics engine sits inline with the Sflow collector mentioned above. sFlow-RT can match on
Layer 2-Layer 7 flows in real-time. The way you would query a database for certain field and filter out
what you do or do not want, is the same way you can build, match, and threshold a flow with sFlow-RT. It
has been used to create Real-Time dashboards for monitoring network traffic, thresholds, CPUs,
countries of origin, etc. sFlow-RT’s SDN controller function has been used to mitigate DDoS attacks,
steer flows when other links are congested, forward flows to (virtual TAPs) to security devices, and mark
flows with a higher priority to guarantee bandwidth.
4.3 Router Security
The Science DMZ routers have Access Control Lists(ACLs) on applied to deny all traffic into the Science
DMZ by default. All traffic denied creates a syslog event and is forwarded to the Security office for
aggregation and review. ACLs have been opened to all of Penn State resources like DNS, NTP, Proxy,
etc.
After engaging the researchers about their data security and risk, additional ACLs can be
customized and opened based on the researchers requirements. This means that a researcher’s server
in the data center can have one ACL built and their desktop in their office can have another. This ACL
setup works for now, but it not going to scale forever. We’d like to design a more efficient ACL workflow
for future using SDN tools. Also, we have requested a bcp38 feature set with our current routers.
4.4 Syslog with Elastic Search, Logstash, and Kibana (ELK)
OIS and ICS currently operate logging services for analysis. This can received server, host, and network
data. Syslog servers will collect the Router ACL accounting data in addition to the Bro and proposed
Broala appliances.
4.5 Border monitoring
Penn State Office of Information Security currently operates a BRO cluster to monitor all border traffic.
ACLs are in place to block common security threats as well as known suspicious IPs or ports. ACL
accounting can be turn on to syslog each time an ACL is matched. This proposal will assist in the
development of the SDN tools on the Science DMZ which could be adapted for our border routers.
4.6 For specific Science DMZ security, deep packet inspection monitoring, the proposed security for the
Science DMZ will introduce a Packet Broker. This will service as a telemetry solution for distributing
Science DMZ traffic to numerous security platforms. Mirroring both of the 100Gb/s Layer3, Internet
bound connections as well as the proposed 100Gb/s Layer2, VPLS campus connections will provide
complete, full packet capture of all traffic to and from the Science DMZ. sFlow will provide visibility at
every port and if a signature is matched, a virtual TAP can be setup to mirror all traffic back to the Packet
Broker. By use a traditional router as a packet broker to do what the more expensive security solutions,
you have a cost effective solution, with the benefits of SDN, and routing/switching features.
4.6.1
Brocade Visibility Manager provides an application to accept, filter, replicate, and drop traffic from the
Science DMZ span ports. This would allow the OIS group the ability to manipulate L3 IP, L2 VLAN, or
VPLS to SIEM, Froensic, IDS/IPS, NPM, IT management, and APM tools. These could include Bro,
Broala, Wireshark, Splunk, FireEye, PaloAlto. With the sFlow queries mentioned above, OIS can build a
sFlow ‘flow’definition out of packet headers. With the SPAN ports, Packet Broker, and Visibility Manager,
a similar approach can be made to investigate further into the packets to look for a signature, executable,
or exploit.
4.7 Broala
Broala is a turnkey appliance running BRO IDS with vendor support. By proposing an appliance and not
more equipment for our existing cluster, we hope to investigate the build vs. buy problem with many open
source solutions. If you buy to 80% of your requirements, and develop the additional 20% from the
appliance API to meet local needs, you would have a complete solution. Many of the features OIS would
like to implement in BRO have already been developed in Broala, including the Netcontrol framework for
writing SDN rules for DDOS, Large Flow ByPass(pass elephant flows once they are OK), drop, redirect,
shunt, whitelist, and quarantine. This can create a closed loop for security analysis and threat mitigation.
5.0: Use Cases
5.1 Weather information has grown in volume and importance to the global community, and for more than
two decades, the NSF-sponsored UCAR/Unidata program has operated an Internet Data Distribution
(IDD) network that disseminates weather information to universities and government agencies around the
globe. Penn State plays an important role in this network as a top-level relay ingesting over 700 GB of
data per day and distributing nearly 5 TB of data per day to more than 20 external sites. Three servers in
a high-availability configuration on the Penn State Science DMZ Research Network handle the transfer of
these data which include a variety of types such as ground weather observations, satellite imagery, radar
imagery, and complex digital output from the world’s most advanced weather forecasting models. In
addition to providing these data as a service to external sites, the data are also used within Penn State for
research, classroom instruction and public service. The Penn State Electronic Map Wall (the “e-Wall”)
has become a well-known source for processed graphical weather information and is largely supported by
the UCAR/Unidata IDD. As new sources of weather information become available and downstream sites
are added to the IDD network, network demands will continue to increase, perhaps doubling in 3-5 years.
Indeed, there are already many datasets available that are not included on the IDD network due to lack of
infrastructure in either the network or at the endpoints.
5.2
HydroTerre is a research prototype platform developed at Penn State for the hydrology community. It
provides access to aggregated scientific data sets that are useful for hydrological modeling and research.
HydroTerre’s frontend is a web service, and a user query can request creation of a data bundle whose
size can vary from a few megabytes to 100’s of gigabytes. In this article, we present software tuning and
optimization strategies for various hardware configurations of the HydroTerre platform. Our goal is to
minimize access time to a wide range of data bundle creation queries from users. We use automated
schemes to estimate the computational work required for various queries, and identify the bestperforming hardware/software configuration. We hope this study is instructive for researchers developing
similar data management cyberinfrastructure in other science and engineering fields.
5.3
Penn State Health maintains electronic medical records and financial data for patients. This data contains
PHI and PII and is frequently used for secondary purposes in research. Specific uses include the
following:
1. transfer of data within and external to Penn state for integration with other data sources such as
geospatial, genomic, social, and other variables
2. transfer of data to biostatisticians and transcribers
3. transfer of data for large scale data networks across hubs
4. controlled use of data such as from CMS and other sources
5. use of patient data for computer scientists, machine learning, and predictive analytics
5.4 BioBehavorial Health receives a large amount of data from Add Health. Most of the requirements are
based on the host and not the network. The data itself is typically restricted (both non PHI and PHI).
Recently, we have started getting data from the Dept of Public Welfare. These data sets require much
tighter security as the data contains identifiers that relate to minors.
computers on the network.
For now we do not allow these
We require bios passwords, full disk encryption, disable alternate boot
devices, complex Windows passwords, and preventing the use of USB Drives.
6.0 Research aspect
While much of this may be viewed as an operational model, we view the Science DMZ as an opportunity
to research and innovate techniques on technology, security, and research engagement. The Applied
Research Laboratory Cyber, Data and Image Sciences Division was involved to review and provide
feedback on documentation of Science DMZ network router, switch, host, server, storage, application
security, the security tools and devices we used, and the SDN application and controls put in place.
6.1 Conclusion:
The proposed plan will design, implement, and operate a secure data architecture model to provide an
enterprise wide connectivity to Advance CyberInfrastructure through Penn State Science DMZ as a
Service. The proposal will investigate a consistent, higher level baseline (NIST 800-53 moderate) for
securing research data, connectivity, and workflow.
Scalable Science DMZ connectivity models with
additional monitoring with sFlow on every port and Bro from all uplinks, interconnects, and virtual TAPs,
will provide continuous network control of traffic to reduce threats, exploits, DDOS, or unwanted incident.
Isolating traffic by data type, regulation, or lab can now be implemented. Creating security zones for
sensitive projects or contractual restrictions will not be a problem, both physical or virtually. Science DMZ
On-boarding, Network Requirements, Security and Data Policies are built on faculty and IT working
groups.
The Research GURU position is poised to facilitate Penn State resources collectively, fix
problems, and join forces, while dealing with data compliance, data use agreements, etc. ICS-ACI is
building out innovative compute and storage infrastructure to accelerate discovery. The Science DMZ
has speeds not expected a few years ago and can now will now connect research data among any of the
commonwealth campuses and medical center. This proposal seeks to build an innovative, consistent,
managed, university wide, approach to securing the Science DMZ and access to Advanced Cyber
Infrastructure.