ATLAS FroNTier cache consistency stress testing David Front Weizmann Institute 1 ATLASFroNTier chache consistency stress testing September 2009 outline • Goals • ?? • Parameters to test 2 ATLASFroNTier chache consistency stress testing September 2009 Handling cache consistency ATHENA COOL CORAL Client machine FroNTier client squid squid 3 Server site Tomcat FroNTier servlet Consult modification times to invalidate or not Oracle DB server Memorize modification time of each table ATLASFroNTier chache consistency stress testing September 2009 Goals 1) Verify that the table modification time trigger does not consume too much resources I expect this to be trivial 2) Show that the performance of frontier with the cache consistency does match expectations 3) Compare performance of frontier to direct Oracle connection Testing to be done at CERN And possibly also sending requests from remote sites 4 ATLASFroNTier chache consistency stress testing September 2009 Testing • To test creating high load on the caching system, multiple instances of a job, created by Richard Hawkings, will be submitted • Jobs submitted as grid jobs • Two Atlas FroNTier server machines that are currently being configured will be used for this testing 5 ATLASFroNTier chache consistency stress testing September 2009 Testing challenge • According to Dave Dykstra, in order to load a FroNTier server, one needs many tens of squids. squid is a single threaded application • The reasonable amount of squids to be installed on a machine is its number of cores • Allocating servers to function as pools of squids does not seem to be the correct attitude • An alternative is that a squid will be available on each client machine 6 ATLASFroNTier chache consistency stress testing September 2009 A squid per client machine In order to have a squid per client machine: – Tens of worker nodes should be pre allocated – As suggested by Johannes Elmsheuser, HammerClould can not accommodate such a setting efficiently – Rather, local submission to scheduler, using a dedicated job queue may be done – For this the appropriate resources should be allocated (Dario Barberis?) – In addition, there is a need for a way (script) to setup a squid at beginning of test and tearing it down at end of test 7 ATLASFroNTier chache consistency stress testing September 2009 Testing elements • Squids setup and tear down • Spawn writer jobs one every 15 minutes (or 30 or 60) • Spawning reader jobs via the grid • monitoring 8 ATLASFroNTier chache consistency stress testing September 2009 FroNTier stress testing (at CERN) Tens of pre allocated worker nodes Using a dedicated job queue Used to spawn Athena reader jobs Via the grid Testing manager Setup/tear down squids Athena Athena reader job job Spawn writer jobs Spawn reader jobs Monitoring: DB server Frontier machines squid Once a while (15 minutes) A writer job writes fresh data to Oracle writer job Tomcat FroNTier servlet Oracle DB server 9 ATLASFroNTier chache consistency stress testing September 2009 Parameters to test with 10 ATLASFroNTier chache consistency stress testing September 2009 Parameter: Arrival rate of reading client jobs – I do not know what the expected rate is – Taking into account the number of queries per Athena job (~1-3k), and comparing with the capability of CMS FroNTier to answer queries, I expect the whole system to be able to handle 2-300 jobs per minute (100-20K jobs per hour) – Hence, testing may be repeated with the following rates of reader jobs per minute: 10,100,1000 11 ATLASFroNTier chache consistency stress testing September 2009 Parameters to test with • The rate in which ATLAS COOL tables are updated: – Using: once every 15 minutes • This is the rate that PVSS2COOL works • According to Fred Luehring, other COOL tables are updated in a slower rate – Testing may be repeated with slower periods: • Once every 30 minutes • Once every 60 minutes 12 ATLASFroNTier chache consistency stress testing September 2009 Compare performance of frontier to Oracle Network latency is an important issue, hence test: – Reading client runs from CERN or remote near/far location – TBD: Candidate remote locations may be – For each of these 3(4) sites, run the example workload via frontier or directly to Oracle at CERN. 13 ATLASFroNTier chache consistency stress testing September 2009 cache consistency policy • A comparison between the performance with and without cache consistency may be expected. • Test where both compared cases use the same frontier delays (5 minutes). 14 ATLASFroNTier chache consistency stress testing September 2009 frontier delays • We are currently using 5 minutes for the three delays (squid/frontier server/DB trigger) • I do not see the need to test with shorter delays. • Yet, it may be expected to compare performance between different delays times, in minutes: [5,10,15] 15 ATLASFroNTier chache consistency stress testing September 2009 Monitoring 16 ATLASFroNTier chache consistency stress testing September 2009 Testing time frame • Each test will be done for a given period (one hour) at steady load, after ramping-up the load • Ramp-up may take an order to 1-2 hours, according to Johannes Elmsheuser • After stopping to spawn jobs, load will slow down (maybe within an hour or two as well) • Hence each test is expected to last about 3-5 hours 17 ATLASFroNTier chache consistency stress testing September 2009 Links • David Dykstra: poster of the CMS solution http://frontier.cern.ch/dist/Poster_CHEP09_Frontier-newcaching.pdf • Richard Hawkings: ATLAS COOL reference workloads and tests https://twiki.cern.ch/twiki/bin/view/Atlas/CoolRefWork • David Front: Previous related presentations http://indico.cern.ch/getFile.py/access?contribId=5&resId=1&materialId=slides&confId=59928 http://indico.cern.ch/getFile.py/access?subContId=2&contribId=2&resId=1&materialId=slides&confId=62120 18 ATLASFroNTier chache consistency stress testing September 2009
© Copyright 2026 Paperzz