A Developers Guide To Coprocessors John Weatherford https://github.com/jweatherford Hbasecon 2013 Who Is Telescope? Telescope is the leading provider of interactive television, audience participation and customer engagement solutions. Clients include TV networks, producers, digital platforms, studios, and sponsors seeking to reach, engage, and retain mass-audiences and consumers in real-time. What Is A Coprocessor Arbitrary code that can run on each server Extend the functionality of Hbase Avoid bothering the core committers Two Types of Coprocessors Observers Endpoints React to an event Run code before or after Call a function explicitly Execute code on all regions Client Pre-Action Action Post-Action Client Endpoint Endpoint Endpoint Region 1 Region 3 Region 2 What Can I Do With Coprocessors Access Control Secondary Indexes Optimized Search Data Aggregation Ideas what can be done Real Time Analytics Email split alerts Cache Request Reduce result sets Control compaction times A Short Story Nothing ventured Nothing gained Getting Started With Code preGet(ObserverContext<RegionCoprocessorEnvironment> c, Get get, List<KeyValue> result) postGet(ObserverContext<RegionCoprocessorEnvironment> c, Get get, List<KeyValue> result) prePut(ObserverContext<RegionCoprocessorEnvironment> c, Put put, WALEdit edit, boolean writeToWAL) postPut(ObserverContext<RegionCoprocessorEnvironment> c, Put put, WALEdit edit, boolean writeToWAL) preDelete(ObserverContext<RegionCoprocessorEnvironment> c, Delete delete, WALEdit edit, boolean writeToWAL) postDelete(ObserverContext<RegionCoprocessorEnvironment> c, Delete delete, WALEdit edit, boolean writeToWAL) Our First Observer Intercept and modify the action Consider all circumstances that will trigger the observer Compile your jar to the same version of Java running your Hbase Regions Look for output from the coprocessor Our First Observer Motivation Apache flume only writes one column per put JSON {twitter: { name: “loljk4u”, message: “<3”, length: 2, registered: true }, favorite: { name: “Taylor” ... Single Row Put key: id-1332343 family: twitter qualifier: json_raw value: “{twitter: {name: \“loljk4u\”, message: \“<3\”, length: 2, registered: true ... preput() key: id-1332343 twitter:name: “loljk4u” twitter:message: “<3” twitter:length: 0x2 twitter:registered: 0xFF favorite:name: “Taylor” favorite:song: “I knew you were trouble” put JsonColumnExpander //get the arguments on the coprocessor public void start(CoprocessorEnvironment env) throws IOException { Configuration c = env.getConfiguration(); families = c.get("families", "").split(":"); } public void prePut(ObserverContext<…> e, Put put, WALEdit edit, boolean waL) { if(!put.has(FAMILY, JSON_COLUMN)) { return; } //check for the json_raw column String json = Bytes.toString(put.get(FAMILY, JSON_COLUMN).get(0).getValue()); for(Entry<String, ?> column : columns.entrySet()) { //loop through the json String value = (String) column.getValue(); put.add(family, Bytes.toBytes(column.getKey()), Bytes.toBytes(value)); } //remove the original json from the put put.add(FAMILY, JSON_COLUMN, "--removed--".getBytes()); } Loading the Coprocessor Push the jar to where your cluster can find it $>hadoop fs –put JsonColumnExpander.jar / Alter the table to enable the coprocessor $> alter ‘test', METHOD => 'table_att', 'coprocessor'=>'hdfs:///JsonColumnExpander.jar|telescope.hbase.Json ColumnExpander|1001|arg1=1,arg2=2‘ Verify the load by checking the master web UI. Running The Code Trigger the coprocessor with a put on the table Put put = new Put(“rowkey”); Put.add(“goat”.toBytes(), “json_raw”.toBytes(), json_data); Check each server’s local logs http://regionnode:60030/logs/ hbase-hbase-regionserver-node2. dev-hadoop.telescope.tv.out Creating Your First Endpoint Define the available methods a protocol Implement the protocol Extend BaseRegionEndpoint Load the endpoint on the table Endpoint Example public interface TrendsProtocol extends CoprocessorProtocol{ HashMap<String, Long> getData() throws IOException; } //The endpoint class implements the protocol we wrote above public class TrendsEndpoint extends BaseEndpointCoprocessor implements TrendsProtocol { @Override public HashMap<String, Long> getTrends() throws IOException { RegionCoprocessorEnvironment environment = getEnvironment(); InternalScanner scanner = environment.getRegion().getScanner(s); try { List<KeyValue> curVals = new ArrayList<KeyValue>(); do { curVals.clear(); for(KeyValue pair : curVals){ //loop through values on the region and process } }while(!done); } } } Endpoint Returned Results htable = HBaseDB.getTable(connection, “hbase_demo"); Map<byte[], HashMap<String, Long>> results = null; results = m_analytics.coprocessorExec( TrendsProtocol.class, null, //start row null, //end row new Batch.Call<TrendsProtocol, HashMap<String, Long>>(){ @Override public HashMap<String, Long> call(TrendsProtocol trends)throws IOException { return trends.getData(); } } ); for (Map.Entry<byte[], Boolean> entry : results.entrySet()) { //process results from each region server } Addendum to Endpoints 0.96 is changing Endpoints to use protobuf public static abstract class RowCountService implements com.google.protobuf.Service { ... public interface Interface { public abstract void getRowCount( com.google.protobuf.RpcController controller, CountRequest request, com.google.protobuf.RpcCallback done); public abstract void getKeyValueCount( com.google.protobuf.RpcController controller, CountRequest request, com.google.protobuf.RpcCallback done); } } Telescope’s Coprocessors Observers collect real time analytics data for our moderation platform as well as to create aggregate tables for the steaming data Endpoints optimize searches and transmit only the necessary data. Perform simple reporting queries that don’t need the full power of mapreduce. Questions? Already using coprocessors? I would love to hear about it. Curious to know more about a specific part? All code samples and table definitions can be found at https://github.com/jweatherford
© Copyright 2024 Paperzz