For the source code, see: https://github.com/hiszm/hadoop-train
User behavior log overview
- Records of each search and click by the user
- Historical behavior data, from historical orders
==>Then make recommendations / so as to improve the conversion of users (the ultimate goal)
Log content
20979872853^Ahttp://www.yihaodian. com/1/? type=3&tracker_ u=10974049258^A^A^A3^ABAWG49VCYYTMZ six VU9XX74KPV5CCHPAQ2A4A5^A^A^A^A^APPG68XWJNUSSX649S4YQTCT6HBMS9KBA^A10974049258^A\N^A27. 45.216.128 ^ A ^ A, unionkey: 10974049258 ^ A ^ a2013-07-21 18:58:21 ^ a \ n ^ A ^ A1 ^ A ^ a \ n ^ anull ^ a247 ^ A ^ A ^ A ^ A ^ amozilla / 5.0 (compatible; MSIE 10.0; Windows NT 6.1; wow64; Trident / 6.0; slcc2;. Net CLR 2.0.50727;. Net CLR 3.5.30729;. Net CLR 3.0.30729; Media Center PC 6.0;. Net4.0c) ^ awin32 ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ a Guangdong ^ A20 a ^ A ^ Huizhou city^A^A^A^A^A^A^A\N^A\N^A\N^A\N^A2013-07-21 20977690754^Ahttp://www.yihaodian. com/1/? type=3&tracker_ u=10781349260^A^A^A3^A49FDCP696X2RDDRC2ED6Y4JVPTEVFNDADF1D^A^A^A^A^APPGCTKD92UT3DR7KY1VFZ92ZU4HEP479^A10781349260^A\N^A101. "A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ A ^ 07-21 18:11:46 ^ aShanghai ^ A ^ A ^ A ^ A ^ A ^ a \ n ^ a \ n ^ a \ n ^ a \ n ^ a2013-07-21
- The second field: url = > page ID
- Field 14: IP = > location
- Field 18: time
Common terms of e-commerce
1.Ad Views(Ad browsing):The number of times online advertisements are viewed by users. 2.PV(Visits): Namely Page View. Page views are calculated every time the user refreshes. The total number of times each page of the website has been viewed. A visitor may create a dozen or more views. Or understand it this way: the number of times users open pages on your website, how many pages they browse, or how many times they refresh. 3.Impression (Impression number):It refers to each display of the web page required by the user, which is one Impression;Advertisers want 100000 people to see advertisements, that is, 100000 times Impression;It is also one of the elements to evaluate the effect of advertising. 4.UV(Number of unique visitors): i.e Unique Visitor,Visit a website or see an advertisement on a computer client for a visitor. The same client is calculated only once in 24 hours. 5.IP (independent IP): Namely Internet Protocol,Refers to independence IP Count. Same within 24 hours IP The address is calculated once. 6.URL(Uniform resource locator): URL Give the location of any server, file and image on the Internet. Users can link specific information via hypertext protocol URL And find the information you need. That's the landing page URL. 7.Key Word(keyword) 8.HTML(Hypertext markup language): A page description language based on text format, which is a general editing language of web pages. 9.Band Width (bandwidth):Information that can be transmitted through a transmission line at a certain time(Text, picture, audio and video)Capacity. The higher the bandwidth, the faster the web page will be called. The limited bandwidth leads to making the image files in the web page as small as possible. 10.Browser Cache(Browser cache):In order to speed up the browsing of web pages, the browser stores the recently visited page in the hard disk. If you visit the site again, the browser will display the page from the hard disk instead of from the server. 11.Cookie:A file in the computer that records the behavior of users in the network; The website can be accessed through Cookie To identify whether the user has ever visited the website. 12.Database(database):It usually refers to the use of modern computer technology to classify and sort out all kinds of information in order to facilitate search and management. In network marketing, it refers to the use of the Internet to collect users' personal information and coexist Filing and management; Such as: name, gender, age, address, telephone number, hobbies, consumption behavior, etc. 13.Targeting(directional): Deliver the most appropriate advertisements to users through content matching, user composition or filtering. It is also what Baidu said to find accurate customers. It is advertising orientation and customer orientation. 14.Traffic(flow):Number and type of sites visited by users
Project requirements description
- Statistical visits (pv)
- Count the visits of each province (ip)
- Count the number of visits to each page (url)
Data processing flow and technical architecture
Realization of browsing statistics function
Statistics page views
count a row of records into a fixed ket, and the value is assigned to 1
package com.bigdata.hadoop.mr.project.mrv1; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.NullWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import java.io.IOException; /** * Statistics of traffic in the first version * */ public class PVStatApp { public static void main(String[] args) throws Exception { Configuration configuration = new Configuration(); Job job = Job.getInstance(configuration); job.setJarByClass(PVStatApp.class); job.setMapperClass(MyMapper.class); job.setReducerClass(MyReducer.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(LongWritable.class); job.setOutputKeyClass(NullWritable.class); job.setOutputValueClass(LongWritable.class); FileInputFormat.setInputPaths(job, new Path("input/raw/trackinfo_20130721.data")); FileOutputFormat.setOutputPath(job, new Path("output/pv1")); job.waitForCompletion(true); } static class MyMapper extends Mapper<LongWritable, Text, Text, LongWritable> { private Text KEY = new Text("key"); private LongWritable ONE = new LongWritable(1); @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { context.write(KEY, ONE); } } static class MyReducer extends Reducer<Text, LongWritable, NullWritable, LongWritable> { @Override protected void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException { long count = 0; for (LongWritable value : values) { count++; } context.write(NullWritable.get(), new LongWritable(count)); } } }
output : 300000
Statistics of provincial visits
Count the flow of each province
select province count(1) from xxx group by province
City information < = IP analysis < = how IP converts city information
package com.bigdata.hadoop.hdfs; import com.bigdata.hadoop.mr.project.utils.IPParser; import org.junit.Test; public class Iptest { @Test public void testIP(){ IPParser.RegionInfo regionInfo =IPParser.getInstance().analyseIp("58.32.19.255"); System.out.println(regionInfo.getCountry()); System.out.println(regionInfo.getProvince()); System.out.println(regionInfo.getCity()); } }
/Library/Java/JavaVirtualMachines/jdk1.8.0_1/commons-logging/commons-logging/1.1.3/commons-logging-1.1.3.jar:/Users/jacksun/.m2/repository/log4j/log4j/1.2.17/log4j-1.2.17.jar:/Users/jacksun/.m2/repository/commons-lang/commons-lang/2.6/commons-lang-2.6.jar:/Users/jackop.hdfs.Iptest,testIP China Shanghai null Process finished with exit code 0
IP library resolution
import com.bigdata.hadoop.mr.project.utils.IPParser; import com.bigdata.hadoop.mr.project.utils.LogParser;
package com.bigdata.hadoop.mr.project.utils; import org.apache.commons.lang.StringUtils; import java.util.HashMap; import java.util.Map; public class LogParser { public Map<String, String> parse(String log) { IPParser ipParser = IPParser.getInstance(); Map<String, String> info = new HashMap<>(); if (StringUtils.isNotBlank(log)) { String[] splits = log.split("\001"); String ip = splits[13]; String country = "-"; String province = "-"; String city = "-"; IPParser.RegionInfo regionInfo = ipParser.analyseIp(ip); if (regionInfo != null) { country = regionInfo.getCountry(); province = regionInfo.getProvince(); city = regionInfo.getCity(); } info.put("ip", ip); info.put("country", country); info.put("province", province); info.put("city", city); } return info; } }
Function realization
package com.bigdata.hadoop.mr.project.mrv1; import com.bigdata.hadoop.mr.project.utils.IPParser; import com.bigdata.hadoop.mr.project.utils.LogParser; import org.apache.commons.lang.StringUtils; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.NullWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import java.io.IOException; import java.util.Map; public class ProvinceStatApp { public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException { Configuration configuration = new Configuration(); FileSystem fileSystem = FileSystem.get(configuration); Path outputPath = new Path("output/v1/provincestat"); if (fileSystem.exists(outputPath)) { fileSystem.delete(outputPath, true); } Job job = Job.getInstance(configuration); job.setJarByClass(ProvinceStatApp.class); job.setMapperClass(MyMapper.class); job.setReducerClass(MyReducer.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(LongWritable.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(LongWritable.class); FileInputFormat.setInputPaths(job, new Path("input/raw/trackinfo_20130721.data")); FileOutputFormat.setOutputPath(job, new Path("output/v1/provincestat")); job.waitForCompletion(true); } static class MyMapper extends Mapper<LongWritable, Text, Text, LongWritable> { private LongWritable ONE = new LongWritable(1); private LogParser logParser; @Override protected void setup(Context context) throws IOException, InterruptedException { logParser = new LogParser(); } @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String log = value.toString(); Map<String, String> info = logParser.parse(log); String ip = info.get("ip"); if (StringUtils.isNotBlank(ip)) { IPParser.RegionInfo regionInfo = IPParser.getInstance().analyseIp(ip); if (regionInfo != null) { String province = regionInfo.getProvince(); if(StringUtils.isNotBlank(province)){ context.write(new Text(province), ONE); }else { context.write(new Text("-"), ONE); } } else { context.write(new Text("-"), ONE); } } else { context.write(new Text("-"), ONE); } } } static class MyReducer extends Reducer<Text, LongWritable, Text, LongWritable> { @Override protected void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException { long count = 0; for (LongWritable value : values) { count++; } context.write(key, new LongWritable(count)); } } }
- 923 Shanghai 72898 Yunnan Province 1480 Inner Mongolia Autonomous Region 1298 Beijing 42501 Taiwan Province 254 Jilin Province 1435 Sichuan Province 4442 Tianjin 11042 Ningxia 352 Anhui Province 5429 Shandong Province 10145 Shanxi Province 2301 Guangdong Province 51508 Guangxi 1681 Xinjiang 840 Jiangsu Province 25042 Jiangxi Province 2238 Hebei Province 7294 Henan Province 5279 Zhejiang Province 20627 Hainan 814 Hubei province 7187 Hunan Province 2858 Macao Special Administrative Region 6 Gansu Province 1039 Fujian Province 8918 Tibet 110 Guizhou Province 1084 Liaoning Province 2341 Chongqing City 1798 Shaanxi Province 2487 Qinghai Province 336 Hong Kong Special Administrative Region 45 Heilongjiang Province 1968
Page view statistics
Statistics page visits
Get the pageID that meets the rules, and then make statistics
Page number acquisition
package com.bigdata.hadoop.mr.project.utils; import org.apache.commons.lang.StringUtils; import java.util.HashMap; import java.util.Map; public class LogParser { public Map<String, String> parse(String log) { IPParser ipParser = IPParser.getInstance(); Map<String, String> info = new HashMap<>(); if (StringUtils.isNotBlank(log)) { String[] splits = log.split("\001"); String ip = splits[13]; String country = "-"; String province = "-"; String city = "-"; IPParser.RegionInfo regionInfo = ipParser.analyseIp(ip); if (regionInfo != null) { country = regionInfo.getCountry(); province = regionInfo.getProvince(); city = regionInfo.getCity(); } info.put("ip", ip); info.put("country", country); info.put("province", province); info.put("city", city); String url = splits[1]; info.put("url", url); String time = splits[17]; info.put("time", time); } return info; } public Map<String, String> parseV2(String log) { IPParser ipParser = IPParser.getInstance(); Map<String, String> info = new HashMap<>(); if (StringUtils.isNotBlank(log)) { String[] splits = log.split("\t"); String ip = splits[0]; String country = splits[1]; String province = splits[2]; String city = splits[3]; IPParser.RegionInfo regionInfo = ipParser.analyseIp(ip); info.put("ip", ip); info.put("country", country); info.put("province", province); info.put("city", city); String url = splits[4]; info.put("url", url); } return info; } }
Function realization
package com.bigdata.hadoop.mr.project.mrv1; import com.bigdata.hadoop.mr.project.utils.ContentUtils; import com.bigdata.hadoop.mr.project.utils.LogParser; import org.apache.commons.lang.StringUtils; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import java.io.IOException; import java.util.Map; public class PageStatApp { public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException { Configuration configuration = new Configuration(); FileSystem fileSystem = FileSystem.get(configuration); Path outputPath = new Path("output/v1/pagestat"); if (fileSystem.exists(outputPath)) { fileSystem.delete(outputPath, true); } Job job = Job.getInstance(configuration); job.setJarByClass(PageStatApp.class); job.setMapperClass(MyMapper.class); job.setReducerClass(MyReducer.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(LongWritable.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(LongWritable.class); FileInputFormat.setInputPaths(job, new Path("input/raw/trackinfo_20130721.data")); FileOutputFormat.setOutputPath(job, new Path("output/v1/pagestat")); job.waitForCompletion(true); } static class MyMapper extends Mapper<LongWritable, Text, Text, LongWritable> { private LongWritable ONE = new LongWritable(1); private LogParser logParser; @Override protected void setup(Context context) throws IOException, InterruptedException { logParser = new LogParser(); } @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String log = value.toString(); Map<String, String> info = logParser.parse(log); String url = info.get("url"); if (StringUtils.isNotBlank(url)) { String pageId = ContentUtils.getPageId(url); context.write(new Text(pageId), ONE); } } } static class MyReducer extends Reducer<Text, LongWritable, Text, LongWritable> { @Override protected void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException { long count = 0; for (LongWritable value : values) { count++; } context.write(key, new LongWritable(count)); } } }
- 298827 13483 19 13506 15 13729 9 13735 2 13736 2 14120 28 14251 1 14572 14 14997 2 15065 1 17174 1 17402 1 17449 2 17486 2 17643 7 18952 14 18965 1 18969 32 18970 27 18971 1 18972 3 18973 8 18977 10 18978 5 18979 11 18980 8 18982 50 18985 5 18988 2 18991 27 18992 4 18994 3 18996 3 18997 3 18998 2 18999 4 19000 5 19004 23 19006 4 19009 1 19010 1 19013 1 20154 2 20933 1 20953 5 21208 11 21340 1 21407 1 21484 1 21826 8 22068 1 22107 4 22114 4 22116 5 22120 6 22123 13 22125 1 22127 16 22129 3 22130 3 22140 1 22141 5 22142 8 22143 5 22144 1 22146 5 22169 1 22170 20 22171 51 22180 4 22196 75 22249 4 22331 6 22372 1 22373 1 22805 3 22809 3 22811 5 22813 11 23203 1 23481 194 23541 1 23542 1 23704 1 23705 1 3541 2 8101 36 8121 32 8122 38 9848 2 9864 1
ETL for data processing
----------------------- [INFO ] method:org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:1008) kvstart = 26214396; length = 6553600 Counters: 30 File System Counters FILE: Number of bytes read=857754791 FILE: Number of bytes written=23557997 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 Map-Reduce Framework Map input records=300000 Map output records=299797 Map output bytes=3001739 Map output materialized bytes=3601369 Input split bytes=846 Combine input records=0 Combine output records=0 Reduce input groups=92 Reduce shuffle bytes=3601369 Reduce input records=299797 Reduce output records=92 Spilled Records=599594 Shuffled Maps =6 Failed Shuffles=0 Merged Map outputs=6 GC time elapsed (ms)=513 Total committed heap usage (bytes)=3870818304 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=173576072 File Output Format Counters Bytes Written=771
We found that it will take a lot of time to process the original data, and we need to process the data
So here we introduce ETL
ETL, the abbreviation of extract transform load in English, is used to describe the process of extracting, transforming and loading data from the source to the destination.
- The full amount of original data is not convenient for calculation, so it needs to be processed step by step for corresponding dimensional statistical analysis
- Data needed for parsing: ip = = "city information"
- Remove unnecessary fields
- ip/time/url/page_id/coutry/province/city
package com.bigdata.hadoop.mr.project.mrv2; import com.bigdata.hadoop.mr.project.mrv1.PageStatApp; import com.bigdata.hadoop.mr.project.utils.ContentUtils; import com.bigdata.hadoop.mr.project.utils.LogParser; import org.apache.commons.lang.StringUtils; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.NullWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import java.io.IOException; import java.util.Map; public class ETLApp { public static void main(String[] args) throws Exception { Configuration configuration = new Configuration(); FileSystem fileSystem = FileSystem.get(configuration); Path outputPath = new Path("input/etl"); if (fileSystem.exists(outputPath)) { fileSystem.delete(outputPath, true); } Job job = Job.getInstance(configuration); job.setJarByClass(ETLApp.class); job.setMapperClass(MyMapper.class); job.setMapOutputKeyClass(NullWritable.class); job.setMapOutputValueClass(Text.class); FileInputFormat.setInputPaths(job, new Path("input/raw/trackinfo_20130721.data")); FileOutputFormat.setOutputPath(job, new Path("input/etl")); job.waitForCompletion(true); } static class MyMapper extends Mapper<LongWritable, Text, NullWritable, Text> { private LongWritable ONE = new LongWritable(1); private LogParser logParser; @Override protected void setup(Context context) throws IOException, InterruptedException { logParser = new LogParser(); } @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String log = value.toString(); Map<String, String> info = logParser.parse(log); String ip = info.get("ip"); String country = info.get("country"); String province = info.get("province"); String city = info.get("city"); String url = info.get("url"); String time = info.get("time"); String pageId = ContentUtils.getPageId(url); StringBuilder builder = new StringBuilder(); builder.append(ip).append("\t"); builder.append(country).append("\t"); builder.append(province).append("\t"); builder.append(city).append("\t"); builder.append(url).append("\t"); builder.append(time).append("\t"); builder.append(pageId); context.write(NullWritable.get(), new Text(builder.toString())); } } }
106.3.114.42 China Beijing null http://www.yihaodian.com/2/?tracker_u=10325451727&tg=boomuserlist%3A%3A2463680&pl=www.61baobao.com&creative=30392663360&kw=&gclid=CPC2idPRv7gCFQVZpQodFhcABg&type=2 2013-07-21 11:24:56 - 58.219.82.109 China Jiangsu Province Wuxi City http://www.yihaodian.com/5/?tracker_u=2225501&type=4 2013-07-21 13:57:11 - 58.219.82.109 China Jiangsu Province Wuxi City http://search.yihaodian.com/s2/c0-0/k%25E7%25A6%258F%25E4%25B8%25B4%25E9%2597%25A8%25E9%2587%2591%25E5%2585%25B8%25E7%2589%25B9%25E9%2580%2589%25E4%25B8%259C%25E5%258C%2597%25E7%25B1%25B35kg%2520%25E5%259B%25BD%25E4%25BA%25A7%25E5%25A4%25A7%25E7%25B1%25B3%2520%25E6%2599%25B6%25E8%258E%25B9%25E5%2589%2594%25E9%2580%258F%2520%25E8%2587%25AA%25E7%2584%25B6%2520%2520/5/ 2013-07-21 13:50:48 - 58.219.82.109 China Jiangsu Province Wuxi City http://search.yihaodian.com/s2/c0-0/k%25E8%258C%25B6%25E8%258A%25B1%25E8%2582%25A5%25E7%259A%2582%25E7%259B%2592%25202213%2520%25E5%258D%25AB%25E7%2594%259F%25E7%259A%2582%25E7%259B%2592%2520%25E9%25A6%2599%25E7%259A%2582%25E7%259B%2592%2520%25E9%25A2%259C%25E8%2589%25B2%25E9%259A%258F%25E6%259C%25BA%2520%2520/5/ 2013-07-21 13:57:16 - 58.219.82.109 China Jiangsu Province Wuxi City http://www.yihaodian.com/5/?tracker_u=2225501&type=4 2013-07-21 13:50:13 - 218.11.179.22 China Hebei Province Xingtai City http://www.yihaodian.com/2/?tracker_u=10861423206&type=1 2013-07-21 08:00:13 - 218.11.179.22 China Hebei Province Xingtai City http://www.yihaodian.com/2/?tracker_u=10861423206&type=1 2013-07-21 08:00:20 - 123.123.202.45 China Beijing null http://search.1mall.com/s2/c0-0/k798%25203d%25E7%2594%25BB%25E5%25B1%2595%2520%25E5%259B%25A2%25E8%25B4%25AD/2/ 2013-07-21 11:55:28 - 123.123.202.45 China Beijing null http://t.1mall.com/100?grouponAreaId=3&uid=1ahrua02b8mvk0952dle&tracker_u=10691821467 2013-07-21 11:55:21 - ...........
Function upgrade
Just change the log based on etl
public Map<String, String> parseV2(String log) { IPParser ipParser = IPParser.getInstance(); Map<String, String> info = new HashMap<>(); if (StringUtils.isNotBlank(log)) { String[] splits = log.split("\t"); String ip = splits[0]; String country = splits[1] ; String province = splits[2]; if(province.equals("null")){ province="other"; } String city = splits[3]; IPParser.RegionInfo regionInfo = ipParser.analyseIp(ip); info.put("ip", ip); info.put("country", country); info.put("province", province); info.put("city", city); String url = splits[4]; info.put("url", url); String time = splits[5]; info.put("time", time); String client = splits[6]; info.put("client", client); } return info; }