Vibrato data fetch and the data behind Taobao

Background analysis

As of July this year, vibrato Nikkatsu has exceeded 320 million. Vibrato president Zhang Nan predicts that by 2020, the total number of users Nikkatsu domestic short video industry will reach 10 billion. Vibrato launch multi-cash way, to make 10 million creators make money, vibrato say to make this 10 million creators make money, there are many ways in which cash, I want to share today is behind vibrato Taobao process chain, we brush vibrato video, we’ll find some video in the promotion of Taobao goods, is one of the channel creators realized from Taobao shop perspective, the vibrato of people to help their promotion of goods, need to pay up to people a certain advertising costs; from Taobao point of view, Taobao has a platform Taobao Alliance called every help Taobao to sell goods person, Taobao Alliance will define it as Taobao customers, as long as Taobao customers to promote goods was purchase, then the league will be paid to Taobao Taobao off a certain percentage of commission. In short, vibrato income of people has two parts: Taobao business advertising + (under the premise of a successful transaction) commission Taobao Alliance. This paper analyzes the vibrato to reach people posting process between Taobao.

 

Vibrato Posts

 

 

 

 

Post text

 

We can see the bottom left corner there is a shopping cart marked. Yes, he is the link Taobao goods, click to open the following

 

 

 

 

This is someone who posts to promote merchandise, click to jump Taobao APP

In summary, we can crawl up a list of people behind the product data to analyze the data, so you can get the data corresponding to the Taobao shop.

 

Vibrato APP packet capture

 

Vibrato version8.0.0 version of this iphone used, anyproxy as a proxy packet capture tool

anyproxy is an excellent proxy wheels Alibaba development, of course, there is a foreign mitmproxy

anyproxy installation guide can refer to:

https://link.zhihu.com/?target=https%3A//www.jianshu.com/p/d978d3b8f2aa

anyproxy official link: seemingly needs a stable international environment before they can access the network

https://link.zhihu.com/?target=http%3A//anyproxy.io

anyproxy project Address:

https://link.zhihu.com/?target=https%3A//github.com/alibaba/anyproxy

 

We can use anyproxy and mitmproxy to capture as an analytical tool, and

anyproxy is based nodeJs development (recommended people use familiar nodeJs)

mitmproxy is based on the development of python (python person familiar with the recommended use)

Using this two main tools that can be done to intercept and forward data, both of which are using the arrival of the man in the middle attacks the principle behind the development of our reptiles are also using this principle. Of course, the mere doing data analysis, you can use a common packet capture tool fidder charles and so on.

Installed anyproxy requires the phone to set up a trust certificate and agents

Proxy settings, anyproxy use the default port 8001 as a proxy port

 

 

Proxy settings

 

 

 

 

Setting up a trust certificate

 

 

The phone is open vibrato APP them a list of people post

 

 

 

Daren post a list of page

 

Computer open: http: // localhost: 8002 / can see all the data flowing through the mobile phone, which of course also includes data vibrato APP can see the vibrato of people post links.

Do some URL filtering conditions: https: //api-hl.amemv.com/aweme/v1/aweme/post/

 

 

 

 

Each field can see a post has simple_promotions by this analysis, this field is carrying information to promote the goods, we can put this ID to save data, and then to get the information to other Taobao shop merchandise according to ID

 

 

 

 

 

anyproxy default proxy intercepts and forwarding settings

 

 

 

Explain here, the default execution anyproxy -i in the terminal, anyproxy will automatically load the file in /usr/local/lib/node_modules/anyproxy/lib/rule_default.js, we need to intercept data vibrato, we need its sibling Create a directory douyin.js file to perform anuproxy -i douyin.js, then anyproxy do forwarded to intercept operation according to douyin.js inside logic. The default location of this file mac, the default file location window of its own global search rule_default.js which can be found

Douyin.js specific file code is as follows

 

  1 'use strict';
  2 
  3 module.exports = {
  4 
  5   summary: 'the default rule for AnyProxy',
  6 
  7   /**
  8    *
  9    *
 10    * @param {object} requestDetail
 11    * @param {string} requestDetail.protocol
 12    * @param {object} requestDetail.requestOptions
 13    * @param {object} requestDetail.requestData
 14    * @param {object} requestDetail.response
 15    * @param {number} requestDetail.response.statusCode
 16    * @param {object} requestDetail.response.header
 17    * @param {buffer} requestDetail.response.body
 18    * @returns
 19    */
 20   *beforeSendRequest(requestDetail) {
 21     console.log('this is request')
 22     return null;
 23   },
 24 
 25 
 26   /**
 27    *
 28    * 设置截取抖音的数据 
 29    * @param {object} requestDetail
 30    * @param {object} responseDetail
 31    */
 32   *beforeSendResponse(requestDetail, responseDetail) {
 33       if (requestDetail.url.indexOf('https://api-hl.amemv.com/aweme/v1/aweme/post/') >= 0) {    //抖音达人的详细信息app端
 34           const newResponse = responseDetail.response;
 35           newResponse.body = newResponse.body.toString();          
 36           const posturl="/WebCrawler/douyin/AppUserData"
 37           HttpPost(newResponse.body,requestDetail.url,posturl)
 38           console.log('传送app端达人的详细信息')
 39 
 40       }
 41         
 42     
 43 
 44     return null;
 45   },
 46 
 47 
 48   /**
 49    * default to return null
 50    * the user MUST return a boolean when they do implement the interface in rule
 51    *
 52    * @param {any} requestDetail
 53    * @returns
 54    */
 55   *beforeDealHttpsRequest(requestDetail) {
 56     return null;
 57   },
 58 
 59   /**
 60    *
 61    *
 62    * @param {any} requestDetail
 63    * @param {any} error
 64    * @returns
 65    */
 66   *onError(requestDetail, error) {
 67     return null;
 68   },
 69 
 70 
 71   /**
 72    *
 73    *
 74    * @param {any} requestDetail
 75    * @param {any} error
 76    * @returns
 77    */
 78   *onConnectError(requestDetail, error) {
 79     return null;
 80   },
 81 
 82 
 83   /**
 84    *
 85    *
 86    * @param {any} requestDetail
 87    * @param {any} error
 88    * @returns
 89    */
 90   *onClientSocketError(requestDetail, error) {
 91     return null;
 92   },
 93 };
 94 
 95 
 96 //传输数据到本地自己的服务器进行入库存储的操作
 97 function HttpPost(json,url,path) {//将json发送到服务器,str为json内容,url为历史消息页面地址,path是接收程序的路径和文件名
 98     console.log("开始执行转发操作");
 99     try{
100     var http = require('http');
101     var data = {
102         json: json,
103         url: encodeURIComponent(url),
104         data:'Im jiehuhu'
105     };
106     data = require('querystring').stringify(data);
107     var options = {
108         method: "POST",
109         host: "127.0.0.1",//注意没有http://,这是服务器的域名。
110         port: 8080,
111         path: path,//接收程序的路径和文件名
112         headers: {
113             'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
114             "Content-Length": data.length
115         }
116     };
117     var req = http.request(options, function (res) {
118         res.setEncoding('utf8');
119         res.on('data', function (chunk) {
120             console.log('BODY: ' + chunk);
121         });
122     });
123     req.on('error', function (e) {
124         console.log('problem with request: ' + e.message);
125     });
126     
127     req.write(data);
128     req.end();
129     }catch(e){
130         console.log("错误信息:"+e);
131     }
132     
133     console.log("转发操作结束"+req);
134 }

 

 

 

The back end has a specific project to receive anyproxy forward to intercept data, I use this name javaWeb project called WebCrawler project to handle requests

APP vibrato data acquisition flowchart substantially as follows:

 

 

Here is the java + tomcat8 + mysql technical framework, which is the technology stack me a year ago, of course, now I am more like a mongoDB and Python, faster processing time up

Python + mongoDb may also be used to process the data transmitted from anyproxy

 

 

Specific part of automated operations temporarily not complete, you can use the phone or automated testing tools Appium QuickMacro

Data results are as follows: the portion of the data vibrato

 

 

 

 

 

Taobao shop to get the data according to the ID of goods

 

 

 

 

 

According to Taobao ID information to obtain goods Taobao shop is also a need to develop a new reptile. Here without too much explanation, or a little more difficult, goods need to get to know the key reptiles Taobao signature mechanism

Taobao H5 signature mechanism, interest research yourself slowly. . . . Anyway, I studied out ha ha ha ha

 

Specific data I climb down on the Baidu cloud at the following link: Interested can look at

 

Link: https: //pan.baidu.com/s/1O5CYJeJYiL6uB7e56_WPUA Password: 1abc

 

These are the vibrato data fetch process, and the process to Taobao extending substantially ideas

    Vibrato vibrato APP to get up to people of all posts by anyproxy

    ID post inside the analysis of the promotion of goods, to obtain relevant information based on the ID merchandise shop

    Analysis of a person in the end in which products to promote, and that some shops in cooperation.

    Crawl through large-scale data analysis, you can analyze those shops do large-scale promotion in vibrato

About cries: will point crawlers, will point backend, will point front end, will point data analysis, will point algorithm, a like Eason Chan ?

here can contact me

 

 

 

This article original author, I play a word out, trained, trained, For reprint please indicate the original link

The purpose of this study is only a crawler technology, if someone illegally manipulating the use of the techniques described herein are the consequences of the operator themselves, and this article and the author does not have any relationship.

 

Leave a comment