2019 college entrance examination the number of applicants reached a new high of 10.31 million, three years ago as a reference to the entrance of the quasi-ape program, in time for the college entrance examination, overtime scratch made a small entrance check points program, be sent to an old seniors Benedictine College Shanghai Campus entrance ceremony. On-line only a month, the number of users on the break 1k, introduction to applet is not to say, you can go to college entrance examination scores over the years [search query] experience, today mainly to talk about the principles and technical implementation details.

Data Sources

Applet background included a total of nearly 30w of data, containing 2008-2017 Division of Arts admission of each batch of all major colleges and universities as well as the use of new curriculum standards for all roll 2008-2018, vols new curriculum standards, new curriculum standards in three volumes autonomous provinces and some propositions from Tiqian Pi to a specialist vocational batch of admission, of sorts informative.

All data were collected from various tertiary institutions and related sites each entrance, because of the huge amount of data, in order to improve the speed, use the concurrent.futures (required Python3.5 +) module in the ThreadPoolExecutor to construct thread pool to perform multiple tasks concurrently.

Database using the PgSQL, a known as the world’s most powerful open source database products, all the data are present new gaokao database, under which a batch of two tables, university (college admission points) and province (provinces of second line)

university table shows



School Name

New students

Arts and sciences

Admission batch


Admission Average


province table shows




Candidates location

Arts and sciences


This batch minimum control line


30w amount of data, a plurality of sites, concurrent crawling, data collision is unavoidable, before insertion, incomplete first filtered data, such as when inserting a table of data missing university pc field, then this record it should be discarded, the most serious is the duplication of data, the solution I use is: first, whether the query to be inserted into the data already exists, the primary key university table is (name, stu, stu_wl, pc, year), because of practical constraints a college only a year in a batch can be only one category average admission, if not, before the implementation of the last insertion, and commit to commit the transaction.

Background build

After 30w of data to get, I intend background using Flask + PgSQL model to achieve, even in the background deployed at Ali cloud server, a small program end after the developer tools by the FBI, on the applet to a line encountered big trouble because the program requires a small line running through the ip address can not access the background, must be accessible via the domain name registration, buy a domain name is not quite troublesome, but the domain name for the record relatively time-consuming, requires more than a week, and there was also less than 5 from the entrance day, when the helpless, accidentally saw a small cloud development program, on the applet cloud development, introduction official website is:

Developers can use the cloud to develop small program to develop micro-letters, games, without having to set up the server, you can use cloud capabilities.

Cloud development to provide complete for developers of native cloud support and micro-channel service support, weakening the back-end maintenance concept and operation, without the need to build a server, API using the platform provided by the core business development, we can achieve rapid on-line and iteration, and this capability , with the cloud service developers already use compatible with each other, they are not mutually exclusive.

In other words, as long as the data into small program that comes with the background, you can access through a small platform API program to these data, previously understood the LeanCloud cloud cloud Bomb third parties and did not expect a small program now integrates these functions, Tencent have to admire it.

That is, the next task is mainly to import background data for small programs known backstage, back-office support data import json or csv format. So I wrote a script to export data from a local database to json file:

import psycopg2
import json

# 连接 pgsql 数据库,为保证隐私,密码已隐藏
conn = psycopg2.connect(database="gaokao", user="postgres", password="*******", host="", port="5432")
cur = conn.cursor()

cur.execute('select stu_loc,year,stu_wl,pc,control from province')
result = []
query_res = cur.fetchall()
for i in query_res:
    item = {}
    item['stu_loc'] = i[0]
    item['year'] = i[1]
    item['wl'] = i[2]
    item['pc'] = i[3]
    item['score'] = i[4]
# indent=2 控制 json 格式的缩进
# ensure_ascii 控制中文的正常显示
with open("province.json", 'w', encoding="utf-8") as f:
    f.write(json.dumps(result, indent=2, ensure_ascii=False))

There is also a need to explain there is a pit, backstage applet required json format and json little difference between us on the ordinary meaning of the format, first of all, all the contents of json can not be [and] include them, and each is {the} including the inability to obtain data items commas.

Selection notepad ++ json open the original file, use the Replace function can be solved, the [and] replace spaces, put}} can be replaced.

After modification, the small background program by importing the json file, the background to build basically completed.

Write small programs end

About writing small programs end, I mainly talk about two experiences, the first page is written, such as the following interface.

Initially want to achieve this effect, no ideas, and finally from the custom modal popups that has been thought, at the beginning of this drop-down box corresponding layout area colleges are hidden in wxml file control by hidden = true , a click region / college drop-down box, put the hidden set to false, if there are other drop-down box corresponding to the start of the layout of the hidden attribute is false, then, while the hidden attribute is set to make all these layouts to true to hide other layouts, of course, , where the need to true or false by js li setData () dynamically modified, the modified data from the data layer to render the view layer.

The second is a small program on the development of native cloud-Bug, a background query only query the data to up to 20 to achieve once to get all matching results, you need to solve two problems, the first question was naturally able to think, after the first data found in 20, before the second skipped 20 and then take the 20, 40 and then take before skipping the third 20, and so on; there is a more deadly problem, query background asynchronous callback API to get results, that is, in order to ensure a complete data, the second query need to write the callback first query, the third query need to write the callback second query in, but you can not explicitly know how many times you want to query, so how many layers of nested, and the annoying need to write the same variables are coverage issues, this is called asynchronous hell. To solve this problem, we need to write code to turn this into an asynchronous method of synchronization, which would be:

先在所要添加功能的js页面中导入 runtime.js 文件,同时把runtime.js文件放入相应文件夹

const regeneratorRuntime = require(“../runtime”);

runtime.js Download: https: //

The following example also emulated code completion business logic:

// 查询可能较慢,最好加入加载动画​
          title: '加载中',
        const countResult = await db.collection('province').where({
          stu_loc: name,
          pc: pici,

        const total =
        const batchTimes = Math.ceil(total / MAX_LIMIT)
        // 承载所有读操作的 promise 的数组
        for (let i = 0; i < batchTimes; i++) {
          const promise = await db.collection('province').where({
            stu_loc: name,
            pc: pici,
          }).skip(i * MAX_LIMIT).limit(MAX_LIMIT).get()
          for (let j = 0; j <; j++) {
            var item = {};
            item.code = i * MAX_LIMIT + j;
            item.year =[j].year;
            item.wl =[j].wl;
            item.pc =[j].pc;
            item.score =[j].score;
        if (newResult.length != 0) {
            hasdataFlag: true,
            resultData: newResult
        } else {
            hasdataFlag: false,
            resultData: newResult
        // 隐藏加载动画

These are some of the ideas I developed this experience, welcome criticism.

Course full source code

