Alternative practice of large data daily living ranking

1. Data analysis

Description: the total number of stars has not been uploaded and will not be counted for the time being.

String iosSql = "select " +
        "uid," +
        "level," +
        "row_number() over(order by level desc) rank " +
        "from  " +
        "(select " +
        "uid, " +
        "max(str2Num(params['maxlevel'])) level  " +
        "from bricksdb.s3_stage   " +
        "WHERE ym='"+ym+"' and day='"+day+"'  " +
        "and  appname='iOS_BricksGame' "+
        "group by uid   " +
        "order by max(str2Num(params['maxlevel'])) desc  " +
        "limit 3000) t order by level desc ";
spark.sql(iosSql).write().mode(SaveMode.Overwrite).jdbc(JdbcUtils.URL, "ios_rank_level", JdbcUtils.getProJdbcInfo());

2. URL path and parameters

http://{ip}:{port}/u3d/getRank
Content-Type:application/json
uid:4e6a1dafa2ae1674
version:100067
Client-Name:BricksGame|iOS_BricksGame|CN_BricksGame|CN_iOS_BricksGame

Interface result description:
uid+"#"+name+"#"+photo+"#"+photoFrame+"#"+level+"#"+starts+"#"+rank

User ID + "#" + user name + "#" + avatar + "#" + avatar frame + "#" + level + "#" + star + "#" + ranking

{
    "msg": "Operation successful!",
    "code": 1,
    "data": {
        "rankList": [
            "151a1a0326f5477e#151a1a0326f5477e#1#1#5000#0#1",
            "d7c43ad3a3dee434#d7c43ad3a3dee434#1#1#5000#0#2",
            "8c7081e07f3fa45e#8c7081e07f3fa45e#1#1#5000#0#3",
            "18be19e5d7feb0fa#18be19e5d7feb0fa#1#1#5000#0#4",
            "44250e4332eb8d79#44250e4332eb8d79#1#1#5000#0#5",
            			...
            "2895dd9c50cf1068#2895dd9c50cf1068#1#1#4623#0#95",
            "279b94e344635720#279b94e344635720#1#1#4622#0#96",
            "314291b1b724d9bd#314291b1b724d9bd#1#1#4612#0#97",
            "ac461aee63589dc9#ac461aee63589dc9#1#1#4611#0#98",
            "d6a0b17fc0828aca#d6a0b17fc0828aca#1#1#4610#0#99",
            "9c0bb22866ee2f68#9c0bb22866ee2f68#1#1#4606#0#100"
        ]
    }
}

3. Front end logic sorting

1. The client pulls the top 100 ranking information and displays the ranking information. (assuming that the level information of 100 is 3500)

2. Personal information acquisition:

(a) If the level of local information is greater than or equal to 3500, traverse the top 100 information. If in the collection, personal information is directly displayed (mainly local); If you are no longer in the collection, delete the last information and insert your own information into the top 100.

(b) If the level of local information is less than 3500. Analyze the relationship between the top 1w level and ranking through machine learning (the formula is given by the server). The client displays the personal ranking through this relationship.

4. Relationship between level and ranking

[external link image transfer fails, and the source station may have anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-n9y4uvgh-1600686658219) (C: \ users \ administrator \ appdata \ roaming \ typora \ typora user images \ image-20200917183614070. PNG)]

It can be seen from the image that the image is divided into two sections with a large slope of 1-400; The remaining 400-5000 slopes are slow.

5. Machine learning algorithm evaluation

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, SGDRegressor, Ridge
from sklearn.metrics import mean_squared_error
import pandas as pd
import matplotlib.pyplot as plt
def getRank():
    """
    Evaluation of multiple regression algorithms
    """
    data = pd.read_csv("./data/android.csv")
    # data = data.query("rank < 10001")
    x_train, x_test, y_train, y_test = train_test_split(data[['level']], data[['rank']], test_size=0.00001)
    plt.figure(figsize=(20, 8), dpi=80)
    plt.scatter(data[['level']], data[['rank']])
    plt.title("android levels and ranks")
    plt.xlabel('levels')
    plt.ylabel('ranks')
    plt.show()
    # Solution of normal equation and prediction results
    lr = LinearRegression()
    lr.fit(x_train, y_train)
    print("Coefficient:", lr.coef_, "Cut off distance:", lr.intercept_)
    y_lr_predict = lr.predict(x_test)
    print("Ranking of each level in the normal equation test set:", y_lr_predict)
    print("Mean square error of normal equation:", mean_squared_error(y_test, y_lr_predict))
    # Ridge regression for prediction
    rd = Ridge(alpha=1.0)
    rd.fit(x_train, y_train)
    print("Ridge regression algorithm coefficient:", rd.coef_, "Cut off distance:", rd.intercept_)
    y_rd_predict = rd.predict(x_test)
    print("Ranking of each level in ridge regression test set:", y_rd_predict)
    print("Mean square error of ridge regression:", mean_squared_error(y_test, y_rd_predict))
if __name__ == '__main__':
    getRank()

6. Conclusion

The relationship between level and ranking is as follows (Android):

-0.242474 * Rank +3850=Level (100<rank<=10000)

Reverse push 1400 < level < 3825

-17.31919503 * Level + 43013 = Rank (1500<Level <=2000)

-6.24650545 * Level + 20734 = Rank (2000<Level <=3000)

-2.1950991 * Level + 8891= Rank (3000<Level <=4000)

-0.51153987 * Level + 2457= Rank (4000<Level <=4600 )

Description: when the Level is less than 1500, it is not listed at this time; The rest are brought into the segment function for rounding operation to judge whether its ranking is greater than 10000, greater than not on the list and less than the display ranking.

The relationship between level and ranking is as follows (IOS):

-0.10760038 * Rank +2073 =Level (100<rank<=10000)

< 2000 level < 1000

-10.40723106 * Level + 21202 = Rank (1000<Level <=2000)

-1.55131557 * Level + 4769 = Rank (2000<Level <=3000)

-0.48629439 * Level + 1785 = Rank (3000<Level <3500)

Description: when the Level is less than 1000, it is not listed at this time; The rest are brought into the segment function for rounding operation to judge whether its ranking is greater than 10000, greater than not on the list and less than the display ranking.

Posted by Eman on Sun, 15 May 2022 16:27:27 +0300