Big Data Fundamentals Review--Junior Year 1

1.Python

1. Data Type

1. Numbers

  • Integer : int
  • float: float
  • Plural: complex
  • boolean: bool

2. String

  • String: String

3. Differences from Java

  • python has only four kinds of data: integers, longs, floats, and complexes

  • java has char, short,byte, int, long, float,double types

4. List

  • A list is an ordered sequence of python objects

  • list creation

    list1 = [1,2.0,'hello']
    
  • list of values

    ##Take the above list as an example
    list1[0]
    ##output: 1
    list1[list1.len()-1]
    ##Output result: hello
    
  • list slice

    list2 = [1,2.0,3,4,5,6]
    list2[2,-1]
    ##Output result: [3,4,5]
    
  • delete list element

    ##Take the above list as an example
    del list2[0]
    ##Output result: [2.0,3,4,5,6]
    
  • Common method

    ##	1. Delete
    del list2[0]
    ##	2. Return the number of occurrences of an element in the list
    list2.count(1)
    ##	3. Returns the first position where the element appears
    list2.index('hello')
    ##	4. Append the element at the end of the list
    list2.append('world')
    ##	5. Delete the element at the specified position in the list
    list2.pop(0)
    ##	6. Sort the list
    list2.sort()
    

5. Tuples

  • The characteristics of tuples are: once created, they cannot be changed

  • tuple creation

    ##	1. Create a tuple with a pair of parentheses
    t = (1,2,3,4,5)
    ##	2. The number of tuples exceeds two parentheses can be omitted
    t = 1,2,3,4,5
    ##	3. Note that when the number of elements in the tuple is only one parenthesis cannot be omitted
    t = (1)
    
  • tuple of values

    ##Take the above t as an example
    t[0]
    ##output: 1
    
  • tuple slice

    ##Take the above t as an example
    t[1:3]
    ##Output result: (3,4)
    
  • Due to the immutability of tuples, there are no methods like insert

6. Dictionary

  • A dictionary is a data structure consisting of "key-value"

  • Note: When the dictionary gets the key, it is not repeatable. If it is repeated, it will be overwritten by the last one, so there will be no repetition. At the same time, the dictionary is unordered.

  • dictionary creation

    ##	1. Create an empty dictionary
    a = {}
    ##	2. Direct assignment
    b = {'one':1,'two':2}
    
  • Insert an element into a dictionary by index

    ##Take a above as an example
    a['one'] = 1
    ##Result: a changes from the original empty dictionary to a dictionary with one element: {'one':1}
    
  • You can also use the above method to assign

  • Common method

    ##Take b above as an example
    ##	1.key() method
    b.key()
    ##Output result: ['one','two']
    
    ##	2.value() method
    b.value()
    ##Output result: [1,2]
    

2. Cycle

1.for loop

  • Fixed number of cycles

  • for i in range(5) :
        print(i)
    ##output
    0
    1
    2
    3
    4
    
  • traverse

  • languages = ["C", "C++", "Perl", "Python"] 
    for x in languages:
         print (x)
    """
    result 
    C
    C++
    Perl
    Python
    >>>
    """
    

2.while loop

  • n = 100
     
    sum = 0
    counter = 1
    while counter <= n:
        sum = sum + counter
        counter += 1
    print("1 arrive %d The sum is: %d" % (n,sum))
    
    ##Result: The sum of 1 to 100 is: 5050
    

3. Positive triangle, inverted triangle source code

  • right

  • #The format in the upper right corner outputs the ninety-nine multiplication table
    for i in rang(1,10):
        for k in rang(1,i):
            print(end = "		")
        for j in rang(i,10):
            print("%d*%d = %2d" % (i,j,i*j),end=" ")
        print(" ")
            
    #The lower right triangle format outputs the ninety-nine multiplication table
    for i in rang(1,10):
        for k in rang(1,10-i):
            print(end = "		")
        for j in rang(1,10-i):
            print("%d*%d = %2d" % (i,j,i*j),end=" ")
        print(" ")
    
  • left

  • #The number of the lower left triangle outputs the nine-nine multiplication table
    for i in rang(1,10):
        for j in rang(1,i+1):
            print("%d*%d = %2d" % (i,j,i*j),end=" ")
        print(" ")
        
    #The upper left triangle format outputs the ninety-nine multiplication table
    for i in rang(1,10):
        for j in rang(1,i+1):
            print("%d*%d = %2d" % (i,j,i*j),end=" ")
        print(" ")
    

4. Pay attention

  • python syntax
  • Strictly indent
  • The loop is followed by a colon

2.Pandas

1. Basic Concepts

  • Series

    1. Series: Series, also known as a sequence, is used to store a row or column of data, as well as an index collection associated with it

      Series([data 1,data 2,...],index = [index 1,Index 2,...])
      
    2. Note the following points

      • Series is a one-dimensional array-like object
      • His data structure has no restrictions
      • He has an index, similar to a dictionary
      • Series functions as both numbers and dictionaries
  • DataFrame

    1. DataFrame is a collection of data that stores multi-row and multi-column data, and is a container for Series

      from pandas import Series
      from pandas import DataFrame
      df = DataFrame({'age':Series([26,29,24]),'name':Series(['Ken','Jerry','Ben'])},index=[1,2,3])
      

2. Group analysis

  • Grouping analysis: refers to a method of dividing the analysis object into different parts according to the grouping field to compare and analyze the differences between the groups

  • Commonly used methods are count, sum, average

  • common form

    df.groupby(by = ['Category 1','Category 2',...])['Columns to be counted'].agg({Column Alias ​​1: Statistical Function 1, Column Alias ​​2: Statistical Function 2})
    
    df.groupby(by=[u'class',u'gender'])[u'military training'].agg((u'total score':numpy.sum,u'number of people':numpy.sizey,u'average value':nmpy.mean,u'variance':numpy.var,u'standard deviation':numpy.std,u'highest score':umpy.max,u'Lowest score':numpy,min))
    

    result:

3. Data visualization

  • Scatter chart: A scatter chart is a graph that uses one variable as the abscissa and another variable as the ordinate, and uses the distribution pattern of scatter points (coordinate points) to reflect the relationship between variables. The relevant methods are as follows:

    plt.plot(x,y,'.',color=(r,g,b)
    plt.xlabel('x axis coordinates')
    plt.ylabel('y axis coordinates')
    plt.grid(True)
    #Among them, x and y are the sequence of the x-axis and the y-axis: '.' ('o', etc.) represents the small point (large point, etc.); color is the color of the scatter plot, which can be defined by RGB or English letters Defines how RGB colors are set as (red, green, blue)
    
  • Line chart: P134

3. Bayesian

slightly

Posted by jsbrown on Thu, 05 May 2022 06:29:28 +0300