Categories
Uncategorized

Script code obfuscation -Python articles -pyminifier (1)

Foreword

Recent studies a bit confusing method of scripting languages, such as python, javascript and so on. Scripting languages ​​are dynamic languages, most of the code can not be directly translated into binary machine code, source code release scripts basically the equivalent of exposure, which for some commercial applications is unacceptable. So the script code reinforcement, become first choice for many applications. A measure to reinforce the code is code obfuscation, increasing the difficulty of reading the code reverse logic of personnel, time delay being cracked.

Today, explain the confusion method Python code, Python code is generally used as a web, provides a service interface, but there are some desktop applications, this part of the code needs to confusion protection. To an open source project pyminifier (https://github.com/qiyeboy/pyminifier) ​​method to illustrate the technique of confusion, this project has been four years did not update, some bug, but still we can learn and get started.

Project structure

Framework Details:

analyze.py - 用于分析Python代码
compression.py - 使用压缩算法压缩代码
minification.py - 用于简化Python代码
obfuscate.py - 用于混淆Python 代码
token_utils.py - 用于收集Python Token

  

From the project code, you can see confusion pyminifier method is based on the Token, which is based on lexical analysis, if we confuse done before, then it should belong to the primary program confusing, because such confusion does not modify the code of the original logical structure.

Token extract

Token Python language how to extract it? Python provides a special package lexical analysis: tokenize. Is simple to use, in token_utils.py code as follows:

def listified_tokenizer(source):

"""Tokenizes *source* and returns the tokens as a list of lists."""
     io_obj = io.StringIO(source)
     return [list(a) for a in tokenize.generate_tokens(io_obj.readline)]

First, read the source file and produces a token list by tokenize.generate_tokens. It will save up to extract token of this function, and then let him extract himself, look at the structure token list.

[[1, 'def', (1, 0), (1, 3), 'def listified_tokenizer(source):\n'],
[1, 'listified_tokenizer', (1, 4), (1, 23), 'def listified_tokenizer(source):\n'],
[53, '(', (1, 23), (1, 24), 'def listified_tokenizer(source):\n'],
[1, 'source', (1, 24), (1, 30), 'def listified_tokenizer(source):\n'],
[53, ')', (1, 30), (1, 31), 'def listified_tokenizer(source):\n'],
[53, ':', (1, 31), (1, 32), 'def listified_tokenizer(source):\n'],
[4, '\n', (1, 32), (1, 33), 'def listified_tokenizer(source):\n'],
......

 

Each Token corresponds to a list, in a first row [1, ‘def’, (1,0), (1,3), ‘def listified_tokenizer (source): \ n’] will be explained as an example:

  1. 1 represents the token type

  2. def extracted token string

  3. (1, 0) represents the starting row and column token string

  4. (1, 3) represents the end of the row and column token string

  5. ‘Def listified_tokenizer (source): \ n’ represents the line where the

Restore Token Code

From the source file to extract token list, the list is how to restore the token source code? In fact, very simple, because there are a list of extracted token string information and location information, so the strings can be spliced.

def untokenize(tokens):
    """
    Converts the output of tokenize.generate_tokens back into a human-readable
    string (that doesn't contain oddly-placed whitespace everywhere).
    .. note::

        Unlike :meth:`tokenize.untokenize`, this function requires the 3rd and
        4th items in each token tuple (though we can use lists *or* tuples).
    """
    out = ""
    last_lineno = -1
    last_col = 0
    for tok in tokens:
        token_string = tok[1]
        start_line, start_col = tok[2]
        end_line, end_col = tok[3]
        # The following two conditionals preserve indentation:
        if start_line > last_lineno:
            last_col = 0
        if start_col > last_col and token_string != '\n':
            out += (" " * (start_col - last_col))
        out += token_string
        last_col = end_col
        last_lineno = end_line
    return out

Streamlining and compressed code

In pyminifier, there are two methods to reduce the Python code: is a streamlined manner, the other is to use a compression algorithm.

streamline

Is used in the minification.py streamlined manner, specific code as follows:

def minify(tokens, options):
    """
    Performs minification on *tokens* according to the values in *options*
    """
    # Remove comments
    remove_comments(tokens)
    # Remove docstrings
    remove_docstrings(tokens)
    result = token_utils.untokenize(tokens)
    # Minify our input script
    result = multiline_indicator.sub('', result)
    result = fix_empty_methods(result)
    result = join_multiline_pairs(result)
    result = join_multiline_pairs(result, '[]')
    result = join_multiline_pairs(result, '{}')
    result = remove_blank_lines(result)
    result = reduce_operators(result)
    result = dedent(result, use_tabs=options.tabs)
    return result 

The above code uses a total of nine kinds of methods to reduce the size of the script:

remove_comments

Removing the comments in the code, but there are two categories to be retained: a script interpreter path 2. Script Encoder

#!/usr/bin/env python
# -*- coding: utf-8 -*-

 

remove_docstrings

Doc remove specified content, example:

__doc__ = """\
Module for minification functions.

"""

 

fix_empty_methods

Modify empty function becomes pass

def myfunc():
'''This is just a placeholder function.'''

transform into:

def myfunc():pass

 

join_multiline_pairs

(1) first case:

test = (
"This is inside a multi-line pair of parentheses"
)

transform into:

test = ( "This is inside a multi-line pair of parentheses")

 

(2) The second situation:

test = [
"This is inside a multi-line pair of parentheses"
]

transform into:

test = [ "This is inside a multi-line pair of parentheses"]

  

(3) Third case:

test = {


"parentheses":"This is inside a multi-line pair of parentheses"


}

transform into:

test = { "parentheses":"This is inside a multi-line pair of parentheses"}

 

remove_blank_lines

Remove blank lines.

test = "foo"

 

test2 = "bar"

transform into:

test = "foo"
test2 = "bar"

 

reduce_operators

Remove spaces between operators.

def foo(foo, bar, blah):
    test = "This is a %s" % foo

change into:

def foo(foo,bar,blah):
    test="This is a %s"%foo

  

dedent

Alternatively indentation between codes, such as replacing a single space

def foo(bar):

    test = "This is a test"

change into:

def foo(bar):

 test = "This is a test"

 

compression

compression.py in this project, providing four code compression method in which three principles are the same, but the compression algorithm used is not the same.

bz2, gz, lzma compression implementation of the principle of

If a new 1.py, and save the following:

if __name__=="__main__":

    print(__name__)

In bz2 as an example, using the first algorithm bz2 compressed code, and then converted to base64 encoding.

code='''


if __name__=="__main__":


    print(__name__)


'''

import bz2,base64
compressed_source = bz2.compress(code.encode("utf-8"))
print(base64.b64encode(compressed_source).decode('utf-8'))

Output:

QlpoOTFBWSZTWdfQmoEAAAHbgEAQUGAAEgAAoyNUACAAIam1NNGgaaFNMjExMQ2Za0TTvJepAjgXb2pDBBGoliFIT04+LuSKcKEhr6E1Ag==

After the completion of code compression, how to implement it? In fact, it uses exec function / keyword. Good content encoding, decoding base64 first, then use bz2 decompression algorithm, and finally get real code and execute using exec

import bz2, base64
exec(bz2.decompress(base64.b64decode("QlpoOTFBWSZTWdfQmoEAAAHbgEAQUGAAEgAAoyNUACAAIam1NNGgaaFNMjExMQ2Za0TTvJepAjgXb2pDBBGoliFIT04+LuSKcKEhr6E1Ag==")))

 

This code represents the most primitive code, using gz, lzma compression, the zlib or bz2 package lzma can be replaced.

the implementation of the principle of zip

Many of my friends do not know, Python can run the zip file directly (special), mainly for the convenience of developers to manage and publish the project. Python can directly execute a __main__.py directory or zip file contains.

for example:

|—— ABC/

|—— A.py

|—— __main__.py

Sample code:

# A.py
def echo():

    print('ABC!')

# __main__.py
if __name == '__main__':

    import A
    A.echo()

  

Multiple files can be directly compressed into a zip file, zip file can be run directly. Directory Structure:

|—— ABC.zip/

|—— A.py

|—— __main__.py

  

Operation:

$ python ABC.zip

ABC!

To be continued. . .

At last

No public attention: seven nights Security blog

    [1] Reply to: receive Python data analysis tutorial spree

    [2] Reply to: receive a full tutorial Python Flask

    Reply [3]: to get a college course on machine learning

    [4] Reply to: receive reptile tutorial

    [5] Reply to: receive compiler theory tutorial

    Reply [6]: to get Penetration Testing Tutorial

    [7] Reply to: receive artificial intelligence mathematical foundation course

This article belongs to original works, welcome to reprint to share, modify the contents of the article is prohibited. Respect for the original, reproduced please specify from: seven-night story http://www.cnblogs.com/qiyeboy/

 

Leave a Reply