开发者

Simplifying for-if messes with better structure? [closed]

开发者 https://www.devze.com 2023-02-04 20:23 出处:网络
Closed. This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this
Closed. This question needs to be more focused. It is not currently accepting answers.

Want to improve this question? Update the question so it focuses on one problem only by editing this post.

Closed 3 years ago.

Improve this question

Please, move this question to Code Review -area. It is better suited there because I know the code below is junk and I want critical feedback to complete rewrite. I am pretty much reinventing the wheel.

# Description: you are given a bitwise pattern and a string
# you need to find the number of times the pattern matches in the string.
# The pattern is determined by markov chain.
# For simplicity, suppose the ones and zeros as unbiased coin flipping
# that stops as it hits the pattern, below.
#
# Any one liner or simple pythonic solution?

import random

def matchIt(yourString, yourPattern):
        """find the number of times yourPattern occurs in yourString"""

        count = 0
        matchTimes = 0

        # How can you simplify the for-if structures?
        # THIS IS AN EXAMPLE HOW NOT TO DO IT, hence Code-Smell-label
        # please, read clarifications in [Update]

        for coin in yourString:
            #return to base
            if  count == len(pattern):
                    matchTimes = matchTimes + 1
                    count = 0

            #special case to return to 2, there could be more this type of conditions
            #so this type of if-conditionals are screaming for a havoc
            if count == 2 and pattern[count] == 1:
                    count = count - 1

            #the work horse
            #it could be simpler by breaking the intial string of lenght 'l'
            #to blocks of pattern-length, the number of them is 'l - len(pattern)-1'
            if coin == pattern[count]:
                    count=count+1

        average = len(yourString)/matchTimes

        return [average, matchTimes]



# Generates the list
myString =[]
for x in range(10000):
    myString= myString + [int(random.random()*2)]

pattern = [1,0,0]
result = matchIt(myString, pattern)

print("The sample had "+str(result[1])+" matches and its size was "+str(len(myString))+".\n" +
        "So it took "+str(result[0])+" steps in average.\n" +
        "RESULT: "+str([a for a in "FAILURE" if result[0] != 8]))


# Sample Output
# 
# The sample had 1656 matches and its size was 10000.
# So it took 6 steps in average.
# RESULT: ['F', 'A', 'I', 'L', 'U', 'R', 'E']

[Update]

I will explain here a bit about theory, perhaps, the problem can be simplified that way. The above code try to construct the markov chain with transition matrix A below. The pattern 100 that you can imagine as coin flipping corresponds to it.

>>> Q=numpy.matrix('0.5 0.5 0; 0 0.5 0.5; 0 0.5 0')     
>>> I=numpy.identity(3)
>>> I
array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])
>>> Q
matrix([[ 0.5,  0.5,  0. ],
        [ 0. ,  0.5,  0.5],
        [ 0. ,  0.5,  0. ]])
>>> A=numpy.matrix('0.5 0.5 0 0; 0 0.5 0.5 0; 0 0.5 0 0.5; 0 0 0 1')
>>> A
matrix([[ 0.5,  0.5,  0. ,  0. ],
        [ 0. ,  0.5,  0.5,  0. ],
        [ 0. ,  0.5,  0. ,  0.5],
        [ 0. ,  0. ,  0. ,  1. ]])

The average 8 in the question becomes the sum of values on the first row in the matrix N=(I-Q)^-1 where Q above.

>>> (I-Q)**-1
matrix([[ 2.,  4.,  2.],
        [ 0.,  4.,  2.],
        [ 0.,  2.,  2.]])
>>> numpy.sum(((I-Q)**-1)[0])
8.0

Now, you can probably see that this apparently-only-pattern-matching-problem becomes a markov chain. I cannot see a reason why you could not substitute the messy for-while-if conditions with something similar to matrices or matrices. I don't know how to implement them but iterators could be a way to go, researching, particularly with开发者_JAVA技巧 more states where you need to decompose.

But a problem emerged with Numpy, what are the things -Inf and NaN for? Check the values to which they should converge, above, from (I-Q)**-1 matrix. The N is from N=I+Q+Q^2+Q^3+...=\frac{I-Q^{n}}{I-Q}.

>>> (I-Q**99)/(I-Q)
matrix([[  2.00000000e+00,   1.80853571e-09,             -Inf],
        [             NaN,   2.00000000e+00,   6.90799171e-10],
        [             NaN,   6.90799171e-10,   1.00000000e+00]])
>>> (I-Q**10)/(I-Q)
matrix([[ 1.99804688,  0.27929688,        -Inf],
        [        NaN,  1.82617188,  0.10742188],
        [        NaN,  0.10742188,  0.96679688]])


def matchIt(yourString, yourPattern):
        """find the number of times yourPattern occurs in yourString"""

Are you allowed to use the following?

yourString.count(yourPattern)

In your case you could create myString as a real string of 10000 characters and the pattern also as a string and then count the occurence in a simple pythonic way.

EDIT

A one-liner that gives you the number of (overlapping) occurences of pattern in text (which can be either a string or a list), could look like this:

nbOccurences = sum(1 for i in xrange(len(text)-len(pattern)) if text[i:i+len(pattern)] == pattern)


Ok - a standard(-ish) string search:

def matchIt(needle, haystack):
    """
    @param needle:   string, text to seek
    @param haystack: string, text to search in

    Return number of times needle is found in haystack,
        allowing overlapping instances.

    Example: matchIt('abab','ababababab') -> 4
    """
    lastSeenAt = -1
    timesSeen = 0
    while True:
        nextSeen = haystack.find(needle, lastSeenAt+1)
        if nextSeen==-1:
            return timesSeen
        else:
            lastSeenAt = nextSeen
            timesSeen += 1

but you want to do this to a list of numbers? No problem; we just need to make a list class with a find() method, like so:

import itertools
class FindableList(list):
    def find(self, sub, start=None, end=None):
        """
        @param sub: list, pattern to look for in self

        @param start: int, first possible start-of-list
            If not specified, start at first item

        @param: end: int, last+1 possible start-of-list
            If not specified, end such that entire self is searched

        Returns;
            Starting offset if a match is found, else -1
        """
        if start is None or start < 0:
            start = 0

        # N.B. If end is allowed to be too high,
        # zip() will silently truncate the list comparison
        # and you will probably get extra spurious matches.
        lastEnd = len(self) - len(sub) + 1
        if end is None or end > lastEnd:
            end = lastEnd

        rng = xrange if xrange else range
        iz  = itertools.izip
        isl = itertools.islice

        for pos in rng(start, end):
            if all(a==b for a,b in iz(sub, isl(self, pos, end))):
                return pos

        # no match found
        return -1

then the example looks like

matchIt([1,2,1,2], FindableList([1,2,1,2,1,2,1,2,1,2])) -> 4

and your code becomes:

# Generate a list
randIn = lambda x: int(x*random.random())
myString =[randIn(2) for i in range(10000)]

pattern = [1,0,0]
result = matchIt(pattern, myString)

print("The sample had {0} matches and its size was {1}.\n".format(result, len(myString)))


This is not ready.

Similar question but main focus on graph libraries here and similar question but in C#, maybe useful.

The files that are relevant to this question are ./networkx/generators/degree_seq.py (997 lines, about generating graps with given degree sequence) and ./networkx/algorithms/mixing.py (line 20, function degree_assortativity(G) about probability based graphs) and also note that its source code refer to 92 references, not sure whether you want to reinvent the wheel. For igraph, please, read the line 835 of the file convert.c about weighted edges. You can get the source for Networkx here and the source for igraph here. Please, note that the former is under BSD license and done in Python while igraph under GNU (GPL) and done in C.

To get started with Networkx, a helpful line about creating weighted graph from its jUnits test_convert_scipy.py -file:

def create_weighted(self, G): 
    g = cycle_graph(4)
    e = g.edges()
    source = [u for u,v in e]
    dest = [v for u,v in e]
    weight = [s+10 for s in source]
    ex = zip(source, dest, weight)
    G.add_weighted_edges_from(ex)
    return G

So to create your Markov Chain, help about directed weighted graph here, something like this perhaps:

>>> DG=nx.DiGraph()
>>> DG.add_weighted_edges_from([(0,0,0.5),(1,1,0.5),(3,3,1),(0,1,0.5),(1,2,0.5),(2,3,0.5), (2,1,0.5)])

or perhaps there is some ready markov chain generation tool as there are for some other stochastic processes, more here. Cannot find an algorithm to analyze the graph with excepted value or do trials with different sets as in your example, perhaps there is none, and you must stick with other repliers' solutions.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号