开发者

Python search list of objects that contain objects, partial matches

开发者 https://www.devze.com 2023-02-07 11:54 出处:网络
I\'m trying to build a simple search engine for a small website. My initial thought is to avoid using larger packages such as Solr, Haystack, etc. because of the simplistic nature of my search needs.

I'm trying to build a simple search engine for a small website. My initial thought is to avoid using larger packages such as Solr, Haystack, etc. because of the simplistic nature of my search needs.

My hope is that with some guidance I can make my code more pythonic, efficient, and most importantly function properly.

Intended functionality: return product results based on full or partial matches of item_number, product name, or category name (currently no implementation of category matching)

Some code:


import pymssql
import utils #My utilities  

class Product(object):  
   def __init__(self, item_number, name, description, category, msds):
        self.item_number = str(item_number).strip()
        self.name = name
        self.description = description
        self.category = category
        self.msds = str(msds).strip()

class Category(object):  
    def __init__(self, name, categories):
        self.name = name
        self.categories = categories
        self.slug = utils.slugify(name)
        self.products = []

categories = (
    Category('Food', ('123', '12A')),
    Category('Tables', ('354', '35A', '310', '31G')),
    Category('Chemicals', ('845', '85A', '404', '325'))
)

products = []

conn = pymssql.connect(...)
curr = conn.cursor()

for Category in categories:
    开发者_运维知识库for c in Category.categories:
        curr.execute('SELECT item_number, name, CAST(description as text), category, msds from tblProducts WHERE category=%s', c)
        for row in curr:
            product = Product(row[0], row[1], row[2], row[3], row[4])
            products.append(product)
            Category.products.append(product)

conn.close()

def product_search(*params):
    results = []
    for product in products:
        for param in params:
            name = str(product.name)
            if (name.find(param.capitalize())) != -1:
                results.append(product)
            item_number = str(product.item_number)
            if (item.number.find(param.upper())) != -1:
                results.append(product)
    print results

product_search('something')


MS SQL database with tables and fields I cannot change.

At most I will pull in about 200 products.

Some things that jump out at me. Nested for loops. Two different if statements in the product search which could result in duplicate products being added to the results.

My thought was that if I had the products in memory (the products will rarely change) I could cache them, reducing database dependence and possibly providing an efficient search.

...posting for now... will come back and add more thoughts

Edit: The reason I have a Category object holding a list of Products is that I want to show html pages of Products organized by Category. Also, the actual category numbers may change in the future and holding a tuple seemed like simple painless solution. That and I have read-only access to the database.

The reason for a separate list of products was somewhat of a cheat. I have a page that shows all products with the ability to view MSDS (safety sheets). Also it provided one less level to traverse while searching.

Edit 2:


def product_search(*params):
    results = []
    lowerParams = [ param.lower() for param in params ]

    for product in products:
        item_number = (str(product.item_number)).lower()
        name = (str(product.name)).lower()
        for param in lowerParams:
           if param in item_number or param in name:
               results.append(product)
    print results


Prepare all variables outside of the loops and use in instead of .find if you don't need the position of the substring:

def product_search(*params):
    results = []
    upperParams = [ param.upper() for param in params ]

    for product in products:
        name = str(product.name).upper()
        item_number = str(product.item_number).upper()
        for upperParam in upperParams:
            if upperParam in name or upperParam in item_number:
                results.append(product)
    print results


If both the name and number matches the search parameters, the product will appear twice on the result list.

Since the products count is a small number, I recommend constructing a SELECT query like:

def search(*args):
    import operator
    cats = reduce(operator.add, [list(c.categories) for c in categories], [])

    query = "SELECT * FROM tblProducts WHERE category IN (" + ','.join('?' * len(cats)) + ") name LIKE '%?%' or CAST(item_number AS TEXT) LIKE '%?%' ..."
    curr.execute(query, cats + list(args)) # Not actual code
    return list(curr)
0

精彩评论

暂无评论...
验证码 换一张
取 消