开发者

Assuming a many to many relationship model in sqlalchemy with mysql, how to update data ignoring duplicates?

开发者 https://www.devze.com 2023-03-16 10:42 出处:网络
I am a little stuck with sqlalchemy trying to update some data. I have a many to many and a one to many relationship. The first is a relationship between an author and the possible spellings of his n

I am a little stuck with sqlalchemy trying to update some data.

I have a many to many and a one to many relationship. The first is a relationship between an author and the possible spellings of his name. The second is l开发者_C百科inking authors to their written literature. A paper may have several authors and vice versa.

Assuming an author "Peter Shaw" who has already 4 papers stored and linked to him in the database. No I want to "add" a new set of 6 papers for "Peter Shaw". Unfortunately 4 of the 6 papers are already stored in the database. This is why session.commit() results in a duplicate error.

Is there a common way to avoid the duplicate errors, and to tell sqlalchemy to just fill in the holes instead of complaining about the duplicates? Neither the docus of sqlalchemy nor google could enlighten me with an explicit answer/approach, so any suggestions are well apreciated.

These are the models I am testing with:

class NameSpelling(Base):
    __tablename__ = 'name_spellings'

    id = Column(Integer, primary_key=True)
    name = Column(String(255), nullable=False, unique=True, index=True)
    authors_id = Column(Integer, ForeignKey('authors.id'))

    def __init__(self, name=None):
        self.name = name

    def __repr__(self):
        return "NameSpelling(%r)" % (self.name)

class Author(Base):
    __tablename__ = 'authors'

    id = Column(Integer, primary_key=True)
    name = Column(String(255), nullable=True, unique=True, index=True)

    papers = relationship('Paper',
                          secondary=author_paper,
                          backref='authors')

    name_spellings = relationship(NameSpelling,
                                  order_by=NameSpelling.id,
                                  backref="author",
                                  cascade="all, delete, delete-orphan")

    def __init__(self, name=None):
        self.name = name

    def __repr__(self):
        return "Authors(%r, %r)" % (self.name_spellings, self.name)


class Paper(Base):
    __tablename__ = 'papers'

    id = Column(Integer, primary_key=True)
    title = Column(String(1500), nullable=False, index=True)
    url = Column(String(255), nullable=False, unique=True, index=True)
    date = Column(Date(), nullable=True)

    def __init__(self, title=None, url=None, date=None):
        self.title = title
        self.url = url
        self.date = date

    def __repr__(self):
        return "Paper(%r)" % (self.title)


I have the exact same problem with an SQLAlchemy project. What I ended up doing (and what is likely a bad way of handling the issue), is to check the relationship collections before adding a new instance to the session and replacing the related instances with the result of session.merge(), if any.

It looks somewhat like this:

def add_instance_to_session(instance, session):
    '''
    Add instance to session, while checking for existing child instances in
    the relationship collection instance.child_list.
    '''
    def _merge_and_replace(child):
        with session.no_autoflush:
            merged_child = session.merge(child)
            if id(merged_child) != id(child):
                try:
                    session.expunge(child)
                except sqlalchemy.exc.InvalidRequestError:
                    # child wasn't in the session to begin with
                    pass
                return merged_child
            else:
                return child
    instance.child_list = map(_merge_and_replace, instance.child_list)
    session.add(instance)

This seems to work for me, but comes across as pretty bad performace-wise, especially if you have many childs. Maybe there is a better way utilizing the ON DUPLICATE KEY idiom that mySQL offers, or similar constructs.

[edit] The session.expunge() part is probably unnecessary if only the above method is used to add instances to a session, as the children cannot be in the session at this point. At least that's how I think it is...

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号